and has 0 comments

  Matthew McConaughey is a well known actor that inspires different things for different people. He's attractive, but intense, easily switching from charming to violently wild. He was for a while the quintessential romantic comedy actor until he suddenly wasn't. He is active socially and spiritually, always coming with some emotional speech about some thing or another. So what would his autobiography be like?

  Well, it was good, but it felt a little too rehearsed even as it was constructed as a collection of unfiltered anecdotes from the author's life. The title, Greenlights, comes from the understanding that some things in life are opportunities for the future. They don't push you forward, but give you the green light to go, they are open doors. Each of the stories in the book represents a greenlight for McConaughey, regardless of how amazing, fantastical, horrible or dangerous they sound.

  In short, his crazy parents instilled in him the moral fortitude to choose and then stick with that choice. From a household in which all emotions were heightened - there is a story where his parents have a fight involving a broken nose and knife swinging, followed by wild sex, for example - Matthew learns to live and love wild but mind the consequences. And then, with a series of greenlight events, he gets into acting and fame.

  The way the author says it, his character was formed before he became famous. If you believe he does crazy stuff now, it's because he was always like this and he chose to do it. The wet dreams that also stand for premonitions on what he has to explore, the naked stoned bongo playing at night, the choice to not accept any rom-com scripts anymore, which led to him not working for two years until Hollywood finally managed to see him as an actor and give him other roles.

  Same thing with love. He had a lot of temporary relationships and sex until he met the woman he saw as "the one", wooed her, married her and they have been together ever since. When he won the Oscar, he lost 30% of his weight for the role. I know this doesn't a performance make, but it shows the way McConaughey makes a choice and sticks with it.

  A relevant quote: "What is success to me? Continue to ask yourself that question. How are you prosperous? What is your relevance? Your answer may change over time and that's fine but do yourself this favor – whatever your answer is, don't choose anything that would jeopardize your soul"

  Now, did I like the book? I feel conflicted about it, as it provided insights into how the man thinks and feels, but which also felt bland and processed. At no time did I feel I was really understanding the person or experience things together with him. As an autobiography it wasn't very effective, but then again the book was never meant to be that, more a statement of belief on how life gives you paths to choose from.

  Bottom line: good, inspiring work, but less personal that I would have liked.

and has 0 comments

  Prions are a fascinating subject that we know almost nothing about. They are misfolded proteins that somehow proliferate inside our bodies and kill us with 100% efficiency. The diseases produced by prions are the deadliest there are, yet we know little about how prions multiply and even how they manage to kill us.

  Prions, a Challenge for Science, Medicine and Public Health System is a 2001 summary of works on prions. What does it say? That we don't know much. Then it gets terribly technical and, as I am not a biologist, I've decided to stop reading instead of pretending I understand anything. But I did scour the Internet for newer sources of knowledge and my finding is... that we still know shit about prions!

  So, what does misfolding mean? Prions are proteins, long chain molecules that are at the border of chemistry and mechanics in such a way that the way these molecules come to rest (fold) determines both their chemical and mechanical properties. Somehow (and no one actually knows how) a protein that is manufactured by our bodies (and that we don't really know what does) gets folded in the wrong way, leading to behavior that is detrimental to the body (in ways we don't really know). There is also a mechanism that turns proper proteins to this toxic form, much like a zombie invasion at nanoscale. And we don't know how it works.

  Why does it matter? Well, diseases such as scrapie in sheep, chronic wasting disease (CWD) in deer, bovine spongiform encephalopathy (BSE) in cattle (commonly known as "mad cow disease") and Creutzfeldt–Jakob disease (CJD), its variant (vCJD), Gerstmann–Sträussler–Scheinker syndrome (GSS), fatal familial insomnia (FFI), and kuru in humans are caused by prions. There is evidence that the same mechanism that destroys the nervous system in these diseases is also at fault with Alzheimer's. A biological weapon using prions, assuming it affects a large portion of a population, would kill 100% of the victims, decades after the weapon was used and without spreading the disease further.

  And why are prions so deadly? Because the immune system doesn't react to them. They are not viruses, they don't have nucleic acids, they are really tiny proteins that slowly but surely spread throughout the body and and up killing the brain of the victim (not unlike zombies, hmm).

  The leading expert in prions is Stanley B. Prusiner, the man who coined the term prion in 1982. The idea that a disease could be spread by just proteins was developed in the 1960 by people such as biophysicist John Stanley Griffith. Prusiner did a lot of work, but even so, there is little we understand about this, more than 70 years later.

  Bottom line: prions are fascinating and show us how much more we have to learn about biochemistry and disease vectors. Even if we hypothesized their existence in the '60s, we still don't know much on how they work. I welcome more research on the subject, as diseases caused by prions, even if rare, are deadly without exception.

and has 0 comments

  Lightchaser is a story about complacency, one that faults not our silly human nature but an external alien influence. The "Domain" is the place where numerous human cultures live on thousands of planets and the Lightchaser is a starship pilot that moves from planet to planet giving and collecting "collars" which give the wearer extra status and record all of their life experiences. Further down the line, the company that builds the ships, controller exclusively by AIs, will buy the collars.

  I love Peter F. Hamilton stories because they go far, they allow the reader to dream of futures so vast and amazing that our own existence seems static and impossible to explain. Lightchaser is tiny, self contained, but it breathes the same concept. The book is not the best he wrote and in fact it is a short story with a singular idea, but I enjoyed it. Certainly a fan of Hamilton's, I am going to read everything he ever wrote at one time.

  Bottom line: good hard short sci-fi story. A light (heh!) read.

and has 0 comments

  The Library of the Unwritten starts with a magical library which holds unwritten books, whether because their author has not written them yet or never got to before they died. And interesting premise, but one which made me afraid it was similar to Sorcery of Thorns. And I feel bad about it, but I did profile A. J. Hackwith before I started reading, which also filled me with apprehension (authors using their initials only make me suspicious). But the book was great! I am so glad to have been proven wrong.

  I don't want to spoil anything, but enough to say that characters like humans, muses, demons, angels, fallen angels, elder gods and literary characters who took shape in the real world are all characters in the book.

  While the story is a young adult fantasy, the writing is compelling, the characters complex and the plot quite refreshing and captivating. But I have to say I liked the characters the most: tortured souls (befitting a story which takes place in Hell most of the time) who have to resolve their issues in order to grow. All good characters are like that and inspire readers everywhere to do the same. The book also avoided getting mired in occult legislature (like defining a series of rules or a specific magic system) or pushing some gender agenda, instead focusing solely on story and characterization, which I applaud.

  Bottom line: not a masterpiece or anything, but one of the best books I've read recently and a very entertaining vacation read.

and has 0 comments

  I want humanity to spread to the cosmos, to colonize the Moon, Mars, the asteroid belt or anything other than Earth in whatever order possible. Personally, I think asteroids are our best first bet, but it doesn't matter as long as I am presented with a well crafted argument and solution plan. Unfortunately, How We'll Live on Mars is not that.

  Stephen Petranek starts with the old idea that colonizing Mars will be a human endeavor that will bring glory and scientific evolution and the betterment of humanity. It well may be, but as history demonstrated no one cares about anyone else and certainly not for "the world"; they care for wealth. Until the ninth chapter, the author fails to provide any inkling on how a colony on Mars would generate wealth and even there he sees it as a port and manufacturing place for resources extracted from asteroids and nothing more.

  I was curious on how Petranek will solve some thorny issues like the chemical composition of the soil, cosmic radiation, medical emergencies and so on. Don't get me wrong, I think with 8 billion people to spare we can afford to lose as many as they are needed as long as they volunteer. I am a strong proponent of individual will and agency and so I despise people who stop progress for fear of losing a few lives. But the author provides nothing but wishful thinking and, when faced with a problem he cannot fix with a simplistic solution, he pivots to another, bigger yet unrelated, problem to which he finds even bigger solutions.

  In fact, without solving the basics, like how to get there in one piece and how to support life once we get there, chapters about terraforming Mars (in centuries!!) are completely useless.

  I like Stephen Petranek's optimism. It inspires me to want to look at space colonization more carefully, find solutions and finally do it. However, when that scrutiny is turned on the book itself, only dust remains. This book is more like a science fiction story from a guy who didn't know how to write fiction and not a realistic manual on how to achieve human expansion on Mars.

  Bottom line: I want us to get to Mars, and quick, but this book is nothing but day dreaming.

and has 0 comments

  In a world where humans have solved the issues with biological-electronic interfacing you have people, electronically enhanced people, biologically enhanced robots and robots. One of these part biological robots is thinking for itself and... that's the story in All Systems Red. Some corporate shenanigans, some shooting, some world building, but in the end I wasn't charmed by the characters, the idea or the world itself. Probably it all becomes better in the next (at least five) books written by Martha Wells in the same series, but I don't think I am going to follow through.

  Don't get me wrong, I enjoyed reading the book. It was fun, it was pulp, it was short, but I didn't feel that need for more when it ended.

  One of the things that turn me off from AI stories is when they act and feel and think exactly like a human. In this book in particular, this makes sense somewhat, because the main character is a mix of electronics and biological tissue, but I felt no real difference between the bio-robot and the robo-human characters. System AIs were stupid and robotic while Murderbot is watching TV shows for fun because... it has a skin?

  I can only assume that further down the line they discover it's a Robocop-like situation, that might fix this obvious issue with the story, but frankly I don't care.

  Bottom line: a short fun read that lead me nowhere, but was good while on vacation.

and has 0 comments

  I only remember about Ready Player One that it was fun and pleasant to read, with kids exploring a virtual universe of cultural references to reach the magical MacGuffin. Ready Player Two is almost none of that, instead being boring, by the numbers and most of it written as exposition. It's like Sorento tried to write a Ready Player One book. I really did not like it. What was Ernest Cline thinking?!

  The exposition writing style is the thing that annoyed me first. You know when you are reading a book and it has to explain some thing that happened in a previous book, so it takes some well placed paragraphs to talk in the past about that? Well, this book starts with a third of it written like this. A complete third of the book is just exposition! And maybe it would have been OK if it were fun exposition, but no. It basically says "remember the good fun we had in the other book and the glorious feeling of victory? Well, that all went to shit immediately".

  It then proceeds on explaining (also in past tense) how two incredibly sci-fi things just... happened: first a complete machine to brain interface that is just there and you can put it on your head and then... an interstellar starship?! Which, BTW, does nothing for the entire book. It's an impossible to believe part of the story that then has no impact on it.

  Since the Oasis is basically Meta, with a working metaverse, the author does some lazy mental gymnastics to explain how it is still a good thing and how Wade is not Zuckerberg. Only it fails completely. I mean, we are meant to believe Wade temporarily joins the dark side only to recover later, while still remaining a positive character, but he comes up as a hypocrite who has no actual control over himself or what happens. After reading the first half of the book you hope Zuckerberg is going to take over, because Wade is so much worse. And then, the antagonist and a new quest are revealed by matter-of-factly presenting another impossible technological leap.

  No. This book is a total failure. Every character (including the wonderful do-gooder Samantha, voice of conscience and princess of awesome) is unlikeable, the writing style is amateurish and feels like an accountant explains in a board meeting what has happened while the plot is full of holes and deus-ex-machinas. But worst of all, by far, is that the book is not fun at all. 

and has 0 comments

  Clive Thompson is a technology journalist and therefore perfectly position to write a book about how digital technology really affects us. Does it destroy the world? No! Instead, it makes it better. Most of the time and if used well. In Smarter Than You Think, we read about how computers take over some of our tasks, then enhance them when used cooperatively, how new ways of thinking, awareness and literacy are unlocked by technology and how education can be used to improve how we use tech which then in turn can be used to upgrade education. So this is one non fiction book that paints technology in a rosy light and looks forward to the future. We need more of these.

  A few things popped up for me while reading this book. First a quote about teachers and medics. If you reach into the past and you pluck a doctor from 20 years ago and bring them in the present, they will not function well, as they did not keep up to date with the latest discoveries and techniques developed. However, a teacher from 200 years ago can still find a job teaching children. The job hasn't fundamentally changed in centuries... until now. Reading about how good teachers have evolved to make use of digital technology is inspiring.

  Then there was the concept of pluralistic ignorance, where people choose to behave in ways they do not adhere to because they are unaware of the position of the people around them. It was sobering. The book shows how the Internet can help dispel this problem by sharing awareness. That is not the same as "spreading awareness", the governmental and social warrior mindset which requires all people to think alike, but the increase in transparency of what people really think.

  Finally there was a small bit about how pessimistic or negative views are statistically interpreted as more serious, realistic and intelligent than positive ones. Which makes writing the book a bit braver and also explains why everyone is whining all the time.

  Of course, this book was written in 2013. Many things have happened since and the toxicity of public discourse combined with the insidious techniques corporations and groups in power use to manipulate everything can sour even the most optimistic of people. However I found the book still relevant and bringing a fresh sense of hope, without feeling like someone tried to push their worldview down my throat or predict the future for me. Instead it studies the many and often unpredictable ways in which people use technology to make things better.

  I can't say it's a masterpiece, but I enjoyed reading a positive and realistic book like Smarter Than You Think. It was a welcome alternative to the gloom and doom we see directed towards us on a daily basis.

and has 0 comments

  There is a psychological theory that tries to categorize behavior and personality into three: the Child, the Parent and the Adult. I am not really a specialist (I feel that the word "psychological" is an oxymoron), but in short you get the Child, who feels things and acts on impulse and pleasure and is creative, the Parent, who respects and enforces rituals that hold society together and free individuals from trivial decisions, and the Adult, who tries to do the best to mediate between the other two states by striving towards an objective view of reality.

  The roots of Star Trek, from this point of view, are that of an Adult that sometimes leans towards Parent. The show examines our current beliefs by creating fictional situations where they are put to the test. Characters or even entire societies assume archetypal roles, child-like, parent-like, while the role of the heroic Federation crew is to mediate some sort of understanding between them. As any good sci-fi, it is meant to make people think for themselves.

  No other show makes this mission clearer than Star Trek Discovery, which failed miserably to be Star Trek because it pushed its agenda on the viewer, rather than letting them think for themselves and make their own choice. Star Trek has touched so many controversial subjects, usually without taking things too far, but occasionally doing a brilliant job to inspire introspection.

  For example the Borg, which were always "evil" in their attempt to circumvent individuality and absorb everything and everybody in their megaorganism. Yet, with characters such as Hugh and Seven of Nine, grey areas were explored, culminating, I believe, with the conflict between Seven and Janeway, when her individuality is returned to her, but then her choices to return to the Collective are rejected. I still believe that they could have done a deeper job here, but times being what they were and the show being American, they got pretty far as it is. Personally, I would make an entire show about humans and a Borg-like species only.

  Frustrated by rules and rituals (heh!), Seth McFarlane, a huge Star Trek fan, decided to stop begging people to let him do a Star Trek show and created his own, borrowing what he could from the original show and improving or changing things to escape the confines of copyright. The Orville was born, a show that is a must see for any Star Trek fan. And I have to admit that when I decided to write this post, I was planning to talk about the differences between shows such as Star Trek Next Generation (and DS9 and especially Voyager), which leans a little too much toward the Parent role, and The Orville, which does a pretty good job being an Adult. But then I've changed my mind.

  The reason why I've changed my mind is the story of Topa. If you have not watched The Orville yet, please do so because I am going to spoil it for you.

  OK, so Topa is the female child of a two males Moclan couple in a society that considers females a genetic aberration. When a female infant is born, they immediately change their sex to male and never tell the children they were born different. How apropos this subject is, a society of homosexual males forcefully trans-forming any female baby, analyzed from our current socio-political point of view. And they did a fantastic job... at the beginning.

  You see, the first part of the story is about the disagreement between one parent and the other about if they should obey the mandated custom of their home planet, even if they are on a Federation (sorry, Union) ship. You can guess which part the crew was leaning toward, yet they had to accept the decision of the people in the culture that child was born... which was to proceed with the transformation. A disappointment for our American minded future union of planets, but what an episode finale! And before that, the revelation that the most revered poet of the Moclan culture is actually a female living in secrecy and willing to reveal herself to "fight for the cause".

  The second part is when the femaleness of Topa surfaces and makes her feel she lives in the wrong body. Again a lot of politics and scandal and opinions back and forth. This time, the episode is less ambiguous and I think the writers were actually afraid to do it any other way. Or they were lazy. Because at the end they skirt the law and the agreements between species and they reveal to Topa that she was born a female and immediately revert her to a female state in the same episode. A lot of effort went into making the supportive parent look good and the reticent parent look bad.

  Finally (maybe) the episode I saw today, where the female poet, now leader of a colony of all female Moclans that are protected from their homeworld's wrath by a Union agreement, tries to coopt Topa to be part of the "resistance" and she, hero-pressured, accepts, then almost loses her life at the hands of the evil all male Moclan military. I applauded the way it exposed the hypocrisy of the female leader, using a child to further her agenda and also endangering the entire colony that she was responsible for. However, again I felt like the conflict was resolved too quickly and too swiftly towards what we would accept as agreeable: Topa escapes with her life, the entire Union rejects the Moclan way of life and even the conservative parent makes a comeback complete with a full reversal of his opinions. How is the Union going to keep itself together if they can't accept the local idiosyncrasies of member states?

  And here is where the Parent, Adult, Child analysis feels appropriate. Topa, the child who wants to do what she feels is right and damn the consequences, Klyden the parent who won't renounce his custom and beliefs regardless of who that hurts and Bortus, the other parent - with an entire interstellar Union to support him, who has to find an adult way forward in which harm is minimized.

  I feel like the first episode about Topa lifted Orville above Star Trek shows. I know, blasphemy! How can I discount the eternal greatness of Star Trek? Well, because I compare the whole thing with the Seven of Nine storyline, where the show quickly dismissed her desire to return to the Collective as childish and went full Parent Janeway on her, even working towards a Mother/Daughter dynamic between them to justify it all. The Orville episode looked at individual opinions, cultural clashes, diplomatic discourse, the feelings of everyone involved and made the brave choice to not give the audience what it hoped for. Thus, making them think about the whole thing. Now with the other two episodes, I feel like the writers succumbed to societal pressure to resolve the conflict the only way the viewers would accept. And pronto! Before they #metoo McFarlane! Or maybe that's just stupid and childish, I don't know. I just liked the first episode so much compared with the "classical" other two.

  I think the PAC (Parent-Adult-Child) model is pretty useful in dissecting these Star Trek-like situations. I find it inspiring that the Adult, which is something people supposedly should strive to achieve psychologically, cannot exist in a vacuum. Without Adults and Children, it has no direction, it's like an AI system without a value function, while the two other roles generate this direction from feeling and instinct (genetics) and experience and tradition (culture). Whenever the crew encounter an alien species and enter the inevitable conflict, they have to not only solve the problem, but also do it in a way that is objectively and morally better, while also catering to their often strong feelings about a subject. Fascinating!

  We must be aware of the attraction we people have for strong authoritative figures that "know what's best", just as we must be aware of how easy solutions that feel good in the moment may have disastrous consequences further down the line. In some way, accepting everything from Picard-like people is almost as dangerous as acting like Q all the time.

  Haven't you ever wondered what a show like Star Trek would be like if situations were actually dangerous, where tech solutions would not solve everything in minutes and the alternatives are run, negotiate, intimidate or attack? When meeting some backwater one planet civilization that sentences your people to death for stamping on a flower, instead of spending one hour to save them using some loophole in the local law system to just arm photon torpedoes and say "Choose a city. Any city. Preferably one that you won't need anymore." Or if phasers would be set on "cut through stone" whenever firing at an alien lunging towards the crew? Or using any and all technology one finds to increase the tactical advantages of your ship and navy?

  But that's the whole point! Star Trek is not about levelling up, is about finding yourself with just shitty options and still choosing the one that is most principled and logical for everyone involved. About examining one's preconceptions and reaching not a conclusion, but a point of decision where the viewer can spend some time and think. It's about good writing! Compare that with Kirk on a motorcycle and you realize what the roots of Star Trek are all about.

  I wanted to write a post about how Star Trek treats too many situations as a Parent, probably because it was created by people in the 60s and 70s, and is sometimes too eager to put characters in their place because family (yeah, The Fast and the Furious doesn't have a monopoly on that) and how The Orville is going above that. Then I realize that they are actually doing the same thing, most of the time, with Orville just freshening things up and having a little bit more courage when writing their stories. And I love it! 

  Happy Trekking!

  This is a very basic tutorial on how to access Microsoft SQL Server data via SQL queries. Since these are generic concepts, they will be applicable in most other SQL variants out there. My hope is that it will provide the necessary tools to quickly "get into it" without having to read (or understand) too much. Where you go from there is on you.

  There are a lot of basic concepts about SQL, this post will be pretty long.

Table of contents

Connecting to a database

  Let's start with tooling. To access a database you will need SQL Server Management Studio, in my case version 2022, but I will not do anything complicated with it here, therefore any version will do just fine. I will assume you have it installed already as installation is beyond the scope of the blog post. Starting it will prompt for a connection:

  To connect to the local computer, the server will be either . or (local) or the computer name. You can of course connect to any server and you can specify the "instance" and the port number as well. An instance is a specific named installation of SQL server which allows one to have multiple installations (and even versions) of SQL Server. In fact, each instance has its own port, so specifying the port number will ignore the name of the instance. The default port is usually 1433.

  Example of connection server strings: Computer1\SQLEXPRESS, sql.corporate.com,1433, (local), .

  The image here is from a connection to the local machine using Windows Authentication (your windows user). You can connect using SQL Server Authentication, which means providing a username and a password, or using one of the more modern Azure Active Directory methods.

  I will also assume that the connection parameters are known to you, so let's go to the next step.

  Once connected, the Object Explorer window will display the connection you've opened.

  Expanding the Databases node will show the available databases.

  Expanding a database node we get the objects that are part of the database, the most important being:

  • Tables - where the actual data resides
  • Views - abstractions over more complex queries that behave like tables as much as possible, but with some restrictions
  • Stored Procedures - SQL code that can be executed with parameters and may return data results
  • Functions - SQL code that can be executed and returns a value (which can be scalar, like a number of string, or a table type, etc.) 

  In essence they are the equivalent of data stores and code that is executed to use those stores. Views, SPs and functions will not be explained in this post, but feel free to read about them afterwards.

  If one expands a table node, the child nodes will contains various things, the most important of which are:

  • Columns - the names and types of each column in the table
  • Indexes - data structures designed to increase performance to various ways of accessing the data in the table
  • Constraints and Keys - logical restrictions and relationships between tables

  Tables are kind of like Excel sheets, they have rows (data records) and columns (record properties). The power of SQL is a way to declare what you want from tabular representations of data and get the results quickly and efficiently.

  Last thing I want to show from the graphical interface is right clicking on a table node, which shows multiple options, including generating simple operations on the table, the CRUD (Create, Read, Update, Delete) operations mostly, which in SQL are called INSERT, SELECT, UPDATE and DELETE respectively.

  The keywords are traditionally written in all caps, I am not shouting at you. Depending on your preferences and of course the coding standards that apply to your project you can capitalize SQL code however you like. SQL is case insensitive.

Anyway, whatever you are going to choose to "script" it's going to open a so called query window and show you a text with the query. You then have the option of executing it. Normally no one uses the UI to generate scripts except for getting the column names in order for SELECT or INSERT operations. Most of the time you will just right click on a database and choose New Query or select a database and press Ctrl-N, with the same result.

Getting data from tables

Finally we get to doing something. The operation to read data from SQL is called SELECT. One can specify the columns to be returned or just use * to get them all. It is good practice to always specify the column names in production code, even if you intend to select all columns, as the output of the query will not change if we add more columns in the future. However, we will not be discussing software projects, just how to get or change the data using SQL server, so let's get to it.

The simplest select query is: SELECT * FROM MyTable, which will return all columns of all records of the table. Note that MyTable is the name of a table and the least specific way of accessing that table. The same query can be written as: SELECT * FROM [MyDatabase].[dbo].[MyTable], specifying the database name, the schema name (default one is dbo, but your database can use multiple ones) and only then the table name.

The square bracket syntax is usually not required, but might be needed in special cases, like when a column has the same name as a keyword or if an object has spaces or commas in it (never a good idea, but a distinct possibility), for example: SELECT [Stupid,column] FROM [Stupid table name with spaces]. Here we are selecting a badly named column from a badly named table. Removing the square brackets would result in a syntax error.

In the example above we selected stuff from table CasesSince100 and we got tabular results for every record and the columns defined in the table. But that is not really useful. What we want to do when getting data is:

  • getting data from specific columns
  • formatting the data for our purposes
  • filtering the data on conditions
  • grouping the data
  • ordering the results

So here is a more complex query:

-- everything after two dashes in a line is a comment, ignored by the engine
/* there is also
   a multiline comment syntax */
SELECT TOP 10                            -- just the first 10 records
    c.Entity as Country,                 -- Entity will be returned with the name Country
    CAST(c.[Date] as Date) as [Date],    -- Unfortunate naming, as Date is also a type
    c.cases as Cases                     -- capitalized alias
FROM CasesSince100 c                     -- source for the data, aliased as 'c'
WHERE c.Code='ROU'                       -- conditions to filter by
    AND c.[Date]>'2020-03-01'
ORDER BY c.[Date] DESC                   -- ordering in descending order

  The query above will return at most 10 rows, only for Romania, for dates larger than March 2020, but ordered from the newest to oldest. Data returned will be the country name, the date (which was originally a DATETIME and now is cast to a timeless DATE type) and the number of cases.

  Note that I have aliased all columns, so the resulting table has columns named as the aliases. I've also aliased the table name as 'c', which helps in several ways. First of all, Intellisense works better and faster when specifying the table name. All you have to do is type c. and the list of columns will pop up and be filtered as you type. The second reason will become apparent when I am talking about updating and deleting. For the moment just remember that it's a good idea to alias your tables.

  You can alias a table by specifying a name to call it by next to its own name and optionally using 'as', like SELECT ltn.* FROM Schema.LongTableName as ltn. It helps differentiating between ambiguous names (like if two joined tables have columns with the same name), simplifying the code for long named tables and helping with code completion. Even when aliased, the table name can be used and one can specify or ignore the name of the table if the column names are unambiguous.

Of course these are trivial examples. The power of SQL is that you can get information from multiple sources, aggregate them and structure your database for quick access. More advanced concepts are JOINs and indexes, and I hope you will read until I get there, but for now let's just go through the very basics.

Here is another query that groups and aggregates data:

SELECT TOP 10                            -- top 10 results
    c.Entity as Country,                 -- country name
    SUM(CAST(c.cases as INT)) as Cases   -- cases is text, so we transform it to int
FROM CasesSince100 c
WHERE YEAR([Date])=2020                  -- condition applies a function to the date
GROUP BY c.Entity                        -- groups by country
HAVING SUM(CAST(c.cases as INT))<1000000 -- this is filtering on grouped values
ORDER BY SUM(CAST(c.cases as INT)) DESC  -- order on sum of cases

This query will show us the top 10 countries and the total sum of cases in year 2020, but only for countries where that total is less than a million. There is a lot to unpack here:

  • cases column is declared as NVARCHAR(150) meaning Unicode strings of varied length, but at most 150 characters, so we need to cast it to INT (integer) to be able to apply summing to it
  • there are two different ways of filtering: WHERE, which applies to the data before grouping, then HAVING, which applies to data after grouping
  • filtering, grouping, ordering all work with unaliased columns, so even if Entity is returned as Country, I cannot do WHERE Country='Romania'
  • grouping allows to get a row for each combination of the columns the grouping is done and compute some sort of aggregation (in the case above, a sum of cases per country)

Here are the results:

Let me rewrite this in a way that is more readable using what is called a subquery, in other words a query from which I will query once again:

SELECT TOP 10
    Country,
	SUM(Cases) as Cases
FROM (
    SELECT
        c.Entity as Country,
        CAST(c.cases as INT) as Cases,
	    YEAR([Date]) as [Year]
FROM CasesSince100 c
) x
WHERE [Year]=2020
GROUP BY Country
HAVING SUM(Cases)<1000000
ORDER BY Cases DESC

Note that I still have to use SUM(Cases) in the HAVING clause. I could have grouped it in another subquery and selected again and so on. In order to select from a subquery, you need to name it (in our case, we named it x). Also I selected Country from x, which I could have also written as x.Country. As I said before, table names (aliased or not) are optional if the column name if unambiguous. Also you may notice that I've given a name to the summed column. I could have skipped that, but that would mean the resulting columns would have had no name and the query itself would have been difficult to use in code (extracted column values would have had to be retrieved by index and not by name, which is never recommended).

If you think about it, the order of the clauses in a SELECT operation has a major flaw: you are supposed to write SELECT, then specify what columns you want and only then specify where you want the columns to be read from. This makes code completion problematic, which is why the in code query language for .NET (LInQ) puts the selection at the end. But even so there is a trick:

  • SELECT * and then complete the query
  • go back and replace the * with the column names you want to extract (you will now have Intellisense code completion)
  • the alias of the tables will now come in handy, but even without aliases one can press Ctrl-Space and get a list of possible values to select

Defining tables and inserting data

Before we start inserting information, let's create a table:

CREATE TABLE Food(
    Id INT IDENTITY(1,1) PRIMARY KEY,
    FoodName NVARCHAR(100),
    Quantity INT
)

One important concept in SQL is the primary key. It is a good idea in most cases that your tables have a primary key which identifies each record uniquely and also makes them easy to reference. Let me give you an example. Let's assume that we would put no Id column in our Food table and then we would accidentally add cheese twice. How would you reference the first record as opposed to the second? How would you delete the second one?

A primary key is actually just a special case of a unique index, clustered by default. We will get to indexes later, so don't worry about that yet. Enough to remember that it is fastest (most efficient) to find records by the primary key than any other column combination and the way records are uniquely identified. 

The IDENTITY(1,1) notation tells SQL Server that we will not insert values in that column and instead let it put values starting with 1, then increasing with 1 each time. That functionality will become clear when we INSERT data in the table:

INSERT INTO Food(FoodName,Quantity)
VALUES('Bread',1),('Cheese',1),('Pork',2),('Chilly',10)

Selecting from our Food table now gets us these results:

As you can see, we've inserted four records, by only specifying two out of three columns - we skipped Id. Yet SQL has filled the column with values from 1 to 4, starting with 1 and incrementing each time with 1.

The VALUES syntax is specifying inline data, but we could, in fact, insert into a table the results of a query, something like this:

INSERT INTO Food(FoodName,Quantity)
SELECT [Name],Quantity
FROM Store
WHERE [Type]='Food'

There is another syntax for insert that is useful with what are called temporary tables, tables created for the purpose of your session (lifetime of the query window) and that will automatically disappear once the session is over. It looks like this:

SELECT FoodName,Quantity
INTO #temp
FROM Food

This will create a table (temporary because of the # sign in front of it) that will have just FoodName and Quantity as columns, then proceed on saving the data there. This table will not have a primary key nor any types of indexes and it will work as a simple dump of the data selected. You can add indexes later or alter the table in any way you want, it works just like a regular table. While a convenient syntax (you don't have to write a CREATE TABLE query or think of the type of columns) it has a limited usefulness and I recommend not using it in application code.

Just as one creates a table, there are DROP TABLE and ALTER TABLE statements that delete or change the structure of the table, but we won't go into that.

Changing existing data

So now we have some data in a table that we have defined. We will see how the alias syntax I discussed in the SELECT section will come in handy. In short, I propose you use just two basic syntax forms for all CRUD operations: one for INSERT and one for SELECT, UPDATE and DELETE.

But how can you use the same syntax for statements that are so different, I hear you ask? Let me give you some example of similar code doing just that before I dive in what each operation does.

SELECT *
FROM Food f
WHERE f.Id=4

UPDATE f
SET f.Quantity=9
FROM Food f
WHERE f.Id=4

DELETE FROM f
FROM Food f
WHERE f.Id=4

The last two lines of all operations are exactly the same. These are simple queries, but imagine you have a complex one to craft. The first thing you want to see is that you are updating or deleting the right thing, therefore it makes sense to start with a SELECT query instead, then change it to a DELETE or UPDATE when satisfied. You see I UPDATE and DELETE using the alias I gave the table.

When first learning UPDATE and DELETE statements, one usually gets to this syntax:

UPDATE Food     -- using the table name is cumbersome if in a complex query
SET Quantity=9  -- unless using Food.Quantity and Food.Id
WHERE Id=4      -- you don't get easy Intellisense

DELETE          -- this seems a lot easier to remember
FROM Food       -- but it only works with one table in a simple query
WHERE Id=4

I've outlined some of the reasons I don't use this syntax in the comments, but the most important reason why one shouldn't use them except for very simplistic cases is that you are trying to create a query to destructively change the data in the database and there is no fool proof way to duplicate the same logic in a SELECT query to verify what you are going to change. I've seen people (read that as: I was dumb enough to do it myself) who created an entire different SELECT statement to verify what they would do, then realize to their horror the statements were not equivalent and they had updated or deleted the wrong thing!

OK, let's look at UPDATE and DELETE a little closer.

One of the useful clauses for these statements is, just like with SELECT, the TOP clause, which instructs SQL to affect just a finite number of rows. However, because TOP has been added later for write operations, you need to encase the value (or variable) in parentheses. For SELECT you can skip the parentheses for constant values (you still need them for variables)

DELETE TOP (10) FROM MyTable

Another interesting clause, that frankly I have not used a lot, but is essential in some specific cases, is OUTPUT. One can delete or update some rows and at the same time get the rows they have changed. The reason being that first of all in a DELETE statement the rows will be gone, so you won't be able to SELECT them again. But even in an UPDATE operation, the rows chosen to be updated by a query may not be the same if you execute them again. 

SQL does not guarantee the order of rows unless specifically using ORDER BY. So if you execute SELECT TOP 10 * FROM MyTable twice, you may get two different results. Moreover, between the time you UPDATE some rows and you SELECT them in another query, things may change because of other processes running at the same time on the same data.

So let's say we have some for of Invoices and Items tables that reference each other. You want to delete one invoice and all the items associated with it. There is no way of telling SQL to DELETE from multiple tables at the same time, so you DELETE the invoice, OUTPUT its Id, then delete the items for that Id.

CREATE TABLE #deleted(Id INT) -- temporary table, but explicitly created

DELETE FROM Invoice 
OUTPUT Deleted.Id    -- here Deleted is a keyword
INTO #deleted        -- the Id from the deleted rows will be stored here
WHERE Id=2           -- and can be even be restored from there

DELETE 
FROM Item
WHERE Id IN (
  SELECT Id FROM #deleted
)  -- a subquery used in a DELETE statement

-- same thing can be written as:
DELETE FROM i
FROM Item i
INNER JOIN #deleted d  -- I will get to JOINs soon
ON i.Id=d.Id

I have been informed that the INTO syntax is confusing and indeed it is:

  • SELECTing INTO will create a new table with results and throw an exception if the table already exists. The table will have the names and types of the selected values, which may be what one wants for a quick data dump, but it may also cause issues. For example the following query would throw an exception:
    SELECT 'Blog' as [Name]
    INTO #temp
    
    INSERT INTO #temp([Name]) -- String or binary data would be truncated error
    VALUES('Siderite')
    ​

    because the Name column of the new temporary table would be VARCHAR(4), just like 'Blog' and 'Siderite' would be too long

  • UPDATEing or DELETEing with OUTPUT INTO will require an existing table with the same number and types of columns as the columns specified in the OUTPUT clause and will throw an exception if it doesn't exist

One can use derived values in UPDATE statements, not just constants. One can reference the columns already existing or use any type of function that would be allowed in a similar SELECT statement. For example, here is a query to get the tax value of each row and the equivalent update to store it into a separate column:

SELECT
    i.Price, 
    i.TaxPercent, 
    i.Price*(i.TaxPercent/100) as Tax  -- best practice: SELECT first
FROM Item i

UPDATE i
SET Tax = i.Price*(i.TaxPercent/100)   -- UPDATE next
FROM Item i

So here we first do a SELECT, to see if the values we have and calculate are correct and, if satisfied, we UPDATE using the same logic. Always SELECT before you change data, so you know you are changing the right thing.

There is another trick to help you work safely, one that works on small volumes of data, which involves transactions. Transactions are atomic operations (all or nothing) which are defined by starting them with BEGIN TRANSACTION and are finalized with either COMMIT TRANSACTION (save the changes to the database) or ROLLBACK TRANSACTION (revert changes to the database). Transactions are an advanced concept also, so read about it yourself, but remember one can do the following:

  • open a new query window
  • execute BEGIN TRANSACTION
  • do almost anything in the query window
  • if satisfied with the result execute COMMIT TRANSACTION
  • if any issue with what you've done execute ROLLBACK TRANSACTION to undo the changes

Note that this only applies for stuff you do in that query window. Also, all of these operations are being saved in the log of the database, so this works only with small amounts of data. Attempting to do this with large amounts of data will practically duplicate it on disk and take a long time to execute and revert.

The NULL value

We need a quick primer on what NULL is. NULL is a placeholder for a value that was not set or is considered unknown. It's a non-value. It is similar to null in C# or JavaScript, but with some significant differences applicable to SQL only. For example, a NULL value (an oxymoron for sure) will never be equal to (or not equal to) or less than or greater than anything. One might expect to get all the values in a table in these two queries: SELECT * FROM MyTable WHERE Value>5 and SELECT * FROM MyTable WHERE Value<=5. But if any rows will have NULL for a Value, then they will not appear in any of the query results. That applies to the negation operator NOT as well: SELECT * FROM MyTable WHERE NOT (Value>5).

This behavior can be changed by using SET ANSI_NULLS OFF, but I am yet to see a database that has ever been set up like this.

To check if a value is or is not NULL, one uses the IS and IS NOT syntax :)

SELECT *
FROM MyTable
WHERE MyValue IS NOT NULL

The NULL concept will be used a lot in the next chapter.

Combining data from multiple sources

We finally go to JOIN operations. In most scenarios, you have a database containing multiple table, with intricate connections between them. Invoices that have items, customers, the employee that processed it, dates, departments, store quantities, etc., all referencing something. Integrating data from multiple tables is a complex subject, but I will touch just the most common and important parts:

  • INNER JOIN
  • OUTER JOIN
  • EXISTS
  • UNION / UNION ALL

Let's write a query that displays the name of employees and their department. I will show the CREATE TABLE statements, too, in order to see where we get the data from:

CREATE TABLE Employee (
  EmployeeId INT,          -- Best practice: descriptive column names
  FirstName NVARCHAR(100),
  LastName NVARCHAR(100),
  DepartmentId INT)        -- Best practice: use same name for the same thing

CREATE TABLE Department (
  DepartmentId INT,        -- same thing here
  DepartmentName NVARCHAR(100)
)

SELECT
    CONCAT(FirstName,' ',LastName) as Employee,
    DepartmentName
FROM Employee e
INNER JOIN Department d
ON e.DepartmentId=d.DepartmentId

Here it is: INNER JOIN, a clause that combines the data from two tables based ON a condition or series of conditions. For each row of Employee we are looking for the corresponding row of Department. In this example, one employee belongs to only one department, but a department can hold multiple employees. It's what we call a "one to many relationship". One can have "one to one" or "many to many" relationships as well. That is very important when trying to gauge performance (and number of returned rows).

Our query will only find at most one department for each employee, so for 10 employees we will get at most 10 rows of data. Why do I say "at most"? Because the DepartmentId for some employees might not have a corresponding department row in the Department table. INNER JOIN will not generate records if there is no match. But what if I want to see all employees, regardless if their department exists or not? Then we use an OUTER JOIN:

SELECT
    CONCAT(FirstName,' ',LastName) as Employee,
    DepartmentName
FROM Employee e
LEFT OUTER JOIN Department d
ON e.DepartmentId=d.DepartmentId

This will generate results for each Employee and their Department, but show a NULL (without value) result if the department does not exist. In this case LEFT is used to define that there will be rows for each record in the left table (Employee). We could have used RIGHT, in which case we would have rows for each department and NULL values for departments that have no employees. There is also the FULL OUTER JOIN option, in which case we will get both departments with NULL employees if none are attached and employees with NULL departments in case the department does not exist (or the employee is not assigned - DepartmentId is NULL)

Note that the keywords INNER and OUTER are completely optional. JOIN is the same thing as INNER JOIN and LEFT JOIN is the same as LEFT OUTER JOIN. I find that specifying them makes the code more readable, but that's a personal choice.

The OUTER JOINs are sometimes used in a non intuitive way to find records that have no match in another table. Here is a query that shows employees that are not assigned to a department:

SELECT
    CONCAT(FirstName,' ',LastName) as Employee
FROM Employee e
LEFT OUTER JOIN Department d
ON e.DepartmentId=d.DepartmentId
WHERE d.DepartmentId IS NULL

Until now, we talked about the WHERE clause as a filter that is applied first (before grouping) so one might intuitively have assumed that the WHERE clauses are applied immediately on the tables we get the data from. If that were the case, then this query would never return anything, because every Department will have a DepartmentId. Instead, what happens here is the tables are LEFT JOINed, then the WHERE clause applies next. In the case of unassigned employees, the department id or name will be NULL, so that is what we are filtering on.

So what happens above is:

  • the Employee table is LEFT JOINed with the Department table
  • for each employee (left) there will be rows that contain the values of the Employee table rows and the values of any matched Department table rows
  • in the case there is no match, NULL values will be returned for the Department table for all columns
  • when we filter by Department.DepartmentId being NULL we don't mean any Department that doesn't have an Id (which is impossible) but any Employee row with no matching Department row, which will have a NULL value where the Department.DepartmentId value would have been in case of a match.
  • not matching can happen for two reasons: Employee.DepartmentId is NULL (meaning the employee has not been assigned to a department) or the value stored there has no associated Department (the department may have been removed for some reason)

Also, note that if we are joining tables on some condition we have to be extra careful with NULL values. Here is how one would join two tables on VARCHAR columns being equal even when NULL:

SELECT *
FROM Table1 t1
INNER JOIN Table2 t2
ON (t1.Value IS NULL AND t2.Value IS NULL) OR t1.Value=t2.Value

SELECT *
FROM Table1 t1
INNER JOIN Table2 t2
ON ISNULL(t1.Value,'')=ISNULL(t2.Value,'')

The second syntax seems promising, doesn't it? It is more readable for sure. Unfortunately, it introduces some assumptions and also decreases the performance of the query (we will talk about performance later on). The assumption is that if Value is an empty string, then it's the same as having no value (being NULL). One could use something like ISNULL(Value,'--NULL--') but now it starts looking worse.

There are other ways of joining two tables (or queries, or table variables, or table functions, etc.), for example by using the IN or the EXISTS/NOT EXISTS clauses or subqueries. Here are some examples:

SELECT *
FROM Table1
WHERE MyValue IN (SELECT MyValue FROM Table2)

SELECT *
FROM Table1
WHERE MyValue = (SELECT TOP 1 MyValue FROM Table2 WHERE Table1.MyValue=Table2.MyValue)

SELECT *
FROM Table1
WHERE NOT EXISTS(SELECT * FROM Table2 WHERE Table1.MyValue=Table2.MyValue)

These are less readable, usually have terrible performance and may not return what you expect them to return.

When I was learning SQL, I thought using a JOIN would be optimal on all cases and subqueries in the WHERE clause were all bad, no exception. That is, in fact, false. There is a specific case where it is better to use a subquery in WHERE instead of JOIN, and that is when trying to find records that have at least one match. It is better to use EXISTS because it is short-circuiting logic which leads to better performance.

Here is an example with different syntax for achieving the same goal:

SELECT DISTINCT d.DepartmentId
FROM Department d
INNER JOIN Employee e
ON e.DepartmentId=d.DepartmentId

SELECT d.DepartmentId
FROM Department d
WHERE EXISTS(SELECT * FROM Employee e WHERE e.DepartmentId=d.DepartmentId)

Here, the search for departments with employees will return the same thing, but in the first situation it will get all employees for all departments, then list the department ids that had employees, while in the second query the department will be returned the moment just one employee that matches is found.

There is another way of combining data from two sources and that is to UNION two or multiple result sets. It is the equivalent of taking rows from multiple sources of the same type and showing them together in the same result set.

Here is a dummy example:

SELECT 1 as Id
UNION
SELECT 2
UNION
SELECT 2

And we execute it and...

What happened? Shouldn't there have been three values? Somehow, when copy pasting the silly example, you added two identical values. UNION will add only distinct values to the result set. using UNION ALL will show all three values.

SELECT 1 as Id
UNION ALL
SELECT 2
UNION ALL
SELECT 2

SELECT DISTINCT Id FROM (
  SELECT 1 as Id
  UNION ALL
  SELECT 2
  UNION ALL
  SELECT 2
) x

The first query will return 1,2,2 and the second will be the equivalent of the UNION one, returning 1 and 2. Note the DISTINCT keyword.

My recommendation is to never use UNION and instead use UNION ALL everywhere, unless it makes some kind of sense for a very specific scenario, because the operation to DISTINCT values is expensive, especially for many and/or large columns. When results are supposed to be different anyway, UNION and UNION ALL will return the same output, but UNION is going to perform one more pointless distinct operation.

After learning about JOIN, my request to start with SELECT queries and only them modify them to be UPDATE or DELETE begins to make more sense. Take a look at this query:

UPDATE d
SET ToFindManager=1
--SELECT *
FROM Department d
LEFT OUTER JOIN Employee e
ON d.DepartmentId=e.DepartmentId
AND e.[Role]='Manager'
WHERE e.EmployeeId IS NULL

This will set ToFindManager in departments that have no corresponding manager. But if you select the text from SELECT * on and then execute, you will get the results that you are going to update. Same query, executing by selecting different sections of it will either verify or perform the operation.

Indexes and relationships. Performance.

We have seen how to define tables, how to insert, select, update and delete records from them. We've also seen how to integrate data from multiple sources to get what we want. The SQL engine will take our queries, try to understand what we meant, optimize the execution, then give us the results. However, with large enough data, no amount of query optimization will help if the relationships between tables are not properly defined and tables are not prepared for the kind of queries we will execute.

This requires an introduction to indexes, which is a rather advanced idea, both in terms of how to create, use, debug and profile, but also as a computer science concept. I will try to stick to the basics here, and you go and get more in depth from here.

What is an index? It's a separate data structure that will allow quick access to specific parts of the original data. A table of contents in a blog post is an index. It allows you to quickly jump to the section of the post without having to read it all. There are many types of indexes and they are used in different ways.

We've talked about the primary key: (unless specified differently) it's a CLUSTERED, UNIQUE index. It can be on a single column or a combination of columns. Normally, the primary key will be the preferred way to find or join records on, as it physically rearranges the table records in order and insures only one record has a particular primary key.

The difference between CLUSTERED and NONCLUSTERED indexes is that a table can have only one clustered index, which will determine the physical order of record data on the disk. As an example, let's consider a simple table with a single integer column called X. If there is a clustered index on X, then when inserting new values, data will be moved around on the disk to account for this:

CREATE TABLE Test(X INT PRIMARY KEY)

INSERT INTO Test VALUES (10),(1),(20)

INSERT INTO Test VALUES (2),(3)

DELETE FROM Test WHERE X=1

After inserting 10,1 and 20, data on the disk will be in the order of X: a 1, followed by a 10, then a 20. When we insert values 2 and 3, 10 and 20 will have to be moved so that 2 and 3 are inserted. Then, after deleting 1, all data will be moved so that the final physical order of the data (the actual file on the disk holding the database data) will be 2,3,10,20. This will help optimize not only finding the rows, but also efficiently reading them from disk (disk access is the most expensive operation for a database). 

Note: deletion is working a little differently in reality, but in theory this is how it would work.

Nonclustered indexes, on the other hand, keep their own order and reference the records from the original data. For such a simple example as above, the result would be almost identical, but imagine you have the Employee table and you create a nonclustered index on LastName. This means that behind the scenes, a data structure that looks like a table is created, which is ordered by LastName and contains another column for EmployeeId (which is the primary key, the identifier of an employee). When you do SELECT * FROM Employee ORDER BY LastName, the index will be used to first get a list of ids, then select the values from them.

A UNIQUE index also insures that no two records will have the same combination of values as defined therein. In the case of the primary key, there cannot be two records with the same id. But one can imagine something like:

CREATE UNIQUE INDEX IX_Employee_Name ON Employee(FirstName,LastName)

INSERT INTO Employee (FirstName,LastName)
VALUES('Siderite','Blog')

IX_Employee_Name is a nonclustered unique index on FirstName and LastName. If you execute the insert, it will work the first time, but fail the second time:

There is another type of index-like structure called a foreign key. It should be used to define logical relationships between tables. For the Department table, DepartmentId should be a primary key, but in the Employee table, DepartmentId should be defined as a foreign key connecting to the column in the Department table.

Important note: a foreign key defines the relationship, but doesn't index the column. A separate index should be added on the Employee.DepartmentId column for performance reasons.

I don't want to get into foreign keys here. Suffice to say that once this relationship is defined, some things can be achieved automatically, like deleting corresponding Item records by the engine when deleting Invoices. Also the performance of JOIN queries increases.

Indexes can be used not only on equality, but also other more complex cases: numerical ranges, prefixes, etc. It is important to understand how they are structured, so you know when to use them.

Let's consider the IX_Employee_Name index. The index is practically creating a tree structure on the concatenation of the first and last name of the employee and stores the primary key columns for the table for reference. It will work great for increasing performance of a query like SELECT * FROM Employee ORDER BY FirstName or SELECT * FROM Employee WHERE FirstName LIKE 'Sid%'. However it will not work for LastName queries or contains queries like SELECT * FROM Employee ORDER BY LastName or SELECT * FROM Employee WHERE FirstName LIKE '%derit%'.

That's important because sometimes simpler queries will take more resources than more complicated ones. Here is a dumb example:

CREATE INDEX IX_Employee_Dumb ON Employee(
    FirstName,
    DepartmentId,
    LastName
)

SELECT *
FROM Employee e
WHERE e.FirstName='Siderite'
  AND e.LastName='Blog'

SELECT *
FROM Employee e
WHERE e.FirstName='Siderite'
  AND e.LastName='Blog'
  AND e.DepartmentId=1

The index we create is called IX_Employee_Dumb and it creates a data structure to help find rows by FirstName, DepartmentId and LastName in that order. 

For some reason, in our employee table there are a lot of people called Siderite, but with different departments and last names. The first query will use the index to find all Siderite employees (fast), then look into each and check if LastName is 'Blog' (slow). The second query will directly find the Siderite Blog employee from department with id 1 (fast), because it uses all columns in the index. As you can see, the order of columns in the index is important, because without the DepartmentId in the WHERE clause, only the first part of the index, for FirstName, can be used. In the last query, because we specify all columns, the entire index can be used to efficiently locate the matching rows. 

Note 2022-09-06: Partitioning a table (advanced concept) takes precedence to indexes. I had a situation where a table was partitioned on column RowDate into 63 partitions. The primary key was RowId, but when you SELECTed on RowId, there were 63 index seeks performed. If queried on RowId AND RowDate, it went to the containing partition and did only one index seek inside it. So careful with partitioning. It only provides a benefit if you query on the columns you use to partition on.

One more way of optimizing queries is using the INCLUDE clause. Imagine that Employee is a table with a lot of columns. On the disk, each record is taking a lot of space. Now, we want to optimize the way we get just FirstName and LastName when searching in a department:

SELECT FirstName,LastName
FROM Employee
WHERE DepartmentId=@departmentId

That @ syntax is used for variables and parameters. As a general rule, any values you send to an SQL query should be parameterized. So don't do in C# var sql = "SELECT * FROM MyTable WHERE Id="+id, instead do var sql="SELECT * FROM MyTable WHERE Id=@id" and add an @id parameter when running the query.

So, in the query above SQL will do the following:

  • use an index for DepartmentId if any (fast)
  • find the EmployeeId
  • read the (large) records of each employee from the table (slow)
  • extract and return the first and last name for each

But add this index and there is no need to even go to the table:

CREATE INDEX IX_Employee_DepWithNames
  ON Employee(DepartmentId)
  INCLUDE(FirstName,LastName)

What this will do is add the values of FirstName and LastName to the data inside the index and, if only selecting values from the include list, return them from the index directly, without having to read records from the initial table.

Note that DepartmentId is used to locate rows (in WHERE and JOIN ON clauses) while FirstName and LastName are the columns one SELECTs.

Indexes are a very complex concept and I invite you to examine it at length. It might even be fun.

When indexes are bad

Before I close, let me tell you where indexes are NOT recommended.

One might think that adding an index for each type of query would be a good thing and in some scenarios it might, but as usual in database work, it depends. What performance you gain for finding records in SELECT, UPDATE and DELETE statements, you lose with INSERT, UPDATE and DELETE data changes.

As I explained before, indexes are basically hidden tables themselves. Slight differences, but the data they contain is similar, organized in columns. Whenever you change or add data, these indexes will have to be updated, too. It's like writing in multiple tables at the same time and it affects not only the execution time, but also the disk space.

In my opinion, the index and table structure of a database depends the most on if you intend to read a lot from it or write a lot to it. And of course, everybody will scowl and say: "I want both! High performance read and write". My recommendation is to separate the two cases as much as possible.

  • You want to insert a lot of data and often? Use large tables with many columns and no indexes, not even primary keys sometimes.
  • You want to update a lot of data and often? Use the same tables to insert the modifications you want to perform.
  • You want to read a lot of data and often? Use small read only tables, well defined, normalized data, clear relationships between tables, a lot of indexes
  • Have a background process to get inserts and updates and translate them into read only records

Writing data and reading data, from the SQL engine perspective, are very very different things. They might as well be different software and indeed some companies use one technology to insert data (like NoSQL databases) and another to read it.

Conclusion

I hope the post hasn't been too long and that it will help you when beginning with SQL. Please leave any feedback that you might have, the purpose of this blog is to help people and every perspective helps.

SQL is a very interesting idea and has changed the way people think of data access. However, it has become so complex that most people are still confused even after years of working with it. Every year new features are being added and new ideas are put forward. Yet there are a few concepts, a foundation if you will, that will get you most of the way there. This is what I have tried to distil here. Hope I succeeded.

  I was attempting to optimize an SQL process that was cleaning records from a big table. There are a multitude of ways of doing this, but the pattern that I had adopted for the last similar tasks were to delete rows in batches using the TOP (@rowCount) syntax. And it had all worked fine until then, but now my "optimization" increased the run time from 6 minutes to 2 hours! Humbled (or more like humiliated) I started to analyze what was going on.

  First thing I did was to SET STATISTICS IO ON. Then I ran the cleaning task again. And lo and behold, there was a row reporting accessing an object that was not part of the query itself. What was going on? At first I thought that I was using a VIEW somewhere, one that I had thought was a table, but no, there was no reference to that object anywhere. But when I looked for that object is was a view!

  The VIEW in question was a view with SCHEMABINDING, to which several indexes were then created. That explained it all. If you ever attempted to create an index on a view you probably got the error "Cannot create index on view, because the view is not schema bound" and then you investigated what that entailed (and probably gave up because of all the restrictions) but in that first moment when you thought "all I have to do is add WITH SCHEMABINDING and I can index my views!" it seemed like a good idea. It might even be a good idea for several scenarios, but what it also does is create a reverse dependency on the object you are using. Moreover, if you look more carefully at the Microsoft documentation it says: "The query optimizer may use indexed views to speed up the query execution. The view does not have to be referenced in the query for the optimizer to consider that view for a substitution." So you may find yourself querying a table and instead the engine queries a view instead!

  You see, what happens is that every time when you delete 4900 rows from a table that is used by a view that has indexes on it is those indexes are being recreated, so not only your table is affected, but potentially everything that is being called in the view as well. If it's a complicated view that integrates data from multiple sources, it will be run after every batch delete and indexed. Again. And again. And again again. It also prohibits you from some operations, like TRUNCATE TABLE, where you get a funny message saying it's referenced by a view and that is why you can't truncate it. What?!

  Now, I deleted the VIEW and ran the same code. It was faster, but it still took ages because finding the records to delete was a much longer operation than the deletion itself. This post is about this reverse dependency that an indexed view introduces.

  So what is the solution? What if you have the view, you need the view and you also need it indexed? You can disable the indexes before your operation, then enable them again. I believe this will solve most issues, even if it's not a trivial operation. Just remember that in cleaning operations, you need some indexes to find the records to delete as well.

  That's it. I hope it helps. Get out of here!

and has 0 comments

  A Lush and Seething Hell is a collection of two novellas: The Sea Dreams It Is the Sky, where vast magical forces play with death and torture in a fictional Chile inspired South-American country, and My Heart Struck Sorrow, a story of dark magic working through verse and song.

  John Hornor Jacobs writes well, dragging the reader into the worlds of his mind, however I found it difficult to stay there. Perhaps it's the alert lifestyle of today, full of interruptions and distractions, but it felt easy for me to stop reading and it needed some effort to start again. It took me two weeks to read it all and even then it required a conscious decision to push through, though it's not a large book.

  Both stories have a common structure: people who are following the narrative of another and thus are drawn into the same world. Reading about reading, so to speak. They have elements of cosmic horror, although most of it is implied or not clearly explained - the traditional way of approaching the genre - intimating that even the tiniest brushes with these hidden realms are terrifyingly dangerous. What they both reminded me repeatedly is House of Leaves, though not so convolutedly detailed, and only marginally of any Lovecraftian work.

  Bottom line: I liked both stories, the world building, the style, the slowly getting under the skin horror elements, but I did feel the writing dragged a little.

and has 0 comments

  Something that feels inspired heavily by Octavia Butler, Semiosis starts with a very interesting premise and continues through generations of human colonists on an alien planet. However, each chapter introduces a new generation, thus abandoning characters and attachments introduced before. In the end it simply feels too clinical, with characterization lacking luster, while still remaining a captivating read.

  The plot centers around a human colony on a distant alien planet. There are only a few dozen people and, with some equipment failures, they find themselves at the mercy of the world's inhabitants. Which are intelligent plants! It is a very interesting premise and both the generational span of the story and the cold calculations of different species that must coexist despite their massive differences reminded me a bit of Xenogenesis. However, Sue Burke didn't have the cruelty required to thoroughly violate her characters that Butler had, so in the end the mood was more positive, perhaps reminiscent of '60s sci-fi, with lots of deliberations and rational arguments as a major part of the story.

  Bottom line: I liked the book. Could have been better, but as a debut it's pretty good. I will probably read the second book sooner or later, because the world of Pax is so full of potential, however I do believe Semiosis can be taken as a standalone story without the need for a continuation.

and has 0 comments

  Edward O. Wilson was a biologist who died at the end of 2021, aged 94. Nicknamed "ant man" for his world renowned expertise of ants, he championed concepts such as sociobiology and biodiversity. Reportedly, he was a very nice man, beloved by most of the people he interacted with. And yet, I didn't hear of him because of his scientific writings, but because of a vitriolic article published by Scientific American. In it, the author used Wilson's death and the renewed interest in his autobiography, Naturalist, to decry Wilson's views ("problematic beliefs"). He had tried to explain everything through biological lenses, for example that individual characteristics are caused by evolution and those characteristics cause the characteristics of a group or society or race in a particular environment. The article's author considered that as proof of "scientific racism", but was immediately shut down by scores of scientists who debunked her entire article and pretty much proved she didn't even read the books she was supposedly basing her writing on.

  So even when I try to filter out the political idiocy that pollutes every aspect of modern life and try to keep up to date with science and technology, I still fall into these toxic holes. Ironically, one of the last chapters in Naturalist talks about how weird it was for one of his colleagues to try to explain biology ideologically (in that case Marxism). Anyway, so I decided to read the book. I usually love autobiographies, especially those of scientists and other driven people, because it makes me feel as they did. Even if prompted by an ugly example of human stupidity and malice, still something good could come of it.

  Alas, while the book is interesting and takes the reader through much of Wilson's life and work, it merely describes his passion for nature, rather than evoke it. Even as it starts with a personal history and childhood, it feels strangely impersonal. A small boy with hearing issues and partial vision in one eye (accidentally caused by him trying to handle a spiked fish), he was nevertheless taught to never run away from a fight by his father, partially schooled in educational institutions that prepared children for military careers and had overall the belief that anything is possible, once you put your mind to it.

  I have no doubt that his approach to life wasn't as analytical as it is portrayed in the book, but what exactly that was is hard to glimpse from this biography. Wilson published Naturalist when he was 65 and, while I am sure he worked some time on it, he treated it as any of his scientific books at the time: facts, history based on journals, actions, expectations, results. I liked the book and I liked Wilson, but I wouldn't particularly recommend Naturalist for anything than a glimpse in Wilson's nature (pardon the pun).

  First of all, neither am I a philosopher nor have I read Nietzsche. The philosophical aspects that I am discussing are how a layman would interpret them. In this post I am going to discuss anime from the Baki Hanma and JoJo's Bizarre Adventures universes with a nod to Andromeda's race of genetically modified humans called Nietzscheans and also other media portrayals of similar concepts.

  Watching episodes from Baki or JoJo anime I got a weird feeling. Both series, while having completely different plots, focus on humans with superior abilities fighting each other. Nothing new here: both American and Japanese cultures are inundated with this cliché. Yet these shows are strangely humanistic in nature. The characters have impossible strong muscles, dress in their own special way and are proudly dedicated to particular philosophies that define their path in life. Compared to other people, they are intimidating, entirely dominating, and they are so strong that they defy the laws of medicine and even physics. They use their power in tactical and strategic ways, they hone their skills, they outthink their adversaries and use whatever the environment gives them in order to win. And this in order to gain power only over themselves.

  In so many ways, they reminded me of the Nietzscheans, from Gene Roddenberry's TV series Andromeda (before the show went to shit, so first season only). They also reveled in their physical, mental and knowledge prowess. Violence, to them, was justified as a way to eliminate weakness. The characters in the two anime shows are the same: they risk their health, their lives, in order to try themselves to the limit. As a result, they cannot exist in human society. People can't abide such obvious difference, when these guys are stronger than guns, impossible to detain through cuffs, chains, walls or cages and at any time they can just destroy a normal human being with little to no effort. It is this part that actually got me thinking and writing the blog post.

  Usually in media, people who care only about their own betterment to the point they eschew social norms are portrayed as villains. Human values are represented as communal values: caring about others, respecting their way to live, abiding social constraints and obeying laws, forming bonds and families, then dedicating effort to maintain and preserve them. The hero will defend, not attack, will arrest, not destroy, will consider, not dismiss, will protect, not invade. In fact, a hero is a social construct and can only exist as society's protector.

  In regular situations, the ones that are considered normal in society, heroes are not needed. Performance is not needed. There are some boundaries in which one is allowed to strive for better output, but only as cogs in a social mechanism that needs them to perform within expected ranges. Only when things go awry, from the breaking of a component (be it a tool, a flow or a person) to some huge disaster, some people "step up" and take over the load. Those are heroes. And here is the dilemma, because someone who has not made the effort of being better than expected of them will not be able to step up, while someone who does make the effort is inevitably vilified during "peace times".

  This reminds me of Rambo, in the first movie and not the ridiculous propaganda sequels. Here is a man who, through circumstances that needed to be tragic and out of his control so as to enhance his heroic status, reached a level above his peers, at least in one particular domain: fighting and killing. He was perfect as a soldier, but as he returns home he has difficulties integrating himself back into society. It takes only a small town sheriff bullying to bring the beast to surface. The old adage still stands: the best heroes are all dead.

  Going back to the animes, I found myself in conflict. Here is the usual portrayal of society, a safe place for everybody to live in, defining what human life is and should be like, but functioning as a soulless mechanism. And here is the usual portrayal of the self absorbed villain, a monstruous being of immense power who threatens the existence of all, but functioning as a proud individual constantly bettering themselves. I feel like the latter option is more humanistic, therefore truly being human is in antithesis to human society.

  Can there be a balance between the two? Could we actually imagine a benign Nietzschean-like society? One that would truly embrace diversity, specialization and performance while despising mediocrity and also not eating itself from within? I find it hard, if not impossible. Still, I can't but feel a sort of admiration for these larger than life characters and their dedication to a random thing than then defines them for ever.

  What do you think?