Just a few days ago I was writing on how important it is to tell Entity Framework what SQL type to use in order to avoid costly conversions. In fact, it wasn't so much an EF issue as it was an SQL one. Converting even character types to character types or changing collation was surprisingly expensive.  In this post I will show you how important it is to choose the right type for your querying columns, especially the primary key. 

  First imagine this scenario: you get some data from an outside source, rows and rows of it, and you have to store them and query them in SQL. The value that uniquely identifies a row is a small string, maybe 50 characters long. How do you proceed?

  My first naive solution was the most obvious one: just create a table that has a column for each value in the rows and put the primary key on the identifying one. But this leads to immediate performance losses:

  • by default, a primary key is a clustered index - text is not sequential, so at every insert the database engine will physically move huge swaths of data in order to place the rows in the alphabetical order of their identifiers
  • a primary key is a unique index - meaning text will have to get compared to other text in order to determine uniqueness, which is slow
  • by default, SQL is case insensitive - meaning that all text comparisons will have to be made taking into account capitalization and accents
  • 50 characters is a lot - even without Unicode support, it's 50 bytes, which is 12 times more than an integer, meaning the primary key index will be large; and slow

  "But!", you will undoubtedly say, if you put the primary key on some other column, you will still have to create a unique index on the identifier. Isn't this just pushing the problem farther down the road? The size and speed limitations will be the same. And primary keys are clustered only by default, but they can be declared as not clustered. And SQL doesn't need to be case insensitive, all you have to do is change the collation of the column to be binary and it will be compared faster. Wouldn't that solve the problem?

  No. In fact, my final solution which worked five times faster, did not have an index on the identifier column AT ALL. Incidentally, I did end up changing the collation, but only because the idiots sending me the data were doing it case sensitive.

  Without further ado, here is what I did:

  • an INT column with IDENTITY(1,1) as the primary key - which ensures a fast insertion due to the sequential nature of the value, fast query speed and low usage of disk space for the index
  • an INT column holding the checksum of the identifier - which when indexed, is fast to query and doesn't use a lot of disk space for the index

   So how do I query on the identifier? Simple: I calculate the checksum of the string and then I look it up in the database - which uses the index to locate the few strings that have the same checksum, then just finds the right one by enumerating through them. I query on the checksum column AND the text identifier. And there is an added bonus: I only need to do this once. If I need the record from the DB again, I query it directly through the integer primary key.

  Entity Framework has this automatic memory cache so when I am querying on the database entity using a business model - as good separation of concerns practice would dictate - it gets it really fast from memory. Because the memory cache also uses just the int to identify an entity, which means double the benefits!

  The eagle eyed reader will have noticed that I am not using a unique index on the identifier, so technically I could create multiple rows with the same one. However, my application is always looking for the existing record first. But if you really worry about data consistency, the index on the checksum column can be replaced with a unique index on the checksum and identifier column. It will take more space, but it will be just as fast.

  Another thing that you may have noticed is that I use a code checksum, not the database provided functions to achieve the same. At first glance, it's an instant win: just create a persisted computed column that calculates the checksum or binary checksum of the identifier column. However, this would be weird when having to query, since you would have to craft a stored procedure or a custom SQL command to get the identifier and query on its checksum. In my case I just calculate a checksum - and not use the lazy string.GethashCode function which may be subject to change and it's already different between 32 and 64 bit systems.

  Of course, if you want your text columns to be case and/or accent insensitive, you will have to store the hash code of the lowercase and unaccented string or use an implementation that is case and accent insensitive. This may not be trivial.

  Further tests showed that just using a non clustered index on the identifier column, even a unique one, was just slightly slower, maybe 5%. However, the space taken by indexes increased by 20%. So I might understand why you would find it a bit off putting and skip the checksum part.

  Hope this helps!

  P.S. Why did this solution provide such a huge performance gain? Obviously the SQL team would have implemented a sort of checksum for their text index, this should have been working natively and faster than any possible implementation I could make. Well, I don't know the answer. In fact, this all could be some quirk of Entity Framework and the SQL queries would not be optimizable to such a degree. I will attempt to test that using purely SQL commands. But meanwhile, all the points I made above are valid and with a little more work you can have a lot more control on how the system works.

  What do you remember from the Terminator movies? It's the Skynet killer robot, obviously, the people who seem to always be related somehow, and a hero that needs saving for the sake of their work in the future, but running for their lives in the present. In Terminator Zero you get all of these, to the point that they feel a little overdone. But the animation is good and the story is interesting, adding some logical elements that I've only seen in Terminator: The Sarah Connor Chronicles, which I liked a lot and wanted more of. I loved that they set the action in a clearly different timeline than our own and also tried to make it clear the ridiculous cycle of trying to fix the past from the future.

  Unfortunately, they've decided to add children to the mix. And I mean children that need a nanny, not 24 year old Claire Danes. Most of the time it's the children and their very Japanese emotions filling the screen, while their father, a mysterious tech mogul, keeps saying cryptic things almost until the end of the movie for no good reason. The Terminator is thankfully not in the shape of Arnie and the human fighter from the future is a woman. It also is set in Japan. The series ends with a promise rather than with closure, although I doubt they will make a second season.

  It's eight episodes of 20 minutes each, but I think the story was a little too simple for 160 minutes and it could have easily been a more concise two hour animation film. What's the difference, really, between a series you release all at once and a feature film anyway?

  While I applaud stories said in animation - readers of this blog may already know that I believe that's how you do and say brave things today, especially in sci-fi and horror - being a Terminator story meant it was locked in some preestablished framework and couldn't be too creative. Just consider taking some pages out of Screamers, for example, and you understand what I mean. I would watch seasons and seasons of Terminator anime than hope for something decent in live action anymore. The thing is that they already are very far advanced in special effects, but those also cost a lot of money, meaning that you either underdeliver on viewer expectations or have to make a whole bunch of money to break even. Animation is not like that and it's also a lot more flexible.

  All in all I liked the show and I recommend it, but don't expect too much.

  I've built an application and, like any lazy dev out there, I focused on the business logic, the project structure, the readability, comments, the dependency injection, the unit tests, you know... the code. My preference is to start from top to bottom, so I create more and more detailed implementations of interfaces while going down to the metal. The bottom of this chain is the repository, that class which handles database access, and I've spent little to understand or optimize that code. I mean, it's DB access, you read or you write stuff, how difficult can it be?

  When it was time to actually test it, the performance of the application was unexpectedly bad. I profiled it and I was getting reasonable percentages for different types of code, but it was all taking too long. And suddenly my colleague says "well, I tried a few things and now it works twice as fast". Excuse me?! You did WHAT?! I have been trying a few things too, and managed to do diddly squat! Give me that PR to see what you did! And... it was nothing I could see.

  He didn't change the code, he just added or altered the attributes decorating the properties of models. That pissed me off, because I had previously gone to the generated SQL with the SQL Profiler and it was all OK. So I executed my code and his code and recorded the SQL that came out:

  • was it the lazy loading? Nope. The number of instructions and their order was exactly the same
  • was it the explicit declaration of the names of indexes and foreign keys? Nope. Removing those didn't affect performance.
  • was it the ChangeTracker.LazyLoadingEnabled=false thing? Nope, I wasn't using child entities in a way that could be affected.
  • was there some other structure of the generated SQL? No. It was exactly the same SQL! Just my code was using thousands of CPU units and his was using none.
  • was it magic? Probably, because it made no sense whatsoever! Except...

Entity Framework generates simple SQL queries, but it doesn't execute them as you and I would. It constructs a string, then uses sp_executesql to run it. Something like this:

exec sp_executesql N'SELECT TOP(1) [p].[ID], [p].[TXT], [p].[LUP_TS]

FROM [sch].[table] AS [p]

WHERE [p].[ID] = @__p_0',N'@__p_0 nvarchar(64)',@__p_0='xxxx'

Do you see it? I didn't until I started to compare the same SQL in the two versions. And it was the type of the parameters! Note that the aptly named parameter @__p_0 is an NVARCHAR. The actual column in the database was VARCHAR! Meaning that the code above was unnecessarily always converting values in order to compare them. The waste of resources was staggering!

How do you declare the exact database type of your columns? Multiple ways. In my case there were three different problems:

  • no Unicode(false) attribute on the string columns - meaning EF expected the columns to be NVARCHAR
  • no Typename parameter in the Column attribute where the columns were NTEXT - meaning EF expected them to be NVARCHAR(Max)
    • I guess one could skip the Unicode thing and instead just specify the type name, but I haven't tested it
  • using MaxLength instead of StringLength - because even if their descriptions are very similar and MaxLength sounds like applying in more cases, it's StringLength that EF wants.

From 40-50ms per processing loop, it dropped to 21ms just by fixing these.

Long story short: parametrized SQL executed with sp_executesql hides a possible performance issue if the columns that you compare or extract have slightly different types than the one of the parameters.

Go figure. I hate Entity Framework!

and has 0 comments

  In The Memory Police Yōko Ogawa describes a small Japanese island ruled by "the memory police", an organization with apparent total power and no opposition whose entire purpose is to make sure the things that "are disappeared" are physically destroyed and arrest anyone on the island who is able to remember them. A very interesting metaphor on the things that only hold value if we remember and fight for them.

  Unfortunately, in this book no one fights for anything! I expected some sort of revelation on how this magical police can make disappear concepts from the minds of people so thoroughly that they can't even put them back in their memory when holding them in their hands. Or some sort of solution to said problem. Some sort of misguided attempt at a revolution. Something! But these are Japanese people, if things are supposed to disappear, they go with it until they are all gone.

  Was the author trying to convey the same frustration that I felt while reading the book? People so ritualistic and conformist that they basically amount to non playing characters, running the same routine until someone turns the game off? Because this frustration only combined with the ethereal quality of internal monologues who noticed things happening and ... then did nothing at all.

  I can't say the book was not decently written and the idea was intriguing, but if you expect the story to go anywhere, well, it doesn't.

and has 0 comments

  The Stories of My Life is the autobiography of James Patterson, said to be "the most popular storyteller of our time" of which I honestly had not heard before, written in a bunch of very short and out of order chapters, a la Mrs. Bridge, in which he repeats incessantly to outline everything. Very ironic. I liked the character more than the book.

  You see, James Patterson is a type of person that you can't help but admire: he is good at sports, he is good at school, he is good with women, he is good with business and he is a famous writer. And all of this not because anyone handed anything to him, but through hard work and dedication. This guy is the American Dream made flesh.

  He meets famous writers, actors, sports people, business people, several presidents of the United States and so on, he becomes the CEO of the advertising firm he basically interned at and all of this while being nice to people, loving and caring about family and friends and feeling pretty good about himself. And all of this without cocaine!

  So I liked the main character, very inspiring, despite the times having changed so much as to make such a person impossible nowadays, but I can't say I liked the book. The shuffled nature of the stories doesn't really help. It's clear the guy had the outline of the story he wanted to tell, so why write it this way? It didn't improve anything. Is it to clarify that life is a string of scenes and their order and the narrative we tell to ourselves are not that important compared to doing the right thing at the present time? Perhaps. But then it's inevitable that the reader is going to try to unshuffle the scenes into something comprehensible.

  And then is the always present question with an autobiography: how real is all of this? I've read some that sound real and others that feel like the prose version of "Biggest & the Best", by Clawfinger. Are there things in the artificial gaps the author creates between these anecdotes that he doesn't feel like sharing or maybe is not even aware he doesn't? Are the stories in the book overblown to inflate the author's ego? Well, I don't think so. The book actually feels right. Maybe it's not at all accurate - after all that's what a writer's job is, to make things up - but it felt honest.

  What it didn't feel was personal. You see, Patterson is a good writer, he writes with humor and wit, but I didn't feel he was writing about himself, but about this character called Jim Patterson. While honest, it also felt overpolished, the edges smoothed off, and personal is what an autobiography should feel like, something perhaps even more important than being written well.

  Bottom line: really inspiring, felt real, but also impersonal enough to not merit the full mark. I liked it.

and has 0 comments

  Fortune's Fool is something that feels like Game of Thrones, but set in a 16th century type of world inspired by Spanish and Italian history, focused on a woman ex-princess, now warrior. A lot of intrigue, world building, betrayal and feudal machinations. I didn't feel like going through with it, though. 

  It's not that I didn't like Angela Boord's writing, I just didn't feel like going through the motions with the female ingénue, betrayed by unscrupulous men, forced to see the world as it is, harden, then get betrayed again in a somewhat cathartic situation that will bring closure to her teenage trauma.

  Bottom line: I might pick it up later, if I feel like reading about feudal intrigue and cruelty, but at the moment I choose not to.

and has 0 comments

  What a wonderful book! Something that feels like a spiritual sibling of The Santaroga Barrier, by Frank Herbert and written in a similar style. The slightly outdated writing style might put you off, but the story is really interesting and well crafted.

  John Wyndham is the author of novels that were adapted by TV and movies that, to be frank, I was a lot more familiar with than the author or his books. These include The Day of the Triffids, Children of the Damned and Chocky. However, I have to say that Trouble with Lichen might be the most interesting by far.

  The plot revolves around a mysterious lichen that contains a substance that can prolong life by slowing down metabolism in a way that doesn't affect anything but growth. Independently discovered by both a seasoned scientist and his brilliant female employee, the substance affects the very fabric of human society.

  I have to admit I just took the book out of my list and started reading it. I expected some popular science book about lichens and instead I had to read a lot of feminist philosophy written in a 1930's type of English writing style. But I continued reading and I was not disappointed. Yes, the plot is not airtight and there are parts that are either anachronic or sometimes less relevant to the main theme, but the parts that are there are thought provoking and captivating.

  Some of these themes include: women fighting for their rights and getting them, only to then not use them because of societal conformism, world changing discoveries that are immediately threatened with seizure, stifling, destruction by people and organizations with power, political and economical manipulation of the masses, the danger of knowing or owning something of power and value without the proportional means of protecting it and so on. In a way, it reminded me of the wonderful movie The Man in the White Suit which also contrasts what we say we want with what would actually happen if we got it.

  Bottom line: if you get past the olden writing style and some anachronisms you will get your mind excited by some fundamental ideas underlying the functioning of our apparently benign society. I can't recommend it enough. Read it!

and has 0 comments

  One Second After sounded really interesting: what happens after a massive EMP attack disrupts all electricity use in the United States. However William R. Forstchen's writing style and the things the book was focusing one really repelled me. Have you ever read one of those American airplane books where everything happens in a small U.S. town, where people all know each other and help each other through tight social networks and they are all God fearing red blooded nationalists and everything is about how the average Joe feels about things and how they fight to protect their families? Well, this is one of them.

  Bottom line: I might pick it up later, but I wanted to relax, not get aggravated, so I did not finish it.

and has 0 comments

  A feel similar to Brandon Sanderson's Skyward series, even the writing style, The Last Human is an young adult novel with some pretty intriguing ideas that stayed with me a long time after I finished reading it.

  Zack Jordan creates a complex world of millions of sentient civilizations held together by The Network, a faster than light framework that allows all of these different species to travel the universe, understand each other and be safe from one another. However, the future of civilizations that refuse to follow the rules of the Network is dire, especially the one of the most hated race of people in the known universe: the vile humans. And of course the main character is a human teenage girl who was told nothing about her species and past and has to discover it all together with the reader.

  Many of the ideas in the book were really interesting, like the legal status of species and artificial intelligences based on the "intelligence tier" and the illusion of having control over your destiny when something hundreds of times your better decides to use you. Also the common question about which is better: freedom, order or something in between. I also liked how the author presented the way different species saw the world. A bit formulaic, but fun!

  Yet the book was not perfect. While some things in it might be considered horrifying by any degree, the plot flows like most YA stories where the main character lacks both control and understanding of the situation, therefore they're considered not responsible for the bad things happening around them. This changes a bit towards the end, but not particularly so. There is also a very interesting relationship between Sarya and her "mother", which then is just left behind and crystalized in a few McGuffins and some principles that the daughter blindly follows when the story requires it. This happens with most characters, really: they are described, used, then mostly discarded.

  The ending ended threads in a satisfactory manner, but most characters remained in the discard bin, which I didn't like. I'd say that Jordan has the writing thing down, he just needs to work more. I would read more of his stuff in the future. To me The Last Human was both a positive surprise and somewhat disappointing. A decent book, though, that I will recommend.

and has 0 comments

  The Frugal Wizard is a nice little standalone story, a science fantasy that is at once a white room story (man wakes up without memory) and a non-Asian isekai (in a parallel world derived from history, fantasy or gaming). Luckily, not a Cosmere novel, either; you know how I feel about pointless "cinematic universes". I like how these "secret projects" led to more original stories, unconstrained by arbitrary rules of fitting in with anything before or after.

  In the book, a man from a far technological future of mankind, where purchasing access to your own parallel dimension is a reality yet dollars and marketing pamphlets are still a thing, wakes up in a medieval setting without knowing who he is. His character follows a classic hero's arc - a Brandon Sanderson specialty, where he first thinks he's the hero, then finds out he is not, only to become one. The setting is a bit too silly, with a rather disappointing villain that is not fleshed out more than the typical psychotic bully, but it makes up for it with a satisfying redemption plot, some playful romance and a colorful, magical and curated version of medieval England.

  I especially liked the jabs towards the popular depictions of the era, which I hear are quite inaccurate and probably the consequence of creators copying each other until it becomes culture. Fake it till you make it, I guess. But what's with the Odin hate? Everyone seems to dislike the guy lately...

  Bottom line: medium sized book with a silly, but not overly so, premise and a whiff of the early Sanderson work that I fell in love with originally.

and has 0 comments

  This is a very Brandon Sanderson novella: the willful youth, the sardonic adult hiding their inherent goodness under a veil of insults and bad puns, the logical puzzles, the world building done while telling a concise and compelling story. The only thing that I dreaded was that it was another Cosmere story, trying to square peg something interesting into this pointless joint universe. And it wasn't! Well, not that particular universe.

  Children of the Nameless is set in the extended universe of the card game Magic: The Gathering. The novella was released for free on the website of Wizards of the Coast, the publishers of Magic, through an arrangement that allowed Brandon increased creative control of the story. It is set on the plane of Innistrad several years after the events of Eldritch Moon. It introduces the original characters Tacenda Verlasen and Davriel Cane and follows their story as they seek to uncover the mystery of Tacenda's entire village being taken by geists. Meanwhile, the story is no longer available on the website, or maybe I didn't know how to find it.

  Anyway, back to the story. It was a bit on the childish side, although it featured some gruesome scenes as well. Overall it made me very interested in the characters and maybe the world. There is a "Magic: the Gathering" collection of books on Goodreads and it contains 75 works. This particular magical literary universe was not on my radar before. I doubt I will delve into it any time soon, but it's intriguing.

  Bottom line: fun, short, intense. I liked it!

and has 0 comments

  The longest of the Bobiverse series books, almost as long as the first three combined - which makes it its own self-contained trilogy, Heaven's River was... drawn out to the point of being boring. Humor and some intense scenes made it interesting, but not only did it spend a lot of attention on trivialities, it also set up some reveals that were both predictable and also rather inconsequential.

  I am not complaining that much. I still liked the book, but the things Dennis E. Taylor flags as important are weird as only someone living in the North American utopia can think of. And yes, I know he is Canadian and he is nice and a computer programmer and a sci-fi references obsessed geek, so basically a perfect human being, yet I can't take seriously the perils of financial ruination of the Bobs or the obsessions over whether the Prime Directive should be followed, enforced, and then enforced over other people, which is self-contradicting! And a lot of talking about emotional and emotion related philosophical issues and how to accommodate them and not hurt people, when everybody else behaves like self-interested psychopaths.

  Anyway, as a slight departure from the original flow of the books in the series, this is mostly about the attempts to rescue the hardware storage of one of the Bobs from an alien superstructure where aliens seem to be living an idyllic life in a pre-steam technology civilization and not a jump from Bob to Bob ad nauseam.

  The do the mission in the most time consuming and pointless way imaginable. And then there is the issue of the "civil war" which is spoiled directly in the book description, but which in the ends falls flat as a very random and implausible evolution of the situation. One thing that I found truly original and fascinating is the idea of a quantum soul. But reading the entire book just for that is hardly worth it.

  I am going to probably continue to read the series, but I would have to remember it when the fifth book comes out, which has a pretty heavy description, so I do have hope. Overall this was a below average Bob adventure. I need it to be better.

and has 0 comments

  Finally a book with an actual ending! I think the Bobiverse series was supposed to be a trilogy, then Dennis E. Taylor continued to write more stuff in the same universe, because I just started a fourth book which has as many pages as the previous three books combined. So if you feel you want to stop somewhere, All These Worlds is where the story actually ends. More exciting than previous books, but also with an underwhelming resolution.

  I mean, humanity is in dire straits. Not only did they stupidly almost killed themselves off, but now a very advanced civilization is threatening them with extinction all over again. It must be hard getting out of that one! No, it's actually very easy, barely an inconvenience! Also having the power to alter solar systems but still getting snagged in moralistic, political and even legal squabbles felt underwhelming.

  Did I mention it was underwhelming? I need the whelming! Whelm me, Taylor!

  Bottom line: if you've read the first two books, for sure you will need to read this one. But don't expect too much. More content, but less resolution.

and has 0 comments

  The Bobiverse series doesn't have actual books, it has volumes. It's a single story, or rather history, that just goes on and on without any type of marker or closure between books. For We Are Many is therefore just like the first book, but lacking the surprise factor. Just as physical and temporal scales are largely ignored, Dennis E. Taylor often exaggerates the technical ones, placing more complexity on creating life like androids than planetary system harvesters or colony ships for thousands of people. I still like the series, but it's getting blander.

and has 0 comments

  We are Legion reads as a blend of Andy Weir and Adrian Tchaikovsky: the geekiness is there, the science, the optimism, the humor, the glossing over the complicated stuff :) I liked the first book and I am going to read the others, too. It's like Dennis E. Taylor is their replicant son!

  If anything, the issue is that there are almost no stakes (yet!). The story is about a guy who is translated into an AI then given control of the first von Neumann probe sent from Earth. Then Earth destroys itself, so all that's left is Bob and his many replicas, spreading over the universe.

  Reading the book you kind of feel the power of the Fermi paradox: intelligent technological species that may have been started billions of years in the past had all the time in the world to get to us. So where are they? Of course, the biggest technical challenges of space expansion: energy and propulsion, are hand waved away with some gimmicks that makes stuff work. However, the book is geeky and fun enough to enjoy also while reading to some dry descriptions of how automated probes dismantle Kuiper belts to replicate themselves.

  Bottom line: light, geeky, fun.