and has 0 comments
Today the free antivirus Avast reported my BitTorrent installation as infected with Win32:Evo-Gen (Susp). It also promptly removed the executable of the program. I tried to reinstall it, also to have the same happen to the installation program. I've reported a false positive (I hope it is) for BitTorrent Stable (7.9.8 build 42577) and then I added *bittorrent* as an exclusion pattern in Avast. I could then reinstall the program, retaining all of the settings and torrents in the download list.

and has 0 comments
I got this bundle of four books in the Alcatraz series, by Brandon Sanderson, and since I loved all of his books so far, I enthusiastically started to read them. First, you need to understand that there are five books in the series, with The Dark Talent just published. I think it pays to wait until you have it before you read the whole thing, since it pretty much reads as one long story (unless you read the beginning and then immediately skip to the end of each book, heh heh heh!). Second, you need to know that at the beginning the writing will appear waaaay too silly and under par compared to the author's other works. After a while it kind of grew on me, but be aware that it is written mostly for kids. Sanderson even says so in the story itself.

So at first I kind of thought this will be the series that breaks the rule, the one that I would not enjoy reading, and it took some time to shake off this feeling. It feels like the author could not make up his mind on what to write: the obligatory book about writers (after all, the rule says "write what you know", so in the end it's inevitable) or the recently obligatory Harry Potter spoof (which fantasy authors are peer pressured into writing). In the end he wrote something that has both: a story about a boy discovering he has magical powers and also a book filled with meta comments and breaks in the Fourth Wall (the character has the Breaking Talent, see) that shows some of the tools and processes in the writing business.

In fact, the more I read, the more I enjoyed the books. The characters are as always incredibly (annoyingly) positive and there is that Sanderson smartness behind even the silliest of exchanges. He references future scenes, with book and page number (which is amazing if you think about it), he hooks you to a scene then berates himself on using hooks in the book, he uses really silly and out of context details only to use them at full effectiveness a book later. In the end, you get something that children will undoubtedly read with giggling pleasure and that adults (especially those interested in writing) will see as a deconstruction of the writing process.

Now, I still feel the Alcatraz series is one of the lesser Brandon Sanderson books. The silliness sometimes feels forced and the way he writes each book changes slightly, as he experiments with the crazy shenanigans that he started with this series. It's still very entertaining, though. Give it a try, or give it to your children so they can get into writing themselves and make your life a living hell when they grow up :)

and has 0 comments

I was in some sort of American campus, one of those where they found a gimmick to show how cool they are. In this case it was reptiles. Alligators were moving on the side of the street, right next to people. They would hiss or even try to bite if you got close enough. I was moving towards the exit, which was close to the sea and leading into a sort of beach, and I was wondering what would happen if one of these large three meter animals would bite someone. And suddenly something happened. Something large, much larger than a crocodile, came out of the water and snapped a fully grown adult alligator like a stork snatches frogs. What was that thing? A family of three was watching, fascinated, moving closer to see what was going on. "Are these people stupid?" I thought.

Sure enough, the dinosaur thing bites through the little kid. The father jumps to help but is completely ignored, his sudden pain and anguish irrelevant to the chomping reptile. Other tourists were being attacked. One in particular drew my eyes: he had one of the things pulling with its teeth on his backpack. The man acted like something annoyed him, making repeated "Ugh!" sounds while trying to climb back on the walkway. His demeanor indicated that he didn't care at all of the reptiles, temporary annoyances that stopped him from returning to normal life. At one moment he paused to scratch an itch on his face. His fingers were bitten clean off, blood gurgling from the stumps. He was in shock.

Somehow, a famous actor jumped from somewhere and made it clear that it had been just a show, although the realism of it was so extreme that I felt an immediate cognitive dissonance and the scene switched.

I was at the villa of a very rich friend and there was wind outside. Really strong wind. Nearby wooden booths that were at the edge of his garden were pushed towards me, threatening to crush me against the wall of the building. I went inside, telling my friend that he should take whatever he needs from outside because the wind is going to tear them away. He calmly pressed a button on a remote and an inflatable wall grew around the compound, holding the wind at bay. Cool trick, I thought, trying to figure out if it was firmly anchored to the ground by wires or it got all its structural strength from the material it was made of. It had to have some sort of Kevlar-like component, otherwise it would have been easy to puncture. I calculated the cost of such a feature as astronomical.

We went in, climbing to the second or third floor. Out the window I saw buildings, like near the center of Bucharest. The wind was wreaking havoc on whatever was not firmly fixed. The building across the street was 10 floors tall and apparently someone was doing roofing work when the wind started. The fire used to heat up the tar was now really intense from all the fresh oxygen and smoke was billowing strongly. Suddenly the top floor caught fire, flames stoked by the wind into unstoppable blazes. I called my friend to the window and showed him what was going on. While we watched, floor after floor were engulfed in flames, explosions starting to be heard from within. The building caught fire like a cigarette smoked in haste. I thought that if the wind went through the building, via broken windows and walls, then the a very high temperature could be reached. And as I was thinking that, the building went down. It didn't collapse, instead it leaned towards our villa and crashed right next to it.

I panicked. I knew that on 9/11 the towers collapsed because of the intense heat, but a nearby building was structurally weakened by the towers falling and it too went down eventually. I ran towards the ground floor, jumping over stairs. If the villa would crumble, I did not want to get stuck inside. While fleeing, I was considering my options. The building that went down had people in it, maybe I should head right there and help people get out. Then I also remembered the thousands of people helping after the September 11 that later got lung cancer from all the toxic stuff they put into buildings. Besides, what do I know about rescuing people from rubble? I would probably walk on their heads and crash more stuff on top of the survivors. I've decided against it, feeling like an awful human being that is smart enough to rationalize anything.

As I got down I noticed too things. First there was a large suffocating quantity of white big grained dust being blown by the wind, making it hard to breath. The light was filtered through this dust, giving it a surreal dusky quality. I used my shirt to create a makeshift breathing mask for the dust. Then there were the people. A sexy young woman, short skirt, skimpy blouse, long hair, the type that you see prowling the city centers, was now crawling on the ground, left leg sectioned just under the knee, but not bleeding, instead ending with a carbonized stump, like a lighted match that you put off before it disintegrated. I saw people with their skins completely burned off, moving slowly through the rubble, reaching for me. Some where nothing more than bloodied skeletons, some were partially covered in tar, the contrast of red inflamed burns and black tar or burned clothing evoking demonic visions of hell. These people were crying in pain and coming towards me from all sides, climbing over the ruins of buildings and cars. The smell was awful. I ran, parkour style.

As I woke up I tried to remember as much of the dream as possible. I still have no idea what I was doing in that campus and how I had gotten there. I was fascinated by the scene with the shocked tourist, as it implies knowledge of human behavior that I don't normally possess when awake. It was completely believable, yet at the same time bizarre and unexpected. Great scene! I've had disaster dreams before, top quality special effects and all, but never was I confronted with the reality of the aftermath (I usually died myself in those). I am certain I made logical decisions in the situation, but at the same time I felt ashamed at the powerlessness, fear and the fact that I was running away. Should I have stayed and helped (and gotten cancer)? Should I have rushed home and blogged about it? Maybe a strategic retreat in which I would have conferred with experts and then maybe got back with professional reinforcements?

What felt new was that while having all those random thoughts and running, I wasn't really thinking of my life. I wasn't considering whether my life is worth saving or what the purpose of my running actually is. It wasn't thinking at all; just running. It was visceral, like my body was on autopilot, taking that stupid head away from the danger before he thinks itself to death. Considering it was a virtual body in my actual head, that's saying something. I am not just sure what.

Anyway, beats the crap out of the one with being late for the exam, not having studied anything...

and has 0 comments
Copyright is something that sounds, err... right. It's about keeping the profits of the sale of work to its author. If I write a book, I expect to get the money from it, not somebody who would print it and sell it in my name. With the digital revolution, copy costs collapsed to zero: anyone can copy anything and spread it globally for free. So what does that mean for me, the author? Well, it sucks. The logical thing for me is to try to fight for my rights, turn to the law which is supposed to protect me, find technical ways of making copying my work harder, preferably impossible, going for the people who are chipping away at my hard earned profits.

However, that only makes sense from the author's point of view, specific to their work, outside of any context. Now, imagine a system that allows me to anonymously share anything to anyone, while they also can get it with no chance of being identified or even detected. Me, the author, can see that my work is being circulated, but I can't stop it or determine who is spreading it. My only recourse is to try to dismantle this devilish system that destroys my living, with the full support of law and various companies that want to get their share of the profits. And here comes the context: in order for me (and the entire media industry) to be able to protect my way of life I need to actively fight against any method that allows anonymous and/or secret sharing of information. In fact, I would be fighting for a global system of surveillance and censorship.

As you have seen, I tried to present the situation from the standpoint of the author. I feel for the poor guy (less for the snakes that get most of the money by controlling distribution and marketing), however it is clear that I can't be on his side. The hypothetical system that I have been describing kind of exists right now in various forms and media companies are making efforts to thwart attempts to make it more user friendly and more widespread. What we see today is the logical continuation of the situation. Forget intelligence companies looking for terrorist threats. Forget tyrannical states trying to find dissidents. Forget the political correctness police trying to expose and shame people for their beliefs. The real money is in stopping people distributing knowledge for free and it has direct (and dire) implications on our ability to speak freely.

Note: I used the expression "speak freely" rather than free speech, because free speech is not actually free at all. Read about it and you will see that is a completely different thing than the ability and permission to express yourself without repercussions.

Of course, I am not the first one to have thought of that. In the world it often has become more effective to use copyright in order to censor things you don't agree with. Just google it and see. Every time you hear something about economic accords, freedom of information, net neutrality, they are all talking about the same thing, only from different angles. The war is active, in full swing, with all of us possible casualties, yet few people are even aware of it. Oh, I am not saying that you can do anything about it... or can you?... but at least you should know about it.

No, it's not about mine, although this blog has had its ups and down. What I want to talk about is the list of blogs I am following and how it (d)evolved.

When I was an enthusiastic beginner in software development I was hunting for interesting blogs that would give me valuable insights into the minds of good developers, the quirks of frameworks, the hidden tools and processes that would make my life better. I was adding blog after blog to my RSS list. Later on, I kind of stopped. I had things to do, work to be done and unfortunately went through some jobs that were not conducive to learning. Perhaps seeing myself as an expert also hindered enthusiasm in learning (note to self: don't do that!). The obsolescence of the tool I was using to read RSS with and the death of Google Reader also did not help. So recently I just went back to that list of blogs and started organizing it with a new tool. I use Feedly now, in case you were wondering.

Today I had an epiphany. I have over 150 blogs that I am "following", 100 of which are software related, yet only very few of them are actually spewing content anymore. In my three year hiatus from blog reading most of the technical blogs just ... stopped. Some of them just plain vanished, complete with content that I had linked to in my own articles. At that time I was considering blogs as permanent as you can get. I mean people just write stuff for the heck of it, so others can read and learn. There would be no reason for any of this to disappear - there are still pages from 1990 active on the Internet, for crying out loud! So what happened?

One theory is that blogs were created as representations of a person's evolution. For example you are a good WPF programmer and you create something like Dr. WPF's blog. When you stop doing WPF (because Microsoft dropped the ball with it!) you stop writing. Perhaps the author still blogs in other places, other blogs that are thematic, I don't know. Another theory is that people just blog at a certain stage in their life; it's like a quarter-life crisis. When they mature, people stop blogging (which says something...). Maybe the social media explosion pushed people away from personalized platforms and they do all their publishing on Twitter, Facebook, LinkedIn, Medium and so on. As the IT industry moves at an ever increasing pace, the blogs may turn into antiquated relics that are obsolete by the time several posts have been published.

I feel sad either way. When I started blogging, people would come to me for help. After all I started the site years before StackOverflow arrived on the scene. I would write about programming, books, anime, life, personal ideas, jokes, space, science, rants, whatever. It happened several times that I was looking for a solution to a problem and found myself explaining it in an older post. People still praise some posts because they refer technology that is maybe a decade old. Others for getting the full picture on how I got to the end result. So for all these vanishing blogs, I feel a sense of loss for all the knowledge that was lost, for all the voices that turned silent.

I know that as a blog dies ten others appear, but there is no sense of origin anymore, no chronological timeline of the evolution of the person writing. I can even go down the "they are not making them like they used to" road. For me a blog would have functioned as a sort of resumé of someones's work. If I liked an article, I would look at others, maybe subscribe. This way I would be connected not with a concept, but with a person: as they grew, I grew. And SEO be damned, I don't care people don't discover my blog anymore because Google can't make up its mind on what I am actually writing about. When people do come, they see me, not just disparate out of context solutions to their 5 minute problems.

So I wrote this article to express my sorrow. I guess that I miss my friends, even if they never knew me.

and has 0 comments
The Emperor's Soul is set in the same world as Elantris, but for all intents and purposes it is a standalone story. It's not a full size book, but it's a bit longer than a short story. In it we find a type of magic called Forging, by which someone can carve and use a complicated seal to change the history of an object or, indeed, a soul. The forging needs to be very precise and as close as possible to the actual history of the target, otherwise the magic doesn't "stick". So what must a brilliant forger do in order to do the forbidden soul forging? They must know their target so intimately that they can't ultimately hurt them.
An intriguing idea, yet Brandon Sanderson does what he does best and focuses on the characters, while the story itself reveals the underlying motivation of each and how it affects the final outcome. The descriptions are minimal and the ending is optimistic and personal as Sanderson often likes his finales.
Since the story is short and I had no trouble reading it in a single evening, it is almost a pity not to read it, even if in itself is just a tiny "what if" tale and offers nothing spectacular.

and has 0 comments
I've read the book in a day. Just like the other two in the series (Steelheart and Firefight), I was caught up in the rhythm of the characters and the overwhelming positivity of the protagonist. Perhaps strange, I kind of missed descriptions in this one, as both locations and characters were left to the imagination and everything was action and dialogue.

Brandon Sanderson ends (why!!?? Whyy?!!?) the Reckoners series with this third book called Calamity. You absolutely have to read the other two books to understand anything about it. In fact, Reckoners feels more like a single story released in three installments than a true trilogy. In it, the team has to reckon (heh heh heh) with their leader going rogue and have a decision to make: either bring him back or kill him. All this while fighting off various Epics in different stages of madness.

I can't really say anything about the story without spoiling the hell out of it. I loved the first two books and I loved this one. Indeed, I have yet to find a Brandon Sanderson book that I don't like. If you are into superhero stories, the Reckoners series has a refreshingly original plot, a wonderful main character and true debate about what heroism really is. I recommend it highly!

and has 0 comments
Some times you need to repeat a code block a number of times and the solution is often a for block.
for (var i=0; i<n; i++)...
This is a complex line to write and most importantly obscures the intent of the code. Wouldn't it be better to have some kind of construct that says "repeat N times" and be intuitively easy to understand? Well there is one:
while (n-->0) ...
No, it isn't some C# construct that you have not heard of before, it's a while loop that checks on the value of n, then decrements it. But it looks great! It almost reads as "while n moves to 0". I liked it and I thought I should share.

and has 0 comments
The Mirror Thief is a really interesting book. It is well written, original in ideas and Martin Seay has his own unique writing style. It is also a very deceptive book, always changing shape, misleading the reader over what he is actually reading.

The book starts in Vegas, with a black former military policeman named Curtis arriving in search of a certain Stanley, in order to give him a message from a friend. The story feels like a detective-noir, but immediately there are things that just don't belong. The apparent mystical talents of Stanley, the very detailed descriptions of the surroundings, using rarely met but very specific words. At the time I thought I was going to read some sort of mystical noir thing akin to Cast a Deadly Spell.

But then the plot switches to the story of Stanley when he was a kid, hustling people on the street and doing various other bad things, his only anchor a book called The Mirror Thief, a poetry book that describes the adventures of a certain Vettor Crivano in Venice at the end of the 16th century. While he doesn't really understand what it is about, he feels that the unnaturally meandering book hides some sort of universal secret. As a reader, you start to doubt what you have read until now. After all, aren't you reading a confusing meandering book with a lot of unexplainable details? Stanley has traversed the United States in order to find the author of the book and ask him to reveal the mystical secret that would give him the power he craves.

And just when you think you figured it out, the book reveals a third story, that of Crivano himself, but not in poetry and apparently unrelated to the book about him. While he is an alchemist and physician, his best skill appears to be fighting, and he only uses it effectively at the end. All main characters: Curtis, Stanley and Crivano feel absurdly human and flawed. Curtis carries a gun everywhere and he doesn't get to use it once, while being disarmed multiple times. He doesn't get what's going on up until the very end. Crivano is also deceived several times, another pawn in the big game of life. Paradoxically Stanley is the one most in control of his life, mostly by rejecting everything society considers normal or even moral and choosing his every step.

To me, the book was most of all about perception. The reader is confusedly pinballed from perspective to perspective, even with each of them painstakingly detailed. While reading the book you learn new words, old words, history of three different times and places and intimately get to know each character. When you get tricked, you are just following characters that get tricked, disappointed or set up themselves. The three stories are really completely unrelated, at most red herrings when mentioned in the others, and offer little closure. It is all about understanding there are other ways of looking at the world.

Bottom line: Martin Seay is often accosted by readers begging for explanations of what they have read. You cannot read the book and not feel it is a good book, but actually enjoying it is a different thing altogether. At the end you start thinking about the book, about the world, about yourself, wondering if you didn't just read the whole thing wrong and whether maybe you should start over.

Other resources:

and has 0 comments
I have been playing a little with the Houdini chess engine and Chess Arena. I limit the ELO of the engine to a set value and then I try to beat it. It's not like playing a human being, but saves me the humiliation of being totally thrashed by another person :) Plus I didn't have Internet. In this case the ELO was set to 1400.

One of the games I played turned out to be extremely interesting (and short). Black tried the Elephant Gambit, to which I replied clumsily, but then it made two horrible mistakes and I saw the correct continuation. What I thought was really interesting is how the computer reacted. In what I thought would lead to some sort of piece advantage after a king and rook fork turned out to be a mating situation. The only solution for the computer was to sacrifice the queen and trying to save her would have resulted in mates or even worse situations.
I will publish the PGN here, so you can explore the variations, but before that I want to point out another greatly interesting move. As the Black queen attempts to escape, the next move is a king-queen-rook fork, yet after the king moves White does not capture the queen but does a seemingly random move: 12. Qd4. Why is that? I leave you to check it out for yourselves!

1. e4 e5 2. Nf3 d5 {The Elephant Gambit} 3. Nxe5 Qf6 4.
d4 dxe4 5. Nc3 Bb4 6.
Bd2 Qf5 {A really bad move giving a 3 point advantage to
White.} 7. Nd5 Bd6 8. c3 {My turn for a bad move,
anything went there, Bc4, g4, but I did this, going back to 1 pawn
advantage.} Bxe5 9. dxe5 Qxe5 {Disastrous move, but interesting. Check out the continuations.} (9.
.. Qd7 {This would have been the only decent move.}) 10. Bf4 Qxf4 {Amazingly, the only move here is to sacrifice the
queen for either bishop or knight. Attempts to save the queen lead to
mate!} (10. .. Qf5 {queen attempts to escape.} 11. Nxc7+ {leads to mate in
1. no matter when the king goes.} Kf8 (11. .. Ke7 12. Qd6#) 12. Qd8#) (10.
.. Qe6 {queen attempts to live just a bit longer.} 11. Nxc7+ {Chessgasm!
and Black's pain is only beginning.} Ke7 12. Qd4 {This is the most
interesting move here and by far the best by Houdini's calculations.} Nf6
(12. .. Nc6 13. Qc5+ Kd8 14. O-O-O+ Nd4 15. Rxd4+ Bd7 16. Qf8+ Qe8 17.
Qxe8#) (12. .. Qc6 13. Bb5 Qf6 14. Qb4+ Kd8 15. Qf8#) (12. .. Qf6 13. Qc5+
Kd7 14. O-O-O+ Qd4 15. Rxd4#) 13. Qc5+ Kd8 14. Nxe6+ Bxe6 15. Qc7+ Ke8 16.
Qxb7 Bd5 17. Qc8+ Ke7 18. Qxh8) 11. Nxf4 Nc6 {White wins
easily from +11 points} 1-0

and has 0 comments
My mind has been wandering around the concept of gamification for a few weeks now. In short, it's the idea of turning a task into a game to increase motivation towards completing it. And while there is no doubt that it works - just check all the stupid games that people play obsessively in order to gain some useless points - it was a day in the park that made it clear how wrong the term is in connecting point systems to games.



I was walking with some friends and I saw two kids playing. One was shooting a ball with his feet and the other was riding a bicycle. The purpose of their ad-hoc game was for the guy with the ball to hit the kid on the bike. They were going at it again and again, squealing in joy, and it hit me: they invented a game. And while all games have a goal, not all of them require points. Moreover, the important thing is not the points themselves, but who controls the point system. That was my epiphany: they invented their own game, with its own goal and point system, but they were controlling the way points were awarded and ultimately how much they mattered. The purpose of the game was actually to challenge the players, to gently explore their limitations and try to push boundaries a little bit further out. It wasn't about losing or winning, it was about learning and becoming.

In fact, another word for point system is currency. We all know how money relates to motivation and happiness, so how come we got conned into believing that turning something into a game means showing flashy animations filled with positive emotion that award you arbitrary sums of arbitrary types of points? I've tried some of these things myself. It feels great at times to go up the (arbitrary) ranks or levels or whatever, while golden chests and diamonds and untold riches are given to me. But soon enough the feeling of emptiness overcomes the fictional rewards. I am not challenging myself, instead I am doing something repetitive and boring. That and the fact that most of these games are traps to make you spend actual cash or more valuable currency to buy theirs. You see, the game of the developers is getting more money. And they call it working, not playing.

I was reading this book today and in it a character says that money is the greatest con: it is only good for making more money. Anything that can be bought can be stolen. And it made a lot of sense to me. When you play gamified platforms like the ones I am describing, the goal of the game quickly changes in your mind. You start to ignore pointless (pardon the pun) details, like storyline, character development, dialogues, the rush of becoming better at something, the skills one acquires, even the fun of playing. Instead you start chasing stars, credits, points, jewels, levels, etc. You can then transfer those points, maybe convert them into something or getting more by converting money into them. How about going around and tricking other players to give you their points? And suddenly, you are playing a different game, the one called work. Nobody can steal the skills you acquire from you, but they can always steal your title or your badge or your trophy or the money you made.

What I am saying is that games have a goal that defines them. Turning that goal into a metric irrevocably perverts the game. Even sports like football, that start off as a way of proving your team is better than the other team and incidentally improving your physical fitness, turn into ugly deformed versions of themselves where the bottom line is getting money from distribution rights, where goals can be bought or stolen by influencing a referee, for example.

I remember this funny story about a porn game that had a very educational goal: make girls reach orgasm. In order to translate this into a computer program, the developers had several measurements of pleasure - indicated in the right side of the screen as colored bars - which all had to go over a threshold in order to make the woman cum. What do you think happened? Players ignored the moaning image of a naked female and instead focused solely on the bars. Focus on the metric and you ignore the actual goal.

To summarize: a game requires of one to define their limits, acknowledge them, then try to break them. While measuring is an important part of defining limits, the point is in breaking them, not in acquiring tokens that somehow prove it to other people. If you want to "gamify" work, then the answer is to do your tasks better and harder and to do it for yourself, because you like who you become. When you do it in order to make more money, that's work, and to win that game you only need to trick your employers or your customers that you are doing what is required. And it's only play when you enjoy who you are when doing it.

As an aside, I know people that are treating making money like a game. For them making more and more money is a good thing, it challenges them, it makes them feel good about themselves. They can be OK people that sometimes just screw you over if they feel the goal of their game is achieved better by it. These people never gamified work, they were playing a game from the very start. They love doing it.

Stay true to the goal! That is the game.

Lab Girl should have been the kind of book I like: a deeply personal autobiography. Hope Jahren writes well, also, and in 14 chapters goes through about 20 years of her life, from the moment she decided she would be a scientist to the moment when she was actually accepted as a full professor by academia. She talks about her Norwegian family education, about the tough mother that never gave her the kind of love she yearned for, she talks about misogyny in science, about deep feelings for her friends, she talks about her bipolar disorder and her pregnancy. Between chapters she interposes a short story about plants, mostly trees, as metaphors for personal growth. And she is an introvert who works and is best friends with a guy who is even more an introvert than she is. What is not to like?

And the truth is that I did like the book, yet I couldn't empathize with her "character". Each chapter is almost self contained, there is no continuity and instead of feeling one with the writer I was getting the impression that she overthinks stuff and everything I read is a memory of a memory of a thought. I also felt there was little science in a book written by someone who loves science, although objectively there is plenty of stuff to rummage through. Perhaps I am not a plant person.

The bottom line is that I was expecting someone autopsying their daily life, not paper wrapping disjointed events that marked their life in general. As it usually is with expectations, I felt a bit disappointed when the author had other plans with her book. It does talk about deep feelings, but I was more interested in the actual events than the internal projection of them. However if you are the kind of person who likes the emotional lens on life, you will probably like the book more than I did.

I am going to discuss in this post an interview question that pops up from time to time. The solution that is usually presented as best is the same, regardless of the inputs. I believe this to be a mistake. Let me explore this with you.

The problem



The problem is simple: given two sorted arrays of very large size, find the most efficient way to compute their intersection (the list of common items in both).

The solution that is given as correct is described here (you will have to excuse its Javiness), for example. The person who provided the answer made a great effort to list various solutions and list their O complexity and the answer inspires confidence, as coming from one who knows what they are talking about. But how correct is it? Another blog post describing the problem and hinting on some extra information that might influence the result is here.

Implementation


Let's start with some code:

var rnd = new Random();
var n = 100000000;
int[] arr1, arr2;
generateArrays(rnd, n, out arr1, out arr2);
var sw = new Stopwatch();
sw.Start();
var count = intersect(arr1, arr2).Count();
sw.Stop();
Console.WriteLine($"{count} intersections in {sw.ElapsedMilliseconds}ms");

Here I am creating two arrays of size n, using a generateArrays method, then I am counting the number of intersections and displaying the time elapsed. In the intersect method I will also count the number of comparisons, so that we avoid for now the complexities of Big O notation (pardon the pun).

As for the generateArrays method, I will use a simple incremented value to make sure the values are sorted, but also randomly generated:

private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    int s1 = 0;
    int s2 = 0;
    for (var i = 0; i < n; i++)
    {
        s1 += rnd.Next(1, 100);
        arr1[i] = s1;
        s2 += rnd.Next(1, 100);
        arr2[i] = s2;
    }
}


Note that n is 1e+7, so that the values fit into an integer. If you try a larger value it will overflow and result in negative values, so the array would not be sorted.

Time to explore ways of intersecting the arrays. Let's start with the recommended implementation:

private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1<arr1.Length && p2<arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch(v1.CompareTo(v2))
        {
            case -1:
                p1++;
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2++;
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}


Note that I am not counting the comparisons of the two pointers p1 and p2 with the Length of the arrays, which can be optimized by caching the length. They are just as resource using as comparing the array values, yet we discount them in the name of calculating a fictitious growth rate complexity. I am going to do that in the future as well. The optimization of the code itself is not part of the post.

Running the code I get the following output:

19797934 comparisons
199292 intersections in 832ms


The number of comparisons is directly proportional with the value of n, approximately 2n. That is because we look for all the values in both arrays. If we populate the values with odd and even numbers, for example, so no intersections, the number of comparisons will be exactly 2n.

Experiments


Now let me change the intersect method, make it more general:

private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1 < arr1.Length && p2 < arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch (v1.CompareTo(v2))
        {
            case -1:
                p1 = findIndex(arr1, v2, p1, ref comparisons);
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2 = findIndex(arr2, v1, p2, ref comparisons);
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    p++;
    while (p < arr.Length)
    {
        comparisons++;
        if (arr[p] >= v) break;
        p++;
    }
    return p;
}

Here I've replaced the increment of the pointers with a findIndex method that keeps incrementing the value of the pointer until the end of the array is reached or a value larger or equal with the one we are searching for was found. The functionality of the method remains the same, since the same effect would have been achieved by the main loop. But now we are free to try to tweak the findIndex method to obtain better results. But before we do that, I am going to P-hack the shit out of this science and generate the arrays differently.

Here is a method of generating two arrays that are different because all of the elements of the first are smaller than the those of the second. At the very end we put a single element that is equal, for the fun of it.

private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    for (var i = 0; i < n - 1; i++)
    {
        arr1[i] = i;
        arr2[i] = i + n;
    }
    arr1[n - 1] = n * 3;
    arr2[n - 1] = n * 3;
}


This is the worst case scenario for the algorithm and the value of comparisons is promptly 2n. But what if we would use binary search (what in the StackOverflow answer was dismissed as having O(n*log n) complexity instead of O(n)?) Well, then... the output becomes

49 comparisons
1 intersections in 67ms

Here is the code for the findIndex method that would do that:

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var start = p + 1;
    var end = arr.Length - 1;
    if (start > end) return start;
    while (true)
    {
        var mid = (start + end) / 2;
        var val = arr[mid];
        if (mid == start)
        {
            comparisons++;
            return val < v ? mid + 1 : mid;
        }
        comparisons++;
        switch (val.CompareTo(v))
        {
            case -1:
                start = mid + 1;
                break;
            case 0:
                return mid;
            case 1:
                end = mid - 1;
                break;
        }
    }
}


49 comparisons is smack on the value of 2*log2(n). Yeah, sure, the data we used was doctored, so let's return to the randomly generated one. In that case, the number of comparisons grows horribly:

304091112 comparisons
199712 intersections in 5095ms

which is larger than n*log2(n).

Why does that happen? Because in the randomly generated data the binary search find its worst case scenario: trying to find the first value. It divides the problem efficiently, but it still has to go through all the data to reach the first element. Surely we can't use this for a general scenario, even if it is fantastic for one specific case. And here is my qualm with the O notation: without specifying the type of input, the solution is just probabilistically the best. Is it?

Let's compare the results so far. We have three ways of generating data: randomly with increments from 1 to 100, odds and evens, small and large values. Then we have two ways of computing the next index to compare: linear and binary search. The approximate numbers of comparisons are as follows:

RandomOddsEvensSmallLarge

Linear 2n 2n 2n
Binary search 3/2*n*log(n) 2*n*log(n) 2*log(n)

Alternatives


Can we create a hybrid findIndex that would have the best of both worlds? I will certainly try. Here is one possible solution:

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var inc = 1;
    while (true)
    {
        if (p + inc >= arr.Length) inc = 1;
        if (p + inc >= arr.Length) return arr.Length;
        comparisons++;
        switch(arr[p+inc].CompareTo(v))
        {
            case -1:
                p += inc;
                inc *= 2;
                break;
            case 0:
                return p + inc;
            case 1:
                if (inc == 1) return p + inc;
                inc /= 2;
                break;
        }
    }
}


What am I doing here? If I find the value, I return the index; if the value is smaller, not only do I advance the index, but I also increase the speed of the next advance; if the value is larger, then I slow down until I get to 1 again. Warning: I do not claim that this is the optimal algorithm, this is just something that was annoying me and I had to explore it.

OK. Let's see some results. I will decrease the value of n even more, to a million. Then I will generate the values with random increases of up to 10, 100 and 1000. Let's see all of it in action! This time is the actual count of comparisons (in millions):

Random10Random100Random1000OddsEvensSmallLarge

Linear 2 2 2 2 2
Binary search 30 30 30 40 0.00004
Accelerated search 3.4 3.9 3.9 4 0.0002


So for the general cases, the increase in comparisons is at most twice, while for specific cases the decrease can be four orders of magnitude!

Conclusions


Because I had all of this in my head, I made a fool of myself at a job interview. I couldn't reason all of the things I wrote here in a few minutes and so I had to clear my head by composing this long monstrosity.

Is the best solution the one in O(n)? Most of the time. The algorithm is simple, no hidden comparisons, one can understand why it would be universally touted as a good solution. But it's not the best in every case. I have demonstrated here that I can minimize the extra comparisons in standard scenarios and get immense improvements for specific inputs, like arrays that have chunks of elements smaller than the next value in the other array. I would also risk saying that this findIndex version is adaptive to the conditions at hand with improbable scenarios as worst cases. It works reasonable well for normally distributed arrays, it does wonders for "chunky" arrays (in this is included the case when one array is much smaller than the other) and thus is a contender for some kinds of uses.

What I wanted to explore and now express is that finding the upper growth rate of an algorithm is just part of the story. Sometimes the best implementation fails for not adapting to the real input data. I will say this, though, for the default algorithm: it works with IEnumerables, since it never needs to jump forward over some elements. This intuitively gives me reason to believe that it could be optimized using the array/list indexing. Here it is, in IEnumerable fashion:

private static IEnumerable<int> intersect(IEnumerable<int> arr1, IEnumerable<int> arr2)
{
    var e1 = arr1.GetEnumerator();
    var e2 = arr2.GetEnumerator();
    var loop = e1.MoveNext() && e2.MoveNext();
    while (loop)
    {
        var v1 = e1.Current;
        var v2 = e2.Current;
        switch (v1.CompareTo(v2))
        {
            case -1:
                loop = e1.MoveNext();
                break;
            case 0:
                loop = e1.MoveNext() && e2.MoveNext();
                yield return v1;
                break;
            case 1:
                loop = e2.MoveNext();
                break;
        }

    }
}

Extra work


The source code for a project that tests my various ideas can be found on GitHub. There you can find the following algorithms:

  • Standard - the O(m+n) one described above
  • Reverse - same, but starting from the end of the arrays
  • Binary Search - looks for values in the other array using binary search. Complexity O(m*log(n))
  • Smart Choice - when m*log(n)<m+n, it uses the binary search, otherwise the standard one
  • Accelerating - the one that speeds up when looking for values
  • Divide et Impera - recursive algorithm that splits arrays by choosing the middle value of one and binary searching it in the other. Due to the complexity of the recursiveness, it can't be taken seriously, but sometimes gives surprising results
  • Middle out - it takes the middle value of one array and binary searches it in the other, then uses Standard and Reverse on the resulting arrays
  • Pair search - I had high hopes for this, as it looks two positions in front instead of one. Really good for some cases, though generally it is a bit more than Standard


The testing tool takes all algorithms and runs them on randomly generated arrays:

  1. Lengths m and n are chosen randomly from 1 to 1e+6
  2. A random number s of up to 100 "spikes" is chosen
  3. m and n are split into s+1 equal parts
  4. For each spike a random integer range is chosen and filled with random integer values
  5. At the end, the rest of the list is filled with any random values

Results


For really small first array, the Binary Search is king. For equal size arrays, usually the Standard algorithm wins. However there are plenty of cases when Divide et Impera and Pair Search win - usually not by much. Sometimes it happens that Accelerating Search is better than Standard, but Pair Search wins! I still have the nagging feeling that Pair Search can be improved. I feel it in my gut! However I have so many other things to do for me to dwell on this.

Maybe one of you can find the solution! Your mission, should you choose to accept it, is to find a better algorithm for intersecting sorted arrays than the boring standard one.

and has 1 comment
While reading the book Introduction to Algorithms, Third Edition, by Thomas H. Cormen and Charles E. Leiserson, I found a little gem about simultaneously finding the minimum and maximum value in an array in 3*n/2 comparisons instead of the usual 2n. The trick is to take two numbers at a time, compare them with each other and only then compare the smallest one with the minimum and the largest with the maximum.

So instead of:
var min=int.MaxValue;
var max=int.MinValue;
for (var i=0; i<arr.Length; i++) {
var val=arr[i];
if (val>max) max=val;
if (val<min) min=val;
}
you can use this:
var min=int.MaxValue;
var max=int.MinValue;
for (var i=0; i<arr.Length-1; i+=2) {
var v1=arr[i];
var v2=arr[i+1];
if (v1>v2) {
if (v1>max) max=v1;
if (v2<min) min=v2;
} else {
if (v2>max) max=v2;
if (v1<min) min=v1;
}
}
if (arr.Length%2==1) {
var v=arr[arr.Length-1];
if (v>max) max=v;
if (v<min) min=v;
}

In the first case, we take all n values and compare them with the min and max values respectively, so n times 2. In the second example we take every two values (so n/2 times), compare them with each other (1 comparison) and then we compare the smaller value with min and the larger with max (another 2 comparisons), with a combined number of comparisons of n/2 times 3 (plus 2 extra ones if the number of items in the array is odd).

Update: Here is a variant for an IEnumerable<int>, the equivalent of a foreach:
var enumerator = enumerable.GetEnumerator();
var min = int.MaxValue;
var max = int.MinValue;
while (enumerator.MoveNext()) {
var v1 = enumerator.Current;
var v2 = enumerator.MoveNext() ? enumerator.Current : v1;
if (v1 > v2)
{
if (v1 > max) max = v1;
if (v2 < min) min = v2;
}
else
{
if (v2 > max) max = v2;
if (v1 < min) min = v1;
}
}