You will quickly understand why I felt the need to say I was unbiased, but let me first demonstrate how much unbiased I was: I went into this raw fruits store, with an errand from the wife, and wanted to get something from me. Usually I like the caju and macadamia nuts, but I didn't want to have the conversation about why did I spent so much on something I eat out of boredom, so I looked around to get something else. And here they were, packaged and sold just like any other dried fruits or nuts: bitter apricot kernels. So I bought a 200 g bag.

Back in the office, I opened the bag up and I started eating. They were bitter as hell, but I didn't mind it much. I was eating some of them, then switching to candied ginger (which I'd absolutely love if it weren't so sweet), then back again. After a while, though, I'd had enough. About half of the bag in, I couldn't really find a reason to keep eating them. My colleagues had all refused to eat (and spit) more than half of one. But I was curious what they were actually for. People who love bitter tastes, maybe?

So went on the Internet and KABOOOM! mind blown. Just for scale, try to look for yourself at the dimensions of the can of worms I'd just opened: apricot kernels.

Turns out that the "active ingredient" in the apricot kernels is amygdalin, a substance that turns to cyanide in the gut. Yes, you've heard that right: I had just bitten the tooth, dying for the motherland before I could spill the beans. Google had already failed miserably, by serving first a page that explained how Big Pharma and governments conspired to keep this wonder drug from the public. The second page was Wikipedia, then every single conspiracy nut site, sprinkled with the occasional very dry scientific study that bottom lined at "we don't really know".

But I am getting ahead of myself. At this point I was already severely biased and I first need to describe my earnest experience to you. Short story: accelerated heartbeat, fever, terrible headache and nausea that lasted for half a day. Also, didn't die, which was good.

Back to my rant. So, some guy looked at the chemical structure of amygdalin and thought it looked like a B complex vitamin, so he named it vitamin B17. It was quickly marketed as a cure for cancer, despite numerous trials to show that it wasn't. And no, it's not a vitamin for humans either. It is not made in the human body, but it's not needed, either. The bag was not labeled anything dangerous, because it came from the outside of the European Union, which has a law regarding this. Here is some advice for both the EU and the US. Turkey was OK, though, so it only said "great for cancer, eat 5 to 8 seeds daily, not all at once".

So how fucked was I after eating about one hundred of them? A European Food Safety Authority article said that eating three kernels exceeds the safe level for adults. A toddler could do that from just eating one. An article from Cancer Council Australia detailed the child fatalities due to ingesting apricot seeds. Another article was telling me of an adult who got poisoning, but he was both stupid and extreme (he was taking a concentrated extract) and didn't die anyway. A thousand other sites were telling me how amazing my health will be after I had just eaten ten times the daily dosage they suggested.

Drowned in the sea of controversy regarding apricot kernels I've decided to look for the chemical and medicinal treatment for cyanide poisoning. Step 1: decontamination. It was kind of too late to go to the toilet and do the anorexia thing. Step 2: take some amyl nitrite (and then some intravenous things). Wait, that's a party drug. I could maybe get one in a sex shop. There was no home remedy and most of all, even if the amyl nitrite seems to work, no one seems to know exactly why other than the vasodilating effect it obviously has. Another possible antidote is (ironically) hydroxocobalamin, also called vitamin B12a. In the end some vitamin C and a headache pill did wonders, just in case you eat a bunch of apricot kernels and feel awful. Obviously, if it were a serious condition I would have died at the keyboard, trying to wade through the marketing posts and the uselessly dry official reports. Also, not enough easily available party drugs, I dare say.

So, days later the bout of shaky hands, fever and the horrible headache that only blood oxygen deprivation can bring, I decided to write this post. I doubt people will find it with Google, but maybe just my immediate friends will know not to eat this crap.

New Scientist is a science oriented news site that has existed in my periodic reading list for years. They had great content, seemingly unbiased and a good web site structure. But they went greedy. Instead of one in ten articles being "premium" now almost all articles I want to read are behind a pay wall. While I appreciate their content, I will never pay for it, especially when similar (and recently, event better) content can be found on phys.org or arstechnica.com completely free. So, I feel sad, but I need to remove New Scientist from my reading list. I understand there is an effort in what they do and that quality requires investment and cost, but brutally switching from an almost free format to a spammy pay wall is unacceptable for me.

I have invented a new way to write software when people who hold decision power are not available. It's called Flag Assisted Programming and it goes like this: whenever you have a question on how to proceed with your development, instead of bothering decision makers, add a flag to the configuration that determines which way to go. Then estimate for all the possible answers to your question and implement them all. This way, management not only has more time to do real work, but also the ability to go back and forth on their decisions as they see fit. Bonus points, FAPing allows middle management to say you have A/B testing at least partially implemented, and that you work in a very agile environment.

Finally, finally there is a TV series about an asteroid coming towards Earth and what we are going to do about it. It is called Salvation and it fails in every single respect.

The first alarm bell was Jennifer Finnigan, the female costar from Tyrant. She was terribly annoying in that show where she posed as the voice of reason and common sense, while being a nagging and demanding wife to the ruler of a foreign country. I thought "shame on you, Siderite! Just because she was like that in that show it's no reason to hold it against the actress". In Salvation, she plays the annoying nagging and demanding voice of reason and common sense as girlfriend to the secretary of the DOD.

But that's the least of the problems of the show. The idea is that a brilliant MIT student figures out there is an asteroid coming towards Earth. He tells his professor, who then calls someone and then promptly disappears, with goons watching his house. Desperate, he finds a way to reach to an Elon Musk wannabe and tell him the story. Backed by this powerful billionaire, he then contacts the government, which, surprise!, knew all about it and already had a plan. Which fails. Time to bring in the brilliant solution of the people who care: the EM drive! For which there is a need of exactly two billion dollars and one hundred kilograms of refined uranium. And that's just episode 2.

The only moment we actually see the asteroid is in a 3D holographic video projection, coming from most likely a text data file output of a tool an MIT student would build. Somehow that turns into a 3D rendering on the laptop of the billionaire. Not only does it crash into Earth, but it shows the devastation on the planet as a fire front. Really?

Bottom line: imagine something like Madam Secretary which somehow mated with the pilot episode of the X-Files reboot. Only low budget and boring as hell. There is no science, no real plot, no sympathetic characters, nothing but artificial drama which one would imagine to be pointless in a show about the end of the world, and ridiculously beautiful people acting with the skill of underwear models (Mark Wahlberg excluded, of course). Avoid it at all costs.

Update: oh, in episode 3, the last one I will watch, they send a probe to impact the asteroid and they do it like: "OK, we have a go from the president!" And in the next minute they watch (in real time from Io and from a front camera on the probe) how it is heading towards the asteroid. I mean... why write a story and not make it use anything real? What's the point in that? Even superhero movies are more realistic than this disaster. I know the creators of the show did other masterpieces such as Extant and Scream: The TV Series and Hawaii Five-O, they don't know any better, but at least they could have tried to improve just a tiny sliver. Instead they shat on our TV screens.

Fuck Java! Just fuck it! I have been trying for half an hour to understand why a NullPointerException is returned in a Java code that I can't debug. It was a simple String object that was null inside a switch statement. According to this link states that The prohibition against using null as a switch label prevents one from writing code that can never be executed. If the switch expression is of a reference type, that is, String or a boxed primitive type or an enum type, then a run-time error will occur if the expression evaluates to null at run time.

The Mist is a novella by Stephen King that has been adapted into a great horror movie. I mean, I rated the movie 10 out of 10 stars. So when the TV show The Mist came around I was ecstatic. And then... I watched three episodes of the most insipid and obnoxious series I've seen since Under The Dome. Nay, since Fear of the Walking Dead.

Imagine they removed most of the monsters and replaced them with mostly insects, then they enhanced everything else: small town politics, family matters, teenagers, etc. OK, the original Mist was great because it showed the greatest ugliness was not the interdimensional creatures, but the pettiness of humans. However, it was the right balance between the two. Now, in a TV show that censors words like "fuck", you get to see teenage angst, drug rape, power hungry egotistic policemen, one of the most beautiful actresses from Vikings relegated to the role of an overprotective mother, husband and wife interactions - lots of those, junkies, amnesiac soldiers, priests, goth kids, nature freaks, old people... oh, the humanity! Three episodes in which nothing happened other than exposition, introduction of lots of characters no one cares for and that's about it.

I am tired. I really am tired of hearing that price is driven by offer and demand - which is quite true because that's the definition of price, it has nothing to do with actual value. Same with stories: they are all about people, because people care about people and most people are people. No need for anything too exotic when all you need to do to please most people is to show them other most people. Grand from a marketing point of view, but quite pointless overall, I would say. But who's gonna listen to me, I am not most people after all.

Bottom line: lately there has been a lot of effort invested into TV. HBO and Netflix have led the way by caring about their productions enough to make them rival and even beat not only film productions, but also the original literary material. This has led me to hope against hope that The Mist will be the best horror TV show out there, one that would maybe last two or three seasons at most, but burn a bright light. Instead it is a dying fire that wasn't properly lit and is probably going to take two or three seasons just to properly die out without anyone noticing it is gone, yet managing to poison the legacy of the film forever.

As you probably know, whenever I blog something, an automated process sends a post to Facebook and one to Twitter. As a result, some people comment on the blog, some on Facebook or Twitter, but more often someone "likes" my blog post. Don't get me wrong, I appreciate the sentiment, but it is quite meaningless. Why did you like it? Was it well written, well researched, did you find it useful and if so in what way? I would wager that most of the time the feeling is not really that clear cut, either. Maybe you liked most of the article, but then you absolutely hated a paragraph. What should you do then? Like it a bunch of times and hate it once?

This idea that people should express emotion related to someone else's content is not only really really stupid, it is damaging. Why? I am glad you asked - clearly you already understand the gist of my article and have decided to express your desire for knowledge over some inevitable sense of awe and gratitude. Because if it is natural for people to express their emotions related to your work, then that means you have to accept some responsibility for what they get to feel, and then you fall into the political correctness, safe zone, don't do anything for someone might get hurt pile of shit. Instead, accept the fact that sharing knowledge or even expressing an opinion is nothing more than a data signal that people may or may not use. Don't even get me started on that "why didn't you like my post? was it something wrong with it? Are you angry with me?" insecurity bullshit that may be cute coming from a 12 year old, but it's really creepy with 50 year old people.

Back to my amazing blog posts, I am really glad you like them. You make my day. I am glowing and I am filled with a sense of happiness that is almost impossible to describe. And then I start to think, and it all goes away. Why did you like it, I wonder? Is it because you feel obligated to like stuff when your friends post? Is it some kind of mercy like? Or did you really enjoy part of the post? Which one was it? Maybe I should reread it and see if I missed something. Mystery like! Nay, more! It is a riddle, wrapped in a mystery, inside an enigma; but perhaps there is a key. That key is personal interest in providing me with useful feedback, the only way you can actually help me improve content.

Let me reiterate this as clear as I possibly can: the worse thing you can do is try to spare my feelings. First of all, it is hubris to believe you have any influence on them at all. Second, you are not skilled enough to understand in what direction your actions would influence them anyway. And third, feeling is the stuff that fixates memories, but you have to have some memory to fixate first! Don't sell a lifetime of knowing something on a few seconds of feeling gratified by some little smiley or bloody heart.

And then there is another reason, maybe one that is more important than everything I have written here. When you make the effort of summarizing what you have read in order to express an opinion you retrieve and generate knowledge in your own head, meaning you will remember it better and it will be more useful to you.

So fuck your wonderful emotions! Give me your thoughts and knowledge instead.

Today I received two DMCA notices. One of them might have been true, but the second was for a file which started with
/*
Copyright (c) 2010, Yahoo! Inc. All rights reserved.
Code licensed under the BSD License:
http://developer.yahoo.com/yui/license.html
version: 2.8.1
*/
Nice, huh?

The funny part is that these are files on my Google Drive, which are not used anywhere anymore and are accessible only by people with a direct link to them. Well, I removed the sharing on them, just in case. The DMCA is even more horrid than I thought. The links in it are general links towards a search engine for notices (not the link to the actual notice) and some legalese documents, the email it is coming from is noreply-6b094097@google.com and any hope that I might fight this is quashed with clear intention from the way the document is worded.

So remember: Google Drive is not yours, it's Google's. I wonder if I would have gotten the DMCA even if the file was not being shared. There is a high chance I would, since no one should be using the link directly.

Bleah, lawyers!

I have enabled Disqus comments on this blog and it is supposed to work like this: every old comment from Blogger has to be imported into Disqus and every new comment from Disqus needs to be also saved in the Blogger system. Importing works just fine, but "syncing" does not. Every time someone posts a comment I receive this email:
Hi siderite,
 
You are receiving this email because you've chosen to sync your
comments on Disqus with your Blogger blog. Unfortunately, we were not
able to access this blog.
 
This may happen if you've revoked access to Disqus. To re-enable,
please visit:
https://siderite.disqus.com/admin/discussions/import/platform/blogger/
 
Thanks,
The Disqus Team
Of course, I have not revoked any access, but I "reenable" just the same only to be presented with a link to resync that doesn't work. I mean, it is so crappy that it returns the javascript error "e._ajax is undefined" for a line where e._ajax is used instead of e.ajax and even if that would have worked, it uses a config object that is not defined.

It doesn't really matter, because the ajax call just accesses (well, it should access) https://siderite.disqus.com/admin/discussions/import/platform/blogger/resync/. And guess what happens when I go there: I receive an email that the Disqus access in Blogger has been revoked.

No reply for the Disqus team for months, for me or anybody else having this problem. They have a silly page that explains that, of course, they are not at fault, Blogger did some refactoring and broke their system. Yeah, I believe that. They probably renamed the ajax function in jQuery as well. Damn Google!

I am going to discuss in this post an interview question that pops up from time to time. The solution that is usually presented as best is the same, regardless of the inputs. I believe this to be a mistake. Let me explore this with you.

The problem



The problem is simple: given two sorted arrays of very large size, find the most efficient way to compute their intersection (the list of common items in both).

The solution that is given as correct is described here (you will have to excuse its Javiness), for example. The person who provided the answer made a great effort to list various solutions and list their O complexity and the answer inspires confidence, as coming from one who knows what they are talking about. But how correct is it? Another blog post describing the problem and hinting on some extra information that might influence the result is here.

Implementation


Let's start with some code:

var rnd = new Random();
var n = 100000000;
int[] arr1, arr2;
generateArrays(rnd, n, out arr1, out arr2);
var sw = new Stopwatch();
sw.Start();
var count = intersect(arr1, arr2).Count();
sw.Stop();
Console.WriteLine($"{count} intersections in {sw.ElapsedMilliseconds}ms");

Here I am creating two arrays of size n, using a generateArrays method, then I am counting the number of intersections and displaying the time elapsed. In the intersect method I will also count the number of comparisons, so that we avoid for now the complexities of Big O notation (pardon the pun).

As for the generateArrays method, I will use a simple incremented value to make sure the values are sorted, but also randomly generated:

private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    int s1 = 0;
    int s2 = 0;
    for (var i = 0; i < n; i++)
    {
        s1 += rnd.Next(1, 100);
        arr1[i] = s1;
        s2 += rnd.Next(1, 100);
        arr2[i] = s2;
    }
}


Note that n is 1e+7, so that the values fit into an integer. If you try a larger value it will overflow and result in negative values, so the array would not be sorted.

Time to explore ways of intersecting the arrays. Let's start with the recommended implementation:

private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1<arr1.Length && p2<arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch(v1.CompareTo(v2))
        {
            case -1:
                p1++;
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2++;
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}


Note that I am not counting the comparisons of the two pointers p1 and p2 with the Length of the arrays, which can be optimized by caching the length. They are just as resource using as comparing the array values, yet we discount them in the name of calculating a fictitious growth rate complexity. I am going to do that in the future as well. The optimization of the code itself is not part of the post.

Running the code I get the following output:

19797934 comparisons
199292 intersections in 832ms


The number of comparisons is directly proportional with the value of n, approximately 2n. That is because we look for all the values in both arrays. If we populate the values with odd and even numbers, for example, so no intersections, the number of comparisons will be exactly 2n.

Experiments


Now let me change the intersect method, make it more general:

private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1 < arr1.Length && p2 < arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch (v1.CompareTo(v2))
        {
            case -1:
                p1 = findIndex(arr1, v2, p1, ref comparisons);
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2 = findIndex(arr2, v1, p2, ref comparisons);
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    p++;
    while (p < arr.Length)
    {
        comparisons++;
        if (arr[p] >= v) break;
        p++;
    }
    return p;
}

Here I've replaced the increment of the pointers with a findIndex method that keeps incrementing the value of the pointer until the end of the array is reached or a value larger or equal with the one we are searching for was found. The functionality of the method remains the same, since the same effect would have been achieved by the main loop. But now we are free to try to tweak the findIndex method to obtain better results. But before we do that, I am going to P-hack the shit out of this science and generate the arrays differently.

Here is a method of generating two arrays that are different because all of the elements of the first are smaller than the those of the second. At the very end we put a single element that is equal, for the fun of it.

private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    for (var i = 0; i < n - 1; i++)
    {
        arr1[i] = i;
        arr2[i] = i + n;
    }
    arr1[n - 1] = n * 3;
    arr2[n - 1] = n * 3;
}


This is the worst case scenario for the algorithm and the value of comparisons is promptly 2n. But what if we would use binary search (what in the StackOverflow answer was dismissed as having O(n*log n) complexity instead of O(n)?) Well, then... the output becomes

49 comparisons
1 intersections in 67ms

Here is the code for the findIndex method that would do that:

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var start = p + 1;
    var end = arr.Length - 1;
    if (start > end) return start;
    while (true)
    {
        var mid = (start + end) / 2;
        var val = arr[mid];
        if (mid == start)
        {
            comparisons++;
            return val < v ? mid + 1 : mid;
        }
        comparisons++;
        switch (val.CompareTo(v))
        {
            case -1:
                start = mid + 1;
                break;
            case 0:
                return mid;
            case 1:
                end = mid - 1;
                break;
        }
    }
}


49 comparisons is smack on the value of 2*log2(n). Yeah, sure, the data we used was doctored, so let's return to the randomly generated one. In that case, the number of comparisons grows horribly:

304091112 comparisons
199712 intersections in 5095ms

which is larger than n*log2(n).

Why does that happen? Because in the randomly generated data the binary search find its worst case scenario: trying to find the first value. It divides the problem efficiently, but it still has to go through all the data to reach the first element. Surely we can't use this for a general scenario, even if it is fantastic for one specific case. And here is my qualm with the O notation: without specifying the type of input, the solution is just probabilistically the best. Is it?

Let's compare the results so far. We have three ways of generating data: randomly with increments from 1 to 100, odds and evens, small and large values. Then we have two ways of computing the next index to compare: linear and binary search. The approximate numbers of comparisons are as follows:

RandomOddsEvensSmallLarge

Linear2n2n2n
Binary search3/2*n*log(n)2*n*log(n)2*log(n)

Alternatives


Can we create a hybrid findIndex that would have the best of both worlds? I will certainly try. Here is one possible solution:

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var inc = 1;
    while (true)
    {
        if (p + inc >= arr.Length) inc = 1;
        if (p + inc >= arr.Length) return arr.Length;
        comparisons++;
        switch(arr[p+inc].CompareTo(v))
        {
            case -1:
                p += inc;
                inc *= 2;
                break;
            case 0:
                return p + inc;
            case 1:
                if (inc == 1) return p + inc;
                inc /= 2;
                break;
        }
    }
}


What am I doing here? If I find the value, I return the index; if the value is smaller, not only do I advance the index, but I also increase the speed of the next advance; if the value is larger, then I slow down until I get to 1 again. Warning: I do not claim that this is the optimal algorithm, this is just something that was annoying me and I had to explore it.

OK. Let's see some results. I will decrease the value of n even more, to a million. Then I will generate the values with random increases of up to 10, 100 and 1000. Let's see all of it in action! This time is the actual count of comparisons (in millions):

Random10Random100Random1000OddsEvensSmallLarge

Linear22222
Binary search303030400.00004
Accelerated search3.43.93.940.0002


So for the general cases, the increase in comparisons is at most twice, while for specific cases the decrease can be four orders of magnitude!

Conclusions


Because I had all of this in my head, I made a fool of myself at a job interview. I couldn't reason all of the things I wrote here in a few minutes and so I had to clear my head by composing this long monstrosity.

Is the best solution the one in O(n)? Most of the time. The algorithm is simple, no hidden comparisons, one can understand why it would be universally touted as a good solution. But it's not the best in every case. I have demonstrated here that I can minimize the extra comparisons in standard scenarios and get immense improvements for specific inputs, like arrays that have chunks of elements smaller than the next value in the other array. I would also risk saying that this findIndex version is adaptive to the conditions at hand with improbable scenarios as worst cases. It works reasonable well for normally distributed arrays, it does wonders for "chunky" arrays (in this is included the case when one array is much smaller than the other) and thus is a contender for some kinds of uses.

What I wanted to explore and now express is that finding the upper growth rate of an algorithm is just part of the story. Sometimes the best implementation fails for not adapting to the real input data. I will say this, though, for the default algorithm: it works with IEnumerables, since it never needs to jump forward over some elements. This intuitively gives me reason to believe that it could be optimized using the array/list indexing. Here it is, in IEnumerable fashion:

private static IEnumerable<int> intersect(IEnumerable<int> arr1, IEnumerable<int> arr2)
{
    var e1 = arr1.GetEnumerator();
    var e2 = arr2.GetEnumerator();
    var loop = e1.MoveNext() && e2.MoveNext();
    while (loop)
    {
        var v1 = e1.Current;
        var v2 = e2.Current;
        switch (v1.CompareTo(v2))
        {
            case -1:
                loop = e1.MoveNext();
                break;
            case 0:
                loop = e1.MoveNext() && e2.MoveNext();
                yield return v1;
                break;
            case 1:
                loop = e2.MoveNext();
                break;
        }

    }
}

Extra work


The source code for a project that tests my various ideas can be found on GitHub. There you can find the following algorithms:

  • Standard - the O(m+n) one described above
  • Reverse - same, but starting from the end of the arrays
  • Binary Search - looks for values in the other array using binary search. Complexity O(m*log(n))
  • Smart Choice - when m*log(n)<m+n, it uses the binary search, otherwise the standard one
  • Accelerating - the one that speeds up when looking for values
  • Divide et Impera - recursive algorithm that splits arrays by choosing the middle value of one and binary searching it in the other. Due to the complexity of the recursiveness, it can't be taken seriously, but sometimes gives surprising results
  • Middle out - it takes the middle value of one array and binary searches it in the other, then uses Standard and Reverse on the resulting arrays
  • Pair search - I had high hopes for this, as it looks two positions in front instead of one. Really good for some cases, though generally it is a bit more than Standard


The testing tool takes all algorithms and runs them on randomly generated arrays:

  1. Lengths m and n are chosen randomly from 1 to 1e+6
  2. A random number s of up to 100 "spikes" is chosen
  3. m and n are split into s+1 equal parts
  4. For each spike a random integer range is chosen and filled with random integer values
  5. At the end, the rest of the list is filled with any random values

Results


For really small first array, the Binary Search is king. For equal size arrays, usually the Standard algorithm wins. However there are plenty of cases when Divide et Impera and Pair Search win - usually not by much. Sometimes it happens that Accelerating Search is better than Standard, but Pair Search wins! I still have the nagging feeling that Pair Search can be improved. I feel it in my gut! However I have so many other things to do for me to dwell on this.

Maybe one of you can find the solution! Your mission, should you choose to accept it, is to find a better algorithm for intersecting sorted arrays than the boring standard one.

I am writing this post to rant against subscription popups. I've been on the Internet long enough to remember when this was a thing: a window would open up and ask you to enter your email address. We went from that time, through all the technical, stylistic and cultural changes to the Internet, to this Web 3.0 thing, and the email subscription popups have emerged again. They are not ads, they are simply asking you to allow them into your already cluttered inbox because - even before you've had a chance to read anything - what they have to say is so fucking important. Sometimes they ask you to like them on Facebook or whatever crap like that.

Let me tell you how to get rid of these real quick. Install an ad blocker, like AdBlockPlus or uBlock Origin. I recommend uBlock Origin, since it is faster and I feel works better than the older AdBlock. Now this is something that anyone should do just to get rid of ads. I've personally never browsed the Internet from a tablet or cell phone because they didn't allow ad blockers. I can't go on the web without them.

What you may not know, though, is that there are several lists of filters that you can choose from and that are not enabled by default when you install an ad blocker. One of my favourite lists is Fanboy's Annoyances list. It takes care of popups of all kinds, including subscriptions. But even so, if the default list doesn't contain the web site you are looking at, you have the option to pick elements and block them. A basic knowledge of CSS selectors helps, but here is the gist of it: ###something means the element with the id "something" and ##.something is the elements with the class name "something". Here is an example: <div id="divPopup" class="popup ad annoying"> is a div element that has id "divPopup" and class names "popup", "ad" and "annoying".

One of the reason why subscription popups are not always blocked is because beside the elements that they cover the page with, they also place some constraints on the page. For example they place a big element over the screen (what is called an overlay), then a popup element in the center of the screen and also change the style of the entire page to not scroll down. So if you would remove the overlay and the popup, the page would only show you the upper part and not allow you to scroll down. This can be solved with another browser extension called Stylish, which allows you to save and apply your own style to pages you visit. The CSS rule that solves this very common scenario is html,body { overflow: auto !important; }. That is all. Just add a new style for the page and copy paste this. 19 in 20 chances you will get the scroll back.

To conclude, whenever you see such a stupid, stupid thing appearing on the screen, consider blocking subscription popups rather than pressing on the closing button. Block it once and never see it again. Push the close button and chances are you will have to keep pressing it each time you visit a page.

Now, if I only had a similar option for jump scares in movies...

P.S. Yes, cookie consent popups are included in my rant. Did you know that you can block all cookie nagware from Blogspot within one fell swoop, rather than having to click OK at each blog individually, for example?

In the last few days I've read several articles that all seem to say the same thing: computer algorithms we use today are biased towards wealthy white men, because they are made by companies in wealthy countries and predominantly by white male developers. Therefore they are inherently racist, misogynistic and wasteful. Nice journalistic metaphors were used such as: "Sea of dudes", "discriminatory code", "green algorithms", etc. I call bullshit!

Computer algorithms need to be, most of all, money makers. If Facebook or Google tweak an algorithm one way or another, the result is immediately apparent in the bottom line because of their huge user count. It may be possible that somehow, by cosmic coincidence, wealthy white males would also be the ones making most purchases, moving the most money, and thus the algorithm may appear biased. But it's not. The algorithm performs as it was supposed to. If a face recognition algorithm classifies black people as gorillas, Asian people as blinking, etc, it's not because the algorithm is racist, but because the data it was provided pointed to that result. If looking for a movie title you get torrent links rather than the official web page of the movie it's because that is what people want more. It's not a glitch, it's the way a machine understands reality. An algorithm is no more racist than the Nazi ovens or the Hiroshima bomb.

What I am trying to say is that code, especially now when it is becoming more and more embedded with machine learning (which is a much better term than the terrible misleading "artificial intelligence"), represents an intersection point between specifications, people biases and data biases, to which you add horrible bugs. Algorithms, just like the way pieces of your brain work, are but elements in a puzzle.

"Well, of course, and to make the entire puzzle more socially responsible, we need to make all pieces socially responsible!" That's stupid. It's like working on the paint of the car to make it go faster. Sure, you can use some over engineered paint to reduce drag, but the engine and the wheels are still going to be more important. Male developers don't decide to tweak an algorithm to make it disregard women any more than a human resources female employee doesn't decide to hire developers based on how much they value women. Owners, managers, money ultimately are what lead to decisions.

Stop trying to appear politically correct when you don't know what you are talking about. If a complex computer algorithm that uses math as its underlying mechanism shows a bias, it's not because statistics are racist, but because the data it was fed was biased. The algorithm in question doesn't reveal the small mindedness of the white developer or of the male mathematician, but a characteristic of the world it sees. Even with people feeding them the wrong data, algorithms are more objective than humans - that is a fact - because often you start developing them before you know what you are looking for; a person always works the other way around. Why not use code to show us where we are wrong, or biased, or angry at how the world is, or plain stupid? We have such a wonderful tool to make judgements from formal principles that we can actually tweak and, instead of scrutinizing the principles, you go nitpicking against the developers and the algorithms. I find it especially humorous to see random data introduced into a generic algorithm producing results that are considered biased because you don't like what you see.

Bottom line: want to change the world and make it better? Here is an algorithm for you: take the world and make it better.

And BTW, I find that constantly accusing developers of being white and male is a form of sexist racism. What do you want me to do? Turn black? If you would truly be unbiased you wouldn't care what is the social structure of your IT department. It's only now when computers matter so much that you are bothered of how much the geeks are getting paid.

...is stupid.

For a very long time the only commonly used expression of software was the desktop application. Whether it was a console Linux thing or a full blown Windows application, it was something that you opened to get things done. In case you wanted to do several things, you either opted for a more complex application or used several of them, usually transferring partial work via the file system, sometimes in more obscure ways. For example you want to publish a photo album, you take all pictures you've taken, process them with an image processing software, then you save them and load them with a photo album application. For all intents and purposes, the applications are black boxes to each other, they only connect with inputs and outputs and need not know what goes on inside one another.

Enter the web and its novel concept of URLs, Uniform Resource Locators. In theory, everything on the web can be accessible from the outside. You want to link to a page, you have its URL to add as an anchor in your page and boom! A web site references specific resources from another. The development paradigm for these new things was completely different from big monolithic applications. Sites are called sites because they should be a place for resources to sit in; they are places, they have no other role. The resources, on the other hand, can be processed and handled by specific applications like browsers. If a browser is implemented in all operating systems in the same way, then the resources get accessed the same way, making the operating system - the most important part of one's software platform - meaningless. This gets us to this day and age when an OS is there to restrict what you can do, rather than provide you with features. But that's another story altogether.

With increased computing power, storage space, network speeds and the introduction and refining of Javascript - now considered a top contender for the most important programming language ever - we are now able to embed all kinds of crazy features in web pages, so much so that we have reached a time when writing a single page application is not only possible, but a norm. They had to add new functionality to browsers in order to let the page tweak the browser address without reloading the page and that is a big deal! And a really dumb one. Let me explain why.

The original concept was that the web would own the underlying mechanism of resource location. The new concept forces the developer to define what a resource locator means. I can pretty much make my own natural language processing system and have URLs that look like: https://siderite.com/give me that post ranting about the single page apps. And yes, the concept is not new, but the problem is that the implementation is owned by me. I can change it at any time and, since it all started from a desire to implement the newest fashion, destined to change. The result is chaos and that is presuming that the software developer thought of all contingencies and the URL system is adequate to link to resources from this page... which is never true. If the developer is responsible for interpreting what a URL means, then it is hardly "uniform".

Another thing that single page apps lead to is web site bloating. Not only do you have to load the stuff that now is on every popular website, like large pointless images and big fonts and large empty spaces, but also the underlying mechanism of the web app, which tells us where we are, what we can do, what gets loaded etc. And that's extra baggage that no one asked for. A single page app is hard to parse by a machine - and I don't care about SEO here, it's all about the way information is accessible.

My contention is that we are going backwards. We got the to point where connectivity is more important than functionality, where being on the web is more important than having complex well done features in a desktop app. It forced us to open up everything: resources, communication, protocols, even the development process and the code. And now we are going back to the "one app to rule them all" concept. And I do understand the attraction. How many times did I dream of adding mini games on my blog or make a 3D interface and a circular corner menu and so on. This things are cool! But they are only useful in the context of an existing web page that has value without them. Go to single page websites and try to open them with Javascript disabled. Google has a nice search page that works even then and you know what? The same page with Javascript is six times larger than the one without - and this without large differences in display. Yes, I know that this blog has a lot of stuff loaded with Javascript and that this page probably is much smaller without it, but the point it that the blog is still usable. For more on this you should take the time to read The Web Obesity Crisis, which is not only terribly true, but immensely funny.

And I also have to say I understand why some sites need to be single page applications, and that is because they are more application than web site. The functionality trumps the content. You can't have an online image processing app work without Javascript, that's insane. You don't need to reference the resource found in a color panel inside the photo editor, you don't need to link to the image used in the color picker and so on. But web sites like Flipboard, for example, that display a blank page when seen without Javascript, are supposed to be news aggregators. You go there to read stuff! It is true we can now decide how much of our page is a site and how much an application, but that doesn't mean we should construct abominations that are neither!

A while ago I wrote another ranty rant about how taking over another intuitively common web mechanism: scrolling, is helping no one. These two patterns are going hand in hand and slowly polluting the Internet. Last week Ars Technica announced a change in their design and at the same time implemented it. They removed the way news were read by many users: sequentially, one after the other, by scrolling down and clicking on the one you liked, and resorted to a magazine format where news were just side by side on a big white page with large design placeholders that looked cool yet did nothing but occupy space and display the number of comments for each. Content took a backseat to commentary. I am glad to report that two days later they reverted their decision, in view of the many negative comments.

I have nothing but respect for web designers, as I usually do for people that do things I am incapable of, however their role should always be to support the purpose of the site. Once things look cool just for the sake of it, you get Apple: a short lived bloom of user friendliness, followed by a vomitous explosion of marketing and pricing, leading to the immediate creation of cheaper clones. Copying a design because you think is great is normal, copying a bunch of designs because you have no idea what your web page is supposed to do is just direct proof you are clueless, and copying a design because everyone else is doing it is just blindly following clueless people.

My advice, as misguided as it could be, is forget about responsiveness and finger sized checkboxes, big images, crisp design and bootstrapped pages and all that crap. Just stop! And think! What are you trying to achieve? And then do it, as a web site, with pages, links and all that old fashioned logic. And if you still need cool design, add it after.

I've noticed an explosion of web sites that try to put all of their stuff on a single page, accessible through nothing else than scrolling. Parallax effects, dynamic URL changes as you scroll down, self arranging content based on how much you have scrolled, videos that start and stop based on where they are placed in the viewbox, etc. They're all the rage now, like web versions of pop-up books. And, as anything that pops up at you, they are annoying! I know creativity in the design world means copying the hell out of whoever is more fashionable, but I really really really would want people to stop copying this particular Facebook++ type, all slimy fingers on touchscreens abomination.

Take a look at inc.com, for example. Reading about brain hacking and scrolling down I get right into Momofuku, whatever that is, and self playing videos. It's spam, that's what it is. I am perfectly capable of finding links and clicking (or tapping, whatever the modern equivalent of pressing Enter after a few Tab keys is now) to follow the content I am interested in. What I do NOT want is for crappy design asswipes to shove their idea of interesting content down my throat, eyes, ears or any other body organs. Just quit it!

If you are not convinced, read this article that explains how parallax scrolling web sites have become mainstream and gives two different links that list tens of "best web sites" using this design method. They are all obnoxious, slow to load, eye tiring pieces of crap. Oh look, different parts of the same page move at different speeds! How cool, now I have to scroll up and down just in order to be able to pay attention to them all, even if they are at the same bloody place!

Am I the only one who feels that way? Am I too old to understand what the cool kids like nowadays or is this exactly what I think it is: another graphical gimmick that values form over substance?

I really missed reading a good science fiction book and when I was in this vulnerable mental state I stumbled upon this very positive review on Ars Technica recommending Ann Leckie's trilogy Ancillary. Ars Technica is one of the sites that I respect a lot for the quality of the news there, but I have to tell you that after this, my opinion of them plummeted. Let me tell you why.

The only remotely interesting characteristics of the Ancillary series is the premise - that an AI gets trapped in the body of an "ancillary" soldier that was used as only a physical extension among many others - and the original idea of using no gender when talking about people. You see, in the future, the empire language has lost its need of defining people by gender and therefore they are all she, mother, sister, etc. It is important that the genre is translated into our backward English is all female, as to balance the patriarchal bias of our society. Way to go, Ann! The books also won a ton of awards, which made me doubt myself for a full second before deciding that notorious book awards seem to be equally narrow in focus as notorious movie awards.

Unfortunately, after two books that only exposed antiquated ideas of space operas past, boring scenes and personal biases of the author, I decided to stop. I will not read the third book, the one that maybe would give me some closure as the last in the series. That should tell you how bad I think the books were. On a positive note it vaguely reminded me of Feintuch's Seafort Saga. If you want to read a similarly out of date space opera, but really good and captivating, read that one.

You see, it all happens on ships and stations, where the only thing that doesn't feel like taken from feudal stories are... wait... err... no, there is nothing remotely modern about the books. The premise gets lost on someone who focuses exclusively on the emotions of the Artificial Intelligence, rather than on their abilities or actual characteristics. If I were an AI, I would consider that discrimination. The same ideas could be put in some magical kingdom where magical people clone themselves and keep in touch. I don't know who invented this idea that the future will somehow revert to us being pompous boring nobles that care about family name, clothes, tea and saucer sets (this is from the books, I am not making it up), but enough with it! We have the Internet. And cell phones. That future will not happen! And if it would, no one cares! The main character acts like a motherly person for stupid or young people, no doubt reflecting Leckie's mood as a stay-at-home mom at the time of writing the books. You can basically kill people with impunity in this world of hers, if you are high enough on the social ladder, but swearing is frowned upon, for example.

OK, ranted enough about this. I don't care that her vision of the future sucks. I wouldn't have cared if her writing was bad - which it isn't. It's not great either, though. I do care when I get flooded with review titles like "The book series that brought space opera into the 21st century", by Annalee Newitz, or "Ancillary Justice is the mind-blowing space opera you've been needing", by Annalee Newitz, or "Why I’m Voting for Ann Leckie’s Ancillary Justice", by Justin Landon - a friend of Annalee Newitz' from the Speculative Fiction compilations, and "A mind-bending, award-winning science fiction trilogy that expertly investigates the way we live now.", by Tammy Oler, who is writing with Annalee Newitz at Bitch Media. Do you see a pattern here?

I have to admit that I think it is a feminism thing. So enamored were these girls of a story that doesn't define its characters by gender, that they loved the book. Annalee's reviews, though, are describing the wonderful universe that Leckie created, with its focus on social change and social commentary, and how it makes one think of how complex things are. I didn't get that at all. I got the typical all powerful emperor over the space empire thing, with stuck up officers believing they know everything and everything is owed to them and the "man/woman of the people" main character that shows them their wicked ways. The rest is horribly boring, not because of the lack of action, but because of the lack of consequence. I kind of think it's either a friend advertising for another or some shared feminist agenda thing.

Bottom line: regardless of my anger with out of proportion reviews, I still believe these books are not worth reading. The first one is acceptable, while the second one just fizzles. I am sure I can find something better to do with my time than to read the third. The great premise of the series is completely wasted with this author and the main character doesn't seem to be or do anything of consequence, while moving from "captain of the ship" to "social rebel" and "crime investigator" at random.