and has 0 comments

  One can take a container in which there is water and keep pouring oil in and after a time there will be more oil than water. That's because oil is hydrophobic, it "fears water" in a direct translation of the word. You can then say that the percentage of oil is higher than the percentage of water, that there is more oil in the container. Skin color in a population doesn't work like that, no matter how phobic some people are. Instead of water and oil, it's more like paint. One can take a container in which there is white paint and keep pouring black, red, yellow and brown paint in, but from a very early stage, that paint is no longer white.

  I keep finding these statistics about which part of the world is going to have Whites in a minority after a while. Any statistic counting people by color of skin is purist in nature and, as we should know by now, the quest for purity begets violence. The numbers are irrelevant if the basis of these statistics is conceptually wrong. In a true openly diverse population, white skin color should disappear really quick. The only chance for it to exist is if people with white skin would not mingle with people of any other color.

  What is a White person? Someone who has white skin? Someone who has European ancestry? Someone who has no ancestry that is not European? Are Jews white? How about coptic Egyptians? Some Asians are really white, too. There is no argument that uses the concept of White which is not directly dependant on the idea of racial purity. And then there is Non-White. A few days ago someone was noting that it feels weird to use the term Latino, considering how many different countries and interests are represented by the people labeled as such. So how can anyone meaningfully use a term like Non-White, which groups together Black people, Mexicans, Chinese, Indians, Eskimos and Native-Americans, among many others? Two "African-American" people of identical skin color may be as different as someone can imagine: one a many generations American with slave ancestors, the other a middle-class African recently arrived in the US.

  What I am saying is that the most politically correct terms, used (and imposed) by proponents (and arbiters) of racial justice and equality, are as purist as they could be. The only argument that one can possibly bring here is that purism is somehow different and distinct from racism. This is absurd. One can be a purist and not be racist, but not the other way around. In fact, when people are trying to limit your freedom of expression because some of your words or concepts may be offensive, they are in fact fighting for the purity of ideas, one that is not marred by a specific idea of purity that they are against. These are similar patterns, so similar in fact that I can barely see a difference. No wonder this kind of thinking has taken root most in a country where a part of its founders were called Puritans!

  So how about we change the rhetoric to something that does not imply segregation or a quest for purity or a war on something or cancelling other people or creating safe spaces or hating something that is other? And the phrase above is not ironic, since I am not proposing we fight against this kind of ideas, only that we acknowledge their roots and that we come up with new ones. Let us just grow in different directions, rather than apart.

and has 0 comments

  I've read today this CNN article: 'Star Trek: Discovery' to introduce history-making non-binary and transgender characters. And it got me thinking on what this means for the Star Trek universe. It means absolutely nothing. Star Trek has had people turned into other species, duplicated, merged, their genetic code altered and fixed, made young and old. It has had species with no gender, multiple genders and various philosophies. It has interspecies relationships, including sexual.

  Star Trek has tackled intolerance many times, usually by showing the Federation crew having contact with an alien species that does the same things we do today, in caricature. It tackled race intolerance, from Kirk's kiss with Uhura to the episode with the species with black on one side and white on the other discriminating the people who had their colors the other way around. It tackled gender discrimination in multiple situations. It tackled sex change and identity change with the Trill. It featured multi sex civilisations. The happy tolerance train seems to stop with anything related to using inorganic technology with the human body, but no one is perfect and Janeway was awful with everybody.

  A person who is biologically a man yet desires to be treated as a woman would be normal for Star Trek. It would be inconsequential. If they go the way of the oppressed member of another culture that they meet, they will not solve anything, they will just have another weird alien around, which defeats the purpose. If they go with a non-binary crewmember they should not acknowledge the fact except in passing. Yes, habituate the public with the concept, let them see it in a series and get used to it, but the people in Star Trek should already have passed that point. Hell, they could go with a person who changes their sex every one in a while, to spice things up.

  What I would do is have a character who is clearly of a different sex than the gender they identify with and someone badgering them to have a proper sex change while they refuse. Now that would be a Star Trek worthy dilemma. You want to make it spicy? Have them go to the doctor and change their race instead, behave like a black person while wearing the high tech equivalent of blackface. What? Are you refusing someone the ownership of their identity?

  I really doubt they will go that way, though. Instead they will find some way of bringing the subject up again and again and again and throw it in our faces like a conflict that has to be resolved. In the bright and hopeful future, there should be no conflict about it! This CBS announcement should not have existed. You want to put some transgender people in, OK, put them in. It's not a boasting point, is it? The announcement basically does the opposite of what they claim to do: "Oh, look, we put non binary people in our series! How quaint! Hurrah! Only we do it, come watch the freak show!".

  Please, writers, please please please, don't just write stories then change the gender or race of characters because it's en vogue. Stop it with the gender swapping, which is the creative equivalent of copy and paste. Write with the story in mind, with the context, with the characters as they would normally behave. Don't add characters after you've thought of the story just to make them diverse either. Just write stories with characters that make sense! You don't know people from that demographic? Find one, spend time with them, then adjust your characters accordingly. I am so tired of tiny female action heroes, flamboyant and loud gays and the wise old lesbian. How come no one finds those offensive? It's like someone said "OK, we will have shitty black and female and non-cis characters for now. When people get used to them, we will actually have them do something and be realistic and perhaps in 2245 we'll even have them be sympathetic".

  They tried the woke way from the very beginning in Discovery, with the Stamets/Culber gay couple. They kept showing them kissing and washing their teeth together and other stuff like that, when it made little difference to the story. Most people on Star Trek are written as single, for some weird reason that makes no sense, unless their relationship furthers the story. Riker and Troi could be the exception, though, yet even they were not kissy kissy on the bridge all the time. I never understood that couple. Dax and Worf made more sense, for crying out loud! And remember Starfleet is a military organization. You may put women and men and trans and aliens and robots together in a crew, but their role is to do their job. Their sex, their gender even less, makes no difference.

  Gene Roddenberry was a dreamer of better futures, where all of our idiotic problems have been left behind and reason prevailed, but even he imagined a third World War leading to humanity changing its ways as a start. Star Trek has always analysed the present from the viewpoint of an idyllic future, a way of looking back that is inherently rational: "Imagine the future you want, then wonder what would people from that time think of you". It's brilliant! Don't break that to bring stupid into the future. To tackle present social issues you have to first be a Trekkie, already there in the exalted future, before you consider the dark ages of the 21st century with a fresh perspective.

and has 0 comments

  For a more in depth exploration of the concept, read Towards generic high performance sorting algorithms

Sorting

  Consider QuickSort, an algorithm that uses a divide and conquer strategy to sort efficiently and the favourite in computer implementations.

  It consists of three steps, applied recursively:

  1. find a pivot value
  2. reorder the input array so that all values smaller than the pivot are followed by values larger or equal to it (this is called Partitioning)
  3. apply the algorithm to each part of the array, before and after the pivot

  QuickSort is considered generic, meaning it can sort any type of item, assuming the user provides a comparison function between any two items. A comparison function has the same specific format: compare(item1,item2) returning -1, 0 or 1 depending on whether item1 is smaller, equal or larger than item2, respectively. This formalization of the function lends more credence to the idea that QuickSort is a generic sorting algorithm.

  Multiple optimizations have been proposed for this algorithm, including using insertion sort for small enough array segments, different ways of choosing the pivot, etc., yet the biggest problem was always the optimal way in which to partition the data. The original algorithm chose the pivot as the last value in the input array and the average complexity was O(n log n), but worse case scenario was O(n^2), when the array was already sorted and the pivot was the largest value. Without extra information you can never find the optimal partitioning schema (which would be to choose the median value of all items in the array segment you are sorting).

  But what if we turn QuickSort on its head? Instead of providing a formalized comparison function and fumbling to get the best partition, why not provide a partitioning function (from which a comparison function is trivial to obtain)? This would allow us to use the so called distribution based sorting algorithms (as opposed to comparison based ones) like Radix, BurstSort, etc, which have a complexity of O(n) in a generic way!

  My proposal for a formal signature of a partitioning function is partitionKey(item,level) returning a byte (0-255) and the sorting algorithm would receive this function and a maximum level value as parameters.

  Let's see a trivial example: an array of values [23,1,31,0,5,26,15] using a partition function that would return digits of the numbers. You would use it like sort(arr,partFunc,2) because the values are two digits numbers. Let's explore a naive Radix sorting:

  • assign 256 buckets for each possible value of the partition function result and start at the maximum (least significant) level
  • put each item in its bucket for the current level
  • concatenate the buckets
  • decrease the level and repeat the process

Concretely:

  • level 1: 23 -> bucket 3, 1 -> 1, 31 -> 1, 0 -> 0, 5 -> 5, 26 -> 6, 15 -> 5 results in [0,1,31,5,15,6]
  • level 0: 0 -> 0, 1 -> 0, 31 -> 3, 5 -> 0, 15 -> 1, 6 -> 0 results in [0,1,5,6,15,31]

Array sorted. Complexity is O(n * k) where k is 2 in this case and depends on the type of values we have, not on the number of items to be sorted!

  More complex distribution sorting algorithms, like BurstSort, optimize their function by using a normal QuickSort in small enough buckets. But QuickSort still requires an item comparison function. Well, it is easy to infer: if partFunc(item1,0) is smaller or larger than partFunc(item2,0) then item1 is smaller or larger than item2. If the partition function values are equal, then increase the level and compare partFunc(item1,1) to partFunc(item2,1).

  In short, any distribution sorting algorithm can be used in a generic way provided the user gives it a partitioning function with a formalized signature and a maximum level for its application.

  Let's see some example partitioning functions for various data types:

  • integers from 0 to N - maximum level is log256(N) and the partition function will return the bytes in the integer from the most significant to the least
    • ex: 65534 (0xFFFE) would return 255 (0xFF) for level 0 and 254 (0xFE) for level 1. 26 would return 0 and 26 for the same levels.
  • integers from -N to N - similarly, one could return 0 or 1 for level 0 if the number is negative or positive or return the bytes of the equivalent positive numbers from 0 to 2N 
  • strings that have a maximum length of N - maximum level would be N and the partition function would return the value of the character at the same position as the level
    • ex: 'ABC' would return 65, 66 and 67 for levels 0,1,2.
  • decimal or floating point or real values - more math intensive functions can be found, but a naive one would be to use a string partitioning function on the values turned to text with a fixed number of digits before and after the decimal separator.
  • date and time - easy to turn these into integers, but also one could just return year, month, day, hour, minute, second, etc based on the level
  • tuples of any of the types above - return the partition values for the first item, then the second and so on and add their maximum levels

  One does not have to invent these functions, they would be provided to the user based on standard types in code factories. Yet even these code factories will be able to encode more information about the data to be sorted than mere comparison functions. Stuff like the minimum and maximum value can be computed by going through all the values in the array to be sorted, but why do it if the user already has this information, for example.

  Assuming one cannot find a fixed length to the values to be sorted on, like real values or strings of any length, consider this type of sorting as a first step to order the array as much as possible, then using something like insertion or bubble sort on the result.

Finding a value or computing distinct values

  As an additional positive side effect, there are other processes on lists of items that are considered generic because they use a formalized form function as a parameter. Often found cases include finding the index of an item in a list equal to a given value (thus determining if the value exists in a list) and getting the distinct values from an array. They use an equality function as a parameter which is formalized as returning true or false. Of course, a comparison function could be used, depending on if its result is 0 or not, but a partitioning function can also be used to determine equality, if all of the bytes returned on all of the levels are equal.

  But there is more. The format of the partition function can be used to create a hash set of the values, thus reducing the complexity of the search for a value from O(n) to O(log n) and that of getting distinct values from O(n^2) to O(n log n)!

  In short, all operations on lists of items can be brought together and optimized by using the same format for the function that makes them "generic": that of a partitioning function.

Conclusion

  As you can see, I am rather proud of the concepts I've explained here. Preliminary tests in Javascript show a 20 fold improvement in performance for ten million items when using RadixSort over the default sort. I would really love feedback from someone who researches algorithms and can even test these assumptions under benchmark settings. Them being complex as they are, I will probably write multiple posts on the subject, trying to split it (partition it?) into easily digestible bits

 The concept of using a generic partitioning function format for operations on collections is a theoretical one at the moment. I would love to collaborate with people to get this to production level code, perhaps taking into account advanced concepts like minimizing cache misses and parallelism, not only the theoretical complexity.

 More info and details at Towards generic high performance sorting algorithms

Intro

  I want to examine together with you various types of sort algorithms and the tricks they use to lower the magic O number. I reach the conclusion that high performance algorithms that are labeled as specific to a certain type of data can be made generic or that the generic algorithms aren't really that generic either. I end up proposing a new form of function that can be fed to a sorting function in order to reach better performance than the classic O(n*log(n)). Extra bonus: finding distinct values in a list.

Sorting

  But first, what is sorting? Given a list of items that can be compared to one another as lower or higher, return the list in the order from lowest to highest. Since an item can be any type of data record, to define a generic sorting algorithm we need to feed it the rules that make an item lower than another and that is called the comparison function. Let's try an example in JavaScript:

  // random function from start to end inclusive
  function rand(start, end) {
    return parseInt(start + Math.random() * (end - start + 1));
  }
  
  // measure time taken by an action and output it in console
  let perfKey = 0;
  function calcPerf(action) {
    const key = perfKey++;
    performance.mark('start_' + key);
    action();
    performance.mark('end_' + key);
    const measure = performance.measure('measure_' + key, 'start_' + key, 'end_' + key);
    console.log('Action took ' + measure.duration);
  }
  
  // change this based on how powerful the computer is
  const size = 10000000;
  // the input is a list of size 'size' containing random values from 1 to 50000
  const input = [];
  for (let i = 0; i < size; i++)
    input.push(rand(1, 50000));
  
  // a comparison function between two items a and b
  function comparisonFunction(a, b) {
    if (a > b)
      return 1;
    if (a < b)
      return -1;
    return 0;
  }
  
  const output = [];
  // copy input into output, then sort it using the comparison function
  // same copying method will be used for future code
  calcPerf(() => {
    for (let i = 0; i < size; i++)
      output.push(input[i]);
    output.sort(comparisonFunction);
  });

  It's not the crispest code in the world, but it's simple to understand:

  • calcPerf is computing the time it takes for an action to take and logs it to the console
  • start by creating a big array of random numbers as input
  • copy the array in a result array and sort that with the default browser sort function, to which we give the comparison function as an argument
  • display the time it took for the operation.

  This takes about 4500 milliseconds on my computer.

  Focus on the comparison function. It takes two items and returns a number that is -1, 0 or 1 depending on whether the first item is smaller, equal or larger than the second. Now let's consider the sorting algorithm itself. How does it work?

  A naïve way to do it would be to find the smallest item in the list, move it to the first position in the array, then continue the process with the rest of the array. This would have a complexity of O(n2). If you don't know what the O complexity is, don't worry, it just provides an easy to spell approximation of how the amount of work would increase with the number of items in the input. In this case, 10 million records, squared, would lead to 100 trillion operations! That's not good.

  Other algorithms are much better, bringing the complexity to O(n*log(n)), so assuming base 10, around 70 million operations. But how do they improve on this? Surely in order to sort all items you must compare them to each other. The explanation is that if a<b and b<c you do not need to compare a to c. And each algorithm tries to get to this in a different way.

  However, the basic logic of sorting remains the same: compare all items with a subset of the other items.

Partitioning

  A very common and recommended sorting algorithm is QuickSort. I am not going to go through the entire history of sorting algorithms and what they do, you can check that out yourself, but I can focus on the important innovation that QuickSort added: partitioning. The first step in the algorithm is to choose a value out of the list of items, which the algorithm hopes it's as close as possible to the median value and is called a pivot, then arrange the items in two groups called partitions: the ones smaller than the pivot and the ones larger than the pivot. Then it proceeds on doing the same to each partition until the partitions are small enough to be sorted by some other sort algorithm, like insertion sort (used by Chrome by default).

  Let's try to do this manually in our code, just the very first run of the step, to see if it improves the execution time. Lucky for us, we know that the median is around 25000, as the input we generated contains random numbers from 1 to 50000. So let's copy the values from input into two output arrays, then sort each of them. The sorted result would be reading from the first array, then from the second!

  // two output arrays, one for numbers below 25000, the other for the rest
  const output1 = [];
  const output2 = [];
  const pivot = 25000;
  
  calcPerf(() => {
    for (let i = 0; i < size; i++) {
      const val = input[i];
      if (comparisonFunction(val, pivot) < 0)
        output1.push(val);
      else
        output2.push(val);
    }
    // sorting smaller arrays is cheaper
    output1.sort(comparisonFunction);
    output2.sort(comparisonFunction);
  });

  Now, the performance is slightly better. If we do this several times, the time taken would get even lower. The partitioning of the array by an operation that is essentially O(n) (we just go once through the entire input array) reduces the comparisons that will be made in each partition. If we would use the naïve sorting, partitioning would reduce nto n+(n/2)2+(n/2)2 (once for each partitioned half), thus n+n2/2. Each partitioning almost halves the number of operations!

  So, how many times can we half the number of operations for? Imagine that we do this with an array of distinct values, from 1 to 10 million. In the end, we would get to partitions of just one element and that means we did a log2(n) number of operations and for each we added one n (the partitioning operation). That means that the total number of operations is... n*log(n). Each algorithm gets to this in a different way, but at the core of it there is some sort of partitioning, that b value that makes comparing a and c unnecessary.

  Note that we treated the sort algorithm as "generic", meaning we fed it a comparison function between any two items, as if we didn't know how to compare numbers. That means we could have used any type of data as long as we knew the rule for comparison between items.

  There are other types of sorting algorithms that only work on specific types of data, though. Some of them claim a complexity of O(n)! But before we get to them, let's make a short detour.

Distinct values intermezzo

  Another useful operation with lists of items is finding the list of distinct items. From [1,2,2,3] we want to get [1,2,3]. To do this, we often use something called a trie, a tree-like data structure that is used for quickly finding if a value exists or not in a list. It's the thing used for autocorrect or finding a word in a dictionary. It has an O(log n) complexity in checking if an item exists. So in a list of 10 million items, it would take maybe 20 operations to find the item exists or not. That's amazing! You can see that what it does is partition the list down to the item level.

  Unfortunately, this only works for numbers and strings and such primitive values. If we want to make it generic, we need to use a function that determines when two items are equal and then we use it to compare to all the other items we found as distinct so far. That makes using a trie impossible.

  Let me give you an example: we take [1,1,2,3,3,4,5] and we use an externally provided equality function:

  • create an empty output of distinct items
  • take first item (1) and compare with existing distinct items (none)
  • item is not found, so we add it to output
  • take next item (1) and compare with existing distinct items (1)
  • item is found, so we do nothing
  • ...
  • we take the last item (5) and compare with existing items (1,2,3,4)
  • item is not found, so we add it to the output

  The number of operations that must be taken is the number of total items multiplied by the average number of distinct items. That means that for a list of already distinct values, the complexity if O(n2). Not good: It increases exponentially with the number of items! And we cannot use a trie unless we have some function that would provide us with a distinctive primitive value for an item. So instead of an equality function, a hashing function that would return a number or maybe a string.

  However, given the knowledge we have so far, we can reduce the complexity of finding distinct items to O(n*log(n))! It's as simple as sorting the items, then going through the list and sending to output an item when different from the one before. One little problem here: we need a comparison function for sorting, not an equality one.

So far

  We looked into the basic operations of sorting and finding distinct values. To be generic, one has to be provided with a comparison function, the other with an equality function. However, if we would have a comparison function available, finding distinct generic items would become significantly less complex by using sorting. Sorting is better than exponential comparison because it uses partitioning as an optimization trick.

Breaking the n*log(n) barrier

  As I said above, there are algorithms that claim a much better performance than n*log(n). One of them is called RadixSort. BurstSort is a version of it, optimized for strings. CountSort is a similar algorithm, as well. The only problem with Radix type algorithms is that they only work on numbers or recursively on series of numbers. How do they do that? Well, since we know we have numbers to sort, we can use math to partition the lot of them, thus reducing the cost of the partitioning phase.

  Let's look at our starting code. We know that we have numbers from 1 to 50000. We can also find that out easily by going once through all of them and computing the minimum and maximum value. O(n). We can then partition the numbers by their value. BurstSort starts with a number of "buckets" or lists, then assigns numbers to the buckets based on their value (dividing the value to the number of buckets). If a bucket becomes too large, it is "burst" into another number of smaller buckets. In our case, we can use CountSort, which simply counts each occurrence of a value in an ordered array. Let's see some code:

  const output = [];
  const buckets = [];
  calcPerf(() => {
    // for each possible value add a counter
    for (let i = 1; i <= 50000; i++)
      buckets.push(0);
    // count all values
    for (let i = 1; i <= size; i++) {
      const val = input[i];
      buckets[val - 1]++;
    }
    // create the output array of sorted values
    for (let i = 1; i <= 50000; i++) {
      const counter = buckets[i - 1];
      for (let j = 0; j < counter; j++)
        output.push(i);
    }
  });

  This does the following:

  • create an array from 1 to 50000 containing zeros (these are the count buckets)
  • for each value in the input, increment the bucket for that value (at that index)
  • at the end just go through all of the buckets and output the index as many times as the value in the bucket shows

  This algorithm generated a sorted output array in 160 milliseconds!

  And of course, it is too good to be true. We used a lot of a priori knowledge:

  • min/max values were already known
  • the values were conveniently close together integers so we can use them as array indexes (an array of size 50000 is not too big)

  I can already hear you sigh "Awwh, so I can't use it!". Do not despair yet!

  The Radix algorithm, that is used only for numbers, is also used on strings. How? Well, a string is reducible to a list of numbers (characters) so one can recursively assign each string into a bucket based on the character value at a certain index. Note that we don't have to go through the entire string, the first few letters are enough to partition the list in small enough lists that can be cheaply sorted.

  Do you see it yet?

A generic partition function

  What if we would not use an equality function or a comparison function or a hashing function as a parameter for our generic sort/distinct algorithm? What if we would use a partition function? This partition function would act like a multilevel hashing function returning values that can also be compared to each other. In other words, the generic partition function could look like this:

  function partitionFunction(item, level) returning a byte

  For strings it returns the numeric value of the character at position level or 0. For numbers it returns the high to low byte in the binary representation of the number. For object instances with multiple properties, it would return a byte for each level in each of the properties that we want to order by. Radix style buckets would use the known values from 0 to 255 (the partition table returns a byte). The fact that the multilevel partitioning function is provided by the user means we can pack in it all the a priori knowledge we have, while keeping the sorting/distinct algorithm unchanged and thus, generic! The sorting will be called by providing two parameters: the partitioning function and the maximum level to which it should be called:

  sort(input, partitioningFunction, maxLevel)

A final example

  Here is an implementation of a radix sorting algorithm that receives a multilevel partitioning function using our original input. Note that it is written so that it is easily read and not for performance:

  // will return a sorted array from the input array
  // using the partitioning function up to maxLevel
  function radixSort(input, partitioningFunction, maxLevel) {
    // one bucket for each possible value of the partition function
    let buckets = Array.from({length: 256}, () => []); 
    buckets[0] = input;
    // reverse order, because level 0 should be the most significant
    for (let level = maxLevel-1; level >=0; level--) {
      // for each level we re-sort everything in new buckets
      // but taken in the same order as the previous step buckets
      let tempBuckets = Array.from({length: 256}, () => []);
      for (let bucketIndex = 0; bucketIndex < buckets.length; bucketIndex++) {
        const bucket = buckets[bucketIndex];
        const bucketLength = bucket.length;
        for (let bucketOffset = 0; bucketOffset < bucketLength; bucketOffset++) {
          const val = bucket[bucketOffset];
          const partByte = partitioningFunction(val, level);
          tempBuckets[partByte].push(val);
        }
      }
      // we replace the source buckets with the new ones, then repeat
      buckets = tempBuckets;
    }
    // combine all buckets in an output array
    const output = [].concat(...buckets);
    return output;
  }

  // return value bytes, from the most significant to the least
  // being <50000 the values are always represented as at most 2 bytes
  // 0xFFFF is 65535 in hexadecimal
  function partitioningFunction(item, level) {
    if (level === 0) return item >> 8;
    if (level === 1) return item & 255;
    return 0;
  }
  
  let output3 = [];
  calcPerf(() => {
    output3 = radixSort(input, partitioningFunction, 2);
  });

Want to know how long it took? 1300 milliseconds!

You can see how the same kind of logic can be used to find distinct values, without actually sorting, just by going through each byte from the partitioning function and using them as values in a trie, right?

Conclusion

Here is how a generic multilevel partitioning function replaces comparison, equality and hashing functions with a single concept that is then used to get high performance from common data operations such as sorting and finding distinct values.

I will want to work on formalizing this and publishing it as a library or something like that, but until then, what do you think?

Wait, there is more

There is a framework in which something similar is being used: SQL. It's the most common place where ORDER BY and DISTINCT are used. In SQL's case, we use an optimization method that uses indexes, which are also trie data structures storing the keys that we want to order or filter by. Gathering the data to fill a database index also has its complexity. In this case, we pre-partition once and we sort many. It's another way of reducing the cost of the partitioning!

However, this is just a sub-type of the partition function I am talking about, one that uses a precomputed data structure to reach its goal. The multilevel partition function concept I am describing here may be pure code or some other encoding of information we know out of hand before doing the operation.

Finally, the complexity. What is it? Well instead of O(n*log(n)) we get O(n*k), where k is the maximum level used in the partition function. This depends on the data, so it's not a constant, but it's the closest theoretical limit for sorting, closer to O(n) than the classic log version. In our example, k was a log, but its base was 256, not 10 as usually assumed.

I am not the best algorithm and data structure person, so if you have ideas about it and want to help me out, I would be grateful.  

and has 1 comment

Why this article should never have been written

  It's a bit too early to do this. I am sure no one in their right mind would want to display any non-positive words about George Floyd at this moment for fear of reprisals. But I feel like that is exactly what must be done. If Floyd was an innocent victim, a hero that overcame his background only to be brought down, literally, by the heavy boot of law enforcement, then what chance do normal people have?

  I didn't want to be writing about this. I think that the entire thing has become grotesque and the only thing that could now bring these Whites and Blacks together, corrupt police and gangstas, racists and anti-racists... is me! I am sure that if I were to enter the argument, all these people angrily hating each other would come together in trying to murder me. Because while I understand both sides, I can't fully endorse any of them. Racists are just dumb. What the hell does race matter in anything? But I do understand anti-anti-racists, the ones that hate being put together with assholes because of the color of their skin. Anti-racism protesters are dumb. Well, maybe. I am sure all of this protesting is finally going to have an impact, and this is good, but have you seen some of these people? Manicaly jumping on toppled down statues and roaring in megaphones about how great they are because they oppose evil. In this whole discussion, again, normal people are left out. They are boring, they don't clump together, they don't stand out, each one has their own slightly different opinion. Well, this is mine.

The gentle giant saint versus the black monster

  Something happened today that pushed me to write this post. I saw a Facebook post that detailed the criminal record of George Floyd. Cocaine dealing, two armed robberies, one which held him back four years, addiction and, when he was arrested, metamfetamine and fentanyl in his blood and the incriminating fake twenty dollar bill. Was it true? It is a very important question to ask, because many of these things are complete bullshit. So I googled it. And the result was actually worse: almost nothing!

  There are just two websites that actually advertise Floyd's criminal record: Great Game India - self titled "Journal on Geopolitics and International Relations" and sporting articles like "Coronavirus Bioweapon - How China Stole Coronavirus From Canada and Weaponized It" and "How A Pornstar & Sci-Fi Writer Influenced WHO Policies On Hydroxychloroquine With Fake Data" - and The Courier Daily, which seems legit. Funny though, when you search for "George Floyd criminal record" you get Game India first and not The Daily Mail, which is linked in their article and who actually did the research and published the court documents attesting to that. They are fifth on the search page. More, during the writing of this blog post, the Courier Daily link disappeared from my Google search and Game India was demoted to second place, with a "gentle giant" story on top instead.

  Either way, other than Game India, no other news outlet even phrases the title as to indicate George had been a criminal. The few who tackle the subject: The Star, The Daily Mail itself and even The Courier Daily, just portray the man as a flawed individual who nevertheless was set to change, found religion and even moved to Minneapolis to escape his past. And I agree with this viewpoint, because as far as I can see, the armed robbery had been many years before and the man had changed, in both behavior and intent. But hiding this doesn't really help. The Daily Mail article was published on the 26th of May, one day after Floyd's death, and the information therein is either not discussed or spun into a "gentle giant" narrative. He was a bouncer before the Coronavirus hit and he lost his job. George the gentle bouncer?

  One thing is certain, when you search for George's criminal record, it's more likely you get to articles about the criminal records of the arresting officers or Mark Wahlberg's hate crimes than what you actually searched for.

How did George die and why it doesn't matter

  But there is more. How did George die? You would say that having a knee on their throat while they gasp for air saying "I can't breathe" would be enough. But it's not. Several different reports say different things. The first one preliminarily portrays Floyd as a very sick man: coronary artery disease, hypertensive heart disease, even Covid-19. There were "no physical findings that support a diagnosis of traumatic asphyxia or strangulation", but instead they diagnosed it as a heart failure under stress and multiple intoxicants. Finally, two days later, the report admits "a cardiopulmonary arrest while being restrained" by officers who had subjected Floyd to "neck compression". But Floyd's family would not have that, so they commissioned their own autopsy. The result? Floyd died from "asphyxia due to compression of the neck", affecting "blood flow and oxygen going into the brain", and also from "compression of the back, which interferes with breathing". The medical examiner said Floyd had no underlying medical problem that caused or contributed to his death.

  So which was it? It doesn't really matter. He died with a knee on his neck, which should never happen to anyone, and both reports admit it was a homicide. But ignoring all of these other data points doesn't help. People just vilify the policeman and raise George to saintly status. You want to solve something? Start with the truth. All of it. Now both sides have the ammunition required to never change their minds.

  I have not found any article that makes a definitive claim on which report is the good one, if any. They all lean on believing the second, because it fits, but if the first one was a complete fabrication, why wasn't anyone charged with it?

Wikipedia v. Facebook

  So of course I would find about Floyd's criminal past from Facebook. It makes so much sense. It is a pool of hateful bile and rank outrage that brings ugly right up to the surface. But this time it pointed me towards an interesting (albeit short) investigation. Without it, I would have swallowed up the entire completely innocent victim narrative that is pushed on every media outlet. So, once in a blue moon, even Facebook is good for something.

  As you may have noticed above, I took some information from Wikipedia, which has an entire article dedicated to George Floyd's death. It is there where the information about his two medical autopsies is also published. On George Floyd's page, his early life consists of: basketball, football, his family calling him a gentle giant. Then he customized cars, did some rap and was an informal community leader. Only then did he get arrested a few times then put in jail for five years. He was charged in 2007, went to jail in 2009 and was released on 2013. It's just six years and it does not define a man, but try to say that to a police officer who has just read the fact sheet on his cruiser's terminal and has to arrest a 1.93m tall intoxicated man.

  And you may want to read the entire chain of events. The police didn't just put him on the ground, they talked to him, they put him in their car, they fought, they pulled him out, all while being filmed and surrounded by a crowd.

You will never gonna get it

  How much of this is truth and how much of it is spin? You will never know. There are so many people that have to justify their own shit using carefully chosen bits and pieces from this that there will never be a truthful image of who George Floyd was and what happened to him. He is now more than a man and also much less: he is a symbol, rallying people to cry out against injustice, as they see it. The greatest thing George Floyd ever did was die and after that he stopped being human. How sad is that?

  In truth, he was a flawed man. He was not perfect. He was your everyman. A policeman casually killing him while getting filmed doing it hurts on so many levels because that could be you. That was you or your friend or your relative somewhere. But no, they had to make it about being black and being the gentle giant and being killed by the bad psycho cop and his cronies. It sounds like a Hollywood movie because it is scripted as one. You can be certain that at this point several documentaries and movies are in the works about this. And when you'll see it, a big time celebrity will be interpreting Floyd and the cop will be played by that actor who plays psychos in every other movie because he has that face. Once you go black, you can never go back.

  I am not defending anyone here. As I said in the beginning, I am on nobody's side in this. I just hope no one will knee me or my friends to death while everybody films it down.

The world has spoken

  I find it amazing that the protests in Minneapolis have spread to the entire world. It makes me hope that they will slowly turn into protests about things that matter even more than the color of one's skin, like our responsibility as a community to carefully choose our guardians, like having to think for ourselves if something is right or wrong and maybe doing something about it. George Floyd was killed slowly, over nine minutes, while people stood around and filmed it. Not just the other officers, but civilian bystanders, too.

  There were people who did something. At one point a witness said: "You got him down. Let him breathe." Another pointed out that Floyd was bleeding from the nose. Another told the officers that Floyd was "not even resisting arrest right now". Yet another said "Get him off the ground ... You could have put him in the car by now. He's not resisting arrest or nothing. You're enjoying it. Look at you. Your body language explains it." But that's all they did. Wikipedia calls them "witnesses", but you have to wonder: what skin color were they? Were they afraid they would be next and that's why all they could was beg for George's life as he slowly died? Or did they believe the story that American TV has fed them for decades, that cops are ultimately good people who break the rules in order to protect the innocent? Or maybe a more recent belief had taken hold: that filming injustice makes you a hero and it's more than enough.

  The world has spoken. Racism must go, police brutality must go. Let's not replace them by carefully crafted fantasies, though. Let's see the truth as it is so we can make it better.

2020 is great so far

  I am not being sarcastic. After a virus that punched presidents of the free world and dictators alike in the nose, that made people question their fake feelings of safety and forced them to act, here comes this age of protesting how things are. We have been shaken awake. Will we fall asleep again? I am sure we will, but some things will have changed by then. And the year is not yet over.

and has 1 comment

  I want to write this post to talk about the most common mistake I make as a software developer, even after almost 20 years of experience. And it's not code related. It's more human. But I would also like to hear what you think your biggest mistakes are that are not related to lack of experience. Don't be shy!

  My mistake: assumptions.

  I was assigned this bug recently and, wanting to do a fast job and impress people, I investigated the code, noticed a bug, fixed it, then immediately gave it in for review. I had reasons for doing that, because I was new and did not know the application well. The leads would tell me if they thought I did not find the problem. But, out of the goodness of my heart, you see, I've decided to test the fix as well. And I discovered that the feature was barely implemented. It was not a bug, it was a full fuck up.

  What happened here? I assumed a certain quality of the code and expected, without any reasonable evidence, to find a small typo or a logic bug that would be solved by writing a few lines of code. Instead, I had to reimplement the whole thing as a fix, I pissed off the lead devs because they had enough on their plate and made the entire team wonder what I was doing there. I mean, I haven't even tested the fix!

  Doing things fast means working on valid assumptions that allow you to cut corners. In a new environment, with a team you are not familiar with and a code base you don't understand, you do not have the luxury to make assumptions. Instead, do a good job first: investigate thoroughly, see what the reported problem is, find the actual problem (which may be different), come with an attack plan, implement it, then test that it had the desired result. Yes, it takes more time than to quickly compile the logic flow in your head and hope for the best, but in some places you don't get second chances at fixing things, teams are more formal, processes need to be followed. Optimism is also based on assumptions. Be a realist instead.

  In order to win a game you need to know its rules. That applies both to development process and to local politics, which sometimes are more important than the work. Once you are a good player, you can start experimenting. The title "senior developer" is not given in a vacuum, but is relevant (or not) depending on the environment. Yes, you have experience, but you don't have *this* experience yet. In my desire to be efficient and fast I didn't do anything right and I couldn't believe I have been that stupid.

  Now, how about you? What are your silly mistakes that you can't believe you are still making?

and has 0 comments

  I am writing this post to make people aware of the changes that happen around them because of the Covid pandemic. How easy was it for them to pop up and how many of these "extraordinary measures" will stick with us after we get rid of the virus? 

  I wake up and turn on BBC News. First reporting of the day is the mass graves in the US. Yeah, I was surprised, too... but just a little bit. Mass graves? Aren't those things that happen when people want to kill a whole bunch of other people? Like conflict or ethnic cleansing or whatever the euphemism of the day is for war? And in the US? Home of the brave, the free, the rich and the apathetic? Apparently New York has had a special little island close by to use as the dumping ground for dead people that don't have money or relatives or names. It's been a human garbage bin for 150 years!

  Then I open up YouTube and watch this video that is unrelated to the virus, but at the beginning of the video they talk about "the virus that we cannot name". Apparently YouTube has rules to protect "the truth" by censoring free speech. And yes, it's not the government, it's a private company that pretty much can do whatever it freaking wants, and what it wants if for you to not speak some specific words. Facebook does it, your search engine does it, TV stations do it. I open a news site and I see an article about a conspiracy theorist who was "allowed" to speak on BBC about his dumbass ideas. Ofcom, the media regulator in the UK also has rules about what people can say or not on TV. Next article is about a movie about xenophobes in an elevator picking on an Asian woman who dares cough. The whole idea of the article was to wonder if the film was "unethical" or if it is too soon for Covid movies. Who gets to decide what is true or not, hurtful or not. This is an older question, but now it's come into contrast.

  I go out and I see a military police car, all armored, with a gun rack on top, police stopping cars passing by to check the papers of the drivers. Every minute or so a patrol car would pass, blinking lights on. And the new rules. We are now at the eighth iteration of a military ordnance telling civilians what to do. Governments have instantly given themselves as much power as possible, some using it to further their own agenda, like the prime minister of Hungary moving quickly to pick on gay people. And yes, Israelis laugh at the world going all brisly about these rules that most believe a bit extreme, regardless of how necessary, because in Israel they have these kind of rules into their regular constitution. This did not stop Netanyahu to get even more power and use it immediately when he could. And this without mentioning Trump. There was a scene in the American TV series Homeland where it is asked "What do weak presidents do to appear strong?" "They go to war". But we are not in war.

  How did this happen? How did we reach a point in which the military is telling everybody what to do, corporations and pundits and social pressure tells us what to say and governments get extra judicial power that they use however they see fit in a time of peace?

  When this whole thing started, the first thing I googled (again, Google) was what to do in case you are infected. And all the pages that came about were about Covid and the official recommendation from both politicians and doctors: stay in, wash your hands, leave masks to the medical professionals, don't self medicate, report immediately if you have symptoms. So I repeated the query, now with -covid so that I see what people were saying about what to do when you get a virus *before* all of this started. And lo' and behold, the advice was completely different: hand washing (or waving) doesn't really do much, do self medicate with anti inflammatory drugs to avoid a cytokine storm, take vitamin D (or generate it by being in the sun) and zinc, vitamins C and E also help, masks help with both sick and healthy and, most of all, keep hydrated by drinking lots of liquids, preferably warm, like soups. Slowly, the "expert view" is changing back to what people in the field were saying from before the declaration of the pandemic.

  This is also important: the definition of pandemic is a disease that is prevalent over a country or the entire world. Practically it became a pandemic from the moment the World Health Organization declared it so. Our belief in the terms that are vehiculated gives them power. I am not saying that Covid is not prevalent over the world, or that you should not take it seriously, but things only began to move when enough people were convinced that it was real. Before action, an ideological pandemic has to happen. The brutal decisions that are being taken in your name right now are based on your belief in various narratives that may be correct or not. BTW, if I search now on Google on the same thing a search page dedicated to Covid-19 appears. 

  The question is not of truth, but of utility. If it is not useful, who cares it is true? is the old adage. But then the question becomes: useful to whom? At first they wanted you to come to the hospital, so that they can have as much information and control as possible and to isolate you, then get all the people you came into contact with and do the same to them. It is good for us all as a community, but not particularly for the sick person who is now confined in haste, perhaps with other people that are infected so they can swap strains and being taken care of in medical systems unfit for that job. Wearing masks doesn't do much if you are in an infected place, but it can protect both you and others in more relaxed environments, like public transportation or on the street, not to mention that it's a simple way to remind you not to touch your face. But they were way more useful to medical personnel and so they spun this story where you should not use them unless you know you are sick. When the number of masks and the number of sick increased, the narrative changed to use masks, but don't come to the hospital.

  I've said it before and I will repeat it at nauseam because it is true, it is important and it is verifiable: the only reason the very deadly pandemic in 1918 was called The Spanish Flu was that Spain was neutral and therefore free to report on people getting sick and dying of a disease. All the other countries were caught in their little World War that killed way less people, but that put the military in power to enact censorship. It is also the reason why most of people today haven't even heard of the 1918 influenza pandemic. We are not in a declared war right now, but the reaction of authorities all over the world is kind of the same. It's impossible to hide things in this world of social media and non stop global TV networks, right? Wrong. There were news outlets in 1918, too, and they all declared themselves independent. There are a billion films and books and plays about the heroes of WWI. Where are the ones about a virus that killed so many people? Things don't have to be hidden from you, just depicted a little differently than reality in a consistent way. Doesn't the current situation appear similar? And we are not in a war.

  Ask yourself this: what narrative is being spun around you and who does it benefit? Have you looked at the problems you have and actively searched for solutions that were not pushed towards you by others? The truth is out there, but you have to actually look for it. Yes, we need to find a solution to the virus that has spread around the world and kills people, but we are not at war. We have a problem and we have to solve it, that is it. So the next time some solemn guy with a grave face tells you what to do, ask yourself, why the hell is he wearing an uniform? We are not at war.

and has 0 comments

Intro  

  One of the most asked questions related to the novel coronavirus is "what is the mortality rate of the disease?" And most medical professionals and statisticians will choose not to answer it, because so far the data is not consistent enough to tell. Various countries report things differently, have different testing rates and methods and probably different definitions of what it means to be dead or recovered from Covid-19. To give a perfectly informed answer to this is impossible and that is why the people we look to for answers avoid the question, while people who are not professionals are giving all of the possible answers at the same time. I am not a professional, so I can give my answer and you can either trust my way of thinking or not.

  In order to compute mortality with absolute certainty we need several things:

  • the pandemic has to be over
  • the number of deaths from SARS-Cov-2 has to be exactly known
  • the number of people infected with SARS-Cov-2 has to be exactly known

 Then the answer would be the total number of dead over the total number of infected people (100*dead/infected). During the epidemic, though, people tend to use the numbers they have.

Panic!

 One of the most used formulas is: current number of deaths over the total number of infected so far (100*current deaths/current infected). This formula is wrong! Imagine there would be two people A and B. Both get infected at the same time and no one else gets infected after that. A will die from the disease in a week, B will recover in two weeks. If we use the formula above, for the first week the mortality of the disease is 0, then it becomes 50% after a week and it stays that way until the end. If B would die, too, the mortality would be computed as 0, then 50, then 100%. As you see, not much accuracy. In the case of Covid-19 the outcome of an infection is not known for three weeks or even more (see below).

  But let's use it anyway. Today, the 31st of March 2020, this would be 100*37832/786617 which is 4.8%. This is a large number. Applied to the entire world population, it would yield 362 million deaths.

  Accuracy comes from the finality of an outcome. A dead man stays dead, a recovered one stays recovered. A better formula is current number of deaths over the sum of current number of deaths and current number of recovered (100*current deaths/(current deaths+current recovered)). This eliminates the uncertainty of people who are sick, but still have to either die or live. If we would know with certainty who is infected and who is not, who died from Covid-19 and who recovered, this would actually be pretty accurate, wouldn't it?

  If we use it on Covid-19 today, we have  100*37832/(37832+165890), which gives us an 18.57% mortality rate. "What!? Are you insane? That's a fifth of all people!", you will shout, with immediate thoughts of a fifth of the world population: 1.4 billion people.

  So which number is the correct one? Neither. And it all comes from the way we define the numbers we use.

Reality

  OK, I kind of tricked you, I apologize. I can't answer the question of mortality, either. My point is that no one can. We can estimate it, but as you have seen, the numbers will fluctuate wildly. And the numbers above are not the extremes of the interval, not by a long shot. Let's explore that further while I explain why numbers derived from bad data cannot be good data.

  What are the types of data that we have right now?

  • deaths
  • infected (cases)
  • recovered
  • tested
  • total population of an area

  And we can't trust any of these.

Cases/infected

  One cannot confirm an infection without testing, which is something that for most countries (and especially the ones with numerous populations) it is really lacking. We know from small countries like Iceland that when you test a significant part of the population, half of the number of infections show no symptoms. The rest of 50% are on average also experiencing mild symptoms. The number of severe cases that can lead to death is relatively small. The takeaway here is that many more people can be infected than we think, making the actual mortality rate be very very small.

  So, can we use the Iceland data to compute mortality? Yes we can, on the part of the population of Iceland that was tested. We can't use that number for anything else and there are still people that have not been infected there. What is this significant percent of the population that was tested? 3%. 3% right now is considered large. Iceland has a population of 360000, less than the neighbourhood I live in. 3% of that is 10800 people. The largest number of tests have been performed in South Korea, a staggering number of 316664. That's only 0.6% of the total population size.

  But, using formula number 2, mortality for from the Iceland data would be 100*2/(2+157), which is 1.26%. Clearly this will get skewed quite a lot if one more person dies, so we can't really say anything about that number other than: yay! smaller than 4.8%!

  We can try on South Korean data: 100*162/(162+5408) which gives us a 2.9% mortality rate.

  Yet, assuming we would test a lot of people, wouldn't that give us useful data to make an accurate prediction? It would, only at this time, testing is something of a confusing term.

Testing

  What does testing mean? There are two types of tests: based on antibodies and based on RNA, or molecular tests. One tells you that the body is fighting or has fought an infection, the other is saying that you have virus stuff in your system. The first one is faster and cheaper, the other takes more time, but is more accurate. In all of these, some tests are better than others.

  There were reports that people who were infected before and recovered got reinfected later. This is not how that works. The immune system works by recognizing the intruder and creating antibodies to destroy it. Once your body has killed the virus, you still keep the antibodies and the knowledge of what the intruder is. You are, for at least a number of months, immune to the virus. The length of time for this immunity depends not on how forgetful your immune system is, but on how much the virus mutates and can trick it into believing it is not an intruder. As far as we know, SARS-Cov-2 is relatively stable genetically. That's good. So the reason why people were reported to get reinfected was that they were either false positives when they were detected or false negatives when they were considered recovered or false positives when they were considered reinfected.

  Does it mean we can't trust testing at all and it's all useless? No. It means that at the beginning, especially when everybody was panicking, testing was unreliable. We can start trusting tests now, after they have been used and their efficacy determined in large populations. Remember, though, that the pandemic pretty much started in January and for many countries just recently. It takes time to make this reality the new normal and people and technology work in a "proper way".

  Testing is also the official way of determining when someone has recovered.

Recovered

  It is surprisingly difficult to find out what "recovered" means. There are also some rules, implemented in a spotty way by the giants of the Internet, which determine which web pages are "not fake news", but I suspect that the system filters a lot of the legitimate ones as well. A Quora answer to the question says "The operational definition of “recovered” is that after having tested positive for the virus (you have had it) you test negative twice, 3 days apart. If you test negative, that means that no RNA (well, below a certain threshold) from the virus is found in a nasal or throat swab."

  So if you feel perfectly fine, even after having negative effects, you still have to test negative, then test negative again after three days. That means in order to determine one is recovered, two tests have to be used per person, tests that will not be used to determine infection in people with symptoms or in people who have died. I believe this would delay that kind of determination for quite a while.

  In other words, probably the number of recovered is way behind the number of infected and, obviously, of deaths. This means the mortality has to be lower than whatever we can compute using the currently reported values for recovered people.

Deaths

  Surely the number of dead is something we can trust, right? Not at all. When someone dies their cause of death is determined in very different ways depending on where they died and in situations where the morgues are overflowing with dead from the pandemic and where doctors are much better used for the sick you cannot trust the official cause of death! Moreover, on a death certificate you can write multiple causes of death. On average, they are about two or three, some have up to 20. And would you really use tests for covid for the dead rather than for the sick or recovered?

 Logically it's difficult to assign a death to a clear little category. If a person dies of a heart attack and it is tested positive of SARS-Cov-2, is it a heart attack? If someone dies of hunger because they lost their job during the pandemic, is it a Covid-19 death or not? If an 87 year old dies, can you really say which of the dozen medical conditions they were suffering of was the culprit?

 So in some situations the number of deaths associated with Covid-19 will be overwhelmingly exaggerated. This is good. It means the actual mortality rate is lower than what we can determine right now.

Population in an area

  Oh, come on! We know how many people there are in an area. How can you not trust this? Easy! Countries like China and Italy and others have implemented quarantine zones. That means that the total people in Italy or China is irrelevant as there are different densities of the contagion in regions of the same territory. Even without restrictive measures, geography and local culture as well as local genetic predispositions will work towards skewing any of the relevant values.

  Yeah, you can trust the number of people in small areas, especially if they are isolated, like Iceland, but then you can't trust those numbers in statistics, because they are not significant. As the virus spreads and more and more people get infected, we will be able to trust a little more the values, as computed over the entire world, but it will all be about estimations that we can't use in specific situations.

Infectiousness

  An important factor that will affect the total number of deaths, rather than the percent of dead over infected, is how infectious Covid-19 really is. Not all people exposed to SARS-Cov-2 will get infected. They are not really immune, some of them will be, some of them will be resistant enough to not catch the virus. I need a medical expert to tell me how large this factor is. I personally did not find enough information about this type of interaction (or lack thereof) and I suspect it is a small percent. However, most pessimistic scenarios assume 80% of the world population will get infected at some point. That implies a 20% that will not. If anyone knows more about this, please let me know.

Mortality trends

  There is another thing that has to be mentioned. By default viruses go through the process of attenuation when going through large populations. This is the process by which people with milder symptoms have a larger mobility, therefore they infect more people with their milder strain, while sicker people tend to "fall sick" and maybe die, therefore locking the more aggressive strains away from the general population. In this context (and this context only) quarantines and social distancing are actually bad because they limit the mobility of the milder strains as well as of the aggressive ones. In extreme cases, preventing people from interacting, but then taking severely sick people to hospitals and by that having them infect medical personnel and other people is making the disease stronger.

  However, statistically speaking, I expect the mortality of the virus to slowly decrease in time, meaning that even if we could compute the mortality rate exactly right now, it will be different later on.

  What about local authorities and medical administrators? How do they prepare for this if they can't pinpoint the number of sick and dead? The best strategy is hope for the best while preparing for the worst. Most politicians, though, live in a fantasy world of their own making where words and authority over others affect what and how things are done. There is also the psychological bias of wanting to believe something so much that you start believing it is probable. I am looking at you, Trump! Basically that's all he does. That being said, there are a lot of people who are doing their job and the best they can do is to estimate based on current data, then extrapolate based on the evolution of the data.

  So here is another piece of data, or rather information, that we have overlooked: the direction in which current data is moving. One of the most relevant is what is called "the peak of the contagion". This is the moment when, for whatever reasons, the number of infected and recovered has reached a point where the virus has difficulties finding new people to infect. The number of daily infections starts to decrease and, if you can't assign this drop to some medical or administrative strategy, you can hope it means the worst is behind you. Mind you, the number of total infected is still rising, but slower. I believe this is the one you should keep your attention on. While the number of daily infected people increases in your area, you are not out of the woods yet. 

Mechanism

  Statistical studies closely correlate the life expectancy of a population with the death rate in that population. In other words there isn't a specific mechanism that only kills old people, for example. In fact, this disease functions like a death probability amplifier. Your chances to die increase proportionally to how likely you were to die anyway. And again, statistically, it doesn't apply to you as an individual. The virus attacks the lungs and depending on your existing defenses, it is more or less successful. (To be fair, the success of a virus is measured in how much it spreads, not how badly it sickens its host. The perfect virus would show no negative symptom and increase the health or survival chances of its host. That's how vampires work!)

  I have no doubt that there are populations that have specific mutations that make them more or less susceptible to SARS-Cov-2, but I think that's not statistically relevant. I may be wrong, though. We can't know right now. There are reports of Italian regions in the middle of the contagion that have no sick people. 

Conclusion

  We cannot say with certainty what is the mortality rate right now. We can't even estimate it properly without going into horrible extremes. For reasons that I cannot ascertain, WHO Director-General Dr Tedros Adhanom Ghebreyesus announced on the 3rd of March a mortality rate estimated at 3.4%. It is immense and I personally believe it was irresponsible to make such a statement at that time. But what do I know? A UK study released today calculates a 1.4 fatality rate.

  My personal belief, and I have to emphasize that is a belief, no matter how informed, is that the mortality of this disease, by which I mean people who would have not died otherwise but instead died of viral pneumonia or organ failure due to SARS-Cov-2 overwhelming that very organ over the total people that have been exposed to the virus and their immune system has fought it, will be much less than 1%. That is still huge. Assuming a rate of infection of 80%, as many scenarios are considering right now, that's up to 0.8% of all people dying, meaning 60 million people. No matter what proportion of that number will die, it will still be a large number.

  The fact that most of these people would have been on their way anyway is not really a consolation. There will be loved grandparents, people that had various conditions and were happily carrying on with their first world protected lives, believing in the power of modern medicine to keep them alive. I really do expect that the average life expectancy, another statistic that would need thousands of words to unpack, will not decrease by a lot. In a sense, I believe this is the relevant one, though, in terms of how many years of life have been robbed from people by this virus. It, too, won't be easy to attribute. How many people will die prematurely because of losing their job, not getting medical attention when they needed it, getting murdered by people made insane by this whole thing, etc?

  Also, because the people who were more likely to die died sooner, or even got medical attention that they would otherwise not gotten, because pollution dropped, cars killed less people, etc, we might actually see a rise of the life expectancy statistic immediately after the pandemic ends.

  Bottom line: look for the daily number of newly infected people and rejoice when it starts consistently decreasing. After the contagion, try to ascertain the drop in average life expectancy. The true effects of this disease, not only in terms of mortality, will only become visible years after the pandemic ends.

  Update: mere days after I've written this article, BBC did a similar analysis.

  I didn't want to write about this. Not because of a false sense of security, but because everybody else talked about it. They all have opinions, most of them terribly wrong, but for me to join the fray and tell the world what I think is right would only put me in the same category as them. So no, I abstained. However, there are some things so wrong, so stupidly incorrect, that I can't maintain this silence. So let's begin.

  "The flu", "a cold" are not scientific, they are popular terms and they all relate to respiratory infectious diseases caused by a variety of viruses and sometimes bacteria or a combination thereof. Some of them affect us on a seasonal basis, some of them do not. Rhinoviruses are the ones most often associated with the common cold and they are seasonal. However, a whooping 15% of what is commonly called "a cold" comes from coronaviruses, thus named because of their crown-like shape. Influenza viruses, what we would normally call "flu" are a completely different type of virus. In other words, Covid-19 is more a common cold than a flu, but it's not the seasonal type. Stop wishful thinking that it will all go away with the summer. It will not. Other famous coronavirus diseases are SARS and MERS. The SARS epidemic lasted until July, the MERS epidemic spreaded just fine in the Middle Eastern summer weather. This will last. It will last for months from the moment I am writing this blog. This will be very important for the next section of the post.

  Also, there is something called the R-naught (R0), the rate with which a virus spreads to other people. It predicts, annoyingly accurate, how a disease is going to progress. This virus has an R0 probably twice as high as that of the influenza virus, which we all get, every fucking year. Draw your own conclusions.

  The only reason we got rid of SARS and MERS is because they are only infectious after the symptoms are apparent and the symptoms are pretty damn apparent. Covid-19 is very infectious even before the first cough, when people feel just fine. Surely masks will help, then? Not unless they are airtight. Medical masks are named so because medics use them in order to not cough or spit or breathe inside a patient, maybe during surgery. The air that the doctor breathes comes from the sides of the mask. So if you get sick and you wear the mask it will help the people that have not met you while you had no symptoms yet.

  Washing the hands is always good. It gets rid of all kind of crap. The primary medium of spreading Covid-19 is air, so you can wash your hands as often as you'd like, it helps very little with that. Stopping touching your face does little good, either. There is a scenario when someone coughs in their hand, touches something, then you touch it, then you pick your nose. Possible, so it's not all worthless, it's just statistically insignificant. What I am saying is that washing your hands and not touching yourself decreases the probability a very small amount. That being said, masturbation does increase the activity of your immune system, so be selective when you touch yourself.

  The idea that old people are the only ones affected is a myth. Age statistically correlates with harsher symptoms because it also correlates with negative health conditions. In other words, people with existing health conditions will be most affected. This includes smokers, obese people, people with high blood pressure, asthma and, of course, fucking old people. The best way to prepare for a SARS-Cov-2 virus (the latest "official" name) is to stay in good health. That means healthy food, less alcohol, no smoking and keeping a healthy weight. So yes, I am fucked, but at least I will die happy... oh, no, I am out of gin!!

  Medically, the only good strategy is to develop a vaccine as soon as possible and distribute it everywhere. It will lead quicker and with less casualties to the inevitable end of this pandemic: when more people are immune than those who are not. This will happen naturally after we all get infected and get healthy (or die). All of the news of people who got sick after getting healthy are artefacts of defective testing. All of it! Immunity does not work like that. You either got rid of it and your body knows how to defend itself or you never had it or you had something else or somebody tested you wrong.

  That being said, fuck all anti-vaxxers. You are killing people, you assholes!

  Personally, the best you can do is keep hydrated and eat in a balanced way. You need proteins and zinc and perhaps vitamin C (not sure about that). Warm bone broths will be good. Zinc you get from red meat and plant seeds. There was a report of drinking green tea being negatively correlated with influenza infections (different virus, though). And don't start doing sport now, if you haven't been doing it already, you can't get the pig fat one day before Christmas. Sport is actually decreasing the efficiency of your immune system.

  This is the end of the medical section of this post. There is nothing else. Probiotics won't help, Forsythia won't help, antibiotics will certainly not help. The only thing that fights the virus right now is your immune system, so just help it out. If there was a cure for the common cold you wouldn't get it each year every year.

  But it's not over. Because of people. When people panic, bad things happen. And by panic, I mean letting their emotions get the better of them, I mean not thinking people, not zombie hordes, although sometimes the difference is academic.

  Closing schools and workplaces and public places has one beneficial effect: it makes the infection rate go down. It doesn't stop the spread, it doesn't stop the disease, it just gives more time to the medical system to deal with the afflicted. But at the same time, it closes down manufacturing, supply chains, it affects the livelihood of entire categories of people. So here is where governments should step in, to cover financially the losses these people have to endure. You need money for medical supplies and for keeping healthy. Think of it as sponsoring immune systems.

  The alternative, something we are seeing now in paranoid countries, is closing down essential parts of national economies with no compensation. This is the place and time for an honest cost vs. gain analysis. Make sure the core of your nation is functioning. This is not one of those moments when you play dead for a few minutes and the bear leaves (or falls down next to you because he really likes playing that game). This is something that needs to work for months, if not a year or more. This is not (and never was) a case of stopping a disease, but of managing its effects. Some people are going to die. Some people are going to get sick and survive. Some lucky bastards will cough a few times and go on with their day. Society and the economical system that sustains it must go on, or we will have a lot more problems than a virus.

  Speaking of affected professions, the most affected will be medical personnel. Faced day in and day out with SARS-Cov-2 infections they will get infected in larger numbers than the regular population. Yes, they will be careful, they will wear masks and suits and whatever, but it won't help. Not in a statistical way, the only way we must think right now. It's a numbers game. It's no longer about tragedies, it's about statistics, as Stalin used to say. And these people are not regular people. They've been in school for at least a decade before they can properly work in a hospital where Covid-19 patients will be admitted. You lose one of these, you can't easily replace them. Especially in moron countries like my own, where the medical system is practically begging people to leave work in other countries. The silver lining is that probably, at the end of the outbreak, there will be a lot more medical people available, since they went through the disease and emerged safe and immune. But there is a lot of time between now and then.

  Closing borders is probably the most idiotic thing one can do, with perhaps the exception of countries that had real problems with immigration before. If sick people don't crowd your borders in order to take advantage of your medical system, closing borders is just dumb. The virus is already in, the only thing you are stopping is the flow of supplies to handle the disease. Easter is coming. People from all over the world will just move around chaotically to spend this religious holiday with their family. It will cause a huge spike in the number of sick people and will probably prompt some really stupid actions taken by governments all over the place. One could argue that religion is dumb at all times, but right now it makes no difference. It's just an acceleration of a process that is already inevitable, Easter or no Easter.

  Statistics again: look at the numbers and you will see that countries get an increase of 30% in infected cases every day. It's an exponential curve. It doesn't care about your biases, your myths, your hopes, your judging. It just grows. China will get infection cases as soon as travelling restrictions relax. Consider the ridiculous situation where one somehow protected their country against infection when the whole of the world went through a global pandemic. It doesn't even matter. It's not even healthy, as sooner or later that virus will affect only them. The best they can do is manage the situation, bottleneck it so that the medical system can cope with it.

  Do you know what the most important supply chain is in this situation? Medical supplies. A lot of countries get these from China and India. Because they are cheaper. So they can sell them to you at ten times the prices and make those immense profits that generated the name Big Pharma. It's not a conspiracy theory, it's common knowledge. What do you think happens when you close your borders with China and India?

  In this situation, the globally economy will stagger. It will be worse than the 2008 crisis. But while that was a crisis generated by artificial and abstract concepts that affected the real economy, that of people working for other people, this one comes as real as it gets, where people can't work anymore. That means less money, less resources, scarcity of some resources, less slack to care of the old and sick in your family. It's a lose-lose situation: the most affected by the pandemic will be affected either by people not being able to care for them or people giving them the disease while caring for them because they must make much more effort and human contact to get the supplies needed. Now, some countries can somehow handle that by employing a healthy transport infrastructure and care system, but in others, where they can barely handle normal quantities of sick people that come to hospitals themselves, they will never be able to cover, even if they wanted to, the effort to give supplies to previously affected people.

  So does that mean you have to go to the supermarket and get all the supplies you might need for months to come? I am afraid to say that it does. The reasonable way to handle this is for the governments of the world to ensure supply and financial support for everybody. Then people wouldn't need to assault shops to get the last existing supplies. If you can trust your government to do that, by all means, trust you will always have a nearby shop to sell you the goods you need to stay alive and health. But I ask you this: if you got to the farmacy and bought their entire stock of some medicine that you might need and then you hear your neighbor, the person you greeted every day when you got to work, died because they couldn't get that medicine, what then? What if you hear they need the medicine now? Will you knock at their door and offer it to them? Maybe at five time the price? Or maybe for free? What if you are the neighbor?

  And you hear that some country has isolated the virus and are making a vaccine. Oh, it's all over, you think. But before they even start mass producing it, they need to test it. People will die because of how overcautious and bureaucratic the system is. People will die when corners are cut. People will die either way. It will take time either way. This thing will be over, but not soon. After they make it, you will still have to get it. That means supply chains and money to buy things.

  Bottom line: it's all about keeping systems going. In your body, the immune system has to be working to fight the disease. In your country, the economy must be working in order to handle the effects of the disease. Fake cures and home remedies are just as damaging as false news of the crisis not being grave, getting over soon or going away by itself.

  Here is a video from a medical professional that is saying a lot of the things I've listed here:

[youtube:E3URhJx0NSw]

  One more thing: consider how easy it was for this panic to lead to countries announcing national emergency, a protocol that gives extraordinary powers to the government. A few dead here, a few sick there, and suddenly the state has the right to arrest your movement, to detain you unconditionally, to close borders, to censor communications. Make sure that when this is over, you get every single liberty back. No one it going to return freedom to you out of their own good will.

Summary

Once you finished with the foundation, it doesn't matter who you call to architect your house or fix problems you might have. Businesses and software are exactly like that. Think hard about your foundation, it will save you a lot of effort later on. I've been working in a lot of different places and was surprised to see they didn't know there are other ways of doing things. I distill the foundational principles one needs for a good software solution and maybe not just software:

  • Separation of concerns - processes, components and people should be able to function in isolation. If you can test they work when everything else is shut down, you're good. People should only do what they are good at. Components should do only one thing.
  • Cleanliness - keep your code readable rather than efficient, your process flow intuitive, roles and responsibilities clear. Favor convention over documentation, document anything else.
  • Knowledge sharing - Allow knowledge to be free and alive in your organization by promoting knowledge sharing, collaborative documentation, searchability.

Intro

  I am not the greatest of all developers or architects, but I am good. I know how things should be and what they should do in order to reach a goal. When people ask me about software, my greatest gaps are around specific software tools or some algorithm, not the general craft. That is because of several reasons: I enjoy my work, I've been really enthusiastic in my youth and sponged everything I could in order to become better and I've worked in many different types of organizations so I know multiple ways in which people have tried to do this. As I grow older, the last one may be my most valuable skill, but I am yet to find the employer to realize this.

  You see, what I've learned from my career so far is that most businesses live in a bubble. Used to not only learn software development as I am working on some task, but also network with other people in the craft from all across the business, I kind of expected every other developer to be like that. Or at least the senior devs, the dev leads and architects, maybe some managers. But no, most of the time, people are stuck in their little position and never stray from it. They may invoke life work balance, or they are just burned out, or they just don't care enough, but more often, they haven't even realized what they are missing. And that's the bubble. A bubble is not a prison, is just a small area in which people stay voluntarily and never get out of.

  This is why gaming development is so different from business app development. That is why development in an administrative business with a small software department is so different from development in a software company. It is why development in small shops is so different than in large software companies. Sometimes people, really smart people, have worked for a long time in only one of these ecosystems and they only know that one. They can hardly conceive of different ways to do things.

  So this is why I am writing this post, to talk about the foundations of things, that part that separates one from the other, forever, no matter what you do afterwards. And this applies to business, people organization, but especially well to large software projects. You see, if you start your solution badly, it will be bad until you rewrite it. Just like a building with a weak foundation. It doesn't matter you bring the best workers and architects afterwards, they will only build a wonderful house that will fall down when the foundation fails. You want to make a good thing, first plan it through and build the greatest foundation you can think of and afford. It's much easier to change the roof than the foundation.

  And you wouldn't believe how many times I've been put in the situation of having to fix the unfixable. "Hey, you're smart, right? We started this thing a million years ago, we thought we would save some money, so we got a bunch of junior developers to do it, and it worked! But then it didn't anymore. Fix it!" And I have to explain it to them: you can't scale duct tape. You can go only so much with a thing held together by paper clips, chewing gum and the occasional hero employee with white hair and hunched back and in his twenties.

  Now of course, to an entitled senior engineer like me any software evokes the instinct to rewrite it in their own image. "Also, get some juniors to carve my face into that hill over there!". Sometimes it's just a matter of adapting to the environment, work with what you have. But sometimes you just have to admit things are beyond salvation. Going forward is just a recipe for disaster later on. It's the law of diminishing returns when the original returns were small to begin with. And you wouldn't believe how many people agree with that sentiment, then declare there is nothing that can be done. "They won't give us the budget!" is often muttered. Sometimes it's "We only need this for a few years. After that we start from scratch" and in a few years some "business person" makes a completely uninformed cost and gain analysis and decides building up from existing code is cheaper than starting over. But don't worry, they will suuuurely start from scratch next time.

  Sometimes the task of rewriting something is completely daunting. It's not just the size of it, or the feeling that you've achieved nothing if you have to start over to do the same thing. It's the dread that if you make the same thing and it takes less effort and less money and it works better then you must be inferior. And it's true! You sucked! Own it and do better next time. It's not the same thing, it's version 2.0. You now have something that you couldn't have dreamed of when building version 1.0: an overview. You know what you need, not what you think you will need. Your existing project is basically the D&D campaign you've played so many times that it has become a vast landscape, rich with characters and story. You've mapped it all down.

  This post is an overview. Use it! To be honest, reaching this point is inevitable, there will always be a moment when a version 2.0 makes more sense than continuing with what you have. But you can change how terrible your software is when you get to it. And for this you need the right foundation. And I can teach you to do that. It's not even hard.

Separation of Concerns

  Most important thing: separation of concerns. Components should not be aware of each other. Compare a Lego construction to a brick and mortar one. One you can disassemble and reassemble, adding to it whatever you need, the other you need to tear down and rebuild from zero. Your foundation needs to allow and even enable this. Define clear boundaries that completely separate the flow into independent parts. For example a job description is an interface. It tells the business that if the person occupying a job leaves, another can come and take their place. The place is clearly defined as a boundary that separates a human being from their role in the organization.

  Software components, too, need to be abstracted as interfaces in order to be able to swap them around. And I don't mean the exact concept of interface from some programming languages. I mean that as loosely as one can. A web service is an interface, since it abstracts business logic from user interface. A view model is an interface, as it abstracts the user interface logic from its appearance. A website is an interface, as it performs a different task than another that is completely separated. If you can rewrite an authorization component in isolation and then just replace the one you have and the application continues to work as before, that means you have done well.

  Separation of concerns should also apply to your work process and the people in it. A software developer should not have to do much outside developing software. A manager should just manage. People should only be in meetings that bring value and should only be in those that actually concern them. If the process becomes too cumbersome, split it up into smaller pieces, hire people to handle each of them. Free the time of your employees to do the job they are best suited for. 

  One important value you gain from isolating components is testing. In fact, you can use testing as a force towards separation of concerns. If you can test a part of your application in isolation (so all other parts do not need to be working for it), then you have successfully separated concerns. Consider a fictional flow: you get on the bus, you get to the market, you find a vegetable stand, you buy a kilo of tomatoes, you get back to the bus, you come home. Now, if you can successfully test your ability to get on a bus, any bus, to get anywhere the bus is going, in order to test that you can buy tomatoes from the market you just test you can find the stand and buy the tomatoes. Then, if you can test that you can buy anything at any type of stand, you only need to test your ability to find a stand in a market.

  It seems obvious, right? It feels that way to me. Even now, writing this post, I am thinking I sound like an idiot trying to seem smart, but I've seen droves of developers who don't even consider this. Businesses who are not even aware of this as a possibility. "We have testing teams to make sure the application is working end to end, we don't need unit testing" or "We have end to end automated testing. For each new feature we write new tests". When you hear this, fight it. Their tests, even if written correctly and maintained perfectly, will take as long as one needs to get on a bus and go to the market. And then the other test will take as long as one need to get on a train and get to the airport. And so on. End to end testing should exist and if you can automate it, great, but it should be sparse, it should function like an occasional audit, not something that supports your confidence in the solution working correctly.

  So go for testable, not for tests. Tests often get a bad wrap because someone like me comes and teaches a company to write tests, then they leave and the people in the company either skip testing occasionally or they change bits of the application and don't bother to update the tests. This leads to my next point: clean code.

Cleanliness

  Cleanliness applies to everything, again. The flow of your solution (note that I am being as general as possible) needs to be as clear as possible, at every level. In software this usually translates in readable code and builds up from that. Someone looking at the code should be able to instantly and easily understand what it does. Some junior developers want to write their code as efficient as possible. They just read somewhere that this method is faster than the other method and want to put that in code. But it boils down to a cost analysis: if they shave one second off a process you run ten times a day, they save one hour per year; if another developer has to spend more than one hour to understand what the code does, the gain means nothing.

  Code should be readable before being fast. Comments in code should document decisions, not explain what is going on. Comments should display information from another level than the code's. Component names, object names, method names, variable names should be self explanatory. Configuration structures, property names, property values, they should be intuitive and discoverable.

  And there is another aspect to cleanliness. Most development environments have some automated checks for your code. You can add more and even make your own. This results in lists of errors, warnings and notifications. On a flow level, this might translate to people complaining about various things, some in key positions, some not. Unit tests, once you have them, might be passing or failing. It is important to clean that shit up! Do not ignore warnings or even notifications. You think a warning is wrong, find a way to make it go away, not by ignoring it, but by replacing the complaining component, marking it specifically in the code as not a valid warning and document why, run all the tests and make sure they are green or remove the tests that you think are not important (this should not happen usually). The reason is simple: in a sea of ignored warnings you will not see the one that matters.

  To be perfectly clear: by clean code I don't mean code that follows design patterns, I don't mean documentation comments on every property and method, I don't mean color coded sections (although that's nice). What I mean is code clean enough to read without cringing or having to look in ten other places to figure out what it does. If your hotdog falls on that code you should be comfortable enough to pick it up and continue eating it.

  Cleanliness should and must be applied to your work process. If the daily meeting is dirty (many people talking about unrelated things) then everybody is wasting time. If the process of finishing a task is not clear, you will have headless chickens instead of professionals trying to complete it. If you have to ask around where to log your hours or who is responsible for a specific job that you need done in order to continue, you need to clean that process. Remove all superfluous things, optimize remaining ones. Remember separation of concerns.

  Cleanliness extends to your project folder structure, your solution structure, your organizational structure. It all has to be intuitive. If you declare a principle, it should inform every query and decision, with no exception. "All software development people are at the fifth floor! Ugh... all except Joe". What if you need Joe? What if you don't know that you need Joe, but you still need him? Favor convention over configuration/documentation, document everything else. And that leads me to the final point: knowledge sharing.

Knowledge Sharing

  To me, knowledge sharing was always natural. In small companies there was always "that guy" who would know everything and couldn't work at all because people came to ask him things. In medium companies there was always some sort of documentation of decisions and project details. In large companies there were platforms like Confluence where people would share structured information, like the complete description of tasks: what they are about, how decisions were made, who is responsible for what, how they were split into specific technical tasks, what problems arose, what the solutions were, etc. And there were always your peers that you could connect to and casually talk about your day.

  Imagine my surprise to find myself working in places where you don't know what anyone else is doing, where you don't know what something is and what it is supposed to do, there are no guidelines outside random and out of date Powerpoint files, where I am alone with no team, brought in for problems that need strong decisions in order to fix but no one is willing to make them, and already I have no idea who should even attempt to. I solve a common problem, I want to share the solution, there is no place to do that. People are not even in the same building as me. Emails are come and go and no one has time to read them.

  Knowledge should live freely in your company. You should be able to search for anything and find it, be able to understand it, contribute to it, add more stuff. It should be more natural for the people in your company to write a blog post than go for coffee and complain. It should be easier to find and consume information from people that left the company than to get it from colleagues at the desk next to you. And maybe this cannot be generalized to all departments, but it is fucking important: people in the office should never need to open Microsoft Office (or any similar product suite). I can't stress that enough.

  You should not need printed documents, so no need for Word. Excel files are great for simple data tasks, but they are not specific. If you need something done repeatedly and you use Excel sheet, it is probably better to build a tool for it. Don't reinvent the wheel now, but use the best tool for the job. And there are better and more modern tools than Powerpoint files, but I will allow the use of them because, in the context of knowledge sharing, everyone should feel free and confident enough to make presentation for the team. My tenet still stands, though: the Powerpoint file would be used in a presentation. Hardly anyone else should need to open it. I mean, there would be a video of the presentation available, right?

Vision

  Imagine a park. It is sunny, birds are singing, there are people walking on hardened dirt walkways, cyclers biking on their asphalted bike lanes, benches everywhere, with a small notepad attached to them that people can just pick up and read or write their own notes. Places in the park are clearly indicated with helpful arrows: children playground, hotdog stand, toilet, football field, bar, ice ring. Everything is clean, everybody is doing what they do best, all is good. You feel hungry, you see the arrow pointing towards the hotdog stand, you walk there calmly and ask for one. The boy there give you a bun and a wurst. He is new, but he has a colleague that knows exactly how much mustard and ketchup to put on the hotdog. He even asks you if you want curry on it. 

  Imagine a park. It is sunny, birds are singing. Some walkways start of as asphalt, then continue as dirt. Some stop suddenly or end in a ditch. There is a place that serves hotdogs next to a toilet. You have to ask around to find out where to find it. You get lost several times, as some people don't know either, but they still come with an opinion, or they are just misinformed. You get tired, but you can't sit on a bench, they are all taken and there are so few of them. You have to look both ways several times before you walk to the stand, because of cyclers. You stand in a line, then order a hotdog. The boy there gives you a bun with a wurst in it. You ask for mustard, but the boy is new and it takes him a while to find it after looking for some paper that tells him where it is. You have to dodge a football that was coming at your head. Someone flushes the toilet.

  I've accepted the old man should teach me as the only solution to becoming a champion, but it is hard to swallow it. He is very old, but mischievous, so whenever I try to learn something from him, he kicks me to the ground. He tricks me again and again and again. I am frustrated, but I am trying to keep my cool. I am strong. If I were to really fight him, he might be smart, but every attack would break bone and then what good would he be? Just a bag of meat and broken shards. I close my eyes, I breath, I tell myself it is worth it.

  The old man apologizes and offers me a hand, I take it, only to be kicked in the ass and thrown into a jumble of debris. I lose my temper and stomp away. He doesn't understand. Getting angry at him is pointless, hurting him futile. I have nothing to learn from him. I walk through the old grounds of my conquests, now just the walled in and decrepit underground of the large arena above. I feel a presence behind me and I see the old man is following me, eyes to the ground. Contrition? Surely another of his tricks. "Begone!" I roar at him, but he goes to his knees and kowtows in front of me, his hands touching my feet. I feel tears swelling up in my eyes. He might as well be a little boy asking for forgiveness. Just who is the teacher and who is the student? Who is the adult here?

  "How did you get to a hundred years or whatever behaving like a little kid?! You are a child!" I shout at him in admonishment. I look around and ghosts of my past awaken my anguish. I feel my face contort into a painful grin as my tears flow freely. "Every week I was coming here to murder people!", I rage, my voice barely my own, a booming, low, animal growl, my expression that of an enraptured madman, for sure. "I would stake my life every time and I would leave, alive, every time!". The images of old fights flash before my wet blurred vision and I imagine that some of the painted white walls might contain some of the scrolls of the ancient arts, built over by a world that doesn't get it anymore. "I loved it!", I say, walking in the dead halls, every step a pulse of power overlaying glorious past over grey reality. My body is shaking with now uncontrollable weeping. "I killed so many people and I miss it... so.... very... MUCH!".

  Does he get it now, I ask myself? Has he even an inkling of the power he needs to teach me to control? I burst through the door to the surface and climb the stairs that get me to the arena above. The seats are packed with oblivious spectators, all watching some performance I don't even care to notice. I breathe in the fresh air and feel better. Ready to come to a final understanding with the old man, if he is capable of it ,I turn around. There is little time and we should not fight each other. But the old man is gone.

   I strain my eyes into the darkness of the stairs and I feel it, The Beast, the adversary I need to fight is there. He's got the old man and, even if I cannot see it, I know it is there, all cunning, fury and power. My body roars by itself, a predator sound, strong and fearless, no sound a man should ever be able to make. The arena spectators panic in surprised horror, but I ignore them. I jump into the darkness with animal strength. I will fight this beast, I will meet it head on, I will be the most savage, alone I will remain alive.

and has 2 comments

  Sometimes a good regular expression can save a lot of time and lead to a robust, yet flexible system that works very efficiently in terms of performance. It may feel like having superpowers. I don't remember when exactly I've decided they were useful, but it happened during my PHP period, when the Linux command line system and its multitude of tools made using regular expressions a pretty obvious decision. Fast forward (waaaay forward) and now I am a .NET developer, spoiled by the likes of Visual Studio and the next-next approach to solving everything, yet I still use regular expressions, well... regularly, sometimes even when I shouldn't. The syntax is concise, flexible and easy to use most of the time.

  And yet I see many senior developers avoiding regular expressions like they were the plague. Why is that? In this post I will argue that the syntax makes a regular pattern be everything developers hate: unreadable and hard to maintain. Having many of the characters common in both XML and JSON have special meaning doesn't help either. And even when bent on learning them, having multiple flavors depending on the operating system, language and even the particular tool you use makes it difficult. However, using small incremental steps to get the job done and occasionally look for references to less used features is usually super easy, barely an inconvenience. As for using them in your code, new language features and a few tricks can solve most problems one has with regular expressions.

 The Syntax Issue

The most common scenario for the use of regular expressions is wanting to search, search and replace or validate strings and you almost always start with a bunch of strings that you want to match. For example, you want to match both dates in the format 2020-01-21 and 21/01/2020. Immediately there are problems:

  • do you need to escape the slashes?
  • if you match the digits, are you going for two digit month and day segments or do you also accept something like 21/1/2020?
  • is there the possibility of having strings in your input that look like 321/01/20201, in which case you will still match 21/01/2020, but it's clearly not a date?
  • do you need to validate stuff like months being between 1-12 and days between 1-31? Or worse, validate stuff like 30 February?

But all of these questions, as valid as they are, can be boiled down to one: given my input, is my regular expression matching what I want it to match? And with that mindset, all you have to do is get a representative input and test your regular expression in a tester tool. There are many out there, free, online, all you have to do is open a web site and you are done. My favourite is RegexStorm, because it tests .NET style regex, but a simple Google search will find many and varied tools for reading and writing and testing regular expressions.

The syntax does present several problems that you will hit every time:

  • you cannot reuse parts of the pattern
    • in the example above, even if you have clearly three items that you look for - year, month, day - you will need to copy/paste the pattern for each variation you are looking for
  • checking the same part of the input string multiple times is not what regular expressions were created for and even those that support various methods to do that do it in a hackish way
    • example: find a any date in several formats, but not the ones that are in March or in 2017
    • look behind and look ahead expressions are usually used for scenarios like this, but they are not easy to use and reduce a lot of the efficiency of the algorithm
  • classic regular expression syntax doesn't support named groups, meaning you often need to find and maintain the correct index for the match
    • what index does one of the capturing groups have?
    • if you change something, how do other indexes change?
    • how do you count groups inside groups? Do they even count if they match nothing?
  • the "flags" indicating how the regular engine should interpret the pattern are different in every implementation
    • /x/g will look for all x characters in the string in Javascript
    • C# doesn't even need a global flag and the flags themselves, like CaseInsensitive, are .NET enums in code, not part of the regular pattern string
  • from the same class of issues as the two above (inconsistent implementation), many a tool uses a regular expression syntax that is particular to it
    • a glaring example is Visual Studio, which does not use a fully compatible .NET syntax

  The Escape Issue

The starting point for writing any regular expression has to be a regex escaped sample string that you want to match. Escaping means telling the regular expression engine that characters from your sample string that have meaning to it are just simple characters. Example 12.23.34.56, which is an IP address, if used exactly like that as a regular pattern, will match 12a23b34c56, because the dot is a catch all special character for regular expressions. The pattern working as expected would be 12\.23\.34\.56. Escaping brings several severe problems:

  • it makes the pattern less humanly readable
    • think of a phrase in which all white space has been replaced with \s+ to make it more robust (it\s+makes\s+the\s+pattern\s+less\s+humanly\s+readable)
  • you only escape with a backslash in every flavor of regular expressions, but the characters that are considered special are different depending on the specific implementation
  • many characters found in very widely used data formats like XML and JSON are special characters in regular expressions and the escaping character for regex is a special character in JSON and also string in many popular programming languages, forcing you to "double escape", which magnifies the problem
    • this is often an annoyance when you try to store regular patterns in config files

  Readability and Maintainability

Perhaps the biggest issue with regular expressions is that they are hard to maintain. Have you ever tried to understand what a complex regular expression does? The author of the code started with some input data and clear objectives of what they wanted to match, went to regular expression testers, found what worked, then dumped a big string in the code or configuration file. Now you have neither the input data or the things that they wanted matched. You have to decompile the regular pattern in your head and try to divine what it was trying to do. Even when you manage to do that, how often do developers redo the testing step so they verify the changes in a regular expressions do what was intended?

Combine this with the escape issue and the duplication of subpatterns issue and you get every developer's nightmare: a piece of code they can't understand and they are afraid to touch, one that is clearly breaking every tenet of their religion, like Don't Repeat Yourself or Keep It Simple Silly, but they can't change. It's like an itch they can't scratch. The usual solution for code like that is to unit test it, but regular expression unit tests are really really ugly:

  • they contain a lot of text variables, on many lines
  • they seem to test the functionality of the regular expression engine, not that of the regular expression pattern itself
  • usually regular expressions are used internally, they are not exposed outside a class, making it difficult to test by themselves

  Risk

Last, but not least, regular expressions can work poorly in some specific situations and people don't want to learn the very abstract computer science concepts behind regular expression engines in order to determine how to solve them.

  • typical example is lazy modifiers (.*? instead of .*) which tell the engine to not be greedy (get the least, not the most)
    • ex: for input "ineffective" the regular expression .*n will work a lot worse than .*?n, because it will first match the entire word, then see it doesn't end with n, then backtrack until it gets to "in" which it finally matches. The other syntax just stops immediately as it finds the n.
  • another typical example is people trying to find the HTML tag that has a an attribute and they do something like \<.*href=\"something\".*\/\> and what it matches is the entire HTML document up to a href attribute and the end of the last atomic tag in the document.
  • the golden hammer strikes again
    • people start with a simple regular expression in the proof of concept, they put it in a config file, then for the real life application they continuously tweak the configured pattern instead of trying to redesign the code, until they get to some horrible monstrosity
    • a regex in an online library that uses look aheads and look behinds solves the immediate problem you have, so you copy paste it in the code and forget about it. Then the production app has serious performance problems. 

  Solutions

  There are two major contexts in which to look for solutions. One is the search/replace situation in a tool, like a text editor. In this case you cannot play with code. The most you can hope for is that you will find a tester online that supports the exact syntax for regular expressions of the tool you are in. A social solution would be to throw shade on lazy developers that think only certain bits of regular expressions should be supported and implemented and then only in one particular flavor that they liked when they were children.

  The second provides more flexibility: you are writing the code and you want to use the power of regular expressions without sacrificing readability, testability and maintainability. Here are some possible solutions:

  • start with simple stuff and learn as you go
    • the overwhelming majority of the time you need only the very basic features of regular expressions, the rest you can look up when you need them
    • if the regular expression becomes too complex it is an indication that maybe it's not the best approach
  • store subpatterns in constants than you can then reuse using templated strings
    • ex: var yearPattern = @"(?<year>\d{4})"; var datePattern = $@"\b(?:{yearPattern}-(?<month>\d{2})-(?<day>\d{2})|(?<month>\d{2})\/(?<day>\d{2})\/{yearPattern})\b";
    • the example above only stores the year in another variable, but you can store the two different formats, the day, the month, etc
    • in the end your code might look more like the Backus-Naur (BNF) syntax, in which every separate component is described separately
  • use verbatim strings to make the patterns more readable by avoiding double escaping
    • in C# use @"\d+" not "\\d+"
    • in Javascript they are called template literals and use backticks instead of quotes, but they have the same mechanism for escaping characters as normal strings, so they are not a solution
  • use simple regular expressions or, if not, abstract their use
    • a good solution is using a fluent interface (check this discussion out) that allows the expressivity of human language for something that ends up being a regular expression
    • no, I am not advocating creating your own regular expression syntax... I just think someone probably already did it and you just have to find the right one for you :)
  • look for people who have solved the same problem with regular expression and you don't have to rewrite the wheel
  • always test your regular expressions on valid data and pay attention to the time it took to get the matches
  • double check any use of the string methods like IndexOf, StartsWith, Contains, even Substring. Could you use regular expressions?
    • note that you cannot really chain these methods. Replacing a regular expression like ^http[s]?:// with methods always involves several blocks of code and the introduction of cumbersome constants:
      if (text.StartsWith("http")) {
        // 4 works now, but when you change the string above, it stops working
        // you save "http" to a constant and then use .Length, you write yet another line of code
        text=text.Substring(4);
      } else {
        return false;
      }
      if (text.StartsWith("s")) {
        text=text.Substring(1);
      }
      return text.StartsWith("://");
      
      // this can be easily refactored to
      return text
               .IfStartsWith("http")
               .IfStartsWithOrIgnore("s")
               .IfStartsWith("://");
      // but you have to write your own helper methods
      // and you can't save the result in a config file​

  Conclusion

Regular expressions look daunting. Anyone not familiar with the subject will get scared by trying to read a regular expression. Yet most regular expression patterns in use are very simple. No one actually knows by heart the entire feature set of regular expressions: they don't need to. Language features and online tools can help tremendously to make regex readable, maintainable and even testable. Regular expressions shine for partial input validation and efficient string search and replace operations and can be easily stored in configuration files or data stores. When regular expressions become too complex and hard to write, it might be a sign you need to redesign your feature. Often you do not need to rewrite a regular expression, as many libraries with patterns to solve most common problems already exist.

and has 0 comments

  I see a lot of pages about how to write blog posts. I read them, because I am both curious and sincere in my desire to make my blog popular and spread the knowledge I've amassed here. They are always crap. Take one that says the best tool to get a blog popular is to use Google Trends or Google autocomplete to see what people are searching for. And the results are always incredibly stupid. Like "how to add one to one to get two". I am paraphrasing a bit here, but you get the gist. Go "worldwide" and the first trend is always some Chinese spam. Another post is saying that a blog post should be written as four drafts: one for what you want to say, one for how you want to say it, one for peer reviewed content and the final one that actually is what you want to publish. It sounds great, but it implies a level of work that sometimes is prohibitive related to the subject of your post. Sometimes you just want to share something as a stream of consciousness and be done with it. Is that better? No. But it sure beats NOT writing anything. There is always time to improve your work and get peer review AFTER publishing it.

  There are two big types of people blogging. The first category is akin to reporters and media people. They want to get their message across for reasons that are rather independent of the message itself. They want to earn money or influence or some other kind of benefit. I don't have any advice for people like that. Not because I disconsider their goals, but because I have never blogged for an ulterior reason. The second category of bloggers is akin to writers: they want to get their message across because they feel there is some value in the message itself. I consider myself such a person, although I probably suck as a writer. This post is for people like that.

  The most important part of writing a post is motivation. And I don't mean just the reason for writing it, but the reason for wanting to share it. For me, most of the posts I write are either content that I consume, such as books, or stuff that I think is worth considering or technical stuff that I've stumbled upon and I believe people would want to find if googling for it instead of wasting the time I wasted to solve it. Now, the books and the personal idea posts I totally agree are ego boosting shit: I feel like it's important enough to "share", but I don't really expect people to read it or that there is any inherent value in them other than getting to know me better. And everyone wants to understand other people better on the Internet, right? In the end they are just a personal log of random thoughts I have. My blog is something that represents me and so I feel that I need to share things that are personal to me, including thoughts that are not politically correct or even correct in any possible way. One can use Facebook for this, so I won't write about those posts. They still reach some people and inform their choices, which is something I love.

  What is left is the posts that I work for. You have no idea how much I work on some of these posts. It may take me hours or even days for content that you read in a few minutes. That is because I am testing my ideas in code and creating experiments to validate my beliefs and doing research on how other people did it. A lot of the times I learn a lot from writing these posts. I start with the expectation that I know what I am talking about only to find out that I was wrong. The important part is that I do correct myself and some of the blog posts here are exclusively about discovering how wrong I was. There is nothing more rewarding than writing something that you feel others might benefit from. Perhaps other than getting feedback about how your post benefited others. Publishing your failures is just as important as publishing your successes.

  Yes, I know, if I learn something new by doing what I need to be doing, then sharing the results is like writing for myself, too. It's ego boosting, for sure. However, it would be even more obnoxious to believe no one is like me and so no one would benefit from the same work. There was a time when people came to my blog and asked me about all kinds of technical problems and I worked with them to solve them. There were usually incredibly simple problems that posed difficulties only to the laziest people, but it felt good! Then StackOverflow came along and no one actually interacts with me. But I have solved stupid problems that I still keep getting thanks for, even (maybe especially because) if the technology is really old and obsolete. Many other blogs published cool things about subjects that are not fashionable anymore and then just disappeared. The value of your content is that it may help people in your situation, even if they don't share your sense of now and even if all they take away is how NOT to do things.

  Sometimes you are looking for the solution for a problem and after hours of work you realize the problem was invalid or the solution was deceptively simple. It's the "Oh, I am so stupid!" moment that makes a lot of people shy away from writing about it. I find that these moments are especially important, because other people will surely make the same mistake and be hungry about finding the answer. OK, you admit to the world you were stupid, but you also help so many other people that would waste time and effort and feel as stupid as you if not for writing the post.

  My take on writing a blog post is that you just have to care about what you are writing. You may not be the best writer out there, you might not even be explaining the thing right, but if you care about what you are writing, then you will make the effort of getting it right eventually. Write what you think and, if you are lucky, people will give you reasons to doubt your results or improve them. Most likely people will NOT care as much about the subject as you, but you are not writing because of them, you are writing for them. Some of your thoughts and toils will reach and help someone and that is what blogging is all about.

  The last thing I want to mention is maintenance. Your work is valid when you write it, but may become obsolete later on. You need to make the effort to update the content, not by removing the posts or deleting their content, but by making clear things have changed, how they did and what can be done about it. It is amazing how many recent posts are reached only because I mentioned them in an "obsolete" post. People search for obsolete content, find out it's too old, then follow the link to your latest solution for that problem. It makes for good reading and even better understanding of how things got to the current point.

  So, bottom line: publish only what you care about and care about your readers, keep the posts up to date, publish both successes and failures.

and has 0 comments

  Imagine you are playing a computer game, exploring virtual realms and testing your mettle in cooperation or opposition to other players. You are not the best, but you are getting better and you feel that reward system in your brain getting activated and giving you that pleasant buzz, like you are doing something that matters. Suddenly, new players enter the game and they seem indomitable. You can't possibly defeat them: they are faster, incredibly so, they are more accurate, dubiously so, and they seem to have no respect at all for the spirit of the game that you, until just now, enjoyed. They don't want to get better, they want to humiliate you and the other players by just being incredibly better than all with no credible cause other than, yes, they cheat. Somehow they found a way to skirt the rules and thus make them meaningless.

  While this is a scourge that affects all online games, it is also a powerful metaphor about real life. Think about trying to advance in your company, get that job that gives you more money, more prestige and proves to yourself and others that you have become better, a worthy person for that role. Suddenly, a new player arrives, and he is the nephew of the CEO and he gets the job for no credible reason. That is not a game, it's your life. The resentment is real. You can't just change servers or turn off the computer and read a book: this affects you, your family, your loved ones.

  But I will go ever further. Imagine that you are trying to lead a good life, according to the moral principles that were instilled in you by family, culture and your own evolution as a human being. You take care of your children and make efforts to set up their lives so that they have the many and good opportunities. You paid your life insurance, prepared your pension fund and are content that in a decade or so you will finish paying the rates for the house where you plan to retire and live out your golden years. You've taken care of your health, you eat and drink responsibly, you exercise regularly. Suddenly, new players arrive. They have found a way to cheat death. Not only do they have better health, they don't age. They might even get younger and fitter and smarter with no effort at all. Your pension funds implode, because old age becomes irrelevant, the prices skyrocket because there are more people with more money buying stuff and not getting any older and weaker as they go. Your children have no more opportunities, as they can't compete with people that are of the same biological age, but have decades of experience.

  I believe this way of thinking is what instructs most ethical ideas. Life is a game, with rules that are either agreed upon or forced upon the players by the limitations of the environment. Cheating at this game sounds both ideal and amoral. We have a zillion stories warning about the perils of playing god, but in the end they are just a reaction of fear to the mere possibility that someone might find a way to hack life.

  And I agree that it is very dangerous, for the very reasons that game hacking is so annoying: it makes the game irrelevant. If people don't care about life anymore, if they have no limits, then what's the point? It's almost a Nietzschean concept that the worth of life cannot exist in a vacuum, it needs suffering and obstacles to overcome. What would the philosopher believe of someone who becomes better by overcoming hardship only to be completely overshadowed by someone who just ... cheated. What would it mean to live a happy and fulfilling life if you've hacked your brain to feel happy and fulfilled? What would it mean to live a moral life if the ability to disobey rules has been bred out of you?

  Yet, what is the moral ground to not even try, I ask. How can it be moral to conceive of life as just a game? Wouldn't that be meaningless also? I submit that the very possibility of skirting the rules makes them obsolete. I submit that just as talented people are banned from online servers for being too good, talented people are getting sidelined in life by the very same "ethical" way of thinking of life as a static game where people should follow the same rules and achieve the same relative rewards.

  As technology and knowledge and sheer individual power increase, the danger of people playing god is dwarfed by the danger of people killing god inside themselves.

  I see only one solution for this: the expansion of the human race. Only when centralized authority becomes impossible will humanity truly reach its potential. That requires we spread out so far that enforcement can only be local. It will permit us, if you will, to have different servers to play on. Some of them will ban cheaters, some of them will welcome them, and there will be many variations in between. Some of them will try and fail, maybe spectacularly, but some of them will thrive and advance not only the players, but the game itself.

and has 0 comments
Just has a revelation. There are studies that show the moment you introduce currency in a social transaction, the dynamics change dramatically, leading to conflict, selfishness and the dissolution of societal and even emotional bonds. For a random reference check this article: Why Good Deeds and Money Don’t Mix.

I've been struggling with this new political correctness movement because 1) I didn't get it 2) almost every one of the people actively acting offended in this context appears to be... not nice and 3) it doesn't seem to be helping any. So, am I the bad guy? I started to ask myself. Am I a racist homophobic sexist misogynistic normie white male working in the tech field or is there something else going on? Judging by how far from normie people who actually know me think I am, I started to think about it more.

And it came to me! Political correctness is a form of currency forcefully introduced into our social transactions. Not only is it causing trouble for people who are assholes, but also for normal people who suddenly feel they have to pay something. And, as currency does, it breaks society, not strengthens it.

That is why so many people caught in this are so violent and partisan about it. That is why when you are nice towards a - I don't even know how to call them these days - not white person it feels good, as it would being nice towards anybody else, but when you are forced to do it, it well... feels forced. It feels like duty, like work, like paying a tax. The concept of balance slowly creeps in and makes one push back. Maybe with a joke, maybe with an angry tweet, maybe with something worse like actually picking on someone for their skin color, sex, age, religion or anything else. And they do it because picking on someone for being... I don't know... Romanian, doesn't feel like restoring anything. And now Romanians are pretty angry, because offending Jewish people or of recent African descent is somehow "wronger", so they get offended and feel left out. It's wrong to pick on anyone either way, deal with it!

In the end, introducing currency just pushes people into two diametrical opposed groups: the payers and the people who are owed. And of course, the people who ride the wave and get their little percentage to convert it to any other currency: money, hate, power, etc. We become slaves to the middlemen even when we interact with other people! Hell, they want to introduce ethics for computers now. Where does it end?!

So, as I am an egalitarian in my misanthropy, I submit that you should get offended by people just like any other person would. Leave currency to bankers. Or pick on them! No one ever got into a twist for calling bankers names.