and has 0 comments
I've stumbled upon a very funny exception today. Basically I was creating a constant string from adding some other constant strings to each other. And it worked. The moment I added an integer, though, I got The expression being assigned to 'Program.x2' must be constant. The code that generated this error is simple:
const string x2 = "string" + 2;
Note that
const string x2 = "string" + "2";
is perfectly valid. Got the same result when using VS2010 and VS2015, so it's not a compiler bug, it's intended behavior.

So, what's going on? Well, my code transforms behind the scenes into
const string x2 = "string" + 2.ToString();
which is not constant because of ToString!

The only way to solve it was to declare the numeric constant as string as well.

This clause is so obscure that I couldn't even find the Microsoft reference page for it for a few minutes, so no wonder I didn't know about it. Introduced in SQL Server 2005, the TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows.

Usage:
TABLESAMPLE (sample_number [ PERCENT | ROWS ] ) [ REPEATABLE (repeat_seed) ]

REPEATABLE is used to set the seed of the random number generator so one can get the same result if running the query again.

It sounds great at the beginning, until you start seeing the limitations:
  • it cannot be applied to derived tables, tables from linked servers, and tables derived from table-valued functions, rowset functions, or OPENXML
  • the number of rows returned is approximate. 10 ROWS doesn't necessarily return 10 records. In fact, the functionality underneath transforms 10 into a percentage, first
  • a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table.
  • it isn't even that random!

Funny enough, even the reference page recommends a different way of getting a random sample of rows from a table:
SELECT * FROM Sales.SalesOrderDetail
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

Even if probably not really usable, at least I've learned something new about SQL.

Update:
More about getting random samples from a table here, where it explains why ORDER BY NEWID() is not the way to do it and gives hints of what really happens in the background when we invoke TABLESAMPLE.
Another interesting article on the subject, focused more on the statistical probability, can be found here, where it also shows how TABLESAMPLE's cluster sampling may fail in spectacular ways.

and has 1 comment
I am often left dumbfounded by the motivations other people are assigning to my actions. Most of the time it is caused by their self-centeredness, their assumption that whatever I do is somehow related more to them than to me. And it made me think: am I a good/bad person, or is it all a matter of perception from others?

I rarely feel like I do something out of the ordinary for other people; instead I do it because that's who I am. I help a colleague because I like to help or I refuse to do so because I feel that what I am doing is more important. Same with friends or romantic relationships. Sometimes I need to make an effort to do something, but it's still my choice, my assessment of the situation and my decision to go a certain way. It's not a value judgment on the person, it's not an asshole move or some out of my way effort to improve their life. What I do IS me.

It's also a weird direction of reasoning, since I am aware of the physical impossibility for "free will" and I subscribe to the school of thought that it is all an illusion. I mean, logic dictates that either the world works top-bottom, with some central power of will trickling down reality or it is merely a manifestation of low level forces and laws of physics that lead inexorably towards the reality we perceive. In other words, if you believe in free will, you have to believe in some sort of god, and I don't. Yet living my life as if I have no free will makes no sense either. I need to play the game if I am to play the game. It's kind of circular.

Getting back to my original question: Isn't good or bad just a label I (and other people) assign to a pattern of behavior that belongs to me? And not before I do things, but always afterwards. Just like the illusion of free will there is the illusion of moral quality that guides my path. While one cannot quantify free will, they can measure the effect my behavior has on their life and goals and determine a value. But then is my "goodness" something like an average? Because then it would be more important the number of people I am affecting, rather than the absolute value of the effect per person. Who cares I help a colleague or pay attention to my wife? In the big sea of people, I am just a small fish that affects a few other small fish. We could all die tomorrow in the belly of a whale, all that goodness pointless.

So here I am, asking essentially a "who am I" question - painfully aware it has no final answer - in a world I think is determined by tiny laws of physics that create the illusion of self and with a quantity of consequence that is irrelevant even if it weren't so. I am torturing myself for no good reason, ain't I?

Yet the essence of the question still intrigues me. Is it necessary that I feel a good drive for my actions to be a good person, or is it a posterior calculation of their effect that determines that? If I work really well and fast for a month and then I do less the next, is it that I did good work in the first month or that I am a lazy bastard in the second? If I pay attention to someone or make a nice gesture, is it something to be lauded, or something to be criticized when I don't do it all the time? Is this a statistical problem or an issue of causality?

And I have to ask this question because if I feel no particular drive to do something and just "am myself", I don't think people should assign all kind of stupid motivations to my actions. And if I need to make this sustained effort to go outside my routine just to gain moral value... well, it just feels like a lot of bother. And I have to ask it because the same reasoning can be applied to other people. Is my father making terrible efforts to take care of just about everybody in his life, making him some sort of saint, or is it just what he does and can't help himself, in which case he's just a regular dude?

Personally I feel that I am just an amalgamation of experiences that led to the way I behave. I am neither good nor evil and my actions define me more than my intentions. While there is some sort of consistency that can be statistically assessed, it is highly dependent on the environment and any inference would go down the drain the moment that environment changes. But then, how can I be a good person? And does it even matter?

.Net Core Web API uses Newtonsoft's Json.NET to do JSON serialization and for other cases where you wanted to control Json.NET options you would do something like
JsonConvert.DefaultSettings = (() =>
{
var settings = new JsonSerializerSettings();
// do something with settings
return settings;
});
, but in this case it doesn't work. The way to do it is to use the fluent interface method and hook yourself in the ConfigureServices(IServiceCollection services) method, after the call to .AddMvc(), like this:
services
.AddMvc()
.AddJsonOptions(options =>
{
var settings=options.SerializerSettings;
// do something with settings
});

In my particular case I wanted to serialize enums as strings, not as integers. To do that, you need to use the StringEnumConverter class. For example if you wanted to serialize the Gender property of a person as a string you could have defined the entity like this:
public class Person
{
public string Name { get; set; }
[JsonConverter(typeof(StringEnumConverter))]
public GenderEnum Gender { get; set; }
}

In order to do this globally, add the converter to the settings converter list:
services
.AddMvc()
.AddJsonOptions(options =>
{
options.SerializerSettings.Converters.Add(new StringEnumConverter {
CamelCaseText = true
});
});

Note that in this case, I also instructed the converter to use camel case. The result of the serialization ends up as:
{"name":"James Carpenter","age":51,"gender":"male"}

and has 0 comments
I was doing this silly HackerRank algorithm challenge and I got the solution correctly, but it would always time out on test 7. I wracked my brain on all sorts of different ideas but to no avail. I was ready to throw in the towel and check out other people solutions, only they were all in C++ and seemed pretty similar to my own. And then I've made a code change and the test passed. I had replaced LINQ's OrderBy with Array.Sort.

Intrigued, I started investigating. The idea was creating a sorted integer array from a space delimited string of values. I had used Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).OrderBy(v=>v); and it consumed above 7% of the total CPU of the test. Now I was using var arr=Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).ToArray(); Array.Sort(arr); and the CPU usage for that piece of the code was 1.5%. So it was almost five times slower. How do the two implementations differ?

Array.Sort should be simple: an in place quicksort, the best general solution for this sort (heh heh heh) of problem. How about Enumerable.OrderBy? It returns an OrderedEnumerable which internally uses a Buffer<T> to get all the values in a container, then uses an EnumerableSorter to ... quicksort the values. Hmm...

Let's get back to Array.Sort. It's not as straightforward as it seems. First of all it "tries" a SZSort. If it works, fine, return that. This is an external native code implementation of QuickSort on several native value types. (More on that here) Then it goes to a SorterObjectArray that chooses, based on framework target, to use either an IntrospectiveSort or a DepthLimitedQuickSort. Even the implementation of this DepthLimitedQuickSort is much, much more complex than the quicksort used by OrderBy. IntrospectiveSort seems to be the one preferred for the future and is also heavily optimized, but less complex and easier to understand, perhaps. It uses quicksort, heapsort and insertionsort together.

Now, before you go all "OrderBy sucks!", read more about it. This StackOverflow list of answers seems to indicate that in case of strings, at least, the performance is similar. A lot of other interesting things there, as well. OrderBy uses a "stable" QuickSort, meaning that two items that are compared as equal will appear in their original order. Array.Sort does not guarantee that.

Anyway, the performance difference in my particular case seems to come from the native code implementation of the sort for integers, rather than algorithmic improvements, although I don't have the time right now to grab the various implementations and test them properly. However, just from the way the code reads, I would bet the IntrospectiveSort will compare favorably to the simple Quicksort implementation used in OrderBy.

and has 0 comments
Today I received two DMCA notices. One of them might have been true, but the second was for a file which started with
/*
Copyright (c) 2010, Yahoo! Inc. All rights reserved.
Code licensed under the BSD License:
http://developer.yahoo.com/yui/license.html
version: 2.8.1
*/
Nice, huh?

The funny part is that these are files on my Google Drive, which are not used anywhere anymore and are accessible only by people with a direct link to them. Well, I removed the sharing on them, just in case. The DMCA is even more horrid than I thought. The links in it are general links towards a search engine for notices (not the link to the actual notice) and some legalese documents, the email it is coming from is noreply-6b094097@google.com and any hope that I might fight this is quashed with clear intention from the way the document is worded.

So remember: Google Drive is not yours, it's Google's. I wonder if I would have gotten the DMCA even if the file was not being shared. There is a high chance I would, since no one should be using the link directly.

Bleah, lawyers!

I have enabled Disqus comments on this blog and it is supposed to work like this: every old comment from Blogger has to be imported into Disqus and every new comment from Disqus needs to be also saved in the Blogger system. Importing works just fine, but "syncing" does not. Every time someone posts a comment I receive this email:
Hi siderite,
 
You are receiving this email because you've chosen to sync your
comments on Disqus with your Blogger blog. Unfortunately, we were not
able to access this blog.
 
This may happen if you've revoked access to Disqus. To re-enable,
please visit:
https://siderite.disqus.com/admin/discussions/import/platform/blogger/
 
Thanks,
The Disqus Team
Of course, I have not revoked any access, but I "reenable" just the same only to be presented with a link to resync that doesn't work. I mean, it is so crappy that it returns the javascript error "e._ajax is undefined" for a line where e._ajax is used instead of e.ajax and even if that would have worked, it uses a config object that is not defined.

It doesn't really matter, because the ajax call just accesses (well, it should access) https://siderite.disqus.com/admin/discussions/import/platform/blogger/resync/. And guess what happens when I go there: I receive an email that the Disqus access in Blogger has been revoked.

No reply for the Disqus team for months, for me or anybody else having this problem. They have a silly page that explains that, of course, they are not at fault, Blogger did some refactoring and broke their system. Yeah, I believe that. They probably renamed the ajax function in jQuery as well. Damn Google!

I've met an interesting case today when we needed to manipulate data from tens of thousands of people daily. Assuming we would use table rows for the information, then we get a table in which rows are constantly added, updated and deleted. The issue is with the space allocated in table pages.

SQL works like this: If it needs space it allocates some as a "page" which can contain more records. When you delete records the space is not reclaimed, it remains as is (this is called ghosting). The exception is when all records in a page are deleted, in which case the page is reused as an empty page. When you update a record with more data then it held before (like when you have a variable length column), the page is split, with the rest of the records on the page moved to a new page.

In a heap table (no clustered index) the space inside pages is reused for new records or for updated records that don't fit in their allocated space, however if you use a clustered index, like a primary key, the space is not reused, since there needs to be a correlation between the value of the column and its position in the page. And here lies the problem. You may end up with a lot of pages with very few records in them. A typical page is 8 kilobytes, so a table with a few integers in a record can hold hundreds of records on a single page.

Fragmentation can be within a page, as described above, also called internal, but also external, between pages, when the recycled pages are used for data that is out of order. To get a large swathe of records the disk might be worked hard in order to jump from page to page to get what is logically a continuous blob of data. It is disk input/output that kills a database.

OK, back to our case. A possible solution was to store all the data for a user in a "blob", a VARBINARY column. For reads or changes only the disk space occupied by the blob would be changed, with C# code handling everything. It's what is called trading CPU for IO, which is generally good. However this NoSql-like idea itself smelled badly to me. We are supposed to trust our databases, not work against them. The solution I chose is monitoring index fragmentation and occasionally issuing clustered index rebuilding or reorganizing. I am willing to bet that reading/writing the data equivalent to several pages of table is going to be more expensive than selecting the changes I want to make. Also, rebuilding the index will end up storing all the data per user in the same space anyway.

However, this case made me think. Here is a situation in which the solution might have been (and it was in a similar case implemented by someone else) to micromanage the way the database works. It made me question using a clustered index/primary key on a table.

These articles helped me understand more:

I had this problem with Perforce where I accidentally Reconciled my offline work with all the files in /bin and /obj folders, resulting in a huge 6000+ file changelist. OK, simple one button mistake, surely there must be some one button undoing what I just did. It appears there is not.

In order to fix this I have to follow these steps:
  1. Change the settings of Perforce to show files even in changelists larger than 1000 items (the default value)
  2. Select by hand in the changelist window the files from obj and bin folders and using Revert on them
  3. Revert the few other files that were unwanted in the changelist, like .suo and .user files - note that Revert on added files doesn't delete them, it just unadds them
  4. Create a file with paths to ignore and then use p4 set P4IGNORE=<filename> for future reconcile work

What didn't work was adding a filename or path filter when visualizing the changelist, since that is a changelist filter, not a files filter. It will show you changelists that have files that contain the pattern, but not filter the files inside the changelists themselves.

For reference, the p4ignore file I used looked like this:
p4ignore
bin/Debug
obj/Debug
*.suo
*.user
Note that I also added the p4ignore file itself, although the file was not in any Perforce repository (yet).

"But, Siderite, you should use Git (or whatever source control is the newest fad at the moment)!" Wish that could, my friend, wish that I could.

and has 0 comments
Having been so pleasantly surprised by First Light, the first book in The Red series, I quickly read the next two books: The Trials and Going Dark. However, possibly due to my high expectations, I have been disappointed by the continuation. Linda Nagata seemed to have reached that sweet spot between current tech trends and emergent future that makes stories feel both hard sci-fi and realistic. The integration between man and machine, the politics run by shadowy megarich "dragons" from the background, nuclear bombs detonated in major US cities, artificial intelligence and so on. The potential was immense!

Yet, the author chose to continue the story on the same flat note, like an ode to the Stockholm Syndrome, where the hero gets repeatedly coerced to run missions that at first seem bullshit, but in the end are rationalized as necessary and even dutiful by himself. The common intrusion of external forces into his emotional balance by way of direct brain stimulation also makes his feelings and motivations be completely isolated from the ones of the reader. A strange choice, considering the vast possibilities opened by the first book. Frankly, it felt like Nagata liked writing the first book and then was forced by publishers to make it "a trilogy", since that is the norm for fantasy and science fiction, but her heart wasn't really in it.

I don't want to spoil the ending, such as it is, but I will say it was disappointing as well, with no real closure for the reader of any of the important questions raised in First Light. Too bad, since I felt the story was beginning to touch on important subjects that needed to be discussed at a deeper level than just "boots on the ground".

and has 0 comments
I was listening to this Software Engineering Daily podcast about Facebook Relationship Algorithms and I had this weird idea. The more I was thinking about it, the more realistic it felt (as well as more than a bit creepy). Let me lay it out for you. The podcast is interesting in its own right, so go listen to it, it's instructive.

So they describe these metrics of your Facebook connections that they reached while trying to find an algorithm to detect your romantic relationship. They took a large sample of users that have declared their significant other and tried to find an automated way of predicting that from the other information Facebook had. The first idea was to use connectivity, one often used idea in sociology that the more common friends you have with someone, the closer you are, but it didn't quite work. One clear counterexample would be coworkers connected on social media. Instead, they realized that such functional connections often cluster, so you would have the cluster of coworkers, your family, your club friends, the people who share your hobby, etc. The romantic partner would be connected with many of these friends, but across clusters, in other words you would have many common friends that are not really connected with each other. By using this metric they called dispersion, they would guess with more than 50% accuracy from the first try who your partner is from the list of hundreds of your friends.

And here is my idea: why not reverse engineer it? Imagine you have someone you want to hook up with, but have no idea how to proceed. Maybe asking them on a date is not an option or maybe, like any engineer out there, you want to maximize the chances your experiment would work. Why not find the smallest subset of friends of that person that have the largest value of dispersion? So here is what this "hook me up" algorithm would do:
  1. Collect the list of friends of your target
  2. Find a sample that are well connected to the target, but less connected with each other
  3. Approach each of them and befriend them

The result would be that you would become the "natural" choice for a relationship, by going backwards and reversing the direction of causality. Automatic stalking, courtesy of your friendly neighborhood software developer: Siderite! We live in an age in which information about us can be used or abused in innumerable ways and we become addicted and stuck to this way of relating. It's not a bad thing, but it has its drawbacks. It is good to know of them.

and has 0 comments
First Light is Linda Nagata's first book in The Red series, which follows a military man landing right into the middle of an emergence event. Stuck between his duty as a soldier, his love for his girlfriend and father, the maniacal ambitions of an all powerful defense contractor queen and a mysterious God-like entity which seems to like him, our hero does what he can to survive and do good by his own principles.

At first I thought it was going to be one of those cheap soldiering books. It was short, written by a woman, and frankly I expected a standard pulp fiction "read it on a train" kind of thing. Instead I was blown away by the subtlety with which the characters are being explored and the way the story was constructed. I loved the book and I plan to read all the series. I started reading The Dread Hammer, which is another Nagata book, this time fantasy, but it doesn't even come close to First Light. I may even dislike it.

Anyway, I can't say much about the plot without spoiling it, but I can certainly recommend this book. As I said, it is short enough to read and see if it evokes the same feelings. Instead of hurting it, the female perspective of the author enhances the experience and makes it unique. The technical aspects are spot on and the writing style is fluid and easy to read. Top marks!

and has 0 comments

Update May 2020: I used this on a web site and the body was white. It may be that the bug in Chrome was solved in the meantime.

Update October 2019: a CSS media feature (prefers-color-scheme) can be used in conjunction with this. A recent development, it's a media query that allows a browser to activate CSS code based on the theme set in the operating system. You set your preference in Windows or MacOS or wherever and then sites that use prefers-color-scheme will take advantage of that. Something like this:

@media (prefers-color-scheme: dark) {
  html,img, video, object, [style*=url] {
    -webkit-filter:invert(100%) !important;
    filter:invert(100%) !important;
  }

  /* this was solving a bug in Chrome that seems to have been fixed
  body {
    background:black;
  }
  */
}

  I am a very light sensitive person. Shine a light in my eyes and you limit my productivity immensely. Not to mention it makes me irritable. Therefore I often have the desire to turn cheerful black on white sites to a dark theme, where the colors are reversed. I am sure other people have the same problem so I thought of building a browser extension to enable a switching button between the two.

  The first problem is that I need to interrogate all the elements in a page, including the ones that will be created later. The second is that even so I would have problems determining the dominant color of images. But there is something I can use which makes all of this unnecessary: using the invert CSS filter! Since I already use a browser extension that injects my own styles in any site - it's called Stylish and I highly recommend it - all I have to do is apply a filter on the entire site, right?

  Wrong! The problem is that when you invert an entire site, all images on the site get inverted, too. That also includes videos and Flash objects. The worst offenders here are the elements that sport a background image that is declared via CSS, since you can't create a CSS selector for them. I am going to present my partial solution and maybe you can help me find a more elegant or more complete one. Here is a general dark theme stylesheet, without the elements that have a background image declared via CSS (it does include those with a background image declared inline, though):

 html,img, video, object, [style*=url] {
    -webkit-filter:invert(100%) !important;
    filter:invert(100%) !important;
  }

 /* this was solving a bug in Chrome that seems to have been fixed
 body {
   background:black;
 }
 */

  What it does is invert the entire page (html), then reinvert video, img, and object elements, as well as those with "url" in the style attribute. In Chrome, at least, there seems to be a bug in the sense that the backgrounds of the direct child elements are not inverted, which means body, as the first child, needs to have the background set to black specifically. (this seems to have been solved by May 2020) The hack to invert elements with "url" in style is pretty ugly, too.

  What I think of a solution is this:

  • inject Javascript to enumerate all elements present and future using document.createTreeWalker and Mutation Observers, check if they have a background image and if so, add a class to them
  • inject the CSS above with an additional rule for the class for the elements with background image

  However, this doesn't completely solve the problem. One of the major issues is the inverted colors sometimes look dumb. For example a red background turns cyan, white text on light gray background turns black text on dark gray background, which makes it hard to read. I've tried various other filters, like hue-rotate or contrast, but it doesn't really help. Detecting individual color patterns doesn't really work, as the filter attribute affects an element and all of its children. The CSS above only works because the images are inverted again when the entire page has been inverted.

  The good news is that most of the time you may use the CSS above as a template, then add various rules (manually) to fix small issues with colors of backgrounds. Even if I don't package this in an extension, you have the power to create your own themes for various sites. Never again will you be subjects to the tyranny of the happy bright shiny people!

I was glad to attend the 565th SQLSaturday in Bucharest yesterday and, while all presentations were cool, I wanted to share with you some specific points that I found very revealing. Without further ado, here is the list:
  • SQL execution plans are read from right to left - such a simple thing, but I remember when I was trying to read them from left to right and I didn't get anything. In SQL Server Management 2016 you also get a "live" version, which shows you an execution plan while it's executing. Really useful to see where the blocking operations are.
  • Manually control your statistics update - execution plans are calculated based on statistics, but the condition for updating the statistics is to have changes in a number of 20% of the rows plus 500 of any table. This default setting is completely arbitrary and may cause a lot of pain. Not only updating the statistics blocks your table (which means more chances that the table will be locked when it is most used), but sometimes the statistics are not useful. One example are reports which may receive a startdate/enddate range or a count or something like that which makes the number of rows affected vary immensely with different parameters. Use OPTION(RECOMPILE) for that.
  • Look for a difference between estimated and actual rows in a query plan, which leads to tempdb spills, which leads to unwanted IO operations - before a query, an execution plan is created or reused, based on statistics, as I was saying above. Once a plan has been chosen, though, it doesn't change during its execution. Basically what this means is that the structure of the plan remains unchanged between the estimated and actual plan. Also based on the plan, memory is requested and never changed. So if the plan asks for 10KB of memory and you need 1000KB, the rest of 990 will be stored and used from tempdb even if there is enough memory to put them in, since the memory requirements don't change from estimated to actual. The reverse is not much better, since a plan may ask for a lot of memory when it only needs little, thus making everything else on that machine have less available resources.
  • SQL default settings suck - there was an entire presentation about that, it is useful to think a little about it. So many settings are legacy things that make no sense, like the initial database size, the autogrow size, index fill factors, maxdop (degree of parallelism), parallelism threshold, used memory (ironically, using all of it may be hurtful as it takes it away from other processes which leads to using the swap file), etc.
  • Look for hard page faults - this counter is much useful than the soft page faults, which are fixable faults. A hard page fault is indicative of unnecessary IO operations, which are orders of magnitude slower than memory use.

There are a lot more things that I want to explore now, since I participated to the event. You may find the files for the presentations in the same place as the full list of talks at SQLSaturday.

and has 3 comments

Tad Williams probably fancies himself as another Tolkien: he writes long decriptions of lands and people and languages, shows us poetry and songs, tells us about the rich history of the land. And all of this while we follow yet another common, but good boy, with a mysterious ancestry, while he and his merry band of helpers fight THE DARK ONE. It's the same old story, with the hapless youth that is guided by wise but not forthcoming people who tend to die, leave or otherwise shut up before the hero gets the whole story and can do anything about it. Regardless, he is young and lucky, so it's OK.

If The Dragonbone Chair would have been fun or interesting or at least show us a character that we could care about, this book would have been readable. As such it's a trope filled, boring and sleep inducing thing. You have to wait until half of the book to see the things that you predicted would happen from the first few chapters. I couldn't even finish it. More than two thirds in the book and there is no significant part of the story that involves either dragons or chairs.

Bottom line: it sucks!