and has 0 comments
I had this situation where I had to match a string by several regular expressions and choose the most specific match. Let me explain it a little bit further: you have a string and several regular expressions that match that string. The requirement is to choose which one matches better. This would be useful if, for example, you want to match mobile manufacturers and so you define one configurations for Apple and another for Android and another for the rest. The regular expressions would be "Apple", "Android" and ".*" (which matches everything). So you would want to choose only "Apple" for Apple devices and not ".*".

Now, I've studied regular expressions a bit, but I don't want to implement a regular expression parser in order to compute the complexity of the expression tree. What I want is to have various Regex objects and, somehow, compare their complexity. I am not going to pretend that I understand what goes on within the Regex object, instead I will say that I noticed the complexity of a regular expression is directly proportional to the length of the pattern and, in the case of the Regex object, with the number of items in the _codes int32 array found in the code RegexCode object found as a private field of the Regex object.

So, the simplest code for that is this:
private double regexComplexity(string pattern)
{
var reg = new Regex(pattern, RegexOptions.Singleline | RegexOptions.IgnoreCase);
var codeField = reg.GetType().GetField("code", BindingFlags.Instance | BindingFlags.NonPublic);
var code = codeField.GetValue(reg);
var _codesField = code.GetType().GetField("_codes", BindingFlags.Instance | BindingFlags.NonPublic);
var codes = (int[])_codesField.GetValue(code);
return codes.Length;
}

However we must take into consideration several cases:
  • The regular expression pattern might be wrong
  • Reflection is slow

So, let's first catch all exceptions and use the length of the pattern as the complexity and then find a way to cache the complexity of a string, in case we use it more than once. Also, cache the FieldInfo objects for use at every call.

Here is the final code:
public static class RegularExpressionExtensions
{
private static ConcurrentDictionary<string, int> _dict;
private static FieldInfo _codeField;
private static FieldInfo _codesField;

static RegularExpressionExtensions()
{
_dict = new ConcurrentDictionary<string, int>();
_codeField = typeof(Regex).GetField("code", BindingFlags.Instance | BindingFlags.NonPublic);
_codesField = _codeField.FieldType.GetField("_codes", BindingFlags.Instance | BindingFlags.NonPublic);
}

public static int Complexity(this Regex reg)
{
if (reg == null) return 0;
var pattern = reg.ToString();
var complexity = _dict.GetOrAdd(pattern, p => getComplexity(reg));
return complexity;
}

private static int getComplexity(Regex reg)
{
var code = _codeField.GetValue(reg);
var codes = (int[])_codesField.GetValue(code);
return codes.Length;
}
}

Basically we encapsulate the Regex complexity in an static extension class that we can use on any Regex instance as reg.Complexity().

Today I published a raw version of a program that solves Pixelo puzzles, a Flash version of the game generally known as Nonogram, Picross, Hanjie, etc.. I was absolutely fascinated by the game, not only the concept, but also the attention to details that Tamaii lent to the game. The program is called Pixelo Solver and you can find it at Github, complete with source code.


I liked working on this because it covers several concepts that I found interesting:
  • Responding to global key combinations
  • Getting a snapshot of the screen and finding an object in it
  • Parsing a visual image for digits in a certain format
  • The algorithm for solving the puzzle in a reasonable amount of time and memory (it was not trivial)

Quick how-to: get all the files from here and copy them in a folder, then run PixeloTools.GameSolver.exe and press Ctrl-Shift-P on your puzzle in the browser.

Usage:
  1. Start PixeloTools.GameSolver.exe
  2. Open the browser to the Pixelo game
  3. Start a puzzle
  4. Press Ctrl-Shift-P

However, if the list of numbers per row or column is too long, the automated parsing will not work. In that case, you may use it offline, with a parameter that defines a file in the format:
5 1 10, 3 3 7, 2 5 4, 2 5 2, 2 1 1 1, 1 1 1 1 1, 1 1 3 1 1, 1 1 1 1 1, 1 1 1, 1 1, 1 1, 2 2 2 4, 2 1 1 1 2 4, 2 1 2 1, 7 4, 2 2 1 2 2 2, 2 1 1 1 1 1 1, 2 1 1 4 2, 1 3 4 2 1, 1 1
8 1 1, 5 2 2 1, 2 1 3, 1 1, 1 7 4 1, 3 5 1, 3 1 1 3, 4 3 4, 3 1 1 3, 3 5 1, 1 7 4 1, 1 2, 1 1 1, 2 2 2, 2 1 1 2, 2 1 2 2, 3 2 1, 3 1 1, 4 1 2 2, 9 2 3
where the first line is the horizontal lines and the second line the vertical lines. When the parsing fails, you still get something like this in the output of the program. You can just copy paste the two lines in a file, edit it so that it matches the puzzle, then run:
start "" PixeloTools.GameSolver.exe myFile.txt

The file can also be in XML format, same as in the file the game loads. That's for Tamaii's benefit, mostly.

Let me know what you think in the comments below.

Relational databases have the great advantage of being able to use indexes. These are constructs that trade space and the speed of inserts and updates for query speed. This introduces a problem: what do you do when you have a high input and high output application? How can you make your database fast for both changes and queries?

The traditional solution is a write-database and a read-database. There are background services that ensure that the read-database is updated as fast as necessary with the new data, most of the time also doing extra computation and caching for the information one needs extracted. OLAP systems are a common solution, where the aggregated data required by various reports is computed and stored, ready to be read by your applications. I am not saying this is a bad idea, in fact I am saying this is the best idea for this scenario. The problem that I have with it is that you can hardly automate the process. You need to know what you want to read, you need to write the software to create the data and aggregate it into what you need.

So I decided to try to build a system that obeys the following rules:
  1. The speed of insert and update operations needs to be unhindered by indexes. Indeed, changes to the original data should be avoided.
  2. The select operation need to benefit from the indexing system.
  3. The system must be able to determine by itself the data structures needed to cover the first two rules.
  4. Optionally, there should be a way to encapsulate this new behaviour into a separate data structure from the original database.


In order to insert and update as fast as possible, I needed tables that only had primary keys, identity integers rather than uniqueidentifiers, as they lead to fragmentation of clustered indexes. In order to not only index the columns that are used in where and join conditions, but also encapsulate the behaviour in some other data structure, I decided to create shadow tables with the columns that I needed for querying. These tables I would then index in order to improve selects. The connection between the original insert table and the select table would be made via primary keys and the synchronization between the two types of tables would be done in the background, whenever needed. Best of all, analysis on execution plans could automatically determine the columns needed for this system and create the tables and indexes required, then suggest improvements on the observed queries.

In order to test my theory I created the following tables:
  • OriginalTable - with a primary key identity ingeniously called Id and two other columns called Latitude and Longitude, containing random decimal(18,6) values
  • LocationIndexTable - with a refId integer primary key column that links to the OriginalTable Id - without being a foreign key - and two Latitude and Longitude columns that are indexed by the same non clustered index
  • LocationIndexTable2 - with a refId integer primary key column that links to the OriginalTable Id - without being a foreign key - and a locId integer column that links to another table called LocationTable and has its own non clustered index
  • LocationTable - with a primary key identity column and Latitude and Longitude columns. I know that this looks identical to LocationIndexTable, but this is the more complex case where there are multiple records with the same location. LocationTable holds the distinct Location and Latitude values
  • LocationIndexTable3 - with a refId integer column that links to the OriginalTable Id and is indexed by its own nonclustered index - without being a foreign key - and two Latitude and Longitude columns that are indexed by a clustered index

With a number of 77179072 original table rows, I attempted the queries for each case, careful to clean the cache and memory buffers before that. Here are the results:
  • SELECT count(1) FROM OriginalTable WHERE Latitude BETWEEN 45.5 AND 46 AND Longitude BETWEEN 8.5 AND 9 - no indexes whatsoever. Result: 30 seconds
  • SELECT count(1) FROM OriginalTable ot INNER JOIN LocationIndexTable lit ON lit.RefId=ot.Id WHERE lit.Latitude BETWEEN 45.5 AND 46 AND lit.Longitude BETWEEN 8.5 AND 9. Result: 17 seconds
  • SELECT count(1) FROM OriginalTable ot INNER JOIN LocationIndexTable2 lit2 ON lit2.RefId=ot.Id INNER JOIN LocationTable lt ON lit2.LocId=lt.Id WHERE lt.Latitude BETWEEN 45.5 AND 46 AND lt.Longitude BETWEEN 8.5 AND 9. Result: 41 seconds
  • SELECT count(1) FROM OriginalTable ot INNER JOIN LocationIndexTable3 lit ON lit.RefId=ot.Id WHERE lit.Latitude BETWEEN 45.5 AND 46 AND lit.Longitude BETWEEN 8.5 AND 9. Result: 22 seconds

Unexpectedly for me, the most comfortable solution also seems the faster. Indeed, one issue is that I didn't have duplicate location data in the database, so there was no advantage in adding a new table to link locations to the original table. That means that in cases where the indexed data has many duplicates, it might be advantageous to experiment with a "distinct" table, although indexing should take this case into account, as well. The clustered index is slower than the unclustered one, that was a surprise. Also, the indexing just made the query twice as fast - I had expected more. Of course, all this benchmarking was done after deleting the cache and buffers with the commands DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS. It is interesting to see how fast they queries go without this clearing. The first unindexed query takes 3 or 4 seconds, while all the others take less than 2.

There is one more thing that needs to be addressed: moving these tables into another database. Are the indexes just as fast? They should be, but we must test interdatabase communication. This would allow to move the entire system outside a target database, truly encapsulated, really adding value to a database that remains unaffected. My tests show the following results: 28, 18, 29 and 18 seconds, respectively. Yes, you saw that right, the speed of the joins when the indexed databases are in another database on the same server seems faster! Just to make sure I reran the original tests on the same database and I got approximately the same results: 29,19,43 and 24, respectively. The only thing I didn't try (and I don't intend to) is to create primary-foreign key associations, as this means modifying the original tables, something I wish to avoid.

So, after all this I believe my idea has been validated. A lot more work has to be done in order to automate this system, but at least a no-effort manual solution is possible. Let's recap:
  1. The speed of row inserts and updates remains unchanged: the only index is the original primary key identity integer key that should always exist in a table anyway.
  2. The speed of selects is improved by creating tables that have an integer primary key that links to the original table, and only the columns used in queries, over which indexes are created.
  3. Queries can be used to determine the columns needed to index. Even better, computed columns can be used instead of the original columns, which adds a little more performance.
  4. Encapsulation is achieved by not only creating other tables for reading, but also moving these tables into another database, after which, unexpectedly, the performance is even better.


The end result is a system similar to indexing, but which takes space in another database and which is updated on demand, not when the system deems it necessary. Any SQL statements that worked before will continue to work unaffected, but faster alternatives for selects will become available. Overall, I am very satisfied, although I had expected better performance improvements than observed.

I was trying to do a simple thing: configure a daily rolling log for log4net, meaning that I wanted that log files would be created daily and the name of the files would contain the date. The log was already configured and working with a normal FileAppender, so all I had to do was find the correct configuration. There are several answers on the Internet regarding this. I immediately went to the trusty StackOverflow and read the first answers, copy pasted lazily, and it seemed to work. But it did not. So warning, the first answer on StackOverflow about this is wrong.

So, my logs had to be named something like Application_20150420.log. That means several things:
  • The name of the appender class has to be set to log4net.Appender.RollingFileAppender
  • The name of the log files need to start with Application_ and end in .log
  • The name of the log files need to contain the date in the correct format
  • The logger needs to know that the files need to be created daily

This is the working configuration:
<appender name="FileAppender" type="log4net.Appender.RollingFileAppender">
<filter type="log4net.Filter.LevelRangeFilter">
<acceptOnMatch value="true" />
<levelMin value="DEBUG" />
</filter>

<file type="log4net.Util.PatternString" value="c:\logfiles\Application_.log" />
<appendToFile value="true" />
<rollingStyle value="Date" />
<datePattern value="yyyyMMdd" />
<preserveLogFileNameExtension value="true"/>
<staticLogFileName value="false" />

<lockingModel type="log4net.Appender.FileAppender+MinimalLock"/>
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%date %-5level %logger - %message%newline"/>
</layout>
</appender>

As you can see, it is a little counter-intuitive. You do not need to specify the date in the file name, it will be added automatically, and you absolutely need to use preserveLogFileNameExtension, otherwise your files would look like Application_.log20140420

A blog reader asked me to help him get rid of the ugly effect of a large background image getting loaded. I thought of several solutions, all more complicated than the rest, but in the end settled on one that seems to be working well and doesn't require complicated libraries or difficult implementation: using the img onload event.

Let's assume that the background image is on the body element of the page. The solution involves setting a style on the body to hide it (style="display:none") then adding as child of the body an image that also is hidden and that, when completing loading, shows the body element. Here is the initial code:

<style>
body {
background: url(bg.jpg) no-repeat center center fixed;
}
</style>
<body>


And after:

<style>
body {
background: url(bg.jpg) no-repeat center center fixed;
}
</style>
<body style="display:none">
<img src="bg.jpg" onload="document.body.style.display=''" style="display:none;" />


This loads the image in a hidden img element and shows the body element when the image finished loading.

The solution might have some problems with Internet Explorer 9, as it seems the load event is not fired for images retrieved from the cache. In that case, a slightly more complex Javascript solution is needed as detailed in this blog post: How to Fix the IE9 Image Onload Bug. Also, in Internet Explorer 5-7 the load event fires for animated GIFs at every loop. I am sure you know it's a bad idea to have an animated GIF as a page background, though :)

Warning: While this hides the effect of slow loading background images, it also hides the page until the image is loaded. This makes the page appear blank until then. More complex solutions would show some simple html content while the page is loading rather than hiding the entire page, but this post is about the simplest solution for the question asked.

A more comprehensive analysis of image preloading, complete with a very nice Javascript code that covers a lot of cases, can be found at Preloading images using javascript, the right way and without frameworks

I am starting a new blog series called Blog Question, due to the successful incorporation of a blog chat that works, is free, and does exactly what it should do: Chatango. All except letting me know in real time when a question has been posted on the chat :( . Because of that, many times I don't realize that someone is asking me things and so I fail to answer. As a solution, I will try to answer questions in blog posts, after I do my research. The new label associated with these posts is 'question'.

First off, some assumptions. I will assume that the person who said I'm working on this project of address validation. Using crf models is my concern. was talking about Conditional Random Fields and he meant postal addresses. If you are reading this, please let me know if that is correct. Also, since I am .NET developer, I will use concepts related to .NET.

I knew nothing about CRFs before writing this posts, so bear with me. The Wikipedia article about them is hard to understand by anyone without mathematical (specifically probabilities and statistics) training. However the first paragraph is pretty clear: Conditional random fields (CRFs) are a class of statistical modelling method often applied in pattern recognition and machine learning, where they are used for structured prediction. Whereas an ordinary classifier predicts a label for a single sample without regard to "neighboring" samples, a CRF can take context into account. It involves a process that classifies data by taking into account neighboring samples.

A blog post that clarified the concept much better was Introduction to Conditional Random Fields. It describes how one uses so called feature functions to extract a score from a data sample, then aggregates scores using weights. It also explains how those weights can be automatically computed (machine learning).

In the context of postal address parsing, one would create an interface for feature functions, implement a few of them based on domain specific knowledge, like "if it's an English or American address, the word before St. is a street name", then compute the weighting of the features by training the system using a manually tagged series of addresses. I guess the feature functions can ignore the neighboring words and also do stuff like "If this regular expression matches the address, then this fragment is a street name".

I find this concept really interesting (thanks for pointing it out to me) since it successfully combines feature extraction as defined by an expert and machine learning. Actually, the expert part is not that relevant, since the automated weighing will just give a score close to 0 to all stupid or buggy feature functions.

Of course, one doesn't need to do it from scratch, other people have done it in the past. One blog post that discusses this and also uses more probabilistic methods specifically to postal addresses can be found here: Probabilistic Postal Address Elementalization. From Hidden Markov Models, Maximum-Entropy Markov Models, Transformation-Based Learning and Conditional Random Fields, she found that the Maximum-Entropy Markov model and the Conditional Random Field taggers consistently had the highest overall accuracy of the group. Both consistently had accuracies over 98%, even on partial addresses. The code for this is public at GitHub, but it's written in Java.

When looking around for this post, I found a lot of references to a specific software called the Stanford Named Entity Recognizer, also written in Java, but which has a .NET port. I haven't used the software, but it seems as it is a very thorough implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION). Perhaps this would also come in handy.

This is as far as I am willing to go without discussing existing code or actually writing some. For more details, contact me and we can work on it.

More random stuff:
The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world sequence labeling tasks. - from Conditional Random Fields: An Introduction

Tutorial on Conditional Random Fields for Sequence Prediction

CRFsuite - Documentation

Extracting named entities in C# using the Stanford NLP Parser

Tutorial: Conditional Random Field (CRF)

Just to remember this for future work. I wanted to replace GetDate() default column values with SysUtcDatetime(). This is the script used:
-- declare a string that will hold the actual SQL executed
DECLARE @SQL NVARCHAR(Max) = ''
SELECT @SQL=@SQL+
N'ALTER TABLE ['+t.name+'] DROP CONSTRAINT ['+o.name+'];
ALTER TABLE ['
+t.name+'] ADD DEFAULT SYSUTCDATETIME() FOR ['+c.name+'];
'
-- drop the default value constraint, then add another with SYSUTCDATETIME() as default value
FROM sys.all_columns c -- get the name of the columns
INNER JOIN sys.tables t -- get the name of the tables containing the columns
ON c.object_id=t.object_id
INNER JOIN sys.default_constraints o -- we are only interested in default value constraints
ON c.default_object_id=o.object_id
WHERE o.definition='(getdate())' -- only interested in the columns with getdate() as default value

-- execute generated SQL
EXEC sp_executesql @SQL

Recently I created a framework for translating JSON requests from a REST API to entities sent to the database. For simple objects, it was very easy, just create an SQL parameter for each property. However, for complex objects - having other objects as properties - this was not a solution. So I used a DataContractSerializer to transform the object to XML, send it as an XML SQL parameter and get the values from it in the stored procedures. Then I noticed date time inconsistencies between the two approaches. What was going on?

Let's start with the code. The DateTime object created from the JSON is a date and time value with a timezone, like 16:00 UTC+1. That is 15:00 in universal time. One you send it as a parameter for a stored procedure, the value received by the stored procedure is 16:00 (the server has the same timezone). In SQL Server, DATETIME and DATETIME2 types don't store timezone information. However, when sent through XML, the value looks like this: 2015-03-09T16:00:0.0000000+01:00. Using SELECT [Time] = T.Item.value('@Time','DATETIME2') FROM @Xml.nodes('//Location/SDO') as T(Item), the value returned is 15:00! You get 16:00+01 if you translate to DATETIMEOFFSET.

So let's recap: When you send a DateTime with timezone offset as an SQL parameter, the value reaching the SQL server is the local time. When you extract a textual value with timezone offset from an XML into a DATETIME, using the .value method, the value you get back is basically the Universal Time.

Solutions? Well, if you are going to use DateTime, you might as well consider that servers and devices are not always in the same timezone. Always translating values to universal time might be a good idea. Another solution is to extract from XML to a DATETIMEOFFSET type, which holds both the datetime and the timezone offset. Converting that value to DATETIME or DATETIME2 removes the timezone (Warning: it does NOT give the local time, unless you are in the same zone as the timezone in the datetimeoffset value).

I was trying to solve another problem, that operations creating, altering or dropping databases cannot be made inside a .Net transaction. Therefore I created a new TransactionScope inside the main one, using the TransactionScopeOption.Suppress option, but then I started getting this weird exception: The transaction associated with the current connection has completed but has not been disposed. The transaction must be disposed before the connection can be used to execute SQL statements. But I did Complete and Dispose all my transaction scopes, so what was going on? Long story short, this is really confusing message for a simple problem: your transaction probably timed out. Create the transaction scope with a large TimeSpan as the Timeout and it will get through. If you can, you can use a try/catch/finally block in which you Dispose the transaction (just remember that a using construct is basically a try/finally block). In my case, I was conditionally creating this new TransactionScope, so I wasn't using using. The transaction would fail, bleed into the other transaction where the confusing exception was being thrown.

I have another SQL quirk for you today, particularly in Microsoft's SQL Server. It concerns the AVG function when the SUM of the averaged values would result in an overflow for that type: AVG fails with the same error. Let's imagine for a moment that we wouldn't have an AVG function. In that case, our code would use SUM and COUNT to average values. For the integer column x, the naive SUM(x)/COUNT() would fail when the sum goes over the INT maximum value (I am not sure that SUM should fail, either, but that is another discussion). So our solution would be something like CONVERT(INT,SUM(CONVERT(BIGINT,x)))/COUNT(). Obviously, this is the solution for the AVG issue, just average the values converted to the biggest type available. So it seems that in their flagship SQL server, Microsoft implemented the naive version, hmm... To be fair, this behaviour is documented in the T-SQL function page: If the sum exceeds the maximum value for the data type of the return value an error will be returned.

As I see it, it is a bug, as this should have been handled internally, and one is logged in Microsoft Connect here: AVG function causes Arithmetic overflow error. Read the funny comment from some Microsoft employee who says there is a solution proposed, but they don't have time to implement it in SQL Server 2008, so probably they will do that in a future version. I'm trying on SQL Server 2014; it's 7 years later, Microsoft!!

Just for kicks, I tried the same in MySQL and it works there. More than that, it doesn't fail for SUM, either, as it automatically returns a BIGINT value, pushing the issue to an eventual insert or update in an INT column.

I am relatively new to the entire NuGet ecosystem. What I expected is for things to just work. You know... Microsoft. However the web of interdepencies seems to be too much even for them. The problems that appear when updating MVC versions, .NET Framework versions, etc, are as annoying as they are unclear. One example: I was trying to publish a project that worked perfectly on my system. I moved it to the server machine, and weird things began to happen. The most annoying of them all is that the errors that occur do that at runtime instead of at compile time. Such an error was "Could not load file or assembly System.Web.WebPages, Version=3.0.0.0...". The project reference for WebPages was 2.0.0.0. If I removed it and tried to add a reference, only versions 1.0.0.0 and 2.0.0.0 were available. Meanwhile Razor was complaining that it didn't find the 3.0.0.0 version of WebPages.

Long story short: don't try to resolve only the library dependencies, but also the framework dependencies. In this case, System.Web.WebPages 3.0.0.0 is only available for the .NET framework 4.5.1. The project was configured as 4.5. Updating the MVC framework after the change to .NET 4.5.1 solved it.

Steps:
  • Change the project version to 4.5.1 (or whatever the newest usable .NET framework version)
  • go to the NuGet Package Manager Console in Visual Studio
  • Run command Update-Package -reinstall Microsoft.AspNet.Mvc

This, of course, is not a panacea for all problems, but just remember that the .NET framework is important in these scenarios.

and has 2 comments
I met this situation where I wanted to implement an http interceptor, a piece of JavaScript code that would do something on unauthorized access to my API. The way to do this is irrelevant (and different from jQuery and AngularJS sites), but there is a problem that affects every situation and that is when you access the API from a different domain than the API's. You see, the browser needs the API server to authorize CORS for every Ajax request that accesses that server from another domain. You might think you did that already in your API, but let me ask you: when there is a problem, like not authorized access, are you sending CORS headers with your error response? Because if you do not, everything you send, including the http code, will not be parsed by the browser and any interception will just show a code of 0. The situation is a little confounded by the fact that the browser does announce that you have a CORS access problem, but also displays the status message, which in this case would be "Unauthorized access". This might make you think you can access that message or the status code. Well, you cannot.

The solution is to send the CORS headers: Access-Control-Allow-Origin, Access-Control-Allow-Methods, Access-Control-Allow-Headers. Just set their value to "*" and send them with your error response and now your JavaScript can read the error code and message.

I was using this JavaScript function that I wanted to accept an array of arguments or just a bunch of arguments that would be interpreted as an array. As you probably know, for each function in JS you get a variable called arguments which contains the arguments to the function. It is a pseudo array, not a real array, and has some extra properties. My code looked like this:
function x(arg) {
if (!arg) {
arg=[];
} else if (!arg.length) {
arg=[];
for(var i=0; i<arguments.length; i++) arg.push(arguments[i]);
}
// do something with arg
}

The logic is simple, if there is no first parameter, arg becomes an empty array, else, if there is a first argument, but it doesn't have a length property (not an array) set arg to an array and push all arguments of the function as items of the array. But it doesn't work! The point is this: you set arg to an empty array and at that moment arguments[0] is no longer the original argument, but the empty array. Even worse, the code then adds the array as an item of itself, which makes the object be infinitely recursive.

Let's make this simpler:
function x(arg) {
arg=[];
console.log(arguments[0]);
}
After you execute x() with any arguments, the console will show an empty array, not the original argument. Weird, huh?

Sometimes you want to run your browser without some protections that are enabled by default in it. One of these is the cross-origins protection when running a local filesystem script. For Chrome, the solution is to run the browser with the command line switch --allow-file-access-from-files. Seems straight forward enough, but if your browser is already open (usually I have at least 10 active tabs, ranging from documentation pages, email to the music I listen to), the command line switches will be ignored and your script will be run as just another window in the same instance of Chrome. In order to fix this, you need to use another switch called --user-data-dir. Just make sure this folder exists and it can be deleted (because it will be filled with a zillion useless files).

So, how do I run a test.html file that can access files and is located in C:\SomePath ? Like this:
start "Chrome with access to files" chrome --allow-file-access-from-files "file:///C:\SomePath\test.html" --user-data-dir="C:\SomePath\UserDir"


In your path a UserDir folder will be created which you can delete after you finish your work.

Of course, this issue applies to any other flags that you want to use ad-hoc while Chrome is already open.

I was trying to write a simple query in MySQL that only inserted a row if there wasn't one already there, else it would update the existing one. MySQL has a system to take care of that for you if you have a primary key or a unique index on the table you want to insert or update. You can choose between INSERT ... ON DUPLICATE KEY or REPLACE. However, what if you don't want to add an index on your table just for that?

T-SQL (Microsoft SQL Server) has a syntax that works like this:
IF EXISTS(SELECT * FROM MyTable WHERE y=2)
UPDATE MyTable SET x=x+1 WHERE y=2;
ELSE
INSERT INTO MyTable (x,y) VALUES(1,2);
END

MySQL also has an IF...THEN syntax as well as an EXISTS statement. The problem is that they work in a completely different way. IF has an extra THEN keyword, uses ELSEIF instead of ELSE and needs to end with an END IF. EXISTS works only in WHERE clauses. So, let's translate the query above in MySQL and hope for the best:
INSERT INTO MyTable (x,y) 
SELECT 0,2 FROM DUAL
WHERE NOT EXISTS(SELECT * FROM MyTable WHERE y=2);
UPDATE MyTable SET x=x+1 WHERE y=2;

Notice several things: I am not using any IF and I am updating rows all the time after the conditional insert. I am selecting values from DUAL because any select with a where clause needs a table in MySQL and DUAL is a dummy one built into the engine. (SELECT 1 WHERE 2=2; is not valid in MySQL). Also, I am inserting a value, then updating it, which is not the most efficient.

There is another way, closer to the original T-SQL query, but it doesn't use EXISTS. It looks like this:
IF (SELECT 1=1 FROM MyTable WHERE y=2) THEN
UPDATE MyTable SET x=x+1 WHERE y=2;
ELSE
INSERT INTO MyTable (x,y) VALUES(1,2);
END IF;

Thing to notice here: I am using 1=1 to return a TRUE value that will make the IF work. Also, PAY ATTENTION!, this doesn't work as a simple query. I spent the better half of an hour trying to understand where the syntax error was while trying to execute this directly. Any flow operations like IF or WHILE, etc, are only valid in "programs", the MySQL term for stored procedures and functions.

I hope I clarified things for you on this.