Right now there is a battle raging on that few of us are aware of. It is for one thing only, and that is control over the Internet, control over communications between people, whether it is a discussion about a two tiered Internet, one free and one paid, or a ban instituted by a government or another on sites that are considered bad for you. It started as it usually does, with governments and corporations trying to get as much of the pie as possible. Only something was different: the Internet is so basic, so flexible, that the companies regulating its use and owning the hardware it runs on cannot control its flow or its direction. And as great strides have been made by intelligence and commercial entities alike to control the content and to track the use, equally great strides have been made by individuals to conceal the use and escape monitoring and censorship. The biggest and most touted mechanism that allows anonymity on the Internet is called TOR, The Onion Router, and its concept is simple: encrypt all communications and randomly route requests through the TOR nodes so that the origin of the access is next to impossible to find. There are other, less known methods of doing this, but TOR is the most used and the most known. It is mostly used as a proxy to anonymize normal Internet access, though, and very few people are actually using TOR to access TOR services only.
I am here to tell you that, first, TOR is not enough and, second, that no other software will ever be enough for this kind of use. You see, the TOR nodes I was talking about are people using TOR on their computers and allowing other people to access the "normal" Internet through them. A lot of the TOR exit nodes that are the border of the anonymous TOR world and the transparent Internet, are actually heavily monitored by everyone interested, if not actually ran by them from the beginning. Like in an old example where the FBI was running an IP anonymizing proxy, those exit points are the weak spot of the TOR network. Another flaw is the fact that it works as a proxy for normal IP protocols. Some software (Bittorrent, for example) is openly sending the originating IP in their data, so it doesn't matter if you go through TOR to download stuff, your IP is still there for the world to see. Since you cannot trust all software than runs on your computer, you cannot completely trust using TOR as a proxy for anonymous Internet access.
The solution, I believe, is to implement the anonymizing and encryption features in the Internet itself. Make it so that there is no address for any of its users, or if it is, it is something temporary that you assigned for a connection or another and can be easily recreated and changed. Do it in such a manner that no one will be able to control the DNS servers and the naming schemes, so that you can call your web site whatever you want and not have to pay for it and be able to host it without broadcasting to the world where you are. The problems in implementing this are major, but not insurmountable. One of them is that encryption and complicated routing are significantly decreasing access time. However, given the speed of Internet today, that is not really a big problem anymore.
My thesis is that if freedom of speech, true freedom of speech, is implemented in a technical way, unbiased by any other rule than that you are free to communicate without fear, then no amount of intimidation will be able to break it. As always when human politics have encroached in the territory of personal freedom, the only solution is usually technical, at least since Gutenberg made his printing press and probably way before that.
I am myself not skilled enough to think of all the aspects of such a new protocol for the Internet. Also I am pretty sure that opposition will be huge against any attempt to do it. But what about if we, technical people, get together and make this work? Borrowing parts from the enormously successful TOR, Bittorrent, Bitcoin, we can architect freedom rather than just talk about it in the context of some war or another. Think about it the next time when, in your free country, you get arrested for saying what you believe in or sharing what you know or trying to access a site and finding that it is not there anymore, not for you at least.
Last year there were three very good US political shows: Homeland, of course, then The Americans, which presents two KBG agents pretending to be US citizens as the main protagonists, as well as The Assets, which was not that good, but was about Aldrich Ames, the infamous American CIA agent who sold secrets to the Russians. All of these shows were presenting various intelligence services doing their best, and in good conscience, to further the interests of their countries. Motivated people, some you might love, some you might hate, but all doing things for the right reasons. Unfortunately, this year seems to be plagued by disgustingly propagandistic shows like Madam Secretary and State of Affairs, bent on showing the US as the spotless white knights and their enemies as faceless evil villains.
Both seemingly wanting to put forward strong female characters with real power and responsibility, they do it in a sledgehammer way that makes me want to cringe. Madam Secretary is about an ex-CIA, now political university intellectual, woman who gets to become the US Foreign Secretary after an unforeseen accident to the real secretary. Interpreted by the beautiful Téa Leoni, it presents the entire US administration as a bunch of do-gooders, plagued by the moral consequences of using their power and having to often sidestep their conscience in order to save the world from the boogie man. Not only a powerful woman at work, she is also a mother, having to take care of her daughter and solve family problems together with her teacher husband. The entire image portrayed by the series is so artificial that you actually see all the pink dripping from where it was forcefully painted over all the black bits.
State of Affairs just started. The lead of the series is Grey's Anatomy star Katherine Heigl, also a CIA woman with a direct professional and personal connection with the US president, who is a black woman. Her fiance was killed right in front of her by terrorists. He was none other than the president's son. In the first three episodes she has to make decisions to thwart the efforts of: Arab terrorists abducting American doctors and threatening them with decapitation, Russian submarines that steal American secrets by tapping undersea fiber optics and Boko Haram terrorists kidnapping school girls. Meanwhile she is being torn by the fact that the guy who killed her fiance was a former asset of hers. She doesn't tell the president because... she would break her oaths to the CIA. The show has some darkness in it, but it also artificial, as some hidden entity has some proof that would incriminate her and a shady character who might be good or bad or both is circling like a vulture. Soon enough we'll discover her father is somehow involved and her fiance is not dead and he is actually her brother or something.
To their benefit, after exhausting the original material, Homeland is not necessarily worst. Also there is another show which I cannot yet say if I like or not, called The Honourable Woman. Great cast: Maggie Gyllenhaal, Stephen Rea (who is not a werewolf in this one... yet :D ) and others. It is an US/UK coproduction, really atmospheric, really dark, but also a bit slow and obtuse. I may have had my brain affected by the previous shows so I can't yet appreciate the value of this one. It seems to mix real politics with some really heavy stuff like assassinations, the economic battlefront between Israel and Palestine, arms smuggling, etc.
The reason why I wrote this post is because sometimes I feel like the media shows are following too closely the politics of the moment, so close in fact that many a time they seem to be slightly ahead of it. The bad guys are entities that are not yet enemies of the US or that behave in worst ways than their real life counterparts, the people in charge often have to bend the rules in order to remain the good guys, even if officially in reality those people have not (yet) bent those rules, etc. Unlike war movies that try to erase or at least erode the moral debt of nations and people involved in past wars, I feel now there are films and series that inject the apology before the act has even been committed. In a period when the US intelligence apparatus is being attacked from all sides by news reports of their activities, here they are, all these stories about the good ole CIA, ran by intellectual and beautiful women of power who maintain their morality like the good mothers of the nation that they are. Am I paranoid here?
The algorithm works perfectly well and is better than Sift3, however it's slightly more complex. You might want to start with Sift3 in order to understand where it came from.
Update November 8 2022: I found a bug in the algorithm, relating to maxDistance. I've updated the code. If you didn't use maxDistance, you are unaffected. Basically the fix is to compare temporaryDistance>minDistance (before it was >= ) and to move the calculation of the temporary distance after c1 and c2 are updated to their minimum value when a token was not found (otherwise the temporary distance might become larger than the final distance)
Try the Javascript implementation here:
Algorithm:
MaxOffset:
String 1:
String 2:
Result:
Update 28 Mar 2015: I've changed the algorithm significantly. The transpositions are now computed differently and the cost of a transposition in the final result is 1, rather than 0.5. Also, while I think a value of 1 is better conceptually, I noticed that Sift4 approximates Levenshtein a little better when the cost of a transposition is either 2 or a function depending on the offset difference between c2 and c1, especially when maxOffset grows. This can be now changed via the new options function transpositionCostEvaluator. The problem I am having now is more false positives when the letters/tokens of the two strings are the same, but their positions are jumbled differently. With small maxOffset values, like 5 or 10, the result is much better than Sift3, however when maxOffset grows, lots of matches can be found and the cost of transpositions becomes very important.
Update 27 Mar 2015: Thanks to Emanuele Bastianelli who discovered a bug that appeared in an edge case, I've updated the algorithms. Now, at the end of the while loop there is an extra check to prevent the algorithm exiting prematurely, before computing remaining tokens.
Intro
A really long time ago I wrote the third version of Sift, the string distance algorithm. It so happens that I am going to give a small presentation, here in Ispra, about this algorithm, so I had the opportunity to review it. I found some inconsistencies and I actually did some research in the field that gave more more ideas. So before giving the presentation I thought of publishing what I think is the fourth version. What's new:
33% more accurate
three different variants: simple, common and general
new concepts added
support for own value and matching functions, different tokenizer functions, etc.
actually tested with a (slightly more) serious test
more robust, working better for large values of maxOffset
Before I get into the details, I am publishing the algorithm here for the moment, no Codeplex or PasteBin or GitHub or whatever. Also, it is written in Javascript now, the C# and T-SQL version pending. Of course, it would be great if, as before, the community of people using the algorithm would go into implementing it into various programming languages, however I am a bit apprehensive because more often than not people came with their own improvements or interpretations when translating the algorithm into another language. But support is always welcome!
New concepts in Sift4
I created a test that used random strings, but also a huge list of commonly used English phrases as well as mutations on these strings, adding or removing small bits and so on. I then implemented Sift3, Levenstein and the new algorithm and computed the error distance between the Levenstein distance and the two Sift variants. This permitted me to see how the error evolves when changing the algorithm and the parameters. One thing I noticed is that when increasing the maxOffset value to large values like 15 or 20, the accuracy of Sift3 was going down. Also, as pointed out by one commenter on the Sift3 post, there are cases when Sift3(a,b) is different from Sift3(b,a). There are edge cases, but this one in particular grated me.
After implementing Sift4, I can now tell you that the simple version is slightly better than Sift3 for small maxOffset values like 5, but it gets better as the value increases. The common version is a bit more complex, but the error decreases with 33% and maintains a low error for large maxOffset values. The extended or general version receives an options object that can change almost everything, but most important is the tokenizer function. Imagine that you want to compute the distance based not on letters, but on n-grams (groups of n characters). Or that you want to compare them by the words in the text, maybe even their synonyms. This can all be achieved just by changing the tokenizer function. The other parameters involve defining what it means for two tokens to match and what is the value of their match, etc.
One of the new concepts implemented is taken from the Jaro distance. Jaro seems a lot like Sift in the way that it considers two characters to match if they are in close proximity. Also, if "the streams cross", like 'ab' vs 'ba', one considers them transpositions and removes some of their value from the distance. Actually, if I look at the implementation, it might be that I have independently discovered the Jaro distance. I will research this further. I don't know if the transposition calculation is the most optimal. At the moment it uses an array of all matches found until a point, clearing it of values as the cursors move along the string. The difference between the simple and the common versions of Sift4 is that the simple version is not computing the transpositions at all and has no concept of maxDistance. In that respect it is a slightly fixed up Sift3.
Another new concept added is the one of local substring. Imagine that the Largest Common Subsequence that Sift is actually trying to find in order to determine the distance is made of substrings, separated by non matching characters. Each of these substrings can be used to improve the distance function. For example one could argue that 'abcdex' is closer to 'abcde' than 'abcxde', because even if the largest common subsequence is 5, the largest common substring is 5 for the first string and only 3 for the second. The extended version of the algorithm allows for changing the value of each substring individually.
Well, here they are, the three versions. The extended version has some examples at the end for possible parameters.
The code
Simplest Sift4:
// Sift4 - simplest version
// online algorithm to compute the distance between two strings in O(n)
// maxOffset is the number of characters to search for matching letters
function sift4(s1, s2, maxOffset) {
if (!s1 || !s1.length) {
if (!s2) {
return 0;
}
return s2.length;
}
if (!s2 || !s2.length) {
return s1.length;
}
var l1 = s1.length;
var l2 = s2.length;
var c1 = 0; //cursor for string 1
var c2 = 0; //cursor for string 2
var lcss = 0; //largest common subsequence
var local_cs = 0; //local common substring
while ((c1 < l1) && (c2 < l2)) {
if (s1.charAt(c1) == s2.charAt(c2)) {
local_cs++;
} else {
lcss += local_cs;
local_cs = 0;
if (c1 != c2) {
c1 = c2 = Math.max(c1, c2); //using max to bypass the need for computer transpositions ('ab' vs 'ba')
}
for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
if ((c1 + i < l1) && (s1.charAt(c1 + i) == s2.charAt(c2))) {
c1 += i;
local_cs++;
break;
}
if ((c2 + i < l2) && (s1.charAt(c1) == s2.charAt(c2 + i))) {
c2 += i;
local_cs++;
break;
}
}
}
c1++;
c2++;
}
lcss += local_cs;
return Math.round(Math.max(l1, l2) - lcss);
}
Common Sift4:
// Sift4 - common version
// online algorithm to compute the distance between two strings in O(n)
// maxOffset is the number of characters to search for matching letters
// maxDistance is the distance at which the algorithm should stop computing the value and just exit (the strings are too different anyway)
function sift4(s1, s2, maxOffset, maxDistance) {
if (!s1 || !s1.length) {
if (!s2) {
return 0;
}
return s2.length;
}
if (!s2 || !s2.length) {
return s1.length;
}
var l1 = s1.length;
var l2 = s2.length;
var c1 = 0; //cursor for string 1
var c2 = 0; //cursor for string 2
var lcss = 0; //largest common subsequence
var local_cs = 0; //local common substring
var trans = 0; //number of transpositions ('ab' vs 'ba')
var offset_arr = []; //offset pair array, for computing the transpositions
while ((c1 < l1) && (c2 < l2)) {
if (s1.charAt(c1) == s2.charAt(c2)) {
local_cs++;
var isTrans = false;
//see if current match is a transposition
var i = 0;
while (i < offset_arr.length) {
var ofs = offset_arr[i];
if (c1 <= ofs.c1 || c2 <= ofs.c2) {
// when two matches cross, the one considered a transposition is the one with the largest difference in offsets
isTrans = Math.abs(c2 - c1) >= Math.abs(ofs.c2 - ofs.c1);
if (isTrans) {
trans++;
} else {
if (!ofs.trans) {
ofs.trans = true;
trans++;
}
}
break;
} else {
if (c1 > ofs.c2 && c2 > ofs.c1) {
offset_arr.splice(i, 1);
} else {
i++;
}
}
}
offset_arr.push({
c1: c1,
c2: c2,
trans: isTrans
});
} else {
lcss += local_cs;
local_cs = 0;
if (c1 != c2) {
c1 = c2 = Math.min(c1, c2); //using min allows the computation of transpositions
}
if (maxDistance) {
var temporaryDistance = Math.max(c1, c2) - lcss + trans;
if (temporaryDistance > maxDistance)
return temporaryDistance;
}
//if matching characters are found, remove 1 from both cursors (they get incremented at the end of the loop)
//so that we can have only one code block handling matches
for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
if ((c1 + i < l1) && (s1.charAt(c1 + i) == s2.charAt(c2))) {
c1 += i - 1;
c2--;
break;
}
if ((c2 + i < l2) && (s1.charAt(c1) == s2.charAt(c2 + i))) {
c1--;
c2 += i - 1;
break;
}
}
}
c1++;
c2++;
// this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
if ((c1 >= l1) || (c2 >= l2)) {
lcss += local_cs;
local_cs = 0;
c1 = c2 = Math.min(c1, c2);
}
}
lcss += local_cs;
return Math.max(l1, l2) - lcss + trans; //add the cost of transpositions to the final result
}
Extended/General Sift4:
// Sift4 - extended version
// online algorithm to compute the distance between two strings in O(n)
// maxOffset is the number of positions to search for matching tokens
// options: the options for the function, allowing for customization of the scope and algorithm:
// maxDistance: the distance at which the algorithm should stop computing the value and just exit (the strings are too different anyway)
// tokenizer: a function to transform strings into vectors of tokens
// tokenMatcher: a function to determine if two tokens are matching (equal)
// matchingEvaluator: a function to determine the way a token match should be added to the local_cs. For example a fuzzy match could be implemented.
// localLengthEvaluator: a function to determine the way the local_cs value is added to the lcss. For example longer continuous substrings could be awarded.
// transpositionCostEvaluator: a function to determine the value of an individual transposition. For example longer transpositions should have a higher cost.
// transpositionsEvaluator: a function to determine the way the total cost of transpositions affects the final result
// the options can and should be implemented at a class level, but this is the demo algorithm
function sift4(s1, s2, maxOffset, options) {
options = extend(options, {
maxDistance: null,
tokenizer: function (s) {
return s ? s.split('') : [];
},
tokenMatcher: function (t1, t2) {
return t1 == t2;
},
matchingEvaluator: function (t1, t2) {
return 1;
},
localLengthEvaluator: function (local_cs) {
return local_cs;
},
transpositionCostEvaluator: function (c1, c2) {
return 1;
},
transpositionsEvaluator: function (lcss, trans) {
return lcss - trans;
}
});
var t1 = options.tokenizer(s1);
var t2 = options.tokenizer(s2);
var l1 = t1.length;
var l2 = t2.length;
if (l1 == 0)
return l2;
if (l2 == 0)
return l1;
var c1 = 0; //cursor for string 1
var c2 = 0; //cursor for string 2
var lcss = 0; //largest common subsequence
var local_cs = 0; //local common substring
var trans = 0; //number of transpositions ('ab' vs 'ba')
var offset_arr = []; //offset pair array, for computing the transpositions
while ((c1 < l1) && (c2 < l2)) {
if (options.tokenMatcher(t1[c1], t2[c2])) {
local_cs += options.matchingEvaluator(t1[c1], t2[c2]);
var isTrans = false;
//see if current match is a transposition
var i = 0;
while (i < offset_arr.length) {
var ofs = offset_arr[i];
if (c1 <= ofs.c1 || c2 <= ofs.c2) {
// when two matches cross, the one considered a transposition is the one with the largest difference in offsets
isTrans = Math.abs(c2 - c1) >= Math.abs(ofs.c2 - ofs.c1);
if (isTrans) {
trans += options.transpositionCostEvaluator(c1, c2);
} else {
if (!ofs.trans) {
ofs.trans = true;
trans += options.transpositionCostEvaluator(ofs.c1, ofs.c2);
}
}
break;
} else {
if (c1 > ofs.c2 && c2 > ofs.c1) {
offset_arr.splice(i, 1);
} else {
i++;
}
}
}
offset_arr.push({
c1: c1,
c2: c2,
trans: isTrans
});
} else {
lcss += options.localLengthEvaluator(local_cs);
local_cs = 0;
if (c1 != c2) {
c1 = c2 = Math.min(c1, c2); //using min allows the computation of transpositions
}
if (options.maxDistance) {
var temporaryDistance = options.localLengthEvaluator(Math.max(c1, c2)) - options.transpositionsEvaluator(lcss, trans);
if (temporaryDistance > options.maxDistance)
return Math.round(temporaryDistance);
}
//if matching tokens are found, remove 1 from both cursors (they get incremented at the end of the loop)
//so that we can have only one code block handling matches
for (var i = 0; i < maxOffset && (c1 + i < l1 || c2 + i < l2); i++) {
if ((c1 + i < l1) && options.tokenMatcher(t1[c1 + i], t2[c2])) {
c1 += i - 1;
c2--;
break;
}
if ((c2 + i < l2) && options.tokenMatcher(t1[c1], t2[c2 + i])) {
c1--;
c2 += i - 1;
break;
}
}
}
c1++;
c2++;
// this covers the case where the last match is on the last token in list, so that it can compute transpositions correctly
if ((c1 >= l1) || (c2 >= l2)) {
lcss += options.localLengthEvaluator(local_cs);
local_cs = 0;
c1 = c2 = Math.min(c1, c2);
}
}
lcss += options.localLengthEvaluator(local_cs);
return Math.round(options.localLengthEvaluator(Math.max(l1, l2)) - options.transpositionsEvaluator(lcss, trans)); //add the cost of found transpositions
}
function extend(obj, def) {
var result = {};
for (var prop in def) {
if (!obj || !obj.hasOwnProperty(prop)) {
result[prop] = def[prop];
} else {
result[prop] = obj[prop];
}
}
return result;
}
// possible values for the options
// tokenizers:
function nGramTokenizer(s, n) { //tokenizer:function(s) { return nGramTokenizer(s,2); }
var result = [];
if (!s)
return result;
for (var i = 0; i <= s.length - n; i++) {
result.push(s.substr(i, n));
}
return result;
}
function wordSplitTokenizer(s) { //tokenizer:wordSplitTokenizer
if (!s)
return [];
return s.split(/\s+/);
}
function characterFrequencyTokenizer(s) { //tokenizer:characterFrequencyTokenizer (letters only)
var result = [];
for (var i = 0; i <= 25; i++) {
var val = 0;
if (s) {
for (j = 0; j < s.length; j++) {
var code = s.charCodeAt(j);
if (code == i + 65 || code == i + 97)
val++;
}
}
result.push(val);
}
return result;
}
//tokenMatchers:
function sift4TokenMatcher(t1, t2) { //tokenMatcher:sift4TokenMatcher
var similarity = 1 - sift4(t1, t2, 5) / Math.max(t1.length, t2.length);
return similarity > 0.7;
}
//matchingEvaluators:
function sift4MatchingEvaluator(t1, t2) { //matchingEvaluator:sift4MatchingEvaluator
var similarity = 1 - sift4(t1, t2, 5) / Math.max(t1.length, t2.length);
return similarity;
}
//localLengthEvaluators:
function rewardLengthEvaluator(l) {
if (l < 1)
return l; //0 -> 0
return l - 1 / (l + 1); //1 -> 0.5, 2-> 0.66, 9 -> 0.9
}
function rewardLengthEvaluator2(l) {
return Math.pow(l, 1.5); // 0 -> 0, 1 -> 1, 2 -> 2.83, 10 -> 31.62
}
//transpositionCostEvaluators:
function longerTranspositionsAreMoreCostly(c1, c2) {
return Math.abs(c2 - c1) / 9 + 1;
}
As always, I will be most happy to know if you used my algorithm and how it performed, as well as receive any suggestion that you might have.
Options explained
Here is some explanation for the options of the general algorithm.
It no longer searches for characters, but for tokens. That is why the default tokenizer function splits the values into characters so that the algorithm would work on an array of one character long tokens. Other options are possible, like splitting the strings by empty spaces so that the comparisons are done on words or transforming a string into an array of strings N characters long, the so called N-grams. The tokenizer can be anything, like the characterFrequencyTokenizer, which turns each word in an array of 25 values representing the number of letters in the word for each letter a-z.
The tokenMatcher function returns true if two tokens are matching. They can be fuzzy matched, for example the sift4tokenMatcher example function uses Sift inside Sift to determine the character distance between two tokens and returns true if they match more than 70%.
The matchingEvaluator is a function that returns the value that will be added to the "common substring" length value when two tokens match. The default is 1, but one can use some other metric, like the similarity, for example. Of course, the common substring length has lost its meaning when these functions change, but the variable local_cs is still used.
The lengthEvaluator is taking the length value of the local common substring and returns a value that will be added to the longest common subsequence value. Usually it returns the same value as the one provided, but some functions could reward longer substrings.
FAQ
Q: Can you make Sift4 to work case insensitive? A: Just turn the strings to lower or upper case before you compare them. Since this algorithm is more general, the concept of 'case' might not apply. Or implement a case insensitive tokenMatcher.
Q: Can you make Sift4 to compare strings based on their meaning, like using synonyms? A: Use a tokenizer function that splits the strings into words, then replaces them with the most used of their synonyms. A more complex solution would require to analyze the strings beforehand and turn them into some ordered synonym or equivalent expressions equivalents, then use Sift4 with a word tokenizer (one is provided in the Extended algorithm source)
Q: I need an implementation for this programming language, can you help? A: I can, but I might not have the time. Ask anyway, maybe I can be persuaded :)
Q: I have been using Sift3 until now, how do I upgrade to Sift4? A: The best way I can think of is to implement Sift4 Simplest, as it needs only the Sift3 code and some minor changes. Since you never needed tokens before, I doubt you need them now. But if you do, I can help, see the above question.
Q: How can I reward you for this fantastic piece of software engineering? A: While I did this for free and I don't expect to make any money out of it and while this algorithm is completely free to use and change as you see fit, I don't mind having a beer every now and then ;)
Q: Your algorithm really sucks because... reasons. A: It may. I would be glad to discuss the reasons, though, and try to fix any problem you encounter.
Q: I compared Sift4 with another algorithm that is much more exact and there are differences. A: Of course, they are different algorithms. This is a fuzzy distance calculator, it doesn't give you the exact value. There are still edge cases. But the idea of Sift is to be fast and relatively accurate, rather than very accurate. You need more accuracy, try to combine Sift with Levenshtein for example, computing Levenshtein only where Sift says the strings are above a certain similarity.
Q: I want to make maxOffset dependent on the length of the strings compared. Can you do that? A: That is a perfect example why maxOffset should be a parameter of the function rather than a member of the class. Since this implementation is so far Javascript only, just compute the maxOffset that is convenient to you before you compare.
Q: I want to vary the weight of matches based on the position of the match, for example matches at the beginning of the string could be more valuable than those at the end. A: The position of the match is indeed not sent to the functions that can be specified in the options object of the Sift4 Extended, but that can be trivially changed in the code. I don't think this particular request is very common, though, and I prefer to keep it out of the published implementation to make the code easier to understand.
Q: I found a bug! A: Let me know it and I will try and fix it.
Q: If you need to compare large lists of strings, it is better to precompute some things, like specific hashes or suffix trees, etc. This will speed up the comparison tremendously! A: Sift is what is called an online algorithm. It does not precompute anything, it just gets the two strings and the parameters for its functioning and returns the distance. You are correct in what you are saying, but that kind of solution is not in the scope of Sift, at least not version 4.
Q: What are the edge cases for Sift? A: Probably there are several, but I didn't really spot them. One of them is that one might find both letters at a position matching letters at other positions, but only one will count. Example 'abxx' and 'bayy'. The algorithm will look at position 0, find no match, then try to find the closest match for each letter. Starting with position 0 in the first string it will find 'a' matched in position 1 in the second. It will increase both counters and lcss will be increase as well. Next check will be 'b', the character at position 1 in the first string matched with position 2 in the second string. No match, therefore both counters will be reset to 1, and starting search again. The 'b' match is lost and distance is 3 instead of 2. Also I think there might be some situations where the counters are not equal and the biggest of them reaches the end of its string, thus terminating the algorithm, but there could have been more matches. Incidentally I tried to fix both these issues and the error from Levenshtein was not really affected, but I am not 100% sure of the implementation.
Q: The algorithm continues to be asymmetric, Sift4(s1,s2) can be different from Sift4(s2,s1). A: Yes. This is one of the artifacts of the linear nature of the algorithm. There is a function that is symmetric and that is Math.min(Sift4(a,b),Sift4(b,a)), however it is twice as slow, obviously.
Implementations in other languages
You can find a Go implementation here, written by Jason W. Hutchinson. There is also a Swift implementation here. A Perl 6 (now called Raku) implementation can be found here.
I had this memory problem that I could not understand. OK, the design of the application was not the best in the world, but why would it always give me a damn OutOfMemoryException when I have a super duper computer with a lot of memory and I have solved issues like heap memory allocation? And the reason is that IISExpress, the default Windows7 ASP.Net web server is running in 32bit mode, meaning it has a maximum of 4GB of memory no matter what you do. Well, you can make IISExpress run in 64bit mode by simply switching it on in the Windows registry. I don't want to copy the content of the article from someone else, so here is a link towards how to do it: Debugging VS2013 Websites Using 64-bit IIS Express.
Just in case you want the ultra fast version, copy this into a file with the .reg extension and execute it:
You know how many a time you need to get the results of a query as a strings list, let's say a CSV and there is nothing in Microsoft SQL Server to help you? What you would like is something like an aggregate function similar to SUM that returns the concatenated value of all strings in the SELECT query.
From SQL Server 2017 on, there is such a function called STRING_AGG and you use it just like one might expect.
If you have an older version of SQL Server... upgrade :) But if you can't, here is a solution:
And before SQL Server 2008 that was the case, you needed to use variables and SELECT @S=@S+[Value] FROM .... But in SQL 2008 they added more XML support and thus the data() XML PATH method. Take notice that this method adds a space between atomic values. So, without further ado, here is the code to concatenate several values into a comma separated value string:
DECLARE @T TABLE(S NVARCHAR(Max)) INSERTINTO @T VALUES('Filet'),('T-bone'),('Sausage'),('Würstel') -- enough with the fruity examples!
SELECTCONVERT(NVARCHAR(Max),(SELECT S+','AS'data()'FROM @T t FOR XML PATH('')))
Result: Filet, T-bone, Sausage, Würstel, - of course, remove the last comma.
Update: What if I have a table with multiple columns and I just want to aggregate one?
The solution is to use the select with WHERE clauses on the GROUP BY columns. Here is an example:
SELECT col1, col2, CONVERT(NVARCHAR(Max),( SELECTvalue+','AS'data()'FROM @T t2 WHERE t1.col1 = t2.col1 AND t1.col2 = t2.col2 FOR XML PATH(''))) FROM @t t1 GROUPBY col1, col2
It was inevitable, both Naruto and Sasuke were getting ridiculously strong. In the end they fought the mother of all chakra and... of course they won, then they fought each other, but it was kind of underwhelming, since their power prevented any subtlety and they just went cowboy punching each other. The last color chapter is about how they leave it all to the next generation, although it is hard to think of anything more they could do to top their parents. I loved the entire series and it is easy to understand why: simple concept, positive feelings like friendship and camaraderie and weird magical ninja fights. I was a teen when I started watching the anime and now I am freakishly old. Well, life happens. After I got kind of tired of watching the anime, even if it was really well done and followed the manga faithfully, I went with reading the manga. I like to use Mangastream for my reading purposes, so you can read the entire thing here: Naruto Shippuden. Even if it appears they are writing some Naruto side stories, I am not sure I will ever read them. I am still looking for a manga that can grab me like Naruto has.
I was watching a video from GM Niclas Huschenbeth where he played lytura in an online game. Amazingly he lost, but that is what happens when you underestimate your opponent, which I think was what actually went wrong. At the end of the video it was difficult to see exactly what White could have done after a point, so I analysed the game with the computer and found some amazing moves. First I will show you the original game. I urge you to think it through and see what moves you would have done differently, like a chess puzzle, before you watch the game as the computer suggested it. You can also watch the video online at the end of the post and, if you like chess, I really recommend Huschenbeth's channel. Not only is he a great player, but also a decent and nice guy and young, too. His Blitz & Talk GM Special videos are especially cool, since he plays with other world class grand masters.
But enough of that. Here is the game, up to a point where it didn't really matter what happened: 1. e4 e5 2. f4 exf4 3. Nf3 g5 4. Nc3 g4 5. Ne5 Qh4+ 6. g3 fxg3 7. Qxg4 g2+ 8. Qxh4 gxh1=Q 9. Qh5 Nh6 10. d3 d6 11. Bxh6 Be6 12. Bxf8 Rxf8 13. Nf3 Nd7 14. O-O-O c6 15. Bh3 Nf6 16. Rxh1 Nxh5 17. Bf1 Rg8 18. Ne2 Kd7 19. Kd2 Rg7 20. Ke3 Rag8 21. a3 Nf6 22. h3 b6 23. d4 a5 24. Nf4 Rg3 25. Ne2
Here is the video of the game, to give you the time to think it through:
And finally, here are two lines that the computer recommended. This is considering that the fateful 10. d3 was played already: 1. e4 e5 2. f4 exf4 3. Nf3 g5 4. Nc3 g4 5. Ne5 Qh4+ 6. g3 fxg3 7. Qxg4 g2+ 8. Qxh4 gxh1=Q 9. Qh5 Nh6 10. d3 d6 11. Bxh6 Be6 12. Bxf8 Rxf8 13. Nf3 Nd7 14. O-O-O c6 15. Nb5 Nf6 (15. .. cxb5 16. Bh3 Qxd1+ 17. Kxd1 O-O-O 18. Bxe6 fxe6 19. Nd4 {White +1.2}) 16. Nxd6+ Kd7 17. Qh3 Bxh3 18. Bxh3+ Kxd6 19. Rxh1 {Black +0.3}
Did you see those Nb5 and Qh3 moves?! Who does that? :)
The wonder of .Net is that most of the time we don't really have to care about stuff like memory allocation, the framework does everything for you. One especially annoying thing, though, is when you are using a basic data structure that is supposed to be efficient and you get stuff like OutOfMemoryException. One of the cases is List<T> which in the background uses one big array. This means two things: one is that certain modifying operations are slow and the other is that it requires contiguous memory space for the items. If your memory space gets too fragmented, then there is not enough to allocate for hundred of thousands of items, even if in that memory space you only need a pointer for each item. That is why you end up with out of memory exceptions. It's not like you don't have enough memory, it's that you don't have a big enough contiguous block of it.
As a solution I give you... the BucketList<T> class. It has a bucket size that defaults to 10000 and it implements a list of lists that each will always have at most that amount of items as specified in the bucket size. This way operations that remove and add items will only operate on 10000 item big arrays and there is no need for only one big memory block. I implemented the IList interface explicitly, so that you will never find it comfortable to use an instance as a BucketList, but as an IList. This way you can replace the implementation of the interface with a normal List or whatever other form you like. Enjoy!
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks;
/// <summary> /// Create a bucket list from an IEnumerable /// </summary> /// <param name="enm"></param> public BucketList(IEnumerable<T> enm):this() { var list = (IList<T>)this; foreach (var itm in enm) { list.Add(itm); } }
/// <summary> /// The item count /// </summary> publicint Count { get { return ((IList<T>)this).Count; } }
#region IList<T> implementation
int IList<T>.IndexOf(T item) { var index = 0; for (var i = 0; i < _list.Count; i++) { var idx = _list[i].IndexOf(item); if (idx < 0) { index += _list[i].Count; } else { index += idx; return index; } } return -1; }
void IList<T>.Insert(int index, T item) { var idx = 0; for (var i = 0; i < _list.Count; i++) { var lst = _list[i]; if (index < idx + lst.Count) { lst.Insert(index - idx, item); splitListIfTooLarge(i); _count++; return; } else { idx += lst.Count; } } thrownew IndexOutOfRangeException("index"); }
void IList<T>.RemoveAt(int index) { var idx = 0; for (var i = 0; i < _list.Count; i++) { var lst = _list[i]; if (index < idx + lst.Count) { lst.RemoveAt(index - idx); removeListIfEmpty(i); _count--; return; } else { idx += lst.Count; } } thrownew IndexOutOfRangeException("index"); }
T IList<T>.this[int index] { get { var idx = 0; for (var i = 0; i < _list.Count; i++) { var lst = _list[i]; if (index < idx + lst.Count) { return lst[index - idx]; } else { idx += lst.Count; } } thrownew IndexOutOfRangeException("index"); } set { var idx = 0; for (var i = 0; i < _list.Count; i++) { var lst = _list[i]; if (index < idx + lst.Count) { lst[index - idx]=value; } else { idx += lst.Count; } } thrownew IndexOutOfRangeException("index"); } }
void ICollection<T>.CopyTo(T[] array, int arrayIndex) { var index = arrayIndex; foreach (var lst in _list) { lst.CopyTo(array, index); index += lst.Count; } }
int ICollection<T>.Count { get { return _count; } }
bool ICollection<T>.IsReadOnly { get { returnfalse; } }
bool ICollection<T>.Remove(T item) { for (var i = 0; i < _list.Count; i++) { var lst = _list[i]; if (lst.Remove(item)) { _count--; removeListIfEmpty(i); returntrue; } } returnfalse; }
You may have heard of the recent scandal about Internet leaks of nude or personal photos of female celebrities. Dubbed "The Fappening", a word play on the (horribly bad - my opinion) movie The Happening and the term "fap", which refers to masturbation, it is a huge collection of pictures that seem to have been taken by the celebs themselves or by close acquaintances in private surroundings. You know... selfies. They were obviously obtained through some underhanded methods and published in several waves, three at the moment. I am not here to give you torrent links to the leaked material, even if they are fairly easy to find, instead I am going to talk about the general reaction, as proven by seed/leech ratios of torrent downloads: after an initial boom in interest, the updates have been less and less interesting to people. Why is that?
At first I hypothesized that the vehement reaction of the media was part joining in the fray, like sharks smelling blood, and in their own way pointing people to search the net for the photos (yeah, I don't really believe in the difference between mentioning something that is easy to find and a link), but also because this affair was an obvious attack on the brands that the celebrities are standing for. Nobody really cares about how some of the actresses in this situation are actually acting if they look hot enough and also, very important, how unattainable they are. The reaction of the agencies that invested a lot in these brands was expectedly violent. However, there is another factor, one that I think makes it all meaningful to discuss: people expected something completely different from what was provided. No matter how much we understand the media processes involved in creating a celebrity personality, we don't really (emotionally) believe that it is happening or we don't understand the extent of the effort. Indeed, when people downloaded the pictures and guiltily and excitedly started to look at the images, they found out... real women. Without the expertise of professional photographers and without the extensive post processing and media censorship that occurs after the pictures are taken, the celebrity females that we collectively idolatrized appeared as less than goddesses and as just normal people, with zits and saggy tits and all that. Even if they look fabulous, like some of them do, the amateurish manner of the way the pictures were taken give little pleasure. Indeed, the only pleasure that can be extracted from this is akin to rape: they wanted to cheat, to show us just the Photoshopped images of themselves, but we showed them! We took what we wanted.
Look at the torrent statistics though. The October collection of pictures is at the top, over Fappening 2 that has more seeds than Fappening 3. People lost interest: they were curious, downloaded the stuff, then they didn't follow through with the rest. All because they were getting something other than they had bargained for. Instead of pictures showing more of the beautiful women we yearn for, they showed enough to make those women feel terribly human. The breasts, the asses and all the other hidden skin was hidden not because they were something amazing to hide, but because the myth was more beautiful and sexy, perfect in its imperfect sharing. It raises important questions that I believe to be worth exploring: what are we really falling for? What is beauty: just a branded illusion? Why do girls appearing fully clothed and smiling in a music video or a movie seem more desirable than the fully naked and active girls in porn films? Are we really interested in the "reality show" of someone's intimacy, or do we, really, secretly, want these people to show us only the beautiful parts, to make us believe that perfect people exist? Are we all victims of a global romcom? And who is it that is laughing at the comedy aspect of all this?
The autumn season for TV shows is beginning, so I am here again to discuss the ones that I have been watching lately.
The Legend of Korra - This third season was better than others, but we still have to contend with Korra's helplessness. Also I have this nagging feeling that her not being able to do anything and having to be saved by her friends repeatedly has less to do with what people can accomplish together and more with the fact that she is female and therefore must be perceived as in distress. And before you ask me when did I become a feminist, just read on and see what are the shows with female lead characters and what happens to them. The magnificent four villains also were ridiculously strong for people living in solitary confinement for years.
The Good Wife - Interesting new dynamic of the show. I am quite fascinated by how they achieve this dynamic equilibrium: giving people what they want, but always changing things one way or another, moving characters around, keeping things interesting. If nothing else, this is a brilliantly constructed TV show.
Homeland - Homeland will have a fourth season, which is to begin soon. They released a sort of recap of what happened so far, but I believe you should skip it as I think it contains spoilers for the upcoming season. Also it conveys nothing of the quality of the first three seasons. This is a good show, you should watch it, even if the lead female character is bipolar and prone to do crazy things in the name of love. See what I mean?
Gotham - A new superhero series will being shortly. Hurray! This time is about Gotham where every supervillain and superhero is young, at the beginning of their "careers". Why are you throwing an adolescent tantrum, Bruce? Because I'm Batman!
Ressurection - I've decided not to watch it anymore. It is a pale copy of the original.
Vikings - I love this show, however it is beginning to change. It started with eager actors doing a cool project, so they all gave their best and the focus was on the way of the Viking. But now, after a while, the actors are acting more like themselves and the focus of the story shifted towards feudal intrigue.
Suits - The fourth season just ended with Mike back at the law firm and the comic relief guy, Louis Litt, leaving the company. I thought the actor got tired of playing a ridiculous man that doesn't seem to do anything right, but the ending of the season goes a different direction.
Black Box - As expected, the show was cancelled.
Halt and Catch Fire - A lot of emotion, a lot of tension, a lot of drama. Of course it got renewed for the second season, even if it doesn't make a whole lot of sense.
Under the Dome - I kept watching and watching and watching until I realized this is the new Lost! Every episode something happens that is completely implausible and unrelated to anything in the previous episodes. I will not watch it anymore.
Crossbones - This is a well done show, with good acting and high production values. However it relates to pirates and not even the fun ones. Imagine watching a TV show about drug lords and instead of high powered automatic rifle fights you would see the accountants doing the job of inventorying the proceeds. Black Sails is like that and then they insert some artificial personal drama to spice it up.
The Honourable Woman - Described as "The daughter of an assassinated Zionist arms dealer seeks to legitimise the family business while righting the wrongs done to them in the past.", it is a strange little show. I started watching it, but then I stopped. It is heavy, well acted, good production values. I wasn't in the mood for it for a long time, though. The premise is nothing if not brave.
The Leftovers - I just couldn't watch it anymore. The entire show was about people feeling depressed and/or suicidal because they were not among the 2% of people who magically vanished. Depressing and pointless.
The Witches of East End - I guess it's like The Originals with witches instead of vampires. I keep watching it, though, even if I don't know why. Probably someone placed a spell on me.
Tyrant - Speaking of brave TV show premises, this is a show about an Arab-American, who left his birth country because his father was an ass - and the country's tyrannical ruler - who returns there after his father dies. I like the acting and the premise. What I don't like is the condescending viewpoint of the script: the smart educated moral good looking American comes and teaches his older brother how to rule the country based on American principles. But the ending of the first season implies that this is not what it is going to happen at all. Could it be that it will be the series to show Americans that all tyrants are manufactured, more or less, by circumstances? Check out this quote: "CIA guy: The US is not in the business of regime change. Al Fayed: Say that again with a straight face"
Taxi Brooklyn - A weird premise for a show that doesn't know what it wants to be: a comedy, a car thing or a police procedural. The idea is that a police detective (female and hot, but with daddy issues, of course) loses her right to drive. Instead, she coopts a French immigrant taxi driver to move her around. He serves as the comic relief most of the time, but also, probably, as the male reason why anything gets done. Decided not to watch it anymore for several reason: the premise, the scripting and the acting being at least three of them.
Extant - Female astronaut returns after a solo mission of more than nine months. And she is pregnant. Wonderful premise, however it has several things that are not going for the show: the main actress is Halle Berry, who I completely dislike. Then there is the electronic boy story arch (her husband and she have an artificial son that the husband created) which is either a good subject for another series or completely out of place here. And finally, the "corporate conspiracy" arch, where the guy from Helix is the bad guy. I don't know, imagine Gothica in space, with a little of AI sprinkled over for fun. Ugh!
The Bridge - Haven't watched any of the second season episodes, waiting for the wife to see it with me. I will wait for a long time more, I believe.
Tokyo Ghoul - I usually make separate blog posts with anime, but this one was nothing that deserves too much attention. Hybrid man and ghoul (something like a vampire that also likes to eat the flesh), the main character is a whinny boy who gets stepped on by just about everybody. In the end he is captured and tortured a lot, which makes him more aggressive and less whinny. But still it's nothing too interesting.
Ghost in the Shell - Arise - the modern reboot of GITS, it is not bad. Unfortunately there are only four OVA episodes, each one released months from the previous one. Still waiting for the fourth one. I like the show a lot, but then I am biased, since I love anything related to the Ghost in the Shell universe.
The Strain - Guillermo del Toro wrote a horror book with vampires and now he is creating the TV show based on the book. So far I like the series, although I've already read the book and a lot of the surprise is gone. It's brutal, with vampires that are neither sexy nor romantic, but just want to drink all your blood and answer unconditionally to their Master. The Master is even scarier. It is not brilliant, but certainly beats The Living Dead.
Blood Lad - Horribly stupid anime. This guy who is a prince in the magical world of demons accidentally meets a human girl who then gets killed. He pledges his support to help her ghost find a body again. Just boring.
Longmire - The second season is a lot darker, but for all the wrong reasons, if you ask me. The things I liked in Longmire were to see him being uncompromisingly moral, even if he appears lonely and withdrawn to everybody around him. This season everybody connected to him has to face a metaphorical demon or ten, including Longmire himself. It felt pushed too far, I think.
The Lottery - Just like Extant, The Lottery is a sci-fi series centered around a woman. Naturally, the only things she can possibly do is worry about children. You see, for some reason no one is able to sire children anymore. A scientist manages to impregnate 100 embryos and there will be a lottery to give the children to 100 couples. Lots of government conspiracies and child protecting going around. I may watch the rest of the episodes, but the pilot didn't convince me at all.
Manhattan - A show about the development of the first atomic bomb. Its take is interesting, focusing on the personal quirks, on the politically incorrect, on the compromises and mistakes, on scientists, military and their spouses alike. What I found fascinating is showing how the obnoxious arrogance of someone truly driven and brilliant is almost forced, as a defense mechanism against getting pulled down by the mediocre. Don't get me wrong, it is not another show about brilliant assholes a la Dr House and does not apologize gratuitous arrogance. Instead it shows how vital it is in the way to success. Given that, the character of Winter is so bloody annoying that I wonder why anyone would put up with such a guy and not kick his ass or just shoot him directly. Perhaps that is another strength of the show: describing how close to failure for some many different reasons the Manhattan Project truly was. It was a government project after all.
The Assets - The series ended with episode 8. In that sense, it is actually a miniseries, as the entire premise of the show comes to an end with the final episode. I wrote before that it got cancelled really soon and I believe that the reason is that all characters are really unlikable. Also, since it is based on a book that describes a real event that a lot of Americans know how it went down, the interest was probably small. Also, at the end of the show you realise something: spies are really boring.
Legends - A TV series for the sole purpose of keeping Sean Bean alive! :) Sean Bean is this deep undercover agent with a lot of prefabricated "legends", or fake lives, that he uses to infiltrate criminal or terrorist organizations. This leads him to have identity crises, even questioning if any of his lives, including the real one, are actually real. Sexy Ali Larter is his "handler", which can't hurt.
Outlander - Is this an attempt at a romantic Yankee in King Arthur's Court? A 1945 nurse is thrown back in time in 1743 Scotland. Her healing skills are helping her join this band of rebellious Scots and experience the life there. The synopsis of the show didn't give me much hope for it, but after watching all the episodes so far it got me hooked. The acting is good and the script is well written. I hope it doesn't deteriorate on the way. It is also intriguing that she has 200 years of extra knowledge, but she doesn't suddenly share antibiotics with the world or try to improve muskets or whatever. Being a woman in a sort of prisoner situation serves to explain that, but how long can it go on like that?
The Divide - About a woman that works for an organization that tries to help the wrongly accused in the US justice system. There is a coverup, a conspiracy, White men wrongly accused of killing a Black family, politics, etc. It started as intriguing, but outside the twist that is probably looming, I don't think there is anything really interesting to me in the show. Too political, I guess.
The Knick - This is one of the good ones. A look at the professional and personal lives of the staff at New York's Knickerbocker Hospital during the early part of the twentieth century, it is directed by Soderbergh, starring Clive Owen and it is both brutal and truthful. A must see for all the new age assholes that like to think medicine was better at that time.
Doctor Who - Season 8 with Capaldi is both interesting and dull. It was supposed to be darker, more intense, but it isn't really. What it is is confusing, though. I didn't like the pilot, but I enjoyed the second and third episodes. I don't know, let's see.
Forever - Another "special" person helping the police. Why?! Oh, why?!! This time it's about a guy who cannot die. Every time he dies, he appears somewhere in water, naked. A single, sexy, female police detective partners with him in order to solve crime. I like the actor, though, even if the script is eerily similar to any of the shows in the genre out there. Let's see how it goes.
Intruders - "Jack Whelan is a former LAPD officer who is asked to investigate some strange occurrences. He tries to find answers, but he's stonewalled at every turn. Baffled, he continues until he starts to concentrate his search around a secret society that chases immortality by seeking refuge in the bodies of others." The cast seems good. I still have to actually watch it, though.
Hysteria - It concerns the idea that people can get afflictions from social media. A perfect reason for Internet control! :) Anyway, the title is perfect as it seems the cause of the problem is hysteria, while the reaction of the people is mass hysteria. The Hannibal Lecter beginning, though, may either be the sign of bad writing or of some ingenious plot device. Wait and see.
Hand of God - Ron Perlman? Sign me up! I've seen the pilot though and it's kind of weird. You get this judge who's son just killed himself. He did it because someone raped his wife and made him watch. And so the judge gets born again in a shady church by a preacher who is an ex actor and a con artist. Then the judge starts hearing the voice of God. Some things clearly get lost in translation, because He is always putting Ron Perlman in the situation to be a total ass who everybody thinks is insane. Oh, except the insane people, who think he is the new Solomon. But is he? Weird, huh?
Just a quick solution for a problem that seems to be @font-face not functioning at all. You define the @font-face CSS rule, you then add a font-family CSS rule for your element and nothing seems to be happening. The element has the correct family when you look in the browser inspection tool, but the font is never loaded. Nor is it displayed. What could be wrong?
The simple answer in my case is that the element I wanted to style had a rule like this: font-family: 'My Special Font, Verdana, Arial';. The correct rule should have been font-family: 'My Special Font', Verdana, Arial;. The quotes are just for escaping spaces and the like for the individual family names, not for encapsulating the "value" of the css rule. I know, stupid, but I wasted half an hour on it!
Update: If you are behind a proxy, here is some additional code to add right after creating the update session:
'updateSession.WebProxy.AutoDetect = true 'try this first. It doesn't work so well in some environments if no authentication windows appears (*cough* Windows 8 *cough*)
strProxy = "proxy name or address:proxy port"'ex: 1234:999 strProxyUser = "your username" strProxyPass = "your password"
I am working behind a "secured" web proxy that sometimes skips a beat. As a result there are days in which I cannot install Window Updates, the normal Windows update application just fails (with Error Code: 0x80246002) and I am left angry and powerless. Well, there are options. First of all, none of the "solutions" offered by Microsoft seem to work. The most promising one (which may apply to you, but it did not apply to me) was that you may have corrupted files in the Download folder for Windows updates. As a result you need to:
Stop the Windows Update service issuing the command line command: net stop wuauserv or by going to Control Panel, Services and manually stopping it.
Go to the download folder parent found at %systemroot%\SoftwareDistribution (cd %systemroot%\SoftwareDistribution) and rename the Download folder (ren Download Download.old)
Start the Windows Update service issuing the command line command: net start wuauserv or by going to Control Panel, Services and manually starting it.
So my solution was to use a script that downloads and installs the Windows updates from the command line and I found this link: Searching, Downloading, and Installing Updates that pretty much provided the solution I was looking for. There are two issues with the script. The first is that it prompts you to accept any EULA that the updates may present. The second is that it downloads all updates, regardless of severity. So I am publishing here the script that I am using who fixes these two problems: EULA is automatically accepted and only Important and Critical updates are downloaded and installed:
Set updateSession = CreateObject("Microsoft.Update.Session") updateSession.ClientApplicationID = "Siderite :) Sample Script"
Set updateSearcher = updateSession.CreateUpdateSearcher()
WScript.Echo "Searching for updates..." & vbCRLF
Set searchResult = _ updateSearcher.Search("IsInstalled=0 and Type='Software' and IsHidden=0")
WScript.Echo "List of applicable items on the machine:"
For I = 0 To searchResult.Updates.Count-1 Set update = searchResult.Updates.Item(I) WScript.Echo I + 1 & "> " & update.Title Next
If searchResult.Updates.Count = 0 Then WScript.Echo "There are no applicable updates." WScript.Quit EndIf
WScript.Echo vbCRLF & "Creating collection of updates to download:"
Set updatesToDownload = CreateObject("Microsoft.Update.UpdateColl")
For I = 0 to searchResult.Updates.Count-1 Set update = searchResult.Updates.Item(I)
addThisUpdate = false If update.InstallationBehavior.CanRequestUserInput = trueThen WScript.Echo I + 1 & "> skipping: " & update.Title & _ " because it requires user input" Else If update.EulaAccepted = falseThen update.AcceptEula() WScript.Echo I + 1 & "> Accept EULA " & update.Title addThisUpdate = true 'WScript.Echo I + 1 & "> note: " & update.Title & " has a license agreement that must be accepted:" 'WScript.Echo update.EulaText 'WScript.Echo "Do you accept this license agreement? (Y/N)" 'strInput = WScript.StdIn.Readline 'WScript.Echo 'If (strInput = "Y" or strInput = "y") Then ' update.AcceptEula() ' addThisUpdate = true 'Else ' WScript.Echo I + 1 & "> skipping: " & update.Title & _ ' " because the license agreement was declined" 'End If Else addThisUpdate = true EndIf EndIf
If addThisUpdate AND (update.MsrcSeverity = "Important"OR update.MsrcSeverity = "Critical") Then 'wscript.echo ("This item is " & update.MsrcSeverity & " and will be processed!") Else 'comment these lines to make it download everything wscript.echo (update.Title & " has severity [" & update.MsrcSeverity & "] and will NOT be processed!") addThisUpdate=false EndIf
For I = 0 To searchResult.Updates.Count-1 set update = searchResult.Updates.Item(I) If update.IsDownloaded = trueThen WScript.Echo I + 1 & "> " & update.Title updatesToInstall.Add(update) If update.InstallationBehavior.RebootBehavior > 0 Then rebootMayBeRequired = true EndIf EndIf Next
If updatesToInstall.Count = 0 Then WScript.Echo "No updates were successfully downloaded." WScript.Quit EndIf
If rebootMayBeRequired = trueThen WScript.Echo vbCRLF & "These updates may require a reboot." EndIf
WScript.Echo vbCRLF & "Would you like to install updates now? (Y/N)" strInput = WScript.StdIn.Readline WScript.Echo
If (strInput = "Y"or strInput = "y") Then WScript.Echo "Installing updates..." Set installer = updateSession.CreateUpdateInstaller() installer.Updates = updatesToInstall Set installationResult = installer.Install()
For I = 0 to updatesToInstall.Count - 1 WScript.Echo I + 1 & "> " & _ updatesToInstall.Item(i).Title & _ ": " & installationResult.GetUpdateResult(i).ResultCode Next EndIf WScript.StdIn.Readline()
Save the code above in a file called Update.vbs and then creating a batch file that looks like this:
@ECHO OFF start "Command line Windows update" cscript Update.vbs
Run the script and you will get the .vbs executed in a command line window that will also wait for pressing Enter at the end of execution so you can see the result.
For other solutions that are more system admin oriented, follow this link which provides you with a lot of possibilities, some in PowerShell, for example.
Also, I didn't find a way to install the updates without the Windows annoyance that asks me to reboot the computer popping up. If you know how to do that, I would be grateful.
I have been watching this weekly space show made by a husband and wife couple working for SpaceX. Initially called Spacevidcast, now it is called TMRO (pronounced Tomorrow). It is a great show, great quality, nice humor and, more than anything, a comprehensive video report on weekly events in space exploration, commercial or otherwise. If you are even remotely interested in space, you should subscribe. And they have been doing it all from their own resources and crowdfunding for seven years! You gotta love that.
But the selfish reason I am blogging about them is that I got mentioned in the TMRO show! Click here to see how they are trying and even succeeding to pronounce my Internet nom de guerre. The effort is appreciated.
Because of idiotic firewall rules at my workplace I am forced to use Hangouts rather than Yahoo Messenger as an instant messenger. I am not going to rant here about which one is best, enough to say that most of my friends are on YM and being on Hangouts doesn't help. Hangouts has many annoyances for me, like its propensity to freeze when you lose Internet connection often or the lack of features that YM had. In fact I was so annoyed that I planned to do my own professional messenger to rule them all. But that's another story.
I am writing this post because of a behaviour of the Google Hangouts instant messenger (which, to be fair, is only a Chrome extension), mainly that after a while, the green traybar icon of the messenger goes in the "hidden icons" group. I have to customize it every day, sometimes twice, as it seems to reset this behavior after a period of use, not just on restarts. There is a Google product forum that discusses this here: System tray icon resets every time Chrome is started where you also see a few comments from your truly.
I immediately wanted to create a script or a C# program to fix this, but at first I just searched for a solution on the web and I found TrayManager, a C# app that does what the "Customize..." tray link does and more. One of the best features is a command line! So here is what you do after downloading the software and installing it somewhere: TrayManager.exe -t "Hangouts" 2. Now, probably that doesn't solve the problem long term. It is just as you would go into the Customize... link, but it's faster. Also, it has no side effects if run multiple times, so you can use Task Scheduler to run it periodically. Yatta!
BBC's show The Sky at Night did a coverage of the Rosetta mission, called How to Catch a Comet. It is the standard popular science show, with a lot of fake enthusiasm from the reporters and simple language and explanations, but for people who read this blog entry and wonder what the hell Rosetta is, it does the job. The fat black reporter is really annoying, and not because she's black, but because she feels completely fake whenever she says anything. Other than that the show is decent.
You get to learn about comet 67P, the Rosetta probe features and mission, walk around ESA, talk to scientists and even see a how-to about photographing comets - it was funny to see a shooting star in the night sky while the guy was preparing his camera and talking in the video. Of course, for me the show stopped just when it was getting interesting. I know you can't do much in 29 minutes, but still. I hope they do follow-up shows on Rosetta and I can't wait for November when the lander module will try to grapple the comet and land.
Just in case I've stirred your interest, here are some links that can cover the subject in a lot more detail: ESA Euronews: Comet Hunters: Rosetta's race to map 67P - 8 minutes and a half of Euronews report from 11 August. ESAHangout: How do we journey to a comet? - Google Hangout from ESA explaining the mission. It's one hour long and it dates from the 26th of June. Many other videos about Rosetta can be found on the ESA channel. A playlist about Rosetta from Mars Underground. The most interesting is this video, published on 11 Aug 2014. It lasts an hour and a half and shows the first mission images and science results. Comets - A wonder to Behold, A continuing Stream of Surprises - The Beauty and the Danger, not about Rosetta, but one hour and a half about comets. The documentary is trying to justify a controversial theory about the electric nature of comets. It is well done with a lot of proof, but I know too little about the theory so I can't recommend it. Interesting, though.