Converting timestamps (linux, mysql) to datetime (mssql, t-sql)

Published Jul 28, 2007

Posted in
database
programming

Well, a timestamp is defined as the integer number of seconds from 1st of January 1970, but not 1st January 1970 itself, that would mean 0 seconds and that is reserved as the 'zero time'.

So, converting is easy in T-Sql (Microsoft Sql Server):
@dateTime=DateAdd(second,{d '1970-01-01'},@timeStamp)
@timeStamp=DateDiff(second,{d '1970-01-01'},@dateTime)

The {d 'yyyy-MM-dd'} notation is an ODBC escape sequence.

Debugging T-SQL

Published Jul 18, 2007

and has 0 comments

In order to debug SQL many people open new windows in Query Analyser or Management Studio trying to see where the errors come from and opening transactions and rolling them back and basically be miserable.

Yet, even from Microsoft SQL 2000 stored procedures had debug support. You would use Query Analyser, open Object Browser, right click a stored procedure and select Debug.

However, in SQL 2005 you can't do that anymore. Query Analyser is no longer available, the Management Studio doesn't have debug options and the SQL 2000 Query Analyser doesn't allow you to debug stored procedures on SQL 2005 servers. But there is support for SQL debugging in Visual Studio .NET, in the Professional and Team versions. Let me rephrase: If you have the Express or Standard editions you are out of luck. No SQL 2005 debugging for you. I did some queries on the web searching for third party sql debuggers, maybe something from Microsoft, like their Javascript Debugger (which works better than the in-built javascript debugging in Visual Studio, btw)

There are some ugly problems that may occur:

For example you want to add an SQL Connection to Visual Studio 2005 and you get a "Unable to add data connection. Key not valid for use in specified state." error. The fix is to delete all not working sql connections that are already defined.
Or you cannot step into a stored procedure from ASP.NET code on Windows Server 2003 in which case you need to run the WWW service in IIS 5.0 isolation mode.

Maybe others. In this case, please let me know so I can update the post. Other people need help too, you know?

Even so, SQL debugging is not as straight forward as usual debugging. From the Microsoft entry on How to debug stored procedures in Visual Studio .NET I quote the Limitations of stored procedure debugging:

You cannot "break" execution.
You cannot "edit and continue."
You cannot change the order of statement execution.
Although you can change the value of variables, your changes may not take effect because the variable values are cached.
Output from the SQL PRINT statement is not displayed

Using Regular Expressions in T-SQL

Published Jul 17, 2007

Posted in
database
programming

and has 0 comments

Well, you can! But you need to use the sp_OA* stored procedures and VBScript. Here is a link to the user defined function that allows you to regex in sql:
Regular Expressions in T-SQL

If you check out the comments to this link you can even find a Regex search and replace solution.

You may get this error while trying to use the OLE Automation :
Msg 15281, Level 16, State 1, Line 1
SQL Server blocked access to procedure 'sys.sp_OACreate' of component 'Ole Automation Procedures' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ole Automation Procedures' by using sp_configure. For more information about enabling 'Ole Automation Procedures', see "Surface Area Configuration" in SQL Server Books Online.

The solution I've found is use sp_configure to enable Ole Automation Procedures like this:

EXEC sp_configure 'show advanced options', 1 -- make them available
reconfigure

EXEC sp_configure 'Ole Automation Procedures', 1 -- turn on OLE
reconfigure

EXEC sp_configure -- to see the new value
EXEC sp_configure 'show advanced options', 0 -- make them unavailable
reconfigure

How to get rid of 'String or binary data would be truncated', divisions by zero or arithmetic overflows in T-SQL

Published Jun 15, 2007

Posted in
database
programming

and has 0 comments

In MS Sql when you try to insert or update a field value or parameter that is bigger than the defined size you get String or binary data would be truncated and the query fails. In order to get rid of this, you can use SET ANSI_WARNINGS OFF. But, warning (not ANSI :) ), there are some drawbacks that you have to be aware of before using SET ANSI_WARNINGS OFF :

putting SET ANSI_WARNINGS OFF in a stored procedure will force a procedure recompilation each time, so it is better to set it before the execution of the stored procedure
this setting will disable the indexes, so queries might run slower
no error will be generated when a division by zero appears and a null value is returned (this might not sound as a drawback, but think that you don't always want to return null in some cases and you want at least to be aware of them)

What SET ANSI_WARNING does:
- disables the warning message from aggregated functions when finding null values
- returns null on divisions by zero and arithmetic overflows
- values are truncated when the length of the data is bigger than the defined size

If you want only part of the functionality above, also check SET ARITHABORT or SET ARITHIGNORE.

Also here is a list of how these options affect each other.

Super Fast and Accurate string distance algorithm: Sift3

Published Apr 4, 2007

and has 66 comments

Update November 2014: Sift4 is here!! Check out the new improved version of Sift here: Super Fast and Accurate string distance algorithm: Sift4

Update October 6 2014: New stuff, compare Levenstein vs Sift here:

Algorithm: Levenstein Sift

String 1: String 2:

Result:

Update June 25th 2013: I've decided to play a little with the suggestions in the comments and check for validity. This was spurned by the realization that a lot of people use my algorithm. So, in order to celebrate this, here is the "3B" version of the Sift3 algorithm:
It is made in Javascript, this time, as it was easier to test and has the following extra features:

a maxDistance value that tells the algorithm to stop if the strings are already too different.
two pointers c1 and c2, rather than a single pointer c and two offsets
Instead of dividing to 2 the total length of the strings compared, now I divide it with 1.5. Why? Because this way the value is closer to the Levenshtein distance computed per random strings

~~Happy usage!~~ The variant I posted was totally buggy. I removed it. Just use sift3Distance.

Update: the Sift algorithm is now on Github.

A while ago I wrote an entry here about Sift2, an improvement of Sift, the original and silly string distance algorithm. Now I am publishing Sift3, which is way more accurate and even simpler as an algorithm.

I found out that my algorithm is part of a class of algorithms that solve the Longest Common Substring problem, therefore I calculated the LCS, not the distance, then the distance from the LCS. The result is way more robust, easy to understand and closer to the Levenshtein algorithm both on random strings and user databases. Not to mention that there is no goto in this one.

BTW, if you are looking for an algorithm that detects switched words, this is not it :) This just looks for typos and small regional differences between the strings. I mean, you could normalize the strings, so that words are ordered by some mechanism, then it would work because the words wouldn't be switched :)

I promise to work on a word switching algorithm, but not in the near future.
Without further ado, here is the code:

The C# code is a method in an object that has a private member maxOffset. As in Sift2 maxOffset should be around 5 and it represents the range in which to try to find a missing character.

public float Distance(string s1, string s2, int maxOffset)
{
if (String.IsNullOrEmpty(s1))
{
return String.IsNullOrEmpty(s2)
? 0
: s2.Length;
}
if (String.IsNullOrEmpty(s2))
{
return s1.Length;
}
int c = 0;
int offset1 = 0;
int offset2 = 0;
int lcs = 0;
while ((c + offset1 < s1.Length)
&&
(c + offset2 < s2.Length))
{
if (s1[c + offset1] ==
s2[c + offset2])
lcs++;
else
{
offset1 = 0;
offset2 = 0;
for (int i = 0;
i < maxOffset;
i++)
{
if ((c + i < s1.Length)
&&
(s1[c + i] == s2[c]))
{
offset1 = i;
break;
}
if ((c + i < s2.Length)
&&
(s1[c] == s2[c + i]))
{
offset2 = i;
break;
}
}
}
c++;
}
return (s1.Length + s2.Length)/2 - lcs;
}

And here is the T-Sql code. This version is actually an improvement of my original source, gracefully provided by Todd Wolf:

CREATE FUNCTION [DBO].[Sift3distance2]
(
@s1 NVARCHAR(3999),@s2 NVARCHAR(3999),@maxOffset INT
)
RETURNS FLOAT
AS 
BEGIN
DECLARE @s1LEN INT,@s2LEN INT

SELECT @s1LEN=Len(Isnull(@s1,'')),@s2LEN=Len(Isnull(@s2,''))

IF @s1LEN=0 RETURN @s2LEN
ELSE
IF @s2LEN=0 RETURN @s1LEN

IF Isnull(@maxOffset,0)=0 SET @maxOffset=5

DECLARE @currPos INT,@matchCnt INT,@wrkPos INT,@s1Offset INT,@s1Char VARCHAR,@s1Pos INT,@s1Dist INT,@s2Offset INT,@s2Char VARCHAR,@s2Pos INT,@s2Dist INT

SELECT @s1Offset=0,@s2Offset=0,@matchCnt=0,@currPos=0

WHILE(@currPos+@s1Offset<@s1LEN AND @currPos+@s2Offset<@s2LEN)
BEGIN
SET @wrkPos=@currPos+1

IF(Substring(@s1,@wrkPos+@s1Offset,1)=Substring(@s2,@wrkPos+@s2Offset,1)) SET @matchCnt=@matchCnt+1
ELSE
BEGIN
SET @s1Offset=0

SET @s2Offset=0

SELECT @s1Char=Substring(@s1,@wrkPos,1),@s2Char=Substring(@s2,@wrkPos,1)

SELECT @s1Pos=Charindex(@s2Char,@s1,@wrkPos)-1,@s2Pos=Charindex(@s1Char,@s2,@wrkPos)-1

SELECT @s1Dist=@s1Pos-@currPos,@s2Dist=@s2Pos-@currPos

IF(@s1Pos>0 AND (@s1Dist<=@s2Dist OR @s2Pos<1) AND @s1Dist<@maxOffset) SET @s1Offset=(@s1Pos-@wrkPos)+1
ELSE
IF(@s2Pos>0 AND (@s2Dist<@s1Dist OR @s1Pos<1) AND @s2Dist<@maxOffset) SET @s2Offset=(@s2Pos-@wrkPos)+1
END

SET @currPos=@currPos+1
END

RETURN(@s1LEN+@s2LEN)/2.0-@matchCnt
END

It doesn't give the same exact results as my own code, yet the result is close enough and the speed is about 20% higher.

And thanks to Diogo Nechtan, the version in PHP:

function sift3Plus($s1, $s2, $maxOffset) {
$s1Length = strlen($s1); 
$s2Length = strlen($s2);
if (empty($s1)) {
return (empty($s2) ? 0 : $s2Length);
}
if (empty($s2)) {
return $s1Length;
}
$c1 = $c2 = $lcs = 0;

while (($c1 < $s1Length) && ($c2 < $s2Length)) {
if (($d = $s1{$c1}) == $s2{$c2}) {
$lcs++;
} else {
for ($i = 1; $i < $maxOffset; $i++) {
if (($c1 + $i < $s1Length) && (($d = $s1{$c1 + $i}) == $s2{$c2})) {
$c1 += $i;
break;
}
if (($c2 + $i < $s2Length) && (($d = $s1{$c1}) == $s2{$c2 + $i})) {
$c2 += $i;
break;
}
}
}
$c1++;
$c2++;
}
return (($s1Length + $s2Length) / 2 - $lcs);
}

And thanks to Fernando Jorge Mota, the version in Python:

Also, here is the Javascript version, used in Mailcheck, by Derrick Ko and Wei Lu.

function sift3Distance(s1, s2) {
if (s1 == null || s1.length === 0) {
if (s2 == null || s2.length === 0) {
return 0;
} else {
return s2.length;
}
}

if (s2 == null || s2.length === 0) {
return s1.length;
}

var c = 0;
var offset1 = 0;
var offset2 = 0;
var lcs = 0;
var maxOffset = 5;

while ((c + offset1 < s1.length) && (c + offset2 < s2.length)) {
if (s1.charAt(c + offset1) == s2.charAt(c + offset2)) {
lcs++;
} else {
offset1 = 0;
offset2 = 0;
for (var i = 0; i < maxOffset; i++) {
if ((c + i < s1.length) && (s1.charAt(c + i) == s2.charAt(c))) {
offset1 = i;
break;
}
if ((c + i < s2.length) && (s1.charAt(c) == s2.charAt(c + i))) {
offset2 = i;
break;
}
}
}
c++;
}
return (s1.length + s2.length) / 2 - lcs;
}

~~Another implementation, this time in Java, by Eclesia:~~

You might also be interested in a customised version in AutoKey, by Toralf:

Thanks all for your comments and I look forward to more. Just tell me it worked or not and, most important, why. Good luck!

Copying a database from Sql2005 to Sql2000

Published Mar 29, 2007

Posted in
database
programming

and has 0 comments

You sometimes need to copy the exact structure of a database to an Sql2000 server, even if the source server is 2005.

Follow these steps:

open 2005 Sql Server Management Studio

right click on the offending database and go to Tasks -> Generate Scripts

do NOT check Script all objects in the selected database

click Next

set Include if NOT EXISTS to False

set Script for Server Version to SQL Server 2000

try to check only the objects and types of objects you actually need

create the script

delete all occurences of "WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)" from the generated script

Now the script should work on an SQL 2000 Server..
For the copying of data, the Server Management Studio has an option called ~~Copy Database~~Export Data, also in Tasks, that now accepts an Sql 2000 Server as destination.

ODBC Escape sequences!

Published Mar 21, 2007

Posted in
database
programming

and has 0 comments

In other words: those curly bracket things in SQL. What? curly brackets in SQL? Yes! Imagine that :)

The idea is that most database systems adhere to the ODBC standard, at least ODBC 1.0. That means that, when you communicated with a database, you can send so called ODBC escape sequences that are translated into the SQL engine native objects.

Quick example: SELECT {d '2007-03-15'} will work in all ODBC 1.0 compliant DBMSs, including Microsoft SQL Server, MySql, PostgreSQL, Oracle, etc. and select a date object from 15 of March 2007, no matter the server configured country or language.

Interested yet? You can read the ODBC Programmer's Reference for more details. Short story shorter, here are the working and interesting parts (to me) of the ODBC escape sequences:
select {d '2007-02-13' }
select {t '22:20:30' }
select {ts '2007-02-13 22:20:30' }
select {fn curdate()}
select {fn curtime()}
select {fn User()}
select {fn Database()}
select {fn week(getdate())}
select {fn quarter(getdate())}
select {fn monthname(getdate())}
select {fn dayname(getdate())}
select {fn curdate()}
select {fn dayofweek(getdate())}
select {fn dayofyear(getdate())}
select {guid '12345678-1234-1234-1234-123456789012'}

Super fast string distance algorithm: Sift2

Published Jan 9, 2007

and has 40 comments

Update!! Read the Sift3 post. It is an even better algorithm.

A while ago I wrote a little algorithm that tried to cope with the large amount of time that the Levenshtein edit-distance algorithm took to compare two strings. It was a really funny algorithm, but it was fast enough to show me the similarity between strings that, after all, should have been either very similar or different.

Meanwhile I started thinking if I could improve on it. I was sure I could, because it wasn't very scientific, it was empirical. And finally I did it. This is the Sift2 algorithm, along with an SQL server function that can make one million string comparisons in 2.5 minutes. Compared with Levenshtein, Sift2 performs 10 to 25 times faster and it is four times faster than Sift.

The concept:

Starting from the beginning of both words, compare the letters on the same position.
If the same, move forward, else search the letter from the first word in the next maxOffset letters in the second word and viceversa.
Offset the words according to the closest found letter, add 1 to distance
Repeat until one of the words ends
Add to the calculated distance half of the length of string remained unparsed from the other word

That's it! Here is the code:

C# Code (double click to show/hide)

public class StringSift2

 private int maxOffset;

 public StringSift2() : this(5) { }

 public StringSift2(int maxOffset)

 this.maxOffset = maxOffset;

 public float Distance(string s1, string s2)

 if (String.IsNullOrEmpty(s1))

 return

 String.IsNullOrEmpty(s2) ? 0 : s2.Length;

 if (String.IsNullOrEmpty(s2))

 return s1.Length;

 int c = 0;

 int offset1 = 0;

 int offset2 = 0;

 int dist = 0;

 while ((c + offset1 < s1.Length)

 && (c + offset2 < s2.Length))

 if (s1[c + offset1] != s2[c + offset2])

 offset1 = 0;

 offset2 = 0;

 for (int i = 0; i < maxOffset; i++)

 if ((c + i < s1.Length)

 && (s1[c + i] == s2[c]))

 if (i > 0)

 dist++;

 offset1 = i;

 goto ender;

 if ((c + i < s2.Length)

 && (s1[c] == s2[c + i]))

 if (i > 0)

 dist++;

 offset2 = i;

 goto ender;

 dist++;

 ender:

 c++;

 return dist + (s1.Length - offset1

 + s2.Length - offset2)/2 - c;

 public float Similarity(string s1, string s2)

 float dis = Distance(s1, s2);

 int maxLen = Math.Max(s1.Length, s2.Length);

 if (maxLen == 0) return 1;

 else

 return 1 - dis / maxLen;

T-SQL code (double click to show/hide)

CREATE FUNCTION Sift2Distance(@s1 nvarchar(3999), @s2 nvarchar(3999),@maxOffset int)

RETURNS int

AS

BEGIN

IF ISNULL(@s1,'')='' RETURN LEN(isnull(@s2,''))

IF ISNULL(@s2,'')='' RETURN LEN(isnull(@s1,''))

IF isnull(@maxOffset,0)=0 SET @maxOffset=5

 DECLARE @c int,@offset1 int, @offset2 int, @dist int,@i int

 SET @offset1=0

 SET @offset2=0

 SET @dist=0

 SET @c=0

WHILE (@c+@offset1<len(@s1) AND @c+@offset2<len(@s2))

BEGIN

 IF (SUBSTRING(@s1,@c+@offset1+1,1)!=SUBSTRING(@s2,@c+@offset2+1,1))

 BEGIN

 SET @offset1=0

 SET @offset2=0

 SET @i=0

 WHILE (@i<@maxOffset)

 BEGIN

 IF (@c+@i<len(@s1) AND SUBSTRING(@s1,@c+@i+1,1)=SUBSTRING(@s2,@c+1,1))

 BEGIN

 IF (@i>0)

 BEGIN

 SET @dist=@dist+1

 SET @offset1=@i

END

 GOTO ENDER

END

 IF (@c+@i<len(@s2) AND SUBSTRING(@s1,@c+1,1)=SUBSTRING(@s2,@c+@i+1,1))

 BEGIN

 IF (@i>0)

 BEGIN

 SET @dist=@dist+1

 SET @offset2=@i

END

 GOTO ENDER

END

 SET @i=@i+1

END

 SET @dist=@dist+1

END

 ENDER:

 SET @c=@c+1

END

RETURN @dist+(len(@s1)-@offset1+len(@s2)-@offset2)/2-@c

END

CREATE function Sift2Similarity (@s1 nvarchar(3999), @s2 nvarchar(3999),@replaceCost int) returns float

as

BEGIN

 DECLARE @distance int

 SELECT @distance=dbo.Sift2Distance(@s1,@s2,2)

 DECLARE @maxLen int

 SET @maxLen=case when len(@s1)>len(@s2) then len(@s1) else len(@s2) end

 return case when @maxLen=0 then 1 else 1-cast(@distance as float)/@maxLen end

END

You can find a nice OOP Javascript implementation at IT BASE.

Performance:
The algorithm seems to be working fine for letter insertions, typos, letter inversions and such. It gives slightly different values than Levenshtein when big words are inverted. When a lot of letters are inverted, the difference from Levenshtein increases. Example: abcdefghijkl vs. badcfehgjilk (every two letters inverted) results in 0.42 similarity in Levenshtein and 0.08 in Sift2. But let's face it, the two words are really different.

Update:
I've optimised the algorithm a little and also changed the formula so that it matches the Levenshtein distance as close as possible. The basic idea remains the same, but now the average error from Levenshtein (calculated or real company name and address data) is only 3%.
Some people saw the goto line and immediately laughed. Well, I tried replacing it with a break and a subtraction and even removed the subtraction altogether (thus maiming the algorithm) and the goto implementation was still the fastest. So I will continue to use goto, as it is more efficient.

Request:
I have seen a lot of people actually searched on Google and got here. I would really really really like some comments, links to implementations or whatever... It's my baby after all. Thanks!

TransactionScope in .NET

Published Nov 7, 2006

and has 0 comments

Amirthalingam Prasanna's article about the transactional model in NET 2.0.

As far as I understand, the old declarative ADO.NET Begin/RollBack/CommitTransaction model has become obsolete and a new TransactionScope model is used in NET 2.0. You have to add the System.Transactions.dll file to your references.

Basically, the C# code is like this:


using (TransactionScope scope=new TransactionScope
                       (scopeOption,transactionOptions,interopOption)) {
    // do database ops

    // if everything is alright
    scope.Complete();
}

scopeOption is an enum of type TransactionScopeOption, with the options Required (requires a transaction and uses if there is already on open), RequiresNew (always opens a new transaction), Suppress (don't use a transaction, even if one is open)

transactionOptions is of type TransactionOptions which has two interesting properties: IsolationLevel and Timeout.

interopOption is an enum of type EnterpriseServicesInteropOption and specifies how distributed transactions interact with COM+ transactions.

But what about the old NET1.1 framework? Doesn't it have something like that? Here comes Alexander Shirshov with help:
TransactionScope in .NET 1.1

Accessing Row based data in an efficient and maintainable manner

Published Aug 14, 2006

Posted in
database
.NET
programming
C#

and has 0 comments

Accessing Row based data in an efficient and maintainable manner - The Code Project - C# Database

This is a nice and simple article about accessing data from a lot of datarows. While using the string index will make the code more readable, using the integer index will make the code faster. The solution? Get the numeric indexes at the beginning of the loop, based on string indexes. Assert the indexes exist for better debugging. Very elegant. Also, check out the user comments, which are pretty good and to the point.

Code example:


int customerIDIndex = table.Columns.IndexOf("customerID");
int customerFirstNameIndex = table.Columns.IndexOf("firstName");
int customerLastNameIndex = table.Columns.IndexOf("lastName");

System.Diagnostics.Debug.Assert(customerIDIndex > -1, 
    "Database out of sync");
System.Diagnostics.Debug.Assert(customerFirstNameIndex > -1, 
    "Database out of sync");
System.Diagnostics.Debug.Assert(customerLastNameIndex > -1, 
    "Database out of sync");

foreach(DataRow row in table.Rows){
    customer = new Customer();
    customer.ID = (Int32)row[customerIDIndex];
    customer.FirstName = row[customerFirstNameIndex].ToString();
    customer.LastName = row[customerLastNameIndex].ToString();
}//end foreach

ASP.NET 2.0 Authentication and new controls

Published Aug 7, 2006

and has 0 comments

Using SQL Server instead of Access files:
1. Run aspnet_regsql.exe (from the NET Framework 2.0 folder)
2. Add to web.config (inside the configuration tag):
<connectionStrings>
<remove name="LocalSqlServer"/>
<add name="LocalSqlServer" connectionString="Data Source=localhost;Initial Catalog=aspnetdb;Integrated Security=True" providerName="System.Data.SqlClient"/>
</connectionStrings>
(that's because LocalSqlServer is already defined by default. Really dumb)
3. Go to the Website menu in Visual Studio -> ASP.NET Configuration and create users, roles and access rules.

Logging out programatically:
FormsAuthentication.SignOut();

Changing settings for the membership (in system.web section):
<membership defaultProvider="CustomizedProvider">
<providers>
<remove name="AspNetSqlMembershipProvider"/>
<add name="AspNetSqlMembershipProvider"
type="System.Web.Security.SqlMembershipProvider"
connectionStringName="LocalSqlServer"
enablePasswordRetrieval="false"
enablePasswordReset="true"
requiresQuestionAndAnswer="true"
applicationName="/"
requiresUniqueEmail="false"
minRequiredPasswordLength="1"
minRequiredNonalphanumericCharacters="0"
passwordFormat="Hashed"
maxInvalidPasswordAttempts="5"
passwordAttemptWindow="10"
passwordStrengthRegularExpression=""
commentTimeout=""/>
</providers>
</membership>