Skip to main content

Posts

Showing posts from 2011

OpenSuse 12.1 - My new primary OS

OpenSuse 12.1 was released just few days back. I was expecting it to be the update of all the components. I downloaded the KDE live CD, burn it to DVD. I normally install linux using live USB drives, created using UnetBootin utility, which method does not work well for OpenSuse. Official method of installing from Live use drive is specified here http://en.opensuse.org/Live_USB_stick. This method has a problem that if format the USB drive in ext3 filesystem, so it cannot be used as regular storage device. Installation run smoothly, installation from Live version asked lesser questions comparatively lesser then DVD installation, like, software selection etc.. For boot loader configuration it still shows "boot loader 128 GB" error, it is there since 11.3. So decided not to install boot loader at all, I have Kubuntu 11.10 installed on other partition of the HDD, so I used that boot loader. Automatic partition detection was really smart, it properly detected my home partition an...

Phonetic comparison of strings

I was working on data clean up to some schools database. Data can be  in any language. One of the thing that I have to as part of this, is to find possible  duplicates. First we tried out soundex algorithm, which has build in  sql server, it is a good algorithm but is most suitable names and  surnames (as it was developed for use in census ). Next algorithm we  tried is Levenshtien distance, this takes 2 strings and gives no of  character different in both strings. For a table with 10k rows for  checking one field for possible duplicates it was taking about 15 min,  which is quite long. Levenshtien does not well for string with spaces. Which make reduces its usefulness very much. So I searched around for few better algorithms. I found a soundex  derivative 'Daitch–Mokotoff Soundex' , but it is also for surname with  Slavic and Germanic support. Another one I found is Metaphone, which  is suitable for most English words, but ...