Tuesday, November 29, 2011

OpenSuse 12.1 - My new primary OS

OpenSuse 12.1 was released just few days back. I was expecting it to be the update of all the components. I downloaded the KDE live CD, burn it to DVD. I normally install linux using live USB drives, created using UnetBootin utility, which method does not work well for OpenSuse. Official method of installing from Live use drive is specified here http://en.opensuse.org/Live_USB_stick. This method has a problem that if format the USB drive in ext3 filesystem, so it cannot be used as regular storage device.

Installation run smoothly, installation from Live version asked lesser questions comparatively lesser then DVD installation, like, software selection etc.. For boot loader configuration it still shows "boot loader 128 GB" error, it is there since 11.3. So decided not to install boot loader at all, I have Kubuntu 11.10 installed on other partition of the HDD, so I used that boot loader. Automatic partition detection was really smart, it properly detected my home partition and mount point was set to /home, my existing OpenSuse partition was detected and set it "format and mount point to /", this is really nice thing. Rest of the installation went smoothly.

Since I have not installed the boot loader from OpenSuse, I have to boot into Kubuntu to update boot loader configuration, I used StartUp Manager. It is a very nice utility, I don't have to do anything except running and closing this application. It automatically detect and update the boot loader configuration. Then rebooted into OpenSuse, it automatically logged into the first user. I did not find any big change in booting time, it takes about a min to get to the my KDE desktop. All my hardware and peripherals were detected properly including WiFi. I tested Audio quality by playing a couple of mp3 files and experience audio quality much better than as with 11.4 but maximum sound volume is much lower than the previous release. After first boot I was greeted by default OpenSuse KDE desktop, Since I'm using existing home partition, I was expecting same old desktop that I have configured in Kubuntu. It looks like OpenSuse is using .kde4 folder for KDE settings instead of .kde as in Kubuntu and other distros. I think because OpenSuse still supports KDE3.

Drivers and apps
The first thing is ATI drivers, I followed this instructions from OpenSuse wiki. It works fine, I was having problem getting fglrx driver working in Kubuntu. In recent few releases fglrx driver have some issues, specially related to Gnome3. I installed 11.11 version of fglrx (or Catalyst Driver Suit), it still have some problems with Gnome3. KDE has BlueDevil app for bluetooth devices, it was not included in Live CD version. BlueDevil did not work well for me, I'm not able to transfer file via bluetooth. Next task is to install multimedia applications. I'm using VLC these days as default video player. Flash player 11.1 64-bit is also available in Non-OSS repo. Other applications that I installed are Opera, Thunderbird, QBittorrent, Skype, Google Chome, Google Earth and Wine.

After using for a week, I'm pretty happy with the performance and stability of OpenSuse. Over all experience is quite smooth, much better than Kubuntu 11.10. OpenSuse KDE still remains the most polished KDE among all other distros. OpenSuse 12.1 is going to be default OS on my laptop, until I find a better OS. I'll give it 4.5 out of 5.

Thursday, November 24, 2011

Phonetic comparison of strings

I was working on data clean up to some schools database. Data can be in any language.
One of the thing that I have to as part of this, is to find possible duplicates. First we tried out soundex algorithm, which has build in sql server, it is a good algorithm but is most suitable names and surnames (as it was developed for use in census ). Next algorithm we tried is Levenshtien distance, this takes 2 strings and gives no of character different in both strings. For a table with 10k rows for checking one field for possible duplicates it was taking about 15 min, which is quite long. Levenshtien does not well for string with spaces.
Which make reduces its usefulness very much.
So I searched around for few better algorithms. I found a soundex derivative 'Daitch–Mokotoff Soundex' , but it is also for surname with Slavic and Germanic support. Another one I found is Metaphone, which is suitable for most English words, but it is somewhat suitable for english related language (like spanish), but for language like japanese,korean etc.. it doesn't work at all. For strings with spaces it is not much accurate.
There is an improved version of Metaphone named Double-Metaphone. On language wise it is same as Metaphone. For strings with spaces it is much better, it identifies both "bank of india" and "bankofindia" as
same, while non of other algorithm is able to find it. Here are the queries and there output 


select soundex('bank of india'),soundex('bankofindia')
-- B520, B521
select dbo.Levenshtein('bank of india','bankofindia')
--NULL
select dbo.Metaphone('bank of india'),dbo.Metaphone('bankofindia')
--bnk of int, bnkfnt
select dbo.DoubleMetaPhone('bank of india'),dbo.DoubleMetaPhone('bankofindia')
--PNKFNPNKFN,PNKFNPNKFN