Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Thursday 18 May 2006 9:06 am, markw at mohawksoft.com wrote: > While I like to think I'm no slouch in this department, I'd like to see > if anyone has a better idea: > > You have two strings and you want to quantify how similar they are. > > Levenshtein Distance is not a good choice for two reasons: > > (1) it doesn't really show how similar strings are -- really, for > instance: > > "Tom Lehrer - The Was The Year That Was" and "That Was The Year That Was > - Tom Lehrer" > > Are, to you and me, almost identical, but difficult to detect, > Levenshtein only shows how many steps are required to convert one to > another. > > (2) Levenshtein Distance is an N^2 algorithm and, well, yuck. > > Anyone have any ideas? I currently using an n^2 algorithm that scans > through both strings and used run length matching, but if you know of > something better, I'd love to hear it. Kevin Gleason was working on something similar to that. I'm not sure if Kevin is still at Mount Ida. -- Jerry Feldman <gaf at blu.org> Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |