![]() |
Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
While I like to think I'm no slouch in this department, I'd like to see if anyone has a better idea: You have two strings and you want to quantify how similar they are. Levenshtein Distance is not a good choice for two reasons: (1) it doesn't really show how similar strings are -- really, for instance: "Tom Lehrer - The Was The Year That Was" and "That Was The Year That Was - Tom Lehrer" Are, to you and me, almost identical, but difficult to detect, Levenshtein only shows how many steps are required to convert one to another. (2) Levenshtein Distance is an N^2 algorithm and, well, yuck. Anyone have any ideas? I currently using an n^2 algorithm that scans through both strings and used run length matching, but if you know of something better, I'd love to hear it.
![]() |
|
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |