Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Here is what you want (Dynamic Programming Algorithm (DPA) for Edit- Distance): http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/ We used this technique at my previous job for validating OCR accuracy. Its very fast. -Josh On May 18, 2006, at 9:06 AM, markw at mohawksoft.com wrote: > While I like to think I'm no slouch in this department, I'd like to > see if > anyone has a better idea: > > You have two strings and you want to quantify how similar they are. > > Levenshtein Distance is not a good choice for two reasons: > > (1) it doesn't really show how similar strings are -- really, for > instance: > > "Tom Lehrer - The Was The Year That Was" and "That Was The Year > That Was - > Tom Lehrer" > > Are, to you and me, almost identical, but difficult to detect, > Levenshtein > only shows how many steps are required to convert one to another. > > (2) Levenshtein Distance is an N^2 algorithm and, well, yuck. > > Anyone have any ideas? I currently using an n^2 algorithm that scans > through both strings and used run length matching, but if you know of > something better, I'd love to hear it. > > _______________________________________________ > Discuss mailing list > Discuss at blu.org > http://olduvai.blu.org/mailman/listinfo/discuss
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |