Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
> Here is what you want (Dynamic Programming Algorithm (DPA) for Edit- > Distance): > > http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/ > > We used this technique at my previous job for validating OCR > accuracy. Its very fast. That seems to do what Levensthein distence does, am I missing something? > > -Josh > > On May 18, 2006, at 9:06 AM, markw at mohawksoft.com wrote: > >> While I like to think I'm no slouch in this department, I'd like to >> see if >> anyone has a better idea: >> >> You have two strings and you want to quantify how similar they are. >> >> Levenshtein Distance is not a good choice for two reasons: >> >> (1) it doesn't really show how similar strings are -- really, for >> instance: >> >> "Tom Lehrer - The Was The Year That Was" and "That Was The Year >> That Was - >> Tom Lehrer" >> >> Are, to you and me, almost identical, but difficult to detect, >> Levenshtein >> only shows how many steps are required to convert one to another. >> >> (2) Levenshtein Distance is an N^2 algorithm and, well, yuck. >> >> Anyone have any ideas? I currently using an n^2 algorithm that scans >> through both strings and used run length matching, but if you know of >> something better, I'd love to hear it. >> >> _______________________________________________ >> Discuss mailing list >> Discuss at blu.org >> http://olduvai.blu.org/mailman/listinfo/discuss >
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |