Any computer science geeks?

Jerry Feldman gaf at blu.org
Thu May 18 09:48:23 EDT 2006


On Thursday 18 May 2006 9:06 am, markw at mohawksoft.com wrote:
> While I like to think I'm no slouch in this department, I'd like to see
> if anyone has a better idea:
>
> You have two strings and you want to quantify how similar they are.
>
> Levenshtein Distance is not a good choice for two reasons:
>
> (1) it doesn't really show how similar strings are -- really, for
> instance:
>
> "Tom Lehrer - The Was The Year That Was" and "That Was The Year That Was
> - Tom Lehrer"
>
> Are, to you and me, almost identical, but difficult to detect,
> Levenshtein only shows how many steps are required to convert one to
> another.
>
> (2) Levenshtein Distance is an N^2 algorithm and, well, yuck.
>
> Anyone have any ideas? I currently using an n^2 algorithm that scans
> through both strings and used run length matching, but if you know of
> something better, I'd love to hear it.
Kevin Gleason was working on something similar to that. I'm not sure if 
Kevin is still at Mount Ida. 
-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9



More information about the Discuss mailing list