BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Any computer science geeks?

Subject: Any computer science geeks?
From: josh at offthehill.org (Josh ChaitinPollak)
Date: Fri, 19 May 2006 20:28:13 -0400
In-reply-to: <18233.24.91.171.78.1147957561.squirrel@mail.mohawksoft.com>
References: <18233.24.91.171.78.1147957561.squirrel@mail.mohawksoft.com>

Here is what you want (Dynamic Programming Algorithm (DPA) for Edit- 
Distance):

http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/

We used this technique at my previous job for validating OCR  
accuracy. Its very fast.

-Josh

On May 18, 2006, at 9:06 AM, markw at mohawksoft.com wrote:

> While I like to think I'm no slouch in this department, I'd like to  
> see if
> anyone has a better idea:
>
> You have two strings and you want to quantify how similar they are.
>
> Levenshtein Distance is not a good choice for two reasons:
>
> (1) it doesn't really show how similar strings are -- really, for  
> instance:
>
> "Tom Lehrer - The Was The Year That Was" and "That Was The Year  
> That Was -
> Tom Lehrer"
>
> Are, to you and me, almost identical, but difficult to detect,  
> Levenshtein
> only shows how many steps are required to convert one to another.
>
> (2) Levenshtein Distance is an N^2 algorithm and, well, yuck.
>
> Anyone have any ideas? I currently using an n^2 algorithm that scans
> through both strings and used run length matching, but if you know of
> something better, I'd love to hear it.
>
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://olduvai.blu.org/mailman/listinfo/discuss

Follow-Ups:
- Any computer science geeks?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)

References:
- Any computer science geeks?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)

Prev by Date: gnu/linux on AMD: 64-bit or 32-bit?
Next by Date: Any computer science geeks?
Previous by thread: Any computer science geeks?
Next by thread: Any computer science geeks?
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org