BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Any computer science geeks?

Subject: Any computer science geeks?
From: dsr at tao.merseine.nu (dsr at tao.merseine.nu)
Date: Thu, 18 May 2006 09:05:58 -0400
In-reply-to: <18233.24.91.171.78.1147957561.squirrel@mail.mohawksoft.com>
References: <18233.24.91.171.78.1147957561.squirrel@mail.mohawksoft.com>

On Thu, May 18, 2006 at 09:06:01AM -0400, markw at mohawksoft.com wrote:
> While I like to think I'm no slouch in this department, I'd like to see if
> anyone has a better idea:
> 
> You have two strings and you want to quantify how similar they are.
> 
> "Tom Lehrer - The Was The Year That Was" and "That Was The Year That Was -
> Tom Lehrer"
> 
> (2) Levenshtein Distance is an N^2 algorithm and, well, yuck.
> 
> Anyone have any ideas? I currently using an n^2 algorithm that scans
> through both strings and used run length matching, but if you know of
> something better, I'd love to hear it.

I don't have anything faster, but you could, for example, apply
SOUNDEX to each word, sort the result, and count how many were
the same.

You'll probably want an enhanced SOUNDEX... see
http://www.creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm

-dsr-

Follow-Ups:
- Any computer science geeks?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)

References:
- Any computer science geeks?
  - From: markw at mohawksoft.com (markw at mohawksoft.com)

Prev by Date: gnu/linux on AMD: 64-bit or 32-bit?
Next by Date: Any computer science geeks?
Previous by thread: Any computer science geeks?
Next by thread: Any computer science geeks?
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org