BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Any computer science geeks?

Subject: Any computer science geeks?
From: markw at mohawksoft.com (markw at mohawksoft.com)
Date: Thu, 18 May 2006 09:06:01 -0400 (EDT)

While I like to think I'm no slouch in this department, I'd like to see if
anyone has a better idea:

You have two strings and you want to quantify how similar they are.

Levenshtein Distance is not a good choice for two reasons:

(1) it doesn't really show how similar strings are -- really, for instance:

"Tom Lehrer - The Was The Year That Was" and "That Was The Year That Was -
Tom Lehrer"

Are, to you and me, almost identical, but difficult to detect, Levenshtein
only shows how many steps are required to convert one to another.

(2) Levenshtein Distance is an N^2 algorithm and, well, yuck.

Anyone have any ideas? I currently using an n^2 algorithm that scans
through both strings and used run length matching, but if you know of
something better, I'd love to hear it.

Follow-Ups:
- Any computer science geeks?
  - From: josh at offthehill.org (Josh ChaitinPollak)
- Any computer science geeks?
  - From: gaf at blu.org (Jerry Feldman)
- Any computer science geeks?
  - From: gaf at blu.org (Jerry Feldman)
- Any computer science geeks?
  - From: kclark at mtghouse.com (Kevin D. Clark)
- Any computer science geeks?
  - From: dsr at tao.merseine.nu (dsr at tao.merseine.nu)

Prev by Date: Any computer science geeks?
Next by Date: Any computer science geeks?
Previous by thread: Fw: USENIX NEWS: LISA '06 CFP Deadline, Annual Tech Update and More (73108)
Next by thread: Any computer science geeks?
Index(es):
- Date
- Thread

Boston Linux & Unix / webmaster@blu.org