Talk:Edit distance

Latest comment: 10 months ago by David Eppstein in topic Inspect the sentence
WikiProject Computer science (Rated C-class, Mid-importance)
WikiProject iconThis article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C This article has been rated as C-Class on Wikipedia's content assessment scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Things you can help WikiProject Computer science with:

WikiProject Computing (Rated C-class)
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C This article has been rated as C-Class on Wikipedia's content assessment scale.
 ???  This article has not yet received a rating on the project's importance scale.

Name of articleEdit

As far as I know, the name 'Edit distance traditionally means Levenshtein distance in Computer Science (see [1]). The list presented here also has overlap with string metrics. Maybe that page could be organised into categories instead of starting this page? Klem fra Nils Grimsmo 20:04, 19 March 2007 (UTC)Reply[reply]

You're rignt. I fixed it. Touchatou (talk) 16:19, 31 October 2011 (UTC)Reply[reply]
I think it would be even better to merge some of the many articles here that describe Levenshtein distance or varieties. There are many! Rp (talk) 09:57, 1 November 2011 (UTC)Reply[reply]

Proposed merge with Levenshtein distanceEdit

... as well as while moving the algorithmic parts to Wagner-Fischer algorithm. This page duplicates the information on those two pages.

In the literature about this problem, edit distance is often synonymous with Levenshtein distance, although sometimes it is extended, though seldom very dramatically. See e.g. Gonzalo Navarro's Guided tour to approximate string matching, references on edit distance.

Furthermore, this page spends a lot of time discussing the W-F algorithm, even though many applications of edit distance do not use that algorithm at all, circumventing the explicit computation of pairwise distances as it is simply too expensive. Therefore, this article does not reflect the state of the art in computer science and the discussion of the algorithm should really be separated from the algorithm to compute it. (A short discussion of the W-F algorithm is featured on edit distance, of course, as it's historically important.) QVVERTYVS (hm?) 08:55, 17 January 2014 (UTC)Reply[reply]

Please merge the three! I'm all for it! Rp (talk) 09:10, 17 January 2014 (UTC)Reply[reply]
To clarify: I did not mean to merge all three into a single article. I want the material on Levenshtein distance to be split into edit distance and Wagner-Fischer algorithm, so that we have one overview of the concept and its applications, and one on the basic algorithm that computes it. QVVERTYVS (hm?) 09:24, 17 January 2014 (UTC)Reply[reply]
Maybe a hatnote would be appropriate, along the lines of "This article is about the distance metric; for details of its computation, see Wagner-Fischer algorithm" (?) QVVERTYVS (hm?) 09:27, 17 January 2014 (UTC)Reply[reply]
I think prior to any restructuring it would pay off to hunt for all articles explaining the concept of edit distance or an algorithm to compute it (I'm sure Levenshtein_distance#See_also isn't complete), then design a new structure, and most of all, a way to make that structure immediately clear to readers, so they won't keep adding the same information in different places, or after removal, over and over and over. I stopped editing these articles after going through the Levenshtein distance page history; there is a clear lack of direction. Rp (talk) 09:38, 17 January 2014 (UTC)Reply[reply]
One problem is that there is a whole space of closely related algorithms and distances, so it's not very clear how to chop it up into articles. Rp (talk) 09:38, 17 January 2014 (UTC)Reply[reply]
I've decided to use Navarro as my guide here, since his is one of the best cited overviews of the field. (And because I don't have a copy of Sankoff and Kruskal's book, unfortunately, but that's also quite old.) Following that, rather than random web resources, should give us a common vocabulary. QVVERTYVS (hm?) 10:54, 17 January 2014 (UTC)Reply[reply]

merge - Putting it all in edit distance seems sensible. You could argue that the section on Levenshtein distance would be so large as to warrant its own article, but, for me, the benefits in terms of reduced repetition outweigh this argument in this case. Yaris678 (talk) 10:09, 17 January 2014 (UTC)Reply[reply]

No merge - Levenshtein distance is one of many edit distances. It happens to occupy an overly large percentage of peoples' mind-share, but that is because they don't know much about the other kinds of edit distances. What is really needed is to beef up the edit distance article to cover a lot more of the material. At the moment the edit distance article reads like it was written by somebody who mostly knows about Levenshtein distance, hence the desire to merge. Derek farn (talk) 13:31, 17 January 2014 (UTC)Reply[reply]

I welcome any suggestions as to how to "beef up the article". I've tried to generalize beyond Levenshtein distance + transposition, but I certainly did not read up on all the literature that discusses substring swaps, learning edit weights and q-gram distances, so go ahead. QVVERTYVS (hm?) 22:42, 17 January 2014 (UTC)Reply[reply]
  • Oppose merge. Edit distance is the general idea; it need not go into detail about the specific definitions and algorithms to compute it. The LD article could also have more detail about optimizations -- material that would be out of place in the general SD article. Glrx (talk) 00:56, 19 January 2014 (UTC)Reply[reply]
I want to separate the algorithmic content from the distance metric as such, and generalize the notion of Levenshtein distance in a single article edit distance. Optimizations and variants for specific distance metrics can be discussed in the page Wagner-Fischer algorithm, where IMHO they belong more properly than on a page about a distance metric that is embodied in many more algorithms than just its pairwise DP computation. QVVERTYVS (hm?) 12:18, 19 January 2014 (UTC)Reply[reply]

No merge Just remove/move the stuff thats specifically about Levenshtein distance from the edit distance page. There will be enough left for a general article with the properties and discussion of different approaches. --Ray andrew (talk) 00:23, 26 February 2014 (UTC)Reply[reply]

Merge, but even more urgently: merge with string metric, which covers exactly the same topic. The discussion could be widened to include block edit distances. (talk) 10:14, 23 May 2014 (UTC)Reply[reply]

String metric is even more problematic because it lumps together actual string metrics and similarity functions on sets and histograms. QVVERTYVS (hm?) 10:32, 23 May 2014 (UTC)Reply[reply]

Word metric?Edit

Can the relationship between word metric and this article be clarified? — Preceding unsigned comment added by (talk) 00:23, 9 July 2020 (UTC)Reply[reply]

Inspect the sentenceEdit

@David Eppstein I really can't understand this sentence:

Edit distance with non-negative cost satisfies the axioms of a metric, giving rise to a metric space of strings, when the following conditions are met:

please make that sentence more fluent. Thanks, Hooman Mallahzadeh (talk) 15:59, 14 July 2022 (UTC)Reply[reply]

That sentence is grammatical, fluent, and accurate. Your difficulty with technical English is not a problem with the article. —David Eppstein (talk) 16:47, 14 July 2022 (UTC)Reply[reply]

I disagree. The interjection "giving rise to a metric space of strings" interrupts the flow, the reoccurrence if "metric" doesn’t make it any better, and it seems superfluous to me. Strike it and the sentence will be easier to read. Rp (talk) 21:41, 14 July 2022 (UTC)Reply[reply]

@David Eppstein@Rp In the above sentence, "satisfies" means "would satisfy", am I correct? Is the following sentence more fluent? and semantically correct?

If the following conditions are met, then edit distance with non-negative cost would satisfy the axioms of a metric, and give rise to a metric space of strings:

Hooman Mallahzadeh (talk) 02:41, 15 July 2022 (UTC)Reply[reply]
Um. That doesn't feel idiomatic to me. I'd prefer to match indicative to indicative (if conditions are met, then things happen) or subjunctive to subjunctive (if conditions were met, then things would happen). But the subjunctive version has a connotation of negativity or counterfactuality (if conditions were met, then things would happen, but the conditions may not be met) that I'd prefer to avoid in this case. That's why I prefer the existing wording. —David Eppstein (talk) 05:13, 15 July 2022 (UTC)Reply[reply]