-
Notifications
You must be signed in to change notification settings - Fork 876
fuzz.ratio is not always commutative #173
Comments
With python-Levenshtein it appears to be commutative (at least for this example).
But I can confirm that it is not commutative without python-Levenshtein.
I think the issue here is using difflib which doesn't actually give true edit distances - from the docs
If you put these examples through the SequenceMatcher in difflib you will get the same (but unrounded) results. |
difflib sounds as a likely cause indeed. A relatively simple fix for this case could be to try both directions and take the min or max of them. That would fix the reported discrepancy, although it makes the problem of determining what is actually measured more complicated, in particular if difflib result and python-Levenshtein result are not the same. In the mean time, we decided that normalized equality notion did not work for our project, and instead opted for true edit distance. As such we are not waiting for a fix of this issue. |
Probably no good way to fix this if you are using difflib, and documenting that it isn't commutative is our best shot. |
How can I force fuzz.partial_ratio allow only 1st argument as substring and not the other way round. For example following example will be giving misleading matches: |
I'm not implementing that in fuzzywuzzy, but you are free to do so in a wrapper function by just checking if one string is in another. Note that you'll definitely incur a performance penalty. |
Will you able to share some example for this? Coz substring needs to be
fuzzy again.
…On 5 Jan 2018 9:02 pm, "Jose Diaz-Gonzalez" ***@***.***> wrote:
I'm not implementing that in fuzzywuzzy, but you are free to do so in a
wrapper function by just checking if one string is in another. Note that
you'll definitely incur a performance penalty.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#173 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALn7Gf0Nb08CX020dZ9NSshq6lyQNLuRks5tHkCNgaJpZM4OuZD8>
.
|
One of the properties that you'd expect is that the order of the arguments doesn't matter, ie
fuzz.ratio(A, B) == fuzz.ratio(B, A)
for all stringsA
andB
. This property is also listed at https://en.wikipedia.org/wiki/Edit_distance#Properties as one of the fundamental properties of editing distance.While it seems to hold most of the times, unfortunately this is not always the case:
gives
I haven't tested it with the
levenshtein
package.The text was updated successfully, but these errors were encountered: