fuzzywuzzy returns 0 instead of 100 in many corner cases #196

rralf · 2018-03-21T14:41:15Z

Hi,

following issue(s) can be reproduced with python-levenshtein 0.12.0 and fuzzywuzzy 0.16.0

There are several combinations that should be considered similar while being scored as absolutely dissimilar.

fuzz.ratio('', '')
evaluates to 0. Two empty absolutely identical, so this should be 100.

fuzz.ratio('{', '{')
evaluates to 100, while
fuzz.token_sort_ratio(['{'], ['{'])
evaluates erroneously to 0. Inconsistently,
fuzz.token_sort_ratio(['{a'], ['{a'])
evaluates to 100. (which is correct IMO).

Proposal: Independent of the method, fuzzywuzzy should always return 100 if both arguments are absolutely equal. I think this could efficiently be checked.

Thanks!

The text was updated successfully, but these errors were encountered:

josegonzalez · 2018-03-21T16:25:08Z

Pull requests - with the accompanying tests - are welcome and encouraged :)

rralf · 2018-03-21T16:29:05Z

First I'd like to discuss those issues, I didn't have a look into the code yet. Additionally I still have to check if this behaviour depends on python-levenshtein (which I'm using) or not.

Maybe there's even some rationale behind this and the behaviour is intended for some reason.

Or in other words: Am I facing a bug or does it work as intended? :-)

josegonzalez · 2018-03-21T16:45:51Z

From my perspective, the tests pass, so any behavior is expected. If you can add this functionality and tests continue to pass, great!

One thing to note is that the ratio may be misleading if the tokens are all the same but actually mis-ordered. That might have actual implications on someone's code, but if we're not testing for that now, its fine to break imo.

rralf · 2018-03-22T11:13:39Z

I'm currently writing some patches.

Think I will have to touch the check_for_none and check_empty_string decorators... Let me return with a pull request in a couple of hours.

josegonzalez · 2018-03-22T12:39:12Z

Please be sure to include perf benchmarks :)

rralf · 2018-03-22T15:58:33Z

Hi,

before requesting to pull, here are the fixes.

How can I run performance benchmarks?

Thanks

zackkitzmiller · 2018-03-22T16:02:17Z

@rralf I looked over those changes and can assure @josegonzalez that they are performant. @josegonzalez do you actually need a perf tests here? This changeset looks good to me.

josegonzalez · 2018-03-22T16:10:46Z

Sounds fine with me. PR it.

rralf · 2018-03-22T16:25:11Z

Huh, that went quickly... Thanks for reviewing.

maxbachmann mentioned this issue Apr 2, 2023

replace python-Levenshtein with rapidfuzz seatgeek/thefuzz#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuzzywuzzy returns 0 instead of 100 in many corner cases #196

fuzzywuzzy returns 0 instead of 100 in many corner cases #196

rralf commented Mar 21, 2018

josegonzalez commented Mar 21, 2018

rralf commented Mar 21, 2018

josegonzalez commented Mar 21, 2018

rralf commented Mar 22, 2018

josegonzalez commented Mar 22, 2018

rralf commented Mar 22, 2018

zackkitzmiller commented Mar 22, 2018

josegonzalez commented Mar 22, 2018

rralf commented Mar 22, 2018

fuzzywuzzy returns 0 instead of 100 in many corner cases #196

fuzzywuzzy returns 0 instead of 100 in many corner cases #196

Comments

rralf commented Mar 21, 2018

josegonzalez commented Mar 21, 2018

rralf commented Mar 21, 2018

josegonzalez commented Mar 21, 2018

rralf commented Mar 22, 2018

josegonzalez commented Mar 22, 2018

rralf commented Mar 22, 2018

zackkitzmiller commented Mar 22, 2018

josegonzalez commented Mar 22, 2018

rralf commented Mar 22, 2018