Partial_Ratio not working #279

aW3st · 2020-08-27T17:08:25Z

Having some weird issues using partial ratio. Here's the code:

test_string = ('completed transactions settlement date trade date '
               'symbol name transaction type account type quantity price commissions & fees amount '
               '12/23 12/23 dividend '
               'appreciation etf dividend - - - $441.99 12/23 12/23 '
               'vig dividend appreciation etf reinvestment cash')

'etf' in test_string # returns True
fuzz.partial_ratio('etf', test_string)

without python-levenshtein this returns 33, with python levenshtein 67. My understanding of the method is that it should be 100, since there's a substring that's a perfect match. Any ideas?

(on python 3.8, btw)

XDGFX · 2020-08-31T16:11:51Z

I'm having the same issue, I would also expect a score of 100 with the below function

>>> artists_a
'carvar & clock'
>>> artists_b
'carvar clock'
>>> fuzz.partial_ratio(artists_a, artists_b)
83
>>> fuzz.partial_ratio(artists_b, artists_a)
83

I also tried without python-Levenshtein as suggested in #79 but exact same result.

XDGFX · 2020-08-31T16:20:49Z

Possibly replace partial_ratio with partial_token_sort_ratio, as mentioned on this stackoverflow answer. In both our examples it seemed to work as expected.

maxbachmann · 2020-09-01T01:18:58Z

partial_ratio searches for the best alignment between two strings and the calculates the fuzz.ratio for this alignment. So while in @aW3st case the word 'etf' is part of the second string therefore you would expect the result 100, thats not the case in your example @XDGFX.
When comparing 'carvar & clock' and 'carvar clock' they are no substring of each other. However when using partial_token_sort_ratio it works since it resorts the words to 'carvar clock &' and 'carvar clock'. So afterwards 'carvar clock' is a substring of 'carvar clock &' ;)

@aW3st you tried both with python-Levenshtein and without and both have wrong results for different reasons.

Python-Levenshtein has a known bug with finding the optimal alignment between strings, which is probably the bug your encountering here aswell. You can find this here: Broken partial_ratio functionality with python-Levenshtein #79 (comment)
when not using python-Levenshtein fuzzywuzzy falls back to difflib. Here the problem appears to occur when using the automatic junk heuristic of difflib which is activated by default. So it would be required to change

fuzzywuzzy/fuzzywuzzy/fuzz.py

Line 46 in 2188520

m = SequenceMatcher(None, shorter, longer)

to

m = SequenceMatcher(None, shorter, longer, False)

As a sidenote my library rapidfuzz provides the same string matching algorithm without this problem, so your example string returns a score of 100 as you expected

aW3st · 2020-09-01T19:59:45Z

Thanks Max, I'll give your library a shot!

thomkav · 2020-09-01T21:40:46Z

@maxbachmann Hi Max, I'm working with @aW3st on a project. We've swapped fuzzywuzzy for your library, and we're seeing great performance. Thanks!

maxbachmann mentioned this issue Sep 1, 2020

Faulty result of partial ratio (without python-Levenshtein) #264

Open

maxbachmann mentioned this issue Mar 22, 2021

Disable automatic junk heuristic of difflib #303

Closed

maxbachmann mentioned this issue Apr 2, 2023

replace python-Levenshtein with rapidfuzz seatgeek/thefuzz#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial_Ratio not working #279

Partial_Ratio not working #279

aW3st commented Aug 27, 2020

XDGFX commented Aug 31, 2020 •

edited

Loading

XDGFX commented Aug 31, 2020

maxbachmann commented Sep 1, 2020 •

edited

Loading

aW3st commented Sep 1, 2020

thomkav commented Sep 1, 2020 •

edited

Loading

Partial_Ratio not working #279

Partial_Ratio not working #279

Comments

aW3st commented Aug 27, 2020

XDGFX commented Aug 31, 2020 • edited Loading

XDGFX commented Aug 31, 2020

maxbachmann commented Sep 1, 2020 • edited Loading

aW3st commented Sep 1, 2020

thomkav commented Sep 1, 2020 • edited Loading

XDGFX commented Aug 31, 2020 •

edited

Loading

maxbachmann commented Sep 1, 2020 •

edited

Loading

thomkav commented Sep 1, 2020 •

edited

Loading