This repository has been archived by the owner on Aug 26, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 876
Finding best matches in a list gives wrong results. #77
Comments
@acslater00 Thoughts? |
Yeah this definitely looks broken. Unfortunately I don't think it's safe to run
This example is contrived but not too unrealistic. In retrospect there probably shouldn't be a default The problem here, I think, is that there is a mismatch between the expected input to the custom scorer (in this case, ratio.partial_ratio) and the input to extract. Another reasonable solution is the following
|
ethanwhite
added a commit
to ethanwhite/core-transient
that referenced
this issue
Nov 4, 2015
1. Set processor=str so that this design flaw in fuzzywuzzy: seatgeek/fuzzywuzzy#77 doesn't cause result in incorrect ratio calculations by lower casing only one of the strings to be compared. 2. Use a simple ratio for the string comparison instead of a weighted average with more complex ratios which are less appropriate to the task.
ethanwhite
added a commit
to weecology/bbc-data-rescue
that referenced
this issue
Jan 26, 2016
1. Set processor=str so that this design flaw in fuzzywuzzy: seatgeek/fuzzywuzzy#77 doesn't cause result in incorrect ratio calculations by lower casing only one of the strings to be compared. 2. Use a simple ratio for the string comparison instead of a weighted average with more complex ratios which are less appropriate to the task.
paulbodean88
pushed a commit
to paulbodean88/fuzzywuzzy
that referenced
this issue
Sep 30, 2016
This was referenced Oct 29, 2016
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The
process.extract
function doesn't seem to handle capitalised queries well with some scorers:This is because here the choice string is processed but the query string is not:
With some other scorers (e.g., WRatio) things work fine because those processes both strings internally anyway.
A workaround I use right now is:
The text was updated successfully, but these errors were encountered: