-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong (non symmetric) results for some strings if Levenshtein module is not present #5
Comments
Another example is strings "9038119840000001" and "AS19970718106866036". We get 3 different answers: |
This is probably because without the Levenshtein library, this package uses a different algorithm from the standard library's |
This is all very confusing, could someone please explain what "If the python-Levenshtein library is not installed" means? Does it mean if both "thefuzz" and "python-Levenshtein" are installed on the machine then we should get the same results for "fuzz.ratio('abc', 'cadb') " and "fuzz.ratio('cadb', 'abc') "? |
Yes if both are installed you should get the same results:
|
@maxbachmann I am using this library in Amazon Redshift in a Python UDF. Both libraries are installed on my Amazon Redshift cluster, running "select * from pg_library;" gives me "python_levenshtein, setuptools and thefuzz" and still get 29 for "SELECT f_fuzzy_string_match( 'cadb', 'abc');" |
Then you do not have a valid installation of from thefuzz import fuzz
import difflib
print(fuzz.SequenceMatcher == difflib.SequenceMatcher) |
@maxbachmann Something is not right here mate. In "StringMatcher.py", line 11 is "from Levenshtein import *", which looks like the project you're maintaining on: https://github.com/maxbachmann/Levenshtein ERROR: File "/rdsdbdata/user_lib/1/0/105733.zip/Levenshtein/init.py", line 17 author: str = "Max Bachmann" ^ SyntaxError: invalid syntax. Please look at svl_udf_log for more information Detail: ----------------------------------------------- error: File "/rdsdbdata/user_lib/1/0/105733.zip/Levenshtein/init.py", line 17 author: str = "Max Bachmann" ^ SyntaxError: invalid syntax. Please look at svl_udf_log for more information code: 10000 context: UDF query: 0 location: udf_client.cpp:364 process: padbmaster [pid=3125] ----------------------------------------------- [ErrorId: 1-63081de5-55324c9c41e8b72f3fdf7abb] |
both should work |
If the python-Levenshtein library is not installed, then fuzz.ratio('abc', 'cadb') returns 57 and fuzz.ratio('cadb', 'abc') returns 29.
If the python-Levenshtein library is used, then both calls return 57.
Similar errors happen for other strings too, e.g:
'abcd', 'cbda'
'ONYZBOHON', 'ZRKFULFORD'
The text was updated successfully, but these errors were encountered: