Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process.extractBests and usage of __str__ #332

Closed
banagale opened this issue Jun 12, 2023 · 2 comments
Closed

process.extractBests and usage of __str__ #332

banagale opened this issue Jun 12, 2023 · 2 comments
Labels
question Further information is requested

Comments

@banagale
Copy link

banagale commented Jun 12, 2023

I am trying to drop-in replace a project that depends on the last version of fuzzywuzzy prior to the name change. This is needed after hitting this issue.

The project uses process.extractBests. I noticed that rapidfuzz does not include process.extractBests.

Is process.extract a drop in replacement for that old function?

I tried using process.extract and realized that the project was relying on the __str__ of objects passed into the choices argument being read. Later in the code, the variable is used like an object. (this allowed the dev to easily use the object and refer to the string for comparison)

rapidfuzz does not seem to look at a given __str__ for an object. Is this on purpose? Or perhaps FW should not have done this?

I mention the two above because I believe the goal is for the FW api to be fully available in RF. I do not know if the above use of the FW api was unusual or an anti-pattern though.

@maxbachmann maxbachmann added the question Further information is requested label Jun 12, 2023
@maxbachmann
Copy link
Member

In fuzzywuzzy there is both extract and extractBests with the difference that extractBests has an additional score_cutoff parameter. In RapidFuzz I only have the extract function which does provide the score_cutoff argument and so is equivalent to extractBests

There are a couple of differences between RapidFuzz and fuzzywuzzy. In your specific case I assume you are using a function like WRatio which defaults to force_ascii=True. So your strings are preprocessed using utils.full_process(, force_ascii=True) which runs str(sequence). This behaviour is not supported in rapidfuzz, so you will need to perform this conversion yourself. This can be done e.g. like this:

process.extract(query, choices, processor=str)

or in case you want to use the preprocessing function:

def preprocess(seq):
    return utils.default_process(str(seq))

process.extract(query, choices, processor=preprocess)

@banagale
Copy link
Author

Thank you for that feedback, Max!

I'll have another run at this, and if I run into difficulty re-open this issue. I saw #333 and appreciate that, I had seen #26 and presumed the function previously existed but was obviated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants