Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique keyword arg #580

Merged
merged 16 commits into from
Sep 5, 2021
Merged

Unique keyword arg #580

merged 16 commits into from
Sep 5, 2021

Conversation

cosmicBboy
Copy link
Collaborator

@cosmicBboy cosmicBboy commented Jul 24, 2021

fixes #390

@fkrull8 I'm re-opening a pull request against the current master branch and will work on getting this merged

@cosmicBboy
Copy link
Collaborator Author

hey @fkrull8 would you mind giving me push access to your fork of pandera? https://github.com/fkrull8/pandera

Just fixed all the merge conflicts mentioned above.

@abyz0123
Copy link
Contributor

abyz0123 commented Aug 1, 2021

Hi @cosmicBboy , apologies for not seeing this earlier! I think I should have just provided you with collaborator access to the repo. The PR was only half finished, as I was having trouble with the property assignment of the new unique keyword in the base class. The tests that I had written should have failed where I was having trouble. If I could get an assist on that, I could pick this PR back up and iron it out.

@cosmicBboy
Copy link
Collaborator Author

no worries @fkrull8, and thanks for the write permissions! I'll iron out some of the tests and hand it back over to you

@codecov
Copy link

codecov bot commented Aug 8, 2021

Codecov Report

Merging #580 (9bcce4f) into master (3191be9) will decrease coverage by 0.67%.
The diff coverage is 98.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #580      +/-   ##
==========================================
- Coverage   99.45%   98.77%   -0.68%     
==========================================
  Files          21       29       +8     
  Lines        2572     3359     +787     
==========================================
+ Hits         2558     3318     +760     
- Misses         14       41      +27     
Impacted Files Coverage Δ
pandera/hypotheses.py 94.54% <ø> (ø)
pandera/schema_inference.py 100.00% <ø> (ø)
pandera/error_formatters.py 95.45% <75.00%> (-4.55%) ⬇️
pandera/dtypes.py 92.38% <92.34%> (-7.62%) ⬇️
pandera/strategies.py 97.71% <97.22%> (-2.29%) ⬇️
pandera/schemas.py 99.47% <98.75%> (+0.16%) ⬆️
pandera/engines/engine.py 98.82% <98.82%> (ø)
pandera/engines/pandas_engine.py 99.29% <99.29%> (ø)
pandera/__init__.py 100.00% <100.00%> (ø)
pandera/check_utils.py 100.00% <100.00%> (ø)
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3191be9...9bcce4f. Read the comment docs.

@rbngz
Copy link
Contributor

rbngz commented Aug 17, 2021

Hello

Thank you for this PR. This functionality is exactly what we need.
Would you maybe also be able to integrate these changes into the FrictionlessFieldParser in order to fix #587 ?

@cosmicBboy
Copy link
Collaborator Author

hey @rbngz yes! this PR is almost ready.

@fkrull8 I managed to iron out a bunch of the tests since revamping the type system... I might be missing something, but this PR seems to be feature complete?

The only thing blocking is adding test coverage for these non-covered lines of code: https://github.com/pandera-dev/pandera/pull/580/checks?check_run_id=3281386477

@rbngz once this PR is merged would you be open to making a contribution to add support for multi-column uniqueness in FrictionlessFieldParser?

abyz0123 and others added 2 commits August 29, 2021 20:51
…unique_keyword_arg

� Conflicts:
�	pandera/io.py
�	pandera/model.py
�	pandera/model_components.py
�	pandera/schema_components.py
�	pandera/schemas.py
�	tests/core/test_schemas.py
@abyz0123
Copy link
Contributor

abyz0123 commented Sep 1, 2021

Hi @cosmicBboy, apologies for the late follow-up on this. Yes! The issues I was having before seem to have been ironed out. The feature that is not covered with this PR is the on the hypothesis side. As far as I can tell, there is no ready-made methods from hypothesis for setting unique-ness over multiple columns. That would have to be custom-made.

@abyz0123
Copy link
Contributor

abyz0123 commented Sep 2, 2021

I have to admit I'm a bit mystified by the coverage results -- the uncovered bits don't appear to be different than on master branch. The uncovered cases from #580 (comment) have been addressed with dc88f59 .

Sorry about the earlier linting errors, my dev environment wasnt set up properly for tox.

@cosmicBboy
Copy link
Collaborator Author

I have to admit I'm a bit mystified by the coverage results

yeah coverage results can be a little finicky.

@cosmicBboy
Copy link
Collaborator Author

@fkrull8 going to merge this now, thanks and congratulations on your first contribution to pandera 🎉

@cosmicBboy cosmicBboy changed the base branch from master to dev September 5, 2021 22:27
@cosmicBboy cosmicBboy merged commit df78e79 into unionai-oss:dev Sep 5, 2021
cosmicBboy added a commit that referenced this pull request Sep 9, 2021
* Unique keyword arg (#580)

* add copy button to docs (#448)

* Add missing inplace arg to SchemaModel's validate (#450)

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* WIP

* fix test errors, re-factor allow_duplicates handling

* fix io tests

* fix docs, remove _allow_duplicates private var

* update unique type signature in strategies

* completing tests for setters and lazy evaluation of unique kw

* small fix for the linting errors

* support dataframe-level uniqueness in strategies

* add docs, fix error formatting, add multiindex support

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>

* Add support for timezone-aware datetime strategies (#595)

* add support for Any annotation in schema model (#594)

* add support for Any annotation in schema model

the motivation behind this feature is to support column annotations
that can have any type, to support use cases like the one described
in #592, where
custom checks can be applied to any column except for ones that
are explicitly defined in the schema model class attributes

* update pylint, fix lint

* Docs/scaling - Bring Pandera to Spark and Dask (#588)

* scaling.rst

* edited conf

* finished first pass

* removing FugueWorkflow

* Update index.rst

* Update docs/source/scaling.rst

Co-authored-by: Niels Bantilan <[email protected]>

* add support for timezone-aware datetime strategies

* fix le/ge strategies with datetime

* fix mypy errors

Co-authored-by: Niels Bantilan <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>

* support frictionless primary keys with multiple fields

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>
cosmicBboy added a commit that referenced this pull request Sep 10, 2021
* Unique keyword arg (#580)

* add copy button to docs (#448)

* Add missing inplace arg to SchemaModel's validate (#450)

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* WIP

* fix test errors, re-factor allow_duplicates handling

* fix io tests

* fix docs, remove _allow_duplicates private var

* update unique type signature in strategies

* completing tests for setters and lazy evaluation of unique kw

* small fix for the linting errors

* support dataframe-level uniqueness in strategies

* add docs, fix error formatting, add multiindex support

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>

* Add support for timezone-aware datetime strategies (#595)

* add support for Any annotation in schema model (#594)

* add support for Any annotation in schema model

the motivation behind this feature is to support column annotations
that can have any type, to support use cases like the one described
in #592, where
custom checks can be applied to any column except for ones that
are explicitly defined in the schema model class attributes

* update pylint, fix lint

* Docs/scaling - Bring Pandera to Spark and Dask (#588)

* scaling.rst

* edited conf

* finished first pass

* removing FugueWorkflow

* Update index.rst

* Update docs/source/scaling.rst

Co-authored-by: Niels Bantilan <[email protected]>

* add support for timezone-aware datetime strategies

* fix le/ge strategies with datetime

* fix mypy errors

Co-authored-by: Niels Bantilan <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>

* schemas with multi-index columns correctly report errors (#600)

fixes #589

* strategies module supports undefined checks in regex columns (#599)

* Add support for empty data type annotation in SchemaModel (#602)

* remove artifacts of py3.6 support

* add support for empty data type annotation in SchemaModel

* fix frictionless version in dev dependencies

* fix setuptools version instead of frictionless

* fix setuptools pinning

* remove frictionless from core pandera deps (#609)

* support frictionless primary keys with multiple fields (#608)

* fix validation of check raising error without message (#613)

* docs/requirements.txt pin setuptools (#611)

* bump version 0.7.1

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add unique keyword option to all schemas and schema components
4 participants