Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for timezone-aware datetime strategies #595

Merged
merged 5 commits into from
Sep 5, 2021

Conversation

jeffzi
Copy link
Collaborator

@jeffzi jeffzi commented Sep 1, 2021

Version 0.7.0 introduced support for pandas.DatetimeTZDtype in a pandera.DataFrameSchema but that dtype is not recognized by the strategies module. This PR adds support for timezone-aware datetime strategies and fixes #534.

Hypothesis only supports pure numpy datetime64 (i.e. timezone naive). We have to do a bit of back-and-forth between pandas.Timestamp and datetime64 because we want checks to see timezone-aware values and hypothesis.extra.pandas strategies to see a numpy.dtype. Ideally, we should fix upstream.

In addition, a TypeError is now raised when trying to call stategies.pandas_dtypes_strategy with an unsupported dtype. That should help users understanding why the strategy failed.

cosmicBboy and others added 2 commits September 1, 2021 14:04
* add support for Any annotation in schema model

the motivation behind this feature is to support column annotations
that can have any type, to support use cases like the one described
in unionai-oss#592, where
custom checks can be applied to any column except for ones that
are explicitly defined in the schema model class attributes

* update pylint, fix lint
* scaling.rst

* edited conf

* finished first pass

* removing FugueWorkflow

* Update index.rst

* Update docs/source/scaling.rst

Co-authored-by: Niels Bantilan <[email protected]>
@codecov
Copy link

codecov bot commented Sep 1, 2021

Codecov Report

Merging #595 (e9b75a7) into dev (abc817f) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #595      +/-   ##
==========================================
+ Coverage   98.73%   98.74%   +0.01%     
==========================================
  Files          29       29              
  Lines        3327     3355      +28     
==========================================
+ Hits         3285     3313      +28     
  Misses         42       42              
Impacted Files Coverage Δ
pandera/engines/pandas_engine.py 99.30% <100.00%> (+<0.01%) ⬆️
pandera/model.py 99.15% <100.00%> (+<0.01%) ⬆️
pandera/strategies.py 97.56% <100.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abc817f...e9b75a7. Read the comment docs.

@jeffzi jeffzi force-pushed the bugfix/datetime_tz_strategy branch from 41dfee5 to 4d42bba Compare September 2, 2021 19:14
@jeffzi
Copy link
Collaborator Author

jeffzi commented Sep 2, 2021

I rebased master into dev to catchup with pylint fixes.

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jeffzi! the strategies module is sort of a beast to wrap one's head around, but you seem to have managed :)

@cosmicBboy cosmicBboy merged commit f51c3aa into unionai-oss:dev Sep 5, 2021
@jeffzi
Copy link
Collaborator Author

jeffzi commented Sep 5, 2021

the strategies module is sort of a beast to wrap one's head around

It is, but I was already a bit familiar with it from the development the data types! Now I have no excuses to pick up issues related to strategies :)

cosmicBboy added a commit that referenced this pull request Sep 9, 2021
* Unique keyword arg (#580)

* add copy button to docs (#448)

* Add missing inplace arg to SchemaModel's validate (#450)

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* WIP

* fix test errors, re-factor allow_duplicates handling

* fix io tests

* fix docs, remove _allow_duplicates private var

* update unique type signature in strategies

* completing tests for setters and lazy evaluation of unique kw

* small fix for the linting errors

* support dataframe-level uniqueness in strategies

* add docs, fix error formatting, add multiindex support

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>

* Add support for timezone-aware datetime strategies (#595)

* add support for Any annotation in schema model (#594)

* add support for Any annotation in schema model

the motivation behind this feature is to support column annotations
that can have any type, to support use cases like the one described
in #592, where
custom checks can be applied to any column except for ones that
are explicitly defined in the schema model class attributes

* update pylint, fix lint

* Docs/scaling - Bring Pandera to Spark and Dask (#588)

* scaling.rst

* edited conf

* finished first pass

* removing FugueWorkflow

* Update index.rst

* Update docs/source/scaling.rst

Co-authored-by: Niels Bantilan <[email protected]>

* add support for timezone-aware datetime strategies

* fix le/ge strategies with datetime

* fix mypy errors

Co-authored-by: Niels Bantilan <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>

* support frictionless primary keys with multiple fields

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>
cosmicBboy added a commit that referenced this pull request Sep 10, 2021
* Unique keyword arg (#580)

* add copy button to docs (#448)

* Add missing inplace arg to SchemaModel's validate (#450)

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* link documentation to github (#449)

Co-authored-by: Niels Bantilan <[email protected]>

* intermediate commit for review by @cosmicBboy

* WIP

* fix test errors, re-factor allow_duplicates handling

* fix io tests

* fix docs, remove _allow_duplicates private var

* update unique type signature in strategies

* completing tests for setters and lazy evaluation of unique kw

* small fix for the linting errors

* support dataframe-level uniqueness in strategies

* add docs, fix error formatting, add multiindex support

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>

* Add support for timezone-aware datetime strategies (#595)

* add support for Any annotation in schema model (#594)

* add support for Any annotation in schema model

the motivation behind this feature is to support column annotations
that can have any type, to support use cases like the one described
in #592, where
custom checks can be applied to any column except for ones that
are explicitly defined in the schema model class attributes

* update pylint, fix lint

* Docs/scaling - Bring Pandera to Spark and Dask (#588)

* scaling.rst

* edited conf

* finished first pass

* removing FugueWorkflow

* Update index.rst

* Update docs/source/scaling.rst

Co-authored-by: Niels Bantilan <[email protected]>

* add support for timezone-aware datetime strategies

* fix le/ge strategies with datetime

* fix mypy errors

Co-authored-by: Niels Bantilan <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>

* schemas with multi-index columns correctly report errors (#600)

fixes #589

* strategies module supports undefined checks in regex columns (#599)

* Add support for empty data type annotation in SchemaModel (#602)

* remove artifacts of py3.6 support

* add support for empty data type annotation in SchemaModel

* fix frictionless version in dev dependencies

* fix setuptools version instead of frictionless

* fix setuptools pinning

* remove frictionless from core pandera deps (#609)

* support frictionless primary keys with multiple fields (#608)

* fix validation of check raising error without message (#613)

* docs/requirements.txt pin setuptools (#611)

* bump version 0.7.1

Co-authored-by: Jean-Francois Zinque <[email protected]>
Co-authored-by: tfwillems <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: fkroll8 <[email protected]>
Co-authored-by: Kevin Kho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants