Replies: 9 comments
-
Hi @learnnk, can you please provide a minimally reproducible code snippet with your (i) schema and (ii) data... basically what you've done above except in an easily copy-pasteable form (and the unexpected error/output you're seeing). This will help us help you 🤝 |
Beta Was this translation helpful? Give feedback.
-
Thanks for your response import pandera as pa class Retaildata(pa.SchemaModel): df = pd.DataFrame({ try: output getting as below [1 rows x 6 columns] As i said i'm not getting failure case for negative value in orderid for index 1 |
Beta Was this translation helpful? Give feedback.
-
FYI you can do triple backticks ``` to preserve indentation: import pandera as pa
import pandas as pd
from pandera.typing import Index, DataFrame, Series
from pandera import Check, Column, DataFrameSchema
class Retaildata(pa.SchemaModel):
orderid: Series[int] = pa.Field(coerce=True,nullable=False,gt=0)
orderdate: Series[pa.DateTime] = pa.Field(coerce=True)
orderamount: Series[int] = pa.Field(coerce=True)
orderstatus: Series[str] = pa.Field(coerce=True, nullable=False) this makes it easier to copy-paste into (in this case) my editor to help debug |
Beta Was this translation helpful? Give feedback.
-
ah, you're seeing just the coercion error (failure to coerce
because pandera doesn't apply the This might be too strict an assumption, but I'd be open to changing this behavior if you and others find this unintuitive. In any case, it would be worth adding an explanation of this behavior in the lazy validation docs... https://pandera.readthedocs.io/en/stable/lazy_validation.html I'm curious: what is your specific use case, and what are the ways you're using pandera/pandas to clean/validate data? E.g. is this a programmatic data pipeline, or are you using pandera in the context of a GUI where a human updates the state of the data with multiple rounds of validation? |
Beta Was this translation helpful? Give feedback.
-
Thanks for quick update I'm curious: what is your specific use case, and what are the ways you're using pandera/pandas to clean/validate data? E.g. is this a programmatic data pipeline, or are you using pandera in the context of a GUI where a human updates the state of the data with multiple rounds of validation? [learnnk]-We are using Pandera for DataQuality Validation. We built a in house framework which automatically generates the schema model based on my schema(Flat File/DB) and we will validate the schema using Pandera in built functions It would be better if this use case suffice with out having the dependency of column-coerece type Thanks!! |
Beta Was this translation helpful? Give feedback.
-
cool, so if I understand correctly, this is a summary of your situation:
my next question is: do you care that the resulting valid data are of the expected dtypes? |
Beta Was this translation helpful? Give feedback.
-
Here is my response output getting as below Thanks! |
Beta Was this translation helpful? Give feedback.
-
Gotcha, so as I mentioned, Recommendation 1: Custom Check with no Dtype, or
|
Beta Was this translation helpful? Give feedback.
-
converting this to a discussion, @learnnk feel free to mark continue the discussion there! |
Beta Was this translation helpful? Give feedback.
-
Hi Team,
I'm using the Pandera Schema model to validate my file which as Integer column.
My Schema Model
class Retaildata(pa.SchemaModel):
orderid: Series[int] = pa.Field(coerce=True,nullable=False)
orderdate: Series[pa.DateTime] = pa.Field(coerce=True)
orderamount: Series[int] = pa.Field(coerce=True)
orderstatus: Series[str] = pa.Field(coerce=True, nullable=False)
Data
orderid, orderdate, orderamount, orderstatus
0 1, 2013-07-25 00:00:00.0,11599, PENDING_PAYMENT
1 -1, 2013-07-25 00:00:00.0, 256, PENDING_PAYMENT
2 abc, 2013-07-25 00:00:00.0, 12111, COMPLETE
output
schema_context column ... failure_case index
0 Column orderid ... abc 2
[1 rows x 6 columns]
To test this column
Ouput- It always shows the failure case as String value-index number(which is point 1) not showing the Index value for (point 2) from above
Can you please suggest/guide how to capture the index of the failure case for negative row which ( is index 1 from data)
Beta Was this translation helpful? Give feedback.
All reactions