How to use pandera in conjunction with typeguard and/or as a type checker for the other arguments? #837
-
Question about panderaHi. I am facing the following scenario. Due to ecosystem limitations, I can't use any static type checkers. So how I got around this is by using typeguard for most of my functions to check for the right types. The issue I'm facing now is if I have a function that has both pandera and non-pandera type annotations. Typeguard seems to not like that pandas.core.frame.DataFrame (the type of the arguments) is not pandera.typing.pandas.DataFrame. @typeguard.typechecked
@pa.check_types
def some_function(df: pa.typing.DataFrame[Schema], some_arg: int) -> None:
... This throws:
Any ideas how to get around this? Ultimately, I could make a pull request into typeguard to add functionality to ignore some arguments, but I'm trying to see if there's a simpler solution. Can pandera check the oher types as well? Thanks 🐶 😄 |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 1 reply
-
In the meantime, for now I switched to using |
Beta Was this translation helpful? Give feedback.
-
hi @acovaci ! Is typeguard a requirement for your use case? Another thing you can do is use |
Beta Was this translation helpful? Give feedback.
-
If you really care about types, then you can actually use You can think of @typeguard.typechecked
@pa.check_types
def some_function(df: pa.typing.DataFrame[Schema], some_arg: int) -> None:
...
# when you invoke the function, make sure to pass in the schema-typed dataframe
some_function(pa.typing.DataFrame[Schema](...)) Although you're not using mypy, it would still be worth reading this mypy integration guide to understand the gotchas of using typed dataframes. |
Beta Was this translation helpful? Give feedback.
-
@cosmicBboy Thanks so much, I'll give the guide a read.
Not really, it was just the easiest library to integrate into my ecosystem.
Ah this is awesome, though seems like I can't pass a config to it? This is how I'm currently using it: @pydantic.validate_arguments(config={"arbitrary_types_allowed": True})
def some_function(df1: pa.typing.DataFrame[Schema], df2: pd.DataFrame, some_var: int) -> None:
... Sadly I need this for passing pd.DataFrame arguments, sadly again, for ecosystem reasons. If not, the explicit type instantiation is defo an option. Thanks 👼 |
Beta Was this translation helpful? Give feedback.
-
You're welcome! Converting this to a discussion, mind marking my answer as the correct one? |
Beta Was this translation helpful? Give feedback.
-
Do you think it might be worth it making a pull request implementing passing a config object to pydantic.BaseModel.Config.arbitrary_types_allowed = True |
Beta Was this translation helpful? Give feedback.
If you really care about types, then you can actually use
pandera.typing.DataFrame[Schema](data)
(wheredata
is some valid data you want for instantiating the dataframe). In this case, the types should match and typeguard should stop complaining.You can think of
pandera.typing.DataFrame[Schema](...)
as a schema-typed dataframe. Initializing a dataframe like this basically validates thedata
coming in at initialization.Although you're not us…