Nested Datatypes specified in schema #825
-
How do we specify a nested datatype like a list/array of integers?I would like to specify a nested data structure for the schema, below I am trying to specify a list of integers for "column0". I could not find any documentation on how to specify nested types, is it possible? df = pd.DataFrame({
"column0": [[1, 2], [4], [0, 5, 7], [10], [9, 8]],
"column1": [1, 4, 0, 10, 9],
"column3": ["value_1", "value_2", "value_3", "value_2", "value_1"],
})
# define schema
schema = pa.DataFrameSchema({
"column0": pa.Column(int), <-- ERROR HERE
"column1": pa.Column(int, checks=pa.Check.le(10)),
"column3": pa.Column(str),
})
validated_df = schema(df)
print(validated_df) Getting the following ERROR message: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
you can use # define schema
schema = pa.DataFrameSchema({
"column0": pa.Column(
object,
checks=pa.Check(lambda element: all(isinstance(x, int) for x in element), element_wise=True)
),
"column1": pa.Column(int, checks=pa.Check.le(10)),
"column3": pa.Column(str),
}) |
Beta Was this translation helpful? Give feedback.
-
@cosmicBboy Thank you for your help, resolved my issue |
Beta Was this translation helpful? Give feedback.
you can use
object
as the dtype, and then use custom checks to validate stuff about that column