Define schemas with pydantic #9969
-
Is there a way to define table schemas as pydantic models and use that as the source of truth for what the data contains, irrespective of the backend used (duckdb in my case)? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hey @tgy! We don't have explicit support for >>> import ibis
>>> sch = ibis.schema({"a": "array<int>", "b": "float32", "c": "str"})
>>> ibis.table(sch, name="my_table")
UnboundTable: my_table
a array<int64>
b float32
c string Or if you prefer to not use our string-parsing to get the datatypes, you can use explicit ibis dtypes: >>> import ibis.expr.datatypes as dt
>>> sch = ibis.schema({"a": dt.Array(dt.int), "b": dt.float32, "c": dt.str})
>>> ibis.table(sch, name="my_table")
UnboundTable: my_table
a array<int64>
b float32
c string Alternatively, if you have a schema-like thing defined in any one of >>> from ibis.expr.schema import Schema
>>> sch = Schema.from_polars(schema_obj) # or `pyarrow`, etc... Now you can operate on this table like any other table >>> t = ibis.table(sch, name="my_table")
>>> expr = t.mutate(c=t.c.replace("foo", "bar"))
>>> expr
r0 := UnboundTable: my_table
a array<int64>
b float32
c string
Project[r0]
a: r0.a
b: r0.b
c: StringReplace(r0.c, pattern='foo', replacement='bar') Once you have the expression how you want it (or just to test it out) you can run it against actual data by doing something like con = ibis.duckdb.connect("my_db_with_that_actual_table_definition.ddb")
con.to_pandas(expr) |
Beta Was this translation helpful? Give feedback.
You could do something like this? Depending on how you want to type the
dataclass
, you might have to add in some type mapping (like if you were typing things using python builtin types instead of Ibis types).