-
-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
Issue with #556
Create a small Parquet file with a nested (list of struct) column:
import shapely
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.table({
"geometry": shapely.to_wkb(shapely.points(np.ones((3, 2)))),
"col_flat": [0, 1, 2],
"col_nested": [[{"a": 1, "b": 2}]*2]*3
})
pq.write_table(table, "test_nested.parquet")Reading with pyogrio works with default, but fails with arrow:
>>> pyogrio.read_dataframe("test_nested.parquet")
col_flat col_nested geometry
0 0 [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}] POINT (1 1)
1 1 [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}] POINT (1 1)
2 2 [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}] POINT (1 1)
>>> pyogrio.read_dataframe("test_nested.parquet", use_arrow=True)
...
File ~/scipy/repos/pyogrio/pyogrio/geopandas.py:336, in read_dataframe(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, fid_as_index, use_arrow, on_invalid, arrow_to_pandas_kwargs, **kwargs)
334 for ogr_subtype, c in zip(meta["ogr_subtypes"], df.columns):
335 if ogr_subtype == "OFSTJSON":
--> 336 df[c] = df[c].map(json.loads, na_action="ignore")
338 if fid_as_index:
339 df = df.set_index(meta["fid_column"])
...
TypeError: the JSON object must be str, bytes or bytearray, not intI think the problem lies in zip(meta["ogr_subtypes"], df.columns), where the order of the columns in meta is not guaranteed to be the same as df.columns.
In this case I think because the geometry column is at the start of the file, and is not included in the meta?
Metadata
Metadata
Assignees
Labels
No labels