Skip to content

BUG: reading file with JSON ogr subtype is broken with use_arrow=True #592

@jorisvandenbossche

Description

@jorisvandenbossche

Issue with #556

Create a small Parquet file with a nested (list of struct) column:

import shapely
import pyarrow as pa
import pyarrow.parquet as pq

table = pa.table({
    "geometry": shapely.to_wkb(shapely.points(np.ones((3, 2)))), 
    "col_flat": [0, 1, 2], 
    "col_nested": [[{"a": 1, "b": 2}]*2]*3
})
pq.write_table(table, "test_nested.parquet")

Reading with pyogrio works with default, but fails with arrow:

>>> pyogrio.read_dataframe("test_nested.parquet")
   col_flat                            col_nested     geometry
0         0  [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]  POINT (1 1)
1         1  [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]  POINT (1 1)
2         2  [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]  POINT (1 1)

>>> pyogrio.read_dataframe("test_nested.parquet", use_arrow=True)
...
File ~/scipy/repos/pyogrio/pyogrio/geopandas.py:336, in read_dataframe(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, fid_as_index, use_arrow, on_invalid, arrow_to_pandas_kwargs, **kwargs)
    334 for ogr_subtype, c in zip(meta["ogr_subtypes"], df.columns):
    335     if ogr_subtype == "OFSTJSON":
--> 336         df[c] = df[c].map(json.loads, na_action="ignore")
    338 if fid_as_index:
    339     df = df.set_index(meta["fid_column"])
...
TypeError: the JSON object must be str, bytes or bytearray, not int

I think the problem lies in zip(meta["ogr_subtypes"], df.columns), where the order of the columns in meta is not guaranteed to be the same as df.columns.
In this case I think because the geometry column is at the start of the file, and is not included in the meta?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions