Skip to content

Pyogrio and Use Arrow - Read GDB Layer - "Categorical categories cannot be null" #535

@zachariahBinx

Description

@zachariahBinx

Introduction
I came across an interesting issue when using GeoPandas to read gdb layers into a geodataframe while setting use_arrow=True and the engine to pyogrio (defualt). Found a stack post about setting "use_arrow" to True to speed up reading and writing, so wanted to try it out.

Problem
If a gdb has a domain with the below properties and the FIRST Code is set to 1 <= x <= 99, pandas returns the following error: ValueError: Categorical categories cannot be null. If the Code is set to any other number (positive or negative), reading the data passes. If the first Code is set to 150 and the second Code entry is set to 50, passes. Only when the first Code is set 1 <= x <= 99 is when it fails.

Properties

Domain Name = Domain_One
Decription = Some Description Text
Field Type = Long
Domain Type = Coded value Domain
Split Policy = Default
Merge Policy = Default

Solutions

  1. Rearrange domain codes
  2. Setting geopandas engine to "fiona"

Environment Set Up

uv venv --python 3.11
.venv/scripts/activate
uv init
uv add geopandas
uv add pyarrow
uv add fiona

pyproject.toml

[project]
name = "pyogrio-arrow"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "fiona>=1.10.1",
    "geopandas>=1.0.1",
    "pyarrow>=19.0.1",
]

Code

import os
import geopandas
geopandas.options.io_engine = "pyogrio"
os.environ["PYOGRIO_USE_ARROW"] = '1'

def main():
    file_path = r"C:\Users\MY_USER\TEST_DATA\PYOGRIO_USE_ARROW\Test.gdb"
    gdf = geopandas.read_file(file_path, layer="Test_Layer")

if __name__ == "__main__":
    main()

GDB

ArcPro 3.1.2

Created a gdb with "Test" polygon feature class with 4 geometries. Domain with "Long" type and a Code set to 50. Field "Field" domain is set to "Domain_one".

Image

Error Message

Traceback (most recent call last):
  File "C:\Users\MY_USER\PYOGRIO_ARROW\main.py", line 12, in <module>
    main()
  File "C:\Users\MY_USER\PYOGRIO_ARROW\main.py", line 8, in main
    gdf = geopandas.read_file(file_path, layer="Test_Layer")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\geopandas\io\file.py", line 294, in _read_file
    return _read_file_pyogrio(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\geopandas\io\file.py", line 547, in _read_file_pyogrio
    return pyogrio.read_dataframe(path_or_bytes, bbox=bbox, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyogrio\geopandas.py", line 292, in read_dataframe
    df = table.to_pandas(**kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\\array.pxi", line 889, in pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow\\table.pxi", line 5132, in pyarrow.lib.Table._to_pandas
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyarrow\pandas_compat.py", line 824, in table_to_dataframe
    blocks = [
             ^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyarrow\pandas_compat.py", line 825, in <listcomp>
    _reconstruct_block(item, column_names, ext_columns_dtypes)
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyarrow\pandas_compat.py", line 739, in _reconstruct_block
    arr = _pandas_api.categorical_type.from_codes(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\arrays\categorical.py", line 745, in from_codes
    dtype = CategoricalDtype._from_values_or_dtype(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 338, in _from_values_or_dtype
    dtype = CategoricalDtype(categories, ordered)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 221, in __init__
    self._finalize(categories, ordered, fastpath=False)
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 378, in _finalize
    categories = self.validate_categories(categories, fastpath=fastpath)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 576, in validate_categories
    raise ValueError("Categorical categories cannot be null")
ValueError: Categorical categories cannot be null

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions