-
-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Introduction
I came across an interesting issue when using GeoPandas to read gdb layers into a geodataframe while setting use_arrow=True and the engine to pyogrio (defualt). Found a stack post about setting "use_arrow" to True to speed up reading and writing, so wanted to try it out.
Problem
If a gdb has a domain with the below properties and the FIRST Code is set to 1 <= x <= 99, pandas returns the following error: ValueError: Categorical categories cannot be null. If the Code is set to any other number (positive or negative), reading the data passes. If the first Code is set to 150 and the second Code entry is set to 50, passes. Only when the first Code is set 1 <= x <= 99 is when it fails.
Properties
Domain Name = Domain_One
Decription = Some Description Text
Field Type = Long
Domain Type = Coded value Domain
Split Policy = Default
Merge Policy = Default
Solutions
- Rearrange domain codes
- Setting geopandas engine to "fiona"
Environment Set Up
uv venv --python 3.11
.venv/scripts/activate
uv init
uv add geopandas
uv add pyarrow
uv add fiona
pyproject.toml
[project]
name = "pyogrio-arrow"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"fiona>=1.10.1",
"geopandas>=1.0.1",
"pyarrow>=19.0.1",
]
Code
import os
import geopandas
geopandas.options.io_engine = "pyogrio"
os.environ["PYOGRIO_USE_ARROW"] = '1'
def main():
file_path = r"C:\Users\MY_USER\TEST_DATA\PYOGRIO_USE_ARROW\Test.gdb"
gdf = geopandas.read_file(file_path, layer="Test_Layer")
if __name__ == "__main__":
main()
GDB
ArcPro 3.1.2
Created a gdb with "Test" polygon feature class with 4 geometries. Domain with "Long" type and a Code set to 50. Field "Field" domain is set to "Domain_one".
Error Message
Traceback (most recent call last):
File "C:\Users\MY_USER\PYOGRIO_ARROW\main.py", line 12, in <module>
main()
File "C:\Users\MY_USER\PYOGRIO_ARROW\main.py", line 8, in main
gdf = geopandas.read_file(file_path, layer="Test_Layer")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\geopandas\io\file.py", line 294, in _read_file
return _read_file_pyogrio(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\geopandas\io\file.py", line 547, in _read_file_pyogrio
return pyogrio.read_dataframe(path_or_bytes, bbox=bbox, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyogrio\geopandas.py", line 292, in read_dataframe
df = table.to_pandas(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow\\array.pxi", line 889, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow\\table.pxi", line 5132, in pyarrow.lib.Table._to_pandas
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyarrow\pandas_compat.py", line 824, in table_to_dataframe
blocks = [
^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyarrow\pandas_compat.py", line 825, in <listcomp>
_reconstruct_block(item, column_names, ext_columns_dtypes)
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pyarrow\pandas_compat.py", line 739, in _reconstruct_block
arr = _pandas_api.categorical_type.from_codes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\arrays\categorical.py", line 745, in from_codes
dtype = CategoricalDtype._from_values_or_dtype(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 338, in _from_values_or_dtype
dtype = CategoricalDtype(categories, ordered)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 221, in __init__
self._finalize(categories, ordered, fastpath=False)
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 378, in _finalize
categories = self.validate_categories(categories, fastpath=fastpath)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MY_USER\PYOGRIO_ARROW\.venv\Lib\site-packages\pandas\core\dtypes\dtypes.py", line 576, in validate_categories
raise ValueError("Categorical categories cannot be null")
ValueError: Categorical categories cannot be null
