You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The empty string ends up as pieces[0] and pyarrow ultimately throws the following error since this is not a valid filename:
Traceback (most recent call last):
File "build_petastorm_dataset.py", line 103, in <module>
run(args)
File "build_petastorm_dataset.py", line 79, in run
.parquet(args.output_url)
File "/opt/conda/default/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "/opt/conda/default/lib/python3.6/site-packages/petastorm/etl/dataset_metadata.py", line 113, in materialize_dataset
_generate_unischema_metadata(dataset, schema)
File "/opt/conda/default/lib/python3.6/site-packages/petastorm/etl/dataset_metadata.py", line 206, in _generate_unischema_metadata
utils.add_to_dataset_metadata(dataset, UNISCHEMA_KEY, serialized_schema)
File "/opt/conda/default/lib/python3.6/site-packages/petastorm/utils.py", line 115, in add_to_dataset_metadata
arrow_metadata = compat_get_metadata(dataset.pieces[0], dataset.fs.open)
File "/opt/conda/default/lib/python3.6/site-packages/petastorm/compat.py", line 31, in compat_get_metadata
arrow_metadata = piece.get_metadata()
File "/opt/conda/default/lib/python3.6/site-packages/pyarrow/parquet.py", line 676, in get_metadata
f = self.open()
File "/opt/conda/default/lib/python3.6/site-packages/pyarrow/parquet.py", line 683, in open
reader = self.open_file_func(self.path)
File "/opt/conda/default/lib/python3.6/site-packages/pyarrow/parquet.py", line 1054, in _open_dataset_file
buffer_size=dataset.buffer_size
File "/opt/conda/default/lib/python3.6/site-packages/pyarrow/parquet.py", line 210, in __init__
read_dictionary=read_dictionary, metadata=metadata)
File "pyarrow/_parquet.pyx", line 1023, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Parquet file size is 0 bytes
To recreate:
This becomes a problem in petastorm.utils.add_to_dataset_metadata where we have the following line:
The empty string ends up as pieces[0] and pyarrow ultimately throws the following error since this is not a valid filename:
@megaserg @selitvin
The text was updated successfully, but these errors were encountered: