Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB upload error #56

Open
alex-hh opened this issue Nov 9, 2024 · 0 comments
Open

PDB upload error #56

alex-hh opened this issue Nov 9, 2024 · 0 comments

Comments

@alex-hh
Copy link
Collaborator

alex-hh commented Nov 9, 2024

"remove_cif": args.remove_cif,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex/envs/devo/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 5478, in push_to_hub
additions, uploaded_size, dataset_nbytes = self._push_parquet_shards_to_hub(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex/envs/devo/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 5309, in _push_parquet_shards_to_hub
for index, shard in hf_tqdm(
File "/Users/alex/envs/devo/lib/python3.11/site-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/Users/alex/envs/devo/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 5294, in shards_with_embedded_external_files
shard = shard.map(
^^^^^^^^^^
File "/Users/alex/envs/devo/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 562, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex/envs/devo/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3057, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/Users/alex/envs/devo/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3475, in _map_single
writer.write_table(batch)
File "/Users/alex/envs/devo/lib/python3.11/site-packages/datasets/arrow_writer.py", line 627, in write_table
pa_table = pa_table.combine_chunks()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 4387, in pyarrow.lib.Table.combine_chunks
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant