Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFT datasets error #18

Open
arthasyou opened this issue Oct 27, 2023 · 0 comments
Open

SFT datasets error #18

arthasyou opened this issue Oct 27, 2023 · 0 comments
Labels
bug Something isn't working needs triage The issue needs to be triaged by some maintainer

Comments

@arthasyou
Copy link

arthasyou commented Oct 27, 2023

from datasets import load_dataset

dataset = load_dataset("ShengbinYue/DISC-Law-SFT")

----------------------------------------------------------------------------------
error:
Generating train split: 166758 examples [00:00, 184286.58 examples/s]
Traceback (most recent call last):
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1940, in _prepare_split_single
    writer.write_table(table)
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/arrow_writer.py", line 572, in write_table
    pa_table = table_cast(pa_table, self._schema)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/table.py", line 2328, in table_cast
    return cast_table_to_schema(table, schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/table.py", line 2286, in cast_table_to_schema
    raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
ValueError: Couldn't cast
id: string
reference: list<item: string>
  child 0, item: string
input: string
output: string
to
{'id': Value(dtype='string', id=None), 'input': Value(dtype='string', id=None), 'output': Value(dtype='string', id=None)}
because column names don't match

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2153, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1813, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/home/ysx/miniconda3/lib/python3.11/site-packages/datasets/builder.py", line 1958, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
@Charlie-XIAO Charlie-XIAO added usage Usage questions bug Something isn't working needs triage The issue needs to be triaged by some maintainer and removed usage Usage questions labels Oct 29, 2023
@Charlie-XIAO Charlie-XIAO changed the title sft datasets error SFT datasets error Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage The issue needs to be triaged by some maintainer
Projects
None yet
Development

No branches or pull requests

2 participants