Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzer not processing multiple "attributes" #254

Open
zxnie opened this issue Mar 12, 2025 · 0 comments
Open

Analyzer not processing multiple "attributes" #254

zxnie opened this issue Mar 12, 2025 · 0 comments

Comments

@zxnie
Copy link

zxnie commented Mar 12, 2025

I'm using -c config.yaml to pass config.
When "attributes" is a list of multiple elements:

uv run dolma -c analyze.yaml stat
attributes:
- cudo/attributes/c4_v2
- cudo/attributes/pii_regex_with_counts_fast_v2
bins: 20
debug: false
processes: 1
regex: null
report: stats
seed: 0
total: true
work_dir:
  input: null
  output: null
Traceback (most recent call last):
  File "/home/user/Projects/llm-data-prep-1/.venv/bin/dolma", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/Projects/llm-data-prep-1/.venv/lib/python3.11/site-packages/dolma/cli/__main__.py", line 93, in main
    return cli.run_from_args(args=args, config=config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Projects/llm-data-prep-1/.venv/lib/python3.11/site-packages/dolma/cli/__init__.py", line 192, in run_from_args
    return cls.run(parsed_config)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Projects/llm-data-prep-1/.venv/lib/python3.11/site-packages/dolma/cli/analyzer.py", line 76, in run
    create_and_run_analyzer(
  File "/home/user/Projects/llm-data-prep-1/.venv/lib/python3.11/site-packages/dolma/core/analyzer.py", line 321, in create_and_run_analyzer
    analyzer = AnalyzerProcessor(
               ^^^^^^^^^^^^^^^^^^
  File "/home/user/Projects/llm-data-prep-1/.venv/lib/python3.11/site-packages/dolma/core/parallel.py", line 154, in __init__
    raise ValueError(
ValueError: The number of source and destination prefixes must be the same (got 2 and 1)

But when I just provide one attribute, it works well.

Env info: Python 3.11.11, Ubuntu 24.04.2 LTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant