⛏️ Add cli dict parsing for grpo_config #3082

Tavish9 · 2025-03-14T03:04:53Z

What does this PR do?

Adds dict parsing logic to model_init_kwargs in grpo_config, enabling dynamic configuration via CLI. Users can now pass dictionary-like strings (e.g., --model_init_kwargs '{"torch_dtype":"bfloat16") through command-line arguments, which are automatically parsed into Python dicts for the target fields.

Logic is the same as TrainingArguments.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qgallouedec · 2025-03-14T12:44:08Z

Thanks! Does it work is you directly modify transformers.training_args._VALID_DICT_FIELDS instead?

Tavish9 · 2025-03-14T12:56:22Z

Yes, but both of transformers.training_args and trl.GRPOConfig should have their independent _VALID_DICT_FIELDS , as private attribute does.

In GRPOConfig.__post_init__, it first post-inits it's _VALID_DICT_FIELDS and then transformers.training_args's

qgallouedec · 2025-03-14T13:52:40Z

It seems to work:

from transformers.training_args import _VALID_DICT_FIELDS
from trl import GRPOConfig

_VALID_DICT_FIELDS.append("model_init_kwargs")

args = GRPOConfig("output_dir", model_init_kwargs='{"num_labels": 2}')
print(args.model_init_kwargs)  # {"num_labels": 2}

qgallouedec · 2025-03-14T13:59:50Z

To do this properly, the first step would be to convert _VALID_DICT_FIELDS into a class attribute of TrainingArguments in transformers. Are you ready to open such a PR in Transformers?

Then we could do:

# in transformers
class TrainingArguments:
    _VALID_DICT_FIELDS = [...]

# in trl
class GRPOConfig(TrainingArguments):
    _VALID_DICT_FIELDS = TrainingArguments._VALID_DICT_FIELDS + ["model_init_kwargs"]

which eliminates the need to duplicate the post init

Tavish9 · 2025-03-14T14:09:34Z

To do this properly, the first step would be to convert _VALID_DICT_FIELDS into a class attribute of TrainingArguments in transformers. Are you ready to open such a PR in Transformers?

Then we could do:
# in transformers
class TrainingArguments:
    _VALID_DICT_FIELDS = [...]

# in trl
class GRPOConfig(TrainingArguments):
    _VALID_DICT_FIELDS = TrainingArguments._VALID_DICT_FIELDS + ["model_init_kwargs"]
which eliminates the need to duplicate the post init

Yes, that was my initial thought as well. However, considering that the transformers defines _VALID_DICT_FIELDS as semi-private, I decided against submitting a PR to their repository. If we follow the semi-private variable approach, each config should ideally have its own variable, even though this might lead to some code duplication in the __post__init__ logic. That said, I’m also open to the idea of modifying the semi-private variable in the transformers to make it a class attribute. However, I’m not sure if the maintainers would be receptive to this change in philosophy.

What's your suggestions?

qgallouedec · 2025-03-14T14:56:57Z

Yes I think first modifying transformers is the way to go.

Tavish9 · 2025-03-14T16:42:24Z

okay, I would notify you when pr merged. :)

Tavish9 · 2025-04-01T10:54:14Z

Hi, @qgallouedec, the PR in Transformers is merged. 🥳

qgallouedec · 2025-04-02T05:02:51Z

I just need to review it carefully and ensure backwards compatibility
I'll do it asap.

HuggingFaceDocBuilderDev · 2025-04-05T05:06:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Tavish9 · 2025-04-07T04:02:34Z

maybe you need to update the version of transfomers and re-run the test?

qgallouedec · 2025-04-07T04:06:59Z

So currently this change isn't backward compatible we need to figure out how to make it backward compatible

Tavish9 · 2025-04-07T04:41:17Z

okay, let me try with version checking

trl/trainer/grpo_config.py

qgallouedec

Nice!! Thanks

Tavish9 marked this pull request as draft March 14, 2025 16:42

Tavish9 mentioned this pull request Mar 15, 2025

Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses huggingface/transformers#36736

Merged

5 tasks

Tavish9 force-pushed the grpo_config_extend branch from 0743c5a to 3e44f00 Compare April 1, 2025 10:51

Tavish9 marked this pull request as ready for review April 1, 2025 10:52

Tavish9 added 2 commits April 7, 2025 12:50

add cli dict extend

49da86a

add backward compatible

34f148b

Tavish9 force-pushed the grpo_config_extend branch from 9510b36 to 34f148b Compare April 7, 2025 05:18

Tavish9 and others added 2 commits April 7, 2025 21:05

NIT: ruff formatting

bdae7f6

Merge branch 'main' into grpo_config_extend

0c97034

qgallouedec reviewed Apr 8, 2025

View reviewed changes

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

qgallouedec approved these changes Apr 8, 2025

View reviewed changes

Update trl/trainer/grpo_config.py

73809eb

qgallouedec changed the title ~~add cli dict parsing for grpo_config~~ ⛏️ Add cli dict parsing for grpo_config Apr 8, 2025

qgallouedec merged commit e03e7ac into huggingface:main Apr 8, 2025
9 checks passed

yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025

⛏️ Add cli dict parsing for grpo_config (huggingface#3082)

fd040b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⛏️ Add cli dict parsing for grpo_config #3082

⛏️ Add cli dict parsing for grpo_config #3082

Uh oh!

Tavish9 commented Mar 14, 2025

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

Tavish9 commented Mar 14, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

Tavish9 commented Mar 14, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

Tavish9 commented Mar 14, 2025

Uh oh!

Tavish9 commented Apr 1, 2025

Uh oh!

qgallouedec commented Apr 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2025

Uh oh!

Tavish9 commented Apr 7, 2025

Uh oh!

qgallouedec commented Apr 7, 2025

Uh oh!

Tavish9 commented Apr 7, 2025

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Uh oh!

⛏️ Add cli dict parsing for grpo_config #3082

⛏️ Add cli dict parsing for grpo_config #3082

Uh oh!

Conversation

Tavish9 commented Mar 14, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

Tavish9 commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

Tavish9 commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Mar 14, 2025

Uh oh!

Tavish9 commented Mar 14, 2025

Uh oh!

Tavish9 commented Apr 1, 2025

Uh oh!

qgallouedec commented Apr 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2025

Uh oh!

Tavish9 commented Apr 7, 2025

Uh oh!

qgallouedec commented Apr 7, 2025

Uh oh!

Tavish9 commented Apr 7, 2025

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Tavish9 commented Mar 14, 2025 •

edited

Loading

Tavish9 commented Mar 14, 2025 •

edited

Loading