Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset format of pretraining stage #56

Open
annopackage opened this issue Jul 17, 2024 · 0 comments
Open

dataset format of pretraining stage #56

annopackage opened this issue Jul 17, 2024 · 0 comments

Comments

@annopackage
Copy link

annopackage commented Jul 17, 2024

How did you unify the format of pretraining dataset? During supervised fine tuning stage, the training data are curated as question and answer pairs. For caption or detection dataset, I want to know if they follow the same format as sft data, and how to collect questions for these data as they originally only contains ground truth like caption or boxes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant