dataset format of pretraining stage #56

annopackage · 2024-07-17T05:16:17Z

How did you unify the format of pretraining dataset? During supervised fine tuning stage, the training data are curated as question and answer pairs. For caption or detection dataset, I want to know if they follow the same format as sft data, and how to collect questions for these data as they originally only contains ground truth like caption or boxes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset format of pretraining stage #56

dataset format of pretraining stage #56

annopackage commented Jul 17, 2024 •

edited

Loading

dataset format of pretraining stage #56

dataset format of pretraining stage #56

Comments

annopackage commented Jul 17, 2024 • edited Loading

annopackage commented Jul 17, 2024 •

edited

Loading