[QUESTION] Sample idx, bin files in public domain for trying out pretrain_gpt.py? #1105
Unanswered
sambar1729
asked this question in
Q&A
Replies: 2 comments
-
@ashors1 Can you please help with this question? Thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi, we've made some changes to Megatron recently to remove the required dependency on Transformer Engine. You should no longer need to install Transformer Engine to run this script. The following works for me:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your question
Ask a clear and concise question about Megatron-LM.
Can we have a sample idx + bin files as required by the pretrain_gpt.py ?
Running tools/preprocess_data.py on some sample data like
needs transformer_engine and on an A100 this takes a long time to build from source (the pip install
also fails).
This is just too much work to get some training data to run
pretrain_gpt.py
with. Can some sampleidx
,bin
files as required by the pretraining be provided in a public place?Thanks.
Beta Was this translation helpful? Give feedback.
All reactions