-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why doesn't the model input include attention_mask? #58
Comments
Because GPT is a uni-directional language model. It does not need attention mask. |
Why is the response concatenated to the input_ids for both the train and validation datasets? Would not this create over-fitted models? Would it be possible to somehow mask the response ids? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
DialoGPT/LSP_train.py
Line 281 in fa0c0c5
Since it is a LMHeadModel, the
1^th
-n^th
tokens are used to predict the(n+1)^th
token during training, so why not introduce attention_mask for masking the(n+2)^th
-(n+m)^th
tokens. Without attention_mask, there maybe an inconsistency between the training and the testing scene. Is it possible to add attention_mask during training to make the testing better?The text was updated successfully, but these errors were encountered: