-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to reproduce the scores using provided checkpoint on NLG tasks #138
Comments
Hello, I have the same problem, have you solved it please? |
@1181000705 Yes, partially. The low NLG scores are because somehow the original script did not load the pretrained backbone models. So make sure you are loading the backbone models weights correctly, together with the lora weights, you will get the published scores. As for the |
I see. Would be great if could make a PR to fix that!
Seems like a dependency issue. |
The error is here:
The saved checkpoint contains the However, in examples/NLG/src/model.py, line 448:
This I suggest to slightly modify the weight loading function like this:
|
same error with webnlg, how to fix it |
Hi, I was able to reproduce the GLUE benchmark results but not the NLG task.
For NLG tasks, I downloaded the checkpoint for GPT2-M and follow the step 2,3,4 in the instructions
https://github.com/microsoft/LoRA/tree/main/examples/NLG
However, the scores were either extremely low, or there were errors during evaluations.
For e2e task I got:
SCORES:
BLEU: 0.0000
NIST: 0.0196
METEOR: 0.0034
ROUGE_L: 0.0072
CIDEr: 0.0000
For WebNLG and DART, I see error:
Error: test and reference not same length
ERROR ON COMPUTING METEOR. MAKE SURE YOU HAVE JAVA INSTALLED GLOBALLY ON YOUR MACHINE.
I do have java installed.
And the other scores were also low:
BLEU BLEU NLTK METEOR chrF++ TER BERT-SCORE P BERT-SCORE R BERT-SCORE F1 BLEURT
0 0 -1 0.11 1.96 0 0 0 -1
Any suggestions? Thank you.
The text was updated successfully, but these errors were encountered: