-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird Loss with LISA #806
Comments
Thanks for your interest in LMFlow! We have fixed several bugs of the LISA implementation in LMFlow, it would be nice if you could check whether the implementation matches our latest version. If the implementation is correct, it is worth trying:
Hope this information can be helpful. Please feel free to let us know if further problems are encountered 😄 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi ,
so giving a background -
I am using Mistral 7b along with HF trainer for finetuning on domain specific data.
Where the task is CausalLM ie next word prediction.
Using datacollatorfor Causal LM for data prep using context size is 1000 tokens per data point and I have 9k total dataset. which includes 5-10% of Wiki data for mixing it with Domain data for avoiding Catastrophic Forgetting.
Test data is a part of train to make it learn on the specific data
I am utilizing the
DynamicLayerActivationCallback
from LMFlow in my trainer asTraining Callbacks.
I tried multiple experiments with -
for both of the runs the loss starts around 8 ad goes around 5-6 but it goes into plateau . and doesnt come below 5 .
I find it little strange, maybe need other experimentation on -
Also would like to get the idea, whats the ideal or recommended hyperparams for such type of finetuning with around 10K datapoints.
Thanks in Advance
The text was updated successfully, but these errors were encountered: