-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Replies: 4 comments · 5 replies
-
@dotXem Thank you for your question. Your configuration looks great, except for the prompt part:
I do not see the ending Moreover, even after the above suggestions are incorporated, you may still see some fluctuations in the loss values during training; as long as the overall loss is generally decreased, the training is successful. Likewise, early on the predictions might not be so good, but when the training is completed, you should expect to see a "good" performance. However, even that is not so simple. You may have an imbalanced dataset, unclean labels, and a host of other issues limiting the model performance. So it is an ongoing work, until you are satisfied. Thank you, and please let us know how it turns out. |
Beta Was this translation helpful? Give feedback.
All reactions
-
The closing """ is located at the bottom. As for the placeholder for the "target", I believe I don't need one as it should be implicitly be at end of the prompt, where the generation is supposed to begin. In the blog post you mention, they don't have placeholder for the target. |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for both your answers!
The training step loss that I showed you is from the last epoch, which resulted in a slight global loss decrease. So no, it's not coming from an average with the first steps having a high loss compared to the last ones (unless there is something going on with the scheduler?). That being said, is it possible that the global loss displayed at the end is just average the accumulation of the 16 steps (per the config) over the epoch? The global loss is roughly equal to 16 times the step loss.
The outputs were all from the last epoch. The validation step generation are roughly the same for all epochs, without noticeable improvements (except in the loss, but not in the generation). My issue is: just before the training stops (at the end of the last epoch), generation in the validation step is bad but generation using the |
Beta Was this translation helpful? Give feedback.
All reactions
-
@dotXem I see the closing quotations now -- thank you. What I mean by placeholder is a prompt hint -- so that the target will follow that. Trying this may make the results perform better; otherwise, what you have is a normal situation. Thank you. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Isn't that a prompt hint? Or maybe it's badly formatted?
|
Beta Was this translation helpful? Give feedback.
All reactions
-
@dotXem Those are examples. Let us analyze the following prompt as an illustration:
This prompt contains one Input/Output example. But then when it comes to the instruction, we have:
This is where the model is prompted with the actual I hope that this helps. Thank you! |
Beta Was this translation helpful? Give feedback.
All reactions
-
Sorry not sure i understand what you are saying, and not sure I understand the role of the scheduler, can you help clarify? It doesn't really matter if it's the first or the 5th, the value reported in the cell in the table is the average over the entire epoch. Or are you implying that you believe the loss is accumulated and not averaged?
The examples generated at the last epochs may be different from the ones you get after training if the has been early stopping or the validation metric deemed a model at a different epoch better. Not sure that is the case for you though, can you share a screenshot of the tensorboard? Anyway, i think both are things worth investigating. @alexsherstinsky @arnavgarg1 @justinxzhao |
Beta Was this translation helpful? Give feedback.
All reactions
-
Scheduler is here because of it was in the tutorial example I used. I'll try without it to see if there is any difference.
Not a strong belief, but that's what I was suggesting. My epoch loss is more than 10x the displayed step loss, which is weird to me.
Last epoch has the best validation loss, so is selected with early stopping.
Not sure how to do that, but I can share the full training logs I currently have, displaying the 5 epochs logs.
|
Beta Was this translation helpful? Give feedback.
-
Hello there, I am trying to use Ludwig for a LLM finetuning task, and I have two questions regarding the training logs I get:
Here is my config:
The code that runs the training and does the inference afterwards (where
qlora_fine_tuning_config
is the config above):During the training of the epoch, loss is displayed with value of 0.169; but in the report it's a complete different value. What am I not understanding?
Training: 100%|█████████████████████████████████████| 4500/4500 [1:55:32<00:00, 1.56s/it, loss=0.169]
Here are one generated example during evaluation step (all other examples look like this)
It's nowhere close to what I am trying to achieve, but when doing inference after the training, I actually get decent results.
Beta Was this translation helpful? Give feedback.
All reactions