You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I observed in the training output logs that the captions did not always include the autocaption_prefix or autocaption_suffix. This is pretty important as it contains the unique trigger token.
After looking at caption.py I noticed this block:
if autocaption_prefix:
inp += f"\n\nYou must start the caption with '{autocaption_prefix}'. "
if autocaption_suffix:
inp += f"\n\nYou must end the caption with '{autocaption_suffix}'."
Instead of relying on the llm to add these which it is clearly failing to do, I suggest manually adding to the resulting output as follows:
output = self.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[
0
].strip()
if autocaption_prefix:
output = f"{autocaption_prefix} {output}"
if autocaption_suffix:
output = f"{output} {autocaption_suffix}"
print(f"Caption for {image_path}: {output}")
The text was updated successfully, but these errors were encountered:
I observed in the training output logs that the captions did not always include the autocaption_prefix or autocaption_suffix. This is pretty important as it contains the unique trigger token.
After looking at caption.py I noticed this block:
Instead of relying on the llm to add these which it is clearly failing to do, I suggest manually adding to the resulting output as follows:
The text was updated successfully, but these errors were encountered: