Closed
Description
The following invocation
./tortoise --message 'So these models are trained 5.4 billion annotations across 126 million images. The number of images is a lot less, but maybe there are more (or better) annotation across those images.
It might also be the case that with Flamingo there was only an alt-text for the whole image, while the FLD-5B dataset used in Florence-2 has multiple annotations per image (segment).
But look at the table on the HuggingFace page. These models are beating most of the multi-billion models on most of the benchmarks.' --voice "../models/mouse.bin" --seed 0 --output "based?.wav"
results in
[2] 342702 IOT instruction (core dumped) ./tortoise --message --voice "../models/mouse.bin" --seed 0 --output
Metadata
Metadata
Assignees
Labels
No labels