Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with "IOT instruction (core dumped)" #11

Closed
dur-randir opened this issue Jun 27, 2024 · 8 comments
Closed

Segfault with "IOT instruction (core dumped)" #11

dur-randir opened this issue Jun 27, 2024 · 8 comments

Comments

@dur-randir
Copy link

The following invocation

./tortoise --message 'So these models are trained 5.4 billion annotations across 126 million images. The number of images is a lot less, but maybe there are more (or better) annotation across those images.

It might also be the case that with Flamingo there was only an alt-text for the whole image, while the FLD-5B dataset used in Florence-2 has multiple annotations per image (segment).

But look at the table on the HuggingFace page. These models are beating most of the multi-billion models on most of the benchmarks.' --voice "../models/mouse.bin" --seed 0 --output "based?.wav"

results in

[2]    342702 IOT instruction (core dumped)  ./tortoise --message  --voice "../models/mouse.bin" --seed 0 --output
@balisujohn
Copy link
Owner

Could be a phrase length issue; try "this is a test message"

@balisujohn
Copy link
Owner

balisujohn commented Jun 27, 2024

Also the message can only include lowercase letters, spaces, and punctuation(though this isn't why it core dumped).

@dur-randir
Copy link
Author

Also the message can only include lowercase letters, spaces, and punctuation(though this isn't why it core dumped).

Yes, it warns that it sees unknown tokens, but it shouldn't segfault on them anyways. I've retried with lower-cased version without the unknown tokens with the same result.

Could be a phrase length issue; try "this is a test message"

Maybe. This exact phrase works, phrase from the readme works. This works

so these models are trained billion annotations across million images. the number of images is a lot less, but maybe there are more (or better) annotation across those images.

and this doesn't

so these models are trained billion annotations across million images. the number of images is a lot less, but maybe there are more (or better) annotation across those images. it might also be the case that with flamingo there was only an alt-tex

@balisujohn
Copy link
Owner

The second phrase works on a 1070ti for me.

./tortoise --message "so these models are trained billion annotations across million images. the number of images is a lot less, but maybe there are more (or better) annotation across those images. it might also be the case that with flamingo there was only an alt-tex"
gpt_vocab_init: loading vocab from '../models/tokenizer.json'
gpt_vocab_init: vocab size = 255
autoregressive_model_load: loading model from '../models/ggml-model.bin'
autoregressive_model_load: ggml tensor size    = 368 bytes
autoregressive_model_load: backend buffer size = 1889.29 MB
autoregressive_model_load: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1070 Ti, compute capability 6.1, VMM: yes
autoregressive_model_load: model size  =  1510.54 MB
 329728 diffusion_model_load: loading model from '../models/ggml-diffusion-model.bin'
diffusion_model_load: ggml tensor size    = 368 bytes
diffusion_model_load: backend buffer size = 689.28 MB
diffusion_model_load: using CUDA backend
diffusion_model_load: model size  =   688.49 MB
vocoder_model_load: loading model from '../models/ggml-vocoder-model.bin'
vocoder_model_load: ggml tensor size    = 368 bytes
vocoder_model_load: backend buffer size =  56.42 MB
vocoder_model_load: using CUDA backend
vocoder_model_load: model size  =    56.42 MB
vocoder: compute buffer size: 4123.97 MB
WAV file saved successfully. :^)

Make sure your submodule is up to date; I released a patch that allows long phrases on GPU very recently. You might want to do a fresh recursive clone of the repository to make sure the submodule is actually updated.

@dur-randir
Copy link
Author

I'm running it on CPU, not on GPU. Nevertheless, I've tried with updated sources - and segfault is gone. Though the sound deteriorates further into the sentence - but I couldn't run the original model to compare to.

@balisujohn
Copy link
Owner

balisujohn commented Jul 3, 2024

yeah you might have to try a few times to get a good generation with multiple sentences; you're better off splitting your input text by "." or something to that effect and passing one sentence into tortoise.cpp at a time. Generation quality is consistent with the "fast" preset of tortoise-tts, with the caveat that the original version generates multiple wav candidates and uses CLVP to pick the best one, whereas tortoise.cpp doesn't implement CLVP. So, at the cost of additional computation, tortoise-tts has a bit of an edge over tortoise.cpp in terms of average final generation quality. but you can get just as good generations with tortoise.cpp with enough tries. Another caveat is that tortoise.cpp only re-implements the "fast" preset of tortoise-tts, when there are higher quality settings with more diffusion steps available in tortoise-tts.

Please let me know if you have any further issues; feedback is super valuable at this stage of the project :^)

@dur-randir
Copy link
Author

dur-randir commented Jul 3, 2024

passing one sentence into tortoise.cpp at a time

Yeah, I thought the same. Single sentences sounds good, but I haven't yet tried to merge them - I was more interested in checking intonations/overall voice quality.

@balisujohn
Copy link
Owner

balisujohn commented Jul 3, 2024

It varies a lot by voice, I find that the premade voices provided with tortoise-tts with the name format train_<name> seem subjectively the most stable and humanlike. This includes mol and mouse, but I think there are others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants