Preload the model into Ollama to get faster response times #1162

wintermeyer · 2024-11-20T07:02:59Z

When Ollama has to load the model it takes longer for it to answer our requests. We can force it to preload the model by a simple empty request:

curl http://localhost:11434/api/generate -d '{"model": "llava:13b"}'

We already know what pages are likely to ask the LLM for help. When those pages are accessed by the user we should preload the model with a spawn (so that we don't shoot ourself in the foot by this).

There is little harm by doing this when the model is already loaded but a big improvement when it isn't.

wintermeyer added the enhancement New feature or request label Nov 20, 2024

wintermeyer added this to the MVP milestone Nov 20, 2024

wintermeyer assigned briankariuki and MICHAELMUNAVU83 Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preload the model into Ollama to get faster response times #1162

Preload the model into Ollama to get faster response times #1162

wintermeyer commented Nov 20, 2024

Preload the model into Ollama to get faster response times #1162

Preload the model into Ollama to get faster response times #1162

Comments

wintermeyer commented Nov 20, 2024