LLM Runner is a Rust-powered library for running local AI models (like TinyLlama, Phi-1.5) in Flutter apps. It handles model downloading, loading, and inference with a simple API.
- Multiple Models: Support for TinyLlama, Phi-1.5, and more
- Automatic Downloads: Models are downloaded automatically when needed
- Local Execution: All processing happens on device
- Memory Efficient: Models are loaded/unloaded as needed
- Simple API: Just a few lines to get started
Add to your pubspec.yaml:
dependencies:
llm_runner:
git: https://github.com/yourusername/rust_llm_runner.gitimport 'package:llm_runner/llm_runner.dart';
// Use a pre-configured model
final response = await LlmRunner.generateText(
model: Models.tinyllama, // Small, fast model
prompt: "Tell me a joke",
);
// Switch to a more powerful model
final mathResponse = await LlmRunner.generateText(
model: Models.mistral7b, // Better at complex tasks
prompt: "Explain quantum computing",
);
// Use your own custom model
final customModel = Models.custom(
name: 'deepseek-ai/deepseek-math-7b-instruct',
minRamMb: 8192,
description: 'Specialized for mathematics',
);
final mathResult = await LlmRunner.generateText(
model: customModel,
prompt: "Solve: ∫x²dx",
);Models.tinyllama- Fast, lightweightModels.phi2- Good at codingModels.gemma2b- Google's efficient model
Models.llama32_3b- Latest Llama 3.2Models.mistral7b- Powerful open-source
Models.qwen7b- High-quality multilingual
Use any compatible model:
final myModel = Models.custom(
name: 'organization/model-name',
minRamMb: 6144,
description: 'My custom model',
metadata: {
'type': 'instruct',
'language': 'multilingual',
},
);Models should be:
- GGUF format compatible
- Within device memory constraints
- Properly structured (tokenizer, weights, etc.)
See MODELS.md for a full list of tested models.
Models are automatically downloaded and loaded as needed:
// Use TinyLlama
var response = await LlmRunner.generateText(
model: LlmRunner.tinyllama,
prompt: "Tell me a story",
);
// Switch to Phi-1.5
response = await LlmRunner.generateText(
model: LlmRunner.phi15,
prompt: "Solve: x^2 = 16",
);try {
final response = await LlmRunner.generateText(
model: LlmRunner.tinyllama,
prompt: "Hello!",
);
print(response);
} catch (e) {
print('Error: $e');
}-
Model Management: The library automatically handles:
- Model downloading
- Loading into memory
- Efficient switching between models
- Memory cleanup
-
Performance:
- ~50ms per token generation
- ~20 tokens per second
- Automatic memory management
- Flutter 3.0 or higher
- iOS 11+ or Android 21+
- ~500MB free storage per model
- ~1GB RAM for model execution
Contributions welcome! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details