Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
8adaf2e
Add Qwen3 model support
nyo16 Oct 5, 2025
0499d71
Add last token pooling support for Qwen3-Embedding models
nyo16 Oct 5, 2025
1d92e9e
Add Qwen3 embedding architecture and instruction prompts support
nyo16 Oct 5, 2025
47c337d
Add .lexical/ to gitignore and IEx usage guide
nyo16 Oct 5, 2025
6f68d8f
mix format and rebuilding lock
nyo16 Oct 5, 2025
5641a4f
Add Qwen3-Reranker support and example
nyo16 Oct 5, 2025
fa592c3
Organize Qwen3 examples into dedicated folder
nyo16 Oct 5, 2025
8208efd
Address PR review feedback for Qwen3 support
Oct 6, 2025
1f24cc6
Fix Qwen3 layer naming for Layers.Transformer.blocks
Oct 6, 2025
cb181f3
Map qwen3 model type to :qwen2 tokenizer type
Oct 6, 2025
1651488
Add comprehensive Qwen3 notebook with examples
Oct 6, 2025
c02c295
Add instruction format to embeddings example in Qwen3 notebook
Oct 6, 2025
bd19c79
Add Qwen3 model tests with reference values
Oct 6, 2025
8d787ee
Fix Qwen3 embedding pooling to use attention mask instead of pad_toke…
nyo16 Oct 10, 2025
a1923e1
Add :for_reranker architecture for Qwen3
nyo16 Oct 10, 2025
0f271b5
Address PR #423 review comments: simple fixes
nyo16 Nov 7, 2025
66e2a1b
Update lib/bumblebee/text/pre_trained_tokenizer.ex
nyo16 Nov 7, 2025
81285e7
Merge branch 'qwen3-dense-support' of github.com:nyo16/bumblebee into…
nyo16 Nov 7, 2025
1e189b8
Merge branch 'main' into qwen3-dense-support
nyo16 Nov 7, 2025
cc92ccc
Rename text_reranking to text_reranking_qwen3
nyo16 Nov 7, 2025
660ef1b
Remove :for_reranker architecture, use :for_causal_language_modeling
nyo16 Nov 7, 2025
9fccfaa
Fix syntax error and document :last_token_pooling option
Nov 16, 2025
b289b75
Make query_norm and key_norm always functions
Nov 16, 2025
7604f42
Fix duplicate rotary_embedding key in transformer blocks
Nov 16, 2025
7a7eb93
Update Qwen3 tests to use bumblebee-testing models
Nov 16, 2025
bd4f915
run formatter
Nov 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,6 @@ bumblebee-*.tar

# Temporary files, for example, from tests.
/tmp/

# Lexical LSP
/.lexical/
58 changes: 58 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Bumblebee Examples

This directory contains example scripts demonstrating how to use Bumblebee models.

## Qwen3 Examples

See the `qwen3/` subdirectory for comprehensive Qwen3 model examples:

### Text Generation
```bash
elixir examples/qwen3/qwen3.exs
```

### Text Embeddings
```bash
elixir examples/qwen3/qwen3_embedding.exs
elixir examples/qwen3/qwen3_embedding_prompts.exs
```

### Document Reranking
```bash
elixir examples/qwen3/qwen3_reranker.exs
```

### Features Demonstrated

**Text Generation** (`qwen3.exs`):
- Text completion
- Question answering
- Chat format
- Code generation

**Embeddings** (`qwen3_embedding.exs`, `qwen3_embedding_prompts.exs`):
- 1024-dimensional text embeddings
- Semantic similarity computation
- Instruction-aware prompts (recommended by Qwen team)
- Multilingual support
- Code search

**Reranking** (`qwen3_reranker.exs`):
- Query-document relevance scoring
- Custom task instructions
- Top-k result selection

### Requirements

- **Text Generation**: ~8GB disk space, ~10GB RAM
- **Embeddings**: ~1.5GB disk space, ~4GB RAM (0.6B model)
- **Reranking**: ~1.5GB disk space, ~4GB RAM (0.6B model)
- **Backend**: EXLA (CPU or GPU)

### Documentation

See `examples/qwen3/QWEN3_IEX_GUIDE.md` for interactive IEx usage examples.

## Phoenix Examples

See the `phoenix/` subdirectory for LiveView-based examples.
206 changes: 206 additions & 0 deletions examples/qwen3/QWEN3_IEX_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Qwen3 IEx Usage Guide

## Text Generation (Qwen3-4B-Instruct)

```elixir
# Start IEx
iex -S mix

# Set backend
Nx.default_backend(EXLA.Backend)

# Load model components
{:ok, m} = Bumblebee.load_model({:hf, "Qwen/Qwen3-4B-Instruct-2507"})
{:ok, t} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-4B-Instruct-2507"})
{:ok, c} = Bumblebee.load_generation_config({:hf, "Qwen/Qwen3-4B-Instruct-2507"})

# Create serving
s = Bumblebee.Text.generation(m, t, c)

# Generate text
Nx.Serving.run(s, "The future of AI is")

# With chat format
prompt = "<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is Elixir?<|im_end|>
<|im_start|>assistant
"
Nx.Serving.run(s, prompt)
```

## Text Embeddings (Qwen3-Embedding-0.6B)

### Method 1: Using :for_embedding Architecture (Recommended)

```elixir
# Start IEx
iex -S mix

# Set backend
Nx.default_backend(EXLA.Backend)

# Load embedding model with :for_embedding architecture
{:ok, m} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Embedding-0.6B"},
architecture: :for_embedding
)
{:ok, t} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-Embedding-0.6B"})

# Create serving
s = Bumblebee.Text.text_embedding(m, t,
output_attribute: :embedding,
embedding_processor: :l2_norm
)

# Generate embeddings
e1 = Nx.Serving.run(s, "The cat sat on the mat")
e2 = Nx.Serving.run(s, "A feline rested on the rug")
e3 = Nx.Serving.run(s, "Python is a programming language")

# Check dimension
Nx.shape(e1.embedding) # {1024}

# Compute similarity
Nx.dot(e1.embedding, e2.embedding) |> Nx.to_number() # ~0.73 (similar)
Nx.dot(e1.embedding, e3.embedding) |> Nx.to_number() # ~0.34 (different)
```

### Method 2: Direct Model Access (Advanced)

```elixir
# For more control over the pipeline
{:ok, m} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Embedding-0.6B"},
architecture: :for_embedding
)
{:ok, t} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-Embedding-0.6B"})

{_init, predict} = Axon.build(m.model)

# Generate embedding
inputs = Bumblebee.apply_tokenizer(t, "test text")
output = predict.(m.params, inputs)
embedding = Bumblebee.Utils.Nx.normalize(output.embedding)
Nx.shape(embedding) # {1, 1024}
```

## Instruction-Aware Embeddings (Qwen Team Recommendation)

```elixir
# Setup
Nx.default_backend(EXLA.Backend)
{:ok, m} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Embedding-0.6B"},
architecture: :for_embedding
)
{:ok, t} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-Embedding-0.6B"})
s = Bumblebee.Text.text_embedding(m, t,
output_attribute: :embedding,
embedding_processor: :l2_norm
)

# Without instruction
query = "What is the capital of France?"
q_plain = Nx.Serving.run(s, query)

# With instruction (recommended by Qwen team)
query_prompted = "Instruct: Given a web search query, retrieve relevant passages that answer the query
Query: What is the capital of France?"
q_with_prompt = Nx.Serving.run(s, query_prompted)

# Documents (no instruction needed)
doc = "Paris is the capital and largest city of France."
d = Nx.Serving.run(s, doc)

# Compare
Nx.dot(q_plain.embedding, d.embedding) |> Nx.to_number()
Nx.dot(q_with_prompt.embedding, d.embedding) |> Nx.to_number()
```

## Custom Task Instructions

```elixir
# Code search
code_query = "Instruct: Given a code search query, find relevant code snippets
Query: function to calculate factorial"

code_doc = "def factorial(n), do: if n <= 1, do: 1, else: n * factorial(n - 1)"

q = Nx.Serving.run(s, code_query)
d = Nx.Serving.run(s, code_doc)

Nx.dot(q.embedding, d.embedding) |> Nx.to_number() # High similarity
```

## Semantic Search Example

```elixir
# Index documents
documents = [
"Paris is the capital of France",
"Berlin is the capital of Germany",
"Machine learning uses neural networks",
"The Eiffel Tower is in Paris"
]

doc_embeddings = Enum.map(documents, fn doc ->
Nx.Serving.run(s, doc).embedding
end)

# Search
query = "Instruct: Given a web search query, retrieve relevant passages
Query: What is the French capital?"
q_emb = Nx.Serving.run(s, query).embedding

# Compute similarities
similarities = Enum.map(doc_embeddings, fn doc_emb ->
Nx.dot(q_emb, doc_emb) |> Nx.to_number()
end)

# Show results ranked by similarity
Enum.zip(documents, similarities)
|> Enum.sort_by(&elem(&1, 1), :desc)
|> Enum.each(fn {doc, score} ->
IO.puts("#{Float.round(score, 3)}: #{doc}")
end)
```

## Batch Processing

```elixir
# Process multiple texts at once
texts = [
"First document",
"Second document",
"Third document"
]

results = Nx.Serving.run(s, texts)

embeddings = Enum.map(results, & &1.embedding)
```

## Model Variants

```elixir
# Different sizes available
{:ok, m} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Embedding-0.6B"}, architecture: :for_embedding)
{:ok, m} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Embedding-4B"}, architecture: :for_embedding)
{:ok, m} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Embedding-8B"}, architecture: :for_embedding)
```

## Common Similarity Metrics

```elixir
# Cosine similarity (recommended for normalized embeddings)
cosine_sim = fn e1, e2 -> Nx.dot(e1, e2) |> Nx.to_number() end

# Euclidean distance
euclidean = fn e1, e2 ->
Nx.subtract(e1, e2) |> Nx.pow(2) |> Nx.sum() |> Nx.sqrt() |> Nx.to_number()
end

# Manhattan distance
manhattan = fn e1, e2 ->
Nx.subtract(e1, e2) |> Nx.abs() |> Nx.sum() |> Nx.to_number()
end
```
79 changes: 79 additions & 0 deletions examples/qwen3/qwen3.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/env elixir

# Qwen3-4B-Instruct Text Generation
#
# This example demonstrates using the Qwen3-4B-Instruct model for various
# text generation tasks including completion, chat, and code generation.
#
# Usage:
# elixir examples/qwen3.exs

Mix.install([
{:bumblebee, "~> 0.6.0"},
{:exla, ">= 0.0.0"}
])

Application.put_env(:nx, :default_backend, EXLA.Backend)

# Load model, tokenizer, and generation configuration
{:ok, model_info} = Bumblebee.load_model({:hf, "Qwen/Qwen3-4B-Instruct-2507"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-4B-Instruct-2507"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "Qwen/Qwen3-4B-Instruct-2507"})

# Configure generation parameters
generation_config =
Bumblebee.configure(generation_config,
max_new_tokens: 100,
strategy: %{type: :multinomial_sampling, top_k: 20, top_p: 0.8},
temperature: 0.7
)

# Create text generation serving
serving = Bumblebee.Text.generation(model_info, tokenizer, generation_config)

# Example 1: Text Completion
IO.puts("\n=== Text Completion ===")
result = Nx.Serving.run(serving, "The future of artificial intelligence")
IO.puts(result.results |> hd() |> Map.get(:text))

# Example 2: Question Answering with Chat Format
IO.puts("\n=== Question Answering ===")

prompt = """
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What are the key features of the Elixir programming language?<|im_end|>
<|im_start|>assistant
"""

result = Nx.Serving.run(serving, prompt)
IO.puts(result.results |> hd() |> Map.get(:text))

# Example 3: Code Generation
IO.puts("\n=== Code Generation ===")

prompt = """
<|im_start|>system
You are an expert Elixir programmer.<|im_end|>
<|im_start|>user
Write a function to calculate the nth Fibonacci number using recursion.<|im_end|>
<|im_start|>assistant
"""

result = Nx.Serving.run(serving, prompt)
IO.puts(result.results |> hd() |> Map.get(:text))

# Example 4: Creative Writing
IO.puts("\n=== Creative Writing ===")

prompt = """
<|im_start|>system
You are a creative storyteller.<|im_end|>
<|im_start|>user
Write the opening paragraph of a science fiction story.<|im_end|>
<|im_start|>assistant
"""

result = Nx.Serving.run(serving, prompt)
IO.puts(result.results |> hd() |> Map.get(:text))
Loading
Loading