huggingface · burtenshaw · Feb 17, 2025 · Jan 29, 2025 · Jan 30, 2025 · Jan 30, 2025
diff --git a/chapters/en/chapter11/1.mdx b/chapters/en/chapter11/1.mdx
@@ -0,0 +1,33 @@
+# Supervised Fine-Tuning
+
+This chapter will introduce fine-tuning generative language models with supervised fine-tuning (SFT). SFT involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. We will separate this chapter into three sections:
+
+## 1️⃣ Chat Templates
+
+Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages.
+
+## 2️⃣ Supervised Fine-Tuning
+
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices.
+
+## 3️⃣ Low Rank Adaptation (LoRA)
+
+Low Rank Adaptation (LoRA) is a technique for fine-tuning language models by adding low-rank matrices to the model's layers. This allows for efficient fine-tuning while preserving the model's pre-trained knowledge.
+
+## 4️⃣ Evaluation
+
+Evaluation is a crucial step in the fine-tuning process. It allows us to measure the performance of the model on a task-specific dataset.
+
+<Tip>
+⚠️ In order to benefit from all features available with the Model Hub and 🤗 Transformers, we recommend <a href="https://huggingface.co/join">creating an account</a>.
+</Tip>
+
+## References
+
+- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating)
+- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)
+- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer)
+- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290)
+- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning)
+- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma)
+- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format)
diff --git a/chapters/en/chapter11/10.mdx b/chapters/en/chapter11/10.mdx
@@ -0,0 +1,56 @@
+# Implementing Evaluation
+
+In this section, we will implement evaluation for our finetuned model. We can use `lighteval` to evaluate our finetuned model on standard benchmarks, which contains a wide range of tasks built into the library. We just need to define the tasks we want to evaluate and the parameters for the evaluation.
+
+LightEval tasks are defined using a specific format:
+
+```
+{suite}|{task}|{num_few_shot}|{auto_reduce}
+```
+
+| Parameter | Description |
+|-----------|-------------|
+| `suite` | The benchmark suite (e.g., 'mmlu', 'truthfulqa') |
+| `task` | Specific task within the suite (e.g., 'abstract_algebra') |
+| `num_few_shot` | Number of examples to include in prompt (0 for zero-shot) |
+| `auto_reduce` | Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) |
+
+Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference.
+
+## Example Evaluation Pipeline
+
+Let's set up an evaluation pipeline for our finetuned model. We will evaluate the model on  set of sub tasks that relate to the domain of medicine. 
+
+Here's a complete example of evaluating on automatic benchmarks relevant to one specific domain using Lighteval with the VLLM backend:
+
+```bash
+lighteval vllm \
+    "pretrained=your-model-name" \
+    "mmlu|anatomy|0|0" \
+    "mmlu|high_school_biology|0|0" \
+    "mmlu|high_school_chemistry|0|0" \
+    "mmlu|professional_medicine|0|0" \
+    --max_samples 40 \
+    --batch_size 1 \
+    --output_path "./results" \
+    --save_generations true
+```
+
+Results are displayed in a tabular format showing:
+
+```
+|                  Task                  |Version|Metric|Value |   |Stderr|
+|----------------------------------------|------:|------|-----:|---|-----:|
+|all                                     |       |acc   |0.3333|±  |0.1169|
+|leaderboard:mmlu:_average:5             |       |acc   |0.3400|±  |0.1121|
+|leaderboard:mmlu:anatomy:5              |      0|acc   |0.4500|±  |0.1141|
+|leaderboard:mmlu:high_school_biology:5  |      0|acc   |0.1500|±  |0.0819|
+```
+
+Lighteval also include a python API for more detailed evaluation tasks, which is useful for manipulating the results in a more flexible way. Check out the [Lighteval documentation](https://huggingface.co/docs/lighteval/using-the-python-api) for more information.
+
+<Tip>
+
+✏️ **Try it out!** Evaluate your finetuned model on a specific task in lighteval.
+
+</Tip>
diff --git a/chapters/en/chapter11/11.mdx b/chapters/en/chapter11/11.mdx
@@ -0,0 +1,13 @@
+# Conclusion
+
+In this chapter, we explored the essential components of fine-tuning language models:
+
+1. **Chat Templates** provide structure to model interactions, ensuring consistent and appropriate responses through standardized formatting.
+
+2. **Supervised Fine-Tuning (SFT)** allows adaptation of pre-trained models to specific tasks while maintaining their foundational knowledge.
+
+3. **LoRA** offers an efficient approach to fine-tuning by reducing trainable parameters while preserving model performance.
+
+4. **Evaluation** helps measure and validate the effectiveness of fine-tuning through various metrics and benchmarks.
+
+These techniques, when combined, enable the creation of specialized language models that can excel at specific tasks while remaining computationally efficient. Whether you're building a customer service bot or a domain-specific assistant, understanding these concepts is crucial for successful model adaptation.
diff --git a/chapters/en/chapter11/2.mdx b/chapters/en/chapter11/2.mdx
@@ -0,0 +1,66 @@
+# Chat Templates
+
+Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns.
+
+## Base Models vs Instruct Models
+
+A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant.
+
+To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant).
+
+It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template.
+
+## Understanding Chat Templates
+
+At their core, chat templates are structured string representations of conversations. They define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template:
+
+```sh
+<|im_start|>user
+Hi there!<|im_end|>
+<|im_start|>assistant
+Nice to meet you!<|im_end|>
+<|im_start|>user
+Can I ask a question?<|im_end|>
+<|im_start|>assistant
+```
+
+The `transformers` library will take care of chat templates for you in relation to the model's tokenizer. Read more about how transformers builds chat templates [here](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates). All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest. Here's a basic example of a conversation:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
+    {"role": "user", "content": "Can you explain what a chat template is?"},
+    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."}
+]
+```
+
+Let's break down the above example, and see how it maps to the chat template format.
+
+## System Messages
+
+System messages set the foundation for how the model should behave. They act as persistent instructions that influence all subsequent interactions. For example:
+
+```python
+system_message = {
+    "role": "system",
+    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
+}
+```
+
+## Conversations
+
+Chat templates can maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations:
+
+```python
+conversation = [
+    {"role": "user", "content": "I need help with my order"},
+    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
+    {"role": "user", "content": "It's ORDER-123"},
+]
+```
+
+<Tip>
+
+✏️ **Try it out!** Create a chat template for a conversation between a user and an assistant. Then, use the `transformers` library to tokenize the conversation and see how the model responds. You won't need to download the model to do this, as the tokenizer will handle the formatting.
+
+</Tip>
diff --git a/chapters/en/chapter11/3.mdx b/chapters/en/chapter11/3.mdx
@@ -0,0 +1,83 @@
+# Implementation with Transformers
+
+Now that we understand how chat templates work, let's see how we can implement them using the `transformers` library. The transformers library provides built-in support for chat templates, we just need to use the `apply_chat_template()` method to format our messages.
+
+```python
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
+
+messages = [
+    {"role": "system", "content": "You are a helpful coding assistant."},
+    {"role": "user", "content": "Write a Python function to sort a list"},
+]
+
+# Apply the chat template
+formatted_chat = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+```
+
+This will return a formatted string that can be passed to the model. It would look like this for the SmolLM2-135M-Instruct model specified:
+
+```sh
+<|im_start|>system
+You are a helpful coding assistant.<|im_end|>
+<|im_start|>user
+Write a Python function to sort a list<|im_end|>
+```
+
+Note that the `im_start` and `im_end` tokens are used to indicate the start and end of a message. The tokenizer will also have corresponding special tokens for the start and end of messages. For a refresher on how these tokens work, see the [Tokenizers](../chapter2/5.mdx) section.
+
+Chat templates can handle multi-turn conversations while maintaining context:
+
+```python
+messages = [
+    {"role": "system", "content": "You are a math tutor."},
+    {"role": "user", "content": "What is calculus?"},
+    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
+    {"role": "user", "content": "Can you give me an example?"},
+]
+```
+
+## Working with Chat Templates
+
+When working with chat templates, you have several options for processing the conversation:
+
+1. Apply the template without tokenization to return the raw formatted string
+2. Apply the template with tokenization to return the token IDs
+3. Add a generation prompt to prepare for model inference
+
+The tokenizer's `apply_chat_template()` method handles all these cases through its parameters:
+
+- `tokenize`: Whether to return token IDs (True) or the formatted string (False)
+- `add_generation_prompt`: Whether to add a prompt for the model to generate a response
+
+<Tip>
+
+✏️ **Try it out!** Take a dataset from the Hugging Face hub and process it for Supervised Fine-Tuning (SFT). Convert the `HuggingFaceTB/smoltalk` dataset into chatml format and save it to a new file.
+
+For this exercise, you'll need to:
+1. Load the dataset using the Hugging Face datasets library
+2. Create a processing function that converts the samples into the correct chat format
+3. Apply the chat template using the tokenizer's methods
+
+</Tip>
+
+## Conclusion
+
+Chat templates are a crucial component for working with language models, especially when fine-tuning or deploying models for chat applications. They provide structure and consistency to conversations, making it easier for models to understand context and generate appropriate responses.
+
+Understanding how to work with chat templates is essential for:
+- Converting datasets for fine-tuning
+- Preparing inputs for model inference
+- Maintaining conversation context
+- Ensuring consistent model behavior
+
+## Resources
+
+- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating)
+- [Transformers Documentation](https://huggingface.co/docs/transformers)
+- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) 
diff --git a/chapters/en/chapter11/4.mdx b/chapters/en/chapter11/4.mdx
@@ -0,0 +1,125 @@
+# Supervised Fine-Tuning
+
+Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on relevant datasets with human-validated examples.
+
+Because of the supervised structure of the task, the model can learn to generate structured outputs. For example, the chat templates we created in the previous sections.
+
+## Understanding Supervised Fine-Tuning
+
+Supervised fine-tuning is about teaching a pre-trained model to perform specific tasks, and use specific output structures, through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. 
+
+SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.
+
+## When to Use Supervised Fine-Tuning
+
+The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.
+
+Two core reasons to use SFT are:
+
+1. **Template Control**: SFT allows you to control the output structure of the model, ensuring that it generates outputs in a specific format. For example, you need a specific chat template to generate structured outputs.
+
+2. **Domain-Specific Requirements**: SFT is effective when you need precise control over the model's outputs in specialized domains. For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. SFT can help align the model's responses with professional standards and domain expertise.
+
+## Quiz
+
+### 1. What is the primary purpose of Supervised Fine-Tuning (SFT)?
+
+<Question
+	choices={[
+		{
+			text: "To train a language model from scratch",
+			explain: "SFT builds upon pre-trained models rather than training from scratch."
+		},
+		{
+			text: "To adapt a pre-trained model to specific tasks or domains while maintaining its foundational knowledge",
+			explain: "Correct! SFT allows models to learn specific tasks while leveraging their pre-trained capabilities.",
+			correct: true
+		},
+		{
+			text: "To compress a large language model into a smaller one",
+			explain: "This is more related to model distillation, not SFT."
+		}
+	]}
+/>
+
+### 2. Which of the following are valid reasons to use SFT?
+
+<Question
+	choices={[
+		{
+			text: "Template Control - ensuring the model generates outputs in a specific format",
+			explain: "Yes! SFT helps enforce specific output structures through training examples.",
+			correct: true
+		},
+		{
+			text: "Domain Adaptation - teaching the model domain-specific knowledge and terminology",
+			explain: "Correct! SFT is excellent for adapting models to specialized domains.",
+			correct: true
+		},
+		{
+			text: "Model Architecture Changes - modifying the underlying structure of the model",
+			explain: "SFT doesn't change the model architecture, it only updates the weights."
+		}
+	]}
+/>
+
+### 3. What is required for effective Supervised Fine-Tuning?
+
+<Question
+	choices={[
+		{
+			text: "A pre-trained language model",
+			explain: "Yes! SFT starts with a pre-trained model as its foundation.",
+			correct: true
+		},
+		{
+			text: "Validated examples of desired input-output behavior",
+			explain: "Correct! Quality training data is crucial for successful SFT.",
+			correct: true
+		},
+		{
+			text: "A high performing reference model",
+			explain: "SFT uses existing architectures rather than creating new ones."
+		}
+	]}
+/>
+
+### 4. How does SFT relate to chat templates?
+
+<Question
+	choices={[
+		{
+			text: "SFT can train models to consistently follow specific chat templates",
+			explain: "Correct! SFT helps models learn to generate responses in the desired template format.",
+			correct: true
+		},
+		{
+			text: "Chat templates are not compatible with SFT",
+			explain: "Incorrect! Chat templates are commonly used with SFT for structured outputs."
+		},
+		{
+			text: "SFT automatically creates chat templates",
+			explain: "SFT doesn't create templates, it trains models to use existing templates."
+		}
+	]}
+/>
+
+### 5. What distinguishes SFT from pre-training?
+
+<Question
+	choices={[
+		{
+			text: "SFT uses labeled data for specific tasks",
+			explain: "Yes! SFT requires examples of desired behavior for specific tasks.",
+			correct: true
+		},
+		{
+			text: "SFT is faster than pre-training",
+			explain: "The speed difference isn't a defining characteristic; it depends on various factors."
+		},
+		{
+			text: "SFT requires more data than pre-training",
+			explain: "Actually, SFT typically uses less data than pre-training, focusing on task-specific examples."
+		}
+	]}
+/>