Skip to content

Use prompt template from HF tokenizer #131

@metascroy

Description

@metascroy

Currently we hard code prompt templates in ExecuTorch LLM apps.

But HF tokenizers know how to apply the chat template, e.g.,

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)

This makes using HF models from python with the right template very easy.

Can we have similar logic in our C++ runners?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions