Describe the bug
Kimi-K2.6 (moonshotai/Kimi-K2.6) fails to start with a download error or AttributeError after Moonshotai pushed commit 81bcaaa to the HuggingFace repo on 2026-05-11, which removed tokenization_kimi.py (the slow tokenizer).
Exo's load_tokenizer_for_model_id in utils_mlx.py explicitly loads this file and requires the TikTokenTokenizer class with its .model tiktoken attribute for _patched_encode. The fast tokenizer (TikTokenTokenizerFast) cannot substitute — it uses a Rust backend, not tiktoken.
To Reproduce
Start exo on any node (source or .app)
Place a Kimi-K2.6 instance
Exo syncs metadata from HuggingFace, tries to download tokenization_kimi.py
File returns 404
Instance enters start→fail→shutdown loop
Error messages
Download phase:
File not found: https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/tokenization_kimi.py
If download is bypassed, tokenizer load phase:
AttributeError: module 'tokenization_kimi' has no attribute 'TikTokenTokenizer'
Or if fast tokenizer is substituted:
AttributeError: 'NoneType' object has no attribute 'encode'
at utils_mlx.py line 392 in _patched_encode, because hf_tokenizer.model is None on the fast tokenizer.
Root cause
Moonshotai's commit 81bcaaa ("use-fast-tokenizer #38") removed tokenization_kimi.py from the K2.6 repo but:
The file still exists in moonshotai/Kimi-K2-Instruct (same tokenizer, same vocab)
tokenizer_config.json still has "auto_map": {"AutoTokenizer": [null, "tokenization_kimi_fast.TikTokenTokenizerFast"]} — the null slow tokenizer slot
Exo's Kimi-specific code path requires the slow tokenizer's tiktoken.Encoding object
Workaround
Copy the slow tokenizer from the K2-Instruct repo and lock it read-only:
MODEL_DIR="$HOME/.exo/models/moonshotai--Kimi-K2.6"
curl -sL "https://huggingface.co/moonshotai/Kimi-K2-Instruct/raw/main/tokenization_kimi.py"
-o "$MODEL_DIR/tokenization_kimi.py"
chmod 444 "$MODEL_DIR/tokenization_kimi.py"
chmod 444 "$MODEL_DIR/tokenizer_config.json"
The chmod 444 prevents exo from overwriting the files during metadata re-sync on restart.
Suggested fixes
Don't hard-fail if tokenization_kimi.py is 404 during download — skip it gracefully
Fall back to fast tokenizer when slow tokenizer is missing — adapt _patched_encode to work with TikTokenTokenizerFast or use the fast tokenizer's native encode path
Pin model revision to avoid silent upstream breakage from HF repo changes
Environment
Exo: latest .app from exolabs.net + source at commit 87c72fc
Hardware: 4× Mac Studio M3 Ultra, RDMA/Thunderbolt, tensor sharding
macOS: 26.4.1
Model: moonshotai/Kimi-K2.6 (595GB, unquantized)
Worked before: 2026-05-10 (prior to HF commit 81bcaaa)
Describe the bug
Kimi-K2.6 (moonshotai/Kimi-K2.6) fails to start with a download error or AttributeError after Moonshotai pushed commit 81bcaaa to the HuggingFace repo on 2026-05-11, which removed tokenization_kimi.py (the slow tokenizer).
Exo's load_tokenizer_for_model_id in utils_mlx.py explicitly loads this file and requires the TikTokenTokenizer class with its .model tiktoken attribute for _patched_encode. The fast tokenizer (TikTokenTokenizerFast) cannot substitute — it uses a Rust backend, not tiktoken.
To Reproduce
Start exo on any node (source or .app)
Place a Kimi-K2.6 instance
Exo syncs metadata from HuggingFace, tries to download tokenization_kimi.py
File returns 404
Instance enters start→fail→shutdown loop
Error messages
Download phase:
File not found: https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/tokenization_kimi.py
If download is bypassed, tokenizer load phase:
AttributeError: module 'tokenization_kimi' has no attribute 'TikTokenTokenizer'
Or if fast tokenizer is substituted:
AttributeError: 'NoneType' object has no attribute 'encode'
at utils_mlx.py line 392 in _patched_encode, because hf_tokenizer.model is None on the fast tokenizer.
Root cause
Moonshotai's commit 81bcaaa ("use-fast-tokenizer #38") removed tokenization_kimi.py from the K2.6 repo but:
The file still exists in moonshotai/Kimi-K2-Instruct (same tokenizer, same vocab)
tokenizer_config.json still has "auto_map": {"AutoTokenizer": [null, "tokenization_kimi_fast.TikTokenTokenizerFast"]} — the null slow tokenizer slot
Exo's Kimi-specific code path requires the slow tokenizer's tiktoken.Encoding object
Workaround
Copy the slow tokenizer from the K2-Instruct repo and lock it read-only:
MODEL_DIR="$HOME/.exo/models/moonshotai--Kimi-K2.6"
curl -sL "https://huggingface.co/moonshotai/Kimi-K2-Instruct/raw/main/tokenization_kimi.py"
-o "$MODEL_DIR/tokenization_kimi.py"
chmod 444 "$MODEL_DIR/tokenization_kimi.py"
chmod 444 "$MODEL_DIR/tokenizer_config.json"
The chmod 444 prevents exo from overwriting the files during metadata re-sync on restart.
Suggested fixes
Don't hard-fail if tokenization_kimi.py is 404 during download — skip it gracefully
Fall back to fast tokenizer when slow tokenizer is missing — adapt _patched_encode to work with TikTokenTokenizerFast or use the fast tokenizer's native encode path
Pin model revision to avoid silent upstream breakage from HF repo changes
Environment
Exo: latest .app from exolabs.net + source at commit 87c72fc
Hardware: 4× Mac Studio M3 Ultra, RDMA/Thunderbolt, tensor sharding
macOS: 26.4.1
Model: moonshotai/Kimi-K2.6 (595GB, unquantized)
Worked before: 2026-05-10 (prior to HF commit 81bcaaa)