-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix bytelevel decode of added tokens + 27x faster deserialization
#1995
opened Mar 27, 2026 by
ArthurZucker
Loading…
Fix node-release: all platforms, zig cross-compilation, universal macOS
#1970
opened Mar 19, 2026 by
MayCXC
Loading…
feat: performance, adding pcre2 backend + regex-shards (5-15% speedup)
#1968
opened Mar 19, 2026 by
michaelfeil
Loading…
feat: Optimize BPE tokenization: sharded cache, packed merge keys, FxHash (10-15% speedup)
#1967
opened Mar 19, 2026 by
michaelfeil
Loading…
Fix type_ids not applied to overflow encodings
#1965
opened Mar 17, 2026 by
joaquinhuigomez
Loading…
Add get_special_tokens and is_special_token methods
#1945
opened Feb 5, 2026 by
ArthurZucker
Loading…
2 tasks done
Add post_process_tokens and post_process_ids methods
#1944
opened Feb 5, 2026 by
ArthurZucker
Loading…
3 tasks done
feat: add unk_token property to Unigram model
#1943
opened Feb 5, 2026 by
ArthurZucker
Loading…
4 tasks done
🚨 feat: add role_to_token field for special token metadata
#1942
opened Feb 5, 2026 by
ArthurZucker
Loading…
Use
unicode-normalization instead of unicode-normalization-alignments
#1912
opened Dec 14, 2025 by
IvanIsCoding
Loading…
C and C++ bindings to Tokenizers
bindings
Feature Request
#1888
opened Nov 21, 2025 by
thammegowda
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.