The library for normalizing text into Unix (\n
) or DOS (\r\n
) newline formats, using fast SIMD search and zero-copy when possible.
- Adds extension traits to
str
— call.to_unix_newlines()
and.to_dos_newlines()
directly. - Preserves input with
Cow<str>
— skips allocation if no changes are needed. - Converts
\r
and\r\n
into consistent Unix (\n
) or DOS (\r\n
) newlines. - Unicode-safe — preserves all characters without loss.
- Fast scanning with memchr and SIMD.
use newline_normalizer::{ToUnixNewlines, ToDosNewlines};
let unix = "line1\r\nline2\rline3".to_unix_newlines();
assert_eq!(unix, "line1\nline2\nline3");
let dos = "line1\nline2\nline3".to_dos_newlines();
assert_eq!(dos, "line1\r\nline2\r\nline3");
Benchmarks are in the /benches
folder.
Run them using:
cargo bench --bench to_unix
cargo bench --bench to_dos
All suggestions on how to improve the benchmarks are welcome.
Hardware: AMD Ryzen 9 9900X 12-Core Processor with 64 GB RAM.
Rust version: rustc 1.86.0 (05f9846f8 2025-03-31)
Benchmark framework: Criterion
Case | newline-converter |
This crate (newline_normalizer ) |
---|---|---|
Small Unicode paragraph | ~685.46 ns | ~88.789 ns 🚀 |
Small Unicode paragraph pre-normalized | ~151.39 ns | ~58.350 ns 🚀 |
The Adventures of Sherlock Holmes (608kb) | ~345.27 µs | ~138.26 µs 🚀 |
The Adventures of Sherlock Holmes (608kb) pre-normalized | ~342.91 µs | ~137.54 µs 🚀 |
Note: Pre-normalized means the input already has correct line endings and does not require changes.
Case | newline-converter |
regex replace all |
This crate (newline_normalizer ) |
---|---|---|---|
Small Unicode paragraph | ~1.0858 µs | ~101.42 ns | ~24.464 ns 🚀 |
Small Unicode paragraph pre-normalized | ~164.41 ns | ~20.744 ns | ~4.6608 ns 🚀 |
The Adventures of Sherlock Holmes (608kb) | ~680.12 µs | ~289.84 µs | ~89.150 µs 🚀 |
The Adventures of Sherlock Holmes (608kb) pre-normalized | ~318.83 µs | ~7.7864 µs | ~2.5146 µs 🚀 |
- Pre-normalized means the input text already uses the correct line endings.
- In such cases,
newline_normalizer
can skip allocations and return a borrowed reference. - Extremely low latency (e.g., ~4.66 ns) is achieved by using
Cow::Borrowed
, avoiding an allocation of a new string when the input does not change.
This crate does not alter Unicode content. It only rewrites newline boundaries.
All valid UTF-8 sequences are preserved, including:
- Combining characters stay attached
- Emoji and multi-codepoint sequences remain valid
- Right-to-left (RTL) markers are unaffected
This crate does not currently normalize U+2028 (LINE SEPARATOR) or U+2029 (PARA SEP). Only ASCII newline formats are converted.
This project is licensed under the MIT License.