qsv-sniffer
provides methods to infer CSV file metadata (delimiter choice, quote character,
number of fields, field names, field data types, etc.). See the documentation for more details.
Its a detached fork of csv-sniffer with these additional capabilities, detecting:
- utf-8 encoding
- field names
- number of rows
- average record length
- additional data types - Date/DateTime and NULL
- smarter Boolean type detection - "true" and "false" are not the only Boolean values it detects. It now also detects 1/0, yes/no, y/n, true/false, t/f - case insensitive
ℹ️ NOTE: This fork is optimized to support qsv, and its development will be primarily dictated by qsv's requirements.
cargo install qsv-sniffer
This will install a binary named sniff
.
Add this to your Cargo.toml
:
[dependencies]
qsv-sniffer = "0.9"
and this to your crate root:
use qsv_sniffer;
cli
- to build thesniff
binaryruntime-dispatch-simd
- enables detection of SIMD capabilities at runtime, which allows using the SSE2 and AVX2 code paths (only works on Intel and AMD architectures. Ignored on other architectures).generic-simd
- enables architecture-agnostic SIMD capabilities, but only works with Rust nightly.
The SIMD features are mutually exclusive and increase sampling performance.
This example shows how to write a simple command-line tool for discovering the metadata of a CSV file:
use qsv_sniffer;
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() != 2 {
eprintln!("Usage: {} <file>", args[0]);
::std::process::exit(1);
}
// sniff the path provided by the first argument
match qsv_sniffer::Sniffer::new().sniff_path(&args[1]) {
Ok(metadata) => {
println!("{}", metadata);
},
Err(err) => {
eprintln!("ERROR: {}", err);
}
}
}
This example is provided as the primary binary for this crate. In the source directory, this can be run as:
$ cargo run -- tests/data/library-visitors.csv