Skip to content
This repository was archived by the owner on Jan 11, 2021. It is now read-only.

sunchao/parquet-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

331024e · Dec 21, 2018
Dec 7, 2018
Jul 24, 2018
Aug 21, 2018
Nov 18, 2018
Dec 11, 2018
Apr 14, 2018
Dec 11, 2018
Nov 3, 2018
Feb 21, 2017
Dec 21, 2018
Nov 28, 2018
Nov 11, 2018
Nov 11, 2018

Repository files navigation

parquet-rs

Build Status Coverage Status License API docs Master API docs

An Apache Parquet implementation in Rust.

NOTE: this project has merged into Apache Arrow, and development will continue there. To file an issue or pull request, please file a JIRA in the Arrow project.

Usage

Add this to your Cargo.toml:

[dependencies]
parquet = "0.4"

and this to your crate root:

extern crate parquet;

Example usage of reading data:

use std::fs::File;
use std::path::Path;
use parquet::file::reader::{FileReader, SerializedFileReader};

let file = File::open(&Path::new("/path/to/file")).unwrap();
let reader = SerializedFileReader::new(file).unwrap();
let mut iter = reader.get_row_iter(None).unwrap();
while let Some(record) = iter.next() {
  println!("{}", record);
}

See crate documentation on available API.

Supported Parquet Version

  • Parquet-format 2.4.0

To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of parquet-format crate in Cargo.toml.

Features

  • All encodings supported
  • All compression codecs supported
  • Read support
    • Primitive column value readers
    • Row record reader
    • Arrow record reader
  • Statistics support
  • Write support
    • Primitive column value writers
    • Row record writer
    • Arrow record writer
  • Predicate pushdown
  • Parquet format 2.5 support
  • HDFS support

Requirements

  • Rust nightly

See Working with nightly Rust to install nightly toolchain and set it as default.

Build

Run cargo build or cargo build --release to build in release mode. Some features take advantage of SSE4.2 instructions, which can be enabled by adding RUSTFLAGS="-C target-feature=+sse4.2" before the cargo build command.

Test

Run cargo test for unit tests.

Binaries

The following binaries are provided (use cargo install to install them):

  • parquet-schema for printing Parquet file schema and metadata. Usage: parquet-schema <file-path> [verbose], where file-path is the path to a Parquet file, and optional verbose is the boolean flag that allows to print full metadata or schema only (when not specified only schema will be printed).

  • parquet-read for reading records from a Parquet file. Usage: parquet-read <file-path> [num-records], where file-path is the path to a Parquet file, and num-records is the number of records to read from a file (when not specified all records will be printed).

If you see Library not loaded error, please make sure LD_LIBRARY_PATH is set properly:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib

Benchmarks

Run cargo bench for benchmarks.

Docs

To build documentation, run cargo doc --no-deps. To compile and view in the browser, run cargo doc --no-deps --open.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.