You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+21-16Lines changed: 21 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,10 @@
1
1
# parquet2json
2
2
3
-
A command-line tool for converting[Parquet](https://parquet.apache.org)to [newline-delimited JSON](https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON).
3
+
A command-line tool for streaming[Parquet](https://parquet.apache.org)as [line-delimited JSON](https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON).
4
4
5
-
It uses the excellent [Apache Parquet Official Native Rust Implementation](https://github.com/apache/arrow-rs/tree/master/parquet).
5
+
It reads only required ranges from file, HTTP or S3 locations, and supports offset/limit and column selection.
6
+
7
+
It uses the [Apache Parquet Official Native Rust Implementation](https://github.com/apache/arrow-rs/tree/master/parquet) which has excellent support for compression formats and complex types.
6
8
7
9
## How to use it
8
10
@@ -13,18 +15,21 @@ $ cargo install parquet2json
13
15
$ parquet2json --help
14
16
15
17
USAGE:
16
-
parquet2json [OPTIONS] <FILE>
18
+
parquet2json [OPTIONS] <FILE><SUBCOMMAND>
17
19
18
20
ARGS:
19
21
<FILE> Location of Parquet input file (file path, HTTP or S3 URL)
20
22
21
23
OPTIONS:
22
-
-o, --offset <OFFSET> Starts outputting from this row [default: 0]
23
-
-l, --limit <LIMIT> Maximum number of rows to output
24
-
-t, --timeout <TIMEOUT> Request timeout in seconds [default: 60]
25
-
-s, --schema-output <SCHEMA_OUTPUT> Outputs thrift schema only
26
-
-c, --columns <COLUMNS> Select columns by name (comma,separated)
27
-
-h, --help Print help information
24
+
-t, --timeout <TIMEOUT> Request timeout in seconds [default: 60]
25
+
-h, --help Print help information
26
+
-V, --version Print version information
27
+
28
+
SUBCOMMANDS:
29
+
cat Outputs data as JSON lines
30
+
schema Outputs the Thrift schema
31
+
rowcount Outputs only the total row count
32
+
help Print this message or the help of the given subcommand(s)
28
33
```
29
34
30
35
### S3 Settings
@@ -40,23 +45,23 @@ Use it to stream output to files and other tools such as `grep` and [jq](https:/
0 commit comments