Skip to content

Split file each 4GB for BigQuery Quota Policy #6

Open
@sakama

Description

@sakama

BigQuery has following Quota Policy.

So, It's better to split output file each 4GB.

File Type Compressed Uncompressed
CSV 4 GB With new-lines in strings: 4 GB
Without new-lines in strings: 5 TB
JSON 4 GB 5TB

Problems

  • Have to split newline(CRLF/LF/CR) at EOL, not only filesize.
  • Split before output beforehand is better way than split output file, Because Embulk run multiple tasks with multiple CPU cores.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions