Open
Description
BigQuery has following Quota Policy.
So, It's better to split output file each 4GB.
File Type | Compressed | Uncompressed |
---|---|---|
CSV | 4 GB | With new-lines in strings: 4 GB Without new-lines in strings: 5 TB |
JSON | 4 GB | 5TB |
Problems
- Have to split newline(CRLF/LF/CR) at EOL, not only filesize.
- Split before output beforehand is better way than split output file, Because Embulk run multiple tasks with multiple CPU cores.