You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When processing large streams of files, we need the ability to cut them at a specific point. Today, users have to rely on thirty tools for this, such as logrotate. There are two obvious ways to do this:
Spatially: after a file reached a given size
Temporally: after a fixed time duration
The content you are editing has changed. Please copy your edits and refresh the page.
Users most likely don't have to even worry about this:
By default the system tries to infer if the provided files are in a hive partitioned hierarchy. And if so, the hive_partitioning flag is enabled automatically. The autodetection will look at the names of the folders and search for a 'key' = 'value' pattern. This behaviour can be overridden by setting the hive_partitioning flag manually.
Given that the Hive partitioning is a quasi cross-tool standard in the data community, and that many data tools support it OOTB, we should start with this approach, as it maximizes interoperability and simplicity. For example, Arrow also supports reading partitioned datasets.
When processing large streams of files, we need the ability to cut them at a specific point. Today, users have to rely on thirty tools for this, such as
logrotate
. There are two obvious ways to do this:Tasks
The text was updated successfully, but these errors were encountered: