Does ordering of data matter for locality? #2616
-
I'm really liking DuckDB so far, but one question I'm unsure about is whether the ordering of data matters at all? E.g. imagine a scenario where we have 3 columns: timestamp, id, value. There are say 1000 timestamps and for each timestamp there are 1000 unique ids. Is the data on disk/in memory laid out in the order that it is inserted in? The reason I'm asking is that locality might matter. If data is ordered by (id, timestamp) are all timestamps for a particular id going to be close to each other? Conversely, if data is ordered by (timestamp, id) are all ids on the same timestamp going to be close to each other? This would be particularly important for very large datasets and rolling windows on large datasets. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Insertion order is preserved (i.e. data is inserted into the table in the order it is inserted). The timestamps would indeed be close to each other. The system does not yet take advantage of inherent sortedness, however, so window functions will always resort the data (even if it is already sorted). See #2548 for some more discussion on that. |
Beta Was this translation helpful? Give feedback.
Insertion order is preserved (i.e. data is inserted into the table in the order it is inserted). The timestamps would indeed be close to each other. The system does not yet take advantage of inherent sortedness, however, so window functions will always resort the data (even if it is already sorted). See #2548 for some more discussion on that.