Optimize get list of possible values query for `types/Collection` columns #170

UnamedRus · 2023-06-13T13:52:35Z

Use approach with settings for such queries:

UnamedRus · 2023-06-13T13:54:25Z

Related #119

slvrtrn · 2023-06-19T16:12:06Z

Can you please clarify what needs to be done here cause I am a bit OOTL?

As I understood, it's better to apply these settings for types/Collection scans queries:

max_rows_to_group_by=1000
group_by_overflow_mode=break

Is it correct?

UnamedRus · 2023-06-19T16:16:44Z

Yeah, something like that will be simpler.

It will not return sorted list of values, but just first/random 1000 values.

Another possible problem, what if your column have less then 1000 values, so ClickHouse will scan it fully. (1 trillion rows)

So, it make sense to play with settings:

│ max_rows_to_read                                   │ 0       │       0 │ Limit on read rows from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.                                                                                          │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ UInt64 │ 0       │           │
│ max_rows_to_read_leaf                              │ 0       │       0 │ Limit on read rows on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node.                                                                                                      │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ UInt64 │ 0       │

Or

│ max_execution_time                      │ 0     │       0 │ If query run time exceeded the specified number of seconds, the behavior will be determined by the 'timeout_overflow_mode' which by default is - throw an exception. Note that the timeout is checked and query can stop only in designated places during data processing. It currently cannot stop during merging of aggregation states or during query analysis, and the actual run time will be higher than the value of this setting. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ Seconds │ 0       │           │
│
┌─name──────────────────┬─value─┬─changed─┬─description────────────────────────────┬─min──┬─max──┬─readonly─┬─type─────────┬─default─┬─alias_for─┐
│ timeout_overflow_mode │ throw │       0 │ What to do when the limit is exceeded. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ OverflowMode │ throw   │           │
└───────────────────────┴───────┴─────────┴────────────────────────────────────────┴──────┴──────┴──────────┴──────────────┴─────────┴───────────┘

SET  timeout_overflow_mode = 'break', max_execution_time=5

UnamedRus changed the title ~~Optimize get list of possible values query~~ Optimize get list of possible values query for types/Collection columns Jun 13, 2023

slvrtrn added the enhancement New feature or request label Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize get list of possible values query for `types/Collection` columns #170

Optimize get list of possible values query for `types/Collection` columns #170

UnamedRus commented Jun 13, 2023

UnamedRus commented Jun 13, 2023

slvrtrn commented Jun 19, 2023

UnamedRus commented Jun 19, 2023

Optimize get list of possible values query for types/Collection columns #170

Optimize get list of possible values query for types/Collection columns #170

Comments

UnamedRus commented Jun 13, 2023

UnamedRus commented Jun 13, 2023

slvrtrn commented Jun 19, 2023

UnamedRus commented Jun 19, 2023

Optimize get list of possible values query for `types/Collection` columns #170

Optimize get list of possible values query for `types/Collection` columns #170