Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize get list of possible values query for types/Collection columns #170

Open
UnamedRus opened this issue Jun 13, 2023 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@UnamedRus
Copy link

Use approach with settings for such queries:

ClickHouse/ClickHouse#50919 (comment)

@UnamedRus UnamedRus changed the title Optimize get list of possible values query Optimize get list of possible values query for types/Collection columns Jun 13, 2023
@UnamedRus
Copy link
Author

Related #119

@slvrtrn
Copy link
Collaborator

slvrtrn commented Jun 19, 2023

Can you please clarify what needs to be done here cause I am a bit OOTL?

As I understood, it's better to apply these settings for types/Collection scans queries:

max_rows_to_group_by=1000
group_by_overflow_mode=break

Is it correct?

@UnamedRus
Copy link
Author

Yeah, something like that will be simpler.

It will not return sorted list of values, but just first/random 1000 values.

Another possible problem, what if your column have less then 1000 values, so ClickHouse will scan it fully. (1 trillion rows)

So, it make sense to play with settings:

│ max_rows_to_read                                   │ 0       │       0 │ Limit on read rows from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.                                                                                          │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ UInt64 │ 0       │           │
│ max_rows_to_read_leaf                              │ 0       │       0 │ Limit on read rows on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node.                                                                                                      │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ UInt64 │ 0       │     

Or

│ max_execution_time                      │ 0     │       0 │ If query run time exceeded the specified number of seconds, the behavior will be determined by the 'timeout_overflow_mode' which by default is - throw an exception. Note that the timeout is checked and query can stop only in designated places during data processing. It currently cannot stop during merging of aggregation states or during query analysis, and the actual run time will be higher than the value of this setting. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ Seconds │ 0       │           │
│
┌─name──────────────────┬─value─┬─changed─┬─description────────────────────────────┬─min──┬─max──┬─readonly─┬─type─────────┬─default─┬─alias_for─┐
│ timeout_overflow_mode │ throw │       0 │ What to do when the limit is exceeded. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ OverflowMode │ throw   │           │
└───────────────────────┴───────┴─────────┴────────────────────────────────────────┴──────┴──────┴──────────┴──────────────┴─────────┴───────────┘
SET  timeout_overflow_mode = 'break', max_execution_time=5

@slvrtrn slvrtrn added the enhancement New feature or request label Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants