-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monolith OSS - Limit query consumes high CPU usage for look backs 12+hours #25573
Comments
MaduMitha-Ravi
changed the title
Monolith - Limit query consumes high CPU usage for look backs 12+hours
Monolith OSS - Limit query consumes high CPU usage for look backs 12+hours
Nov 21, 2024
Can you get some EXPLAIN plans from some of these queries? I'm curious if deduplication is running, which should be something that we generally don't need except for stuff in the WriteBuffer Chunks. |
@pauldix Here are the explains. Explains for Q5 query (LIMIT 5 reaching CPU usage of 93%)``` plan: Sort: senec_data.time DESC NULLS FIRST, fetch=5 Filter: senec_data.device_id = Dictionary(Int32, Utf8("136")) AND senec_data.time > TimestampNanosecond(1732094449936786175, None) TableScan: senec_data projection=[device_id, field1, field2, field3, field4, field5, field6, field7, field8, field9, time], partial_filters=[senec_data.device_id = Dictionary(Int32, Utf8("136")), senec_data.time > TimestampNanosecond(1732094449936786175, None)] plan_type: logical_plan plan: SortPreservingMergeExec: [time@10 DESC], fetch=5 SortExec: TopK(fetch=5), expr=[time@10 DESC], preserve_partitioning=[true] CoalesceBatchesExec: target_batch_size=8192 FilterExec: device_id@0 = 136 AND time@10 > 1732094449936786175 UnionExec CoalesceBatchesExec: target_batch_size=8192 FilterExec: device_id@0 = 136 AND time@10 > 1732094449936786175 ParquetExec: file_groups={2 groups: [[mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/21-00/0000084566.parquet:0..740319, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/20-40/0000083390.parquet:0..739549, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/20-20/0000082213.parquet:0..739327, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/20-00/0000081037.parquet:0..739762, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/19-40/0000079860.parquet:0..739837, ...], [mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-19/21-20/0000000900.parquet:481478..738991, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/20-50/0000083978.parquet:0..740778, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/20-30/0000082801.parquet:0..740188, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/20-10/0000081625.parquet:0..739828, mravi-m-oss-1k-monolithoss-worker-00/dbs/thebucket-0/senec_data-0/2024-11-20/19-50/0000080449.parquet:0..739798, ...]]}, projection=[device_id, field1, field2, field3, field4, field5, field6, field7, field8, field9, time], predicate=device_id@0 = 136 AND time@10 > 1732094449936786175, pruning_predicate=CASE WHEN device_id_null_count@2 = device_id_row_count@3 THEN false ELSE device_id_min@0 <= 136 AND 136 <= device_id_max@1 END AND CASE WHEN time_null_count@5 = time_row_count@6 THEN false ELSE time_max@4 > 1732094449936786175 END, required_guarantees=[device_id in (136)] ProjectionExec: expr=[device_id@0 as device_id, field1@1 as field1, field2@2 as field2, field3@3 as field3, field4@4 as field4, field5@5 as field5, field6@6 as field6, field7@7 as field7, field8@8 as field8, field9@9 as field9, time@10 as time] DeduplicateExec: [device_id@0 ASC,time@10 ASC] SortPreservingMergeExec: [time@10 ASC,__chunk_order@11 ASC] SortExec: expr=[time@10 ASC,__chunk_order@11 ASC], preserve_partitioning=[true] CoalesceBatchesExec: target_batch_size=8192 FilterExec: device_id@0 = 136 AND time@10 > 1732094449936786175 RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1 RecordBatchesExec: chunks=1, projection=[device_id, field1, field2, field3, field4, field5, field6, field7, field8, field9, time, __chunk_order] ProjectionExec: expr=[device_id@0 as device_id, field1@1 as field1, field2@2 as field2, field3@3 as field3, field4@4 as field4, field5@5 as field5, field6@6 as field6, field7@7 as field7, field8@8 as field8, field9@9 as field9, time@10 as time] DeduplicateExec: [device_id@0 ASC,time@10 ASC] SortExec: expr=[time@10 ASC,__chunk_order@11 ASC], preserve_partitioning=[false] CoalesceBatchesExec: target_batch_size=8192 FilterExec: device_id@0 = 136 AND time@10 > 1732094449936786175 RecordBatchesExec: chunks=1, projection=[device_id, field1, field2, field3, field4, field5, field6, field7, field8, field9, time, __chunk_order] plan_type: physical_plan ```Explains for F1 query (LIMIT 1 reaching CPU usage of 74%)
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem 1
The SQL limit query consumes very high CPU utilization even for a single series. On a
m6dg.large
machine which is of 2 cores and 8 GiB RAM, we observe that the query uses more than 75% and in many cases nearly 95%.Problem 2
As the LIMIT increases, drastic increase in CPU usage is observed. LIMIT 1 -> LIMIT 5.
Note: Happens only in OSS. This has an effect on the latency as well and these experiments are without concurrency.
Configuration
The problem was found with the below basic setup,
Queries
Metrics/Evidence
Q5 query - LIMIT 5 reaching 93.5%
F1 query - LIMIT 1 reaching 74.4%
Latency over 100 runs
The text was updated successfully, but these errors were encountered: