Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Alternative build) Possible performance regression in v24.3 #63500

Closed
canhld94 opened this issue May 8, 2024 · 6 comments · Fixed by #63730
Closed

(Alternative build) Possible performance regression in v24.3 #63500

canhld94 opened this issue May 8, 2024 · 6 comments · Fixed by #63730

Comments

@canhld94
Copy link
Contributor

canhld94 commented May 8, 2024

(you don't have to strictly follow this form)

Describe the situation
And it's possibly caused by new feature user pagecache.
After upgrade our baseline ClickHouse version to v24.3, we observe a significant increase in P95 and P99 latency:

image

It's not official ClickHouse build, nevertheless, we didn't add any new code in this release, only upgrade the baseline from v24.2 to v24.3.2.1 and observe the performance regression.

We check the profiling but cannot found any concrete issue (just the queries run longer and with more trace recorded by profiler). After testing some concreted queries, we suspect the issue is that when running high concurrent workload, on v24.3 queries more often don't read from OS page cache.

That leads us to #53770. We try to revert the PR and the query latency back to normal.

May be someone in core team can look at this PR if you have time.

@Algunenano
Copy link
Member

cc @al13n321

@jorisgio
Copy link
Contributor

jorisgio commented May 8, 2024

Didn't you mean v24.3 ?

@canhld94 canhld94 changed the title (Alternative build) Possible performance regression in v23.3 (Alternative build) Possible performance regression in v24.3 May 8, 2024
@canhld94
Copy link
Contributor Author

canhld94 commented May 8, 2024

Didn't you mean v24.3 ?

@jorisgio yes, thanks for pointing out!

@nickitat
Copy link
Member

nickitat commented May 8, 2024

did you happen to collect any profiles? from CH profiler or perf records, whatever

@canhld94
Copy link
Contributor Author

canhld94 commented May 10, 2024

did you happen to collect any profiles? from CH profiler or perf records, whatever

@nickitat yes, we collected top trace from CH profiler:

top_cpu_trace_before_upgrade.txt
top_cpu_trace_after_upgrade.txt
top_real_trace_before_upgrade.txt
top_real_trace_after_upgrade.txt

I also collect perf data but didn't save it. Will see if I can rollback a server and collect it.

@alexey-milovidov
Copy link
Member

The page cache should not be enabled by default. We will disable it and backport this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants