Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build full-fledged plan for local replica #63550

Open
nickitat opened this issue May 8, 2024 · 3 comments · May be fixed by #64448
Open

Build full-fledged plan for local replica #63550

nickitat opened this issue May 8, 2024 · 3 comments · May be fixed by #64448
Assignees

Comments

@nickitat
Copy link
Member

nickitat commented May 8, 2024

as we do for distributed queries. i.e. instead of

   ┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Expression (Project names)                                                                                                                                                                                                                                 │
2. │   ReadFromRemoteParallelReplicas (Replica: 127.0.0.1, Replica: 127.0.0.2, Replica: 127.0.0.3, Replica: 127.0.0.4, Replica: 127.0.0.5, Replica: 127.0.0.6, Replica: 127.0.0.7, Replica: 127.0.0.8, Replica: 127.0.0.9, Replica: 127.0.0.10, Replica: 127.0.0.11) │
   └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

build something like this:

Expression ((Project names + Projection))
  MergingAggregated
    Union
      Aggregating
        Expression ((Before GROUP BY + Change column names to column identifiers))
          ReadFromSystemNumbers
      ReadFromRemote (Read from remote replica)

this will enable any analysis/optimisation work based on query plan on the initiator

@nikitamikhaylov
Copy link
Member

For me it looks similar to prefer_localhost_replica=1 for Distributed table. I tried to support this previously and failed, because it turned out to be difficult to balance properly between executing the local part of the pipeline and constantly answer for the incoming requests from remote replicas (from the scheduling standpoint). Pipeline executor has lot of affinities and there is no fairness there, so from what I observed ClickHouse either prefers executing local plan only with all the threads it has or prefers answering the incoming requests.

@nickitat
Copy link
Member Author

nickitat commented May 9, 2024

interesting, thx for sharing

@devcrafter devcrafter self-assigned this May 13, 2024
@devcrafter
Copy link
Member

For me it looks similar to prefer_localhost_replica=1 for Distributed table.

There is an important difference to a distributed table, - in the case of task-based parallel replicas, the data to read is assigned dynamically across replicas, where for distributed table, replicas read different data according to sharding schema (so there is no dynamic scheduling among replicas).

@nickitat nickitat linked a pull request May 28, 2024 that will close this issue
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants