Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open connections to new nodes more lazily #108127

Open
DaveCTurner opened this issue May 1, 2024 · 1 comment
Open

Open connections to new nodes more lazily #108127

DaveCTurner opened this issue May 1, 2024 · 1 comment
Labels
:Distributed/Network Http and internode communication implementations >enhancement Team:Distributed Meta label for distributed team >tech debt

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented May 1, 2024

Today we block cluster state application while waiting to connect to newly-added nodes in a cluster before starting to apply the state:

protected void connectToNodesAndWait(ClusterState newClusterState) {
// can't wait for an ActionFuture on the cluster applier thread, but we do want to block the thread here, so use a CountDownLatch.
final CountDownLatch countDownLatch = new CountDownLatch(1);
connectToNodesAsync(newClusterState, countDownLatch::countDown);
try {
countDownLatch.await();
} catch (InterruptedException e) {
logger.debug("interrupted while connecting to nodes, continuing", e);
Thread.currentThread().interrupt();
}
}

We do this because we expect to be able to send requests to every node in the cluster, and we don't want to report a failure if we attempt to send a request before the initial connection attempt has completed. However, we could achieve the same effect without this blocking wait by creating a placeholder connection which captures any requests destined for these new nodes and delays them until the initial connection attempt has completed (whether successfully or otherwise).

Such delays generally wouldn't apply to performance-critical requests like searches or indexing because the new node would initially have no shards assigned to it. One possible problem is that if the new node is an ingest node, and the cluster contains some nodes without the ingest role, then those nodes might try and forward ingest traffic to the new node using this delayed connection which would be visible as a blip in indexing latency. We'd probably want to make the routing logic for those requests be aware of the potential delay.

Relates #89821 since that's another thing that delays cluster state application unnecessarily

@DaveCTurner DaveCTurner added >enhancement :Distributed/Network Http and internode communication implementations >tech debt labels May 1, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label May 1, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Network Http and internode communication implementations >enhancement Team:Distributed Meta label for distributed team >tech debt
Projects
None yet
Development

No branches or pull requests

2 participants