Open connections to new nodes more lazily #108127

DaveCTurner · 2024-05-01T10:23:15Z

Today we block cluster state application while waiting to connect to newly-added nodes in a cluster before starting to apply the state:

elasticsearch/server/src/main/java/org/elasticsearch/cluster/service/ClusterApplierService.java

Lines 509 to 519 in 90351ef

 protected void connectToNodesAndWait(ClusterState newClusterState) { 

 // can't wait for an ActionFuture on the cluster applier thread, but we do want to block the thread here, so use a CountDownLatch. 

 final CountDownLatch countDownLatch = new CountDownLatch(1); 

 connectToNodesAsync(newClusterState, countDownLatch::countDown); 

 try { 

 countDownLatch.await(); 

 } catch (InterruptedException e) { 

 logger.debug("interrupted while connecting to nodes, continuing", e); 

 Thread.currentThread().interrupt(); 

 } 

 }

We do this because we expect to be able to send requests to every node in the cluster, and we don't want to report a failure if we attempt to send a request before the initial connection attempt has completed. However, we could achieve the same effect without this blocking wait by creating a placeholder connection which captures any requests destined for these new nodes and delays them until the initial connection attempt has completed (whether successfully or otherwise).

Such delays generally wouldn't apply to performance-critical requests like searches or indexing because the new node would initially have no shards assigned to it. One possible problem is that if the new node is an ingest node, and the cluster contains some nodes without the ingest role, then those nodes might try and forward ingest traffic to the new node using this delayed connection which would be visible as a blip in indexing latency. We'd probably want to make the routing logic for those requests be aware of the potential delay.

Relates #89821 since that's another thing that delays cluster state application unnecessarily

elasticsearchmachine · 2024-05-01T10:23:46Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner added >enhancement :Distributed/Network Http and internode communication implementations >tech debt labels May 1, 2024

elasticsearchmachine added the Team:Distributed Meta label for distributed team label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open connections to new nodes more lazily #108127

Open connections to new nodes more lazily #108127

DaveCTurner commented May 1, 2024 •

edited

elasticsearchmachine commented May 1, 2024

Open connections to new nodes more lazily #108127

Open connections to new nodes more lazily #108127

Comments

DaveCTurner commented May 1, 2024 • edited

elasticsearchmachine commented May 1, 2024

DaveCTurner commented May 1, 2024 •

edited