Brainstorm: Expose keep-alive information to user #1965

mxinden · 2020-08-13T08:51:11Z

mxinden
Aug 13, 2020
Collaborator

Background

For paritytech/polkadot#1532 I am investigating why connections initiated through Kademlia are kept alive for more than the expected 10 second idle timeout. As far as I can tell there is no way to learn why connections are kept alive without intrusive libp2p source code changes. Ideally I would like to expose a Prometheus metric in Substrate.

substrate_sub_libp2p_connections_total{keep-alive-protocol="kademlia"} 10
substrate_sub_libp2p_connections_total{keep-alive-protocol="legacy"} 11

Hacky solution

To help debugging I extended KeepAlive with a protocol id indicating the protocol that would like to keep the connection alive.

diff --git a/swarm/src/protocols_handler.rs b/swarm/src/protocols_handler.rs
index 9721e9db..1fcc3fac 100644
--- a/swarm/src/protocols_handler.rs
+++ b/swarm/src/protocols_handler.rs
@@ -498,12 +498,12 @@ where T: ProtocolsHandler
 }
 
 /// How long the connection should be kept alive.
 pub enum KeepAlive {
     /// If nothing new happens, the connection should be closed at the given `Instant`.
-    Until(Instant),
+    Until(Instant, Cow<'static, [u8]>),
     /// Keep the connection alive.
-    Yes,
+    Yes(Cow<'static, [u8]>),
     /// Close the connection as soon as possible.
     No,
 }

ProtocolHandlers that delegate to other ProtocolHandlers aggregate the KeepAlive and pass the protocol id of the highest KeepAlive upwards.

Within node_handler.rs I could then log the id of the protocol that keeps the connection alive.

This lead to #1698 which triggered investigation for #1700.

Way forward

First of all: Do people feel the need to surface keep-alive information to the user? Or is the on-demand debugging through log lines good enough?

If we do want to expose that information we need to find a consistent way to do so. One suggestion from my side would be to bubble up a ~ KeepAlive event that records the id of the protocol keeping the protocol alive from the NodeHandler for each connection on a regular interval .

romanb · 2020-08-13T09:03:28Z

romanb
Aug 13, 2020

Relates to #1478.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Brainstorm: Expose keep-alive information to user #1965

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Brainstorm: Expose keep-alive information to user #1965

Uh oh!

mxinden Aug 13, 2020 Collaborator

Background

Hacky solution

Way forward

Replies: 1 comment

Uh oh!

romanb Aug 13, 2020

mxinden
Aug 13, 2020
Collaborator

romanb
Aug 13, 2020