-
-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: handle chain stall in peer manager #7508
base: unstable
Are you sure you want to change the base?
Conversation
Performance Report🚀🚀 Significant benchmark improvement detected
Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - we have tested this for a while on holesky-rescue branch and it was very useful to speed up our sync times + retrieving more details about our connected peers, see discord thread for more details
return StatusScore.FAR_AHEAD; | ||
} | ||
|
||
// It seems dangerous to downscore peers that are far behind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we ever return FAR_BEHIND, then lodestar nodes will most likely disconnect peers that want to sync from us which is not great for the network. Should we remove FAR_BEHIND completely? or add more info for it
if we delete FAR_BEHIND enum then there is only 2 options: CLOSE_TO_US and FAR_AHEAD. At synced time, this function will always return CLOSE_TO_US. Hence the main usage for this function is for syncing time
please add all of these to the function description for later reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the feature is great, I dropped some comments/suggestions
@@ -111,6 +111,10 @@ export function createNetworkCoreMetrics(register: RegistryMetricCreator) { | |||
help: "Peer manager heartbeat function duration in seconds", | |||
buckets: [0.001, 0.01, 0.1, 1], | |||
}), | |||
starved: register.gauge({ | |||
name: "lodestar_peer_manager_starved_bool", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have time would be good to add a panel somewhere that uses the bool
seems like we need to update tests here
|
Motivation
Description
starved
check in the peer manager heartbeatstarved
when its head hasn't updates since the last heartbeat and is behindSTARVATION_THRESHOLD_SLOTS
starved
and has at leasttargetPeers
, attempt to prune an additionaltargetPeers * STARVATION_PRUNE_RATIO
peersstarved
, disallow pruning of peers that areFAR_AHEAD
of usstarved
or not, adjust the sorting for pruning to disfavor pruning peers that areFAR_AHEAD
of us