Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate why Lighthouse/Lighthouse combo experiences issues with missing duties #3088

Open
6 tasks
boulder225 opened this issue May 17, 2024 · 1 comment
Open
6 tasks
Assignees
Labels
bug Something isn't working protocol Protocol Team tickets

Comments

@boulder225
Copy link
Collaborator

🎯 Problem to be solved

Lighthouse/Lighthouse combo (running with distributed flag), although gets slightly better over time
, is experiencing issues with missing duties, specifically:

  • aggregate_attestation duties are being missed.
  • sync_committee_contribution duties are being missed.

This could be due to another misconfiguration probably in kurtosis. Also, the beacon node score is low (97%), and VC loads keys extremely slow (15 minutes or longer).

image.png

image.png

VC log:
2024-05-17 11:41:02 May 17 08:41:02.002 INFO All validators active                   slot: 120, epoch: 3, total_validators: 600, active_validators: 600, current_epoch_proposers: 11, service: notifier
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: "Failed to produce an aggregate attestation: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)", node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Error during attestation routine        slot: 119, committee_index: 0, error: "Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(\"Failed to produce an aggregate attestation: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)\")", service: attestation
2024-05-17 11:41:04 May 17 08:41:04.002 DEBG Request to beacon node failed           error: HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out), node: http://node0:3600/
2024-05-17 11:41:04 May 17 08:41:04.002 CRIT Failed to produce sync contribution     error: Some endpoints failed, num_failed: 1 http://node0:3600/ => RequestFailed(HttpClient(url: http://node0:3600/, kind: timeout, detail: operation timed out)), beacon_block_root: 0xf39e22a862b1b632a791c716fed6b5196b6dfac88c41ed69185b189f9fdc3a1c, slot: 119, service: sync_committee

charon log:

2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.00039563s", "vapi_endpoint": "sync_committee_contribution"}
2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.000778464s", "vapi_endpoint": "sync_committee_contribution"}
2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.000870464s", "vapi_endpoint": "aggregate_attestation"}
2024-05-17 11:41:40 08:41:40.003 DEBG vapi       Validator api 4xx response               {"status_code": 408, "message": "client cancelled request", "error": "api error[status=408,msg=client cancelled request]: context canceled", "duration": "12.001038714s", "vapi_endpoint": "sync_committee_contribution"}
2024-05-17 11:41:44 08:41:44.001 DEBG sched      Slot ticked                              {"slot": 124}
2024-05-17 11:41:44 08:41:44.003 DEBG fetcher    Timeout calling fetcher/fetch, duty expired {"duty": "119/aggregator"}
2024-05-17 11:41:44 08:41:44.004 DEBG fetcher    Timeout calling fetcher/fetch, duty expired {"duty": "119/sync_contribution"}

CL error:

2024-05-17 11:42:35 May 17 08:42:35.060 WARN Relay error when registering validator(s), error: ServerMessage(ErrorMessage { code: 502, message: "no successful relay response", stacktraces: [] }), num_registrations: 600

2024-05-17 11:43:08 May 17 08:43:08.014 ERRO No valid eth1_data votes, `votes_to_consider` empty, outcome: casting `state.eth1_data` as eth1 vote, genesis_time: 1715933816, earliest_block_timestamp: 1715933796, lowest_block_number: 0, service: deposit_contract_rpc

🛠️ Proposed solution

  • Investigate the root cause of the missing aggregate_attestation duties and address any underlying issues.
  • Investigate the root cause of the missing sync_committee_contribution duties and address any underlying issues.
  • Review the validator client's logs and configuration to identify potential factors contributing to the missed duties.
  • Investigate the cause of the low beacon node score and address any underlying issues.
  • Leverage the https://lighthouse-book.sigmaprime.io/validator-manager-create.html#1-create-the-validators guide to generate a validators.json file, which may help improve the key loading performance.
  • Explore alternative methods or optimizations for loading keys more efficiently, instead of the current one-by-one approach in the run.sh script.
@boulder225 boulder225 added the bug Something isn't working label May 17, 2024
@github-actions github-actions bot added the protocol Protocol Team tickets label May 17, 2024
@boulder225
Copy link
Collaborator Author

Hey team! Please add your planning poker estimate with Zenhub @gsora @KaloyanTanev @pinebit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working protocol Protocol Team tickets
Projects
None yet
Development

No branches or pull requests

2 participants