Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tower-abci: handle errors more gracefully #29

Open
erwanor opened this issue Jun 14, 2023 · 3 comments
Open

tower-abci: handle errors more gracefully #29

erwanor opened this issue Jun 14, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@erwanor
Copy link
Member

erwanor commented Jun 14, 2023

For each individual connection, we spawn a tokio task that is responsible for driving the state and handling I/O. In this context, a variety of failures can occur, ranging from codec errors to connection failures etc. Right now, if such a failure occurs, we simply crash the task without propagating the error in any way, or contextualizing the failure in a log (beside a rust backtrace).

@erwanor erwanor added the enhancement New feature or request label Jun 14, 2023
@erwanor erwanor removed this from Testnets Jun 14, 2023
@erwanor erwanor added this to Testnets Jun 14, 2023
@erwanor erwanor moved this to Future in Testnets Jun 14, 2023
@erwanor erwanor changed the title Handle errors more gracefully tower-abci: handle errors more gracefully Jun 30, 2023
@erwanor
Copy link
Member Author

erwanor commented Aug 1, 2023

Related to penumbra-zone/penumbra#689

@xla
Copy link
Contributor

xla commented Oct 27, 2023

@erwanor I'm interested in tackling this issue. First I want to clarify that there is no way to propagate non-application errors back to Comet, in fact it is expected that for any such error the ABCI app exits and both processes are meant to restart to initiate the Crash Recovery. Unless there has been a change in Comet within the last couple of months that is the expected behaviour.

@erwanor
Copy link
Member Author

erwanor commented Oct 27, 2023

@xla great point! i have amended the issue. do you have something specific in mind to address it? if you have the appetite for it, we could track the connection handles and propagate the error to the application when a worker fails. otherwise, logging would already be a good first step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Future
Development

No branches or pull requests

2 participants