fix(core): continue backup workflow even after I/O errors & timeouts#6651
fix(core): continue backup workflow even after I/O errors & timeouts#6651romanz wants to merge 9 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR moves the unresponsive-warning timeout into the wire protocol handler, adds a module-level timeout and _waiting_screen(), converts ButtonRequestHandler.br_task() to an async coroutine and adjusts put()/join() signatures, updates ContinueOnErrors to retry on UnexpectedMessageException without failing, removes the UI-side delayed waiting coroutine and adapts Layout to await the handler directly, and adds/refactors tests plus updated UI fixtures and a changelog line. sequenceDiagram
participant Client
participant Layout
participant ButtonRequestHandler
participant _waiting_screen
Client->>Layout: initiates operation requiring button ack
Layout->>ButtonRequestHandler: put(button request)
Client->>ButtonRequestHandler: send response (or delayed)
Layout->>ButtonRequestHandler: join()
alt response before timeout
ButtonRequestHandler->>Layout: is_done -> join() returns
else timeout
ButtonRequestHandler->>_waiting_screen: start timeout and show comm warning
_waiting_screen->>ButtonRequestHandler: may raise cancellation -> abort flow
ButtonRequestHandler->>Layout: join() returns (possibly with Failure)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
4d11c88 to
b46944e
Compare
b46944e to
efc2321
Compare
efc2321 to
d419aa0
Compare
c754fc8 to
374d32f
Compare
d419aa0 to
7d316d2
Compare
80f484b to
1a197d9
Compare
7d316d2 to
59dfbe0
Compare
fdb4567 to
a000c84
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/src/trezor/wire/protocol_common.py`:
- Around line 267-272: The br_task method currently returns None when
self.ignore is true, which breaks callers (Layout.get_result) that always
schedule the returned task; change br_task in protocol_common.py so that when
self.ignore is set it returns a schedulable no-op generator/task instead of None
(i.e., a generator that yields nothing or immediately completes) so callers like
Layout.get_result and code paths that call br_handler.br_task(...) (e.g.,
layout.show_backup_success) can store and schedule it without special-casing “no
task.”
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 79e8abe1-55f8-4539-9230-1a1a6da2f595
📒 Files selected for processing (4)
core/.changelog.d/6348.fixedcore/src/apps/management/reset_device/__init__.pycore/src/trezor/ui/__init__.pycore/src/trezor/wire/protocol_common.py
a000c84 to
f71078d
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
core/src/trezor/wire/protocol_common.py (1)
285-317: 🏗️ Heavy liftPlease lock the ignore-host transition down with a focused regression test.
This path now depends on
successsurvivingUnexpectedMessageException, generic handler failures, and task cancellation, withself.ignorethen turning laterput()/join()/br_task()calls into no-ops. That behavior is subtle and central to the backup-safety story, so it deserves a dedicated test instead of relying only on fixture updates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/trezor/wire/protocol_common.py` around lines 285 - 317, Add a focused regression test for the ignore-host transition in ContinueOnErrors: exercise the _handle flow to reproduce the three paths mentioned (Successful loop iteration, an UnexpectedMessageException during the inner loop, and task cancellation/GeneratorExit) and assert that self.ignore is set to False only when success=True and True otherwise; also assert downstream methods that rely on ignore (e.g., put(), join(), br_task() on the ButtonRequest handler) become no-ops when ignore is True. Use the ContinueOnErrors class, trigger UnexpectedMessageException and a generic Exception/cancellation, and validate the final value of the instance attribute ignore and the observable behavior of put()/join()/br_task() to lock this regression in a concise unit test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@core/src/trezor/wire/protocol_common.py`:
- Around line 285-317: Add a focused regression test for the ignore-host
transition in ContinueOnErrors: exercise the _handle flow to reproduce the three
paths mentioned (Successful loop iteration, an UnexpectedMessageException during
the inner loop, and task cancellation/GeneratorExit) and assert that self.ignore
is set to False only when success=True and True otherwise; also assert
downstream methods that rely on ignore (e.g., put(), join(), br_task() on the
ButtonRequest handler) become no-ops when ignore is True. Use the
ContinueOnErrors class, trigger UnexpectedMessageException and a generic
Exception/cancellation, and validate the final value of the instance attribute
ignore and the observable behavior of put()/join()/br_task() to lock this
regression in a concise unit test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 274b60e8-fb77-4754-bd8d-479d026fb110
📒 Files selected for processing (1)
core/src/trezor/wire/protocol_common.py
lapohoda
left a comment
There was a problem hiding this comment.
- I'd omit the use of icon-only header unless we need to.
- There's an update proposal from Chris regarding copy:
"Communication with your connected device failed. Cancel and try again."
Button: Cancel.
Design file updated: https://www.figma.com/design/XzPdVcLsDdvJrMfzyLho5Y/GitHub--6348?node-id=8028-49&t=Gqe4fyfXxmgBVyEu-11
|
Rebasing to update UI fixtures. |
a503508 to
bc6da12
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/device_tests/test_msg_delayed_ack.py (1)
42-50: ⚡ Quick winAdd clarifying comment for the second
press_yes()call.The second
press_yes()on line 46 is not immediately obvious—it's pressing the abort button on the "waiting" screen that appears after the 2.5-second timeout, not the initial button request confirmation (which was already done inping_without_ack). A brief comment would help readers understand the flow.📝 Suggested comment
`@pytest.mark.models`("core") def test_abort(session: Session): ping_without_ack(session) + # Abort the "waiting" screen that appears after the timeout session.debug.press_yes() assert session.read() == messages.Failure(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/device_tests/test_msg_delayed_ack.py` around lines 42 - 50, Add a brief clarifying comment above the second session.debug.press_yes() in test_abort to explain that this press is dismissing the "waiting" screen shown after the 2.5s timeout (not the initial confirmation already handled by ping_without_ack); reference the test name test_abort, the helper ping_without_ack, and the press call session.debug.press_yes() so reviewers understand the flow before the subsequent assertions against messages.Failure and session.client.ping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/device_tests/test_msg_delayed_ack.py`:
- Around line 42-50: Add a brief clarifying comment above the second
session.debug.press_yes() in test_abort to explain that this press is dismissing
the "waiting" screen shown after the 2.5s timeout (not the initial confirmation
already handled by ping_without_ack); reference the test name test_abort, the
helper ping_without_ack, and the press call session.debug.press_yes() so
reviewers understand the flow before the subsequent assertions against
messages.Failure and session.client.ping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3790924d-6a4e-40cb-8ed8-c0fd37a8f325
📒 Files selected for processing (2)
tests/device_tests/test_msg_delayed_ack.pytests/ui_tests/fixtures.json
Sounds good, will update.
Will it be please possible to update the copy to allow also continuation (in case there is no response from the host during backup flow)? The current implementation changes the "verb" from EDIT: let's use "Communication with your connected device failed. It’s safe to continue." + |
|
Let's postpone this for |
cf8cdbf to
14dc12b
Compare
Use `None` as a special marker for explicitly hiding the header. [no changelog]
Implemented for Bolt and Eckhart. It was done for Delizia in a6090dc. [no changelog]
[no changelog]
No more button requests / errors will be sent later during the backup workflow after an I/O-related error/timeout. Host will be ignored until the backup workflow is over - sending the final "Success" response may fail. Other workflows can be aborted after user cancellation via `_waiting_screen()`.
14dc12b to
ad3a86c
Compare




























































































































































Related to #3686 (comment).
No more button requests / errors will be sent later during the backup workflow after an I/O-related error/timeout.
Host will be ignored until the backup workflow is over - sending the final "Success" response may fail.
Other workflows can be aborted after user cancellation on
_waiting_screen().Timeout handling
In case the button request is not ACKed for 2s, the "trouble communicating" layout will allow aborting (for regular workflows) or continuing (for backup workflow):
TODOs: