Describe the bug
Our THP implementation doesn't handle the case where the underlying transport is blocked for writes, when our peer is not reading it - while trying to send us a (large) message back.
Our current device-side implementation times out after 5 seconds, causing the write operation to fail (when using USB):
|
if isinstance(result, int): |
|
# Can happen when the USB peer is not reading. |
|
raise Timeout("THP write is blocked") |
For this to happen, the host needs to be "blocked" as well (causing a deadlock), e.g. while trying to send another message over the same transport.
In theory, this can also happen before #6135:
- the device send an message to the host, which takes some time before ACKing it
- the device re-transmits and gets blocked
- the host sends an ACK, and then tries to send the next message
- the host gets blocked as well :(
Firmware version and revision
59d0e66
To Reproduce
$ BOOTLOADER_DEVEL=1 TREZOR_MODEL=T3W1 QUIET_MODE=1 PYOPT=0 make -C core build_firmware upload
$ trezorctl -vvv get-features
The device times out during THP handshake:
[08:59:56.627] 11.325 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201cccc0>
[08:59:56.630] 11.327 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201cd4e0>
[08:59:56.635] 11.332 trezor.wire.thp.interface_context DBG [USBIF] New channel allocated with id: 061a
[08:59:56.638] 11.335 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201ce130>
[08:59:56.641] 11.338 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) channel initialization
[08:59:56.643] 11.340 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[08:59:56.645] 11.343 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) reassembled valid message: 42 bytes
[08:59:56.648] 11.345 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[08:59:56.648] 11.346 trezor.wire.thp.received_message_handler DBG [USBIF] handle_state_handshake
[08:59:56.651] 11.348 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) THP ACK piggybacking = True
[08:59:56.652] 11.350 trezor.wire.thp.channel DBG [USBIF] Writing ACK message to a channel with cid: 061a, ack_bit: 0
[08:59:56.785] 11.482 trezor.wire.thp.crypto DBG th1 - enc (key: 167042b8ddc2710d0bcd1bbd7738a3b21966d63d15f390e4d73c25d98b874a7b, nonce: 0, handshake_hash 34d4e0bd7b20a4c269c1b39a5eb5c1494d28d84162a675dbf44393e20020e4c0)
[08:59:56.808] 11.505 trezor.wire.thp.received_message_handler DBG [USBIF] trezor ephemeral public key: 70971dcb2345d6ff1f493e304c4a30fd7d6e55e41d96bf2f96f03984f1234272
[08:59:56.809] 11.506 trezor.wire.thp.received_message_handler DBG [USBIF] encrypted trezor masked static public key: d117090c624b462e81d5b5cd171878d4ff18702eeec7415d9d4998253c96cbfd0d9bc9247933bff76fa9c2d276d63041
[08:59:56.811] 11.508 trezor.wire.thp.received_message_handler DBG [USBIF] tag: 4f3ed5becec95739c31c6a7c4b58bf2d
[08:59:56.813] 11.511 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) Sending 96 bytes, latency: 0 ms
[08:59:56.816] 11.513 trezor.loop DBG finish: <generator object '_write_packets' at 0x201d19a0>
[08:59:57.018] 11.716 trezor.loop ERR exception:
[08:59:57.018] Traceback (most recent call last):
[08:59:57.018] File "trezor/loop.py", in _step
[08:59:57.018] File "trezor/wire/thp/interface_context.py", in _timeout_after
[08:59:57.019] Timeout:
[08:59:57.020] 11.718 trezor.wire.thp.channel WRN Retransmit after 200 ms
[08:59:57.343] 12.040 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201d1f00>
[09:00:02.022] 16.720 trezor.loop DBG finish: <generator object '__iter__' at 0x201d23b0>
[09:00:02.024] 16.722 trezor.wire DBG Finished THP session: (<USBIF>, <BLEIF>)
[09:00:02.024] 16.722 trezor.workflow DBG joining 0 workflows
[09:00:02.026] 16.723 trezor.loop DBG finish: <generator object '__iter__' at 0x201cbe20>
[09:00:02.027] 16.724 trezor.loop DBG finish: <generator object 'handle_session' at 0x201c9de0>
[09:00:02.027] 16.725 trezor.loop DBG finish: <generator object '__iter__' at 0x201d2a10>
[09:00:02.028] 16.725 trezor.loop ERR exception:
[09:00:02.028] Traceback (most recent call last):
[09:00:02.028] File "trezor/loop.py", in _step
[09:00:02.028] File "trezor/wire/__init__.py", in handle_session_thp
[09:00:02.028] File "trezor/wire/thp/received_message_handler.py", in handle_received_message
[09:00:02.029] File "trezor/wire/thp/received_message_handler.py", in _handle_state_handshake
[09:00:02.029] File "trezor/wire/thp/channel.py", in write_encrypted_payload
[09:00:02.029] File "trezor/wire/thp/channel.py", in _write_loop
[09:00:02.029] File "trezor/wire/thp/channel.py", in _write_payload_once
[09:00:02.029] Timeout: THP write is blocked
[09:00:02.029] 16.726 session DBG Restarting main loop
[09:00:02.335] 17.032 trezor.loop DBG spawn new task: <generator object 'handle_session' at 0x201c9b60>
[09:00:02.356] 17.053 trezor.workflow DBG setting a new default: <generator>
[09:00:02.356] 17.054 trezor.loop DBG spawn new task: <generator object 'homescreen' at 0x201caa80>
[09:00:02.357] 17.054 trezor.workflow DBG start default: <generator object 'homescreen' at 0x201caa80>
[09:00:02.387] 17.085 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201cca40>
[09:00:02.390] 17.088 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) channel initialization
[09:00:02.392] 17.090 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:02.394] 17.091 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201cd5a0>
[09:00:02.397] 17.094 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:02.399] 17.096 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) reassembled valid message: 73 bytes
[09:00:02.401] 17.099 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:02.402] 17.099 trezor.wire.thp.received_message_handler DBG [USBIF] handle_state_handshake
[09:00:02.404] 17.102 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) THP ACK piggybacking = True
[09:00:02.407] 17.105 trezor.wire.thp.channel WRN [USBIF] (cid: 061a) Unexpected control byte - ignoring 12061a0044ab22116f824ae297de71ef4a3d0e36d9deb34d901e072891c943698fc9ddc00ce17d78e4fd411d87de7b6d9ea4b93332a825ddcffdd5cf49bb72eec2afab7a17544bf6b7
[09:00:03.092] 17.789 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201ceab0>
[09:00:03.095] 17.792 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:03.096] 17.794 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201cf1c0>
[09:00:03.099] 17.797 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:03.101] 17.799 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) reassembled valid message: 73 bytes
[09:00:03.104] 17.801 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) THP ACK piggybacking = True
[09:00:03.107] 17.804 trezor.wire.thp.channel WRN [USBIF] (cid: 061a) Unexpected control byte - ignoring 12061a0044ab22116f824ae297de71ef4a3d0e36d9deb34d901e072891c943698fc9ddc00ce17d78e4fd411d87de7b6d9ea4b93332a825ddcffdd5cf49bb72eec2afab7a17544bf6b7
[09:00:03.896] 18.593 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201cfee0>
[09:00:03.899] 18.596 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:03.900] 18.598 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201d05c0>
[09:00:03.903] 18.601 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:03.905] 18.603 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) reassembled valid message: 73 bytes
[09:00:03.907] 18.605 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) THP ACK piggybacking = True
[09:00:03.911] 18.608 trezor.wire.thp.channel WRN [USBIF] (cid: 061a) Unexpected control byte - ignoring 12061a0044ab22116f824ae297de71ef4a3d0e36d9deb34d901e072891c943698fc9ddc00ce17d78e4fd411d87de7b6d9ea4b93332a825ddcffdd5cf49bb72eec2afab7a17544bf6b7
[09:00:04.899] 19.597 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201d12e0>
[09:00:04.902] 19.600 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:04.904] 19.601 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201d19c0>
[09:00:04.907] 19.604 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:04.909] 19.606 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) reassembled valid message: 73 bytes
[09:00:04.911] 19.609 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) THP ACK piggybacking = True
[09:00:04.914] 19.612 trezor.wire.thp.channel WRN [USBIF] (cid: 061a) Unexpected control byte - ignoring 12061a0044ab22116f824ae297de71ef4a3d0e36d9deb34d901e072891c943698fc9ddc00ce17d78e4fd411d87de7b6d9ea4b93332a825ddcffdd5cf49bb72eec2afab7a17544bf6b7
[09:00:06.303] 21.000 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201d26e0>
[09:00:06.306] 21.003 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:06.307] 21.005 trezor.loop DBG finish: <generator object '_wait_for_packet' at 0x201d2dc0>
[09:00:06.310] 21.008 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) get_channel_state: TH1
[09:00:06.312] 21.010 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) reassembled valid message: 73 bytes
[09:00:06.314] 21.012 trezor.wire.thp.channel DBG [USBIF] (cid: 061a) THP ACK piggybacking = True
[09:00:06.317] 21.015 trezor.wire.thp.channel WRN [USBIF] (cid: 061a) Unexpected control byte - ignoring 12061a0044ab22116f824ae297de71ef4a3d0e36d9deb34d901e072891c943698fc9ddc00ce17d78e4fd411d87de7b6d9ea4b93332a825ddcffdd5cf49bb72eec2afab7a17544bf6b7
Describe the bug
Our THP implementation doesn't handle the case where the underlying transport is blocked for writes, when our peer is not reading it - while trying to send us a (large) message back.
Our current device-side implementation times out after 5 seconds, causing the write operation to fail (when using USB):
trezor-firmware/core/src/trezor/wire/thp/channel.py
Lines 492 to 494 in 59d0e66
For this to happen, the host needs to be "blocked" as well (causing a deadlock), e.g. while trying to send another message over the same transport.
In theory, this can also happen before #6135:
Firmware version and revision
59d0e66
To Reproduce
The device times out during THP handshake: