ethcore/client: fix deadlock caused by double-read lock and conflicting lock order #11766

BurtonQin · 2020-06-05T09:37:28Z

This PR fixes deadlock in ethcore/client caused by double read lock and conflicting lock order.

The first commit fixes double read lock:
As is explained in #11176, two read locks interleaved by a write lock with higher priority lead to deadlock.

Four such cases are caused by chain.read() in build_last_hashes():
e.g.

build_hashes	restore_db
chain.read()
build_last_hashes
	chain.write()//prior,wait for first read lock
chain.read()//deadlock!

https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L2132-L2134
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L933
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1299-L1304

Two cases are caused by chain.read() in block_number_ref():

log	restore_db
chain.read()
block_number_ref
	chain.write()//prior,wait for first read lock
chain.read()//deadlock!

https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1979-L1980
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L2000
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1271-L1276
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1299-L1304

The fix is to use lock propagating as suggested by @niklasad1. I appends &RwLockReadGuard to the parameter lists of build_last_hashes() and block_number_ref(). Note that the & is necessary because if we move it, then we cannot access the lockguard after calling build_last_hashes().

But after that, I find a callchain: check_and_lock_block->chain.read()->check_epoch_end_signal->build_last_hashes
So I simply drop(chain) before calling check_epoch_end_signal to avoid changing its function signature. This is in the second commit.

The second commit deals with conflicting lock order:
Most of them are related to the wrong order of locks involving state_db in restore_db:
e.g.

restore_db	check_and_lock_block
state_db.write()
	chain.read()
chain.write()//deadlock!
	state_db.read()//deadlock!

Some are related to import_lock:

e.g.

queue_ancient_block	restore_db
chain.read()
import_old_block
	import_lock.lock()
import_lock.lock()//deadlock!
	chain.write()//deadlock!

The fix is to pass the lockguard as a parameter to import_old_block() and enforces the order between import_lock and chain/db.
An alternative way is to add a comment requiring holding import_lock before import_old_block() and remove the lock inside.

rakita · 2020-08-27T10:14:03Z

Hello, just to put this into context. restore_db, that is in question, is a function called only once after warping is finished and all snapshot chucks are downloaded.

BurtonQin added 2 commits June 5, 2020 15:32

ethcore/client: fix deadlock caused by double read lock

a380494

ethcore/client: fix deadlock caused by conflicting lock order

e616bda

This was referenced Jun 5, 2020

ethcore/test_helpers: fix deadlock caused by double read lock #11767

Open

ethcore/light: fix deadlock caused by conflicting lock order #11768

Open

util/network-devp2p: fix possible deadlock caused by conflicting lock order #11769

Open

sorpaas approved these changes Jul 29, 2020

View reviewed changes

sorpaas added A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust. labels Jul 29, 2020

ngotchac mentioned this pull request Aug 26, 2020

Client occasionally stops importing new blocks #11758

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ethcore/client: fix deadlock caused by double-read lock and conflicting lock order #11766

ethcore/client: fix deadlock caused by double-read lock and conflicting lock order #11766

BurtonQin commented Jun 5, 2020

rakita commented Aug 27, 2020

ethcore/client: fix deadlock caused by double-read lock and conflicting lock order #11766

Are you sure you want to change the base?

ethcore/client: fix deadlock caused by double-read lock and conflicting lock order #11766

Conversation

BurtonQin commented Jun 5, 2020

rakita commented Aug 27, 2020