Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

ethcore/client: fix deadlock caused by double-read lock and conflicting lock order #11766

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

BurtonQin
Copy link
Contributor

Fixes #11176

This PR fixes deadlock in ethcore/client caused by double read lock and conflicting lock order.

The first commit fixes double read lock:
As is explained in #11176, two read locks interleaved by a write lock with higher priority lead to deadlock.

Four such cases are caused by chain.read() in build_last_hashes():
e.g.

build_hashes restore_db
chain.read()
build_last_hashes
chain.write()//prior,wait for first read lock
chain.read()//deadlock!

https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L2132-L2134
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L933
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1299-L1304

Two cases are caused by chain.read() in block_number_ref():

log restore_db
chain.read()
block_number_ref
chain.write()//prior,wait for first read lock
chain.read()//deadlock!

https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1979-L1980
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L2000
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1271-L1276
https://github.com/openethereum/openethereum/blob/1b0bbd50a4b75d25b52961f34dcccd5836a07c97/ethcore/src/client/client.rs#L1299-L1304

The fix is to use lock propagating as suggested by @niklasad1. I appends &RwLockReadGuard to the parameter lists of build_last_hashes() and block_number_ref(). Note that the & is necessary because if we move it, then we cannot access the lockguard after calling build_last_hashes().

But after that, I find a callchain: check_and_lock_block->chain.read()->check_epoch_end_signal->build_last_hashes
So I simply drop(chain) before calling check_epoch_end_signal to avoid changing its function signature. This is in the second commit.

The second commit deals with conflicting lock order:
Most of them are related to the wrong order of locks involving state_db in restore_db:
e.g.

restore_db check_and_lock_block
state_db.write()
chain.read()
chain.write()//deadlock!
state_db.read()//deadlock!

Some are related to import_lock:

e.g.

queue_ancient_block restore_db
chain.read()
import_old_block
import_lock.lock()
import_lock.lock()//deadlock!
chain.write()//deadlock!

The fix is to pass the lockguard as a parameter to import_old_block() and enforces the order between import_lock and chain/db.
An alternative way is to add a comment requiring holding import_lock before import_old_block() and remove the lock inside.

@sorpaas sorpaas added A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust. labels Jul 29, 2020
@rakita
Copy link

rakita commented Aug 27, 2020

Hello, just to put this into context. restore_db, that is in question, is a function called only once after warping is finished and all snapshot chucks are downloaded.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible deadlock by recursive calling of parking_lot::read() in one thread in ethcore client
3 participants