-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Txn Commit can result in transaction guarantee violation #6
Comments
Thanks for the review, really appreciated. Yeah, I'm planning to do this, first by merging delete+update+insert together and acquiring the locks then releasing them at the end. Do you have any references on the protocols mentioned? I was thinking about sharding locks as well, currently we have RWLocks per column but it may be worth to sharding them per row range (say a lock per column and every 100K rows). Not quite sure yet what's the best approach. |
Some on the net:
About the sharding, seems like you can use Clickhouse's MergeTree, although clickhouse doesnt guarantee any isolation or anything. But this seems like good tradeoff, optimizing for read. |
Seems this behavior also affect per column lock behavior. While reading the value , it only locks the column, but on txn code, especially for delete and insert, doesnt have that. |
Great thanks, I'll read through all of this over the weekend. But indeed, I'll write some unit tests that test different transaction isolation levels. However I wonder if we drop column-level locks and instead create a sharded lock. I'm leaning towards a simple I think we'll easily be able to guarantee |
Separating read-only and read-write transaction definitely gonna makes the API clearer about the guarantee, I totally support this. As for per page (row shard?), how would it handle insert? As insert might create new page. But this definitely gonna help with update/delete case. |
For isolation level, typical read-focused databases benefit from |
It could be of fixed size, something like a pre-allocated slice and then use range reduction and modulo to figure out the shard index. For example if you have a slice of 100 locks, and we want a new lock every 1k items, we can do
Yeah, but I think first things first. Implementing properly |
Ah okay, with predefined max length, locks will be enough.
I take it as possiblity that this repo may support multiple isolation level at once? Some databases choose to have only one isolation level, cause not all people understand isolation level enough
Definitely true |
Some work in progress #7 |
@kelindar Can this work be summarized in the readme to describe the current implementation? From the readme, the suggestion to use merge instead of reading and writing in a single transaction seems to imply that transactions are not atomic. But otherwise, the behaviour and performance of concurrent writes leave me guessing. |
I'll add this to the readme. TL;DR is that transactions are atomic, but the isolation level currently implemented is essentially read committed, which means that you have dirty/phantom reads in the system that you have to deal with. In order to update a particular row without reading it, and hence without being exposed to dirty/phantom reads, merge was introduced. |
Hi
Looking at the txn source, it seems that Commit is not holding the table lock for the entire apply duration, but instead on per
delete/update/insert
basis. This means that another concurrent transaction might read some deleted value (cause it is done first), and not see the update. Even worse, first transaction might delete a record, while second one might try to update it, resulting in lost update. Haven't thought deeply about the per column lock problems, but probably it gonna has same issues.For disclosure, I have not tried to reproduce this, but full lock is needed to ensure only one transaction is updating, till finish. This is what sqlite does. For concurrent writer db (e.g. postgres/mysql), they have another protocol (one example is xmin/xmax check in postgres, range/gap lock till commit in mysql) to handle this.
The simplest solution is to just wrap the entire Commit call with a lock, at the cost of reduced concurrency.
What do you think?
The text was updated successfully, but these errors were encountered: