-
Notifications
You must be signed in to change notification settings - Fork 173
Description
Description
Titan now can use delta compression.
Here is my code repository
Acording to the test result, the compression ratio for compressed record can reach up to 77.88x!
However the database disk size shrink ratio is not so big.
You can see my test result below.
Delta Compression procedure
- Every call for Put will genertate a feature of the record by Odess similarity detection method.
- The feature of the record will stored in the feature index table. Every column family will have a table.
- In gc, every valid record will be searched for similar record by feature.
- Once foud similar record in the table, they will be compressed into a record + multi deltas
Question
I wanna test the impact of the delta compression for Titan.
But I see 2 tools for testing:
Here is my question:
- If I use the scipt in the /tools. There is a lot of work jobs, which should I choose?
- If I use go-ycsb, is there any parameter that I can use to compare with the result in this ariticle?
Test result
Here is the sumary result of titan_delta_compression_test
Enron Email
517401 records have been put into titan databse!
1.40GB(1420666341) are the size of keys and values
59113 (11.42%) is the number of similar records that can be delta compressed
| method | compress fail | compress success | delta size | delta after size | delta compress ratio | compress time |
|---|---|---|---|---|---|---|
| kGDelta | 0 | 97799 | 978.90MB | 17.10MB | 57.48 | 1.05s |
| kXDelta | 0 | 97799 | 978.90MB | 12.60MB | 77.88 | 11.60s |
| kEDelta | 0 | 97799 | 978.90MB | 25.30MB | 38.83 | 1.40s |
| method | database size | database after size | database compress ratio | blob files size | blob files after size | blob file compress ratio |
|---|---|---|---|---|---|---|
| kGDelta | 1.20GB | 974.50MB | 1.17 | 386.90MB | 149.00MB | 2.60 |
| kXDelta | 1.20GB | 974.50MB | 1.17 | 386.90MB | 149.00MB | 2.60 |
| kEDelta | 1.20GB | 974.50MB | 1.17 | 386.90MB | 149.00MB | 2.60 |
Wikipedia
1367732 records have been put into titan databse!
19.10GB(20402694776) are the size of keys and values
731224 (53.46%) is the number of similar records that can be delta compressed
| method | compress fail | compress success | delta size | delta after size | delta compress ratio | compress time |
|---|---|---|---|---|---|---|
| kGDelta | 16 | 729411 | 9.70GB | 1.50GB | 6.61 | 69.51s |
| kXDelta | 0 | 729427 | 9.70GB | 741.70MB | 13.34 | 360.59s |
| kEDelta | 19 | 729408 | 9.70GB | 2.20GB | 4.58 | 81.94s |
| method | database size | database after size | database compress ratio | blob files size | blob files after size | blob file compress ratio |
|---|---|---|---|---|---|---|
| kGDelta | 7.80GB | 7.30GB | 1.08 | 7.50GB | 6.80GB | 1.10 |
| kXDelta | 7.80GB | 7.30GB | 1.08 | 7.50GB | 6.80GB | 1.10 |
| kEDelta | 7.80GB | 7.30GB | 1.08 | 7.50GB | 6.80GB | 1.10 |