Skip to content

New Feature Add! Delta compression ratio can reach up to 77.88x! #245

@apple-ouyang

Description

@apple-ouyang

Description

Titan now can use delta compression.
Here is my code repository
Acording to the test result, the compression ratio for compressed record can reach up to 77.88x!
However the database disk size shrink ratio is not so big.
You can see my test result below.

Delta Compression procedure

  1. Every call for Put will genertate a feature of the record by Odess similarity detection method.
  2. The feature of the record will stored in the feature index table. Every column family will have a table.
  3. In gc, every valid record will be searched for similar record by feature.
  4. Once foud similar record in the table, they will be compressed into a record + multi deltas

Question

I wanna test the impact of the delta compression for Titan.
But I see 2 tools for testing:

  1. the script in the /tools
  2. go-ycsb used in this ariticle

Here is my question:

  1. If I use the scipt in the /tools. There is a lot of work jobs, which should I choose?
  2. If I use go-ycsb, is there any parameter that I can use to compare with the result in this ariticle?

Test result

Here is the sumary result of titan_delta_compression_test

Enron Email

517401 records have been put into titan databse!

1.40GB(1420666341) are the size of keys and values

59113 (11.42%) is the number of similar records that can be delta compressed

method compress fail compress success delta size delta after size delta compress ratio compress time
kGDelta 0 97799 978.90MB 17.10MB 57.48 1.05s
kXDelta 0 97799 978.90MB 12.60MB 77.88 11.60s
kEDelta 0 97799 978.90MB 25.30MB 38.83 1.40s
method database size database after size database compress ratio blob files size blob files after size blob file compress ratio
kGDelta 1.20GB 974.50MB 1.17 386.90MB 149.00MB 2.60
kXDelta 1.20GB 974.50MB 1.17 386.90MB 149.00MB 2.60
kEDelta 1.20GB 974.50MB 1.17 386.90MB 149.00MB 2.60

Wikipedia

1367732 records have been put into titan databse!

19.10GB(20402694776) are the size of keys and values

731224 (53.46%) is the number of similar records that can be delta compressed

method compress fail compress success delta size delta after size delta compress ratio compress time
kGDelta 16 729411 9.70GB 1.50GB 6.61 69.51s
kXDelta 0 729427 9.70GB 741.70MB 13.34 360.59s
kEDelta 19 729408 9.70GB 2.20GB 4.58 81.94s
method database size database after size database compress ratio blob files size blob files after size blob file compress ratio
kGDelta 7.80GB 7.30GB 1.08 7.50GB 6.80GB 1.10
kXDelta 7.80GB 7.30GB 1.08 7.50GB 6.80GB 1.10
kEDelta 7.80GB 7.30GB 1.08 7.50GB 6.80GB 1.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions