Skip to content
This repository was archived by the owner on Oct 18, 2023. It is now read-only.
This repository was archived by the owner on Oct 18, 2023. It is now read-only.

proposal: Add more storage services support for bottomless #711

Open
@Xuanwo

Description

@Xuanwo

Summary

Add more storage services support for bottomless

Motivation

bottomless implements a virtual write-ahead log (WAL) which continuously backs up the data to S3-compatible storage and is able to restore it later. It's natural to consider extending this feature to other storage services such as GCS, AzBlob, HDFS, and more.

Guide-level explanation

Users can serve and repliace sqlite files stored at gcs, azblob in the same way as they are at AWS S3:

LIBSQL_BOTTOMLESS_GCS_BUCKET=<bucket>

or

LIBSQL_BOTTOMLESS_AZBLOB_BUCKET=<bucket>
LIBSQL_BOTTOMLESS_AZBLOB_ACCOUNT_NAME=<account_name>
LIBSQL_BOTTOMLESS_AZBLOB_ACCOUNT_KEY=<account_key>

Reference-level explanation

Introduces OpenDAL to handle the IO operations.

OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way. It's now natively s3, gcs, azblob, oss, hdfs and over 20 different storage services. OpenDAL is used in many cloud native databases like databend, risingwave and greptime.

I'm one of the maintainers of this project 💌

The general usage of OpenDAL will be like:

// Init s3
let mut builder = services::S3::default();
builder.bucket("test");
let op = Operator::new(builder)?.finish();

// A reader implements AsyncRead & AsyncSeek.
let r = op.reader("path/to/file").await?;

// A writer implements AsyncWrite
let w = op.writer("path/to/file").await?;

// A lister implement Stream<Item=Result<Entry>>
let l = op.lister("path/to/dir").await?;

We can add opendal in following steps:

  • Move s3 related config to a seperate S3Options instead of a large Options.
  • Add gcs or azblob support as PoC.
  • Migrate s3 implemenation to OpenDAL too (it depends)

Drawbacks

Make the code and testing more complex to ensure that bottomless works on all storage services, even though OpenDAL has already tested all those services.

Rationale and alternatives

Use storage vendors SDK

We can use the SDK provided by storage vendors to implement the same features.

Good:

Visit storage features directly instead of adding an unified abstraction like OpenDAL.

Bad:

  • More dependences to be added (OpenDAL implement all features without those SDKs)
  • Harder to alter their behaviors (For example, adding logging/metrics/tracing for all services)

Stick to S3-compatible storage

We can stick to S3-compatible storage since most storage services provide S3 API.

Good:

Easy to maintain

Bad:

Users have to access the bucket with static keys as they are unable to utilize IAM, which is a native feature of storage.

For instance, OpenDAL users on GCP can utilize Application Default Credentials (ADC) without the need for manual configuration of credentials.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions