proposal: Add more storage services support for bottomless #711
Description
Summary
Add more storage services support for bottomless
Motivation
bottomless
implements a virtual write-ahead log (WAL) which continuously backs up the data to S3-compatible storage and is able to restore it later. It's natural to consider extending this feature to other storage services such as GCS, AzBlob, HDFS, and more.
Guide-level explanation
Users can serve and repliace sqlite files stored at gcs, azblob in the same way as they are at AWS S3:
LIBSQL_BOTTOMLESS_GCS_BUCKET=<bucket>
or
LIBSQL_BOTTOMLESS_AZBLOB_BUCKET=<bucket>
LIBSQL_BOTTOMLESS_AZBLOB_ACCOUNT_NAME=<account_name>
LIBSQL_BOTTOMLESS_AZBLOB_ACCOUNT_KEY=<account_key>
Reference-level explanation
Introduces OpenDAL to handle the IO operations.
OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way. It's now natively s3, gcs, azblob, oss, hdfs and over 20 different storage services. OpenDAL is used in many cloud native databases like databend, risingwave and greptime.
I'm one of the maintainers of this project 💌
The general usage of OpenDAL will be like:
// Init s3
let mut builder = services::S3::default();
builder.bucket("test");
let op = Operator::new(builder)?.finish();
// A reader implements AsyncRead & AsyncSeek.
let r = op.reader("path/to/file").await?;
// A writer implements AsyncWrite
let w = op.writer("path/to/file").await?;
// A lister implement Stream<Item=Result<Entry>>
let l = op.lister("path/to/dir").await?;
We can add opendal in following steps:
- Move s3 related config to a seperate S3Options instead of a large
Options
. - Add gcs or azblob support as PoC.
- Migrate s3 implemenation to OpenDAL too (it depends)
Drawbacks
Make the code and testing more complex to ensure that bottomless works on all storage services, even though OpenDAL has already tested all those services.
Rationale and alternatives
Use storage vendors SDK
We can use the SDK provided by storage vendors to implement the same features.
Good:
Visit storage features directly instead of adding an unified abstraction like OpenDAL.
Bad:
- More dependences to be added (OpenDAL implement all features without those SDKs)
- Harder to alter their behaviors (For example, adding logging/metrics/tracing for all services)
Stick to S3-compatible storage
We can stick to S3-compatible storage since most storage services provide S3 API.
Good:
Easy to maintain
Bad:
Users have to access the bucket with static keys as they are unable to utilize IAM, which is a native feature of storage.
For instance, OpenDAL users on GCP can utilize Application Default Credentials (ADC) without the need for manual configuration of credentials.