Skip to content

Support importing data from object storage using greptime cli #7756

@daviderli614

Description

@daviderli614

What problem does the new feature solve?

greptime --version
greptime
branch:
commit: 969a64d483a6593ebe3a95c4240e2136544f7aa1
clean: true
version: 1.0.0-rc.1

Export data to object storage docs: https://docs.greptime.com/user-guide/deployments-administration/disaster-recovery/back-up-&-restore-data/#export-to-s3
greptime cli data export --s3 --s3-access-key xxx ...

greptime cli data export --help
Command for exporting data from the GreptimeDB

Usage: greptime cli data export [OPTIONS] --addr <ADDR>

Options:
      --addr <ADDR>
          Server address to connect

      --output-dir <OUTPUT_DIR>
          Directory to put the exported data. E.g.: /tmp/greptimedb-export for local export

      --database <DATABASE>
          The name of the catalog to export

          [default: greptime-*]

  -j, --db-parallelism <DB_PARALLELISM>
          The number of databases exported in parallel. For example, if there are 20 databases and `db_parallelism` is 4, 4 databases will be exported concurrently

          [default: 1]

      --table-parallelism <TABLE_PARALLELISM>
          The number of tables exported in parallel within a single database. For example, if a database has 30 tables and `parallelism` is 8, 8 tables will be exported concurrently

          [default: 4]

      --max-retry <MAX_RETRY>
          Max retry times for each job

          [default: 3]

      --log-dir <LOG_DIR>


  -t, --target <TARGET>
          Things to export

          [default: all]

          Possible values:
          - schema: Export all table schemas, corresponding to `SHOW CREATE TABLE`
          - data:   Export all table data, corresponding to `COPY DATABASE TO`
          - all:    Export all table schemas and data at once

      --log-level <LOG_LEVEL>


      --start-time <START_TIME>
          A half-open time range: [start_time, end_time). The start of the time range (time-index column) for data export

      --end-time <END_TIME>
          A half-open time range: [start_time, end_time). The end of the time range (time-index column) for data export

      --auth-basic <AUTH_BASIC>
          The basic authentication for connecting to the server

      --timeout <TIMEOUT>
          The timeout of invoking the database.

          It is used to override the server-side timeout setting. The default behavior will disable server-side default timeout(i.e. `0s`).

      --proxy <PROXY>
          The proxy server address to connect, if set, will override the system proxy.

          The default behavior will use the system proxy if neither `proxy` nor `no_proxy` is set.

      --no-proxy
          Disable proxy server, if set, will not use any proxy

      --s3
          if export data to s3

      --ddl-local-dir <DDL_LOCAL_DIR>
          if both `ddl_local_dir` and remote storage (s3/oss) are set, `ddl_local_dir` will be only used for exported SQL files, and the data will be exported to remote storage.

          Note that `ddl_local_dir` export sql files to **LOCAL** file system, this is useful if export client don't have direct access to remote storage.

          if remote storage is set but `ddl_local_dir` is not set, both SQL&data will be exported to remote storage.

      --s3-bucket <S3_BUCKET>
          The s3 bucket name if s3 is set, this is required

      --s3-root <S3_ROOT>
          if s3 is set, this is required

      --s3-endpoint <S3_ENDPOINT>
          The s3 endpoint if s3 is set, this is required

      --s3-access-key <S3_ACCESS_KEY>
          The s3 access key if s3 is set, this is required

      --s3-secret-key <S3_SECRET_KEY>
          The s3 secret key if s3 is set, this is required

      --s3-region <S3_REGION>
          The s3 region if s3 is set, this is required

      --oss
          if export data to oss

      --oss-bucket <OSS_BUCKET>
          The oss bucket name if oss is set, this is required

      --oss-endpoint <OSS_ENDPOINT>
          The oss endpoint if oss is set, this is required

      --oss-access-key-id <OSS_ACCESS_KEY_ID>
          The oss access key id if oss is set, this is required

      --oss-access-key-secret <OSS_ACCESS_KEY_SECRET>
          The oss access key secret if oss is set, this is required

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

But import does not support restoring from object storage: https://docs.greptime.com/user-guide/deployments-administration/disaster-recovery/back-up-&-restore-data/#import-operations

greptime cli data import --help
Command to import data from a directory into a GreptimeDB instance

Usage: greptime cli data import [OPTIONS] --addr <ADDR> --input-dir <INPUT_DIR>

Options:
      --addr <ADDR>
          Server address to connect

      --input-dir <INPUT_DIR>
          Directory of the data. E.g.: /tmp/greptimedb-backup

      --database <DATABASE>
          The name of the catalog to import

          [default: greptime-*]

  -j, --db-parallelism <DB_PARALLELISM>
          The number of databases imported in parallel. For example, if there are 20 databases and `db_parallelism` is 4, 4 databases will be imported concurrently

          [default: 1]

      --max-retry <MAX_RETRY>
          Max retry times for each job

          [default: 3]

  -t, --target <TARGET>
          Things to export

          [default: all]

          Possible values:
          - schema: Import all table schemas into the database
          - data:   Import all table data into the database
          - all:    Export all table schemas and data at once

      --auth-basic <AUTH_BASIC>
          The basic authentication for connecting to the server

      --log-dir <LOG_DIR>


      --log-level <LOG_LEVEL>


      --timeout <TIMEOUT>
          The timeout of invoking the database.

          It is used to override the server-side timeout setting. The default behavior will disable server-side default timeout(i.e. `0s`).

      --proxy <PROXY>
          The proxy server address to connect, if set, will override the system proxy.

          The default behavior will use the system proxy if neither `proxy` nor `no_proxy` is set.

      --no-proxy
          Disable proxy server, if set, will not use any proxy

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

What does the feature do?

Supports importing data from object storage.

Implementation challenges

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions