Numbers are measured against a local greptime/greptimedb:v1.0.0 container on Apple M4 Max (16 cores) / 48 GB, Node.js 22.17.
Schema: 22-column bench_logs with ~3072-series cardinality (details below).
| Bench | Setup | Concurrency | Throughput | p50 | p95 | p99 |
|---|---|---|---|---|---|---|
| unary | 1M rows, batch=1000 | 1 (request/resp) | 26k rows/s | 33ms | 40ms | 207ms |
| streaming | 1M rows, batch=1000 | 1 (single stream) | 31k rows/s | 27ms | 38ms | 207ms |
| bulk | 2M rows, batch=5000 | 8 | 137k rows/s | 48ms | 353ms | 395ms |
Batch sizes mirror the Rust SDK's published log benchmark.
- Rows are pre-generated before the timer starts.
- Timer is end-to-end — it includes every ack /
stream.finish()/bulk.finish(). - Unary and streaming are single-concurrency; real deployments run multiple clients for more throughput.
- Bulk uses fire-and-forget (
writeRowsAsync); p50/p95/p99 measure submit-to-ack under the saturated pipeline.
A separate cpu-bulk-api bench mirrors the schema and shape used in the Greptime ingestion protocol benchmark blog: 4 string tags + 5 Float64 fields + 1 ms timestamp, numHosts × 5 × 10 × 20 series, round-robin distribution.
| Config | Go SDK (blog) | TS SDK (this repo) |
|---|---|---|
| 1M series, 10M rows, batch=1000 | 2.01M rows/s (p50=1.7ms) | ~700k rows/s (p50=11ms) |
Today's gap is Arrow JS single-thread encoding (rowsToArrowTable ~= 99% of client CPU, measured via bench/encode-only.ts). Encoder work is ongoing.
Three benches share the CPU schema above and feed identically-shaped pre-generated data (same series layout and ms-stepped timestamps; Float64 values are re-rolled per run via Math.random()) through their respective clients. GreptimeDB serves gRPC on 4001, InfluxDB v2 and OTLP over HTTP on 4000.
cpu-bulk-api—@greptime/ingester, Arrow Flight bulk path. Writes a proper time-series table (4-tag composite PK + 5 fields + ms ts).cpu-influxdb—@influxdata/influxdb-clientv1.35, InfluxDB line protocol.cpu-otel—@opentelemetry/exporter-logs-otlp-protov0.215, OTLP log records withX-Greptime-Pipeline-Name: greptime_identityso attributes map to real columns (without it they collapse into a JSON blob and the comparison is meaningless).
parallelism=8, numHosts=100 (100k series). Median of 3 runs; each run starts with a fresh (just-dropped) table.
| Bench | 1M rows, batch=1000 | p50 | p95 | p99 | 1M rows, batch=5000 | p50 | p95 | p99 |
|---|---|---|---|---|---|---|---|---|
cpu-bulk-api |
789k rows/s | 9ms | 11ms | 22ms | 758k rows/s | 47ms | 72ms | 99ms |
cpu-influxdb |
494k rows/s | 14ms | 24ms | 26ms | 520k rows/s | 69ms | 119ms | 127ms |
cpu-otel |
679k rows/s | 10ms | 14ms | 28ms | 638k rows/s | 53ms | 110ms | 156ms |
Arrow Flight bulk leads OTLP by ~1.2× and InfluxDB LP by ~1.6×. The advantage is server-side: rows arrive as a ready-made Arrow columnar batch, skipping text/proto parsing and per-attribute column mapping. The OTel table still carries log-model columns (ScopeName, TraceId, …) and has no TAG-marked primary key, so these numbers measure ingestion throughput only, not query-path parity.
Row counts were spot-checked out-of-band with SELECT COUNT(*); the bench scripts themselves don't run the verification.
Both comparison SDKs run near-default. We override only what's needed for --batch-size and --parallelism to actually take effect:
- InfluxDB: one
WriteApiper worker with{ batchSize: --batch-size }. The SDK's default of 1000 would silently chunk a 5000-point batch into 5 POSTs. Each worker callswritePoints(batch)thenflush(true)—truedrains the retry buffer so a transient 429/5xx is only counted as written once its retry succeeds.maxRetries,flushInterval, etc. stay at defaults. - OTLP: single
OTLPLogExporterwithconcurrencyLimit: max(30, --parallelism). The default 30 would reject--parallelism > 30outright. We callexporter.export(records, cb)directly instead of going throughLoggerProvider+BatchLogRecordProcessor, which serialises exports internally and would cap in-flight requests at 1.
./scripts/run-greptimedb.sh # or point at your own deployment
# gRPC (default endpoint localhost:4001)
pnpm bench bulk-api --rows=2000000 --batch-size=5000
pnpm bench cpu-bulk-api --rows=1000000 --batch-size=1000 --parallelism=8
# HTTP (default endpoint http://localhost:4000)
pnpm bench cpu-influxdb --rows=1000000 --batch-size=1000 --parallelism=8
pnpm bench cpu-otel --rows=1000000 --batch-size=1000 --parallelism=8Available benches: regular-api, stream-api, bulk-api, cpu-bulk-api, cpu-influxdb, cpu-otel. Drop the target table (benchmark_grpc_bulk, benchmark_influxdb, benchmark_otel, or bench_logs) between runs for fresh-insert measurements. CLI flags take precedence over env vars: GREPTIMEDB_ENDPOINT, GREPTIMEDB_HTTP_ENDPOINT, GREPTIMEDB_DATABASE, GREPTIMEDB_USER, GREPTIMEDB_PASSWORD.
Shared flags:
--rows=N— rounded down to a multiple of--batch-size--batch-size=N--parallelism=N— bulk / cpu-* only (default 8)--num-hosts=N— cpu-* only; cardinality =N × 5 × 10 × 20(default 100 → 100k series; 1000 matches the blog's 1M-series config)--endpoint=host:port— gRPC benches--http-endpoint=URL,--database=NAME,--user=NAME,--password=VALUE— HTTP benches
- Raise
parallelismto 12–16 if your producer is CPU-light and request latency dominates (e.g. remote server). - Enable
BulkCompression.Lz4orBulkCompression.Zstdfor bandwidth-constrained links. - Prefer bulk over unary for sustained ingest — Arrow columnar amortizes encoding cost across the batch.
- One
Clientper process; channels pool internally. - With body compression enabled,
compressBatchMessageusesPromise.allto compress every buffer in a RecordBatch concurrently.lz4-napiand@mongodb-js/zstdare both native add-ons that dispatch to the libuv thread pool (UV_THREADPOOL_SIZE, default 4). Wide schemas (many columns → many buffers) combined with highparallelismcan saturate the pool and cause head-of-line blocking for other async I/O. If you see unexplained latency cliffs under bulk + compression, setUV_THREADPOOL_SIZE=8(or higher) in the producer process.
22-column bench_logs table, source bench/log-data-provider.ts.
| Column | Kind | Type | Notes |
|---|---|---|---|
service |
tag | String | 6 values |
host |
tag | String | 32 values |
region |
tag | String | 4 values |
env |
tag | String | 4 values |
pod |
field | String | 64 values |
log_level |
field | String | 4 values |
message |
field | String | synthesized per row |
trace_id |
field | String | 32-hex, high cardinality |
span_id |
field | String | 16-hex, high cardinality |
http_method |
field | String | 5 values |
http_path |
field | String | 5 values |
http_status_class |
field | String | 4 values |
user_agent |
field | String | 4 values |
client_ip |
field | String | synthesized per row |
caller |
field | String | synthesized per row |
latency_ms |
field | Float64 | |
bytes_in |
field | Int64 | |
bytes_out |
field | Int64 | |
error_flag |
field | Bool | |
retry_flag |
field | Bool | |
log_ts |
timestamp | Timestamp (millisecond) | time index |
ingest_ts |
field | Int64 | client-side ingest time (kept as field for comparison) |
The 4 tag columns yield up to 6 × 32 × 4 × 4 = 3072 distinct series. High-cardinality identifiers (trace_id, span_id) are fields, not tags, so they don't inflate the series count.