SPT supports an optional RDMA (Remote Direct Memory Access) data path for S3 workloads. When enabled, object transfers bypass the kernel networking stack for significantly lower latency and higher throughput on supported hardware.
RDMA acceleration applies to write and read workloads. Objects below the configured threshold are transparently sent over standard HTTP.
Hardware required: S3-RDMA requires RDMA-capable NICs, an RDMA-capable storage target (e.g., Dell ECS), and Linux. See Requirements for details.
# RDMA-accelerated write: 16 threads, 1MB objects, 5 minutes
spt run write \
--endpoints https://ecs.example.com \
--access-key "$S3_ACCESS_KEY" \
--secret-key "$S3_SECRET_KEY" \
--bucket benchmark-test \
--threads 16 \
--object-size 1MB \
--duration 5m \
--use-rdma \
--rdma-local-ip 10.247.128.125
# RDMA-accelerated read
spt run read \
--endpoints https://ecs.example.com \
--access-key "$S3_ACCESS_KEY" \
--secret-key "$S3_SECRET_KEY" \
--bucket benchmark-test \
--threads 16 \
--object-size 1MB \
--seed-objects 5000 \
--duration 5m \
--use-rdma \
--rdma-local-ip 10.247.128.125| Flag | Default | Description |
|---|---|---|
--use-rdma |
false |
Enable the RDMA-accelerated S3 driver |
--rdma-local-ip |
"" |
Local RDMA interface IP address |
--rdma-threshold |
1MB |
Minimum object size for RDMA transfer (e.g., 0, 256KB, 4MB) |
--rdma-fallback |
false |
Fall back to HTTP if RDMA initialization fails |
--rdma-device |
auto |
RDMA device name or auto for auto-detection |
--rdma-log-level |
WARN |
RDMA native library log level |
--rdma-timeout-ms |
30000 |
RDMA operation timeout in milliseconds |
Environment variable overrides: SPT_RDMA_ENABLED, RDMA_LOCAL_IP, RDMA_DEVICE, RDMA_LOG_LEVEL, RDMA_THRESHOLD_BYTES, RDMA_TIMEOUT_MS, RDMA_FALLBACK_ENABLED
Not all objects benefit from RDMA. Small objects have higher per-operation overhead from memory registration and token generation, while large objects amortize this cost easily.
The --rdma-threshold flag controls the cutoff:
- Objects at or above the threshold are transferred via RDMA.
- Objects below the threshold use standard HTTP.
- Set to
0to force all objects through RDMA.
The default of 1MB is a good starting point. At 1MB, RDMA memory registration overhead is approximately 2% of total operation time.
Use spt verify --use-rdma to pre-check RDMA readiness on your test nodes:
spt verify --test-hosts "rdma1,rdma2" --use-rdmaThis adds RDMA-specific checks on top of the standard Docker/port verification: hardware presence, device accessibility, and driver availability.
- OS: Linux only
- NICs: RDMA-capable NICs — NVIDIA/Mellanox ConnectX-4 or newer. Bonded interfaces (
mlx5_bond_0) are supported. - Storage target: An RDMA-capable S3 endpoint (e.g., Dell ECS with RDMA enabled)
- System packages:
rdma-core(provideslibibverbs,librdmacm,libmlx5) - Docker: Device passthrough for RDMA hardware (
--device /dev/infiniband)
If RDMA hardware is not available at runtime, the driver fails by default. Set --rdma-fallback to fall back to HTTP instead.
SPT implements S3-RDMA by extending the standard S3 storage driver. The S3RdmaStorageDriver overrides only the data transfer path — all S3 authentication, signing, and metadata operations continue to use the existing Netty HTTP engine.
The SPT client does not perform RDMA data transfers directly. The storage server initiates the transfer:
| Operation | What happens |
|---|---|
| PUT (write) | Client registers a memory buffer, generates an RDMA token, and sends it as an x-amz-rdma-token HTTP header. The server performs an RDMA READ from the client's buffer. |
| GET (read) | Same token flow. The server performs an RDMA WRITE into the client's buffer. |
This means the client-side implementation is lightweight: register memory, generate a token, send the HTTP request, and wait for the server to complete the transfer.
SPT Engine (Java)
├── S3RdmaStorageDriver — routing: RDMA vs HTTP based on threshold
├── RdmaTransport — JNI bridge + memory registration lifecycle
└── libspt_rdma.so — ~675 lines of C using libibverbs/librdmacm
└── rdma-core — system packages (libibverbs, libmlx5, librdmacm)
The native layer uses Mellanox DC (Dynamically Connected) transport for scalable connections and RoCE v2 for Ethernet-based RDMA. Memory registration is done on-demand per operation — the overhead (~88 microseconds at 1MB) is negligible relative to the transfer time.
If SPT reports that RDMA is not available:
- Verify RDMA hardware:
ibv_devinfoshould list your device - Check
rdma-corepackages are installed - Ensure
/dev/infinibandis accessible inside the Docker container - Run
spt verify --use-rdmato diagnose
If --rdma-fallback is set and RDMA initialization fails, all operations silently use HTTP. Check engine logs for RDMA initialization failed, falling back to HTTP to confirm.
- Ensure
--rdma-local-ipmatches your RDMA interface (not a management NIC) - Check that object sizes are above
--rdma-threshold - Verify RoCE v2 is properly configured on the network (PFC/ECN flow control)
- Use
--rdma-log-level DEBUGfor detailed native-layer diagnostics