Also available in: 简体中文
seekdb-rs is the official Rust SDK for SeekDB, currently focused on the Server mode and talking to SeekDB / OceanBase over the MySQL protocol.
The APIs are designed to closely mirror the Python SDK (pyseekdb), but this crate is still experimental / incomplete and may evolve.
- Installation
- Client Connection
- AdminClient and Database Management
- Collection Management
- DML Operations
- DQL Operations
- Embedding Functions
- Sync Client
- Testing
- Feature Matrix
seekdb-rs is currently developed as a standalone crate and may not yet be published on crates.io. The recommended way to use it today is via a local path dependency.
# Cargo.toml in your application / workspace crate
[dependencies]
seekdb-rs = { path = "/path/to/seekdb-rs" } # adjust to where you cloned this repoBuild:
cargo buildFeatures:
server(enabled by default): async client for the remote SeekDB / OceanBase server.embedding(enabled by default): built‑in ONNX‑based embedding implementation (DefaultEmbedding), depends onreqwest/tokenizers/ort/hf-hub.sync(optional): blocking wrapper around the async client (SyncServerClient,SyncCollection), backed by an internal Tokio runtime.
Example enabling sync and embedding explicitly:
[dependencies]
seekdb-rs = { path = "/path/to/seekdb-rs", features = ["server", "embedding", "sync"] }The Python SDK exposes a single Client factory that hides embedded vs remote server.
In Rust we currently only support the remote server client, represented by ServerClient.
Embedded mode (equivalent to Python’s embedded client) is not implemented in Rust yet.
use seekdb_rs::{ServerClient, SeekDbError};
#[tokio::main]
async fn main() -> Result<(), SeekDbError> {
// Build a client using a fluent builder
let client = ServerClient::builder()
.host("127.0.0.1") // host
.port(2881) // port
.tenant("sys") // tenant
.database("demo") // database
.user("root") // user (without tenant suffix)
.password("") // password
.max_connections(5)
.build()
.await?;
// Run an arbitrary SQL statement
let _ = client.execute("SELECT 1").await?;
Ok(())
}The Rust ServerClient behaves similarly to Python’s RemoteServerClient:
- Uses
user@tenantbehind the scenes to connect. - Talks to SeekDB / OceanBase via the MySQL protocol.
For parity with Python’s “read config from environment variables” pattern, Rust exposes ServerConfig::from_env() and helpers on ServerClient.
Environment variables:
SERVER_HOSTSERVER_PORT(default:2881)SERVER_TENANTSERVER_DATABASESERVER_USERSERVER_PASSWORDSERVER_MAX_CONNECTIONS(default:5)
export SERVER_HOST=127.0.0.1
export SERVER_PORT=2881
export SERVER_TENANT=sys
export SERVER_DATABASE=demo
export SERVER_USER=root
export SERVER_PASSWORD=your_passworduse seekdb_rs::{ServerClient, ServerConfig, SeekDbError};
#[tokio::main]
async fn main() -> Result<(), SeekDbError> {
// Build config from env
let config = ServerConfig::from_env()?;
// Connect from config
let client = ServerClient::from_config(config).await?;
// Or in a single step:
let client = ServerClient::from_env().await?;
// Or mix env defaults with manual overrides
let client = ServerClient::builder()
.from_env()? // prefill from env
.database("demo_override")
.build()
.await?;
client.execute("SELECT 1").await?;
Ok(())
}There is no universal “mode‑switching” Client type in Rust; use ServerClient directly.
Key async APIs:
| Method / Property | Description |
|---|---|
ServerClient::builder() |
Fluent builder for creating a remote client |
ServerClient::from_config(ServerConfig) |
Connect from an explicit config |
ServerClient::from_env() |
Load config from env and connect |
ServerClient::pool() |
Access the underlying MySqlPool |
ServerClient::tenant() / database() |
Inspect current tenant / database |
ServerClient::execute(sql) |
Execute a statement that does not return rows (INSERT / UPDATE /…) |
ServerClient::fetch_all(sql) |
Execute a query and return all rows |
ServerClient::create_collection(...) |
Create a collection (see below) |
ServerClient::get_collection(...) |
Get a Collection handle for an existing collection |
ServerClient::get_or_create_collection(...) |
Get or create a collection |
ServerClient::delete_collection(name) |
Drop a collection |
ServerClient::list_collections() |
List all collection names in the current database |
ServerClient::has_collection(name) |
Check if a collection exists |
ServerClient::count_collection() |
Count collections in the current database |
Python exposes an AdminClient for managing databases. Rust provides:
AdminApitrait: abstract admin interface.AdminClient: thin wrapper aroundServerClient.ServerClientitself implementsAdminApi, so you can call admin methods on it directly.
use seekdb_rs::{AdminClient, AdminApi, ServerClient, ServerConfig, SeekDbError};
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), SeekDbError> {
let config = ServerConfig::from_env()?;
let client = ServerClient::from_config(config).await?;
// Use a dedicated AdminClient (holds Arc<ServerClient>)
let admin = AdminClient::new(Arc::new(client));
// Or call the same methods directly on ServerClient
// let admin: &dyn AdminApi = &client;
Ok(())
}// Create a database
admin.create_database("my_db", Some("sys")).await?;
// Get metadata
let db = admin.get_database("my_db", Some("sys")).await?;
println!("name = {}, tenant = {:?}", db.name, db.tenant);
// List databases (with optional limit/offset/tenant)
let list = admin.list_databases(None, None, None).await?;
// Delete database
admin.delete_database("my_db", Some("sys")).await?;Database corresponds to the Python struct:
pub struct Database {
pub name: String,
pub tenant: Option<String>,
pub charset: Option<String>,
pub collation: Option<String>,
}Collection<Ef> in Rust mirrors Python’s Collection class, with a few important differences:
- Rust uses a generic parameter:
Collection<Ef = Box<dyn EmbeddingFunction>>. - All operations are async and return
Result<_, SeekDbError>. - With the
syncfeature enabled you also getSyncCollection<Ef>as a blocking wrapper.
When creating a collection you must provide HnswConfig (dimension + distance metric):
use seekdb_rs::{DistanceMetric, HnswConfig, ServerClient, SeekDbError};
#[tokio::main]
async fn main() -> Result<(), SeekDbError> {
let config = seekdb_rs::ServerConfig::from_env()?;
let client = ServerClient::from_config(config).await?;
let hnsw = HnswConfig {
dimension: 384,
distance: DistanceMetric::Cosine,
};
// Create a collection without automatic embeddings
let coll = client
.create_collection::<Box<dyn seekdb_rs::EmbeddingFunction>>(
"my_collection",
Some(hnsw),
None,
)
.await?;
Ok(())
}If HnswConfig is missing, collection creation fails with:
SeekDbError::Config("HnswConfig must be provided when creating a collection").
let coll = client
.get_collection::<Box<dyn seekdb_rs::EmbeddingFunction>>("my_collection", None)
.await?;
println!(
"Collection name = {}, dim = {}, distance = {:?}",
coll.name(),
coll.dimension(),
coll.distance()
);Under the hood, SeekDB is introspected using DESCRIBE / SHOW CREATE TABLE:
- Vector dimension is parsed from the embedding column type (e.g.
vector(384)). - Distance metric is parsed from the vector index options (e.g.
distance=cosine).
// List all collections
let names = client.list_collections().await?;
// Count collections
let count = client.count_collection().await?;
// Check existence
if client.has_collection("my_collection").await? {
println!("collection exists");
}
// Drop collection
client.delete_collection("my_collection").await?;Collection<Ef> exposes a few read‑only accessors:
name() -> &strid() -> Option<&str>(internal ID; table name still usesc$v1$prefix)dimension() -> u32distance() -> DistanceMetricmetadata() -> Option<&serde_json::Value>
Rust implements DML operations with semantics close to Python:
- Explicit
embeddingsare always supported. - When a collection has an
embedding_function,add/update/upsertcan generate embeddings automatically fromdocuments.
use seekdb_rs::{Embedding, Metadata};
use serde_json::json;
let ids = vec!["item1".to_string(), "item2".to_string()];
let embeddings: Vec<Embedding> = vec![vec![0.1, 0.2, 0.3], vec![0.4, 0.5, 0.6]];
let documents = vec!["Document 1".to_string(), "Document 2".to_string()];
let metadatas: Vec<Metadata> = vec![
json!({"category": "AI", "score": 95}),
json!({"category": "ML", "score": 88}),
];
coll.add(&ids, Some(&embeddings), Some(&metadatas), Some(&documents))
.await?;When the collection was created with an EmbeddingFunction, you can skip the
embeddings parameter and let the SDK embed the documents:
use seekdb_rs::{DistanceMetric, HnswConfig, ServerClient, embedding::DefaultEmbedding};
let config = seekdb_rs::ServerConfig::from_env()?;
let client = ServerClient::from_config(config).await?;
// Requires the `embedding` feature
let ef = DefaultEmbedding::new()?;
let hnsw = HnswConfig {
dimension: ef.dimension() as u32,
distance: DistanceMetric::Cosine,
};
let coll = client
.create_collection::<DefaultEmbedding>("auto_emb", Some(hnsw), Some(ef))
.await?;
let ids = vec!["auto1".to_string(), "auto2".to_string()];
let docs = vec!["hello rust".to_string(), "seekdb vector".to_string()];
// No explicit embeddings: documents are embedded automatically
coll.add(&ids, None, None, Some(&docs)).await?;// Metadata‑only update
coll.update(
&["item1".to_string()],
None,
Some(&[serde_json::json!({"category": "AI", "score": 98})]),
None,
)
.await?;
// Update embeddings + metadata + documents
coll.update(
&["item1".to_string(), "item2".to_string()],
Some(&[vec![0.9, 0.8, 0.7], vec![0.6, 0.5, 0.4]]),
Some(&[
serde_json::json!({"category": "AI"}),
serde_json::json!({"category": "ML"}),
]),
Some(&[
"Updated document 1".to_string(),
"Updated document 2".to_string(),
]),
)
.await?;Validation rules:
embeddings(if present) must matchidsin length.- Each embedding must have the same dimension as
Collection::dimension(). documents/metadatascan be omitted; only provided fields are updated.- If no embeddings are provided but documents are, and the collection has an
embedding_function, the SDK generates embeddings automatically.
let id = "item1".to_string();
// 1) Insert
coll.upsert(
&[id.clone()],
Some(&[vec![1.0, 2.0, 3.0]]),
Some(&[serde_json::json!({"tag": "init", "cnt": 1})]),
Some(&["doc1".to_string()]),
)
.await?;
// 2) Metadata‑only upsert: keep doc and embedding
coll.upsert(
&[id.clone()],
None,
Some(&[serde_json::json!({"tag": "init", "cnt": 2})]),
None,
)
.await?;
// 3) Document‑only upsert
coll.upsert(
&[id.clone()],
None,
None,
Some(&["new_doc".to_string()]),
)
.await?;Semantics:
idsmust be non‑empty.- If
embeddings/documents/metadatasare allNone, you getSeekDbError::InvalidInput. - When the row exists:
- Only fields provided in the call are updated.
- Others keep their previous values.
- When the row does not exist:
- A new row is inserted; missing fields use default values (
NULL/ empty). - If only
documentsare given and the collection has anembedding_function, embeddings are generated; otherwise only the document field is updated.
- A new row is inserted; missing fields use default values (
Rust mirrors Python’s collection.delete(ids=..., where=..., where_document=...) using strongly typed filters.
use seekdb_rs::{Filter, DocFilter};
use serde_json::json;
// Delete by IDs
coll.delete(Some(&vec!["id1".to_string(), "id2".to_string()]), None, None)
.await?;
// Delete by metadata filter
let where_meta = Filter::Gte {
field: "score".into(),
value: json!(90),
};
coll.delete(None, Some(&where_meta), None).await?;
// Delete by document filter
let where_doc = DocFilter::Contains("machine learning".into());
coll.delete(None, None, Some(&where_doc)).await?;If all of ids, where_meta, and where_doc are None,
SeekDbError::InvalidInput is returned.
On the Python side, collections expose query (vector search), get (filtered read),
and hybrid_search. Rust supports the same concepts:
query_embeddings– search using explicit query embeddings.query_texts– search using raw text; embeddings are computed using the collection’sEmbeddingFunction.get– filter‑only reads.hybrid_search/hybrid_search_advanced– hybrid vector + text + metadata search.
You will find the complete set of examples (including hybrid search and filter
operators) in the Simplified Chinese README: README_zh-CN.md.
Python defines an EmbeddingFunction protocol and ships a default ONNX model.
Rust mirrors this with an EmbeddingFunction trait and an optional DefaultEmbedding
implementation behind the embedding feature.
use seekdb_rs::{EmbeddingFunction, Embeddings, Result};
#[async_trait::async_trait]
pub trait EmbeddingFunction: Send + Sync {
async fn embed_documents(&self, docs: &[String]) -> Result<Embeddings>;
fn dimension(&self) -> usize;
}You can implement this trait for your own models (local, remote, or SaaS).
When attached to a collection, it is used for:
add/update/upsertwhen you only passdocuments.query_textsand text‑basedhybrid_search.
With the embedding feature enabled, seekdb-rs provides a built‑in
DefaultEmbedding based on an ONNX export of a sentence‑transformers model
(similar to all-MiniLM-L6-v2):
use seekdb_rs::embedding::DefaultEmbedding;
let ef = DefaultEmbedding::new()?; // model is downloaded / loaded on demand
let dim = ef.dimension(); // e.g. 384
let embs = ef.embed_documents(&["hello".into(), "world".into()]).await?;For codebases that cannot adopt async/await yet, the sync feature enables
blocking wrappers SyncServerClient and SyncCollection. They internally own
a Tokio runtime and simply block_on the async implementations.
[dependencies]
seekdb-rs = { path = "/path/to/seekdb-rs", features = ["sync"] }use seekdb_rs::{ServerConfig, SyncServerClient, SyncCollection, SeekDbError};
fn main() -> Result<(), SeekDbError> {
let config = ServerConfig::from_env()?;
let client = SyncServerClient::from_config(config)?;
let hnsw = seekdb_rs::HnswConfig {
dimension: 3,
distance: seekdb_rs::DistanceMetric::Cosine,
};
let coll: SyncCollection = client
.create_collection::<seekdb_rs::DummyEmbedding>("sync_demo", Some(hnsw), None::<seekdb_rs::DummyEmbedding>)?;
let ids = vec!["id1".to_string(), "id2".to_string()];
let embs = vec![vec![1.0, 2.0, 3.0], vec![2.0, 3.0, 4.0]];
coll.add(&ids, Some(&embs), None, Some(&["doc1".into(), "doc2".into()]))?;
let cnt = coll.count()?;
assert_eq!(cnt, 2);
Ok(())
}Do not call the blocking APIs (
SyncServerClient,SyncCollection) from within an existing Tokio runtime; use them only in non‑async contexts.
This crate ships both unit tests and async integration tests.
- Unit tests live alongside modules under
src/and can be run withcargo test. - Integration tests live under
tests/and require a real SeekDB / OceanBase instance, controlled by env variables.
Run integration tests:
SEEKDB_INTEGRATION=1 \
SERVER_HOST=127.0.0.1 \
SERVER_PORT=2881 \
SERVER_TENANT=sys \
SERVER_DATABASE=test \
SERVER_USER=root \
SERVER_PASSWORD='' \
cargo test --testsIntegration tests cover:
- Database CRUD and admin APIs.
- Collection DML semantics and metadata handling.
- ONNX‑based default embedding (with the
embeddingfeature). - Hybrid search behavior.
- Sync client wrappers (with the
syncfeature).
A high‑level comparison with the Python SDK:
| Area | Status | Notes |
|---|---|---|
Error type SeekDbError |
✅ | Unified error type, aligned with the design docs |
Config types ServerConfig / HnswConfig / DistanceMetric |
✅ | Includes from_env helpers |
Common structs QueryResult / GetResult / IncludeField / Database |
✅ | Struct shapes match Python |
Server client ServerClient |
✅ | connect/from_config/from_env/execute/fetch_all |
| Collection mgmt: create/get/get_or_create/delete/list/has/count | ✅ | Table naming, vector column/index follow Python conventions |
Collection DML: add / update / upsert / delete (explicit embeddings) |
✅ | Length & dimension checks; semantics aligned with Python |
Collection DQL: query_embeddings / query_texts / get / count / peek |
✅ | Supports metadata/document filters and include flags |
Filter expressions Filter / DocFilter |
✅ | Typed equivalents of $eq/$ne/$gt/... and $contains/$regex |
| Integration tests (server mode) | ✅ | Require real SeekDB / OceanBase |
EmbeddingFunction trait |
✅ | Custom implementations supported |
Default embedding implementation DefaultEmbedding |
✅ | ONNX‑based, behind the embedding feature |
Auto‑embedding for add / update / upsert |
✅ | When a collection has an embedding_function |
Text queries: Collection::query_texts |
✅ | Uses attached EmbeddingFunction |
Sync wrappers: SyncServerClient / SyncCollection |
✅ | Provided behind the sync feature |
Hybrid search (hybrid_search, hybrid_search_advanced) |
✅ | Hybrid vector + text + metadata search |
| Embedded client (on‑disk, non‑server mode) | ❌ | Not implemented in Rust yet |
| RAG demo (end‑to‑end example) | ❌ | Only available in Python for now |
For more detailed, API‑by‑API explanations (currently in Simplified Chinese),
see README_zh-CN.md.