Skip to content

πŸπŸ“¦ High-performance cosine similarity ranking for Retrieval-Augmented Generation (RAG) pipelines.

License

Notifications You must be signed in to change notification settings

analyticsinmotion/symrank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo-symrank

Similarity ranking for Retrieval-Augmented Generation

Meta Β  Β  Β  uvΒ  RuffΒ  Powered by RustΒ  Analytics in Motion

✨ What is SymRank?

SymRank is a blazing-fast Python library for top-k cosine similarity ranking, designed for vector search, retrieval-augmented generation (RAG), and embedding-based matching.

Built with a Rust + SIMD backend, it offers the speed of native code with the ease of Python.


πŸš€ Why SymRank?

⚑ Fast: SIMD-accelerated cosine scoring with adaptive parallelism

🧠 Smart: Automatically selects serial or parallel mode based on workload

πŸ”’ Top-K optimized: Efficient inlined heap selection (no full sort overhead)

🐍 Pythonic: Easy-to-use Python API

πŸ¦€ Powered by Rust: Safe, high-performance core engine

πŸ“‰ Memory Efficient: Supports batching for speed and to reduce memory footprint


πŸ“¦ Installation

You can install SymRank with 'uv' or alternatively using 'pip'.

Recommended (with uv):

uv pip install symrank

Alternatively (using pip):

pip install symrank

πŸ§ͺ Usage

Basic Example (using python lists)

import symrank as sr

query = [0.1, 0.2, 0.3, 0.4]  
candidates = [
    ("doc_1", [0.1, 0.2, 0.3, 0.5]),
    ("doc_2", [0.9, 0.1, 0.2, 0.1]),
    ("doc_3", [0.0, 0.0, 0.0, 1.0]),
]

results = sr.cosine_similarity(query, candidates, k=2)
print(results)

Output

[{'id': 'doc_1', 'score': 0.9939991235733032}, {'id': 'doc_3', 'score': 0.7302967309951782}]

Basic Example (using numpy arrays)

import symrank as sr
import numpy as np

query = np.array([0.1, 0.2, 0.3, 0.4], dtype=np.float32)
candidates = [
    ("doc_1", np.array([0.1, 0.2, 0.3, 0.5], dtype=np.float32)),
    ("doc_2", np.array([0.9, 0.1, 0.2, 0.1], dtype=np.float32)),
    ("doc_3", np.array([0.0, 0.0, 0.0, 1.0], dtype=np.float32)),
]

results = sr.cosine_similarity(query, candidates, k=2)
print(results)

Output

[{'id': 'doc_1', 'score': 0.9939991235733032}, {'id': 'doc_3', 'score': 0.7302967309951782}]

🧩 API: cosine_similarity(...)

cosine_similarity(
    query_vector,              # List[float] or np.ndarray
    candidate_vectors,         # List[Tuple[str, List[float] or np.ndarray]]
    k=5,                       # Number of top results to return
    batch_size=None            # Optional: set for memory-efficient batching
)

'cosine_similarity(...)' Parameters

Parameter Type Default Description
query_vector list[float] or np.ndarray required The query vector you want to compare against the candidate vectors.
candidate_vectors list[tuple[str, list[float] or np.ndarray]] required List of (id, vector) pairs. Each vector can be a list or NumPy array.
k int 5 Number of top results to return, sorted by descending similarity.
batch_size int or None None Optional batch size to reduce memory usage. If None, uses SIMD directly.

Returns

List of dictionaries with id and score (cosine similarity), sorted by descending similarity:

[{"id": "doc_42", "score": 0.8763}, {"id": "doc_17", "score": 0.8451}, ...]

πŸ“„ License

This project is licensed under the Apache License 2.0.

About

πŸπŸ“¦ High-performance cosine similarity ranking for Retrieval-Augmented Generation (RAG) pipelines.

Topics

Resources

License

Stars

Watchers

Forks