Server data from the Official MCP Registry
Lightning-fast RAG for AI agents. 4-layer fusion, ONNX Runtime, sub-200ms search.
Lightning-fast RAG for AI agents. 4-layer fusion, ONNX Runtime, sub-200ms search.
Valid MCP server (2 strong, 4 medium validity signals). 2 known CVEs in dependencies (0 critical, 1 high severity) Package registry verified. Imported from the Official MCP Registry.
5 files analyzed Β· 3 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-haseebkhalid1507-velocirag": {
"args": [
"velocirag"
],
"command": "uvx"
}
}
}From the project's GitHub README.
Lightning-fast RAG for AI agents.
Four-layer retrieval fusion powered by ONNX Runtime. No PyTorch. Sub-200ms warm search. Incremental graph updates. MCP-ready.
Most RAG solutions either drag in 2GB+ of PyTorch or limit you to single-layer vector search. VelociRAG gives you four retrieval methods β vector similarity, BM25 keyword matching, knowledge graph traversal, and metadata filtering β fused through reciprocal rank fusion with cross-encoder reranking. All running on ONNX Runtime, no GPU, no API keys. Comes with an MCP server for agent integration, a Unix socket daemon for warm queries, and a CLI that just works.
pip install "velocirag[mcp]"
velocirag index ./my-docs
velocirag mcp
Claude Code β add to .mcp.json in your project root:
{
"mcpServers": {
"velocirag": {
"command": "velocirag",
"args": ["mcp"],
"env": { "VELOCIRAG_DB": "/path/to/data" }
}
}
}
Then open /mcp in Claude Code and enable the velocirag server. If using a virtualenv, use the full path to the binary (e.g. .venv/bin/velocirag).
Claude Desktop β add to claude_desktop_config.json:
{
"mcpServers": {
"velocirag": {
"command": "velocirag",
"args": ["mcp", "--db", "/path/to/data"]
}
}
}
Cursor β add to .cursor/mcp.json:
{
"mcpServers": {
"velocirag": {
"command": "velocirag",
"args": ["mcp", "--db", "/path/to/data"]
}
}
}
from velocirag import Embedder, VectorStore, Searcher
embedder = Embedder()
store = VectorStore('./my-db', embedder)
store.add_directory('./my-docs')
searcher = Searcher(store, embedder)
results = searcher.search('query', limit=5)
pip install velocirag
velocirag index ./my-docs
velocirag search "your query here"
velocirag serve --db ./my-data # start daemon (background)
velocirag search "query" # auto-routes through daemon
velocirag status # check daemon health
velocirag stop # stop daemon
The daemon keeps the ONNX model + FAISS index warm over a Unix socket. First query loads the engine (~1s), subsequent queries return in ~180ms with full 4-layer fusion.
The 4-layer pipeline:
Query β expand (acronyms, variants)
β [Vector] FAISS cosine similarity (384d, MiniLM-L6-v2 via ONNX)
β [Keyword] BM25 via SQLite FTS5
β [Graph] Knowledge graph traversal
β [Metadata] Structured SQL filters (tags, status, project)
β RRF Fusion β Cross-encoder rerank β Results
What each layer catches:
| Query type | Vector | Keyword | Graph | Metadata |
|---|---|---|---|---|
| Conceptual ("improve error handling") | β | β | β | β |
| Exact match ("ERR_CONNECTION_REFUSED") | β | β | β | β |
| Connected concepts | β | β | β | β |
| Filtered ("#python status:active") | β | β | β | β |
| Combined ("React state management") | β | β | β | β |
VelociRAG exposes a Model Context Protocol server for seamless agent integration:
Available tools:
search β 4-layer fusion search with rerankingindex β Add documents to the knowledge baseadd_document β Insert single documenthealth β System diagnosticslist_sources β Show indexed document sourcesThe MCP server process stays alive between queries, so models load once and every subsequent search is warm. Works with any MCP-compatible client.
Full 4-layer unified search:
from velocirag import (
Embedder, VectorStore, Searcher,
GraphStore, MetadataStore, UnifiedSearch,
GraphPipeline
)
# Build the full stack
embedder = Embedder()
store = VectorStore('./search-db', embedder)
graph_store = GraphStore('./search-db/graph.db')
metadata_store = MetadataStore('./search-db/metadata.db')
# Index with graph + metadata
store.add_directory('./docs')
pipeline = GraphPipeline(graph_store, embedder, metadata_store)
pipeline.build('./docs', source_name='my-docs')
# Unified search across all layers
searcher = Searcher(store, embedder)
unified = UnifiedSearch(searcher, graph_store, metadata_store)
results = unified.search(
'machine learning algorithms',
limit=5,
enrich_graph=True,
filters={'tags': ['python'], 'status': 'active'}
)
Quick semantic search:
from velocirag import Embedder, VectorStore, Searcher
embedder = Embedder()
store = VectorStore('./db', embedder)
store.add_directory('./docs')
searcher = Searcher(store, embedder)
results = searcher.search('neural networks', limit=10)
Incremental graph updates:
from velocirag import Embedder, GraphStore, GraphPipeline
# First run β full build, populates provenance
gs = GraphStore('./db/graph.db')
pipeline = GraphPipeline(gs, embedder=Embedder())
pipeline.build('./docs', source_name='my-docs') # full build
# Subsequent runs β only changed files get reprocessed
pipeline.build('./docs', source_name='my-docs') # incremental (automatic)
# Force full rebuild
pipeline.build('./docs', source_name='my-docs', force_rebuild=True)
# Multi-source graphs
pipeline.build('./project-a', source_name='project-a')
pipeline.build('./project-b', source_name='project-b') # isolated provenance
# Deleted files automatically cascade across all stores
# (vector, FTS5, graph, metadata) on next build
# Index documents (graph + metadata built by default)
velocirag index <path> [--no-graph] [--no-metadata] [--gliner] [--full-graph] [--force]
[--source NAME] [--db PATH]
# Search across all layers (auto-routes through daemon if running)
velocirag search <query> [--limit N] [--threshold F] [--format text|json]
# Search daemon
velocirag serve [--db PATH] [-f] # start daemon (-f for foreground)
velocirag stop # stop daemon
velocirag status # check daemon health
# Metadata queries
velocirag query [--tags TAG] [--status S] [--project P] [--recent N]
# System health and status
velocirag health [--format text|json]
# Start MCP server
velocirag mcp [--db PATH] [--transport stdio|sse]
Options:
--no-graph β Skip knowledge graph build--no-metadata β Skip metadata extraction--full-graph β Build graph WITH semantic similarity edges (~2GB extra RAM)--source NAME β Label for multi-source provenance isolation--force β Clear and rebuild from scratch--gliner β Use GLiNER for entity extraction (requires pip install "velocirag[ner]")Real benchmarks on ByteByteGo/system-design-101 (418 files, 1,001 chunks):
| Metric | Value |
|---|---|
| Index (418 files) | 13.6s |
| Search (warm, 5 results) | 35β90ms |
| Graph build (light) | 2.1s β 2,397 nodes, 8,717 edges |
| Incremental update (1 file) | 1.3s |
| Reranker | Cross-encoder TinyBERT via ONNX |
| Install size | ~80MB (no PyTorch) |
| RAM usage | <1GB with all models loaded |
Production deployment (6,300+ chunks, 3 sources, 950 files):
| Metric | Value |
|---|---|
| Full search (warm) | 16ms avg, 2ms min |
| Full search (first run) | 22ms avg, 4ms min |
| Search P50 / P95 | 17ms / 55ms |
| Hit rate (100-query benchmark) | 99/100 |
| Graph | 3,125 nodes, 132,320 edges |
| Reranker | Cross-encoder TinyBERT via ONNX |
| RAM | <1GB with all models loaded |
| Environment Variable | Default | Description |
|---|---|---|
VELOCIRAG_DB | ./.velocirag | Database directory |
VELOCIRAG_SOCKET | /tmp/velocirag-daemon.sock | Daemon socket path |
NO_COLOR | β | Disable colored output |
Dependencies (all included in base install):
onnxruntime β ONNX inference (embedder + reranker)tokenizers + huggingface-hub β model loadingfaiss-cpu β vector similarity searchnetworkx + scikit-learn β knowledge graph + topic clusteringnumpy, click, pyyaml, python-frontmatterOptional extras:
pip install "velocirag[mcp]" β MCP server (adds fastmcp)pip install "velocirag[ner]" β GLiNER entity extraction (adds gliner, requires PyTorch)VelociRAG builds on these foundational works:
Core Fusion & Retrieval
Reciprocal Rank Fusion β Cormack, G. V., Clarke, C. L. A., & BΓΌttcher, S. (2009). "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods." SIGIR '09.
Core fusion algorithm for merging results across retrieval layers.
BM25 β Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). "Okapi at TREC-3." TREC-3.
Keyword search foundation via SQLite FTS5.
Embeddings & Neural IR
Sentence-BERT β Reimers, N., & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP 2019. paper
Dense embedding architecture usingall-MiniLM-L6-v2.
MiniLM β Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers." NeurIPS 2020. paper
Efficient transformer distillation for production embedding models.
Reranking & Neural Models
Cross-Encoder Reranking β Nogueira, R., & Cho, K. (2019). "Passage Re-ranking with BERT." arXiv:1901.04085. paper
Cross-attention reranking with TinyBERT on MS MARCO.
TinyBERT β Jiao, X., et al. (2020). "TinyBERT: Distilling BERT for Natural Language Understanding." Findings of EMNLP 2020. paper
Compressed BERT for fast reranking inference.
Vector Search & Systems
FAISS β Johnson, J., Douze, M., & JΓ©gou, H. (2019). "Billion-scale similarity search with GPUs." IEEE Transactions on Big Data. paper
High-performance vector similarity search engine.
GLiNER β Zaratiana, U., Nzeyimana, A., & Holat, P. (2023). "GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer." arXiv:2311.08526. paper
Generalist NER for knowledge graph entity extraction (optional dependency).
MIT β Use it anywhere, build anything.
Need agent integration help? Check AGENTS.md for machine-readable project context.
Built for agents who think fast and remember faster.
Be the first to review this server!
by Modelcontextprotocol Β· Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno Β· Developer Tools
Toleno Network MCP Server β Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace Β· Developer Tools
Create, build, and publish Python MCP servers to PyPI β conversationally.
by Microsoft Β· Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption
by mcp-marketplace Β· Developer Tools
Scaffold, build, and publish TypeScript MCP servers to npm β conversationally
by mcp-marketplace Β· Finance
Free stock data and market news for any MCP-compatible AI assistant.