How do I install Haiku?

Haiku is a local plugin. Install it using PyPI package: haiku-rag and add the generated configuration snippet to your AI app's MCP config file. Then restart your AI app.

What AI apps work with Haiku?

Haiku uses the Model Context Protocol (MCP) and works with any MCP-compatible AI app, including Claude, ChatGPT / Codex, Gemini, Copilot, Cursor, and more.

Back to Browse

Haiku MCP Server

by Ggozad

AI & MLModerate6.7MCP RegistryLocal

Free

Server data from the Official MCP Registry

Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling

About

Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling

Security Report

6.7

Moderate6.7Moderate Risk

Haiku RAG is a well-structured RAG framework with comprehensive security practices. The codebase demonstrates proper async/await patterns, input validation, and careful handling of document sources. No malicious patterns or hardcoded credentials were detected. Minor code quality observations exist around broad exception handling and verbose test coverage, but these do not constitute security vulnerabilities. Permissions (file_system, network_http, env_vars) are appropriate for an ML/RAG framework that indexes documents and calls embedding APIs. Package verification found 1 issue.

4 files analyzed · 5 issues found

Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.

Permissions Required

This plugin requests these system permissions. Most are normal for its category.

File System Read

Reads files on your machine. Normal for tools that analyze or process local data.

File System Write

Writes or modifies files on your machine. Check that this is expected for the tool.

HTTP Network Access

Connects to external APIs or services over the internet.

env_vars

Check that this permission is expected for this type of plugin.

How to Install

Add this to your MCP configuration file:

{
  "mcpServers": {
    "io-github-ggozad-haiku-rag": {
      "args": [
        "haiku.rag"
      ],
      "command": "uvx"
    }
  }
}

Documentation

View on GitHub

From the project's GitHub README.

Haiku RAG

Agentic RAG built on LanceDB, Pydantic AI, and Docling.

New: vision and multimodal search. Picture-aware ingestion captures embedded figure bytes; vision-capable QA models receive them alongside text. Multimodal embedders put picture vectors in the same space as text, enabling text-as-query → figure hits and image-as-query retrieval.

Features

Hybrid search — Vector + full-text with Reciprocal Rank Fusion
Multimodal & cross-modal search — Multimodal embedders (vLLM) put picture vectors in the same space as text; supports text-as-query → figure hits and image-as-query
Question answering — RAG skill with citations (page numbers, section headings)
Vision QA — Vision-capable models receive figure bytes alongside chunk text
Reranking — MxBAI, Cohere, Zero Entropy, or vLLM
Analysis skill — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, LM Studio, vLLM (multimodal). QA: any model supported by Pydantic AI
Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
CLI & Python API — Full functionality from command line or code
MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
Visual grounding — View chunks highlighted on original page images
Production ingester — Long-lived haiku-ingester service with persistent SQLite queue, async worker pool with retries and a dead-letter queue, FS / HTTP / S3 / WebDAV source adapters, FastAPI control plane, and a browser dashboard for operators. See docs/ingester.md.
Time travel — Query the database at any historical point with --before
Inspector — TUI for browsing documents, chunks, and search results

Installation

Python 3.12 or newer required

Full Package (Recommended)

pip install haiku.rag

Includes all features: document processing, all embedding providers, and rerankers.

Using uv? uv pip install haiku.rag

Slim Package (Minimal Dependencies)

pip install haiku.rag-slim

Install only the extras you need. See the Installation documentation for available options.

Quick Start

Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.

# Index a PDF
haiku-rag add-src paper.pdf

# Search
haiku-rag search "attention mechanism"

# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?"

# Analyze — complex analytical tasks via code execution
haiku-rag analyze "How many documents mention transformers?"

# Interactive chat — multi-turn conversations with memory
haiku-rag chat

# Continuously ingest from configured sources (FS, HTTP, S3, WebDAV)
haiku-ingester serve

See Configuration for customization options.

Python API

from haiku.rag.client import HaikuRAG

async with HaikuRAG("knowledge.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self-attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self-attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For details on the skills the client wraps, see the Skills docs.

MCP Server

Use with AI assistants like Claude Desktop:

haiku-rag mcp --stdio

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["mcp", "--stdio"]
    }
  }
}

Provides tools for document management, search, QA, and analysis directly in your AI assistant.

Examples

See the examples directory for working examples:

Docker Setup - Complete Docker deployment with continuous ingestion (haiku-ingester) and MCP server
Web Application - Full-stack conversational RAG with CopilotKit frontend

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

Quickstart - Provider setup and first ingestion
Installation - Packages and extras
Configuration - YAML reference
CLI - Command reference
Python API - Complete API docs
Skills - The RAG and analysis skills the client wraps
Tuning - Retrieval and answer-quality tuning
Ingester - Production ingester for continuous indexing from FS, HTTP, S3, and WebDAV
MCP - Model Context Protocol integration
Remote processing - Offload conversion to docling-serve
Applications - Chat TUI, web app, and inspector
Benchmarks - Performance benchmarks
Changelog - Version history

License

This project is licensed under the MIT License.

mcp-name: io.github.ggozad/haiku-rag

Reviews

No reviews yet

Be the first to review this server!

More AI & ML MCP Servers

Sequential Thinking

Free

by Modelcontextprotocol · AI & ML

Dynamic and reflective problem-solving through structured thought sequences

Toleno

Free

by Toleno · Developer Tools

Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.

mcp-creator-python

Free

by mcp-marketplace · Developer Tools

Create, build, and publish Python MCP servers to PyPI — conversationally.

Haiku MCP Server

About

Security Report

Findings (5)

Permissions Required

How to Install

Documentation

Haiku RAG

Features

Installation

Full Package (Recommended)

Slim Package (Minimal Dependencies)

Quick Start

Python API

MCP Server

Examples

Documentation

License

Reviews

No reviews yet

More AI & ML MCP Servers

Sequential Thinking

Toleno

mcp-creator-python

MarkItDown

FinAgent

mcp-creator-typescript