Server data from the Official MCP Registry
BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.
BM25 search + tree navigation over markdown docs for AI agents. No embeddings, no LLM calls.
Valid MCP server (1 strong, 0 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry.
4 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: DOCS_ROOT
Environment variable: DOCS_GLOB
Environment variable: DOCS_ROOTS
Environment variable: GLOSSARY_PATH
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-joesaby-doctree-mcp": {
"env": {
"DOCS_GLOB": "your-docs-glob-here",
"DOCS_ROOT": "your-docs-root-here",
"DOCS_ROOTS": "your-docs-roots-here",
"GLOSSARY_PATH": "your-glossary-path-here"
},
"args": [
"-y",
"doctree-mcp"
],
"command": "npx"
}
}
}From the project's GitHub README.
Agentic document retrieval over markdown, CSV, and JSONL. BM25 + tree navigation via MCP — no vector DB, no embeddings, no LLM calls at index time.
The pitch: MCP provides the structural primitives (a navigable tree, BM25, glossary, row lookup). The bundled skills provide the procedural knowledge (how to walk that tree). Together the agent behaves like a trained research librarian — not a one-shot searcher. See The Skill + MCP Pattern.
Have docs already? Point a client at them:
# In your AI tool's MCP config — see docs/CLIENTS.md for per-tool snippets
{ "mcpServers": { "doctree": {
"command": "bunx", "args": ["doctree-mcp"],
"env": { "DOCS_ROOT": "./docs", "WIKI_WRITE": "1" }
} } }
Restart the tool → ask "search the docs for X" or invoke the doc-read prompt.
Starting fresh? Scaffold a Karpathy-style LLM wiki:
bunx doctree-mcp init # configure current tool
bunx doctree-mcp init --all # configure every supported client
bunx doctree-mcp init --dry-run
Creates docs/wiki/ (LLM-maintained) + docs/raw-sources/ (your inputs), writes the MCP config, installs a post-write lint hook, appends wiki conventions to CLAUDE.md / AGENTS.md / .cursor/rules/.
| Mode | Use when | Guide |
|---|---|---|
| stdio (default) | Local dev, agent on your machine | Client setup |
| HTTP (Streamable HTTP) | Teams, CI, hosted agents | Deployment — Railway · Fly · Render · Cloudflare Containers · Docker |
| CLI | init, lint, debug-index | Operation modes |
Full decision tree: Operation Modes.
Agent: "How does token refresh work?"
→ search_documents("token refresh")
#1 auth/middleware.md § Token Refresh Flow score: 12.4
#2 auth/oauth.md § Refresh Token Lifecycle score: 8.7
→ get_tree("docs:auth:middleware")
[n1] # Auth Middleware
[n4] ## Token Refresh Flow
[n5] ### Automatic Refresh
→ navigate_tree("docs:auth:middleware", "n4") ← n4 + descendants
Core read tools (always on):
| Tool | Purpose |
|---|---|
search_documents | BM25 keyword search + facet filters + glossary expansion (markdown · CSV · JSONL) |
get_tree | Table of contents — headings, word counts, summaries |
get_node_content | Full text of a specific section by node ID |
navigate_tree | A section plus all descendants in one call |
lookup_row | O(1) exact-key lookup for structured data rows (e.g. PROJ-44) |
Wiki write tools (opt-in with WIKI_WRITE=1):
| Tool | Purpose |
|---|---|
find_similar | Duplicate detection with overlap ratios |
draft_wiki_entry | Scaffold: suggested path, inferred frontmatter, glossary hits |
write_wiki_entry | Validated write: path containment, schema, duplicate guards, dry-run |
Safety: path containment · frontmatter validation · duplicate detection · dry-run · overwrite protection.
Deprecated aliases (list_documents, find_files, find_symbol) are superseded by search_documents — still functional, no longer recommended.
Most retrieval tools hand the agent a search box and hope for the best. doctree-mcp hands it a tree, and the bundled skills teach it how to walk one.
search_documents, get_tree, navigate_tree, get_node_content, lookup_row return tree positions the agent reasons over — not finished answers./doc-read, /doc-write, /doc-lint encode breadcrumb drill-down: search → outline → navigate → retrieve. The agent learns the policy, not just the API.That pairing doesn't exist cleanly elsewhere:
| Approach | Primitive | Skill teaches | Gap |
|---|---|---|---|
| Managed hybrid RAG (Cloudflare AI Search, Nia) | Flat chunks + similarity | — | Black-box score, no audit trail |
| Tool-returns-answer (Context7) | 2 tools returning answers | Query shape | Agent can't reason about skipped content |
| Skill-over-CLI (QMD) | CLI over flat search | Query expansion | No tree to navigate |
doctree-mcp + /doc-read | Navigable tree | Breadcrumbs, multi-instance routing, wiki compilation | — |
Why iterative retrieval wins:
search_documents → get_tree → navigate_tree → get_node_content is a replayable trail. A cosine score is not. Regulated domains can ship the former.Multi-instance = client-side federation. Register several doctree servers under different names; the /doc-read skill encodes the routing policy. Add or remove instances without touching the skill. See Client setup → Multi-instance routing.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Raw Sources │ │ The Wiki │ │ The Schema │
│ (immutable) │ ──→ │ (LLM-maintained)│ ←── │ (you define) │
│ notes · logs │ │ runbooks · refs │ │ CLAUDE.md rules │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Inspired by Karpathy's LLM Wiki. Full walkthrough: docs/LLM-WIKI-GUIDE.md.
---
title: "Descriptive Title"
description: "One-line summary — boosts ranking"
tags: [relevant, terms]
type: runbook # runbook | guide | reference | tutorial | architecture | adr
category: auth
---
All non-reserved frontmatter fields become filter facets:
search_documents("auth", filters: { type: "runbook", tags: ["production"] })
Common env vars:
| Variable | Default | Description |
|---|---|---|
DOCS_ROOT | ./docs | Docs folder |
DOCS_GLOB | **/*.md | Comma-separated globs (**/*.md,**/*.csv,**/*.jsonl) |
DOCS_ROOTS | — | Weighted multi-collection (./wiki:1.0,./rfcs:0.5) |
PORT | 3100 | HTTP mode port |
WIKI_WRITE | (unset) | 1 enables write tools |
GLOSSARY_PATH | $DOCS_ROOT/glossary.json | Query-expansion glossary |
Full reference: docs/CONFIGURATION.md.
Glossary — place glossary.json in docs root for bidirectional query expansion:
{ "CLI": ["command line interface"], "K8s": ["kubernetes"] }
Acronym definitions like "TLS (Transport Layer Security)" are also auto-extracted.
Structured data — CSV/JSONL files become documents where each row is a tree node. Column roles (id, title, description, facets, URL) are auto-detected from headers. See docs/STRUCTURED-DATA.md.
git clone https://github.com/joesaby/doctree-mcp.git
cd doctree-mcp && bun install
DOCS_ROOT=./docs bun run serve # stdio
DOCS_ROOT=./docs bun run serve:http # HTTP (port 3100)
DOCS_ROOT=./docs bun run index # CLI: inspect indexed output
bun test
| Operation | Time | Token cost |
|---|---|---|
| Full index (900 docs) | 2–5s | 0 |
| Incremental re-index | ~50ms | 0 |
| Search | 5–30ms | ~300–1K tokens |
| Tree outline | <1ms | ~200–800 tokens |
Setup & operation
Patterns & concepts
Source
/doc-read · /doc-write · /doc-lintMIT
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption
by mcp-marketplace · Developer Tools
Scaffold, build, and publish TypeScript MCP servers to npm — conversationally
by mcp-marketplace · Finance
Free stock data and market news for any MCP-compatible AI assistant.