Server data from the Official MCP Registry
Crawl and analyse websites for SEO errors using Crawlee with SQLite storage
Crawl and analyse websites for SEO errors using Crawlee with SQLite storage
Valid MCP server (1 strong, 1 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry. Trust signals: trusted author (11/11 approved).
4 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: OUTPUT_DIR
Environment variable: DEBUG
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-houtini-ai-seo-crawler-mcp": {
"env": {
"DEBUG": "your-debug-here",
"OUTPUT_DIR": "your-output-dir-here"
},
"args": [
"-y",
"@houtini/seo-crawler-mcp"
],
"command": "npx"
}
}
}From the project's GitHub README.
Crawl and analyse your website for errors and issues that probably affect your site's SEO
Quick Navigation
Installation | CLI mode | How to use | What gets detected | Data storage | Performance | Tools reference | Available queries
I wanted to build on my experience working with the MCP protocol SDK to see just how far we can extend an AI assistant's capabilities. I decided that I'd quite like to build a crawler to check my site's "technical SEO" health and came across Crawlee - which seemed like the ideal library to base the crawl component of my MCP.
What's interesting is that MCP usually indicates a server connection of some sort. This is not so with SEO Crawler MCP. The MCP protocol is probably more powerful than I realised - this is a self-contained application wrapped in the MCP SDK that handles everything locally:
Claude (or your AI assistant of choice) can orchestrate this entire stack through simple function calls. The crawl runs asynchronously, stores everything in SQLite, and then Claude can query that data through natural language - "analyse this crawl for seo opportunities" or "report on internal broken links" - and the MCP server translates that into sophisticated SQL analysis.
You can also run crawls directly from the terminal - perfect for large sites or background processing. The CLI mode lets you run a crawl, get the output directory, and then hand that over to Claude for AI-powered analysis via the MCP tools.
The core crawling architecture is inspired by the logic and patterns from the LibreCrawl project. We've adapted their proven crawling methodology for use within the MCP protocol whilst adding comprehensive SEO analysis capabilities.
If you're new to MCP servers, I'd recommend reading these first:
I'd also suggest installing Desktop Commander first - it's useful for working with the crawl output files. See the Desktop Commander setup guide for details.
Add this to your Claude Desktop config file:
Windows: C:\Users\[YourName]\AppData\Roaming\Claude\claude_desktop_config.json
Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"seo-crawler-mcp": {
"command": "npx",
"args": ["-y", "@houtini/seo-crawler-mcp"],
"env": {
"OUTPUT_DIR": "C:\\seo-audits"
}
}
}
}
Restart Claude Desktop. Four tools will be available:
seo-crawler-mcp:run_seo_auditseo-crawler-mcp:analyze_seoseo-crawler-mcp:query_seo_dataseo-crawler-mcp:list_seo_queriesClaude Code uses a different registration mechanism -- it doesn't read claude_desktop_config.json. Use claude mcp add instead:
claude mcp add -e OUTPUT_DIR=/path/to/seo-audits -s user seo-crawler-mcp -- npx -y @houtini/seo-crawler-mcp
Verify with:
claude mcp get seo-crawler-mcp
You should see Status: Connected.
cd C:\MCP\seo-crawler-mcp
npm install
npm run build
Then use the local path in your config:
{
"mcpServers": {
"seo-crawler-mcp": {
"command": "node",
"args": ["C:\\MCP\\seo-crawler-mcp\\build\\index.js"],
"env": {
"OUTPUT_DIR": "C:\\seo-audits",
"DEBUG": "false"
}
}
}
}
Environment Variables:
OUTPUT_DIR: Directory where crawl results are saved (required)DEBUG: Set to "true" to enable verbose debug logging (optional, default: "false")CLI Usage for Local Development:
When running the CLI from a local build (not installed via npm), use node directly:
# Run crawl
node C:\MCP\seo-crawler-mcp\build\cli.js crawl https://example.com --max-pages=20
# Analyze results
node C:\MCP\seo-crawler-mcp\build\cli.js analyze C:\seo-audits\example.com_2026-02-02_abc123
# List queries
node C:\MCP\seo-crawler-mcp\build\cli.js queries --category=critical
For large crawls or background processing, you can run crawls directly from the terminal.
Note: These examples use npx for globally installed packages. For local development, see the "Development Install" section above.
}
}
}
}
---
## CLI Mode (Terminal Usage)
For large crawls or background processing, you can run crawls directly from the terminal:
### Run a Crawl
```bash
# Basic crawl
npx @houtini/seo-crawler-mcp crawl https://example.com
# Large crawl with custom settings
npx @houtini/seo-crawler-mcp crawl https://example.com --max-pages=5000 --depth=5
# Using Googlebot user agent
npx @houtini/seo-crawler-mcp crawl https://example.com --user-agent=googlebot
# Show summary statistics
npx @houtini/seo-crawler-mcp analyze C:/seo-audits/example.com_2026-02-01_abc123
# Detailed JSON output
npx @houtini/seo-crawler-mcp analyze C:/seo-audits/example.com_2026-02-01_abc123 --format=detailed
# All queries
npx @houtini/seo-crawler-mcp queries
# Security queries only
npx @houtini/seo-crawler-mcp queries --category=security
# Critical priority queries
npx @houtini/seo-crawler-mcp queries --priority=CRITICAL
Run large crawl from terminal (runs in background, can close terminal)
npx @houtini/seo-crawler-mcp crawl https://bigsite.com --max-pages=5000
Get the output path from the crawl results
Output Path: C:\seo-audits\bigsite.com_2026-02-02T10-15-30_abc123
In Claude Desktop, analyze with AI
Analyze the crawl at C:\seo-audits\bigsite.com_2026-02-02T10-15-30_abc123
Show me the critical issues
What are the biggest SEO problems?
Give me a detailed report on broken internal links
This workflow is perfect for:
The typical workflow goes like this:
Crawl the website
Use seo-crawler-mcp to crawl https://example.com with maxPages=2000
Run the analysis
Analyse the crawl at C:/seo-audits/example.com_2026-02-01_abc123
Investigate specific issues
Show me the broken internal links from that crawl
Claude handles the rest - calling the right tools, parsing the results, and presenting everything in readable format.
If you're specifically worried about security headers:
List available security queries
What security checks can you run on an SEO crawl?
Run security-focused analysis
Check the security issues in crawl C:/seo-audits/example.com_2026-02-01_abc123
Deep dive on specific problems
Show me all pages with unsafe external links
The analysis engine includes 25 comprehensive SEO checks across five categories:
The crawler stores everything in SQLite databases organised by domain and date:
C:/seo-audits/example.com_2026-02-01_abc123/
├── crawl-data.db # SQLite database
│ ├── pages # Every page crawled
│ ├── links # All link relationships
│ ├── errors # Crawl errors
│ └── crawl_metadata # Statistics
├── config.json # Crawl settings
└── crawl-export.csv # Optional CSV export
Typical crawl performance metrics:
Crawl Speed:
Query Performance:
The SQLite approach works well here. Everything stays local, no API rate limits to worry about, and the query performance is more than adequate for SEO analysis.
There are 4 additional checks planned for v3.0:
The current 25 checks cover the most critical aspects of technical SEO that directly impact search engine crawling, indexing, and ranking.
Built with:
The code uses ES modules throughout, with proper Zod validation on inputs and comprehensive error handling. I've kept the architecture clean - separate modules for crawling, analysis, formatting, and tool definitions.
Deployment:
Crawl a website and extract comprehensive SEO data into SQLite.
Parameters:
url (required) - Website URL to crawlmaxPages (optional) - Maximum pages to crawl (default: 1000)depth (optional) - Maximum crawl depth (default: 3)userAgent (optional) - "chrome" or "googlebot" (default: "chrome")Example:
run_seo_audit({
url: "https://example.com",
maxPages: 2000,
depth: 5,
userAgent: "chrome"
})
Returns: Crawl ID and output path
Run comprehensive SEO analysis on a completed crawl.
Parameters:
crawlPath (required) - Path to crawl output directoryformat (optional) - "structured", "summary", or "detailed" (default: "structured")includeCategories (optional) - Filter by categories: "critical", "content", "technical", "security", "opportunities"maxExamplesPerIssue (optional) - Maximum example URLs per issue (default: 10)Example:
analyze_seo({
crawlPath: "C:/seo-audits/example.com_2026-02-01_abc123",
format: "structured",
includeCategories: ["critical", "security"],
maxExamplesPerIssue: 5
})
Returns: Structured report with issues, affected URLs, and fix recommendations
Execute specific SEO queries by name.
Parameters:
crawlPath (required) - Path to crawl output directoryquery (required) - Query name (see list_seo_queries)limit (optional) - Maximum results (default: 100)Example:
query_seo_data({
crawlPath: "C:/seo-audits/example.com_2026-02-01_abc123",
query: "broken-internal-links",
limit: 50
})
Returns: Query results with affected URLs and context
Discover available SEO analysis queries.
Parameters:
category (optional) - Filter by categorypriority (optional) - Filter by priority levelExample:
list_seo_queries({
category: "security",
priority: "HIGH"
})
Returns: List of available queries with descriptions and priorities
The analysis engine includes 28 predefined SQL queries organised by category. Each query includes detailed impact analysis and fix recommendations.
missing-titles
broken-internal-links
server-errors
not-found-errors
duplicate-titles
duplicate-descriptions
duplicate-h1s
missing-descriptions
missing-h1s
multiple-h1s
thin-content
redirect-pages
redirect-chains
orphan-pages
canonical-issues
non-https-pages
missing-csp
missing-hsts
missing-x-frame-options
missing-referrer-policy
unsafe-external-links
protocol-relative-links
title-length-issues
description-length-issues
title-equals-h1
no-outbound-links
high-external-links
missing-images
In Claude Desktop:
List all available queries
Show me the critical queries only
Run the missing-titles query on my crawl
What does the orphan-pages query check for?
In CLI:
# List all queries
seo-crawler-mcp queries
# Filter by category
seo-crawler-mcp queries --category=security
# Filter by priority
seo-crawler-mcp queries --priority=CRITICAL
Each query returns:
# Build
npm run build
# Development mode
npm run dev
# Run tests
npm test
Apache License 2.0
Copyright 2026 Richard Baxter
This product includes software developed by Apify and the Crawlee project. See NOTICE file for details.
GitHub: https://github.com/houtini-ai/seo-crawler-mcp
Issues: https://github.com/houtini-ai/seo-crawler-mcp/issues
Author: Richard Baxter hello@houtini.com
Tags: seo, crawler, audit, technical-seo, mcp, crawlee, sqlite, web-scraping, site-analysis
Be the first to review this server!
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption
by mcp-marketplace · Developer Tools
Scaffold, build, and publish TypeScript MCP servers to npm — conversationally
by mcp-marketplace · Finance
Free stock data and market news for any MCP-compatible AI assistant.
by Taylorwilsdon · Productivity
Control Gmail, Calendar, Docs, Sheets, Drive, and more from your AI