Server data from the Official MCP Registry
A standardized testing harness for MCP servers and agent workflows
A standardized testing harness for MCP servers and agent workflows
Valid MCP server (3 strong, 7 medium validity signals). 4 known CVEs in dependencies (0 critical, 3 high severity) Package registry verified. Imported from the Official MCP Registry.
4 files analyzed · 5 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: YOUR_API_KEY
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-dbsectrainer-mcp-eval-runner": {
"env": {
"YOUR_API_KEY": "your-your-api-key-here"
},
"args": [
"-y",
"mcp-eval-runner"
],
"command": "npx"
}
}
}From the project's GitHub README.
npm mcp-eval-runner package
A standardized testing harness for MCP servers and agent workflows. Define test cases as YAML fixtures (steps → expected tool calls → expected outputs), run regression suites directly from your MCP client, and get pass/fail results with diffs — without leaving Claude Code or Cursor.
Tool reference | Configuration | Fixture format | Contributing | Troubleshooting | Design principles
expected_output without a server.output_contains, output_not_contains, output_equals, output_matches, schema_match, tool_called, and latency_under per step.{{steps.<step_id>.output}}.Add the following config to your MCP client:
{
"mcpServers": {
"eval-runner": {
"command": "npx",
"args": ["-y", "mcp-eval-runner@latest"]
}
}
}
By default, eval fixtures are loaded from ./evals/ in the current working directory. To use a different path:
{
"mcpServers": {
"eval-runner": {
"command": "npx",
"args": ["-y", "mcp-eval-runner@latest", "--fixtures=~/my-project/evals"]
}
}
}
Amp · Claude Code · Cline · Cursor · VS Code · Windsurf · Zed
Create a file at evals/smoke.yaml. Use live mode (recommended) by including a server block:
name: smoke
description: "Verify eval runner itself is working"
server:
command: node
args: ["dist/index.js"]
steps:
- id: list_check
description: "List available test cases"
tool: list_cases
input: {}
expect:
output_contains: "smoke"
Then enter the following in your MCP client:
Run the eval suite.
Your client should return a pass/fail result for the smoke test.
Fixtures are YAML (or JSON) files placed in the fixtures directory. Each file defines one test case.
| Field | Required | Description |
|---|---|---|
name | Yes | Unique name for the test case |
description | No | Human-readable description |
server | No | Server config — if present, runs in live mode; if absent, runs in simulation mode |
steps | Yes | Array of steps to execute |
server block (live mode)server:
command: node # executable to spawn
args: ["dist/index.js"] # arguments
env: # optional environment variables
MY_VAR: "value"
When server is present the eval runner spawns the server as a child process, connects via MCP stdio transport, and calls each step's tool against the live server.
steps arrayEach step has the following fields:
| Field | Required | Description |
|---|---|---|
id | Yes | Unique identifier within the fixture (used for output piping) |
tool | Yes | MCP tool name to call |
description | No | Human-readable step description |
input | No | Key-value map of arguments passed to the tool (default: {}) |
expected_output | No | Literal string used as output in simulation mode |
expect | No | Assertions evaluated against the step output |
Live mode — fixture has a server block:
Simulation mode — no server block:
expected_output (or empty string if absent).output_contains assertions will always fail if expected_output is not set.All assertions go inside a step's expect block:
expect:
output_contains: "substring" # output includes this text
output_not_contains: "error" # output must NOT include this text
output_equals: "exact string" # output exactly matches
output_matches: "regex pattern" # output matches a regular expression
tool_called: "tool_name" # verifies which tool was called
latency_under: 500 # latency in ms must be below this threshold
schema_match: # output (parsed as JSON) matches JSON Schema
type: object
required: [id]
properties:
id:
type: number
Multiple assertions in one expect block are all evaluated; the step fails if any assertion fails.
Reference the output of a previous step in a downstream step's input using {{steps.<step_id>.output}}:
steps:
- id: search_step
tool: search
input:
query: "mcp eval runner"
expected_output: "result: mcp-eval-runner v1.0"
expect:
output_contains: "mcp-eval-runner"
- id: summarize_step
tool: summarize
input:
text: "{{steps.search_step.output}}"
expected_output: "Summary: mcp-eval-runner v1.0"
expect:
output_contains: "Summary"
Piping works in both live mode and simulation mode.
create_test_caseFixtures created with the create_test_case tool do not include a server block. They always run in simulation mode. To use live mode, add a server block manually to the generated YAML file.
run_suite — execute all fixtures in the fixtures directory; returns a pass/fail summaryrun_case — run a single named fixture by namelist_cases — enumerate available fixtures with step counts and descriptionscreate_test_case — create a new YAML fixture file (simulation mode; no server block)scaffold_fixture — generate a boilerplate fixture with placeholder steps and pre-filled assertion commentsregression_report — compare the current fixture state to the last run; surfaces regressions and fixescompare_results — diff two specific runs by run IDgenerate_html_report — generate a single-file HTML report for a completed runevaluate_deployment_gate — CI gate; fails if recent pass rate drops below a configurable thresholddiscover_fixtures — discover fixture files across one or more directories (respects FIXTURE_LIBRARY_DIRS)--fixtures / --fixtures-dirDirectory to load YAML/JSON eval fixture files from.
Type: string
Default: ./evals
--db / --db-pathPath to the SQLite database file used to store run history.
Type: string
Default: ~/.mcp/evals.db
--timeoutMaximum time in milliseconds to wait for a single step before marking it as failed.
Type: number
Default: 30000
--watchWatch the fixtures directory and rerun the affected fixture automatically when files change.
Type: boolean
Default: false
--formatOutput format for eval results.
Type: string
Choices: console, json, html
Default: console
--concurrencyNumber of test cases to run in parallel.
Type: number
Default: 1
--http-portStart an HTTP server on this port instead of stdio transport.
Type: number
Default: disabled (uses stdio)
Pass flags via the args property in your JSON config:
{
"mcpServers": {
"eval-runner": {
"command": "npx",
"args": ["-y", "mcp-eval-runner@latest", "--watch", "--timeout=60000"]
}
}
}
Before publishing a new version, verify the server with MCP Inspector to confirm all tools are exposed correctly and the protocol handshake succeeds.
Interactive UI (opens browser):
npm run build && npm run inspect
CLI mode (scripted / CI-friendly):
# List all tools
npx @modelcontextprotocol/inspector --cli node dist/index.js --method tools/list
# List resources and prompts
npx @modelcontextprotocol/inspector --cli node dist/index.js --method resources/list
npx @modelcontextprotocol/inspector --cli node dist/index.js --method prompts/list
# Call a tool (example — replace with a relevant read-only tool for this plugin)
npx @modelcontextprotocol/inspector --cli node dist/index.js \
--method tools/call --tool-name list_cases
# Call a tool with arguments
npx @modelcontextprotocol/inspector --cli node dist/index.js \
--method tools/call --tool-name run_case --tool-arg name=smoke
Run before publishing to catch regressions in tool registration and runtime startup.
New assertion types go in src/assertions.ts — implement the Assertion interface and add a test. Integration tests live under tests/ as unit tests and under evals/ as eval fixtures.
npm install && npm test
This plugin is available on:
Search for mcp-eval-runner.
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption
by mcp-marketplace · Developer Tools
Scaffold, build, and publish TypeScript MCP servers to npm — conversationally
by mcp-marketplace · Finance
Free stock data and market news for any MCP-compatible AI assistant.