Server data from the Official MCP Registry
Composable Supertone TTS toolkit: synthesis, voice search/preview/clone, usage — 31 languages
Composable Supertone TTS toolkit: synthesis, voice search/preview/clone, usage — 31 languages
A well-structured MCP server for Supertone TTS API with proper authentication via API key and reasonable permission scope. Input validation is comprehensive across all tool parameters, and error handling is robust. The autoplay feature has minor platform limitations (macOS-only), and some subprocess handling could be slightly more defensive, but these are low-severity quality issues that do not constitute security vulnerabilities. Permissions align appropriately with the server's purpose. Supply chain analysis found 3 known vulnerabilities in dependencies (0 critical, 3 high severity). Package verification found 1 issue.
3 files analyzed · 8 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Set these up before or after installing:
Environment variable: SUPERTONE_API_KEY
Environment variable: SUPERTONE_MCP_VOICE_ID
Environment variable: SUPERTONE_OUTPUT_DIR
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-supertone-inc-supertone-mcp": {
"env": {
"SUPERTONE_API_KEY": "your-supertone-api-key-here",
"SUPERTONE_OUTPUT_DIR": "your-supertone-output-dir-here",
"SUPERTONE_MCP_VOICE_ID": "your-supertone-mcp-voice-id-here"
},
"args": [
"supertone-mcp"
],
"command": "uvx"
}
}
}From the project's GitHub README.
A composable MCP toolkit for the Supertone TTS API. Rather than a single "speak this text" command, it exposes Supertone's SDK as a set of building-block tools — synthesis, voice discovery, preview, duration/credit prediction, usage tracking, and full voice-cloning CRUD — that an LLM assembles to fulfill a request. Works in Claude Desktop, Cursor, or any MCP-compatible client.
Covers Korean, English, Japanese, and 31 languages total. Speed (0.5x–2.0x), pitch shift (-24 to +24 semitones), emotion styles, per-call output mode, streaming, and model selection.
Synthesis
text_to_speech — Convert text to audio. Per-call control of output_mode (files / resources / both), autoplay, streaming, model, plus include_phonemes / normalized_text. Long text is auto-chunked by the SDK.predict_duration — Estimate audio length (and credit cost) without synthesizing.Voice discovery (preset)
search_voice — Filter the catalog by language, gender, age, use_case, style, model, name, or description.get_voice — Full detail for one voice.preview_voice — Sample audio URLs for a voice (filterable by language/style/model).Custom voice cloning
clone_voice — Create a cloned voice from a local WAV/MP3 (≤3MB).search_custom_voice — List/filter cloned voices.get_custom_voice — Full detail for one cloned voice.edit_custom_voice — Update name and/or description.delete_custom_voice — Permanently delete (irreversible).Audio assembly
merge_audio_files — Concatenate two or more local audio files (mp3/wav) into one via a bundled ffmpeg. Supports plain concat, silence gaps between clips (gap_ms), or crossfade blending (crossfade_ms). Output format auto-detected (mixed → mp3) or forced via output_format. No system ffmpeg required.Usage & credits
get_credit_balance — Remaining credits.get_usage_history — Usage over a time window.get_voice_usage — Usage for a specific voice.0.2.0 moves behavior control out of environment variables and into per-call tool parameters — so the LLM decides per request, not the server config.
| Before (env var) | After (per-call parameter) | Note |
|---|---|---|
SUPERTONE_MCP_OUTPUT_MODE=files|resources|both | text_to_speech(output_mode=...) | Default still files |
SUPERTONE_MCP_AUTOPLAY=true | text_to_speech(autoplay=...) | Default changed true → false (playback is now explicit) |
| (always streamed) | text_to_speech(streaming=...) | New, default false (one-shot). streaming=true requires model="sona_speech_1" |
Other changes:
sona_speech_1 → sona_speech_2_flash.list_voices was removed (since the discovery release) and replaced by search_voice — call it with no arguments to reproduce the old "list everything" behavior.If you previously set SUPERTONE_MCP_OUTPUT_MODE or SUPERTONE_MCP_AUTOPLAY, remove them from your client config and pass output_mode / autoplay per call instead. (The server prints a one-time stderr notice if it sees the removed vars.)
# Using uvx (recommended)
uvx supertone-mcp
# Using pip
pip install supertone-mcp
Add to claude_desktop_config.json:
{
"mcpServers": {
"supertone-tts": {
"command": "uvx",
"args": ["supertone-mcp"],
"env": {
"SUPERTONE_API_KEY": "your-api-key-here"
}
}
}
}
Add to your Cursor MCP settings (same JSON shape as above).
Only authentication and stable defaults are configured via the environment — all behavior is controlled per call.
| Variable | Required | Default | Description |
|---|---|---|---|
SUPERTONE_API_KEY | Yes | — | Your Supertone API key |
SUPERTONE_MCP_VOICE_ID | No | preset voice (Aiden, multilingual) | Default voice_id for text_to_speech / predict_duration (override per call) |
SUPERTONE_OUTPUT_DIR | No | ~/supertone-tts-output/ | Directory where audio files are saved (used by output_mode=files/both) |
Removed in 0.2.0:
SUPERTONE_MCP_OUTPUT_MODEandSUPERTONE_MCP_AUTOPLAY— see Migration.
text_to_speech output_mode)| Mode | Returns | Use when |
|---|---|---|
files (default) | Plain text with the saved file path + metadata | You want the file on disk |
resources | MCP AudioContent + TextContent (no file written) | The client renders audio inline (e.g., Claude.ai chat) |
both | File on disk and AudioContent/TextContent | You want both — preview inline, keep the file |
The MCP client routes natural-language requests across these tools — the value of the toolkit is composition: the LLM chains several tools to satisfy one request.
"Find a calm Korean female voice, let me hear a sample, check the cost, then make this announcement as an mp3."
The LLM assembles:
search_voice(language="ko", gender="female", style="neutral") # find candidates
→ preview_voice(voice_id) # sample URLs to confirm the voice
→ predict_duration(text, voice_id) + get_credit_balance() # gauge cost before spending
→ text_to_speech(text, voice_id, output_format="mp3",
output_mode="files") # synthesize
"Make a cloned voice from ~/recordings/sample.wav named MyVoice, then read this greeting with it and play it for me."
The LLM assembles:
clone_voice(name="MyVoice", audio_path="~/recordings/sample.wav") # create the cloned voice
→ get_custom_voice(voice_id) # confirm it was created
→ text_to_speech(text, voice_id=<cloned>, autoplay=true) # synthesize, then play immediately
autoplayis a per-call parameter (defaultfalse), so playback happens only when explicitly requested.
text_to_speech| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | Yes | — | Text to convert (long text is auto-chunked by the SDK) |
voice_id | string | No | env or preset | Voice identifier (browse via search_voice) |
language | string | No | ko | Language code — one of 31 (ko, en, ja, …) |
output_format | string | No | mp3 | mp3 or wav |
model | string | No | sona_speech_2_flash | sona_speech_1, sona_speech_2, sona_speech_2_flash, sona_speech_2t, sona_speech_3t, supertonic_api_1, supertonic_api_3 |
speed | float | No | 1.0 | 0.5–2.0 |
pitch_shift | int | No | 0 | -24 to +24 semitones |
style | string | No | — | Emotion style (varies by voice) |
output_mode | string | No | files | files, resources, or both (see Output modes) |
autoplay | bool | No | false | Play the audio locally after synthesis (macOS afplay) |
streaming | bool | No | false | Stream synthesis. Only supported by model="sona_speech_1" |
include_phonemes | bool | No | false | Return phoneme timing data alongside the audio |
normalized_text | string | No | — | Pre-normalized text (only used by sona_speech_2 / sona_speech_2_flash) |
predict_durationSame core parameter schema as text_to_speech (long text auto-chunked). Returns "Predicted duration: 2.34s (credit usage is proportional to duration).".
search_voiceAll parameters optional. With no filters → full catalog. With any filter → first response line is Filters applied: ....
| Parameter | Type | Description |
|---|---|---|
language | string | e.g., ko, en, ja |
gender | string | e.g., male, female |
age | string | e.g., young_adult, child |
use_case | string | e.g., narration, advertisement |
style | string | e.g., neutral, happy |
model | string | e.g., sona_speech_2_flash |
name | string | partial match |
description | string | partial match |
get_voice / preview_voice| Tool | Required | Optional |
|---|---|---|
get_voice | voice_id | — |
preview_voice | voice_id | language, style, model (filter samples) |
clone_voice| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name (non-empty) |
audio_path | string | Yes | Local WAV or MP3 path (≤3MB). Supports ~ expansion |
description | string | No | Optional note |
| Tool | Required | Optional |
|---|---|---|
search_custom_voice | — | name, description (partial match) |
get_custom_voice | voice_id | — |
edit_custom_voice | voice_id | name, description (at least one required) |
delete_custom_voice | voice_id | — (IRREVERSIBLE) |
| Tool | Required | Optional |
|---|---|---|
get_credit_balance | — | — |
get_usage_history | — | — (reports a recent default window) |
get_voice_usage | voice_id | — |
merge_audio_files| Parameter | Type | Required | Description |
|---|---|---|---|
input_paths | string[] | Yes | Two or more local mp3/wav paths (~ expansion supported). A single file is returned as-is |
gap_ms | int | No | Silence (ms) inserted at each junction. Default 0. Mutually exclusive with crossfade_ms |
crossfade_ms | int | No | Crossfade blend (ms) at each junction. Default 0. Mutually exclusive with gap_ms |
output_format | string | No | Force mp3 or wav. If omitted: all-same-ext → that ext; mixed → mp3 |
# Clone and install
git clone https://github.com/supertone-inc/supertone-mcp.git
cd supertone-mcp
uv sync
# Run tests
uv run pytest -q
# Run with coverage
uv run pytest --cov=src --cov-report=term-missing
MIT
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Modelcontextprotocol · Developer Tools
Web content fetching and conversion for efficient LLM usage
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.