How do I install Supertone?

Supertone is a local plugin. Install it using PyPI package: supertone-mcp and add the generated configuration snippet to your AI app's MCP config file. Then restart your AI app.

Is Supertone safe to use?

Supertone was flagged by MCP Marketplace's security scan, scoring 4.5/10 (high risk). It has 3 high or critical findings to review. Review the security report on this page carefully before installing it.

What credentials does Supertone need?

Supertone requires the following credentials or environment variables: SUPERTONE_API_KEY, SUPERTONE_MCP_VOICE_ID, SUPERTONE_OUTPUT_DIR. You can find setup instructions on the server detail page.

What AI apps work with Supertone?

Supertone uses the Model Context Protocol (MCP) and works with any MCP-compatible AI app, including Claude, ChatGPT / Codex, Gemini, Copilot, Cursor, and more.

Back to Browse

Supertone MCP Server

by Supertone Inc

Developer ToolsUse Caution4.5MCP RegistryLocal

Free

Server data from the Official MCP Registry

Composable Supertone TTS toolkit: synthesis, voice search/preview/clone, usage — 31 languages

About

Composable Supertone TTS toolkit: synthesis, voice search/preview/clone, usage — 31 languages

Security Report

4.5

Use Caution4.5High Risk

A well-structured MCP server for Supertone TTS API with proper authentication via API key and reasonable permission scope. Input validation is comprehensive across all tool parameters, and error handling is robust. The autoplay feature has minor platform limitations (macOS-only), and some subprocess handling could be slightly more defensive, but these are low-severity quality issues that do not constitute security vulnerabilities. Permissions align appropriately with the server's purpose. Supply chain analysis found 3 known vulnerabilities in dependencies (0 critical, 3 high severity). Package verification found 1 issue.

3 files analyzed · 8 issues found

Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.

Permissions Required

This plugin requests these system permissions. Most are normal for its category.

env_vars

Check that this permission is expected for this type of plugin.

HTTP Network Access

Connects to external APIs or services over the internet.

File System Read

Reads files on your machine. Normal for tools that analyze or process local data.

File System Write

Writes or modifies files on your machine. Check that this is expected for the tool.

process_spawn

Check that this permission is expected for this type of plugin.

What You'll Need

Set these up before or after installing:

Supertone API keyRequired

Environment variable: SUPERTONE_API_KEY

Default voice_id for synthesis (falls back to a preset voice)Optional

Environment variable: SUPERTONE_MCP_VOICE_ID

Directory where audio files are saved (default ~/supertone-tts-output/)Optional

Environment variable: SUPERTONE_OUTPUT_DIR

How to Install

Add this to your MCP configuration file:

{
  "mcpServers": {
    "io-github-supertone-inc-supertone-mcp": {
      "env": {
        "SUPERTONE_API_KEY": "your-supertone-api-key-here",
        "SUPERTONE_OUTPUT_DIR": "your-supertone-output-dir-here",
        "SUPERTONE_MCP_VOICE_ID": "your-supertone-mcp-voice-id-here"
      },
      "args": [
        "supertone-mcp"
      ],
      "command": "uvx"
    }
  }
}

Documentation

View on GitHub

From the project's GitHub README.

supertone-mcp

A composable MCP toolkit for the Supertone TTS API. Rather than a single "speak this text" command, it exposes Supertone's SDK as a set of building-block tools — synthesis, voice discovery, preview, duration/credit prediction, usage tracking, and full voice-cloning CRUD — that an LLM assembles to fulfill a request. Works in Claude Desktop, Cursor, or any MCP-compatible client.

Covers Korean, English, Japanese, and 31 languages total. Speed (0.5x–2.0x), pitch shift (-24 to +24 semitones), emotion styles, per-call output mode, streaming, and model selection.

Features

Synthesis

text_to_speech — Convert text to audio. Per-call control of output_mode (files / resources / both), autoplay, streaming, model, plus include_phonemes / normalized_text. Long text is auto-chunked by the SDK.
predict_duration — Estimate audio length (and credit cost) without synthesizing.

Voice discovery (preset)

search_voice — Filter the catalog by language, gender, age, use_case, style, model, name, or description.
get_voice — Full detail for one voice.
preview_voice — Sample audio URLs for a voice (filterable by language/style/model).

Custom voice cloning

clone_voice — Create a cloned voice from a local WAV/MP3 (≤3MB).
search_custom_voice — List/filter cloned voices.
get_custom_voice — Full detail for one cloned voice.
edit_custom_voice — Update name and/or description.
delete_custom_voice — Permanently delete (irreversible).

Audio assembly

merge_audio_files — Concatenate two or more local audio files (mp3/wav) into one via a bundled ffmpeg. Supports plain concat, silence gaps between clips (gap_ms), or crossfade blending (crossfade_ms). Output format auto-detected (mixed → mp3) or forced via output_format. No system ffmpeg required.

Usage & credits

get_credit_balance — Remaining credits.
get_usage_history — Usage over a time window.
get_voice_usage — Usage for a specific voice.

Breaking changes & migration (0.2.0)

0.2.0 moves behavior control out of environment variables and into per-call tool parameters — so the LLM decides per request, not the server config.

Before (env var)	After (per-call parameter)	Note
`SUPERTONE_MCP_OUTPUT_MODE=files\|resources\|both`	`text_to_speech(output_mode=...)`	Default still `files`
`SUPERTONE_MCP_AUTOPLAY=true`	`text_to_speech(autoplay=...)`	Default changed `true` → `false` (playback is now explicit)
(always streamed)	`text_to_speech(streaming=...)`	New, default `false` (one-shot). `streaming=true` requires `model="sona_speech_1"`

Other changes:

Default model changed sona_speech_1 → sona_speech_2_flash.
list_voices was removed (since the discovery release) and replaced by search_voice — call it with no arguments to reproduce the old "list everything" behavior.
No more hard 300-character limit — longer text is auto-chunked by the SDK (credit/latency scale with length).

If you previously set SUPERTONE_MCP_OUTPUT_MODE or SUPERTONE_MCP_AUTOPLAY, remove them from your client config and pass output_mode / autoplay per call instead. (The server prints a one-time stderr notice if it sees the removed vars.)

Installation

# Using uvx (recommended)
uvx supertone-mcp

# Using pip
pip install supertone-mcp

Configuration

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "supertone-tts": {
      "command": "uvx",
      "args": ["supertone-mcp"],
      "env": {
        "SUPERTONE_API_KEY": "your-api-key-here"
      }
    }
  }
}

Cursor

Add to your Cursor MCP settings (same JSON shape as above).

Environment Variables

Only authentication and stable defaults are configured via the environment — all behavior is controlled per call.

Variable	Required	Default	Description
`SUPERTONE_API_KEY`	Yes	—	Your Supertone API key
`SUPERTONE_MCP_VOICE_ID`	No	preset voice (Aiden, multilingual)	Default `voice_id` for `text_to_speech` / `predict_duration` (override per call)
`SUPERTONE_OUTPUT_DIR`	No	`~/supertone-tts-output/`	Directory where audio files are saved (used by `output_mode=files`/`both`)

Removed in 0.2.0: SUPERTONE_MCP_OUTPUT_MODE and SUPERTONE_MCP_AUTOPLAY — see Migration.

Output modes (`text_to_speech` `output_mode`)

Mode	Returns	Use when
`files` (default)	Plain text with the saved file path + metadata	You want the file on disk
`resources`	MCP `AudioContent` + `TextContent` (no file written)	The client renders audio inline (e.g., Claude.ai chat)
`both`	File on disk and `AudioContent`/`TextContent`	You want both — preview inline, keep the file

Usage Examples

The MCP client routes natural-language requests across these tools — the value of the toolkit is composition: the LLM chains several tools to satisfy one request.

Example 1 — Discover → preview → estimate cost → synthesize

"Find a calm Korean female voice, let me hear a sample, check the cost, then make this announcement as an mp3."

The LLM assembles:

search_voice(language="ko", gender="female", style="neutral")   # find candidates
  → preview_voice(voice_id)                                       # sample URLs to confirm the voice
  → predict_duration(text, voice_id) + get_credit_balance()       # gauge cost before spending
  → text_to_speech(text, voice_id, output_format="mp3",
                   output_mode="files")                           # synthesize

Example 2 — Clone my voice → use it right away

"Make a cloned voice from ~/recordings/sample.wav named MyVoice, then read this greeting with it and play it for me."

The LLM assembles:

clone_voice(name="MyVoice", audio_path="~/recordings/sample.wav")   # create the cloned voice
  → get_custom_voice(voice_id)                                       # confirm it was created
  → text_to_speech(text, voice_id=<cloned>, autoplay=true)           # synthesize, then play immediately

autoplay is a per-call parameter (default false), so playback happens only when explicitly requested.

Tool Parameters

`text_to_speech`

Parameter	Type	Required	Default	Description
`text`	string	Yes	—	Text to convert (long text is auto-chunked by the SDK)
`voice_id`	string	No	env or preset	Voice identifier (browse via `search_voice`)
`language`	string	No	`ko`	Language code — one of 31 (`ko`, `en`, `ja`, …)
`output_format`	string	No	`mp3`	`mp3` or `wav`
`model`	string	No	`sona_speech_2_flash`	`sona_speech_1`, `sona_speech_2`, `sona_speech_2_flash`, `sona_speech_2t`, `sona_speech_3t`, `supertonic_api_1`, `supertonic_api_3`
`speed`	float	No	`1.0`	0.5–2.0
`pitch_shift`	int	No	`0`	-24 to +24 semitones
`style`	string	No	—	Emotion style (varies by voice)
`output_mode`	string	No	`files`	`files`, `resources`, or `both` (see Output modes)
`autoplay`	bool	No	`false`	Play the audio locally after synthesis (macOS `afplay`)
`streaming`	bool	No	`false`	Stream synthesis. Only supported by `model="sona_speech_1"`
`include_phonemes`	bool	No	`false`	Return phoneme timing data alongside the audio
`normalized_text`	string	No	—	Pre-normalized text (only used by `sona_speech_2` / `sona_speech_2_flash`)

`predict_duration`

Same core parameter schema as text_to_speech (long text auto-chunked). Returns "Predicted duration: 2.34s (credit usage is proportional to duration).".

`search_voice`

All parameters optional. With no filters → full catalog. With any filter → first response line is Filters applied: ....

Parameter	Type	Description
`language`	string	e.g., `ko`, `en`, `ja`
`gender`	string	e.g., `male`, `female`
`age`	string	e.g., `young_adult`, `child`
`use_case`	string	e.g., `narration`, `advertisement`
`style`	string	e.g., `neutral`, `happy`
`model`	string	e.g., `sona_speech_2_flash`
`name`	string	partial match
`description`	string	partial match

`get_voice` / `preview_voice`

Tool	Required	Optional
`get_voice`	`voice_id`	—
`preview_voice`	`voice_id`	`language`, `style`, `model` (filter samples)

`clone_voice`

Parameter	Type	Required	Description
`name`	string	Yes	Display name (non-empty)
`audio_path`	string	Yes	Local WAV or MP3 path (≤3MB). Supports `~` expansion
`description`	string	No	Optional note

Custom voice CRUD

Tool	Required	Optional
`search_custom_voice`	—	`name`, `description` (partial match)
`get_custom_voice`	`voice_id`	—
`edit_custom_voice`	`voice_id`	`name`, `description` (at least one required)
`delete_custom_voice`	`voice_id`	— (IRREVERSIBLE)

Usage & credits

Tool	Required	Optional
`get_credit_balance`	—	—
`get_usage_history`	—	— (reports a recent default window)
`get_voice_usage`	`voice_id`	—

`merge_audio_files`

Parameter	Type	Required	Description
`input_paths`	string[]	Yes	Two or more local mp3/wav paths (`~` expansion supported). A single file is returned as-is
`gap_ms`	int	No	Silence (ms) inserted at each junction. Default `0`. Mutually exclusive with `crossfade_ms`
`crossfade_ms`	int	No	Crossfade blend (ms) at each junction. Default `0`. Mutually exclusive with `gap_ms`
`output_format`	string	No	Force `mp3` or `wav`. If omitted: all-same-ext → that ext; mixed → `mp3`

Development

# Clone and install
git clone https://github.com/supertone-inc/supertone-mcp.git
cd supertone-mcp
uv sync

# Run tests
uv run pytest -q

# Run with coverage
uv run pytest --cov=src --cov-report=term-missing

License

MIT

Reviews

No reviews yet

Be the first to review this server!

More Developer Tools MCP Servers

Git

Free

by Modelcontextprotocol · Developer Tools

Read, search, and manipulate Git repositories programmatically

Fetch

Free

by Modelcontextprotocol · Developer Tools

Web content fetching and conversion for efficient LLM usage

Toleno

Free

by Toleno · Developer Tools

Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.

Supertone MCP Server

About

Security Report

Findings (8)Action required

Permissions Required

What You'll Need

How to Install

Documentation

supertone-mcp

Features

Breaking changes & migration (0.2.0)

Installation

Configuration

Claude Desktop

Cursor

Environment Variables

Output modes (`text_to_speech` `output_mode`)

Usage Examples

Example 1 — Discover → preview → estimate cost → synthesize

Example 2 — Clone my voice → use it right away

Tool Parameters

`text_to_speech`

`predict_duration`

`search_voice`

`get_voice` / `preview_voice`

`clone_voice`

Custom voice CRUD

Usage & credits

`merge_audio_files`

Development

License

Reviews

No reviews yet

More Developer Tools MCP Servers

Git

Fetch

Toleno

mcp-creator-python

MarkItDown

FinAgent

Supertone MCP Server

About

Security Report

Findings (8)Action required

Permissions Required

What You'll Need

How to Install

Documentation

supertone-mcp

Features

Breaking changes & migration (0.2.0)

Installation

Configuration

Claude Desktop

Cursor

Environment Variables

Output modes (text_to_speech output_mode)

Usage Examples

Example 1 — Discover → preview → estimate cost → synthesize

Example 2 — Clone my voice → use it right away

Tool Parameters

text_to_speech

predict_duration

search_voice

get_voice / preview_voice

clone_voice

Custom voice CRUD

Usage & credits

merge_audio_files

Development

License

Reviews

No reviews yet

More Developer Tools MCP Servers

Git

Fetch

Toleno

mcp-creator-python

MarkItDown

FinAgent

Output modes (`text_to_speech` `output_mode`)

`text_to_speech`

`predict_duration`

`search_voice`

`get_voice` / `preview_voice`

`clone_voice`

`merge_audio_files`