How do I install Spark Sql?

Spark Sql is a local plugin. Install it using PyPI package: spark-sql-mcp-server and add the generated configuration snippet to your AI app's MCP config file. Then restart your AI app.

What credentials does Spark Sql need?

Spark Sql requires the following credentials or environment variables: SPARK_HOST, SPARK_PORT, SPARK_DATABASE, SPARK_AUTH, SPARK_USERNAME, SPARK_PASSWORD, SPARK_KERBEROS_SERVICE_NAME. You can find setup instructions on the server detail page.

What AI apps work with Spark Sql?

Spark Sql uses the Model Context Protocol (MCP) and works with any MCP-compatible AI app, including Claude, ChatGPT / Codex, Gemini, Copilot, Cursor, and more.

Back to Browse

Spark Sql MCP Server

by Aidancorrell

Developer ToolsModerate6.9MCP RegistryLocal

Free

Server data from the Official MCP Registry

Query Spark SQL clusters via Thrift/HiveServer2. Works with Spark, EMR, Hive, Impala.

About

Query Spark SQL clusters via Thrift/HiveServer2. Works with Spark, EMR, Hive, Impala.

Security Report

6.9

Moderate6.9Moderate Risk

Valid MCP server (1 strong, 5 medium validity signals). 1 code issue detected. 3 known CVEs in dependencies (0 critical, 3 high severity) Package registry verified. Imported from the Official MCP Registry. 1 finding(s) downgraded by scanner intelligence.

12 files analyzed · 5 issues found

Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.

Permissions Required

This plugin requests these system permissions. Most are normal for its category.

env_vars

Check that this permission is expected for this type of plugin.

file_system

Check that this permission is expected for this type of plugin.

Shell Command Execution

Runs commands on your machine. Be cautious — only use if you trust this plugin.

What You'll Need

Set these up before or after installing:

Hostname of the Spark Thrift ServerOptional

Environment variable: SPARK_HOST

Port of the Spark Thrift Server (default: 10000)Optional

Environment variable: SPARK_PORT

Default database to useOptional

Environment variable: SPARK_DATABASE

Authentication method: NONE, LDAP, KERBEROS, CUSTOM, or NOSASLOptional

Environment variable: SPARK_AUTH

Username for LDAP authenticationOptional

Environment variable: SPARK_USERNAME

Password for LDAP authenticationRequired

Environment variable: SPARK_PASSWORD

Kerberos service name (default: hive)Optional

Environment variable: SPARK_KERBEROS_SERVICE_NAME

How to Install

Add this to your MCP configuration file:

{
  "mcpServers": {
    "io-github-aidancorrell-spark-sql-mcp-server": {
      "env": {
        "SPARK_AUTH": "your-spark-auth-here",
        "SPARK_HOST": "your-spark-host-here",
        "SPARK_PORT": "your-spark-port-here",
        "SPARK_DATABASE": "your-spark-database-here",
        "SPARK_PASSWORD": "your-spark-password-here",
        "SPARK_USERNAME": "your-spark-username-here",
        "SPARK_KERBEROS_SERVICE_NAME": "your-spark-kerberos-service-name-here"
      },
      "args": [
        "spark-sql-mcp-server"
      ],
      "command": "uvx"
    }
  }
}

Documentation

View on GitHub

From the project's GitHub README.

Spark SQL MCP Server

An MCP server that enables AI assistants to query Spark SQL clusters via the Thrift/HiveServer2 protocol.

Works with any HiveServer2-compatible system: Apache Spark, AWS EMR, Hive, Impala, Presto.

Features

Query Spark SQL — Execute read-only SQL queries against your Spark cluster
Schema Discovery — List databases, tables, and describe table structures
Multiple Auth Methods — NONE, LDAP, NOSASL, CUSTOM, and Kerberos authentication
EMR Compatible — Works with AWS EMR clusters out of the box
Read-Only Enforcement — Only SELECT, SHOW, DESCRIBE, EXPLAIN, and WITH statements are allowed
Safety Defaults — Automatic LIMIT clause on unbounded queries, sanitized error messages

Installation

pip install spark-sql-mcp-server

Or run directly with uvx:

uvx spark-sql-mcp-server

Quick Start

1. Set Environment Variables

export SPARK_HOST="your-emr-master-node.amazonaws.com"
export SPARK_PORT="10000"        # default
export SPARK_DATABASE="default"  # default
export SPARK_AUTH="NONE"         # NONE | LDAP | KERBEROS | CUSTOM | NOSASL

2. Add to Claude Code

Global (all projects) — add to ~/.claude.json under your project's mcpServers:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "your-emr-master-node.amazonaws.com",
        "SPARK_PORT": "10000",
        "SPARK_AUTH": "NONE"
      }
    }
  }
}

Project-level — add to .claude/mcp.json in your repo:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "your-emr-master-node.amazonaws.com",
        "SPARK_PORT": "10000",
        "SPARK_AUTH": "NONE"
      }
    }
  }
}

3. Add to Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "your-emr-master-node.amazonaws.com",
        "SPARK_PORT": "10000"
      }
    }
  }
}

4. Query

Ask Claude things like:

"What databases are available in our Spark cluster?"
"Show me the schema of the sales.transactions table"
"Query the top 10 customers by revenue from the analytics database"

Available Tools

Tool	Description
`list_databases`	List all available databases
`list_tables`	List tables in a database
`describe_table`	Get table schema (columns, types)
`execute_query`	Run read-only SQL queries with formatted results

Authentication

No Auth (default)

export SPARK_AUTH="NONE"

LDAP

export SPARK_AUTH="LDAP"
export SPARK_USERNAME="your-username"
export SPARK_PASSWORD="your-password"

Kerberos

export SPARK_AUTH="KERBEROS"
export SPARK_KERBEROS_SERVICE_NAME="hive"  # default
# Ensure you have a valid Kerberos ticket (kinit)

AWS EMR Setup

Security Group — Allow inbound traffic on port 10000 from your IP

SSH Tunnel (recommended):

ssh -i your-key.pem -L 10000:localhost:10000 hadoop@your-emr-master

Set SPARK_HOST=localhost

Development

git clone https://github.com/aidancorrell/spark-sql-mcp-server.git
cd spark-sql-mcp-server
pip install -e ".[dev]"
pytest
ruff check .

Local Testing with Docker

A Docker Compose setup provides a local Spark Thrift Server with sample data for integration testing.

# Start the Spark Thrift Server
cd docker && docker compose up -d

# Wait for it to be ready (takes ~30s on first start)
docker logs -f spark-thrift-server  # look for "Sample data loaded."

# Run integration tests
pytest -m integration -v

# Tear down
cd docker && docker compose down -v

The local server comes with sample tables: default.employees, default.orders, and test_db.metrics.

Unit tests run by default with pytest (integration tests are skipped unless -m integration is specified).

Using the local server with Claude Code

With the Docker Spark server running, add it to your MCP config to test the server interactively.

Global — add to ~/.claude.json under your project's mcpServers:

{
  "spark-sql": {
    "command": "uvx",
    "args": ["spark-sql-mcp-server"],
    "env": {
      "SPARK_HOST": "localhost",
      "SPARK_PORT": "10000",
      "SPARK_AUTH": "NONE"
    }
  }
}

Project-level — add to .claude/mcp.json:

{
  "mcpServers": {
    "spark-sql": {
      "command": "uvx",
      "args": ["spark-sql-mcp-server"],
      "env": {
        "SPARK_HOST": "localhost",
        "SPARK_PORT": "10000",
        "SPARK_AUTH": "NONE"
      }
    }
  }
}

Then start a new Claude Code session and ask it to query the sample data.

Security

Read-Only Enforcement

The execute_query tool only allows read-only SQL statements. Queries must start with one of: SELECT, SHOW, DESCRIBE, DESC, EXPLAIN, or WITH. All other statement types (DROP, INSERT, DELETE, CREATE, ALTER, SET, ADD JAR, etc.) are rejected before reaching the Spark cluster.

Error Sanitization

Database errors are sanitized before being returned to the MCP client. Internal details such as server hostnames, file paths, and stack traces are not exposed. Connection failures report only the target host/port and error type.

Credential Handling

Passwords are never included in log output or error messages
The SparkConfig object masks passwords in its string representation
SPARK_PASSWORD is marked as a secret in the MCP registry schema

Known Limitations

No TLS/SSL support — Thrift connections are unencrypted. For production use with LDAP auth, use an SSH tunnel to protect credentials in transit.
No query timeout — Long-running queries are not automatically cancelled. Rely on Spark cluster-level timeout configuration.
No per-user access control — All queries execute with the privileges of the configured Spark user. Use HiveServer2 authorization (Ranger, Sentry) to restrict access at the database level.
Auth mode defaults to NONE — Appropriate for local development but not for production. Set SPARK_AUTH to LDAP or KERBEROS for authenticated environments.

License

MIT

Reviews

No reviews yet

Be the first to review this server!

More Developer Tools MCP Servers

Git

Free

by Modelcontextprotocol · Developer Tools

Read, search, and manipulate Git repositories programmatically

Toleno

Free

by Toleno · Developer Tools

Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.

mcp-creator-python

Free

by mcp-marketplace · Developer Tools

Create, build, and publish Python MCP servers to PyPI — conversationally.

MarkItDown

Free

by Microsoft · Content & Media

Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption

mcp-creator-typescript

Free

by mcp-marketplace · Developer Tools

Scaffold, build, and publish TypeScript MCP servers to npm — conversationally

FinAgent

Free

by mcp-marketplace · Finance

Free stock data and market news for any MCP-compatible AI assistant.

Spark Sql MCP Server

About

Security Report

Findings (5)Action required

Permissions Required

What You'll Need

How to Install

Documentation

Spark SQL MCP Server

Features

Installation

Quick Start

1. Set Environment Variables

2. Add to Claude Code

3. Add to Claude Desktop

4. Query

Available Tools

Authentication

No Auth (default)

LDAP

Kerberos

AWS EMR Setup

Development

Local Testing with Docker

Using the local server with Claude Code

Security

Read-Only Enforcement

Error Sanitization

Credential Handling

Known Limitations

License

Reviews

No reviews yet

More Developer Tools MCP Servers

Git

Toleno

mcp-creator-python

MarkItDown

mcp-creator-typescript

FinAgent

Spark Sql MCP Server

About

Security Report

Findings (5)Action required

Permissions Required

What You'll Need

How to Install

Documentation

Spark SQL MCP Server

Features

Installation

Quick Start

1. Set Environment Variables

2. Add to Claude Code

3. Add to Claude Desktop

4. Query

Available Tools

Authentication

No Auth (default)

LDAP

Kerberos

AWS EMR Setup

Development

Local Testing with Docker

Using the local server with Claude Code

Security

Read-Only Enforcement

Error Sanitization

Credential Handling

Known Limitations

License

Reviews

No reviews yet

More Developer Tools MCP Servers

Git

Toleno

mcp-creator-python

MarkItDown

mcp-creator-typescript

FinAgent