20 MCP tools + 47 skills for ENCODE Project genomics — search, download, pipelines
20 MCP tools + 47 skills for ENCODE Project genomics — search, download, pipelines
Valid MCP server (2 strong, 1 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry.
4 files analyzed · 2 issues found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
This plugin requests these system permissions. Most are normal for its category.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-ammawla-encode-toolkit": {
"args": [
"-y",
"encode-toolkit"
],
"command": "npx"
}
}
}From the project's GitHub README.
Search ENCODE, cross-reference 14 databases, run 7 analysis pipelines, and generate publication-ready methods — all from natural language in Claude Code.
Start from ENCODE but go everywhere: discover histone peaks, cross-reference with GWAS variants, check ClinVar pathogenicity, pull GTEx expression, analyze TF binding motifs from JASPAR, run pipelines, and generate publication-ready methods with full provenance — in one conversation.
If you use ENCODE-Toolkit, please cite:
Alex M. Mawla. (2026). ENCODE-Toolkit: an MCP server, Claude plugin, and skills suite for ENCODE genomic data access and analysis. Zenodo. https://doi.org/10.5281/zenodo.18917511
@software{mawla_2026_encode_toolkit,
author = {Mawla, Alex M.},
title = {ENCODE-Toolkit: an MCP server, Claude plugin, and skills suite for ENCODE genomic data access and analysis},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18917511},
url = {https://doi.org/10.5281/zenodo.18917511}
}
Start a new Claude Code session and enter:
/plugin marketplace add ammawla/encode-toolkit
/plugin install encode-toolkit
That's it. All 20 tools, 47 skills, and the MCP connector are now available.
If you only need the 20 MCP tools without the 47 workflow skills:
claude mcp add encode -- uvx encode-toolkit
npx encode-toolkit
Or in MCP client config: { "command": "npx", "args": ["encode-toolkit"] }
pip install encode-toolkit
Then use encode-toolkit as the command in any MCP client configuration:
{
"mcpServers": {
"encode": {
"command": "encode-toolkit"
}
}
}
Add to your claude_desktop_config.json:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"encode": {
"command": "uvx",
"args": ["encode-toolkit"]
}
}
}
No installation needed when using
uvx. Just add the config and restart Claude.
Add to .vscode/mcp.json in your workspace:
{
"mcp": {
"servers": {
"encode": {
"command": "uvx",
"args": ["encode-toolkit"]
}
}
}
}
Add to .cursor/mcp.json:
{
"mcpServers": {
"encode": {
"command": "uvx",
"args": ["encode-toolkit"]
}
}
}
Add to .windsurf/mcp.json:
{
"mcpServers": {
"encode": {
"command": "uvx",
"args": ["encode-toolkit"]
}
}
}
ENCODE Toolkit integrates 14 databases through live API tools and guided skills.
| Database | Access Method | Use Case |
|---|---|---|
| ENCODE | 20 MCP tools (live API) | ChIP-seq, ATAC-seq, RNA-seq, Hi-C, WGBS, CUT&RUN data |
| GTEx | REST API (skill) | Tissue-specific gene expression across 54 tissues |
| ClinVar | E-utilities (skill) | Variant clinical significance and pathogenicity |
| GWAS Catalog | REST API (skill) | Trait-variant associations from genome-wide studies |
| JASPAR | REST API (skill) | Transcription factor binding motif profiles |
| CellxGene | Census API (skill) | Single-cell expression atlas across tissues |
| gnomAD | GraphQL (skill) | Population allele frequencies and gene constraint |
| Ensembl | REST API (skill) | VEP annotation, Regulatory Build, coordinate liftover |
| UCSC Genome Browser | REST API (skill) | cCRE tracks, TF clusters, sequence retrieval |
| GEO | E-utilities (skill) | Complementary expression/epigenomic datasets |
| PubMed | MCP server | Literature search and citation |
| bioRxiv | MCP server | Preprint discovery |
| ClinicalTrials.gov | MCP server | Clinical trial cross-reference |
| Open Targets | MCP server | Drug target identification |
Using genomics databases today means:
With ENCODE Toolkit, just tell Claude what you need:
"Find all histone ChIP-seq data for human pancreas tissue"
Claude searches ENCODE, returns a structured table of 66 experiments with targets, replicates, and file counts. Downloads are organized by experiment with MD5 verification and full provenance tracking.
Five core tools are shown below. The remaining 15 are collapsed for readability.
encode_search_experimentsSearch ENCODE experiments with 20+ filters.
| Parameter | Type | Description |
|---|---|---|
assay_title | string | Assay type: "Histone ChIP-seq", "ATAC-seq", "RNA-seq", "Hi-C", etc. |
organism | string | Species (default: "Homo sapiens") |
organ | string | Organ: "pancreas", "brain", "liver", "heart", "kidney", etc. |
biosample_type | string | "tissue", "cell line", "primary cell", "organoid" |
target | string | ChIP target: "H3K27me3", "H3K4me3", "CTCF", etc. |
biosample_term_name | string | Specific biosample: "GM12878", "HepG2", etc. |
limit | int | Max results (default: 25) |
encode_get_experimentGet full details for a single experiment including all files, quality metrics, and audit info.
| Parameter | Type | Description |
|---|---|---|
accession | string | Experiment ID (e.g., "ENCSR133RZO") |
encode_download_filesDownload specific files by accession to a local directory.
| Parameter | Type | Description |
|---|---|---|
file_accessions | list[str] | File IDs to download (e.g., ["ENCFF635JIA"]) |
download_dir | string | Local path to save files |
organize_by | string | "flat", "experiment", "format", "experiment_format" |
verify_md5 | bool | Verify file integrity (default: true) |
encode_batch_downloadSearch + download in one step. Runs in preview mode by default.
| Parameter | Type | Description |
|---|---|---|
download_dir | string | Local path to save files |
file_format | string | File format to download |
assay_title | string | Assay type filter |
organ | string | Organ filter |
dry_run | bool | Preview only (default: true). Set false to download. |
encode_track_experimentTrack an experiment locally with its publications, methods, and pipeline info.
| Parameter | Type | Description |
|---|---|---|
accession | string | Experiment ID to track |
fetch_publications | bool | Fetch associated publications (default: true) |
fetch_pipelines | bool | Fetch pipeline/analysis info (default: true) |
notes | string | Optional notes to attach |
encode_list_filesList files for a specific experiment with format/type filters.
| Parameter | Type | Description |
|---|---|---|
experiment_accession | string | Experiment ID |
file_format | string | "fastq", "bam", "bed", "bigWig", "bigBed", etc. |
output_type | string | "reads", "peaks", "signal", "alignments", etc. |
assembly | string | "GRCh38", "mm10", etc. |
preferred_default | bool | Only return recommended files |
encode_search_filesSearch files across all experiments with combined experiment + file filters.
| Parameter | Type | Description |
|---|---|---|
file_format | string | File format filter |
assay_title | string | Assay type of parent experiment |
organ | string | Organ of parent experiment |
target | string | ChIP/CUT&RUN target |
output_type | string | Output type filter |
assembly | string | Genome assembly |
encode_get_metadataList valid filter values for any parameter.
| Parameter | Type | Description |
|---|---|---|
metadata_type | string | "assays", "organisms", "organs", "biosample_types", "file_formats", "output_types", "assemblies" |
encode_get_facetsGet live counts from ENCODE showing what data exists for given filters.
| Parameter | Type | Description |
|---|---|---|
assay_title | string | Pre-filter by assay |
organism | string | Pre-filter by organism |
organ | string | Pre-filter by organ |
encode_get_file_infoGet detailed metadata for a single file.
| Parameter | Type | Description |
|---|---|---|
accession | string | File ID (e.g., "ENCFF635JIA") |
encode_manage_credentialsStore, check, or clear ENCODE credentials for restricted data access.
| Parameter | Type | Description |
|---|---|---|
action | string | "store", "check", or "clear" |
access_key | string | ENCODE access key (for "store") |
secret_key | string | ENCODE secret key (for "store") |
encode_list_trackedList all experiments in your local tracker with metadata, publication counts, and derived file counts.
| Parameter | Type | Description |
|---|---|---|
assay_title | string | Filter by assay type |
organism | string | Filter by organism |
organ | string | Filter by organ |
encode_get_citationsGet publications for tracked experiments. Export as BibTeX or RIS for reference managers.
| Parameter | Type | Description |
|---|---|---|
accession | string | Specific experiment (or all if omitted) |
export_format | string | "json" (default), "bibtex", or "ris" |
encode_compare_experimentsAnalyze whether two experiments are compatible for combined analysis.
| Parameter | Type | Description |
|---|---|---|
accession1 | string | First experiment ID |
accession2 | string | Second experiment ID |
encode_summarize_collectionGet grouped statistics of your tracked experiment collection.
| Parameter | Type | Description |
|---|---|---|
assay_title | string | Filter by assay type |
organism | string | Filter by organism |
organ | string | Filter by organ |
encode_log_derived_fileLog a file you created from ENCODE data for provenance tracking.
| Parameter | Type | Description |
|---|---|---|
file_path | string | Path to your derived file |
source_accessions | list[str] | ENCODE accessions this was derived from |
description | string | What the file contains |
tool_used | string | Tool/software used |
parameters | string | Command or parameters used |
encode_get_provenanceView provenance chains from derived files back to source ENCODE data.
| Parameter | Type | Description |
|---|---|---|
file_path | string | Get provenance for a specific file |
source_accession | string | List all files derived from an accession |
encode_export_dataExport tracked experiments as a table (CSV, TSV, or JSON) for Excel, R, pandas.
| Parameter | Type | Description |
|---|---|---|
format | string | "csv" (default), "tsv", or "json" |
assay_title | string | Filter by assay type |
encode_link_referenceLink external references (PubMed, bioRxiv, ClinicalTrials, GEO) to tracked experiments.
| Parameter | Type | Description |
|---|---|---|
experiment_accession | string | ENCODE experiment accession |
reference_type | string | "pmid", "doi", "nct_id", "preprint_doi", "geo_accession", "other" |
reference_id | string | The identifier value |
encode_get_referencesGet external references linked to tracked experiments for cross-server workflows.
| Parameter | Type | Description |
|---|---|---|
experiment_accession | string | Filter by experiment (optional) |
reference_type | string | Filter by type (optional) |
Most ENCODE data is public and requires no authentication. Just install and use.
For restricted/unreleased data, ask Claude: "Store my ENCODE credentials"
Credentials are encrypted using your OS keyring (macOS Keychain, Linux Secret Service, Windows Credential Locker) and never stored in plaintext. Get your access keys from your ENCODE profile.
When installed as a Claude Code plugin, ENCODE Toolkit includes 47 literature-backed workflow skills that guide Claude through complex genomics tasks. Each analysis skill includes evidence-based quality thresholds, assay-specific metrics, and citations to primary literature.
| Skill | Description |
|---|---|
setup | Install and configure the ENCODE Toolkit server |
search-encode | Search and explore ENCODE experiments and files |
download-encode | Download files with organization and verification |
track-experiments | Track experiments, citations, and provenance locally |
cross-reference | Connect ENCODE data to PubMed, bioRxiv, ClinicalTrials.gov |
| Skill | Description |
|---|---|
quality-assessment | Evaluate experiment quality using ENCODE metrics — assay-specific thresholds for ChIP-seq (FRiP, NSC, RSC, NRF, IDR), ATAC-seq (TSS enrichment, NFR ratio), RNA-seq (mapping rate, gene body coverage), WGBS (bisulfite conversion, CpG coverage), Hi-C (cis/trans ratio), and CUT&RUN/CUT&Tag. Backed by Landt 2012, Buenrostro 2013, ENCODE Phase 3 (2020), Li 2011 |
integrative-analysis | Combine multiple experiments with batch effect awareness — integration strategies (peak overlap, signal correlation, DiffBind, DESeq2, ChromHMM, ABC model). Backed by Ernst & Kellis 2012, Ross-Innes 2012, Love 2014, Fulco 2019 |
regulatory-elements | Discover enhancers, promoters, insulators from combinatorial histone marks — ENCODE cCRE classification (926,535 elements), ChromHMM state interpretation. Backed by ENCODE Phase 3 (2020), Roadmap Epigenomics (2015), Whyte 2013 |
epigenome-profiling | Build comprehensive chromatin state profiles — three-tiered histone panels, ChromHMM 15-state model, bivalent chromatin analysis. References the chromatin biology catalog |
compare-biosamples | Compare experiments across tissues and cell types — biosample hierarchy, tissue-specific regulation, batch effect detection. Backed by Roadmap Epigenomics (2015), Leek 2010 |
visualization-workflow | Generate publication-quality visualizations: genome browser tracks, heatmaps, and signal profiles |
motif-analysis | Discover and analyze TF binding motifs in regulatory regions using HOMER, MEME, and JASPAR |
peak-annotation | Annotate genomic peaks with features (promoter/enhancer/intergenic), nearest genes, and functional categories |
batch-analysis | Batch processing and QC screening across multiple ENCODE experiments with systematic quality filtering |
| Skill | Description |
|---|---|
functional-screen-analysis | Analyze CRISPR screens, MPRA, and STARR-seq data from ENCODE — MAGeCK, BAGEL2, MPRAflow integration |
| Skill | Description |
|---|---|
histone-aggregation | Union merge of histone ChIP-seq peaks across studies — signalValue-based noise filtering, sample-of-origin tagging, ENCODE blacklist removal. Backed by ChIP-Atlas (Oki 2018), Amemiya 2019, Perna 2024 |
accessibility-aggregation | Union merge of ATAC-seq and DNase-seq peaks — cross-platform integration, peak summit preservation. Backed by Corces 2017, Amemiya 2019, Zhao 2020 |
hic-aggregation | Union catalog of Hi-C chromatin loops (BEDPE) — resolution-aware anchor matching, loop caller concordance tracking. Backed by Loop Catalog (Reyna 2025), Mustache (Roayaei Ardakany 2020) |
methylation-aggregation | Aggregate WGBS methylation profiles — per-CpG weighted averaging, HMR/UMR/PMD identification. Backed by Schultz 2015, DMRcate (Peters 2021), Zhou 2020 |
| Skill | Description |
|---|---|
scrna-meta-analysis | Cross-study meta-analysis of scRNA-seq data — reproducibility assessment, TIN-based quality filtering, ambient RNA quantification. Backed by Tran 2020, Luecken & Theis 2019, Stuart 2019, Korsunsky 2019 |
multi-omics-integration | Integrate RNA-seq, ATAC-seq, Histone ChIP-seq, and TF ChIP-seq — ABC model regulatory predictions, signal correlation. Backed by Fulco 2019, Corces 2018, ENCODE Phase 3 (2020) |
| Skill | Description |
|---|---|
data-provenance | Full reproducibility tracking — tool versions, reference files, scripts, exact commands, timestamps, source-to-derived provenance chains |
cite-encode | Generate proper citations, BibTeX/RIS export, data availability statements |
variant-annotation | Annotate GWAS/disease variants with ENCODE functional data — variant-to-gene mapping via cCREs. Backed by Finucane 2015, Maurano 2012 |
pipeline-guide | Understand ENCODE uniform analysis pipelines and output types — pipeline specifications, Nextflow integration |
single-cell-encode | Work with scRNA-seq and scATAC-seq data — platform comparison, cross-study integration, WNN multimodal analysis. Backed by Hao 2021, Stuart 2019 |
disease-research | Disease-focused workflows — GWAS variant interpretation, disease-tissue mapping, heritability enrichment, drug target identification via Open Targets. Backed by Buniello 2019, Finucane 2015 |
publication-trust | Publication integrity assessment — 5-level trust scoring, retraction/erratum detection, citation analysis. Integrates with PubMed, bioRxiv, and Consensus |
bioinformatics-installer | Install all bioinformatics tools for ENCODE analyses — 7 conda environment YAMLs, 3 install scripts, 134+ tools across ChIP-seq, ATAC-seq, RNA-seq, WGBS, Hi-C, DNase-seq, CUT&RUN |
scientific-writing | Generate publication-ready methods sections, figure legends, supplementary tables, and data availability statements with full tool citations |
liftover-coordinates | Convert genomic coordinates between assembly versions (hg19/hg38, mm9/mm10) using UCSC liftOver, CrossMap, Ensembl REST API, and rtracklayer |
| Skill | Description |
|---|---|
gtex-expression | Query GTEx tissue expression data via REST API for gene expression context across 54 tissues |
clinvar-annotation | Annotate variants with ClinVar clinical significance, pathogenicity, and review status |
cellxgene-context | Query CellxGene single-cell atlas for cell type expression context across tissues |
gwas-catalog | Search NHGRI-EBI GWAS Catalog for trait associations, risk alleles, and study metadata |
jaspar-motifs | Query JASPAR database for transcription factor binding motifs and matrix profiles |
ensembl-annotation | Ensembl VEP variant annotation, Regulatory Build, coordinate liftover, gene lookup via REST API |
geo-connector | Search NCBI GEO for complementary datasets, cross-reference with ENCODE, FTP downloads |
gnomad-variants | gnomAD population allele frequencies, gene constraint (LOEUF/pLI), structural variants via GraphQL |
ucsc-browser | UCSC Genome Browser REST API for cCRE tracks, TF binding clusters, and sequence retrieval |
| Pipeline | Assay | Aligner | Caller |
|---|---|---|---|
pipeline-chipseq | ChIP-seq | BWA-MEM | MACS2 + IDR |
pipeline-atacseq | ATAC-seq | Bowtie2 | MACS2 (Tn5-adjusted) |
pipeline-rnaseq | RNA-seq | STAR | RSEM + Kallisto |
pipeline-wgbs | WGBS | Bismark | MethylDackel |
pipeline-hic | Hi-C | BWA | Juicer + HiCCUPS |
pipeline-dnaseseq | DNase-seq | BWA | Hotspot2 |
pipeline-cutandrun | CUT&RUN | Bowtie2 | SEACR |
Each pipeline includes a SKILL.md overview, 5-stage reference files (preprocessing through QC), a complete Nextflow DSL2 pipeline, a Dockerfile, and deployment configurations for local, SLURM, GCP, and AWS.
| File | Description |
|---|---|
skills/histone-aggregation/references/histone-marks-reference.md | Comprehensive chromatin biology catalog (1,442 lines) — 21 histone marks with writers/erasers/readers, 5 novel acylation marks, ChromHMM state models (5 to 51 states), TF co-binding patterns, chromatin remodeling complexes, DNA methylation-chromatin interplay, nucleosome dynamics, 3D genome organization, chromatin in disease. 74 primary references |
skills/*/references/literature.md | 33 per-skill literature reference documents — ~250 papers cataloged with DOI, PMID, citation counts, and skill-relevant key findings |
Most genomics tools give you one thing. ENCODE Toolkit gives you the full research loop:
| Capability | ENCODE Toolkit | Typical MCP servers |
|---|---|---|
| Live database access | 20 tools across 14 databases | Single database, read-only |
| Executable pipelines | 7 Nextflow DSL2 pipelines with Docker and cloud configs | None |
| Provenance tracking | Full audit trail from source data to derived files | None |
| Publication output | BibTeX/RIS citations, auto-generated methods sections | None |
| Literature backing | 100+ primary references with assay-specific QC thresholds | None |
| Workflow skills | 47 guided skills covering search to publication | Static documentation |
| Category | Assays |
|---|---|
| Histone/Chromatin | Histone ChIP-seq, TF ChIP-seq, ATAC-seq, DNase-seq, CUT&RUN, CUT&Tag, MNase-seq |
| Transcription | RNA-seq, total RNA-seq, small RNA-seq, long read RNA-seq, CAGE, RAMPAGE, PRO-seq, GRO-seq |
| 3D Genome | Hi-C, intact Hi-C, Micro-C, ChIA-PET, HiChIP, PLAC-seq, 5C |
| DNA Methylation | WGBS, RRBS, MeDIP-seq, MRE-seq |
| Functional | STARR-seq, MPRA, CRISPR screen, eCLIP, iCLIP |
| Single Cell | scRNA-seq, snATAC-seq, 10x multiome, SHARE-seq, Parse SPLiT-seq |
| Perturbation | CRISPRi + RNA-seq, shRNA + RNA-seq, siRNA + RNA-seq |
Supported file formats: fastq bam bed bigWig bigBed tsv csv hic tagAlign bedpe pairs fasta vcf tar
verify=FalseStep-by-step walkthroughs showing real Claude sessions, including actual API output and scientific interpretation.
| Vignette | Skills Demonstrated |
|---|---|
| 01 — Discovery & Search | Facets, search, metadata, quality-aware selection |
| 02 — Download & Track | File listing, download, tracking, citations, provenance |
| 03 — Epigenomics Workflow | Histone marks, ATAC-seq, aggregation skills |
| 04 — Variant & Disease Research | GWAS catalog, ClinVar, GTEx, JASPAR, gnomAD |
| 05 — Expression & Single-Cell | RNA-seq, scRNA-seq, GTEx, CellxGene, meta-analysis |
| 06 — Motif & Regulatory Analysis | TF ChIP-seq, chromatin states, HOMER/MEME |
| 07 — 3D Genome & Methylation | Hi-C loops, WGBS methylation, integrative analysis |
| 08 — Pipeline Execution | ChIP-seq/ATAC-seq/RNA-seq pipelines, Nextflow |
| 09 — Cross-Reference & Integration | GEO, PubMed, Ensembl, UCSC, multi-omics |
Every skill has a dedicated vignette in docs/skill-vignettes/ with a complete example session. Highlights:
| Skill | Vignette Scenario |
|---|---|
| data-provenance | Download, blacklist-filter, liftover, auto-generate methods section |
| histone-aggregation | Union merge of H3K27ac across 5 pancreas experiments |
| variant-annotation | rs7903146 in TCF7L2 with islet enhancer evidence scoring |
| pipeline-chipseq | Full Nextflow pipeline execution with ENCODE QC thresholds |
| gwas-catalog | T2D GWAS variants overlaid on islet H3K27ac enhancers |
| publication-trust | Trust assessment of artemisinin transdifferentiation claim |
| scrna-meta-analysis | 3-study islet integration following Mawla et al. 2019 framework |
See the full showcase for 15 detailed examples.
git clone https://github.com/ammawla/encode-toolkit.git
cd encode-toolkit
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Run the server locally:
encode-toolkit
Run tests:
pytest
uvx is installed: pip install uv or curl -LsSf https://astral.sh/uv/install.sh | shencode_get_facets to see what data actually exists for your filtersencode_get_metadata to check valid filter valuesDr. Alex M. Mawla, PhD
AGPL-3.0. See LICENSE for full terms.
For commercial licensing inquiries: ammawla@ucdavis.edu
Be the first to review this server!
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption
by mcp-marketplace · Developer Tools
Scaffold, build, and publish TypeScript MCP servers to npm — conversationally
by mcp-marketplace · Finance
Free stock data and market news for any MCP-compatible AI assistant.
by Taylorwilsdon · Productivity
Control Gmail, Calendar, Docs, Sheets, Drive, and more from your AI