Server data from the Official MCP Registry
Pseudonymizes sensitive data before it reaches cloud LLMs and restores it on the way back.
Pseudonymizes sensitive data before it reaches cloud LLMs and restores it on the way back.
Valid MCP server (2 strong, 4 medium validity signals). No known CVEs in dependencies. Package registry verified. Imported from the Official MCP Registry. Trust signals: trusted author (3/3 approved).
5 files analyzed · 1 issue found
Security scores are indicators to help you make informed decisions, not guarantees. Always review permissions before connecting any MCP server.
Add this to your MCP configuration file:
{
"mcpServers": {
"io-github-woladi-pseudonym-mcp": {
"args": [
"-y",
"pseudonym-mcp"
],
"command": "npx"
}
}
}From the project's GitHub README.
Local privacy proxy for LLMs — pseudonymizes sensitive data before it reaches the cloud, then restores it on the way back.
Sits between your application and any cloud LLM (Claude, GPT-4, Gemini…). Replaces PII with opaque tokens locally before the prompt ever leaves your machine, then seamlessly restores original values in the response — so users never see the tags.
detectLanguage()) infers the language from text content — --lang remains the authoritative override but is no longer the only input.[PERSON:1] map back to originals in an isolated, per-request session. Multiple round-trips preserve token coherence.regex only (no Ollama required), llm only, or hybrid (default).❌ Without pseudonym-mcp:
"John Smith, SSN 123-45-6789, card 4111 1111 1111 1111" → sent verbatim to OpenAI / Anthropic servers✅ With pseudonym-mcp:
"[PERSON:1], SSN [SSN:1], card [CREDIT_CARD:1]" before it leaves your machinepseudonym-mcp directly addresses the regulatory challenges of using cloud AI in data-sensitive contexts.
The EU General Data Protection Regulation (GDPR) classifies names, national ID numbers (like SSN or PESEL), bank account numbers (IBAN), email addresses, credit card numbers, and phone numbers as personal data under Article 4(1). Sending this data to a cloud LLM provider constitutes processing under Article 4(2) and triggers a range of obligations:
| GDPR Article | Obligation | How pseudonym-mcp helps |
|---|---|---|
| Art. 5(1)(c) | Data minimisation — only necessary data should be processed | Strips PII before transmission; the LLM receives only what it needs to reason |
| Art. 25 | Privacy by design and by default | Pseudonymization layer is built into the MCP transport, not bolted on |
| Art. 32 | Security of processing — appropriate technical measures | Local token substitution is a recognized technical measure under Recital 83 |
| Art. 44 | Transfers to third countries — requires safeguards | If no personal data is transferred, Art. 44 restrictions do not apply |
| Art. 4(5) | Pseudonymisation — explicitly recognized as a protective measure | Tokens are opaque; re-identification requires access to the local mapping store |
Note: Pseudonymisation under GDPR (Art. 4(5)) does not equal anonymisation — the data is still personal data in your system. However, it substantially reduces risk and demonstrates compliance with the accountability principle (Art. 5(2)).
The EU AI Act (in force from 2024) places additional requirements on high-risk AI systems that process personal data. Using pseudonym-mcp as an intermediary layer:
While GDPR originates in the EU, pseudonym-mcp is equally relevant for:
| Sector | Relevant regulation | PII types commonly handled |
|---|---|---|
| Healthcare | GDPR + HIPAA + national health data laws | Patient names, SSN, diagnoses |
| Banking & Finance | GDPR + PCI DSS + PSD2 + DORA | Credit cards, IBAN, SSN, PESEL |
| HR & Recruitment | GDPR Art. 9 (special categories) | Names, national IDs, contact details |
| Legal | GDPR + attorney-client privilege | Names, case numbers, personal details |
| Insurance | GDPR + Solvency II | Personal identifiers, health data |
| Public Sector (US) | CCPA + state privacy laws | SSN, driver's license numbers |
| Public Sector (PL) | GDPR + UODO + KRI | PESEL, NIP, REGON |
Your App / Claude Desktop
│
│ prompt with PII
▼
┌─────────────────────────┐
│ pseudonym-mcp │
│ │
│ Phase 1: Regex NER │ ← SSN, CREDIT_CARD, EMAIL, PHONE (en)
│ │ ← PESEL, IBAN, EMAIL, PHONE (pl)
│ Phase 2: Ollama NER │ ← PERSON, ORG (local LLM)
│ MappingStore (session) │ ← [TAG:N] ↔ original value
└────────────┬────────────┘
│ sanitized prompt (no PII)
▼
Cloud LLM API
(Claude / GPT-4 / Gemini)
│
│ response with [TAG:N] tokens
▼
┌─────────────────────────┐
│ pseudonym-mcp │
│ unmask_text / revert │ ← tokens → originals
└────────────┬────────────┘
│ restored response
▼
Your App / User
English (--lang en, default):
[PERSON:1] John Smith
[SSN:1] 123-45-6789
[CREDIT_CARD:1] 4111 1111 1111 1111
[ORG:1] Acme Corp
[EMAIL:1] john@acme.com
[PHONE:1] (555) 123-4567
Polish (--lang pl):
[PERSON:1] Jan Kowalski
[PESEL:1] 90010112318
[ORG:1] Auto-Lux
[IBAN:1] PL27114020040000300201355387
[EMAIL:1] jan@example.pl
[PHONE:1] +48 123 456 789
The mapping is stored in a session-scoped in-memory store. Each mask_text call returns a session_id; pass it back to unmask_text to restore originals.
You have a note:
Meeting with Jan Kowalski (PESEL: 90010112318) from Acme sp. z o.o.
We discussed a contract for 45 000 zł. Contact: jan.kowalski@acme.pl
In Claude Code you type:
Use mask_text on this note, then summarize the key points of the meeting.
pseudonym-mcp replaces PII locally before sending to Claude:
Meeting with [PERSON:1] ([PESEL:1]) from [ORG:1].
We discussed a contract for 45 000 zł. Contact: [EMAIL:1]
Claude responds (sees tokens only):
Meeting with [PERSON:1] from [ORG:1] covered a contract
for 45 000 zł. Follow up via [EMAIL:1].
pseudonym-mcp restores originals locally:
Meeting with Jan Kowalski from Acme sp. z o.o. covered
a contract for 45 000 zł. Follow up via jan.kowalski@acme.pl
Anthropic / OpenAI never saw any real data. The entire swap happens on your machine.
session_id# mask the entire vault once — save the session_id
Use mask_text on my notes — remember the session_id
# ask Claude anything across multiple prompts
Summarize all meetings from Q1
# Claude replies with tokens; restore originals
Use unmask_text with session_id abc123 on the response
The session_id keeps the token map alive for the entire session — the same [PERSON:1] always refers to the same person, no matter how many times they appear across different notes.
pseudonym-mcp ships two built-in prompt templates that chain masking, an LLM task, and unmasking into a single workflow — no glue code needed.
pseudonymize_task — inline text/pseudonymize_task text="Meeting with Jan Kowalski (PESEL: 90010112318). Contract: 45 000 zł." task="Extract action items"
What happens:
[PERSON:1], [PESEL:1]Optional lang argument: en (default) or pl.
privacy_scan_file — file / PDF (macOS only)Requires macos-vision-mcp — a separate MCP server that uses Apple's Vision framework to extract text from PDFs and images. macOS only.
/privacy_scan_file filePath="/Users/me/contracts/nda.pdf" task="Summarize obligations and deadlines"
What happens:
Optional arguments: task (default: summarize the key points), lang (en or pl).
Step 1 — Add to your MCP client (example for Claude Code — no install needed):
claude mcp add pseudonym-mcp -- npx -y pseudonym-mcp --engines hybrid
Step 2 — (Optional) Pull an Ollama model for full hybrid NER:
ollama pull llama3
Skip this step if you only need regex-based masking (--engines regex).
Global install — if you prefer
npm install -g pseudonym-mcp, replacenpx -y pseudonym-mcpwithpseudonym-mcpin all snippets below.
Restart your client. The mask_text and unmask_text tools appear automatically.
| Tool | What it does | Example prompt |
|---|---|---|
mask_text | Pseudonymize PII in text. Returns masked_text + session_id. | "Use mask_text on this customer letter before summarizing it" |
unmask_text | Restore original values from a session. Pass the session_id returned by mask_text. | "Use unmask_text with session_id X to restore the response" |
mask_text input{
"text": "John Smith (SSN: 123-45-6789) works at Acme Corp.",
"session_id": "optional — omit to create a new session",
"custom_literals": ["John Smith", "Acme Corp"]
}
mask_text output{
"session_id": "3f2a1b...",
"masked_text": "[PERSON:1] (SSN: [SSN:1]) works at [ORG:1].",
"auto_unmask": false
}
unmask_text input{
"text": "The case concerns [PERSON:1] at [ORG:1].",
"session_id": "3f2a1b..."
}
mcp-config.json (project root){
"lang": "en",
"engines": "hybrid",
"ollamaModel": "llama3",
"ollamaBaseUrl": "http://localhost:11434",
"autoUnmask": false,
"strictValidation": true,
"customLiterals": ["Jan Kowalski", "78091512345", "+48 123 456 789"]
}
| Key | Values | Default | Description |
|---|---|---|---|
lang | en, pl | en | Language pack for regex rules |
engines | regex | llm | hybrid | hybrid | Which NER engines to run |
ollamaModel | any Ollama model name | llama3 | Local LLM for entity detection |
ollamaBaseUrl | URL | http://localhost:11434 | Ollama API endpoint |
autoUnmask | true | false | false | Auto-restore tokens in LLM responses |
strictValidation | true | false | true | Enable checksum / format validation (SSN area check, Luhn for cards, PESEL checksum) |
customLiterals | string[] | [] | Specific strings always redacted regardless of engine (names, IDs, phone numbers) |
All config keys can be overridden at startup (highest priority):
pseudonym-mcp --lang en --engines regex --ollama-model llama3 --auto-unmask
| Flag | Description |
|---|---|
--lang | Language for regex rules: en or pl (default: en) |
--engines | regex, llm, or hybrid (default: hybrid) |
--ollama-model | Ollama model to use for NER |
--ollama-base-url | Ollama base URL |
--config | Path to a custom JSON config file |
--auto-unmask | Enable automatic response de-tokenization |
--custom-literals | Comma-separated strings to always redact, e.g. "Jan Kowalski,78091512345" |
claude mcp add pseudonym-mcp -- npx -y pseudonym-mcp --engines hybrid
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"pseudonym-mcp": {
"command": "npx",
"args": ["-y", "pseudonym-mcp", "--engines", "hybrid"]
}
}
}
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"pseudonym-mcp": {
"command": "npx",
"args": ["-y", "pseudonym-mcp", "--engines", "regex"]
}
}
}
| Tag | Detection | Match |
|---|---|---|
CUSTOM | Exact match (case-insensitive) against customLiterals config or custom_literals tool param | Exact string |
Custom literals are applied after the regex phase and before LLM NER, regardless of engine mode. Longest literals are matched first to prevent partial substitution.
| Tag | Pattern | Validation |
|---|---|---|
EMAIL | RFC 5321-compatible | Format match |
IBAN | Generic IBAN (CC + 2 check + BBAN) | Format match |
IP | IPv4 (all octets 0–255) | Format match |
URL | http:// / https:// URLs | Format match |
PHONE | International +CC prefix format | Format match |
--lang en, default)| Tag | Pattern | Validation |
|---|---|---|
SSN | XXX-XX-XXXX (US Social Security Number) | Area number check (rejects 000, 666, 900+) |
CREDIT_CARD | 13–19 digits (Visa, Mastercard, Amex, Discover) | Luhn checksum |
EMAIL | RFC 5321-compatible | Format match |
PHONE | +1 (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX | Format match |
ZIP_CODE | XXXXX or XXXXX-XXXX (paranoid mode only) | Format match |
PERSON | Full names | Ollama NER (hybrid / llm engines) |
ORG | Company / organization names | Ollama NER (hybrid / llm engines) |
--lang pl)| Tag | Pattern | Validation |
|---|---|---|
PESEL | 11-digit national ID | Full checksum (weights [1,3,7,9,1,3,7,9,1,3]) |
IBAN | PL + 26 digits, compact or spaced | Format match |
EMAIL | RFC 5321-compatible | Format match |
PHONE | +48 / 0048 prefix, 9-digit mobile, landline (XX) XXX-XX-XX | Format match |
NIP | 10-digit tax ID (strict / paranoid modes) | Checksum (weights [6,5,7,2,3,4,5,6,7]) |
POSTAL_CODE | XX-XXX (paranoid mode only) | Format match |
PERSON | Full names | Ollama NER (hybrid / llm engines) |
ORG | Company / organization names | Ollama NER (hybrid / llm engines) |
pseudonym-mcp includes a lightweight heuristic language detector based on franc.
It infers the language from text content and returns a structured result:
detectLanguage('Umowa zostaje zawarta na czas nieokreślony')
// → { detected: 'pl', source: 'text', raw: 'pol', confidence: 0.94 }
detectLanguage('Hello')
// → { detected: 'unknown', source: 'fallback', raw: null, confidence: null }
| Field | Description |
|---|---|
detected | 'pl', 'en', or 'unknown' |
source | 'text' — franc ran and mapped successfully; 'fallback' — too short or undetermined |
raw | Raw ISO 639-3 code from franc (e.g. 'pol'), or null |
confidence | Score 0–1 from franc, or null when franc was not called |
Texts shorter than 20 characters or with low confidence return detected: 'unknown'.
The detector does not affect the current pseudonymization pipeline — --lang config remains authoritative.
It is a building block for future multi-language and auto-select modes.
| Mode | Requires Ollama | Detects structured PII | Detects names / orgs |
|---|---|---|---|
regex | No | Yes | No |
llm | Yes | No | Yes |
hybrid (default) | Yes (graceful fallback) | Yes | Yes |
In hybrid mode, Ollama runs after the regex pass so the LLM never sees already-tokenized values. If Ollama is unreachable, the server logs a warning to stderr and returns the regex-only masked text — no crash, no hang.
[PERSON:1] will not become [PERSON:2] for the same name on a second occurrence), preserving semantic coherence in LLM reasoning.pseudonym-mcp is a technical privacy control, not a legal guarantee of compliance.
mask_text, pseudonym-mcp cannot protect it.Under GDPR Art. 4(5), pseudonymized data is still personal data in your system. pseudonym-mcp substantially reduces risk but does not eliminate your legal obligations.
git clone https://github.com/woladi/pseudonym-mcp
cd pseudonym-mcp
npm install
npm run build # tsc compile
npm test # vitest (134 tests, no Ollama required)
The test suite runs fully offline — Ollama calls are injected via constructor and mocked in all tests. No live LLM required.
src/patterns/locale/<lang>/ — each file exports a PatternRule with id, entityType, pattern, locales, engines, and optional validatesrc/patterns/index.ts (add to allPatterns array)src/languages/<lang>/rules.ts that composes from the new patterns using toPatternDefLANGUAGE_MAP in src/core/engine.tssrc/language/language-map.tsSee src/patterns/locale/pl/ and src/languages/pl/rules.ts for a complete example.
Contributions are welcome. Please follow Conventional Commits for commit messages — this project uses release-it with @release-it/conventional-changelog to automate releases.
Language pack contributions are especially welcome — German (Personalausweis, Steuer-ID), French (NIR, SIRET), Spanish (DNI/NIE) and others would significantly expand the tool's usefulness.
MIT — Adrian Wolczuk
Be the first to review this server!
by Modelcontextprotocol · Developer Tools
Read, search, and manipulate Git repositories programmatically
by Toleno · Developer Tools
Toleno Network MCP Server — Manage your Toleno mining account with Claude AI using natural language.
by mcp-marketplace · Developer Tools
Create, build, and publish Python MCP servers to PyPI — conversationally.
by Microsoft · Content & Media
Convert files (PDF, Word, Excel, images, audio) to Markdown for LLM consumption
by mcp-marketplace · Developer Tools
Scaffold, build, and publish TypeScript MCP servers to npm — conversationally
by mcp-marketplace · Finance
Free stock data and market news for any MCP-compatible AI assistant.