Every time a coding agent sends a prompt to a cloud LLM, the full content of that prompt — your code, your credentials, your customer names, your internal project codenames — lands on someone else’s server. It may be logged, retained for training, produced in response to subpoena, or exfiltrated in a breach. TLS protects the wire. Nothing protects the content.

We built LLM-Redactor to measure exactly how much leaks and what you can do about it. The paper evaluates eight techniques on a common benchmark. This post is the practitioner’s summary.

The eight options#

We identified eight distinct approaches to privacy-preserving LLM requests, spanning “never leave the device” to “homomorphic encryption”:

OptionWhat it doesWorks today?
A Local-onlyRun inference on a local model (Ollama). Nothing leaves.Yes
B RedactNER + regex finds PII/secrets, replaces with typed placeholders, restores on response.Yes
C RephraseLocal model rewrites the prompt to strip implicit identity.Yes
D TEEForward to a Trusted Execution Environment (Nitro Enclave). Hardware attestation.Partial
E Split inferenceRun first layers locally, send only activations (not tokens).Research
F FHEFully homomorphic encryption. Cloud computes on ciphertext.Research
G MPCSecret-share input across non-colluding servers.Research
H DP noiseCalibrated word-level noise. Good for statistical workloads.Yes (niche)

What we measured#

We built a benchmark of 1,300 synthetic prompts across four workload classes:

  • WL1 (PII): 500 samples with names, emails, SSNs, addresses
  • WL2 (Secrets): 300 samples with API keys, AWS creds, passwords, PEM keys
  • WL3 (Implicit identity): 200 samples like “the CFO whose wife works at the competitor”
  • WL4 (Code): 300 samples with internal function names, schemas, project codenames

Each sample has ground-truth annotations (4,014 total). We measure how many survive the pipeline — the combined leak rate (exact verbatim match + partial substring match).

The headline numbers#

OptionPIISecretsImplicitCode
Baseline (no protection)100%100%100%100%
B (NER + regex)15.3%31.8%95.0%58.5%
B+C (redact + rephrase)13.9%31.6%94.1%55.8%
A (local routing)6.3%24.2%46.8%59.9%
A+B+C (recommended)0.6%6.4%43.6%31.3%

The combination A+B+C — route locally when possible, redact and rephrase the rest — achieves 0.6% combined leak on PII with zero exact leaks across 500 samples. That’s the best practical result in the entire matrix.

The implicit identity problem#

The hardest finding: implicit identity survives everything we throw at it. Phrases like “the CFO of Acme Corp whose wife works at the competitor” have no PII token to redact. Even the rephrase model preserves the structural relationships because they are the content of the prompt.

We built a semantic leak metric — a local judge model that reads the redacted text and determines if the original person/org is still identifiable. Results:

OptionSemantic leak rate
Baseline95%
B (NER)95%
B+C (rephrase)100%

Content-level transformations can remove tokens but not meaning. For implicit identity, your options are: (a) never send it to the cloud (Option A), (b) use a TEE (Option D), or (c) accept the residual risk.

Using it#

LLM-Redactor works as an HTTP proxy (transparent to the agent) or an MCP server (explicit tools). Four modes:

1. Transparent proxy#

uv run llm-redactor serve --port 7789
export OPENAI_API_BASE=http://localhost:7789/v1
# Your agent now routes through the redactor automatically

2. MCP tools#

Add to your MCP config and the agent gets redact.scrub, redact.restore, redact.detect, and llm.chat tools:

{
  "mcpServers": {
    "llm-redactor": {
      "command": "uv",
      "args": [
        "--directory", "/path/to/llm-redactor",
        "run", "llm-redactor", "mcp",
        "--config", "/path/to/llm-redactor/examples/max-privacy.yaml"
      ]
    }
  }
}

The llm.chat tool is a drop-in: the agent calls it instead of the LLM directly, and scrub/restore happens internally. The agent never sees placeholders.

3. Pre-tool hook#

A Claude Code hook that warns when sensitive content is about to leave through any tool call — a safety net alongside any of the other modes.

4. Belt and suspenders#

Run the proxy AND load the MCP tools. Proxy catches everything silently; MCP tools available for the agent to inspect what was caught.

The detector#

35+ regex pattern families covering:

  • PII: email, phone (US + intl), SSN, IPv4/v6, credit cards
  • Cloud keys: AWS (access/secret/session), GCP, Azure
  • Vendor API keys: OpenAI, Anthropic, GitHub, GitLab, Slack, Stripe, Twilio, SendGrid, npm, PyPI
  • Generic: passwords, JWTs, bearer/basic auth, PEM/SSH/PGP keys, connection strings

Plus Presidio NER for person/org/location names, with a false-positive suppression layer (drug names, abbreviations, generic words) and an optional LLM validation pass that sends each NER span to a local model for KEEP/DROP verdict.

What it costs#

Redaction actually reduces token count by 4–12% (placeholders are shorter than emails and API keys). The latency overhead:

OptionMedian latency
B (NER)~20ms
B+H (DP noise)~20ms
B+C (rephrase)~2s
A (local routing)~2s

For a utility evaluation, we ran a judge-model A/B comparison (n=50 per workload). The baseline response is preferred ~78% of the time — redaction does cost some quality, but for privacy-first workloads the trade-off is clear.

The paper#

The full evaluation with all eight options, four workloads, semantic leak analysis, epsilon sensitivity for DP noise, cross-family judge bias controls, and a decision rule for practitioners:

arXiv:2604.12064

Code, benchmarks, and configs: github.com/jayluxferro/llm-redactor


This is part of the LLM Agents series. Previously: Resilient Write.