Cross-Session Context Leakage - Ultra Security Research

Summary

When you ask an AI agent a question through MCP, a lot of context gets generated: your prompt, the tool calls the agent makes, the data those tools return, any documents or resources pulled in, and the agent's reasoning about all of it. That context is supposed to stay within your session. In reality, it often doesn't.

There's no standard mechanism for isolating context between MCP sessions. How servers handle state, how tool outputs are scoped, whether anything gets cleaned up when a session ends -- all of that is left to the individual implementation. But the servers are only half the picture. The AI agents and clients connecting to those servers accumulate massive amounts of context on their own. Every tool call response, every document retrieved, every database query result -- it all lives in the agent's context window for the duration of the session. The agent becomes a central aggregation point for sensitive data from every MCP server it's connected to, with no boundaries between any of it.

This matters because MCP tool calls frequently involve sensitive data: database query results, internal documents, API responses containing PII, financial data, proprietary information. If any of that persists in server state, gets cached in a shared resource, or accumulates unchecked in an agent's context, you have a data breach that nobody noticed.

What Is the Issue?

Context leakage in MCP happens through several mechanisms. The most immediate risks are the ones happening on every developer and AI-native user's machine right now.

Context Window Accumulation

This is the most common and least discussed form of context leakage. During an MCP session, every tool call response gets added to the AI model's context window. That context persists for the entire conversation and influences everything the model does afterward.

Say you ask your agent to pull a customer record from your database MCP server. The full response, name, email, billing address, payment history, is now in the context. Ten minutes later you ask the agent to draft a Slack message using your Slack MCP server. The model still has that customer data in context. If the agent references it, summarizes it, or even subtly incorporates it into the Slack message, that data has crossed a boundary.

This isn't a bug in any specific MCP server. It's how context windows work. The agent is the aggregation point. It collects outputs from every MCP server it's connected to and holds all of it in one shared context. MCP makes this worse because tool call responses tend to return structured, detailed data: full database rows, complete API responses, entire documents. Not vague summaries. The longer a session runs and the more tools you use, the more sensitive data piles up inside the agent with zero boundaries between any of it.

Tool Output Persistence

Even without explicit caching, tool outputs can leak between sessions through side effects that most developers don't think about:

Temp files: Tools that write intermediate results to shared temp directories
Log files: Verbose logging that captures full tool inputs and outputs, accessible to anyone on the machine
Database connections: Connection pools that carry session state between queries
Subprocess state: Tools that shell out to processes that maintain their own state

A developer debugging an issue might not realize that their MCP server's debug logs contain the full text of every document they retrieved, every database query result, and every API response from sitting in a plaintext log file indefinitely.

Persistent Memory and Context Features

Some MCP servers and clients implement persistent memory. Including, but not limited to: storing conversation history, user preferences, and/or retrieved documents across sessions. This is useful for continuity ("remember what we discussed last time") but creates leakage vectors even in single-user setups:

Context carryover: Previous session data loaded into a new session may contain sensitive information from unrelated conversations
Unbounded memory growth: Without cleanup, the memory store becomes a growing archive of everything you've ever accessed through MCP or that client
Resource caching: Documents, files, or API responses cached by the server persist beyond the session that requested them

As Adoption Scales: Shared Server State

The risks above exist on a single developer's machine today. They get significantly worse when MCP servers are shared across users, which is where things are heading as teams centralize MCP infrastructure.

MCP servers are long-running processes. When multiple users connect to the same server, the server's internal state can leak between them. Consider a database MCP server that caches query results:

# Common pattern in MCP servers: module-level cache
query_cache = {}

@mcp.tool()
def query_database(sql: str) -> dict:
    if sql in query_cache:
        return query_cache[sql]  # Returns cached result from ANY previous session

    result = db.execute(sql)
    query_cache[sql] = result
    return result

User A queries salary data. User B runs a similar query. The cache returns User A's results. The MCP server has no concept of which session a cached result belongs to.

This also applies to multi-agent setups where multiple AI agents share MCP servers. Agent A calls a tool that returns sensitive customer data, the server stores it in working state, and Agent B's subsequent call is influenced by or includes fragments of Agent A's data. There are no guardrails here. If the MCP server doesn't explicitly isolate state per session, it won't. And most don't.

Root Cause Analysis

The Agent as an Uncontrolled Data Aggregator

While server-side state gets most of the attention, the AI agent's context window is arguably the bigger risk. The agent sits at the center of every MCP interaction -- it's the one calling tools across multiple servers and accumulating all of their responses in a single context. That context is a data store that nothing manages or protects.

Once data enters the agent's context: through tool call responses, resource reads, or prompt templates, it stays there for the duration of the session and influences all subsequent interactions. This means:

Tool outputs from early in a session are still "visible" to the agent when processing later requests
Sensitive data returned by one server is available when the agent calls a completely different server
In long-running sessions, the agent holds a comprehensive record of everything the user accessed across every connected MCP server
If the agent is compromised or manipulated (via prompt injection, tool poisoning, etc.), it has access to all of that accumulated data at once

No Tenant Isolation Standard

For organizations running MCP servers as shared infrastructure (internal tool servers, centralized gateways), there's no standard for tenant isolation. The protocol doesn't define:

How to scope tool access per user, team, or organization
How to prevent cross-tenant data access through shared tools
How to audit which tenant's data was accessed in each session
How to enforce data residency or classification requirements per tenant

No Standard for Session Isolation

There's no true standard or clearly defined way for MCP servers to partition state between sessions. No mandated session identifiers to scope data. No requirements for isolating concurrent sessions. No cleanup obligations when a session ends. Servers that want to keep sessions separate have to build that entirely on their own, and most don't, because the single-user local setup doesn't demand it.

Stateful Servers in a Stateless-Assumed World

Many MCP servers are designed as if they handle one user at a time. This works for the common case (single developer, local machine), but breaks the moment the server is shared. Whether through a team deployment, a gateway, or a multi-agent orchestrator.

The transition from "my local MCP server" to "our shared MCP server" is where leakage starts, and nothing in the protocol flags this as a risk.

Risk & Impact Analysis

Why It Matters

Context leakage is uniquely dangerous because it's silent and cumulative:

No visible artifact: Unlike credential theft or data exfiltration, context leakage doesn't create obvious network traffic or access logs. Data moves within the normal flow of tool calls and responses.
Regulatory exposure: If PII, financial data, or health records leak between sessions, you have a confidentiality and/or compliance violation under GDPR, HIPAA, SOX, or PCI-DSS. Even if the data never leaves your infrastructure.
Privilege escalation by context: A user with read-only access to one dataset might receive data from a higher-privilege user's previous session through cached tool outputs. The access controls on the upstream service were correct, but the MCP layer bypassed them.
Compounding over time: Every session that runs through a stateful MCP server adds to the accumulated state. The longer the server runs, the more cross-session data it holds, and the higher the leakage risk.
Multi-agent amplification: In agentic workflows where multiple agents collaborate, each agent's context becomes a potential source of leakage for every other agent in the chain.

Who Is Affected

Teams sharing MCP server infrastructure (most common exposure)
Organizations using multi-agent frameworks with shared tool servers
Users of MCP servers with persistent memory or caching features
Developers building MCP servers who haven't considered multi-session state isolation
Enterprises evaluating MCP for production workflows with sensitive data

Proof of Concept: The Persistent Context

Scenario

A developer uses Claude Desktop (or any MCP client) with a few MCP servers connected: a database server for their company's customer data, a Slack server, and a GitHub server. Normal day, normal workflow. The problem is what the agent remembers between tasks, and what the MCP servers hold onto between sessions.

Session 1: Morning -- Customer Escalation

The developer is debugging a billing issue for an escalated customer. They ask the agent to pull the customer record.

User: "Pull up the account details for customer acct_8472"

Agent calls database MCP server → query_customer("acct_8472")

Response now in context:
  - Name: Sarah Chen
  - Email: sarah.chen@example.com
  - Phone: (415) 555-0142
  - Billing address: 742 Evergreen Terrace, San Francisco, CA
  - Payment method: Visa ending 4829
  - Account tier: Enterprise
  - Monthly spend: $12,400
  - Support history: 3 escalations in 90 days

The developer resolves the issue, writes some notes, and moves on. The full customer record, PII, payment info, and spending data is still sitting in the agent's context.

Still Session 1: Afternoon -- Different Task Entirely

Later that day, still in the same session, the developer switches to a completely unrelated task.

User: "Draft a Slack message to #engineering about the new API rate limits"

Agent calls slack MCP server → send_message(channel="#engineering", ...)

The agent still has Sarah Chen's full customer record in context. If the agent references any of that data while drafting the Slack message, even subtly, even accidentally, PII has just been posted to a channel with 200 engineers. The agent didn't mean to leak it. The data was just there, in context, with no boundary between the morning's customer lookup and the afternoon's Slack draft.

Session 2: Next Day -- The Memory Server

The developer's setup includes the MCP Memory server (https://github.com/modelcontextprotocol/servers/tree/main/src/memory) -- the official reference implementation from the MCP project that provides knowledge graph-based persistent memory. It's designed to store entities, relationships, and context across sessions so you can pick up where you left off. When they start a new session the next morning, the memory server loads previous context.

User: "What was I working on yesterday?"

Agent calls memory MCP server → recall_recent()

Response includes:
  - "Debugged billing issue for customer acct_8472 (Sarah Chen, Enterprise tier, $12,400/mo)"
  - "Drafted API rate limit announcement for #engineering"

Yesterday's customer PII is now in today's context. If the developer shares their screen in a meeting, if the agent references it in an unrelated conversation, if another tool call picks it up, the data has leaked across a session boundary.

Why This Is Realistic

This is how people actually use MCP clients: long sessions, switching between tasks, multiple servers connected at once
Memory/context persistence features are marketed as a benefit ("pick up where you left off")
Nobody thinks to clear their context window between tasks the way they might close browser tabs
The MCP servers did nothing wrong here. The database server returned what was asked for, the memory server stored what it was told to. The leakage happens because nothing separates the contexts

What Makes It Worse

The agent has no concept of data sensitivity. It treats a customer's SSN the same as a README file. Both are just tokens in the context.
There's no "forget" command. Once data is in the context window, there's no standard way to selectively remove it without ending the session.
Memory features compound the problem. Without memory, the data at least dies when the session ends. With memory, it persists indefinitely and resurfaces in future sessions where it has no business being.

In this scenario, one developer's normal workflow resulted in customer PII crossing into a Slack channel, persisting into the next day's session, and surfacing in screen shares, unrelated conversations, or future tool calls. Nobody attacked anything. Nobody misconfigured anything. The developer just used their tools the way they're designed to be used, and sensitive data quietly spread to places it was never supposed to go.

MCP-Specific Considerations

Why MCP Amplifies Context Leakage

Rich data in tool responses: MCP tool calls return structured data. Database rows, document contents, API responses, not just text summaries. The fidelity of leaked data is high.
Long-running server processes: MCP servers persist between client connections, accumulating state over time. Traditional stateless API patterns don't apply.
Agents as central aggregators: When multiple MCP servers feed into a single agent, the agent's context window becomes a mixing pot for data from all sources. The agent holds everything. Database records from one server, documents from another, API responses from a third, with no isolation between them.
Resource subscriptions: MCP's resource feature allows servers to push updates to clients. If resource subscriptions aren't session-scoped, one user's subscriptions might trigger updates in another user's session.
Prompt templates with embedded data: MCP prompt templates can include data from previous interactions. If templates aren't user-scoped, they can carry data across sessions.

Vulnerability Indicators

No Session Isolation Standard

No standard mechanism for partitioning server state between sessions

Caching Bypasses Access Controls

Upstream service ACLs are enforced on first access but not on cached results

Memory Features Create Cross-User Bleed

Persistent memory without user scoping merges everyone's data

Context Windows Have No Partitions

All tool outputs share the same context with no isolation

Severity Rating

Factor	Score	Rationale
Exploitability	6/10	Often requires shared infrastructure; not trivially exploitable from outside
Impact	8/10	Can expose PII, financial data, and privileged information across trust boundaries
Detection Difficulty	9/10	Leakage occurs within normal data flows; no anomalous network activity
Prevalence	7/10	Any shared MCP server with state is potentially affected; growing with multi-agent adoption
Remediation Complexity	7/10	Requires protocol-level session isolation and server-side architectural changes

Mitigations

For Organizations

Enforce Session-Scoped MCP Server State

Every MCP server in your infrastructure should partition its internal state by session. This includes:

Caches (keyed by session ID, not just query)
Memory stores (scoped to user or session)
Temp files (created in session-specific directories)
Database connections (using session-specific roles where possible)
Session terminations (close sessions when complete)

Audit Server State Management

Before deploying any MCP server to a shared environment, review how it handles:

Module-level or global variables
Caching layers (in-memory, Redis, filesystem)
Persistent storage (databases, files)
Logging (what gets written, who can read it)

Implement Context Boundaries

Deploy infrastructure that enforces context isolation:

Separate MCP server instances per user or team for sensitive workloads
Use containerized server deployments that spin up fresh per session
Route MCP traffic through gateways that enforce tenant isolation
Implement session cleanup that purges server state when a session ends
Suggest or force session terminations when agent conversations or sessions are over

Classify Data Flowing Through MCP

Apply your existing data classification policies to MCP tool outputs:

Which tools return data classified as Confidential or above?
Which tools access PII-containing systems?
Are those tools deployed on shared or isolated infrastructure?
Do tool outputs get logged, and if so, with what retention and access controls?

For MCP Server Developers

Design for Multi-Tenancy From Day One

Even if your server currently runs locally for one user, build with isolation in mind:

# Bad: Global cache
cache = {}

@mcp.tool()
def search(query: str) -> list:
    if query in cache:
        return cache[query]
    result = upstream.search(query)
    cache[query] = result
    return result

# Good: Session-scoped cache
from collections import defaultdict
session_caches = defaultdict(dict)

@mcp.tool()
def search(query: str, session_id: str) -> list:
    cache = session_caches[session_id]
    if query in cache:
        return cache[query]
    result = upstream.search(query)
    cache[query] = result
    return result

Implement Session Lifecycle Hooks

On session start: Initialize isolated state
On session end: Purge all session-specific data (cache, temp files, memory)
On error: Ensure cleanup runs even if the session terminates abnormally

Minimize State Retention

Prefer stateless tool implementations where possible
If caching is needed, use short TTLs and session-scoped keys
Don't build "memory" features unless you've solved multi-user isolation
Log tool call metadata (what was called, when) but not full responses

Respect Upstream Access Controls

If the upstream service enforces access controls, your cache must too:

Cache entries should include the user's identity and permissions
Cache hits should verify the requesting user has the same access level as the user who populated the cache
When in doubt, skip the cache and make a fresh upstream call

For MCP Client Developers

Pass session identifiers to servers: Give servers the information they need to scope state
Implement context window limits: Allow users to control how much tool output persists in context
Support context clearing: Let users flush the context window of sensitive data mid-session
Warn on sensitive data in context: Detect patterns (SSNs, credit card numbers, etc.) in tool outputs and flag them

Detection Methods

Static Analysis

Review MCP server code for module-level mutable state (dictionaries, lists, sets)
Identify caching implementations and check whether they're session-scoped
Scan for temp file usage without session-specific paths
Audit memory or persistence features for user isolation

Runtime Monitoring

Compare tool outputs across sessions for data that shouldn't be shared
Monitor for access patterns where users receive data they didn't request
Track cache hit rates, since unusually high rates in multi-user environments suggest cross-session hits
Log session lifecycle events and correlate with state management operations

Behavioral Analysis

Detect when tool outputs contain data that doesn't match the requesting user's access level
Flag sessions where the agent references information that wasn't retrieved in the current session
Monitor for data classification violations (e.g., Confidential data appearing in non-Confidential contexts)
Track context window sizes, since abnormally large contexts may indicate accumulated cross-session data

MCP Observability (monitoring tool call data flows)
MCP Sprawl (more servers = more shared state = more leakage vectors)
Shadow AI (unmanaged MCP deployments with no isolation controls)
MCP Security Guardrails (policy enforcement for context boundaries)
Tool Poisoning (attackers can deliberately trigger context leakage)
Credential & Secrets Exposure (plaintext credentials in MCP configs)

References

OWASP MCP Top 10 - Security Framework
MCP Specification - Protocol Lifecycle - Protocol Specification
Elastic Security Labs - MCP Attack Vectors - Research
Simon Willison - MCP Prompt Injection - Analysis
CyberArk - Full-Schema Poisoning in MCP - Research
OWASP Top 10 for LLM & Generative AI Security - Security Framework

Report generated as part of the MCP Security Research Project

Severity: 7.5/10 (High)