OpenAI Compaction API: Maintaining Agent Quality While Cutting Context Costs

OpenAI has added compaction to the Responses API: context is compressed not into a human-readable summary, but into an opaque encrypted_content blob that preserves the model's latent understanding. For business, this means more stable agents in long dialogues, lower token costs, and less quality degradation compared to traditional summarization.

Technical Context

I took a close look at the OpenAI documentation regarding compaction in the Responses API and noticed a shift that is architecturally more significant than just another "larger context" update. After a call, the output contains a special item with type=compaction, containing opaque encrypted_content. According to OpenAI, this “preserves the model’s latent understanding.” To me, this means we are not getting a retelling of history, but a transfer of the internal state of understanding into a compact container.

The key practical difference from classic summarization is this: a summary is text forced to "distill" the past, almost always losing details, causal links, and subtle agreements. Compaction, however, attempts to preserve exactly what the model actually uses to continue reasoning, but does so in a form that the client cannot interpret or edit.

From an integration standpoint, there are two main paths:

Auto-context management in /responses: you set context_management with a compact_threshold (e.g., 0.9). When the context window approaches the limit, the platform can emit a compaction item during the stream, "trim" the history, and continue generation with the compact representation.
Manual execution via /responses/compact: you send the full input (which must fit within the model's limit at the time of compaction) and receive a "packed window" for the next /responses call.

What caught my eye as an architect: encrypted_content can be very large (documentation mentions values up to ~10M characters), yet it remains opaque. This isn't "saving down to a couple of lines"; it's a different format for storing the context trace. Another point: OpenAI emphasizes compatibility with ZDR (Zero Data Retention), which sounds like an attempt to minimize storage/leakage risks for corporate environments. However, without disclosing key management details and internal cryptography, I wouldn't promise security teams "magic privacy"—I would simply treat it as a provider-managed mechanism.

It is also important to understand that compaction in their world model doesn't necessarily replace all user messages: descriptions suggest that user text might be preserved more "as is," while assistant/tool/reasoning layers get packed. For agentic systems, this is logical: the tool trace and reasoning chains bloat the window fastest and break most often with primitive summarization.

Business & Automation Impact

In practice, I almost always hit the same pain point: "the agent gets smarter for the first 10–20 minutes, then starts getting dumber." The reason is usually not the model itself, but how the team manages context: history was cut, a rough summary was made, constraints/goals/exceptions were lost—and the agent starts making different decisions. Compaction is the provider's attempt to solve exactly this class of degradation.

If I am designing AI automation for sales, support, or engineering teams, compaction changes the TCO mathematics: long sessions become more predictable in cost, and quality after "compression" potentially dips less than with text summarization. The winners are those who have:

long-lived dialogues (support L2/L3, onboarding, procurement, tender correspondence);
tool-heavy agents (CRM/ERP, catalogs, RPA orchestration, code-assist with many tool calls);
processes where decision sequence is critical (compliance scripts, approvals, regulations).

The losers are those who counted on "universal history portability" between providers. The opaque blob is effectively vendor lock-in at the agent memory level. If tomorrow you want to move to another LLM stack, this context layer will not migrate. In my projects, I would establish a simple rule: original artifacts must be stored locally (user messages, tool inputs/outputs, critical facts, decisions, and their rationale)—otherwise, you lose manageability and auditability.

Another business detail: compaction isn't just "fewer tokens and done." It affects observability. You can no longer read exactly what the agent "remembers" after compaction, which increases the role of tests, regression scenarios, and quality telemetry. In Nahornyi AI Lab, I would include this in the AI implementation plan as a mandatory layer: eval sets for key intents, checks for constraint preservation, and separate tests for the "post-compaction" state.

From an architectural perspective, I see a practical pattern: compaction is the provider's "fast memory," while the client retains "long-term memory" in a structured form (facts/decisions/constraints) and raw storage for audit. In this setup, compaction provides speed and stability, while your environment provides control.

Strategic Vision & Deep Dive

My forecast: the agent platform market will move from discussing "which summary is better" to "whose memory is better." Compaction is a step toward managed memory on the model side, where the provider optimizes state transfer just as they optimize inference. And this is a strong move: it removes the need for teams to invent their own dialogue compression algorithms, which almost always end up fragile.

But I don't buy this as a universal solution. In Nahornyi AI Lab projects, I regularly see two classes of requirements that compaction does not cover:

Explainability: Business often needs to understand why an agent reached a decision. An opaque blob won't help. Therefore, I still record critical decisions in readable logs: "what facts were used," "which rules triggered," "what the tool returned."
Managed Forgetting: Sometimes you need to guarantee that part of the context is "erased" (PII, trade secrets, erroneous instructions). When memory is opaque, you have to build a policy: what is not allowed into context at all, what goes through redaction before sending, and how retention/deletion works on your side.

I would use compaction as a mechanism to stabilize agent reasoning, but not as the single source of truth. Real AI solution architecture in production is a layered cake: event logs, structured "facts," RAG over corporate data, safety rules, and only on top of that—the conversational layer which can be compacted.

And there is another trap that is easy to fall into: "since everything is saved latently, we don't need to think about context." You do. If you feed the model garbage (duplicates, contradictions, unnecessary tool traces), you risk preserving this garbage in a more resilient form. The benefit of compaction is revealed when you have already established proper message discipline, tool result typing, and noise minimization beforehand.

Going forward, there will be less hype about "giant windows" and more engineering around memory, control, and cost. Compaction is a powerful tool, but those who implement it as part of a system, rather than just a checkbox in the SDK, will win.

If you are building a long-lived agent or want to implement AI automation without budget sprawl, I invite you to discuss the architecture and test plan for your process. Write to Nahornyi AI Lab—I, Vadym Nahornyi, will personally conduct the consultation and propose an implementation scheme considering quality, security, and future portability.

Share this article

Twitter/X LinkedIn Telegram

OpenAI Compaction API: Maintaining Agent Quality While Cutting Context Costs

Technical Context

Business & Automation Impact

Strategic Vision & Deep Dive

More News

Gemma 4 Becomes Significantly More Practical on Edge

364M parameters and a new chance for on-device AI