mirror of https://github.com/openai/codex.git synced 2026-05-18 18:22:39 +00:00

Files

Owen Lin 3516cb9751 fix(core): truncate large mcp tool outputs in rollouts (#20260 )

## Why
Large MCP tool call outputs can make rollout JSONL files enormous. In
the session that motivated this change, the biggest JSONL records were:
- `event_msg/mcp_tool_call_end`
- `response_item/function_call_output`

both containing the same unbounded MCP payloads - just 3 MCP tool calls
that each were multi-hundred MBs 😱

This PR truncates both of those JSONL records.

## How

#### For `response_item/function_call_output`
Unified exec already bounds tool output before it is injected into
model-facing history, which also keeps the corresponding rollout
`response_item/function_call_output` records small.

MCP should follow the same pattern: truncate the model-facing tool
output at the tool-output boundary, while leaving code-mode/raw hook
consumers alone.

#### For `event_msg/mcp_tool_call_end`
`McpToolCallEnd` also needs its own bounded event copy because it is the
app-server/replay/UI event shape that backs `ThreadItem::McpToolCall`.
Unfortunately this is _not_ downstream of the `ToolOutput` trait.

## Model behavior 
Model behavior is actually unchanged as a result of this PR. 

Before this PR, MCP output was:
1. Converted to `FunctionCallOutput`.
2. Recorded into in-memory history.
3. Truncated by `ContextManager::record_items()` before later model
turns saw it.

After this branch, MCP output is truncated earlier, in
`McpToolOutput::response_payload()`, using the same helper. Then
`ContextManager::record_items()` sees an already-truncated output and
effectively has little/no additional work to do.

So the model should still see the same kind of truncated function-call
output. The practical difference is where truncation happens: earlier,
before rollout persistence/app-server emission can see the giant
payload.

## Verification

- `cargo test -p codex-core mcp_tool_output`
- `cargo test -p codex-core
mcp_tool_call::tests::truncate_mcp_tool_result_for_event`
- `cargo test -p codex-core
mcp_post_tool_use_payload_uses_model_tool_name_args_and_result`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`

2026-04-30 16:30:43 +00:00

src

fix(core): truncate large mcp tool outputs in rollouts (#20260 )

2026-04-30 16:30:43 +00:00

templates

[tool_suggest] Improve tool_suggest triggering conditions. (#20091 )

2026-04-29 13:41:12 -07:00

tests

realtime: rename provider session ids (#20361 )

2026-04-30 13:39:48 +03:00

BUILD.bazel

test: increase core-all-test shard count to 16 (#19727 )

2026-04-26 23:10:26 +00:00

Cargo.toml

feat: split memories part 2 (#19860 )

2026-04-28 13:03:28 +02:00

config.schema.json

Add persisted hook enablement state (#19840 )

2026-04-30 04:46:32 +00:00

gpt_5_1_prompt.md

…

gpt_5_2_prompt.md

…

gpt_5_codex_prompt.md

…

gpt-5.1-codex-max_prompt.md

…

gpt-5.2-codex_prompt.md

…

hierarchical_agents_message.md

…

prompt_with_apply_patch_instructions.md

…

README.md

permissions: remove legacy read-only access modes (#19449 )

2026-04-24 17:16:58 -07:00

review_prompt.md

…

README.md

codex-core

This crate implements the business logic for Codex. It is designed to be used by the various Codex UIs written in Rust.

Dependencies

Note that codex-core makes some assumptions about certain helper utilities being available in the environment. Currently, this support matrix is:

macOS

Expects /usr/bin/sandbox-exec to be present.

When using the workspace-write sandbox policy, the Seatbelt profile allows writes under the configured writable roots while keeping .git (directory or pointer file), the resolved gitdir: target, and .codex read-only.

Network access and filesystem read/write roots are controlled by SandboxPolicy. Seatbelt consumes the resolved policy and enforces it.

Seatbelt also keeps the legacy default preferences read access (user-preference-read) needed for cfprefs-backed macOS behavior.

Linux

Expects the binary containing codex-core to run the equivalent of codex sandbox linux (legacy alias: codex debug landlock) when arg0 is codex-linux-sandbox. See the codex-arg0 crate for details.

Legacy SandboxPolicy / sandbox_mode configs are still supported on Linux. They can continue to use the legacy Landlock path when the split filesystem policy is sandbox-equivalent to the legacy model after cwd resolution. Split filesystem policies that need direct FileSystemSandboxPolicy enforcement, such as read-only or denied carveouts under a broader writable root, automatically route through bubblewrap. The legacy Landlock path is used only when the split filesystem policy round-trips through the legacy SandboxPolicy model without changing semantics. That includes overlapping cases like /repo = write, /repo/a = none, /repo/a/b = write, where the more specific writable child must reopen under a denied parent.

The Linux sandbox helper prefers the first bwrap found on PATH outside the current working directory whenever it is available. If bwrap is present but too old to support --argv0, the helper keeps using system bubblewrap and switches to a no---argv0 compatibility path for the inner re-exec. If bwrap is missing, it falls back to the vendored bubblewrap path compiled into the binary and Codex surfaces a startup warning through its normal notification path instead of printing directly from the sandbox helper. Codex also surfaces a startup warning when bubblewrap cannot create user namespaces. WSL2 uses the normal Linux bubblewrap path. WSL1 is not supported for bubblewrap sandboxing because it cannot create the required user namespaces, so Codex rejects sandboxed shell commands that would enter the bubblewrap path before invoking bwrap.

Windows

Legacy SandboxPolicy / sandbox_mode configs are still supported on Windows. Legacy read-only and workspace-write policies imply full filesystem read access; exact readable roots are represented by split filesystem policies instead.

The elevated Windows sandbox also supports:

legacy ReadOnly and WorkspaceWrite behavior
split filesystem policies that need exact readable roots, exact writable roots, or extra read-only carveouts under writable roots
backend-managed system read roots required for basic execution, such as C:\Windows, C:\Program Files, C:\Program Files (x86), and C:\ProgramData, when a split filesystem policy requests platform defaults

The unelevated restricted-token backend still supports the legacy full-read Windows model for legacy ReadOnly and WorkspaceWrite behavior. It also supports a narrow split-filesystem subset: full-read split policies whose writable roots still match the legacy WorkspaceWrite root set, but add extra read-only carveouts under those writable roots.

New [permissions] / split filesystem policies remain supported on Windows only when they can be enforced directly by the selected Windows backend or round-trip through the legacy SandboxPolicy model without changing semantics. Policies that would require direct explicit unreadable carveouts (none) or reopened writable descendants under read-only carveouts still fail closed instead of running with weaker enforcement.

All Platforms

Expects the binary containing codex-core to simulate the virtual apply_patch CLI when arg1 is --codex-run-as-apply-patch. See the codex-arg0 crate for details.