codex

mirror of https://github.com/openai/codex.git synced 2026-04-30 17:36:40 +00:00

Author	SHA1	Message	Date
Michael Bolin	b4e9baaaff	sandboxing: plumb split sandbox policies through runtime	2026-03-06 13:03:45 -08:00
Michael Bolin	929eeaf2c9	config: add v3 filesystem permission profiles	2026-03-06 13:03:45 -08:00
sayan-oai	8a54d3caaa	feat: structured plugin parsing (#13711 ) #### What Add structured `@plugin` parsing and TUI support for plugin mentions. - Core: switch from plain-text `@display_name` parsing to structured `plugin://...` mentions via `UserInput::Mention` and `[$...](plugin://...)` links in text, same pattern as apps/skills. - TUI: add plugin mention popup, autocomplete, and chips when typing `$`. Load plugin capability summaries and feed them into the composer; plugin mentions appear alongside skills and apps. - Generalize mention parsing to a sigil parameter, still defaults to `$` <img width="797" height="119" alt="image" src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb" /> Builds on #13510. Currently clients have to build their own `id` via `plugin@marketplace` and filter plugins to show by `enabled`, but we will add `id` and `available` as fields returned from `plugin/list` soon. ####Tests Added tests, verified locally.	2026-03-06 11:08:36 -08:00
Charley Cunningham	cb1a182bbe	Clarify sandbox permission override helper semantics (#13703 ) ## Summary Today `SandboxPermissions::requires_additional_permissions()` does not actually mean "is `WithAdditionalPermissions`". It returns `true` for any non-default sandbox override, including `RequireEscalated`. That broad behavior is relied on in multiple `main` callsites. The naming is security-sensitive because `SandboxPermissions` is used on shell-like tool calls to tell the executor how a single command should relate to the turn sandbox: - `UseDefault`: run with the turn sandbox unchanged - `RequireEscalated`: request execution outside the sandbox - `WithAdditionalPermissions`: stay sandboxed but widen permissions for that command only ## Problem The old helper name reads as if it only applies to the `WithAdditionalPermissions` variant. In practice it means "this command requested any explicit sandbox override." That ambiguity made it easy to read production checks incorrectly and made the guardian change look like a standalone `main` fix when it is not. On `main` today: - `shell` and `unified_exec` intentionally reject any explicit `sandbox_permissions` request unless approval policy is `OnRequest` - `exec_policy` intentionally treats any explicit sandbox override as prompt-worthy in restricted sandboxes - tests intentionally serialize both `RequireEscalated` and `WithAdditionalPermissions` as explicit sandbox override requests So changing those callsites from the broad helper to a narrow `WithAdditionalPermissions` check would be a behavior change, not a pure cleanup. ## What This PR Does - documents `SandboxPermissions` as a per-command sandbox override, not a generic permissions bag - adds `requests_sandbox_override()` for the broad meaning: anything except `UseDefault` - adds `uses_additional_permissions()` for the narrow meaning: only `WithAdditionalPermissions` - keeps `requires_additional_permissions()` as a compatibility alias to the broad meaning for now - updates the current broad callsites to use the accurately named broad helper - adds unit coverage that locks in the semantics of all three helpers ## What This PR Does Not Do This PR does not change runtime behavior. That is intentional. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-06 09:57:48 -08:00
Won Park	ee1a20258a	Enabling CWD Saving for Image-Gen (#13607 ) Codex now saves the generated image on to your current working directory.	2026-03-06 00:47:21 -08:00
Ahmed Ibrahim	629cb15bc6	Replay thread rollback from rollout history (#13615 ) - Replay thread rollback from the persisted rollout history instead of truncating in-memory state.\n- Add rollback coverage, including rollback-behind-compaction snapshot coverage.	2026-03-05 16:40:09 -08:00
Ahmed Ibrahim	6cf0ed4e79	Refine realtime startup context formatting (#13560 ) ## Summary - group recent work by git repo when available, otherwise by directory - render recent work as bounded user asks with per-thread cwd context - exclude hidden files and directories from workspace trees	2026-03-05 16:31:20 -08:00
sayan-oai	4e77ea0ec7	add @plugin mentions (#13510 ) ## Note-- added plugin mentions via @, but that conflicts with file mentions depends and builds upon #13433. - introduces explicit `@plugin` mentions. this injects the plugin's mcp servers, app names, and skill name format into turn context as a dev message. - we do not yet have UI for these mentions, so we currently parse raw text (as opposed to skills and apps which have UI chips, autocomplete, etc.) this depends on a `plugins/list` app-server endpoint we can feed the UI with, which is upcoming - also annotate mcp and app tool descriptions with the plugin(s) they come from. this gives the model a first class way of understanding what tools come from which plugins, which will help implicit invocation. ### Tests Added and updated tests, unit and integration. Also confirmed locally a raw `@plugin` injects the dev message, and the model knows about its apps, mcps, and skills.	2026-03-06 00:03:39 +00:00
Owen Lin	aa3fe8abf8	feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602 ) This PR adds a durable trace linkage for each turn by storing the active trace ID on the rollout TurnContext record stored in session rollout files. Before this change, we propagated trace context at runtime but didn’t persist a stable per-turn trace key in rollout history. That made after-the-fact debugging harder (for example, mapping a historical turn to the corresponding trace in datadog). This sets us up for much easier debugging in the future. ### What changed - Added an optional `trace_id` to TurnContextItem (rollout schema). - Added a small OTEL helper to read the current span trace ID. - Captured `trace_id` when creating `TurnContext` and included it in `to_turn_context_item()`. - Updated tests and fixtures that construct TurnContextItem so older/no-trace cases still work. ### Why this approach TurnContext is already the canonical durable per-turn metadata in rollout. This keeps ownership clean: trace linkage lives with other persisted turn metadata.	2026-03-05 13:26:48 -08:00
Curtis 'Fjord' Hawthorne	657841e7f5	Persist initialized js_repl bindings after failed cells (#13482 ) ## Summary - Change `js_repl` failed-cell persistence so later cells keep prior bindings plus only the current-cell bindings whose initialization definitely completed before the throw. - Preserve initialized lexical bindings across failed cells via module-namespace readability, including top-level destructuring that partially succeeds before a later throw. - Preserve hoisted `var` and `function` bindings only when execution clearly reached their declaration site, and preserve direct top-level pre-declaration `var` writes and updates through explicit write-site markers. - Preserve top-level `for...in` / `for...of` `var` bindings when the loop body executes at least once, using a first-iteration guard to avoid per-iteration bookkeeping overhead. - Keep prior module state intact across link-time failures and evaluation failures before the prelude runs, while still allowing failed cells that already recreated prior bindings to persist updates to those existing bindings. - Hide internal commit hooks from user `js_repl` code after the prelude aliases them, so snippets cannot spoof committed bindings by calling the raw `import.meta` hooks directly. - Add focused regression coverage for the supported failed-cell behaviors and the intentionally unsupported boundaries. - Update `js_repl` docs and generated instructions to describe the new, narrower failed-cell persistence model. ## Motivation We saw `js_repl` drop bindings that had already been initialized successfully when a later statement in the same cell threw, for example: const { context: liveContext, session } = await initializeGoogleSheetsLiveForTab(tab); // later statement throws That was surprising in practice because successful earlier work disappeared from the next cell. This change makes failed-cell persistence more useful without trying to model every possible partially executed JavaScript edge case. The resulting behavior is narrower and easier to reason about: - prior bindings are always preserved - lexical bindings persist when their initialization completed before the throw - hoisted `var` / `function` bindings persist only when execution clearly reached their declaration or a supported top-level `var` write site - failed cells that already recreated prior bindings can persist writes to those existing bindings even if they introduce no new bindings The detailed edge-case matrix stays in `docs/js_repl.md`. The model-facing `project_doc` guidance is intentionally shorter and focused on generation-relevant behavior. ## Supported Failed-Cell Behavior - Prior bindings remain available after a failed cell. - Initialized lexical bindings remain available after a failed cell. - Top-level destructuring like `const { a, b } = ...` preserves names whose initialization completed before a later throw. - Hoisted `function` bindings persist when execution reached the declaration statement before the throw. - Direct top-level pre-declaration `var` writes and updates persist, for example: - `x = 1` - `x += 1` - `x++` - short-circuiting logical assignments only persist when the write branch actually runs - Non-empty top-level `for...in` / `for...of` `var` loops persist their loop bindings. - Failed cells can persist updates to existing carried bindings after the prelude has run, even when the cell commits no new bindings. - Link failures and eval failures before the prelude do not poison `@prev`. ## Intentionally Unsupported Failed-Cell Cases - Hoisted function reads before the declaration, such as `foo(); ...; function foo() {}` - Aliasing or inference-based recovery from reads before declaration - Nested writes inside already-instrumented assignment RHS expressions - Destructuring-assignment recovery for hoisted `var` - Partial `var` destructuring recovery - Pre-declaration `undefined` reads for hoisted `var` - Empty top-level `for...in` / `for...of` loop vars - Nested or scope-sensitive pre-declaration `var` writes outside direct top-level expression statements	2026-03-05 11:01:46 -08:00
sayan-oai	03d55f0e6f	chore: add web_search_tool_type for image support (#13538 ) add `web_search_tool_type` on model_info that can be populated from backend. will be used to filter which models can use `web_search` with images and which cant. added small unit test.	2026-03-05 07:02:27 +00:00
sayan-oai	d44398905b	feat: track plugins mcps/apps and add plugin info to user_instructions (#13433 ) ### first half of changes, followed by #13510 Track plugin capabilities as derived summaries on `PluginLoadOutcome` for enabled plugins with at least one skill/app/mcp. Also add `Plugins` section to `user_instructions` injected on session start. These introduce the plugins concept and list enabled plugins, but do NOT currently include paths to enabled plugins or details on what apps/mcps the plugins contain (current plan is to inject this on @-mention). that can be adjusted in a follow up and based on evals. ### tests Added/updated tests, confirmed locally that new `Plugins` section + currently enabled plugins show up in `user_instructions`.	2026-03-04 19:46:13 -08:00
Won Park	229e6d0347	image-gen-event/client_processing (#13512 ) enabling client-side to process with image-generation capabilities (setting app-server)	2026-03-04 16:54:38 -08:00
Ahmed Ibrahim	294079b0b1	Prefix handoff messages with role (#13505 ) Format handoff context by prefixing each message with its role (for example "user:" and "assistant:") before forwarding to the agent.	2026-03-04 15:37:31 -08:00
Won Park	fa2306b303	image-gen-core (#13290 ) Core tool-calling for image-gen, handles requesting and receiving logic for images using response API	2026-03-03 23:11:28 -08:00
Val Kharitonov	4f6c4bb143	support 'flex' tier in app-server in addition to 'fast' (#13391 )	2026-03-03 22:46:05 -08:00
Michael Bolin	7134220f3c	core: box wrapper futures to reduce stack pressure (#13429 ) Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This uses the same general fix pattern as [#12421](https://github.com/openai/codex/pull/12421), but in the `codex-core` compact/resume/fork path. ## Why `compact_resume_after_second_compaction_preserves_history` started overflowing the stack on Windows CI after `#13388`. The important part is that this was not a compaction-recursion bug. The test exercises a path with several thin `async fn` wrappers around much larger thread-spawn, resume, and fork futures. When one `async fn` awaits another inline, the outer future stores the callee future as part of its own state machine. In a long wrapper chain, that means a caller can accidentally inline a lot more state than the source code suggests. That is exactly what was happening here: - `ThreadManager` convenience methods such as `start_thread`, `resume_thread_from_rollout`, and `fork_thread` were inlining the larger spawn/resume futures beneath them. - `core_test_support::test_codex` added another wrapper layer on top of those same paths. - `compact_resume_fork` adds a few more helpers, and this particular test drives the resume/fork path multiple times. On Windows, that was enough to push both the libtest thread and Tokio worker threads over the edge. The previous 8 MiB test-thread workaround proved the failure was stack-related, but it did not address the underlying future size. ## How This Was Debugged The useful debugging pattern here was to turn the CI-only failure into a local low-stack repro. 1. First, remove the explicit large-stack harness so the test runs on the normal `#[tokio::test]` path. 2. Build the test binary normally. 3. Re-run the already-built `tests/all` binary directly with progressively smaller `RUST_MIN_STACK` values. Running the built binary directly matters: it keeps the reduced stack size focused on the test process instead of also applying it to `cargo` and `rustc`. That made it possible to answer two questions quickly: - Does the failure still reproduce without the workaround? Yes. - Does boxing the wrapper futures actually buy back stack headroom? Also yes. After this change, the built test binary passes with `RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough evidence to justify removing the explicit 8 MiB override while keeping a deterministic low-stack repro for future debugging. If we hit a similar issue again, the first places to inspect are thin `async fn` wrappers that mostly forward into a much larger async implementation. ## `Box::pin()` Primer `async fn` compiles into a state machine. If a wrapper does this: ```rust async fn wrapper() { inner().await; } ``` then `wrapper()` stores the full `inner()` future inline as part of its own state. If the wrapper instead does this: ```rust async fn wrapper() { Box::pin(inner()).await; } ``` then the child future lives on the heap, and the outer future only stores a pinned pointer to it. That usually trades one allocation for a substantially smaller outer future, which is exactly the tradeoff we want when the problem is stack pressure rather than raw CPU time. Useful references: - [`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin) - [Async book: Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html) ## What Changed - Boxed the wrapper futures in `core/src/thread_manager.rs` around `start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the corresponding `ThreadManagerState` spawn helpers so callers no longer inline the full spawn/resume state machine through multiple layers. - Boxed the matching test-only wrapper futures in `core/tests/common/test_codex.rs` and `core/tests/suite/compact_resume_fork.rs`, which sit directly on top of the same path. - Restored `compact_resume_after_second_compaction_preserves_history` in `core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]` and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing. - Simplified a tiny helper in `compact_resume_fork` by making `fetch_conversation_path()` synchronous, which removes one more unnecessary future layer from the test path. ## Verification - `cargo test -p codex-core --test all suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history -- --exact --nocapture` - `cargo test -p codex-core --test all suite::compact_resume_fork -- --nocapture` - Re-ran the built `codex-core` `tests/all` binary directly with reduced stack sizes: - `RUST_MIN_STACK=917504` passes - `RUST_MIN_STACK=786432` still overflows - `cargo test -p codex-core` - Still fails locally in unrelated existing integration areas that expect the `codex` / `test_stdio_server` binaries or hit the existing `search_tool` wiremock mismatches.	2026-03-04 05:44:52 +00:00
Michael Bolin	bfff0c729f	config: enforce enterprise feature requirements (#13388 ) ## Why Enterprises can already constrain approvals, sandboxing, and web search through `requirements.toml` and MDM, but feature flags were still only configurable as managed defaults. That meant an enterprise could suggest feature values, but it could not actually pin them. This change closes that gap and makes enterprise feature requirements behave like the other constrained settings. The effective feature set now stays consistent with enterprise requirements during config load, when config writes are validated, and when runtime code mutates feature flags later in the session. It also tightens the runtime API for managed features. `ManagedFeatures` now follows the same constraint-oriented shape as `Constrained<T>` instead of exposing panic-prone mutation helpers, and production code can no longer construct it through an unconstrained `From<Features>` path. The PR also hardens the `compact_resume_fork` integration coverage on Windows. After the feature-management changes, `compact_resume_after_second_compaction_preserves_history` was overflowing the libtest/Tokio thread stacks on Windows, so the test now uses an explicit larger-stack harness as a pragmatic mitigation. That may not be the ideal root-cause fix, and it merits a parallel investigation into whether part of the async future chain should be boxed to reduce stack pressure instead. ## What Changed Enterprises can now pin feature values in `requirements.toml` with the requirements-side `features` table: ```toml [features] personality = true unified_exec = false ``` Only canonical feature keys are allowed in the requirements `features` table; omitted keys remain unconstrained. - Added a requirements-side pinned feature map to `ConfigRequirementsToml`, threaded it through source-preserving requirements merge and normalization in `codex-config`, and made the TOML surface use `[features]` (while still accepting legacy `[feature_requirements]` for compatibility). - Exposed `featureRequirements` from `configRequirements/read`, regenerated the JSON/TypeScript schema artifacts, and updated the app-server README. - Wrapped the effective feature set in `ManagedFeatures`, backed by `ConstrainedWithSource<Features>`, and changed its API to mirror `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`, and result-returning `enable` / `disable` / `set_enabled` helpers. - Removed the legacy-usage and bulk-map passthroughs from `ManagedFeatures`; callers that need those behaviors now mutate a plain `Features` value and reapply it through `set(...)`, so the constrained wrapper remains the enforcement boundary. - Removed the production loophole for constructing unconstrained `ManagedFeatures`. Non-test code now creates it through the configured feature-loading path, and `impl From<Features> for ManagedFeatures` is restricted to `#[cfg(test)]`. - Rejected legacy feature aliases in enterprise feature requirements, and return a load error when a pinned combination cannot survive dependency normalization. - Validated config writes against enterprise feature requirements before persisting changes, including explicit conflicting writes and profile-specific feature states that normalize into invalid combinations. - Updated runtime and TUI feature-toggle paths to use the constrained setter API and to persist or apply the effective post-constraint value rather than the requested value. - Updated the `core_test_support` Bazel target to include the bundled core model-catalog fixtures in its runtime data, so helper code that resolves `core/models.json` through runfiles works in remote Bazel test environments. - Renamed the core config test coverage to emphasize that effective feature values are normalized at runtime, while conflicting persisted config writes are rejected. - Ran `compact_resume_after_second_compaction_preserves_history` inside an explicit 8 MiB test thread and Tokio runtime worker stack, following the existing larger-stack integration-test pattern, to keep the Windows `compact_resume_fork` test slice from aborting while a parallel investigation continues into whether some of the underlying async futures should be boxed. ## Verification - `cargo test -p codex-config` - `cargo test -p codex-core feature_requirements_ -- --nocapture` - `cargo test -p codex-core load_requirements_toml_produces_expected_constraints -- --nocapture` - `cargo test -p codex-core compact_resume_after_second_compaction_preserves_history -- --nocapture` - `cargo test -p codex-core compact_resume_fork -- --nocapture` - Re-ran the built `codex-core` `tests/all` binary with `RUST_MIN_STACK=262144` for `compact_resume_after_second_compaction_preserves_history` to confirm the explicit-stack harness fixes the deterministic low-stack repro. - `cargo test -p codex-core` - This still fails locally in unrelated integration areas that expect the `codex` / `test_stdio_server` binaries or hit existing `search_tool` wiremock mismatches. ## Docs `developers.openai.com/codex` should document the requirements-side `[features]` table for enterprise and MDM-managed configuration, including that it only accepts canonical feature keys and that conflicting config writes are rejected.	2026-03-04 04:40:22 +00:00
sayan-oai	082682a628	feat: load plugin apps (#13401 ) load plugin-apps from `.app.json`. make apps runtime-mentionable iff `codex_apps` MCP actually exposes tools for that `connector_id`. if the app isn't available, it's filtered out of runtime connector set, so no tools are added and no app-mentions resolve. right now we don't have a clean cli-side error for an app not being installed. can look at this after. ### Tests Added tests, tested locally that using a plugin that bundles an app picks up the app.	2026-03-03 16:29:15 -08:00
Curtis 'Fjord' Hawthorne	c4cb594e73	Make js_repl image output controllable (#13331 ) ## Summary Instead of always adding inner function call outputs to the model context, let js code decide which ones to return. - Stop auto-hoisting nested tool outputs from `codex.tool(...)` into the outer `js_repl` function output. - Keep `codex.tool(...)` return values unchanged as structured JS objects. - Add `codex.emitImage(...)` as the explicit path for attaching an image to the outer `js_repl` function output. - Support emitting from a direct image URL, a single `input_image` item, an explicit `{ bytes, mimeType }` object, or a raw tool response object containing exactly one image. - Preserve existing `view_image` original-resolution behavior when JS emits the raw `view_image` tool result. - Suppress the special `ViewImageToolCall` event for `js_repl`-sourced `view_image` calls so nested inspection stays side-effect free until JS explicitly emits. - Update the `js_repl` docs and generated project instructions with both recommended patterns: - `await codex.emitImage(codex.tool("view_image", { path }))` - `await codex.emitImage({ bytes: await page.screenshot({ type: "jpeg", quality: 85 }), mimeType: "image/jpeg" })` #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/13050 - 👉 `2` https://github.com/openai/codex/pull/13331 - ⏳ `3` https://github.com/openai/codex/pull/13049	2026-03-03 16:25:59 -08:00
Curtis 'Fjord' Hawthorne	b92146d48b	Add under-development original-resolution view_image support (#13050 ) ## Summary Add original-resolution support for `view_image` behind the under-development `view_image_original_resolution` feature flag. When the flag is enabled and the target model is `gpt-5.3-codex` or newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends `detail: "original"` to the Responses API instead of using the legacy resize/compress path. ## What changed - Added `view_image_original_resolution` as an under-development feature flag. - Added `ImageDetail` to the protocol models and support for serializing `detail: "original"` on tool-returned images. - Added `PromptImageMode::Original` to `codex-utils-image`. - Preserves original PNG/JPEG/WebP bytes. - Keeps legacy behavior for the resize path. - Updated `view_image` to: - use the shared `local_image_content_items_with_label_number(...)` helper in both code paths - select original-resolution mode only when: - the feature flag is enabled, and - the model slug parses as `gpt-5.3-codex` or newer - Kept local user image attachments on the existing resize path; this change is specific to `view_image`. - Updated history/image accounting so only `detail: "original"` images use the docs-based GPT-5 image cost calculation; legacy images still use the old fixed estimate. - Added JS REPL guidance, gated on the same feature flag, to prefer JPEG at 85% quality unless lossless is required, while still allowing other formats when explicitly requested. - Updated tests and helper code that construct `FunctionCallOutputContentItem::InputImage` to carry the new `detail` field. ## Behavior ### Feature off - `view_image` keeps the existing resize/re-encode behavior. - History estimation keeps the existing fixed-cost heuristic. ### Feature on + `gpt-5.3-codex+` - `view_image` sends original-resolution images with `detail: "original"`. - PNG/JPEG/WebP source bytes are preserved when possible. - History estimation uses the GPT-5 docs-based image-cost calculation for those `detail: "original"` images. #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/13050 - ⏳ `2` https://github.com/openai/codex/pull/13331 - ⏳ `3` https://github.com/openai/codex/pull/13049	2026-03-03 15:56:54 -08:00
xl-openai	9b004e2db1	Refactor plugin config and cache path (#13333 ) Update config.toml plugin entries to use <plugin_name>@<marketplace_name> as the key. Plugin now stays in [plugins/cache/marketplace-name/plugin-name/$version/] Clean up the plugin code structure. Add plugin install functionality (not used yet).	2026-03-03 15:00:18 -08:00
Ahmed Ibrahim	041c896509	Revert "Revert "realtime prompt changes"" (#13398 ) Reverts openai/codex#13385	2026-03-03 14:41:26 -08:00
Ahmed Ibrahim	6bee02a346	Build delegated realtime handoff text from all messages (#13395 ) ## Summary - Route delegated realtime handoff turns from all handoff message texts, preserving order - Fallback to input_transcript only when no messages are present - Add regression coverage for multi-message handoff requests	2026-03-03 14:07:51 -08:00
pakrym-oai	69df12efb3	Remove Responses V1 websocket implementation (#13364 ) V2 is the way to go!	2026-03-03 11:32:53 -07:00
pash-openai	2f5b01abd6	add fast mode toggle (#13212 ) - add a local Fast mode setting in codex-core (similar to how model id is currently stored on disk locally) - send `service_tier=priority` on requests when Fast is enabled - add `/fast` in the TUI and persist it locally - feature flag	2026-03-02 20:29:33 -08:00
Celia Chen	0bb152b01d	chore: remove SkillMetadata.permissions and derive skill sandboxing from permission_profile (#13061 ) ## Summary This change removes the compiled permissions field from skill metadata and keeps permission_profile as the single source of truth. Skill loading no longer compiles skill permissions eagerly. Instead, the zsh-fork skill escalation path compiles `skill.permission_profile` when it needs to determine the sandbox to apply for a skill script. ## Behavior change For skills that declare: ``` permissions: {} ``` we now treat that the same as having no skill permissions override, instead of creating and using a default readonly sandbox. This change makes the behavior more intuitive: - only non-empty skill permission profiles affect sandboxing - omitting permissions and writing permissions: {} now mean the same thing - skill metadata keeps a single permissions representation instead of storing derived state too Overall, this makes skill sandbox behavior easier to understand and more predictable.	2026-03-03 01:29:53 +00:00
Ahmed Ibrahim	b20b6aa46f	Update realtime websocket API (#13265 ) - migrate the realtime websocket transport to the new session and handoff flow - make the realtime model configurable in config.toml and use API-key auth for the websocket --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-02 16:05:40 -08:00
jif-oai	1905597017	feat: update memories config names (#13237 )	2026-03-02 15:25:39 +00:00
jif-oai	b649953845	feat: polluted memories (#13008 ) Add a feature flag to disable memory creation for "polluted"	2026-03-02 11:57:32 +00:00
Ahmed Ibrahim	0aeb55bf08	Record realtime close marker on replacement (#13058 ) ## Summary - record a realtime close developer message when a new realtime session replaces an active one - assert the replacement marker through the mocked responses request path --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charles Cunningham <ccunningham@openai.com>	2026-03-01 13:54:12 -08:00
xl-openai	752402c4fe	feat: load from plugins (#12864 ) Support loading plugins. Plugins can now be enabled via [plugins.<name>] in config.toml. They are loaded as first-class entities through PluginsManager, and their default skills/ and .mcp.json contributions are integrated into the existing skills and MCP flows.	2026-03-01 10:50:56 -08:00
Michael Bolin	1a8d930267	core: adopt host_executable() rules in zsh-fork (#13046 ) ## Why [#12964](https://github.com/openai/codex/pull/12964) added `host_executable()` support to `codex-execpolicy`, but the zsh-fork interception path in `unix_escalation.rs` was still evaluating commands with the default exact-token matcher. That meant an intercepted absolute executable such as `/usr/bin/git status` could still miss basename rules like `prefix_rule(pattern = ["git", "status"])`, even when the policy also defined a matching `host_executable(name = "git", ...)` entry. This PR adopts the new matching behavior in the zsh-fork runtime only. That keeps the rollout intentionally narrow: zsh-fork already requires explicit user opt-in, so it is a safer first caller to exercise the new `host_executable()` scheme before expanding it to other execpolicy call sites. It also brings zsh-fork back in line with the current `prefix_rule()` execution model. Until prefix rules can carry their own permission profiles, a matched `prefix_rule()` is expected to rerun the intercepted command unsandboxed on `allow`, or after the user accepts `prompt`, instead of merely continuing inside the inherited shell sandbox. ## What Changed - added `evaluate_intercepted_exec_policy()` in `core/src/tools/runtimes/shell/unix_escalation.rs` to centralize execpolicy evaluation for intercepted commands - switched intercepted direct execs in the zsh-fork path to `check_multiple_with_options(...)` with `MatchOptions { resolve_host_executables: true }` - added `commands_for_intercepted_exec_policy()` so zsh-fork policy evaluation works from intercepted `(program, argv)` data instead of reconstructing a synthetic command before matching - left shell-wrapper parsing intentionally disabled by default behind `ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so path-sensitive matching relies on later direct exec interception rather than shell-script parsing - made matched `prefix_rule()` decisions rerun intercepted commands with `EscalationExecution::Unsandboxed`, while unmatched-command fallback keeps the existing sandbox-preserving behavior - extracted the zsh-fork test harness into `core/tests/common/zsh_fork.rs` so both the skill-focused and approval-focused integration suites can exercise the same runtime setup - limited this change to the intercepted zsh-fork path rather than changing every execpolicy caller at once - added runtime coverage in `core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and disallowed `host_executable()` mappings and the wrapper-parsing modes - added integration coverage in `core/tests/suite/approvals.rs` to verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns under zsh-fork outside a restrictive `WorkspaceWrite` sandbox --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046). * #13065 * __->__ #13046	2026-02-28 01:41:23 +00:00
Charley Cunningham	695957a348	Unify rollout reconstruction with resume/fork TurnContext hydration (#12612 ) ## Summary This PR unifies rollout history reconstruction and resume/fork metadata hydration under a single `Session::reconstruct_history_from_rollout` implementation. The key change from main is that replay metadata now comes from the same reconstruction pass that rebuilds model-visible history, instead of doing a second bespoke rollout scan to recover `previous_model` / `reference_context_item`. ## What Changed ### Unified reconstruction output `reconstruct_history_from_rollout` now returns a single `RolloutReconstruction` bundle containing: - rebuilt `history` - `previous_model` - `reference_context_item` Resume and fork both consume that shared output directly. ### Reverse replay core The reconstruction logic moved into `codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans rollout items newest-to-oldest. That reverse pass: - derives `previous_model` - derives whether `reference_context_item` is preserved or cleared - stops early once it has both resume metadata and a surviving `replacement_history` checkpoint History materialization is still bridged eagerly for now by replaying only the surviving suffix forward, which keeps the history result stable while moving the control flow toward the future lazy reverse loader design. ### Removed bespoke context lookup This deletes `last_rollout_regular_turn_context_lookup` and its separate compaction-aware scan. The previous model / baseline metadata is now computed from the same replay state that rebuilds history, so resume/fork cannot drift from the reconstructed transcript view. ### `TurnContextItem` persistence contract `TurnContextItem` is now treated as the replay source of truth for durable model-visible baselines. This PR keeps the following contract explicit: - persist `TurnContextItem` for the first real user turn so resume can recover `previous_model` - persist it for later turns that emit model-visible context updates - if mid-turn compaction reinjects full initial context into replacement history, persist a fresh `TurnContextItem` after `Compacted` so resume/fork can re-establish the baseline from the rewritten history - do not treat manual compaction or pre-sampling compaction as creating a new durable baseline on their own ## Behavior Preserved - rollback replay stays aligned with `drop_last_n_user_turns` - rollback skips only user turns - incomplete active user turns are dropped before older finalized turns when rollback applies - unmatched aborts do not consume the current active turn - missing abort IDs still conservatively clear stale compaction state - compaction clears `reference_context_item` until a later `TurnContextItem` re-establishes it - `previous_model` still comes from the newest surviving user turn that established one ## Tests Targeted validation run for the current branch shape: - `cd codex-rs && cargo test -p codex-core --lib codex::rollout_reconstruction_tests -- --nocapture` - `cd codex-rs && just fmt` The branch also extracts the rollout reconstruction tests into `codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic has a dedicated home instead of living inline in `codex.rs`.	2026-02-27 13:50:45 -08:00
Michael Bolin	d09a7535ed	fix: use AbsolutePathBuf for permission profile file roots (#12970 ) ## Why `PermissionProfile` should describe filesystem roots as absolute paths at the type level. Using `PathBuf` in `FileSystemPermissions` made the shared type too permissive and blurred together three different deserialization cases: - skill metadata in `agents/openai.yaml`, where relative paths should resolve against the skill directory - app-server API payloads, where callers should have to send absolute paths - local tool-call payloads for commands like `shell_command` and `exec_command`, where `additional_permissions.file_system` may legitimately be relative to the command `workdir` This change tightens the shared model without regressing the existing local command flow. ## What Changed - changed `protocol::models::FileSystemPermissions` and the app-server `AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf` - wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so relative permission roots in `agents/openai.yaml` resolve against the containing skill directory - kept app-server/API deserialization strict, so relative `additionalPermissions.fileSystem.*` paths are rejected at the boundary - restored cwd/workdir-relative deserialization for local tool-call payloads by parsing `shell`, `shell_command`, and `exec_command` arguments under an `AbsolutePathBufGuard` rooted at the resolved command working directory - simplified runtime additional-permission normalization so it only canonicalizes and deduplicates absolute roots instead of trying to recover relative ones later - updated the app-server schema fixtures, `app-server/README.md`, and the affected transport/TUI tests to match the final behavior	2026-02-27 17:42:52 +00:00
jif-oai	8cf5b00aef	fix: more stable notify script (#13011 )	2026-02-27 16:05:44 +01:00
Ahmed Ibrahim	4d180ae428	Add model availability NUX metadata (#12972 ) - replace show_nux with structured availability_nux model metadata - expose availability NUX data through the app-server model API - update shared fixtures and tests for the new field	2026-02-26 22:02:57 -08:00
Eric Traut	cee009d117	Add oauth_resource handling for MCP login flows (#12866 ) Addresses bug https://github.com/openai/codex/issues/12589 Builds on community PR #12763. This adds `oauth_resource` support for MCP `streamable_http` servers and wires it through the relevant config and login paths. It fixes the bug where the configured OAuth resource was not reliably included in the authorization request, causing MCP login to omit the expected `resource` parameter.	2026-02-26 20:10:12 -08:00
Curtis 'Fjord' Hawthorne	7e980d7db6	Support multimodal custom tool outputs (#12948 ) ## Summary This changes `custom_tool_call_output` to use the same output payload shape as `function_call_output`, so freeform tools can return either plain text or structured content items. The main goal is to let `js_repl` return image content from nested `view_image` calls in its own `custom_tool_call_output`, instead of relying on a separate injected message. ## What changed - Changed `custom_tool_call_output.output` from `string` to `FunctionCallOutputPayload` - Updated freeform tool plumbing to preserve structured output bodies - Updated `js_repl` to aggregate nested tool content items and attach them to the outer `js_repl` result - Removed the old `js_repl` special case that injected `view_image` results as a separate pending user image message - Updated normalization/history/truncation paths to handle multimodal `custom_tool_call_output` - Regenerated app-server protocol schema artifacts ## Behavior Direct `view_image` calls still return a `function_call_output` with image content. When `view_image` is called inside `js_repl`, the outer `js_repl` `custom_tool_call_output` now carries: - an `input_text` item if the JS produced text output - one or more `input_image` items from nested tool results So the nested image result now stays inside the `js_repl` tool output instead of being injected as a separate message. ## Compatibility This is intended to be backward-compatible for resumed conversations. Older histories that stored `custom_tool_call_output.output` as a plain string still deserialize correctly, and older histories that used the previous injected-image-message flow also continue to resume. Added regression coverage for resuming a pre-change rollout containing: - string-valued `custom_tool_call_output` - legacy injected image message history #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/12948	2026-02-26 18:17:46 -08:00
Ahmed Ibrahim	a11da86b37	Make realtime audio test deterministic (#12959 ) ## Summary\n- add a websocket test-server request waiter so tests can synchronize on recorded client messages\n- use that waiter in the realtime delegation test instead of a fixed audio timeout\n- add temporary timing logs in the test and websocket mock to inspect where the flake stalls	2026-02-26 16:09:00 -08:00
Celia Chen	90cc4e79a2	feat: add local date/timezone to turn environment context (#12947 ) ## Summary This PR includes the session's local date and timezone in the model-visible environment context and persists that data in `TurnContextItem`. ## What changed - captures the current local date and IANA timezone when building a turn context, with a UTC fallback if the timezone lookup fails - includes current_date and timezone in the serialized <environment_context> payload - stores those fields on TurnContextItem so they survive rollout/history handling, subagent review threads, and resume flows - treats date/timezone changes as environment updates, so prompt caching and context refresh logic do not silently reuse stale time context - updates tests to validate the new environment fields without depending on a single hardcoded environment-context string ## test built a local build and saw it in the rollout file: ``` {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}} ```	2026-02-26 23:17:35 +00:00
pakrym-oai	951a389654	Allow clients not to send summary as an option (#12950 ) Summary is a required parameter on UserTurn. Ideally we'd like the core to decide the appropriate summary level. Make the summary optional and don't send it when not needed.	2026-02-26 14:37:38 -08:00
jif-oai	a6065d30f4	feat: add git info to memories (#12940 )	2026-02-26 20:14:13 +00:00
Michael Bolin	7fa9d9ae35	feat: include sandbox config with escalation request (#12839 ) ## Why Before this change, an escalation approval could say that a command should be rerun, but it could not carry the sandbox configuration that should still apply when the escalated command is actually spawned. That left an unsafe gap in the `zsh-fork` skill path: skill scripts under `scripts/` that did not declare permissions could be escalated without a sandbox, and scripts that did declare permissions could lose their bounded sandbox on rerun or cached session approval. This PR extends the escalation protocol so approvals can optionally carry sandbox configuration all the way through execution. That lets the shell runtime preserve the intended sandbox instead of silently widening access. We likely want a single permissions type for this codepath eventually, probably centered on `Permissions`. For now, the protocol needs to represent both the existing `PermissionProfile` form and the fuller `Permissions` form, so this introduces a temporary disjoint union, `EscalationPermissions`, to carry either one. Further, this means that today, a skill either: - does not declare any permissions, in which case it is run using the default sandbox for the turn - specifies permissions, in which case the skill is run using that exact sandbox, which might be more restrictive than the default sandbox for the turn We will likely change the skill's permissions to be additive to the existing permissions for the turn. ## What Changed - Added `EscalationPermissions` to `codex-protocol` so escalation requests can carry either a `PermissionProfile` or a full `Permissions` payload. - Added an explicit `EscalationExecution` mode to the shell escalation protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and `Permissions(...)` instead of overloading `None`. - Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution time, which keeps ordinary `UseDefault` commands on the turn sandbox and preserves turn-level macOS seatbelt profile extensions. - Updated the `zsh-fork` skill path so a skill with no declared permissions inherits the conversation's effective sandbox instead of escalating unsandboxed. - Updated the `zsh-fork` skill path so a skill with declared permissions reruns with exactly those permissions, including when a cached session approval is reused. ## Testing - Added unit coverage in `core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit `UseDefault` / `RequireEscalated` / `WithAdditionalPermissions` execution mapping. - Added unit coverage in `core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt extension preservation in both the `TurnDefault` and explicit-permissions rerun paths. - Added integration coverage in `core/tests/suite/skill_approval.rs` for permissionless skills inheriting the turn sandbox and explicit skill permissions remaining bounded across cached approval reuse.	2026-02-26 12:00:18 -08:00
jif-oai	3404ecff15	feat: add post-compaction sub-agent infos (#12774 ) Co-authored-by: Codex <noreply@openai.com>	2026-02-26 18:55:34 +00:00
jif-oai	d3603ae5d3	feat: fork thread multi agent (#12499 )	2026-02-26 18:01:53 +00:00
pakrym-oai	ba41e84a50	Use model catalog default for reasoning summary fallback (#12873 ) ## Summary - make `Config.model_reasoning_summary` optional so unset means use model default - resolve the optional config value to a concrete summary when building `TurnContext` - add protocol support for `default_reasoning_summary` in model metadata ## Validation - `cargo test -p codex-core --lib client::tests -- --nocapture` --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-26 09:31:13 -08:00
jif-oai	c528f32acb	feat: use memory usage for selection (#12909 )	2026-02-26 16:44:02 +00:00
jif-oai	382fa338b3	feat: memories forgetting (#12900 ) Add diff based memory forgetting	2026-02-26 13:19:57 +00:00
Charley Cunningham	07aefffb1f	core: bundle settings diff updates into one dev/user envelope (#12417 ) ## Summary - bundle contextual prompt injection into at most one developer message plus one contextual user message in both: - per-turn settings updates - initial context insertion - preserve `<model_switch>` across compaction by rebuilding it through canonical initial-context injection, instead of relying on strip/reattach hacks - centralize contextual user fragment detection in one shared definition table and reuse it for parsing/compaction logic - keep `AGENTS.md` in its natural serialized format: - `# AGENTS.md instructions for {dirname}` - `<INSTRUCTIONS>...</INSTRUCTIONS>` - simplify related tests/helpers and accept the expected snapshot/layout updates from bundled multi-part messages ## Why The goal is to converge toward a simpler, more intentional prompt shape where contextual updates are consistently represented as one developer envelope plus one contextual user envelope, while keeping parsing and compaction behavior aligned with that representation. ## Notable details - the temporary `SettingsUpdateEnvelope` wrapper was removed; these paths now return `Vec<ResponseItem>` directly - local/remote compaction no longer rely on model-switch strip/restore helpers - contextual user detection is now driven by shared fragment definitions instead of ad hoc matcher assembly - AGENTS/user instructions are still the same logical context; only the synthetic `<user_instructions>` wrapper was replaced by the natural AGENTS text format ## Testing - `just fmt` - `cargo test -p codex-app-server codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages -- --exact` - `cargo test -p codex-core compact::tests::collect_user_messages_filters_session_prefix_entries --lib -- --exact` - `cargo test -p codex-core --test all 'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_apps_guidance_as_developer_message_when_enabled' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_developer_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_user_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::resume_includes_initial_messages_and_sends_prior_items' -- --exact` - `cargo test -p codex-core --test all 'suite::review::review_input_isolated_from_parent_history' -- --exact` - `cargo test -p codex-exec --test all 'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' -- --exact` - `cargo test -p core_test_support context_snapshot::tests::full_text_mode_preserves_unredacted_text -- --exact` ## Notes - I also ran several targeted `compact`, `compact_remote`, `prompt_caching`, `model_visible_layout`, and `event_mapping` tests while iterating on prompt-shape changes. - I have not claimed a clean full-workspace `cargo test` from this environment because local sandbox/resource conditions have previously produced unrelated failures in large workspace runs.	2026-02-26 00:12:08 -08:00

1 2 3 4 5 ...

675 Commits