Commit Graph

658 Commits

Author SHA1 Message Date
Michael Bolin
bfff0c729f config: enforce enterprise feature requirements (#13388)
## Why

Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.

This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.

It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.

The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.

## What Changed

Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:

```toml
[features]
personality = true
unified_exec = false
```

Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.

- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.

## Docs

`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
2026-03-04 04:40:22 +00:00
sayan-oai
082682a628 feat: load plugin apps (#13401)
load plugin-apps from `.app.json`.

make apps runtime-mentionable iff `codex_apps` MCP actually exposes
tools for that `connector_id`.

if the app isn't available, it's filtered out of runtime connector set,
so no tools are added and no app-mentions resolve.

right now we don't have a clean cli-side error for an app not being
installed. can look at this after.

### Tests
Added tests, tested locally that using a plugin that bundles an app
picks up the app.
2026-03-03 16:29:15 -08:00
Curtis 'Fjord' Hawthorne
c4cb594e73 Make js_repl image output controllable (#13331)
## Summary

Instead of always adding inner function call outputs to the model
context, let js code decide which ones to return.

- Stop auto-hoisting nested tool outputs from `codex.tool(...)` into the
outer `js_repl` function output.
- Keep `codex.tool(...)` return values unchanged as structured JS
objects.
- Add `codex.emitImage(...)` as the explicit path for attaching an image
to the outer `js_repl` function output.
- Support emitting from a direct image URL, a single `input_image` item,
an explicit `{ bytes, mimeType }` object, or a raw tool response object
containing exactly one image.
- Preserve existing `view_image` original-resolution behavior when JS
emits the raw `view_image` tool result.
- Suppress the special `ViewImageToolCall` event for `js_repl`-sourced
`view_image` calls so nested inspection stays side-effect free until JS
explicitly emits.
- Update the `js_repl` docs and generated project instructions with both
recommended patterns:
  - `await codex.emitImage(codex.tool("view_image", { path }))`
- `await codex.emitImage({ bytes: await page.screenshot({ type: "jpeg",
quality: 85 }), mimeType: "image/jpeg" })`

#### [git stack](https://github.com/magus/git-stack-cli)
-  `1` https://github.com/openai/codex/pull/13050
- 👉 `2` https://github.com/openai/codex/pull/13331
-  `3` https://github.com/openai/codex/pull/13049
2026-03-03 16:25:59 -08:00
Curtis 'Fjord' Hawthorne
b92146d48b Add under-development original-resolution view_image support (#13050)
## Summary

Add original-resolution support for `view_image` behind the
under-development `view_image_original_resolution` feature flag.

When the flag is enabled and the target model is `gpt-5.3-codex` or
newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
`detail: "original"` to the Responses API instead of using the legacy
resize/compress path.

## What changed

- Added `view_image_original_resolution` as an under-development feature
flag.
- Added `ImageDetail` to the protocol models and support for serializing
`detail: "original"` on tool-returned images.
- Added `PromptImageMode::Original` to `codex-utils-image`.
  - Preserves original PNG/JPEG/WebP bytes.
  - Keeps legacy behavior for the resize path.
- Updated `view_image` to:
- use the shared `local_image_content_items_with_label_number(...)`
helper in both code paths
  - select original-resolution mode only when:
    - the feature flag is enabled, and
    - the model slug parses as `gpt-5.3-codex` or newer
- Kept local user image attachments on the existing resize path; this
change is specific to `view_image`.
- Updated history/image accounting so only `detail: "original"` images
use the docs-based GPT-5 image cost calculation; legacy images still use
the old fixed estimate.
- Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
at 85% quality unless lossless is required, while still allowing other
formats when explicitly requested.
- Updated tests and helper code that construct
`FunctionCallOutputContentItem::InputImage` to carry the new `detail`
field.

## Behavior

### Feature off
- `view_image` keeps the existing resize/re-encode behavior.
- History estimation keeps the existing fixed-cost heuristic.

### Feature on + `gpt-5.3-codex+`
- `view_image` sends original-resolution images with `detail:
"original"`.
- PNG/JPEG/WebP source bytes are preserved when possible.
- History estimation uses the GPT-5 docs-based image-cost calculation
for those `detail: "original"` images.


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/13050
-  `2` https://github.com/openai/codex/pull/13331
-  `3` https://github.com/openai/codex/pull/13049
2026-03-03 15:56:54 -08:00
xl-openai
9b004e2db1 Refactor plugin config and cache path (#13333)
Update config.toml plugin entries to use
<plugin_name>@<marketplace_name> as the key.
Plugin now stays in
[plugins/cache/marketplace-name/plugin-name/$version/]
Clean up the plugin code structure.
Add plugin install functionality (not used yet).
2026-03-03 15:00:18 -08:00
Ahmed Ibrahim
041c896509 Revert "Revert "realtime prompt changes"" (#13398)
Reverts openai/codex#13385
2026-03-03 14:41:26 -08:00
Ahmed Ibrahim
6bee02a346 Build delegated realtime handoff text from all messages (#13395)
## Summary
- Route delegated realtime handoff turns from all handoff message texts,
preserving order
- Fallback to input_transcript only when no messages are present
- Add regression coverage for multi-message handoff requests
2026-03-03 14:07:51 -08:00
pakrym-oai
69df12efb3 Remove Responses V1 websocket implementation (#13364)
V2 is the way to go!
2026-03-03 11:32:53 -07:00
pash-openai
2f5b01abd6 add fast mode toggle (#13212)
- add a local Fast mode setting in codex-core (similar to how model id
is currently stored on disk locally)
- send `service_tier=priority` on requests when Fast is enabled
- add `/fast` in the TUI and persist it locally
- feature flag
2026-03-02 20:29:33 -08:00
Celia Chen
0bb152b01d chore: remove SkillMetadata.permissions and derive skill sandboxing from permission_profile (#13061)
## Summary

This change removes the compiled permissions field from skill metadata
and keeps permission_profile as the single source of truth.

Skill loading no longer compiles skill permissions eagerly. Instead, the
zsh-fork skill escalation path compiles `skill.permission_profile` when
it needs to determine the sandbox to apply for a skill script.

  ## Behavior change

  For skills that declare:
```
  permissions: {}
```
we now treat that the same as having no skill permissions override,
instead of creating and using a default readonly sandbox. This change
makes the behavior more intuitive:

  - only non-empty skill permission profiles affect sandboxing
- omitting permissions and writing permissions: {} now mean the same
thing
- skill metadata keeps a single permissions representation instead of
storing derived state too

Overall, this makes skill sandbox behavior easier to understand and more
predictable.
2026-03-03 01:29:53 +00:00
Ahmed Ibrahim
b20b6aa46f Update realtime websocket API (#13265)
- migrate the realtime websocket transport to the new session and
handoff flow
- make the realtime model configurable in config.toml and use API-key
auth for the websocket

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-02 16:05:40 -08:00
jif-oai
1905597017 feat: update memories config names (#13237) 2026-03-02 15:25:39 +00:00
jif-oai
b649953845 feat: polluted memories (#13008)
Add a feature flag to disable memory creation for "polluted"
2026-03-02 11:57:32 +00:00
Ahmed Ibrahim
0aeb55bf08 Record realtime close marker on replacement (#13058)
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
2026-03-01 13:54:12 -08:00
xl-openai
752402c4fe feat: load from plugins (#12864)
Support loading plugins.

Plugins can now be enabled via [plugins.<name>] in config.toml. They are
loaded as first-class entities through PluginsManager, and their default
skills/ and .mcp.json contributions are integrated into the existing
skills and MCP flows.
2026-03-01 10:50:56 -08:00
Michael Bolin
1a8d930267 core: adopt host_executable() rules in zsh-fork (#13046)
## Why

[#12964](https://github.com/openai/codex/pull/12964) added
`host_executable()` support to `codex-execpolicy`, but the zsh-fork
interception path in `unix_escalation.rs` was still evaluating commands
with the default exact-token matcher.

That meant an intercepted absolute executable such as `/usr/bin/git
status` could still miss basename rules like `prefix_rule(pattern =
["git", "status"])`, even when the policy also defined a matching
`host_executable(name = "git", ...)` entry.

This PR adopts the new matching behavior in the zsh-fork runtime only.
That keeps the rollout intentionally narrow: zsh-fork already requires
explicit user opt-in, so it is a safer first caller to exercise the new
`host_executable()` scheme before expanding it to other execpolicy call
sites.

It also brings zsh-fork back in line with the current `prefix_rule()`
execution model. Until prefix rules can carry their own permission
profiles, a matched `prefix_rule()` is expected to rerun the intercepted
command unsandboxed on `allow`, or after the user accepts `prompt`,
instead of merely continuing inside the inherited shell sandbox.

## What Changed

- added `evaluate_intercepted_exec_policy()` in
`core/src/tools/runtimes/shell/unix_escalation.rs` to centralize
execpolicy evaluation for intercepted commands
- switched intercepted direct execs in the zsh-fork path to
`check_multiple_with_options(...)` with `MatchOptions {
resolve_host_executables: true }`
- added `commands_for_intercepted_exec_policy()` so zsh-fork policy
evaluation works from intercepted `(program, argv)` data instead of
reconstructing a synthetic command before matching
- left shell-wrapper parsing intentionally disabled by default behind
`ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so
path-sensitive matching relies on later direct exec interception rather
than shell-script parsing
- made matched `prefix_rule()` decisions rerun intercepted commands with
`EscalationExecution::Unsandboxed`, while unmatched-command fallback
keeps the existing sandbox-preserving behavior
- extracted the zsh-fork test harness into
`core/tests/common/zsh_fork.rs` so both the skill-focused and
approval-focused integration suites can exercise the same runtime setup
- limited this change to the intercepted zsh-fork path rather than
changing every execpolicy caller at once
- added runtime coverage in
`core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and
disallowed `host_executable()` mappings and the wrapper-parsing modes
- added integration coverage in `core/tests/suite/approvals.rs` to
verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns
under zsh-fork outside a restrictive `WorkspaceWrite` sandbox

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046).
* #13065
* __->__ #13046
2026-02-28 01:41:23 +00:00
Charley Cunningham
695957a348 Unify rollout reconstruction with resume/fork TurnContext hydration (#12612)
## Summary

This PR unifies rollout history reconstruction and resume/fork metadata
hydration under a single `Session::reconstruct_history_from_rollout`
implementation.

The key change from main is that replay metadata now comes from the same
reconstruction pass that rebuilds model-visible history, instead of
doing a second bespoke rollout scan to recover `previous_model` /
`reference_context_item`.

## What Changed

### Unified reconstruction output

`reconstruct_history_from_rollout` now returns a single
`RolloutReconstruction` bundle containing:

- rebuilt `history`
- `previous_model`
- `reference_context_item`

Resume and fork both consume that shared output directly.

### Reverse replay core

The reconstruction logic moved into
`codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans
rollout items newest-to-oldest.

That reverse pass:

- derives `previous_model`
- derives whether `reference_context_item` is preserved or cleared
- stops early once it has both resume metadata and a surviving
`replacement_history` checkpoint

History materialization is still bridged eagerly for now by replaying
only the surviving suffix forward, which keeps the history result stable
while moving the control flow toward the future lazy reverse loader
design.

### Removed bespoke context lookup

This deletes `last_rollout_regular_turn_context_lookup` and its separate
compaction-aware scan.

The previous model / baseline metadata is now computed from the same
replay state that rebuilds history, so resume/fork cannot drift from the
reconstructed transcript view.

### `TurnContextItem` persistence contract

`TurnContextItem` is now treated as the replay source of truth for
durable model-visible baselines.

This PR keeps the following contract explicit:

- persist `TurnContextItem` for the first real user turn so resume can
recover `previous_model`
- persist it for later turns that emit model-visible context updates
- if mid-turn compaction reinjects full initial context into replacement
history, persist a fresh `TurnContextItem` after `Compacted` so
resume/fork can re-establish the baseline from the rewritten history
- do not treat manual compaction or pre-sampling compaction as creating
a new durable baseline on their own

## Behavior Preserved

- rollback replay stays aligned with `drop_last_n_user_turns`
- rollback skips only user turns
- incomplete active user turns are dropped before older finalized turns
when rollback applies
- unmatched aborts do not consume the current active turn
- missing abort IDs still conservatively clear stale compaction state
- compaction clears `reference_context_item` until a later
`TurnContextItem` re-establishes it
- `previous_model` still comes from the newest surviving user turn that
established one

## Tests

Targeted validation run for the current branch shape:

- `cd codex-rs && cargo test -p codex-core --lib
codex::rollout_reconstruction_tests -- --nocapture`
- `cd codex-rs && just fmt`

The branch also extracts the rollout reconstruction tests into
`codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic
has a dedicated home instead of living inline in `codex.rs`.
2026-02-27 13:50:45 -08:00
Michael Bolin
d09a7535ed fix: use AbsolutePathBuf for permission profile file roots (#12970)
## Why
`PermissionProfile` should describe filesystem roots as absolute paths
at the type level. Using `PathBuf` in `FileSystemPermissions` made the
shared type too permissive and blurred together three different
deserialization cases:

- skill metadata in `agents/openai.yaml`, where relative paths should
resolve against the skill directory
- app-server API payloads, where callers should have to send absolute
paths
- local tool-call payloads for commands like `shell_command` and
`exec_command`, where `additional_permissions.file_system` may
legitimately be relative to the command `workdir`

This change tightens the shared model without regressing the existing
local command flow.

## What Changed
- changed `protocol::models::FileSystemPermissions` and the app-server
`AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
- wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
relative permission roots in `agents/openai.yaml` resolve against the
containing skill directory
- kept app-server/API deserialization strict, so relative
`additionalPermissions.fileSystem.*` paths are rejected at the boundary
- restored cwd/workdir-relative deserialization for local tool-call
payloads by parsing `shell`, `shell_command`, and `exec_command`
arguments under an `AbsolutePathBufGuard` rooted at the resolved command
working directory
- simplified runtime additional-permission normalization so it only
canonicalizes and deduplicates absolute roots instead of trying to
recover relative ones later
- updated the app-server schema fixtures, `app-server/README.md`, and
the affected transport/TUI tests to match the final behavior
2026-02-27 17:42:52 +00:00
jif-oai
8cf5b00aef fix: more stable notify script (#13011) 2026-02-27 16:05:44 +01:00
Ahmed Ibrahim
4d180ae428 Add model availability NUX metadata (#12972)
- replace show_nux with structured availability_nux model metadata
- expose availability NUX data through the app-server model API
- update shared fixtures and tests for the new field
2026-02-26 22:02:57 -08:00
Eric Traut
cee009d117 Add oauth_resource handling for MCP login flows (#12866)
Addresses bug https://github.com/openai/codex/issues/12589

Builds on community PR #12763.

This adds `oauth_resource` support for MCP `streamable_http` servers and
wires it through the relevant config and login paths. It fixes the bug
where the configured OAuth resource was not reliably included in the
authorization request, causing MCP login to omit the expected
`resource` parameter.
2026-02-26 20:10:12 -08:00
Curtis 'Fjord' Hawthorne
7e980d7db6 Support multimodal custom tool outputs (#12948)
## Summary

This changes `custom_tool_call_output` to use the same output payload
shape as `function_call_output`, so freeform tools can return either
plain text or structured content items.

The main goal is to let `js_repl` return image content from nested
`view_image` calls in its own `custom_tool_call_output`, instead of
relying on a separate injected message.

## What changed

- Changed `custom_tool_call_output.output` from `string` to
`FunctionCallOutputPayload`
- Updated freeform tool plumbing to preserve structured output bodies
- Updated `js_repl` to aggregate nested tool content items and attach
them to the outer `js_repl` result
- Removed the old `js_repl` special case that injected `view_image`
results as a separate pending user image message
- Updated normalization/history/truncation paths to handle multimodal
`custom_tool_call_output`
- Regenerated app-server protocol schema artifacts

## Behavior

Direct `view_image` calls still return a `function_call_output` with
image content.

When `view_image` is called inside `js_repl`, the outer `js_repl`
`custom_tool_call_output` now carries:
- an `input_text` item if the JS produced text output
- one or more `input_image` items from nested tool results

So the nested image result now stays inside the `js_repl` tool output
instead of being injected as a separate message.

## Compatibility

This is intended to be backward-compatible for resumed conversations.

Older histories that stored `custom_tool_call_output.output` as a plain
string still deserialize correctly, and older histories that used the
previous injected-image-message flow also continue to resume.

Added regression coverage for resuming a pre-change rollout containing:
- string-valued `custom_tool_call_output`
- legacy injected image message history


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12948
2026-02-26 18:17:46 -08:00
Ahmed Ibrahim
a11da86b37 Make realtime audio test deterministic (#12959)
## Summary\n- add a websocket test-server request waiter so tests can
synchronize on recorded client messages\n- use that waiter in the
realtime delegation test instead of a fixed audio timeout\n- add
temporary timing logs in the test and websocket mock to inspect where
the flake stalls
2026-02-26 16:09:00 -08:00
Celia Chen
90cc4e79a2 feat: add local date/timezone to turn environment context (#12947)
## Summary

This PR includes the session's local date and timezone in the
model-visible environment context and persists that data in
`TurnContextItem`.

  ## What changed
- captures the current local date and IANA timezone when building a turn
context, with a UTC fallback if the timezone lookup fails
- includes current_date and timezone in the serialized
<environment_context> payload
- stores those fields on TurnContextItem so they survive rollout/history
handling, subagent review threads, and resume flows
- treats date/timezone changes as environment updates, so prompt caching
and context refresh logic do not silently reuse stale time context
- updates tests to validate the new environment fields without depending
on a single hardcoded environment-context string

## test

built a local build and saw it in the rollout file:
```
{"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n  <shell>zsh</shell>\n  <current_date>2026-02-26</current_date>\n  <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
```
2026-02-26 23:17:35 +00:00
pakrym-oai
951a389654 Allow clients not to send summary as an option (#12950)
Summary is a required parameter on UserTurn. Ideally we'd like the core
to decide the appropriate summary level.

Make the summary optional and don't send it when not needed.
2026-02-26 14:37:38 -08:00
jif-oai
a6065d30f4 feat: add git info to memories (#12940) 2026-02-26 20:14:13 +00:00
Michael Bolin
7fa9d9ae35 feat: include sandbox config with escalation request (#12839)
## Why

Before this change, an escalation approval could say that a command
should be rerun, but it could not carry the sandbox configuration that
should still apply when the escalated command is actually spawned.

That left an unsafe gap in the `zsh-fork` skill path: skill scripts
under `scripts/` that did not declare permissions could be escalated
without a sandbox, and scripts that did declare permissions could lose
their bounded sandbox on rerun or cached session approval.

This PR extends the escalation protocol so approvals can optionally
carry sandbox configuration all the way through execution. That lets the
shell runtime preserve the intended sandbox instead of silently widening
access.

We likely want a single permissions type for this codepath eventually,
probably centered on `Permissions`. For now, the protocol needs to
represent both the existing `PermissionProfile` form and the fuller
`Permissions` form, so this introduces a temporary disjoint union,
`EscalationPermissions`, to carry either one.

Further, this means that today, a skill either:

- does not declare any permissions, in which case it is run using the
default sandbox for the turn
- specifies permissions, in which case the skill is run using that exact
sandbox, which might be more restrictive than the default sandbox for
the turn

We will likely change the skill's permissions to be additive to the
existing permissions for the turn.

## What Changed

- Added `EscalationPermissions` to `codex-protocol` so escalation
requests can carry either a `PermissionProfile` or a full `Permissions`
payload.
- Added an explicit `EscalationExecution` mode to the shell escalation
protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and
`Permissions(...)` instead of overloading `None`.
- Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution
time, which keeps ordinary `UseDefault` commands on the turn sandbox and
preserves turn-level macOS seatbelt profile extensions.
- Updated the `zsh-fork` skill path so a skill with no declared
permissions inherits the conversation's effective sandbox instead of
escalating unsandboxed.
- Updated the `zsh-fork` skill path so a skill with declared permissions
reruns with exactly those permissions, including when a cached session
approval is reused.

## Testing

- Added unit coverage in
`core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit
`UseDefault` / `RequireEscalated` / `WithAdditionalPermissions`
execution mapping.
- Added unit coverage in
`core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt
extension preservation in both the `TurnDefault` and
explicit-permissions rerun paths.
- Added integration coverage in `core/tests/suite/skill_approval.rs` for
permissionless skills inheriting the turn sandbox and explicit skill
permissions remaining bounded across cached approval reuse.
2026-02-26 12:00:18 -08:00
jif-oai
3404ecff15 feat: add post-compaction sub-agent infos (#12774)
Co-authored-by: Codex <noreply@openai.com>
2026-02-26 18:55:34 +00:00
jif-oai
d3603ae5d3 feat: fork thread multi agent (#12499) 2026-02-26 18:01:53 +00:00
pakrym-oai
ba41e84a50 Use model catalog default for reasoning summary fallback (#12873)
## Summary
- make `Config.model_reasoning_summary` optional so unset means use
model default
- resolve the optional config value to a concrete summary when building
`TurnContext`
- add protocol support for `default_reasoning_summary` in model metadata

## Validation
- `cargo test -p codex-core --lib client::tests -- --nocapture`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-26 09:31:13 -08:00
jif-oai
c528f32acb feat: use memory usage for selection (#12909) 2026-02-26 16:44:02 +00:00
jif-oai
382fa338b3 feat: memories forgetting (#12900)
Add diff based memory forgetting
2026-02-26 13:19:57 +00:00
Charley Cunningham
07aefffb1f core: bundle settings diff updates into one dev/user envelope (#12417)
## Summary
- bundle contextual prompt injection into at most one developer message
plus one contextual user message in both:
  - per-turn settings updates
  - initial context insertion
- preserve `<model_switch>` across compaction by rebuilding it through
canonical initial-context injection, instead of relying on
strip/reattach hacks
- centralize contextual user fragment detection in one shared definition
table and reuse it for parsing/compaction logic
- keep `AGENTS.md` in its natural serialized format:
  - `# AGENTS.md instructions for {dirname}`
  - `<INSTRUCTIONS>...</INSTRUCTIONS>`
- simplify related tests/helpers and accept the expected snapshot/layout
updates from bundled multi-part messages

## Why
The goal is to converge toward a simpler, more intentional prompt shape
where contextual updates are consistently represented as one developer
envelope plus one contextual user envelope, while keeping parsing and
compaction behavior aligned with that representation.

## Notable details
- the temporary `SettingsUpdateEnvelope` wrapper was removed; these
paths now return `Vec<ResponseItem>` directly
- local/remote compaction no longer rely on model-switch strip/restore
helpers
- contextual user detection is now driven by shared fragment definitions
instead of ad hoc matcher assembly
- AGENTS/user instructions are still the same logical context; only the
synthetic `<user_instructions>` wrapper was replaced by the natural
AGENTS text format

## Testing
- `just fmt`
- `cargo test -p codex-app-server
codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
-- --exact`
- `cargo test -p codex-core
compact::tests::collect_user_messages_filters_session_prefix_entries
--lib -- --exact`
- `cargo test -p codex-core --test all
'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_developer_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::includes_user_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::resume_includes_initial_messages_and_sends_prior_items'
-- --exact`
- `cargo test -p codex-core --test all
'suite::review::review_input_isolated_from_parent_history' -- --exact`
- `cargo test -p codex-exec --test all
'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
--exact`
- `cargo test -p core_test_support
context_snapshot::tests::full_text_mode_preserves_unredacted_text --
--exact`

## Notes
- I also ran several targeted `compact`, `compact_remote`,
`prompt_caching`, `model_visible_layout`, and `event_mapping` tests
while iterating on prompt-shape changes.
- I have not claimed a clean full-workspace `cargo test` from this
environment because local sandbox/resource conditions have previously
produced unrelated failures in large workspace runs.
2026-02-26 00:12:08 -08:00
Curtis 'Fjord' Hawthorne
40ab71a985 Disable js_repl when Node is incompatible at startup (#12824)
## Summary
- validate `js_repl` Node compatibility during session startup when the
experiment is enabled
- if Node is missing or too old, disable `js_repl` and
`js_repl_tools_only` for the session before tools and instructions are
built
- surface that startup disablement to users through the existing startup
warning flow instead of only logging it
- reuse the same compatibility check in js_repl kernel startup so
startup gating and runtime behavior stay aligned
- add a regression test that verifies the warning is emitted and that
the first advertised tool list omits `js_repl` and `js_repl_reset` when
Node is incompatible

## Why
Today `js_repl` can be advertised based only on the feature flag, then
fail later when the kernel starts. That makes the available tool list
inaccurate at the start of a conversation, and users do not get a clear
explanation for why the tool is unavailable.

This change makes tool availability reflect real startup checks, keeps
the advertised tool set stable for the lifetime of the session, and
gives users a visible warning when `js_repl` is disabled.

## Testing
- `just fmt`
- `cargo test -p codex-core --test all
js_repl_is_not_advertised_when_startup_node_is_incompatible`
2026-02-26 01:14:51 +00:00
Michael Bolin
14116ade8d feat: include available decisions in command approval requests (#12758)
Command-approval clients currently infer which choices to show from
side-channel fields like `networkApprovalContext`,
`proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
the request shape harder to evolve, and it forces each client to
replicate the server's heuristics instead of receiving the exact
decision list for the prompt.

This PR introduces a mapping between `CommandExecutionApprovalDecision`
and `codex_protocol::protocol::ReviewDecision`:

```rust
impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
    fn from(value: CoreReviewDecision) -> Self {
        match value {
            CoreReviewDecision::Approved => Self::Accept,
            CoreReviewDecision::ApprovedExecpolicyAmendment {
                proposed_execpolicy_amendment,
            } => Self::AcceptWithExecpolicyAmendment {
                execpolicy_amendment: proposed_execpolicy_amendment.into(),
            },
            CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
            CoreReviewDecision::NetworkPolicyAmendment {
                network_policy_amendment,
            } => Self::ApplyNetworkPolicyAmendment {
                network_policy_amendment: network_policy_amendment.into(),
            },
            CoreReviewDecision::Abort => Self::Cancel,
            CoreReviewDecision::Denied => Self::Decline,
        }
    }
}
```

And updates `CommandExecutionRequestApprovalParams` to have a new field:

```rust
available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
```

when, if specified, should make it easier for clients to display an
appropriate list of options in the UI.

This makes it possible for `CoreShellActionProvider::prompt()` in
`unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
adding support for `ApprovedForSession` when approving a skill script,
which was previously missing in the TUI.

Note this results in a significant change to `exec_options()` in
`approval_overlay.rs`, as the displayed options are now derived from
`available_decisions: &[ReviewDecision]`.

## What Changed

- Add `available_decisions` to
[`ExecApprovalRequestEvent`](de00e932dd/codex-rs/protocol/src/approvals.rs (L111-L175)),
including helpers to derive the legacy default choices when older
senders omit the field.
- Map `codex_protocol::protocol::ReviewDecision` to app-server
`CommandExecutionApprovalDecision` and expose the ordered list as
experimental `availableDecisions` in
[`CommandExecutionRequestApprovalParams`](de00e932dd/codex-rs/app-server-protocol/src/protocol/v2.rs (L3798-L3807)).
- Thread optional `available_decisions` through the core approval path
so Unix shell escalation can explicitly request `ApprovedForSession` for
session-scoped approvals instead of relying on client heuristics.
[`unix_escalation.rs`](de00e932dd/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L194-L214))
- Update the TUI approval overlay to build its buttons from the ordered
decision list, while preserving the legacy fallback when
`available_decisions` is missing.
- Update the app-server README, test client output, and generated schema
artifacts to document and surface the new field.

## Testing

- Add `approval_overlay.rs` coverage for explicit decision lists,
including the generic `ApprovedForSession` path and network approval
options.
- Update `chatwidget/tests.rs` and app-server protocol tests to populate
the new optional field and keep older event shapes working.

## Developers Docs

- If we document `item/commandExecution/requestApproval` on
[developers.openai.com/codex](https://developers.openai.com/codex), add
experimental `availableDecisions` as the preferred source of approval
choices and note that older servers may omit it.
2026-02-26 01:10:46 +00:00
Celia Chen
4f45668106 Revert "Add skill approval event/response (#12633)" (#12811)
This reverts commit https://github.com/openai/codex/pull/12633. We no
longer need this PR, because we favor sending normal exec command
approval server request with `additional_permissions` of skill
permissions instead
2026-02-26 01:02:42 +00:00
pakrym-oai
4fedef88e0 Use websocket v2 as model-preferred websocket protocol (#12838) 2026-02-25 16:35:53 -08:00
Charley Cunningham
2f4d6ded1d Enable request_user_input in Default mode (#12735)
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags

## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
2026-02-25 15:20:46 -08:00
Celia Chen
b6d20748e0 Revert "Ensure shell command skills trigger approval (#12697)" (#12721)
This reverts commit daf0f03ac8.

# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-02-25 22:49:53 +00:00
sayan-oai
d45ffd5830 make 5.3-codex visible in cli for api users (#12808)
5.3-codex released in api, mark it visible for API users via bundled
`models.json`.
2026-02-25 13:01:40 -08:00
Rasmus Rygaard
73eaebbd1c Propagate session ID when compacting (#12802)
We propagate the session ID when sending requests for inference but we
don't do the same for compaction requests. This makes it hard to link
compaction requests to their session for debugging purposes
2026-02-25 19:17:38 +00:00
Michael Bolin
648a420cbf fix: enforce sandbox envelope for zsh fork execution (#12800)
## Why
Zsh fork execution was still able to bypass the `WorkspaceWrite` model
in edge cases because the fork path reconstructed command execution
without preserving sandbox wrappers, and command extraction only
accepted shell invocations in a narrow positional shape. This can allow
commands to run with broader filesystem access than expected, which
breaks the sandbox safety model.

## What changed
- Preserved the sandboxed `ExecRequest` produced by
`attempt.env_for(...)` when entering the zsh fork path in
[`unix_escalation.rs`](https://github.com/openai/codex/blob/main/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs).
- Updated `CoreShellCommandExecutor` to execute the sandboxed command
and working directory captured from `attempt.env_for(...)`, instead of
re-running a freshly reconstructed shell command.
- Made zsh-fork script extraction robust to wrapped invocations by
scanning command arguments for `-c`/`-lc` rather than only matching the
first positional form.
- Added unit tests in `unix_escalation.rs` to lock in wrapper-tolerant
parsing behavior and keep unsupported shell forms rejected.
- Tightened the regression in
[`skill_approval.rs`](https://github.com/openai/codex/blob/main/codex-rs/core/tests/suite/skill_approval.rs):
- `shell_zsh_fork_still_enforces_workspace_write_sandbox` now uses an
explicit `WorkspaceWrite` policy with `exclude_tmpdir_env_var: true` and
`exclude_slash_tmp: true`.
- The test attempts to write to `/tmp/...`, which is only reliably
outside writable roots with those explicit exclusions set.

## Verification
- Added and passed the new unit tests around `extract_shell_script`
parsing behavior with wrapped command shapes.
  - `extract_shell_script_supports_wrapped_command_prefixes`
  - `extract_shell_script_rejects_unsupported_shell_invocation`
- Verified the regression with the focused integration test:
`shell_zsh_fork_still_enforces_workspace_write_sandbox`.

## Manual Testing

Prior to this change, if I ran Codex via:

```
just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork
```

and asked:

```
what is the output of /bin/ps
```

it would run it, even though the default sandbox should prevent the
agent from running `/bin/ps` because it is setuid on MacOS.

But with this change, I now see the expected failure because it is
blocked by the sandbox:

```
/bin/ps exited with status 1 and produced no output in this environment.
```
2026-02-25 11:05:27 -08:00
pakrym-oai
9d7013eab0 Handle websocket timeout (#12791)
Sometimes websockets will timeout with 400 error, ensure we retry it.
2026-02-25 10:31:37 -08:00
jif-oai
5441130e0a feat: adding stream parser (#12666)
Add a stream parser to extract citations (and others) from a stream.
This support cases where markers are split in differen tokens.

Codex never manage to make this code work so everything was done
manually. Please review correctly and do not touch this part of the code
without a very clear understanding of it
2026-02-25 13:27:58 +00:00
jif-oai
5a9a5b51b2 feat: add large stack test macro (#12768)
This PR adds the macro `#[large_stack_test]`

This spawns the tests in a dedicated tokio runtime with a larger stack.
It is useful for tests that needs the full recursion on the harness
(which is now too deep for windows for example)
2026-02-25 13:19:21 +00:00
jif-oai
10c04e11b8 feat: add service name to app-server (#12319)
Add service name to the app-server so that the app can use it's own
service name

This is on thread level because later we might plan the app-server to
become a singleton on the computer
2026-02-25 09:51:42 +00:00
Celia Chen
6a3233da64 Surface skill permission profiles in zsh-fork exec approvals (#12753)
## Summary

- Preserve each skill’s raw permissions block as a permission_profile on
SkillMetadata during skill loading.
- Keep compiling that same metadata into the existing runtime
Permissions object, so current enforcement
    behavior stays intact.
- When zsh-fork intercepts execution of a script that belongs to a
skill, include the skill’s
    permission_profile in the exec approval request.
- This lets approval UIs show the extra filesystem access the skill
declared when prompting for approval.
2026-02-25 01:23:10 -08:00
Michael Bolin
59398125f6 feat: zsh-fork forces scripts/**/* for skills to trigger a prompt (#12730)
Direct skill-script matches force `Decision::Prompt`, so skill-backed
scripts require explicit approval before they run. (Note "allow for
session" is not supported in this PR, but will be done in a follow-up.)

In the process of implementing this, I fixed an important bug:
`ShellZshFork` is supposed to keep ordinary allowed execs on the
client-side `Run` path so later `execve()` calls are still intercepted
and reviewed. After the shell-escalation port, `Decision::Allow` still
mapped to `Escalate`, which moved `zsh` to server-side execution too
early. That broke the intended flow for skill-backed scripts and made
the approval prompt depend on the wrong execution path.

## What changed
- In `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`,
`Decision::Allow` now returns `Run` unless escalation is actually
required.
- Removed the zsh-specific `argv[0]` fallback. With the `Allow -> Run`
fix in place, zsh's later `execve()` of the script is intercepted
normally, so the skill match happens on the script path itself.
- Kept the skill-path handling in `determine_action()` focused on the
direct `program` match path.

## Verification
- Updated `shell_zsh_fork_prompts_for_skill_script_execution` in
`codex-rs/core/tests/suite/skill_approval.rs` (gated behind `cfg(unix)`)
to:
- run under `SandboxPolicy::new_workspace_write_policy()` instead of
`DangerFullAccess`
  - assert the approval command contains only the script path
- assert the approved run returns both stdout and stderr markers in the
shell output
- Ran `cargo test -p codex-core
shell_zsh_fork_prompts_for_skill_script_execution -- --nocapture`

## Manual Testing

Run the dev build:

```
just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork
```

I have created `/Users/mbolin/.agents/skills/mbolin-test-skill` with:

```
├── scripts
│   └── hello-mbolin.sh
└── SKILL.md
```

The skill:

```
---
name: mbolin-test-skill
description: Used to exercise various features of skills.
---

When this skill is invoked, run the `hello-mbolin.sh` script and report the output.
```

The script:

```
set -e

# Note this script will fail if run with network disabled.
curl --location openai.com
```

Use `$mbolin-test-skill` to invoke the skill manually and verify that I
get prompted to run `hello-mbolin.sh`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12730).
* #12750
* __->__ #12730
2026-02-24 23:51:26 -08:00
viyatb-oai
c086b36b58 feat(ui): add network approval persistence plumbing (#12358)
## Summary
- add TUI approval options for persistent network host rules
- add app-server v2 approval payload plumbing for network approval
context + proposed network policy amendments
- add app-server handling to translate `applyNetworkPolicyAmendment`
decisions back into core review decisions
- update docs/test client output and generated app-server schemas/types
2026-02-25 07:06:19 +00:00
Curtis 'Fjord' Hawthorne
9501669a24 tests(js_repl): remove node-related skip paths from js_repl tests (#12185)
## Summary
Remove js_repl/node test-skip paths and make Node setup explicit in CI
so js_repl tests always run instead of silently skipping.

## Why
We had multiple “expediency” skip paths that let js_repl tests pass
without actually exercising Node-backed behavior. This reduced CI signal
and hid runtime/environment regressions.

## What changed

### CI
- Added Node setup using `codex-rs/node-version.txt` in:
  - `.github/workflows/rust-ci.yml`
  - `.github/workflows/bazel.yml`
- Added a Unix PATH copy step in Bazel workflow to expose the setup-node
binary in common paths.

### js_repl test harness
- Added explicit js_repl sandbox test configuration helpers in:
  - `codex-rs/core/src/tools/js_repl/mod.rs`
  - `codex-rs/core/src/tools/handlers/js_repl.rs`
- Added Linux arg0 dispatch glue for js_repl tests so sandbox subprocess
entrypoint behavior is correct under Linux test execution.

### Removed skip behavior
- Deleted runtime guard function and early-return skips in js_repl tests
(`can_run_js_repl_runtime_tests` and related per-test short-circuits).
- Removed view_image integration test skip behavior:
  - dropped `skip_if_no_network!(Ok(()))`
- removed “skip on Node missing/too old” branch after js_repl output
inspection.

## Impact
- js_repl/node tests now consistently execute and fail loudly when the
environment is not correctly provisioned.
- CI has stronger signal for js_repl regressions instead of false green
from conditional skips.

## Testing
- `cargo test -p codex-core` (locally) to validate js_repl
unit/integration behavior with skips removed.
- CI expected to surface any remaining environment/runtime gaps directly
(rather than masking them).


#### [git stack](https://github.com/magus/git-stack-cli)
-  `1` https://github.com/openai/codex/pull/12300
-  `2` https://github.com/openai/codex/pull/12275
-  `3` https://github.com/openai/codex/pull/12205
-  `4` https://github.com/openai/codex/pull/12407
-  `5` https://github.com/openai/codex/pull/12372
- 👉 `6` https://github.com/openai/codex/pull/12185
-  `7` https://github.com/openai/codex/pull/10673
2026-02-24 22:52:14 -08:00