Commit Graph

6939 Commits

Author SHA1 Message Date
Joe Florencio
86ae66fda7 Refine cloud config bundle runtime integration
Move CloudConfigBundleLoader into ConfigLoadOptions so lower-level config loading does not need a separate bundle parameter or a bundle-specific TOML helper.

Keep exec and TUI on the straightforward local helper path: bootstrap from local config, build the shared cloud bundle loader, and reload merged TOML with that loader only when OSS provider inference needs cloud-delivered config.

Preserve strict validation and path-base behavior for cloud config and requirements fragments, including diagnostics that name the cloud layer and requirements paths resolved against CODEX_HOME.

Validation: just fmt; just test -p codex-core -E 'test(load_config_layers_resolves_relative_cloud_requirements_paths_against_codex_home) | test(strict_config_rejects_unknown_cloud_config_key)'; just test -p codex-config; just test -p codex-exec -E 'test(top_cli_parses_resume_prompt_after_config_flag)'; just test -p codex-tui -E 'test(app_server_target_for_launch_prefers_explicit_remote_endpoint)'; just fix -p codex-config -p codex-core -p codex-exec -p codex-tui -p codex-app-server -p codex-core-plugins; git diff --check.
2026-05-28 18:24:49 -07:00
Joe Florencio
8c6213aaa3 Switch runtime to cloud config bundle
Replace the old cloud requirements runtime path with the unified cloud config bundle loader. Config construction now receives a CloudConfigBundleLoader and uses the shared bundle for both cloud-delivered requirements and cloud-delivered config layers, so config and requirements consumers do not race separate backend fetches.

Remove the codex-cloud-requirements crate and legacy CloudRequirementsLoader surface. The new path preserves the existing pull-based load behavior and routes managed requirements through the same composed requirements model while enterprise-managed config fragments are inserted below user/profile/project/session config and above system config. Legacy MDM-delivered managed_config.toml remains the highest-precedence compatibility layer while it is phased out.

Thread the bundle loader through app-server, app-server-client, TUI, core config construction, exec, hooks, network proxy loading, and related tests. Update config-manager error handling and tests to report cloudConfigBundle load failures consistently, and update the loader README to describe the new public argument and the effective config-layer precedence.

Refresh generated Python SDK protocol artifacts from the local app-server schema so downstream clients know about enterpriseManaged config layers and the cloudManagedConfig hook source. The normal SDK generate-types command still targets the pinned runtime package, so this checkpoint regenerated v2_all.py from the checked-in local schema for the branch-local protocol changes.

Add codex_config::test_support::CloudConfigBundleFixture to centralize cloud bundle test setup. The fixture supports quick single-layer enterprise requirement/config loaders, additive enterprise requirement/config layers for multi-layer tests, and conversion into either a bundle or loader, removing copied private helpers from core and app-server tests.

Verification: just fmt; just fix -p codex-config; cargo test -p codex-config test_support; cargo test -p codex-core cloud_config_bundle_take_precedence_over_mdm_requirements; cargo test -p codex-app-server write_value_rejects_feature_requirement_conflict; git diff --cached --check. Earlier checkpoint verification also covered codex-cloud-config, codex-backend-client, codex-hooks, codex-core-plugins, selected codex-core cloud config tests, selected codex-app-server cloud config tests, bazel lock update/check, and a Python SDK smoke test for enterpriseManaged and cloudManagedConfig.
2026-05-28 18:24:48 -07:00
Joe Florencio
8aa8462583 Fix cloud config argument comments
Add the explicit parameter comments required by argument-comment-lint for cloud config bundle metric calls and cache identity tests.
2026-05-28 17:37:32 -07:00
Joe Florencio
f064cad531 Add cloud config bundle transport
Introduce codex-cloud-config as the bundle-oriented replacement path for the existing cloud requirements transport. This deliberately adapts the cloud-requirements behavior rather than changing semantics: eligible ChatGPT Business/Enterprise accounts fetch from the backend, failures fail closed, auth recovery is attempted on unauthorized responses, transient request failures retry with backoff, and the cache is scoped to the ChatGPT user/account identity.

The new crate fetches the generated backend config bundle endpoint and converts it into the config-domain CloudConfigBundle contract. The bundle shape mirrors the backend API with config_toml.enterprise_managed and requirements_toml.enterprise_managed sections only; we are not modeling future source types yet. Fragment order from the backend is preserved so later config/requirements composition can apply the same deterministic precedence rules as the server-delivered bundle.

Add a separate signed cache file for the bundle instead of reusing the legacy requirements cache. The cache keeps the same broad goals as cloud requirements: local reuse for active sessions, a 30 minute TTL, a 5 minute background refresh cadence, HMAC tamper detection, cache versioning, and identity mismatch rejection. Metrics mirror the old fetch_attempt/fetch_final/load split while adding a bundle_shape tag that reports none, empty, or the sorted enterprise_config/enterprise_requirements sources present in the bundle.

This commit keeps the existing codex-cloud-requirements path intact. Follow-up PRs will wire config loading to the new bundle loader, migrate requirements consumption onto the bundle, and then delete the legacy cloud requirements crate once the cutover is complete.

Also add the config-domain CloudConfigBundle/CloudConfigBundleLoader types in codex-config, matching the existing CloudRequirementsLoader injection pattern. codex-cloud-config owns transport, auth, caching, refresh, and metrics; codex-config owns the domain contract that later config composition will consume.

Verification: just fmt; cargo test -p codex-cloud-config; cargo test -p codex-config; just fix -p codex-cloud-config; just bazel-lock-update; just bazel-lock-check; git diff --check.
2026-05-28 17:37:32 -07:00
Joe Florencio
a33262fced Add cloud managed config layer support
Introduce an explicit enterprise-managed config layer source and the client-side machinery to materialize cloud-delivered config TOML fragments into the normal config stack. The new ConfigLayerSource::EnterpriseManaged variant carries the backend layer id and display name so diagnostics and debug output can point admins at the exact cloud layer that needs fixing.

Add codex_config::cloud_config_layers to build config layers from delivered fragments. The composition keeps backend layer order deterministic, resolves relative path settings against a supplied base directory for consistency with existing MDM-delivered config semantics, and stores the raw TOML with that base directory on ConfigLayerEntry so typed diagnostics can reparse non-file layers without relying on a synthetic filesystem path.

Keep this v1 pull-based and snapshot-oriented. The bundle loader/cache work can feed these helpers, but this change does not introduce dynamic refresh or announce/push semantics. Consumers continue to read the config state they are already handed.

Tighten provenance and diagnostics for non-file layers: enterprise-managed layers render as enterprise-managed config values in debug output, syntax/type errors use the layer display name, and synthetic hook source paths include the enterprise layer name/id when a filesystem path is needed for existing hook metadata surfaces.

Split hook provenance semantically by adding HookSource::CloudManagedConfig. Hooks delivered through enterprise-managed config layers now report cloud_managed_config / cloudManagedConfig, while hooks delivered through requirements remain CloudRequirements. The TUI labels the new source as Cloud-managed config, and analytics/core metric mappings were updated to include the new source.

Regenerate app-server protocol JSON and TypeScript schema fixtures for the new ConfigLayerSource and HookSource wire values.

Verification: just write-app-server-schema; cargo test -p codex-app-server-protocol; cargo test -p codex-hooks hook_metadata_for_config_layer_source; cargo test -p codex-core hook_run_metric_tags; cargo test -p codex-analytics hook_run_metadata; just fmt; just fix -p codex-protocol -p codex-app-server-protocol -p codex-hooks -p codex-analytics -p codex-core -p codex-tui.
2026-05-28 17:04:39 -07:00
Joe Florencio
08256d290a Compose requirements layers
Replace the cloud-only requirements fragment composer with a generic requirements layer composer. The new composer accepts already-prioritized requirements layers from any managed source, parses each layer independently, applies remote_sandbox_config within the layer, strips fields that require domain-specific handling, and then performs config-style TOML merging for the regular fields.

The default semantics now match config TOML: lower-priority layers provide defaults, higher-priority layers win scalar/list conflicts, and tables recurse. The special cases are explicit: rules.prefix_rules append in priority order, hooks append event groups and fail closed on active managed directory conflicts, permissions.filesystem.deny_read unions with stable deduplication, and remote_sandbox_config materializes allowed_sandbox_modes before the regular TOML merge.

Track provenance with RequirementSource::Composite when a table field is assembled from multiple layers, while preserving exact RequirementSource::EnterpriseManaged metadata for single enterprise-managed layers. Keep the legacy CloudRequirements source for the existing cloud requirements loader until the runtime switch removes that path.

Derive Serialize for SandboxModeRequirement so TOML materialization uses the enum serde names instead of duplicating string mappings.
2026-05-28 16:53:25 -07:00
Joe Florencio
aefd558661 Add config bundle transport types
Add curated generated OpenAPI models for the codex-backend config bundle response and re-export them through codex-backend-client.

Expose Client::get_config_bundle() for both Codex API and ChatGPT /wham path styles. This is transport-only wiring for the first cloud config bundle checkpoint; it does not replace the existing requirements loader or change runtime config behavior.
2026-05-28 15:18:11 -07:00
Michael Bolin
e7dda8070e Surface filesystem permission profiles in prompt context (#23924)
## Summary
Some permission profiles can encode filesystem reads that should remain
unavailable to the agent. Before this change, the model-visible context
and automatic approval review prompt summarized the effective
permissions as a legacy sandbox mode, which can omit permission-profile
filesystem entries from escalation decisions.

For example, a profile can grant workspace access while denying a
private subtree across every workspace root:

```toml
default_permissions = "restricted-workspace"

[permissions.restricted-workspace.workspace_roots]
"/Users/alice/project" = true
"/Users/alice/other-project" = true

[permissions.restricted-workspace.filesystem]
":minimal" = "read"

[permissions.restricted-workspace.filesystem.":workspace_roots"]
"." = "write"
"private" = "deny"
"private/**" = "deny"
```

The context window now describes the workspace roots and effective
filesystem side of the `PermissionProfile` directly, with deny entries
marked as non-escalatable:

```xml
<environment_context>
  <cwd>/Users/alice/project</cwd>
  <shell>zsh</shell>
  <filesystem><workspace_roots><root>/Users/alice/project</root><root>/Users/alice/other-project</root></workspace_roots><permission_profile type="managed"><file_system type="restricted"><entry access="read"><special>:minimal</special></entry><entry access="write"><path>/Users/alice/project</path></entry><entry access="write"><path>/Users/alice/other-project</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/project/private</path></entry><entry access="deny" escalatable="false"><path>/Users/alice/other-project/private</path></entry><entry access="deny" escalatable="false"><glob>/Users/alice/project/private/**</glob></entry><entry access="deny" escalatable="false"><glob>/Users/alice/other-project/private/**</glob></entry></file_system></permission_profile></filesystem>
</environment_context>
```

Managed requirements can impose the same kind of deny-read restriction:

```toml
[permissions.filesystem]
deny_read = [
  "/Users/alice/project/private",
  "/Users/alice/project/private/**",
]
```

The automatic approval review prompt also receives the parent turn's
denied-read context, so review decisions can account for the active
permission profile.

## What Changed
- Render the effective filesystem profile in `<environment_context>`,
including profile type, filesystem entries, workspace roots, and
non-escalatable deny entries.
- Persist effective `workspace_roots` in `TurnContextItem` so
resumed/replayed context does not have to bind `:workspace_roots`
through legacy `cwd` fallback.
- Add explicit permission instructions that denied reads are policy
restrictions, not escalation targets.
- Pass the parent turn's denied-read context into automatic approval
reviews.
- Add targeted coverage for prompt rendering, workspace-root
materialization, replay context, and review prompt context.
- Keep the prompt-context test expectations platform-aware so the same
filesystem rendering assertions pass on Unix and Windows paths.

## Testing
- `just test -p codex-core
context::environment_context::tests::serialize_environment_context_with_full_filesystem_profile`
- `just test -p codex-core
context::environment_context::tests::turn_context_item_filesystem_uses_workspace_roots_instead_of_cwd`
- `just test -p codex-core
context::permissions_instructions::permissions_instructions_tests::builds_permissions_from_profile_with_denied_reads`
- `just fix -p codex-core`

I also attempted `just test -p codex-core`; the changed prompt-context
tests passed, but the full local run did not complete cleanly in this
sandboxed macOS environment due unrelated user-shell `CODEX_SANDBOX*`
expectations and integration-test timeouts.
2026-05-28 14:56:53 -07:00
Alexi Christakis
e92c952b2e [codex] Add user input client ids (#24653)
## Summary

Adds an optional `clientId` field to app-server v2 `UserInput` and
carries it through the core `UserInput` model so clients can correlate
echoed user input items without relying on payload equality.

## Details

- Adds `client_id: Option<String>` to core `UserInput` variants.
- Exposes the v2 app-server field as `clientId` on the wire and in
generated TypeScript.
- Preserves the id when converting between app-server v2 and core
protocol types.
- Regenerates app-server schema fixtures.

## Validation

- `just fmt`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-protocol`
- `git diff --check`
2026-05-28 14:54:39 -07:00
viyatb-oai
a027135bc6 fix(exec-server): reject websocket requests with Origin headers (#24947)
## Why

`codex exec-server` has a local WebSocket listener, but it did not apply
the same browser-origin request handling as the `app-server` WebSocket
transport. Requests that carry an `Origin` header should not be upgraded
by this local transport, keeping both local WebSocket servers consistent
and avoiding unexpected browser-initiated connections.

## What changed

- Added an Axum middleware guard in
`codex-rs/exec-server/src/server/transport.rs` that returns `403
Forbidden` for requests carrying an `Origin` header.
- Added an integration test in `codex-rs/exec-server/tests/websocket.rs`
that covers rejection of an `Origin`-bearing WebSocket handshake.
- Kept ordinary WebSocket clients unchanged: existing no-`Origin`
initialization and process behavior remains covered by the crate tests.

## Validation

- `just test -p codex-exec-server` test phase (`186 passed`; run outside
the parent macOS sandbox so nested sandbox tests can execute)
- `just clippy -p codex-exec-server`
2026-05-28 14:44:14 -07:00
viyatb-oai
3cf737e4e3 fix: cancel Windows sandbox on network denial (#19880)
## Why

When Guardian or the sandbox network proxy detects and denies a network
attempt, core cancels the associated execution through `ExecExpiration`.
The Windows sandbox capture path was only forwarding the timeout
component of that expiration state. As a result, a sandboxed Windows
command whose network attempt had already been denied could keep running
until its timeout elapsed rather than terminating promptly in response
to the denial.

This change closes that cancellation-propagation gap for Windows sandbox
execution.

## What changed

- Added `WindowsSandboxCancellationToken` as the cancellation hook
exposed to Windows capture backends.
- Extracted the cancellation token from `ExecExpiration` in core and
passed it to both the direct and elevated Windows sandbox capture paths
alongside the existing timeout.
- Updated direct capture to poll for either process exit, timeout, or
cancellation and to terminate cancelled processes without reporting them
as timed out.
- Updated elevated capture to watch for cancellation and send the
existing `Terminate` IPC frame to the elevated runner. The watcher parks
for 50 ms between checks to bound response latency without a tight busy
wait.
- Added Windows regression coverage for a long-running PowerShell
command: cancellation ends capture before its timeout and does not set
`timed_out`.
- Added a visible skip diagnostic when that PowerShell-dependent
regression test cannot execute, and consolidated the duplicated
expiration-policy branch identified in review.

## Security

This improves enforcement after a denied network attempt has been
attributed to a Windows sandboxed execution: the command no longer
remains alive simply because Windows capture lost the cancellation
signal.

This PR does not claim to make Windows offline mode an airtight
no-network or no-exfiltration boundary. It does not introduce
AppContainer or change how network denial is detected; it makes an
already-detected denial promptly stop the affected sandboxed command.

## Validation

### Commands run

- `just fmt`
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core network_denial`
- `cargo clippy -p codex-core -p codex-windows-sandbox --tests --no-deps
-- -D warnings`
- `just argument-comment-lint -p codex-windows-sandbox -p codex-core`

The new capture regression is `cfg(target_os = "windows")`, so Windows
CI is the execution coverage for that test path. The local macOS test
runs validate the host-runnable crate and core network-denial behavior.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-28 21:28:06 +00:00
Michael Bolin
bc10e5b390 runtime: prepend zsh fork bin dir to PATH (#23768)
## Why

#23756 makes packaged Codex builds include and default to the bundled
zsh fork. The important reason to put that fork's directory at the front
of `PATH` is to keep executable-level escalation working after a command
leaves the original shell and later re-enters zsh through `env`.

The expected chain is:

1. The zsh fork runs the top-level shell command.
2. That command launches another program, such as `python3`, while
inheriting the `EXEC_WRAPPER` environment and the escalation socket fd.
3. That program spawns a shell script whose shebang is `#!/usr/bin/env
zsh` rather than `#!/bin/zsh`, and it does not close the escalation fd.
4. `/usr/bin/env` resolves `zsh` through `PATH`, so it must find the
packaged zsh fork before the system zsh.
5. Commands inside that nested script are intercepted by the zsh fork
and can still request escalation from Codex.

If `PATH` resolves `zsh` to the system shell instead, the nested script
loses zsh-fork exec interception. Commands that should request
escalation can then run only in the original sandbox, or fail there,
without Codex ever receiving the approval request.

Shell snapshots make this slightly more subtle: a snapshot can restore
an older `PATH` after the child shell starts. This PR treats the zsh
fork `PATH` prepend as an explicit environment override so snapshot
wrapping preserves it.

## What Changed

- Added shared zsh-fork runtime helpers that prepend the configured zsh
executable parent directory to `PATH` without duplicate entries.
- Applied the zsh fork `PATH` prepend to both zsh-fork `shell_command`
launches and unified-exec zsh-fork launches before sandbox command
construction.
- Kept the shell-command zsh-fork backend API narrow: it derives the
configured zsh path from session services and rebuilds its sandbox
environment from `req.env`, rather than accepting a second, competing
environment map or a separately threaded bin dir.
- Kept Unix-only zsh-fork `PATH` mutation out of Windows clippy-visible
mutability.
- Added coverage for duplicate `PATH` entries, for preserving the zsh
fork prepend through shell snapshot wrapping, and for the nested
`python3` -> `#!/usr/bin/env zsh` escalation flow.

## Testing

- `just fmt`
- `just fix -p codex-core`

I left final test validation to CI after the latest review-comment
cleanup. Before that cleanup, `just test -p codex-core zsh_fork` passed
locally for the zsh-fork-focused tests.
2026-05-28 14:10:40 -07:00
Celia Chen
0a8c835845 [codex] Remove Bedrock OSS models from catalog (#24960)
Remove the GPT OSS 120B and 20B entries from the Amazon Bedrock static
model catalog, as they are no longer supported.
2026-05-28 14:10:26 -07:00
iceweasel-oai
d9f53128b7 [codex] Handle PowerShell UTF-8 setup failures (#24949)
Fixes #12496.

## Why

Windows sandboxed PowerShell commands can run under
`ConstrainedLanguage` on some machines, especially enterprise-managed
Windows environments. In that mode, our PowerShell command prelude could
fail before every command because it directly assigned
`[Console]::OutputEncoding` to UTF-8. The actual user command still ran,
but Codex surfaced noisy `Cannot set property. Property setting is
supported only on core types in this language mode.` output for every
shell call.

## What Changed

- Makes the PowerShell UTF-8 output encoding prelude best-effort by
wrapping the assignment in `try { ... } catch {}`.
- Keeps the existing UTF-8 behavior when PowerShell allows the
assignment.
- Adds focused tests for adding the prelude and avoiding duplicate
prelude insertion.

## Validation

- `cargo fmt -p codex-shell-command`
- `cargo check -p codex-shell-command`
- `git diff --check`
- Verified a local `ConstrainedLanguage` PowerShell probe prints only
the command output with no property-setting error.
- Verified `codex exec` from a temporary `chcp 437` context reports
`utf-8` / `65001` and preserves non-ASCII output (`café`, `漢字`).
2026-05-28 13:58:20 -07:00
Felipe Coury
2e0c4f4977 fix(tui): prevent repository-configured code execution in /diff (#24954)
## Why

`/diff` is intended to display working-tree changes, but its Git
invocations honored repository-selected executable helpers. A repository
could configure diff/text conversion helpers, clean/process filters,
`core.fsmonitor`, or `post-index-change` hooks that execute when a user
runs `/diff`.

Fixes
[PSEC-4395](https://linear.app/openai/issue/PSEC-4395/codex-cli-diff-executes-repository-selected-diff-helpers).

## What Changed

- Pass `--no-textconv` and `--no-ext-diff` for tracked and untracked
diff generation.
- Discover configured `filter.<driver>.clean` and `.process` entries,
then neutralize the selected drivers through structured
`GIT_CONFIG_KEY_*` / `GIT_CONFIG_VALUE_*` overrides, including driver
names containing `=`.
- Run all `/diff` Git probes with `core.fsmonitor=false` and a null
`core.hooksPath`.
- Use short submodule reporting while ignoring dirty submodule
worktrees, since inspecting a checked-out submodule for dirtiness can
execute filters from that child repository. This intentionally omits
dirty-only submodule markers in order to preserve the non-executing
security boundary.
- Add real-Git marker tests covering filters, fsmonitor, hooks, and
configured helpers inside checked-out submodules.

## How to Test

1. In a repository with ordinary tracked and untracked edits, run
`/diff`.
2. Confirm the normal working-tree diff is shown for top-level files.
3. Run the targeted tests below; they configure executable marker
helpers for repository filters, fsmonitor, hooks, and a checked-out
submodule, then verify `/diff` does not invoke them.
4. Confirm a dirty-only submodule does not cause Codex to enter the
submodule and execute its configured helper.

Targeted tests:
- `just test -p codex-tui get_git_diff_`

Validation note: `just test -p codex-tui` runs the new coverage, but
this worktree currently also has two unrelated failing guardian tests:
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
and
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`.
2026-05-28 16:53:59 -03:00
Adam Perry @ OpenAI
b90ec46387 Add codex app-server --stdio alias (#24940)
## Summary
- Add `--stdio` as a direct alias for `codex app-server --listen
stdio://`.
- Keep `--stdio` and `--listen` mutually exclusive.
- Update the app-server README to document both forms.
2026-05-28 12:43:30 -07:00
Adam Perry @ OpenAI
9dd39f334e Move Bazel Windows jobs onto codex-runners (#24952)
The codex-windows runner group should be much faster than the default
GHA runners. Since bazel jobs on windows are frequently the long pole
for PRs checks, this will hopefully get people landing a bit faster.
2026-05-28 12:43:04 -07:00
Won Park
ecb41fcb64 Add feature-gated standalone image generation extension (#24723)
## Why

Add a standalone image generation path that can be exercised
independently of hosted Responses image generation, while retaining the
hosted tool as fallback unless the extension is actually available to
the model.

## What changed

- Added the `codex-image-generation-extension` crate with standalone
generate/edit execution, prior-image selection for edits, model-visible
image output, and local generated-image persistence.
- Installed the extension in app-server behind the disabled-by-default
`imagegenext` feature and backend eligibility checks.
- Updated core tool planning so eligible `image_gen.imagegen` exposure
replaces hosted `image_generation`, while unavailable configurations
retain hosted fallback.
- Added coverage for extension behavior, edit history reuse, feature
gating, auth eligibility, and hosted-tool replacement.
- The extension is installed through app-server only in this PR; other
execution paths retain hosted image generation because hosted
replacement occurs only when the standalone executor is actually
registered and model-visible.
- The initial extension contract intentionally fixes the image model to
`gpt-image-2` and uses automatic image parameters.
- Native generated-image history/card parity and rollout persistence
cleanup are intentionally deferred follow-up work.

## Validation

- `just test -p codex-image-generation-extension`
- `just test -p codex-features`
- `just test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
- `just test -p codex-app-server`
- `just fix -p codex-image-generation-extension -p codex-features -p
codex-core -p codex-app-server`
- `just fmt`
- `just bazel-lock-update`
- `just bazel-lock-check`

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-05-28 11:44:55 -07:00
jif-oai
462deb0426 Wire task completion into thread-idle lifecycle (#24928)
## Why

#24744 introduced the thread idle lifecycle hook so idle continuation
can be owned by lifecycle contributors instead of hard-coded goal
runtime plumbing. Task completion still called
`goal_runtime_apply(GoalRuntimeEvent::MaybeContinueIfIdle)` directly, so
the post-turn idle transition remained goal-specific and did not notify
generic thread lifecycle contributors.

## What Changed

- Add `Session::emit_thread_idle_lifecycle_if_idle()` to gate idle
emission on both no active turn and no queued trigger-turn mailbox work.
- Call that helper when a task clears the active turn, replacing the
direct `GoalRuntimeEvent::MaybeContinueIfIdle` path.
- Cover the behavior with `codex-core` session tests for emitting after
task completion and suppressing idle emission while trigger-turn mailbox
work is pending.

## Verification

- New tests in `core/src/session/tests.rs` exercise the idle lifecycle
emission and trigger-turn mailbox guard.
2026-05-28 20:05:41 +02:00
Adam Perry @ OpenAI
c2508db60d Revert "Add app-server startup benchmark crate" (#24937)
Reverts openai/codex#24651, broke musl job
https://github.com/openai/codex/actions/runs/26585495205/job/78330166927
2026-05-28 17:49:41 +00:00
canvrno-oai
6c1215dac6 TUI: Unified mentions tweaks + polish mentions rendering (#23363)
This change keeps unified @mentions behind the mentions_v2 gate, moves
the flag to under-development, and polishes mention rendering/history
behavior.

It also adds a few small improvements to the mentions feature around
mention rendering and history round-tripping for plugin/tool mentions in
message edit scenarios. Plugin selections now insert `@` mentions with
better casing, and saved history preserves the visible sigil so recalled
messages look the same as what the user typed.

- Preserves `@` sigils when encoding/decoding mention history for
tool/plugin paths.
- Improves plugin mention insertion so display names/casing are
reflected more cleanly in the composer.
- Update composer to render user-entered plugin mentions in the same
color as the mentions menu. ALso applies to recalled/edited messages.
- Left/right arrows no longer switch unified-mention search modes after
an @mention has already been accepted (Ex: arrowing left through a
composed message that contains @mentions).
- Keeps bound mentions stable around punctuation, so accepted `@`
mentions do not reopen the popup and punctuated `$` mentions still
persist to cross-session history.

**Steps to test**
- Ensure mentions_v2 is enabled through configuration or `--enable
mentions_v2`
- Type `@` in the TUI composer and verify filesystem/plugin/skill
results are displayed in the unified mentions menu.
- Select a plugin mention from the `@` popup and confirm the inserted
text is an `@...` mention with casing, then recall/edit the message and
confirm it still renders as `@...`.
- Mention a skill and verify that skills still insert as `$skill`
mentions rather than `@` mentions.
- Verify punctuated mentions such as `@plugin.` and `($skill)` keep
their bound mention behavior across editing and history recall.
2026-05-28 10:30:15 -07:00
Celia Chen
489bf38658 chore: add GPT-5.5 to the Amazon Bedrock catalog (#24701)
## Summary

Amazon Bedrock should expose GPT-5.5 alongside GPT-5.4, and the Bedrock
GPT entries should stay aligned with the canonical bundled OpenAI model
metadata instead of carrying a separate hand-written copy that can drift
over time. This change will be merged when the model is online.

This change:

- Adds the Bedrock Mantle model id for `openai.gpt-5.5`.
- Builds the Bedrock GPT-5.5 and GPT-5.4 catalog entries from the
bundled OpenAI model catalog, then overrides the Bedrock-facing slug,
explicit priority, and Bedrock-specific context windows.
- Hardcodes both `context_window` and `max_context_window` to `272000`
for Bedrock GPT-5.5 and GPT-5.4.
- Keeps `openai.gpt-5.5` as the default Bedrock model ahead of
`openai.gpt-5.4` and the Bedrock OSS models.
2026-05-28 10:29:06 -07:00
Gabriel Peal
577ec03bf8 [codex] Support ui visibility meta for tools (#24700)
## Summary

Adds support for the same ui.visibility metadata as resources

[spec](https://github.com/modelcontextprotocol/ext-apps/blob/main/specification/draft/apps.mdx#resource-discovery)
2026-05-28 10:24:03 -07:00
Michael Bolin
2264fdd4a2 Fix extension turn item emitter test event ordering (#24936)
## Why

PR #24813 added extension `TurnItemEmitter` coverage and introduced a
test that records a conversation history item before asserting
extension-emitted turn item events.

`record_conversation_items()` also emits a `RawResponseItem` event to
observers. The test was reading from the same event receiver and
expected the next event to be `ItemStarted`, so the test failed reliably
once the setup history item was present.

## What Changed

Update
`passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call` to
consume and assert the expected setup `RawResponseItem` before checking
the extension `ItemStarted`, `WebSearchBegin`, `ItemCompleted`, and
`WebSearchEnd` events.

This is test-only and does not change extension runtime behavior.

## Verification

- `cargo nextest run --no-fail-fast -p codex-core
tools::handlers::extension_tools::tests::passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call`
2026-05-28 09:59:34 -07:00
jif-oai
e2551a5e36 Reap stale multi-agent slots (#24903)
## Summary

- Let `close_agent` clean up an agent that is still registered in
`AgentRegistry` even when its underlying thread is already missing.
- Preserve the explicit-close boundary: for known stale thread-spawn
agents, mark the persisted spawn edge `Closed`, then treat
`ThreadNotFound` / `InternalAgentDied` as a successful close so the
registry slot can be released.
- Add a regression for MultiAgentV2 task-name targets where
`close_agent("worker")` succeeds after the worker thread has already
disappeared.

## Motivation

A worker can disappear from `ThreadManager` while its metadata still
exists in the root `AgentRegistry`. Before this change, the close tool
failed while trying to subscribe to the missing thread status, so it
never reached the cleanup path that releases the registered agent slot.
With `agents.max_threads = 1`, an explicit close of that stale task-name
agent could fail and leave the session unable to spawn a replacement.

## Scope

This PR intentionally does not add automatic stale-agent reaping to
`spawn_agent`, `resume_agent`, or `list_agents`. A thread being missing
from `ThreadManager` is not the same as an explicit close: persisted
open spawn edges are still the durable source of truth for resume and
task-name ownership until `close_agent` is called.

## Validation

- `just test -p codex-core -E
'test(multi_agent_v2_close_agent_reaps_stale_task_name_target) |
test(resume_agent_from_rollout_reopens_open_descendants_after_manager_shutdown)'`
- `just fix -p codex-core`
2026-05-28 18:48:43 +02:00
Gabriel Peal
8a827d6426 Expose MCP server info as part of server status (#24698)
# Summary

Expose MCP server info via App Server (when available) so apps can
render a richer MCP experience
2026-05-28 09:38:34 -07:00
Brent Traut
2a1158b8e2 feat(app-server): include turns page on thread resume (#23534)
## Summary

The client currently calls `thread/resume` to establish live updates and
immediately follows it with `thread/turns/list` to hydrate recent turns.
This lets `thread/resume` return that page directly, eliminating a round
trip and the ordering/deduplication gap between the two calls.

Experimental clients opt in with `initialTurnsPage: { limit,
sortDirection, itemsView }`. The response returns `initialTurnsPage` as
a `TurnsPage`, including cursors for paging further back in history.
Keeping the controls in a nested opt-in object provides the useful
`thread/turns/list` knobs without spreading page-specific parameters
across `thread/resume`.

## Verification

- `just fmt`
- `just write-app-server-schema --experimental`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server
thread_resume_initial_turns_page_matches_requested_turns_list_page
--tests`
- `cargo test -p codex-app-server
thread_resume_rejoins_running_thread_even_with_override_mismatch
--tests`
- `just fix -p codex-app-server-protocol -p codex-app-server`
2026-05-28 09:18:13 -07:00
sayan-oai
2066874415 extension-api: add TurnItemEmitter to tool calls (#24813)
## Why
Extension-contributed tools need to emit visible turn items through
Codex's normal event and persistence pipeline.

## What
- Add `TurnItemEmitter` to extension `ToolCall`s and route the core
implementation through `Session::emit_turn_item_*`.
- Hold weak session and turn references so retained tool calls cannot
keep host state alive.
- Provide a no-op emitter for extension test callers.

## Test Plan
- `just test -p codex-core -E
'test(passes_turn_fields_and_scoped_turn_item_emitter_to_extension_call)'`

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-05-28 09:13:43 -07:00
Adam Perry @ OpenAI
a061befb46 Remove libubsan CI workaround (#24782)
It seems that this was added to allow rustc to load proc macros that had
been compiled with UBSan enabled, which zig does for debug and
`ReleaseSafe` builds. When zig drives the link of the final binary it
knows to include the ubsan runtime, but our zig-built artifacts are
being linked into a binary whose linking rustc drives. This removes the
libubsan workaround we have and replaces it with
`-fno-sanitize=undefined` passed to zig.

The new argument is passed at the end of zig's args so should take
precedence over any earlier arguments from the script's caller.
2026-05-28 15:49:01 +00:00
jif-oai
e426d48f6d Gate goal tools by thread eligibility (#24925)
## Why

Goal tools create and update goal state for a persistent thread. The
extension was only checking whether goals were enabled before
advertising those tools, which meant they could be surfaced in contexts
that should not receive thread goal controls: ephemeral threads without
persistent thread state and review subagents.

Those sessions can still run the goal extension lifecycle, but the
thread tools should only be visible when the current thread can safely
use them.

## What changed

- Adds a `GoalRuntimeConfig` that separates goal enablement from whether
goal tools are available for the current thread.
- Computes tool eligibility on thread start from
`persistent_thread_state_available` and `SessionSource`, hiding tools
for review subagents.
- Uses `GoalRuntimeHandle::tools_visible()` when contributing thread
tools so enabled runtime state does not automatically imply tool
exposure.
- Adds backend coverage for hiding goal tools on ephemeral threads and
review subagents.

## Testing

- Added `goal_tools_hidden_for_ephemeral_threads`.
- Added `goal_tools_hidden_for_review_subagents`.
2026-05-28 17:47:17 +02:00
Adam Perry @ OpenAI
bd2a732923 Add app-server startup benchmark crate (#24651)
## Summary
- Add a new `app-server-start-bench` crate to measure app-server startup
performance
- Wire the benchmark into the workspace and Bazel build so it can be run
consistently
- Update lockfiles and repo automation to account for the new package
2026-05-28 08:46:30 -07:00
Vaibhav Srivastav
a4ed6c5aa0 [codex] Update OpenAI Docs skill (#24914)
## Summary
- update the bundled `openai-docs` system skill to match the latest
`openai-docs-plus` content from `skills-internal`
- add the cached Codex manual fetch helper and expand the skill routing
for Codex self-knowledge
- keep the stable local skill identity and labels as `openai-docs`

## Why
The built-in OpenAI Docs skill needed to reflect the current upstream
guidance from `skills-internal` while preserving the local system-skill
name used by Codex.

## Impact
Codex now ships the newer OpenAI Docs skill behavior for Codex
self-knowledge and manual-first documentation lookups.

## Validation
- `just test -p codex-skills`
- exact directory diff against transformed `skills-internal`
`origin/main` was clean
2026-05-28 15:11:11 +00:00
pakrym-oai
1c7832ffa3 [codex] Store pending response items directly (#24865) 2026-05-28 07:23:08 -07:00
jif-oai
e7d156eb08 Add turn error lifecycle contributor (#24916)
Summary
- Add TurnErrorInput and TurnLifecycleContributor::on_turn_error to the
extension API.
- Emit the turn-error lifecycle from core turn error paths, including
usage limit failures.
- Add direct lifecycle coverage for the emitted error facts and stores.

Tests
- just fmt
- git diff --check
- Not run: full tests or clippy (per instructions)
2026-05-28 16:13:54 +02:00
jif-oai
ec803fe6c7 Add thread start contributor facts (#24915)
Summary: add session source and persistent-state availability to
ThreadStartInput; populate them from session init; update existing goal
test harness constructors. Tests: just fmt; git diff --check. No full
tests or clippy run per request.
2026-05-28 16:10:55 +02:00
cooper-oai
e5afe5bf8c [codex-cli] Refresh near-expiry ChatGPT access tokens before requests (#23546)
## Summary

- refresh managed ChatGPT auth during auth resolution when its access
token is inside ChatGPT web's five-minute near-expiry window
- cover refresh-window decisions while preserving the existing
expired-token refresh path

## Why

Codex already resolves managed ChatGPT auth before outbound requests and
refreshes expired access tokens there. This change adjusts the existing
predicate to refresh a still-valid access token once it is within the
same five-minute refresh window used by ChatGPT web, avoiding a request
with a token about to expire.

A cross-process serialization follow-up was explored in #24663 and
closed for now; we do not currently suspect cross-process refresh races
are a root cause of the refresh errors under investigation.

External-token, API-key, and Agent Identity auth modes remain unchanged.

## Validation

- `bazel test //codex-rs/login:login-all-test`
- `just fmt` runs Rust formatting successfully, then its Python SDK Ruff
step cannot install `openai-codex-cli-bin==0.131.0a4` on this Linux
environment because no compatible wheel is published.
2026-05-28 05:40:17 -07:00
jif-oai
3abf96739b Add Guardian review metrics (#24897)
## Why

Guardian reviews already emit analytics events, but we do not expose
aggregate OpenTelemetry metrics for review volume, latency, token usage,
or terminal outcomes. That makes it harder to monitor Guardian behavior
during rollouts and to compare review outcomes by source, action type,
session kind, model, and failure mode.

## What Changed

- Added Guardian review metric names for count, total duration, time to
first token, and token usage in `codex-rs/otel`.
- Added `core/src/guardian/metrics.rs` to convert
`GuardianReviewAnalyticsResult` into sanitized metric tags covering
decision, terminal status, failure reason, approval request source,
reviewed action, session kind, risk/outcome, model, reasoning effort,
and context/truncation state.
- Emitted the new metrics from `track_guardian_review` for each terminal
Guardian review result.

## Testing

- Added
`guardian_review_metrics_record_counts_durations_and_token_usage`, which
verifies the emitted count, duration, TTFT, token usage histograms, and
tag set through the in-memory metrics exporter.
2026-05-28 14:07:25 +02:00
jif-oai
ba6678ca9a Fix memories namespace for Responses API tools (#24898)
## Why

Dedicated memories tools are exposed through a Responses API namespace
tool. The namespace itself has to be a valid tool identifier, so
`memories/` can fail validation before the model ever gets a chance to
call the memory tools.

## What changed

- Changed `MEMORY_TOOLS_NAMESPACE` from `memories/` to `memories`.
- Added `memory_tool_namespace_matches_responses_api_identifier` so the
namespace stays non-empty and limited to Responses-safe identifier
characters.

## Verification

- Added unit coverage for the namespace identifier shape in
`codex-rs/ext/memories/src/tests.rs`.
2026-05-28 14:06:21 +02:00
jif-oai
120edad8ed [codex] Fix Guardian argument comment lint (#24902)
## Summary
- Add the required `/*parent_thread_id*/` argument comment at the
Guardian review session test callsite flagged by CI.

## Validation
- `just fmt`
- Not run: clippy/tests, per request; CI will cover them.
2026-05-28 13:38:48 +02:00
jif-oai
3307240195 Use stable Guardian prompt cache keys (#24803)
## Why

Guardian review sessions are reusable across forks when their
`GuardianReviewSessionReuseKey` is unchanged, but the underlying
Responses request was still using the child thread ID as
`prompt_cache_key`. That meant forked Guardian reviews that should share
cache context produced different cache keys, reducing prompt cache reuse
and weakening the reuse invariant.

## What Changed

- Adds a `ModelClient` prompt cache key override and uses it for
`ResponsesApiRequest.prompt_cache_key`.
- Computes Guardian review cache keys as
`guardian:<sha1(parent_thread_id:reuse_key)>`, scoped to the parent
thread plus the reuse-sensitive Guardian config.
- Wires session construction to apply that override only for Guardian
sub-agent sessions.

## Testing

- Added coverage that Guardian cache keys are stable for the same
parent/reuse key, change when either the parent thread or reuse key
changes, fit within the Responses API length limit, and are absent for
non-Guardian sessions.
- Extended the parallel review test to assert forked Guardian reviews
send the same `prompt_cache_key`.
2026-05-28 12:55:08 +02:00
jif-oai
4b9eda6ff6 Thread Guardian cache key through session (#24895)
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/session/session.rs. Validation was not run per
request; this branch is expected to rely on the companion split PRs.
2026-05-28 12:36:40 +02:00
jif-oai
4e57239cac Assert Guardian prompt cache key reuse (#24894)
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/guardian/tests.rs. Validation was not run per request;
this branch is expected to rely on the companion split PRs.
2026-05-28 12:36:36 +02:00
jif-oai
4ce563a873 Add Guardian review prompt cache key (#24893)
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/guardian/review_session.rs. Validation was not run per
request; this branch is expected to rely on the companion split PRs.
2026-05-28 12:36:25 +02:00
jif-oai
bf4978a01f Export Guardian prompt cache key helper (#24892)
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/guardian/mod.rs. Validation was not run per request;
this branch is expected to rely on the companion split PRs.
2026-05-28 12:36:17 +02:00
jif-oai
c95eb3d07b Stabilize Guardian client cache key handling (#24891)
Split from the Guardian prompt cache key change. This PR only updates
codex-rs/core/src/client.rs. Validation was not run per request; this
branch is expected to rely on the companion split PRs.
2026-05-28 12:36:01 +02:00
jif-oai
d5ec93f379 Move memories root setup out of core config (#24758)
## Why

Config loading should not create or write-authorize the memories root
just because memory support exists. Memory startup is the code path that
actually materializes that tree.

## What

- Stop creating the memories root during Config load and remove it from
legacy workspace-write projections.
- Grant the memories root read access only when the memories feature and
use_memories are enabled.
- Create the memories root inside memories startup before seeding
extension instructions.
- Update config and startup tests around the ownership boundary.

## Tests

- just fmt
- just fix -p codex-core
- just fix -p codex-memories-write
- just test -p codex-core
memory_tool_makes_memories_root_readable_without_creating_or_widening_writes
workspace_write_includes_configured_writable_root_once_without_memories_root
permission_profile_override_keeps_memories_root_out_of_legacy_projection
permissions_profiles_allow_direct_write_roots_outside_workspace_root
default_permissions_profile_populates_runtime_sandbox_policy
- just test -p codex-memories-write memories_startup_creates_memory_root

Note: a broader just test -p codex-core run is not clean in this
sandbox; it hit missing test_stdio_server plus seatbelt, realtime, and
environment-sensitive failures. The changed config tests above pass.
2026-05-28 11:51:24 +02:00
Ahmed Ibrahim
46946bb91c [codex] Stage Python SDK beta versions from release tags (#24872)
## Summary
- Treat `sdk/python` as a development template with source version
`0.0.0-dev`, matching the existing Python runtime packaging pattern.
- Have `python-v*` tags supply the published SDK beta version through
the existing `stage-sdk --sdk-version` path.
- Remove the workflow check requiring a source version bump for each
beta release and remove its now-unused host Python setup step.
- Keep the reviewed runtime dependency pin at
`openai-codex-cli-bin==0.132.0`.
- Remove beta-number-specific documentation so it does not need editing
for each publish.

## Why
The package staging script already writes the release version into the
artifact. Requiring the checked-in SDK template version to match every
tag adds release-only source churn without changing the package users
receive.

## Validation
- Not run locally; relying on online CI for this workflow and metadata
change.

## Release
After this PR lands, publish the next beta by pushing tag
`python-v0.1.0b2` from merged `main`.
python-v0.1.0b2
2026-05-27 23:24:42 -07:00
Ahmed Ibrahim
3d9f58ace8 [codex] Remove Python SDK beta warning note (#24870)
## Summary
- Remove the beta warning callout from the PyPI-facing Python SDK
README.
- Keep the existing Beta title and install/usage guidance unchanged.

## Validation
- Not run locally; relying on online CI for this documentation-only
change.

## Release
- Land this change before publishing the next Python SDK beta.
2026-05-27 23:13:48 -07:00
Ahmed Ibrahim
4a53b09eb7 [codex] Remove Python SDK language classifiers (#24868)
## Summary
- Remove the Python language classifiers from the Python SDK package
metadata.
- Keep `requires-python = ">=3.10"` as the package's interpreter
compatibility constraint.
- Avoid presenting a curated version-support list in PyPI metadata.

## Validation
- Not run locally; relying on online CI for this metadata-only change.

## Release
- Land this change before publishing the next Python SDK beta.
2026-05-27 23:09:34 -07:00
Ahmed Ibrahim
d0663f8004 [codex] Simplify Python SDK install guidance (#24866)
## Summary
- Remove the exact-version install snippet from the PyPI-facing Python
SDK README.
- Remove the release-selection explanation so the install section
presents the standard `pip install openai-codex` path directly.

## Validation
- Not run locally; relying on online CI for this documentation-only
change.
2026-05-27 23:01:42 -07:00