codex

mirror of https://github.com/openai/codex.git synced 2026-05-22 03:54:18 +00:00

Author	SHA1	Message	Date
anp-oai	f198ca115b	feat: Add btw alias for side slash command (#23592 )	2026-05-20 15:49:35 +00:00
Michael Bolin	e9f59e30d9	release: publish Codex package archive checksums (#23635 ) ## Summary Standalone installers and other downstream package consumers need a stable checksum source for the canonical package archives. Relying on per-asset metadata makes that harder to consume uniformly, especially when several package archives are produced in the same release. This keeps the `codex-package-.tar.gz` and `codex-app-server-package-.tar.gz` assets in the GitHub Release upload set and adds `codex-package_SHA256SUMS` to `dist/` before the release is created. The manifest contains one SHA-256 line per package archive and fails the release job if no package archives are present. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23635). * #23638 * #23637 * #23636 * __->__ #23635	2026-05-20 08:48:04 -07:00
Michael Bolin	b0b383bea3	runtime: use install context for bundled bwrap (#23634 ) ## Summary The Linux sandbox should find bundled `bwrap` through the same package-layout abstraction as the rest of the runtime, instead of maintaining a separate standalone-specific lookup path. This adds an `InstallContext` helper for bundled resources and updates `codex-linux-sandbox` to ask the current install context for `codex-resources/bwrap` before falling back to the old executable-relative probes. The tests cover npm-style, standalone, and canonical package layouts so `bwrap` lookup follows the package structure introduced earlier in the stack. ## Test plan - `cargo test -p codex-install-context` - `cargo test -p codex-linux-sandbox --lib` - `just fix -p codex-install-context -p codex-linux-sandbox` - `just bazel-lock-check` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23634). * #23638 * #23637 * #23636 * #23635 * __->__ #23634	2026-05-20 08:24:43 -07:00
pakrym-oai	a52c91d8b5	[codex] Hide deferred tools from code mode prompt (#23605 ) ## Why `code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools` was failing because code-mode prompt generation used the same nested tool spec list for both the model-visible `exec` guide and the runtime `ALL_TOOLS` surface. That allowed deferred MCP/app tools, such as `calendar_timezone_option_99`, to leak into the `exec` description even though they should only be discoverable through `ALL_TOOLS` at runtime. ## What changed Split code-mode nested tool planning into two sets in `core/src/tools/spec_plan.rs`: - runtime nested tool specs still include deferred tools, so `tools[...]` and `ALL_TOOLS` can call them - `exec` prompt docs only render non-deferred tools, so deferred app tools stay out of the model-visible guide ## Validation - `cargo test -p codex-core --test all code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools -- --nocapture` - looped the same focused test 5 additional times with `cargo test -q -p codex-core --test all code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`	2026-05-20 08:09:45 -07:00
jif-oai	59507b8491	feat: expose turn-start metadata to extensions (#23688 ) ## Why The goal extension needs more context when a turn starts than `turn_store` alone provides. In particular, goal accounting needs the stable turn id, the effective collaboration mode, and the cumulative token-usage baseline captured at turn start so it can: - suppress goal accounting for plan-mode turns - compute exact per-turn deltas from cumulative `total_token_usage` snapshots instead of relying on the most recent usage event alone - keep the extension-owned accounting path aligned with the host turn lifecycle ## What - extend `codex_extension_api::TurnStartInput` to expose `turn_id`, `collaboration_mode`, and `token_usage_at_turn_start` - pass the full `TurnContext` plus the captured token-usage baseline through the turn-start lifecycle emission path - initialize goal turn accounting from the turn-start baseline and collaboration mode - switch goal token accounting to compute deltas from cumulative `total_token_usage` snapshots - add coverage for the new turn-start lifecycle fields and for goal-accounting baseline behavior ## Testing - added `turn_start_lifecycle_exposes_turn_metadata_and_token_baseline` in `codex-rs/core/src/session/tests.rs` - added `ext/goal/tests/accounting.rs` coverage for baseline-aware goal accounting and plan-mode suppression	2026-05-20 15:54:29 +02:00
jif-oai	1392a2a770	feat: async turn item process (#23692 ) Mechanical change	2026-05-20 15:30:01 +02:00
jif-oai	f64fce61b3	feat: async approval contrib (#23690 )	2026-05-20 15:13:54 +02:00
jif-oai	b555dd5d1d	feat: wire goal extension tools to the dedicated goal store (#23685 ) ## Why `ext/goal` already had the tool specs and contributor wiring for `/goal`, but the installed tools still depended on a placeholder backend that always errored. That meant the extension could not actually own goal persistence even though the dedicated `thread_goals` store already exists. This change wires the extension tools directly to the dedicated goal store so the extension can create, read, and complete goals against real state instead of falling back to host-side placeholders. ## What changed - make `install_with_backend(...)` require `Arc<codex_state::StateRuntime>` so goal storage is always available when the extension is installed - remove the unused no-backend/public backend abstraction from `ext/goal` and have the tool executors talk directly to `StateRuntime` - map `thread_goals` rows into the existing protocol response shape for `get_goal`, `create_goal`, and `update_goal` - preserve current thread-list behavior by filling an empty thread preview from the goal objective when a goal is created through the extension path - add integration coverage for the installed tool surface, including successful goal creation and duplicate-create rejection ## Testing - `cargo test -p codex-goal-extension`	2026-05-20 14:44:17 +02:00
jif-oai	51d6616431	fix: main (#23675 ) Fix main due to conflicting merges This is only fixing some imports and mechanics	2026-05-20 12:27:39 +02:00
jif-oai	9483b09ea4	feat: rename 2 (#23668 ) Just a mechanical renaming	2026-05-20 12:11:44 +02:00
jif-oai	66d5edf825	feat: rename 3 (#23669 ) Just a mechanical renaming	2026-05-20 12:07:06 +02:00
jif-oai	93456320ef	feat: rename 1 (#23667 ) Just a mechanical renaming	2026-05-20 12:05:58 +02:00
jif-oai	18cefba922	Add timeout for remote compaction requests (#23451 ) ## Why Remote compaction currently sends a unary `POST /responses/compact` and waits for the full response before replacing history or emitting the completed `ContextCompaction` item. Unlike normal `/responses` streaming requests, this unary compact request had no timeout boundary. If the backend accepts the request and then stalls before returning a body, the existing request retry policy never sees a transport error, so the compact turn can remain stuck after the started item with no completion or actionable error. That matches the reported hang shape in issues such as #18363, where logs show `responses/compact` was posted but no corresponding compact completion followed. A bounded request timeout gives the existing retry policy a concrete timeout error to retry instead of letting the user sit indefinitely on automatic context compaction. ## What - Add a request timeout to legacy `/responses/compact` calls. - Size that timeout from the provider stream idle timeout with a conservative multiplier, so the default compact attempt gets 20 minutes rather than the 5 minute stream idle window. - Map API transport timeouts to a request timeout error instead of the child-process timeout message. ## Testing - Not run (per request; CI will cover).	2026-05-20 11:56:00 +02:00
richardopenai	000bf5ce6d	Migrate exec-server remote registration to environments (#23633 ) ## Summary - migrate exec-server remote registration naming from executor to environment - align CLI, public Rust exports, registry error messages, and relay test fixtures with the environment registry contract - keep the live registration path and response model consistent with `/cloud/environment/{environment_id}/register` ## Verification - `cargo test -p codex-exec-server remote::tests::register_environment_posts_with_auth_provider_headers --manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml` - `cargo test -p codex-exec-server --test relay multiplexed_remote_environment_routes_independent_virtual_streams --manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml` - `cargo check -p codex-cli --manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml` (still running when PR opened; will update after completion if needed)	2026-05-20 00:25:04 -07:00
sayan-oai	34aad43684	add encryptedcontent to functioncalloutput (#23500 ) add new `EncryptedContent` variant to `FunctionCallOutputContentItem` ahead of standalone websearch. we need to be able to receive and pass encrypted function call output from the new web search endpoint back to responsesapi, as we cannot expose direct search results.	2026-05-19 23:47:48 -07:00
Michael Bolin	cfa16fcc2e	runtime: detect Codex package layout (#23596 ) ## Why The package-builder stack now creates a canonical Codex package directory where the entrypoint lives under `bin/`, bundled helper resources live under `codex-resources/`, and bundled PATH-style tools live under `codex-path/`. That layout is not specific to the standalone installer: npm, brew, install scripts, and manually unpacked artifacts should all be able to use the same package shape. The Rust runtime still only knew about the legacy standalone release layout, where resources sit next to the executable. A packaged binary therefore would not identify its package root or prefer the bundled `rg` from `codex-path/`. ## What changed - Adds `CodexPackageLayout` to `codex-install-context` and detects it from an executable path shaped like `<package>/bin/<entrypoint>` when `<package>/codex-package.json` is present. - Splits `InstallContext` into an install `method` plus an optional package layout so the layout is shared across npm, bun, brew, standalone, and other launch contexts. - Stores package-layout paths as `AbsolutePathBuf` values. - Keeps `codex-resources/` and `codex-path/` optional so Codex can still run with degraded behavior if sidecar directories are missing. - Updates `InstallContext::rg_command()` to prefer bundled `codex-path/rg` or `rg.exe`, then fall back to the legacy standalone resources location, then system `rg`. - Updates `codex doctor` reporting so package installs show package, bin, resources, and path directories, and so bundled search detection recognizes `codex-path/` for any install method. ## Test plan - `cargo test -p codex-install-context` - `cargo test -p codex-cli` - `cargo test -p codex-tui update_action::tests::maps_install_context_to_update_action` - `just bazel-lock-check`	2026-05-19 23:13:49 -07:00
Michael Bolin	57a68fb9e3	ci: build Codex package archives in release workflow (#23582 ) ## Why Release CI already builds the Codex entrypoints before staging artifacts, and the package builder can now package those prebuilt binaries directly. The workflow should produce package-shaped sidecar archives from the same staged entrypoints that downstream distribution channels will eventually consume, without rebuilding `codex` or `codex-app-server` inside the packaging step. This intentionally does not publish the new package archives as GitHub Release assets yet. The archives are kept with workflow artifacts until npm, Homebrew, `install.sh`, winget, and related consumers are ready to switch over. ## What changed - Adds a `Build Codex package archive` step to `.github/workflows/rust-release.yml` after target artifacts are staged. - Runs `scripts/build_codex_package.py` for both release bundles: - `primary` builds `codex-package-${TARGET}.tar.gz` with `--variant codex`. - `app-server` builds `codex-app-server-package-${TARGET}.tar.gz` with `--variant codex-app-server`. - Passes `--entrypoint-bin target/${TARGET}/release/<entrypoint>` so packages contain the entrypoint already built by the workflow. - Deletes both package archive names before the final GitHub Release upload so they remain workflow artifacts only for now. ## Verification - Parsed `.github/workflows/rust-release.yml` with Ruby's YAML loader. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23582). * #23596 * __->__ #23582	2026-05-20 05:43:53 +00:00
Michael Bolin	343a74076f	build: package prebuilt Codex entrypoints (#23586 ) ## Why The package builder should describe the binaries it is actually packaging, not require callers to restate release metadata out of band. A caller-provided `--version` flag can drift from the workspace version, but running the target entrypoint to discover its version breaks cross-target packages when the produced binary cannot execute on the build host. This PR keeps package metadata tied to the repository source of truth by reading `[workspace.package].version` from `codex-rs/Cargo.toml`. It also prepares the package layout for `codex-app-server` packages: the same package structure can now represent either the CLI entrypoint or the app-server entrypoint while keeping shared sidecars such as `rg`, `bwrap`, and Windows sandbox helpers in the existing package directories. ## What changed - Removes the `--version` CLI flag from `scripts/build_codex_package.py`. - Adds Cargo.toml version discovery for `codex-package.json.version` via `codex-rs/Cargo.toml`. - Adds `--entrypoint-bin` so callers can package a prebuilt entrypoint instead of rebuilding it with Cargo. - Makes `--variant` an explicit choice between `codex` and `codex-app-server`, and uses it to select the cargo binary and packaged `bin/` entrypoint name. - Updates `scripts/codex_package/README.md` to document variants, prebuilt entrypoints, and Cargo.toml version detection. ## Verification - Compiled `scripts/build_codex_package.py` and `scripts/codex_package/*.py` with `PYTHONDONTWRITEBYTECODE=1`. - Ran `scripts/build_codex_package.py --help` and verified `--version` is gone while `--variant` and `--entrypoint-bin` are present. - Verified the package builder reads version `0.0.0` from `codex-rs/Cargo.toml`. - Built a fake cross-target `codex-app-server` package using a non-executable `--entrypoint-bin`; verified metadata records version `0.0.0`, variant `codex-app-server`, and `bin/codex-app-server` as the entrypoint.	2026-05-19 22:10:03 -07:00
xl-openai	dc255b0d8a	feat: Add vertical remote plugin collection support (#23584 ) - Adds an explicit vertical marketplace kind for plugin/list that fail-open fetches collection=vertical only when full remote plugins are disabled. - Renames the global remote marketplace/cache identity to openai-curated-remote and materializes remote installs with backend release versions and app manifests.	2026-05-19 22:03:08 -07:00
Eric Traut	9dda71dbae	Warn on invalid UTF-8 in AGENTS.md files (#23232 ) Fixes #23223. ## Why Malformed AGENTS instructions should not fail silently. The reported issue had invalid UTF-8 in a global `AGENTS.md`; before this change, Codex treated that decode failure like a missing file, so the personal instructions disappeared without a user-visible explanation and the rollout had no `# AGENTS.md instructions` block. Project-level AGENTS files already used lossy decoding, so their instructions still appeared, but invalid bytes were replaced without telling the user. Global and project AGENTS files should behave consistently: keep usable instruction text when possible, and surface a diagnostic when bytes had to be replaced. ## What changed Global `AGENTS.override.md` and `AGENTS.md` loading now reads bytes and decodes with replacement characters on invalid UTF-8, matching project-level AGENTS behavior. Both global and project AGENTS loading now emit a startup warning when invalid UTF-8 is found, and both keep the instruction text with invalid byte sequences replaced. Missing files, non-file candidates, empty files, and the existing `AGENTS.override.md` before `AGENTS.md` precedence keep their current behavior. ## How users see it The warnings flow through the existing startup warning surface. App-server clients receive config-time startup warnings as `configWarning` notifications during initialization, and thread startup emits startup warnings as thread-scoped `warning` notifications. Global AGENTS invalid UTF-8 warnings can appear on both surfaces. Project-level AGENTS invalid UTF-8 warnings are discovered while building thread instructions, so they appear as thread-scoped `warning` notifications. Clients that render warning notifications in the conversation surface show the message as a visible diagnostic instead of silently hiding or altering instructions.	2026-05-19 21:56:46 -07:00
Ahmed Ibrahim	5a4202ad90	[codex] Preserve raw code-mode exec output by default (#23564 ) ## Why Code mode can use nested unified exec calls as data sources. When those calls omit `max_output_tokens`, code mode should receive raw command output so the script can parse or summarize it itself. When code mode does provide `max_output_tokens`, that explicit nested budget should be respected, including values above the default unified exec limit, rather than being capped before code mode sees the result. ## What - Preserve direct unified exec truncation behavior, while letting code-mode exec/write_stdin keep `max_output_tokens` as `None` unless explicitly supplied. - Make code-mode tool results use raw output when no explicit limit is present, and use the explicit nested limit directly when one is specified. - Refactor unified exec output formatting so `truncated_output` takes the caller-selected token budget. - Add e2e integration coverage for explicit nested exec limits, omitted nested exec limits, outer exec limit propagation, omitted-limit outputs that exceed both the default and a small truncation policy, explicit nested limits above those caps, and high explicit limits that still compact larger command output. - Reuse the code-mode turn setup helper while directly asserting the exact exec output item in each test. ## Testing - `just fmt` - `git diff --check` - Not run locally per repo guidance; CI should validate the e2e integration tests.	2026-05-20 04:02:14 +00:00
Eric Traut	e43a2e297f	Fix stale background terminal poll events (#23231 ) ## Why Issue #23214 reports `/ps` showing no background terminals while the status line still says it is waiting for a background terminal. The race is in core: `write_stdin` can poll a process that exits before the response returns. The process manager correctly returns `process_id: None`, but the handler still emitted a `TerminalInteraction` event using the requested session id, causing clients to believe a dead process was still being polled. Fixes #23214. ## What changed - Suppress `TerminalInteraction` events for empty `write_stdin` polls once `response.process_id` is `None`. - Continue emitting interactions for non-empty stdin, even if that input causes the process to exit before the response returns. - Extend the unified exec integration test to assert completed empty polls do not emit terminal interactions. ## Verification - `cargo test -p codex-core --test all unified_exec_emits_one_begin_and_one_end_event` - `cargo test -p codex-core --test all unified_exec_emits_terminal_interaction_for_write_stdin` `cargo test -p codex-core` currently aborts in unrelated `agent::control::tests::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale` with a reproducible stack overflow.	2026-05-19 20:48:37 -07:00
Ahmed Ibrahim	532b9c83ae	Move plugin and skill warmup into session startup (#23535 ) ## Why Plugin and skill loading is useful as warmup and early validation, but session startup does not need to wait for that work before it can continue building the session. Keeping it on the serial startup path adds avoidable latency to every fresh thread start. We still want invalid skill configurations to show up quickly, and we want the warmup to exercise the same plugin and skill manager caches that the normal turn path uses. ## What changed - moved plugin and skill warmup into the session startup async path instead of eagerly awaiting it on the serial setup path - kept the warmup using the session's resolved filesystem/environment context so skill loading still sees the right roots - preserved early skill-load error logging so broken skill configurations still surface during startup - left the per-turn plugin and skill loading path unchanged, so turns still use the normal cached managers ## Testing - Not run locally; relying on CI for validation.	2026-05-19 20:05:52 -07:00
viyatb-oai	c3faea0b09	feat: add permission profile list api (#23412 ) ## Why Clients need a typed permission-profile catalog instead of reconstructing that state from config internals. ## What changed - Added `permissionProfile/list` to the app-server v2 protocol with cursor pagination and optional `cwd`. - The list response includes built-in permission profiles plus config-defined `[permissions.<id>]` profiles from the effective config for the request context. - Permission profiles keep optional `description` metadata for display purposes. - App-server docs and schema fixtures are updated for the new RPC.	2026-05-20 02:42:56 +00:00
Michael Bolin	1495302347	feat: expose codex-app-server version flag (#23593 ) ## Why `codex-app-server` is published as a standalone release binary, so it should support the same basic version inspection behavior users expect from command-line tools. This is independent of package assembly: package metadata now comes from `codex-rs/Cargo.toml`, but the standalone app-server binary should still answer `--version` directly. ## What changed - Enables Clap's generated `--version` flag for the `codex-app-server` binary by adding `#[command(version)]` to its top-level parser. ## Verification - Ran `cargo run -p codex-app-server --bin codex-app-server -- --version` and verified it prints `codex-app-server 0.0.0`.	2026-05-19 19:01:05 -07:00
starr-openai	64ef6cd1e4	Fan out rust-ci-full nextest by platform (#23358 ) ## Why `rust-ci-full` was paying the full Cargo nextest build-and-run cost once per platform, with Windows ARM64 as the long pole. This change moves the heavy work into one reusable per-platform flow: build a nextest archive once, then replay it across four shards so the platform lane spends less time running tests serially. For Windows ARM64, the archive is cross-compiled on Windows x64 and replayed on native Windows ARM64 shards so the slow ARM64 machine is used for execution rather than compilation. ## What changed - split the `rust-ci-full` nextest matrix into five explicit per-platform reusable-workflow calls - add `.github/workflows/rust-ci-full-nextest-platform.yml` to build one archive, upload timings/helpers, replay four nextest shards, upload per-shard JUnit, and roll the shard status back up per platform - add Windows CI helpers for Dev Drive setup and MSVC ARM64 linker environment export so the Windows ARM64 archive can be produced on Windows x64 - keep the existing Cargo git CLI fetch hardening inside the reusable workflow, since caller workflow-level `env` does not flow through `workflow_call` - document the archive-backed shard shape in `.github/workflows/README.md` - raise the default nextest slow timeout to 30s so the sharded full-CI path does not treat every >15s test as stuck ## Verification - validated the archive/shard flow with live GitHub Actions runs on this PR branch - Windows ARM64 cross-compile latency on completed runs: - https://github.com/openai/codex/actions/runs/26118759651: `34m30s` lane e2e, `17m16s` archive build, `9m55s` shard phase - https://github.com/openai/codex/actions/runs/26120777976: `30m36s` lane e2e, `17m21s` archive build, `6m50s` shard phase - comparable pre-cross-compile sharded Windows ARM64 runs were `55m01s`, `50m21s`, and `46m42s`, so the completed cross-compile runs improved the lane by roughly `12m` to `24m` versus the prior range - latest corrected cross-compile run: https://github.com/openai/codex/actions/runs/26120777976 - Windows ARM64 archive built successfully on Windows x64 - native Windows ARM64 shards started immediately after the archive upload - 3/4 Windows ARM64 shards passed; the failing shard hit the same existing `code_mode` test failure seen outside this lane - downloaded failed-shard JUnit XML from the validation runs and confirmed the remaining red is from known test failures, not archive/shard wiring - no local Codex tests run per repo guidance ## Notes - this PR does not change developers.openai.com documentation	2026-05-19 17:54:41 -07:00
Michael Bolin	79f044ed34	build: default Codex package target and output (#23541 ) ## Why The package builder should be easy to run during local iteration. Requiring callers to provide both a target triple and an output directory every time makes the common host-package case more awkward than necessary. This PR keeps explicit overrides available, but makes the default invocation useful: build for the current host platform and place the package in a fresh temporary directory. Because a temp output path is otherwise easy to lose, the builder continues to print the final package directory path when it completes. ## What changed - Makes `--target` optional and maps the host OS/architecture to supported Codex package target triples. - Uses GNU Linux target triples for Linux host defaults, while keeping the musl targets available for release jobs that pass `--target` explicitly. - Makes `--package-dir` optional and creates a new `codex-package-` temp directory when omitted. - Documents the new defaults in `scripts/codex_package/README.md`. ## Verification - Compiled `scripts/build_codex_package.py` and `scripts/codex_package/.py` with `PYTHONDONTWRITEBYTECODE=1`. - Ran `scripts/build_codex_package.py --help` from outside the repo. - Verified Linux host detection maps `x86_64` and `aarch64` to GNU target triples. - Ran a fake-Cargo package build while omitting both `--target` and `--package-dir`; verified the generated metadata target, expected package files, and printed temp package path. - Ran a fake-Cargo package build for `x86_64-unknown-linux-gnu` and verified `codex`, `bwrap`, and `rg` are assembled into the package.	2026-05-20 00:05:43 +00:00
Michael Bolin	c58c84d6ee	test: fix multi-agent service tier assertion (#23576 ) ## Why `openai/codex#22169` added a regression test that expects an invalid child `service_tier` to be rejected, but the test used `Result::expect_err` on `SpawnAgentHandler::handle`. That requires the `Ok` type to implement `Debug`, and this handler returns `Box<dyn ToolOutput>`, so Bazel failed while compiling `codex-core` tests before it could run them. ## What changed - Capture the handler result and assert on `result.err()` instead of calling `expect_err`. - Keep the same `FunctionCallError::RespondToModel` assertion for the rejected service tier. ## Verification - `cargo test -p codex-core spawn_agent_role_service_tier_does_not_hide_invalid_spawn_request`	2026-05-19 16:47:20 -07:00
Matthew Zeng	b019a678d8	Remove unused ARC monitor path (#23573 ) ## Summary - remove the unreachable ARC monitor path from MCP tool approval handling - delete the unused ARC monitor module/tests and trim the orphaned safety-monitor decision plumbing - keep `always allow` approvals on the existing auto-approval short-circuit without a dead monitor hop ## Testing - `cargo test -p codex-core mcp_tool_call` - `just fmt` - `just fix -p codex-core` - `git diff --check` ## Additional validation - Attempted `cargo test -p codex-core`; the library test target passed, then the integration target failed in this local environment. - The narrower MCP-focused rerun passed its unit coverage and only hit missing local `test_stdio_server` binaries in filtered integration cases.	2026-05-19 16:23:25 -07:00
Michael Bolin	59f262a2b4	build: fetch rg for Codex packages (#23526 ) ## Why The Codex package builder should produce a complete package without requiring callers to pre-populate `rg` under `codex-cli/vendor` or have `dotslash` installed on `PATH`. The repo already tracks the authoritative DotSlash manifest in `codex-cli/bin/rg`, so the builder can read that metadata directly and fetch the correct ripgrep archive for the target it is packaging. ## What changed - Added `scripts/codex_package/ripgrep.py` to parse `codex-cli/bin/rg` after stripping the shebang, select the target platform entry, download the configured artifact, and verify the recorded size and SHA-256 digest. - Added a cache under `$TMPDIR/codex-package/<target>-rg` so verified archives can be reused without fetching again. - Extracted `rg`/`rg.exe` from `tar.gz` and `zip` artifacts into the package-builder cache, then copied that into `codex-path` through the existing package layout flow. - Kept `--rg-bin` as an explicit local override for offline tests and unusual local workflows. - Documented the default `rg` fetch/cache behavior in `scripts/codex_package/README.md`. ## Verification - Ran wrapper/module syntax compilation. - Ran `scripts/build_codex_package.py --help` from `/private/tmp`. - Ran a local manifest fetch test covering shebang-stripped manifest parsing, `tar.gz` extraction, `zip` extraction, size/SHA-256 verification, and cache reuse after deleting the original source archives. - Ran fake-cargo package/archive builds for macOS, Linux, and Windows target layouts with `--rg-bin`, including an assertion that generated tar archives contain no duplicate member names. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23526). * #23541 * __->__ #23526	2026-05-19 15:52:17 -07:00
canvrno-oai	27c4c67b15	Fix: TUI starting in wrong CWD (#23538 ) This fixes a regression wher codex could start in the wrong directory when a live local app-server socket was present. The issue was that implicit local socket reuse was being treated like an explicit remote workspace session, which dropped the invoking cwd unless --cd was passed. The change separates local socket transport from true remote workspace semantics. - Plain local startup keeps local cwd, trust, resume, picker, and config-refresh behavior. - Explicit --remote keeps the existing remote cwd behavior. - Added coverage for launch target selection and local-session filtering/cwd behavior. Steps to test: - Start a local app-server from a different directory than the repo you want to use. - Launch codex from a project/worktree without --cd. - Confirm the session starts in the invoking directory, not the app-server process directory. - Confirm explicit codex --remote ... still preserves existing remote behavior.	2026-05-19 15:48:40 -07:00
adams-oai	d86352d520	Add CUA requirements subsection for locked computer use (#23555 ) Adds a new top-level section for "CUA" requirements that can allow for disablement of specific features as needed for enterprises.	2026-05-19 15:41:44 -07:00
Ahmed Ibrahim	c53da029bc	[codex] Honor role-defined spawn service tiers (#22169 ) ## Why Custom agent roles are ordinary config layers, so a role file can already express `service_tier` just like other config values. The spawned-agent tier path needs to preserve that effective role config and follow the same precedence pattern as model/reasoning. ## What changed - Apply an explicit spawn-time `service_tier` onto the child config before role application, so a role config layer can override it just like role-defined model/reasoning settings do. - Validate the final effective child tier after the final child model is known, while still falling back to the parent tier when no child tier survives. - Add focused integration coverage for both v1 and v2 proving role TOML loads a service tier, spawned children keep that role-configured tier, and a role tier wins over a conflicting spawn-time tier. ## Validation - `just fmt` - `git diff --check` - Local Rust tests not run, per repo guidance; CI should exercise the new coverage.	2026-05-19 22:40:41 +00:00
efrazer-oai	c2141c7ce0	fix: serialize unix app-server startup (#23516 ) # Summary Unix-socket app-server startup can currently race when multiple launch attempts target the same `CODEX_HOME`. Those processes can overlap before the control socket exists, which lets them enter SQLite state initialization concurrently and reproduce the startup corruption pattern seen in SSH mode. This change makes the app-server own that singleton startup guarantee. Unix-socket startup now takes a `CODEX_HOME`-scoped advisory lock before SQLite initialization, runs the existing control-socket preparation check while holding that lock, returns the established `AddrInUse` error when another live listener already owns the socket, and releases the lock once the new listener has bound its socket. # Design decisions - The singleton rule lives in `app-server --listen unix://`, not in a desktop-only caller path, so every Unix-socket launch gets the same race protection. - A duplicate raw app-server launch returns an error instead of silently succeeding. The attach operation remains `app-server proxy`, which continues to connect to an already-running listener. - The lock is held only across the dangerous startup window: socket preparation, SQLite initialization, and socket bind. It is not held for the app-server lifetime. - Listener detection stays in `prepare_control_socket_path(...)`, so the preexisting live-listener and stale-socket behavior remains the single source of truth. # Testing Tests: targeted Unix-socket transport tests on the branch checkout, full `codex-cli` build on `efrazer-db10`, and an SSH-style smoke on `efrazer-db10` covering concurrent app-server starts, explicit duplicate-start errors, and absence of SQLite startup-error matches in launch logs.	2026-05-19 14:57:11 -07:00
Matthew Zeng	8335b56c33	Split plugin install discovery into list and request tools (#23372 ) ## Summary - Add `list_available_plugins_to_install` as the inventory step for plugin and connector install suggestions. - Slim `request_plugin_install` so it only handles the actual elicitation, instead of carrying the full discoverable list in its prompt. - Emit send-time telemetry when an install elicitation is dispatched, including requested tool identity in the event payload. - Emit install-result telemetry through `SessionTelemetry`, including tool type, user response action, and completion status. - Update registration and tests to cover the new two-step flow while keeping the existing `tool_suggest` feature gate unchanged. ## Testing - `just fmt` - `cargo test -p codex-tools` - `cargo test -p codex-core request_plugin_install` - `cargo test -p codex-core list_available_plugins_to_install` - `cargo test -p codex-core install_suggestion_tools_can_be_registered_without_search_tool` - `cargo test -p codex-otel manager_records_plugin_install_suggestion_metric` - `cargo test -p codex-otel manager_records_plugin_install_elicitation_sent_metric` - `just fix -p codex-core` - `just fix -p codex-tools` - `just fix -p codex-otel` - `cargo check -p codex-core`	2026-05-19 14:45:37 -07:00
starr-openai	1509ae6d8d	Route local-only app-server gating through processors (#23551 ) ## Summary - move local-only app-server gating out of `MessageProcessor` - let `fs/`, `command/exec`, and `process/spawn` resolve local availability inside their owning processors - keep `fs/` mounted for the future environment-param path while preserving current no-local error behavior ## Validation - not run locally per Codex repo guidance	2026-05-19 14:38:03 -07:00
Tom	954a9c8579	Fix empty rollout path app-server handling (#23400 ) ## Summary - Coerce `path: ""` to `None` at the v2 protocol params deserialization boundary for `thread/resume` and `thread/fork`. - Restore the pre-ThreadStore running-thread resume behavior: if `threadId` is already running, rejoin it by id and treat a non-empty `path` only as a consistency check; otherwise cold resume keeps `history > path > threadId` precedence. - Add protocol, resume, and fork regression coverage for empty path payloads; refresh app-server schema fixtures for the clarified params docs. ## Tests - `just fmt` - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol thread_path_params_deserialize_empty_path_as_none` - `cargo test -p codex-app-server-protocol --test schema_fixtures` - `cargo test -p codex-app-server empty_path` - `RUST_MIN_STACK=8388608 cargo test -p codex-app-server --test all thread_resume_rejects_mismatched_path_for_running_thread_id` - `RUST_MIN_STACK=8388608 cargo test -p codex-app-server --test all thread_resume_uses_path_over_non_running_thread_id`	2026-05-19 21:19:38 +00:00
Felipe Coury	40be41763c	fix(tui): preserve modified enter in plan questions (#23536 ) ## Why Plan mode questionnaires reuse the shared composer for free-form answers, but the surrounding `request_user_input` overlay still treated every `KeyCode::Enter` as “advance to the next question.” That made `Shift+Enter` insert a newline in the composer and then immediately advance the questionnaire anyway. Fixes #23448. ## What Changed - pass the live `RuntimeKeymap` into `RequestUserInputOverlay` so its embedded composer honors existing `/keymap` composer/editor remaps - advance free-form questions only on the configured composer submit binding, instead of any Enter-shaped key event - add regressions for `Shift+Enter` newline behavior and configured composer submit bindings inside the questionnaire UI ## How to Test 1. Start Codex in Plan mode and trigger a `request_user_input` questionnaire with a free-form answer field. 2. Focus the free-form field, type a line, then press `Shift+Enter`. 3. Confirm the answer gains a newline and the questionnaire stays on the same question. 4. Press the configured submit binding, or plain `Enter` with the default keymap, and confirm the questionnaire advances as before. Targeted tests: - `cargo test -p codex-tui bottom_pane::request_user_input::tests::freeform_ -- --nocapture` ## Notes - `cargo test -p codex-tui` still reaches an unrelated existing stack overflow in `app::tests::discard_side_thread_removes_agent_navigation_entry` on this checkout. - `just argument-comment-lint` is locally blocked by Bazel analysis failing in external `compiler-rt` before the lint runs.	2026-05-19 18:01:38 -03:00
starr-openai	83af3abc68	Refactor exec-server websocket pump (#23327 ) ## Why Exec-server websocket handling had separate reader and writer tasks for the same socket. That made websocket control-frame handling asymmetric: the task reading frames could observe `Ping`, but the task allowed to write frames was elsewhere. This PR moves each physical websocket onto one always-running pump so the socket owner can handle application frames and websocket control frames together. ## What changed - Refactored direct exec-server websocket connections in `connection.rs` to use one task that owns the websocket for outbound JSON-RPC, inbound JSON-RPC, periodic keepalive pings, and `Ping` -> `Pong` replies. - Refactored relay websocket handling in `relay.rs` the same way for both the harness-side logical connection and the multiplexed executor physical socket. - Preserved the existing keepalive ownership policy: outbound direct websocket clients still send periodic pings, inbound Axum accepts only reply with pongs, and relay physical websocket endpoints keep their existing periodic pings. - Added focused websocket pump tests for ping/pong, binary JSON-RPC, relay data, malformed relay text frames, and close/disconnect behavior. - Reconnect behavior is intentionally left for a follow-up. ## Validation - Devbox Bazel focused unit target: - `//codex-rs/exec-server:exec-server-unit-tests --test_filter='websocket_connection_\|harness_connection_\|multiplexed_executor_'`	2026-05-19 13:31:57 -07:00
starr-openai	5c43a64e2b	Make local environment optional in EnvironmentManager (#23369 ) ## Summary - make `EnvironmentManager` local environment/runtime paths optional - simplify constructor surface around snapshot materialization - rename local env accessors to `require_local_environment` / `try_local_environment` ## Validation - devbox Bazel build for touched crate surfaces - `//codex-rs/exec-server:exec-server-unit-tests` - `//codex-rs/app-server-client:app-server-client-unit-tests` - filtered touched `//codex-rs/core:core-unit-tests` cases	2026-05-19 12:55:34 -07:00
Michael Bolin	7f4d7ae3a4	build: add Codex package builder (#23513 ) ## Why Codex CLI packaging is currently split across npm staging, standalone installers, and release bundle creation, which makes it hard to define and validate a single valid package directory. This adds the first standalone package builder so later release paths can converge on the same canonical layout. ## What changed - Added `scripts/build_codex_package.py` as the stable executable wrapper around `scripts/codex_package`. - Added modules for CLI parsing, target metadata, grouped cargo builds, package layout validation, and archive writing. - The builder creates a package directory with `codex-package.json`, `bin/`, `codex-resources/`, and `codex-path`, and can serialize it as `.tar.gz`, `.tar.zst`, or `.zip`. - Source-built artifacts are built by one grouped `cargo build`: `codex` for all targets, `bwrap` for Linux, and the Windows sandbox helpers for Windows. `rg` remains an input because it is vendored from upstream rather than built from this repo. - Added `scripts/codex_package/README.md` to document the package layout, source-built artifacts, and cargo profile behavior. ## Verification - Ran wrapper/module syntax compilation. - Ran `scripts/build_codex_package.py --help` from `/private/tmp`. - Ran fake-cargo package/archive builds for macOS, Linux, and Windows target layouts, including an assertion that generated tar archives contain no duplicate member names. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23513). * #23526 * __->__ #23513	2026-05-19 19:54:03 +00:00
Abhinav	d661ab70ed	Add SubagentStart hook (#22782 ) # What `SubagentStart` runs once when Codex creates a thread-spawned subagent, before that child sends its first model request. Thread-spawned subagents use `SubagentStart` instead of the normal root-agent `SessionStart` hook. Configured handlers match on the subagent `agent_type`, using the same value passed to `spawn_agent`. When no agent type is specified, Codex uses the default agent type. Hook input includes the normal session-start fields plus: - `agent_id`: the child thread id. - `agent_type`: the resolved subagent type. `SubagentStart` may return `hookSpecificOutput.additionalContext`. That context is added to the child conversation before the first model request. # Lifecycle Scope Only thread-spawned subagents run `SubagentStart`. Internal/system subagents such as Review, Compact, MemoryConsolidation, and Other do not run normal `SessionStart` hooks and do not run `SubagentStart`. This avoids exposing synthetic matcher labels for internal implementation paths. Also the `SessionStart` hook no longer fires for subagents, this matches behavior with other coding agents' implementation # Stack 1. This PR: add `SubagentStart`. 2. #22873: add `SubagentStop`. 3. #22882: add subagent identity to normal hook inputs.	2026-05-19 12:45:08 -07:00
Arun Eswara	d269aa2af9	Harden CLI rate limit window labels (#22929 ) ## Context The CLI rate-limit surfaces previously described usage windows as fixed 5-hour and weekly limits. We want the CLI to display whatever supported rate-limit period the server returns instead of assuming a 5-hour/1-week pair. This supports generalized Codex rate-limit periods. ## Summary - Formats CLI rate-limit warning/status labels only for the supported returned window durations: approximate 5h, daily, weekly, monthly, and annual. - Uses generic fallback copy when a primary or secondary window has no duration, so missing secondary protection data does not produce stale weekly copy. - Uses generic fallback copy for unsupported window durations instead of adding arbitrary hourly, multi-day, multi-week, or multi-year labels. - Updates status line and terminal title setup descriptions/previews to talk about primary/secondary usage limits rather than fixed 5h/weekly limits. - Adds rendered insta snapshot coverage for the updated rate-limit status surfaces and `/status` fallback labels. ## Tests Tested locally: - one primary window - one secondary window - primary and secondary window	2026-05-19 11:22:00 -07:00
viyatb-oai	3c76081876	Make `deny` canonical for filesystem permission entries (#23493 ) ## Why Filesystem permission profiles used `none` for deny-read entries, which is less direct than the action the entry actually represents. This change makes `deny` the canonical filesystem permission spelling while preserving compatibility for older configs that still send `none`. ## What changed - rename `FileSystemAccessMode::None` to `Deny` - serialize and generate schemas with `deny` as the canonical value - retain `none` only as a legacy input alias for temporary config compatibility - update filesystem glob diagnostics and regression coverage to use the canonical spelling - refresh config and app-server schema fixtures to match the new wire shape ## Validation - `cargo test -p codex-protocol` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-core config_toml_deserializes_permission_profiles --lib` - `cargo test -p codex-core read_write_glob_patterns_still_reject_non_subpath_globs --lib` Earlier in the session, a broad `cargo test -p codex-core` run reached unrelated pre-existing failures in timing/snapshot/git-info tests under this environment; the targeted surfaces touched by this PR passed cleanly.	2026-05-19 11:03:47 -07:00
jif-oai	05b8ce4354	chore: namespace v1 sub-agent tools (#23475 ) ## Why The v1 sub-agent tools are a single tool family, but they were exposed as separate flat function tools. This makes the model-visible surface less clearly grouped and leaves the legacy names in the same flat namespace as newer agent tooling. ## What - Wraps the v1 `spawn_agent`, `send_input`, `resume_agent`, `wait_agent`, and `close_agent` specs in the `multi_agent_v1` namespace. - Registers the corresponding handlers with namespaced runtime tool names. - Updates tool-planning, deferred tool search, and sub-agent notification tests to assert the namespace shape and child `spawn_agent` lookup. ## Verification - Updated `codex-core` coverage for the v1 multi-agent tool plan, deferred tool search output, and sub-agent tool descriptions.	2026-05-19 19:46:17 +02:00
pakrym-oai	ccbf0137db	[codex] Make contextual user fragments dyn-renderable (#23397 ) ## Why `ContextualUserFragment` needs to be usable behind `dyn` for render-only paths, but associated constants made the trait non-object-safe. ## What changed - Replaced associated constants with trait methods so `dyn ContextualUserFragment` can render fragments. - Preserved the existing typed `T::matches_text(text)` registration pattern via `type_markers()`. - Kept default `render()` on the main trait so implementations only provide role, markers, and body. - Added unit coverage for rendering a `Box<dyn ContextualUserFragment>`. ## Verification - `cargo test -p codex-core contextual_user_fragment_is_dyn_compatible` - `just fix -p codex-core`	2026-05-19 10:42:54 -07:00
Eric Traut	ae10708ae0	[2 of 4] tui: route app and skill enablement through app server (#22914 ) ## Why App and skill toggles are user config mutations too. When the TUI is attached to a remote app server, writing those toggles into the local `config.toml` makes the UI report success without updating the server that actually owns the session. This is [2 of 4] in a stacked series that moves TUI-owned config mutations onto app-server APIs. ## What changed - Routed app enable/disable persistence through app-server config batch writes. - Routed skill enable/disable persistence through `skills/config/write`. - Avoided refreshing local config from disk after these writes when the TUI is connected to a remote app server. ## Config keys affected - `apps.<app_id>.enabled` - `apps.<app_id>.disabled_reason` - `[[skills.config]]` entries keyed by `path`, with `enabled = false` used for persisted disables ## Suggested manual validation - Connect the TUI to a remote app server, disable an app, reconnect, and confirm the app remains disabled from remote config rather than local disk state. - Re-enable the same app and confirm both `apps.<app_id>.enabled` and `apps.<app_id>.disabled_reason` are cleared remotely. - Disable a skill in the manage-skills UI and confirm a remote `[[skills.config]]` disable entry appears. - Re-enable that skill and confirm the disable entry is removed and the effective enabled state updates without relying on local config reloads. ## Stack 1. [#22913](https://github.com/openai/codex/pull/22913) `[1 of 4]` primary settings writes 2. [#22914](https://github.com/openai/codex/pull/22914) `[2 of 4]` app and skill enablement 3. [#22915](https://github.com/openai/codex/pull/22915) `[3 of 4]` feature and memory toggles 4. [#22916](https://github.com/openai/codex/pull/22916) `[4 of 4]` startup and onboarding bookkeeping	2026-05-19 10:21:07 -07:00
pakrym-oai	f0663fd4fd	[codex] Preserve steer input as user input (#23405 ) ## Why Steered input was queued as a `ResponseInputItem`, then parsed back into a user message before recording. That path loses information that only exists on `UserInput`, such as UI text elements. This change keeps turn-local pending input typed as either original `UserInput` or existing response items, so steered user input reaches user-message recording without being reconstructed from a response item. ## What changed - Add `TurnInput` for active-turn pending input. - Queue `Session::steer_input` as `TurnInput::UserInput`. - Run pending-input hook inspection only for `TurnInput::UserInput`. - Process drained pending input item by item: accepted items are recorded, blocked items append hook context and are skipped. - Remove the pending-input prepend/requeue path. ## Validation - `just fmt` - `just fix -p codex-core` - `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib session::tests::task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input -- --nocapture` - `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib steer_input` - `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib pending_input` - `RUST_MIN_STACK=16777216 cargo test -p codex-core --test all pending_input` - `RUST_MIN_STACK=16777216 cargo test -p codex-core` (unit tests passed: 1835 passed, 0 failed, 4 ignored; integration `all` target failed due missing helper binaries such as `codex`/`test_stdio_server` plus unrelated MCP/search/code-mode expectations)	2026-05-19 09:47:43 -07:00
pakrym-oai	9289b7cea8	[codex] Move hook request plumbing into hook runtime (#23388 ) ## Why `run_turn` was still hand-building hook payloads and lifecycle events for a couple of hook paths. Most hook call sites already delegate request construction and event emission to `hook_runtime`, which keeps turn orchestration focused on model-flow decisions rather than hook plumbing. This also keeps the legacy `after_agent` message extraction next to the legacy hook dispatch instead of leaving response-item walking in `run_turn`. ## What changed - Added `run_stop_hooks` in `hook_runtime` to build `StopRequest`, emit preview start events, run the hook, and emit completion events. - Added `run_legacy_after_agent_hook` in `hook_runtime` to build and dispatch the legacy `AfterAgent` hook payload, including extracting input messages from response items. - Updated `run_turn` to call the hook runtime helpers and keep only the resulting continuation/block/stop decisions inline. - Removed the repeated pending session-start hook check from the run loop. ## Validation - `cargo test -p codex-core hook_runtime`	2026-05-19 08:41:26 -07:00
pakrym-oai	ef24ef127f	[codex] Allow empty turn/start requests (#23409 ) ## Why `turn/start` already accepts an input array on the wire, including an empty array, but core treated empty input as a no-op before the turn could reach the model. App-server clients need to be able to start a real turn even when there is no new user message, for example to let the model proceed from existing thread context. ## What changed - Removed the `run_turn` early return that skipped empty-input turns when there was no pending input. - Kept empty active-turn steering rejected by moving the `steer_input` empty-input check until after core has determined whether there is an active regular turn. - Empty regular turns now refresh `previous_turn_settings` like other regular turns, so follow-up context injection state advances consistently. - Added an app-server v2 integration test proving `turn/start` with `input: []` emits started/completed notifications, sends one Responses request, and does not synthesize an empty user message. ## Validation - `cargo test -p codex-app-server --test all turn_start_with_empty_input_runs_model_request`	2026-05-19 08:39:45 -07:00

1 2 3 4 5 ...

6689 Commits