Compare commits

..

572 Commits

Author SHA1 Message Date
Eva Wong
4ddc4c47ad Add Windows metadata adapter target type 2026-05-06 15:06:01 -07:00
Brian Henzelmann
8f5d68f9d2 Document Codex git commit attribution config (#21379)
## Summary
- document that commit attribution for generated git commit messages is
gated by the `codex_git_commit` feature flag
- add an example `config.toml` snippet showing `commit_attribution` with
`[features].codex_git_commit = true`
- update the config schema description so the reference docs explain
that `commit_attribution` only takes effect when the feature is enabled

Fixes #19799.

## Validation
- `cargo run -p codex-core --bin codex-write-config-schema`
- `cargo test -p codex-config`
- `cargo test -p codex-features`
- `cargo fmt --check`
- `git diff --check`

## Notes
- `cargo test -p codex-core config_schema_matches_fixture` currently
fails before reaching the schema test because `core_test_support`
imports `similar` without a linked crate in this checkout. The narrower
package checks above avoid that unrelated test-support build failure.
2026-05-06 16:14:50 -05:00
iceweasel-oai
123e78b97b [codex] Fix Windows sandbox git safe.directory for worktrees (#21409)
## Why

Windows sandboxed commands run as a sandbox user, while workspace
repositories are usually owned by the real user. The sandbox compensates
by injecting a temporary Git `safe.directory` entry into the child
environment.

That injection was still broken for linked worktrees because the helper
followed the `.git` file's `gitdir:` pointer and injected the internal
`.git/worktrees/...` location. Git's dubious-ownership check expects the
worktree root instead, so sandboxed Git commands still failed in
worktree-based Codex checkouts.

## What changed

- Treat any `.git` marker, directory or file, as the worktree root for
`safe.directory` injection.
- Keep the safe-directory logic in
`windows-sandbox-rs/src/sandbox_utils.rs` and have the one-shot elevated
path reuse it.
- Add regression coverage for both normal `.git` directories and
gitfile-based worktrees.

## Validation

- `cargo test -p codex-windows-sandbox sandbox_utils::tests`
- `cargo test -p codex-windows-sandbox` built and ran; the new
`sandbox_utils` tests passed, while two pre-existing legacy sandbox
tests failed locally with `Access is denied`:
`session::tests::legacy_non_tty_cmd_emits_output` and
`spawn_prep::tests::legacy_spawn_env_applies_offline_network_rewrite`.
2026-05-06 14:08:45 -07:00
rhan-oai
fbdbc6b2fe [codex-analytics] emit tool item events from item lifecycle (#17090)
## Why

After the tool-item schemas are in place, analytics needs to emit them
from the app-server item lifecycle rather than requiring bespoke
tracking at each callsite. The reducer should also reuse the shared
thread analytics context introduced below it in the stack so later event
families do not repeat the same reducer joins or missing-state ladder.

## What changed

- Tracks tool-item completion notifications and emits the matching tool
analytics event when a terminal item arrives.
- Derives event-specific payload details for command execution, file
changes, MCP calls, dynamic tools, collaboration tools, web search, and
image generation.
- Denormalizes thread, app-server client, runtime, and subagent
provenance metadata through the shared thread analytics context.
- Adds reducer coverage for item lifecycle emission and subagent
metadata inheritance.

## Duration semantics

`duration_ms` is computed from the app-server item lifecycle timestamps:
`completed_at_ms - started_at_ms`. That makes it the duration of the
lifecycle Codex observed locally, not necessarily the upstream
provider's full execution time.

- Web search usually has a meaningful observed lifecycle because
Responses can send `response.output_item.added` before
`response.output_item.done`; in that case `started_at_ms` comes from the
added event and `completed_at_ms` comes from the done event.
- Image generation can be much less precise. In the current observed
stream, image generation often arrives only as a completed
`response.output_item.done`; when there is no earlier added event, Codex
synthesizes the started item immediately before completion, so
`duration_ms` can be `0` even though upstream image generation took
longer.
- Standalone web search and standalone image generation work is expected
to land after this stack. Those paths may introduce more direct
lifecycle events or timing points, so the current
web-search/image-generation duration semantics should be treated as the
best available item-lifecycle approximation, not the final latency
contract for those tool families.
- `execution_duration_ms` is populated only where the completed item
already carries a native execution duration; otherwise it remains `null`
while `duration_ms` still reflects the local lifecycle interval.

## Currently placeholder / partial fields

Some fields are included in the schema for the intended steady-state
contract, but this PR does not yet populate them from real
approval/review state:

- `review_count`, `guardian_review_count`, and `user_review_count`
currently default to `0`.
- `final_approval_outcome` currently defaults to `unknown`.
- `requested_additional_permissions` and `requested_network_access`
currently default to `false`.

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17090).
* #18748
* #18747
* __->__ #17090
* #17089
* #20514
2026-05-06 20:27:41 +00:00
rhan-oai
21295f47e2 [codex-tui] pass thread source for tui threads (#21401)
## Summary
- mark TUI-created thread starts and forks with explicit `thread_source
= user`
- add focused coverage for embedded and remote lifecycle request
builders

## Why
Thread analytics now consume an explicit thread-level source
classification instead of inferring it from `session_source`. The TUI
still omitted that field, so TUI-created interactive threads would
continue to land as `null` even after the new analytics plumbing
shipped.

## Validation
- `cargo test -p codex-tui app_server_session --lib`
2026-05-06 13:18:41 -07:00
pakrym-oai
b9c50a53d7 [codex] Split tool handlers into separate files (#21395)
## Why

Several tool handler modules still bundled multiple `ToolHandler`
implementations in one file. That made the handler directory harder to
navigate and made otherwise local handler edits land in large shared
modules.

## What

- Split grouped tool handlers into one handler file each for agent jobs,
goals, MCP resources, shell tools, and unified exec.
- Kept shared parsing, payload, and runtime helpers in the existing
parent modules, with re-exports preserving the existing handler import
paths.
- Updated the shell handler tests to construct `ShellCommandHandler`
through the existing `ShellCommandBackendConfig` conversion now that the
backend detail lives with the shell-command handler.

## Validation

- `cargo check -p codex-core`
- `cargo clippy -p codex-core --lib -- -D warnings`
- `git diff --check -- codex-rs/core/src/tools/handlers`

Targeted `codex-core` handler tests did not run locally because
`core_test_support` currently fails to compile before reaching these
tests due to an unresolved `similar` import.
2026-05-06 13:12:24 -07:00
canvrno-oai
d5f0b6d63a [codex] Dedupe fallback model metadata warnings (#21090)
Fixes #21070.

This is a small cleanup around model metadata handling for
gateway/provider model names. It follows the report and proposed
direction from @dkbush by keeping the fallback metadata warning useful
without repeating it every turn, and by tightening the existing
provider-prefix lookup path.

- Track fallback metadata warning slugs in session state so each
unresolved model warns once per session.
- Keep warning emission outside the session-state lock and preserve the
existing warning text.
- Allow one-segment provider prefixes with hyphenated provider IDs,
while preserving the multi-segment rejection behavior.
- Add focused coverage for warning dedupe and hyphenated provider-prefix
metadata matching.

Testing:

- Ran `just fmt`.
- Ran `git diff --check`.
- Added tests for the new warning dedupe and provider-prefix lookup
behavior.
2026-05-06 13:11:44 -07:00
starr-openai
63a27ad6c6 Avoid hard-coded environment context shell (#21390)
## Summary
- make resolved turn environment shell metadata optional instead of
hard-coding bash
- render environment context shells from explicit environment metadata
when present, falling back to the existing session shell
- update environment context tests for inherited PowerShell-style
fallback and explicit per-environment shell override

## Testing
- Not run (not requested; formatted with `just fmt`).

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 19:54:26 +00:00
Christoph Paasch (OpenAI)
f9063045e1 Avoid noisy OTEL diagnostics in codex exec (#21107)
`codex exec` should not print OpenTelemetry exporter self-diagnostics to
stderr by default. Suppress the SDK and OTLP exporter targets unless
callers
explicitly opt in with `RUST_LOG`.

Also stop defaulting the trace exporter to the log exporter, since OTLP
HTTP
endpoints are signal-specific and a logs endpoint is not valid for
spans.

Co-authored-by: Codex <noreply@openai.com>

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 12:49:13 -07:00
Clark DuVall
346070a424 Route opted-in MCP elicitations through Guardian (#19431)
# Motivation

Browser Use origin-access prompts are MCP elicitations, not direct
tool-call approval prompts, so they were bypassing the Guardian approval
path. We need a generic opt-in that lets eligible MCP elicitations use
Guardian when the current turn already routes approvals there.

# Description

Add a generic elicitation reviewer hook in codex-mcp and wire codex-core
to pass a Guardian reviewer callback when creating the MCP connection
manager. The reviewer validates explicit mcp_tool_call opt-in metadata,
builds a Guardian MCP tool-call review request from
server/tool/connector metadata and tool params, and maps Guardian
approval, denial, timeout, and cancellation decisions back to MCP
elicitation responses.

The new option to trigger this in the `_meta` object is:
```
"codex_request_type": "approval_request",
```

# Testing

- RUST_MIN_STACK=8388608 NEXTEST_STATUS_LEVEL=leak cargo nextest run
--no-fail-fast --cargo-profile ci-test --test-threads 2
- cargo clippy --tests -- -D warnings
- cargo fmt -- --config imports_granularity=Item --check
- cargo shear
- pnpm run format
- python3 .github/scripts/verify_cargo_workspace_manifests.py
- python3 .github/scripts/verify_tui_core_boundary.py
- python3 .github/scripts/verify_bazel_clippy_lints.py
- git diff --check
2026-05-06 19:42:45 +00:00
Felipe Coury
6b7d6cafa0 fix(tui): persist ctrl-c draft via app event (#21397)
## Why

The main branch started failing after #21351 merged because the merge
commit kept calling `AppCommand::add_to_history` from
`BottomPane::clear_composer_for_ctrl_c`, but main had already removed
that helper as part of the history persistence refactor. The PR head
passed because it was based on an older main commit where the helper
still existed.

This restores the Ctrl+C draft-stashing behavior using the current
app-event path instead of the removed command helper.

## What Changed

- Store the active `ThreadId` in `BottomPane` when history metadata is
provided.
- Emit `AppEvent::AppendMessageHistoryEntry` for Ctrl+C-cleared drafts.
- Update the slash-clear regression test to assert the current history
event shape.

## How to Test

Targeted tests:
- `cargo test -p codex-tui
slash_clear_after_ctrl_c_keeps_stashed_draft_recallable`

Broader local checks:
- `just fix -p codex-tui`
- `just argument-comment-lint -p codex-tui`
- `git diff --check origin/main...HEAD`
- `cargo test -p codex-tui` reached completion; the fixed test passed,
and the only local failures were
`status::tests::status_permissions_full_disk_managed_*`, blocked by this
machine config rejecting `DangerFullAccess` via
`/etc/codex/requirements.toml`.
2026-05-06 19:03:11 +00:00
iceweasel-oai
f32c496144 [codex] Handle git pagination flags by position (#21381)
## Why

This is a follow-up to the Windows Git safe-command bypass fix for
BUGB-15601. Git's global `--paginate` / `-p` flags can route output
through a configured pager, so they should not be auto-approved as safe
before the subcommand. At the same time, `-p` after read-only
subcommands like `log`, `diff`, and `show` is the common patch-output
flag, so treating every `-p` as unsafe would make ordinary read-only
inspection commands prompt unnecessarily.

## What Changed

- Split Git option safety matching into explicit global-option and
subcommand-option lists.
- Treat global `git --paginate ...` and `git -p ...` as unsafe.
- Keep post-subcommand patch usage such as `git log -p`, `git diff -p`,
and `git show -p HEAD` safe.
- Keep the pagination coverage with the shared Git safe-command
implementation rather than the Windows wrapper tests.
- Remove the stale `git_global_option_requires_prompt` helper now that
safe-command Git option matching owns the prompt-required lists.

## Testing

- `cargo test -p codex-shell-command`
2026-05-06 11:53:26 -07:00
pakrym-oai
712305be47 Remove core MCP list tools op (#21281)
## Why

The core `Op::ListMcpTools` request path is no longer needed. Keeping it
around left a dead request/response surface alongside the app-server MCP
inventory APIs that own current server status listing.

## What Changed

- Removed `Op::ListMcpTools`, `EventMsg::McpListToolsResponse`, and the
core handler that built the MCP snapshot response.
- Removed the now-unused `codex-mcp` snapshot wrapper/export and passive
event handling arms in rollout and MCP-server consumers.
- Updated tests that used the old op as a synchronization hook to wait
on existing startup/skills events, and deleted the plugin test that only
exercised the removed listing op.

## Validation

- `cargo test -p codex-protocol`
- `cargo test -p codex-mcp`
- `cargo test -p codex-rollout -p codex-rollout-trace -p
codex-mcp-server`
- `cargo test -p codex-core --test all
pending_input::queued_inter_agent_mail`
- `cargo test -p codex-core --test all
rmcp_client::stdio_mcp_tool_call_includes_sandbox_state_meta`
- `cargo test -p codex-core --test all
rmcp_client::stdio_image_responses`
- `just fix -p codex-core -p codex-protocol -p codex-mcp -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`
2026-05-06 11:20:34 -07:00
Michael Bolin
123ec8b035 vendor: update bubblewrap to 0.11.2 (#21389)
## Why

`codex-rs/vendor/bubblewrap` had fallen behind upstream, and upstream
`v0.11.2` is the current Bubblewrap release. The release is a security
update for `CVE-2026-41163`, affecting setuid Bubblewrap builds, and
deprecates setuid support in favor of the default non-setuid build mode.

## What changed

- Refreshed the vendored Bubblewrap sources under
`codex-rs/vendor/bubblewrap` to upstream `v0.11.2`.
- Brought in the upstream `-Dsupport_setuid` build option, which
defaults setuid support off.
- Updated vendored release notes and documentation files included with
Bubblewrap.

## Verification

Not run locally; this PR only refreshes the vendored upstream Bubblewrap
source snapshot.

Upstream release:
https://github.com/containers/bubblewrap/releases/tag/v0.11.2
2026-05-06 18:10:30 +00:00
Felipe Coury
e97610cf3b fix(tui): keep Ctrl-C stashed drafts after /clear (#21351)
## Why

When a user stashes a draft with Ctrl+C, then runs `/clear`, the fresh
chat session loses the in-memory composer history that held the stashed
draft. Pressing Up after `/clear` can then recall an older submitted
prompt instead of the draft the user explicitly saved for later.

## What Changed

- Record Ctrl+C-cleared composer text through the existing message
history path, so it survives the fresh session created by `/clear`.
- Keep `/clear` itself out of local slash-command recall so it does not
sit ahead of the stashed draft.
- Add regression coverage for the full flow: submit a prompt, stash a
later draft with Ctrl+C, run `/clear`, then recall the stashed draft
before the older prompt.

## How to Test

1. Start Codex with `just c`.
2. Submit a short prompt such as `ok` and wait for the turn to complete.
3. Type a new draft, press Ctrl+C, then run `/clear`.
4. Press Up and confirm the stashed draft is restored.
5. Press Up again and confirm the older submitted prompt is still
reachable after the stashed draft.

Targeted tests:

- `cargo test -p codex-tui
slash_clear_after_ctrl_c_keeps_stashed_draft_recallable`

Manual verification:

- Reproduced the issue in tmux with `RUST_LOG=trace just c -c
log_dir=...`: before the fix, Up after `/clear` recalled the older
submitted prompt.
- Re-tested the same tmux flow after the fix: Up after `/clear` restored
the Ctrl+C-stashed draft.
2026-05-06 14:46:18 -03:00
mifan-oai
f2f5d6f6c7 [codex] Coordinate OpenAI docs sample with API key setup (#21263)
## Summary
- Add the same API key setup coordination guidance to the embedded
OpenAI Docs sample skill in `codex-rs/skills`.
- Keep the skill description/frontmatter unchanged; the coordination
lives only in the body.
- Preserve direct OpenAI Docs routing for docs-only questions,
citations, model/API guidance, conceptual explanations, and non-building
examples.

## Why
The Codex repo carries its own OpenAI Docs skill variant under
`codex-rs/skills/src/assets/samples`. This keeps that embedded sample
aligned with the other OpenAI Docs variants patched in the related PRs.

## Validation
- `cargo test -p codex-skills`
- `git diff --check`
2026-05-06 13:46:15 -04:00
jif-oai
ab43db44a2 feat: move auto vaccum (#21378)
The initial vaccum is not needed anymore. We can consider all the DBs
have been reclaimed by now
2026-05-06 19:32:28 +02:00
jif-oai
0e821b380a rollout: coalesce thread updated_at touches (#21367)
## Why

Metadata-irrelevant rollout events currently refresh
`threads.updated_at` on every flush. That keeps thread recency accurate,
but it also turns high-frequency agent output into unnecessary SQLite
writes. Recency only needs to advance periodically during an active
session, while the final suppressed touch still needs to be persisted
before shutdown.

## What changed

- coalesce touch-only `updated_at` writes in the rollout writer, with a
short production interval between persisted touches
- retain the latest suppressed touch and flush it during shutdown so the
thread is not left stale
- extend rollout recorder coverage for coalesced touches, delayed
refresh, shutdown flushing, and the existing missing-thread fallback
path

## Verification

- Added regression coverage in `rollout/src/recorder_tests.rs` for
coalescing and shutdown flushing behavior.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 19:32:24 +02:00
pakrym-oai
2070d5bfd3 [codex] Add response.processed websocket request (#21284)
## Summary

- Add a `response.processed` websocket request payload and sender for
Responses API websockets.
- Send `response.processed` from `try_run_sampling_request` after a
response completes, local turn processing succeeds, and the
session-owned feature flag is enabled.
- Add websocket coverage for both enabled and disabled feature-flag
behavior.

## Validation

- `just fmt`
- `cargo test -p codex-core response_processed`
- `cargo test -p codex-api responses_websocket`
- `cargo test -p codex-features
responses_websocket_response_processed_is_under_development`
- `git diff --check`
- `just fix -p codex-api -p codex-core -p codex-features`
- `git diff --check origin/main...HEAD`
2026-05-06 09:58:46 -07:00
pakrym-oai
2004173cd7 Move message history out of core (#21278)
## Why

Message history was implemented inside `codex-core` and surfaced through
core protocol ops and `SessionConfiguredEvent` fields even though the
current consumer is TUI-local prompt recall. That made core own UI
history persistence and exposed `history_log_id` / `history_entry_count`
through surfaces that app-server and other clients do not need.

This change moves message history persistence out of core and keeps the
recall plumbing local to the TUI.

## What changed

- Added a new `codex-message-history` crate for appending, looking up,
trimming, and reading metadata from `history.jsonl`.
- Removed core protocol history ops/events: `AddToHistory`,
`GetHistoryEntryRequest`, and `GetHistoryEntryResponse`.
- Removed `history_log_id` and `history_entry_count` from
`SessionConfiguredEvent` and updated exec/MCP/test fixtures accordingly.
- Updated the TUI to dispatch local app events for message-history
append/lookup and keep its persistent-history metadata in TUI session
state.

## Validation

- `cargo test -p codex-message-history -p codex-protocol`
- `cargo test -p codex-exec event_processor_with_json_output`
- `cargo test -p codex-mcp-server outgoing_message`
- `cargo test -p codex-tui`
- `just fix -p codex-message-history -p codex-protocol -p codex-core -p
codex-tui -p codex-exec -p codex-mcp-server`
2026-05-06 08:35:42 -07:00
Ahmed Ibrahim
be1d3cff93 2- Use string service tiers in session protocol (#20971)
## Summary
- break service tier session/op/app-server protocol fields from the
closed enum to string tier ids
- send the service tier string directly through model requests, prewarm,
compaction, memories, and TUI/app-server turn starts
- regenerate app-server protocol JSON/TypeScript schemas, removing the
standalone ServiceTier TS enum

## Verification
- just fmt
- cargo check -p codex-core -p codex-app-server -p codex-tui
- just write-app-server-schema

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 18:00:21 +03:00
jif-oai
ebd9ec05b4 [codex] fix builtin MCP Windows path test (#21350)
## Summary
- make the builtin MCP config test derive the expected `--codex-home`
argument from `AbsolutePathBuf`

## Why
`AbsolutePathBuf::try_from("/tmp/codex-home")` is rendered as
`D:\\tmp\\codex-home` on Windows, but the test asserted the Unix literal
`"/tmp/codex-home"`. That made the Windows Bazel job fail even though
the production code was behaving correctly.

## Impact
This keeps the test cross-platform while preserving the same transport
assertion on Unix and Windows.

## Validation
- `cargo test -p codex-builtin-mcps`

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 16:06:21 +02:00
jif-oai
5ecff05196 feat(app-server): move v2 sessionId onto Thread (#21336)
## Why

`session_id` and `thread_id` are separate identities after #20437, but
app-server only surfaced `sessionId` on the `thread/start`,
`thread/resume`, and `thread/fork` response envelopes. Other
thread-bearing surfaces such as `thread/list`, `thread/read`,
`thread/started`, `thread/rollback`, `thread/metadata/update`, and
`thread/unarchive` either lacked the grouping key or forced clients to
special-case those three responses.

Making `sessionId` part of the reusable `Thread` payload gives every v2
API surface one place to expose session-tree identity.

## Mental model
  1. thread.sessionId lives on `Thread`
2. It is a view/runtime identity for the current live session tree, not
durable stored lineage metadata
3. When app-server has a live loaded thread, it copies the real value
from core’s session_configured.session_id
4. When it only has stored/unloaded data, it falls back to
thread.sessionId = thread.id

## What changed

- Added `sessionId` to the v2
[`Thread`](8fc9e9b4cf/codex-rs/app-server-protocol/src/protocol/v2/thread_data.rs (L105-L109)).
- Removed the duplicate top-level `sessionId` fields from
`thread/start`, `thread/resume`, and `thread/fork`; clients should now
read `response.thread.sessionId`.
- Populated `thread.sessionId` when building live thread responses,
replaying loaded threads, and returning stored-thread summaries so the
field is present across start, resume, fork, list, read, rollback,
metadata-update, unarchive, and `thread/started` paths. See
[`load_thread_from_resume_source_or_send_internal`](8fc9e9b4cf/codex-rs/app-server/src/request_processors/thread_processor.rs (L2824-L2918))
and
[`thread_from_stored_thread`](8fc9e9b4cf/codex-rs/app-server/src/request_processors/thread_processor.rs (L3671-L3719)).
- Preserved the stored-thread fallback: if a thread has not been loaded
into a live session tree yet, `thread.sessionId` falls back to
`thread.id`; once the thread is live again, the field reports the active
session tree root.
- Regenerated the JSON/TypeScript schemas and updated the app-server
README examples to show
[`thread.sessionId`](8fc9e9b4cf/codex-rs/app-server/README.md (L306-L310))
on the thread object.
2026-05-06 15:23:25 +02:00
jif-oai
ca257b6ce5 chore: spawn MCP for memories (#21214)
Co-authored-by: Codex <noreply@openai.com>
2026-05-06 15:05:54 +02:00
jif-oai
8f3bb355f4 Move installation ID resolution out of core startup (#21182)
## Summary

- resolve or inject the installation ID before core startup and pass it
through `ThreadManager`, `CodexSpawnArgs`, and `Session` as a plain
`String`
- keep child sessions on the parent installation ID instead of
rediscovering it inside core
- propagate installation ID startup failures in `mcp-server` instead of
panicking

## Why

Core was still touching the filesystem on the session startup path to
discover `installation_id`. This moves that work to the outer host
boundary so core no longer depends on `codex_home` reads during session
construction.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 10:48:54 +00:00
Ahmed Ibrahim
5d6f23a27b Propagate cache key and service tiers in compact (#21249)
## Why

`/responses/compact` should preserve the request-affinity fields that
apply to the active auth mode. ChatGPT-auth compact requests need the
effective `service_tier`, and compact requests for every auth mode need
the stable `prompt_cache_key`, so compaction does not quietly lose
routing or cache behavior that normal sampling already has.

This follows the request-parity direction from #20719, but keeps the net
change focused on the compact payload fields needed here.

## What changed

- Add `service_tier` and `prompt_cache_key` to the compact endpoint
input payload.
- Build the remote compact payload from the existing responses request
builder output so `Fast` still maps to `priority` when compact sends a
service tier.
- Pass the turn service tier into remote compaction, but only include it
in compact payloads for ChatGPT-backed auth.
- Keep `prompt_cache_key` on compact payloads for all auth modes.
- Add request-body diff snapshot coverage in
`core/tests/suite/compact_remote.rs` for:
- API-key auth reusing `prompt_cache_key` while omitting `service_tier`
even when `Fast` is configured.
  - ChatGPT auth reusing both `service_tier` and `prompt_cache_key`.
- Drive the snapshot coverage through five varied turns: plain text,
multi-part text, tool-call continuation, image+text input, local-shell
continuation, and final-turn reasoning output.

## Verification

- Added insta snapshots for compact request-body parity against the last
normal `/responses` request after five varied turns.
- Not run locally per repo guidance; relying on GitHub CI for test
execution.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 13:38:43 +03:00
jif-oai
cc84e6bc6d Revert "feat: support template interpolation in multi-agent usage hints" (#21337)
Reverts openai/codex#20973
2026-05-06 12:33:37 +02:00
jif-oai
06e5dfa4dd feat: return session ID from thread/fork (#21332)
## Why

`thread/start` and `thread/resume` already return `sessionId`, but
`thread/fork` only returned the new thread. That left clients to infer
the forked thread's session identity from `thread.id`, which kept the
new `session_id` / `thread_id` split implicit at one lifecycle boundary.
Follow-up to #20437.

## What changed

- Add `sessionId` to `ThreadForkResponse`.
- Populate it from the forked session configuration.
- Regenerate the v2 JSON/TypeScript schema fixtures and update the
app-server docs/example.
- Extend the fork integration test to assert the returned `sessionId`.

## Verification

- Added coverage in `thread_fork_creates_new_thread_and_emits_started`
for the new response field.
2026-05-06 12:04:27 +02:00
jif-oai
fe24a180ab feat: include thread ID in MCP turn metadata (#21329)
## Why

MCP tool calls already include `session_id` in `x-codex-turn-metadata`,
but descendant threads intentionally share that value with the root
thread. Consumers that need to correlate work at the concrete thread
level also need the current `thread_id`.

## What changed

- add `thread_id` to `x-codex-turn-metadata` while preserving
`session_id` as the shared session identity
- thread the two identities separately through normal turns and spawned
review threads
- add regression coverage for resumed sessions, reserved metadata
fields, and deferred MCP tool calls

## Verification

- added focused coverage in `core/src/session/tests.rs`,
`core/src/turn_metadata_tests.rs`, and `core/tests/suite/search_tool.rs`
2026-05-06 11:36:15 +02:00
jif-oai
b5e965e1d7 test: isolate app-server-client in-process test state (#21328)
## Why

The in-process `app-server-client` tests were still building their
configs from the ambient `codex_home` and letting the embedded app
server create its own state DB when `state_db` was absent. That matters
because in-process startup falls back to
`init_state_db_from_config(...)` in that case, so tests can otherwise
share persisted state instead of getting isolated fixtures:
[`app-server/src/in_process.rs`](a98623511b/codex-rs/app-server/src/in_process.rs (L368-L373)).

## What changed

- Give each in-process test client its own temporary `codex_home`.
- Initialize the matching state DB from that per-client config and pass
it into the client explicitly.
- Keep the temp directory alive for the lifetime of the test client
through a small `TestClient` wrapper.
- Add `tempfile` as a dev dependency for the new harness.

The updated setup lives in
[`app-server-client/src/lib.rs`](35c1133d45/codex-rs/app-server-client/src/lib.rs (L982-L1055)).

## Testing

- Existing `codex-app-server-client` tests continue to exercise the
updated in-process client path through the isolated helper.
2026-05-06 09:21:22 +00:00
jif-oai
a98623511b feat: add session_id (#20437)
## Summary

Related to
https://openai.slack.com/archives/C095U48JNL9/p1777537279707449
TLDR:
We update the meaning of session ids and thread ids:
* thread_id stays as now
* session_id become a shared id between every thread under a /root
thread (i.e. every sub-agent share the same session id)

This PR introduces an explicit `SessionId` and threads it through the
protocol/client boundary so `session_id` and `thread_id` can diverge
when they need to, while preserving compatibility for older serialized
`session_configured` events.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-06 10:48:37 +02:00
Matthew Zeng
f9a907aebe Support Codex Apps auth elicitations (#19193)
## Summary

- request URL-mode MCP elicitations when Codex Apps tool calls fail with
connector auth metadata
- route Codex Apps auth URL elicitations into the TUI app-link flow

## Test plan

- `just fmt`
- `cargo test -p codex-core mcp_tool_call::tests`
- `cargo test -p codex-mcp`
- `cargo test -p codex-tui bottom_pane::app_link_view::tests`
- `just fix -p codex-core`
- `just fix -p codex-mcp`
- `just fix -p codex-tui`

Also attempted broader local runs:

- `cargo test -p codex-core` fails in unrelated
config/request-permission/proxy-sensitive tests under the current Codex
Desktop environment.
- `cargo test -p codex-tui` fails in unrelated status
snapshots/trust-default tests because the ambient environment renders
workspace-write/network permission defaults.
2026-05-06 07:18:00 +00:00
Michael Bolin
22326e263c release: bundle bwrap with Linux codex DotSlash artifact (#21312)
## Why

#21255 changed the Linux sandbox fallback so Codex can use a bundled
`codex-resources/bwrap` executable when no suitable system `bwrap` is
available. That lookup is relative to the native Codex executable
returned by
`std::env::current_exe()`, as implemented in
[`bundled_bwrap.rs`](9766d3d51c/codex-rs/linux-sandbox/src/bundled_bwrap.rs (L83-L93)).

The release already publishes a separate `bwrap` DotSlash output, but
the Linux `codex` DotSlash output still pointed at a single-binary
`.zst` payload. Running the `codex` DotSlash manifest only materializes
the native `codex` executable; it does not also create sibling files
from the separate `bwrap` manifest. The fallback path therefore needs
the Linux `codex` DotSlash artifact itself to include the real `bwrap`
executable at `codex-resources/bwrap`.

## What changed

- stage a Linux primary `codex-<target>-bundle.tar.zst` release artifact
containing `codex` and `codex-resources/bwrap`
- point the Linux `codex` DotSlash outputs at that bundle tarball
- leave the standalone `bwrap` DotSlash output in place for consumers
that want to fetch `bwrap` directly

## Verification

- `jq . .github/dotslash-config.json`
- Ruby YAML parse of `.github/workflows/rust-release.yml`
2026-05-05 23:33:13 -07:00
viyatb-oai
9766d3d51c fix(bwrap): emit libcap after standalone archive (#21285)
## Why

#21255 added the standalone `codex-bwrap` binary. In the Cargo build,
[`pkg_config::probe("libcap")`](a736cb55a2/codex-rs/bwrap/build.rs (L37-L39))
emits `-lcap` before
[`cc::Build::compile("standalone_bwrap")`](a736cb55a2/codex-rs/bwrap/build.rs (L50-L67))
adds the static bwrap archive. The Linux musl link then sees `-lcap
-lstandalone_bwrap`; because static archives are resolved left-to-right,
`cap_from_name` is still undefined once `standalone_bwrap` introduces
that reference.

The musl setup already builds `libcap.a` and exposes it through
[`libcap.pc`](a736cb55a2/.github/scripts/install-musl-build-tools.sh (L78-L88)),
so the failure is link ordering rather than a missing dependency.

## What changed

- probe `libcap` with `cargo_metadata(false)` so `pkg-config` does not
emit its link flags early
- emit the discovered `libcap` search paths and libraries after
`standalone_bwrap` is compiled, preserving the needed static-link order

## Verification

- `cargo test -p codex-bwrap`
- `cargo clippy -p codex-bwrap --all-targets`

The affected Linux musl release link is exercised by CI, which is the
path this fix targets.
2026-05-05 22:22:01 -07:00
Matthew Zeng
41505bcea2 [mcp] Return Accept early per feedback. (#21277)
- [x] Return Accept early when auto_deny is enabled per feedback.
2026-05-05 21:23:42 -07:00
aaronl-openai
9f06d171e2 Preserve session MCP config on refresh (#21055)
# Overview
MCP refreshes were rebuilding active threads from fresh disk-backed
config only, which dropped thread-start session overlays such as
app-injected MCP servers. This keeps refreshes current with disk config
while preserving the thread-local config that only the active thread
knows about.

# Changes
- Rebuild refreshed config per active thread using that thread's current
`cwd`, rather than fanning out one app-server config to every thread.
- Preserve each thread's `SessionFlags` layer while replacing reloadable
config layers with freshly loaded config, then derive the MCP refresh
payload from the rebuilt result.
- Move MCP refresh orchestration into app-server so manual refreshes
fail loudly while background refreshes remain best-effort, and route
plugin-triggered refreshes through the same per-thread reload path.
- Add regression coverage for session overlays, fresh project config,
plugin-derived MCP config, current requirements, and strict vs
best-effort refresh behavior.

# Verification
- Passed focused Rust coverage for the thread-config rebuild behavior
and deferred MCP refresh flow, plus `cargo test -p codex-app-server
--lib`.
- Verified end to end in the Codex dev app against the locally built
CLI: registered an MCP via thread config, verified that it could be used
successfully before refresh, manually triggered MCP refresh, and
verified that it continued to be available afterward.
2026-05-05 21:09:28 -07:00
Andrei Eternal
8ef31894dc app-server: align dynamic tool identifiers with Responses API (#20724)
## Why

Codex currently accepts dynamic tool names and namespaces that the
upstream Responses function-tool path does not actually support. In
practice, that means app-server can register a dynamic tool successfully
and only discover later that the LLM-facing tool contract will reject or
mishandle it.

This PR tightens the app-server-side dynamic tool contract to match the
Responses API before we stack dynamic tool hook support on top of it.

## What changed

- validate dynamic tool `name` against the Responses function-tool
identifier contract: `^[a-zA-Z0-9_-]+$`, length `1..128`
- validate dynamic tool `namespace` the same way, with the Responses
namespace length limit `1..64`
- reject namespaces that collide with the always-reserved Responses
runtime namespaces such as `functions`, `multi_tool_use`, `file_search`,
`web`, `browser`, `image_gen`, `computer`, `container`, `terminal`,
`python`, `python_user_visible`, `api_tool`, `tool_search`, and
`submodel_delegator`
- escape invalid identifiers in error messages so control characters do
not spill raw into logs or client-visible error text
- document the tightened dynamic tool identifier contract in
`codex-rs/app-server/README.md`
- add both unit coverage for the validator and an app-server integration
test that rejects a `thread/start` request with Responses-incompatible
dynamic tool identifiers

## Verification

- `cargo test -p codex-app-server validate_dynamic_tools_`
- `cargo test -p codex-app-server --test all
thread_start_rejects_dynamic_tools_not_supported_by_responses`
2026-05-05 21:05:00 -07:00
xl-openai
5119680f85 feat: Add plugin share access controls (#21124)
Extends `plugin/share/save` to accept optional discoverability and
shareTargets while uploading plugin contents, and adds
`plugin/share/updateTargets` for share-only target updates without
re-uploading.
2026-05-05 20:14:18 -07:00
rhan-oai
b3d4f1a9f0 [codex-analytics] rework thread_source for thread analytics (#20949)
## Summary
- make `thread_source` an explicit optional thread-level field on
`thread/start`, `thread/fork`, and returned thread payloads
- persist `thread_source` in rollout/session metadata so resumed live
threads retain the original value
- replace the old best-effort `session_source` -> `thread_source`
mapping with an explicit caller-supplied analytics classification

## Why
Before this change, analytics `thread_source` was populated by a
best-effort mapping from `session_source`. `session_source` describes
the runtime/client surface, not the actual thread-level origin, so that
projection was not accurate enough to distinguish cases such as `user`,
`subagent`, `memory_consolidation`, and future thread origins reliably.

Making `thread_source` explicit keeps one thread-level analytics field
while letting callers provide the real classification directly instead
of recovering it indirectly from `session_source`.

## Impact
For new analytics events, `thread_source` now reflects the explicit
thread-level classification supplied by the caller rather than an
inferred value derived from `session_source`. Existing protocol fields
remain optional; callers that omit `threadSource` now produce `null`
instead of a best-effort inferred value.

## Validation
- `just write-app-server-schema`
- `cargo test -p codex-analytics -p codex-core -p
codex-app-server-protocol --no-run`
- `cargo test -p codex-app-server-protocol
generated_ts_optional_nullable_fields_only_in_params`
- `cargo test -p codex-analytics
thread_initialized_event_serializes_expected_shape`
- `cargo test -p codex-core
resume_stopped_thread_from_rollout_preserves_thread_source`
2026-05-06 02:12:31 +00:00
Abdulrahman Alfozan
94db03d5af Expose plugin manifest keywords in app server (#21271)
## Summary
- Add plugin manifest keywords to core plugin marketplace/detail models
- Expose keywords on app-server v2 PluginSummary and generated
schema/types
- Populate keywords in plugin/list and plugin/read responses for local
plugins

Depends on https://github.com/openai/openai/pull/891087

## Validation
- just fmt
- just write-app-server-schema
- cargo test -p codex-app-server-protocol
- cargo test -p codex-core-plugins
- cargo test -p codex-app-server
plugin_list_keeps_valid_marketplaces_when_another_marketplace_fails_to_load
- cargo test -p codex-app-server
plugin_read_returns_plugin_details_with_bundle_contents
2026-05-06 02:09:05 +00:00
pakrym-oai
136e442e95 [codex] Remove legacy ListSkills op (#21282)
## Why

`skills/list` is already exposed through app-server v2 and covered by
the app-server test suite. Keeping the separate core `Op::ListSkills`
path leaves a duplicate legacy protocol surface that no longer needs to
be maintained.

## What Changed

- Removed `Op::ListSkills` and `EventMsg::ListSkillsResponse` from the
core protocol.
- Deleted the corresponding core session handler and stale core
integration tests.
- Removed rollout/MCP ignore branches and protocol v1 docs references
for the deleted event/op.
- Left app-server `skills/list` and its existing coverage intact.

## Validation

- `cargo test -p codex-protocol`
- `cargo test -p codex-core --test all suite::skills`
- `cargo check -p codex-mcp-server -p codex-rollout -p
codex-rollout-trace`
- `just fix -p codex-core`
2026-05-05 18:58:18 -07:00
pakrym-oai
024118625e [codex] Remove unused ListModels op (#21276)
## Why

The core protocol still exposed a `ListModels` submission op even though
no client sends it and the core submission loop treated it as an ignored
unknown op. Keeping the dead variant made the protocol surface look
supported while the active model listing API is the app-server
`model/list` JSON-RPC request.

## What Changed

- Removed the unused `Op::ListModels` variant from `codex-rs/protocol`.
- Removed its `Op::kind()` mapping.

The existing app-server `model/list` endpoint is unchanged.

## Verification

- `cargo test -p codex-protocol`
2026-05-06 01:57:17 +00:00
Michael Bolin
a736cb55a2 release/npm: bundle standalone bwrap on Linux (#21257) 2026-05-05 18:21:52 -07:00
iceweasel-oai
db22c91e61 Share Git safe-command logic on Windows (#21275)
## Why

BUGB-15601 showed that the Windows safe-command path had drifted from
the generic Git classifier. The Windows-specific Git parser could
classify a PowerShell-wrapped `git` command as safe as soon as it found
a safelisted subcommand, without applying the generic checks for unsafe
subcommand options such as `--output`, `--ext-diff`, `--textconv`,
`--paginate`, or `cat-file --filters`.

The generic classifier already models the Git command boundary and the
read-only argument checks more carefully, so Windows should reuse that
logic instead of maintaining a smaller parallel parser.

## What Changed

- Extracted the existing generic Git classification logic into
`is_safe_git_command`.
- Updated `windows_safe_commands.rs` to call that shared helper for
parsed PowerShell `git` commands.
- Removed the Windows-only Git subcommand safelist, including the
`cat-file` allowance that was part of the reported bypass.
- Added a Windows regression test that keeps PowerShell-wrapped Git
commands with side-effecting options classified unsafe.
- Made the full-path PowerShell test discover the installed PowerShell
executable instead of depending on one hard-coded `pwsh.exe` path.

## Verification

- `cargo test -p codex-shell-command
rejects_git_subcommand_options_with_side_effects`
- `cargo test -p codex-shell-command
git_global_override_flags_are_not_safe`
- `cargo test -p codex-shell-command
windows_powershell_full_path_is_safe -- --nocapture`

Co-authored-by: Codex <codex@openai.com>
2026-05-05 17:49:42 -07:00
mchen-oai
794c240f25 Add model and reasoning effort to MCP turn metadata (#21219)
## Why
- Similar change as https://github.com/openai/codex/pull/19473.
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id`, `turn_id`, and
`turn_started_at_unix_ms`.
- Issue: MCP servers may want the model and reasoning effort to better
understand tool-call behavior and latency relative to turn start.

## What Changed
- With change: MCP turn metadata now includes `model` and
`reasoning_effort`, propagated in `_meta["x-codex-turn-metadata"]`.
- Normal `/responses` turn metadata headers are unchanged.

## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
2026-05-05 17:37:48 -07:00
pakrym-oai
2c1a361a2e [codex] Move thread naming to app server (#21260)
## Why

Thread names are app-server metadata now, backed by the thread store and
sqlite state database. Keeping a core `SetThreadName` op plus a rollout
`thread_name_updated` event made rename persistence live in the wrong
layer and required historical replay support for an event that new
app-server flows should not write.

## What changed

- Removed `Op::SetThreadName` and `EventMsg::ThreadNameUpdated` from the
core protocol and deleted the core handler path that appended rename
events to rollouts.
- Updated app-server `thread/name/set` so both loaded and unloaded
threads write through thread-store metadata and app-server emits
`thread/name/updated` notifications.
- Updated local thread-store name metadata updates to write sqlite title
metadata and the legacy thread-name index without appending rollout
events.
- Removed state extraction and rollout handling for the deleted
thread-name event.

## Validation

- `cargo test -p codex-app-server thread_name_updated_broadcasts`
- `cargo test -p codex-app-server
thread_name_set_is_reflected_in_read_list_and_resume`
- `cargo test -p codex-thread-store
update_thread_metadata_sets_name_on_active_rollout_and_indexes_name`
- `cargo test -p codex-state`
- `cargo check -p codex-mcp-server -p codex-rollout-trace`
- `just fix -p codex-app-server -p codex-thread-store -p codex-state -p
codex-mcp-server -p codex-rollout-trace`

## Docs

No external documentation update is expected for this internal ownership
change.
2026-05-05 17:16:06 -07:00
Michael Bolin
3ec18a2c0a release: publish standalone bwrap artifacts (#21256)
**Summary**
- Build Linux `bwrap` before the main release binaries.
- Export the release `bwrap` SHA-256 as `CODEX_BWRAP_SHA256` so the
Codex binary can verify the bundled fallback.
- Sign, stage, and upload `bwrap` alongside the primary Linux release
artifacts.

**Verification**
- YAML parse check for `.github/workflows/rust-release.yml`











---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21256).
* #21257
* __->__ #21256
2026-05-05 17:15:46 -07:00
Michael Bolin
26f355b67b linux-sandbox: use standalone bundled bwrap (#21255)
**Summary**
- Add `codex-bwrap`, a standalone `bwrap` binary built from the existing
vendored bubblewrap sources.
- Remove the linked vendored bwrap path from `codex-linux-sandbox`;
runtime now prefers system `bwrap` and falls back to bundled
`codex-resources/bwrap`.
- Add bundled SHA-256 verification with missing/all-zero digest as the
dev-mode skip value, then exec the verified file through
`/proc/self/fd`.
- Keep `launcher.rs` focused on choosing and dispatching the preferred
launcher. Bundled lookup, digest verification, and bundled exec now live
in `linux-sandbox/src/bundled_bwrap.rs`; Bazel runfiles lookup lives in
`linux-sandbox/src/bazel_bwrap.rs`; shared argv/fd exec helpers live in
`linux-sandbox/src/exec_util.rs`.
- Teach Bazel tests to surface the Bazel-built `//codex-rs/bwrap:bwrap`
through `CARGO_BIN_EXE_bwrap`; `codex-linux-sandbox` only honors that
fallback in debug Bazel runfiles environments so release/user runtime
lookup stays tied to `codex-resources/bwrap`.
- Allow `codex-exec-server` filesystem helpers to preserve just the
Bazel bwrap/runfiles variables they need in debug Bazel builds, since
those helpers intentionally rebuild a small environment before spawning
`codex-linux-sandbox`.
- Verify the Bazel bwrap target in Linux release CI with a build-only
check. Running `bwrap --version` is too strong for GitHub runners
because bubblewrap still attempts namespace setup there.

**Verification**
- Latest update: `cargo test -p codex-linux-sandbox`
- Latest update: `just fix -p codex-linux-sandbox`
- `cargo check --target x86_64-unknown-linux-gnu -p codex-linux-sandbox`
could not run locally because this macOS machine does not have
`x86_64-linux-gnu-gcc`; GitHub Linux Bazel CI is expected to cover the
Linux-only modules.
- Earlier in this PR: `cargo test -p codex-bwrap`
- Earlier in this PR: `cargo test -p codex-exec-server`
- Earlier in this PR: `cargo check --release -p codex-exec-server`
- Earlier in this PR: `just fix -p codex-linux-sandbox -p
codex-exec-server`
- Earlier in this PR: `bazel test --nobuild
//codex-rs/linux-sandbox:linux-sandbox-all-test
//codex-rs/core:core-all-test
//codex-rs/exec-server:exec-server-file_system-test
//codex-rs/app-server:app-server-all-test` (analysis completed; Bazel
then refuses to run tests under `--nobuild`)
- Earlier in this PR: `bazel build --nobuild //codex-rs/bwrap:bwrap`
- Prior to this update: `just bazel-lock-update`, `just
bazel-lock-check`, and YAML parse check for
`.github/workflows/bazel.yml`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21255).
* #21257
* #21256
* __->__ #21255
2026-05-05 17:14:29 -07:00
Channing Conger
03d3403a41 ci: trigger rusty-v8 releases from tags (#21259)
Swap to tag based releasing and allow tags of type `rusty-v8-v*.*.*`
2026-05-05 16:56:43 -07:00
Owen Lin
d7de4dd3ac chore(app-server-protocol): split v2 API definitions into modules (#21251)
## Why

`codex-rs/app-server-protocol/src/protocol/v2.rs` had grown into a
single ~12k-line definition file for the entire app-server v2 API.

This is purely a mechanical refactor to break up the monolithic `v2.rs`
file that contains all app-server API v2 types into more modular files,
grouped by resource (e.g. account, thread, turn, etc.).

`just write-app-server-schema` shows no real changes, so we can be sure
that this is purely an internal organizational change.

## What changed

- Replaced the monolithic `protocol/v2.rs` with a `protocol/v2/` module
tree and a small `mod.rs` that only declares and reexports modules.
- Grouped v2 API definitions by conceptual owner, including `account`,
`apps`, `collaboration_mode`, `command_exec`, `config`, `device_key`,
`experimental_feature`, `feedback`, `fs`, `hook`, `item`, `mcp`,
`model`, `notification`, `permissions`, `plugin`, `process`, `realtime`,
`review`, `thread`, `thread_data`, `turn`, and `windows_sandbox`.
- Moved v2 tests into `protocol/v2/tests.rs` so `mod.rs` stays small.
- Kept shared protocol helpers in `protocol/v2/shared.rs`, including the
enum mirroring macro and common cross-resource types.
- Co-located resource-specific notifications and server-request payloads
with the modules that own those resources.
- Regenerated app-server protocol schema fixtures. The schema diffs are
non-semantic newline-only changes after the refactor.

## Verification

- `cargo check -p codex-app-server-protocol`
- `cargo test -p codex-app-server-protocol`
- `just write-app-server-schema`
2026-05-05 16:46:51 -07:00
Michael Bolin
332b8b2c74 fix build (#21261)
I believe a merge race in https://github.com/openai/codex/pull/20689
broke the build, so this is a quick fix.

`cargo check --tests` passed locally.
2026-05-05 16:02:06 -07:00
Tom
ee02cf26d6 codex: use ThreadStore history for core review forks (#20577)
- fork loaded parent threads from `ThreadStore` history in core agent
control paths
- migrate guardian review fork history to loaded session history instead
of rereading rollout files

## Verification
- `cargo test -p codex-core spawn_agent_fork`
2026-05-05 15:25:19 -07:00
Michael Zeng
d0f9d5eba2 Add cloud executor registration to exec-server (#19575)
## Summary
This PR adds the first `codex-rs` milestone for remote-exec e2e: a local
`codex exec-server` can now register itself with
`codex-cloud-environments` and attach to the returned rendezvous
websocket.

At a high level, `codex exec-server --cloud ...` now:
- loads ChatGPT auth from normal Codex config
- registers an executor with `codex-cloud-environments`
- receives a signed rendezvous websocket URL
- serves the existing exec-server JSON-RPC protocol over that websocket

## What Changed
- Added `--cloud`, `--cloud-base-url`, `--cloud-environment-id`, and
`--cloud-name` to `codex exec-server`
- Added a new `exec-server/src/cloud.rs` module that handles:
  - registration requests
  - auth/header setup
  - bounded auth retry on `401/403`
  - reconnect/backoff after websocket disconnects
- Reused the existing `ConnectionProcessor` / `ExecServerHandler` path
so cloud mode serves the same exec/filesystem RPC surface as local
websocket mode
- Added cloud-specific error variants and minimal docs for the new mode

## Testing
Manual e2e test that fully goes through exec server flow with our codex
cloud agent as orchestrator
2026-05-05 22:01:48 +00:00
Rasmus Rygaard
7e310bc7f3 Inject state DB, agent graph store (#20689)
## Why

We want the agent graph store to be passed down the stack as a real
dependency, the same way we already treat the thread store.

This will let us inject the agent graph store as a real dependency and
support implementations other than the local SQLite-backed one. Right
now most code instantiates a state DB and an agent graph store
just-in-time. Ideally, we would not depend on the state DB directly but
only read through the higher-level interfaces.

This change makes the dependency boundaries explicit and moves state DB
initialization to process bootstrap instead of hiding it inside local
store implementations.

## What changed

- `ThreadManager` now requires a `StateDbHandle` and an
`AgentGraphStore` at construction time instead of treating them as
optional internals.
- The local store constructors no longer lazily initialize SQLite.
Callers now initialize the state DB once per process and use that shared
handle to build:
  - `LocalThreadStore`
  - `LocalAgentGraphStore`
- App bootstraps (`app-server`, `mcp-server`, `prompt_debug`, and the
thread-manager sample) now initialize the state DB up front and inject
the resulting handle down the stack.
- `app-server` now consistently uses its process-scoped state DB handle
instead of reopening SQLite or trying to recover it from loaded threads.
- Device-key storage now reuses the shared state DB handle instead of
maintaining its own lazy opener.
- The thread archive / descendant traversal paths now use the injected
`AgentGraphStore` instead of reaching through local
thread-store-specific state.

## Verification

- `cargo check -p codex-core -p codex-thread-store -p codex-app-server
-p codex-mcp-server -p codex-thread-manager-sample --tests`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-core
thread_manager_accepts_separate_agent_graph_store_and_thread_store --
--nocapture`
- `cargo test -p codex-app-server
thread_archive_archives_spawned_descendants -- --nocapture`
2026-05-05 21:45:29 +00:00
Channing Conger
36460387ec Enable V8 sandboxing for source-built builds (#21146)
## Summary

This is the first PR in the V8 in-process sandboxing rollout.

It adds the build-system and Rust feature plumbing needed to support
sandboxed V8 builds, then enables sandboxing by default for the
source-built Bazel V8 path that we control directly. It deliberately
keeps the published `rusty_v8` artifact workflows on their current
non-sandboxed contract so this PR can land and ship independently before
we change any released artifacts.

## Rollout plan

- [x] **PR 1: land sandbox plumbing and default source-built Bazel V8 to
sandboxed mode**

- [ ] **PR 2: publish sandbox-enabled release artifacts and add
compatibility validation**
- Produce sandboxed artifact pairs for every released Cargo target that
does not already use the source-built Bazel path.
- Add CI coverage that consumes those sandboxed artifacts and verifies:
    - `codex-v8-poc` reports sandbox enabled
    - `codex-code-mode` builds/tests against the sandboxed path

- [ ] **PR 3: switch release consumers to sandboxed artifacts by
default**
  - Update released artifact selectors/checksums.
- Enable the Rust `v8_enable_sandbox` feature in the default release
path.
- Make the sandboxed artifact family the normal path for published
builds.

- [ ] **PR 4: remove rollout-only compatibility paths**
- Remove the temporary non-sandbox release compatibility config once the
new default has shipped and baked.
  - Keep the invariant tests permanently.
2026-05-05 14:36:37 -07:00
Felipe Coury
bb2257e3f5 [codex] fix TUI turn items view fixtures (#21243)
## Summary

Adds the required `items_view` field to the three session picker `Turn`
test fixtures that populate full turn item lists.

## Root Cause

`#21063` added `Turn.items_view` to the app-server protocol type. The
later session picker merge added three test-only
`codex_app_server_protocol::Turn` literals without the new field, which
broke Bazel compilation on `main` with `E0063: missing field
items_view`.

## Validation

- `just fmt`
- `cargo test -p codex-tui resume_picker --no-fail-fast`
- `just argument-comment-lint`

I also ran `cargo test -p codex-tui`; it compiled and ran the suite, but
this local machine failed two pre-existing status permission-profile
tests because `/etc/codex/requirements.toml` disallows
`DangerFullAccess`.
2026-05-05 14:24:28 -07:00
Eric Traut
8c88f9a304 Auto-deny MCP elicitations for Xcode 26.4 clients (#21113)
## Summary

Xcode 26.4 was built against app-server behavior from before MCP
elicitation requests became client-visible in CLI 0.120.0 via #17043.
That client line does not expect the new events/messages, so this PR
restores the old behavior for exactly that client/version combination.

The compatibility handling stays in the app-server layer: when the
initialized client is `Xcode` and its version starts with `26.4`, the
app server marks the live Codex thread so MCP elicitations are
auto-denied. The flag is applied on thread start/resume/fork/turn
attachment, carried through `Codex`/`CodexThread`, and stored on
`McpConnectionManager` so refreshed MCP managers preserve the behavior.

## Notes

This is intentionally narrow and includes a TODO to remove the
compatibility path once Xcode 26.4 ages out.
2026-05-05 14:05:42 -07:00
pakrym-oai
f593323ef1 [codex] Split tool handlers by tool name (#20687)
## Why

Tool registration used to bind a tool name to a handler externally,
which left ownership split between the registry plan and the handler
implementation. Some built-in handlers also multiplexed multiple in-core
tools by switching on the invoked tool name internally.

This moves the registry identity onto the handler itself and makes
built-in multi-tool areas use separate concrete handlers, so each
registered handler instance owns exactly one tool name and one dispatch
path.

## What Changed

- Added `ToolHandler::tool_name()` and changed
`ToolRegistryBuilder::register_handler` to derive the registry key from
the handler.
- Split built-in multiplexed handlers into concrete per-tool handlers
for unified exec, shell/local shell/container exec, MCP resources, goal
tools, and agent job tools.
- Kept name-carrying handler instances only where the runtime target is
inherently external or dynamic, such as MCP tools, dynamic tools, and
unavailable placeholders.
- Updated `ToolHandlerKind` and registry-plan construction so plan
entries map directly to concrete handler registrations.

## Verification

- `cargo test -p codex-tools tool_registry_plan`
- `cargo test -p codex-core --lib tools::registry_tests`
- `just fix -p codex-tools`
- `just fix -p codex-core`
2026-05-05 13:46:45 -07:00
viyatb-oai
9cbef243b5 fix(linux-sandbox): isolate Linux sandbox synthetic mount registry per user for shared codex use case (#21234)
## Summary
- make the Linux sandbox synthetic mount registry path unique per
effective UID
- keep same-user coordination intact while avoiding collisions between
users sharing `/tmp`
- add a regression test for the registry path contract

## Why
Issue #21192 reports that the Linux sandbox currently uses one global
temp path at `/tmp/codex-bwrap-synthetic-mount-targets`. If another user
creates that directory first, later users can fail to open the shared
lock file with `Permission denied`.

## Validation
- `just fmt`
- `cargo test -p codex-linux-sandbox`
- `cargo clippy -p codex-linux-sandbox --all-targets`

Fixes #21192
2026-05-05 20:43:37 +00:00
viyatb-oai
8b95d5467e fix(linux-sandbox): avoid panic on bwrap build failures (#21127)
## Summary

- Propagate Linux bubblewrap argument-construction failures instead of
panicking in the helper
- Keep mutable-symlink carveouts fail-closed while reporting them as
ordinary sandbox build failures
- Add regression coverage for a protected `.codex` symlink inside a
writable workspace root

## Root cause

Linux bubblewrap intentionally rejects read-only carveouts that cross a
symlink the sandboxed process can still rewrite. That is the correct
security behavior for protected metadata paths such as `.codex`.

The bug was one layer higher: `linux_run_main` treated the expected
build failure as impossible and panicked while constructing the
bubblewrap argv. For issue #20716, that turned a normal fail-closed
sandbox outcome into a noisy panic in the transcript.

## User impact

Users with a project-local `.codex` symlink inside a writable workspace
still get the conservative sandbox decision, but they no longer see a
Rust panic for that condition. The helper now exits with the concise
sandbox-build error so the normal denial / escalation path can handle
it.


Fixes #20716
2026-05-05 13:34:08 -07:00
Felipe Coury
3b2ebb368e feat(tui): redesign session picker (#20065)
## Why

The resume/fork picker is becoming the main way users recover previous
work, but the old fixed table made sessions hard to scan once thread
names, branches, working directories, and timestamps all mattered. This
redesign makes the picker denser by default, easier to search, and safer
to inspect before resuming or forking.

<table>
<tr>
<td>
<img width="1660" height="1103" alt="CleanShot 2026-05-03 at 12 34 10"
src="https://github.com/user-attachments/assets/313ede1d-1da4-4863-acd2-56b3e27e9703"
/>
</td>
<td>
<img width="1662" height="1100" alt="CleanShot 2026-05-03 at 12 34 15"
src="https://github.com/user-attachments/assets/cfde7d5c-bab0-4994-a807-254e53f344ea"
/>
</td>
</tr>
<tr>
<td>
<img width="1664" height="1107" alt="CleanShot 2026-05-03 at 12 39 22"
src="https://github.com/user-attachments/assets/e1ee58ca-4dc5-4a35-ae0f-47562da3974c"
/>
</td>
<td>
<img width="1662" height="1100" alt="CleanShot 2026-05-03 at 12 35 09"
src="https://github.com/user-attachments/assets/9c888072-eedf-4f45-985c-0c14df28bcc7"
/>
</td>
</tr>
</table>

## What Changed

- Replaces the old session table with responsive session rows that
prioritize the session name or preview, then show timestamp, cwd, and
branch metadata.
- Makes dense view the default while keeping comfortable view available
through `Ctrl+O`.
- Persists the picker view preference in `[tui].session_picker_view`,
including active profile-scoped config.
- Adds sort/filter controls for updated time, created time, cwd, and all
sessions.
- Expands search matching across session name, preview, thread id,
branch, and cwd.
- Makes `Esc` safer in search mode: it clears an active query before
starting a new session.
- Adds lazy transcript inspection:
  - `Space` expands recent transcript context inline.
  - `Ctrl+T` opens a transcript overlay.
  - raw reasoning visibility follows `show_raw_agent_reasoning`.
- Keeps remote cwd filtering server-side for remote app-server sessions
so local path normalization does not incorrectly hide remote results.
- Updates snapshots and config schema for the new picker states and
config option.

## How to Test

1. Start Codex in a repo with several saved sessions.
2. Press `Ctrl+R` / resume picker entry point.
3. Confirm the picker opens in dense mode and shows session name or
preview, timestamp, cwd, and branch metadata.
4. Press `Ctrl+O` and confirm it switches between dense and comfortable
views.
5. Restart Codex and confirm the selected view persists.
6. Type a query that matches a branch, cwd, thread id, or session name;
confirm matching sessions appear.
7. Press `Esc` while the query is non-empty and confirm it clears search
instead of starting a new session.
8. Select a session and press `Space`; confirm recent transcript context
expands inline.
9. Press `Ctrl+T`; confirm the transcript overlay opens and respects
raw-reasoning visibility settings.

Targeted tests:
- `cargo test -p codex-tui resume_picker --no-fail-fast`
- `cargo test -p codex-core
runtime_config_resolves_session_picker_view_default_and_override`
- `cargo test -p codex-core profile_tui_rejects_unsupported_settings`
- `cargo check -p codex-thread-manager-sample`
- `cargo insta pending-snapshots`
2026-05-05 13:32:54 -07:00
Felipe Coury
52fbbe7cdd feat(tui): route /diff through workspace commands (#21001)
Stacked on #20892.

## Why

#20892 adds the TUI workspace command abstraction so branch status
metadata can run through app-server instead of assuming the CLI process
has the active workspace locally. `/diff` still used direct local
process execution, which means remote app-server sessions could compute
the diff against the wrong machine or fail to see the active workspace
at all.

This PR moves `/diff` onto that same app-server-backed command path so
Git runs wherever the active workspace lives.

## What Changed

- Route `/diff` through the TUI `WorkspaceCommandExecutor` using the
active chat cwd.
- Replace direct `tokio::process::Command` usage in `get_git_diff` with
argv-based workspace command requests.
- Preserve the existing `/diff` behavior: tracked diff output, untracked
file diffs, treating Git diff exit code `1` as success, and showing the
existing non-git-repository message.
- Extend `WorkspaceCommand` with caller-set timeouts and an explicit
uncapped-output opt-out. Metadata probes remain capped by default;
`/diff` opts out because its full output is the user-visible payload.

## How to Test

Manual reviewer path:

1. Start the Codex TUI from a Git worktree with one tracked file change
and one untracked file.
2. Run `/diff`.
3. Confirm the rendered diff includes both the tracked diff and the
untracked file diff.
4. Start the TUI outside a Git worktree, or switch to a non-git cwd,
then run `/diff`.
5. Confirm it shows the existing `/diff` not-inside-a-git-repository
message.

Targeted tests run:

- `cargo test -p codex-tui get_git_diff -- --nocapture`
- `cargo test -p codex-tui branch_summary -- --nocapture`
- `cargo test -p codex-tui`
2026-05-05 17:09:25 -03:00
rhan-oai
9e0c191c13 add turn items view to app-server turns (#21063)
## Why

`Turn.items` currently overloads an empty array to mean either that no
items exist or that the server intentionally did not load them for this
response. That ambiguity blocks future lazy-loading work where clients
need to distinguish unloaded, summary, and fully hydrated turn payloads.

## What changed

- add a new `TurnItemsView` enum with `notLoaded`, `summary`, and `full`
variants
- add required `itemsView` metadata to app-server `Turn` payloads
- mark reconstructed persisted history as `full` and live shell-style
turn payloads as `notLoaded`
- keep current `thread/turns/list` behavior unchanged and document that
it still returns `full` turns today
- regenerate the JSON and TypeScript protocol fixtures

## Verification

- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server thread_read_can_include_turns`
- `cargo test -p codex-app-server
thread_turns_list_can_page_backward_and_forward`
- `cargo test -p codex-app-server
thread_resume_rejects_history_when_thread_is_running`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fmt`
2026-05-05 19:17:16 +00:00
pakrym-oai
b6d4c4ea6b [codex] Use shared app-server JSON-RPC error helpers (#21221)
## Why

App-server had repeated hand-built JSON-RPC error objects for standard
error shapes. Using the shared helpers keeps the common
`invalid_request`, `invalid_params`, and `internal_error` construction
in one place and reduces the chance of new call sites drifting from the
common error payload shape.

## What changed

- Replaced manual standard JSON-RPC error object creation with
`internal_error(...)`, `invalid_request(...)`, and `invalid_params(...)`
across app-server request processors and runtime paths.
- Removed local duplicate helper definitions from search and review
request handling.
- Preserved existing structured `data` payloads by creating the shared
helper error first and then attaching the existing metadata.
- Left custom non-standard errors and raw error-code assertions intact.

## Validation

- `cargo test -p codex-app-server`
2026-05-05 12:13:59 -07:00
Abhinav
0452dca986 hook trust metadata and enforcement (#20321)
# Why

We want shared hook trust that both the app and the TUI can build on,
but the metadata is only useful if runtime behavior agrees with it. This
PR adds a single backend trust model for hooks so unmanaged hooks cannot
run until the current definition has been reviewed, while managed hooks
remain runnable and non-configurable.

# What

- persist `trusted_hash` alongside hook state in `config.toml`
- expose `currentHash` and derived `trustStatus` through `hooks/list`
- derive trust from normalized hook definitions so equivalent hooks from
`config.toml` and `hooks.json` share the same trust identity
- gate unmanaged hooks on trust before they enter the runnable handler
set

# Reviewer Notes

- key file to review is `codex-rs/hooks/src/engine/discovery.rs`
- the only **core** change is schema related
2026-05-05 19:13:55 +00:00
starr-openai
78421face0 Route process tools to selected environments (#20647)
## Why
When a turn exposes multiple selected environments, shell-style tools
need a model-facing way to identify the intended target environment and
handlers need to resolve that target before parsing cwd-relative
permission fields or launching processes.

This PR scopes that rollout to process tools. Filesystem-oriented tools
such as `apply_patch`, `view_image`, and `list_dir` are intentionally
left for follow-up slices.

## What Changed
- Adds an `include_environment_id` option to shell-style tool schema
builders.
- Exposes optional `environment_id` on `shell`, `shell_command`, and
`exec_command` only when `ToolEnvironmentMode::Multiple` is active.
- Adds a shared handler helper that parses `environment_id` and
`workdir` from JSON function-call arguments and returns the selected
`Environment` plus effective absolute cwd.
- Uses that helper in `shell`, `shell_command`, and `exec_command`
handling so process execution uses the selected environment filesystem
and cwd.
- Changes `ExecCommandRequest` to carry a required resolved `cwd`,
removing the process-manager fallback to the primary turn cwd for new
exec commands.
- Leaves `write_stdin` unchanged because it targets an existing process
id, not a new environment.

## Testing
- Added unit coverage for process-tool schema exposure, selected
environment resolution, primary fallback, no-environment handling,
unknown environment ids, and resolving cwd-relative permission paths
against the selected environment cwd.
- Added a remote-suite e2e coverage case for `exec_command` routing
across explicit zero environments, one local environment, and
local+remote environments.
- Ran `just fmt` and `git diff --check`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-05 12:12:03 -07:00
rhan-oai
fb7e1eb6fc [codex-analytics] add tool item event schemas (#17089)
## Why

Tool analytics need stable, typed payloads before the later lifecycle
reducer starts emitting them. Keeping the event schema definitions
isolated in their own PR makes the emitted surface reviewable separately
from the reducer logic that produces those events.

## What changed

- Adds the common tool-item analytics event base plus event payload
types for command execution, file changes, MCP calls, dynamic tools,
collaboration tools, web search, and image generation.
- Extends `TrackEventRequest` with the corresponding tool-item variants.
- Adds serialization coverage for the command-execution event shape.

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17089).
* #18748
* #18747
* #17090
* __->__ #17089
* #20514
2026-05-05 11:49:30 -07:00
Owen Lin
6075b77001 app-server: ignore persist_extended_history param (#21225)
## Why

Taking a step to removing the `persistExtendedHistory` field. It's not
scalable to be persisting so much data in the rollout file and returning
it in the thread history.

When a client explicitly sends `true`, the server now tells that client
the parameter is deprecated and ignored so the caller has a clear
migration signal via the `deprecationNotice` notification.

## What changed

- Keep the `persist_extended_history` / `persistExtendedHistory` field
in the v2 protocol for compatibility, but document it as deprecated and
ignored.
- Ignore the parameter in app-server `thread/start`, `thread/resume`,
and `thread/fork`; those paths always use limited history persistence
now.
- Stop treating `persistExtendedHistory` as a running-thread resume
override mismatch.
- Emit a connection-scoped `deprecationNotice` when a request explicitly
sets `persist_extended_history: true`.

## Verification

- Added `thread_start_deprecates_persist_extended_history_true` to cover
the deprecation notice.
- `cargo test -p codex-app-server`
- `cargo test -p codex-app-server-protocol`
2026-05-05 18:36:13 +00:00
Felipe Coury
5e0a4adbe5 feat(tui): add raw scrollback mode (#20819)
## Why

Granular copy is particularly difficult with the current output. Part of
it was solved with the introduction of the `/copy` command but when you
only need to copy parts of a response, you still encounter some issues:

- When you copy a paragraph, the result is a sequence of separate lines
instead of one correctly joined paragraph.
- When a word wraps, part of it stays on the original line and the rest
appears at the start of the next line.
- When you copy a long command, extra line breaks are often inserted,
and command arguments can be split across multiple lines.


https://github.com/user-attachments/assets/0ef85c84-9363-4aad-b43a-15fce062a443

## Solution

Now that we own the scrollback and we re-create it when we resize, we
have the opportunity of toggling between the raw text and the rich text
we see today.

- Add TUI raw scrollback mode with `tui.raw_output_mode`, `/raw
[on|off]`, and the configurable `tui.keymap.global.toggle_raw_output`
action.
- Render transcript cells through rich/raw-aware paths so raw mode
preserves source text and lets the terminal soft-wrap selection-friendly
output.
- Bind raw-mode toggle to `alt-r` by default, with the keybinding path
toggling silently while `/raw` continues to emit confirmation messages.

## Related Issues

Likely addressed by raw mode:

- #12200: clean copy for multiline and soft-wrapped output. Raw mode
removes Codex-inserted wrapping/indentation and lets the terminal
soft-wrap logical lines.
- #9252: command suggestions gain unwanted leading spaces when copied.
Raw mode renders transcript text without the rich-mode left
padding/gutter.
- #8258: prompt output is hard to copy because of leading indentation.
Raw mode renders user/source-backed transcript text without that
decorative indentation.

Partially or conditionally addressed:

- #2880: copy/export message as Markdown. Raw mode exposes raw Markdown
for terminal selection, but this PR does not add a dedicated
export/copy-message command.
- #19820: mouse drag selection + copy in the TUI. Raw mode improves
terminal-native selection of output/history text, but this PR does not
implement in-TUI mouse selection, highlighting, auto-copy, or composer
selection.
- #18979: copied content is divided into two parts. This should improve
cases caused by Codex-inserted wraps/padding in rendered output; if the
report is about pasting into the composer/input path, that remains
outside this PR.

## Validation

- `just write-config-schema`
- `just fmt`
- `cargo test -p codex-config`
- `cargo test -p codex-tui`
- `just fix -p codex-tui`
- `just argument-comment-lint`
- `cargo test -p codex-tui
raw_output_mode_can_change_without_inserting_notice -- --nocapture`
- `cargo test -p codex-tui
raw_slash_command_toggles_and_accepts_on_off_args -- --nocapture`
- `cargo test -p codex-tui raw_output_toggle -- --nocapture`
- `git diff --check`
- `cargo insta pending-snapshots`
2026-05-05 11:17:47 -07:00
viyatb-oai
172303bbfa chore: add minimal proxy egress diagnostics (#21220)
## Why
Recent Auto Review reports show Git traffic hanging through the local
proxy on both SSH and HTTPS paths. Today the support bundle does not
make it obvious whether a request is stuck before upstream dialing,
during the proxy hop, or after the upstream response begins, which slows
down root-cause triage.

This adds a small amount of runtime visibility at the existing proxy
boundaries without changing routing or policy behavior.

## What changed
- log whether HTTP and CONNECT traffic take the direct or upstream-proxy
route
- log start / success / failure timings for CONNECT, HTTP, and SOCKS5
upstream dials
- log CONNECT forwarding lifecycle events
- describe HTTP success at the response-header boundary that is actually
observed, rather than implying the full body finished

## Verification
- `cargo test -p codex-network-proxy`
- `cargo clippy -p codex-network-proxy --all-targets -- -D warnings`
2026-05-05 17:50:59 +00:00
viyatb-oai
ed6082c9f9 fix(sandboxing): Bound advisory system bwrap startup probe (#20111)
## Why

Linux startup runs an advisory system `bwrap` warning probe on each
launch. On hosts with NFS or autofs mounts, its `--ro-bind / /` probe
can take tens of seconds before Codex prints anything, matching #19828.
Because this probe only decides whether to surface a warning, it should
not be allowed to stall startup.

Relevant pre-change path:
[`codex-rs/sandboxing/src/bwrap.rs`](de2ccf9473/codex-rs/sandboxing/src/bwrap.rs (L64-L80))

## What changed

- Bound the advisory system `bwrap` probe to 500 ms.
- Preserve the existing warning behavior when `bwrap` promptly reports a
known user-namespace failure.
- Kill and reap the probe child on timeout, then suppress the advisory
warning instead of blocking startup.
- Read probe stderr with a bounded nonblocking drain so descendants that
inherit the pipe cannot extend startup after the probe child exits.
- Add regression coverage for both a deliberately slow fake `bwrap`
process and a fake probe whose descendant keeps stderr open.

## Security

This only bounds the advisory startup probe. It does not change the
command execution path or add a fail-open sandbox fallback. The related
command-side hang in #20017 remains separate from this PR.

## Verification

- Added `system_bwrap_probe_times_out_without_reporting_a_warning`.
- Added
`system_bwrap_probe_does_not_wait_for_descendants_holding_stderr_open`.
- `cargo test -p codex-sandboxing`
- `cargo clippy -p codex-sandboxing --all-targets -- -D warnings`

Fixes #19828
Related: #20017
2026-05-05 10:45:35 -07:00
Felipe Coury
a3a09dfc9b fix(tui): external editor expansion for same-size large pastes (#21190)
## Why

We found this while reviewing #21091, but confirmed it is not introduced
by that PR: the order-sensitive `current_text_with_pending()`
replacement loop already existed, and `main` already allowed active
same-size large pastes to use prefix-overlapping labels such as `[Pasted
Content N chars]` and `[Pasted Content N chars] #2`.

#21091 fixes placeholder numbering after a draft is cleared, so a fresh
same-size paste can reuse the base label. This PR fixes a different
path: when a draft already contains multiple active same-size large
pastes, the placeholders can overlap by prefix, for example `[Pasted
Content N chars]` and `[Pasted Content N chars] #2`.

That overlap breaks `current_text_with_pending()` when the composer
materializes the draft text for the external editor. Replacing the base
placeholder first can partially rewrite the `#2` placeholder, leaving
the external editor seeded with corrupted text instead of both paste
payloads.

| Before | After |
|---|---|
| <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 18 09"
src="https://github.com/user-attachments/assets/88a2936c-cf00-4adc-8567-8fd8f398b4a8"
/> | <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 20
31"
src="https://github.com/user-attachments/assets/119cff52-43c8-432a-9367-418d82f4ed82"
/> |
| <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 18 57"
src="https://github.com/user-attachments/assets/026031bb-839b-4252-a0fd-9ba9616435fe"
/> | <img width="1230" height="1008" alt="CleanShot 2026-05-05 at 10 21
31"
src="https://github.com/user-attachments/assets/8cb6f2c8-3a5d-411b-8623-dca666ee3c08"
/> |

## What Changed

- Changed `current_text_with_pending()` to expand pending pastes through
the existing element-range based `expand_pending_pastes()` helper
instead of global string replacement.
- Added a regression test with two different same-length large pastes to
ensure both overlapping placeholders expand to their original payloads.

## How to Test

1. Start Codex TUI.
2. Paste a large string, for example 1004 `A` characters.
```shell
perl -e 'print "A" x 1004' | pbcopy
```
3. Paste a second large string with the same length, for example 1004
`B` characters.
```shell
perl -e 'print "B" x 1004' | pbcopy
```
4. Open the external editor from the composer.
5. Confirm the editor is seeded with the full `A...` payload followed by
the full `B...` payload, with no literal `#2` left behind.

Targeted tests:
- `cargo test -p codex-tui
current_text_with_pending_expands_overlapping_placeholders`
- `just argument-comment-lint-from-source -p codex-tui`

I also ran `cargo test -p codex-tui`; it reached the full crate suite
but failed two unrelated local status tests because this machine's
`/etc/codex/requirements.toml` rejects `DangerFullAccess`.
2026-05-05 14:41:43 -03:00
Abhinav
13be504063 revert legacy notify deprecation (#21152)
# Why

Revert #20524 for now because the computer use plugin has not migrated
off legacy `notify` yet. Keeping the deprecation in place today would
show users a warning before the plugin path is ready to move, so this
rolls the change back until that migration is complete.

# What

- revert the legacy `notify` deprecation change from #20524
- restore the prior `notify` behavior and remove the temporary
deprecation metrics/docs from that change

Once the computer use plugin has migrated, we can land the same
deprecation again.
2026-05-05 10:34:44 -07:00
canvrno-oai
394242e95b [codex] Fix fork --last cwd filtering (#21089)
Fixes #20945.

This keeps `codex fork --last` aligned with the neighboring
latest-session lookup flows. The local fork path now uses the same
cwd-scope helper as `resume --last`, which is also a small code cleanup
around how this selection logic is shared.

Credit to @chanwooyang1 for the report and for pointing out the narrow
fix direction.

What changed:
- Route `fork --last` through the shared latest-session cwd filter.
- Preserve `--all` as the explicit opt-in for global latest-session
selection.
- Keep remote cwd override behavior unchanged.
- Add focused coverage for local default, `--all`, and remote override
filter semantics.

Validation:
- Ran `just fmt`.
- Ran `git diff --check`.
- Reviewed the `fork --last`, `resume --last`, and fork picker selection
paths against the issue report.
2026-05-05 10:33:40 -07:00
canvrno-oai
1feaa7d85b [codex] Fix TUI large paste placeholder numbering after Ctrl+C (#21091)
Fixes #19940.

Large-paste placeholder numbering was backed by a per-size counter, so
clearing a draft with `Ctrl+C` left numbering state behind even though
the active pending paste state was gone. This updates the composer to
derive the next placeholder suffix from active pending pastes instead,
which keeps simultaneous same-size pastes distinct while letting fresh
drafts reuse the base label. This is also a small code cleanup: pending
paste state is now the source of truth instead of maintaining a separate
counter.

Credit to @Sungyoun-Kim for the issue report, root-cause notes, and fork
with the proposed fix, and to @charley-oai for the earlier related
#10032 proposal.

Changes:
- Remove the monotonic large-paste counter from the composer.
- Compute suffixes from currently active pending paste placeholders.
- Document large-paste placeholder behavior in the composer module docs.
- Add regression coverage for `Ctrl+C` clearing and deletion/reset
behavior.

Testing:
- `just fmt`
- `git diff --check`
2026-05-05 10:33:37 -07:00
Abhinav
af86be529c Support PreToolUse additionalContext (#20692)
# Why

`PreToolUse` already exposes `hookSpecificOutput.additionalContext` in
the generated hook schema, but the runtime still rejected it as
unsupported. That leaves `PreToolUse` out of step with the other
context-injecting hooks and prevents hook authors from attaching
model-visible guidance to a pending tool call before it runs.

# What

- Parse `PreToolUse.additionalContext` and carry it through the hook
event pipeline.
- Record `PreToolUse` context at the hook boundary so successful context
is preserved for both allowed and blocked calls without widening the
tool registry surface.
- Preserve existing deny behavior when context is combined with either
`permissionDecision: "deny"` or the legacy `decision: "block"` shape.
2026-05-05 10:29:30 -07:00
iceweasel-oai
f35285dc78 Add Windows sandbox readiness RPC (#20708)
## Why

The desktop app on Windows needs a read-only way to tell, before the
next tool call, whether the local Windows sandbox setup is in a state
that should block the user and ask for setup again.

The main case we want to cover is the elevated sandbox setup version
bump. Today, if the app is configured for elevated Windows sandboxing
and the installed setup is stale, the next sandboxed shell/exec path can
end up triggering the elevated setup flow directly. That means the user
can see an unexpected UAC prompt with no UI explanation.

This change adds a small app-server preflight so the desktop app can ask
“is Windows sandbox ready, not configured, or update-required?” during
startup and show the appropriate blocking UI before the user hits a tool
call.

## What changed

- Added a new read-only app-server RPC: `windowsSandbox/readiness`
- Added a new protocol enum and response type:
  - `WindowsSandboxReadiness`
  - `WindowsSandboxReadinessResponse`
- Added core readiness logic in `core/src/windows_sandbox.rs`:
  - `ready`
  - `notConfigured`
  - `updateRequired`
- Wired the new request through `codex_message_processor`
- Regenerated the vendored app-server schema fixtures

## Readiness semantics

This is intentionally a coarse startup/version-bump readiness check, not
a full predictor of every runtime repair case.

For now, readiness is determined from:
- the configured Windows sandbox level
- `sandbox_setup_is_complete()` for elevated mode

That means:
- `disabled` maps to `notConfigured`
- `restricted token` maps to `ready`
- `elevated` maps to `ready` or `updateRequired` depending on
`sandbox_setup_is_complete()`

This is deliberate for the first UI integration because the common case
we want to catch is “the app updated, the elevated setup version bumped,
and the user should see an update-required blocker instead of a surprise
UAC prompt”.

It does not attempt to model every case where the deeper runtime path
might decide to repair or re-run setup.

## Testing

- Ran `cargo fmt --all -- app-server-protocol/src/protocol/common.rs
app-server-protocol/src/protocol/v2.rs
app-server/src/codex_message_processor.rs core/src/windows_sandbox.rs
core/src/windows_sandbox_tests.rs`
- Added unit tests for the pure readiness mapping in
`core/src/windows_sandbox_tests.rs`
- Regenerated vendored schema fixtures with `cargo run -p
codex-app-server-protocol --bin write_schema_fixtures -- --schema-root
app-server-protocol/schema`
- Did not run the full cargo test suite
2026-05-05 09:58:23 -07:00
Eric Traut
f09e1936e0 Validate /goal objective length in TUI (#20746)
## Why

Long `/goal` definitions currently reach lower-level goal validation and
can produce an opaque failure. This bug was reported by a user. Pasted
instruction blocks are especially confusing because the composer can
still contain a paste placeholder before expansion, which may otherwise
fall into the generic prompt-size error path.

There was also a related paste edge case where `/goal ` followed by a
multiline block whose first pasted line was blank looked like a bare
`/goal` command. That showed the goal usage/summary instead of setting
the pasted objective.

## What Changed

This adds TUI-side preflight validation for `/goal <objective>` using
the shared `MAX_THREAD_GOAL_OBJECTIVE_CHARS` limit. Oversized typed,
queued, and pasted goal objectives now fail locally with a goal-specific
message that recommends putting longer instructions in a file and
referencing that file from the goal.

The TUI now also lets inline-argument slash commands consume later-line
arguments before treating the first line as a bare command, so `/goal `
followed by blank lines and then objective text sets the goal instead of
opening the bare `/goal` flow.

## Manual Testing

1. Start the TUI with goals enabled and an active session.
2. Submit `/goal ` followed by exactly 4,000 objective characters. It
should continue through the normal goal-setting path.
3. Submit `/goal ` followed by 4,001 objective characters. It should not
set a goal, and should show `Goal objective is too long: 4,001
characters. Limit: 4,000 characters.` followed by the guidance to put
longer instructions in a file and reference that file from the goal.
4. Type `/goal `, paste a large block that becomes a `[Pasted Content
... chars]` placeholder, then submit. It should validate the expanded
pasted text and show the goal-specific file guidance rather than the
generic prompt-size error.
5. Type `/goal `, paste a multiline block whose first line is blank,
then submit. It should set the objective from the non-blank pasted
content instead of showing `Usage: /goal <objective>` or the bare goal
summary.
6. While a turn is running, queue an oversized `/goal` command. When the
queue drains, it should show the same goal-specific error and should not
emit a goal-setting request.
2026-05-05 09:55:07 -07:00
Eric Traut
91b7350187 Add goal lifecycle metrics (#20799)
## Why

Adding goal metrics makes it possible to track how often goals are
created, completed, and stopped by budget limits, plus the final token
and wall-clock usage for terminal outcomes.

## What Changed

- Added OpenTelemetry metric constants for goal lifecycle tracking:
- `codex.goal.created`: increments each time a new persisted goal is
created or an existing goal is replaced with a new objective.
- `codex.goal.completed`: increments when a goal transitions to
`complete`.
- `codex.goal.budget_limited`: increments when a goal transitions to
`budget_limited` because its token budget has been reached.
- `codex.goal.token_count`: records the final persisted token count when
a goal transitions to `complete` or `budget_limited`.
- `codex.goal.duration_s`: records the final persisted elapsed
wall-clock time, in seconds, when a goal transitions to `complete` or
`budget_limited`.
- Emitted creation metrics when a goal is created or replaced.
- Emitted terminal outcome counters and final usage histograms when a
goal transitions to `complete` or `budget_limited`, avoiding
double-counting later in-flight accounting for already budget-limited
goals.
- Added focused `codex-core` tests for create/complete metrics and
one-time budget-limit metrics.
2026-05-05 09:21:54 -07:00
Felipe Coury
69283aa1c0 fix(tui): make /copy work inside tmux without passthrough (#20207)
## Summary
- prefer tmux's native clipboard integration for `/copy` when running
inside tmux
- fall back to OSC 52 when tmux clipboard copy is unavailable
- add coverage for tmux-preferred, fallback, and combined-failure paths

## Why
Inside tmux, `/copy` previously relied on DCS-wrapped OSC 52 when `TMUX`
was set. That only reaches the outer terminal when tmux passthrough is
enabled, so Codex could report success even though the system clipboard
never changed.

## User impact
`/copy` now works inside tmux even when `allow-passthrough` is off, as
long as tmux clipboard integration is available. If tmux cannot handle
the copy, Codex still keeps the existing OSC 52 fallback path.

## Validation
- `cargo test -p codex-tui`
- `just fmt`
- `just fix -p codex-tui`
- `just argument-comment-lint`
- manually verified `/copy` inside tmux with `allow-passthrough off`

Fixes #19926
2026-05-05 16:18:02 +00:00
jif-oai
be12a80ad1 feat: add normalized matching to memory search (#21205)
## Why

Memory search currently treats separators literally, so callers need to
know whether a stored term uses spaces, hyphens, or no separators at
all. That makes recall brittle for terms such as `MultiAgentV2` vs.
`multi agent v2` and `cold-resume` vs. `cold resume`.

## What changed

- Add an opt-in `normalized` mode to memory search that removes
non-alphanumeric separators after any requested case folding.
- Thread the new flag through the MCP `search` tool into the local
backend while keeping existing literal matching as the default.
- Reject queries that normalize to an empty string, and add regression
coverage for both normalized matching and that validation path.

## Testing

- `cargo test -p codex-memories-mcp`
2026-05-05 17:33:07 +02:00
jif-oai
f75c600872 feat: support windowed multi-query memory search (#21204)
## Why

Memory search currently supports either independent substring matches or
requiring every query to appear on the same line. That is too
restrictive for memory files where related terms often land on nearby
lines in the same note or bullet block.

## What changed

- Replace the old `all` match mode with explicit tagged modes:
`all_on_same_line` and `all_within_lines { line_count }`.
- Add windowed matching in `codex-rs/memories/mcp/src/local.rs` so
callers can require every query to appear within a bounded line range
while returning only the minimal qualifying windows.
- Reject invalid zero-width windows and update the MCP tool description
plus argument parsing to expose the new mode.
- Add coverage for same-line matching, windowed matching, and invalid
`line_count` input.

## Verification

- Added targeted coverage in `codex-rs/memories/mcp/src/local_tests.rs`
for `search_supports_all_within_lines_match_mode` and
`search_rejects_zero_line_window`.
- Added server-side parsing coverage in
`codex-rs/memories/mcp/src/server.rs` for
`search_args_accept_windowed_all_match_mode`.
2026-05-05 17:15:21 +02:00
jif-oai
de924af134 memories-mcp: hide dot paths from list, read, and search (#21201)
## Why

The local memories root can contain implementation details such as
`.git` plus incidental OS metadata like `.DS_Store`. Those entries are
not authored memory content, so the memories MCP should keep them
invisible instead of exposing them through normal discovery or direct
lookup.

Only for local implementation ofc

## What changed

- Return `NotFound` for scoped `list`, `read`, and `search` requests
that include a hidden path component.
- Skip hidden files and directories while listing a directory or
recursively searching the memories tree.
- Add regression coverage for hidden files, hidden directories, and
hidden scoped requests across `list`, `read`, and `search`.

## Testing

- Added focused regression tests in `memories/mcp/src/local_tests.rs`
covering hidden-path behavior across the affected APIs.
2026-05-05 16:59:05 +02:00
jif-oai
70807730f5 tools: remove unused experimental list_dir tool (#21170)
## Why
`list_dir` still carries a full spec/handler/test path, but nothing in
the current model catalog advertises it via
`experimental_supported_tools`. That leaves us maintaining an
environment-backed tool surface that is effectively unused.

## What changed
- delete the `list_dir` handler and its tests from `codex-core`
- remove the `list_dir` spec builder, handler kind, and registry wiring
from `codex-tools`
- clean up the remaining internal README and registry tests so they no
longer mention the removed tool
2026-05-05 13:11:07 +02:00
Ahmed Ibrahim
9d579813bb 1- Add model service tiers metadata (#20969)
## Why

The model list needs to carry display-ready service tier metadata so
clients can render tier choices with stable IDs, names, and
descriptions. A raw speed-tier string list is not enough for richer UI
copy or future tier labels.

## What changed

- Added `ModelServiceTier` to shared model metadata with string `id`,
`name`, and `description` fields.
- Added `service_tiers` to `ModelInfo` and `ModelPreset`, preserving
empty defaults for older cached model payloads.
- Exposed `serviceTiers` on app-server v2 `Model` responses and threaded
it through TUI app-server model conversion.
- Marked legacy `additional_speed_tiers` / `additionalSpeedTiers`
metadata as deprecated in source and generated schema output.
- Regenerated app-server protocol JSON schema and TypeScript fixtures,
including `ModelServiceTier.ts`.

## Verification

- Ran `just write-app-server-schema`.
- Did not run local tests per repo instruction; relying on PR CI.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-05 09:51:18 +03:00
Abhinav
dca105cf99 Spill large hook outputs from context (#21069)
## Why

Large hook outputs can enter model-visible context through hook-specific
paths such as `additionalContext` and `Stop` continuation prompts.
Without a dedicated cap, one hook can inject a large blob directly into
conversation history instead of leaving a bounded preview for the model
and preserving the full text elsewhere.

## What

- spill hook text once it exceeds a fixed `2_500`-token budget,
preserving the full output on disk and leaving a head/tail preview plus
saved path in context
- add shared hook-output spilling under
`CODEX_HOME/hook_outputs/<thread_id>/<uuid>.txt`
- apply the cap to both `additionalContext`, `feedback_message`, and
`Stop` continuation fragments
2026-05-05 05:03:18 +00:00
Tom
33d24b0df5 codex: migrate (more) app-server thread history reads to ThreadStore (#20575)
Migrate token usage replay, rollback responses, and detached review
setup (a special case of forking) to be served from ThreadStore reads
rather direct rollout files.

- replay restored token usage from already-loaded `RolloutItem` history
instead of reopening `Thread.path`
- rebuild rollback responses from loaded `ThreadStore` snapshots and
history
- start detached reviews from store-backed parent history and stored
review-thread metadata
- remove obsolete app-server rollout-summary helper code that became
dead after the store-backed migration
- preserve response/notification ordering for resume, fork, rollback,
and detached review flows
- add integration test coverage for the affected paths
2026-05-04 21:16:50 -07:00
edwardysun3
7e71d02610 Add turn_id to Codex skill invocation analytics (#21122) 2026-05-05 00:11:06 -04:00
alexsong-oai
3ad7cf0993 Add plugin ID to skill analytics (#20923)
## Summary
- thread plugin skill roots through the skills loader with their plugin
ID
- store plugin ID on loaded skill metadata for plugin-provided skills
- include plugin ID on skill invocation analytics events

## Test plan
- cargo check -p codex-core-skills
- cargo check -p codex-core -p codex-core-plugins -p codex-analytics
- cargo check -p codex-tui
- cargo check -p codex-plugin -p codex-core -p codex-core-plugins -p
codex-analytics
- cargo check -p codex-app-server
- cargo test -p codex-analytics
- HOME=/private/tmp/codex-empty-home cargo test -p codex-core-skills
- just fix -p codex-core-skills
- just fix -p codex-analytics
- just fix -p codex-core-plugins
- just fix -p codex-core
- just fmt
- git diff --check
2026-05-04 20:36:29 -07:00
Tom
707e51bd8b codex: route metadata updates through ThreadStore (#20576)
- Route `thread/metadata/update` through
`ThreadStore::update_thread_metadata`.
- Add `LocalThreadStore` git metadata patch support for set, partial
update, and clear semantics.
- Add some unit tests for the new thread store code
- Remove a lot of dead code/tests!
2026-05-04 20:09:41 -07:00
Shijie Rao
0d418f478d Rename agent identity login surface to access token (#21059)
## Why
The external startup/login surface for this auth path should talk about
an access token instead of exposing the internal Agent Identity
terminology. Users should pass `CODEX_ACCESS_TOKEN` or pipe a token into
`codex login --with-access-token`; the old external env/flag spellings
are removed so there is only one supported user-facing path.

## What Changed
- Added `CODEX_ACCESS_TOKEN` as the supported environment variable for
this auth path.
- Added `codex login --with-access-token` as the supported stdin-based
login command.
- Removed the legacy `CODEX_AGENT_IDENTITY` env-var fallback and hidden
`--with-agent-identity` CLI alias.
- Updated CLI error, status, and stdin prompts to use access-token
language.
- Added coverage for access-token env loading, CLI login failure
behavior, and renamed login status text.

## Validation
- `cargo test -p codex-login`
- `cargo test -p codex-cli`
- `just fix -p codex-login`
- `just fix -p codex-cli`
2026-05-04 19:43:48 -07:00
evawong-oai
d85783901c [network-proxy] Cover DNS timeout blocking (#21105)
## Summary
- Add a testable DNS lookup helper for the local or private host
precheck while preserving production `lookup_host` behavior.
- Add deterministic coverage for DNS timeout, lookup error, private
resolution, and public resolution decisions.
- Keep BUGB 15982 guarded without relying on ambient DNS timing or
resolver behavior.

## Why
BUGB 15982 was fixed by failing closed on DNS lookup errors and
timeouts. The existing regression covered lookup failure through real
DNS, but did not deterministically exercise the timeout branch. This PR
adds a small injection point so CI can cover that branch without
standing up slow authoritative DNS.

## Validation
- `cargo test -p codex-network-proxy host_resolves_to_non_public_ip --
--nocapture`
- `cargo test -p codex-network-proxy
host_blocked_rejects_allowlisted_hostname_when_dns_lookup_fails --
--nocapture`
- `cargo test -p codex-network-proxy`
- `just fmt`
- `just fix -p codex-network-proxy`
- `git diff --check`

## Tickets
- BUGB 15982
-
https://linear.app/openai/issue/BUGB-15982/codex-dns-timeout-fail-open-in-codex-network-proxy-bypasses
- Bugcrowd:
https://tracker.bugcrowd.com/openai/submissions/b2bf131d-db04-478f-85aa-cdd17ca8f604
2026-05-04 19:03:56 -07:00
Ruslan Nigmatullin
4950e7d8a6 [codex] Add unsandboxed process exec API (#19040)
## Why

App-server clients sometimes need argv-based local process execution
while sandbox policy is controlled outside Codex. Those environments can
reject sandbox-disabling paths before a command ever starts, even when
the caller intentionally wants unsandboxed execution.

This PR adds a distinct `process/*` API for that use case instead of
extending `command/exec` with another sandbox-disabling shape. Keeping
the new surface separate also makes the future removal of `command/exec`
simpler: clients that need explicit process lifecycle control can move
to the newer handle-based API without depending on `command/exec`
business logic.

## What changed

- Added v2 process lifecycle methods: `process/spawn`,
`process/writeStdin`, `process/resizePty`, and `process/kill`.
- Added process notifications: `process/outputDelta` for streamed
stdout/stderr chunks and `process/exited` for final exit status and
buffered output.
- Made `process/spawn` intentionally unsandboxed and omitted
sandbox-selection fields such as `sandboxPolicy` and
`permissionProfile`.
- Added client-supplied, connection-scoped `processHandle` values for
follow-up control requests and notification routing.
- Supported cwd, environment overrides, PTY mode and size, stdin
streaming, stdout/stderr streaming, per-stream output caps, and timeout
controls.
- Killed active process sessions when the originating app-server
connection closes.
- Wired the implementation through the modular `request_processors/`
app-server layout, with process-handle request serialization for
follow-up control calls.
- Updated generated JSON/TypeScript schema fixtures and documented the
new API in `codex-rs/app-server/README.md`.
- Added v2 app-server integration coverage in
`codex-rs/app-server/tests/suite/v2/process_exec.rs` for spawn
acknowledgement before exit, buffered output caps, and process
termination.

## Verification

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server`

---------

Co-authored-by: Owen Lin <owen@openai.com>
2026-05-04 16:43:58 -07:00
xli-oai
a8db4af5c3 Remove remote plugin uninstall prefix gate (#20722)
## Summary

Remove the hardcoded remote plugin ID prefix allow-list from app-server
uninstall routing. IDs that do not parse as local `plugin@marketplace`
IDs now flow through the remote uninstall path, where the existing
remote ID safety validation still rejects empty IDs, spaces, slashes,
and other unsafe characters before URL/cache use.

## Why

Plugin-service owns the backend remote plugin ID contract. Codex should
not require remote IDs to start with the local hardcoded prefixes
`plugins~`, `plugins_`, `app_`, `asdk_app_`, or `connector_`, because
newer backend ID families could otherwise be rejected before
plugin-service sees the request.

## Validation

- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `just fix -p codex-app-server`
- `git diff --check`
2026-05-04 16:28:13 -07:00
rhan-oai
aee1fe2659 [codex-analytics] add item lifecycle timing (#20514)
## Why

Tool families already disagree on what their existing `duration` fields
mean, so lifecycle latency should live on the shared item envelope
instead of being inferred from per-tool execution fields. Carrying that
envelope through app-server notifications gives downstream consumers one
reusable timing signal without pretending every tool has the same
execution semantics.

## What changed

- Adds `started_at_ms` to core `ItemStartedEvent` values and
`completed_at_ms` to core `ItemCompletedEvent` values.
- Populates those timestamps in the shared session lifecycle emitters,
so protocol-native items get timing without each producer tracking its
own clock state.
- Exposes `startedAtMs` on app-server `item/started` notifications and
`completedAtMs` on `item/completed` notifications.
- Maps the lifecycle timestamps through the app-server boundary while
leaving legacy-converted notifications nullable when no lifecycle
timestamp exists.
- Regenerates the app-server JSON schema and TypeScript fixtures for the
notification-envelope change and updates downstream fixtures that
construct those notifications directly.
- Extends the existing web-search and image-generation integration flows
to assert the new lifecycle timestamps on the native item events.

## Verification

- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-tui -p codex-exec
-p codex-app-server-client`
- `cargo test -p codex-core --test all web_search_item_is_emitted`
- `cargo test -p codex-core --test all
image_generation_call_event_is_emitted`
- `cargo test -p codex-app-server-protocol`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20514).
* #18748
* #18747
* #17090
* #17089
* __->__ #20514
2026-05-04 22:33:20 +00:00
kmeelu-oai
e7e6267ab3 Make realtime sideband startup async (#20715)
## Summary

Moves the WebRTC realtime sideband websocket join out of the voice start
critical path. Call creation still posts the SDP offer and session
config synchronously so the client gets the SDP answer, but the sideband
websocket now connects in the input task async and doesn't block
conversation state installation.

This lets the normal realtime input channels buffer text, handoff
output, and audio while the WebRTC sideband websocket is connecting. If
the sideband join fails while the conversation is still active, the task
sends a RealtimeEvent::Error through the existing events_tx / fanout
path.

To rephrase this:
* No longer blocked on sideband: the client can receive the SDP answer
earlier, set up the WebRTC peer connection, and let the media leg
progress while the sideband websocket joins.
* Still blocked on sideband: queued text, handoff output, and sideband
server events cannot flow until connect_webrtc_sideband(...).await
finishes and then run_realtime_input_task(...) starts

## Validation

- `env CODEX_SKIP_VENDORED_BWRAP=1 cargo test --manifest-path
codex-rs/Cargo.toml -p codex-core --test all
conversation_webrtc_start_posts_generated_session`

`CODEX_SKIP_VENDORED_BWRAP=1` is needed in this local environment
because `libcap.pc` is not installed for the vendored bubblewrap build.

## Testing
I tested this locally by running `cargo run -p codex-cli --bin codex --
--enable realtime_conversation` and invoking `/realtime`. Then, we get
logs emitted in `~/.codex/log/codex-tui.log`.

### Before the Change
Logging commit
(c0299e6edf)
```
2026-05-04T16:06:09.251956Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: starting realtime conversation
2026-05-04T16:06:09.251980Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: creating realtime call transport="webrtc"
2026-05-04T16:06:10.365722Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: realtime call created; sdp answer ready transport="webrtc" call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy elapsed_ms=1113 total_elapsed_ms=1113
2026-05-04T16:06:10.365843Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: connecting realtime sideband websocket call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy
2026-05-04T16:06:10.784528Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: connected realtime sideband websocket call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy elapsed_ms=418 total_elapsed_ms=1532
2026-05-04T16:06:10.784665Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: realtime conversation started
```

### After the Change
Logging commit
(c8b00ac21a)
```
2026-05-04T15:41:24.080363Z  INFO ... codex_core::realtime_conversation: starting realtime conversation
2026-05-04T15:41:24.080434Z  INFO ... codex_core::realtime_conversation: creating realtime call transport="webrtc"
2026-05-04T15:41:25.106906Z  INFO ... codex_core::realtime_conversation: realtime call created; sdp answer ready transport="webrtc" call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy elapsed_ms=1026 total_elapsed_ms=1026
2026-05-04T15:41:25.107067Z  INFO ... codex_core::realtime_conversation: spawned realtime sideband connection task transport="webrtc" total_elapsed_ms=1026
2026-05-04T15:41:25.107160Z  INFO ... codex_core::realtime_conversation: realtime conversation started
2026-05-04T15:41:25.107185Z  INFO codex_core::realtime_conversation: connecting realtime sideband websocket call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy
2026-05-04T15:41:25.107352Z  INFO ... codex_core::realtime_conversation: sent realtime sdp answer to client
2026-05-04T15:41:26.076685Z  INFO codex_core::realtime_conversation: connected realtime sideband websocket call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy elapsed_ms=969 total_elapsed_ms=1996
2026-05-04T15:41:26.573893Z  INFO codex_core::realtime_conversation: realtime session updated realtime_session_id=sess_u0_Dbpi8nhak5eLjQZ73yhAy
2026-05-04T15:41:26.573970Z  INFO codex_core::realtime_conversation: received realtime conversation event event=SessionUpdated { ... }
```

### Conclusion
Here we see that we saved about a half a second in conversation startup
(1532ms -> 969ms). This also checks out with my sanity tests; I was
seeing at most a second of saving.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-04 22:28:14 +00:00
Felipe Coury
36912ce3de fix(tui): use shared paste burst interval on Windows (#18914)
## Summary

Fixes #11678 by removing the Windows-specific
`PASTE_BURST_CHAR_INTERVAL` override. Windows now uses the same `8ms`
paste-burst character interval as macOS and Linux, which removes the
extra per-character hold that made fast typing and key repeat feel
delayed on Windows.

The paste-burst heuristic itself is unchanged, and the Windows-specific
active idle timeout remains in place. This PR only restores the shared
character-to-character burst threshold that decides whether adjacent
plain character events are part of a paste.

## Motivation

PR #9348 raised the Windows character interval from `8ms` to `30ms` to
protect the multiline paste behavior tracked in #2137, where pasted
newlines could be interpreted as submits in Windows terminals. That
fixed the paste failure, but it also made ordinary typing visibly laggy
because the TUI waits briefly before flushing a single typed character
while it checks whether a paste burst is forming.

The deployed behavior here is to remove that Windows-only delay and
return to the cross-platform threshold. Manual Windows validation of the
critical VS Code integrated terminal path shows multiline paste still
works with the final `8ms` value, including testing on VS Code
`1.107.0`.

## Testing

- `cargo test -p codex-tui`
- Manual Windows validation in VS Code integrated PowerShell with the
final `8ms` interval
2026-05-04 20:39:11 +00:00
Michael Bolin
30de54da36 bazel: run sharded rust integration tests (#21057)
## Why

Bazel CI was not actually exercising some sharded Rust integration-test
targets on macOS. The `rules_rust` sharding wrapper expects a symlink
runfiles tree, but this repo runs Bazel with `--noenable_runfiles`. In
that configuration the wrapper could fail to find the generated test
binary, produce an empty test list, and exit successfully. That made
targets such as `//codex-rs/core:core-all-test` look green even when
Cargo CI could still catch failures in the same Rust tests.

The coverage gap appears to have been introduced by
[#18082](https://github.com/openai/codex/pull/18082), which enabled
rules_rust native sharding on `//codex-rs/core:core-all-test` and the
other large Rust test labels. The manifest-runfiles setup itself
predates that change in
[#10098](https://github.com/openai/codex/pull/10098), but #18082 is
where the affected integration tests started running through the
incompatible rules_rust sharding wrapper.
[#18913](https://github.com/openai/codex/pull/18913) fixed the same
class of issue for wrapped unit-test shards, but integration-test shards
were still going through the rules_rust wrapper until this PR.

We still do not have the V8/code-mode pieces stable under the Bazel CI
cross-compile setup, so this keeps those tests out of Bazel while
restoring coverage for the rest of the sharded Rust integration suites.
Cargo CI remains responsible for V8/code-mode coverage for now.

This change did uncover a real failing core test on `main`:
`approved_folder_write_request_permissions_unblocks_later_apply_patch`.
That fix is split into
[#21060](https://github.com/openai/codex/pull/21060), which enables the
`apply_patch` tool in the test, teaches the aggregate core test binary
to dispatch the sandboxed filesystem helper, canonicalizes the macOS
temp patch target, and isolates the core test harness from managed
local/enterprise config. Keeping that fix separate lets this PR stay
focused on restoring Bazel coverage while documenting the first failure
it exposed.

## What changed

- Build sharded Rust integration tests as manual `*-bin` binaries and
run them through the existing manifest-aware `workspace_root_test`
launcher.
- Keep Bazel sharding on the launcher target so Rust test cases are
still distributed by stable test-name hashing.
- Configure Bazel CI to skip Rust tests whose names contain
`suite::code_mode::`.
- Exclude the standalone `codex-rs/code-mode` and `codex-rs/v8-poc`
unit-test targets from `bazel.yml`.

## Verification

- `bazel query --output=build //codex-rs/core:core-all-test` now shows
`workspace_root_test` wrapping `//codex-rs/core:core-all-test-bin`.
- `bazel test --test_output=all --nocache_test_results
--test_sharding_strategy=disabled //codex-rs/core:core-all-test
--test_filter=suite::request_permissions_tool::approved_folder_write_request_permissions_unblocks_later_apply_patch`
runs the actual Rust test body and passes.
- `bazel test --test_output=errors --nocache_test_results
--test_env=CODEX_BAZEL_TEST_SKIP_FILTERS=suite::code_mode::
//codex-rs/core:core-all-test` runs the sharded target with code-mode
skipped and passes overall locally, with one flaky attempt retried by
the existing `flaky = True` setting.
2026-05-04 13:33:14 -07:00
Felipe Coury
87d2235b54 fix(tui): support modified backspace/delete keys (#21058)
## Why

Fixes #21046.

Codex TUI 0.128.0 can show Backspace/Delete-related editor shortcuts in
`/keymap`, but Windows-style modified Backspace/Delete events were still
dropped by the composer because the default editor keymap did not
include those modified special-key variants. On Windows/CMD this meant
`Shift+Backspace` and `Shift+Delete` did not fall through to normal
character deletion, and `Ctrl+Backspace` / `Ctrl+Delete` did not perform
the word deletion users expect from Windows text inputs.

## What Changed

- Added default editor bindings for `shift-backspace` and `shift-delete`
so shifted delete keys keep normal grapheme deletion behavior.
- Added default editor bindings for `ctrl-backspace`,
`ctrl-shift-backspace`, `ctrl-delete`, and `ctrl-shift-delete` so
Windows-style word deletion works when terminals preserve those
modifiers.
- Added regression coverage for the resolved default keymap and textarea
behavior.

## How to Test

1. Start Codex in the TUI on Windows CMD or another terminal that
reports modified Backspace/Delete keys distinctly.
2. Type `hello world` in the composer.
3. Press `Ctrl+Backspace`; confirm `world` is removed and `hello `
remains.
4. Type `world` again, move the cursor before it, then press
`Ctrl+Delete`; confirm the next word is removed.
5. Type a few characters and press `Shift+Backspace` and `Shift+Delete`;
confirm they delete one character in the expected direction instead of
doing nothing.
6. Open `/keymap`, inspect the Editor deletion actions, and confirm the
modified Backspace/Delete aliases are visible as configurable defaults.

Targeted tests:
- `cargo test -p codex-tui keymap::tests`
- `cargo test -p codex-tui bottom_pane::textarea::tests`
- `cargo test -p codex-tui keymap_setup::tests`
2026-05-04 17:16:41 -03:00
charley-openai
a6599b8202 Add reasoning effort to turn tracing spans (#20060)
Why
#19432 added token usage to the turn and response spans. This follow-up
adds the configured reasoning effort so performance traces can be
filtered by model effort.

[example
trace](https://openai.datadoghq.com/apm/trace/1ff708a87159ff4898bdc8bd6091ec18?graphType=waterfall&shouldShowLegend=true&spanID=6596351544047485652&traceQuery=)
<img width="533" height="434" alt="Screenshot 2026-04-28 at 3 52 12 PM"
src="https://github.com/user-attachments/assets/77ef32fc-d7cd-4eec-87b4-26c6798f1af8"
/>


What Changed
- Adds `codex.turn.reasoning_effort` to the turn span.
- Adds `codex.request.reasoning_effort` to `handle_responses`.
- Extends the span test to cover explicit `high` effort with token
usage.

Testing
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `cargo test -p codex-otel`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
2026-05-04 12:57:05 -07:00
Michael Bolin
229b40aa21 core: fix apply_patch request permissions test (#21060)
## Why

The Bazel test coverage change exposed
`approved_folder_write_request_permissions_unblocks_later_apply_patch`,
and `rust-ci-full.yml` showed the same test failing on `main` on macOS.
There were two separate classes of problems here.

### Clean CI failure

The test emits an `apply_patch` tool call, but its config did not enable
the `apply_patch` tool, so the mocked response completed without an
`apply-patch-call` output. After enabling the tool, the same path also
needs the aggregate `codex-core` test binary to dispatch
`--codex-run-as-fs-helper`; sandboxed `apply_patch` uses that helper
under macOS Seatbelt.

The test now also canonicalizes the temporary patch target before
building the patch payload so the path matches normalized grants on
macOS, where `/var` paths often normalize to `/private/var`.

### Local/enterprise config isolation

The core test harness now builds its default test config with managed
config disabled, so host-managed enterprise config cannot alter these
tests. The request-permissions turns in this test also explicitly use
the user reviewer path, keeping the assertions focused on
`request_permissions` behavior rather than reviewer defaults from the
host.

## What Changed

- Enable `apply_patch` in
`approved_folder_write_request_permissions_unblocks_later_apply_patch`.
- Teach the core integration test binary to dispatch
`CODEX_FS_HELPER_ARG1`, matching the existing apply-patch and
linux-sandbox dispatch paths.
- Canonicalize the tempdir-backed patch target before creating the
patch.
- Ignore managed config in default core test configs and explicitly pin
this test to `ApprovalsReviewer::User`.

## Verification

Run outside the Codex app sandbox because these macOS tests
intentionally spawn Seatbelt:

- `cargo test -p codex-core
approved_folder_write_request_permissions_unblocks_later_apply_patch`
- `cargo test -p codex-core
approved_folder_write_request_permissions_unblocks_later_exec_without_sandbox_args`
2026-05-04 12:48:59 -07:00
sayan-oai
8126af3879 core: preserve last model ids in feedback tags (#21026)
## Why

Feedback reports do not currently surface a direct pointer to the last
model call, so investigations may require searching through many
requests in a session to find the bad response. Preserve the last
model-side IDs at response-stream time so immediate feedback reports
carry that breadcrumb.

## What changed

- Record `last_model_request_id` when a Responses stream exposes an
upstream request ID.
- Record `last_model_response_id` when the model response completes.
- Add unit coverage for the emitted feedback tags.

## Verification

- `cargo test -p codex-core
client::tests::response_stream_records_last_model_feedback_ids`
2026-05-04 12:46:08 -07:00
sayan-oai
b9e8df47da Use MCP server instructions in deferred namespace descriptions (#21053)
## Why

MCP servers can provide `instructions` that explain what their tools are
for. Directly exposed MCP namespaces already use those instructions when
a connector description is not available, but deferred `tool_search`
results did not preserve that fallback. The direct path falls back from
connector metadata to server instructions, while the deferred path only
carried `connector_description` and otherwise fell back to generic
namespace text.

That meant a plain MCP server could provide useful model-facing guidance
and still appear as `Tools in the X namespace.` whenever it was
discovered lazily through `tool_search`.

## What changed

- Store one model-facing `namespace_description` on `ToolInfo`, using
connector descriptions for connector-backed tools and server
instructions for plain MCP servers.
- Thread that namespace description through the `tool_search` source
list, search indexing, and returned namespace metadata.
- Add an end-to-end regression test for deferred non-app MCP search
results exposing server instructions as the namespace description.

## Verification

- `cargo test -p codex-tools
search_tool_description_lists_each_mcp_source_once --lib`
- `cargo test -p codex-core --test all
tool_search_uses_non_app_mcp_server_instructions_as_namespace_description`
2026-05-04 19:36:07 +00:00
Felipe Coury
48402be6fa feat(tui): improve TUI keymap coverage (#20798)
## Summary
- normalize terminal-emitted C0 control characters through configurable
editor keymaps, covering raw control-key fallbacks like
Shift+Enter-as-LF in terminals from #20555 and #20898, plus part of the
modified-Enter behavior in #20580
- add default-unbound keymap actions for toggling Fast mode and killing
the current composer line, giving #20698 users a configurable zsh-style
Ctrl+U option without changing the existing default Ctrl+U behavior
- wire the new actions through gated /keymap picker entries, schema
generation, and snapshot coverage

Fixes #20555.
Fixes #20898.

## Testing
- just write-config-schema
- just fmt
- cargo test -p codex-config
- cargo test -p codex-tui keymap::tests
- cargo test -p codex-tui bottom_pane::textarea::tests
- cargo test -p codex-tui keymap_setup::tests
- cargo insta pending-snapshots
- just fix -p codex-tui
- git diff --check
- just argument-comment-lint
2026-05-04 19:18:56 +00:00
Felipe Coury
cc16995cc6 feat(tui): add PR summary statusline items (#20892)
## Why?

The Codex App already exposes branch and PR context in its
branch-details UI. This brings the same context into the CLI footer as
opt-in statusline items, so users can choose the extra signal without
making the default footer busier.

## What?

Add optional `pull-request-number` and `branch-changes` items to the
configurable TUI status line.

- `pull-request-number` shows the open PR for the current checkout and
renders as a clickable terminal hyperlink when OSC 8 links are
supported.
- `branch-changes` shows committed additions/deletions against the
repository default branch, or `No changes` when the branch has no
committed diff.

<img width="1257" height="261" alt="CleanShot 2026-05-03 at 20 44 15"
src="https://github.com/user-attachments/assets/10b4380b-c3e9-4729-9ee1-3f742068fa47"
/>

## Architecture

This follows the same client/app-server split as the Codex App: the TUI
owns presentation, caching, and optional rendering, while
workspace-sensitive `git` and `gh` discovery runs through app-server.

The new TUI-local `workspace_command` layer sends bounded,
non-interactive `command/exec` requests to the active app-server. That
makes the implementation remote-friendly: the TUI does not decide
whether commands run in an embedded local workspace or a remote
workspace, and it does not bypass app-server sandbox or permission
policy.

The branch summary logic stays internal to `codex-tui` because this PR
only needs TUI statusline behavior. The command boundary is still
isolated behind `WorkspaceCommandExecutor`, so the lookup code can be
lifted or reused later without changing statusline rendering.

## How?

- Add a TUI `WorkspaceCommandExecutor` abstraction backed by app-server
`command/exec`.
- Add branch summary probes for:
  - current branch name,
  - open PR metadata,
  - committed branch diff stats against the default branch.
- Prefer remote-tracking default branch refs for diff stats, avoiding
stale or absent local `main` branches.
- Resolve PRs with `gh pr view` first, then fall back to
commit-associated PR lookup across parent/fork repos.
- Add `/statusline` picker entries, preview values, rendering, and OSC 8
clickable PR links.
- Keep all probes best-effort so missing `git`, missing `gh`, auth
failures, or non-git directories hide optional items instead of
surfacing footer errors.

## Validation

- `cargo test -p codex-tui branch_summary -- --nocapture`
- Snapshot coverage for the `/statusline` preview/setup rendering paths
- Hyperlink rendering coverage for clickable PR statusline cells
2026-05-04 16:11:15 -03:00
Owen Lin
c2fed01550 rollout: store web search and mcp tool calls (#21054)
Codex App would like these.
2026-05-04 18:54:20 +00:00
Ruslan Nigmatullin
4d201e340e state: pass state db handles through consumers (#20561)
## Why

SQLite state was still being opened from consumer paths, including lazy
`OnceCell`-backed thread-store call sites. That let one process
construct multiple state DB connections for the same Codex home, which
makes SQLite lock contention and `database is locked` failures much
easier to hit.

State DB lifetime should be chosen by main-like entrypoints and tests,
then passed through explicitly. Consumers should use the supplied
`Option<StateDbHandle>` or `StateDbHandle` and keep their existing
filesystem fallback or error behavior when no handle is available.

The startup path also needs to keep the rollout crate in charge of
SQLite state initialization. Opening `codex_state::StateRuntime`
directly bypasses rollout metadata backfill, so entrypoints should
initialize through `codex_rollout::state_db` and receive a handle only
after required rollout backfills have completed.

## What Changed

- Initialize the state DB in main-like entrypoints for CLI, TUI,
app-server, exec, MCP server, and the thread-manager sample.
- Pass `Option<StateDbHandle>` through `ThreadManager`,
`LocalThreadStore`, app-server processors, TUI app wiring, rollout
listing/recording, personality migration, shell snapshot cleanup,
session-name lookup, and memory/device-key consumers.
- Remove the lazy local state DB wrapper from the thread store so
non-test consumers use only the supplied handle or their existing
fallback path.
- Make `codex_rollout::state_db::init` the local state startup path: it
opens/migrates SQLite, runs rollout metadata backfill when needed, waits
for concurrent backfill workers up to a bounded timeout, verifies
completion, and then returns the initialized handle.
- Keep optional/non-owning SQLite helpers, such as remote TUI local
reads, as open-only paths that do not run startup backfill.
- Switch app-server startup from direct
`codex_state::StateRuntime::init` to the rollout state initializer so
app-server cannot skip rollout backfill.
- Collapse split rollout lookup/list APIs so callers use the normal
methods with an optional state handle instead of `_with_state_db`
variants.
- Restore `getConversationSummary(ThreadId)` to delegate through
`ThreadStore::read_thread` instead of a LocalThreadStore-specific
rollout path special case.
- Keep DB-backed rollout path lookup keyed on the DB row and file
existence, without imposing the filesystem filename convention on
existing DB rows.
- Verify readable DB-backed rollout paths against `session_meta.id`
before returning them, so a stale SQLite row that points at another
thread's JSONL falls back to filesystem search and read-repairs the DB
row.
- Keep `debug prompt-input` filesystem-only so a one-off debug command
does not initialize or backfill SQLite state just to print prompt input.
- Keep goal-session test Codex homes alive only in the goal-specific
helper, rather than leaking tempdirs from the shared session test
helper.
- Update tests and call sites to pass explicit state handles where DB
behavior is expected and explicit `None` where filesystem-only behavior
is intended.

## Validation

- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p
codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p
codex-tui -p codex-exec -p codex-cli --tests`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout state_db_`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout find_thread_path`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout find_thread_path -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout try_init_ -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p
codex-rollout --lib -- -D warnings`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-thread-store
read_thread_falls_back_when_sqlite_path_points_to_another_thread --
--nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-thread-store`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
shell_snapshot`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all personality_migration`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find`
- `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find::find_prefers_sqlite_path_by_id --
--nocapture`
- `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
interrupt_accounts_active_goal_before_pausing`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-app-server get_auth_status -- --test-threads=1`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-app-server --lib`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout
-p codex-app-server --tests`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout
-p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p
codex-exec -p codex-cli`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p
codex-app-server`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p
codex-rollout`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core`
- `just argument-comment-lint -p codex-core`
- `just argument-comment-lint -p codex-rollout`

Focused coverage added in `codex-rollout`:

- `recorder::tests::state_db_init_backfills_before_returning` verifies
the rollout metadata row exists before startup init returns.
- `state_db::tests::try_init_waits_for_concurrent_startup_backfill`
verifies startup waits for another worker to finish backfill instead of
disabling the handle for the process.
-
`state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill`
verifies startup does not hang indefinitely on a stuck backfill lease.
-
`tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename`
verifies DB-backed lookup accepts valid existing rollout paths even when
the filename does not include the thread UUID.
-
`tests::find_thread_path_falls_back_when_db_path_points_to_another_thread`
verifies DB-backed lookup ignores a stale row whose existing path
belongs to another thread and read-repairs the row after filesystem
fallback.

Focused coverage updated in `codex-core`:

- `rollout_list_find::find_prefers_sqlite_path_by_id` now uses a
DB-preferred rollout file with matching `session_meta.id`, so it still
verifies that valid SQLite paths win without depending on stale/empty
rollout contents.

`cargo test -p codex-app-server thread_list_respects_search_term_filter
-- --test-threads=1 --nocapture` was attempted locally but timed out
waiting for the app-server test harness `initialize` response before
reaching the changed thread-list code path.

`bazel test //codex-rs/thread-store:thread-store-unit-tests
--test_output=errors` was attempted locally after the thread-store fix,
but this container failed before target analysis while fetching `v8+`
through BuildBuddy/direct GitHub. The equivalent local crate coverage,
including `cargo test -p codex-thread-store`, passes.

A plain local `cargo check -p codex-rollout -p codex-app-server --tests`
also requires system `libcap.pc` for `codex-linux-sandbox`; the
follow-up app-server check above used `CODEX_SKIP_VENDORED_BWRAP=1` in
this container.
2026-05-04 11:46:03 -07:00
starr-openai
0035d7bd18 Add stdio exec-server listener (#20663)
## Why

This stack adds configured exec-server environments, including
environments reached over stdio. Before client-side stdio transports or
config can use that path, the exec-server binary itself needs a
first-class stdio listen mode so it can speak the same JSON-RPC protocol
over stdin/stdout that it already speaks over websockets.

**Stack position:** this is PR 1 of 5. It is the server-side transport
foundation for the stack.

## What Changed

- Accept `stdio` and `stdio://` for `codex exec-server --listen`.
- Promote the existing stdio `JsonRpcConnection` helper from test-only
code into normal exec-server transport code.
- Add parse coverage for stdio listen URLs while preserving the existing
websocket default.

## Stack

- **1. This PR:** https://github.com/openai/codex/pull/20663 - Add stdio
exec-server listener
- 2. https://github.com/openai/codex/pull/20664 - Add stdio exec-server
client transport
- 3. https://github.com/openai/codex/pull/20665 - Make environment
providers own default selection
- 4. https://github.com/openai/codex/pull/20666 - Add CODEX_HOME
environments TOML provider
- 5. https://github.com/openai/codex/pull/20667 - Load configured
environments from CODEX_HOME

Split from original draft: https://github.com/openai/codex/pull/20508

## Validation

Not run locally; this was split out of the original draft stack.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-04 11:40:03 -07:00
iceweasel-oai
5d5500650b Fix Windows PTY teardown by preserving ConPTY ownership (#20685)
## Why

On Windows, background terminals could stay visible after their shell
process had already exited. The elevated runner waits for the PTY output
reader to reach EOF before it sends the final exit message, but the
ConPTY helper was reducing ownership down to raw handles too early. That
left the pseudoconsole's borrowed pipe handles alive past teardown, so
EOF never propagated and the session stayed `running`.

## What changed

- change `utils/pty/src/win/conpty.rs` to hand off owned ConPTY
resources instead of leaking only raw handles
- make `windows-sandbox-rs/src/conpty/mod.rs` keep the pseudoconsole
owner and the backing pipe handles together until teardown
- update the elevated runner and the legacy unified-exec backend to keep
that `ConptyInstance` alive, take only the specific pipe handles they
need, and drop the owner at teardown instead of trying to close a
detached pseudoconsole handle later

## Testing

- desktop app in `Auto-review`: 11 x `cmd /c "ping -n 3 google.com"` all
exited cleanly and did not accumulate in the UI
- desktop app in `Auto-review`: 5 x `cmd /c "ping -n 30 google.com"`
appeared in the UI and drained back out on their own
2026-05-04 18:40:00 +00:00
starr-openai
905987c08f Prepare selected environment plumbing (#20669)
## Why
This is a prep PR in the multi-environment process-tool stack. It
separates ownership/config cleanup from the behavior change that teaches
process tools to route by selected environment, so the follow-up PR can
focus on model-facing `environment_id` behavior.

## Stack
1. https://github.com/openai/codex/pull/20646 - `EnvironmentContext`
rendering for selected environments
2. https://github.com/openai/codex/pull/20669 - selected-environment
ownership and tool config prep (this PR)
3. https://github.com/openai/codex/pull/20647 - process-tool
`environment_id` routing

## What Changed
- keep the resolved turn environment list wrapped in
`ResolvedTurnEnvironments` through `TurnContext` instead of unwrapping
it back to a raw `Vec`
- add `TurnContext::resolve_path_against` so cwd-relative path
resolution has one shared helper
- replace the old tool config boolean with `ToolEnvironmentMode::{None,
Single, Multiple}`

## Testing
- Tests not run locally; this prep refactor is covered by GitHub CI for
the stack.

Co-authored-by: Codex <noreply@openai.com>
2026-05-04 17:55:49 +00:00
Won Park
5c1ec8f4fd tui: retire /approvals and rename /autoreview to /approve (#21034)
## Why

The TUI currently exposes overlapping command names for the same
permissions flow: `/permissions` and the older `/approvals` alias. It
also uses `/autoreview` for the manual retry flow, even though the
action users take there is approving one denied auto-review request.

This change makes the command surface consistent with the hard rebrand:
- `/permissions` is the only command for permission settings.
- `/approve` is the command for approving a recent auto-review denial.

## What changed

- Removed the legacy `/approvals` slash command and its dispatch path.
- Kept `/permissions` as the single permissions command shown and
accepted by the TUI.
- Renamed the auto-review denial command from `/autoreview` to
`/approve`.
- Updated nearby comments so they refer to `/permissions` rather than
the retired `/approvals` name.

## Verification

- Updated the slash-command unit test to assert that `AutoReview` now
renders and parses as `approve`.
2026-05-04 17:50:34 +00:00
Felipe Coury
94800ecbbf feat(tui): add keymap debug inspector (#20794)
## Why

We constantly get bug reports about keys not being recognized by Codex
when the terminal is not handling the key press. Running `/keymap debug`
or `/keymap` and going to the Debug tab, we can allow the user to either
understand that the key being pressed is not being recognized or to
check what it's being recognized as and report or reassign that key.

| Menu | Inspector | Hint |
|---|---|---|
| <img width="1369" height="796" alt="CleanShot 2026-05-02 at 12 57 12"
src="https://github.com/user-attachments/assets/512b6faa-344e-4aee-9c00-b4bdc633a662"
/> | <img width="1261" height="754" alt="CleanShot 2026-05-02 at 12 56
36"
src="https://github.com/user-attachments/assets/a6ddae7d-e174-4ee4-893f-e6bec4fff4ab"
/> | <img width="1369" height="796" alt="CleanShot 2026-05-02 at 12 57
30"
src="https://github.com/user-attachments/assets/db507784-f40a-4cff-ac23-a61d9703769b"
/> |
## Summary
- add a Debug tab to `/keymap` and support `/keymap debug` for direct
access
- show what key Codex receives, the config key representation, raw event
details, and matching actions
- add a progressive missing-key hint that escalates after a few seconds
with no detected keypress

## Validation
- `just fmt`
- `cargo test -p codex-tui keymap_setup::tests::debug_view`
- `cargo test -p codex-tui keymap_setup::tests`
- `cargo test -p codex-tui slash_keymap`
- `cargo test -p codex-tui` (unit tests passed; integration test
`suite::model_availability_nux::resume_startup_does_not_consume_model_availability_nux_count`
failed locally by itself with `codex resume` exiting 1 and terminal
probe escape output)
- `just fix -p codex-tui`
- `just argument-comment-lint`
- `cargo insta pending-snapshots`
- `git diff --check`
2026-05-04 14:40:50 -03:00
viyatb-oai
5b80f87c97 fix(linux-sandbox): fall back when system bwrap lacks perms (#20628)
## Why

Codex `0.128` started using `--perms` in more routine Linux sandbox
construction when protected workspace metadata mounts landed in #19852.
Upstream bubblewrap added `--perms` in `v0.5.0`, so system `bwrap`
versions older than that, including the `v0.4.0` and `v0.4.1` family, do
not support the flag. The launcher still selected those binaries as long
as they existed on `PATH`.

That means affected hosts can fail every sandboxed command up front
with:

```text
bwrap: Unknown option --perms
```

The reports in #20590 and duplicate #20623 match that compatibility gap;
#20623 explicitly shows system bubblewrap `0.4.0`.

## What changed

- Replace the single `--argv0` probe with a small system-bwrap
capability probe in `codex-rs/linux-sandbox/src/launcher.rs`.
- Continue using the old-system `--argv0` compatibility path when
needed, but only select a system `bwrap` if it also advertises
`--perms`.
- Fall back to the vendored `bwrap` when the system binary is too old
for the flags Codex now requires.
- Add regression coverage for the old-system-bwrap case so binaries
without `--perms` stay on the vendored path.

## Verification

- Added `falls_back_to_vendored_when_system_bwrap_lacks_perms` to cover
the reported compatibility gap.
- Ran `cargo test -p codex-linux-sandbox` and `cargo clippy -p
codex-linux-sandbox --tests` locally. On macOS, the crate builds but its
Linux-only tests are cfg-gated out, so the new regression test still
needs Linux CI or a Linux devbox run for real execution coverage.

## Related issues

- Fixes #20590
- Duplicate report: #20623
2026-05-04 10:38:31 -07:00
Owen Lin
541e99cf09 feat(app-server): always return limited thread history (#20682)
## Why

Whenever we return a thread's history (turns and items) over app-server,
always return the limited form as specified by the rollout policy
`EventPersistenceMode::Limited`, even if the thread was previously
started with `EventPersistenceMode::Extended`.

We're finding it is quite unscalable to be returning the extended
history, so let's apply the same filtering logic of the rollout policy
when we load and return the thread's history.

## What Changed

- Reuse the rollout persistence policy when reconstructing app-server
`ThreadItem` history so only `EventPersistenceMode::Limited` rollout
items are replayed into API turns.
- Route `thread/read`, `thread/resume`, `thread/fork`,
`thread/turns/list`, and rollback responses through the same filtered
app-server history projection.
- Keep live active turns intact when composing a response for a
currently running thread.
- Update command execution coverage so persisted extended command events
are excluded from returned history for `thread/read`, `thread/fork`, and
`thread/turns/list`.

## Test Plan

- `cargo test -p codex-app-server limited`
- `cargo test -p codex-app-server thread_shell_command`
- `cargo test -p codex-app-server thread_read`
- `cargo test -p codex-app-server thread_rollback`
- `cargo test -p codex-app-server thread_fork`
- `cargo test -p codex-app-server-protocol`
2026-05-04 10:37:35 -07:00
Matthew Zeng
1b900bee8a Unify skip-review handling for approval_mode = "approve" (#20750)
## Summary
- Treat `approval_mode = "approve"` as skip-review across all permission
modes.
- Remove the mode-specific split in the MCP auto-approval gate so
approved tools bypass review consistently.
- Expand regression coverage in the shared MCP helper and the core
tool-call flow.

## Testing
- `just fmt`
- `cargo test -p codex-mcp`
- `cargo test -p codex-core
approve_mode_skips_arc_and_guardian_in_every_permission_mode`
- `git diff --check`
- Full `cargo test -p codex-core` was also attempted, but the suite hit
an unrelated pre-existing stack overflow in an existing multi-agent test
2026-05-04 10:30:47 -07:00
Matthew Zeng
83a4e3b66b [mcp-apps] Persist MCP Apps specific tool call end event. (#20853)
- [x] Persist a special type of MCP tool calls for triggering MCP App,
this type of mcp tool calls has 'mcpAppResourceUri` set. These events
are needed so that the Codex App can correctly render the MCP App after
resume.
2026-05-04 10:20:58 -07:00
jif-oai
e3451ce6be core: share responses request builder with compact requests (#20989)
## Why

`ModelClientSession` and `compact_conversation_history()` were still
rebuilding the same `ResponsesApiRequest` fields separately. That
duplication makes it easy for normal `/responses` turns and compact
requests to drift when request-shape changes land later, which is
exactly the kind of cache-affecting divergence we want to avoid.

This follow-up keeps the scope small by extracting the shared
request-construction logic into one helper and using it from both paths.

## What changed

- move `ResponsesApiRequest` construction into a shared
`ModelClient::build_responses_request(...)` helper in
`core/src/client.rs`
- update the normal `/responses` streaming path to call that helper
instead of the old `ModelClientSession`-local implementation
- update `compact_conversation_history()` to derive its compact payload
from the same helper so `model`, `instructions`, `input`, `tools`,
`parallel_tool_calls`, `reasoning`, and `text` stay aligned with normal
request building
- add a unit test covering the shared helper's prompt cache key,
installation metadata, and `service_tier` behavior

## Verification

- `cargo test -p codex-core
build_responses_request_sets_shared_cache_and_metadata_fields`
- `cargo test -p codex-core --test all
remote_compact_v2_reuses_context_compaction_for_followups`

## Docs

No docs update needed.
2026-05-04 17:18:38 +00:00
jif-oai
4fd7dfe223 memories-mcp: reject symlink traversal in local backend (#21010)
## Why

The local memories MCP backend only rejected symlinks after resolving
the final path. That left room for scoped requests like
`skills/secret.md` to walk through a symlinked ancestor directory and
escape the configured memories root.

This change also makes missing scoped paths fail explicitly instead of
looking like an empty `list` / `search` result or a `NotFile` read
error.

## What Changed

- walk each scoped path component in
`LocalMemoriesBackend::resolve_scoped_path` and reject symlinked
ancestors before accessing the target
- reject scoped paths that traverse through a non-directory intermediate
component
- add a `NotFound` backend error for missing `read`, `list`, and
`search` paths and map it through the MCP server error conversion
- add coverage for missing paths and symlinked ancestor directories in
`codex-rs/memories/mcp/src/local_tests.rs`

## Testing

- added unit coverage in `codex-rs/memories/mcp/src/local_tests.rs` for
missing paths and symlinked ancestor directories across `read`, `list`,
and `search`
2026-05-04 18:40:28 +02:00
jif-oai
f20f8a719e memories/mcp: generate tool schemas with schemars (#21012)
## Why

The memories MCP server currently keeps handwritten JSON Schema beside
the Rust types that actually serialize and deserialize the tool
payloads:
[`schema.rs`](2f5c06a29c/codex-rs/memories/mcp/src/schema.rs (L4-L133)),
[`server.rs`](2f5c06a29c/codex-rs/memories/mcp/src/server.rs (L44-L75)),
and
[`backend.rs`](2f5c06a29c/codex-rs/memories/mcp/src/backend.rs (L41-L117)).
That duplicates the tool contract and makes schema drift easier as the
API evolves.

## What changed

- derive `JsonSchema` for the memories tool arguments, responses, and
nested response types
- replace the handwritten schema builders with shared `schemars`
generation
- preserve the existing wire shape while generating schemas, including
nullable output `Option` fields and non-nullable optional input fields
- wire the `list`, `read`, and `search` tools to the generated schemas

## Verification

- CI pending
2026-05-04 18:40:17 +02:00
jif-oai
161541310f typo (#21023) 2026-05-04 18:39:46 +02:00
pakrym-oai
33b19bcfde [codex] Split app-server request processors (#20940)
## Why

The app-server request path had grown around a large
`CodexMessageProcessor` plus separate API wrapper/helper modules. That
made the dependency graph hard to see and forced unrelated request
families to share broad processor state.

This PR makes the split mechanical and command-prefix oriented so
request families own only the dependencies they use.

## What changed

- Replaced `CodexMessageProcessor` with command-prefix request
processors under `app-server/src/request_processors/`.
- Removed the old config, device-key, external-agent-config, and fs API
wrapper files by moving their API handling into processors.
- Split apps, plugins, marketplace, catalog, account, MCP, command exec,
fs, git, feedback, thread, turn, thread goals, and Windows sandbox
handling into dedicated processors.
- Kept shared lifecycle, summary conversion, token usage replay, and
shared error mapping only where multiple processors use them; single-use
helpers were inlined into their owning processor.
- Removed the fallback processor path and moved processor tests to
`_tests` files.

## Validation

- `cargo test -p codex-app-server`
- `cargo check -p codex-app-server`
- `just fix -p codex-app-server`
2026-05-04 09:34:11 -07:00
Eric Traut
12a729f2b2 Keep paused goals paused on thread resume (#20790)
## Summary

Early adopters of the `/goal` feature have provided feedback that they
expect a goal they explicitly paused to remain paused when they resume a
thread. Previously, resuming a thread would reactivate a paused goal.

This PR keeps persisted goal status unchanged during thread resume. This
honors the user feedback while also simplifying the core goal logic.

Rather than have the core logic automatically resume a paused goal, that
responsibility is transferred to the client. The TUI now detects a
resumed thread with a paused goal and asks the user whether to `Resume
goal` or `Leave paused`. The prompt appears only for quiet resume flows,
so users who resume with an immediate prompt are not interrupted.

<img width="544" height="111" alt="image"
src="https://github.com/user-attachments/assets/0ac9de1c-6ee6-47ba-b223-c03c8eb4c192"
/>
2026-05-04 09:04:30 -07:00
Eric Traut
f072119ccf Speed up /side parent restore replay (#20815)
## Why

Returning from a `/side` conversation restores the parent thread by
replaying its snapshot into the TUI. For very long parent threads,
replaying every transcript row can take noticeable time even though most
rows immediately scroll out of terminal history.

## What Changed

- Buffer thread-switch replay for parent restores when terminal resize
reflow is enabled.
- Reuse the existing resize-reflow tail renderer so only the retained
transcript tail is written back to scrollback when a row cap is
configured.
2026-05-04 09:00:30 -07:00
Eric Traut
3c2dcbef85 Keep paused goals paused on thread resume (#20790)
## Summary

Early adopters of the `/goal` feature have provided feedback that they
expect a goal they explicitly paused to remain paused when they resume a
thread. Previously, resuming a thread would reactivate a paused goal.

This PR keeps persisted goal status unchanged during thread resume. This
honors the user feedback while also simplifying the core goal logic.

Rather than have the core logic automatically resume a paused goal, that
responsibility is transferred to the client. The TUI now detects a
resumed thread with a paused goal and asks the user whether to `Resume
goal` or `Leave paused`. The prompt appears only for quiet resume flows,
so users who resume with an immediate prompt are not interrupted.

<img width="544" height="111" alt="image"
src="https://github.com/user-attachments/assets/0ac9de1c-6ee6-47ba-b223-c03c8eb4c192"
/>
2026-05-04 08:58:07 -07:00
jif-oai
2f5c06a29c nit: legacy (#21006) 2026-05-04 16:04:29 +02:00
jif-oai
8ba294ea13 feat: support multi-query memories search (#21004)
## Why
The memories MCP `search` tool only accepts a single substring today,
which makes it hard for clients to express combined queries or explain
why a line matched. This change adds the richer search shape needed for
the next client iteration while keeping the legacy single-`query` call
working.

## What changed
- accept either the legacy `query` field or a new `queries` array, plus
`match_mode: any|all`
- teach the local memories backend to evaluate multi-query line matches
and return `matched_queries` on each hit
- update the MCP input/output schema and add coverage for parser
behavior, ordering, pagination, case sensitivity, and match modes

## Testing
- added unit coverage in `memories/mcp/src/local_tests.rs` and
`memories/mcp/src/server.rs`
2026-05-04 15:55:06 +02:00
jif-oai
5512b23c95 nit: renaming (#20998) 2026-05-04 15:43:58 +02:00
jif-oai
0269a46ab1 feat: add context lines to memories MCP search (#20997)
## Why

The paginated memories MCP `search` tool still returned only the
matching line text, which made it harder for clients to present useful
search results or decide whether they needed to follow up with a
separate `read` call. Adding a small amount of surrounding context makes
individual hits much more usable while keeping the search response
deterministic and line-addressable.

## What changed

- add an optional `context_lines` search argument and thread it through
the MCP server into the local memories backend
- change search matches to return the matched `line_number` plus a
`start_line_number` and multi-line `content` block for the requested
context window
- update the search tool schema and description to document the new
request/response shape
- extend the local backend tests to cover zero-context matches,
contextual results, pagination, and invalid cursors that point past the
end of the result set

## Testing

- Added targeted unit coverage in `memories/mcp/src/local_tests.rs`
- GitHub Actions are running for the branch

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-04 15:32:57 +02:00
jif-oai
554223ab80 feat: paginate memories MCP search results (#20996)
## Why

The memories MCP `search` tool previously stopped once it hit
`max_results`, so callers could tell there were more matches via
`truncated` but had no way to fetch the rest of the result set. That
made large searches awkward for clients that need to keep paging through
a stable, deterministic view of the matches.

## What changed

- add an optional `cursor` field to `SearchMemoriesRequest` / tool input
and return `next_cursor` in `SearchMemoriesResponse`
- update the MCP schemas and tool wiring so clients can request
subsequent pages explicitly
- change the local memories backend to collect and sort the full scoped
match list, then slice the requested page and reject invalid cursors
- add unit coverage for paginated search results and invalid cursor
handling in `memories/mcp/src/local_tests.rs`

## Testing

- Added targeted unit coverage in `memories/mcp/src/local_tests.rs`
- GitHub Actions are running for the branch
2026-05-04 15:23:10 +02:00
jif-oai
29352569b3 feat: make memories MCP list shallow (#20994)
## Why
The memories MCP `list` tool should behave like a directory listing, not
a recursive tree walk. Recursive results make pagination harder to
reason about, return unexpectedly deep paths for scoped requests, and no
longer match the intended tool contract.

## What Changed
- Changed the local memories backend so `list` returns only the
immediate children of the requested path.
- Preserved file-scoped requests by returning the file itself, and
missing paths by returning an empty result.
- Updated cursor handling to paginate over the shallow sibling set and
reject cursors past the available results.
- Updated the MCP tool description to say it lists immediate files and
directories under a path.
- Reworked the local backend tests to cover shallow top-level listing,
shallow scoped listing, sibling ordering, and pagination.

## Testing
- `cargo test -p codex-memories-mcp`
2026-05-04 15:08:34 +02:00
jif-oai
5730615e75 feat: paginate MCP memories list (#20993)
## Why

Large memories trees do not fit well into a single MCP `list` response.
This change makes the memories MCP server page `list` results so callers
can continue walking the tree without overfetching or relying on
ambiguous truncation.

## What changed

- add an optional `cursor` input to the memories MCP `list` API and
return `next_cursor` alongside `truncated` in the response
- paginate recursive local-memory traversal while preserving
lexicographic path order across directories
- reject malformed and out-of-range cursors as invalid MCP requests
- update the server/schema wiring and add coverage for pagination,
ordering, and cursor validation in `memories/mcp/src/local_tests.rs`

## Testing

- `cargo test -p codex-memories-mcp`
2026-05-04 14:59:56 +02:00
jif-oai
6b6581ac59 feat: add max_lines to memories MCP read (#20991)
## Why

The memories MCP `read` tool already supports `line_offset`, but it
cannot return a bounded line range. That makes it awkward to page
through large memory files or request a small slice without relying on
token truncation.

## What changed

- add an optional `max_lines` parameter to the memories MCP `read` tool
schema and request parsing
- cap local backend reads to the requested number of lines before token
truncation
- treat `max_lines = 0` as an invalid request and surface it as
`invalid_params`
- add backend tests for bounded reads and invalid line request
validation

## Testing

- added coverage in `memories/mcp/src/local_tests.rs` for `max_lines`
reads and invalid `max_lines` / `line_offset` requests
2026-05-04 14:45:38 +02:00
jif-oai
019755d570 feat: add line offsets to memory read MCP (#20986)
## Why

Memory clients sometimes need to continue reading a file from a known
line instead of starting over from the top. Adding a line offset to the
`read` MCP keeps that resume logic simple and avoids re-reading
already-consumed content.

## What changed

- Added an optional `line_offset` argument to the memory `read` tool,
defaulting to `1`.
- Read content starting at the requested 1-indexed line before token
truncation, and return `start_line_number` in the response.
- Treat invalid offsets as invalid params errors and cover the new
behavior in `codex-rs/memories/mcp/src/local_tests.rs`.

## Testing

- Added unit tests for reading from a non-default starting line.
- Added unit tests for rejecting `0` and past-end line offsets.
2026-05-04 14:26:37 +02:00
jif-oai
d927f61208 feat: add remote compaction v2 Responses client path (#20773)
## Why

This adds the `remote_compaction_v2` client path so remote compaction
can run through the normal Responses stream and install a
`context_compaction` item that trigger a compaction.

The goal is to migrate some of the compaction logic on the client side

We keeps the v2 transport behind a feature flag while letting follow-up
requests reuse the compacted context instead of falling back to the
legacy compaction item shape.

## What changed

- add `ResponseItem::ContextCompaction` and refresh the generated
app-server / schema / TypeScript fixtures that expose response items on
the wire
- add `core/src/compact_remote_v2.rs` to send compaction through the
standard streamed Responses client, require exactly one
`context_compaction` output item, and install that item into compacted
history
- route manual compact and auto-compaction through the v2 path when
`remote_compaction_v2` is enabled, while keeping the existing remote
compaction path as the fallback
- preserve the new item type across history retention, follow-up request
construction, telemetry, rollout persistence, and rollout-trace
normalization
- add targeted coverage for the feature flag, `context_compaction`
serialization, rollout-trace normalization, and remote-compaction
follow-up behavior

## Verification

- added protocol tests for `context_compaction`
serialization/deserialization in `protocol/src/models.rs`
- added rollout-trace coverage for `context_compaction` normalization in
`rollout-trace/src/reducer/conversation_tests.rs`
- added remote compaction integration coverage for v2 follow-up reuse
and mixed compaction output streams in
`core/tests/suite/compact_remote.rs`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-04 14:15:01 +02:00
jif-oai
d013155f40 feat: memories mcp v1 (#20622)
Add an experimental MCP on memories
This must never be used and is only here for testing purpose

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-04 13:51:03 +02:00
jif-oai
f48b777717 feat: support template interpolation in multi-agent usage hints (#20973)
## Why

`multi_agent_v2` usage hints sometimes need to reference resolved config
values such as the effective thread limit. Those values only exist after
config layering, defaulting, and feature materialization, so the raw
TOML alone was not enough to render them.

## What changed

- allow
`features.multi_agent_v2.{usage_hint_text,root_agent_usage_hint_text,subagent_usage_hint_text}`
to use `{{ ... }}` placeholders backed by the materialized effective
config
- fail config loading with a targeted error when a referenced
placeholder does not exist or does not resolve to a scalar value
- move resolved-config materialization into a shared helper so config
interpolation and config-lock export/replay both serialize the same
resolved feature, memory, and agent settings

## Example
```
[features.multi_agent_v2]
enabled = true
usage_hint_text = "lorem {{ features.multi_agent_v2.max_concurrent_threads_per_session }} ipsum"
```
gets rendered as 
```
        "description": String("... \lorem 4 ipsum"),
```
2026-05-04 11:50:01 +02:00
pakrym-oai
c8c30d9d75 [codex] Emit MCP tool calls as turn items (#20677)
## Why

`McpToolCall` was still an app-server item synthesized from deprecated
legacy begin/end events. Recent item migrations moved this ownership
into core `TurnItem`s, so MCP tool calls now follow the same canonical
lifecycle and leave legacy events as compatibility fanout.

Keeping the core item close to the v2 `ThreadItem::McpToolCall` shape
also avoids spreading MCP result semantics across app-server conversion
code. Core now owns whether a completed call is `completed` or `failed`,
and whether the payload is a tool result or an error.

## What changed

- Added core `TurnItem::McpToolCall` with flattened `server`, `tool`,
`arguments`, `status`, `result`, and `error` fields.
- Updated MCP tool call emitters, including MCP resource tools, to emit
`ItemStarted`/`ItemCompleted` around directly constructed core MCP
items.
- Updated app-server v2 conversion to project the core MCP item into
`ThreadItem::McpToolCall` without deriving status or splitting `Result`
locally.
- Ignored live deprecated MCP legacy fanout in app-server v2 to avoid
duplicate item notifications, while keeping thread history replay on the
legacy event path.

## Verification

- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core --lib mcp_tool_call`
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server
mcp_tool_call_completion_notification_contains_truncated_large_result`
2026-05-03 22:50:13 -07:00
pakrym-oai
9ddfda9db7 [codex] Refactor app-server dispatch result flow (#20897)
## Why

App-server request handling had response sending spread across many
individual handlers, which made it harder to see which requests return
payloads, which methods send their own delayed response, and which
branches emit notifications after a response.

## What changed

- Centralized normal `ClientResponsePayload` sending in the dispatch
path.
- Kept explicit-response methods explicit where they need custom
ordering or delayed delivery.
- Removed forward-only handler wrappers and immediate `async { ...
}.await` bodies where they were not needed.
- Moved branch-specific post-response notifications into the branches
that own the response ordering.
- Replaced unreachable delegated request-family error arms with explicit
`unreachable!` cases.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server thread_goal`
- `just fix -p codex-app-server`
2026-05-03 18:57:46 -07:00
Eric Traut
67849d950d Remove local docs and specs (#20896)
## Summary

We should not check local-only docs or planning specs into this
repository. Keeping those files here duplicates the canonical Codex
documentation surface and makes transient implementation notes look like
supported docs.

This PR removes the local-only docs/spec files from `docs/` and trims
`docs/config.md` back to links for the maintained configuration
documentation on developers.openai.com.
2026-05-03 10:23:09 -07:00
Eric Traut
39555036a3 [codex] Add issue labeler area labels (#20893)
## Why

The automated issue labeler needs more precise area labels for newly
opened GitHub issues so triage can distinguish new Codex app and agent
feature surfaces without falling back to broad labels.

## What Changed

- Added labeler prompt entries for `computer-use`, `browser`, `memory`,
`imagen`, `remote`, `performance`, `automations`, and `pets` in
`.github/workflows/issue-labeler.yml`.
- Updated the agent-area guidance so `memory` is used for agentic memory
storage/retrieval and `performance` is used for slow behavior, high
memory utilization, and leaks.
- Expanded the fallback `agent` guidance so Codex prefers the new
specific labels when applicable.

## Verification

- Parsed `.github/workflows/issue-labeler.yml` with `yq e '.'`.
- Ran `git diff --check` for the workflow change.
2026-05-03 09:25:42 -07:00
pakrym-oai
35aaa5d9fc Bound websocket request sends with idle timeout (#20751)
## Why

We saw Responses websocket sessions recover only after a long quiet
period when the server had already logged the websocket as disconnected.
The normal connect path is already bounded by
`websocket_connect_timeout_ms`, but the first request send on an
established websocket reused only the receive-side idle timeout after
the write completed. If the socket write/pump stalls, the client can sit
in `ws_stream.send(...)` without reaching the existing receive timeout.
2026-05-01 23:33:32 -07:00
Matthew Zeng
f88701f5c8 [tool_suggest] More prompt polishes. (#20566)
Tool suggest still misfires when model needs tool_search, updating the
prompts to further disambiguate it:

- [x] rename it from `tool_suggest` to `request_plugin_install`
- [x] rephrase "suggestion" to "install" in the tool descriptions.
- [x] disambiguate "the tool" vs "the plugin/connector". 

Tested with the Codex App and verified it still works.
2026-05-02 04:22:12 +00:00
Felipe Coury
127434cd8b fix(tui): bound startup terminal probes (#20654)
## Summary

Bound TUI startup terminal response probes so unsupported terminals
cannot stall startup for multiple seconds.

This replaces the Unix startup uses of crossterm's blocking response
probes with short `/dev/tty` probes that use nonblocking reads and
`poll` with a 100ms timeout. It covers the initial cursor-position
query, keyboard enhancement support detection, and OSC 10/11
default-color detection. The default-color probe uses one shared
deadline for foreground and background instead of allowing two
independent full waits.

The diagnostic mode/trace env vars from the investigation branch are
intentionally not included. The shipped behavior is simply bounded
probing by default, while non-Unix keeps the existing crossterm fallback
path.

## Details

- Add a private `terminal_probe` module for bounded Unix terminal probes
and response parsers.
- Let `custom_terminal::Terminal` accept a caller-provided initial
cursor position so startup can compute it before constructing the
terminal.
- Use bounded cursor, keyboard enhancement, and default-color probes on
Unix startup.
- Preserve default-color cache behavior so a failed attempted query does
not retry forever.

## Validation

- `cd codex-rs && just fmt`
- `cd codex-rs && cargo test -p codex-tui terminal_probe`
- `cd codex-rs && just fix -p codex-tui`
- `cd codex-rs && just argument-comment-lint`
- `git diff --check`
- `git diff --cached --check`

`cd codex-rs && cargo test -p codex-tui` still aborts on the
pre-existing local stack overflow in
`app::tests::discard_side_thread_keeps_local_state_when_server_close_fails`;
I reproduced that same focused failure on `main` before this PR work, so
it is not introduced by this change.

Manual validation in the VM showed the original crossterm path taking
about 2s per unanswered probe, while bounded probing returned in about
100ms per probe.
2026-05-02 01:20:57 +00:00
jgershen-oai
9e905528bb Fix custom CA login behind TLS-inspecting proxies (#20676)
Refs:
https://linear.app/openai/issue/SE-6311/login-fails-for-experian-users-behind-tls-inspecting-proxy

## Summary
- When a custom CA bundle is configured, force the shared `codex-client`
reqwest builder onto rustls before registering custom roots.
- Add the `rustls-tls-native-roots` reqwest feature so the rustls client
preserves native roots plus the enterprise CA bundle.
- Add subprocess TLS coverage for both a direct local TLS 1.3 server and
a hermetic local CONNECT TLS-intercepting proxy that forwards a
token-exchange-shaped POST to a local origin.

## Plain-language explanation
Experian users are behind a TLS-inspecting proxy, so the login token
exchange needs to trust the enterprise CA bundle from
`CODEX_CA_CERTIFICATE` or `SSL_CERT_FILE`. Before this change, that
custom-CA branch still used reqwest default TLS selection, which could
fail in the proxy environment. Now, only when a custom CA is configured,
Codex selects rustls first and then adds the custom CA roots, matching
the validated behavior from the Experian test build while leaving normal
system-root clients unchanged.

The new regression test recreates the enterprise-proxy shape locally:
the probe client sends an HTTPS `POST /oauth/token` through an explicit
HTTP CONNECT proxy, the proxy presents a leaf certificate signed by a
runtime-generated test CA, decrypts the request, forwards it to a local
origin, and relays the `ok` response back.

## Scope note
- The actual production fix is the first commit: `8368119282 Fix custom
CA reqwest clients to use rustls`.
- The second commit is integration-test coverage only. It generates all
test CA and localhost certificate material at runtime.

## Validation
- `cd codex-rs && cargo test -p codex-client --test ca_env
posts_to_token_origin_through_tls_intercepting_proxy_with_custom_ca_bundle
-- --nocapture`
- `cd codex-rs && cargo test -p codex-client`
- `cd codex-rs && cargo test -p codex-login`
- `cd codex-rs && just fmt`
- `cd codex-rs && just bazel-lock-update`
- `cd codex-rs && just bazel-lock-check`
- `cd codex-rs && just fix -p codex-client`
2026-05-01 17:51:49 -07:00
Michael Bolin
cd2760fc08 ci: cross-compile Windows Bazel clippy (#20701)
## Why

#20585 moved the Windows Bazel test job to the cross-compile path, but
the Windows Bazel clippy and verify-release-build jobs were still using
the native Windows/MSVC-host fallback. Those two jobs became the slowest
Windows PR legs, even though both are build-only signal and do not need
to execute the resulting binaries.

## What Changed

- Switches the Windows Bazel clippy job from
`--windows-msvc-host-platform` to `--windows-cross-compile`, so clippy
build actions use Linux RBE while still targeting
`x86_64-pc-windows-gnullvm`.
- Switches the Windows Bazel verify-release-build job to
`--windows-cross-compile` as well. This job only compiles
`cfg(not(debug_assertions))` Rust code under `fastbuild`, so it does not
need a native Windows build host.
- Keeps the old `--skip_incompatible_explicit_targets` behavior only for
fork/community PRs without `BUILDBUDDY_API_KEY`, where `run-bazel-ci.sh`
falls back to the local Windows MSVC-host shape.
- Adds `--windows-cross-compile` support to
`.github/scripts/run-bazel-query-ci.sh`, so target-discovery queries
select the same `ci-windows-cross` config as the subsequent build.
- Threads that option through `scripts/list-bazel-clippy-targets.sh` so
the Windows clippy job discovers targets under the same platform shape
as the subsequent clippy build.

## Verification

Local checks:

```shell
bash -n .github/scripts/run-bazel-query-ci.sh
bash -n scripts/list-bazel-clippy-targets.sh
ruby -e 'require "yaml"; YAML.load_file(".github/workflows/bazel.yml"); puts "ok"'
RUNNER_OS=Linux ./scripts/list-bazel-clippy-targets.sh | grep -c -- '-windows-cross-bin$'
RUNNER_OS=Windows ./scripts/list-bazel-clippy-targets.sh --windows-cross-compile | grep -c -- '-windows-cross-bin$'
```

The Linux target-list check reported `0` Windows-cross internal test
binaries, while the Windows cross target-list check reported `47`,
preserving the test-code clippy coverage shape from the existing Windows
job.
2026-05-01 16:40:29 -07:00
Michael Bolin
466798aa83 ci: cross-compile Windows Bazel tests (#20585)
## Status

This is the Bazel PR-CI cross-compilation follow-up to #20485. It is
intentionally split from the Cargo/cargo-xwin release-build PoC so
#20485 can stay as the historical release-build exploration. The
unrelated async-utils test cleanup has been moved to #20686, so this PR
is focused on the Windows Bazel CI path.

The intended tradeoff is now explicit in `.github/workflows/bazel.yml`:
pull requests get the fast Windows cross-compiled Bazel test leg, while
post-merge pushes to `main` run both that fast cross leg and a fully
native Windows Bazel test leg. The native main-only job keeps full
V8/code-mode coverage and gets a 40-minute timeout because it is less
latency-sensitive than PR CI. All other Bazel jobs remain at 30 minutes.

## Why

Windows Bazel PR CI currently does the expensive part of the build on
Windows. A native Windows Bazel test job on `main` completed in about
28m12s, leaving very little headroom under the 30-minute job timeout and
making Windows the slowest PR signal.

#20485 showed that Windows cross-compilation can be materially faster
for Cargo release builds, but PR CI needs Bazel because Bazel owns our
test sharding, flaky-test retries, and integration-test layout. This PR
applies the same high-level shape we already use for macOS Bazel CI:
compile with remote Linux execution, then run platform-specific tests on
the platform runner.

The compromise is deliberately signal-aware: code-mode/V8 changes are
rare enough that PR CI can accept losing the direct V8/code-mode
smoke-test signal temporarily, while `main` still runs the native
Windows job post-merge to catch that class of regression. A follow-up PR
should investigate making the cross-built Windows gnullvm V8 archive
pass the direct V8/code-mode tests so this tradeoff can eventually go
away.

## What Changed

- Adds a `ci-windows-cross` Bazel config that targets
`x86_64-pc-windows-gnullvm`, uses Linux RBE for build actions, and keeps
`TestRunner` actions local on the Windows runner.
- Adds explicit Windows platform definitions for
`windows_x86_64_gnullvm`, `windows_x86_64_msvc`, and a bridge toolchain
that lets gnullvm test targets execute under the Windows MSVC host
platform.
- Updates the Windows Bazel PR test leg to opt into the cross-compile
path via `--windows-cross-compile` and `--remote-download-toplevel`.
- Adds a `test-windows-native-main` job that runs only for `push` events
on `refs/heads/main`, uses the native Windows Bazel path, includes
V8/code-mode smoke tests, and has `timeout-minutes: 40`.
- Keeps fork/community PRs without `BUILDBUDDY_API_KEY` on the previous
local Windows MSVC-host fallback, including
`--host_platform=//:local_windows_msvc` and `--jobs=8`.
- Preserves the existing integration-test shape on non-gnullvm
platforms, while generating Windows-cross wrapper targets only for
`windows_gnullvm`.
- Resolves `CARGO_BIN_EXE_*` values from runfiles at test runtime,
avoiding hard-coded Cargo paths and duplicate test runfiles.
- Extends the V8 Bazel patches enough for the
`x86_64-pc-windows-gnullvm` target and Linux remote execution path.
- Makes the Windows sandbox test cwd derive from `INSTA_WORKSPACE_ROOT`
at runtime when Bazel provides it, because cross-compiled binaries may
contain Linux compile-time paths.
- Keeps the direct V8/code-mode unit smoke tests out of the Windows
cross PR path for now while native Windows CI continues to cover them
post-merge.

## Command Shape

The fast Windows PR test leg invokes the normal Bazel CI wrapper like
this:

```shell
./.github/scripts/run-bazel-ci.sh \
  --print-failed-action-summary \
  --print-failed-test-logs \
  --windows-cross-compile \
  --remote-download-toplevel \
  -- \
  test \
  --test_tag_filters=-argument-comment-lint \
  --test_verbose_timeout_warnings \
  --build_metadata=COMMIT_SHA=${GITHUB_SHA} \
  -- \
  //... \
  -//third_party/v8:all \
  -//codex-rs/code-mode:code-mode-unit-tests \
  -//codex-rs/v8-poc:v8-poc-unit-tests
```

With the BuildBuddy secret available on Windows, the wrapper selects
`--config=ci-windows-cross` and appends the important Windows-cross
overrides after rc expansion:

```shell
--host_platform=//:rbe
--shell_executable=/bin/bash
--action_env=PATH=/usr/bin:/bin
--host_action_env=PATH=/usr/bin:/bin
--test_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}
```

The native post-merge Windows job intentionally omits
`--windows-cross-compile` and does not exclude the V8/code-mode unit
targets:

```shell
./.github/scripts/run-bazel-ci.sh \
  --print-failed-action-summary \
  --print-failed-test-logs \
  -- \
  test \
  --test_tag_filters=-argument-comment-lint \
  --test_verbose_timeout_warnings \
  --build_metadata=COMMIT_SHA=${GITHUB_SHA} \
  --build_metadata=TAG_windows_native_main=true \
  -- \
  //... \
  -//third_party/v8:all
```

## Research Notes

The existing macOS Bazel CI config already uses the model we want here:
build actions run remotely with `--strategy=remote`, but `TestRunner`
actions execute on the macOS runner. This PR mirrors that pattern for
Windows with `--strategy=TestRunner=local`.

The important Bazel detail is that `rules_rs` is already targeting
`x86_64-pc-windows-gnullvm` for Windows Bazel PR tests. This PR changes
where the build actions execute; it does not switch the Bazel PR test
target to Cargo, `cargo-nextest`, or the MSVC release target.

Cargo release builds differ from this Bazel path for V8: the normal
Windows Cargo release target is MSVC, and `rusty_v8` publishes prebuilt
Windows MSVC `.lib.gz` archives. The Bazel PR path targets
`windows-gnullvm`; `rusty_v8` does not publish a prebuilt Windows
GNU/gnullvm archive, so this PR builds that archive in-tree. That
Linux-RBE-built gnullvm archive currently crashes in direct V8/code-mode
smoke tests, which is why the workflow keeps native Windows coverage on
`main`.

The less obvious Bazel detail is test wrapper selection. Bazel chooses
the Windows test wrapper (`tw.exe`) from the test action execution
platform, not merely from the Rust target triple. The outer
`workspace_root_test` therefore declares the default test toolchain and
uses the bridge toolchain above so the test action executes on Windows
while its inner Rust binary is built for gnullvm.

The V8 investigation exposed a Windows-client gotcha: even when an
action execution platform is Linux RBE, Bazel can still derive the
genrule shell path from the Windows client. That produced remote
commands trying to run `C:\Program Files\Git\usr\bin\bash.exe` on Linux
workers. The wrapper now passes `--shell_executable=/bin/bash` with
`--host_platform=//:rbe` for the Windows cross path.

The same Windows-client/Linux-RBE boundary also affected
`third_party/v8:binding_cc`: a multiline genrule command can carry CRLF
line endings into Linux remote bash, which failed as `$'\r'`. That
genrule now keeps the `sed` command on one physical shell line while
using an explicit Starlark join so the shell arguments stay readable.

## Verification

Local checks included:

```shell
bash -n .github/scripts/run-bazel-ci.sh
bash -n workspace_root_test_launcher.sh.tpl
ruby -e "require %q{yaml}; YAML.load_file(%q{.github/workflows/bazel.yml}); puts %q{ok}"
RUNNER_OS=Linux ./scripts/list-bazel-clippy-targets.sh
RUNNER_OS=Windows ./scripts/list-bazel-clippy-targets.sh
RUNNER_OS=Linux ./tools/argument-comment-lint/list-bazel-targets.sh
RUNNER_OS=Windows ./tools/argument-comment-lint/list-bazel-targets.sh
```

The Linux clippy and argument-comment target lists contain zero
`*-windows-cross-bin` labels, while the Windows lists still include 47
Windows-cross internal test binaries.

CI evidence:

- Baseline native Windows Bazel test on `main`: success in about 28m12s,
https://github.com/openai/codex/actions/runs/25206257208/job/73907325959
- Green Windows-cross Bazel run on the split PR before adding the
main-only native leg: Windows test 9m16s, Windows release verify 5m10s,
Windows clippy 4m43s,
https://github.com/openai/codex/actions/runs/25231890068
- The latest SHA adds the explicit PR-vs-main tradeoff in `bazel.yml`;
CI is rerunning on that focused diff.

## Follow-Up

A subsequent PR should investigate making a cross-built Windows binary
work with V8/code-mode enabled. Likely options are either making the
Linux-RBE-built `windows-gnullvm` V8 archive correct at runtime, or
evaluating whether a Bazel MSVC target/toolchain can reuse the same
prebuilt MSVC `rusty_v8` archive shape that Cargo release builds already
use.
2026-05-01 15:55:28 -07:00
Channing Conger
a5fbcf1ab4 Prune unused code-mode globals (#20542)
Hide Atomics, SharedArrayBuffer, and WebAssembly from the code-mode
runtime since the harness does not expose worker support or need those
APIs.
2026-05-01 15:11:22 -07:00
starr-openai
2952beb009 Surface multi-environment choices in environment context (#20646)
## Why
The model needs a way to see which environments are available during a
multi-environment turn without changing the legacy single-environment
prompt surface or pulling replay/persistence changes into the same
review.

## Stack
1. https://github.com/openai/codex/pull/20646 - `EnvironmentContext`
rendering for selected environments (this PR)
2. https://github.com/openai/codex/pull/20669 - selected-environment
ownership and tool config prep
3. https://github.com/openai/codex/pull/20647 - process-tool
`environment_id` routing

## What Changed
- extend `environment_context` so multi-environment turns render an
`<environments>` block with the selected environment ids and cwd values
- keep zero- and single-environment turns on the existing cwd-only
render path
- keep replay and persistence paths on the legacy surface for now so
this PR stays scoped to live prompt rendering
- add focused coverage in
`codex-rs/core/src/context/environment_context_tests.rs`

## Testing
- CI

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-01 22:11:06 +00:00
Abhinav
d55479488e Clear live hook rows when turns finalize (#20674)
# Why

When a user interrupts a turn while a hook is still running, the normal
turn status is cleared but the separate live hook row can remain visible
as `Running` because the TUI may never receive a matching
`HookCompleted` event before cancellation. Once the turn itself is
finalized, that turn-scoped live state should not remain on screen.

# What

- clear any still-live `active_hook_cell` during turn finalization
- add a regression snapshot covering an interrupted turn with a visible
`PreToolUse` hook row

# Testing

- `cargo test -p codex-tui interrupted_turn_clears_visible_running_hook`
- attempted `cargo test -p codex-tui` (currently aborts on unrelated
existing stack overflow in
`app::tests::discard_side_thread_removes_agent_navigation_entry`)
2026-05-01 14:48:22 -07:00
Abhinav
443f6b831e Use the 2025-06-18 elicitation capability shape (#20562)
# Why

Codex currently negotiates MCP `2025-06-18`, where the client
elicitation capability is represented as an empty object. We were still
serializing `capabilities.elicitation.form`, which belongs to the later
capability shape and can cause strict `2025-06-18` servers to reject
`initialize` with an unrecognized-field error.

This keeps the handshake aligned with the protocol version Codex
actually negotiates and fixes the compatibility regression tracked in
#17492.

# What

- Serialize the client elicitation capability as `elicitation: {}` for
`2025-06-18`.
- Keep elicitation advertised for both Codex Apps and custom MCP
servers.
- Tighten regression coverage so the unit test asserts both the Rust
value and the serialized wire shape.
- Add an app-server integration test that round-trips a form elicitation
from a custom MCP server; the existing connector round-trip continues to
cover the connector path.

# Verification

- `cargo test -p codex-mcp`
- `cargo test -p codex-app-server mcp_server_elicitation_round_trip`
- `cargo test -p codex-app-server
mcp_server_tool_call_round_trips_elicitation`

# Next steps

- Decide whether `tool_call_mcp_elicitation=false` should also suppress
capability advertisement during `initialize`.
- Revisit `form` / `url` capability advertisement when Codex is ready to
negotiate MCP `2025-11-25`, which defines that newer shape.
2026-05-01 14:16:22 -07:00
pakrym-oai
aed74e5ee4 [codex] Emit image view as core item (#20512)
## Why

Image-view results should be represented as a core-produced turn item
instead of being reconstructed by app-server. At the same time, existing
rollout/history paths still understand the legacy `ViewImageToolCall`
event, so this keeps that event as compatibility output generated from
the new item lifecycle.

## What changed

- Added `TurnItem::ImageView` to `codex-protocol`.
- Emitted image-view item start/completion directly from the core
`view_image` handler.
- Kept `ViewImageToolCall` as a legacy event and generate it from
completed `TurnItem::ImageView` items.
- Kept `thread_history.rs` on the legacy `ViewImageToolCall` replay
path, with `ImageView` item lifecycle events ignored there.
- Updated app-server protocol conversion, rollout persistence, and
affected exhaustive event matches for the new item plus legacy fan-out
shape.

## Verification

- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server -p
codex-app-server --lib`
- `cargo test -p codex-core --test all
view_image_tool_attaches_local_image`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-rollout -p codex-rollout-trace -p
codex-mcp-server`
- `git diff --check`
2026-05-01 11:28:30 -07:00
canvrno-oai
610eefb86b /plugins: add marketplace upgrade flow (#20478)
This PR adds marketplace upgrade to the `/plugins` menu so users can
update configured marketplaces. It adds a `Ctrl+U` shortcut on eligible
marketplace tabs, a loading state, and the app-server request flow
needed to perform `marketplace/upgrade`. After a successful upgrade, the
TUI refreshes plugin data, plugin mentions, and user config so updated
marketplace contents show up across the menu and other plugin surfaces.
It also preserves the current marketplace tab on no-op and failure paths
and surfaces backend error details directly in the TUI.

- Add a `Ctrl+U` upgrade option for user-configured marketplace tabs in
`/plugins`
- Show the upgrade footer hint only on upgradeable marketplace tabs
- Show a loading state during `marketplace/upgrade`
- Surface already-up-to-date and per-marketplace failure results from
the backend
- Refresh plugin data, plugin mentions, and user config after successful
upgrades
- Add tests and snapshot updates for the shortcut flow, loading state,
and failure messaging

Steps to test:
1. Add a `/plugin` marketplace to Codex TUI.
2. Open `/plugins`, move to that marketplace tab, and confirm the footer
shows `Ctrl+U` to upgrade.
3. Press `Ctrl+U` and confirm the popup switches into an upgrade loading
state.
4. When the request finishes, confirm you see the expected result:
updated marketplace contents on success, an already-up-to-date message
on no-op, or backend error details on failure. On no-op or failure,
confirm the popup stays on the same marketplace tab.
2026-05-01 11:26:29 -07:00
jif-oai
2817866a32 fix: reduce ConfigBuilder::build stack usage (#20650)
## Why

`ConfigBuilder::build` performs a large amount of async config loading.
Leaving that entire future on the caller stack makes config startup more
fragile on small runtime worker stacks.

## What changed

- keep `ConfigBuilder::build` as a thin wrapper that boxes the
config-loading future before awaiting it
- move the existing implementation into a private `build_inner` method
so the large async state machine lives on the heap instead of the
runtime thread stack

## Testing

- Not run locally
2026-05-01 20:24:17 +02:00
Felipe Coury
ff66b3c7eb fix(tui): restore alt-enter newline alias (#20535)
Fixes https://github.com/openai/codex/issues/20501

## Summary
- add Alt+Enter to the built-in editor newline aliases
- update keymap tests that used Alt+Enter as a custom submit binding now
that it conflicts with newline
- refresh the keymap action-menu snapshot fixture

## Test Plan
- `just fmt`
- `cargo test -p codex-tui keymap::tests`
- `cargo test -p codex-tui bottom_pane::textarea::tests`
- `cargo test -p codex-tui keymap_setup::tests`
- `cargo test -p codex-tui`
- `cargo insta pending-snapshots`
- `git diff --check`
- `just argument-comment-lint`
2026-05-01 15:22:02 -03:00
starr-openai
be71b6fcd1 Use selected turn environments for runtime context (#20281)
## Summary
- make selected turn environments the source of truth for session
runtime cwd and MCP runtime environment selection
- keep local/no-selection fallback behavior intact
- add coverage for duplicate selected environments, cwd resolution, and
MCP runtime environment selection

## Validation
- git diff --check
- rustfmt was run on touched Rust files during the implementation
workflow

CI should provide the full Bazel/test signal.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-01 11:00:14 -07:00
Tom
e4d6675632 [codex] Migrate loaded thread/read history to ThreadStore (#20486)
## Summary

- Route loaded `thread/read` + `includeTurns` through
`CodexThread::load_history` / ThreadStore history instead of direct
rollout JSONL reads.
- Add an in-memory ThreadStore regression test covering loaded
`thread/read includeTurns` without a local rollout path.
2026-05-01 10:55:04 -07:00
Abhinav
78baa20780 deprecate legacy notify (#20524)
# Why

`notify` is the remaining compatibility surface from the legacy hook
implementation. The newer lifecycle hook engine now owns the active hook
system, so we should start steering users away from adding new `notify`
configs before removing the old path entirely. This also adds a
lightweight watchpoint for the deprecation so we can see how much legacy
usage remains before the clean drop.

# What

- emit a startup deprecation notice when a non-empty `notify` command is
configured
- emit `codex.notify.configured` when a session starts with legacy
`notify` configured
- emit `codex.notify.run` when the legacy notify path fires after a
completed turn
- mark `notify` as deprecated in the config schema and repo docs
- remove the orphaned `codex-rs/hooks/src/user_notification.rs` file
that is no longer compiled
- add regression coverage for the new deprecation notice

# Next steps

A follow-up PR can remove the legacy notify path entirely once we are
ready for the clean drop. Before then, we can watch
`codex.notify.configured` and `codex.notify.run` to understand the
deprecation impact and remaining active usage. The cleanup PR should
then delete the `notify` config field, the `legacy_notify`
implementation, the old compatibility dispatch types and callsites that
only exist for the legacy path, and the remaining compatibility
docs/tests.

# Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-config`
- `cargo test -p codex-core emits_deprecation_notice_for_notify`
2026-05-01 17:35:21 +00:00
pakrym-oai
9b8d585075 [codex] Add Codex environment config (#20630)
## Why

This adds a checked-in Codex environment configuration so the repo
exposes a ready-to-run Codex action from the app environment metadata.

## What changed

- Added `.codex/environments/environment.toml` with a generated `Run`
action.
- The action runs the `codex` binary from `codex-rs/Cargo.toml` with
`mcp_oauth_credentials_store=file`.

## Verification

- Not run; configuration-only change.
2026-05-01 10:01:45 -07:00
Eric Traut
6784db51c0 Add /ide context support to the TUI (#20294)
## Why

Users have asked for a `/ide` command in the TUI so Codex can use the
active IDE session for live context such as the current file, open tabs,
and selected ranges. We already support a similar feature in the Codex
desktop app, so bringing it to the TUI makes sense.

One subtle compatibility constraint is that the injected prompt wrapper
and transcript stripping should match the desktop app and IDE extension.
By using the same `## My request for Codex:` delimiter and hiding the
injected context from transcript rendering the same way, threads created
in the TUI render correctly in desktop and IDE surfaces, and threads
created there replay correctly in the TUI, even when IDE context was
included.

Addresses https://github.com/openai/codex/issues/13834.

## What changed
### Summary
This PR consists of four four pieces:
1. An IPC client that uses a socket (Mac/Linux) or named pipe (Windows)
to talk to the IDE Extension
2. Logic that establishes the IPC connection and requests IDE context
(open files, selection) on demand
3. Logic that injects this context into the user prompt (using the same
technique as the desktop app) and hides the added context when rendering
the prompt in the TUI transcript
4. A new slash command for enabling/disabling this mode and text within
the footer to indicate when it's enabled

### Details
- Added `/ide [on|off|status]` to the TUI, with bare `/ide` toggling IDE
context on or off.
- Added a Rust IDE context client that connects to the local Codex IDE
IPC route as a client and requests context from the IDE extension flow.
- Injected IDE context using the same prompt delimiter and
transcript-stripping convention as the desktop app and IDE extension so
shared threads render consistently across surfaces.
- Added an `IDE context` status-line indicator while the feature is
active and cleared it when enabling or fetching context fails.
- Added handling for multiple selection ranges, oversized selections,
interleaved IPC messages, and transient reconnect timing after quick
toggles.

## Verification

Did extensive manual testing in addition to running automated unit and
regression tests.

To test:

- Launch VS Code (or Cursor) with the IDE extension.
- Open one or more files in the IDE and select a range of text within
one of them.
- Start the TUI.
- Ask the agent which files you have open in your IDE, and it should say
that it does not know.
- Enable `/ide` mode; note that `IDE context` appears in the lower
right.
- Ask the agent what files you have open in your IDE and what text is
selected.
2026-05-01 09:39:48 -07:00
Ruslan Nigmatullin
41e171fcf2 app-server: move transport into dedicated crate (#20545)
## Why

`codex-app-server` currently owns both request-processing code and
transport implementation details. Splitting the transport layer into its
own crate makes that boundary explicit, reduces the amount of
transport-specific dependency surface carried by `codex-app-server`, and
gives future transport work a narrower place to evolve.

## What changed

- Added `codex-app-server-transport` and moved the existing transport
tree into it, including stdio, unix socket, websocket, remote-control
transport, and websocket auth.
- Moved shared transport-facing message types into the new crate so both
the transport implementation and `codex-app-server` use the same
definitions.
- Kept processor-facing connection state and outbound routing in
`codex-app-server`, with the routing tests moved next to that local
wrapper.
- Updated workspace metadata, Bazel crate metadata, and
`codex-app-server` dependencies for the new crate boundary.

## Validation

- `cargo metadata --locked --no-deps`
- `git diff --check`
- Attempted `cargo test -p codex-app-server-transport`, `cargo test -p
codex-app-server`, `just fix -p codex-app-server-transport`, and `just
fix -p codex-app-server`; all were blocked before compilation by the
existing `packageproxy` resolution failure for locked `rustls-webpki =
0.103.13`.
- Attempted Bazel build / lockfile validation; those were blocked by
external fetch failures against BuildBuddy / GitHub while resolving
`v8`.
2026-05-01 09:23:47 -07:00
jif-oai
5744b85b9a fix: cargo deny (#20627)
Fix cargo deny by ack the `RUSTSEC` while a fix land
```
  RUSTSEC-2026-0118
  NSEC3 closest-encloser proof validation enters unbounded loop on cross-zone responses

  RUSTSEC-2026-0119
  CPU exhaustion during message encoding due to O(n²) name compression

  Dependency path:

  hickory-proto 0.25.2
  └── hickory-resolver 0.25.2
      └── rama-dns 0.3.0-alpha.4
          └── rama-tcp 0.3.0-alpha.4
              └── codex-network-proxy
```

Also upgrade some workers version to prevent this:
```
warning[license-not-encountered]: license was not encountered
    ┌─ ./codex-rs/deny.toml:131:6
    │
131 │     "OpenSSL",
    │      ━━━━━━━ unmatched license allowance

warning[duplicate]: found 2 duplicate entries for crate 'base64'
   ┌─ /github/workspace/codex-rs/Cargo.lock:79:1
   │
79 │ ╭ base64 0.21.7 registry+https://github.com/rust-lang/crates.io-index
80 │ │ base64 0.22.1 registry+https://github.com/rust-lang/crates.io-index
   │ ╰───────────────────────────────────────────────────────────────────┘ lock entries
```
2026-05-01 18:15:38 +02:00
Eric Traut
3d1d164aee Remove no-tool goal continuation suppression (#20523)
## Why

`/goal` is supposed to keep Codex working until the goal is actually
done. The previous continuation logic had two ways to stop early: the
continuation prompt told the model to wait for new input when it felt
blocked, and the runtime suppressed another continuation turn after a
continuation finished without any tool calls.

That made goals stop short even when the agent could still keep making
progress (I received a few reports of this from users). It also relied
on a brittle heuristic that treated "no registry tool calls" as
equivalent to "should stop."

## What changed

- removed the continuation prompt sentence that told the model to stop
and wait for new input when it could not continue productively
- removed the goal runtime suppression heuristic that stopped
auto-continuation after a no-tool continuation turn
- deleted the continuation-activity bookkeeping and left `tool_calls` as
telemetry only
- added focused regressions for the two intended behaviors: completed
no-tool continuation turns still continue, while `request_user_input`
keeps the existing turn open instead of spawning a new continuation
2026-05-01 09:09:55 -07:00
Eric Traut
227bee0445 Enforce animations = false for screen readers (#20564)
## Why

Issue #20489 calls out that animated TUI affordances can be noisy for
screen-reader users. Codex already has `tui.animations = false` as a
reduced-motion setting, but some live activity rows render spinner-style
prefixes in that mode. These were relatively recent regressions.

We have also regressed this pattern more than once by adding new
spinner/shimmer callsites that do not think through the reduced-motion
path, so this PR adds a small guardrail while fixing the current
surfaces.

## What changed

- Omit the live status-row spinner when animations are disabled, so the
row starts with stable text like `Working (...)`.
- Render running hook headers without the spinner prefix when animations
are disabled, while preserving shimmer/spinner behavior when animations
are enabled.
- Centralize TUI activity indicators in `tui/src/motion.rs`, with
explicit reduced-motion choices for hidden prefixes, static bullets, and
plain shimmer-text fallbacks.
- Route existing spinner/shimmer callsites through the central motion
helper, including exec rows, MCP/web-search/loading rows, hook rows,
plugin loading, and onboarding loading text.
- Add a source-scan regression test that rejects direct `spinner(...)`
or `shimmer_spans(...)` usage outside the central module and primitive
definition.
- Add focused coverage that reduced-motion active exec rows are stable,
status rows start without a spinner, running hooks omit the spinner, and
MCP inventory loading stays stable.
- Update the one affected status-indicator snapshot; the existing detail
tree prefix remains unchanged.

## Verification

- `cargo test -p codex-tui`
2026-05-01 09:07:56 -07:00
pakrym-oai
f476338f93 Move apply-patch file changes into turn items (#20540)
## Why

Apply-patch file changes are now part of the core turn item stream, so
v2 clients can consume the same first-class item lifecycle path used by
other turn items instead of relying on app-server-specific remapping
from legacy patch events.

## What changed

- Added a core `TurnItem::FileChange` carrying apply-patch changes and
completion metadata.
- Updated the apply-patch tool emitter to send `ItemStarted` /
`ItemCompleted` with the new `FileChange` item while preserving legacy
`PatchApplyBegin` / `PatchApplyEnd` fan-out.
- Updated app-server v2 conversion to render the new core item directly
and stopped `event_mapping` from remapping old patch begin/end events
into item notifications.
- Kept thread history reconstruction based on the existing old
apply-patch events for rollout compatibility.

## Verification

- `cargo test -p codex-protocol -p codex-app-server-protocol`
- `cargo test -p codex-core --test all
apply_patch_tool_executes_and_emits_patch_events`
- `cargo test -p codex-app-server bespoke_event_handling`
2026-05-01 08:47:18 -07:00
jif-oai
0b04d1b3cc feat: export and replay effective config locks (#20405)
## Why

For reproducibility. A hand-written `config.toml` is not enough to
recreate what a Codex session actually ran with because layered config,
CLI overrides, defaults, feature aliases, resolved feature config,
prompt setup, and model-catalog/session values can all affect the final
runtime behavior.

This PR adds an effective config lockfile path: one run can export the
resolved session config, and a later run can replay that lockfile and
fail early if the regenerated effective config drifts.

## What Changed

- Add a dedicated `ConfigLockfileToml` wrapper with top-level lockfile
metadata plus the replayable config:

  ```toml
  version = 1
  codex_version = "..."

  [config]
  # effective ConfigToml fields
  ```

- Keep lockfile metadata out of regular `ConfigToml`; replay loads
`ConfigLockfileToml` and then uses its nested `config` as the
authoritative config layer.
- Add `debug.config_lockfile.export_dir` to write
`<thread_id>.config.lock.toml` when a root session starts.
- Add `debug.config_lockfile.load_path` to replay a saved lockfile and
validate the regenerated session lockfile against it.
- Add `debug.config_lockfile.allow_codex_version_mismatch` to optionally
tolerate Codex binary version drift while still comparing the rest of
the lockfile.
- Add `debug.config_lockfile.save_fields_resolved_from_model_catalog` so
lock creation can either save model-catalog/session-resolved fields or
intentionally leave those fields dynamic.
- Build lockfiles from the effective config plus resolved runtime values
such as model selection, reasoning settings, prompts, service tier, web
search mode, feature states/config, memories config, skill instructions,
and agent limits.
- Materialize feature aliases and custom feature config into the
lockfile so replay compares canonical resolved behavior instead of
user-authored alias shape.
- Strip profile/debug/file-include/environment-specific inputs from
generated lockfiles so they contain replayable values rather than the
inputs that produced those values.
- Surface JSON-RPC server error code/data in app-server client and TUI
bootstrap errors so config-lock replay failures include the actual TOML
diff.
- Regenerate the config schema for the new debug config keys.

## Review Notes

The main flow is split across these files:

- `config/src/config_toml.rs`: lockfile/debug TOML shapes.
- `core/src/config/mod.rs`: loading `debug.config_lockfile.*`, replaying
a lockfile as a config layer, and preserving the expected lockfile for
validation.
- `core/src/session/config_lock.rs`: exporting the current session
lockfile and materializing resolved session/config values.
- `core/src/config_lock.rs`: lockfile parsing, metadata/version checks,
replay comparison, and diff formatting.

## Usage

Export a lockfile from a normal session:

```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"'
```

Export a lockfile without saving model-catalog/session-resolved fields:

```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"' \
  -c 'debug.config_lockfile.save_fields_resolved_from_model_catalog=false'
```

Replay a saved lockfile in a later session:

```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"'
```

If replay resolves to a different effective config, startup fails with a
TOML diff.

To tolerate Codex binary version drift during replay:

```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"' \
  -c 'debug.config_lockfile.allow_codex_version_mismatch=true'
```

## Limitations

This does not support custom rules/network policies.

## Verification

- `cargo test -p codex-core config_lock`
- `cargo test -p codex-config`
- `cargo test -p codex-thread-manager-sample`
2026-05-01 17:46:02 +02:00
jif-oai
ff27d01676 feat: seed ad-hoc memory extension instructions (#20606)
## Summary

Ad-hoc memory notes are written under `memories/extensions/ad_hoc/`, but
the consolidation agent only knows how to interpret an extension when
the extension folder has an `instructions.md`. Seed those instructions
from the memories write pipeline so an enabled memories startup creates
the expected ad-hoc extension layout automatically.

This also moves extension-specific write behavior behind a dedicated
`memories/write/src/extensions/` module. `ad_hoc` owns the seeded
instructions template, while the existing resource-retention cleanup
lives in its own `prune` module so future memory extensions can add
their own write-side setup without growing a flat helper file.

## Changes

- Seed `memories/extensions/ad_hoc/instructions.md` during eligible
memory startup without overwriting an existing file.
- Store the ad-hoc instructions template under
`memories/write/templates/extensions/ad_hoc/`, keeping ownership in
`codex-memories-write`.
- Split memory extension support into `extensions::ad_hoc` and
`extensions::prune`.
- Keep the existing old-resource pruning behavior unchanged.

## Verification

- `cargo test -p codex-memories-write`
- `bazel build //codex-rs/memories/write:write`

---------

Co-authored-by: chatgpt-codex-connector[bot] <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
2026-05-01 14:43:58 +02:00
jif-oai
70fc55b8f3 chore: improve remember prompt (#20610) 2026-05-01 14:38:07 +02:00
jif-oai
97aae46800 feat: ad-hoc instructions (#20602) 2026-05-01 13:42:54 +02:00
jif-oai
ad404c8400 chore: allow memories edition (#20600) 2026-05-01 13:27:37 +02:00
xl-openai
48791920a8 feat: Track local paths for shared plugins (#20560)
When a local plugin is shared, Codex now records the local plugin path
by remote plugin id under CODEX_HOME/.tmp.

plugin/share/list includes the remote share URL and the matching local
plugin path when available, and plugin/share/delete
clears the local mapping after deleting the remote share.

Also add sharedURL to plugin/share/list.
2026-05-01 00:50:12 -07:00
xli-oai
96d2ea9058 Add remote plugin skill read API (#20150)
## Summary

Adds an app-server `plugin/skill/read` method for remote plugin skill
markdown. The new method calls the plugin-service skill detail endpoint
and returns `skill_md_contents`, so clients can preview skills for
remote plugins before the bundle is installed locally.

## Why

Uninstalled remote plugin skills do not have local `SKILL.md` files.
Without an on-demand remote read, the desktop plugin details UI cannot
render the skill details modal for those skills.

## Validation

- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server --test all --
suite::v2::plugin_read::plugin_skill_read_reads_remote_skill_contents_when_remote_plugin_enabled
--exact`
- `just fix -p codex-app-server-protocol -p codex-core-plugins -p
codex-app-server`
2026-05-01 00:16:25 -07:00
xli-oai
a62b52f826 Refresh remote plugin cache on auth changes (#20265)
## Summary
- Refresh the remote installed-plugin cache after login/logout instead
of keying it by account or eagerly clearing it.
- Reuse the existing single-flight remote installed refresh loop so
newer queued auth refreshes replace older pending requests and the API
result eventually overwrites or clears the cache.
- Keep derived plugin/skills cache and MCP refresh side effects behind
the existing effective-plugin-changed task when the refreshed installed
state changes.
- Leave `clear_plugin_related_caches` scoped to derived plugin/skills
caches so share mutations do not drop remote installed plugins.

## Tests
- `cargo fmt --all --manifest-path codex-rs/Cargo.toml` (passes; stable
rustfmt warns that `imports_granularity = Item` is nightly-only)
- `cargo test -p codex-core-plugins remote_installed_cache`
- `cargo test -p codex-app-server
skills_list_loads_remote_installed_plugin_skills_from_cache`
2026-04-30 23:11:14 -07:00
Eric Traut
a93c89f497 Color TUI statusline from active theme (#19631)
## Why

Users have shared that the TUI can feel too visually flat because themes
mostly show up in code syntax highlighting. The configurable statusline
is a natural place to make the active theme more visible, while still
letting users keep the existing monotone statusline if they prefer it.

## What Changed

- Added a statusline styling helper that builds the rendered statusline
from `(StatusLineItem, text)` segments, preserving item identity while
keeping the plain text output unchanged.
- Derived foreground accent colors from the active syntax theme by
looking up TextMate scopes through the existing syntax highlighter, with
conservative ANSI fallbacks when a scope does not provide a foreground.
- Tuned theme-derived colors to keep the accents visible without making
the statusline feel overly bright.
- Added `[tui].status_line_use_colors`, defaulting to `true`, plus a
separated `/statusline` toggle so users can enable or disable
theme-derived statusline colors from the setup UI.
- Updated the live statusline and `/statusline` preview to use the same
styled builder, while keeping terminal-title preview text plain.
- Kept statusline separators and active-agent add-ons subdued while
removing blanket dimming from the whole passive statusline.

## Verification

- `cargo test -p codex-tui status_line`
- `cargo test -p codex-tui theme_picker`
- `cargo test -p codex-tui foreground_style_for_scopes`
- `cargo test -p codex-tui`
- `cargo test -p codex-config`
- `cargo test -p codex-core status_line_use_colors`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`

## Visual

<img width="369" height="23" alt="Screenshot 2026-04-30 at 6 16 08 PM"
src="https://github.com/user-attachments/assets/11d03efb-8e4f-4450-8f4d-00a9659ef4cd"
/>

<img width="385" height="23" alt="Screenshot 2026-04-30 at 6 16 02 PM"
src="https://github.com/user-attachments/assets/a3d89f36-bdc1-42e8-8e84-61350e3999e2"
/>
2026-04-30 22:42:48 -07:00
Eric Traut
d898cc8f3f Format multi-day goal durations in the TUI (#20558)
## Why

Goal mode shows elapsed time in compact hour/minute form. That is easy
to scan for shorter runs, but once a goal runs past 24 hours, large hour
counts become harder to read at a glance.

## What changed

Updated `codex-rs/tui/src/goal_display.rs` so unbudgeted goal elapsed
time keeps the existing compact format below one day, then switches to a
day-aware format once the elapsed time reaches 24 hours:

- `23h 59m`
- `1d 0h 0m`
- `2d 23h 42m`

The formatter now covers the 24-hour boundary in unit tests, and the TUI
status-line snapshot for a completed elapsed goal now exercises the
multi-day display.

## Verification

- `cargo test -p codex-tui`

Here's my longest-running test task:

<img width="186" height="23" alt="image"
src="https://github.com/user-attachments/assets/cedfcdab-7f6e-44e6-8495-8a39f63973fb"
/>
2026-04-30 22:42:07 -07:00
Tom
fe05acad23 Make thread store process-scoped (#19474)
- Build one app-server process ThreadStore from startup config and share
it with ThreadManager and CodexMessageProcessor.
- Remove per-thread/fork store reconstruction so effective thread config
cannot switch the persistence backend.
- Add params to ThreadStore create/resume for specifying thread
metadata, since otherwise the metadata from store creation would be used
(incorrectly).
2026-04-30 21:24:59 -07:00
pakrym-oai
f50c02d7bc [codex] Remove unused event messages (#20511)
## Why

Several legacy `EventMsg` variants were still emitted or mapped even
though clients either ignored them or had moved to item/lifecycle
events. `Op::Undo` had also degraded to an unavailable shim, so this
removes that dead task path instead of preserving a command that cannot
do useful work.

`McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
intentionally kept because useful consumers still depend on them: MCP
startup completion drives readiness behavior, and the begin events let
app-server/core consumers surface in-progress web-search and
image-generation items before the final payload arrives.

## What Changed

- Removed weak legacy event variants and payloads from `codex-protocol`,
including legacy agent deltas, background events, and undo lifecycle
events.
- Kept/restored `EventMsg::McpStartupComplete`,
`EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
serializer and emission coverage.
- Updated core, rollout, MCP server, app-server thread history,
review/delegate filtering, and tests to rely on the useful replacement
events that remain.
- Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
slash-command comments.
- Stopped agent job/background progress and compaction retry notices
from emitting `BackgroundEvent` payloads.

## Verification

- `cargo check -p codex-protocol -p codex-app-server-protocol -p
codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-core --test all suite::items`
- `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
-p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
startup TUI tests.
2026-04-30 20:03:26 -07:00
xli-oai
bb60b78c46 Surface admin-disabled remote plugin status (#20298)
## Summary

Remote plugin-service returns plugin availability separately from a
user's installed/enabled state. This adds `PluginAvailabilityStatus` to
the app-server protocol, propagates remote catalog `status` into
`PluginSummary`, and rejects install attempts for remote plugins marked
`DISABLED_BY_ADMIN` before downloading or caching the bundle.

This is the `openai/codex` half of the change. The companion
`openai/openai` webview PR is
https://github.com/openai/openai/pull/873269.

## Validation

- `cargo run -p codex-app-server-protocol --bin write_schema_fixtures`
- `cargo test -p codex-app-server --test all
plugin_list_marks_remote_plugin_disabled_by_admin`
- `cargo test -p codex-app-server --test all
plugin_list_includes_remote_marketplaces_when_remote_plugin_enabled`
- `cargo test -p codex-app-server --test all
plugin_install_rejects_remote_plugin_disabled_by_admin_before_download`
- `cargo test -p codex-app-server-protocol schema_fixtures`
2026-04-30 20:00:07 -07:00
Tom
c39824c2fd [codex] Improve PR babysitter CI diagnostics and guardrails (#20484)
## Summary

- Surface failed GitHub Actions jobs in the PR babysitter watcher so
Codex can fetch job logs as soon as a job fails, instead of waiting for
the overall workflow run to complete.
- Update babysit-pr skill instructions, GitHub API notes, and heuristics
to prefer direct job log archives before falling back to `gh run view
--log-failed`.
- Add guardrails requiring explicit user confirmation before posting
replies to human-authored review comments.
- Add guardrails preventing Codex from patching unrelated flaky tests,
CI infrastructure, runner issues, dependency outages, or other failures
not caused by the PR branch.

## Validation

- `python3 -m pytest
.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py`
2026-04-30 19:58:19 -07:00
rhan-oai
6b1b227804 [codex-analytics] centralize thread analytics state (#20300)
## Why

Several analytics event families need the same per-thread attribution
state: the app-server client/runtime associated with a thread and, for
lifecycle-oriented events, the thread metadata captured during
initialization. Keeping connection ids and lifecycle metadata in
separate maps made each consumer rebuild the same thread context and
made subagent attribution harder to resolve consistently.

## What changed

- Replaces the separate thread connection and metadata maps with one
reducer-owned `threads` map.
- Routes guardian, compaction, turn-steer, and turn analytics through
shared thread-state lookups while preserving turn-origin attribution for
turn events and request-origin attribution for steer events.
- Lets newly observed spawned subagent threads inherit their parent
thread connection so later thread-scoped analytics can resolve through
the same state model.
- Adds regression coverage for standalone `SubAgentThreadStarted`
publication plus the `SubAgentSource::ThreadSpawn` parent fallback
through a thread-scoped consumer that depends on inherited connection
state.

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20300).
* #18748
* #18747
* #17090
* #17089
* #20239
* #20515
* #20514
* __->__ #20300
2026-04-30 18:58:50 -07:00
Ruslan Nigmatullin
972b819213 app-server: switch remote control to protocol v3 segmentation (#20341)
## Why

Remote-control protocol v3 makes segmentation an explicit wire-level
feature. The app-server transport needs to support that protocol
directly so large messages can be chunked, acknowledged, replayed, and
reassembled consistently.

## What changed

- Bump the remote-control websocket protocol version from `2` to `3`.
- Add explicit client/server chunk envelope variants plus chunk-aware
acknowledgements.
- Split oversized outbound server messages into bounded transport
chunks.
- Reassemble ordered inbound client chunks with bounded memory usage and
stream/client invalidation handling.
- Track inbound chunk cursors and outbound ack cursors as `(seq_id,
segment_id)` so duplicate chunks and partial replays behave correctly.
- Add focused coverage for chunk splitting, reassembly, duplicate
suppression, and stream replacement behavior.

## Validation

- Added targeted unit coverage for segmented message handling in
`remote_control`.
- Local validation is currently blocked before compilation because
`packageproxy` does not serve the locked `rustls-webpki 0.103.13`
dependency required by the workspace.
2026-04-30 18:27:16 -07:00
Dylan Hurd
af089fb21d fix(exec_policy) heredoc parsing file_redirect (#20113)
## Summary
Fixes a regression introduced in #10941 so that heredocs do not permit
file redirects to be approved by rules, and adds scenario tests to cover
this behavior.


Previously, heredoc command parsing would allow redirects and
environment variables:
```bash
# commands_for_exec_policy() would parse this via parse_shell_lc_single_command_prefix
PATH=/tmp/bad:$PATH cat <<'EOF' > /tmp/bad/hello.txt
hello
EOF
```
This conflicts with the Codex Rules documentation; heredoc parsing logic
should abide by the same strictness of parsing.


## Tests
- [x] Updated unit tests accordingly
- [x] Added scenario tests for these cases

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-01 01:05:02 +00:00
iceweasel-oai
4f96001fa7 execpolicy: unwrap PowerShell -Command wrappers on Windows (#20336)
## Why
On Windows, Codex runs shell commands through a top-level
`powershell.exe -NoProfile -Command ...` wrapper. `execpolicy` was
matching that wrapper instead of the inner command, so prefix rules like
`["git", "push"]` did not fire for PowerShell-wrapped commands even
though the same normalization already happens for `bash -lc` on Unix.

This change makes the Windows shell wrapper transparent to rule matching
while preserving the existing Windows unmatched-command safelist and
dangerous-command heuristics.

## What changed
- add `parse_powershell_command_plain_commands()` in
`shell-command/src/powershell.rs` to unwrap the top-level PowerShell
`-Command` body with `extract_powershell_command()` and parse it with
the existing PowerShell AST parser
- update `core/src/exec_policy.rs` so `commands_for_exec_policy()`
treats top-level PowerShell wrappers like `bash -lc` and evaluates rules
against the parsed inner commands
- carry a small `ExecPolicyCommandOrigin` through unmatched-command
evaluation and expose `is_safe_powershell_words()` /
`is_dangerous_powershell_words()` so Windows safelist and
dangerous-command checks still work after unwrap
- add Windows-focused tests for wrapped PowerShell prompt/allow matches,
wrapper parsing, and unmatched safe/dangerous inner commands, and
re-enable the end-to-end `execpolicy_blocks_shell_invocation` test on
Windows

## Testing
- `cargo test -p codex-shell-command`
2026-05-01 00:56:20 +00:00
Abhinav
0d9a5d20ec Alias codex_hooks feature as hooks (#20522)
# Why

The hooks feature flag should use the concise canonical name `hooks`,
while existing configs that still use `codex_hooks` continue to work
during the rename.

# What

- change the canonical `Feature::CodexHooks` key from `codex_hooks` to
`hooks`
- register `codex_hooks` through the existing legacy-alias path
- update the config schema and canonical config fixtures to prefer
`hooks`
- add regression coverage that both `hooks` and `codex_hooks` resolve to
`Feature::CodexHooks`

# Verification

- `cargo test -p codex-features`
- `cargo test -p codex-core config::schema_tests`
- `cargo test -p codex-core
pre_tool_use_blocks_shell_when_defined_in_config_toml`
- `cargo test -p codex-app-server
hooks_list_uses_each_cwds_effective_feature_enablement`
2026-05-01 00:46:33 +00:00
Owen Lin
5affb7f9d5 fix(app-server): mark thread/turns/list and exclude_turns as experime… (#20499)
…ntal

We have some bugs to work out and it is not quite ready to consume as a
public API.
2026-04-30 17:39:08 -07:00
xli-oai
acdf908268 Emit analytics for remote plugin installs (#20267)
## Summary

- emit `codex_plugin_installed` after a remote plugin install succeeds
- keep local installs unchanged, but let remote installs override the
analytics `plugin_id` with the backend remote plugin id
(`plugins~Plugin_...`)
- preserve the local/display identity in `plugin_name` and
`marketplace_name`, plus capability metadata from the installed bundle
- add regression coverage for local install analytics, remote install
analytics, and analytics id override serialization

## Testing

- `just fmt`
- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server`
2026-04-30 17:27:16 -07:00
Felipe Coury
b6f81257f8 feat(tui): add vim composer mode (#18595)
## Why

Codex now has configurable TUI keymaps, but the composer still behaves
like a plain text field. Users who prefer modal editing need a way to
keep Vim muscle memory while drafting prompts, and the keymap picker
needs to expose Vim-specific actions if those bindings are configurable
instead of hardcoded.

## What Changed

- Adds composer Vim mode with insert/normal state, common normal-mode
movement and editing commands, `d`/`y` operator-pending flows, and
mode-aware footer and cursor indicators.
- Adds `/vim`, an optional global `toggle_vim_mode` binding, and
`tui.vim_mode_default` so Vim mode can be toggled per session or enabled
as the default composer state.
- Extends runtime and config keymaps with `vim_normal` and
`vim_operator` contexts, exposes those contexts in `/keymap`, refreshes
the config schema, and validates Vim bindings separately.
- Integrates Vim normal mode with existing composer behavior: `/` opens
slash command entry, `!` enters shell mode, `j`/`k` navigate history at
history boundaries, successful submissions reset back to normal mode,
and paste burst handling remains insert-mode only.
- Teaches the TUI render path to apply and restore cursor style so Vim
insert mode can use a bar cursor without leaving the terminal in that
state after exit.

## Validation

- `cargo test -p codex-tui keymap -- --nocapture` on the keymap/Vim
coverage
- `cargo insta pending-snapshots`

## Docs

This introduces user-facing `/vim`, `tui.vim_mode_default`, and Vim
keymap contexts under `tui.keymap`, so the public CLI configuration and
slash-command docs should be updated before the feature ships.
2026-04-30 17:20:51 -07:00
maja-openai
a5ebedef67 Bypass review for always-allow MCP tools in auto-review (#20069)
## Why

When an MCP or app tool is configured with approval mode `approve`
(always allow), users expect that decision to be authoritative. In
guardian auto-review mode, ARC could still return `ask-user`, which then
routed the approval question into guardian with the ARC reason as
context. That meant a tool explicitly configured as always allowed still
went through both safety monitors before running.

This change keeps the existing ARC behavior for non-auto-review
sessions, but avoids the ARC-to-guardian sequence when
`approvals_reviewer = auto_review` and the tool approval mode is
`approve`.

## What changed

- Short-circuit MCP tool approval handling when `approval_mode ==
approve` and `approvals_reviewer == auto_review`.
- Updated the MCP approval regression test so the auto-review case
asserts neither ARC nor guardian is called.
- Preserved existing tests that verify ARC can still block always-allow
MCP tools outside guardian auto-review mode.

## Verification

- `cargo test -p codex-core --lib mcp_tool_call`
2026-04-30 16:44:09 -07:00
Owen Lin
5de7992ee5 fix(tui): set persist_extended_history: false (#20502)
Large rollouts are no good. This updates the TUI to behave the same as
the Codex App, which is also turning it off.
2026-04-30 23:31:31 +00:00
xli-oai
2686873e77 Sync remote installed plugin bundles (#20268)
## Summary
- Download missing remote installed plugin bundles during app-server
startup and plugin/list refresh.
- Upgrade cached remote installed bundles when the backend installed
version changes.
- Remove stale remote installed bundle caches without writing remote
plugin state into config.toml.

## Review note
This is a clean PR branch cut from the current diff on top of latest
`origin/main`. The diff intentionally has no `codex-rs/core/**` files,
so CODEOWNERS should not request the core-directory owner review from
stale PR history.

## Validation
Already run on the source branch before creating this clean PR:
- `just fmt`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-app-server --test all
app_server_startup_sync_downloads_remote_installed_plugin_bundles --
--nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_sync_upgrades_and_removes_remote_installed_plugin_bundles --
--nocapture`
- `cargo test -p codex-app-server --test all
app_server_startup_remote_plugin_sync_runs_once -- --nocapture`
- `just fix -p codex-core-plugins`
- `just fix -p codex-app-server`
- `git diff --check`
2026-04-30 16:05:14 -07:00
Owen Lin
9ddb267e9c fix: ignore dangerous project-level config keys (#20098)
## Description
Ignore these top-level config keys when loading project-scoped
config.toml files:
```
    "openai_base_url",
    "chatgpt_base_url",
    "model_provider",
    "model_providers",
    "profile",
    "profiles",
    "experimental_realtime_ws_base_url",
```

## What changed

- Add a project-local config denylist for credential-routing fields such
as `openai_base_url`, `chatgpt_base_url`, `model_provider`,
`model_providers`, `profile`, `profiles`, and
`experimental_realtime_ws_base_url`.
- Strip those fields from project config layers before they participate
in effective config merging, while leaving safe project-local settings
intact.
- Track ignored project-local keys on config layers and surface a
startup warning telling users to move those settings to user-level
`config.toml` if they intentionally need them.
- Update profile behavior coverage so project-local `profile` /
`profiles` entries are ignored instead of overriding user-level profile
selection.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core
project_layer_ignores_unsupported_config_keys`
- `cargo test -p codex-core project_profiles_are_ignored`
- `cargo test -p codex-core config::config_loader_tests`
2026-04-30 23:03:01 +00:00
Owen Lin
6014b6679f fix flaky test falls_back_to_registered_fallback_port_when_default_po… (#20504)
…rt_is_in_use
2026-04-30 22:06:04 +00:00
Akshay Nathan
8426edf71e Stateful streaming apply_patch parser 2026-04-30 21:41:15 +00:00
xl-openai
7b3de63041 Move plugin out of core. (#20348) 2026-04-30 14:26:14 -07:00
Tom
127be0612c [codex] Migrate thread turns list to thread store (#19280)
- migrate `thread/turns/list` to ThreadStore. Uses ThreadStore for most
data now but merges in the in-memory state from thread manager
- keep v2 `thread/list` pathless-store friendly by converting
`StoredThread` directly to API `Thread`
- add regression coverage for pathless store history/listing
2026-04-30 14:16:42 -07:00
alexsong-oai
9121132c8f Send external import completion for sync imports (#20379) 2026-04-30 13:03:21 -07:00
Matthew Zeng
70090c9ff7 [plugin] Add Canva to suggesteable list. (#20474)
- [x] Add Canva to suggesteable list.
2026-04-30 12:39:52 -07:00
iceweasel-oai
8121710ffe install WFP filters for Windows sandbox setup (#20101)
## Summary

This PR installs a first wave of WFP (Windows Filtering Platform)
filters that reduce the surface area of network egress vulnerabilities
for the Windows Sandbox.

- Add persistent Windows Filtering Platform provider, sublayer, and
filters for the Windows sandbox offline account.
- Install WFP filters during elevated full setup, log failures
non-fatally, and emit setup metrics when analytics are enabled.
- Bump the Windows sandbox setup version so existing users rerun full
setup and receive the new filters.

## What WFP is
Windows Filtering Platform (WFP) is the low-level Windows networking
policy engine underneath things like Windows Firewall. It lets
privileged code install persistent filtering rules at specific network
stack layers, with conditions like "only traffic from this Windows
account" or "only this remote port," and an action like block.

In this change, we create a Codex-owned persistent WFP provider and
sublayer, then install block filters scoped to the Windows sandbox's
offline user account via `ALE_USER_ID`. That means the filters are
targeted at sandboxed processes running as that account, rather than
globally affecting the host.

## Initial filter set
We are starting with 12 concrete WFP filters across a few high-value
bypass surfaces. The table below describes the filter families rather
than one filter per row:

| Area | Concrete filters | Purpose |
| --- | --- | --- |
| ICMP | 4 filters: ICMP v4/v6 on `ALE_AUTH_CONNECT` and
`ALE_RESOURCE_ASSIGNMENT` | Block direct ping-style network reachability
checks from the offline account. |
| DNS | 2 filters: remote port `53` on `ALE_AUTH_CONNECT_V4/V6` | Block
direct DNS queries that bypass our intended proxy/offline path. |
| DNS-over-TLS | 2 filters: remote port `853` on
`ALE_AUTH_CONNECT_V4/V6` | Block encrypted DNS attempts that could
bypass ordinary DNS interception. |
| SMB / NetBIOS | 4 filters: remote ports `445` and `139` on
`ALE_AUTH_CONNECT_V4/V6` | Block Windows file-sharing/network share
traffic from sandboxed processes. |

For IPv4/IPv6 coverage, the port-based filters are installed on both
`ALE_AUTH_CONNECT_V4` and `ALE_AUTH_CONNECT_V6`. ICMP also gets both
connect-layer and resource-assignment-layer coverage because ICMP
traffic is shaped differently from ordinary TCP/UDP port traffic.

## Validation
- `cargo fmt -p codex-windows-sandbox` (completed with existing
stable-rustfmt warnings about `imports_granularity = Item`)
- `cargo test -p codex-windows-sandbox wfp::tests`
- `cargo test -p codex-windows-sandbox` (fails in existing legacy
PowerShell sandbox tests because `Microsoft.PowerShell.Utility` could
not be loaded; WFP tests passed before that failure)
2026-04-30 12:39:01 -07:00
Owen Lin
7dd08e304c feat(rollouts): store EventMsg::ApplyPatchEnd in limited history mode (#20463)
The Codex App treats apply patch tool calls quite load-bearing in the UI
(always shown on a completed turn), so we'd like to persist
`EventMsg::ApplyPatchEnd` to guarantee that when a client reconnects to
app-server mid-turn, we always have the full diff to display at the end
of that turn.
2026-04-30 12:11:02 -07:00
iceweasel-oai
06f3b4836a [codex] Fix elevated Windows sandbox named-pipe access (#20270)
## Summary
- add elevated-only token constructors that include the current token
user SID in the restricted SID list
- switch the elevated Windows command runner to use those constructors
- leave the unelevated restricted-token path unchanged

## Why
Windows named pipes created by tools like Ninja use the platform's
default named-pipe ACL when no explicit security descriptor is provided.
In the elevated sandbox, the pipe owner has access, but the
write-restricted token can still fail its restricted-SID access check
because the sandbox user SID was not in the restricting SID set. That
causes child processes to exit successfully while Ninja never receives
the expected pipe completion/close behavior and hangs.

Including the elevated sandbox user's SID in the restricting SID list
lets the restricted check succeed for these owner-scoped pipe objects
without broadening the unelevated sandbox to the real signed-in user.

## Impact
- fixes the minimal Ninja hang repro in the elevated Windows sandbox
- preserves the existing unelevated sandbox behavior and write
protections
- keeps the change scoped to the elevated runner rather than changing
shared token semantics
- this does not affect file-writes for the sandbox because the sandbox
users themselves do not receive any additional permissions over what the
capability SIDs already have. In fact we don't even explicitly grant the
sandbox user ACLs anywhere.

## Validation
- `cargo build -p codex-windows-sandbox --quiet`
- verified the stock `ninja.exe` minimal repro exits normally on host
and in the elevated sandbox
- verified the same repro still hangs in the unelevated sandbox, which
is the intended scope of this change
2026-04-30 12:06:11 -07:00
Celia Chen
31f8813e3e fix: show correct Bedrock runtime endpoint in /status (#20275)
## Why

`/status` was showing the configured `ModelProviderInfo.base_url` for
Amazon Bedrock, which can be stale or misleading because the actual
Bedrock Mantle endpoint is derived at runtime from the resolved AWS
region. This made sessions report the wrong provider endpoint even
though requests used the correct runtime URL.

## What changed

- Added `ModelProvider::runtime_base_url()` so provider implementations
can expose the request-time base URL through the shared runtime provider
abstraction.
- Moved Bedrock region-to-Mantle URL resolution into
`amazon_bedrock::mantle::runtime_base_url()`, keeping region resolution
private to the Mantle module.
- Overrode `runtime_base_url()` for Amazon Bedrock so it returns the
resolved Mantle endpoint instead of the configured default.
- Resolved and cached the runtime provider base URL during TUI startup,
then used that cached value when rendering `/status`.
- Added status coverage that verifies Bedrock displays the runtime URL
and ignores the configured Bedrock `base_url` when they differ.

## Verification
model provider is resolved correctly in local build:
<img width="696" height="245" alt="Screenshot 2026-04-29 at 5 01 36 PM"
src="https://github.com/user-attachments/assets/a13c10a5-3720-41ab-8ace-3c4bc573f971"
/>
2026-04-30 19:02:34 +00:00
Abhinav
93d53f655b Add /hooks browser for lifecycle hooks (#19882)
## Why

`hooks/list` and `hooks/config/write` give us read/write access to hooks
and their state. This hooks up the TUI as a client so users can inspect
and manage that state directly.

## What

- add a two-page `/hooks` browser in the TUI: an event overview with
installed/active counts, followed by a per-event handler page with
toggle controls and detail rendering
- thread managed-state metadata through hook discovery and `hooks/list`
so the UI can label admin-managed hooks and suppress toggles for them
- persist hook toggles through the existing config-write path and add
snapshot coverage for the event list, handler list, managed-hook, and
empty states

## Stack

1. openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. This PR - openai/codex#19882

## Reviewer Notes

- Main UI logic is in
`codex-rs/tui/src/bottom_pane/hooks_browser_view.rs`; most of the diff
is the new view plus its snapshot coverage
- Request / write plumbing for opening the browser and persisting
toggles is in `codex-rs/tui/src/app/background_requests.rs` and
`codex-rs/tui/src/chatwidget/hooks.rs`
- Outside the TUI, the only behavioral change in this PR is threading
`is_managed` through hook discovery and `hooks/list` so managed hooks
render as non-toggleable
- The `codex-rs/tui/src/status/snapshots/` churn is unrelated merge
fallout from the stacked base branch's newer permission-label rendering

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 11:58:27 -07:00
khoi
719431da6e [Codex] Add browser use external feature flag (#20245)
## Summary

- Adds a separate feature control for external-browser Browser Use
integrations.
- Registers `browser_use_external` as a stable, default-enabled
requirements-owned feature key.
- Updates feature registry tests and regenerates the config schema.

Codex validation:
- `cargo fmt -- --config imports_granularity=Item`
- `cargo run -p codex-core --bin codex-write-config-schema`
- `cargo test -p codex-features`

## Addendum

This gives enterprise policy a coarse control for Browser Use outside
the Codex-managed in-app browser. The existing `browser_use` feature is
the Browser Use control, while `browser_use_external` can gate
extension/native integrations for external browsers as that surface
grows
2026-04-30 11:53:19 -07:00
pakrym-oai
b52083146c Stop emitting item/fileChange/outputDelta output delta notifications (#20471)
## Why

`item/fileChange/outputDelta` text output was only the tool's summary or
error text and not used by client surfaces.

We keep `item/fileChange/outputDelta` in the app-server protocol as a
deprecated compatibility entry, but the server no longer emits it.

## What changed

- stop the `apply_patch` runtime from emitting `ExecCommandOutputDelta`
events
- simplify `item_event_to_server_notification` so command output deltas
always map to `item/commandExecution/outputDelta`
- remove the app-server bookkeeping that tried to detect whether an
output delta belonged to a file change
- mark `item/fileChange/outputDelta` as a deprecated legacy protocol
entry in the v2 types, schema, and README
- simplify the file-change approval tests so they only wait for
completion instead of expecting output-delta notifications

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-thread-manager-sample`
- `cargo test -p codex-app-server-protocol
protocol::event_mapping::tests::exec_command_output_delta_maps_to_command_execution_output_delta
-- --exact`
- `cargo test -p codex-app-server
turn_start_file_change_approval_accept_for_session_persists_v2 --
--exact` *(failed before the test assertions because the wiremock
`/responses` mock received 0 requests in setup)*
2026-04-30 11:42:07 -07:00
Eric Traut
f2bc2f26a9 Remove core protocol dependency [2/2] (#20325)
## Why

With the local model layer and app-server routing in place from PR1,
this PR moves the active TUI runtime onto app-server notifications. The
affected pieces share the same event flow, so the command surface,
session state, bottom-pane prompts, chat rendering, history/status
views, and tests move together to keep the stacked branch buildable.

This PR also removes the obsolete compatibility surface that is no
longer used after the migration. The proposed protocol-boundary verifier
layer was dropped from the stack; enforcing that final boundary will be
simpler once `codex-tui` no longer needs any `codex_protocol`
references.

This PR is part 2 of a 2-PR stack:

1. Add TUI-owned replacement models and extract app-server event
routing.
2. Move the active TUI flow to app-server notifications and delete
obsolete adapter code.

## What changed

- Rewired app command and session handling to use app-server request and
notification shapes.
- Moved approval overlays, request-user-input flows, MCP elicitation,
realtime events, and review commands onto the app-server-facing model
surface.
- Updated chat rendering, history cells, status views, multi-agent UI,
replay state, and TUI tests to use app-server notifications plus the
local models introduced in PR1.
- Deleted `codex-rs/tui/src/app/app_server_adapter.rs` and the
superseded `chatwidget/tests/background_events.rs` fixture path.

## Verification

- `cargo check -p codex-tui --tests`
- Top of stack: `cargo test -p codex-tui`
2026-04-30 11:34:34 -07:00
pakrym-oai
5cc5f12efc Move item event mapping into app-server-protocol (#20299)
## Why

Follow-up to #20291.

The v2 item-event-to-notification translation had been embedded in
`app-server/src/bespoke_event_handling.rs`, which made it hard to reuse
anywhere else. This PR moves that stateless mapping into shared protocol
code so other entry points can produce the same `ServerNotification`
payloads without copying app-server logic.

That also lets `thread-manager-sample` demonstrate the same notification
surface that the app server exposes, instead of only printing the final
assistant message.

## What changed

- move `item_event_to_server_notification` into
`codex-app-server-protocol::protocol::event_mapping`
- keep the mapper tests next to the shared implementation in
`codex-app-server-protocol`
- re-export the mapper from `codex-core-api` so lightweight consumers
can use it without reaching into `app-server-protocol` directly
- simplify `app-server/src/bespoke_event_handling.rs` so it delegates
the stateless event-to-notification projection to the shared helper
- update `thread-manager-sample` to:
  - print mapped notifications as newline-delimited JSON
  - use the shared mapper through `codex-core-api`
- enable the default feature set so the sample exposes the normal tool
surface
- use a `read_only` permission profile so shell commands can run in the
sample without widening permissions

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core-api`
- `cargo test -p codex-app-server bespoke_event_handling::tests`
- `cargo test -p codex-thread-manager-sample`
- `cargo run -p codex-thread-manager-sample -- "briefly explore the repo
with pwd and ls, then summarize it"`
2026-04-30 11:02:13 -07:00
Eric Traut
c70cdc108f Remove core protocol dependency [1/2] (#20324)
## Why

This stack moves `codex-tui` away from the core protocol event surface
and toward app-server API shapes plus TUI-owned local models. This first
PR sets up the lower-risk foundation: it introduces the local model
surface and extracts app-server event routing into focused TUI modules
while preserving the existing behavior for the larger migration in PR2.

This PR is part 1 of a 2-PR stack:

1. Add TUI-owned replacement models and extract app-server event
routing.
2. Move the active TUI flow to app-server notifications and delete
obsolete adapter code.

## What changed

- Added TUI-owned approval, diff, session state, session resume, token
usage, and user-message models.
- Added `app/app_server_event_targets.rs` and `app/app_server_events.rs`
to hold app-server event targeting and dispatch logic outside `app.rs`.
- Updated app/status tests to use the local model layer and added
focused routing coverage.
- Boxed a few large async TUI test futures so this base layer remains
checkable without overflowing the default test stack.

## Verification

- `cargo check -p codex-tui --tests`
2026-04-30 10:52:19 -07:00
teddywyly-oai
487716ae74 [Extension] Allowlist Chrome Extension in the tool_suggest tool (#20458)
### Summary
Allowlist chrome extension in tool_suggest tool

### Screenshot
Allowlist chrome extension in tool_suggest tool
<img width="808" height="309" alt="chrome_internal"
src="https://github.com/user-attachments/assets/ed769d77-b635-4a40-a0c5-fbff05af3036"
/>
2026-04-30 10:29:03 -07:00
canvrno-oai
a85d265097 /plugins: remove marketplace (#19843)
This PR adds marketplace removal to the /plugins menu, giving users a
way to remove user-configured plugin marketplaces. It adds a `Ctrl+R`
shortcut to remove selected marketplace tabs, a confirmation prompt,
loading and error states, and the app-server request flow needed to
perform marketplace/remove. After a successful removal, the TUI
refreshes config, plugin mentions, user config, and plugin data so the
removed marketplace disappears from the menu and other surfaces in the
TUI.

- Add `Ctrl+R` removal option for user-configured marketplace tabs
- Show marketplace removal confirmation, loading, and error states
- Route `marketplace/remove` through the TUI background request flow
- Refresh config, plugin mentions, and plugin data after successful
removal
- Adds reusable per-tab footer hints so removal guidance only appears on
applicable tabs
- Add test coverage for `Ctrl+R` behavior while plugin search is active

Steps to test:
- Add a marketplace using the TUI /plugins menu
- Use Ctrl+R to remove the marketplace
- Accept the confirmation prompt
- Confirm the marketplace is removed when the process completes.
2026-04-30 10:25:07 -07:00
Eric Traut
c02814c106 Mark goals feature as experimental (#20083)
## Why

The `goals` feature flag is ready to move out of the hidden
under-development bucket and into the user-facing experimental surface.
Marking it experimental lets users discover it through the experimental
features UI while still making clear that it is opt-in.

## What changed

- Changed `goals` from `Stage::UnderDevelopment` to
`Stage::Experimental` in `codex-rs/features/src/lib.rs`.
- Added experimental menu metadata for the feature with the description
`Set a persistent goal Codex can continue over time`.

## Verification

- `cargo test -p codex-features`
2026-04-30 10:06:44 -07:00
Owen Lin
3516cb9751 fix(core): truncate large mcp tool outputs in rollouts (#20260)
## Why
Large MCP tool call outputs can make rollout JSONL files enormous. In
the session that motivated this change, the biggest JSONL records were:
- `event_msg/mcp_tool_call_end`
- `response_item/function_call_output`

both containing the same unbounded MCP payloads - just 3 MCP tool calls
that each were multi-hundred MBs 😱

This PR truncates both of those JSONL records.

## How

#### For `response_item/function_call_output`
Unified exec already bounds tool output before it is injected into
model-facing history, which also keeps the corresponding rollout
`response_item/function_call_output` records small.

MCP should follow the same pattern: truncate the model-facing tool
output at the tool-output boundary, while leaving code-mode/raw hook
consumers alone.

#### For `event_msg/mcp_tool_call_end`
`McpToolCallEnd` also needs its own bounded event copy because it is the
app-server/replay/UI event shape that backs `ThreadItem::McpToolCall`.
Unfortunately this is _not_ downstream of the `ToolOutput` trait.

## Model behavior 
Model behavior is actually unchanged as a result of this PR. 

Before this PR, MCP output was:
1. Converted to `FunctionCallOutput`.
2. Recorded into in-memory history.
3. Truncated by `ContextManager::record_items()` before later model
turns saw it.

After this branch, MCP output is truncated earlier, in
`McpToolOutput::response_payload()`, using the same helper. Then
`ContextManager::record_items()` sees an already-truncated output and
effectively has little/no additional work to do.

So the model should still see the same kind of truncated function-call
output. The practical difference is where truncation happens: earlier,
before rollout persistence/app-server emission can see the giant
payload.

## Verification

- `cargo test -p codex-core mcp_tool_output`
- `cargo test -p codex-core
mcp_tool_call::tests::truncate_mcp_tool_result_for_event`
- `cargo test -p codex-core
mcp_post_tool_use_payload_uses_model_tool_name_args_and_result`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`
2026-04-30 16:30:43 +00:00
Ahmed Ibrahim
8a97f3cf03 realtime: rename provider session ids (#20361)
## Summary

Codex is repurposing `session` to mean a thread group, so the realtime
provider session id should no longer use `session_id` / `sessionId` in
Codex-facing protocol payloads. This PR renames that provider-specific
field to `realtime_session_id` / `realtimeSessionId` and intentionally
breaks clients that still send the old field names.

## What Changed

- Renamed realtime provider session fields in `ConversationStartParams`,
`RealtimeConversationStartedEvent`, and `RealtimeEvent::SessionUpdated`.
- Renamed app-server v2 realtime request and notification fields to
`realtimeSessionId`.
- Removed legacy serde aliases for `session_id` / `sessionId`; clients
must send the new names.
- Propagated the rename through core realtime startup, app-server
adapters, codex-api websocket handling, and TUI realtime state.
- Regenerated app-server protocol schema/TypeScript outputs and updated
app-server README examples.
- Kept upstream Realtime API concepts unchanged: provider `session.id`
parsing and `x-session-id` headers still use the upstream wire names.

## Testing

- CI is running on the latest pushed commit.
- Earlier local verification on this PR:
  - `cargo test -p codex-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-core
realtime_conversation`
  - `cargo test -p codex-app-server-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-app-server
realtime_conversation`
- attempted `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-tui` (local
linker bus error while linking the test binary)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 13:39:48 +03:00
jif-oai
c37f7434ba Gate multi-agent v2 tools independently of collab (#20246)
## Why

`multi_agents_v2` is meant to be independently gated from the older
`collab` feature. The tool registry still treated the
collaboration-style agent tools as `collab`-only, so enabling
`multi_agents_v2` without `collab` omitted the v2 agent tools. Review
and guardian sub-sessions also need to keep agent spawning disabled even
when the outer session has `multi_agents_v2` enabled.

## What changed

- Include the collab-backed agent tools when either `multi_agents_v2` or
`collab` is enabled.
- Explicitly disable `multi_agents_v2` for review and guardian review
sub-sessions, matching the existing `spawn_csv` and `collab`
restrictions.
- Add a registry test that enables `multi_agents_v2`, disables `collab`,
and verifies the v2 agent tools are present while legacy `send_input`
and `resume_agent` remain hidden.

## Testing

- Added
`test_build_specs_multi_agent_v2_does_not_require_collab_feature`.
2026-04-30 10:23:31 +02:00
Eric Traut
a73403a890 Make missing config clears no-ops (#20334)
## Why

Fixes #20145.

`config/value/write` treats a JSON `null` value as a request to clear
the config key. Clearing a key that is already absent should be
idempotent, but clearing a nested key such as `features.personality`
from an empty `config.toml` returned `configPathNotFound` because
`clear_path` treated the missing `features` parent table as an error.

That makes app-server reset flows brittle because clients have to read
first and avoid sending a clear request unless the parent path already
exists.

## What Changed

- Updated app-server config clearing so missing intermediate tables, or
non-table parents, are treated as an unchanged no-op.
- Removed the now-unreachable `MergeError::PathNotFound` path from
config write merging.
- Added a regression test covering `features.personality = null` against
an empty user config.

## Verification

- `cargo test -p codex-app-server clear_missing_nested_config_is_noop`
- `cargo test -p codex-app-server` was run; the config manager unit
suite passed, but one unrelated integration test failed because
`turn_start_emits_thread_scoped_warning_notification_for_trimmed_skills`
expected `7` trimmed skills and observed `8`.
- `just fix -p codex-app-server`
2026-04-30 10:13:33 +02:00
xl-openai
87d0cf1a62 feat: Add workspace plugin sharing APIs (#20278)
1. Adds v2 plugin/share/save, plugin/share/list, and plugin/share/delete
RPCs.
2. Implements save by archiving a local plugin root, enforcing a size
limit, uploading through the workspace upload flow, and supporting
updates via remotePluginId.
3. Lists created workspace plugins
4. Deletes a previously uploaded/shared plugin.
2026-04-29 23:49:20 -07:00
Michael Bolin
ae863e72a2 ci: increase Windows release workflow timeouts (#20343)
## Why

#20271 increased the `90`-minute timeout in `rust-release.yml`, but it
did not update the reusable Windows workflow in
`rust-release-windows.yml`. As a result, the Windows release compile
jobs were still capped at `60` minutes and the `windows-x64` primary
build could continue timing out.

We are keeping the existing `90`-minute timeout in `rust-release.yml`.
That increase was still directionally correct because the top-level
release build benefits from extra headroom; the mistake was assuming it
also covered the reusable Windows jobs.

## What Changed
- increase the reusable Windows release workflow timeouts in
`rust-release-windows.yml` from `60` minutes to `90` minutes
- update the comment in `rust-release.yml` so it no longer implies that
the top-level timeout covers the Windows reusable jobs
2026-04-29 23:27:04 -07:00
Abhinav
8f3c06cc97 Add persisted hook enablement state (#19840)
## Why

After `hooks/list` exposes the hook inventory, clients need a way to
persist user hook preferences, make those changes effective in
already-open sessions, and distinguish user-controllable hooks from
managed requirements without adding another bespoke app-server write
API.

## What

- Extends `hooks/list` entries with effective `enabled` state.
- Persists user-level hook state under `hooks.state.<hook-id>` so the
model can grow beyond a single boolean over time.
- Uses the existing `config/batchWrite` path for hook state updates
instead of introducing a dedicated hook write RPC.
- Refreshes live session hook engines after config writes so
already-open threads observe updated enablement without a restart.

## Stack

1. openai/codex#19705
2. openai/codex#19778
3. This PR - openai/codex#19840
4. openai/codex#19882

## Reviewer Notes

The generated schema files account for much of the raw diff. The core
behavior is in:

- `hooks/src/config_rules.rs`, which resolves per-hook user state from
the config layer stack.
- `hooks/src/engine/discovery.rs`, which projects effective enablement
into `hooks/list` from source-derived managedness.
- `config/src/hook_config.rs`, which defines the new `hooks.state`
representation.
- `core/src/session/mod.rs`, which rebuilds live hook state after user
config reloads.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 04:46:32 +00:00
Michael Bolin
ac4332c05b permissions: expose active profile metadata (#20095) 2026-04-29 20:54:59 -07:00
Matthew Zeng
ebe602d005 [plugins] Allow MSFT curated plugins in tool_suggest (#20304)
## Summary
- [x] Move the allowlist out of core crate
- [x] Add Teams, SharePoint, Outlook Email, and Outlook Calendar to the
tool_suggest discoverable plugin allowlist
- [x] Add focused coverage for Microsoft curated plugin discovery

## Testing
- just fmt
- cargo test -p codex-core-plugins
- cargo test -p codex-core
list_tool_suggest_discoverable_plugins_returns_
2026-04-29 19:45:52 -07:00
pakrym-oai
4e677d62da app-server: remove dead api version handling from bespoke events (#20291)
Remove ApiVersion::V1
2026-04-30 01:55:44 +00:00
rhan-oai
bb536d65bd [codex-analytics] prevent stale guardian events from satisfying reused reviews (#20080)
## Why

Reused Guardian review trunks can still have older child-turn events
queued when a later review starts. The review waiter currently accepts
the first terminal event it sees from the shared child session, so a
stale `TurnComplete` can be attributed to the new review. That produces
impossible analytics combinations such as non-null TTFT with sub-10 ms
completion latency and zero token deltas on `trunk_reused` reviews.

## What changed

- Preserve the child turn id returned by the Guardian review
`Op::UserTurn` submission.
- Restrict Guardian review waiting to events correlated with that
submitted child turn.
- Restrict timeout/abort draining to terminal events for the same child
turn.
- Add regression coverage for stale prior-turn completions, stale
prior-turn errors, and interrupt draining in
`codex-rs/core/src/guardian/review_session.rs`.

## Verification

- `cargo test -p codex-core guardian::review_session::tests::`
- `cargo clippy -p codex-core --tests -- -D warnings`
2026-04-29 18:26:39 -07:00
Alex Zamoshchin
8b07132e09 update codex_plugins_beta_setting (from workspace settings) (#20250)
update the name after rename internally

see https://github.com/openai/openai/pull/871006
2026-04-30 00:40:25 +00:00
Eric Traut
515aa9a4fb tui: return from side chat on Ctrl-D (#20282)
## Why

Fixes #20264.

Side conversations are an ephemeral layer on top of the main chat.
Pressing `Ctrl+D` from an empty side-chat composer should unwind back to
the parent thread, matching the existing side-return behavior, instead
of falling through to the global quit shortcut and exiting Codex.

## What changed

The side-return shortcut matcher now treats `Ctrl+D` the same way it
already treats `Esc` and `Ctrl+C`. Because app-level side-return
handling runs before the chat widget's global quit handling, this
returns from `/side` while preserving normal `Ctrl+D` quit behavior
outside side conversations.

The existing shortcut coverage was updated to include lowercase and
uppercase `Ctrl+D` key events.

## Verification

- `cargo test -p codex-tui
side_return_shortcuts_match_esc_ctrl_c_and_ctrl_d`
- `cargo test -p codex-tui` starts successfully and the new shortcut
test passes, but the broader suite later aborts in the unrelated
existing test
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`
with a stack overflow.
2026-04-29 17:26:11 -07:00
pakrym-oai
fedcefe9da Reduce the surface of collaboration modes (#20149)
Collaboration modes were slightly invasive both into ThreadManager
construction and ModelProvider
2026-04-29 17:22:41 -07:00
stefanstokic-oai
c8abcbf925 Import external agent sessions in background (#20284)
Summary:
- Return from external agent import before session history import
finishes
- Run session import work in the background and emit the existing
completion notification when it is done
- Serialize session imports so duplicate requests do not create
duplicate imported threads

Verification:
- cargo test -p codex-app-server external_agent_config_
- cargo test -p codex-external-agent-sessions
- just fix -p codex-app-server
- just fix -p codex-external-agent-sessions
- git diff --check
2026-04-30 00:00:41 +00:00
alexsong-oai
7bcd4626c4 Consume ai-title from external sessions and add end marker (#20261)
## Summary
- Support Claude Code `ai-title` / `aiTitle` records when detecting and
importing external agent sessions.
- Preserve existing `custom-title` / `customTitle` precedence; only fall
back to `aiTitle` when no custom title is present.
- Add coverage for both detection and import title selection, including
the custom-title-over-ai-title case.

## Testing
- `cargo test -p codex-external-agent-sessions`
- `just fix -p codex-external-agent-sessions`
2026-04-30 00:00:13 +00:00
Abhinav
8774229a89 Add hooks/list app-server RPC (#19778)
## Why

We need a way to list the available hooks to expose via the TUI and App
so users can view and manage their hooks

## What

- Adds `hooks/list` for one or more `cwd` values that returns discovered
hook metadata

## Stack

1. openai/codex#19705
2. This PR - openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882

## Review Notes

The generated schema files account for most of the raw diff, these files
have the core change:

- `hooks/src/engine/discovery.rs` builds the inventory entries during
hook discovery while leaving runtime handlers focused on execution.
- `app-server/src/codex_message_processor.rs` wires `hooks/list` into
the app-server flow for each requested `cwd`.
- `app-server-protocol/src/protocol/v2.rs` defines the new v2
request/response payloads exposed on the wire.

### Core Changes

`core/src/plugins/manager.rs` adds `plugins_for_layer_stack(...)` so
`skills/list` and `hooks/list`can resolve plugin state for each
requested `cwd`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-29 23:39:57 +00:00
Michael Bolin
6eab7519b4 chore: increase release build timeout from 60 min to 90 (#20271)
Build times are creeping up, so increase the timeout as a precaution.
2026-04-29 16:19:59 -07:00
rafael-jac
98f67b15d3 Update Codex login success page UX (#20136)
## Summary

update the local login success page to match the Codex desktop auth UX
use theme-aware colors and an inline 20px Codex mark
keep the actual localhost success page aligned with the browser auth UX
PR

## Tests

<img width="1728" height="1117" alt="Screenshot 2026-04-29 at 12 00
34 PM"
src="https://github.com/user-attachments/assets/76a40c3f-07c3-452c-97da-e7c43717cd2c"
/>
2026-04-29 19:14:53 -04:00
evawong-oai
74f06dcdfb Enforce workspace metadata protections in Linux sandbox (#19852)
## Summary

Enforce FileSystemSandboxPolicy protected metadata names in the Linux
bubblewrap adapter so `.git`, `.agents`, and `.codex` remain read only
inside writable workspace roots unless the policy grants an explicit
write carveout.

## Scope

1. Translate protected metadata names from FileSystemSandboxPolicy into
bubblewrap masks for existing metadata paths.
2. Represent missing protected metadata paths as guarded mount targets
so agents cannot create `.git`, `.agents`, or `.codex` under writable
roots.
3. Preserve normal git discovery for existing repos, worktrees, and
parent repos.
4. Keep explicit user write grants working when policy allows a
protected metadata path directly.

## Not in scope

1. No shell preflight UX.
2. No TUI runtime profile propagation.
3. No macOS Seatbelt changes in this PR.

## Reviewer focus

1. This should be reviewed as the Linux enforcement adapter for the
policy primitive from PR 19846.
2. macOS enforcement already landed in PR 19847.
3. The important invariant is that `FileSystemSandboxPolicy` is the
source of truth for `.git`, `.agents`, and `.codex`.

## Validation

1. `git diff` whitespace check passed.
2. `cargo fmt` check passed with the existing stable rustfmt warning
about `imports_granularity`.
3. Full Linux sandbox Cargo test suite passed on the devbox.
4. Devbox forty six case suite passed at head
`012accb703c13bd28df5b40079a9bf183036336a`.
5. Devbox summary: pass 46, fail 0.
6. The devbox suite was run through `just c sandbox linux`.
7. Focused repo test for Viyat parent repo case passed on the devbox.
2026-04-29 16:14:14 -07:00
iceweasel-oai
13dbcda28f stop blocking unified_exec on Windows (#19435)
## Summary
- remove the Windows-specific unified-exec environment block from tool
selection
- keep `unified_exec` default-off on Windows unless the feature is
explicitly enabled
- normalize model-provided `shell_type = unified_exec` to
`shell_command` when the feature is disabled
- drop obsolete tests tied to the removed environment gate and keep the
feature-flag regression coverage

## Why
Now that the session/long-lived process backend is implemented for the
Windows sandbox, we don't need to hard disable it anymore. We will be
rolling out slowly using a feature gate.

## Impact
This allows manual Windows opt-in in CLI and app-backed flows while
preserving the existing default-off behavior for Windows users.

---------

Co-authored-by: canvrno-oai <kbond@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-29 16:06:33 -07:00
pakrym-oai
8de2a7a16d Add codex-core public API listing (#20243)
Summary:
- Add a checked-in codex-core public API listing generated by
cargo-public-api.
- Add scripts/regen-public-api.sh with an embedded crate list,
auto-install for cargo-public-api 0.51.0, pinned nightly, and --check
mode.
- Add Rust CI jobs on the codex Linux x64 runner pool to verify the
listing stays up to date.

Testing:
- bash -n scripts/regen-public-api.sh
- just regen-public-api --check
- yq '.' .github/workflows/rust-ci.yml
.github/workflows/rust-ci-full.yml
- git diff --check
2026-04-29 22:58:08 +00:00
Rasmus Rygaard
782191547c Add agent graph store interface (#19229)
## Summary

Persisted subagent parent/child topology currently leaks through
`StateRuntime`'s SQLite-specific thread-spawn helpers. This PR
introduces a narrow `AgentGraphStore` boundary so follow-up work can
route graph operations through a local or remote store without coupling
orchestration code directly to the state DB graph API.

## Changes

- Adds the new `codex-agent-graph-store` crate.
- Defines a flat `AgentGraphStore` trait for the v1 graph surface:
upsert edge, set edge status, list direct children, and list
descendants.
- Adds public graph types for `ThreadSpawnEdgeStatus`,
`AgentGraphStoreError`, and `AgentGraphStoreResult`.
- Implements `LocalAgentGraphStore` on top of an existing
`codex_state::StateRuntime`, preserving today's SQLite-backed
`thread_spawn_edges` behavior.
- Registers the crate in Cargo/Bazel metadata.

This PR only adds the local contract and implementation; call-site
migration and the remote gRPC store are left to the follow-up PRs in the
stack.

## Testing

- `cargo test -p codex-agent-graph-store`

The new unit tests cover local parity with the existing `StateRuntime`
graph methods, `Open`/`Closed` filtering, status updates, and stable
breadth-first descendant ordering.
2026-04-29 22:48:26 +00:00
Matthew Zeng
e20391e567 [mcp] Fix plugin MCP approval policy. (#19537)
Plugin MCP servers are loaded from plugin manifests rather than
top-level `[mcp_servers]`, so their tool approval preferences need to be
stored and applied through the owning plugin config. Without this,
choosing "Always allow" for a plugin MCP tool could write a preference
that was not reliably used on later tool calls.

## Summary
- Add plugin-scoped MCP policy config under
`plugins.<plugin>.mcp_servers`, including server enablement, tool
allow/deny lists, server defaults, and per-tool approval modes.
- Overlay plugin MCP policy onto manifest-provided server configs when
plugins are loaded.
- Route persistent "Always allow" writes for plugin MCP tools back to
the owning `plugins.<plugin>.mcp_servers.<server>.tools.<tool>` config
entry.
- Reload user config after persisting an approval and make the plugin
load cache config-aware so stale plugin MCP policy is not reused after
`config.toml` changes.
- Regenerate the config schema and add coverage for plugin MCP policy
loading, approval lookup, persistence, and stale-cache prevention.

## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-core --lib plugin_mcp`
2026-04-29 15:40:03 -07:00
Eric Traut
4241df4d79 Escape turn metadata headers as ASCII JSON (#19620)
## Why

`x-codex-turn-metadata` is sent as an HTTP/WebSocket header, but Codex
was serializing the metadata JSON with raw UTF-8 string contents. When a
workspace path contains non-ASCII characters, common HTTP stacks can
reject or corrupt that header before the request reaches the provider.

Fixes #17468. Also addresses the duplicate WebSocket report in #19581.

## What changed

- Added `codex_utils_string::to_ascii_json_string`, a shared helper that
serializes JSON normally while escaping non-ASCII string content as
`\uXXXX`.
- Switched turn metadata header serialization, including merged
Responses API client metadata, to use the ASCII-safe JSON helper.
- Added coverage for non-ASCII workspace paths and non-ASCII client
metadata while preserving the same parsed JSON values.

## Verification

- `cargo test -p codex-utils-string`
- `cargo test -p codex-core turn_metadata`
- `just bazel-lock-check`
2026-04-29 15:35:33 -07:00
Michael Bolin
b1546008fc docs: discourage #[async_trait] and #[allow(async_fn_in_trait)] (#20242)
## Why

We have run into two avoidable problems when introducing async trait
APIs in Rust:

- `#[async_trait]` has caused materially worse build times in this
repository.
- `#[allow(async_fn_in_trait)]` makes it too easy to ship a public trait
without spelling out whether the returned future is `Send`, which hides
an important part of the trait contract.

We already have a good example of the preferred alternative in
[#16630](https://github.com/openai/codex/pull/16630) /
[`3c7f013f9735`](https://github.com/openai/codex/commit/3c7f013f9735),
but that guidance currently lives only as prior art in the codebase.
This PR documents the rule in `AGENTS.md` so contributors are more
likely to follow the native RPITIT pattern before these two shortcuts
spread further.

## What Changed

- added Rust guidance in `AGENTS.md` discouraging both `#[async_trait]`
and `#[allow(async_fn_in_trait)]`
- pointed contributors to the native RPITIT pattern with explicit `Send`
bounds on the returned future
- clarified that implementations may still use `async fn` when they
satisfy that trait contract

## Verification

- docs-only change; no tests run
2026-04-29 15:29:29 -07:00
Alex Daley
f63b19bedd [apps] Add apps MCP path override (#20231)
Summary

- Add `[features.apps_mcp_path_override]` config with a `path` field for
overriding only the built-in apps MCP path.
- Keep existing host/base URL derivation unchanged and append the
configured path after that base.
- Regenerate the config schema with the custom feature-config case.

Test Plan

- Not run for latest revision; only `just fmt` and `just
write-config-schema` were run.
- Earlier revision: `cargo test -p codex-features`
- Earlier revision: `cargo test -p codex-mcp`
2026-04-29 18:08:06 -04:00
xli-oai
8d5da3ffe5 Fallback login callback port when default is busy (#19334)
## Summary
- Keep the preferred ChatGPT login callback port `1455` first.
- Preserve the existing `/cancel` recovery for stale Codex login
servers.
- Fall back to the registered localhost callback port `1457` when `1455`
remains unavailable.

## Why
Cursor and Codex Desktop both use the ChatGPT account login callback
server. On Windows, Cursor can already be listening on `127.0.0.1:1455`
/ `[::1]:1455`, causing Codex Desktop sign-in to fail with:

`Local callback port 1455 is already in use on this machine.`

Codex already attempted to cancel a stale Codex login server on that
port, but if the listener does not release the port, the old behavior
was to fail. The new behavior falls back to `1457`, which matches the
fixed redirect URI being registered server-side in
`openai/openai#863817`. This keeps the OAuth `redirect_uri` inside
Hydra's exact allow-list instead of choosing an arbitrary ephemeral
port.

## Validation
- `just fmt`
- `cargo test -p codex-login`
- `git diff --check HEAD~1..HEAD`
2026-04-29 14:45:27 -07:00
rhan-oai
72a39e3a96 [app-server] centralize client response analytics (#20059)
## Why

The precursor PR keeps successful client responses typed until
app-server's outgoing response seam. This follow-up uses that seam to
move successful client-response analytics out of individual handlers and
into the shared sender path, while keeping filtering decisions inside
`codex-analytics`.

## What changed

- Emit successful client-response analytics centrally from
`OutgoingMessageSender::send_response`.
- Remove duplicate handler-local response tracking for the current
thread/turn lifecycle responses.
- Keep analytics ingestion selective inside `AnalyticsEventsClient`, so
unrelated client traffic is ignored before cloning or boxing.
- Collapse client-response analytics facts onto one typed path and
normalize payloads in the reducer.
- Add direct client-filter coverage plus sender-level coverage for the
centralized forwarding path.

## Verification

- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests --lib`
2026-04-29 21:22:39 +00:00
xli-oai
afbddabc8b Require remote plugin detail before uninstall (#19966)
## Summary
- Fetch remote plugin detail before sending the uninstall request.
- Use the detail response to derive the marketplace namespace and plugin
name for cache cleanup.
- Stop the uninstall before the backend POST if detail lookup fails, so
backend state and local cache state do not diverge.

## Testing
- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo test -p codex-core-plugins`
- `git diff --check`
2026-04-29 14:01:11 -07:00
rhan-oai
973c5c823e [app-server] type client response payloads (#20050)
## Why

`pr17088` adds typed server-originated request/response plumbing, but
successful client responses are still erased into bare JSON-RPC `result`
values before app-server can make any typed decision about them.

This precursor PR keeps successful client responses typed until the
outgoing response seam. It is intentionally limited to
protocol/app-server plumbing so the analytics behavior change can review
separately on top.

## What changed

- Add `ClientResponsePayload` as the pre-serialization client response
body type.
- Route app-server successful response paths through the typed payload
seam while preserving existing handler-local analytics behavior.
- Keep `InterruptConversation` JSON-RPC-only because it has no
`ClientResponse` variant.
- Move the new payload conversion tests into a dedicated protocol test
module.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server-protocol`
2026-04-29 20:50:47 +00:00
sayan-oai
b15074d0a4 app-server: fix outgoing sender test setup (#20258)
## Why

[#17088](https://github.com/openai/codex/pull/17088) changed
`OutgoingMessageSender::new` to require an `AnalyticsEventsClient`, but
one `command_exec` test added earlier on `main` still called the old
one-argument constructor. That leaves current `main` failing to compile
in Bazel and argument-comment-lint jobs.

## What changed

- Pass `AnalyticsEventsClient::disabled()` to the missed
`OutgoingMessageSender::new` test call site in `command_exec.rs`.

## Verification

- `cargo test -p codex-app-server
timeout_or_cancellation_reports_cancellation_without_timeout_exit_code`
2026-04-29 20:47:20 +00:00
Matthew Zeng
8ce48f9968 [tool_suggest] Improve tool_suggest triggering conditions. (#20091)
## Summary
- Tighten `tool_suggest` guidance so it prefers explicit plugin install
requests, while still allowing a connector install when the relevant
plugin is already installed and a needed connector from that plugin is
missing.
- Tell the model not to call `tool_suggest` in parallel with other
tools.

## Testing
- `cargo test -p codex-tools tool_suggest`
- `cargo test -p codex-core tool_suggest`
2026-04-29 13:41:12 -07:00
rhan-oai
0690ab0842 [codex-analytics] ingest server requests and responses (#17088)
## Why

Codex analytics needs a typed seam for app-server-originated
request/response traffic so future tool-approval analytics can consume
those facts without adding bespoke callsite tracking each time. Server
responses arrive as JSON-RPC `id + result` payloads, so analytics has to
reconstruct the matching typed response from the original typed request
while that request context still exists in app-server.

This also puts analytics on the app-server outbound path, which needs to
avoid keeping the runtime alive during shutdown. The final ownership fix
keeps the normal strong auth-manager retention in analytics and makes
the external-auth refresh bridge hold a weak back-reference to
`OutgoingMessageSender`, breaking the runtime cycle at the bridge
boundary instead of exposing retention policy through the analytics
client API.

## What changed

- Adds typed `ServerRequest` and `ServerResponse` analytics facts, plus
`AnalyticsEventsClient::track_server_request` and
`track_server_response`.
- Renames the existing client-side facts to `ClientRequest` and
`ClientResponse` so reducers can distinguish client-to-server traffic
from server-to-client traffic.
- Adds `ServerRequest::response_from_result`, allowing a stored typed
request to decode the matching typed server response from a raw JSON-RPC
result payload.
- Threads `AnalyticsEventsClient` through `OutgoingMessageSender` and
records targeted server requests, replayed targeted requests, and
matching targeted responses with the responding connection id needed for
correlation.
- Intentionally leaves broadcast server requests/responses out of
analytics for now because the current model is per connection, while
broadcasts fan one logical request out across multiple connections.
- Breaks the app-server shutdown cycle by storing
`Weak<OutgoingMessageSender>` in `ExternalAuthRefreshBridge` and
upgrading it only when an external-auth refresh is actually requested.
- Keeps reducer ingestion of the new server-side facts as no-ops for
now; this PR is plumbing for later tool-approval analytics work.

## Verification

- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests::`
- Covers typed-response reconstruction plus the targeted, replayed,
broadcast-exclusion, and response-attribution analytics paths.

## Follow-up

This PR intentionally stops at ingestion plumbing, so `ServerRequest`
and `ServerResponse` facts are still reducer no-ops. Once a follow-up PR
adds real downstream analytics output for those facts:

- replace the temporary pre-reducer observation seam with reducer tests
for the emitted event shape;
- add end-to-end coverage in `app-server/tests/suite/v2/analytics.rs`
for the real app-server workflow and captured analytics payload;
- remove the temporary sender-level observer tests added here in favor
of the real-output coverage above.

---

[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17088).
* #18748
* #18747
* #17090
* #17089
* #20241
* #20239
* __->__ #17088
2026-04-29 19:56:41 +00:00
iceweasel-oai
9d1e5df4b2 expand the set of core shell env vars for Windows. (#20089)
https://github.com/openai/codex/issues/13917 and
https://github.com/openai/codex/issues/18248 correctly identify that

```
[shell_environment_policy]
inherit = "core"
```
is not functional on Windows because it carries an insufficient set of
env vars.
This PR expands that to match the more functional set from the MCP
client
2026-04-29 19:23:46 +00:00
viyatb-oai
07c8b8c77c fix: handle deferred network proxy denials (#19184)
## Why

This bug is exposed by Guardian/auto-review approvals. With the managed
network proxy enabled, a blocked network request can be reported back
through the network approval service as an approval denial after the
command has already started. Before this change, the shell and unified
exec runtimes registered those network approval calls, but did not have
a way to observe an async proxy denial as a cancellation/failure signal
for the running process.

The result was confusing: Guardian/auto-review could correctly deny
network access, but the command path could keep running or unregister
the approval without surfacing the denial as the command failure.

## What Changed

- `NetworkApprovalService` now attaches a cancellation token to active
and deferred network approvals.
- Proxy-denial outcomes are recorded only for active registrations,
cancel the owning token, and are consumed when the approval is
finalized.
- The shell runtime combines the normal command timeout with the
network-denial cancellation token.
- Unified exec stores the deferred network approval object, terminates
tracked processes when the proxy denial arrives, and returns the denial
as a process failure while polling or completing the process.
- Tool orchestration passes the active network approval cancellation
token into the sandbox attempt and preserves deferred approval errors
instead of silently unregistering them.
- App-server `command/exec` now handles the combined
timeout-or-cancellation expiration variant used by the runtime.

## Verification

- `cargo test -p codex-core network_approval --lib`
- `cargo clippy -p codex-app-server --all-targets -- -D warnings`
- `cargo clippy -p codex-core --all-targets -- -D warnings`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-29 19:13:57 +00:00
xl-openai
73cd831952 feat: Use remote installed plugin cache for skills and MCP (#20096)
- Fetches and caches remote /installed plugin state
- Lets skills/list load skills from remote-installed cached plugins
without requiring a local marketplace entry
- Routes plugin list/startup/install/uninstall changes through async
plugin cache invalidation and MCP refresh
2026-04-29 12:09:49 -07:00
Won Park
5cf0adba93 Include auto-review rollout in feedback uploads (#20064)
## Summary

- include the live auto-review trunk rollout when `/feedback` uploads
logs
- upload that attachment as
`auto-review-rollout-<parent-thread-id>.jsonl` so it is distinguishable
from the parent rollout
- show the same auto-review attachment name in the TUI consent popup

## Scope

- this only covers the live cached auto-review trunk for the current
parent thread
- it does not add durable historical parent->auto-review lookup
- it does not add persisted rollout support for ephemeral parallel
review forks

## UI 

<img width="599" height="185" alt="Screenshot 2026-04-28 at 1 17 18 PM"
src="https://github.com/user-attachments/assets/6a0e79c2-5d21-4702-8a89-f765778bc9e9"
/>

## Validation

- `cargo test -p codex-core
cached_guardian_subagent_exposes_its_rollout_path`
- `cargo test -p codex-feedback`
- `cargo test -p codex-app-server`
- `cargo test -p codex-tui feedback_upload_consent_popup_snapshot`
- `cargo test -p codex-tui
feedback_good_result_consent_popup_includes_connectivity_diagnostics_filename`

## Known unrelated local failures

- `cargo test -p codex-core` currently fails in the pre-existing proxy
env snapshot test
`tools::runtimes::tests::maybe_wrap_shell_lc_with_snapshot_keeps_user_proxy_env_when_proxy_inactive`
- `cargo test -p codex-tui` currently hits pre-existing `status::*`
snapshot drift unrelated to this change

## Follow-Up 
- persist parallel auto-review fork sessions so /feedback can include
their rollout history too
- attach each persisted fork as its own clearly named file, for example
auto-review-rollout-<parent-thread-id>-fork <n>.jsonl, instead of
merging multiple Guardian sessions into one attachment
- keep the same live-session-only scope initially; durable historical
parent -> auto-review lookup can remain a separate decision if we later
need feedback from resumed sessions
2026-04-29 11:44:55 -07:00
friel-openai
05fd904572 test protocol: lock inter-agent commentary phase (#20046)
## Summary
- add a regression test for
`InterAgentCommunication::to_response_input_item`
- assert replayed inter-agent messages keep `phase:
Some(MessagePhase::Commentary)`

## Test plan
- `cargo test -p codex-protocol`
- `just argument-comment-lint`
2026-04-29 11:24:17 -07:00
pakrym-oai
8356806fc9 Add ThreadManager sample crate (#20141)
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.

Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check

Tests: Not run per request.
2026-04-29 11:21:06 -07:00
joeytrasatti-openai
47fba5df4a [codex-backend] Prefer sqlite git info for rollout-path reads (#20228)
### Summary

- Path-based local thread reads currently return rollout/session git
metadata directly, so `thread/resume` can disagree with persisted SQLite
metadata for the same thread.
- Merge non-null SQLite git fields over rollout-path reads while keeping
rollout values as fallbacks for fields SQLite does not know.
- Add focused regression coverage for rollout-path reads so persisted
branch updates are preserved during resume.

### Testing

- `cargo test -p codex-thread-store`
2026-04-29 17:54:37 +00:00
Eric Traut
d0204c3dcc TUI: Remove core protocol dependency [3/7] (#20174)
## Why

This is part 3 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.

With `AppCommand` now explicit, the internal app event bus can carry TUI
commands directly instead of bouncing through core `Op` values.

## What changed

- Changed `AppEvent::CodexOp` and `AppEvent::SubmitThreadOp` to carry
`AppCommand`.
- Updated app-event senders and direct emitters to submit `AppCommand`
values.
- Adjusted tests to match `AppCommand` or convert back through
`into_core()` where they intentionally assert legacy payload equality.

## Verification

- `cargo test -p codex-tui --no-run`
2026-04-29 10:52:10 -07:00
Eric Traut
445629815c TUI: Remove core protocol dependency [2/7] (#20173)
## Why

This is part 2 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.

Before the TUI event bus can stop carrying core `Op` values,
`AppCommand` needs to be an owned TUI command shape rather than a thin
wrapper around `Op`.

## What changed

- Replaced the opaque `AppCommand(Op)` wrapper with explicit owned
variants for the commands the TUI submits.
- Preserved `into_core()` so this layer does not yet change the
app/thread submission boundary.
- Kept existing core leaf types for now so this remains a mechanical
command-shape refactor.

## Verification

- `cargo check -p codex-tui`
2026-04-29 10:28:04 -07:00
cassirer-openai
df966996a7 [rollout-tracer] Match analysis messages on encrypted id. (#20123)
In some setups the summary or raw content can be dropped between
requests. This triggers a check in the reducer which expects that the
messages should remain identical between requests.

This PR relaxes the checks to only focus on the encrypted ID instead. It
also changes the reducer to keep the most rich version of the message
observed during the rollout (this ensures that we don't accidentally
lose the CoT nor summary when available).
2026-04-29 17:22:24 +00:00
iceweasel-oai
cecca5ae06 Improve Windows process management edge cases (#19211)
## Summary

Some improvements to Windows process-management issues from
https://github.com/openai/codex/pull/15578

- bound the elevated runner pipe-connect handshake instead of waiting
forever on blocking pipe connects
- terminate the spawned runner if that handshake fails, so timeout/error
paths do not leave a stray `codex-command-runner.exe`
- loop on partial `WriteFile` results when forwarding stdin in the
elevated runner, so input is not silently truncated
- fix the concrete HANDLE/SID cleanup paths in the runner setup code
- keep draining driver-backed stdout/stderr after exit until the backend
closes, instead of dropping the tail after a fixed 200ms grace period
- reuse `LocalSid` for SID ownership and add more explanatory comments
around the ownership/concurrency-sensitive code paths

## Why

The original PR fixed a lot of Windows session plumbing, but there were
still a few sharp process-lifecycle edges:

- some elevated runner handshakes could block forever
- the new timeout path could still orphan the spawned runner process
- stdin forwarding still assumed a single `WriteFile` consumed the whole
buffer
- a few raw HANDLE/SID error paths still leaked
- driver-backed output could still lose the last chunk of stdout/stderr
on slower backends

## Validation

- `cargo fmt -p codex-windows-sandbox -p codex-utils-pty`
- `cargo test -p codex-utils-pty`
- `cargo test -p codex-windows-sandbox finish_driver_spawn`
- `cargo test -p codex-windows-sandbox runner_`

Ran a local test matrix of unified-exec and shell_tool tests, all
passing
2026-04-29 10:00:01 -07:00
Eric Traut
1c420a90cd TUI: Remove core protocol dependency [1/7] (#20172)
## Why

This is part 1 of a 7-PR stack to remove direct
`codex_protocol::protocol` usage from `codex-tui` while keeping each
layer reviewable and shippable.

This first layer reduces the size of the later `chatwidget` diff by
mechanically moving MCP startup bookkeeping out of the central widget
file without changing the event shapes or behavior.

## What changed

- Extracted MCP startup status handling into
`tui/src/chatwidget/mcp_startup.rs`.
- Kept the existing core event types in place for this purely mechanical
move.
- Updated the MCP startup tests to import the moved test-only event
types directly.

## Verification

- `cargo test -p codex-tui chatwidget::tests::mcp_startup`
2026-04-29 09:10:22 -07:00
Eric Traut
91ca551df8 Use /goal resume for paused goals (#20082)
## Why

The paused goal statusline currently points users at `/goal` to unpause
a goal, but bare `/goal` is the summary command and does not change the
goal state. Instead of making `/goal` mutate state only when a goal is
paused, this gives the action an explicit command that reads naturally
in the UI.

## What Changed

- Replace `/goal unpause` with `/goal resume` for reactivating a paused
goal.
- Update the paused goal statusline and `/goal` summary copy to point at
`/goal resume`.
2026-04-29 08:56:02 -07:00
jif-oai
70ac0f123c Make multi-agent v2 ignore agents.max_depth (#20180)
## Why

`agents.max_depth` is a legacy multi-agent v1 guard. Multi-agent v2 uses
task-path routing and its own session/thread limits, so v2 should not
reject nested `spawn_agent` calls just because the thread-spawn depth
has reached the v1 maximum.

Keeping the v1 depth guard active in v2 prevents deeper task trees even
though the v2 path still needs the depth value only for lineage and
task-path metadata.

## What Changed

- Removed the depth-limit rejection from the multi-agent v2
`spawn_agent` handler while still computing child depth for lineage/path
metadata.
- Made the depth-based disabling of legacy `SpawnCsv`/`Collab` tools
apply only when `Feature::MultiAgentV2` is disabled.
- Added `multi_agent_v2_spawn_agent_ignores_configured_max_depth` to
cover a v2 child spawning another agent when `agent_max_depth = 1`,
while the existing v1 depth-limit tests continue to enforce the legacy
behavior.

## Verification

- `cargo test -p codex-core
multi_agent_v2_spawn_agent_ignores_configured_max_depth -- --nocapture`
- `cargo test -p codex-core depth_limit -- --nocapture`
- `cargo test -p codex-core tools::handlers::multi_agents::tests --
--nocapture`
2026-04-29 12:23:00 +02:00
jif-oai
c41b74c453 nit: drop old memories things (#20186)
Drop legacy code
2026-04-29 12:19:50 +02:00
iceweasel-oai
5cac3f896d Fix Windows pseudoconsole attribute handling for sandboxed PTY sessions (#20042)
## Summary
Fix the Windows sandbox PTY spawn path to pass the pseudoconsole handle
value directly into `UpdateProcThreadAttribute`.

## Why
Sandboxed `unified_exec` PTY sessions on Windows were failing during
child process startup with `0xc0000142` (`STATUS_DLL_INIT_FAILED`). In
practice this showed up as PowerShell DLL init popups when the sandboxed
background-terminal path tried to launch an interactive shell.

The root cause was that we were passing a pointer to a local `isize`
variable instead of the pseudoconsole handle value in the form Windows
expects for `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE`.

## Validation
- `cargo build -p codex-windows-sandbox --bins`
- Reproduced the real sandboxed `codex exec` flow with
`windows.sandbox_private_desktop=true`
- Verified a `tty=true` interactive session launched through the normal
PowerShell wrapper, printed `READY`, accepted follow-up stdin, and
exited cleanly
- Confirmed no new `0xc0000142` / `Application Popup` events appeared
after the successful repro
2026-04-29 11:59:45 +02:00
alexsong-oai
d92c909ee4 Fix migrated hook path rewriting (#20144)
## Summary
- Rewrite migrated external-agent hook commands by replacing the full
hook script path token instead of only the `.claude/hooks/` segment.
- Preserve quoting around the full rewritten target path so script names
with spaces, absolute paths, and shell operators/redirection continue to
work.
- Apply `.claude/settings.local.json` over `.claude/settings.json` for
config, MCP, and plugin migration so local scope matches Claude settings
precedence.
- Skip legacy command markdown without `description` frontmatter,
including README-style docs under `.claude/commands`.

## Root Cause
The previous hook rewrite handled `.claude/hooks/` as a substring
replacement. For absolute source commands, that left the original
project-root prefix before the newly quoted `.codex/hooks` directory,
producing invalid commands like
`project/'project/.codex/hooks'/script.sh`.

The migration also only used project `settings.json` for
config/MCP/plugin decisions, so local settings such as
`disabledMcpjsonServers` could be ignored even though Claude gives local
settings higher precedence than project settings.

## Validation
- `just fmt`
- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-app-server external_agent_config`
- `just fix -p codex-external-agent-migration`
- `just fix -p codex-app-server`
- `git diff --check`
2026-04-29 00:46:11 -07:00
viyatb-oai
5597925155 feat(cli): add sandbox profile config controls (#20118)
## Why

The explicit profile path from #20117 is meant for standalone testing,
but it still inherited the
shell cwd and all managed requirements implicitly. The pre-existing
launcher path even called out
that it did not support a separate cwd yet in

[`debug_sandbox.rs`](509453f688/codex-rs/cli/src/debug_sandbox.rs (L174-L179)).

For a standalone command, the useful default is to let the caller choose
the project directory being
tested and to avoid administrator-provided constraints unless the caller
explicitly wants to test
those too.

## What changed

- Add explicit-profile-only `-C/--cd DIR`, and use that cwd for both
profile resolution and command
  execution.
- Add explicit-profile-only `--include-managed-config`.
- Make explicit profile mode skip managed requirement sources by
default, including cloud
requirements, MDM requirements, `/etc/codex/requirements.toml`, and the
legacy managed-config
  requirements projection.
- Preserve all existing invocations outside the explicit-profile path.

## Stack

1. #20117 `sandbox-ui-profile`
2. #20118 `sandbox-ui-config` --> this PR

Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.

## Tests ran

- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_`
- `cargo test -p codex-core
load_config_layers_can_ignore_managed_requirements`
- `cargo test -p codex-core
load_config_layers_includes_cloud_requirements`
- macOS branch-binary smoke on the rebased top of stack: `-C` changed
execution cwd, explicit
profile mode omitted managed proxy env under `env -i`, and
`--include-managed-config` restored it.
- Linux devbox branch-binary smoke on the rebased top of stack: `-C`
changed execution cwd for
  built-in and user-defined explicit profiles.
2026-04-29 06:55:51 +00:00
Andrey Mishchenko
857146b328 Delete multi_agent_v2 followup_task interrupt parameter (#20139)
Messages sent with `followup_task` already arrive at their target
recipient promptly (at message boundaries while sampling, or after the
pending tool call completes) -- having `interrupt` is not worth the
added complexity.
2026-04-28 23:19:48 -07:00
viyatb-oai
6ed0440611 feat(cli): add explicit sandbox permission profiles (#20117)
## Why

`codex sandbox` is useful for exercising sandbox behavior directly, but
before this stack the CLI
only picked up permission profiles indirectly from the active config.
The existing debug-sandbox path
already compiled `[permissions]` profiles through normal config loading,
as covered by the existing
profile tests in
[`debug_sandbox.rs`](de2ccf9473/codex-rs/cli/src/debug_sandbox.rs (L715-L760)).

This adds the smallest stable entry point first: an explicit profile
selector that reuses the same
config machinery as normal Codex config, so standalone testing becomes
possible without changing
current no-selector behavior.

## What changed

- Add additive `--permissions-profile NAME` support to `codex sandbox
macos|linux|windows`.
- Resolve built-in and user-defined profile names by feeding
`default_permissions` through the
existing config compilation path instead of inventing a sandbox-only
parser.
- Make an explicit selector win over an ambient active profile's legacy
`sandbox_mode`.
- Keep the existing no-selector behavior unchanged.

## Stack

1. #20117 `sandbox-ui-profile` --> this PR
2. #20118 `sandbox-ui-config`

Both PRs are additive. Replay JSON is intentionally deferred to a
follow-up design pass.

## Tests ran

- `cargo test -p codex-cli debug_sandbox`
- `cargo test -p codex-cli sandbox_macos_parses_permissions_profile`
- `cargo test -p codex-core
cli_override_takes_precedence_over_profile_sandbox_mode`
- macOS branch-binary smoke on the rebased top of stack: built-in
`:workspace` and user-defined
  profiles both executed successfully through `--permissions-profile`.
- Linux devbox branch-binary smoke on the rebased top of stack: built-in
`:workspace` and
user-defined profiles both executed successfully through
`--permissions-profile`.
2026-04-29 06:18:16 +00:00
Dylan Hurd
3d10ba9f36 chore(cli) deprecate --full-auto (#20133)
## Summary
Starts the process of getting rid of `--full-auto`, with some
concessions:
1. Fully removes the command from the tui, since it just resolves to the
default permissions there, and encourages users to use the one-time
trust flow if they're not in a trusted repo.
2. Marks the command as deprecated in `codex exec`, in case users are
actively relying on this. We'll remove in an upcoming n+X release.
3. Cleans up some of the `codex sandbox` cli logic, to keep supporting
legacy sandbox policies for now.

This isn't the cleanest setup, but I think it is worthwhile to warn
users for one release before hard-removing it.

## Testing 
- [x] Updated unit tests
2026-04-29 04:41:30 +00:00
starr-openai
e1ec9e63a0 Add environment provider snapshot (#20058)
## Summary
- Change `EnvironmentProvider` to return concrete `Environment`
instances instead of `EnvironmentConfigurations`.
- Make `DefaultEnvironmentProvider` provide the provider-visible `local`
environment plus optional `remote` environment from
`CODEX_EXEC_SERVER_URL`.
- Keep `EnvironmentManager` as the concrete cache while exposing its own
explicit local environment for `local_environment()` fallback paths.

## Validation
- `just fmt`
- `git diff --check`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 20:05:18 -07:00
xl-openai
6f328d5e02 Soften skill description budget warnings (#20112)
Updates skill description budget messaging to be less alarming
2026-04-28 19:56:25 -07:00
Michael Bolin
e6db1a9442 linux-sandbox: switch helper plumbing to PermissionProfile (#20106)
## Why

`PermissionProfile` is the canonical runtime permission model in the
Rust workspace, but the Linux sandbox helper still accepted a legacy
`SandboxPolicy` plus separate filesystem and network policy flags. That
translation layer made the helper interface harder to reason about and
left `linux-sandbox`-specific callers and tests coupled to the legacy
policy representation.

This change moves the helper onto `PermissionProfile` directly so the
Linux sandbox plumbing matches the rest of the permission stack.

## What changed

- changed `codex-linux-sandbox` to accept `--permission-profile` and
derive the runtime filesystem and network policies internally
- updated the in-process seccomp and legacy Landlock path in
`codex-rs/linux-sandbox` to operate on `PermissionProfile`
- updated Linux sandbox argv construction in `codex-rs/sandboxing`,
`codex-rs/core`, and the CLI debug sandbox path to pass the canonical
profile instead of serializing compatibility policy projections
- simplified the Linux sandbox tests to build the exact permission
profile under test, including the managed-proxy path and
direct-runtime-enforcement carveout coverage
- removed helper-local `SandboxPolicy` usage from `bwrap` tests where
`FileSystemSandboxPolicy` is already the value being exercised

## Testing

- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox` (on this macOS host, the crate
compiled cleanly and its Linux-only tests were cfg-gated)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-cli --no-run`
2026-04-28 19:43:44 -07:00
Celia Chen
80fb0704ee feat: update Bedrock Mantle endpoint and GPT-5.4 model ID (#20109)
## Summary

Amazon Bedrock Mantle's OpenAI-compatible endpoint now lives under
`/openai/v1`, and the GPT-5.4 Mantle model ID no longer uses the `-cmb`
suffix. This updates Codex's built-in Bedrock provider configuration so
generated providers and the static Bedrock catalog use the current
endpoint and model ID.

## Changes

- Update the Bedrock Mantle base URL from
`https://bedrock-mantle.{region}.api.aws/v1` to
`https://bedrock-mantle.{region}.api.aws/openai/v1`.
- Update the Amazon Bedrock default base URL in
`codex-model-provider-info`.
- Change the Bedrock GPT-5.4 catalog slug from `openai.gpt-5.4-cmb` to
`openai.gpt-5.4`.
- Align provider and catalog tests with the new URL and model ID.

## Test Plan

- Manual smoke test:

  ```shell
  target/debug/codex \
      -m openai.gpt-5.4 \
      -c 'model_provider="amazon-bedrock"' \
      -c 'model_providers.amazon-bedrock.aws.region="us-west-2"'
  ```
2026-04-29 01:37:21 +00:00
Celia Chen
8c47e36504 feat: expose provider capability bounds to app server clients (#20049)
follow up of #19442. The app server now exposes provider-derived bounds
through a new v2 `modelProvider/read` method. The response reports the
configured provider map key as `modelProvider` and returns the effective
capability booleans so clients can align their UI with the same
provider-owned limits used by core.
2026-04-29 01:36:19 +00:00
canvrno-oai
4c39ad33cb Fix plugin list workspace settings test isolation (#20086)
Fixes test that often fails locally when running `cargo test`
- Add an app-server test helper that combines managed-config isolation
with custom env overrides.
- Isolate `HOME` / `USERPROFILE` in plugin-list workspace settings tests
so host home marketplaces do not affect results.
2026-04-28 18:34:38 -07:00
canvrno-oai
24be9ac0a4 Restore TUI working status after steer message is set (#19939)
Fix for #19925

Restore the `Working` indicator after a streamed final answer finishes
when a user steer message is sent.
Add regression coverage for long output plus a mid-stream steer:
`cargo test -p codex-tui
final_answer_completion_restores_status_indicator_for_pending_steer`

Duplication/testing steps:
1. Start a new thread and ask for a long response.
2. While the response is streaming, submit a steer message.
3. When the first response finishes, observe whether `Working...` is
shown while waiting for the steer message response.
2026-04-28 18:10:40 -07:00
Michael Bolin
c9f7c88f3d fix: restore live event submit path for apply patch tests (#20108)
## Summary

This fixes the CI regression introduced by
[#20040](https://github.com/openai/codex/pull/20040).

That PR migrated several `apply_patch_cli` tests from direct
`codex.submit(Op::UserTurn { ... })` calls to `harness.submit(...)`.
`harness.submit()` waits for `TurnComplete` before returning, which
drains the same event stream that these tests use to assert `TurnDiff`,
`PatchApplyUpdated`, and related live events. The regressed tests then
timed out waiting for events that had already been consumed.

This change restores a no-wait submit path for the event-observing
`apply_patch_cli` tests so they can watch the turn stream directly
again.

## What Changed

- added a local `submit_without_wait(...)` helper in
`codex-rs/core/tests/suite/apply_patch_cli.rs`
- switched the `apply_patch_cli` tests that assert live turn events back
to that helper
- left the profile-backed `harness.submit(...)` migration in place for
tests that only care about final filesystem or tool output state

## Why macOS Looked Green

In the failing run
[25084487331](https://github.com/openai/codex/actions/runs/25084487331),
`//codex-rs/core:core-all-test` was cached on macOS, so the regressed
tests were not rerun there. The Linux GNU, Linux MUSL, and Windows Bazel
jobs reran the target and exposed the failure.

## Verification

- `cargo test -p codex-core apply_patch_ -- --nocapture`
- previously failing local cases now pass again:
  - `apply_patch_cli_move_without_content_change_has_no_turn_diff`
  - `apply_patch_turn_diff_for_rename_with_content_change`
  - `apply_patch_aggregates_diff_across_multiple_tool_calls`
2026-04-28 18:09:20 -07:00
Celia Chen
f8fe96d548 feat: disable capabilities by model provider (#19442)
## Why

Unsupported features must fail closed and Codex must not expose
OpenAI-hosted fallback paths when the active provider cannot support
them. In practice, Bedrock should not surface app connectors, MCP
servers, tool search/suggestions, image generation, web search, or JS
REPL until those paths are explicitly supported for that provider.

This PR moves that decision into provider-owned capability metadata
instead of scattering Bedrock-specific checks across callers.

## What changed

- Adds `ProviderCapabilities` to `codex-model-provider`, with default
support for existing providers and a Bedrock override that disables
unsupported launch surfaces.
- Adds `ToolCapabilityBounds` to `codex-tools` so provider capability
limits can clamp otherwise-enabled tool config.
- Applies capability bounds when building session and review-thread tool
config.
- Routes MCP/app connector configuration through
`McpManager::mcp_config`, which filters configured MCP servers and app
connectors based on the active provider.
- Updates app-server MCP list/read paths to use the filtered MCP config.
- Adds coverage for default provider capabilities, Bedrock disabled
capabilities, and optional tool-surface clamping.

## Testing

built locally and verified that bedrock responses api now return without
errors calling unsupported tools.
2026-04-28 17:51:30 -07:00
alexsong-oai
cb8b1bbcd6 Support detect and import MCP, Subagents, hooks, commands from external (#19949)
## Why
This PR expands the migration path so Codex can detect and import MCP
server config, hooks, commands, and subagents configs in a Codex-native
shape.

## What changed

- Added a `codex-external-agent-migration` crate that owns conversion
logic for external-agent MCP servers, hooks, commands, and subagents.
- Extended the app-server external-agent config detection/import API
with migration item types for MCP server config, hooks, commands, and
subagents.

## Migration strategy

The migration is intentionally conservative: Codex only imports
external-agent config that can be represented safely in Codex today.
Unsupported or ambiguous config is skipped instead of being partially
translated into behavior that may not match the source system.

- **MCP servers**: import supported stdio and HTTP MCP server
definitions into `mcp_servers`. Disabled servers and servers filtered
out by source `enabledMcpjsonServers` / `disabledMcpjsonServers` are
skipped. Project-scoped MCP entries from `.claude.json` are included
when they match the repo path.
- **Hooks**: import only supported command hooks into
`.codex/hooks.json`. Unsupported hook features such as conditional
groups, async handlers, prompt/http hooks, or unknown fields are
skipped. Referenced hook scripts are copied into `.codex/hooks/`,
preserving any existing target scripts.
- **Commands**: import supported external commands as Codex skills under
`.agents/skills/source-command-*`. Commands that rely on source runtime
expansion such as `$ARGUMENTS`, `$1`, `@file` references, shell
interpolation, or colliding generated names are skipped.
- **Subagents**: import valid subagent Markdown files into
`.codex/agents/*.toml` when they have the minimum Codex agent fields.
Source model names are not migrated, so imported agents keep the user’s
Codex default model; compatible reasoning effort and sandbox mode are
migrated when present.
- **Skills and project guidance**: copy missing skill directories into
`.agents/skills` and migrate `CLAUDE.md` guidance into `AGENTS.md`,
rewriting source-agent terminology to Codex terminology where
appropriate.
- **Detection details**: detected migration items include lightweight
details for UI preview, such as MCP server names, hook event names,
generated command skill names, and subagent names. Import still
recomputes from disk instead of trusting details as the source of truth.

- Adds focused coverage for the new migration behavior and app-server
import flow.

## Verification

- `cargo test -p codex-external-agent-migration`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server external_agent_config`
- `just bazel-lock-check`
2026-04-29 00:45:24 +00:00
Matthew Zeng
ebdf3a878c Support disabling tool suggest for specific tools. (#20072)
## Summary
- Add `disable_tool_suggest` to app and plugin config, schema, and
TypeScript output
- Exclude disabled connectors and plugins from tool suggestion discovery
- Persist "never show again" tool-suggestion choices back into
`config.toml`
- Update config docs and add coverage for connector and plugin
suppression

## Testing
- Added and updated unit tests for config persistence and tool-suggest
filtering
- Not run (not requested)
2026-04-29 00:19:34 +00:00
Michael Bolin
1211a90a35 core tests: migrate hook turns to profiles (#20041)
## Summary
- Removes `SandboxPolicy` from the hooks test suite.
- Submits hook-related turns with explicit `PermissionProfile` values
for disabled, read-only, and workspace-write cases.
- Preserves the managed-network hook test by configuring and submitting
a workspace-write profile with enabled network, allowing the existing
requirements-backed proxy path to remain covered.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:18:45 -07:00
Michael Bolin
1fed948c66 core tests: migrate apply patch turns to profiles (#20040)
## Summary
- Removes `SandboxPolicy` from the apply-patch CLI test suite.
- Uses the harness' profile-backed submit helper for danger/no-sandbox
turns instead of constructing `Op::UserTurn` manually with legacy
fields.
- Converts the workspace-write traversal cases to submit
`PermissionProfile::workspace_write_with(...)` directly.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:18:19 -07:00
Michael Bolin
1dae5788e1 core tests: migrate rmcp turns to profiles (#20037)
## Summary
- Removes `SandboxPolicy` from the RMCP client test suite.
- Adds shared read-only user-turn helpers that submit
`PermissionProfile::read_only()` plus the legacy compatibility
projection required by the current `Op::UserTurn` shape.
- Keeps sandbox metadata assertions intact by deriving the expected
legacy `sandboxPolicy` value from the same read-only profile used for
the turn.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:17:47 -07:00
Michael Bolin
6662c0f312 core tests: migrate compact turns to profiles (#20035)
## Summary
- Removes the remaining `SandboxPolicy` usage from the compaction test
suite.
- Adds a small local helper for direct `Op::UserTurn` construction so
these tests send `PermissionProfile::Disabled` plus the legacy
compatibility projection required by the protocol field.
- Keeps the existing danger/full-access behavior while exercising the
canonical permission profile path.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:17:12 -07:00
Michael Bolin
026df712cc core tests: migrate zsh-fork permissions to profiles (#20034)
## Summary
- Updates the zsh-fork test helper to configure `PermissionProfile`
directly instead of constructing a legacy `SandboxPolicy`.
- Sends permission-profile-backed turns from the skill approval zsh-fork
tests so the runtime and request path exercise the canonical permissions
model.
- Leaves the broader approvals suite on legacy policies for now, except
for the zsh-fork test that shares this helper.

## Verification
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:15:58 -07:00
Michael Bolin
1ea90410e1 core tests: migrate request permissions tool turns to profiles (#20033)
## Summary

This migrates the macOS request-permissions tool tests from legacy
`SandboxPolicy` setup to `PermissionProfile` setup. The tests still
exercise the same workspace-write baseline and request-permission
grants, but the canonical permissions value is now the profile.

## Changes

- Replaces the `workspace_write_excluding_tmp()` helper with a
`PermissionProfile::workspace_write_with()` helper.
- Applies test config through `Permissions::set_permission_profile()`.
- Uses `turn_permission_fields()` for `Op::UserTurn` compatibility
fields.
- Removes the `SandboxPolicy` import from `request_permissions_tool.rs`.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:15:13 -07:00
Michael Bolin
af39e488bc core tests: migrate prompt caching turns to profiles (#20032)
## Summary

This removes the explicit `SandboxPolicy` constructors from
`core/tests/suite/prompt_caching.rs`. The tests still exercise the same
prompt-cache invariants across permission and turn-context changes, but
the permission source is now `PermissionProfile`.

## Changes

- Uses `PermissionProfile::workspace_write_with()` for workspace-write
override scenarios.
- Uses `PermissionProfile::Disabled` for the no-sandbox per-turn
override.
- Projects profiles through `turn_permission_fields()` or
`to_legacy_sandbox_policy()` only to populate compatibility fields on
existing ops.
- Removes the `SandboxPolicy` import from `prompt_caching.rs`.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:13:53 -07:00
Michael Bolin
5d08315c00 core tests: migrate exec policy turns to profiles (#20030)
## Summary

This migrates `core/tests/suite/exec_policy.rs` away from legacy
`SandboxPolicy` turn construction. These tests all use no-sandbox turns
to exercise exec-policy behavior, so `PermissionProfile::Disabled` is
the canonical representation.

## Changes

- Replaces direct `SandboxPolicy::DangerFullAccess` turn fields with
`PermissionProfile::Disabled`.
- Uses `turn_permission_fields()` to populate the compatibility
`sandbox_policy` field required by `Op::UserTurn`.
- Removes the `SandboxPolicy` import from `exec_policy.rs`.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:12:48 -07:00
Michael Bolin
b599849d86 core tests: migrate permissions message tests to profiles (#20028)
## Summary

This removes another test-only `SandboxPolicy` dependency by configuring
`permissions_messages.rs` with a `PermissionProfile` directly. The test
still verifies the rendered compatibility permissions text, but now
obtains the legacy projection from the loaded `Config` rather than using
`SandboxPolicy` as the source of truth.

## Changes

- Builds the workspace-write test setup with
`PermissionProfile::workspace_write_with()`.
- Applies that profile through `Permissions::set_permission_profile()`.
- Uses `Config::legacy_sandbox_policy()` only for the expected
`PermissionsInstructions` compatibility rendering.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:12:10 -07:00
Michael Bolin
3ef09c71d3 core tests: migrate tools tests to permission profiles (#20027)
## Summary

This continues the test-side migration away from `SandboxPolicy` by
removing the remaining legacy policy setup in
`core/tests/suite/tools.rs`. The affected test was already modeling a
profile-backed filesystem policy with a deny-read glob, so configuring
the test through `Permissions::set_permission_profile()` is a better
match for the behavior being exercised.

## Changes

- Drops the `SandboxPolicy` import from `core/tests/suite/tools.rs`.
- Configures the glob deny-read shell test directly with a
`PermissionProfile` instead of creating a legacy read-only policy first.
- Submits the test turn with the session permission profile so the
deny-read glob remains active for the command under test.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:11:43 -07:00
Michael Bolin
8d3992d830 core tests: migrate plan item turns to profiles (#20026)
## Why

The core item tests still had a cluster of plan-mode `Op::UserTurn`
literals that used `SandboxPolicy::DangerFullAccess` and omitted
`permission_profile`. These tests are validating emitted item lifecycle
events, so keeping them on the legacy sandbox-only turn shape adds noise
to the broader permissions migration without testing legacy behavior.

## What Changed

- Adds a local `disabled_plan_turn()` helper that preserves the existing
`std::env::current_dir()` turn cwd behavior.
- Uses `turn_permission_fields(PermissionProfile::Disabled, cwd)` to
populate both the compatibility `sandbox_policy` and canonical
`permission_profile` fields.
- Replaces the plan-mode hand-built turns in
`codex-rs/core/tests/suite/items.rs`, removing all `SandboxPolicy`
references from that file and reducing remaining `codex-rs/core/tests`
`SandboxPolicy` files from 16 to 15.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:11:17 -07:00
Michael Bolin
162f4e3183 core tests: migrate safety check turns to profiles (#20024)
## Why

This stack is retiring direct `SandboxPolicy` construction from tests so
core coverage exercises the same `PermissionProfile` turn path used by
runtime code. `safety_check_downgrade.rs` still submitted each test turn
as `SandboxPolicy::DangerFullAccess` with no permission profile, even
though the tests are about model verification/reroute behavior rather
than legacy sandbox conversion.

## What Changed

- Adds a local `disabled_text_turn()` helper that derives both the
compatibility `sandbox_policy` and canonical `permission_profile` from
`PermissionProfile::Disabled`.
- Replaces repeated hand-built `Op::UserTurn` literals in
`codex-rs/core/tests/suite/safety_check_downgrade.rs` with that helper.
- Removes all `SandboxPolicy` references from the safety-check suite,
reducing the remaining `codex-rs/core/tests` files that mention
`SandboxPolicy` from 17 to 16.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:10:42 -07:00
Michael Bolin
2a8ce9b319 core tests: migrate view image turns to profiles (#20021)
## Why

This stack is removing direct `SandboxPolicy` usage from test code so
new tests exercise the same `PermissionProfile` path that runtime code
now treats as canonical. `view_image.rs` still built `Op::UserTurn`
requests with `SandboxPolicy::DangerFullAccess` and no permission
profile, which kept another core test module on the legacy turn shape.

## What Changed

- Adds a small `disabled_user_turn()` helper for the view-image suite
that derives the compatibility `sandbox_policy` and canonical
`permission_profile` from `PermissionProfile::Disabled`.
- Replaces repeated direct `Op::UserTurn` literals in
`codex-rs/core/tests/suite/view_image.rs` with that helper.
- Removes all `SandboxPolicy` references from `view_image.rs`, reducing
the remaining `codex-rs/core/tests` files that mention `SandboxPolicy`
from 18 to 17.

## Verification

- `cargo check -p codex-core --tests`
2026-04-28 17:09:48 -07:00
Michael Bolin
d77d23da2e core tests: migrate model/personality turns to profiles (#20018)
## Summary

- Migrates `model_switching.rs` and `personality.rs` direct
`Op::UserTurn` construction from legacy `SandboxPolicy` literals to
`PermissionProfile`-backed turn fields.
- Adds small local helpers in each file so tests keep asserting
model/personality behavior without repeating permission plumbing.
- Reduces `rg -l '\bSandboxPolicy\b' codex-rs/core/tests` from 20 files
to 18; `codex-rs/tui` remains at zero `SandboxPolicy` references.

## Testing

- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:09:12 -07:00
Abhinav
5b0d9df1d0 Increase plugin hook env test timeout (#20100)
# Why

`plugin_hook_sources_run_with_plugin_env_and_plugin_source` can still
fail on Windows after the earlier file-based assertion cleanup because
the hook process itself occasionally exceeds the old 5s timeout under CI
load. When that happens, the hook run ends as `Failed` before the test
can inspect its structured output.

The Windows Bazel failure showed the hook run itself failing after
nearly 8 seconds:

```text
---- engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source stdout ----
thread 'engine::tests::plugin_hook_sources_run_with_plugin_env_and_plugin_source' panicked at hooks/src\engine\mod_tests.rs:428:5:
assertion failed: `(left == right)`
Diff < left / right > :
<Failed
>Completed
...
test result: FAILED. 78 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.96s
```

# What

- raise the flaky plugin hook env test timeout from 5s to 10s so it
matches the other executed hook tests in this module

# Validation

- `cargo test -p codex-hooks`
2026-04-28 17:08:12 -07:00
Michael Bolin
d6d79ffcc7 core tests: send model turns with permission profiles (#20016)
## Summary
- Migrate direct `Op::UserTurn` construction in remote-model tests from
legacy `SandboxPolicy::DangerFullAccess` to
`PermissionProfile::Disabled` via `turn_permission_fields()`.
- Migrate the Responses API proxy header helper from an inline
workspace-write `SandboxPolicy` to
`PermissionProfile::workspace_write()`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 22
files after #20015 to 20 files.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`





























---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20016).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* __->__ #20016
2026-04-28 17:08:04 -07:00
Michael Bolin
158b2a4201 core tests: configure profiles directly (#20015)
## Summary
- Replace legacy sandbox config setup in delegate and telemetry tests
with direct `PermissionProfile` configuration.
- Move no-sandbox and read-only test turns in `tools.rs`,
`code_mode.rs`, `user_shell_cmd.rs`, and `model_visible_layout.rs` from
legacy `SandboxPolicy` values to `PermissionProfile` helpers, while
leaving the deny-glob read-only compatibility case for a later targeted
cleanup.
- Use `PermissionProfile::read_only()` where tests need managed
read-only behavior and `PermissionProfile::Disabled` where they
intentionally need no sandbox.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 27
files after #20013 to 22 files.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
2026-04-28 17:06:59 -07:00
Michael Bolin
52e79ee49a core tests: migrate more turns to permission profiles (#20013)
## Summary
- Migrate another batch of direct `Op::UserTurn` test construction from
legacy `SandboxPolicy` values to `PermissionProfile` inputs via
`turn_permission_fields()`.
- Replace a one-off read-only `SandboxPolicy` bridge in the macOS exec
test with `PermissionProfile::read_only()`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 32
files at the start of the cleanup stack to 27 files.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p codex-core`
2026-04-28 17:05:53 -07:00
Michael Bolin
7d15936e69 core tests: build user turns from permission profiles (#20011)
## Summary
- Add `turn_permission_fields()` so tests that construct `Op::UserTurn`
directly can provide a canonical `PermissionProfile` while still filling
the required legacy `sandbox_policy` compatibility field.
- Migrate direct user-turn construction in core integration tests from
`SandboxPolicy::DangerFullAccess` to `PermissionProfile::Disabled`.
- Continue reducing direct `SandboxPolicy` usage in
`codex-rs/core/tests`, from 41 files after #20010 to 32 files in this
PR.

## Testing
- `cargo check -p codex-core --tests`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
2026-04-28 17:03:20 -07:00
Eric Traut
2223b31c06 Refine Codex issue digest summaries (#20097)
## Why

The `codex-issue-digest` skill was producing more detail than the daily
digest needed, and broad all-area digests could miss active issues. In
particular, issue #16088 had substantial recent comments and reactions
but did not appear in the weekly all-areas output because GitHub search
was using default relevance ranking and the collector could exhaust its
candidate cap before later search queries got a fair sample.

That made the digest look quieter than the underlying user activity and
made threshold tuning misleading.

## What changed

- Make the digest summary headline-first and summary-only by default.
- Add an explicit opt-in flow for `## Details`, so the issue table is
shown only when requested or when the prompt asks for details upfront.
- Update the collector to request GitHub issue search results with
`sort=updated` and `order=desc`.
- Apply the search candidate cap per query instead of globally across
all queries.
- Bump the collector script version to `3`.
- Add tests that cover updated sorting and per-query candidate limits.

## Verification

- `pytest
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `ruff check
.codex/skills/codex-issue-digest/scripts/collect_issue_digest.py
.codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py`
- `git diff --check`
- Reran the all-areas weekly collector and confirmed #16088 is now
included with `55` interactions.
2026-04-28 16:53:59 -07:00
Ruslan Nigmatullin
c6465c1ec2 app-server: notify clients of remote-control status changes (#19919)
## Why

Remote-control app-server enrollments have both an internal server id
and the environment id exposed to remote-control clients. App-server
clients need one current status snapshot that says whether remote
control is usable and which environment id, if any, is exposed.

A temporary websocket disconnect is not itself an identity change.
Account changes, stale enrollment invalidation, successful
re-enrollment, and missing ChatGPT auth are meaningful status changes.
Disabled remote control remains `disabled` regardless of auth or SQLite
state. SQLite startup failure disablement and enrollment persistence
failures are handled in #20068; this PR reports the resulting effective
status to clients.

## What changed

- Adds v2 `remoteControl/status/changed` carrying `state` and
`environmentId`.
- Adds `RemoteControlConnectionState` values: `disabled`, `connecting`,
`connected`, and `errored`.
- Exposes remote-control status updates through `RemoteControlHandle`
using a Tokio watch channel.
- Always sends the current remote-control status snapshot to newly
initialized app-server clients.
- Broadcasts status changes to initialized app-server clients when state
or environment id changes.
- Treats missing ChatGPT auth as an `errored` status while leaving it
retryable because auth can change at runtime.
- Clears `environmentId` when enrollment is cleared for account changes,
auth loss, stale backend invalidation, or disabled remote control.
- Updates app-server protocol schema fixtures, generated TypeScript,
app-server README, remote-control tests, and TUI exhaustive notification
matches.

## Stack

- Builds on #20068.

## Verification

- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server transport::remote_control --lib`
- `cargo check -p codex-tui`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fix -p codex-tui`
2026-04-28 23:52:14 +00:00
Gabriel Peal
5e6cbbadf7 Return None when auth refresh fails (#20092)
Right now, if Codex winds up in a state with auth but it can't refresh
the token, the user is left with an unhelpful message that says to log
out and log back in again.

Ultimately, we should prevent that from happening but if it does,
returning None will allow the caller to redirect the user back to the
login page
2026-04-28 16:15:47 -07:00
Michael Bolin
891722849d core tests: submit turns with permission profiles (#20010)
## Summary

- Add `PermissionProfile`-based turn submission helpers to
`core_test_support`, while keeping the legacy `SandboxPolicy` helper for
tests that intentionally exercise legacy fallback behavior.
- Switch the default `TestCodex::submit_turn()` path to send a real
`PermissionProfile` plus the required legacy compatibility projection in
`Op::UserTurn`.
- Migrate straightforward app/search/shell/truncation tests from
`SandboxPolicy::{DangerFullAccess, ReadOnly}` to
`PermissionProfile::{Disabled, read_only}`.
- Add a TUI compatibility projection helper for legacy app-server fields
so non-legacy writable roots are preserved instead of being downgraded
to read-only.
- Fix remote start/resume/fork sandbox-mode projection to classify any
managed profile with writable roots as workspace-write, not only
profiles that can write `cwd`.
- Reduce `SandboxPolicy` references in `codex-rs/core/tests` from 47
files to 41 files without changing production behavior.

## Testing

- `cargo check -p codex-core --tests`
- `cargo test -p codex-tui
compatibility_profile_preserves_unbridgeable_write_roots`
- `cargo test -p codex-tui
sandbox_mode_preserves_non_cwd_write_roots_for_remote_sessions`
- `just fmt`
- `just fix -p core_test_support`
- `just fix -p codex-core`
2026-04-28 23:01:40 +00:00
viyatb-oai
2dbde94aa9 fix(network-proxy): normalize network proxy host matching (#19995)
## Why
The proxy matches allow and deny rules against normalized host strings.
Scoped IPv6 literals can arrive in equivalent forms, such as
`fd00::1%eth0`, `[fd00::1%eth0]`, or `[fd00::1%25eth0]`. Policy should
canonicalize those spellings without erasing scope granularity: an
unscoped rule like `fd00::1` should still cover scoped requests for that
address, while a scoped rule like `fd00::1%eth0` should remain exact to
that scope.

## What changed
- preserve IPv6 scope IDs during host normalization and canonicalize
`%25scope` to `%scope`
- match policy against the exact normalized host plus the unscoped IP
base for scoped literals
- keep local-address explicit allow checks aligned with the same
scoped/unscoped semantics
- add focused coverage for scoped IPv6 normalization, scoped allow
rules, and scoped deny rules in `network-proxy`

## Security impact
A request cannot bypass a broad deny rule by adding an IPv6 scope
suffix. At the same time, scoped policy remains precise:
`deny=fd00::1%eth0` affects that scoped spelling without collapsing
`fd00::1%eth1` onto the same key, and `allow=fe80::1%eth0` does not
implicitly allow other scopes.

## Verification
- `just fmt`
- `cargo test -p codex-network-proxy`
- `just fix -p codex-network-proxy`
- `git diff --check`

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: evawong-oai <evawong@openai.com>
2026-04-28 15:50:00 -07:00
Abhinav
3291463ff1 Fix flaky plugin hook env test (#20088)
The test was flaky because it was checking the right thing in a
roundabout way.

What it wanted to prove:
- plugin hooks receive the right environment variables.

What it actually did:
1. Run a plugin hook.
2. Have that hook write those env vars into a temporary `env.json` file.
3. After the hook finished, read `env.json` back from disk.

On Windows, that last file was sometimes not there when the test tried
to read it, so the test failed with `read env log: file not found`. The
hook system itself was not what the test failure was directly proving;
the test was failing on the extra filesystem side effect it introduced.

The fix is to stop using a temp file as the proof mechanism. The hook
now prints the env values in its normal structured output, and the test
asserts on the output that the hook engine already captures. So we still
verify the same behavior, but without depending on a separate file being
created and read back correctly on Windows.
2026-04-28 15:45:26 -07:00
Owen Lin
2e598df6fc fix: don't auto approve git -C ... (#20085)
It's safer to make sure these commands go through approval flows.
2026-04-28 22:06:55 +00:00
canvrno-oai
66b0781502 /plugins: add marketplace install flow (#18704)
This PR adds a new feature to the `/plugins` menu that gives users the
ability to add new plugin marketplaces. It introduces an Add Marketplace
tab to the right of installed marketplaces, a source prompt, loading and
error states, and the app-server request flow needed to perform the
install. After a successful `marketplace/add`, the popup refreshes back
into the newly added marketplace tab so the new plugins are immediately
visible.

- Add an Add Marketplace tab to the `/plugins` menu
- Prompt for marketplace source input from git repo, URL, or local path
- Show loading and error states during `marketplace/add`
- Refresh plugin data after success and switch into the newly added
marketplace tab
- Add tests and snapshot updates
2026-04-28 14:22:39 -07:00
Abhinav
c6e7d564c3 Discover hooks bundled with plugins (#19705)
## Why

Plugins can bundle lifecycle hooks, but Codex previously only discovered
hooks from user, project, and managed config layers. This adds the
plugin discovery and runtime plumbing needed for plugin-bundled hooks
while keeping execution behind the `plugin_hooks` feature flag.

## What

- Discovers plugin hook sources from each plugin's default
`hooks/hooks.json`.
- Supports `plugin.json` manifest `hooks` entries as either relative
paths or inline hook objects.
- Plumbs discovered plugin hook sources through plugin loading into the
hook runtime when `plugin_hooks` is enabled.
- Marks plugin-originated hook runs as `HookSource::Plugin`.
- Injects `PLUGIN_ROOT` and `CLAUDE_PLUGIN_ROOT` into plugin hook
command environments.
- Updates generated schemas and hook source metadata for the plugin hook
source.

## Stack

1. This PR - openai/codex#19705
2. openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882

## Reviewer Notes

- Core logic is in `codex-rs/core-plugins/src/loader.rs` and
`codex-rs/hooks/src/engine/discovery.rs`
- Moved existing / adding new tests to
`codex-rs/core-plugins/src/loader_tests.rs` hence the large diff there
- Otherwise mostly plumbing and minor schema updates

### Core Changes

The `codex-rs/core` changes are limited to wiring plugin hook support
into existing core flows:

- `core/src/session/session.rs` conditionally pulls effective plugin
hook sources and plugin hook load warnings from `PluginsManager` when
`plugin_hooks` is enabled, then passes them into `HooksConfig`.
- `core/src/hook_runtime.rs` adds the `plugin` metric tag for
`HookSource::Plugin`.
- `core/config.schema.json` picks up the new `plugin_hooks` feature
flag, and `core/src/plugins/manager_tests.rs` updates fixtures for the
added plugin hook fields.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 14:17:18 -07:00
cassirer-openai
89698ad1c3 [rollout-trace] Include x-request-id in rollout trace. (#20066)
## Why

Rollout traces need an identifier that can be used to correlate a Codex
inference with upstream Responses API, proxy, and engine logs. The
reduced trace model already exposed `upstream_request_id`, but it was
being populated from the Responses API `response.id`. That value is
useful for `previous_response_id` chaining, but it is not the transport
request id that upstream systems key on.

This PR separates those concepts so trace consumers can reliably answer
both questions:

- which Responses API response did this inference produce?
- which upstream request handled it?

## Structure

The change keeps the upstream request id at the same lifecycle level as
the provider stream:

- `codex-api` captures the `x-request-id` HTTP response header when the
SSE stream is created and exposes it on `ResponseStream`. Fixture and
websocket streams set the field to `None` because they do not have that
HTTP response header.
- `codex-core` carries that stream-level id into `InferenceTraceAttempt`
when recording terminal stream outcomes. Completed, failed, cancelled,
dropped-stream, and pre-response error paths all record the id when it
is available.
- `rollout-trace` now records both identifiers in raw terminal inference
events and response payloads: `response_id` for the Responses API
`response.id`, and `upstream_request_id` for `x-request-id`.
- The reducer stores both fields on `InferenceCall`. It also uses
`response_id` for `previous_response_id` conversation linking, which
removes the old accidental dependency on the misnamed
`upstream_request_id` field.
- Terminal inference reduction now consumes the full terminal payload
(`InferenceCompleted`, `InferenceFailed`, or `InferenceCancelled`) in
one place. That keeps status, partial payloads, response ids, and
upstream request ids consistent across success, failure, cancellation,
and late stream-mapper events.

## Why This Shape

`x-request-id` is a property of the HTTP/provider response envelope, not
an SSE event. Capturing it once in `codex-api` and plumbing it through
terminal trace recording avoids trying to infer the value from stream
contents, and it preserves the id even when the stream fails or is
cancelled after only partial output.

Keeping `response_id` separate from `upstream_request_id` also makes the
reduced trace model less surprising: `response_id` remains the
conversation-continuation id, while `upstream_request_id` is the
operational correlation id for upstream debugging.

## Validation

The PR updates trace and reducer coverage for:

- reading `x-request-id` from SSE response headers;
- storing the true upstream request id on completed inference calls;
- preserving upstream request ids for cancelled and late-cancelled
inference streams;
- keeping `previous_response_id` reconstruction tied to `response_id`
rather than transport request ids.
2026-04-28 21:11:17 +00:00
Ruslan Nigmatullin
10e2a73b3c app-server: disable remote control without sqlite (#20068)
## Why

Remote control depends on the app-server SQLite state DB for persisted
enrollment identity. If the state DB cannot be opened at startup,
continuing with remote control enabled leaves the process in a
misleading state where enrollment identity cannot be read or persisted.

Feature-disabled remote control remains disabled regardless of SQLite
state. This only changes the case where remote control is requested but
the SQLite state DB is unavailable.

## What changed

- Logs SQLite state DB initialization failures instead of dropping the
error silently.
- Treats remote control as effectively disabled when the SQLite state DB
is unavailable.
- Prevents `RemoteControlHandle::set_enabled(true)` from enabling remote
control later in the same process if the state DB was unavailable at
startup.
- Keeps the existing behavior that disabled remote control does not
validate or connect to the remote-control URL.
- Makes persisted enrollment load/update failures propagate as
remote-control errors instead of silently falling back to in-memory
state.
- Makes the direct websocket connection path fail when called without a
SQLite state DB.
- Adds coverage for startup without a state DB, later handle enablement
with no state DB, and direct websocket connection without a state DB.

## Verification

- `cargo test -p codex-app-server transport::remote_control --lib`
- `just fix -p codex-app-server`
2026-04-28 13:49:00 -07:00
Michael Bolin
3b74a4d3b1 tui: use permission profiles for sandbox state (#20008)
## Summary
- Move TUI permission state from legacy `SandboxPolicy` values to
canonical `PermissionProfile` values across presets, app events, chat
widget state, app commands, thread routing, and cached thread session
state.
- Keep app-server compatibility boundaries explicit: embedded sessions
send `permissionProfile`, while remote sessions send only a legacy
`sandbox` projection and fall back to read-only when a custom profile
cannot be projected.
- Update status/add-dir UI summaries and snapshots to render the active
permission profile, including workspace profiles selected by the new
built-in defaults.

## Verification
- `rg '\bSandboxPolicy\b' codex-rs/tui -n` returns no matches.
- `cargo test -p codex-tui`
- `cargo check -p codex-tui --tests`
- `cargo test -p codex-tui additional_dirs`
- `just fmt`
- `just fix -p codex-tui`




































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20008).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* __->__ #20008
2026-04-28 20:36:48 +00:00
jif-oai
34d71d43eb Make MultiAgentV2 wait minimum configurable (#20052)
## Why

MultiAgentV2 `wait_agent` currently clamps short waits to a fixed 10
second minimum. That default is still useful for preventing tight
polling loops, but it is too rigid for environments that need faster
mailbox wake-up checks or a larger minimum to discourage frequent
polling.

This PR makes the minimum wait timeout configurable from the existing
MultiAgentV2 feature config section, so operators can tune the behavior
without changing the legacy multi-agent tool surface.

## What Changed

- Added `features.multi_agent_v2.min_wait_timeout_ms`.
- Defaulted the new setting to the existing 10 second floor.
- Validated the configured value as `1..=3600000`, matching the existing
one hour maximum wait bound.
- Applied the configured minimum to MultiAgentV2 `wait_agent` runtime
clamping.
- Plumbed the configured minimum into the `wait_agent` tool schema,
including the effective default when the minimum is above the normal 30
second default.
- Regenerated `core/config.schema.json`.

## Verification

- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib multi_agent_v2`
- `just fix -p codex-core`
2026-04-28 22:36:44 +02:00
Ruslan Nigmatullin
1de7a9bf69 app-server: allow remote_control runtime feature override (#20047) 2026-04-28 13:36:12 -07:00
viyatb-oai
e1ba87ccb2 fix(network-proxy): recheck network proxy connect targets (#19999)
## Why
The proxy checks the requested host before opening the upstream
connection, but DNS can resolve an allowed hostname to a loopback,
private, or other non-public address after that first decision. Without
a final check on the actual socket target, a request that looks
acceptable at the hostname layer can still connect to a local service
once resolution completes.

## What changed
- add a shared TCP connector check for direct proxy egress
- use that path for HTTP, `CONNECT`, SOCKS5, and MITM upstream
connections
- keep configured upstream proxy hops on the existing proxy path
- add direct-connector coverage for allowed and rejected local targets

## Security impact
Direct proxy egress now rechecks the resolved socket address before
connecting, closing the gap between hostname policy evaluation and the
final network target.

## Verification
- `cargo test -p codex-network-proxy`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 12:51:43 -07:00
Shijie Rao
25ac0e4527 Load cloud requirements for agent identity (#19708)
## Why

Agent Identity sessions can represent Business and Enterprise ChatGPT
workspaces, but cloud requirements were skipped before fetch. That meant
workspace-managed requirements were not loaded for Agent Identity even
when the JWT carried the same account identity and plan information that
normal ChatGPT token auth exposes.

This PR now sits on top of the Agent Identity stack through
[#19764](https://github.com/openai/codex/pull/19764). Because
[#19763](https://github.com/openai/codex/pull/19763) moved task
registration into Agent Identity auth loading, cloud requirements no
longer needs a separate runtime-initialization step before building the
backend client.

## What changed

- Stop skipping `CodexAuth::AgentIdentity` in the cloud requirements
loader.
- Share the cloud requirements eligibility check between startup load
and background cache refresh.
- Rely on eagerly loaded Agent Identity auth so backend requests can
attach task-scoped `AgentAssertion` headers.
- Decode Agent Identity JWT `plan_type` as the auth-layer plan type,
then convert it through a shared `auth::PlanType` -> `account::PlanType`
mapping.
- Add the missing serde alias for the `education` plan string and add
coverage for raw Agent Identity plan aliases such as `hc` and
`education`.

## Testing

- `cargo test -p codex-agent-identity -p codex-login -p
codex-cloud-requirements -p codex-protocol`
2026-04-28 12:35:00 -07:00
Ruslan Nigmatullin
0700f979ba app-server: run initialized rpcs with keyed serialization (#17373)
## Why

Initialized app-server RPCs no longer need to bottleneck behind one
request processor path. Running them concurrently improves
responsiveness, but several request families still mutate shared state
or depend on ordered side effects. Those stateful families need an
auditable serialization contract so concurrency does not reorder thread,
config, auth, command, watcher, MCP, or similar state transitions.

This PR keeps that boundary explicit: stateful work is serialized by the
smallest useful key, while intentionally read-only or externally
concurrent work remains unkeyed. In particular, `thread/list` and
`thread/turns/list` explicitly have no serialization because they
primarily read append-only rollout storage and should continue to be
served concurrently.

## What changed

- Adds `ClientRequest::serialization_scope()` in `app-server-protocol`
and requires every client request definition to declare its
serialization behavior.
- Introduces keyed request scopes for thread, thread path, command exec
process, fuzzy search session, fs watch, MCP OAuth, and global state
buckets such as config, account auth, memory, and device keys.
- Routes initialized app-server RPCs through per-key FIFO serialization
while allowing unkeyed initialized requests to run concurrently.
- Cancels in-flight initialized RPC work when the connection disconnects
or the app-server exits so spawned request tasks do not outlive their
session.
- Adds focused coverage for representative keyed and unkeyed
serialization scopes, including explicitly concurrent
`thread/turns/list` behavior.

## Validation

- Added protocol tests for representative keyed serialization scopes and
intentionally unkeyed request families.
- Added app-server request serialization tests covering per-key FIFO
behavior, concurrent unkeyed execution, disconnect shutdown, and config
read-after-write ordering.
- Local focused protocol validation after the latest rebase is currently
blocked by packageproxy failing to resolve locked `rustls-webpki
0.103.13`; CI is expected to provide the full validation signal.
2026-04-28 12:23:34 -07:00
Dylan Hurd
7f7c7c2c07 Fix log db batch flush flake (#19959)
## Why

The log DB writer batches tracing events before inserting them into
SQLite, but `tokio::time::interval` produces an immediate first tick.
That meant the inserter could flush the first accepted log entry before
`batch_size` was reached, making
`configured_batch_size_flushes_without_explicit_flush` timing-sensitive
in CI.

## What Changed

- Consume the interval's startup tick before entering the inserter loop,
so interval flushing starts after the configured delay.
- Remove the test's startup sleep, which was masking the race instead of
proving the batch-size behavior.

## Validation

- `cargo test -p codex-state`
- `cargo test -p codex-state
configured_batch_size_flushes_without_explicit_flush` passed 3
consecutive focused runs
- PR checks passed across `rust-ci`, Bazel, `ci`, `sdk`, `cargo-deny`,
Codespell, blob-size policy, and CLA
2026-04-28 12:08:41 -07:00
viyatb-oai
3377afd84a fix(network-proxy): harden linux proxy bridge helpers (#20001)
## Why
The Linux managed-proxy bridge helpers are long-lived child processes in
the sandbox networking path. Before this change they stayed dumpable and
the network seccomp profile did not block cross-process memory syscalls,
so another same-user process could potentially inspect or modify bridge
memory instead of interacting only through the intended proxy interface.

## What changed
- reuse the shared `codex-process-hardening` helper to mark bridge
helper children non-dumpable before they begin serving
- deny `process_vm_readv` and `process_vm_writev` in the existing
network seccomp filter

## Security impact
Bridge helpers are less exposed to same-user cross-process inspection or
memory writes, which reduces the chance that sandboxed code can
interfere with proxy support processes outside the intended IPC path.

## Verification
- `cargo test -p codex-process-hardening`
- `cargo test -p codex-linux-sandbox`
- attempted `cargo check -p codex-linux-sandbox --target
x86_64-unknown-linux-gnu`; blocked on missing `x86_64-linux-gnu-gcc` on
this macOS host

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 11:52:50 -07:00
charley-openai
de2ccf9473 [codex] Add token usage to turn tracing spans (#19432)
## Why

Slow Codex turns are easier to debug when token usage is visible in the
trace itself, without joining against separate analytics. This adds
token usage to existing turn-handling spans for regular user turns only.

[Example
turn](https://openai.datadoghq.com/apm/trace/9d353efa2cb5de1f4c5b93dc33c3df04?colorBy=service&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3555541504891512675&spanViewType=metadata&traceQuery=)
<img width="1447" height="967" alt="Screenshot 2026-04-24 at 3 03 07 PM"
src="https://github.com/user-attachments/assets/ab7bb187-e7fc-41f0-a366-6c44610b2b2c"
/>

## What Changed

Added response-level token fields on completed handle_responses spans:

gen_ai.usage.input_tokens
gen_ai.usage.cache_read.input_tokens
gen_ai.usage.output_tokens
codex.usage.reasoning_output_tokens
codex.usage.total_tokens
Added aggregate token fields on regular turn spans:

codex.turn.token_usage.*
Added an explicit regular-turn opt-in via
SessionTask::records_turn_token_usage_on_span() so this is not coupled
to span-name strings.

## Testing

- `cargo test -p codex-otel`
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
- Manual local Electron/app-server smoke test: regular user turn emits
the new span fields

Known status: `cargo test -p codex-core` was attempted and failed in
unrelated existing areas: config approvals, request-permissions,
git-info ordering, and subagent metadata persistence.
2026-04-28 11:41:32 -07:00
canvrno-oai
640a1b23ea Fix plan mode nudge test after task completion signature change (#20045)
Updates the plan mode nudge test to pass the new `duration_ms` argument
to task completion.

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 11:24:22 -07:00
Michael Bolin
9e26613657 permissions: add built-in default profiles (#19900)
## Why

The migration away from `SandboxPolicy` needs new configs to start from
permissions profiles instead of deriving profiles from legacy sandbox
modes. Existing users can have empty `config.toml` files, and we should
not rewrite user-owned config files that may live in shared
repositories.

This PR introduces built-in profile names so an empty config can resolve
to a canonical `PermissionProfile`, while explicit named `[permissions]`
profiles still behave predictably.

## What changed

- Adds built-in `default_permissions` profile names:
  - `:read-only` maps to `PermissionProfile::read_only()`.
- `:workspace` maps to the workspace-write profile, including
project-root metadata carveouts.
- `:danger-no-sandbox` maps to `PermissionProfile::Disabled`, preserving
the distinction between no sandbox and a broad managed sandbox.
- Reserves the `:` prefix for built-in profiles so user-defined
`[permissions]` profiles cannot collide with future built-ins.
- Allows `default_permissions` to reference a built-in profile without
requiring a `[permissions]` table.
- Makes an otherwise empty config choose a built-in profile by
trust/platform context: trusted or untrusted project roots use
`:workspace` when the platform supports that sandbox, while roots
without a trust decision use `:read-only`.
- Keeps legacy `sandbox_mode` configs on the legacy path, and still
rejects user-defined `[permissions]` profiles that omit
`default_permissions` so we do not silently guess among custom profiles.
- Preserves compatibility behavior for implicit defaults: bare
`network.enabled = true` allows runtime network without starting the
managed proxy, explicit profile proxy policy still starts the proxy, and
implicit workspace/add-dir roots keep legacy metadata carveouts.

## Verification

- `cargo test -p codex-core builtin --lib`
- `cargo test -p codex-core profile_network_proxy_config`
- `cargo test -p codex-core
implicit_builtin_workspace_profile_preserves_add_dir_metadata_carveouts`
- `cargo test -p codex-core
permissions_profiles_network_enabled_allows_runtime_network_without_proxy`
- `cargo test -p codex-core
permissions_profiles_proxy_policy_starts_managed_network_proxy`

## Documentation

Public Codex config docs should mention these built-in names when the
`[permissions]` config format is ready to document as stable.









---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19900).
* #20041
* #20040
* #20037
* #20035
* #20034
* #20033
* #20032
* #20030
* #20028
* #20027
* #20026
* #20024
* #20021
* #20018
* #20016
* #20015
* #20013
* #20011
* #20010
* #20008
* __->__ #19900
2026-04-28 11:21:39 -07:00
viyatb-oai
3afb185a4f fix(network-proxy): tighten network proxy bypass defaults (#20002)
## Why
Managed sessions use `NO_PROXY` to keep a small set of destinations on
the direct path by default. The old default also bypassed all IPv4
link-local addresses in `169.254.0.0/16`, which includes metadata
endpoints such as `169.254.169.254`. Because `NO_PROXY` is evaluated by
the client before the request reaches the managed proxy, requests to
that range could skip proxy-side allowlist and local-binding checks
entirely. On hosts where a link-local metadata service is reachable,
that creates a path to sensitive environment metadata or credentials
outside the intended enforcement point.

## What changed
- remove the default IPv4 link-local `169.254.0.0/16` bypass from the
managed proxy environment
- keep the existing loopback and private-network defaults unchanged
- update the regression assertion to lock in the narrower default

## Security impact
Link-local requests now stay on the managed-proxy path by default, so
the proxy can apply configured policy before they reach metadata-style
endpoints or other link-local services.

## Verification
- `cargo test -p codex-network-proxy`

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 10:51:43 -07:00
stefanstokic-oai
4c68bd728f External agent session support (#19895)
## Summary

This extends external agent detection/import beyond config artifacts so
Codex can detect recent sessions files from the external agent home and
import them into Codex rollout history.

## What changed

- Added a focused `external_agent_sessions` module for:
  - session discovery
  - source-record parsing
  - rollout construction
  - import ledger tracking
- Wired session detection/import into the app-server external agent
config API.
- Added compaction handling so large imported sessions can be resumed
safely before the first follow-up turn.

## Testing

Added coverage for:
- recent-session detection
- custom-title handling
- recency filtering
- dedupe and re-detect-after-source-change behavior
- visible imported turn construction
- backward-compatible import payload deserialization
- end-to-end RPC import flow
- rejection of undetected session paths
- repeat-import behavior
- large-session compaction before first follow-up

Ran:
- `cargo test -p codex-app-server external_agent_config_import_ --test
all`
2026-04-28 17:42:36 +00:00
Felipe Coury
a036584104 fix(tui): let esc exit empty shell mode (#19986)
## Summary

- exit shell mode when `Esc` is pressed while the absorbed `!` is the
only input
- add direct regression coverage plus a composer snapshot for the
restored normal prompt state

## Root cause

Shell mode stores the leading `!` outside the editable textarea. After
typing only `!`, the textarea is empty but the composer is still in bash
mode, so the existing empty-composer `Esc` handling never runs.

## Validation

- `just fmt`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::esc_exits_empty_shell_mode`
- `cargo test -p codex-tui
bottom_pane::chat_composer::tests::footer_mode_snapshots`
- `cargo insta pending-snapshots`

`cargo test -p codex-tui` still reports unrelated existing `/status`
snapshot drift in this local environment because the rendered
permissions text is `workspace-write with network access` instead of the
older `read-only` fixture text.
2026-04-28 14:35:24 -03:00
canvrno-oai
bc5a1b961e Move local /resume cwd filtering into thread/list (#19931)
Move local resume and fork cwd filtering to `thread/list` instead of
filtering in the TUI. This makes the `/resume` menu feel slightly faster
to load when working in repos with many historical threads, and
centralizes the cwd filtering in app-server.

**Affected:**
- /resume from inside the TUI.
- codex resume with no session ID and without --last
- codex resume --all
- codex fork with no session ID and without --last
- codex fork --all

**Not affected:**
- codex resume <id>
- codex fork <id>
- codex resume --last
- codex fork --last

Steps to test performance improvement in a real Codex environment:
- Launch `codex resume` using compiled binary in a directory that has
seen many threads.
- Launch `codex resume` using release binary in same directory.
- Observe difference in time-to-full-page as threads load.
2026-04-28 10:35:10 -07:00
Felipe Coury
c6bcd27832 feat(tui): suggest plan mode from composer drafts (#19901)
## Summary

- suggest Plan mode when the current composer draft contains the
standalone word `plan`
- shares the Codex App heuristics for detection
- excludes things line `/plan` and the word plan in shell mode
- reuse the existing `Shift+Tab` mode cycle and add thread-scoped
dismissal with `Esc`
- replace the normal footer hint while the reminder is visible so the
statusline stays anchored


https://github.com/user-attachments/assets/01123ae8-cee6-4e95-b563-44655c071cde

## Why

The desktop app already nudges users toward Plan mode when their draft
clearly signals planning intent. The TUI had the underlying `/plan` and
`Shift+Tab` flows, but no equivalent reminder at the moment the user was
most likely to benefit from them.

## Details

The reminder is shown only when Plan mode is available, the draft
contains standalone `plan`, the user is not already in Plan mode, the
composer is actionable, and the current thread has not dismissed the
reminder. Slash-command and shell-command drafts are excluded.

The first implementation used an extra composer row, but that moved the
statusline whenever the heuristic fired. This version keeps the layout
stable by rendering the reminder in the existing footer row instead.

## Validation

- `INSTA_UPDATE=always cargo test -p codex-tui
chatwidget::tests::plan_mode::plan_mode_nudge -- --nocapture`
- `just fmt`
- `just fix -p codex-tui`
- `./tools/argument-comment-lint/run.py -p codex-tui`
- `cargo insta pending-snapshots`
- `git diff --check`
2026-04-28 14:34:10 -03:00
maja-openai
273c2e21a9 Clarify network approval auto-review prompts (#19907)
## Why

Network access approval prompts were showing the generic retry reason,
which made auto-review focus on the blocked connection instead of the
command that caused it. This makes network approvals easier to assess by
telling the reviewer to evaluate whether the triggering command was
authorised by the user and within policy, and to treat the network call
as acceptable when it is a reasonable consequence of that command.

## What changed

- Split guardian approval request prompt rendering so `NetworkAccess`
has a dedicated branch.
- For network requests, show `Network approval context` and `Network
access JSON` instead of `Retry reason` / `Planned action JSON`.
- Added regression coverage for the network approval prompt wording and
for omitting retry reason in this case.

## Verification

- `cargo test -p codex-core
guardian::tests::build_guardian_prompt_items_explains_network_access_review_scope`
2026-04-28 10:25:37 -07:00
mchen-oai
01de13b7e6 Record MCP result telemetry on mcp.tools.call spans (#19509)
## Why
- Without change: MCP tool call spans include request-side details such
as server, tool, call ID, connector, session, and turn.
- Issue: Some useful telemetry is only known by the MCP server after it
handles the tool call, such as target identity or whether the call
triggered a user-facing flow.

## What Changed
- With change: Codex reads allowlisted telemetry from
`_meta["codex/telemetry"]["span"]` and records it on the
`mcp.tools.call` span.
- Adds span fields for `codex.mcp.target.id` and
`codex.mcp.user_flow.triggered`, with strict type checks and bounded
target ID length.


## Verification
`codex-rs/core/src/mcp_tool_call_tests.rs`
2026-04-28 17:20:38 +00:00
evawong-oai
0670d8971a Enforce workspace metadata protections in Seatbelt (#19847)
## Summary

Translate FileSystemSandboxPolicy project root metadata carveouts into
macOS Seatbelt rules.

## Scope

1. Thread protected metadata names into Seatbelt access roots.
2. Ask FileSystemSandboxPolicy whether each metadata carveout is
writable.
3. Emit Seatbelt deny rules that block creating or replacing protected
metadata names under writable roots.
4. Add coverage for first time metadata creation and read only
carveouts.

## Reviewer Focus

1. This PR only covers the macOS sandbox adapter.
2. The policy decision comes from FileSystemSandboxPolicy.
3. Read only subpath carveouts and metadata protection checks should
compose cleanly.

## Stack

1. Policy primitive: #19846
2. macOS Seatbelt adapter: this PR
3. Shell preflight UX: #19848
4. Runtime profile propagation: #19849
5. Linux bubblewrap adapter: #19852

## Validation

1. formatting for codex sandboxing
2. codex sandboxing package tests
2026-04-28 10:13:00 -07:00
efrazer-oai
f6797c3ac6 feat: verify agent identity JWTs with JWKS (#19764) 2026-04-28 09:56:20 -07:00
colby-oai
6138063656 Strip connector provenance metadata from custom MCP tools (#19875)
# Summary 
This prevents non-codex_apps MCP servers from spoofing connector
provenance metadata.
2026-04-28 12:43:26 -04:00
mchen-oai
ccec84b148 Add turn start timestamp to turn metadata (#19473)
## Why
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id` and `turn_id`.
- Issue: MCP servers may want the turn start timestamp to measure
internal latency relative to turn start.

## What Changed
- With change: turn metadata now includes `turn_started_at_unix_ms`,
which is propagated to MCP tool calls in
`_meta["x-codex-turn-metadata"]`.

## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/src/turn_timing_tests.rs`
- `codex-rs/core/tests/responses_headers.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
2026-04-28 16:36:59 +00:00
Eric Traut
4e0cf945b7 Terminate stdio MCP servers on shutdown to avoid process leaks (#19753)
## Why

Several bug reports describe thread shutdown (including subagent
threads) leaving stdio MCP server processes behind. These reports all
point at the same lifecycle gap: Codex launches stdio MCP servers, but
the session-level shutdown path does not explicitly close MCP clients or
terminate the server process tree.

Fixes #12491
Fixes #12976
Fixes #18881
Fixes #19469

## History

This is best understood as a regression/coverage gap in MCP session
lifecycle management, not as stdio MCP cleanup being absent all along.
#10710 added process-group cleanup for stdio MCP servers, but that
cleanup only runs when the `RmcpClient`/transport is dropped. The older
reports (#12491 and #12976) came after that cleanup existed, which
suggests the remaining problem was that some higher-level shutdown paths
kept the MCP manager alive or replaced it without explicitly draining
clients. The newer reports (#18881 and #19469) exposed the same family
around manager replacement and shutdown.

## What changed

- Added an explicit stdio MCP process handle in `codex-rmcp-client` so
local MCP servers terminate their process group and executor-backed MCP
servers call the executor process terminator.
- Added `RmcpClient::shutdown()` and manager-level MCP shutdown draining
so session shutdown, channel-close fallback, MCP refresh, and connector
probing stop owned MCP clients.
- Added regression coverage that starts a stdio MCP server, begins an
in-flight blocking tool call, shuts down the client, and asserts the
server process exits.

## Verification

- `cargo test -p codex-rmcp-client`
- `cargo test -p codex-mcp`
- `just fix -p codex-rmcp-client`
- `just fix -p codex-mcp`
- `just fix -p codex-core`

- Manual before/after validation with a temporary repro script:
- Pre-fix binary from `HEAD^` (`fed0a8f4fa`): reproduced the leak with
surviving MCP server and child PIDs, `survivors=[77583, 77592]`,
`leaked=true`.
- Post-fix binary from this branch (`67e318148b`): verified both MCP
processes were gone after interrupting `codex exec`, `survivors=[]`,
`leaked=false`.
2026-04-28 09:29:57 -07:00
Eric Traut
087c9c1f1f TUI: use cumulative turn duration for worked-for separator (#19929)
## Why

Fixes #19814.

The TUI's current `Worked for ...` timing behavior is a leftover from
#9599. At that point, models could emit multiple assistant messages in
one turn for preambles/commentary, but the TUI did not yet have a
reliable signal that an assistant message was the final answer when it
started streaming. To avoid showing an ever-growing elapsed time on each
preamble separator, #9599 made the separator timer incremental by
tracking elapsed time since the previous separator.

That workaround is no longer the right model for the final
completed-turn display. Since then, #16638 added protocol-native turn
timing, including `duration_ms` on turn completion. With that cumulative
duration available at the point where the TUI renders the completed-turn
separator, the UI can show the actual turn duration directly instead of
carrying per-separator timing state.

## What Changed

- Thread `duration_ms` into `ChatWidget::on_task_complete` from both
legacy `TurnCompleteEvent` handling and app-server `TurnCompleted`
notifications.
- Use `duration_ms` for the final `Worked for ...` separator, falling
back to the status indicator timer only when the protocol duration is
unavailable.
- Keep mid-turn separators before later assistant text as plain visual
dividers instead of clocked `Worked for ...` separators.
- Remove the old incremental separator timer state and helper
(`last_separator_elapsed_secs` / `worked_elapsed_from`).
- Add a snapshot regression test for a turn that runs a command and then
completes with a final answer, verifying the final separator uses the
cumulative turn duration.

## Verification

- `cargo test -p codex-tui
final_worked_for_uses_cumulative_turn_duration_snapshot`
- `just fix -p codex-tui`

Manual repro prompt:

```text
Manual timing repro. First send a short preamble/commentary sentence before using tools. Then run exactly this shell command: sleep 75; echo MANUAL_TIMING_DONE. After the command finishes, give a final answer that says "done". Do not skip the preamble.
```

After this change, the mid-turn break before the final answer should be
a plain divider, and the final completed-turn separator should show
`Worked for ...` using the cumulative turn duration.

Before:
<img width="414" height="102" alt="Screenshot 2026-04-27 at 10 09 01 PM"
src="https://github.com/user-attachments/assets/b9e2ce01-2460-40e4-a5c4-c9ba8add2557"
/>


After:
<img width="485" height="149" alt="Screenshot 2026-04-27 at 10 09 07 PM"
src="https://github.com/user-attachments/assets/d24089ae-d4e2-41b6-b966-07c98706ead4"
/>
2026-04-28 09:24:29 -07:00
jif-oai
5b7d6f5c4f feat: house-keeping memories 3 (#20005)
Move stuff in memories, no behavioural change expected
2026-04-28 18:13:35 +02:00
evawong-oai
0156b1e61f [sandbox] Enforce protected workspace metadata paths (#19846)
## Summary

Make FileSystemSandboxPolicy the semantic source of truth for project
root metadata protection. Under writable roots, `.git`, `.codex`, and
`.agents` stay protected unless user policy grants an explicit write
rule for that metadata path.

## Scope

1. Add `protected_metadata_names` to `WritableRoot`.
2. Teach `FileSystemSandboxPolicy::can_write_path_with_cwd` to reject
protected metadata writes under writable roots unless explicitly
allowed.
3. Default workspace write profiles to protect `.git`, `.codex`, and
`.agents`.
4. Add the Linux fallback setup needed before Linux enforcement lands
later in the stack.

## Reviewer Focus

1. The policy decision belongs in FileSystemSandboxPolicy, not shell
command parsing.
2. Legacy SandboxPolicy remains a compatibility projection, not the
source of the new rule.
3. Explicit user write rules can still opt into these metadata paths.

## Stack

1. Policy primitive: this PR
2. macOS Seatbelt adapter: #19847
3. Shell preflight UX: #19848
4. Runtime profile propagation: #19849
5. Linux bubblewrap adapter: #19852

## Validation

1. codex protocol permissions tests
2. formatting for codex protocol and codex linux sandbox
3. diff whitespace check
2026-04-28 09:10:41 -07:00
Felipe Coury
5e737372ee feat(tui): add configurable keymap support (#18593)
## Why

The TUI currently handles keyboard shortcuts as hard-coded event matches
spread across app, composer, pager, list, approval, and navigation code.
That makes shortcuts hard to customize, makes displayed hints easy to
drift from actual behavior, and makes future keymap work riskier because
there is no central action inventory.

This PR adds the foundation for configurable, action-based keymaps
without adding the interactive remapping UI yet. Onboarding
intentionally stays on fixed startup shortcuts because users cannot
reasonably configure keymaps before completing onboarding.

This is PR1 in the keymap stack:

- PR1: #18593: configurable keymap foundation
- PR2: #18594: `/keymap` picker and guided remapping UI
- PR3: #18595: Vim composer mode and the remap option

## Design Notes

The new model resolves named actions into concrete runtime bindings once
from config, then passes those bindings to the UI surfaces that handle
input or render shortcut hints.

The main concepts are:

- **Context**: a scope where an action is active, such as `global`,
`chat`, `composer`, `editor`, `pager`, `list`, or `approval`.
- **Action**: a named operation inside a context, such as
`global.open_transcript`, `composer.submit`, or `pager.close`.
- **Binding**: one or more single-key shortcuts assigned to an action,
written as config strings such as `ctrl-t`, `alt-backspace`, or
`page-down`. Multi-step sequences such as `ctrl-x ctrl-s`, `g g`, or
leader-key flows are not part of this PR.
- **Resolution order**: context-specific config wins first, supported
global fallbacks come next, and built-in defaults fill in anything
unset.
- **Explicit unbinding**: an empty array removes an action binding in
that scope and does not fall through to a fallback binding.
- **Conflict validation**: a resolved keymap rejects duplicate active
bindings inside the same scope so one keypress cannot dispatch two
actions.

## What Changed

- Added `TuiKeymap` config support under `[tui.keymap]`, including typed
contexts/actions, key alias normalization, generated schema coverage,
and user-facing config errors.
- Added `RuntimeKeymap` resolution in `codex-rs/tui/src/keymap.rs`,
including fallback precedence, built-in defaults, explicit unbinding,
and per-context conflict validation.
- Rewired existing TUI handlers to consume resolved keymap actions
instead of directly matching hard-coded keys in each component.
- Updated key hint rendering and footer/pager/list surfaces so displayed
shortcuts follow the resolved keymap.
- Kept onboarding shortcuts fixed in
`codex-rs/tui/src/onboarding/keys.rs` instead of exposing them through
`[tui.keymap]`.

## Validation

The branch includes focused coverage for config parsing, key
normalization, runtime fallback resolution, explicit unbinding,
duplicate-key conflict validation, default keymap consistency,
onboarding startup key behavior, and UI hint snapshots affected by
resolved key bindings.
2026-04-28 12:52:25 -03:00
Eric Traut
a61c785040 Reset TUI keyboard reporting on exit (#19625)
## Why

Codex enables enhanced keyboard reporting while the TUI owns the
terminal. In iTerm2, exiting the TUI with Ctrl+C can intermittently
leave the parent shell receiving raw CSI-u / `modifyOtherKeys` fragments
instead of normal key input.

Final terminal cleanup should put the parent shell back into normal
keyboard reporting even if the terminal misses the usual stack pop.

Fixes #19553.

## What Changed

- Move TUI keyboard enhancement setup and detection into
`tui/src/tui/keyboard_modes.rs`.
- Add an exit-only `restore_after_exit()` path that performs the normal
keyboard enhancement pop plus unconditional keyboard enhancement and
`modifyOtherKeys` resets.
- Keep temporary restore paths, such as external-editor handoff, using
the balanced stack pop behavior.

## Confidence

Medium. This is a speculative fix: I was not able to reproduce the
reported iTerm2 behavior manually, but the symptoms line up with
terminal keyboard reporting state surviving Codex exit. The added reset
sequences are scoped to final TUI shutdown and should be harmless when
the terminal is already clean.
2026-04-28 08:51:44 -07:00
friel-openai
598bbcdb58 Preserve assistant phase for replayed messages (#19832) 2026-04-28 08:46:13 -07:00
jif-oai
21e19912e0 feat: house-keeping memories 2 (#20000)
Just move metrics in a dedicated file
2026-04-28 17:26:44 +02:00
jif-oai
5a79dfab7c feat: house-keeping memories 1 (#19998)
Just move metrics in a dedicated file
2026-04-28 17:11:49 +02:00
jif-oai
1b74360365 feat: skip memory startup when Codex rate limits are low (#19990)
## Why

Memory startup runs in the background after an eligible turn, but it can
consume Codex backend quota at exactly the wrong time: when the user is
already near a rate-limit boundary. This PR adds a guard so the memory
pipeline backs off when the Codex rate-limit snapshot says the remaining
budget is too low.

## What Changed

- Added `memories.min_rate_limit_remaining_percent` with a default of
`25`, clamped to `0..=100`, and regenerated `core/config.schema.json`.
- Added `codex-rs/memories/write/src/guard.rs`, which fetches Codex
backend rate limits before memory startup and skips phase 1 / phase 2
when the Codex limit is reached or either tracked window is above the
configured usage ceiling.
- Keeps startup best-effort: non-Codex auth or rate-limit fetch/client
failures preserve the existing memory startup behavior.
- Records a `codex.memory.startup` counter with
`status=skipped_rate_limit` when startup is skipped.
- Added config parsing/clamping coverage and guard unit tests.

## Verification

- Added `codex-rs/memories/write/src/guard_tests.rs` for threshold,
primary/secondary window, and reached-limit behavior.
- Added config tests for TOML parsing and clamping.
2026-04-28 17:07:16 +02:00
efrazer-oai
0e8d6b8765 fix: configure AgentIdentity AuthAPI base URL (#19904)
## Summary

AgentIdentity runtime loading currently registers tasks against a single
hardcoded AuthAPI base URL. That works for production, but local and
staging validation may need registration to target a different
authapi-login-provider without baking internal staging service URLs into
the OSS binary.

This PR adds a small config surface for
`agent_identity_authapi_base_url` and threads it through the existing
auth-loading path as a direct argument. Explicit config wins. Without
config, task registration keeps using the production AuthAPI URL,
matching the current default behavior.

## Stack

1. openai/codex#19762 - `refactor: make auth loading async` (merged)
2. openai/codex#19763 - `refactor: load agent identity runtime eagerly`
3. This PR - `fix: configure AgentIdentity AuthAPI base URL`
4. openai/codex#19764 - `feat: verify agent identity JWTs with JWKS`

## Design decisions

- Keep the existing auth-loading shape and pass the new value as an
argument. This avoids another wrapper loader and keeps the call path
readable.
- Add config instead of embedding internal staging URLs. Environments
that need a non-production AuthAPI can configure it explicitly.
- Keep the default AuthAPI registration URL as production.
`chatgpt_base_url` remains separate and is used by the follow-up JWKS
verification PR for fetching public keys from the ChatGPT backend route.
- Resolve the AuthAPI base URL inside AgentIdentity loading, because
task registration is the only consumer of this value.

## Testing

Tests: targeted Rust checks, AgentIdentity auth tests, config schema
regeneration, formatter/fix pass, and whitespace diff check.
2026-04-28 08:06:45 -07:00
jif-oai
a9e5c34083 feat: trigger memories from user turns with cooldown (#19970)
## Why

Memory startup was tied to thread lifecycle events such as create, load,
and fork. That can run memory work before a thread receives real user
input, and it makes startup cost scale with thread management instead of
actual turns. Moving the trigger to `thread/sendInput` keeps memory
startup aligned with the first real user turn and lets it use the
current thread config at turn time.

The idea is to prevent ghost cost due to pre-warm triggered by the app

Turn-based startup can also make global phase-2 consolidation easier to
request repeatedly, so this adds a success cooldown and tightens the
default startup scan window.

## What Changed

- Start `codex_memories_write::start_memories_startup_task` after a
non-empty `thread/sendInput` turn is submitted, instead of from thread
create/load/fork paths:
d4a6885b78/codex-rs/app-server/src/codex_message_processor.rs (L6477-L6487)
- Expose `CodexThread::config()` so app-server can pass the live config
into memory startup at turn time.
- Add a six-hour successful-run cooldown for global phase-2
consolidation via `SkippedCooldown`:
d4a6885b78/codex-rs/state/src/runtime/memories.rs (L963-L966)
- Reduce memory startup defaults to at most 2 rollouts over 10 days:
d4a6885b78/codex-rs/config/src/types.rs (L31-L34)

## Verification

Updated the memory runtime coverage around phase-2 reclaim behavior,
including `phase2_global_lock_respects_success_cooldown`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 16:23:13 +02:00
jif-oai
fa127be25f Stabilize memory Phase 2 input ordering (#19967)
## Why

Phase 2 still needs to choose the most relevant stage-1 memory outputs
by usage and recency, but exposing that ranking as the rendered
`raw_memories.md` order creates unnecessary large diff. Usage-count or
timestamp changes can reshuffle otherwise unchanged memories, making the
workspace diff noisy and giving the consolidation prompt a misleading
recency signal from file position.
This fix will reduce token consumption

## What Changed

- Keep the existing top-N Phase 2 selection ranking by `usage_count`,
`last_usage`, `source_updated_at`, and `thread_id`.
- Return the selected rows in stable ascending `thread_id` order before
syncing Phase 2 filesystem inputs.
- Update the memory README, raw memories header, and consolidation
prompt so they describe the stable order and tell the prompt to use
metadata and workspace diffs instead of file order as the recency
signal.
- Adjust the memory runtime tests to use deterministic thread IDs and
assert the stable return order separately from the ranked selection
semantics.

## Test Coverage

- Existing memory runtime tests in
`codex-rs/state/src/runtime/memories.rs` now cover the stable returned
ordering for Phase 2 inputs.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 13:32:05 +02:00
jif-oai
54d1401170 feat: fix hinting 3 (#19963)
Fix https://github.com/openai/codex/pull/19805#discussion_r3153265562
2026-04-28 13:12:51 +02:00
jif-oai
b7c0f26910 feat: fix hinting 2 (#19961)
Fix this:
https://github.com/openai/codex/pull/19805#discussion_r3153265562
2026-04-28 13:06:41 +02:00
jif-oai
431ebeaef7 feat: split memories part 2 (#19860)
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 13:03:28 +02:00
jif-oai
fd36838cf3 Add MultiAgentV2 root and subagent context hints (#19805)
## Why

MultiAgentV2 sessions need startup guidance that matches the role of the
thread that is actually being created. Root agents and subagents have
different responsibilities, and forked subagents can inherit parent
rollout history. If the parent hint is carried into the child context,
the child can see stale or conflicting developer guidance before its own
session-specific context is added.

## What changed

- Added `features.multi_agent_v2.root_agent_usage_hint_text` and
`features.multi_agent_v2.subagent_usage_hint_text` config fields,
including schema/config parsing support.
- Injected the matching root or subagent hint into the initial context
as its own developer message when `multi_agent_v2` is enabled.
- Filtered configured MultiAgentV2 usage-hint developer messages out of
forked parent history so a child thread receives fresh guidance for its
own session source/config.
- Added targeted coverage for config parsing, initial-context rendering,
feature-config deserialization, and forked-history filtering.

## Context examples

With this config:

```toml
[features.multi_agent_v2]
enabled = true
root_agent_usage_hint_text = "Root guidance."
subagent_usage_hint_text = "Subagent guidance."
```

A root thread initial context renders the root hint as a standalone
developer message:

```text
[developer]
<existing developer context, when present>

[developer]
Root guidance.
```

A subagent thread initial context renders the subagent hint instead:

```text
[developer]
<existing developer context, when present>

[developer]
Subagent guidance.
```

When a subagent forks parent history, any parent developer message whose
text exactly matches the configured MultiAgentV2 root or subagent hint
is omitted from the forked history before the child receives its fresh
subagent hint.
2026-04-28 12:31:45 +02:00
xli-oai
803705f795 Add remote plugin uninstall API (#19456)
## Summary
- Adds the remote `plugin/uninstall` request form using required
`pluginId` plus optional `remoteMarketplaceName`, while preserving local
`pluginId` uninstall.
- Adds `codex_core_plugins::remote::uninstall_remote_plugin` for the
deployed ChatGPT plugin backend uninstall path and validates the backend
returns the same id with `enabled: false`.
- Routes app-server remote uninstall through feature checks, remote
plugin id validation, backend mutation, local downloaded cache deletion,
cache clearing, docs, and regenerated protocol schemas.

## Tests
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol
plugin_uninstall_params_serialization_omits_force_remote_sync`
- `cargo test -p codex-app-server plugin_uninstall --test all`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo build -p codex-cli`
- `CODEX_BIN=/Users/xli/code/codex/codex-rs/target/debug/codex python3
/Users/xli/.codex/skills/xli-test-marketplace-api/scripts/run_marketplace_api_matrix.py`
(44 pass / 0 fail)
- `just fix -p codex-app-server-protocol -p codex-app-server -p
codex-tui`
- `just fix -p codex-app-server`
2026-04-28 03:27:53 -07:00
xl-openai
7d72fc8f53 feat: Cache remote plugin bundles on install (#19914)
Remote installs now fetch, validate, download, and cache the plugin
bundle locally
2026-04-28 00:53:27 -07:00
Eric Traut
b985768dc1 Add codex update command (#19933)
## Why

Addresses #9274

Running `codex update` currently starts an interactive Codex session
with `update` as the prompt. That is a rough edge for users who expect a
direct self-update command after seeing the existing update notice, and
it forces them to copy the suggested package-manager command manually.

## What changed

- Added a top-level `codex update` subcommand.
- Reused the existing install-channel detection and update command
runner that the TUI already uses for update prompts.
- Exposed the update-action lookup from `codex-tui` so the CLI can
invoke the same behavior.
- Added CLI coverage to ensure `codex update` is parsed as a subcommand
instead of becoming an interactive prompt.

## Verification

- `cargo test -p codex-cli`
- `cargo test -p codex-tui update_action::tests`
2026-04-27 23:33:59 -07:00
Michael Bolin
0a32c8b396 app-server-protocol: mark permission profiles experimental (#19899)
## Why

`PermissionProfile` is now the canonical internal permissions
representation, but the app-server wire shape is still intentionally
unstable while the migration continues. Stable app-server clients should
not see or generate code for these fields until the wire format settles.

## What changed

- Marks every app-server v2 field that sends `PermissionProfile` as
experimental, including `command/exec`, `thread/start`, `thread/resume`,
`thread/fork`, and `turn/start` request/response payloads.
- Enables per-field experimental inspection for `command/exec`, so
`permissionProfile` is gated without making the entire method
experimental.
- Fixes the generated TypeScript schema filter to be comment-aware. The
previous scanner treated apostrophes inside doc comments as string
delimiters, so some experimental fields leaked into stable TypeScript
even though stable JSON was filtered correctly.

## Verification

- `cargo test -p codex-app-server-protocol`










---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19899).
* #19900
* __->__ #19899
2026-04-28 06:08:34 +00:00
Michael Bolin
341550c275 permissions: store thread sessions as profiles (#19776)
## Why

After thread sessions have a required `PermissionProfile`, the TUI no
longer needs to cache a separate legacy `SandboxPolicy` in
`ThreadSessionState`. Keeping the legacy field would reintroduce two
permission authorities in the session cache and make later
replay/switching logic easier to get wrong.

This PR keeps legacy app-server compatibility at the ingestion boundary:
old `sandbox` response values are still accepted, but they are
immediately converted to a cwd-anchored profile.

## What Changed

- Removes `ThreadSessionState.sandbox_policy`.
- Updates active-session permission syncing to write only the current
`PermissionProfile`.
- Updates thread-read/replay/test fixtures to use profiles as the cached
session permission source.
- Leaves legacy `sandbox` fields in app-server request/response protocol
paths unchanged; those are compatibility boundaries and are converted
before entering cached TUI state.

## Verification

- `cargo test -p codex-tui thread_session_state::tests --lib`
- `cargo test -p codex-tui
inactive_thread_started_notification_initializes_replay_session --lib`
- `cargo test -p codex-tui thread_events --lib`
- `just fix -p codex-tui`




































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19776).
* #19900
* #19899
* __->__ #19776
2026-04-28 05:49:58 +00:00
Eric Traut
92fb848065 Allow large remote app-server resume responses (#19920)
## Why

Remote TUI resume uses the app-server websocket client. That client
inherited tungstenite's default `16 MiB` frame limit, so a large saved
session could make `thread/resume` return a single JSON-RPC response
frame that the client rejected before the TUI could deserialize or
render it.

Fixes #19837

## What Changed

- Configure the remote app-server websocket client with a bounded `128
MiB` max frame/message size.
- Preserve the concrete remote worker exit reason when completing
pending requests after a transport/read failure instead of replacing it
with a generic channel-closed error.
- Add a regression test that sends a single `>16 MiB` JSON-RPC response
frame and verifies the typed request succeeds.

Note: This isn't a perfect fix. It really just moves the limit to a much
larger value. I looked at a bunch of other potential fixes (both
server-side and client-side), and they all involved significant
complexity, had backward-compatibility impact, or impacted performance
of common use cases. This simple fix should address the vast majority of
remote use cases.

## Verification

I reproed the problem locally using a long rollout. Verified that fix
addresses connection drop.
2026-04-27 22:44:10 -07:00
Michael Bolin
fc2a69107c permissions: derive snapshot sandbox projections (#19775)
## Why

`ThreadConfigSnapshot` is used by app-server and thread metadata code as
a stable view of active runtime settings. Keeping both `sandbox_policy`
and `permission_profile` in the snapshot duplicates permission state and
makes it possible for the legacy projection to drift from the canonical
profile.

The legacy `sandbox` value is still needed at app-server compatibility
boundaries, so this PR derives it on demand from the snapshot profile
and cwd instead of storing it.

## What Changed

- Removes `ThreadConfigSnapshot.sandbox_policy`.
- Adds `ThreadConfigSnapshot::sandbox_policy()` as a compatibility
projection from `permission_profile` plus `cwd`.
- Updates app-server response/metadata code and tests to call the
projection only where legacy fields still exist.
- Keeps snapshot construction profile-only so split filesystem rules,
disabled enforcement, and external enforcement remain represented by the
canonical profile.

## Verification

- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
dispatch_reclaims_stale_global_lock_and_starts_consolidation --lib`



































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19775).
* #19900
* #19899
* #19776
* __->__ #19775
2026-04-27 22:30:47 -07:00
Michael Bolin
bf38def44e permissions: make SessionConfigured profile-only (#19774)
## Why

`SessionConfiguredEvent` is the internal event that tells clients what
permissions are active for a session. Emitting both `sandbox_policy` and
`permission_profile` leaves two possible authorities and forces every
consumer to decide which one to honor. At this point in the migration,
the profile is expressive enough to represent managed, disabled, and
external sandbox enforcement, so the internal event can be profile-only.

The wire compatibility concern is older serialized events or rollout
data that only contain `sandbox_policy`; those still need to
deserialize.

## What Changed

- Removes `sandbox_policy` from `SessionConfiguredEvent` and makes
`permission_profile` required.
- Adds custom deserialization so old payloads with only `sandbox_policy`
are upgraded to a cwd-anchored `PermissionProfile`.
- Updates core event emission and TUI session handling to sync
permissions from the profile directly.
- Updates app-server response construction to derive the legacy
`sandbox` response field from the active thread snapshot instead of from
`SessionConfiguredEvent`.
- Updates yolo-mode display logic to treat both
`PermissionProfile::Disabled` and managed unrestricted filesystem plus
enabled network as full-access, while still preserving the distinction
between no sandbox and external sandboxing.

## Verification

- `cargo test -p codex-protocol session_configured_event --lib`
- `cargo test -p codex-protocol serialize_event --lib`
- `cargo test -p codex-exec session_configured --lib`
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox
--lib`
- `cargo test -p codex-tui session_configured --lib`
- `cargo test -p codex-tui
yolo_mode_includes_managed_full_access_profiles --lib`


































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19774).
* #19900
* #19899
* #19776
* #19775
* __->__ #19774
2026-04-27 22:06:47 -07:00
Eric Traut
5ba908d179 Avoid persisting ShutdownComplete after thread shutdown (#19630)
## Why

Fixes #19475.

`codex exec` can finish successfully and then emit an `ERROR` on stderr:

```text
failed to record rollout items: thread <id> not found
```

That happens because shutdown closes the live thread writer before
emitting `ShutdownComplete`. The terminal event was still using the
normal `send_event_raw` path, so it tried to append rollout items
through a recorder that had already been removed. The answer is correct,
but wrappers that treat stderr as failure can retry completed exec runs.

This looks like a likely recent regression from
[#18882](https://github.com/openai/codex/pull/18882), which routed live
thread writes through `ThreadStore` and added the shutdown-time live
writer close. I have not bisected this, so the PR treats #18882 as the
likely source based on the affected shutdown code path rather than a
proven first-bad commit.

## What Changed

`ShutdownComplete` now bypasses rollout persistence after thread
shutdown and is delivered directly to clients. The shutdown path still
records the protocol event in the rollout trace before delivery,
preserving trace visibility without attempting a post-shutdown
thread-store append.

The change also adds a regression test with the in-memory thread store
to assert that shutdown creates and shuts down the live thread without
appending another item after shutdown.
2026-04-27 22:02:08 -07:00
Eric Traut
b7e5588d18 Clarify PR template invitation requirement (#19912)
Addresses #19856

## Summary
- Clarifies that external code contributions are invitation only.
- Points contributors to `docs/contributing.md` for the full policy
instead of using the previous warning phrasing.
2026-04-27 21:45:15 -07:00
marksteinbrick-oai
6a8df2b61d [codex-analytics] include user agent in default headers (#17689)
## Summary
Adds the standard Codex `User-Agent` to shared default headers so the
responses-api WS handshake carries the same client OS and version
context as HTTP requests.

## Testing
- `cargo test -p codex-core
build_ws_client_metadata_includes_window_lineage_and_turn_metadata`
- `cargo test -p codex-core --test all responses_websocket`
2026-04-27 21:32:10 -07:00
efrazer-oai
c08177f7d0 refactor: load agent identity runtime eagerly (#19763)
## Summary

AgentIdentity auth previously registered the process task lazily behind
a `OnceCell`. That meant the auth object could be constructed before its
runtime task binding was known.

This PR makes AgentIdentity auth load the runtime task at auth load time
and stores the resulting process task id directly on the auth object.
The model-provider call path can then read a concrete task id instead of
handling a missing lazy value.

## Stack

1. [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762) (merged)
2. **This PR:** [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [fix: configure AgentIdentity AuthAPI base
URL](https://github.com/openai/codex/pull/19904)
4. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)

## Important call sites

| Area | Change |
| --- | --- |
| `AgentIdentityAuth::load` | Registers the process task during auth
loading and stores `process_task_id`. |
| `CodexAuth::from_agent_identity_jwt` | Awaits AgentIdentity auth
loading. |
| model-provider auth | Reads a concrete `process_task_id` instead of an
optional lazy value. |
| AgentIdentity auth tests | Mock task registration now covers eager
runtime allocation. |

## Design decisions

AgentIdentity auth now treats task registration as part of constructing
a usable auth object. That matches how callers use the value: once auth
is present, the model-provider path expects the task-scoped assertion
data to be ready.

## Testing

Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
2026-04-27 21:09:26 -07:00
canvrno-oai
2307aa8d98 Allow /statusline and /title slash commands during active turns (#19917)
- Marks `/title` and `/statusline` as available during active tasks.
- Extends the existing slash-command availability test coverage to
include these commands alongside `/goal`.
2026-04-27 20:57:20 -07:00
Michael Bolin
af95662a70 permissions: require profiles in TUI thread state (#19773)
## Why

`ThreadSessionState` is the TUI's cached view of an app-server session.
To make `PermissionProfile` the canonical runtime permissions model,
cached thread sessions need to always have a profile instead of treating
the profile as an optional supplement to a legacy `sandbox` response
field.

The main compatibility concern is older app-server v2 lifecycle
responses that only include `sandbox` and omit `permissionProfile`:

- `thread/start` -> `ThreadStartResponse.sandbox`
- `thread/resume` -> `ThreadResumeResponse.sandbox`
- `thread/fork` -> `ThreadForkResponse.sandbox`

Those responses must still hydrate correctly when the TUI is pointed at
an older app-server. This PR converts the legacy `sandbox` value into a
`PermissionProfile` immediately at response ingestion time, using the
response `cwd`, so cached sessions do not carry an optional profile that
can later reinterpret cwd-bound grants against a different thread cwd.

This fallback is intentionally boundary compatibility. The follow-up PRs
in this stack continue the cleanup by making `SessionConfiguredEvent`
profile-only, deriving sandbox projections from snapshots only when an
API still needs them, and then removing `sandbox_policy` from
`ThreadSessionState`.

## What Changed

- Makes `ThreadSessionState.permission_profile` required.
- Converts legacy app-server response `sandbox` values into a
`PermissionProfile` at ingestion time using the response cwd.
- Ensures `thread/read` hydration does not reuse a primary session
profile that may be anchored to a different cwd; it uses the active
widget permission settings for the read thread fallback instead of
reusing cached primary-session permissions.
- Keeps the app-server request path unchanged: embedded sessions send
profiles, while remote sessions continue using legacy sandbox overrides
for compatibility.

## Verification

- `cargo test -p codex-tui thread_read --lib`
- `cargo test -p codex-tui
permission_settings_sync_preserves_active_profile_only_rules --lib`
- `cargo test -p codex-tui
resume_response_restores_turns_from_thread_items --lib`
- `cargo test -p codex-tui thread_session_state::tests --lib`




























---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19773).
* #19900
* #19899
* #19776
* #19775
* #19774
* __->__ #19773
2026-04-27 20:39:06 -07:00
pakrym-oai
4e05f3053c Remove ghost snapshots (#19481)
## Summary
- Remove `ghost_snapshot` / `GhostCommit` from the Responses API surface
and generated SDK/schema artifacts.
- Keep legacy config loading compatible, but make undo a no-op that
reports the feature is unavailable.
- Clean up core history, compaction, telemetry, rollout, and tests to
stop carrying ghost snapshot items.

## Testing
- Unit tests passed for `codex-protocol`, `codex-core` targeted undo and
compaction flows, `codex-rollout`, and `codex-app-server-protocol`.
- Regenerated config and app-server schemas plus Python SDK artifacts
and verified they match the checked-in outputs.
2026-04-27 18:48:57 -07:00
Dylan Hurd
7e8594fc19 Stabilize plugin MCP fixture tests (#19452)
## Why

Recent `main` CI had repeated flakes in the plugin fixture tests:

- `codex-core::all
suite::plugins::explicit_plugin_mentions_inject_plugin_guidance` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `codex-core::all suite::plugins::plugin_mcp_tools_are_listed` failed
in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).

The failures were in the same plugin/MCP fixture family: assertions
expected sample plugin guidance or tool inventory, but the test could
observe the session before the sample MCP server had finished startup.

## Root Cause

`explicit_plugin_mentions_inject_plugin_guidance` submitted the user
turn immediately after constructing the session. MCP startup is
asynchronous, so on a slower or busier CI runner the prompt could be
built before the sample plugin MCP server had reported its tools. That
made the test depend on scheduler timing rather than the fixture being
ready.

`plugin_mcp_tools_are_listed` already needed the same readiness
condition, but its wait logic was local to that test.

## What Changed

- Added a shared `wait_for_sample_mcp_ready` helper for the plugin
fixture tests.
- Wait for `McpStartupComplete` before submitting the explicit plugin
mention turn.
- Reuse the same readiness helper in the MCP tool-listing test.

## Why This Should Be Reliable

The tests now wait for the explicit readiness signal from the sample MCP
server before asserting guidance or tools derived from that server. This
removes the startup race while still exercising the real fixture path,
so the assertions should only run after the plugin inventory is
deterministic.

## Verification

- `cargo test -p codex-core --test all plugins::`
- GitHub CI for this PR is passing.
2026-04-28 01:14:44 +00:00
Michael Zeng
a3350de855 Refactor exec-server filesystem API into codex-file-system (#19892)
## Summary
- Extracted the shared filesystem types and `ExecutorFileSystem` trait
into a new `codex-file-system` crate
- Switched `codex-config` and `codex-git-utils` to depend on that crate
instead of `codex-exec-server`
- Kept `codex-exec-server` re-exporting the same API for existing
callers

## Testing
- Ran `cargo test -p codex-file-system`
- Ran `cargo test -p codex-git-utils`
- Ran `cargo test -p codex-config`
- Ran `cargo test -p codex-exec-server`
- Ran `just fix -p codex-file-system`, `just fix -p codex-git-utils`,
`just fix -p codex-config`, `just fix -p codex-exec-server`
- Ran `just fmt`
- Updated and verified the Bazel module lockfile
2026-04-27 17:43:15 -07:00
colby-oai
2f3b5ed81a disallow fileparams metadata for custom mcps (#19836)
## Summary
Disallow fileParams metadata for custom MCPs 

Restricts Codex openai/fileParams handling to the first-party codex_apps
MCP server. Custom MCP servers may still advertise the metadata, but
Codex now ignores it for upload rewriting, preventing non-Apps tools
from receiving signed OpenAI file refs for local paths. Added a
regression test for the allowed and denied cases.
2026-04-27 20:42:10 -04:00
Michael Bolin
755880ef9c permissions: derive config defaults as profiles (#19772)
## Why

This continues the permissions migration by making legacy config default
resolution produce the canonical `PermissionProfile` first. The legacy
`SandboxPolicy` projection should stay available at compatibility
boundaries, but config loading should not create a legacy policy just to
immediately convert it back into a profile.

Specifically, when `default_permissions` is not specified in
`config.toml`, instead of creating a `SandboxPolicy` in
`codex-rs/core/src/config/mod.rs` and then trying to derive a
`PermissionProfile` from it, we use `derive_permission_profile()` to
create a more faithful `PermissionProfile` using the values of
`ConfigToml` directly.

This also keeps the existing behavior of `sandbox_workspace_write` and
extra writable roots after #19841 replaced `:cwd` with `:project_roots`.
Legacy workspace-write defaults are represented as symbolic
`:project_roots` write access plus symbolic project-root metadata
carveouts. Extra absolute writable roots are still added directly and
continue to get concrete metadata protections for paths that exist under
those roots.

The platform sandboxes differ when a symbolic project-root subpath does
not exist yet.

* **Seatbelt** can encode literal/subpath exclusions directly, so macOS
emits project-root metadata subpath policies even if `.git`, `.agents`,
or `.codex` do not exist.
* **bwrap** has to materialize bind-mount targets. Binding `/dev/null`
to a missing `.git` can create a host-visible placeholder that changes
Git repo discovery. Binding missing `.agents` would not affect Git
discovery, but it would still create a host-visible project metadata
placeholder from an automatic compatibility carveout. Linux therefore
skips only missing automatic `.git` and `.agents` read-only metadata
masks; missing `.codex` remains protected so first-time project config
creation goes through the protected-path approval flow. User-authored
`read` and `none` subpath rules keep normal bwrap behavior, and `none`
can still mask the first missing component to prevent creation under
writable roots.

## What Changed

- Adds profile-native helpers for legacy workspace-write semantics,
including `PermissionProfile::workspace_write_with()`,
`FileSystemSandboxPolicy::workspace_write()`, and
`FileSystemSandboxPolicy::with_additional_legacy_workspace_writable_roots()`.
- Makes `FileSystemSandboxPolicy::workspace_write()` the single legacy
workspace-write constructor so both `from_legacy_sandbox_policy()` and
`From<&SandboxPolicy>` include the project-root metadata carveouts.
- Removes the no-carveout `legacy_workspace_write_base_policy()` path
and the `prune_read_entries_under_writable_roots()` cleanup that was
only needed by that split construction.
- Adds `ConfigToml::derive_permission_profile()` for legacy sandbox-mode
fallback resolution; named `default_permissions` profiles continue
through the permissions profile pipeline instead of being reconstructed
from `sandbox_mode`.
- Updates `Config::load()` to start from the derived profile, validate
that it still has a legacy compatibility projection, and apply
additional writable roots directly to managed workspace-write filesystem
policies.
- Updates Linux bwrap argument construction so missing automatic
`.git`/`.agents` symbolic project-root read-only carveouts are skipped
before emitting bind args; missing `.codex`, user-authored `read`/`none`
subpath rules, and existing missing writable-root behavior are
preserved.
- Adds coverage that legacy workspace-write config produces symbolic
project-root metadata carveouts, extra legacy workspace writable roots
still protect existing metadata paths such as `.git`, and bwrap skips
missing `.git`/`.agents` project-root carveouts while preserving missing
`.codex` and user-authored missing subpath rules.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19772).
* #19776
* #19775
* #19774
* #19773
* __->__ #19772
2026-04-27 16:50:10 -07:00
pakrym-oai
c5a495c2cd Streamline review and feedback handlers (#19498)
## Why

The remaining review, interrupt, fuzzy search, feedback, and git-diff
handlers still had local send-error branches that obscured otherwise
simple request handling. This final slice flattens those handlers
without changing the public protocol behavior.

## What Changed

- Streamlined review start, turn interrupt, fuzzy search session,
feedback upload, and git diff handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Converted validation and upload failures into returned JSON-RPC errors
where that avoids nested `send_error`/`return` blocks.
- Left unrelated sandbox setup and notification code untouched.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::review --
--test-threads=1`
2026-04-27 16:36:04 -07:00
Matthew Zeng
dcd139b7c4 Add MCP app feature flag (#19884)
## Summary
- Add the `enable_mcp_apps` feature flag to the `codex-features`
registry
- Keep it under development and disabled by default

## Testing
- Unit tests for `codex-features` passed
- Formatting passed
2026-04-27 16:28:49 -07:00
canvrno-oai
e64c765673 Show action required in terminal title (#18372)
Implements #18162

This updates the TUI terminal title to show an explicit action-required
state when Codex is blocked on user approval or input. The terminal
title now uses the activity title item to cover both active work and
blocked-on-user states, while still accepting the legacy spinner config
value.

Changes
- Rename the terminal title item from `spinner` to `activity` while
preserving legacy config compatibility
- Show `[ ! ] Action Required `while approval or input overlays are
active, with a blinking `[ . ]` alternate state
- Suppress the normal working spinner while Codex is blocked on user
action
- Add targeted coverage for action-required title behavior and legacy
title-item parsing

Testing
- Trigger an approval or input modal and confirm the tab title
alternates between `[ ! ] Action Required` and `[ . ] Action Required`
- Disable the activity title item and confirm the action-required title
does not appear
- Resolve the prompt and confirm the title returns to the normal
spinning/idel state


https://github.com/user-attachments/assets/e9ecc530-a6be-4fd7-b9a6-d550a790eb2c
2026-04-27 15:27:11 -07:00
pakrym-oai
e903d000b0 Streamline turn and realtime handlers (#19497)
## Why

Turn and realtime handlers had nested validation and send-error branches
that made the request path longer than the behavior warranted. This
slice keeps the same request semantics while letting the handlers return
errors from the failing step.

## What Changed

- Streamlined turn start, injected item, and turn steer request handling
in `codex-rs/app-server/src/codex_message_processor.rs`.
- Applied the same result-returning shape to realtime session response
handlers.
- Preserved existing request validation and thread-manager interactions.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_steer --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_inject_items --
--test-threads=1`
2026-04-27 15:21:59 -07:00
pakrym-oai
739ab6bc51 Streamline thread resume and fork handlers (#19495)
## Why

Thread resume and fork had some of the deepest error-handling
indentation in this area because helpers emitted request errors
directly. Returning those failures gives the handlers a single request
boundary while preserving the async pending-resume behavior.

## What Changed

- Converted thread resume helpers in
`codex-rs/app-server/src/codex_message_processor.rs` to return `Result`
values for validation and view loading failures.
- Applied the same pattern to thread fork request handling.
- Simplified pending resume error construction by using the shared
JSON-RPC error helpers.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_resume --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::thread_fork --
--test-threads=1`
2026-04-27 15:04:53 -07:00
cassirer-openai
30c5c768de [codex] Trace cancelled inference streams (#19839)
Records cancelled inference streams when Codex stops consuming a
provider response before `response.completed`, preserving complete
output items observed before cancellation.

Also closes still-running inference calls when the owning turn ends, so
reduced rollout traces do not leave stale `Running` inference nodes.

Covered by focused reducer coverage and a core stream-drop test for
partial output preservation.
2026-04-27 21:58:29 +00:00
pakrym-oai
2be9fd5a93 Streamline thread read handlers (#19494)
## Why

The thread read/list handlers mostly assemble views, but their error
handling was interleaved with response emission. Returning view-building
errors from the helper path keeps those handlers focused on data
assembly.

## What Changed

- Added a small mapper for `ThreadReadViewError` to JSON-RPC errors in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Streamlined thread list, loaded-thread, read, turn-list, and summary
handlers to produce result values for the request boundary.
- Kept the existing invalid-request vs internal-error distinctions for
missing or unreadable thread data.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all conversation_summary --
--test-threads=1`
2026-04-27 14:30:24 -07:00
Steve Coffey
0f40261e86 Publish Python SDK with Codex-pinned versioning (#18996)
**note**: a large chunk of this diff comes from regenerating Python
types after app-server schema changes on `main`.

This is PR 3 of 3 for the Python SDK PyPI publishing split. PR #18862
refreshed the generated SDK surface, and PR #18865 made the runtime
package publishable as `openai-codex-cli-bin`; this final PR makes the
SDK package publishable as `openai-codex-app-server-sdk` and pins both
packages to the same Codex runtime version.

The key idea is that the published SDK version is the Codex runtime
version. That one version now drives the SDK package version, the exact
runtime dependency, the client version reported by the SDK, and the
bootstrap runtime pin. This keeps release-time versioning in one lane
instead of scattering checked-in literals through the package.

## What changed

- Rename the SDK distribution from `codex-app-server-sdk` to
`openai-codex-app-server-sdk` for conflict-free PyPI publishing.
- Use `stage-sdk --codex-version ...` with one Codex version for both
the SDK package version and exact `openai-codex-cli-bin` dependency.
- Preserve hidden legacy `--runtime-version` / `--sdk-version` args only
to reject mismatched versions during staging.
- Map PEP 440 package versions back to Codex release tags for runtime
setup downloads, e.g. `0.116.0a1` -> `rust-v0.116.0-alpha.1`.
- Derive `codex_app_server.__version__`, the default
`AppServerConfig.client_version`, and
`_runtime_setup.pinned_runtime_version()` from the SDK package/project
version instead of hardcoding duplicate version strings.
- Carry the current generated SDK refresh from `main` so
`generate-types` stays clean after recent app-server schema changes.
- Update `sdk/python/uv.lock` for the renamed editable package.

## Validation

- `uv run --extra dev pytest` in `sdk/python` -> 59 passed, 37 skipped.
- Targeted `uv run ruff check` for the touched SDK files.
- `git diff --check`.
- Staged runtime with `--codex-version rust-v0.116.0-alpha.1
--platform-tag macosx_11_0_arm64`.
- Staged SDK with `--codex-version rust-v0.116.0-alpha.1`.
- Built runtime wheel, SDK wheel, and SDK sdist.
- `twine check /tmp/codex-python-pr3-build/dist/*` -> passed.
- Clean venv smoke installed `openai-codex-app-server-sdk==0.116.0a1`
from local dist and pulled `openai-codex-cli-bin==0.116.0a1`.
- Smoke imports passed for `Codex` and `bundled_codex_path()`.
2026-04-27 14:28:46 -07:00
starr-openai
4ded800374 [codex] Shard exec Bazel integration test (#19862)
## Summary

- shard `//codex-rs/exec:exec-all-test` into 8 Bazel shards
- keep the existing `no-sandbox` test tag unchanged

## Why

The Windows Bazel lane has been timing out this aggregated integration
test target at the default 300s test timeout. The target runs the
combined `codex-rs/exec/tests/all.rs` integration binary; sharding lets
Bazel split the Rust test cases across parallel test actions instead of
running the whole integration suite as one long action.

## Validation

Not run locally, per the Codex repo workflow for development-phase
changes.

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 21:22:53 +00:00
pakrym-oai
5c30d79afb Streamline thread mutation handlers (#19493)
## Why

Thread mutation handlers had many short error branches whose only job
was to emit a JSON-RPC error and stop. This slice keeps those errors
visible, but lets each handler build a result and return early from
validation helpers instead of nesting the main path.

## What Changed

- Streamlined thread archive/unarchive, rename, memory, metadata,
rollback, compact, background terminal, shell, and guardian handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.
- Reused shared JSON-RPC error constructors in
`codex-rs/app-server/src/bespoke_event_handling.rs` for rollback-related
request failures.
- Preserved direct `send_error` calls where they remain the simplest
boundary for pending async event responses.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::thread_rollback --
--test-threads=1`
2026-04-27 14:18:55 -07:00
joeytrasatti-openai
798de22637 [codex-backend] Prefer state git metadata in filtered thread lists (#19874)
### Summary

- `thread/list` filtered filesystem results already overlay state DB
metadata, but the existing merge only filled missing git fields.
- Prefer non-null SQLite git metadata over stale non-null rollout values
so persisted branch/SHA/origin updates are reflected in filtered thread
lists.
- Update the focused merge test to cover stale filesystem git metadata
being replaced by state-backed values.

### Testing

now getting expected icons
<img width="426" height="913" alt="Screenshot 2026-04-27 at 1 45 45 PM"
src="https://github.com/user-attachments/assets/027fb7e7-f54d-4353-8423-cb76f3c8f5ac"
/>
2026-04-27 21:02:40 +00:00
pakrym-oai
c5e2921e1d Streamline thread start handler (#19492)
## Why

The thread start handler mixed request validation, thread construction,
dynamic-tool validation, and JSON-RPC error emission in one nested flow.
Returning request errors from the helper path makes the successful setup
path easier to follow.

## What Changed

- Reworked `thread/start` handling in
`codex-rs/app-server/src/codex_message_processor.rs` so helper methods
return `Result` and the handler emits one result.
- Moved dynamic-tool validation failures into returned JSON-RPC errors
instead of local `send_error` branches.
- Preserved the existing thread creation and task-spawning behavior.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::dynamic_tools --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::turn_start --
--test-threads=1`
2026-04-27 13:56:20 -07:00
Michael Bolin
4b55979755 permissions: remove cwd special path (#19841)
## Why

The experimental `PermissionProfile` API had both `:cwd` and
`:project_roots` special filesystem paths, which made the permission
root ambiguous. This PR removes the unstable `current_working_directory`
special path before the permissions API is stabilized, so callers use
`:project_roots` for symbolic project-root access.

## What changed

- Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol
and app-server protocol models, plus regenerated app-server
JSON/TypeScript schemas.
- Replaces internal `:cwd` permission entries with `:project_roots`
entries.
- Keeps the existing cwd-update behavior for legacy-shaped
workspace-write profiles, while removing the deleted
`CurrentWorkingDirectory` case from that compatibility path.
- Keeps `PermissionProfile::workspace_write()` as the reusable symbolic
workspace-write helper, with docs noting that `:project_roots` entries
resolve at enforcement time.
- Updates app-server docs/examples and approval UI labeling to stop
advertising `:cwd` as a permission token.

## Compatibility

Persisted rollout items may contain the old
`{"kind":"current_working_directory"}` tag from earlier experimental
`permissionProfile` snapshots. This PR keeps that tag as a
deserialize-only alias for `ProjectRoots { subpath: None }`, while
continuing to serialize only the new `project_roots` tag.

## Follow-up

This PR intentionally does not introduce an explicit project-root set on
`SessionConfiguration` or runtime sandbox resolution. Today, the
resolver still uses the active cwd as the single implicit project root.
A follow-up should model project roots separately from tool cwd so
`:project_roots` entries can resolve against the configured project
roots, and resolve to no entries when there are no project roots.

## Verification

- `cargo test -p codex-protocol permissions:: --lib`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-sandboxing -p codex-exec-server --lib`
- `cargo test -p codex-core session_configuration_apply_ --lib`
- `cargo test -p codex-app-server
command_exec_permission_profile_project_roots_use_command_cwd --test
all`
- `cargo test -p codex-tui
thread_read_session_state_does_not_reuse_primary_permission_profile
--lib`
- `cargo test -p codex-tui
preset_matching_accepts_workspace_write_with_extra_roots --lib`
- `cargo test -p codex-config --lib`
2026-04-27 13:41:27 -07:00
Eric Traut
52c06b8759 Preserve TUI markdown list spacing after code blocks (#19706)
## Why

Fixes #19702.

The TUI markdown renderer could visually attach the next list marker to
a fenced code block inside the previous list item, even when the source
markdown included a blank line before the next item. That made
block-heavy loose lists harder to read, while the desired behavior is
still to keep simple lists compact.

## What changed

- Track whether the current rendered list item contains a code block.
- Preserve one blank separator before the following list marker only
when the previous item contained a code block.
- Add regression coverage for both paths: code-block list items keep the
separator, and simple loose list items stay compact.

## Verification

- `cargo test -p codex-tui markdown_render`

I also manually verified that the bug exists before and is fixed after.

## Before
<img width="437" height="240" alt="Screenshot 2026-04-26 at 1 19 01 PM"
src="https://github.com/user-attachments/assets/3bc9d64d-2dba-40d9-9d6b-a1d0b3c0f728"
/>

## After
<img width="410" height="269" alt="Screenshot 2026-04-26 at 1 18 54 PM"
src="https://github.com/user-attachments/assets/19c15bee-da32-455e-a7cb-e05eb85f4ea0"
/>
2026-04-27 13:40:46 -07:00
Eric Traut
0bd25ab374 Delay approval prompts while typing (#19513)
## Why

Fixes #7744. Approval modals can currently appear while the user is
typing ahead in the TUI composer, which lets plain letters like `y` or
`a` get consumed as approval shortcuts instead of staying in the draft
input.

## What changed

- Track recent composer typing activity in `bottom_pane/mod.rs`.
- Delay new approval overlays for 1 second while the composer is active,
keeping delayed requests queued until the user is idle.
- Preserve the existing active-overlay behavior so approvals that arrive
while an approval modal is already open are still queued into that
overlay.
- Prune delayed approvals when app-server resolution says the request
has already been handled.

## Verification

Added unit coverage for immediate approvals, delayed approvals, idle
deadline reset, typed shortcut letters staying in the composer, shortcut
handling after the delay, and resolved delayed-request pruning.

Focused `codex-tui` test groups pass locally. The full `cargo test -p
codex-tui` run currently aborts in
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`;
that same test also fails when run alone with the same stack overflow.

Manual reviewer check:

1. Start the TUI from the repo root:

   ```bash
   RUST_LOG=trace just codex \
     -c log_dir=<temp-log-dir> \
     --ask-for-approval untrusted \
     --sandbox workspace-write
   ```

2. Submit this prompt:

   ```text
   create a file text.txt on my desktop
   ```

3. While the agent is preparing the approval request, immediately type
text such as `ya this should stay in the composer`.
4. Confirm the typed-ahead `y`/`a` remains in the composer instead of
approving the request.
5. Stop typing for about 1 second; the approval modal should then
appear.
6. Once the modal is visible, press `y` and confirm the approval
shortcut works normally.
2026-04-27 13:20:55 -07:00
Eric Traut
850f035b8c Fix filtered thread-list resume regression in TUI (#19591)
## Why

`codex resume` regressed after
[#18502](https://github.com/openai/codex/pull/18502) changed the default
`thread/list` scan-and-repair path for metadata-filtered listings. The
TUI resume picker uses `thread/list` with source/provider/cwd filters
and `useStateDbOnly: false`, which is the intended
correctness-preserving mode: it should still consult the filesystem so
healthy, missing, or stale SQLite state can be repaired.

The regression was that #18502 made that filtered, filesystem-backed
path call `reconcile_rollout` for every filesystem hit, and then call it
again for each SQLite hit. When `reconcile_rollout` does not already
have extracted rollout items, it falls back to loading the full JSONL
rollout. That changed the resume picker’s first page from a cheap
rollout-head scan plus SQLite read-repair into full-file reads for large
sessions, so a few long threads could dominate TUI startup/resume
latency.

This change addresses the regression by keeping `useStateDbOnly: false`
on the correctness-preserving path while avoiding unnecessary full JSONL
reads for rows the filesystem scan has already validated.
Source/provider/cwd filters can be decided from rollout-head metadata,
so non-search resume listings only need the lightweight read-repair path
for filesystem hits. Full reconciliation is still used for DB-only
filtered rows because those can be stale false positives, and for search
listings because search can depend on title metadata that may require
scanning the full rollout.

This fixes #19483.

## What changed

- For non-search filtered listings, repair filesystem hits with the
lightweight `read_repair_rollout_path` path instead of full
`reconcile_rollout`.
- Track thread IDs proven by the filesystem scan and only fully
reconcile SQLite-filtered hits that the filesystem scan did not return,
preserving stale-DB false-positive cleanup without full-reading every
healthy rollout.
- Leave search listings on full reconciliation, since search depends on
full title metadata rather than only source/provider/cwd metadata from
the rollout head.

## Verification

- `cargo test -p codex-rollout list_threads`
- `cargo test -p codex-app-server thread_list`
2026-04-27 13:02:39 -07:00
Curtis 'Fjord' Hawthorne
277186ec85 Cap original-detail image token estimates (#19865)
Clamp original-detail image patch estimates to the current 10k patch
budget so large images cannot inflate local context accounting without
bound. Add regression coverage for an over-budget image.

Fixes openai/codex#19806.
2026-04-27 12:39:24 -07:00
rhan-oai
215d5a8f7c [codex-analytics] remove ga flag (#19863) 2026-04-27 19:29:19 +00:00
sayan-oai
85c1500569 fix: filter dynamic deferred tools from model_visible_specs (#19771)
fixes #19486

### Problem
Right now dynamic deferred tools are filtered at normal-turn prompt
building time, rather than upstream while building the `ToolRouter`
itself. This causes issues because dynamic deferred tools are then
wrongly included in the router's `model_visible_specs`, which is what
the compaction request-building flow relies on.

### Fix
Move the dynamic deferred tool filtering to `ToolRouter` creation time
to solve this problem for every request that relies on `ToolRouter` for
`model_visible_specs`, which solves the issue generically.

### Tests
Added unit + integration tests to ensure dynamic deferred tools are
omitted from `model_visible_specs` and compaction request respectively.

Tested against live `/compact` endpoint; raw deferred dynamic tools
without `tool_search` returned `400` (current bug), while the filtered
payload (this fix) returns `200`.
2026-04-27 19:09:02 +00:00
pakrym-oai
e5709db6dc Streamline account and command handlers (#19491)
## Why

Account login/logout and command exec handlers were doing local error
sends in the middle of each handler. That made these request flows
branch heavily even though most of the logic is validate, perform the
operation, and return the response.

## What Changed

- Converted ChatGPT/API-key login, login cancel, logout, rate-limit, and
add-credit handlers in
`codex-rs/app-server/src/codex_message_processor.rs` to compute `Result`
values and send them once at the request boundary.
- Applied the same shape to command exec start/write/resize/terminate
handlers.
- Kept side-effect notifications in the same places after successful
request handling.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::account --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::command_exec --
--test-threads=1`
2026-04-27 12:03:49 -07:00
Michael Bolin
cafe717dca ci: migrate Bazel setup away from archived setup-bazelisk (#19851)
## Why

All Bazel CI jobs are currently blocked in the `setup-bazelisk` step
while trying to download Bazelisk.
[`bazelbuild/setup-bazelisk`](https://github.com/bazelbuild/setup-bazelisk)
is archived, and its README now recommends migrating to
[`bazel-contrib/setup-bazel`](https://github.com/bazel-contrib/setup-bazel),
so leaving our workflows on the archived action leaves CI exposed to
exactly this sort of outage.

Because `v8-canary` now consumes the shared local `setup-bazel-ci`
action, that workflow also needs to trigger when the action changes.
Without that follow-up, Bazel bootstrap regressions specific to the V8
canary path could be skipped by the workflow path filters.

## What Changed

- Switched `.github/actions/setup-bazel-ci/action.yml` from
`bazelbuild/setup-bazelisk` to `bazel-contrib/setup-bazel`, pinned to
`0.19.0`.
- Left `bazelisk-version` unset so GitHub-hosted runners can use their
preinstalled Bazelisk instead of downloading `1.x` at job start.
- Updated `.github/workflows/rusty-v8-release.yml` and
`.github/workflows/v8-canary.yml` to use the shared `setup-bazel-ci`
action instead of referencing `setup-bazelisk` directly.
- Added `.github/actions/setup-bazel-ci/**` to the `pull_request` and
`push` path filters in `.github/workflows/v8-canary.yml` so changes to
the shared Bazel setup action still run the canary workflow.
- Kept the existing repository-cache and Windows-specific Bazel setup
logic intact.

This keeps Bazel version selection anchored by `.bazelversion` while
removing the failing dependency on the archived setup action.

## Verification

- Searched `.github/` to confirm there are no remaining `setup-bazelisk`
references.
- Parsed the updated workflow and action YAML locally with Ruby's
`YAML.load_file`.
2026-04-27 11:37:30 -07:00
Michael Bolin
c2084552d9 ci: pin npm staging smoke test to a recent rust-release run (#19854)
## Why

The `build-test` workflow stages a representative `codex` npm tarball by
asking `scripts/stage_npm_packages.py` to look up a past `rust-release`
run for a hardcoded release version. That started failing in CI because
the representative version in `.github/workflows/ci.yml` was stale:

- the workflow was still using `0.115.0`
- `stage_npm_packages.py` resolves native artifacts by looking for a
`rust-release` run on the `rust-v<version>` branch
- that lookup no longer found a matching run for `rust-v0.115.0`, so the
smoke test failed before it could stage the package

This PR makes that smoke test depend on a known-good recent release run
instead of an older branch lookup that is no longer reliable.

## What Changed

- Updated the representative release version in
`.github/workflows/ci.yml` from `0.115.0` to `0.125.0`.
- Added an explicit `WORKFLOW_URL` pointing at a recent successful
`rust-release` run:
`https://github.com/openai/codex/actions/runs/24901475298`.
- Passed that URL to `scripts/stage_npm_packages.py` via
`--workflow-url` so the job can reuse the expected native artifacts
directly instead of relying on `gh run list --branch rust-v<version>` to
discover them.

That keeps the npm staging smoke test representative while making it
less sensitive to older release branch history disappearing from the
GitHub Actions lookup path.

## Verification

- Inspected the failing CI log from `build-test` and confirmed the
failure came from `scripts/stage_npm_packages.py` being unable to
resolve `rust-v0.115.0`.
- Confirmed that
`https://github.com/openai/codex/actions/runs/24901475298` is a
successful `rust-release` run for `rust-v0.125.0`.
2026-04-27 11:32:48 -07:00
efrazer-oai
2009f6e894 refactor: make auth loading async (#19762)
## Summary

Auth loading used to expose synchronous construction helpers in several
places even though some auth sources now need async work. This PR makes
the auth-loading surface async and updates the callers to await it.

This is intentionally only plumbing. It does not change how
AgentIdentity tokens are decoded, how task runtime ids are allocated, or
how JWT signatures are verified.

## Stack

1. **This PR:** [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762)
2. [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)

## Important call sites

| Area | Change |
| --- | --- |
| `codex-login` auth loading | `CodexAuth` and `AuthManager`
construction paths now await auth loading. |
| app-server startup | Auth manager construction is awaited during
initialization. |
| CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now
await the same behavior. |
| cloud requirements storage loader | The loader becomes async so it can
share the same auth construction path. |
| auth tests | Tests that load auth now run in async contexts. |

## Testing

Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
2026-04-27 11:00:27 -07:00
pakrym-oai
4ed22fc7d2 Streamline plugin, apps, and skills handlers (#19490)
## Why

The plugin, app, and skills handlers had a lot of repeated
`send_error`/`return` branches that made the success path hard to scan.
This slice keeps behavior the same while moving fallible steps into
local response-producing helpers, so the request boundary can send one
result.

## What Changed

- Converted plugin list/install/uninstall handlers in
`codex-rs/app-server/src/codex_message_processor/plugins.rs` to return
`Result<*Response, JSONRPCErrorError>` from helper methods and call
`send_result` once.
- Added local error-mapping helpers for plugin install/uninstall and
marketplace failures.
- Applied the same mechanical shape to app list, skills list/config, and
marketplace add/remove/upgrade handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::app_list --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::plugin_ --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::skills_list --
--test-threads=1`
2026-04-27 10:18:25 -07:00
Eric Traut
48dd7b58f0 Render delegated patch approval details (#19709)
## Why

Fixes #19632.

When a delegated agent requests approval for an in-progress file change,
the parent TUI handles that request from an inactive thread. The app
server already sent the `FileChange` item with the proposed diff, but
the inactive-thread approval path was not recovering and rendering it
the same way as the active-thread path.

The result was an inconsistent approval prompt: main-thread edits show a
normal patch preview history item before the approval modal, while
delegated edits did not show that preview in the transcript flow.

## What Changed

- Recover buffered or historical `FileChange` item changes when building
inactive-thread file-change approval requests.
- Reuse the app-server file-change conversion helper for both live
transcript rendering and inactive-thread approvals.
- Render recovered delegated patches as a normal patch preview history
cell before the approval modal.
- Keep apply-patch approval modals focused on the decision prompt and
optional metadata; they do not render a synthetic command line or embed
the diff body.

## Manual Repro And Verification

I manually reproduced the issue using a file under `~/Desktop` so the
write would require approval.

Before the fix:

1. Ask the main thread: `Use apply_patch, not shell redirection or
Python, to create ~/Desktop/bug1.txt with three short lines.`
2. Observe the expected TUI shape: the transcript shows a normal patch
preview such as `• Added ~/Desktop/bug1.txt (+N -0)` above the approval
modal, and the modal contains only the approval prompt/options without a
synthetic command line.
3. Ask for the delegated path: `Spawn a worker. Have it use apply_patch,
not shell redirection or Python, to create ~/Desktop/bug1.txt with four
short lines.`
4. Observe the delegated approval is inconsistent: the parent view does
not render the proposed patch as the normal transcript preview before
the modal, so the diff context is missing from the stream or appears
inside the modal instead of in the history flow.

After the fix:

1. Repeat the delegated worker prompt with `apply_patch`.
2. Confirm the parent view renders the same normal patch preview history
cell (`• Added ~/Desktop/bug1.txt (+N -0)` plus the diff) immediately
before the approval modal.
3. Confirm the approval modal remains focused on the decision prompt.
For delegated approvals it may show the worker thread label, but it
should not show a `$ apply_patch` command line or embed the diff body in
the modal.
2026-04-27 10:07:15 -07:00
Eric Traut
0e2300c02c Persist shell mode commands in prompt history (#19618)
## Why

`!` shell commands are currently surfaced as "Bash mode", which is
misleading for users running shells such as PowerShell or zsh. Those
commands also bypass the persistent prompt history path, so they cannot
be recalled after starting a new session.

Fixes #19613.

## What changed

- Rename the TUI footer label and related test wording from "Bash mode"
to "Shell mode".
- Persist accepted `!` shell commands to prompt history with the leading
`!`, so recall restores the composer into shell mode across sessions.
- Add coverage for immediate and queued shell-command submissions
emitting the prompt-history update.

## Verification

- `cargo test -p codex-tui bang_shell`
- `cargo test -p codex-tui shell_command_uses_shell_accent_style`
- `cargo test -p codex-tui footer_mode_snapshots`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`

Manually verified fix after confirming presence of bug prior to fix.
2026-04-27 09:54:25 -07:00
Eric Traut
6c51bf0c7c Hide rewind preview when no user message exists (#19510)
## Why

Fixes #19508.

In a fresh TUI session, pressing `Esc` twice entered the rewind
transcript overlay even though there was no user message to rewind to.
That produced an empty header-only transcript view and exposed a rewind
flow that could not select a valid target.

## What changed

The backtrack flow now checks whether a user-message rewind target
exists before opening the transcript preview. If no target exists, Codex
stays in the main TUI and shows `No previous message to edit.` instead
of opening an empty overlay.

The same guard applies when starting rewind preview from the transcript
overlay, and the first `Esc` no longer advertises the “edit previous
message” hint when there is no previous message available.

Snapshot coverage was added for the unavailable rewind info message,
along with a small target-detection test.
2026-04-27 09:51:12 -07:00
jif-oai
bb83eec825 chore: split memories part 1 (#19818)
Extract memories into 2 different crates
2026-04-27 16:01:05 +02:00
jif-oai
f431ec12c9 nit: one more fix (#19813)
Fix this:
https://github.com/openai/codex/pull/19812#discussion_r3147529230
2026-04-27 15:32:31 +02:00
jif-oai
79b4f691a6 Avoid rewriting Phase 2 selection on clean workspace (#19812)
## Why

Phase 2 can now claim the global consolidation lock on startup even when
the git-backed memory workspace is already clean. The clean-workspace
path still finalized through the normal Phase 2 success path, which
clears and re-marks `selected_for_phase2` rows. That made no-op startups
perform avoidable writes to `stage1_outputs`, creating unnecessary DB
I/O and contention when no memory files changed.

## What Changed

- Added a preserving-selection Phase 2 finalizer in `codex-state` that
only marks the global job row as succeeded.
- Kept the existing `mark_global_phase2_job_succeeded` behavior for real
consolidation runs, where the selected Phase 2 snapshot must be
rewritten.
- Switched the `succeeded_no_workspace_changes` branch in
`core/src/memories/phase2.rs` to use the preserving-selection finalizer.
- Added a regression test that installs a SQLite trigger on
`stage1_outputs` and verifies the clean finalizer performs zero updates
there.

## Testing

- `cargo test -p codex-state`
- `cargo test -p codex-core memories::tests::phase2`
2026-04-27 15:14:16 +02:00
jif-oai
5d314f324c Allow Phase 2 memory claims after retry exhaustion (#19809)
## Why

The Phase 2 memories job row is only the global lock for the git-backed
memory workspace. Manual memory edits do not enqueue new Stage 1 work,
so a Phase 2 row with `retry_remaining = 0` could be skipped before the
worker ever claimed the lock and generated `phase2_workspace_diff.md`.

That left workspace-only changes unconsolidated after repeated failures,
even when retry backoff had elapsed and the filesystem had real diffable
work.

## What Changed

- Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after
the retry budget is exhausted, while still respecting active `retry_at`
backoff and fresh running leases.
- Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and
update the outcome docs to match.
- Clamp Phase 2 retry bookkeeping at zero when failed attempts are
recorded.

## Verification

- Added
`phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to
cover the exhausted-budget lock claim path.
- Ran `cargo test -p codex-state`.
2026-04-27 14:58:11 +02:00
jif-oai
01ab25dbb5 feat: use git-backed workspace diffs for memory consolidation (#18982)
## Why

This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc

The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
 
As a follow-up we will need to add support for user's edition while
`morpheus` is running

## What Changed

- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.

## Verification

- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 14:32:44 +02:00
jif-oai
f8c527e529 multi_agent_v2: move thread cap into feature config (#19792)
## Why

`features.multi_agent_v2.max_concurrent_threads_per_session` is meant to
be the MultiAgentV2-specific session thread cap: it counts the root
thread and all open subagent threads. The previous implementation kept
this surface tied to `agents.max_threads`, which made it a global
subagent-only cap and allowed the legacy setting to coexist with
MultiAgentV2.

## What Changed

- Added `max_concurrent_threads_per_session` to
`[features.multi_agent_v2]` with default `4`.
- Removed the `[agents] max_concurrent_threads_per_session` alias to
`agents.max_threads`.
- When MultiAgentV2 is enabled, reject `agents.max_threads` and derive
the existing internal subagent slot limit as
`max_concurrent_threads_per_session - 1`.
- Regenerated `core/config.schema.json` and added coverage for the new
config semantics.

## Result
```
➜  codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3
╭────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0)                           │
│                                                    │
│ model:     gpt-5.5 xhigh   fast   /model to change │
│ directory: ~/code/codex                            │
╰────────────────────────────────────────────────────╯

  Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager.

› Can you try to spawn 4 agents


• I’ll try to start four lightweight agents at once and report exactly what the runtime accepts.

• Spawned Russell [no-apps] (gpt-5.5 xhigh)
  └ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Spawned Descartes [no-apps] (gpt-5.5 xhigh)
  └ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Agent spawn failed
  └ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Agent spawn failed
  └ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed.
```

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 13:31:56 +02:00
Eric Traut
4f1d5f00f0 Add Codex issue digest skill (#19779)
Problem: Maintainers need a shared way to run Codex GitHub issue digests
without copying large prompts or relying on manual GitHub page
summaries.

Solution: Add a reusable codex-issue-digest skill with a deterministic
GitHub collector, owner/all-label windows, reaction-aware activity
metrics, scaled attention markers, and focused tests.
2026-04-26 23:16:43 -07:00
Michael Bolin
a6ca39c630 permissions: derive legacy exec policies at boundaries (#19737)
## Why

After config and requirements store canonical profiles, exec requests
should not cache a derived `SandboxPolicy`. The cached legacy value can
drift from the richer profile state, and most execution paths already
have the filesystem and network runtime policies they need.

## What Changed

- Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest`
and `codex_core::sandboxing::ExecRequest`.
- Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper
for the Windows and legacy call sites that still need a `SandboxPolicy`
projection.
- Updates Windows filesystem override setup and unified exec policy
serialization to derive that compatibility policy at the boundary.
- Updates Unix escalation reruns and direct shell requests to
reconstruct exec requests from `PermissionProfile` plus runtime
filesystem/network policy, without carrying a cached legacy policy.
- Adjusts sandboxing manager tests to assert the effective profile
rather than the removed legacy field.

## Verification

- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-sandboxing manager`
- `cargo test -p codex-core
exec_server_params_use_env_policy_overlay_contract`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-core exec::tests`
- `cargo test -p codex-core sandboxing::tests`
2026-04-26 22:11:49 -07:00
Michael Bolin
523e4aa8e3 permissions: constrain requirements as profiles (#19736) 2026-04-26 21:49:30 -07:00
Michael Bolin
0ccd659b4b permissions: store only constrained permission profiles (#19735) 2026-04-26 20:59:58 -07:00
Won Park
8033b6a449 Add /auto-review-denials retry approval flow (#19058)
## Why

Auto-review can deny an action that the user later decides they want to
retry. Today there is no TUI surface for selecting a recent denial and
sending explicit approval context back into the session, so users have
to restate intent manually and the retry can be reviewed without the
original denied action context.

This adds a narrow TUI-driven path for approving a recent denied action
while still keeping the retry inside the normal auto-review flow.

## What Changed

- Added `/auto-review-denials` to open a picker of recent denied
auto-review actions.
- Added a small in-memory TUI store for the 10 most recent denied
auto-review events.
- Selecting a denial sends the structured denied event back through the
existing core/app-server op path.
- Core now injects a developer message containing the approved action
JSON rather than the full assessment event.
- Auto-review transcript collection now preserves this specific approval
developer message so follow-up review sessions can see the user approval
context.
- Added TUI snapshot/unit coverage for the picker and approval dispatch
path.
- Added core coverage for retaining the approval developer message in
the auto-review transcript.

## Verification

- `cargo test -p codex-core
collect_guardian_transcript_entries_keeps_manual_approval_developer_message`
- `cargo test -p codex-tui auto_review_denials`
- `cargo test -p codex-tui
approving_recent_denial_emits_structured_core_op_once`

## Notes

This intentionally keeps retries going through auto-review. The approval
signal is context for the exact previously denied action, not a blanket
bypass for similar future actions.
2026-04-27 03:43:53 +00:00
Michael Bolin
0d8cdc0510 permissions: centralize legacy sandbox projection (#19734)
## Why

The remaining migration work still needs `SandboxPolicy` at a few
compatibility boundaries, but those projections should come from one
canonical path. Keeping ad hoc legacy projections scattered through
app-server, CLI, and config code makes it easy for behavior to drift as
`PermissionProfile` gains fidelity that the legacy enum cannot
represent.

## What Changed

- Adds `Permissions::legacy_sandbox_policy(cwd)` and
`Config::legacy_sandbox_policy()` as the compatibility projection from
the canonical `PermissionProfile`.
- Adds `Permissions::can_set_legacy_sandbox_policy()` so legacy inputs
are checked after they are converted into profile semantics.
- Updates app-server command handling, Windows sandbox setup, session
configuration, and sandbox summaries to use the centralized projection
helper.
- Leaves `SandboxPolicy` in place only for boundary inputs/outputs that
still speak the legacy abstraction.

## Verification

- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-tui
permissions_selection_history_snapshot_full_access_to_default --
--nocapture`
- `cargo test -p codex-tui
permissions_selection_sends_approvals_reviewer_in_override_turn_context
-- --nocapture`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_history_snapshot_full_access_to_default
--test_output=errors`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_sends_approvals_reviewer_in_override_turn_context
--test_output=errors`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19734).
* #19737
* #19736
* #19735
* __->__ #19734
2026-04-26 20:31:23 -07:00
Abhinav
c3e60849e5 inline hostname resolution for remote sandbox config (#19739)
# Why

Requirements support host-specific
`remote_sandbox_config.hostname_patterns`, but config loading previously
resolved and passed the system hostname through every config-loading
path even when no requirements layer used `remote_sandbox_config`. On
machines where hostname lookup is slow, startup and app-server config
reads paid for a feature that was not active.

We only need the hostname when a requirements layer actually declares
`remote_sandbox_config`, so this moves hostname resolution to the single
requirements merge point and keeps all other config callers unaware of
hostname matching.

# What

- Removed the eager `host_name` plumbing from
`load_config_layers_state`, `load_requirements_toml`, `ConfigBuilder`,
app-server `ConfigManager`, network proxy loading, and related call
sites.
- Resolve the hostname inside
`merge_requirements_with_remote_sandbox_config` only when the incoming
requirements contain `remote_sandbox_config`.
2026-04-27 03:18:57 +00:00
Michael Bolin
ad57a3fee2 permissions: finish profile-backed app surfaces (#19395) 2026-04-26 19:42:39 -07:00
Andrey Mishchenko
1f304dd1f2 Allow agents.max_threads to work with multi_agent_v2 (#19733) 2026-04-26 17:56:05 -07:00
Michael Bolin
2cb8746457 permissions: remove core legacy policy round trips (#19394)
## Why

Several execution paths still converted profile-backed permissions into
`SandboxPolicy` and then rebuilt runtime permissions from that legacy
shape. Those round trips are unnecessary after the preceding PRs and can
lose split filesystem semantics. Core approval and escalation should
carry the resolved profile directly.

## What Changed

- Removes `sandbox_policy` from `ResolvedPermissionProfile`; the
resolved permission object now carries the canonical `PermissionProfile`
directly.
- Updates exec-policy fallback, shell/unified-exec interception,
escalation reruns, and related tests to pass profiles instead of legacy
policies.
- Removes legacy additional-permission merge helpers that built an
effective `SandboxPolicy` before rebuilding runtime permissions.
- Keeps legacy projections only at compatibility boundaries that still
require `SandboxPolicy`, not in core permission computation.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`







































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394).
* #19737
* #19736
* #19735
* #19734
* #19395
* __->__ #19394
2026-04-26 17:43:32 -07:00
Andrey Mishchenko
35bc6e3d01 Delete unused ResponseItem::Message.end_turn (#19605)
This field is unused. Delete it.
2026-04-26 17:18:09 -07:00
Ahmed Ibrahim
0bda8161a2 Split MCP connection modules (#19725)
## Why

The MCP connection manager module had grown to mix orchestration, RMCP
client startup, elicitation handling, Codex Apps cache and naming
behavior, tool qualification and filtering, and runtime data. The
previous stacked PRs split these responsibilities incrementally; this PR
collapses that work into one self-contained refactor on latest main.

## What changed

- Move McpConnectionManager into connection_manager.rs.
- Move RMCP client lifecycle, startup, and uncached tool listing into
rmcp_client.rs.
- Move elicitation request tracking and policy handling into
elicitation.rs.
- Move Codex Apps cache, key, filtering, and naming helpers into
codex_apps.rs.
- Rename the tool-name helper module to tools.rs and move ToolInfo, tool
filtering, schema masking, and qualification there.
- Move runtime and sandbox shared types into runtime.rs.
- Preserve latest main PermissionProfile-based MCP elicitation
auto-approval behavior.

## Verification

- just fmt
- cargo check -p codex-mcp
- cargo check -p codex-mcp --tests
- cargo check -p codex-core

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-26 23:23:34 +00:00
Michael Bolin
4c58e64f08 test: increase core-all-test shard count to 16 (#19727)
## Summary

Increase `core-all-test`'s Bazel shard count from `8` to `16`.

## Why

[#19609](https://github.com/openai/codex/pull/19609) restored
`bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s
shard count because the bigger timeout risk was not just a cold Windows
build. The more common problem was a long `rust_test()` shard failing
and getting retried multiple times.

Recent `main` runs show that `//codex-rs/core:core-all-test` still has
the same shape of problem on Windows:

- [Run
24943931330](https://github.com/openai/codex/actions/runs/24943931330)
reported `//codex-rs/core:core-all-test` as flaky after first-attempt
failures in shard `5/8` and shard `8/8`.
- Those retries were driven by
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
and
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`.
- The failed shard attempts in that run took `272.61s` and `259.27s`
before retrying, which is exactly the sort of wall-clock cost that burns
through the 30-minute budget.
- [Run
24966332583](https://github.com/openai/codex/actions/runs/24966332583)
also retried `//codex-rs/tui:tui-unit-tests` after
`app::tests::update_memory_settings_updates_current_thread_memory_mode`
failed once on Windows.
- [Run
24965527138](https://github.com/openai/codex/actions/runs/24965527138)
and its linked [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce)
show the other half of the problem: when Windows cache reuse is weak,
the `bazel test //...` step can already consume `24m11s` on its own,
leaving very little headroom for flaky retries.

Increasing `core-all-test` to `16` shards does not fix the flaky tests,
but it does reduce the wall-clock cost when a single shard has to be
retried. That matches the mitigation we already applied to
`app-server-all-test` in `#19609`.

## What Changed

- Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards
instead of `8`.
- Leave `core-unit-tests` unchanged.

## Follow-up Work

This change is meant to buy back CI headroom while we fix the flaky
tests themselves in subsequent commits. The recent Windows retries that
look worth addressing directly include:

-
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
-
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`
-
`app::tests::update_memory_settings_updates_current_thread_memory_mode`

## Verification

- Compared `core-all-test`'s current sharding against the
`app-server-all-test` precedent in
[#19609](https://github.com/openai/codex/pull/19609).
- Inspected recent `main` Bazel workflow logs and the linked BuildBuddy
invocation to confirm that Windows retries on long shards are still
consuming a meaningful fraction of the 30-minute timeout budget.
- Did not run local tests for this change because it only adjusts Bazel
sharding metadata.
2026-04-26 23:10:26 +00:00
pakrym-oai
ba159cbc79 Fix codex-core config test type paths (#19726)
Summary:
- Update config tests to reference config requirement types from
codex_config after the loader split.

Tests:
- just fmt
- cargo build -p codex-core --tests
- cargo clippy -p codex-core --tests -- -D warnings
2026-04-26 15:58:17 -07:00
Michael Bolin
dda8199b73 permissions: migrate approval and sandbox consumers to profiles (#19393)
## Why

Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.

## What Changed

- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`




































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
2026-04-26 15:30:40 -07:00
pakrym-oai
9c3abcd46c [codex] Move config loading into codex-config (#19487)
## Why

Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.

To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.

This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.

## What Changed

- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`

## Notes

- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
2026-04-26 15:10:53 -07:00
pakrym-oai
2a020f1a0a Lift app-server JSON-RPC error handling to request boundary (#19484)
## Why

App-server request handling had a lot of repeated JSON-RPC error
construction and one-off `send_error`/`return` branches. This made small
handlers noisy and pushed error response details into leaf code that
otherwise only needed to validate input or call the underlying API.

## What Changed

- Added shared JSON-RPC error constructors in
`codex-rs/app-server/src/error_code.rs`.
- Lifted straightforward request result emission into
`codex-rs/app-server/src/message_processor.rs` so response/error
dispatch happens at the request boundary.
- Reused the result helpers across command exec, config, filesystem,
device-key, external-agent config, fs-watch, and outgoing-message paths.
- Removed leaf wrapper handlers where the method body was only
forwarding to a response helper.
- Returned request validation errors upward in the simple cases instead
of sending an error locally and immediately returning.

## Verification

- `cargo test -p codex-app-server --lib command_exec::tests`
- `cargo test -p codex-app-server --lib outgoing_message::tests`
- `cargo test -p codex-app-server --lib in_process::tests`
- `cargo test -p codex-app-server --test all v2::fs`
- `cargo test -p codex-app-server --test all v2::config_rpc`
- `cargo test -p codex-app-server --test all v2::external_agent_config`
- `cargo test -p codex-app-server --test all v2::initialize`
- `just fix -p codex-app-server`
- `git diff --check`

Note: full `cargo test -p codex-app-server` was attempted and stopped in
`message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
with a stack overflow after unrelated tests had already passed.
2026-04-26 15:10:35 -07:00
Michael Bolin
deaa307fb2 permissions: derive compatibility policies from profiles (#19392)
## Why

After #19391, `PermissionProfile` and the split filesystem/network
policies could still be stored in parallel. That creates drift risk: a
profile can preserve deny globs, external enforcement, or split
filesystem entries while a cached projection silently loses those
details. This PR makes the profile the runtime source and derives
compatibility views from it.

## What Changed

- Removes stored filesystem/network sandbox projections from
`Permissions` and `SessionConfiguration`; their accessors now derive
from the canonical `PermissionProfile`.
- Derives legacy `SandboxPolicy` snapshots from profiles only where an
older API still needs that field.
- Updates MCP connection and elicitation state to track
`PermissionProfile` instead of `SandboxPolicy` for auto-approval
decisions.
- Adds semantic filesystem-policy comparison so cwd changes can preserve
richer profiles while still recognizing equivalent legacy projections
independent of entry ordering.
- Updates config/session tests to assert profile-derived projections
instead of parallel stored fields.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`



































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
* #19395
* #19394
* #19393
* __->__ #19392
2026-04-26 15:06:42 -07:00
Michael Bolin
4d7ce3447d permissions: make runtime config profile-backed (#19606)
## Why

This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.

`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.

The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.

## What Changed

- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
2026-04-26 13:29:54 -07:00
efrazer-oai
fed0a8f4fa feat: load AgentIdentity from JWT login/env (#18904)
## Summary

This PR lets programmatic AgentIdentity users provide one token through
either stdin login or environment auth.

`codex login --with-agent-identity` reads an Agent Identity JWT from
stdin, validates that it has the required claims, and stores that token
as the `agent_identity` value in `auth.json`. The file format is
token-only; the decoded account and key fields are runtime state, not
hand-authored auth.json fields.

The Agent Identity JWT claim shape and decoder live in
`codex-agent-identity`; `codex-login` only owns env/storage precedence
and conversion into `CodexAuth::AgentIdentity`.

When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same
JWT without writing auth state to disk. `CODEX_API_KEY` still wins if
both env vars are set.

Reference old stack: https://github.com/openai/codex/pull/17387/changes
Reference JWT/env stack: https://github.com/openai/codex/pull/18176

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. This PR: accept AgentIdentity JWTs through login/env

## Testing

Tests: targeted login and Agent Identity crate tests, CLI checks, scoped
formatter/linter cleanup, and CI.

---------

Co-authored-by: Shijie Rao <shijie.rao@openai.com>
2026-04-26 19:49:54 +00:00
Michael Bolin
ac2bffa443 test: harden app-server integration tests (#19683)
## Why

Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.

A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.

## What Changed

- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.

## Verification

- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
2026-04-26 12:43:16 -07:00
Thibault Sottiaux
87bc72408c [codex] remove responses command (#19640)
This removes the hidden `codex responses` CLI subcommand after
confirming no downstream callers rely on it, deleting the raw Responses
passthrough implementation, unregistering the subcommand, and dropping
the now-unused CLI dependencies on `codex-api` and
`codex-model-provider`.
2026-04-25 23:10:38 -07:00
Andrey Mishchenko
355c40ad7e Support end_turn in response.completed (#19610)
Some providers of Responses API forward a model-defined `end_turn`
boolean indicating explicitly the model's indication of whether it would
like to end the turn or to be inferenced again. In this PR, we update
the sampling loop to use this field correctly if it's set. If the field
is not set by the provider, we fall back to the existing sampling logic.
2026-04-25 21:57:42 -07:00
Felipe Coury
5591912f0b fix(tui): reflow scrollback on terminal resize (#18575)
Fixes multiple scrollback and terminal resize issues: #5538, #5576,
#8352, #12223, #16165, and #15380.

## Why

Codex writes finalized transcript output into terminal scrollback after
wrapping it for the current viewport width. A later terminal resize
could leave that scrollback shaped for the old width, so wider windows
kept narrow output and narrower windows could show stale wrapping
artifacts until enough new output replaced the visible area.

This is also the foundation PR for responsive markdown tables. Table
rendering needs finalized transcript content to be width-sensitive after
insertion, not only while content is first streaming. Markdown table
rendering itself stays in #18576.

## Stack

- PR1: resize backlog reflow and interrupt cleanup
- #18576: markdown table support

## What Changed

- Rebuild source-backed transcript history when the terminal width
changes. `terminal_resize_reflow` is introduced through the experimental
feature system, but is enabled by default for this rollout so we can
validate behavior across real terminals.
- Preserve assistant and plan stream source so finalized streaming
output can participate in resize reflow after consolidation.
- Debounce resize work, but force a final source-backed reflow when a
resize happened during active or unconsolidated streaming output.
- Clear stale pending history lines on resize so old-width wrapped
output is not emitted just before rebuilt scrollback.
- Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
omitted uses terminal-specific defaults, `0` keeps all rendered rows,
and a positive value sets an explicit cap. The cap applies both while
initially replaying a resumed transcript into scrollback and when
rebuilding scrollback after terminal resize.
- Consolidate interrupted assistant streams before cleanup, then clear
pending stream output and active-tail state consistently.
- Move resize reflow and thread event buffering helpers out of `app.rs`
into dedicated TUI modules.
- Add focused coverage for resize reflow, feature-gated behavior,
streaming source preservation, interrupted output cleanup,
unicode-neutral text, terminal-specific row caps, and composer/layout
stability.

## Runtime Bounds

Resize reflow keeps only the most recent rendered rows when a row cap is
active. The default is `auto`, which maps to the detected terminal's
default scrollback size where Codex can identify it: VS Code `1000`,
Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
Terminals without a dedicated mapping use the conservative fallback of
`1000` rows. Users can override this with `[tui.terminal_resize_reflow]
max_rows = N`, or set `max_rows = 0` to disable row limiting.

## Validation

- `just fmt`
- `git diff --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
transcript_reflow`
- `just fix -p codex-tui`
- PR CI in progress on the squashed branch
2026-04-25 22:00:32 -03:00
Shijie Rao
4e30281a13 Guard npm update readiness (#19389)
## Why
For npm/Bun-managed installs, the update prompt was treating the latest
GitHub release as ready to install. During the `0.124.0` release, GitHub
and npm visibility were not atomic: the root npm wrapper could become
visible before the npm registry marked that version as the package
`latest`. That left a window where users could be prompted to upgrade
before npm was ready for the release.

## What changed
- Keep GitHub Releases as the candidate latest-version source for
npm/Bun installs, but only write the existing `version.json` cache after
npm registry metadata proves that same root version is ready.
- Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
checking `dist-tags.latest` and root package `dist` metadata for the
GitHub candidate version.
- Move version parsing helpers into
`codex-rs/tui/src/update_versions.rs` so that logic can be tested
without compiling the release-only `updates.rs` module under tests.
- Update `.github/workflows/rust-release.yml` so the six known platform
tarballs publish before the root `@openai/codex` wrapper. Other npm
tarballs publish before the root wrapper, and the SDK publishes after
the root package it depends on.
2026-04-25 17:09:29 -07:00
Michael Bolin
9881dc7306 fix: restore 30-minute timeout for Bazel builds (#19609)
I think raising it to 45 minutes in
https://github.com/openai/codex/pull/19578 was a mistake for the reasons
explained in the comments in the code. Instead, we attempt to defend
against timeouts by increasing the number of shards in
`app-server-all-test` so that a "true failure" that gets run 3x should
not take as much wall clock time.
2026-04-25 16:34:06 -07:00
Michael Bolin
d54493ba1c test: stabilize app-server path assertions on Windows (#19604)
## Why

Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.

That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.

This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.

## Failure Modes Covered

- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.

## What Changed

- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.

## Behavior

This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.

## Verification

Targeted local checks run while extracting this fix:

- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`

Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.

## Docs

No docs update is needed because this is test-only infrastructure
stabilization.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
2026-04-25 16:25:28 -07:00
viyatb-oai
9aaa5d9358 [codex] Bypass managed network for escalated exec (#19595)
## Why

`sandbox_permissions = "require_escalated"` is treated as an explicit
request to approve the command and run it outside the
filesystem/platform sandbox. Before this change, shell and unified exec
still registered managed network approval context and could inject
Codex-managed proxy state into the child process, which meant an
approved escalated command could still hit a second network approval
path.

This PR makes that escalation boundary consistent: once a command is
explicitly approved to run outside the sandbox, Codex does not also
route that process through the managed network proxy.

## Security impact

Command/filesystem sandbox approval now implies network approval for
that command. If an untrusted command or script is allowed to run with
`require_escalated`, its network calls are unsandboxed: Codex-managed
network allowlists and denylists are not respected for that process, so
the command can exfiltrate any data it can read.

## What changed

- Skip managed network approval specs for
`SandboxPermissions::RequireEscalated`.
- Pass `network: None` into shell, zsh-fork shell, and unified exec
sandbox preparation for explicitly escalated requests.
- Strip Codex-managed proxy environment variables when
`CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
when the Codex marker is absent.
- Add regression coverage for the prepared exec request so the old
behavior cannot silently reappear.

## Verification

- `cargo test -p codex-core explicit_escalation`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
2026-04-25 23:23:58 +00:00
Eric Traut
0c785598b3 Keep slash command popup columns stable while scrolling (#19511)
## Why

Fixes #19499.

The slash-command popup recalculated the command-name column from only
the rows visible in the current viewport. That made the description
column shift horizontally while scrolling through `/` commands whenever
longer command names entered or left the visible window.

## What Changed

`codex-rs/tui/src/bottom_pane/command_popup.rs` now uses the shared
selection-popup `AutoAllRows` column-width mode for both height
measurement and rendering. This keeps the command description column
based on the full filtered slash-command list instead of the current
viewport.

## Verification

- `cargo test -p codex-tui bottom_pane::command_popup`
2026-04-25 14:25:58 -07:00
Michael Bolin
f41306b4f3 test: isolate remote thread store regression from plugin warmups (#19593)
Follow-up to #19266.

## Why


`thread_start_with_non_local_thread_store_does_not_create_local_persistence`
is meant to catch accidental local thread persistence when a non-local
thread store is configured. The Windows flake reported in [this
BuildBuddy
invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
showed that the assertion was tripping on an unexpected top-level `.tmp`
entry:

```diff
 {
+    ".tmp",
     "config.toml",
     "installation_id",
     "memories",
     "skills",
 }
```

That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
from unrelated plugin startup work that can legitimately materialize
`codex_home/.tmp`, including the startup remote plugin sync marker in
[`core/src/plugins/startup_sync.rs`](bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15))
and the curated plugin snapshot under
[`.tmp/plugins`](bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)).

That makes the regression race unrelated background startup tasks
instead of validating the thread-store invariant it was added to cover.
Rather than weakening the assertion to allow arbitrary `.tmp` entries,
this change isolates the test from plugin warmups so it can stay strict
about unexpected local thread persistence artifacts.

## What changed

- disable plugins in the generated config used by
`app-server/tests/suite/v2/remote_thread_store.rs`
- keep the existing `codex_home` assertions unchanged so the test still
fails if local session or sqlite persistence is introduced

## Verification

- `cargo test -p codex-app-server
suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
-- --exact`
2026-04-25 20:45:31 +00:00
Eric Traut
bce74c70ce Restore persisted model provider on thread resume (#19287)
Fixes #15219.

## Why

`thread/resume` should continue a persisted thread with the same model
provider that created the thread. The app server already restores the
persisted model and reasoning effort before resuming, but it was leaving
`model_provider` unset. If a user created a thread with one provider and
later switched their active profile to another provider, resumed
encrypted history could be sent to the wrong endpoint and fail with
`invalid_encrypted_content`.

The thread metadata already records the original provider, so resume
should apply it when the caller has not explicitly requested a different
model/provider/reasoning configuration.

## What changed

This updates `merge_persisted_resume_metadata` in
`app-server/src/codex_message_processor.rs` to copy
`ThreadMetadata::model_provider` into `ConfigOverrides::model_provider`
alongside the persisted model.

The existing resume metadata tests now also assert that:

- the persisted provider is restored for normal resume
- explicit model, provider, or reasoning-effort overrides still prevent
persisted resume metadata from being applied
- a thread with no persisted model or reasoning effort still resumes
with its persisted provider

## Verification

- `cargo test -p codex-app-server` passed the app-server unit tests,
including the updated resume metadata coverage. The broader integration
portion of that command failed in an unrelated environment-sensitive
skills-budget warning assertion, where this run saw 8 omitted skills
instead of the expected 7.
- `just fix -p codex-app-server` completed successfully.
2026-04-25 12:40:00 -07:00
Michael Bolin
88f300d74d fix: increase Bazel timeout to 45 minutes (#19578)
Unfortunately, if most of the build graph is invalidated such that there
are few cache hits, the Windows Bazel build for all the tests often
takes more than `30` minutes, so this PR increases the timeout to `45`
minutes until we set up distributed builds.
2026-04-25 10:03:01 -07:00
Ahmed Ibrahim
022f81df1f [codex] Order codex-mcp items by visibility (#19526)
## Why

The visibility cleanup in the base PR reduced what `codex-mcp` exposes,
but several files still made reviewers read private support machinery
before the public or crate-facing entry points. This ordering pass makes
each file easier to scan: exported API first, crate-visible MCP
internals next, then private helpers in breadth-first order from the
higher-level MCP flows to leaf utilities.

## What Changed

- Reordered `codex-mcp` exports so the runtime, configuration, snapshot,
auth, and helper surfaces are grouped by visibility and reader
importance.
- Moved public and crate-visible MCP items ahead of private helpers in
the auth, MCP planning/snapshot, connection manager, and tool-name
modules.
- Kept the change mechanical, with no behavior changes intended.

## Verification

- `cargo check -p codex-mcp`
2026-04-25 07:17:30 -07:00
Ahmed Ibrahim
706490ab1b [codex] Prune unused codex-mcp API and duplicate helpers (#19524)
## Why

`codex-mcp` currently exposes more API than the rest of the workspace
uses. Some of that surface is simply visibility that can be tightened,
and some of it is public helper code that remains compiler-valid because
it is exported even though no workspace caller uses it.

That distinction matters: Rust does not warn on exported API just
because the current workspace does not call it. This PR intentionally
treats those exported-but-workspace-unreferenced paths as stale
`codex-mcp` surface. The main example is MCP skill dependency
collection, where the active implementation now lives in
`codex-rs/core/src/mcp_skill_dependencies.rs`; keeping the older
`codex-mcp` copy makes it unclear which implementation owns skill MCP
installation.

## What Changed

- Pruned unused `codex-mcp` re-exports from `codex-mcp/src/lib.rs`.
- Removed non-runtime helper methods from `McpConnectionManager` so it
stays focused on live MCP clients.
- Made `ToolPluginProvenance` lookup methods crate-private.
- Removed workspace-unreferenced snapshot wrapper APIs and
qualified-tool grouping helpers.
- Deleted the duplicate `codex-mcp` skill dependency module and tests
now that skill MCP dependency handling is owned by `core`.

## Verification

- `cargo check -p codex-mcp`
2026-04-25 06:36:07 -07:00
Matthew Zeng
6e838a19fa Enable unavailable dummy tools by default (#19459)
## Summary
- Mark `unavailable_dummy_tools` as a stable feature and enable it by
default
- Update the feature registry test to match the new default state

## Testing
- `just fmt`
- `cargo test -p codex-features`
2026-04-25 08:46:57 +00:00
Eric Traut
a2db6f97fb Fix codex-rs README grammar (#19514)
## Why

Issue #19418 points out a small grammar issue in `codex-rs/README.md`
under "Code Organization." The current sentence says "we hope this to
be," which reads awkwardly.

Fixes #19418.

## What changed

Updated the `core/` crate description so the sentence reads "we hope
this becomes a library crate."

## Verification

Documentation-only change. Reviewed the Markdown diff.
2026-04-24 23:31:47 -07:00
Dylan Hurd
f5497f4d65 Split approval matrix test groups (#19454)
## Why

Recent `main` CI repeatedly timed out in:

- `codex-core::all suite::approvals::approval_matrix_covers_all_modes`

It failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905823212](https://github.com/openai/codex/actions/runs/24905823212),
[24903439629](https://github.com/openai/codex/actions/runs/24903439629),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).

The failure pattern was a 60s Linux remote timeout. Logs showed many
approval scenarios completing before the single matrix test timed out.

## Root Cause

`approval_matrix_covers_all_modes` packed every approval/sandbox/tool
scenario into one test case. That made the test vulnerable to normal CI
variance: one slow scenario or a slow process startup could push the
whole monolithic case past the 60s per-test timeout. It also hid which
part of the matrix was slow because the runner only reported the one
large matrix test.

## What Changed

- Keep the shared `scenarios()` table as the single source of approval
matrix coverage.
- Use one `#[test_case]` per `ScenarioGroup` to generate five async
Tokio tests: danger/full-access, read-only, workspace-write,
apply-patch, and unified-exec.
- Keep the group runner small and add per-scenario error context so a
failure still reports the specific scenario name.

## Why This Should Be Reliable

Each scenario group now has its own test harness timeout instead of
sharing one timeout window with the full matrix. That removes the long
sequential loop from a single test while keeping the implementation
compact and easy to scan.

The tests still run through the same scenario definitions and runner, so
this preserves coverage. `test-case` already composes with
`#[tokio::test]` in this crate and is already available for test code.

## Verification

- `cargo test -p codex-core --test all approval_matrix_ -- --list`
- `cargo test -p codex-core --test all approval_matrix_`
2026-04-24 21:38:27 -07:00
Eric Traut
f1c963d77e Add goal TUI UX (5 / 5) (#18077)
Adds the TUI user experience for goals on top of the core runtime from
PR 4.

## Why

Users need a direct TUI control surface for long-running goals. The UI
should make the current goal visible, support common goal actions
without waiting for a model turn, and avoid confusing end-of-turn
notifications while an active goal is immediately continuing.

## What changed

- Added `/goal` summary rendering for the current goal, including
active, paused, budget-limited, and complete states.
- Added `/goal <objective>` creation/replacement through the app-server
goal API rather than a model prompt.
- Added `/goal clear`, `/goal pause`, and `/goal unpause` command
variants.
- Added a confirmation menu when the user enters a new goal while
another goal already exists.
- Updated `/goal` help and summary tip text so it reflects the supported
command variants without advertising slash-command token budgets.
- Added footer/statusline goal indicators, including elapsed time and
token budget display when a budget exists from API/tool-created goals.
- Consumes goal updated/cleared notifications so the TUI stays in sync
with external app-server changes.
- Suppresses end-of-turn desktop notifications only when a goal is still
active and follow-up work is expected.
- Preserves slash-command history behavior and avoids leaking queued
`/goal` state into unrelated submissions.

## Verification

- Added TUI unit and snapshot coverage for goal command availability,
summary rendering, control commands, replacement menu behavior,
status/footer display, notification handling, and command history.
2026-04-24 21:16:45 -07:00
Eric Traut
4167628622 Add goal core runtime (4 / 5) (#18076)
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.

## Why

A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.

## What changed

- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.

## Verification

- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
2026-04-24 21:16:00 -07:00
Eric Traut
32ace07ac5 Add goal model tools (3 / 5) (#18075)
Adds the model-facing goal tools on top of the app-server API from PR 2.

## Why

Once goals are persisted and exposed to clients, the model needs a
small, constrained tool surface for goal workflows. The tool contract
should let the model inspect goals, create them only when explicitly
requested, and mark them complete without giving it broad control over
user/runtime-owned state.

## What changed

- Added `get_goal`, `create_goal`, and `update_goal` tool specs behind
the `goals` feature flag.
- Added core goal tool handlers that validate objectives and token
budgets before mutating persisted state.
- Constrained `create_goal` to create only when no goal exists, with
optional `token_budget` only when a budget is explicitly provided.
- Tightened the `create_goal` instructions so the model does not infer
goals from ordinary task requests.
- Constrained `update_goal` to expose only goal completion; pause,
resume, clear, and budget-limited transitions remain user- or
runtime-controlled.
- Registered the goal tools in the tool registry and kept them out of
review contexts where they should not appear.

## Verification

- Added tool-registry coverage for feature gating and tool availability.
- Added core session tests for create/get/update behavior, duplicate
goal rejection, budget validation, and completion-only updates.
2026-04-24 20:54:40 -07:00
Eric Traut
6c874f9b34 Add goal app-server API (2 / 5) (#18074)
Adds the app-server v2 goal API on top of the persisted goal state from
PR 1.

## Why

Clients need a stable app-server surface for reading and controlling
materialized thread goals before the model tools and TUI can use them.
Goal changes also need to be observable by app-server clients, including
clients that resume an existing thread.

## What changed

- Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
RPCs for materialized threads.
- Added `thread/goal/updated` and `thread/goal/cleared` notifications so
clients can keep local goal state in sync.
- Added resume/snapshot wiring so reconnecting clients see the current
goal state for a thread.
- Added app-server handlers that reconcile persisted rollout state
before direct goal mutations.
- Updated the app-server README plus generated JSON and TypeScript
schema fixtures for the new API surface.

## Verification

- Added app-server v2 coverage for goal get/set/clear behavior,
notification emission, resume snapshots, and non-local thread-store
interactions.
2026-04-24 20:53:41 -07:00
Eric Traut
0ee737cea6 Add goal persistence foundation (1 / 5) (#18073)
Adds the persisted goal foundation for the rest of the stack. This PR is
intentionally limited to feature flag and state-layer behavior;
app-server APIs, model tools, runtime continuation, and TUI UX are
layered in later PRs.

## Why

Goal mode needs durable thread-level state before clients or model tools
can safely build on it. The state layer needs to know whether a goal
exists, what objective it tracks, whether it is active, paused,
budget-limited, or complete, and how much time/token usage has already
been accounted.

## What changed

- Added the `goals` feature flag and generated config schema entry.
- Added the `thread_goals` state table and Rust model for persisted
thread goals.
- Added state runtime APIs for creating, replacing, updating, deleting,
and accounting goal usage.
- Added `goal_id`-based stale update protection so an old goal update
cannot overwrite a replacement.
- Kept this PR scoped to persistence and state runtime behavior, with no
app-server, model-facing, continuation, or TUI behavior yet.

## Verification

- Added state runtime coverage for goal creation, replacement, stale
update protection, status transitions, token-budget behavior, and usage
accounting.
2026-04-24 20:51:38 -07:00
Curtis 'Fjord' Hawthorne
8a559e7938 Remove js_repl feature (#19410) 2026-04-24 17:49:29 -07:00
Curtis 'Fjord' Hawthorne
cf02e9c052 Fix Bazel cargo_bin runfiles paths (#19468)
## Summary

Fix a Bazel-only path resolution bug in
`codex_utils_cargo_bin::cargo_bin`.

Under Bazel runfiles, `rlocation` can return a relative `bazel-out/...`
path even though `cargo_bin()` documents that it returns an absolute
path. That can break callers that store the returned binary path and
later spawn it after changing cwd, because the relative path is resolved
from the wrong directory.

This patch absolutizes the runfiles-resolved path before returning it.
2026-04-24 17:47:31 -07:00
viyatb-oai
1c3287125f ci: pin codex-action v1.7 (#19472)
## Summary
- update Codex issue automation to pin `openai/codex-action` to
`5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02`, the commit for `v1.7`
- keep the release intent visible with `# v1.7` comments beside the hash
pins

## Test plan
- `git diff --check`
- `yq e '.' .github/workflows/issue-labeler.yml`
- `yq e '.' .github/workflows/issue-deduplicator.yml`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-25 00:44:04 +00:00
Michael Bolin
789f387982 permissions: remove legacy read-only access modes (#19449)
## Why

`ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
`FullAccess` meant the historical read-only/workspace-write modes could
read the full filesystem, while `Restricted` tried to carry partial
readable roots. The partial-read model now belongs in
`FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
`SandboxPolicy` makes every legacy projection reintroduce lossy
read-root bookkeeping and creates unnecessary noise in the rest of the
permissions migration.

This PR makes the legacy policy model narrower and explicit:
`SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
the old full-read sandbox modes only. Split readable roots, deny-read
globs, and platform-default/minimal read behavior stay in the runtime
permissions model.

## What changed

- Removes `ReadOnlyAccess` from
`codex_protocol::protocol::SandboxPolicy`, including the generated
`access` and `readOnlyAccess` API fields.
- Updates legacy policy/profile conversions so restricted filesystem
reads are represented only by `FileSystemSandboxPolicy` /
`PermissionProfile` entries.
- Keeps app-server v2 compatible with legacy `fullAccess` read-access
payloads by accepting and ignoring that no-op shape, while rejecting
legacy `restricted` read-access payloads instead of silently widening
them to full-read legacy policies.
- Carries Windows sandbox platform-default read behavior with an
explicit override flag instead of depending on
`ReadOnlyAccess::Restricted`.
- Refreshes generated app-server schema/types and updates tests/docs for
the simplified legacy policy shape.

## Verification

- `cargo check -p codex-app-server-protocol --tests`
- `cargo check -p codex-windows-sandbox --tests`
- `cargo test -p codex-app-server-protocol sandbox_policy_`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19449
2026-04-24 17:16:58 -07:00
Celia Chen
d19de6d150 fix: Bedrock GPT-5.4 reasoning levels (#19461)
## Why

When using the Amazon Bedrock provider with `openai.gpt-5.4-cmb`, the
model picker allowed `xhigh` because the CMB catalog entry was derived
from the bundled `gpt-5.4` reasoning metadata. Bedrock rejects that
effort level, causing the request to fail before the turn can run:

```text
{"error":{"code":"validation_error","message":"Failed to deserialize the JSON body into the target type: Invalid 'reasoning': Invalid 'effort': unknown variant `xhigh`, expected one of `high`, `low`, `medium`, `minimal` at line 1 column 77239","param":null,"type":"invalid_request_error"}}
```

## What Changed

- Replace the runtime lookup of bundled `gpt-5.4` metadata for
`openai.gpt-5.4-cmb` with an explicit Bedrock CMB `ModelInfo` entry.
- Advertise only the Bedrock-supported CMB reasoning levels: `minimal`,
`low`, `medium`, and `high`.
- Keep the existing GPT OSS Bedrock model metadata and reasoning levels
unchanged.
- Add catalog coverage for the hardcoded CMB metadata and
Bedrock-compatible reasoning level list.
2026-04-25 00:05:22 +00:00
Rasmus Rygaard
5378cccd8a Refactor log DB into LogWriter interface (#19234)
## Why

This prepares feedback log capture for a future remote app-server hook
sink without changing the current local SQLite upload path. The
important boundary is now intentionally small: a log sink is a tracing
`Layer` that can also flush entries it has accepted.

That keeps the existing SQLite implementation simple while giving the
upcoming gRPC sink a place to fit beside it. SQLite and gRPC have
different worker/write semantics, so this PR avoids introducing a shared
buffered-sink abstraction and instead lets each `LogWriter` own the
buffering mechanics it needs.

## What Changed

- Added `LogSinkQueueConfig` with the existing local defaults: queue
capacity `512`, batch size `128`, and flush interval `2s`.
- Added `LogDbLayer::start_with_config(...)` while preserving
`LogDbLayer::start(...)` and `log_db::start(...)` defaults.
- Introduced the `LogWriter` trait as the minimal shared interface:
`tracing_subscriber::Layer` plus `flush()`.
- Made `LogDbLayer` implement `LogWriter`.
- Kept tracing event formatting inside `LogDbLayer`; it still creates
one `LogEntry` per tracing event before queueing it for SQLite.
- Kept normal event capture best-effort and non-blocking via bounded
`try_send`.

## Behavior Notes

This does not change the SQLite schema, retention behavior,
`/feedback/upload`, or Sentry upload behavior. Normal log events still
drop when the queue is full; explicit `flush()` still waits for queue
capacity and receiver processing before returning.

## Verification

- `cargo test -p codex-state log_db`
- `cargo test -p codex-state`
- `just fix -p codex-state`

The added tests cover configured batch-size flushing, configured
interval flushing, queue-full drops, and the flush barrier semantics.
2026-04-24 16:27:39 -07:00
Dylan Hurd
32aad7bd13 Serialize legacy Windows PowerShell sandbox tests (#19453)
## Why

Recent `main` CI had repeated Windows timeouts in the legacy sandbox
process tests:

- `codex-windows-sandbox
session::tests::legacy_capture_powershell_emits_output` failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
[24905411571](https://github.com/openai/codex/actions/runs/24905411571),
[24903336028](https://github.com/openai/codex/actions/runs/24903336028),
and
[24898949647](https://github.com/openai/codex/actions/runs/24898949647).
- `legacy_tty_powershell_emits_output_and_accepts_input` failed in the
same set of runs.
- `legacy_non_tty_cmd_emits_output` failed in runs
[24909500958](https://github.com/openai/codex/actions/runs/24909500958),
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24903336028](https://github.com/openai/codex/actions/runs/24903336028).
- `legacy_non_tty_powershell_emits_output` failed in runs
[24908076251](https://github.com/openai/codex/actions/runs/24908076251),
[24906197645](https://github.com/openai/codex/actions/runs/24906197645),
and
[24903336028](https://github.com/openai/codex/actions/runs/24903336028).

These failures were 30s timeouts on Windows x64 and/or arm64 rather than
assertion failures.

## Root Cause

The active legacy Windows sandbox process tests all exercise host-level
resources: sandbox setup, ACL/user state, private desktop process
launch, stdio capture, and PowerShell/cmd child cleanup. Running several
of these tests concurrently can leave them competing for the same
Windows sandbox setup path and process/session resources, which makes
command startup or output collection hang under CI load.

## What Changed

- Added a shared in-process mutex for the active legacy Windows sandbox
process tests.
- Held that guard across each legacy cmd/PowerShell process test so
those host-resource-heavy cases run one at a time.
- Kept the skipped legacy cmd TTY tests unchanged.

## Why This Should Be Reliable

The tests still use unique homes and run the real legacy sandbox process
path, but they no longer overlap the fragile host-level setup and
process/session lifecycle. Serializing just this small group removes the
concurrency race without reducing the behavioral coverage of each test.

## Verification

- `cargo test -p codex-windows-sandbox`
- GitHub Windows CI is the primary validation signal for the affected
tests; on this PR, Windows clippy, Windows release, and Windows local
Bazel passed after the serialization fix.
2026-04-24 16:18:30 -07:00
rreichel3-oai
219c65dc2f [codex] Forward Codex Apps tool call IDs to backend metadata (#19207)
## Summary
- include the outer tool `call_id` in Codex Apps MCP request metadata
under `_meta._codex_apps.call_id`
- preserve existing Codex Apps metadata like `resource_uri` and
`contains_mcp_source`
- add request metadata coverage for both the existing-metadata and
no-existing-metadata cases

## Why
The paired backend change in
[openai/openai#850796](https://github.com/openai/openai/pull/850796)
updates MCP compliance logging to prefer `_meta._codex_apps.call_id`
instead of the JSON-RPC request id. This client change sends that outer
tool call id so the backend can record the model/tool call identifier
when it is available.

This is wire-compatible with older backends because `_meta._codex_apps`
is already reserved backend-only metadata. Backends that do not read
`call_id` will ignore the extra field.

## Testing
- `cargo test -p codex-core request_meta`
- `just fmt`
- `just fix -p codex-core`
2026-04-24 18:49:34 -04:00
xl-openai
1e560f33e1 feat: Compress skill paths with root aliases (#19098)
Add skill root tracking so model-visible skill lists can use short path
aliases when absolute paths would exceed the metadata budget.
2026-04-24 15:49:07 -07:00
Tom
588f7a9fc4 [codex] add non-local thread store regression harness (#19266)
- Add an integration test that guarantees nothing gets written to codex
home dir or sqlite when running a rollout with a non-local ThreadStore
- Add an in-memory "spy" ThreadStore for tests like this

Note I could not find a good way to also ensure there were no filesystem
_reads_ that didn't go through threadstore. I explored a more elaborate
sandboxed-subprocess approach but it isn't platform portable and felt
like it wasn't (yet) worth it.
2026-04-24 15:45:44 -07:00
Konstantine Kahadze
3c6e2638ac Clarify bundled OpenAI Docs upgrade guide wording (#19422)
## Summary
- Mirrors the OpenAI Docs skill cleanup in the bundled Codex skill copy
- Clarifies reasoning-effort recommendation wording
- Replaces internal snake_case prompt block names with natural-language
guidance aligned to the prompting guide

## Test plan
- `git diff --check`
- Verified the old snake_case prompt block names no longer appear in the
bundled upgrade guide
2026-04-24 22:35:52 +00:00
Michael Bolin
9b8a1fbefc ci: publish codex-app-server release artifacts (#19447)
## Why
The VS Code extension and desktop app do not need the full TUI binary,
and `codex-app-server` is materially smaller than standalone `codex`. We
still want to publish it as an official release artifact, but building
it by tacking another `--bin` onto the existing release `cargo build`
invocations would lengthen those jobs.

This change keeps `codex-app-server` on its own release bundle so it can
build in parallel with the existing `codex` and helper bundles.

## What changed
- Made `.github/workflows/rust-release.yml` bundle-aware so each macOS
and Linux MUSL target now builds either the existing `primary` bundle
(`codex` and `codex-responses-api-proxy`) or a standalone `app-server`
bundle (`codex-app-server`).
- Preserved the historical artifact names for the primary macOS/Linux
bundles so `scripts/stage_npm_packages.py` and
`codex-cli/scripts/install_native_deps.py` continue to find release
assets under the paths they already expect, while giving the new
app-server artifacts distinct names.
- Added a matching `app-server` bundle to
`.github/workflows/rust-release-windows.yml`, and updated the final
Windows packaging job to download, sign, stage, and archive
`codex-app-server.exe` alongside the existing release binaries.
- Generalized the shared signing actions in
`.github/actions/linux-code-sign/action.yml`,
`.github/actions/macos-code-sign/action.yml`, and
`.github/actions/windows-code-sign/action.yml` so each workflow row
declares its binaries once and reuses that list for build, signing, and
staging.
- Added `codex-app-server` to `.github/dotslash-config.json` so releases
also publish a generated DotSlash manifest for the standalone app-server
binary.
- Kept the macOS DMG focused on the existing `primary` bundle;
`codex-app-server` ships as the regular standalone archives and DotSlash
manifest.

## Verification
- Parsed the modified workflow and action YAML files locally with
`python3` + `yaml.safe_load(...)`.
- Parsed `.github/dotslash-config.json` locally with `python3` +
`json.loads(...)`.
- Reviewed the resulting release matrices, artifact names, and packaging
paths to confirm that `codex-app-server` is built separately on macOS,
Linux MUSL, and Windows, while the existing npm staging and Windows
`codex` zip bundling contracts remain intact.
2026-04-24 15:29:37 -07:00
Ahmed Ibrahim
6de6eaa0c1 [4/4] Honor Streamable HTTP MCP placement (#18584) 2026-04-24 15:03:55 -07:00
Konstantine Kahadze
c43e2fcfbf Add gpt-image-2 to bundled OpenAI Docs skill (#19443)
## Summary
- Mirrors openai/skills#374 in the Codex bundled OpenAI Docs skill
- Adds `gpt-image-2` as the best image generation/edit model
- Updates `gpt-image-1.5` to less expensive image generation/edit
quality

## Test plan
- `git diff --check`
2026-04-24 21:48:45 +00:00
Michael Bolin
db94b1657b ci: stop publishing GNU Linux release artifacts (#19445)
## Why
We already prefer shipping the MUSL Linux builds, and the in-repo
release consumers resolve Linux release assets through the MUSL targets.
Keeping the GNU release jobs around adds release time and extra assets
without serving the paths we actually publish and consume.

This is also easier to reason about as a standalone change: future work
can point back to this PR as the intentional decision to stop publishing
`x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu` release
artifacts.

## What changed
- Removed the `x86_64-unknown-linux-gnu` and `aarch64-unknown-linux-gnu`
entries from the `build` matrix in `.github/workflows/rust-release.yml`.
- Added a short comment in that matrix documenting that Linux release
artifacts intentionally ship MUSL-linked binaries.

## Verification
- Reviewed `.github/workflows/rust-release.yml` to confirm that the
release workflow now only builds Linux release artifacts for
`x86_64-unknown-linux-musl` and `aarch64-unknown-linux-musl`.
2026-04-24 21:29:45 +00:00
Tom
0a9b559c0b Migrate fork and resume reads to thread store (#18900)
- Route cold thread/resume and thread/fork source loading through
ThreadStore reads instead of direct rollout path operations
- Keep lookups that explicitly specify a rollout-path using the local
thread store methods but return an invalid-request error for remote
ThreadStore configurations
- Add some additional unit tests for code path coverage
2026-04-24 13:51:37 -07:00
Michael Bolin
13e0ec1614 permissions: make legacy profile conversion cwd-free (#19414)
## Why

The profile conversion path still required a `cwd` even when it was only
translating a legacy `SandboxPolicy` into a `PermissionProfile`. That
made profile producers invent an ambient `cwd`, which is exactly the
anchoring we are trying to remove from permission-profile data. A legacy
workspace-write policy can be represented symbolically instead: `:cwd =
write` plus read-only `:project_roots` metadata subpaths.

This PR creates that cwd-free base so the rest of the stack can stop
threading cwd through profile construction. Callers that actually need a
concrete runtime filesystem policy for a specific cwd still have an
explicitly named cwd-bound conversion.

## What Changed

- `PermissionProfile::from_legacy_sandbox_policy` now takes only
`&SandboxPolicy`.
- `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the
symbolic, cwd-free projection for profiles.
- The old concrete projection is retained as
`FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for
runtime/boundary code that must materialize legacy cwd behavior.
- Workspace-write profiles preserve `CurrentWorkingDirectory` and
`ProjectRoots` special entries instead of materializing cwd into
absolute paths.

## Verification

- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-exec -p
codex-exec-server -p codex-tui -p codex-sandboxing -p
codex-linux-sandbox -p codex-analytics --tests`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p
codex-sandboxing -p codex-linux-sandbox -p codex-analytics`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19414
2026-04-24 13:42:05 -07:00
canvrno-oai
7262c0c450 Skip disabled rows in selection menu numbering and default focus (#19170)
Selection menus in the TUI currently let disabled rows interfere with
numbering and default focus. This makes mixed menus harder to read and
can land selection on rows that are not actionable. This change updates
the shared selection-menu behavior in list_selection_view so disabled
rows are not selected when these views open, and prevents them from
being numbered like selectable rows.

- Disabled rows no longer receive numeric labels
- Digit shortcuts map to enabled rows only
- Default selection moves to the first enabled row in mixed menus
- Updated affected snapshot
- Added snapshot coverage for a plugin detail error popup
- Added a focused unit test for shared selection-view behavior

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-24 13:21:43 -07:00
willwang-openai
687c5d9081 Update unix socket transport to use WebSocket upgrade (#19244)
## Summary
- Switch Unix socket app-server connections to perform the standard
WebSocket HTTP Upgrade handshake
- Update the Unix socket test to exercise a real upgrade over the Unix
stream
- Refresh the app-server README to describe the new Unix socket behavior

## Testing
- `cargo test -p codex-app-server transport::unix_socket_tests`
- `just fmt`
- `git diff --check`
2026-04-24 13:06:51 -07:00
Ruslan Nigmatullin
a3cccbd8ed [codex] Omit fork turns from thread started notifications (#19093)
## Why

`thread/fork` responses intentionally include copied history so the
caller can render the fork immediately, but `thread/started` is a
lifecycle notification. The v2 `Thread` contract says notifications
should return `turns: []`, and the fork path was reusing the response
thread directly, causing copied turns to be emitted through
`thread/started` as well.

## What Changed

- Route app-server `thread/started` notification construction through a
helper that clears `thread.turns` before sending.
- Keep `thread/fork` responses unchanged so callers still receive copied
history.
- Add persistent and ephemeral fork coverage that asserts
`thread/started` emits an empty `turns` array while the response retains
fork history.

## Testing

- `just fmt`
- `cargo test -p codex-app-server`
2026-04-24 12:31:13 -07:00
Celia Chen
0db6811b7c Fix: use function apply_patch tool for Bedrock model (#19416)
## Why

`openai.gpt-5.4-cmb` is served through the Amazon Bedrock provider,
whose request validator currently accepts `function` and `mcp` tool
specs but rejects Responses `custom` tools. The CMB catalog entry reuses
the bundled `gpt-5.4` metadata, which marks `apply_patch_tool_type` as
`freeform`. That causes Codex to include an `apply_patch` tool with
`type: "custom"`, so even heavily disabled sessions can fail before the
model runs with:

```text
Invalid tools: unknown variant `custom`, expected `function` or `mcp`
```

This is provider-specific: the model should still expose `apply_patch`,
but for Bedrock it needs to use the JSON/function tool shape instead of
the freeform/custom shape.

## What Changed

- Override the `openai.gpt-5.4-cmb` static catalog entry to set
`apply_patch_tool_type` to `function` after inheriting the rest of the
`gpt-5.4` model metadata.
- Update the catalog test expectation so the CMB entry continues to
track `gpt-5.4` metadata except for this Bedrock-specific tool shape
override.

## Verification

- `cargo test -p codex-model-provider`
- `just fix -p codex-model-provider`
2026-04-24 18:45:09 +00:00
mcgrew-oai
dee5f5ea38 Harden package-manager install policy (#19163)
## Summary

This PR hardens package-manager usage across the repo to reduce
dependency supply-chain risk. It also removes the stale `codex-cli`
Docker path, which was already broken on `main`, instead of keeping a
bitrotted container workflow alive.

## What changed

- Updated pnpm package manager pins and workspace install settings.
- Removed stale `codex-cli` Docker assets instead of trying to keep a
broken local container path alive.
- Added uv settings and lockfiles for the Python SDK packages.
- Updated Python SDK setup docs to use `uv sync`.

## Why

This is primarily a security hardening change. It reduces
package-install and supply-chain risk by ensuring dependency installs go
through pinned package managers, committed lockfiles, release-age
settings, and reviewed build-script controls.

For `codex-cli`, the right follow-up was to remove the local Docker path
rather than keep patching it:

- `codex-cli/Dockerfile` installed `codex.tgz` with `npm install -g`,
which bypassed the repo lockfile and age-gated pnpm settings.
- The local `codex-cli/scripts/build_container.sh` helper was already
broken on `main`: it called `pnpm run build`, but
`codex-cli/package.json` does not define a `build` script.
- The container path itself had bitrotted enough that keeping it would
require extra packaging-specific behavior that was not otherwise needed
by the repo.

## Gaps addressed

- Global npm installs bypassed the repo lockfile in Docker and CLI
reinstall paths, including `codex-cli/Dockerfile` and
`codex-cli/bin/codex.js`.
- CI and Docker pnpm installs used `--frozen-lockfile`, but the repo was
missing stricter pnpm workspace settings for dependency build scripts.
- Python SDK projects had `pyproject.toml` metadata but no committed
`uv.lock` coverage or uv age/index settings in `sdk/python` and
`sdk/python-runtime`.
- The secure devcontainer install path used npm/global install behavior
without a local locked package-manager boundary.
- The local `codex-cli` Docker helper was already broken on `main`, so
this PR removes that stale Docker path instead of preserving a broken
surface.
- pnpm was already pinned, but not to the current repo-wide pnpm version
target.

## Verification

- `pnpm install --frozen-lockfile`
- `.devcontainer/codex-install`: `pnpm install --prod --frozen-lockfile`
- `.devcontainer/codex-install`: `./node_modules/.bin/codex --version`
- `sdk/python`: `uv lock --check`, `uv sync --locked --all-extras
--dry-run`, `uv build`
- `sdk/python-runtime`: `uv lock --check`, `uv sync --locked --dry-run`,
`uv build --wheel`
- `pnpm -r --filter ./sdk/typescript run build`
- `pnpm -r --filter ./sdk/typescript run lint`
- `pnpm -r --filter ./sdk/typescript run test`
- `node --check codex-cli/bin/codex.js`
- `docker build -f .devcontainer/Dockerfile.secure -t codex-secure-test
.`
- `cargo build -p codex-cli`
- repo-wide package-manager audit
2026-04-24 14:36:19 -04:00
Konstantine Kahadze
6bb2fa3fd4 Update bundled OpenAI Docs skill for GPT-5.5 (#19407)
## Summary
Updates the bundled OpenAI Docs system skill for GPT-5.5.

## Changes
- Updates the bundled latest-model fallback
- Replaces bundled upgrade guidance with GPT-5.5 migration guidance
- Replaces bundled prompting guidance with GPT-5.5 prompting guidance

## Test plan
- Ran `node scripts/resolve-latest-model-info.js`
- Verified bundled files match the OpenAI Docs skill fallback content
2026-04-24 18:26:47 +00:00
iceweasel-oai
e787358f70 check PID of named pipe consumer (#19283)
## Why
The elevated Windows command runner currently trusts the first process
that connects to its parent-created named pipes. Tightening the pipe ACL
already narrows who can reach that boundary, but verifying the connected
client PID gives the parent one more fail-closed check: it only accepts
the exact runner process it just spawned.

## What changed
- validate `GetNamedPipeClientProcessId` after `ConnectNamedPipe` and
reject clients whose PID does not match the spawned runner
- also did some code de-duplication to route the one-shot elevated
capture flow in `windows-sandbox-rs/src/elevated_impl.rs` through
`spawn_runner_transport()` so both elevated codepaths use the same pipe
bootstrap and PID validation

Using the transport unification here also reduces duplication in the
elevated Windows IPC bootstrap, so future hardening to the runner
handshake only needs to land in one place.

## Validation
- `cargo test -p codex-windows-sandbox`
- manual testing: one-shot elevated path via `target/debug/codex.exe
exec` running a randomized shell command and confirming captured output
- manual testing: elevated session path via `target/debug/codex.exe -c
'windows.sandbox="elevated"' sandbox windows -- python -u -c ...` with
stdin/stdout round-trips (`READY`, then `GOT:...` for two input lines)

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-04-24 17:41:08 +00:00
Alex Zamoshchin
bcc1caa920 respect workspace option for disabling plugins (#18907)
Respects the workspace setting for plugins in Codex

Plugins menu disappears
Plugins do not load
Plugins do not load in composer

no plugins loaded
<img width="809" height="226" alt="Screenshot 2026-04-23 at 3 20 45 PM"
src="https://github.com/user-attachments/assets/3a4dba8e-69c3-4046-a77e-f13ab77f84b4"
/>


no plugins in menu
<img width="293" height="204" alt="Screenshot 2026-04-23 at 3 20 35 PM"
src="https://github.com/user-attachments/assets/5cb9bf52-ad72-488f-b90c-5eb457da09a3"
/>
2026-04-24 17:38:45 +00:00
jif-oai
f802f0a391 chore: drop MCP Plugins and App from Morpheus (#19380)
Quick fix of https://github.com/openai/codex/issues/18333
2026-04-24 17:57:48 +02:00
danwang-oai
11806faf71 Fix hang on turn/interrupt (#18392)
Fix a bug where the `turn/interrupt` RPC hangs when interrupting a turn
that has already completed.

Before this change, `turn/interrupt` requests were queued in app-server
and only answered when a later TurnAborted event arrived. If the target
turn was already complete, core treated Op::Interrupt as a no-op, so no
abort event was emitted and the RPC could hang indefinitely.

This change fixes that in two places:

* Reject turn/interrupt immediately with `INVALID_REQUEST` when the
requested turn is no longer the active turn.
* Resolve any already-accepted pending interrupt requests when the turn
reaches TurnComplete, covering the case where a turn finishes naturally
after the interrupt request is accepted but before it aborts.

I tested this by adding a failing test in
707487c063. You may view the results here:
https://github.com/openai/codex/actions/runs/24585182419/

<img width="1512" height="310" alt="CleanShot 2026-04-17 at 16 33 30@2x"
src="https://github.com/user-attachments/assets/f4a88228-b2a4-41f4-9aaa-ec82814096af"
/>
2026-04-24 10:47:50 -04:00
jif-oai
28742866c7 Add agents.interrupt_message for interruption markers (#19351)
## Why

Agent interruptions currently always persist a model-visible
interrupted-turn marker before emitting `TurnAborted`. That marker is
useful by default because it gives the next model turn context about a
deliberately interrupted task, but some deployments need to suppress
that history injection entirely while still keeping the client-visible
interruption event.

## What changed

- Add `[agents] interrupt_message = false` to disable the model-visible
interrupted-turn marker.
- Resolve the setting into `Config::agent_interrupt_message_enabled`,
defaulting to `true` so existing behavior is unchanged.
- Apply the setting to both live interrupted turns and interrupted fork
snapshots.
- Keep emitting `TurnAborted` even when the history marker is disabled.
- Regenerate `core/config.schema.json` for the new
`agents.interrupt_message` field.

## Testing

- `cargo test -p codex-core load_config_resolves_agent_interrupt_message
-- --nocapture`
- `cargo test -p codex-core
disabled_interrupted_fork_snapshot_appends_only_interrupt_event --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_developer_input_message --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_can_disable_interrupted_marker --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo check -p codex-core`
2026-04-24 16:02:45 +02:00
jif-oai
deb4509302 feat: surface multi-agent thread limit in spawn description (#19360)
## Summary
- Thread `agent_max_threads` into `ToolsConfig` and
`SpawnAgentToolOptions`.
- Render the configured `max_concurrent_threads_per_session` value in
the MultiAgentV2 `spawn_agent` description.
- Cover the description text in `codex-tools` unit tests and
`codex-core` tool spec tests.

## Validation
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core spawn_agent_description`
- `git diff --check`

## Notes
- `cargo test -p codex-core` was also attempted, but unrelated
environment-sensitive tests failed with the active local environment.
Examples: approvals reviewer defaults observed `AutoReview` instead of
`User`, request-permissions event tests did not emit events, and
proxy-env tests saw `http://127.0.0.1:50604` from the active proxy
environment.

Co-authored-by: Codex <noreply@openai.com>
2026-04-24 15:13:54 +02:00
jif-oai
9eadff9713 chore: alias max_concurrent_threads_per_session (#19354) 2026-04-24 14:33:03 +02:00
jif-oai
120aa07d81 Make MultiAgentV2 interruption markers assistant-authored (#19124)
## Why

`MultiAgentV2` follow-up messages are delivered to agents as
assistant-authored `InterAgentCommunication` envelopes. When
`followup_task` used `interrupt: true`, the interrupted-turn guidance
was still persisted as a contextual user message, so model-visible
history made a system-generated interruption boundary look
user-authored.

This keeps interruption guidance consistent with the rest of the v2
inter-agent message stream while preserving the legacy marker shape for
non-v2 sessions.

## What changed

- Make `interrupted_turn_history_marker` feature-aware.
- Record the interrupted-turn marker as an assistant `OutputText`
message when `Feature::MultiAgentV2` is enabled.
- Keep the existing user contextual fragment for non-v2 sessions.
- Apply the same feature-aware marker to interrupted fork snapshots.
- Add coverage for the live `followup_task` interrupt path and the
helper-level v2 marker shape.

## Testing

- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_assistant_output_message --
--nocapture`
- `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`
2026-04-24 13:39:26 +02:00
jif-oai
21463a5074 fix alpha build (#19350) 2026-04-24 13:36:05 +02:00
sayan-oai
c10f95ddac Update models.json and related fixtures (#19323)
Supersedes #18735.

The scheduled rust-release-prepare workflow force-pushed
`bot/update-models-json` back to the generated models.json-only diff,
which dropped the test and snapshot updates needed for CI.

This PR keeps the latest generated `models.json` from #18735 and adds
the corresponding fixture updates:
- preserve model availability NUX in the app-server model cache fixture
- update core/TUI expectations for the new `gpt-5.4` `xhigh` default
reasoning
- refresh affected TUI chatwidget snapshots for the `gpt-5.5`
default/model copy changes

Validation run locally while preparing the fix:
- `just fmt`
- `cargo test -p codex-app-server model_list`
- `cargo test -p codex-core includes_no_effort_in_request`
- `cargo test -p codex-core
includes_default_reasoning_effort_in_request_when_defined_by_model_info`
- `cargo test -p codex-tui --lib chatwidget::tests`
- `cargo insta pending-snapshots`

---------

Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
2026-04-24 11:14:13 +02:00
Eric Traut
ddfa691752 Surface reasoning tokens in exec JSON usage (#19308)
## Summary

Fixes #19022.

`codex exec --json` currently emits `turn.completed.usage` with input,
cached input, and output token counts, but drops the reasoning-token
split that Codex already receives through thread token usage updates.
Programmatic consumers that rely on the JSON stream, especially
ephemeral runs that do not write rollout files, need this field to
accurately display reasoning-model usage.

This PR adds `reasoning_output_tokens` to the public exec JSON `Usage`
payload and maps it from the existing `ThreadTokenUsageUpdated` total
token usage data.

## Verification

- Added coverage to
`event_processor_with_json_output::token_usage_update_is_emitted_on_turn_completion`
so `turn.completed.usage.reasoning_output_tokens` is asserted.
- Updated SDK expectations for `run()` and `runStreamed()` so TypeScript
consumers see the new usage field.
- Ran `cargo test -p codex-exec`.
- Ran `pnpm --filter ./sdk/typescript run build`.
- Ran `pnpm --filter ./sdk/typescript run lint`.
- Ran `pnpm --filter ./sdk/typescript exec jest --runInBand
--testTimeout=30000`.
2026-04-24 01:54:11 -07:00
Eric Traut
6f87eb0479 Hide unsupported MCP bearer_token from config schema (#19294)
## Summary

Fixes #19275.

Codex runtime rejects inline MCP `bearer_token` config entries and asks
users to configure `bearer_token_env_var` instead, but the generated
config schema still advertised `mcp_servers.<name>.bearer_token` as a
supported field. That made editor/schema validation disagree with
runtime validation.

This keeps `bearer_token` in `RawMcpServerConfig` so Codex can continue
producing the targeted runtime error for recent or existing configs, but
skips the field during schemars generation. The checked-in
`core/config.schema.json` fixture now exposes `bearer_token_env_var`
without exposing unsupported inline `bearer_token`.

## Verification

- Added `config_schema_hides_unsupported_inline_mcp_bearer_token` to
assert the generated schema hides `bearer_token` while preserving
`bearer_token_env_var`.
- Ran `cargo test -p codex-config`.
- Ran `cargo test -p codex-core config_schema`.
2026-04-24 00:17:43 -07:00
sayan-oai
e083b6c757 chore: apply truncation policy to unified_exec (#19247)
we were not respecting turn's `truncation_policy` to clamp output tokens
for `unified_exec` and `write_stdin`.

this meant truncation was only being applied by `ContextManager` before
the output was stored in-memory (so it _was_ being truncated from
model-visible context), but the full output was persisted to rollout on
disk.

now we respect that `truncation_policy` and `ContextManager`-level
truncation remains a backup.

### Tests
added tests, tested locally.
2026-04-24 00:17:39 -07:00
Eric Traut
ac8c9fc49c Reject unsupported js_repl image MIME types (#19292)
## Summary

`codex.emitImage` accepted arbitrary image MIME types for byte payloads
and data URLs. That allowed a value like `image/rgba` to be wrapped as
an `input_image`, even though it is not a supported encoded image
format, so the invalid image could reach the model-input path and
trigger output sanitization.

This results in a panic in debug builds because the output sanitization
is meant as a final safety net, not a primary means of rejecting invalid
image types. I've hit this case multiple times when executing certain
long-running tasks.

This PR rejects unsupported image MIME types before they are emitted
from `js_repl`.

## Changes

- Validate `codex.emitImage({ bytes, mimeType })` in the JS kernel so
only encoded PNG, JPEG, WebP, or GIF payloads are accepted.
- Apply the same MIME allowlist to direct image data URLs, including the
Rust host-side validation path.
- Clarify the JS REPL instructions so agents know byte payloads must
already be encoded as PNG/JPEG/WebP/GIF.
2026-04-24 00:14:51 -07:00
Michael Bolin
b68366718b ci: reuse Bazel CI startup for target-discovery queries (#19232)
## Why

A rerun of the Windows Bazel clippy job after
[#19161](https://github.com/openai/codex/pull/19161) had exactly the
cache behavior we wanted in BuildBuddy: zero action-cache misses. Even
so, the GitHub job still took a little over five minutes.

The problem was that the job was paying for two separate Bazel startup
paths:

1. a `bazel query` to discover extra lint targets
2. the real `bazel build --config=clippy ...` invocation

On Windows, that query was bypassing the CI Bazel wrapper, so it did not
reuse the same `--output_user_root`, CI config, or remote-cache setup as
the real build. In practice that meant the rerun could still cold-start
a separate Bazel server before the actual clippy build even began.

## What

- add `.github/scripts/run-bazel-query-ci.sh` to run CI-side Bazel
queries with the same startup and cache-related flags as the main Bazel
command
- switch `scripts/list-bazel-clippy-targets.sh` to use that helper for
manual `rust_test` target discovery
- switch `tools/argument-comment-lint/list-bazel-targets.sh` to use the
same helper
- simplify `.github/scripts/run-argument-comment-lint-bazel.sh` so its
Windows-only query path also goes through the shared helper

This keeps the target-discovery queries aligned with the later
build/test invocation instead of treating them as a separate cold Bazel
session.

## Verification

- `bash -n .github/scripts/run-bazel-query-ci.sh`
- `bash -n scripts/list-bazel-clippy-targets.sh`
- `bash -n tools/argument-comment-lint/list-bazel-targets.sh`
- `bash -n .github/scripts/run-argument-comment-lint-bazel.sh`
- mocked a Windows invocation of `run-bazel-query-ci.sh` and verified it
forwards `--output_user_root`, `--config=ci-windows`, the BuildBuddy
auth header, and the repository cache flags

## Docs

No documentation updates are needed.
2026-04-23 23:26:17 -07:00
Eric Traut
d87d918716 Resolve relative agent role config paths from layers (#19261)
Fixes #19257.

## Summary

Agent roles declared in config layers can set `config_file` to a
relative path, but deserializing the layer-local `[agents.*]` table
happened without an `AbsolutePathBuf` base path. That caused configs
like `config_file = "agents/my-role.toml"` to fail with `AbsolutePathBuf
deserialized without a base path`.

This updates agent role layer loading to deserialize `[agents.*]` while
the layer config folder is active as the path base, matching the
behavior documented for `AgentRoleToml.config_file`. It also adds
coverage for a user config layer with a relative agent role
`config_file`.
2026-04-23 23:23:11 -07:00
Michael Bolin
4816b89204 permissions: make profiles represent enforcement (#19231)
## Why

`PermissionProfile` is becoming the canonical permissions abstraction,
but the old shape only carried optional filesystem and network fields.
It could describe allowed access, but not who is responsible for
enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
when profiles were exported, cached, or round-tripped through app-server
APIs.

The important model change is that active permissions are now a disjoint
union over the enforcement mode. Conceptually:

```rust
pub enum PermissionProfile {
    Managed {
        file_system: FileSystemSandboxPolicy,
        network: NetworkSandboxPolicy,
    },
    Disabled,
    External {
        network: NetworkSandboxPolicy,
    },
}
```

This distinction matters because `Disabled` means Codex should apply no
outer sandbox at all, while `External` means filesystem isolation is
owned by an outside caller. Those are not equivalent to a broad managed
sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
inner sandbox may require the outer Codex layer to use no sandbox rather
than a permissive one.

## How Existing Modeling Maps

Legacy `SandboxPolicy` remains a boundary projection, but it now maps
into the higher-fidelity profile model:

- `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
with restricted filesystem entries plus the corresponding network
policy.
- `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
the “no outer sandbox” intent instead of treating it as a lax managed
sandbox.
- `ExternalSandbox { network_access }` maps to
`PermissionProfile::External { network }`, preserving external
filesystem enforcement while still carrying the active network policy.
- Split runtime policies that legacy `SandboxPolicy` cannot faithfully
express, such as managed unrestricted filesystem plus restricted
network, stay `Managed` instead of being collapsed into
`ExternalSandbox`.
- Per-command/session/turn grants remain partial overlays via
`AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
complete active runtime permissions.

## What Changed

- Change active `PermissionProfile` into a tagged union: `managed`,
`disabled`, and `external`.
- Keep partial permission grants separate with
`AdditionalPermissionProfile` for command/session/turn overlays.
- Represent managed filesystem permissions as either `restricted`
entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
present.
- Preserve old rollout compatibility by accepting the pre-tagged `{
network, file_system }` profile shape during deserialization.
- Preserve fidelity for important edge cases: `DangerFullAccess`
round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
and managed unrestricted filesystem + restricted network stays managed
instead of being mistaken for external enforcement.
- Preserve configured deny-read entries and bounded glob scan depth when
full profiles are projected back into runtime policies, including
unrestricted replacements that now become `:root = write` plus deny
entries.
- Regenerate the experimental app-server v2 JSON/TypeScript schema and
update the `command/exec` README example for the tagged
`permissionProfile` shape.

## Compatibility

Legacy `SandboxPolicy` remains available at config/API boundaries as the
compatibility projection. Existing rollout lines with the old
`PermissionProfile` shape continue to load. The app-server
`permissionProfile` field is experimental, so its v2 wire shape is
intentionally updated to match the higher-fidelity model.

## Verification

- `just write-app-server-schema`
- `cargo check --tests`
- `cargo test -p codex-protocol permission_profile`
- `cargo test -p codex-protocol
preserving_deny_entries_keeps_unrestricted_policy_enforceable`
- `cargo test -p codex-app-server-protocol
permission_profile_file_system_permissions`
- `cargo test -p codex-app-server-protocol serialize_client_response`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox`
- `just fix`
- `just fix -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
2026-04-23 23:02:18 -07:00
xli-oai
33cc135cc3 [codex] Support remote plugin install writes (#18917)
## Summary
- Add a remote plugin install write call that POSTs the selected remote
plugin to the ChatGPT cloud plugin API.
- Align remote install with the latest remote read contract:
`pluginName` carries the backend remote plugin id directly, for example
`plugins~Plugin_linear`, and install no longer synthesizes
`<name>@<marketplace>` ids.
- Validate remote install ids with the same character rules as remote
read, return the same install response shape as local installs, and
include mocked app-server coverage for the write path.

## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all plugin_install`
- `cargo test -p codex-core-plugins`
- `just fix -p codex-app-server`
- `just fix -p codex-core-plugins`
2026-04-23 22:10:15 -07:00
Ruslan Nigmatullin
19badb0be2 app-server: persist device key bindings in sqlite (#19206)
## Why

Device-key providers should only own platform key material. The
account/client binding used to authorize a signing payload is app-server
state, and keeping that state in provider-specific metadata makes the
same check harder to audit and harder to share across platform
implementations.

Persisting the binding in the shared state database gives the device-key
crate a platform-neutral source of truth before it asks a provider to
sign. It also lets app-server move potentially blocking key operations
off the main message processor path, which matters once providers may
wait for OS authentication prompts.

## What changed

- Add a `device_key_bindings` state migration plus `StateRuntime`
helpers keyed by `key_id`.
- Add an async `DeviceKeyBindingStore` abstraction to `codex-device-key`
and use it from `DeviceKeyStore::create` and `DeviceKeyStore::sign`.
- Keep provider calls behind async store methods and run the synchronous
provider work through `spawn_blocking`.
- Wire app-server device-key RPC handling to the SQLite-backed binding
store and spawn response/error delivery tasks for device-key requests.
- Run the turn-start tracing test on the existing larger current-thread
test harness after the larger async surface made the default test stack
too small locally.

## Validation

- `cargo test -p codex-device-key`
- `cargo test -p codex-state device_key`
- `cargo test -p codex-state`
- `cargo test -p codex-app-server device_key`
- `cargo test -p codex-app-server
message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
- `cargo test -p codex-app-server`
- `just fix -p codex-device-key`
- `just fix -p codex-state`
- `just fix -p codex-app-server`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `git diff --check`
2026-04-23 21:55:56 -07:00
Celia Chen
e8d8080818 feat: let model providers own model discovery (#18950)
## Why

`codex-models-manager` had grown to own provider-specific concerns:
constructing OpenAI-compatible `/models` requests, resolving provider
auth, emitting request telemetry, and deciding how provider catalogs
should be sourced. That made the manager harder to reuse for providers
whose model catalog is not fetched from the OpenAI `/models` endpoint,
such as Amazon Bedrock.

This change moves provider-specific model discovery behind
provider-owned implementations, so the models manager can focus on
refresh policy, cache behavior, picker ordering, and model metadata
merging.

## What Changed

- Introduced a `ModelsManager` trait with separate `OpenAiModelsManager`
and `StaticModelsManager` implementations.
- Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives
outside `codex-models-manager`.
- Moved `/models` request construction, provider auth resolution,
timeout handling, and request telemetry into `codex-model-provider` via
`OpenAiModelsEndpoint`.
- Added provider-owned `models_manager(...)` construction so configured
OpenAI-compatible providers use `OpenAiModelsManager`, while
static/catalog-backed providers can return `StaticModelsManager`.
- Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock
model IDs.
- Updated core/session/thread manager code and tests to depend on
`Arc<dyn ModelsManager>`.
- Moved offline model test helpers into
`codex_models_manager::test_support`.
## Metadata References

The Bedrock catalog metadata is based on the official Amazon Bedrock
OpenAI model documentation:

- [Amazon Bedrock OpenAI
models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html)
lists the Bedrock model IDs, text input/output modalities, and `128,000`
token context window for `gpt-oss-20b` and `gpt-oss-120b`.
- [Amazon Bedrock `gpt-oss-120b` model
card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html)
lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the
`bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities,
and `128K` context window.
- [OpenAI `gpt-oss-120b` model
docs](https://developers.openai.com/api/docs/models/gpt-oss-120b)
document configurable reasoning effort with `low`, `medium`, and `high`,
plus text input/output modality.

The display names, default reasoning effort, and priority ordering are
Codex-local catalog choices.

## Test Plan
- Manually verified app-server model listing with an AWS profile:

```shell
CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \
  model-list
```

The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0`
as the default model and `openai.gpt-oss-20b-1:0` as the second listed
model, both text-only and supporting low/medium/high reasoning effort.
2026-04-24 04:28:25 +00:00
xl-openai
53be451673 feat: Use short SHA versions for curated plugin cache entries (#19095)
Curated plugin cache entries now use an 8-character SHA prefix, instead
of the full SHA, as the cache folder version number.
2026-04-23 21:15:03 -07:00
cassirer-openai
a9c111da54 [rollout_trace] Trace sessions and multi-agent edges (#18879)
## Summary

Adds the remaining session and multi-agent edge wiring needed to
reconstruct rollout relationships across spawned agents, resumed
sessions, and parent/child message delivery.

## Stack

This is PR 4/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This is the stack layer that makes traces useful for multi-threaded
agent workflows. The main invariant is that reconstructed relationships
should come from durable rollout data rather than transient in-memory
manager state wherever possible.

The PR is intentionally small relative to the preceding layers: it uses
the recorder and reducer contracts already established by the stack and
only adds the session/agent relationship events needed by later debug
reduction.
2026-04-24 02:29:45 +00:00
starr-openai
49fb25997f Add sticky environment API and thread state (#18897)
## Summary
- add sticky environment selections to app-server v2 thread/start and
turn/start request flow
- carry thread-level selections through core session/thread state
- add app-server coverage for sticky selections and turn overrides

## Stack
1. This PR: API and thread persistence
2. #18898: config.toml named environment loading
3. #18899: downstream tool/runtime consumers

## Validation
- Not run locally; split only.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 18:57:13 -07:00
cassirer-openai
e3c8720a99 [rollout_trace] Add debug trace reduction command (#18880)
## Summary

Adds the debug CLI entry point for reducing recorded rollout traces.
This gives developers a direct way to inspect whether the emitted trace
stream reduces into the expected conversation/runtime model.

## Stack

This is PR 5/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is intentionally last: it depends on the trace crate, core
recorder, runtime/tool events, and session/agent edge data all existing.
The command should remain a debug/developer tool and avoid adding new
runtime behavior.

The useful review question is whether the CLI exposes the reducer in the
smallest practical way for local inspection without turning the debug
command into a supported user-facing workflow.
2026-04-24 01:56:48 +00:00
Celia Chen
432771c5fd feat: expose AWS account state from account/read (#19048)
## Why

AWS/Bedrock mode currently reports `account: null` with
`requiresOpenaiAuth: false` from `account/read`. That suppresses the
OpenAI-auth requirement, but it does not let app clients distinguish AWS
auth from any other non-OpenAI custom provider. For the prototype AWS
provider UX, clients need a simple provider-derived signal so they can
suppress ChatGPT/API-key login and token-refresh paths without
hardcoding Bedrock checks.

## What changed

- Adds an `aws` variant to the v2 `Account` protocol union.
- Adds `ProviderAccountKind` to `codex-model-provider` so the runtime
provider owns the app-visible account classification.
- Makes Amazon Bedrock return `ProviderAccountKind::Aws` from the
model-provider layer.
- Updates app-server `account/read` to map `ProviderAccountKind` to the
existing `GetAccountResponse` wire shape.
- Preserves the existing `account: null, requiresOpenaiAuth: false`
behavior for other non-OpenAI providers.
- Regenerates the app-server protocol schema fixtures.
- Adds coverage for provider account classification and for the Amazon
Bedrock `account/read` response.

## Testing

- `cargo test -p codex-model-provider`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server get_account_with_aws_provider`

## Notes

I attempted `just bazel-lock-update` and `just bazel-lock-check`, but
both are blocked in my local environment because `bazel` is not
installed.
2026-04-24 01:53:13 +00:00
Eric Traut
72f757d144 Increase app-server WebSocket outbound buffer (#19246)
Fixes #18203.

## Why

Remote TUI clients connected through `codex app-server --listen
ws://...` can receive short bursts of outbound turn and tool-output
notifications. The WebSocket transport previously used the shared
128-message channel capacity for its outbound writer queue, so a healthy
client that briefly lagged during normal output streaming could fill the
queue and be disconnected immediately.

This is a smaller mitigation than #18265: instead of adding a new
overflow/backpressure pipeline, keep the existing non-blocking router
behavior and give WebSocket clients enough bounded headroom for
realistic bursts.

## What Changed

- Added a WebSocket-only outbound writer capacity of `64 * 1024`
messages.
- Used that larger capacity only for the WebSocket data writer queue in
`codex-rs/app-server/src/transport/websocket.rs`.
- Left the shared `CHANNEL_CAPACITY` and the existing disconnect-on-full
behavior unchanged for internal/control channels and genuinely stuck
clients.

## Verification

- `cargo test -p codex-app-server
transport::tests::broadcast_does_not_block_on_slow_connection`
- Manually retried the #18203 repro prompt against the remote TUI and
confirmed it stayed connected.
2026-04-23 18:47:28 -07:00
efrazer-oai
5882f3f95e refactor: route Codex auth through AuthProvider (#18811)
## Summary

This PR moves Codex backend request authentication from direct
bearer-token handling to `AuthProvider`.

The new `codex-auth-provider` crate defines the shared request-auth
trait. `CodexAuth::provider()` returns a provider that can apply all
headers needed for the selected auth mode.

This lets ChatGPT token auth and AgentIdentity auth share the same
callsite path:
- ChatGPT token auth applies bearer auth plus account/FedRAMP headers
where needed.
- AgentIdentity auth applies AgentAssertion plus account/FedRAMP headers
where needed.

Reference old stack: https://github.com/openai/codex/pull/17387/changes

## Callsite Migration

| Area | Change |
| --- | --- |
| backend-client | accepts an `AuthProvider` instead of a raw
token/header |
| chatgpt client/connectors | applies auth through
`CodexAuth::provider()` |
| cloud tasks | keeps Codex-backend gating, applies auth through
provider |
| cloud requirements | uses Codex-backend auth checks and provider
headers |
| app-server remote control | applies provider headers for backend calls
|
| MCP Apps/connectors | gates on `uses_codex_backend()` and keys caches
from generic account getters |
| model refresh | treats AgentIdentity as Codex-backend auth |
| OpenAI file upload path | rejects non-Codex-backend auth before
applying headers |
| core client setup | keeps model-provider auth flow and allows
AgentIdentity through provider-backed OpenAI auth |

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. This PR: migrate Codex backend auth callsites through AuthProvider
5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
and load `CODEX_AGENT_IDENTITY`

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-23 17:14:02 -07:00
Michael Bolin
a9f75e5cda ci: derive cache-stable Windows Bazel PATH (#19161)
## Why

The BuildBuddy runs for PR #19086 and the later `main` build had the
same source tree, but their Windows Bazel action and test cache keys did
not line up. Comparing the downloaded execution logs showed the full
GitHub-hosted Windows runner `PATH` had changed from
`apache-maven-3.9.14` to `apache-maven-3.9.15`.

This repo is not using Maven; the Maven entry was just ambient
hosted-runner state. The problem was that Windows Bazel CI was still
forwarding the whole runner `PATH` into Bazel via `--action_env=PATH`,
`--host_action_env=PATH`, and `--test_env=PATH`, which made otherwise
reusable cache entries sensitive to unrelated runner image churn.

After discussion with the Bazel and BuildBuddy folks, the better shape
for this change was to stop asking Bazel to inherit the ambient Windows
`PATH` and instead compute one explicit cache-stable `PATH` in the
Windows setup action that already prepares the CI toolchain environment.

## What

- remove Windows `PATH` passthrough from `.bazelrc`
- export `CODEX_BAZEL_WINDOWS_PATH` from
`.github/actions/setup-bazel-ci/action.yml`
- move the PATH derivation logic into
`.github/scripts/compute-bazel-windows-path.ps1` so the allow-list is
easier to review and document
- keep only the Windows tool locations these Bazel jobs actually need:
MSVC and SDK paths, Git, PowerShell, Node, DotSlash, and the standard
Windows system directories
- update `.github/scripts/run-bazel-ci.sh` to require that explicit
value and forward it to Bazel action, host action, and test environments
- log the derived `CODEX_BAZEL_WINDOWS_PATH` in the setup step to
simplify cache-key debugging

## Verification

- `bash -n .github/scripts/run-bazel-ci.sh`
- `ruby -e 'require "yaml"; YAML.load_file(ARGV[0])'
.github/actions/setup-bazel-ci/action.yml`
- PowerShell parse check for
`.github/scripts/compute-bazel-windows-path.ps1`
- simulated a representative Windows `PATH` in PowerShell; the
allow-list retained MSVC, Git, PowerShell, Node, Windows, and DotSlash
entries while dropping Maven
2026-04-23 22:28:00 +00:00
iceweasel-oai
867820ac7e do not attempt ACLs on installed codex dir (#19214)
We used to attempt a read-ACL on the same dir as `codex.exe` to grant
the sandbox user the ability to invoke `codex-command-runner.exe`. That
worked for the CLI case but it always fails for the installed desktop
app.

We have another solution already in place that copies
`codex-command-runner.exe` to `CODEX_HOME/.sandbox-bin` so we don't even
need this anymore. It causes a scary looking error in the logs that is a
non-issue and is therefore confusing
2026-04-23 22:21:48 +00:00
iceweasel-oai
2e228969be guide Windows to use -WindowStyle Hidden for Start-Process calls (#19044)
Sometimes codex runs `Start-Process` to start up a service or something
similar, which launches a user-visible powershell window that probably
doesn't get cleaned up. This instruction change encourages it to do so
using a hidden window.

This was reported in
https://openai.slack.com/archives/C09K6H5DZC4/p1776741272870519

One caveat is that this change won't do anything to cleanup these
processes, but it will stop them from polluting the user's visible
workspace

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 21:36:17 +00:00
Michael Bolin
040976b218 tests: isolate approval fixtures from host rules (#18288)
## Why

Several approval-focused tests were unintentionally sensitive to
host-level rule files. On machines with broader allowed command
prefixes, commonly allowed commands such as `/bin/date` could bypass the
approval path these tests were meant to exercise, making the fixtures
depend on the developer or CI host configuration.

## What changed

- Pins the approval matrix fixture to the explicit user reviewer so it
does not inherit a host reviewer.
- Changes OTel approval fixtures to request `/usr/bin/touch
codex-otel-approval-test`, avoiding a command that may be pre-approved
by local rules.
- Clears the config layer stack for the permissions-message assertion
that needs to compare only the permissions text under test.

## Verification

- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test
all approval_matrix_covers_all_modes -- --nocapture`
- `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test
all permissions_messages -- --nocapture`
2026-04-23 14:12:09 -07:00
Abhinav
dc5cf1ff78 Mark hooks schema fixtures as generated (#19194)
## Summary
- mark generated hooks schema fixture JSON as linguist-generated
- keep the app-server protocol generated schema marking unchanged

## Validation
- `git check-attr linguist-generated --
codex-rs/hooks/schema/generated/post-tool-use.command.output.schema.json`

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 14:11:16 -07:00
Eric Traut
a50cb205b7 Stabilize plugin MCP tools test (#19191)
## Summary

The plugin MCP tool-listing test could hide MCP startup failures by
polling `ListMcpTools` until its own 30s deadline. If the plugin MCP
server startup had already failed or timed out, the session-owned MCP
manager would keep returning an empty tool list, so CI only reported
`discovered tools: []` instead of the startup state that mattered.

This makes the test synchronize on `McpStartupComplete` for the sample
plugin MCP server before asserting listed tools, and gives the
Bazel-launched test server a larger startup window.

## Notes

Confidence is about 80%. The source path strongly supports the RCA: a
failed MCP startup is represented as an empty tool list through
`ListMcpTools`, so the old polling contract could not distinguish "not
ready yet" from "startup already failed." I could not retrieve the CI
execution-log artifact to confirm the exact hidden startup error, but
the observed Ubuntu Bazel failure matches this path: repeated
`ListMcpTools` responses with no tools until the test-local timeout
fired.

I think this is the right solution because it keeps plugin behavior
unchanged and fixes only the test contract. Future startup failures
should now report the `McpStartupComplete` failure/cancellation instead
of timing out on an empty tool snapshot.

This test was introduced in https://github.com/openai/codex/pull/12864.
2026-04-23 14:08:40 -07:00
Eric Traut
3f8c06e457 Fix /review interrupt and TUI exit wedges (#18921)
Addresses #11267

## Summary
`/review` can be interrupted while it is still spawning the review
sub-agent. That spawn path lives in `codex-core` and did not observe the
task cancellation token until after `Codex::spawn` returned, so an
interrupted review could keep building a child session and leave the TUI
in a wedged state.

The TUI exit path also waited indefinitely for app-server
`thread/unsubscribe`, which made Ctrl+C look broken if the app-server
was already stuck. This makes interactive delegate startup
cancellation-aware and bounds the TUI shutdown-first unsubscribe wait
with a short UI escape-hatch timeout.

## Testing
I reproed the hang using the steps in the bug report. Confirmed hang no
longer exists after fix.
2026-04-23 13:28:12 -07:00
Eric Traut
cccc1b618e Stabilize approvals popup disabled-row test (#19178)
## Summary

The Windows Bazel job has been failing in
`chatwidget::tests::permissions::approvals_popup_navigation_skips_disabled`
because the test assumed a fixed approvals popup row order and shortcut
for the disabled permissions option. The approvals popup can include
platform-specific rows, so those assumptions made the test brittle.

This updates the test to derive the disabled row shortcut from the
rendered popup and assert navigation continues to skip disabled rows
before checking that disabled numeric shortcuts do not close or accept
the popup.
2026-04-23 13:21:35 -07:00
iceweasel-oai
d169bb541e use a version-specific suffix for command runner binary in .sandbox-bin (#19180)
we copy `codex-command-runner.exe` into `CODEX_HOME/.sandbox-bin/` so
that it can be executed by the sandbox user.
We also detect if that version is stale and copy a new one in if so.

This can fail when you are running multiple versions of the app - the
file in `.sandbox-bin` can look stale because it comes from another app
build.

This change allows us to have multiple versions in there for different
CLI versions, and it fallsback to a `size+mtime` hash in the filename
for dev builds that don't report a real CLI version.
2026-04-23 13:16:26 -07:00
xli-oai
0d6a90cd6b Add app-server marketplace upgrade RPC (#19074)
## Summary
- add a v2 `marketplace/upgrade` app-server RPC that mirrors the
existing configured Git marketplace upgrade path
- expose typed request/response/error payloads and regenerate
JSON/TypeScript schema fixtures
- add app-server integration coverage for all, named, already
up-to-date, and invalid marketplace upgrade requests

## Tests
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server marketplace_upgrade`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-app-server`
- `just fmt`
2026-04-23 13:00:46 -07:00
Michael Bolin
491a3058f6 fix(exec-server): retain output until streams close (#18946)
## Why

A Mac Bazel run hit a flake in
`server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
where the read path observed process exit but lost the expected buffered
stdout (`first\nsecond\n`). See the [GitHub Actions
job](https://github.com/openai/codex/actions/runs/24758468552/job/72436716505)
and [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/37475a12-4ef2-45fb-ab8a-e49a2aba1d59).

The underlying race is that process exit is not the same thing as
stdout/stderr closure. If a child or grandchild inherits the pipe write
end, or a process duplicates it with `dup2`, the watched process can
exit while the stream is still open and more output can still arrive.
The exec-server was starting exited-process retention cleanup from the
exit event, so the process entry could be removed before the output
streams had actually closed.

While stress-testing the exec-server unit suite,
`server::handler::tests::long_poll_read_fails_after_session_resume`
exposed a separate test race: it started a short-lived command that
could exit and wake the pending long-poll read before the session-resume
assertion observed the resumed-session error. That test is intended to
cover resume eviction, not process-exit delivery, so this change keeps
the process alive and quiet while the second connection resumes the
session.

## What changed

- Keep exec-server process entries retained until stdout/stderr streams
close, then start the post-exit retention timer from the closed event.
- Wake long-poll readers when the closed event is emitted.
- Add focused `local_process` unit coverage that proves late output is
still retained after the short test retention interval has elapsed, and
that closed process entries are eventually evicted.
- Add a local and remote regression test where a parent exits while a
child keeps inherited stdout open. The child waits on an explicit
release file, so the test deterministically observes exit first,
releases the child, then requires a nonzero-wait read from the exit
sequence to receive the late output.
- In `codex-rs/exec-server/src/server/handler/tests.rs`, make
`long_poll_read_fails_after_session_resume` run a long-lived silent
command instead of a short command that prints and exits. This isolates
the test to session-resume behavior and prevents a normal process exit
from satisfying the pending long-poll read first.

## Testing

- `cargo test -p codex-exec-server
exec_process_retains_output_after_exit_until_streams_close`
- `cargo test -p codex-exec-server local_process::tests`
- `cargo test -p codex-exec-server`
- `just fix -p codex-exec-server`
- `bazel test //codex-rs/exec-server:exec-server-unit-tests
//codex-rs/exec-server:exec-server-exec_process-test
//codex-rs/exec-server:exec-server-file_system-test
//codex-rs/exec-server:exec-server-http_client-test
//codex-rs/exec-server:exec-server-initialize-test
//codex-rs/exec-server:exec-server-process-test
//codex-rs/exec-server:exec-server-websocket-test`
- `bazel test --runs_per_test=25
//codex-rs/exec-server:exec-server-unit-tests`

## Documentation

No docs update needed; this is an internal exec-server correctness fix.
2026-04-23 19:49:58 +00:00
Michael Bolin
9c0eced391 shell-escalation: carry resolved permission profiles (#18287)
## Why

Shell escalation still has adapter code that expects a legacy sandbox
policy, but command approvals should carry the resolved
`PermissionProfile` so callers can reason about the granted permissions
canonically.

## What changed

This introduces profile-shaped resolved escalation permissions while
retaining the derived legacy sandbox policy for the Unix escalation
adapter. It updates approval types, the escalation server protocol, and
tests that inspect escalated command permissions.

## Verification

- `cargo test -p codex-core --test all handle_container_exec_ --
--nocapture`
- `cargo test -p codex-core --test all handle_sandbox_ -- --nocapture`

























































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18287).
* #18288
* __->__ #18287
2026-04-23 12:46:19 -07:00
cassirer-openai
6d09b6752d [rollout_trace] Trace tool and code-mode boundaries (#18878)
## Summary

Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.

## Stack

This is PR 3/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.

The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.
2026-04-23 12:22:11 -07:00
Michael Bolin
ff22982d75 mcp: include permission profiles in sandbox state (#18286)
## Why

MCP tool calls can receive a serialized `SandboxState` when a server
declares the sandbox-state capability. That state is one of the places
MCP runtimes learn what permissions Codex is operating under. As the
permissions migration makes `PermissionProfile` the canonical
representation, MCP consumers should be able to read that profile
directly instead of reconstructing permissions from the legacy
`SandboxPolicy`.

## What changed

- Adds optional `permissionProfile` to `codex_mcp::SandboxState`, while
keeping `sandboxPolicy` for existing MCP consumers.
- Populates `permissionProfile` from the current `TurnContext` when
serializing sandbox state for MCP tool calls.

## Verification

- Current GitHub Actions for this PR are passing.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18286).
* #18288
* #18287
* __->__ #18286
2026-04-23 12:21:26 -07:00
Michael Bolin
f90cc0ee64 tui: carry permission profiles on user turns (#18285)
## Why

Per-turn permission overrides should use the same canonical profile
abstraction as session configuration. That lets TUI submissions preserve
exact configured permissions without round-tripping through legacy
sandbox fields.

## What changed

This adds `permission_profile` to user-turn operations, threads it
through TUI/app-server submission paths, fills the new field in existing
test fixtures, and adds coverage that composer submission includes the
configured profile.

## Verification

- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`























































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285).
* #18288
* #18287
* #18286
* __->__ #18285
2026-04-23 11:54:17 -07:00
Rasmus Rygaard
f11583b8f6 Add remote thread config endpoint (#18908)
## Why

App-server needs a way to fetch thread-scoped config from the remote
thread config service when the user config opts into that behavior. This
mirrors the existing experimental remote thread store endpoint while
keeping local/noop behavior as the default.

Startup paths also need to avoid silently dropping the remote config
endpoint after the first config load. The stdio app-server path
discovers the endpoint from the initial config and installs the real
thread config loader for later config builds, while in-process clients
used by TUI/exec now select the same remote loader directly from their
provided config.

## What changed

- Added `experimental_thread_config_endpoint` to `ConfigToml`, `Config`,
and `core/config.schema.json`.
- Added config parsing coverage for the new setting.
- Updated app-server startup to select `RemoteThreadConfigLoader` from
the initially loaded config, falling back to `NoopThreadConfigLoader`
when unset.
- Let `ConfigManager` replace its thread config loader after startup
discovery so later config loads use the selected loader.
- Updated in-process app-server client startup to pass
`RemoteThreadConfigLoader` when its config has
`experimental_thread_config_endpoint` set.

## Verification

- Added `experimental_thread_config_endpoint_loads_from_config_toml`.
- Added
`runtime_start_args_use_remote_thread_config_loader_when_configured`.
- Ran `cargo check -p codex-app-server --lib`.
- Ran `cargo test -p codex-app-server-client`.
2026-04-23 11:46:06 -07:00
maja-openai
cff337e4e3 Use Auto-review wording for fallback rationale (#19168)
## Why

PR #18797 currently surfaces fallback rationale text that names Guardian
directly.

## What changed

- Updated the bare allow and bare deny fallback rationales in
`codex-rs/core/src/guardian/prompt.rs` from Guardian to Auto-review.
- Updated the existing bare allow parser test and added explicit bare
deny parser coverage.

## Verification

- `cargo test -p codex-core parse_guardian_assessment_treats_bare`
2026-04-23 11:42:43 -07:00
xl-openai
198eddd25d Move marketplace add/remove and startup sync out of core. (#19099)
Move more things to core-plugins.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 11:27:17 -07:00
Ruslan Nigmatullin
e9165b9f40 ci: add macOS keychain entitlements (#19167)
## Summary

- add macOS application and team identifiers to the release signing
entitlements
- add a Codex keychain access group for release-signed macOS binaries
- keep the existing JIT entitlement unchanged

## Why

Codex release binaries are signed with the OpenAI Developer ID team, but
the current entitlements plist only grants JIT. macOS Keychain and
Secure Enclave operations that create persistent keys can require the
process to carry an application identifier and keychain access group.
Adding these entitlements gives release-signed binaries a stable
Keychain namespace for Codex-owned device keys.

## Validation

- `plutil -lint
.github/actions/macos-code-sign/codex.entitlements.plist`
2026-04-23 11:20:58 -07:00
Ruslan Nigmatullin
8a0ab3fc13 app-server: add Unix socket transport (#18255)
## Summary
- add unix:// app-server transport backed by the shared codex-uds crate
- reuse the websocket connection loop for axum and tungstenite-backed
streams
- add codex app-server proxy to bridge stdio clients to the control
socket
- tolerate Windows UDS backends that report a missing rendezvous path as
connection refused before binding

## Tests
- cargo test -p codex-app-server
control_socket_acceptor_forwards_websocket_text_messages_and_pings
- cargo test -p codex-app-server
- just fmt
- just fix -p codex-app-server
- git -c core.fsmonitor=false diff --check
2026-04-23 11:09:25 -07:00
Eric Traut
c2423f42d1 Respect explicit untrusted project config (#18626)
## Why

Fixes #18475. A `-c` override such as `projects.<cwd>.trust_level =
"untrusted"` is meant to be a runtime config override, but app-server
thread startup treated any non-trusted project as eligible for automatic
trust persistence when a permissive sandbox/cwd was requested. That
meant an explicit `untrusted` session override could still cause
`config.toml` to be updated with `trusted`.

## What changed

The app-server auto-trust path now runs only when the active project
trust level is unknown. Explicit `trusted` and explicit `untrusted`
values are both respected, regardless of whether they came from
persisted config or session flags.

A focused `thread/start` test now covers the explicit `untrusted` case
with a permissive sandbox request.

## Verification

- `cargo test -p codex-app-server`
- `just fix -p codex-app-server`
2026-04-23 10:51:17 -07:00
Tom
f1061d9d07 [codex] Implement remote thread store methods (#19008) 2026-04-23 17:49:28 +00:00
Tom
f1923a38b1 [codex] Route live thread writes through ThreadStore (#18882)
Begin migrating the thread write codepaths to ThreadStore.

This starts using ThreadStore inside of core session code, not only in
the app server code.

Rework the interfaces around thread recording/persistence. We're left
with the following:

* `ThreadManager`: owns the process-level registry of loaded threads and
handles cross-thread orchestration: start, resume, fork, lookup, remove,
and route ops to running CodexThreads.
* `CodexThread`: represents one loaded/running thread from the outside.
It is the handle app-server and callers use to submit ops, inspect
session metadata, and shut the thread down.
* `LiveThread`: session-owned persistence lifecycle handle for one
active thread. Core session code uses it to append rollout items,
materialize lazy persistence, flush, shutdown, discard init-failed
writers, and load that thread’s persisted history.
* `ThreadStore`: storage backend abstraction. It answers “how are
threads persisted, read, listed, updated, archived?” Local and remote
implementations live behind this trait.
* `LocalThreadStore`: local ThreadStore implementation. It owns the
file/sqlite-specific details and keeps RolloutRecorder as a local
implementation detail.

This is a few too many Thread abstractions for my liking, but they do
all represent different concepts / needs / layers.

Migration note: in places where the core code explicitly requires a
path, rather than a thread ID, throw an error if we're running with a
remote store.

Cover the new local live-writer lifecycle with focused tests and
preserve app-server thread-start behavior, including ephemeral pathless
sessions.
2026-04-23 10:17:09 -07:00
David de Regt
3d3028a5a9 Add excludeTurns parameter to thread/resume and thread/fork (#19014)
For callers who expect to be paginating the results for the UI, they can
now call thread/resume or thread/fork with excludeturns:true so it will
not fetch any pages of turns, and instead only set up the subscription.
That call can be immediately followed by pagination requests to
thread/turns/list to fetch pages of turns according to the UI's current
interactions.
2026-04-23 10:07:59 -07:00
Rasmus Rygaard
0b4f694347 Add remote thread config loader protos (#18892)
## Why

Thread-scoped config needs a stable boundary between the app/session
owner and the config stack. Instead of having call sites manually copy
thread config fields into individual overrides, this adds the proto and
Rust plumbing needed for a `ThreadConfigLoader` implementation to return
typed sources that can be translated into ordinary config layer entries.

Keeping the remote payload typed also makes precedence easier to reason
about: session-owned thread config maps back to the existing session
config source, while user-owned thread config is represented separately
without introducing a new config-layer source until it has TOML-backed
fields.

## What changed

- Added the `codex.thread_config.v1` protobuf service and generated Rust
module for loading thread config sources.
- Added `RemoteThreadConfigLoader`, which calls the gRPC service, parses
`SessionThreadConfig` / `UserThreadConfig`, and validates provider
fields such as `wire_api`, auth timeout, and absolute auth cwd.
- Added proto generation tooling under
`config/scripts/generate-proto.sh` and
`config/examples/generate-proto.rs`.
- Added `ThreadConfigLoader::load_config_layers`, plus static/no-op
loader helpers, so tests and callers can use the same typed loader
interface while config-layer translation stays centralized.

## Verification

- `cargo test -p codex-config thread_config`
2026-04-23 10:06:05 -07:00
jif-oai
a2f868c9d6 feat: drop spawned-agent context instructions (#19127)
## Why

MultiAgentV2 children should not receive an extra model-visible
developer fragment just because they were spawned. The parent/configured
developer instructions should carry through normally, but the dedicated
`<spawned_agent_context>` block is no longer desired.

## What changed

- Removed the `SpawnAgentInstructions` context fragment and its
`<spawned_agent_context>` wrapper.
- Stopped appending spawned-agent instructions in
`codex-rs/core/src/tools/handlers/multi_agents_v2/spawn.rs`.
- Updated subagent notification coverage to assert inherited parent
developer instructions without expecting the spawned-agent wrapper.

## Verification

- `cargo test -p codex-core --test all
spawned_multi_agent_v2_child_inherits_parent_developer_context --
--nocapture`
- `cargo test -p codex-core --test all
skills_toggle_skips_instructions_for_parent_and_spawned_child --
--nocapture`
- `cargo test -p codex-core --test all subagent_notifications --
--nocapture`
2026-04-23 18:54:45 +02:00
xli-oai
e18bfeec91 [codex] Fix plugin marketplace help usage (#18710)
## Summary
- Updates generated CLI help for plugin marketplace commands to show the
full `codex plugin marketplace ...` namespace.
- Adds a regression test covering the marketplace command and its `add`,
`upgrade`, and `remove` help pages.

## Root Cause
The marketplace parser already lived under `codex plugin marketplace`,
but Clap generated usage text from the child parser's standalone command
name. That made help output show stale `codex marketplace ...`
instructions even though the top-level `codex marketplace` command no
longer parses.

## Validation
- `just fmt`
- `cargo test -p codex-cli`
- `./target/debug/codex plugin marketplace --help`
2026-04-23 09:48:37 -07:00
Michael Bolin
5c239ad748 tui: sync session permission profiles (#18284)
## Why

Once `SessionConfigured` carries the active `PermissionProfile`, the TUI
must treat that as authoritative session state. Otherwise the widget can
keep stale local permission details after a session is configured or
resumed.

The TUI also keeps a local `Config` copy used for later operations, so
session-sourced profiles and subsequent local sandbox changes need to
keep the derived split runtime permissions in sync. Because this PR may
land before the follow-up user-turn profile plumbing, embedded
app-server turns also need a standalone path for carrying local runtime
sandbox overrides.

## What changed

- Sync the chat widget runtime filesystem/network permissions from
`SessionConfigured.permission_profile`, with the legacy `sandbox_policy`
as the fallback.
- Recompute split runtime permissions whenever the TUI applies or
carries forward a local sandbox-policy override.
- Mark feature-driven Auto-review sandbox changes as runtime sandbox
overrides so the standalone embedded turn-start profile path is used
even without the follow-up user-turn profile PR.
- Send a turn-start `permissionProfile` for embedded,
non-ExternalSandbox turns when the TUI has a runtime sandbox override;
remote and ExternalSandbox turns keep using the legacy sandbox field.
- Extend coverage for profile sync, local sandbox changes,
ExternalSandbox fallback, feature-driven sandbox overrides, and
turn-start permission override selection.

## Verification

- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_selects_auto_review`
- `cargo test -p codex-tui
turn_start_permission_overrides_send_profiles_only_for_embedded_runtime_overrides`
- `cargo test -p codex-tui permission_settings_sync`
- `cargo test -p codex-tui
session_configured_external_sandbox_keeps_external_runtime_policy`
- `cargo test -p codex-tui
session_configured_syncs_widget_config_permissions_and_cwd`
- `just fix -p codex-tui`



---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18284).
* #18288
* #18287
* #18286
* #18285
* __->__ #18284
2026-04-23 09:47:53 -07:00
Eric Traut
1fda843fbc Update safety check wording (#19149)
Updates wording of cyber safety check.
2026-04-23 08:53:25 -07:00
jif-oai
45e1742030 exec-server: wait for close after observed exit (#19130)
## Why

Windows CI can flake in
`server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
after a process has exited but before both output streams have closed.
`exec/read` returned immediately whenever `exited` was true, so callers
that had already observed the exit event could spin instead of
long-polling for the later `closed` state.

## What Changed

- Keep returning immediately when a terminal exit event is newly
observable.
- Allow later reads, after the caller has advanced past that event, to
wait for `closed` or new output until `wait_ms` expires.

## Verification

- CI pending.
2026-04-23 16:50:17 +02:00
jif-oai
d3b044938d Reject agents.max_threads with multi_agent_v2 (#19129)
## Why

`multi_agent_v2` uses the v2 agent lifecycle, so accepting the legacy
`agents.max_threads` limit alongside it creates conflicting
configuration semantics. Config load should fail early with a clear
error instead of allowing both knobs to be set.

## What Changed

- During config load, detect when the effective `multi_agent_v2` feature
is enabled and `agents.max_threads` is explicitly set.
- Return an `InvalidInput` error: `agents.max_threads cannot be set when
multi_agent_v2 is enabled`.

## Verification

- `cargo test -p codex-core multi_agent_v2_rejects_agents_max_threads`
passed locally with a temporary focused test for this behavior.
- `cargo test -p codex-core` was also run; the new focused path passed,
but the crate suite has unrelated pre-existing failures in managed
config/proxy/request-permissions tests.
2026-04-23 13:31:54 +02:00
Won Park
17ae906048 Fix auto-review config compatibility across protocol and SDK (#19113)
## Why

This keeps the partial Guardian subagent -> Auto-review rename
forward-compatible across mixed Codex installations. Newer binaries need
to understand the new `auto_review` spelling, but they cannot write it
to shared `~/.codex/config.toml` yet because older CLI/app-server
bundles only know `user` and `guardian_subagent` and can fail during
config load before recovering.

The Python SDK had the opposite compatibility gap: app-server responses
can contain `approvalsReviewer: "auto_review"`, but the checked-in
generated SDK enum did not accept that value.

## What Changed

- Keep `ApprovalsReviewer::AutoReview` readable from both
`guardian_subagent` and `auto_review`, while serializing it as
`guardian_subagent` in both protocol crates.
- Update TUI Auto-review persistence tests so enabling Auto-review
writes `approvals_reviewer = "guardian_subagent"` while UI copy still
says Auto-review.
- Map managed/cloud `feature_requirements.auto_review` to the existing
`Feature::GuardianApproval` gate without adding a broad local
`[features].auto_review` key or changing config writes.
- Add `auto_review` to the Python SDK `ApprovalsReviewer` enum and cover
`ThreadResumeResponse` validation.

## Testing

- `cargo test -p codex-protocol approvals_reviewer`
- `cargo test -p codex-app-server-protocol approvals_reviewer`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_selects_auto_review`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_in_profile_sets_profile_auto_review_policy`
- `cargo test -p codex-core
feature_requirements_auto_review_disables_guardian_approval`
- `pytest
sdk/python/tests/test_client_rpc_methods.py::test_thread_resume_response_accepts_auto_review_reviewer`
- `git diff --check`
2026-04-23 03:12:56 -07:00
Abhinav
305825abd9 Support MCP tools in hooks (#18385)
## Summary

Lifecycle hooks currently treat `PreToolUse`, `PostToolUse`, and
`PermissionRequest` as Bash-only flows
- hook schema constrains `tool_name` to `Bash`
- hook input assumes a command-shaped `tool_input`
- core hook dispatch path passes only shell command strings

That means hooks cannot target MCP tools even though MCP tool names are
model-visible and stable

This change generalizes those hook paths so they can match and receive
payloads for MCP tools while preserving the existing Bash behavior.

## Reviewer Notes

I think these are the key files
- `codex-rs/core/src/tools/handlers/mcp.rs`
- `codex-rs/core/src/mcp_tool_call.rs`

Otherwise the changes across apply_patch, shell, and unified_exec are
mainly to rewire everything to be `tool_input` based instead of just
`command` so that it'll make sense for MCP tools.

## Changes

- Allow `PreToolUse`, `PostToolUse`, and `PermissionRequest` hook inputs
to carry arbitrary `tool_name` and `tool_input` values instead of
hard-coding `Bash` and command-only payloads.
- Add MCP hook payload support through `McpHandler`, using the
model-visible tool name from `ToolInvocation` and the raw MCP arguments
as `tool_input`.
- Include MCP tool responses in `PostToolUse` by serializing
`McpToolOutput` into the hook response payload.
- Run `PermissionRequest` hooks for MCP approval requests after
remembered approval checks and before falling back to user-facing MCP
elicitation.
- Preserve exact matching for literal hook matchers like `Bash` and
`mcp__memory__create_entities`, while keeping regex matcher support for
patterns like `mcp__memory__.*` and `mcp__.*__write.*`.

---------

Co-authored-by: Andrei Eternal <eternal@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-23 07:33:57 +00:00
Michael Bolin
8bc667b07b app-server: include filesystem entries in permission requests (#19086)
## Why

`item/permissions/requestApproval` sends a requested permission profile
to app-server clients. The core profile already stores filesystem
permissions as `entries`, but the v2 compatibility conversion used the
legacy `read`/`write` projection whenever possible and left `entries`
unset.

That made the request ambiguous for clients that consume the canonical
v2 shape: `permissions.fileSystem.entries` was missing even though
filesystem access was being requested. A client that rendered or echoed
grants from `entries` could treat the request as having no filesystem
permission entries, then return an empty or incomplete grant. The
app-server intersects responses with the original request, so omitted
filesystem permissions are denied.

## What Changed

- Populate `AdditionalFileSystemPermissions.entries` when converting
legacy read/write roots for request permission payloads, while
preserving `read` and `write` for compatibility.
- Mark `read` and `write` as transitional schema fields in the generated
app-server schema.
- Add regression coverage for the v2 conversion, the app-server
`item/permissions/requestApproval` round trip, and TUI app-server
approval conversion expectations.
- Refresh generated JSON and TypeScript schema fixtures.

## Verification

- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server request_permissions_round_trip`
- `cargo test -p codex-tui
converts_request_permissions_into_granted_permissions`
- `cargo test -p codex-tui
resolves_permissions_and_user_input_through_app_server_request_id`
2026-04-23 00:21:59 -07:00
Shijie Rao
993e3f407e Persist target default reasoning on model upgrade (#19085)
## Why

When the TUI upgrade flow moves a user to a newer model, the accepted
migration should also persist the target model's default reasoning
effort. That keeps the upgraded model and reasoning setting aligned
instead of carrying forward a stale previously saved effort from the old
model.

## What changed

- The accepted model migration path now updates in-memory config, TUI
state, and persisted model selection with the target preset's
`default_reasoning_effort`.
- The upgrade destructuring keeps `reasoning_effort_mapping` explicitly
unused because mappings are no longer consulted on accepted migrations.
- Added a catalog test that starts with a pre-existing saved reasoning
effort and verifies the accepted upgrade overwrites it with the target
model default and emits the expected persistence events.
- Rebasing onto current `main` also updates a TUI thread-session test
helper for the latest `permission_profile` field and
`ApprovalsReviewer::AutoReview` rename so CI compiles on the new base.

## Verification

- `cargo test -p codex-tui model_catalog`
- `cargo test -p codex-tui
permission_settings_sync_updates_active_snapshot_without_rewriting_side_thread`
2026-04-22 23:36:15 -07:00
Gav Verma
2ef2d675d6 Clarify cloud requirements error messages (#19078)
## Why
The current cloud-requirements failures say `workspace-managed config`,
which is ambiguous and can read like it refers to local managed config
such as `managed_config.toml`.

This code path only applies to cloud requirements, so the user-facing
message should name that source directly.

## What changed
- Updated the load failure in
[`codex-rs/cloud-requirements/src/lib.rs`](46e704d1f9/codex-rs/cloud-requirements/src/lib.rs)
to say `failed to load cloud requirements (workspace-managed policies)`.
- Updated the parse failure in the same file to use the same `cloud
requirements (workspace-managed policies)` terminology.
- Kept `workspace-managed` hyphenated because it is used as a compound
modifier.
- Updated the matching assertion in
[`codex-rs/app-server/src/codex_message_processor.rs`](46e704d1f9/codex-rs/app-server/src/codex_message_processor.rs).
- Reused `CLOUD_REQUIREMENTS_LOAD_FAILED_MESSAGE` in the
`codex-cloud-requirements` test where the test is asserting that
crate-local contract directly.

## Testing
`cargo test -p codex-cloud-requirements`
2026-04-22 23:07:08 -07:00
xl-openai
951be1a8a1 feat: Warn and continue on unknown feature requirements (#19038)
Requirements feature flags now fail open like config feature flags, but
with a startup warning.

<img width="443" height="68" alt="image"
src="https://github.com/user-attachments/assets/76767fa7-8ce8-4fc7-8a09-902fcdda6298"
/>
2026-04-22 22:50:44 -07:00
xl-openai
fb6308cf64 Use remote plugin IDs for detail reads and enlarge list pages (#19079)
1. For remote plugin use plugin id (plugin name) directly for read
plugin details;
2. Request up to 200 remote plugins per directory list page.
2026-04-22 22:50:20 -07:00
Leo Shimonaka
7730fb3ab8 Add computer_use feature requirement key (#19071)
## Summary
- add the `computer_use` requirements-only feature key
- include it in generated config schema output
- cover the new key in feature metadata tests

## Testing
- `cargo test -p codex-features`
- `just write-config-schema`
- `just fmt`
- `just fix -p codex-features`

cc @xl-openai

---------

Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
2026-04-22 22:49:26 -07:00
Eric Traut
08b5e96678 TUI: preserve permission state after side conversations (#18924)
Addresses #18854

## Why

The `/permissions` selector updates the active TUI session state, but
the cached session snapshot used when replaying a thread could still
contain the old approval or sandbox settings. After opening and leaving
`/side`, the main thread replay could restore those stale settings into
the `ChatWidget`, so the UI and the next submitted turn could fall back
to the old permission mode.

## What

- Sync the active thread's cached `ThreadSessionState` whenever approval
policy, sandbox policy, or approval reviewer changes.

## Verification

Confirmed bug prior to fix and correct behavior after fix.
2026-04-22 22:40:35 -07:00
Abhinav
23afa173f4 Mark codex_hooks stable (#19012)
# Why

Hooks are ready to graduate to GA in the next release!

# What

- Moves `Feature::CodexHooks` into the stable feature group.
- Marks the `codex_hooks` feature spec as `Stage::Stable` and
default-enabled.
2026-04-23 05:34:05 +00:00
Michael Bolin
9d824cf4b4 app-server: accept command permission profiles (#18283)
## Why

`command/exec` is another app-server entry point that can run under
caller-provided permissions. It needs to accept `PermissionProfile`
directly so command execution is not left behind on `SandboxPolicy`
while thread APIs move forward.

Command-level profiles also need to preserve the semantics clients
expect from profile-relative paths. `:cwd` and cwd-relative deny globs
should be anchored to the resolved command cwd for a command-specific
profile, while configured deny-read restrictions such as `**/*.env =
none` still need to be enforced because they can come from config or
requirements rather than the command override itself.

## What Changed

This adds `permissionProfile` to `CommandExecParams`, rejects requests
that combine it with `sandboxPolicy`, and converts accepted profiles
into the runtime filesystem/network permissions used for command
execution.

When a command supplies a profile, the app-server resolves that profile
against the command cwd instead of the thread/server cwd. It also
preserves configured deny-read entries and `globScanMaxDepth` on the
effective filesystem policy so one-off command overrides cannot drop
those read protections. The PR also updates app-server docs/schema
fixtures and adds command-exec coverage for accepted, rejected,
cwd-scoped, and deny-read-preserving profile paths.

## Verification

- `cargo test -p codex-app-server
command_exec_permission_profile_cwd_uses_command_cwd`
- `cargo test -p codex-app-server
command_profile_preserves_configured_deny_read_restrictions`
- `cargo test -p codex-app-server
command_exec_accepts_permission_profile`
- `cargo test -p codex-app-server
command_exec_rejects_sandbox_policy_with_permission_profile`
- `just fix -p codex-app-server`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18283).
* #18288
* #18287
* #18286
* #18285
* #18284
* __->__ #18283
2026-04-22 22:33:16 -07:00
Eric Traut
bbff4ee61a Add safety check notification and error handling (#19055)
Adds a new app-server notification that fires when a user account has
been flagged for potential safety reasons.
2026-04-22 22:24:12 -07:00
Shijie Rao
02170996e6 Default Fast service tier for eligible ChatGPT plans (#19053)
## Why

Enterprise and business-like ChatGPT plans should get Codex's Fast
service tier by default when the user or caller has not made an explicit
service-tier choice. At the same time, callers need a durable way to
choose standard routing without adding a new persisted `standard`
service tier value. This keeps existing config compatibility while
letting core own the managed default policy.

## What changed

- Resolve the effective service tier in core at session creation:
explicit `fast` or `flex` wins, explicit null/clear or
`[notice].fast_default_opt_out = true` resolves to standard routing, and
otherwise eligible ChatGPT plans resolve to Fast when FastMode is
enabled.
- Add `[notice].fast_default_opt_out` as the persisted opt-out marker
for managed Fast defaults.
- Treat app-server/TUI `service_tier: null` as an explicit
standard/clear choice by preserving that intent through config loading.
- Update TUI rendering to use core's effective service tier for startup
and status surfaces while still keeping `config.service_tier` as the
explicit configured choice.
- Update `/fast off` to clear `service_tier`, persist the opt-out
marker, and send explicit standard for subsequent turns.

## Verification

- Added unit coverage for config override/notice handling, service-tier
resolution, runtime null clearing, and `/fast off` turn propagation.
- `cargo build -p codex-cli`

Full test suite was not run locally per author request.
2026-04-22 21:54:44 -07:00
Michael Bolin
082fc4f632 protocol: report session permission profiles (#18282)
## Why

Clients that observe `SessionConfigured` need the same canonical
permission view that app-server thread responses provide. Reporting the
profile in protocol events lets clients keep their local state
synchronized without reinterpreting legacy sandbox fields.

## What changed

This adds `permission_profile` to `SessionConfigured` and propagates it
through core, exec JSON output, MCP server messages, and TUI
history/widget handling.

## Verification

- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`















































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18282).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* __->__ #18282
2026-04-22 21:29:32 -07:00
Andrei Eternal
2b2de3f38b codex: support hooks in config.toml and requirements.toml (#18893)
## Summary

Support the existing hooks schema in inline TOML so hooks can be
configured from both `config.toml` and enterprise-managed
`requirements.toml` without requiring a separate `hooks.json` payload.

This gives enterprise admins a way to ship managed hook policy through
the existing requirements channel while still leaving script delivery to
MDM or other device-management tooling, and it keeps `hooks.json`
working unchanged for existing users.

This also lays the groundwork for follow-on managed filtering work such
as #15937, while continuing to respect project trust gating from #14718.
It does **not** implement `allow_managed_hooks_only` itself.

NOTE: yes, it's a bit unfortunate that the toml isn't formatted as
closely as normal to our default styling. This is because we're trying
to stay compatible with the spec for plugins/hooks that we'll need to
support & the main usecase here is embedding into requirements.toml

## What changed

- moved the shared hook serde model out of `codex-rs/hooks` into
`codex-rs/config` so the same schema can power `hooks.json`, inline
`config.toml` hooks, and managed `requirements.toml` hooks
- added `hooks` support to both `ConfigToml` and
`ConfigRequirementsToml`, including requirements-side `managed_dir` /
`windows_managed_dir`
- treated requirements-managed hooks as one constrained value via
`Constrained`, so managed hook policy is merged atomically and cannot
drift across requirement sources
- updated hook discovery to load requirements-managed hooks first, then
per-layer `hooks.json`, then per-layer inline TOML hooks, with a warning
when a single layer defines both representations
- threaded managed hook metadata through discovered handlers and exposed
requirements hooks in app-server responses, generated schemas, and
`/debug-config`
- added hook/config coverage in `codex-rs/config`, `codex-rs/hooks`,
`codex-rs/core/src/config_loader/tests.rs`, and
`codex-rs/core/tests/suite/hooks.rs`

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server config_api`

## Documentation

Companion updates are needed in the developers website repo for:

- the hooks guide
- the config reference, sample, basic, and advanced pages
- the enterprise managed configuration guide

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
2026-04-22 21:20:09 -07:00
Michael Bolin
9955eacd22 tui: fix approvals popup disabled shortcut test (#19072)
## Why

This regressed in #19063, which made `GuardianApproval` stable and
enabled by default. That adds an enabled `Auto-review` row to the
permissions popup, but `approvals_popup_navigation_skips_disabled` still
assumed the disabled `Full Access` row lived behind a hard-coded numeric
shortcut, so the test started selecting a different row and closing the
popup instead of verifying disabled-row behavior.

## What

- disable `GuardianApproval` in
`approvals_popup_navigation_skips_disabled` so the popup layout matches
the scenario the test is exercising
- choose the hidden numeric shortcut for the disabled `Full Access` row
by platform (`2` on non-Windows, `3` on Windows where `Read Only` is
shown) before asserting that selecting the disabled row leaves the popup
open

## Testing

- `cargo test -p codex-tui --lib
chatwidget::tests::permissions::approvals_popup_navigation_skips_disabled
-- --exact --nocapture`
- `cargo test -p codex-tui --lib chatwidget::tests::permissions --
--nocapture`
- `cargo test -p codex-tui`
2026-04-22 21:01:26 -07:00
Michael Bolin
e8ba912fcc test: set Rust test thread stack size (#19067)
## Summary

Set `RUST_MIN_STACK=8388608` for Rust test entry points so
libtest-spawned test threads get an 8 MiB stack.

The Windows BuildBuddy failure on #18893 showed
`//codex-rs/tui:tui-unit-tests` exiting with a stack overflow in a
`#[tokio::test]` even though later test binaries in the shard printed
successful summaries. Default `#[tokio::test]` uses a current-thread
Tokio runtime, which means the async test body is driven on libtest's
std-spawned test thread. Increasing the test thread stack addresses that
failure mode directly.

To date, we have been fixing these stack-pressure problems with
localized future-size reductions, such as #13429, and by adding
`Box::pin()` in specific async wrapper chains. This gives us a baseline
test-runner stack size instead of continuing to patch individual tests
only after CI finds another large async future.

## What changed

- Added `common --test_env=RUST_MIN_STACK=8388608` in `.bazelrc` so
Bazel test actions receive the env var through Bazel's cache-keyed test
environment path.
- Set the same `RUST_MIN_STACK` value for Cargo/nextest CI entry points
and `just test`.
- Annotated the existing Windows Bazel linker stack reserve as 8 MiB so
it stays aligned with the libtest thread stack size.

## Testing

- `just --list`
- parsed `.github/workflows/rust-ci.yml` and
`.github/workflows/rust-ci-full.yml` with Ruby's YAML loader
- compared `bazel aquery` `TestRunner` action keys before/after explicit
`--test_env=RUST_MIN_STACK=...` and after moving the Bazel env to
`.bazelrc`
- `bazel test //codex-rs/tui:tui-unit-tests --test_output=errors`
- failed locally on the existing sandbox-specific status snapshot
permission mismatch, but loaded the Starlark changes and ran the TUI
test shards
2026-04-22 19:51:49 -07:00
Dylan Hurd
5e71da1424 feat(request-permissions) approve with strict review (#19050)
## Summary
Allow the user to approve a request_permissions_tool request with the
condition that all commands in the rest of the turn are reviewed by
guardian, regardless of sandbox status.

## Testing
- [x] Added unit tests
- [x] Ran locally
2026-04-23 01:56:32 +00:00
Dylan Hurd
c6ab601824 chore(auto-review) feature => stable (#19063)
## Summary
Turn on Auto Review

## Testing
- [x] Update unit tests
2026-04-22 18:51:39 -07:00
Matthew Zeng
8f0a92c1e5 Fix relative stdio MCP cwd fallback (#19031) 2026-04-22 17:52:17 -07:00
Michael Bolin
3cc3763e6c core: box multi-agent wrapper futures (#19059)
## Why

While debugging the Windows stack overflows we saw in
[#13429](https://github.com/openai/codex/pull/13429) and then again in
[#18893](https://github.com/openai/codex/pull/18893), I hit another
overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.

That test drives the legacy multi-agent spawn / close / resume path. The
behavior was fine, but several thin async wrappers were still inlining
much larger `AgentControl` futures into their callers, which was enough
to overflow the default Windows stack.

## What

- Box the thin `AgentControl` wrappers around `spawn_agent_internal`,
`resume_single_agent_from_rollout`, and `shutdown_agent_tree`.
- Box the corresponding legacy `multi_agents` handler calls in `spawn`,
`resume_agent`, and `close_agent`.
- Keep behavior unchanged while reducing future size on this call path
so the Windows test no longer overflows its stack.

## Testing

- `cargo test -p codex-core --lib
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed
-- --exact --nocapture`
- `cargo test -p codex-core` (this still hit unrelated local
integration-test failures because `codex.exe` / `test_stdio_server.exe`
were not present in this shell; the relevant unit tests passed)
2026-04-22 17:48:13 -07:00
Ahmed Ibrahim
0e78ce80ee [3/4] Add executor-backed RMCP HTTP client (#18583)
### Why
The RMCP layer needs a Streamable HTTP client that can talk either
directly over `reqwest` or through the executor HTTP runner without
duplicating MCP session logic higher in the stack. This PR adds that
client-side transport boundary so remote Streamable HTTP MCP can reuse
the same RMCP flow as the local path.

### What
- Add a shared `rmcp-client/src/streamable_http/` module with:
  - `transport_client.rs` for the local-or-remote transport enum
  - `local_client.rs` for the direct `reqwest` implementation
  - `remote_client.rs` for the executor-backed implementation
  - `common.rs` for the small shared Streamable HTTP helpers
- Teach `RmcpClient` to build Streamable HTTP transports in either local
or remote mode while keeping the existing OAuth ownership in RMCP.
- Translate remote POST, GET, and DELETE session operations into
executor `http/request` calls.
- Preserve RMCP session expiry handling and reconnect behavior for the
remote transport.
- Add remote transport coverage in
`rmcp-client/tests/streamable_http_remote.rs` and keep the shared test
support in `rmcp-client/tests/streamable_http_test_support.rs`.

### Verification
- `cargo check -p codex-rmcp-client`
- online CI

### Stack
1. #18581 protocol
2. #18582 runner
3. #18583 RMCP client
4. #18584 manager wiring and local/remote coverage

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 17:38:04 -07:00
Won Park
83ec1eb5d6 Rename approvals reviewer variant to auto-review (#19056)
## Why

`approvals_reviewer` now uses `auto_review` as the canonical config/API
value after #18504, but the Rust enum variant and nearby helper/test
names still used `GuardianSubagent` / guardian approval wording. That
made follow-up code and reviews confusing even though the external value
had already moved to Auto-review.

## What changed

- Renamed `ApprovalsReviewer::GuardianSubagent` to
`ApprovalsReviewer::AutoReview`.
- Updated protocol, app-server, config, core, TUI, exec, and analytics
test callsites.
- Renamed nearby helper/test names from guardian approval wording to
Auto-review wording where they refer to the approvals reviewer mode.
- Preserved wire compatibility:
  - `auto_review` remains the canonical serialized value.
  - `guardian_subagent` remains accepted as a legacy alias.

This intentionally does not rename the `[features].guardian_approval`
key, `Feature::GuardianApproval`, `core/src/guardian`, analytics event
names, or app-server Guardian review event types.

## Verification

- `cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-config approvals_reviewer`
- `cargo test -p codex-tui update_feature_flags`
- `cargo test -p codex-core permissions_instructions`
- `cargo test -p codex-tui permissions_selection`
2026-04-22 17:22:35 -07:00
Andrei Eternal
eed0e07825 hooks: emit Bash PostToolUse when exec_command completes via write_stdin (#18888)
Fixes #16246.

## Why

`exec_command` already emits `PreToolUse`, but long-running unified exec
commands that finish on a later `write_stdin` poll could miss the
matching `PostToolUse`. That left the Bash hook lifecycle inconsistent,
broke expectations around `tool_use_id` and `tool_input.command`, and
meant `PostToolUse` block/replacement feedback could fail to replace the
final session output before it reached model context.

This keeps the fix scoped to the `exec_command` / `write_stdin`
lifecycle. Broader non-Bash hook expansion is still out of scope here
and remains tracked separately in #16732.

## What changed

- Compute and store `PostToolUsePayload` while handlers still have
access to their concrete output type, and carry `tool_use_id` through
that payload.
- Preserve the original hook-facing `exec_command` string through
unified exec state (`ExecCommandRequest`, `ProcessEntry`,
`PreparedProcessHandles`, and `ExecCommandToolOutput`) via
`hook_command`, and remove the now-unused `session_command` output
metadata.
- Emit exactly one Bash `PostToolUse` for long-running `exec_command`
sessions when a later `write_stdin` poll observes final completion,
using the original `exec_command` call id and hook-facing command.
- Keep one-shot `exec_command` behavior aligned with the same payload
construction, including interactive completions that return a final
result directly.
- Apply `PostToolUse` block/replacement feedback before the final
`write_stdin` completion output is sent back to the model.
- Keep `write_stdin` itself out of `PreToolUse` matching so it continues
to act as transport/polling for the original Bash tool call.
- Restore plain matcher behavior for tool-name matchers such as `Bash`
and `Edit|Write`, while still treating patterns with regex characters
(for example `mcp__.*`) as regexes.
- Add unit coverage for unified exec payload construction and parallel
session separation, plus a core integration regression that verifies a
blocked `PostToolUse` replaces the final `write_stdin` output in model
context.

## Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-core post_tool_use_payload`
- `cargo test -p codex-core
post_tool_use_blocks_when_exec_session_completes_via_write_stdin`
2026-04-22 17:14:22 -07:00
Michael Bolin
6ca038bbd1 rollout: persist turn permission profiles (#18281)
## Why

Resume and reconstruction need to preserve the permissions that were
active for each user turn. If rollouts only keep legacy sandbox fields,
replay cannot faithfully represent profile-shaped overrides introduced
earlier in the stack.

## What changed

This records `permission_profile` on user-turn rollout events,
reconstructs it through history/state extraction, and updates rollout
reconstruction and related fixtures to keep the field explicit.

## Verification

- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
- `cargo test -p codex-core --test all request_permissions --
--nocapture`











































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* __->__ #18281
2026-04-22 17:00:29 -07:00
Michael Bolin
bc083e4713 clients: send permission profiles to app-server (#18280)
## Why

After app-server can accept `PermissionProfile`, first-party clients
should stop preferring legacy sandbox fields when canonical permission
information is available. This keeps the migration moving without
removing legacy compatibility yet.

The client side still has mixed surfaces during the stack: embedded
thread start/resume/fork and exec initial turns can derive a profile
directly from local config, while TUI remote sessions and some
turn-start paths only have a legacy/server-context-safe sandbox
projection. Those paths keep sending legacy sandbox fields rather than
synthesizing or sending lossy/local-only profiles.

## What changed

- Sends `permissionProfile` from exec and embedded TUI thread
start/resume/fork requests when config has a representable profile.
- Keeps legacy sandbox fallback for external sandbox policies, TUI
remote thread lifecycle requests, and TUI turn-start requests that do
not yet carry the active profile.
- Sends the actual config-derived `permissionProfile` for exec initial
turns instead of rebuilding one from the legacy sandbox projection.
- Stores response `permissionProfile` as optional in TUI session state
so external sandbox responses and compatibility payloads preserve
`null`.
- Updates tests for request construction and response mapping.

## Verification

- `cargo check --tests -p codex-tui -p codex-exec`
- `cargo test -p codex-tui app_server_session -- --nocapture`
- `cargo test -p codex-exec thread_start_params -- --nocapture`
- `cargo test -p codex-tui
app_server_session::tests::thread_lifecycle_params -- --nocapture`
- `just fix -p codex-tui -p codex-exec`
- `just fix -p codex-tui`












---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18280).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* __->__ #18280
2026-04-22 16:34:13 -07:00
Michael Bolin
44dbd9e48a exec-server: require explicit filesystem sandbox cwd (#19046)
## Why

This is a cleanup PR for the `PermissionProfile` migration stack. #19016
fixed remote exec-server sandbox contexts so Docker-backed filesystem
requests use a request/container `cwd` instead of leaking the local test
runner `cwd`. That exposed the broader API problem:
`FileSystemSandboxContext::new(SandboxPolicy)` could still reconstruct
filesystem permissions by reading the exec-server process cwd with
`AbsolutePathBuf::current_dir()`.

That made `cwd`-dependent legacy entries, such as `:cwd`,
`:project_roots`, and relative deny globs, depend on ambient process
state instead of the request sandbox `cwd`. As later PRs make
`PermissionProfile` the primary permissions abstraction, sandbox
contexts should be explicit about whether they carry a request `cwd` or
are profile-only. Removing the implicit constructor prevents new call
sites from accidentally rebuilding permissions against the wrong `cwd`.

## What changed

- Removed `FileSystemSandboxContext::new(SandboxPolicy)`.
- Kept production callers on explicit constructors:
`from_legacy_sandbox_policy(..., cwd)`, `from_permission_profile(...)`,
and `from_permission_profile_with_cwd(...)`.
- Updated exec-server test helpers to construct `PermissionProfile`
values directly instead of routing through legacy `SandboxPolicy`
projections.
- Updated the environment regression test to use an explicit restricted
profile with no synthetic `cwd`.

## Verification

- `cargo test -p codex-exec-server`
- `just fix -p codex-exec-server`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19046).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #19046
2026-04-22 23:05:12 +00:00
Won Park
46142c3cb0 Rebrand approvals reviewer config to auto-review (#18504)
### Why

Auto-review is the user-facing name for the approvals reviewer, but the
config/API value still exposed the old `guardian_subagent` name. That
made new configs and generated schemas point users at Guardian
terminology even though the intended product surface is Auto-review.

This PR updates the external `approvals_reviewer` value while preserving
compatibility for existing configs and clients.

### What changed

- Makes `auto_review` the canonical serialized value for
`approvals_reviewer`.
- Keeps `guardian_subagent` accepted as a legacy alias.
- Keeps `user` accepted and serialized as `user`.
- Updates generated config and app-server schemas so
`approvals_reviewer` includes:
  - `user`
  - `auto_review`
  - `guardian_subagent`
- Updates app-server README docs for the reviewer value.
- Updates analytics and config requirements tests for the canonical
auto_review value.


### Compatibility

Existing configs and API payloads using:

```toml
approvals_reviewer = "guardian_subagent"
```

continue to load and map to the Auto-review reviewer behavior. 

New serialization emits: 
```toml
approvals_reviewer = "auto_review" 
```

This PR intentionally does not rename the [features].guardian_approval
key or broad internal Guardian symbols. Those are split out for a
follow-up PR to keep this migration small and avoid touching large
TUI/internal surfaces.

**Verification**
cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
2026-04-22 15:45:35 -07:00
Konstantine Kahadze
0e25c5ff42 Update bundled OpenAI Docs skill freshness check (#19043)
## Summary

Sync the bundled `openai-docs` system skill with the already-merged
`openai/skills` update from https://github.com/openai/skills/pull/360.

Codex bundles system skills from `codex-rs/skills/src/assets/samples`,
so this PR copies the same GPT-5.4 OpenAI Docs skill update into the
Codex app/CLI bundle path.

## Changes

- Add the latest-model resolver script to the bundled `openai-docs`
skill.
- Route model upgrade and prompt-upgrade requests through remote
latest-model metadata when current guidance is needed.
- Rename bundled fallback references to `upgrade-guide.md` and
`prompting-guide.md`.
- Keep the bundled fallback guidance GPT-5.4-only.

## Validation

- Verified this bundled skill is byte-for-byte identical to
`openai/skills@origin/main` `skills/.system/openai-docs`.
- Ran the resolver locally and confirmed it returns `gpt-5.4` /
`gpt-5p4`.
2026-04-22 22:31:04 +00:00
khoi
568cdacc7e [Codex] Register browser requirements feature keys (#18956)
## Summary
- register `in_app_browser` and `browser_use` as stable feature keys
- allow requirements/MDM feature requirements to pin those desktop
browser controls
- add coverage for browser requirements being accepted by config loading

## Testing
- `cargo fmt --all` (`just fmt` unavailable locally; rustfmt warned
about nightly-only `imports_granularity` config)
- `cargo test -p codex-features`
- `cargo test -p codex-core browser_feature_requirements_are_valid`
- Tested manually by setting in `requirements.toml` and seeing after app
restart state to reflect the setting was correct (at the time hiding the
`Browser Use` setting when the enterprise setting was set to false
2026-04-22 15:27:15 -07:00
joeytrasatti-openai
ee70b365ab Overlay state DB git metadata for filtered thread lists (#19036)
## Summary
- Factor the state DB `ThreadMetadata` to rollout `ThreadItem` mapping
into a shared helper used by both DB pages and filesystem overlays
- Generalize filtered filesystem list overlays to fill missing thread
list metadata from the state-derived `ThreadItem`, while preserving
filesystem `path` and `thread_id`
- Add coverage for the merge behavior so existing filesystem values are
not overwritten and future `ThreadItem` fields require an explicit
decision

## Testing
- `just fmt` from `codex-rs`
- `git diff --check -- codex-rs/rollout/src/recorder.rs
codex-rs/rollout/src/recorder_tests.rs`
- Attempted `cargo test -p codex-rollout thread_item_metadata` from
`codex-rs`; blocked in dependency fetch/setup after updating crates.io
and git submodules `https://github.com/livekit/protocol` and
`https://chromium.googlesource.com/libyuv/libyuv`, so the focused tests
did not run
2026-04-22 14:59:20 -07:00
Michael Bolin
d3dd0d759b exec-server: expose arg0 alias root to fs sandbox (#19016)
## Why

The post-merge `rust-ci-full` run for #18999 still failed the Ubuntu
remote `suite::remote_env` sandboxed filesystem tests. That run checked
out merge commit `ddde50c611e4800cb805f243ed3c50bbafe7d011`, so the arg0
guard lifetime fix was present.

The Docker-backed failure had two remaining pieces:

- The sandboxed filesystem helper needs to execute Codex through the
`codex-linux-sandbox` arg0 alias path. The helper sandbox was only
granting read access to the real Codex executable parent, so the alias
parent also has to be visible inside the helper sandbox.
- The remote-env tests were building sandbox contexts with
`FileSystemSandboxContext::new()`, which captures the local test runner
cwd. In the Docker remote exec-server, that host checkout path does not
exist, so spawning the filesystem helper failed with `No such file or
directory` before the helper could process the request.

## What Changed

- Track all helper runtime read roots instead of a single root.
- Add both the real Codex executable parent and the
`codex-linux-sandbox` alias parent to sandbox readable roots.
- Avoid sending an unused local cwd in remote filesystem sandbox
contexts when the permission profile has no cwd-dependent entries.
- Build the Docker remote-env test sandbox contexts with a cwd path that
exists inside the container.
- Add unit coverage for the alias-parent root and remote sandbox cwd
handling.

## Verification

- `cargo test -p codex-exec-server`
- `cargo test -p codex-core
remote_test_env_sandboxed_read_allows_readable_root`
- `just fix -p codex-exec-server`
- `just fix -p codex-core`
2026-04-22 21:34:22 +00:00
Leo Shimonaka
16eeeb534a Fix MCP permission policy sync (#19033)
###### Why/Context/Summary

Repro: start a session outside Full Access, switch permissions to Full
Access, then submit a new turn that triggers MCP/CUA permission
handling.

The turn used the live Full Access `SessionConfiguration`, but the MCP
coordinator was still synced from the stale `original_config_do_not_use`
/ per-turn config copy. That left the coordinator with an old sandbox
policy, so empty MCP permission elicitations could be denied instead of
auto-accepted.

Fix: update/rebuild the MCP connection manager from the live
turn/session approval and sandbox policy fields.

###### Test plan

```sh
just fmt
cargo test -p codex-core --lib
cargo test -p codex-core --lib mcp_tool_call::tests
```
2026-04-22 14:30:29 -07:00
viyatb-oai
2d73bac45f feat: add guardian network approval trigger context (#18197)
## Summary

Give guardian network-access reviews the command context that triggered
a managed-network approval. The prompt JSON now includes the originating
tool call id, tool name, command argv, cwd, sandbox permissions,
additional permissions, justification, and tty state when a single
active tool call can be attributed.

The implementation keeps the trigger shape canonical by serializing
`GuardianNetworkAccessTrigger` directly and lets each runtime build that
trigger from its `ToolCtx`. Non-guardian approval prompts avoid cloning
the full trigger payload.

## UX changes

Guardian network-access reviews now include a `trigger` object that
explains what command caused the network approval. Instead of seeing
only the requested host, the guardian reviewer can also see the
originating tool call, argv, working directory, sandbox mode,
justification, and tty state.

Example payload the guardian reviewer can see:

```json
{
  "tool": "network_access",
  "target": "https://api.github.com:443",
  "host": "api.github.com",
  "protocol": "https",
  "port": 443,
  "trigger": {
    "callId": "call_abc123",
    "toolName": "shell",
    "command": ["gh", "api", "/repos/openai/codex/pulls/18197"],
    "cwd": "/workspace/codex",
    "sandboxPermissions": "require_escalated",
    "justification": "Fetch PR metadata from GitHub.",
    "tty": false
  }
}
```

The network review itself remains scoped to the network decision:
`target_item_id` stays `null`. `trigger.callId` is attribution context
only, so clients can still distinguish network reviews from
item-targeted command reviews.

## Verification

- Added coverage for serializing network trigger context in guardian
approval JSON.
- Added regression coverage that network guardian reviews do not reuse
`trigger.callId` as `target_item_id`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 14:00:53 -07:00
Ahmed Ibrahim
9360f267f3 [2/4] Implement executor HTTP request runner (#18582)
### Why
Remote streamable HTTP MCP needs the executor to perform ordinary HTTP
requests on the executor side. This keeps network placement aligned with
`experimental_environment = "remote"` without adding MCP-specific
executor APIs.

### What
- Add an executor-side `http/request` runner backed by `reqwest`.
- Validate request method and URL scheme, preserving the transport
boundary at plain HTTP.
- Return buffered responses for ordinary calls and emit ordered
`http/request/bodyDelta` notifications for streaming responses.
- Register the request handler in the exec-server router.
- Document the runner entrypoint, conversion helpers, body-stream
bridge, notification sender, timeout behavior, and new integration-test
helpers.
- Add exec-server integration tests with the existing websocket harness
and a local TCP HTTP peer for buffered and streamed responses, with
comments spelling out what each test proves and its
setup/exercise/assert phases.

### Stack
1. #18581 protocol
2. #18582 runner
3. #18583 RMCP client
4. #18584 manager wiring and local/remote coverage

### Verification
- `just fmt`
- `cargo check -p codex-exec-server -p codex-rmcp-client --tests`
- `cargo check -p codex-core --test all` compile-only
- `git diff --check`
- Online full CI is running from the `full-ci` branch, including the
remote Rust test job.

Co-authored-by: Codex <noreply@openai.com>

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 20:36:34 +00:00
Michael Bolin
18a26d7bbc app-server: accept permission profile overrides (#18279)
## Why

`PermissionProfile` is becoming the canonical permissions shape shared
by core and app-server. After app-server responses expose the active
profile, clients need to be able to send that same shape back when
starting, resuming, forking, or overriding a turn instead of translating
through the legacy `sandbox`/`sandboxPolicy` shorthands.

This still needs to preserve the existing requirements/platform
enforcement model. A profile-shaped request can be downgraded or
rejected by constraints, but the server should keep the user's
elevated-access intent for project trust decisions. Turn-level profile
overrides also need to retain existing read protections, including
deny-read entries and bounded glob-scan metadata, so a permission
override cannot accidentally drop configured protections such as
`**/*.env = deny`.

## What changed

- Adds optional `permissionProfile` request fields to `thread/start`,
`thread/resume`, `thread/fork`, and `turn/start`.
- Rejects ambiguous requests that specify both `permissionProfile` and
the legacy `sandbox`/`sandboxPolicy` fields, including running-thread
resume requests.
- Converts profile-shaped overrides into core runtime filesystem/network
permissions while continuing to derive the constrained legacy sandbox
projection used by existing execution paths.
- Preserves project-trust intent for profile overrides that are
equivalent to workspace-write or full-access sandbox requests.
- Preserves existing deny-read entries and `globScanMaxDepth` when
applying turn-level `permissionProfile` overrides.
- Updates app-server docs plus generated JSON/TypeScript schema fixtures
and regression coverage.

## Verification

- `cargo test -p codex-app-server-protocol schema_fixtures`
- `cargo test -p codex-core
session_configuration_apply_permission_profile_preserves_existing_deny_read_entries`







---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18279).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #18279
2026-04-22 13:34:33 -07:00
Dylan Hurd
ed4def8286 feat(auto-review) short-circuit (#18890)
## Summary
Short circuit the convo if auto-review hits too many denials

## Testing
- [x] Added unit tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 20:34:15 +00:00
xl-openai
b77791c228 feat: Fairly trim skill descriptions within context budget (#18925)
Preserve skill name/path entries whenever possible and trim descriptions
first, using round-robin character allocation so short descriptions do
not waste budget.
2026-04-22 12:33:29 -07:00
Michael Bolin
ddde50c611 arg0: keep dispatch aliases alive during async main (#18999)
## Why

The Ubuntu GNU remote Cargo run has been regularly failing sandboxed
`suite::remote_env` filesystem tests with `No such file or directory`,
while the same cases pass under Bazel. The Cargo remote-env setup starts
`target/debug/codex exec-server` inside Docker via
`scripts/test-remote-env.sh`. That CLI builds `codex-linux-sandbox` and
other arg0 helper aliases in a temporary directory, then passes those
alias paths into the exec-server runtime.

`arg0_dispatch_or_else` constructed `Arg0DispatchPaths` from that
temporary alias guard, but then awaited the async CLI entry point
without otherwise keeping the guard live. That allowed the guard to be
dropped while the exec-server was still running, removing the helper
alias directory. Later sandboxed filesystem calls tried to spawn the
now-deleted `codex-linux-sandbox` path and surfaced as `ENOENT`.

The relevant distinction I found is that `core/tests/common` stores the
result of `arg0_dispatch()` in a process-lifetime
`OnceLock<Option<Arg0PathEntryGuard>>` for test binaries. The Cargo
remote-env setup exercises a real `codex exec-server` process instead,
so it depends on the normal CLI lifetime behavior fixed here.

## What Changed

- Keep the arg0 tempdir guard alive until `main_fn(paths).await`
completes.
- Keep the helper on the real `arg0_dispatch()` shape, where alias setup
can fail and return `None` in production.
- Add a regression test that uses an explicit guard, yields once, and
verifies the generated helper alias path still exists while the async
entry point is running.

## Verification

- `cargo test -p codex-arg0`
- `just argument-comment-lint -p codex-arg0`
- `just fix -p codex-arg0`
2026-04-22 11:06:34 -07:00
Won Park
11e5af53c4 Add plumbing to approve stored Auto-Review denials (#18955)
## Summary

This adds the structural plumbing needed for an app-server client to
approve a previously denied Guardian review and carry that approval
context into the next model turn.

This PR does not add the actual `/auto-review-denials` tool 

## What Changed

- Added app-server v2 RPC `thread/approveGuardianDeniedAction`.
- Added generated JSON schema and TypeScript fixtures for
`ThreadApproveGuardianDeniedAction*`.
- Added core `Op::ApproveGuardianDeniedAction`.
- Added a core handler that validates the event is a denied Guardian
assessment and injects a developer message containing the stored denial
event JSON.
- Queues the approval context for the next turn if there is no active
turn yet.
- Added the TUI app-server bridge so `Op::ApproveGuardianDeniedAction {
event }` is routed to the app-server request.

## What This Does Not Do

- Does not add `/auto-review-denials`.
- Does not add chat widget recent-denial state.
- Does not add popup/list UI.
- Does not add a product-facing denial lookup/store.
- Does not change where Guardian denials are originally emitted or
persisted.

## Verification

- `cargo test -p codex-tui thread_approve_guardian_denied_action`
2026-04-22 10:38:19 -07:00
Dylan Hurd
78593d72ea feat(auto-review) policy config (#18959)
## Summary
Allow users to customize their own auto-review policy config.

## Testing
- [x] added config_tests
2026-04-22 10:33:02 -07:00
cassirer-openai
f67383bcba [rollout_trace] Record core session rollout traces (#18877)
## Summary

Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.

## Stack

This is PR 2/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.

The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.
2026-04-22 17:00:48 +00:00
Eric Traut
79ea577156 TUI: Keep remote app-server events draining (#18932)
Addresses #18860

Problem: Remote app-server clients could stop draining websocket events
when their bounded local event channel filled, leaving clients stuck on
stale in-progress turns after a disconnect.

Solution: Use an unbounded local event channel for the remote client so
the websocket reader can keep forwarding disconnect and progress events
instead of blocking or dropping them.

Why this is reasonable: This does not make the remote websocket itself
unbounded. The changed queue lives inside the remote client, between the
task that reads the remote websocket and the API consumer in the same
client process. Once an event has been received from the remote server,
preserving it is preferable to blocking websocket reads or dropping
disconnect/lifecycle events; network-level backpressure still happens at
the websocket boundary if the remote side outpaces the client.
2026-04-22 09:29:34 -07:00
Steve Coffey
0127cef5db Stage publishable Python runtime wheels (#18865)
This is PR 2 of the Python SDK PyPI publishing split. [PR
1](https://github.com/openai/codex/pull/18862) refreshed the generated
SDK bindings; this PR makes the runtime package itself publishable, and
PR 3 will wire the SDK package/version pinning to this runtime package.

## Summary
- Rename the runtime distribution to `openai-codex-cli-bin` while
keeping the import package as `codex_cli_bin`.
- Make the runtime package wheel-only and build `py3-none-<platform>`
wheels instead of interpreter-specific wheels.
- Add `stage-runtime --codex-version` and `--platform-tag` so release
staging can produce the platform wheel matrix from Codex release tags.
- Add focused artifact workflow tests for version normalization,
platform tag injection, and runtime wheel metadata.

## Why Rename
There is already an unofficial PyPI package,
[`codex-bin`](https://pypi.org/project/codex-bin/), distributing OpenAI
Codex binaries. Publishing the official SDK runtime dependency as
`openai-codex-cli-bin` makes the ownership clear, avoids confusing the
SDK-pinned runtime wheel with that unowned wrapper, and keeps the import
package unchanged as `codex_cli_bin`.

## Tests
- `uv run --extra dev pytest
tests/test_artifact_workflow_and_binaries.py` -> 21 passed
- `uv run --extra dev python scripts/update_sdk_artifacts.py
stage-runtime /tmp/codex-python-pr2-rebased/runtime-stage
/tmp/codex-python-pr2-rebased/codex --codex-version
rust-v0.116.0-alpha.1 --platform-tag macosx_11_0_arm64`
- `uv run --with build --extra dev python -m build --wheel
/tmp/codex-python-pr2-rebased/runtime-stage`
- `uv run --with twine --extra dev twine check
/tmp/codex-python-pr2-rebased/runtime-stage/dist/openai_codex_cli_bin-0.116.0a1-py3-none-macosx_11_0_arm64.whl`

## Note
- Full `uv run --extra dev pytest` currently fails because regenerating
from schemas already on `main` adds new DeviceKey Python types. I left
that generated catch-up out of this runtime-only PR.
2026-04-22 08:14:48 -07:00
Vaibhav Srivastav
0ebe69a8c3 [codex] Update imagegen system skill (#18852)
## Summary

This updates the embedded `imagegen` system skill in `codex-rs/skills`
with the ImageGen 2 skill changes from `openai/skills-internal#87`.

The bundled skill now keeps normal image generation/editing on the
built-in `image_gen` path, updates the CLI fallback defaults to
`gpt-image-2`, and routes explicit transparent-output requests through
`gpt-image-1.5` with clear guidance that `gpt-image-2` does not support
transparent backgrounds.

## Details

- Update `SKILL.md` routing guidance for built-in vs CLI fallback
behavior.
- Update CLI/API references for `gpt-image-2` size constraints, quality
options, near-4K sizes, and unsupported options.
- Update `scripts/image_gen.py` defaults and validation:
  - default model `gpt-image-2`
  - default size `auto`
  - default quality `medium`
  - reject transparent backgrounds on `gpt-image-2`
  - reject `input_fidelity` on `gpt-image-2`
- validate flexible `gpt-image-2` sizes and suggest `3824x2160` /
`2160x3824` for near-4K requests
- Update prompt/reference docs with the new model and routing guidance.

## Validation

- `cargo test -p codex-skills`
- `git diff --check`
- Manual CLI dry-runs for:
  - default `gpt-image-2` payload
  - `3824x2160` near-4K size acceptance
  - `3840x2160` rejection with near-4K guidance
  - transparent background rejection on `gpt-image-2`
  - transparent background acceptance on `gpt-image-1.5`
  - `input_fidelity` rejection on `gpt-image-2`

Bazel target check was not run locally because `bazel` is not installed
in this environment.
2026-04-22 15:08:10 +00:00
jif-oai
65420737e8 chore: prep memories for AB (#18973) 2026-04-22 11:46:15 +01:00
jif-oai
ddf65c9647 fix: cargo deny (#18971) 2026-04-22 11:46:11 +01:00
jif-oai
639382609f fix: wait_agent timeout for queued mailbox mail (#18968)
## Why

`wait_agent` can be called while mailbox mail is already pending. The
previous implementation subscribed for future mailbox sequence changes
and then waited for the next notification. If the mail was queued before
that wait started, no new notification arrived, so the tool could sit
until `timeout_ms` even though mail was ready to deliver.

## What Changed

- Added `Session::has_pending_mailbox_items()` for checking pending
mailbox mail through the session API.
- Updated `multi_agents_v2::wait` to return immediately when pending
mailbox mail already exists before sleeping on a new mailbox sequence
update.
- Reworked the regression coverage in `multi_agents_tests.rs` so already
queued mailbox mail must wake `wait_agent` promptly.

Relevant code:
- [`wait_agent` pending-mail
check](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_v2/wait.rs (L55-L60))
-
[`Session::has_pending_mailbox_items`](aa8ca06e83/codex-rs/core/src/session/mod.rs (L2979-L2981))
-
[`multi_agent_v2_wait_agent_returns_for_already_queued_mail`](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_tests.rs (L2854))

## Verification

- `cargo test -p codex-core
multi_agent_v2_wait_agent_returns_for_already_queued_mail`
2026-04-22 11:16:17 +01:00
acrognale-oai
4f8c58f737 Support multiple cwd filters for thread list (#18502)
## Summary

- Teach app-server `thread/list` to accept either a single `cwd` or an
array of cwd filters, returning threads whose recorded session cwd
matches any requested path
- Add `useStateDbOnly` as an explicit opt-in fast path for callers that
want to answer `thread/list` from SQLite without scanning JSONL rollout
files
- Preserve backwards compatibility: by default, `thread/list` still
scans JSONL rollouts and repairs SQLite state
- Wire the new cwd array and SQLite-only options through app-server,
local/remote thread-store, rollout listing, generated TypeScript/schema
fixtures, proto output, and docs

## Test Plan

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-rollout`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server thread_list`
- `just fmt`
- `just fix -p codex-app-server-protocol -p codex-rollout -p
codex-thread-store -p codex-app-server`
- `cargo build -p codex-cli --bin codex`
2026-04-22 06:10:09 -04:00
jif-oai
b04ffeee4c nit: expose lib (#18962)
As a follow-up
2026-04-22 10:06:53 +01:00
rhan-oai
213b17b7a3 [codex-analytics] guardian review TTFT plumbing and emission (#17696)
## Why

Guardian analytics includes time-to-first-token, but the Guardian
reviewer runs as a normal Codex session and `TurnCompleteEvent` did not
expose TTFT. The timing needs to flow through the standard
turn-completion protocol so Guardian review analytics can consume the
same value as the rest of the session machinery.

## What changed

Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and
populates it from `TurnTiming`. The value is carried through app-server
thread history, rollout reconstruction, TUI/app-server adapters, and
Guardian review session handling.

Guardian review analytics now captures TTFT from the reviewer
turn-complete event when available. Existing tests and fixtures are
updated to set the new optional field to `None` where TTFT is not
relevant.

## Verification

- `cargo clippy -p codex-tui --tests -- -D warnings`
- `cargo clippy -p codex-core --lib --tests -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696).
* __->__ #17696
* #17695
* #17693
* #18278
* #18953
2026-04-22 01:52:48 -07:00
rhan-oai
37aadeaa13 [codex-analytics] guardian review truncation (#17695)
## Why

The Guardian review event needs to report whether the action shown to
Guardian was truncated. That field should come from the same truncation
path used to build the Guardian prompt, rather than being inferred after
the fact.

## What changed

Plumbs truncation metadata through Guardian action formatting, prompt
construction, review session execution, and analytics emission.
`guardian_truncate_text` now reports both the rendered text and whether
it inserted the truncation marker, and `reviewed_action_truncated` is
set from that prompt-building result.

This keeps the analytics field aligned with the model-visible reviewed
action while preserving the existing Guardian prompt behavior.

## Verification

- Guardian truncation tests cover both truncated and non-truncated
action payloads.
- Guardian review tests assert the review session metadata and
truncation field are propagated.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17695).
* #17696
* __->__ #17695
* #17693
* #18278
* #18953
2026-04-22 08:35:29 +00:00
rhan-oai
4e7399c6b9 [codex-analytics] guardian review analytics events emission (#17693)
## Why

Guardian approvals now run as review sessions, but Codex analytics did
not have a terminal event for those reviews. That made it hard to
measure approval outcomes, failure modes, Guardian session reuse, model
metadata, token usage, and timing separately from the parent turn.

## What changed

Adds `codex_guardian_review` analytics emission for Guardian approval
reviews. The event is emitted from the Guardian review path with review
identity, target item id, approval request source, a PII-minimized
reviewed-action shape, terminal decision/status, failure reason,
Guardian assessment fields, Guardian session metadata, token usage, and
timing metadata.

The reviewed-action payload intentionally omits high-risk fields such as
shell commands, working directories, argv, file paths, network
targets/hosts, rationale, retry reason, and permission justifications.
It also classifies prompt-build failures separately from Guardian
session/runtime failures so fail-closed cases are distinguishable in
analytics.

## Verification

- Guardian review analytics tests cover terminal success,
timeout/cancel/fail-closed paths, session metadata, and token usage
plumbing.
- `cargo clippy -p codex-core --lib --tests -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17693).
* #17696
* #17695
* __->__ #17693
2026-04-22 01:02:47 -07:00
1749 changed files with 214163 additions and 91015 deletions

View File

@@ -29,10 +29,13 @@ common:linux --test_env=PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
common:macos --test_env=PATH=/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
# Pass through some env vars Windows needs to use powershell?
common:windows --test_env=PATH
common:windows --test_env=SYSTEMROOT
common:windows --test_env=COMSPEC
common:windows --test_env=WINDIR
# Rust's libtest harness runs test bodies on std-spawned threads. The default
# 2 MiB stack can be too small for large async test futures on Windows CI; see
# https://github.com/openai/codex/pull/19067 for the motivating failure.
common --test_env=RUST_MIN_STACK=8388608 # 8 MiB
common --test_output=errors
common --bes_results_url=https://app.buildbuddy.io/invocation/
@@ -76,6 +79,10 @@ common:ci --disk_cache=
# Shared config for the main Bazel CI workflow.
common:ci-bazel --config=ci
common:ci-bazel --build_metadata=TAG_workflow=bazel
# Bazel CI cross-compiles in several legs, and the V8-backed code-mode tests
# are not stable in that setup yet. Keep running the rest of the Rust
# integration suites through the workspace-root launcher.
common:ci-bazel --test_env=CODEX_BAZEL_TEST_SKIP_FILTERS=suite::code_mode::
# Shared config for Bazel-backed Rust linting.
build:clippy --aspects=@rules_rust//rust:defs.bzl%rust_clippy_aspect
@@ -150,6 +157,25 @@ common:ci-macos --config=remote
common:ci-macos --strategy=remote
common:ci-macos --strategy=TestRunner=darwin-sandbox,local
# On Windows, use Linux remote execution for build actions but keep test actions
# on the Windows runner so Bazel's normal test sharding and flaky-test retries
# still run against Windows binaries.
common:ci-windows-cross --config=ci-windows
common:ci-windows-cross --build_metadata=TAG_windows_cross_compile=true
common:ci-windows-cross --config=remote
common:ci-windows-cross --host_platform=//:rbe
common:ci-windows-cross --strategy=remote
common:ci-windows-cross --strategy=TestRunner=local
common:ci-windows-cross --local_test_jobs=4
common:ci-windows-cross --test_env=RUST_TEST_THREADS=1
# Native Windows CI still covers the PowerShell tests. The cross-built gnullvm
# binaries currently hang in PowerShell AST parser tests when those binaries are
# run on the Windows runner.
common:ci-windows-cross --test_env=CODEX_BAZEL_TEST_SKIP_FILTERS=suite::code_mode::,powershell
common:ci-windows-cross --platforms=//:windows_x86_64_gnullvm
common:ci-windows-cross --extra_execution_platforms=//:rbe,//:windows_x86_64_msvc
common:ci-windows-cross --extra_toolchains=//:windows_gnullvm_tests_on_msvc_host_toolchain
# Linux-only V8 CI config.
common:ci-v8 --config=ci
common:ci-v8 --build_metadata=TAG_workflow=v8
@@ -157,5 +183,15 @@ common:ci-v8 --build_metadata=TAG_os=linux
common:ci-v8 --config=remote
common:ci-v8 --strategy=remote
# Source-built Bazel V8 artifacts use the in-process sandbox by default. This
# does not affect Cargo's default prebuilt rusty_v8 path.
common --@v8//:v8_enable_pointer_compression=True
common --@v8//:v8_enable_sandbox=True
# Keep currently published rusty_v8 release artifacts non-sandboxed until the
# artifact migration ships matching Rust feature selection for Cargo consumers.
common:v8-release-compat --@v8//:v8_enable_pointer_compression=False
common:v8-release-compat --@v8//:v8_enable_sandbox=False
# Optional per-user local overrides.
try-import %workspace%/user.bazelrc

View File

@@ -1,5 +1,6 @@
iTerm
iTerm2
psuedo
SOM
te
TE

View File

@@ -1,6 +1,6 @@
[codespell]
# Ref: https://github.com/codespell-project/codespell#using-a-config-file
skip = .git*,vendor,*-lock.yaml,*.lock,.codespellrc,*test.ts,*.jsonl,frame*.txt,*.snap,*.snap.new,*meriyah.umd.min.js
skip = .git*,vendor,*-lock.yaml,*.lock,.codespellrc,*test.ts,*.jsonl,frame*.txt,*.snap,*.snap.new
check-hidden = true
ignore-regex = ^\s*"image/\S+": ".*|\b(afterAll)\b
ignore-words-list = ratatui,ser,iTerm,iterm2,iterm,te,TE,PASE,SEH

View File

@@ -0,0 +1,11 @@
# THIS IS AUTOGENERATED. DO NOT EDIT MANUALLY
version = 1
name = "codex"
[setup]
script = ""
[[actions]]
name = "Run"
icon = "run"
command = "cargo +1.93.0 run --manifest-path=codex-rs/Cargo.toml --bin codex -- -c mcp_oauth_credentials_store=file"

View File

@@ -27,10 +27,10 @@ Accept any of the following:
2. Run the watcher script to snapshot PR/review/CI state (or consume each streamed snapshot from `--watch`).
3. Inspect the `actions` list in the JSON response.
4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure.
5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
5. If the failure is likely caused by the current branch, patch code locally, commit, and push. Do not patch random flaky tests, CI infrastructure, dependency outages, runner issues, or other failures that are unrelated to the branch.
6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
7. If a review item is actionable and correct, patch code locally, commit, push, and then mark the associated review thread/comment as resolved once the fix is on GitHub.
8. If a review item from another author is non-actionable, already addressed, or not valid, post one reply on the comment/thread explaining that decision (for example answering the question or explaining why no change is needed). Prefix the GitHub reply body with `[codex]` so it is clear the response is automated. If the watcher later surfaces your own reply, treat that self-authored item as already handled and do not reply again.
8. Do not post replies to human-authored review comments/threads unless the user explicitly confirms the exact response. If a human review item is non-actionable, already addressed, or not valid, surface the item and recommended response to the user instead of replying on GitHub.
9. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
10. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
11. On every loop, look for newly surfaced review feedback before acting on CI failures or mergeability state, then verify mergeability / merge-conflict status (for example via `gh pr view`) alongside CI.
@@ -69,12 +69,18 @@ python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --o
Use `gh` commands to inspect failed runs before deciding to rerun.
- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
- `gh run view <run-id> --log-failed`
- `gh api repos/<owner>/<repo>/actions/runs/<run-id>/jobs -X GET -f per_page=100`
- `gh api repos/<owner>/<repo>/actions/jobs/<job-id>/logs > /tmp/codex-gh-job-<job-id>-logs.zip`
- `gh run view <run-id> --log-failed` as a fallback after the overall workflow run is complete
Prefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
`gh run view --log-failed` is workflow-run scoped and may not expose failed-job logs until the overall run finishes. For faster diagnosis, poll the run's jobs first and, as soon as a specific job has failed, fetch that job's logs directly from the Actions job logs endpoint. The watcher includes a `failed_jobs` list with each failed job's `job_id` and `logs_endpoint` when GitHub exposes one.
Prefer treating failures as branch-related when failed-job logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
Do not attempt to fix flaky/unrelated failures by changing tests, build scripts, CI configuration, dependency pins, or infrastructure-adjacent code unless the logs clearly connect the failure to the PR branch. For flaky/unrelated failures, rerun only when the watcher recommends `retry_failed_checks`; otherwise wait or stop for user help.
If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
Read `.codex/skills/babysit-pr/references/heuristics.md` for a concise checklist.
@@ -99,7 +105,8 @@ When you agree with a comment and it is actionable:
5. Resume watching on the new SHA immediately (do not stop after reporting the push).
6. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.
If you disagree or the comment is non-actionable/already addressed, reply once directly on the GitHub comment/thread so the reviewer gets an explicit answer, then continue the watcher loop. Prefix any GitHub reply to a code review comment/thread with `[codex]` so it is clear the response is automated and not from the human user. If the watcher later surfaces your own reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
Do not post replies to human-authored GitHub review comments/threads automatically. If you disagree with a human comment, believe it is non-actionable/already addressed, or need to answer a question, report the item to the user with a suggested response and wait for explicit confirmation before posting anything on GitHub. If the user approves a response, prefix it with `[codex]` so it is clear the response is automated and not from the human user.
If the watcher later surfaces your own approved reply because the authenticated operator is treated as a trusted review author, treat that self-authored item as already handled and do not reply again.
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
## Git Safety Rules
@@ -125,11 +132,11 @@ Use this loop in a live Codex session:
2. Read `actions`.
3. First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
4. Check CI summary, new review items, and mergeability/conflict status.
5. Diagnose CI failures and classify branch-related vs flaky/unrelated.
6. For each surfaced review item from another author, either reply once with an explanation if it is non-actionable or patch/commit/push and then resolve it if it is actionable. If a later snapshot surfaces your own reply, treat it as informational and continue without responding again.
5. Diagnose CI failures and classify branch-related vs flaky/unrelated. If the overall run is still pending but `failed_jobs` already includes a failed job, fetch that job's logs and diagnose immediately instead of waiting for the whole workflow run to finish. Patch only when the failure is branch-related.
6. For each surfaced review item from another author, patch/commit/push and then resolve it if it is actionable. If it is non-actionable, already addressed, or requires a written answer, surface it to the user with a suggested response instead of posting automatically. If a later snapshot surfaces your own approved reply, treat it as informational and continue without responding again.
7. Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
8. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit.
9. If you pushed a commit, resolved a review thread, replied to a review comment, or triggered a rerun, report the action briefly and continue polling (do not stop).
8. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit. Do not make code changes for unrelated flakes or infrastructure failures just to get CI green.
9. If you pushed a commit, resolved a review thread, or triggered a rerun, report the action briefly and continue polling (do not stop). If a human review comment needs a written GitHub response, stop and ask for confirmation before posting.
10. After a review-fix push, proactively restart continuous monitoring (`--watch`) in the same turn unless a strict stop condition has already been reached.
11. If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report that the PR is currently ready to merge but keep the watcher running so new review comments are surfaced quickly while the PR remains open.
12. If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.

View File

@@ -1,4 +1,4 @@
interface:
display_name: "PR Babysitter"
short_description: "Watch PR review comments, CI, and merge conflicts"
default_prompt: "Babysit the current PR: monitor reviewer comments, CI, and merge-conflict status (prefer the watchers --watch mode for live monitoring); surface new review feedback before acting on CI or mergeability work, fix valid issues, push updates, and rerun flaky failures up to 3 times. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Do not treat a green + mergeable PR as a terminal stop while it is still open; continue polling autonomously after any push/rerun so newly posted review comments are surfaced until a strict terminal stop condition is reached or the user interrupts."
default_prompt: "Babysit the current PR: monitor reviewer comments, CI, and merge-conflict status (prefer the watchers --watch mode for live monitoring); surface new review feedback before acting on CI or mergeability work, fix valid issues, push updates, and rerun flaky failures up to 3 times. Do not post replies to human-authored review comments unless the user explicitly confirms the exact response. Do not patch unrelated flaky tests, CI infrastructure, dependency outages, runner issues, or other failures that are not caused by the branch. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Do not treat a green + mergeable PR as a terminal stop while it is still open; continue polling autonomously after any push/rerun so newly posted review comments are surfaced until a strict terminal stop condition is reached or the user interrupts."

View File

@@ -23,9 +23,11 @@ Used to discover failed workflow runs and rerunnable run IDs.
### Failed log inspection
- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
- `gh api repos/{owner}/{repo}/actions/runs/{run_id}/jobs -X GET -f per_page=100`
- `gh api repos/{owner}/{repo}/actions/jobs/{job_id}/logs > /tmp/codex-gh-job-{job_id}-logs.zip`
- `gh run view <run-id> --log-failed`
Used by Codex to classify branch-related vs flaky/unrelated failures.
Used by Codex to classify branch-related vs flaky/unrelated failures. Prefer the direct job log endpoint as soon as a job has failed because `gh run view --log-failed` may not produce failed-job logs until the overall workflow run completes.
### Retry failed jobs only
@@ -70,3 +72,11 @@ Reruns only failed jobs (and dependencies) for a workflow run.
- `conclusion`
- `html_url`
- `head_sha`
### Actions run jobs API (`jobs[]`)
- `id`
- `name`
- `status`
- `conclusion`
- `html_url`

View File

@@ -18,6 +18,8 @@ Treat as **likely flaky or unrelated** when evidence points to transient or exte
- Cloud/service rate limits or transient API outages
- Non-deterministic failures in unrelated integration tests with known flake patterns
Do not patch likely flaky/unrelated failures. Use the retry budget for rerunnable failures, wait for pending jobs, or stop and report the blocker when the failure is persistent or infrastructure-owned.
If uncertain, inspect failed logs once before choosing rerun.
## Decision tree (fix vs rerun vs stop)
@@ -25,9 +27,11 @@ If uncertain, inspect failed logs once before choosing rerun.
1. If PR is merged/closed: stop.
2. If there are failed checks:
- Diagnose first.
- If checks are still pending but an individual job has already failed: fetch that job's logs and diagnose now.
- If branch-related: fix locally, commit, push.
- If likely flaky/unrelated and all checks for the current SHA are terminal: rerun failed jobs.
- If checks are still pending: wait.
- If likely flaky/unrelated and not safely rerunnable: stop and report the blocker; do not edit unrelated tests, build scripts, CI configuration, dependency pins, or infrastructure code.
- If checks are still pending and no failed job is available yet: wait.
3. If flaky reruns for the same SHA reach the configured limit (default 3): stop and report persistent failure.
4. Independently, process any new human review comments.
@@ -40,12 +44,15 @@ Address the comment when:
- The requested change does not conflict with the users intent or recent guidance.
- The change can be made safely without unrelated refactors.
Fix valid human review feedback in code when possible, but do not post a GitHub reply to a human-authored comment/thread unless the user explicitly confirms the exact response.
Do not auto-fix when:
- The comment is ambiguous and needs clarification.
- The request conflicts with explicit user instructions.
- The proposed change requires product/design decisions the user has not made.
- The codebase is in a dirty/unrelated state that makes safe editing uncertain.
- The comment only needs a written answer or disagreement response; propose the reply to the user instead of posting it automatically.
## Stop-and-ask conditions
@@ -56,3 +63,4 @@ Stop and ask the user instead of continuing automatically when:
- The PR branch cannot be pushed.
- CI failures persist after the flaky retry budget.
- Reviewer feedback requires a product decision or cross-team coordination.
- A human review comment requires a written GitHub reply instead of a code change.

View File

@@ -338,6 +338,66 @@ def failed_runs_from_workflow_runs(runs, head_sha):
return failed_runs
def get_jobs_for_run(repo, run_id):
endpoint = f"repos/{repo}/actions/runs/{run_id}/jobs"
data = gh_json(["api", endpoint, "-X", "GET", "-f", "per_page=100"], repo=repo)
if not isinstance(data, dict):
raise GhCommandError("Unexpected payload from actions run jobs API")
jobs = data.get("jobs") or []
if not isinstance(jobs, list):
raise GhCommandError("Expected `jobs` to be a list")
return jobs
def failed_jobs_from_workflow_runs(repo, runs, head_sha):
failed_jobs = []
for run in runs:
if not isinstance(run, dict):
continue
if str(run.get("head_sha") or "") != head_sha:
continue
run_id = run.get("id")
if run_id in (None, ""):
continue
run_status = str(run.get("status") or "")
run_conclusion = str(run.get("conclusion") or "")
if run_status.lower() == "completed" and run_conclusion not in FAILED_RUN_CONCLUSIONS:
continue
jobs = get_jobs_for_run(repo, run_id)
for job in jobs:
if not isinstance(job, dict):
continue
conclusion = str(job.get("conclusion") or "")
if conclusion not in FAILED_RUN_CONCLUSIONS:
continue
job_id = job.get("id")
logs_endpoint = None
if job_id not in (None, ""):
logs_endpoint = f"repos/{repo}/actions/jobs/{job_id}/logs"
failed_jobs.append(
{
"run_id": run_id,
"workflow_name": run.get("name") or run.get("display_title") or "",
"run_status": run_status,
"run_conclusion": run_conclusion,
"job_id": job_id,
"job_name": str(job.get("name") or ""),
"status": str(job.get("status") or ""),
"conclusion": conclusion,
"html_url": str(job.get("html_url") or ""),
"logs_endpoint": logs_endpoint,
}
)
failed_jobs.sort(
key=lambda item: (
str(item.get("workflow_name") or ""),
str(item.get("job_name") or ""),
str(item.get("job_id") or ""),
)
)
return failed_jobs
def get_authenticated_login():
data = gh_json(["api", "user"])
if not isinstance(data, dict) or not data.get("login"):
@@ -568,7 +628,7 @@ def is_pr_ready_to_merge(pr, checks_summary, new_review_items):
return True
def recommend_actions(pr, checks_summary, failed_runs, new_review_items, retries_used, max_retries):
def recommend_actions(pr, checks_summary, failed_runs, failed_jobs, new_review_items, retries_used, max_retries):
actions = []
if pr["closed"] or pr["merged"]:
if new_review_items:
@@ -583,7 +643,7 @@ def recommend_actions(pr, checks_summary, failed_runs, new_review_items, retries
if new_review_items:
actions.append("process_review_comment")
has_failed_pr_checks = checks_summary["failed_count"] > 0
has_failed_pr_checks = checks_summary["failed_count"] > 0 or bool(failed_jobs)
if has_failed_pr_checks:
if checks_summary["all_terminal"] and retries_used >= max_retries:
actions.append("stop_exhausted_retries")
@@ -621,12 +681,14 @@ def collect_snapshot(args):
checks_summary = summarize_checks(checks)
workflow_runs = get_workflow_runs_for_sha(pr["repo"], pr["head_sha"])
failed_runs = failed_runs_from_workflow_runs(workflow_runs, pr["head_sha"])
failed_jobs = failed_jobs_from_workflow_runs(pr["repo"], workflow_runs, pr["head_sha"])
retries_used = current_retry_count(state, pr["head_sha"])
actions = recommend_actions(
pr,
checks_summary,
failed_runs,
failed_jobs,
new_review_items,
retries_used,
args.max_flaky_retries,
@@ -641,6 +703,7 @@ def collect_snapshot(args):
"pr": pr,
"checks": checks_summary,
"failed_runs": failed_runs,
"failed_jobs": failed_jobs,
"new_review_items": new_review_items,
"actions": actions,
"retry_state": {

View File

@@ -75,6 +75,11 @@ def test_collect_snapshot_fetches_review_items_before_ci(monkeypatch, tmp_path):
"failed_runs_from_workflow_runs",
lambda *args, **kwargs: call_order.append("failed_runs") or [],
)
monkeypatch.setattr(
gh_pr_watch,
"failed_jobs_from_workflow_runs",
lambda *args, **kwargs: call_order.append("failed_jobs") or [],
)
monkeypatch.setattr(
gh_pr_watch,
"recommend_actions",
@@ -100,6 +105,7 @@ def test_recommend_actions_prioritizes_review_comments():
sample_pr(),
sample_checks(failed_count=1),
[{"run_id": 99}],
[],
[{"kind": "review_comment", "id": "1"}],
0,
3,
@@ -119,6 +125,7 @@ def test_run_watch_keeps_polling_open_ready_to_merge_pr(monkeypatch):
"pr": sample_pr(),
"checks": sample_checks(),
"failed_runs": [],
"failed_jobs": [],
"new_review_items": [],
"actions": ["ready_to_merge"],
"retry_state": {
@@ -153,3 +160,58 @@ def test_run_watch_keeps_polling_open_ready_to_merge_pr(monkeypatch):
assert sleeps == [30, 30]
assert [event for event, _ in events] == ["snapshot", "snapshot"]
def test_failed_jobs_include_direct_logs_endpoint(monkeypatch):
jobs_by_run = {
99: [
{
"id": 555,
"name": "unit tests",
"status": "completed",
"conclusion": "failure",
"html_url": "https://github.com/openai/codex/actions/runs/99/job/555",
},
{
"id": 556,
"name": "lint",
"status": "completed",
"conclusion": "success",
},
]
}
monkeypatch.setattr(
gh_pr_watch,
"get_jobs_for_run",
lambda repo, run_id: jobs_by_run[run_id],
)
failed_jobs = gh_pr_watch.failed_jobs_from_workflow_runs(
"openai/codex",
[
{
"id": 99,
"name": "CI",
"status": "in_progress",
"conclusion": "",
"head_sha": "abc123",
}
],
"abc123",
)
assert failed_jobs == [
{
"run_id": 99,
"workflow_name": "CI",
"run_status": "in_progress",
"run_conclusion": "",
"job_id": 555,
"job_name": "unit tests",
"status": "completed",
"conclusion": "failure",
"html_url": "https://github.com/openai/codex/actions/runs/99/job/555",
"logs_endpoint": "repos/openai/codex/actions/jobs/555/logs",
}
]

View File

@@ -0,0 +1,127 @@
---
name: codex-issue-digest
description: Run a GitHub issue digest for openai/codex by feature-area labels, all areas, and configurable time windows. Use when asked to summarize recent Codex bug reports or enhancement requests, especially for owner-specific labels such as tui, exec, app, or similar areas.
---
# Codex Issue Digest
## Objective
Produce a headline-first, insight-oriented digest of `openai/codex` issues for the requested feature-area labels over the previous 24 hours by default. Honor a different duration when the user asks for one, for example "past week" or "48 hours". Default to a summary-only response; include details only when requested.
Include only issues that currently have `bug` or `enhancement` plus at least one requested owner label. If the user asks for all areas or all labels, collect `bug`/`enhancement` issues across all labels.
## Inputs
- Feature-area labels, for example `tui exec`
- `all areas` / `all labels` to scan all current feature labels
- Optional repo override, default `openai/codex`
- Optional time window, default previous 24 hours; examples: `48h`, `7d`, `1w`, `past week`
## Workflow
1. Run the collector from a current Codex repo checkout:
```bash
python3 .codex/skills/codex-issue-digest/scripts/collect_issue_digest.py --labels tui exec --window-hours 24
```
Use `--window "past week"` or `--window-hours 168` when the user asks for a non-default duration. Use `--all-labels` when the user says all areas or all labels.
2. Use the JSON as the source of truth. It includes new issues, new issue comments, new reactions/upvotes, current labels, current reaction counts, model-ready `summary_inputs`, and detailed `digest_rows`.
3. Choose the output mode from the user's request:
- Default mode: start the report with `## Summary` and do not emit `## Details`.
- Details-upfront mode: if the user asks for details, a table, a full digest, "include details", or similar, start with `## Summary`, then include `## Details`.
- Follow-up details mode: if the user asks for more detail after a summary-only digest, produce `## Details` from the existing collector JSON when it is still available; otherwise rerun the collector.
4. In `## Summary`, write a headline-first executive summary:
- The first nonblank line under `## Summary` must be a single-line headline or judgment, not a bullet. It should be useful even if the reader stops there.
- On quiet days, prefer exactly: `No major issues reported by users.` Use this when there are no elevated rows, no newly repeated theme, and nothing that needs owner action.
- When users are surfacing notable issues, make the headline name the count or theme, for example `Two issues are being surfaced by users:`.
- Immediately under an active headline, list only the issues or themes driving attention, ordered by importance. Start each line with the row's `attention_marker` when present, then a concise owner-readable description and inline issue refs.
- Treat `🔥🔥` as headline-worthy and `🔥` as elevated. Do not add fire emoji yourself; only copy the row's `attention_marker`.
- Keep any extra summary detail after the headline to 1-3 terse lines, only when it adds a decision-relevant caveat, repeated theme, or owner action.
- Do not include routine counts, broad stats, or low-signal table summaries in `## Summary` unless they change the headline. Put metadata and optional counts in `## Details` or the footer.
- In default mode, end the report with a concise prompt such as `Want details? I can expand this into the issue table.` Keep this separate from the summary headline so the headline stays clean.
- Cluster and name themes yourself from `summary_inputs`; the collector intentionally does not hard-code issue categories.
- Use a cluster only when the issues genuinely share the same product problem. If several issues merely share a broad platform or label, describe them individually.
- Do not omit a repeated theme just because its individual issues fall below the details table cutoff. Several similar reports should be called out as a repeated customer concern.
- For single-issue rows, summarize the concern directly instead of calling it a cluster.
- Use inline numbered issue links from each relevant row's `ref_markdown`.
- Example quiet summary:
```markdown
## Summary
No major issues reported by users.
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
- Example active summary:
```markdown
## Summary
Two issues are being surfaced by users:
🔥🔥 Terminal launch hangs on startup [1](https://github.com/openai/codex/issues/123)
🔥 Resume switches model providers unexpectedly [2](https://github.com/openai/codex/issues/456)
Source: collector v4, git `abc123def456`, window `2026-04-27T00:00:00Z` to `2026-04-28T00:00:00Z`.
Want details? I can expand this into the issue table.
```
5. In `## Details`, when details are requested, include a compact table only when useful:
- Prefer rows from `digest_rows`; include a `Refs` column using each row's `ref_markdown`.
- Keep the table short; omit low-signal rows when the summary already covers them.
- Use compact columns such as marker, area, type, description, interactions, and refs.
- The `Description` cell should be a short owner-readable phrase. Use row `description`, title, body excerpts, and recent comments, but do not mechanically copy the raw GitHub issue title when it contains incidental details.
- A clear quiet/no-concern sentence when there is no meaningful signal.
6. Use the JSON `attention_marker` exactly. It is empty for normal rows, `🔥` for elevated rows, and `🔥🔥` for very high-attention rows. The actual cutoffs are in `attention_thresholds`.
7. Use inline numbered references where a row or bullet points to issues, for example `Compaction bugs [1](https://github.com/openai/codex/issues/123), [2](https://github.com/openai/codex/issues/456)`. Do not add a separate footnotes section.
8. Label `interactions` as `Interactions`; it counts posts/comments/reactions during the requested window, not unique people.
9. Mention the collector `script_version`, repo checkout `git_head`, and time window in one compact source line. In default mode, put this before the details prompt so the final line still asks whether the user wants details. In details-upfront mode, it can be the footer.
## Reaction Handling
The collector uses GitHub reactions endpoints, which include `created_at`, to count reactions created during the digest window for hydrated issues. It reports both in-window reaction counts and current reaction totals. Treat current reaction totals as standing engagement, and treat `new_reactions` / `new_upvotes` as windowed activity.
By default, the collector fetches issue comments with `since=<window start>` and caps the number of comment pages per issue. This keeps very long historical threads from dominating a digest run and focuses the report on recent posts. Use `--fetch-all-comments` only when exhaustive comment history is more important than runtime.
GitHub issue search is still seeded by issue `updated_at`, so a purely reaction-only issue may be missed if reactions do not bump `updated_at`. Covering every reaction-only case would require either a persisted snapshot store or a broader scan of labeled issues.
## Attention Markers
The collector scales attention markers by the requested time window. The baseline is 5 human user interactions for `🔥` and 10 for `🔥🔥` over 24 hours; longer or shorter windows scale those cutoffs linearly and round up. For example, a one-week report uses 35 and 70 interactions. Human user interactions are human-authored new issue posts, human-authored new comments, and human reactions created during the window, including upvotes. Bot posts and bot reactions are excluded. In prose, explain this as high user interaction rather than naming the emoji.
## Freshness
The automation should run from a repo checkout that contains this skill. For shared daily use, prefer one of these patterns:
- Run the automation in a checkout that is refreshed before the automation starts, for example with `git pull --ff-only`.
- If the automation cannot safely mutate the checkout, have it report the current `git_head` from the collector output so readers know which skill/script version produced the digest.
## Sample Owner Prompt
```text
Use $codex-issue-digest to run the Codex issue digest for labels tui and exec over the previous 24 hours.
```
```text
Use $codex-issue-digest to run the Codex issue digest for all areas over the past week.
```
## Validation
Dry run the collector against recent issues:
```bash
python3 .codex/skills/codex-issue-digest/scripts/collect_issue_digest.py --labels tui exec --window-hours 24
```
```bash
python3 .codex/skills/codex-issue-digest/scripts/collect_issue_digest.py --all-labels --window "past week" --limit-issues 10
```
Run the focused script tests:
```bash
pytest .codex/skills/codex-issue-digest/scripts/test_collect_issue_digest.py
```

View File

@@ -0,0 +1,4 @@
interface:
display_name: "Codex Issue Digest"
short_description: "Summarize Codex issues by labels or all areas"
default_prompt: "Use $codex-issue-digest to run the Codex issue digest for labels tui and exec over the previous 24 hours."

View File

@@ -0,0 +1,994 @@
#!/usr/bin/env python3
"""Collect recent openai/codex issue activity for owner-focused digests."""
import argparse
import json
import math
import re
import subprocess
import sys
from datetime import datetime, timedelta, timezone
from pathlib import Path
from urllib.parse import quote
SCRIPT_VERSION = 4
QUALIFYING_KIND_LABELS = ("bug", "enhancement")
REACTION_KEYS = ("+1", "-1", "laugh", "hooray", "confused", "heart", "rocket", "eyes")
BASE_ATTENTION_WINDOW_HOURS = 24.0
ONE_ATTENTION_INTERACTION_THRESHOLD = 5
TWO_ATTENTION_INTERACTION_THRESHOLD = 10
ALL_LABEL_PHRASES = {"all", "all areas", "all labels", "all-areas", "all-labels", "*"}
class GhCommandError(RuntimeError):
pass
def parse_args():
parser = argparse.ArgumentParser(
description="Collect recent GitHub issue activity for a Codex owner digest."
)
parser.add_argument(
"--repo", default="openai/codex", help="OWNER/REPO, default openai/codex"
)
parser.add_argument(
"--labels",
nargs="+",
default=[],
help="Feature-area labels owned by the digest recipient, for example: tui exec",
)
parser.add_argument(
"--all-labels",
action="store_true",
help="Collect bug/enhancement issues across all feature-area labels",
)
parser.add_argument(
"--window",
help='Lookback duration such as "24h", "7d", "1w", or "past week"',
)
parser.add_argument(
"--window-hours", type=float, default=24.0, help="Lookback window"
)
parser.add_argument(
"--since", help="UTC ISO timestamp override for the window start"
)
parser.add_argument("--until", help="UTC ISO timestamp override for the window end")
parser.add_argument(
"--limit-issues",
type=int,
default=200,
help="Maximum candidate issues to hydrate after search",
)
parser.add_argument(
"--body-chars", type=int, default=1200, help="Issue body excerpt length"
)
parser.add_argument(
"--comment-chars", type=int, default=900, help="Comment excerpt length"
)
parser.add_argument(
"--max-comment-pages",
type=int,
default=3,
help=(
"Maximum pages of issue comments to hydrate per issue after applying the "
"window filter. Use 0 with --fetch-all-comments for no page cap."
),
)
parser.add_argument(
"--fetch-all-comments",
action="store_true",
help="Hydrate complete issue comment histories instead of only window-updated comments.",
)
return parser.parse_args()
def parse_timestamp(value, arg_name):
if value is None:
return None
normalized = value.strip()
if not normalized:
return None
if normalized.endswith("Z"):
normalized = f"{normalized[:-1]}+00:00"
try:
parsed = datetime.fromisoformat(normalized)
except ValueError as err:
raise ValueError(f"{arg_name} must be an ISO timestamp") from err
if parsed.tzinfo is None:
parsed = parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
def format_timestamp(value):
return (
value.astimezone(timezone.utc)
.replace(microsecond=0)
.isoformat()
.replace("+00:00", "Z")
)
def resolve_window(args):
until = parse_timestamp(args.until, "--until") or datetime.now(timezone.utc)
since = parse_timestamp(args.since, "--since")
if since is None:
hours = parse_duration_hours(getattr(args, "window", None))
if hours is None:
hours = getattr(args, "window_hours", 24.0)
if hours <= 0:
raise ValueError("window duration must be > 0")
since = until - timedelta(hours=hours)
if since >= until:
raise ValueError("--since must be before --until")
return since, until
def parse_duration_hours(value):
if value is None:
return None
text = value.strip().casefold().replace("_", " ")
if not text:
return None
text = re.sub(r"^(past|last)\s+", "", text)
aliases = {
"day": 24.0,
"24h": 24.0,
"week": 168.0,
"7d": 168.0,
}
if text in aliases:
return aliases[text]
match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(h|hr|hrs|hour|hours)", text)
if match:
return float(match.group(1))
match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(d|day|days)", text)
if match:
return float(match.group(1)) * 24.0
match = re.fullmatch(r"(\d+(?:\.\d+)?)\s*(w|week|weeks)", text)
if match:
return float(match.group(1)) * 168.0
raise ValueError(f"Unsupported duration: {value}")
def normalize_requested_labels(labels, all_labels=False):
out = []
seen = set()
for raw in labels:
for piece in raw.split(","):
label = piece.strip()
if not label:
continue
key = label.casefold()
if key not in seen:
out.append(label)
seen.add(key)
phrase = " ".join(label.casefold() for label in out)
if all_labels or phrase in ALL_LABEL_PHRASES:
return [], True
if not out:
raise ValueError(
"At least one feature-area label is required, or use --all-labels"
)
return out, False
def quote_label(label):
if re.fullmatch(r"[A-Za-z0-9_.:-]+", label):
return f"label:{label}"
escaped = label.replace('"', '\\"')
return f'label:"{escaped}"'
def build_search_queries(
repo, owner_labels, since, kind_labels=QUALIFYING_KIND_LABELS, all_labels=False
):
since_date = since.date().isoformat()
queries = []
if all_labels:
for kind_label in kind_labels:
queries.append(
" ".join(
[
f"repo:{repo}",
"is:issue",
f"updated:>={since_date}",
quote_label(kind_label),
]
)
)
return queries
for owner_label in owner_labels:
for kind_label in kind_labels:
queries.append(
" ".join(
[
f"repo:{repo}",
"is:issue",
f"updated:>={since_date}",
quote_label(owner_label),
quote_label(kind_label),
]
)
)
return queries
def _format_gh_error(cmd, err):
stdout = (err.stdout or "").strip()
stderr = (err.stderr or "").strip()
parts = [f"GitHub CLI command failed: {' '.join(cmd)}"]
if stdout:
parts.append(f"stdout: {stdout}")
if stderr:
parts.append(f"stderr: {stderr}")
return "\n".join(parts)
def gh_json(args):
cmd = ["gh", *args]
try:
proc = subprocess.run(cmd, check=True, capture_output=True, text=True)
except FileNotFoundError as err:
raise GhCommandError("`gh` command not found") from err
except subprocess.CalledProcessError as err:
raise GhCommandError(_format_gh_error(cmd, err)) from err
raw = proc.stdout.strip()
if not raw:
return None
try:
return json.loads(raw)
except json.JSONDecodeError as err:
raise GhCommandError(
f"Failed to parse JSON from gh output for {' '.join(args)}"
) from err
def gh_text(args):
cmd = ["gh", *args]
try:
proc = subprocess.run(cmd, check=True, capture_output=True, text=True)
except (FileNotFoundError, subprocess.CalledProcessError):
return ""
return proc.stdout.strip()
def git_head():
try:
proc = subprocess.run(
["git", "rev-parse", "--short=12", "HEAD"],
check=True,
capture_output=True,
text=True,
)
except (FileNotFoundError, subprocess.CalledProcessError):
return None
return proc.stdout.strip() or None
def skill_relative_path():
try:
return str(Path(__file__).resolve().relative_to(Path.cwd().resolve()))
except ValueError:
return str(Path(__file__).resolve())
def gh_api_list_paginated(endpoint, per_page=100, max_pages=None, with_metadata=False):
items = []
page = 1
truncated = False
while True:
sep = "&" if "?" in endpoint else "?"
page_endpoint = f"{endpoint}{sep}per_page={per_page}&page={page}"
payload = gh_json(["api", page_endpoint])
if payload is None:
break
if not isinstance(payload, list):
raise GhCommandError(f"Unexpected paginated payload from gh api {endpoint}")
items.extend(payload)
if len(payload) < per_page:
break
if max_pages is not None and page >= max_pages:
truncated = True
break
page += 1
if with_metadata:
return {
"items": items,
"truncated": truncated,
"pages": page,
"max_pages": max_pages,
}
return items
def search_issue_numbers(queries, limit):
numbers = {}
for query in queries:
page = 1
seen_for_query = 0
while True:
payload = gh_json(
[
"api",
"search/issues",
"-X",
"GET",
"-f",
f"q={query}",
"-f",
"sort=updated",
"-f",
"order=desc",
"-f",
"per_page=100",
"-f",
f"page={page}",
]
)
if not isinstance(payload, dict):
raise GhCommandError("Unexpected payload from GitHub issue search")
items = payload.get("items") or []
if not isinstance(items, list):
raise GhCommandError("Expected search `items` to be a list")
for item in items:
if not isinstance(item, dict):
continue
number = item.get("number")
if isinstance(number, int):
numbers[number] = str(item.get("updated_at") or "")
seen_for_query += 1
if len(items) < 100 or seen_for_query >= limit:
break
page += 1
ordered = sorted(
numbers, key=lambda number: (numbers[number], number), reverse=True
)
return ordered[:limit]
def fetch_issue(repo, number):
payload = gh_json(["api", f"repos/{repo}/issues/{number}"])
if not isinstance(payload, dict):
raise GhCommandError(f"Unexpected issue payload for #{number}")
return payload
def fetch_comments(repo, number, since=None, max_pages=None):
endpoint = f"repos/{repo}/issues/{number}/comments"
if since is not None:
endpoint = f"{endpoint}?since={quote(format_timestamp(since), safe='')}"
return gh_api_list_paginated(
endpoint,
max_pages=max_pages,
with_metadata=True,
)
def fetch_reactions_for_item(endpoint, item):
if reaction_summary(item)["total"] <= 0:
return []
return gh_api_list_paginated(endpoint)
def fetch_comment_reactions(repo, comments):
reactions_by_comment_id = {}
for comment in comments:
comment_id = comment.get("id")
if comment_id in (None, ""):
continue
endpoint = f"repos/{repo}/issues/comments/{comment_id}/reactions"
reactions_by_comment_id[comment_id] = fetch_reactions_for_item(
endpoint, comment
)
return reactions_by_comment_id
def extract_login(user_obj):
if isinstance(user_obj, dict):
return str(user_obj.get("login") or "")
return ""
def is_bot_login(login):
return bool(login) and login.lower().endswith("[bot]")
def is_human_user(user_obj):
login = extract_login(user_obj)
return bool(login) and not is_bot_login(login)
def label_names(issue):
labels = []
for label in issue.get("labels") or []:
if isinstance(label, dict) and label.get("name"):
labels.append(str(label["name"]))
return sorted(labels, key=str.casefold)
def matching_labels(labels, requested):
labels_by_key = {label.casefold(): label for label in labels}
return [label for label in requested if label.casefold() in labels_by_key]
def area_labels(labels):
kind_keys = {label.casefold() for label in QUALIFYING_KIND_LABELS}
return [label for label in labels if label.casefold() not in kind_keys]
def attention_thresholds_for_window(window_hours):
if window_hours <= 0:
raise ValueError("window_hours must be > 0")
window_hours = round(window_hours, 6)
scale = window_hours / BASE_ATTENTION_WINDOW_HOURS
elevated = max(1, math.ceil(ONE_ATTENTION_INTERACTION_THRESHOLD * scale))
very_high = max(
elevated + 1, math.ceil(TWO_ATTENTION_INTERACTION_THRESHOLD * scale)
)
return {
"base_window_hours": BASE_ATTENTION_WINDOW_HOURS,
"window_hours": round(window_hours, 3),
"scale": round(scale, 3),
"elevated": elevated,
"very_high": very_high,
}
def attention_level_for(user_interactions, attention_thresholds=None):
thresholds = attention_thresholds or attention_thresholds_for_window(
BASE_ATTENTION_WINDOW_HOURS
)
if user_interactions >= thresholds["very_high"]:
return 2
if user_interactions >= thresholds["elevated"]:
return 1
return 0
def attention_marker_for(user_interactions, attention_thresholds=None):
return "🔥" * attention_level_for(user_interactions, attention_thresholds)
def reaction_summary(item):
reactions = item.get("reactions")
if not isinstance(reactions, dict):
return {"total": 0, "counts": {}}
counts = {}
for key in REACTION_KEYS:
value = reactions.get(key, 0)
if isinstance(value, int) and value:
counts[key] = value
total = reactions.get("total_count")
if not isinstance(total, int):
total = sum(counts.values())
return {"total": total, "counts": counts}
def reaction_event_summary(reactions, since, until):
counts = {}
total = 0
for reaction in reactions or []:
if not isinstance(reaction, dict):
continue
if not is_in_window(str(reaction.get("created_at") or ""), since, until):
continue
if not is_human_user(reaction.get("user")):
continue
content = str(reaction.get("content") or "")
if not content:
continue
counts[content] = counts.get(content, 0) + 1
total += 1
return {
"total": total,
"counts": counts,
"upvotes": counts.get("+1", 0),
}
def compact_text(value, limit):
text = re.sub(r"\s+", " ", str(value or "")).strip()
if limit <= 0:
return ""
if len(text) <= limit:
return text
return f"{text[: max(limit - 1, 0)].rstrip()}..."
def clean_title_for_description(title):
cleaned = re.sub(r"\s+", " ", str(title or "")).strip()
cleaned = re.sub(
r"^(codex(?: desktop| app|\.app| cli)?|desktop|windows codex app)\s*[:,-]\s*",
"",
cleaned,
flags=re.IGNORECASE,
)
cleaned = re.sub(r"^on windows,\s*", "Windows: ", cleaned, flags=re.IGNORECASE)
cleaned = cleaned.strip(" -:;")
return compact_text(cleaned, 80) or "Issue needs owner review"
def issue_description(issue):
return clean_title_for_description(issue.get("title"))
def is_in_window(timestamp, since, until):
parsed = parse_timestamp(timestamp, "timestamp")
if parsed is None:
return False
return since <= parsed < until
def summarize_comment(
comment, comment_chars, reaction_events=None, since=None, until=None
):
reactions = reaction_summary(comment)
new_reactions = (
reaction_event_summary(reaction_events, since, until)
if since is not None and until is not None
else {"total": 0, "counts": {}, "upvotes": 0}
)
human_user_interaction = is_human_user(comment.get("user"))
return {
"id": comment.get("id"),
"author": extract_login(comment.get("user")),
"author_association": str(comment.get("author_association") or ""),
"created_at": str(comment.get("created_at") or ""),
"updated_at": str(comment.get("updated_at") or ""),
"url": str(comment.get("html_url") or ""),
"human_user_interaction": human_user_interaction,
"reactions": reactions["counts"],
"reaction_total": reactions["total"],
"new_reactions": new_reactions["total"],
"new_upvotes": new_reactions["upvotes"],
"new_reaction_counts": new_reactions["counts"],
"body_excerpt": compact_text(comment.get("body"), comment_chars),
}
def summarize_issue(
issue,
comments,
requested_labels,
since,
until,
body_chars,
comment_chars,
issue_reaction_events=None,
comment_reactions_by_id=None,
all_labels=False,
comments_hydration=None,
attention_thresholds=None,
):
labels = label_names(issue)
labels_by_key = {label.casefold() for label in labels}
kind_labels = [
label for label in QUALIFYING_KIND_LABELS if label.casefold() in labels_by_key
]
if all_labels:
owner_labels = area_labels(labels) or ["unlabeled"]
else:
owner_labels = matching_labels(labels, requested_labels)
if not kind_labels or not owner_labels:
return None
updated_at = str(issue.get("updated_at") or "")
if not is_in_window(updated_at, since, until):
return None
new_issue = is_in_window(str(issue.get("created_at") or ""), since, until)
comment_reactions_by_id = comment_reactions_by_id or {}
new_comments = [
summarize_comment(
comment,
comment_chars,
reaction_events=comment_reactions_by_id.get(comment.get("id")),
since=since,
until=until,
)
for comment in comments
if is_in_window(str(comment.get("created_at") or ""), since, until)
]
new_comments.sort(key=lambda item: (item["created_at"], str(item["id"])))
issue_reactions = reaction_summary(issue)
issue_reaction_events_summary = reaction_event_summary(
issue_reaction_events, since, until
)
comment_reaction_events_summary = reaction_event_summary(
[
reaction
for reactions in comment_reactions_by_id.values()
for reaction in reactions
],
since,
until,
)
new_reactions = (
issue_reaction_events_summary["total"]
+ comment_reaction_events_summary["total"]
)
new_upvotes = (
issue_reaction_events_summary["upvotes"]
+ comment_reaction_events_summary["upvotes"]
)
all_comment_reaction_total = sum(
reaction_summary(comment)["total"] for comment in comments
)
new_comment_reaction_total = sum(
comment["reaction_total"] for comment in new_comments
)
new_issue_user_interaction = new_issue and is_human_user(issue.get("user"))
new_comment_user_interactions = sum(
1 for comment in new_comments if comment["human_user_interaction"]
)
user_interactions = (
int(new_issue_user_interaction) + new_comment_user_interactions + new_reactions
)
attention_level = attention_level_for(user_interactions, attention_thresholds)
attention_marker = attention_marker_for(user_interactions, attention_thresholds)
updated_without_visible_new_post = (
not new_issue and not new_comments and new_reactions == 0
)
engagement_score = (
len(new_comments) * 3
+ new_reactions
+ issue_reactions["total"]
+ new_comment_reaction_total
+ min(int(issue.get("comments") or len(comments) or 0), 10)
)
return {
"number": issue.get("number"),
"title": str(issue.get("title") or ""),
"description": issue_description(issue),
"url": str(issue.get("html_url") or ""),
"state": str(issue.get("state") or ""),
"author": extract_login(issue.get("user")),
"author_association": str(issue.get("author_association") or ""),
"created_at": str(issue.get("created_at") or ""),
"updated_at": updated_at,
"labels": labels,
"kind_labels": kind_labels,
"owner_labels": owner_labels,
"comments_total": int(issue.get("comments") or len(comments) or 0),
"comments_hydration": comments_hydration
or {
"fetched": len(comments),
"since": None,
"truncated": False,
"max_pages": None,
},
"issue_reactions": issue_reactions["counts"],
"issue_reaction_total": issue_reactions["total"],
"comment_reaction_total": all_comment_reaction_total,
"new_comment_reaction_total": new_comment_reaction_total,
"new_issue_reactions": issue_reaction_events_summary["total"],
"new_issue_upvotes": issue_reaction_events_summary["upvotes"],
"new_comment_reactions": comment_reaction_events_summary["total"],
"new_comment_upvotes": comment_reaction_events_summary["upvotes"],
"new_reactions": new_reactions,
"new_upvotes": new_upvotes,
"user_interactions": user_interactions,
"attention": attention_level > 0,
"attention_level": attention_level,
"attention_marker": attention_marker,
"engagement_score": engagement_score,
"activity": {
"new_issue": new_issue,
"new_comments": len(new_comments),
"new_human_comments": new_comment_user_interactions,
"new_reactions": new_reactions,
"new_upvotes": new_upvotes,
"updated_without_visible_new_post": updated_without_visible_new_post,
},
"body_excerpt": compact_text(issue.get("body"), body_chars),
"new_comments": new_comments,
}
def count_by_label(issues, labels):
out = {}
for label in labels:
matching = [issue for issue in issues if label in issue["owner_labels"]]
out[label] = {
"issues": len(matching),
"new_issues": sum(
1 for issue in matching if issue["activity"]["new_issue"]
),
"new_comments": sum(
issue["activity"]["new_comments"] for issue in matching
),
}
return out
def count_by_kind(issues):
out = {}
for kind in QUALIFYING_KIND_LABELS:
matching = [issue for issue in issues if kind in issue["kind_labels"]]
out[kind] = {
"issues": len(matching),
"new_issues": sum(
1 for issue in matching if issue["activity"]["new_issue"]
),
"new_comments": sum(
issue["activity"]["new_comments"] for issue in matching
),
}
return out
def hot_items(issues, limit=8):
ranked = sorted(
issues,
key=lambda issue: (
issue["attention"],
issue["attention_level"],
issue["user_interactions"],
issue["engagement_score"],
issue["activity"]["new_comments"],
issue["issue_reaction_total"] + issue["comment_reaction_total"],
issue["updated_at"],
),
reverse=True,
)
return [
{
"number": issue["number"],
"title": issue["title"],
"url": issue["url"],
"owner_labels": issue["owner_labels"],
"kind_labels": issue["kind_labels"],
"attention": issue["attention"],
"attention_level": issue["attention_level"],
"attention_marker": issue["attention_marker"],
"user_interactions": issue["user_interactions"],
"new_reactions": issue["new_reactions"],
"new_upvotes": issue["new_upvotes"],
"engagement_score": issue["engagement_score"],
"new_comments": issue["activity"]["new_comments"],
"reaction_total": issue["issue_reaction_total"]
+ issue["comment_reaction_total"],
}
for issue in ranked[:limit]
if issue["engagement_score"] > 0
]
def ranked_digest_issues(issues):
return sorted(
issues,
key=lambda issue: (
issue["attention"],
issue["attention_level"],
issue["user_interactions"],
issue["engagement_score"],
issue["activity"]["new_comments"],
issue["updated_at"],
),
reverse=True,
)
def digest_rows(issues, limit=10, ref_map=None):
ranked = ranked_digest_issues(issues)
if ref_map is None:
ref_map = {issue["number"]: ref for ref, issue in enumerate(ranked, start=1)}
rows = []
for issue in ranked[:limit]:
ref = ref_map[issue["number"]]
reaction_total = issue["issue_reaction_total"] + issue["comment_reaction_total"]
rows.append(
{
"ref": ref,
"ref_markdown": f"[{ref}]({issue['url']})",
"marker": issue["attention_marker"],
"attention_marker": issue["attention_marker"],
"number": issue["number"],
"description": issue["description"],
"title": issue["title"],
"url": issue["url"],
"area": ", ".join(issue["owner_labels"]),
"kind": ", ".join(issue["kind_labels"]),
"state": issue["state"],
"interactions": issue["user_interactions"],
"user_interactions": issue["user_interactions"],
"new_reactions": issue["new_reactions"],
"new_upvotes": issue["new_upvotes"],
"current_reactions": reaction_total,
}
)
return rows
def issue_ref_markdown(issue, ref_map):
ref = ref_map[issue["number"]]
return f"[{ref}]({issue['url']})"
def summary_inputs(issues, limit=80, ref_map=None):
ranked = ranked_digest_issues(issues)
if ref_map is None:
ref_map = {issue["number"]: ref for ref, issue in enumerate(ranked, start=1)}
rows = []
for issue in ranked[:limit]:
rows.append(
{
"ref": ref_map[issue["number"]],
"ref_markdown": issue_ref_markdown(issue, ref_map),
"number": issue["number"],
"title": issue["title"],
"description": issue["description"],
"url": issue["url"],
"labels": issue["labels"],
"owner_labels": issue["owner_labels"],
"kind_labels": issue["kind_labels"],
"state": issue.get("state", ""),
"attention_marker": issue.get("attention_marker", ""),
"interactions": issue["user_interactions"],
"new_comments": issue["activity"].get("new_comments", 0),
"new_reactions": issue.get("new_reactions", 0),
"new_upvotes": issue.get("new_upvotes", 0),
"current_reactions": issue.get("issue_reaction_total", 0)
+ issue.get("comment_reaction_total", 0),
}
)
return rows
def collect_digest(args):
since, until = resolve_window(args)
window_hours = (until - since).total_seconds() / 3600
attention_thresholds = attention_thresholds_for_window(window_hours)
requested_labels, all_labels = normalize_requested_labels(
args.labels, all_labels=args.all_labels
)
queries = build_search_queries(
args.repo, requested_labels, since, all_labels=all_labels
)
numbers = search_issue_numbers(queries, args.limit_issues)
gh_version_output = gh_text(["--version"])
issues = []
max_comment_pages = None if args.max_comment_pages <= 0 else args.max_comment_pages
for number in numbers:
issue = fetch_issue(args.repo, number)
comments_since = None if args.fetch_all_comments else since
comments_payload = fetch_comments(
args.repo,
number,
since=comments_since,
max_pages=max_comment_pages,
)
comments = comments_payload["items"]
issue_reaction_events = fetch_reactions_for_item(
f"repos/{args.repo}/issues/{number}/reactions", issue
)
comment_reactions_by_id = fetch_comment_reactions(args.repo, comments)
comments_hydration = {
"fetched": len(comments),
"total": int(issue.get("comments") or len(comments) or 0),
"since": format_timestamp(comments_since) if comments_since else None,
"truncated": comments_payload["truncated"],
"max_pages": comments_payload["max_pages"],
"fetch_all_comments": args.fetch_all_comments,
}
summary = summarize_issue(
issue,
comments,
requested_labels,
since,
until,
args.body_chars,
args.comment_chars,
issue_reaction_events=issue_reaction_events,
comment_reactions_by_id=comment_reactions_by_id,
all_labels=all_labels,
comments_hydration=comments_hydration,
attention_thresholds=attention_thresholds,
)
if summary is not None:
issues.append(summary)
issues.sort(
key=lambda issue: (issue["updated_at"], int(issue["number"] or 0)), reverse=True
)
totals = {
"candidate_issues": len(numbers),
"included_issues": len(issues),
"new_issues": sum(1 for issue in issues if issue["activity"]["new_issue"]),
"issues_with_new_comments": sum(
1 for issue in issues if issue["activity"]["new_comments"] > 0
),
"new_comments": sum(issue["activity"]["new_comments"] for issue in issues),
"comments_fetched": sum(
issue["comments_hydration"]["fetched"] for issue in issues
),
"issues_with_truncated_comment_hydration": sum(
1 for issue in issues if issue["comments_hydration"]["truncated"]
),
"updated_without_visible_new_post": sum(
1
for issue in issues
if issue["activity"]["updated_without_visible_new_post"]
),
"issue_reactions_current_total": sum(
issue["issue_reaction_total"] for issue in issues
),
"comment_reactions_current_total": sum(
issue["comment_reaction_total"] for issue in issues
),
"new_reactions": sum(issue["new_reactions"] for issue in issues),
"new_upvotes": sum(issue["new_upvotes"] for issue in issues),
"user_interactions": sum(issue["user_interactions"] for issue in issues),
}
ranked = ranked_digest_issues(issues)
ref_map = {issue["number"]: ref for ref, issue in enumerate(ranked, start=1)}
filter_label = "all" if all_labels else requested_labels
return {
"generated_at": format_timestamp(datetime.now(timezone.utc)),
"source": {
"repo": args.repo,
"skill": "codex-issue-digest",
"collector": skill_relative_path(),
"script_version": SCRIPT_VERSION,
"git_head": git_head(),
"gh_version": gh_version_output.splitlines()[0]
if gh_version_output
else None,
},
"window": {
"since": format_timestamp(since),
"until": format_timestamp(until),
"hours": round(window_hours, 3),
},
"attention_thresholds": attention_thresholds,
"filters": {
"owner_labels": filter_label,
"all_labels": all_labels,
"kind_labels": list(QUALIFYING_KIND_LABELS),
},
"collection_notes": [
"Issues are selected when they currently have bug or enhancement plus at least one requested owner label and were updated during the window.",
"By default, issue comments are fetched with since=window_start and a max page cap to avoid long historical threads; use --fetch-all-comments when exhaustive comment history is needed.",
"New issue comments are filtered by comment creation time within the window from the fetched comment set.",
"Reaction events are counted by GitHub reaction created_at timestamps for hydrated issues and fetched comments.",
"Current reaction totals are standing engagement signals; new_reactions and new_upvotes are windowed activity.",
"The collector does not assign semantic clusters; use summary_inputs as model-ready evidence for report-time clustering.",
"Pure reaction-only issues may be missed if GitHub issue search does not surface them via updated_at.",
"Issues updated during the window without a new issue body or new comment are retained because label/status edits can still be useful owner signals.",
],
"totals": totals,
"by_owner_label": count_by_label(
issues,
sorted(
{area for issue in issues for area in issue["owner_labels"]},
key=str.casefold,
)
if all_labels
else requested_labels,
),
"by_kind_label": count_by_kind(issues),
"hot_items": hot_items(issues),
"summary_inputs": summary_inputs(issues, ref_map=ref_map),
"digest_rows": digest_rows(issues, ref_map=ref_map),
"issues": issues,
}
def main():
args = parse_args()
try:
digest = collect_digest(args)
except (GhCommandError, RuntimeError, ValueError) as err:
sys.stderr.write(f"collect_issue_digest.py error: {err}\n")
return 1
sys.stdout.write(json.dumps(digest, indent=2, sort_keys=True) + "\n")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,685 @@
import importlib.util
from datetime import timezone
from pathlib import Path
MODULE_PATH = Path(__file__).with_name("collect_issue_digest.py")
MODULE_SPEC = importlib.util.spec_from_file_location(
"collect_issue_digest", MODULE_PATH
)
collect_issue_digest = importlib.util.module_from_spec(MODULE_SPEC)
assert MODULE_SPEC.loader is not None
MODULE_SPEC.loader.exec_module(collect_issue_digest)
def test_build_search_queries_uses_each_owner_and_kind_label():
since = collect_issue_digest.parse_timestamp("2026-04-25T12:34:56Z", "--since")
queries = collect_issue_digest.build_search_queries(
"openai/codex", ["tui", "exec"], since
)
assert queries == [
"repo:openai/codex is:issue updated:>=2026-04-25 label:tui label:bug",
"repo:openai/codex is:issue updated:>=2026-04-25 label:tui label:enhancement",
"repo:openai/codex is:issue updated:>=2026-04-25 label:exec label:bug",
"repo:openai/codex is:issue updated:>=2026-04-25 label:exec label:enhancement",
]
def test_build_search_queries_can_scan_all_labels():
since = collect_issue_digest.parse_timestamp("2026-04-25T12:34:56Z", "--since")
queries = collect_issue_digest.build_search_queries(
"openai/codex", [], since, all_labels=True
)
assert queries == [
"repo:openai/codex is:issue updated:>=2026-04-25 label:bug",
"repo:openai/codex is:issue updated:>=2026-04-25 label:enhancement",
]
def test_normalize_requested_labels_accepts_all_area_phrases():
assert collect_issue_digest.normalize_requested_labels(["all", "areas"]) == (
[],
True,
)
assert collect_issue_digest.normalize_requested_labels(["all-labels"]) == (
[],
True,
)
def test_search_issue_numbers_requests_updated_sort(monkeypatch):
calls = []
def fake_gh_json(args):
calls.append(args)
return {
"items": [
{"number": 1, "updated_at": "2026-04-25T00:00:00Z"},
]
}
monkeypatch.setattr(collect_issue_digest, "gh_json", fake_gh_json)
assert collect_issue_digest.search_issue_numbers(["query"], limit=10) == [1]
assert "-f" in calls[0]
assert "sort=updated" in calls[0]
assert "order=desc" in calls[0]
def test_search_issue_numbers_applies_limit_per_query(monkeypatch):
calls = []
def fake_gh_json(args):
calls.append(args)
query = next(
value.removeprefix("q=") for value in args if value.startswith("q=")
)
page = int(
next(
value.removeprefix("page=")
for value in args
if value.startswith("page=")
)
)
base = 10_000 if query == "first" else 20_000
offset = (page - 1) * 100
return {
"items": [
{
"number": base + offset + idx,
"updated_at": f"2026-04-25T00:{idx:02d}:00Z",
}
for idx in range(100)
]
}
monkeypatch.setattr(collect_issue_digest, "gh_json", fake_gh_json)
collect_issue_digest.search_issue_numbers(["first", "second"], limit=150)
queried_pages = [
(
next(
value.removeprefix("q=") for value in args if value.startswith("q=")
),
next(
value.removeprefix("page=")
for value in args
if value.startswith("page=")
),
)
for args in calls
]
assert queried_pages == [
("first", "1"),
("first", "2"),
("second", "1"),
("second", "2"),
]
def test_summarize_issue_keeps_new_comments_and_reaction_signals():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
issue = {
"number": 123,
"title": "TUI does not redraw",
"html_url": "https://github.com/openai/codex/issues/123",
"state": "open",
"created_at": "2026-04-24T20:00:00Z",
"updated_at": "2026-04-25T10:00:00Z",
"user": {"login": "alice"},
"author_association": "NONE",
"comments": 2,
"body": "The terminal freezes after resize.",
"labels": [{"name": "bug"}, {"name": "tui"}],
"reactions": {"total_count": 3, "+1": 2, "rocket": 1},
}
comments = [
{
"id": 1,
"created_at": "2026-04-25T11:00:00Z",
"updated_at": "2026-04-25T11:00:00Z",
"html_url": "https://github.com/openai/codex/issues/123#issuecomment-1",
"user": {"login": "bob"},
"author_association": "MEMBER",
"body": "I can reproduce this on main.",
"reactions": {"total_count": 4, "heart": 1, "+1": 3},
},
{
"id": 2,
"created_at": "2026-04-24T11:00:00Z",
"updated_at": "2026-04-24T11:00:00Z",
"html_url": "https://github.com/openai/codex/issues/123#issuecomment-2",
"user": {"login": "carol"},
"author_association": "NONE",
"body": "Older comment.",
"reactions": {"total_count": 1, "eyes": 1},
},
]
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["tui", "exec"],
since,
until,
body_chars=200,
comment_chars=200,
)
assert summary == {
"number": 123,
"title": "TUI does not redraw",
"description": "TUI does not redraw",
"url": "https://github.com/openai/codex/issues/123",
"state": "open",
"author": "alice",
"author_association": "NONE",
"created_at": "2026-04-24T20:00:00Z",
"updated_at": "2026-04-25T10:00:00Z",
"labels": ["bug", "tui"],
"kind_labels": ["bug"],
"owner_labels": ["tui"],
"comments_total": 2,
"comments_hydration": {
"fetched": 2,
"since": None,
"truncated": False,
"max_pages": None,
},
"issue_reactions": {"+1": 2, "rocket": 1},
"issue_reaction_total": 3,
"comment_reaction_total": 5,
"new_comment_reaction_total": 4,
"new_issue_reactions": 0,
"new_issue_upvotes": 0,
"new_comment_reactions": 0,
"new_comment_upvotes": 0,
"new_reactions": 0,
"new_upvotes": 0,
"user_interactions": 1,
"attention": False,
"attention_level": 0,
"attention_marker": "",
"engagement_score": 12,
"activity": {
"new_issue": False,
"new_comments": 1,
"new_human_comments": 1,
"new_reactions": 0,
"new_upvotes": 0,
"updated_without_visible_new_post": False,
},
"body_excerpt": "The terminal freezes after resize.",
"new_comments": [
{
"id": 1,
"author": "bob",
"author_association": "MEMBER",
"created_at": "2026-04-25T11:00:00Z",
"updated_at": "2026-04-25T11:00:00Z",
"url": "https://github.com/openai/codex/issues/123#issuecomment-1",
"human_user_interaction": True,
"reactions": {"+1": 3, "heart": 1},
"reaction_total": 4,
"new_reactions": 0,
"new_upvotes": 0,
"new_reaction_counts": {},
"body_excerpt": "I can reproduce this on main.",
}
],
}
def test_summarize_issue_filters_non_owner_or_non_kind_labels():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
base_issue = {
"number": 1,
"title": "Question",
"created_at": "2026-04-25T01:00:00Z",
"updated_at": "2026-04-25T01:00:00Z",
"labels": [{"name": "question"}, {"name": "tui"}],
}
assert (
collect_issue_digest.summarize_issue(
base_issue,
[],
["tui"],
since,
until,
body_chars=100,
comment_chars=100,
)
is None
)
issue_without_owner = dict(base_issue)
issue_without_owner["labels"] = [{"name": "bug"}, {"name": "app"}]
assert (
collect_issue_digest.summarize_issue(
issue_without_owner,
[],
["tui"],
since,
until,
body_chars=100,
comment_chars=100,
)
is None
)
def test_resolve_window_defaults_to_previous_hours():
class Args:
since = None
until = "2026-04-26T12:00:00Z"
window_hours = 24
since, until = collect_issue_digest.resolve_window(Args())
assert since.isoformat() == "2026-04-25T12:00:00+00:00"
assert until.tzinfo == timezone.utc
def test_parse_duration_hours_accepts_common_phrases():
assert collect_issue_digest.parse_duration_hours("past week") == 168
assert collect_issue_digest.parse_duration_hours("48h") == 48
assert collect_issue_digest.parse_duration_hours("2 days") == 48
assert collect_issue_digest.parse_duration_hours("1w") == 168
def test_attention_thresholds_scale_by_window_length():
one_day = collect_issue_digest.attention_thresholds_for_window(24)
assert one_day["elevated"] == 5
assert one_day["very_high"] == 10
half_day = collect_issue_digest.attention_thresholds_for_window(12)
assert half_day["elevated"] == 3
assert half_day["very_high"] == 5
week = collect_issue_digest.attention_thresholds_for_window(168)
assert week["elevated"] == 35
assert week["very_high"] == 70
assert collect_issue_digest.attention_marker_for(34, week) == ""
assert collect_issue_digest.attention_marker_for(35, week) == "🔥"
assert collect_issue_digest.attention_marker_for(70, week) == "🔥🔥"
def test_fetch_comments_uses_since_filter_and_page_cap(monkeypatch):
calls = []
def fake_gh_json(args):
calls.append(args)
return [{"id": idx} for idx in range(100)]
monkeypatch.setattr(collect_issue_digest, "gh_json", fake_gh_json)
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
payload = collect_issue_digest.fetch_comments(
"openai/codex", 123, since=since, max_pages=1
)
assert len(payload["items"]) == 100
assert payload["truncated"] is True
assert payload["max_pages"] == 1
assert calls == [
[
"api",
"repos/openai/codex/issues/123/comments?since=2026-04-25T00%3A00%3A00Z&per_page=100&page=1",
]
]
def test_issue_description_prefers_title_over_body_noise():
issue = {
"title": "Codex.app GUI: MCP child processes not reaped after task completion",
"body": "A later crash mention should not override the title-level symptom.",
"labels": [{"name": "app"}, {"name": "bug"}],
}
description = collect_issue_digest.issue_description(issue)
assert "MCP child processes" in description
assert "crash" not in description.casefold()
def test_attention_markers_count_human_user_interactions():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
issue = {
"number": 456,
"title": "Agent context is exploding",
"html_url": "https://github.com/openai/codex/issues/456",
"state": "open",
"created_at": "2026-04-25T01:00:00Z",
"updated_at": "2026-04-25T12:00:00Z",
"user": {"login": "alice"},
"labels": [{"name": "bug"}, {"name": "agent"}],
}
comments = [
{
"id": idx,
"created_at": "2026-04-25T02:00:00Z",
"updated_at": "2026-04-25T02:00:00Z",
"user": {"login": f"user-{idx}"},
"body": "same here",
}
for idx in range(4)
]
comments.append(
{
"id": 99,
"created_at": "2026-04-25T02:00:00Z",
"updated_at": "2026-04-25T02:00:00Z",
"user": {"login": "github-actions[bot]"},
"body": "duplicate bot note",
}
)
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["agent"],
since,
until,
body_chars=100,
comment_chars=100,
)
assert summary["user_interactions"] == 5
assert summary["activity"]["new_human_comments"] == 4
assert summary["attention"] is True
assert summary["attention_level"] == 1
assert summary["attention_marker"] == "🔥"
issue["created_at"] = "2026-04-24T01:00:00Z"
comments.extend(
{
"id": idx,
"created_at": "2026-04-25T03:00:00Z",
"updated_at": "2026-04-25T03:00:00Z",
"user": {"login": f"extra-user-{idx}"},
"body": "also seeing this",
}
for idx in range(100, 106)
)
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["agent"],
since,
until,
body_chars=100,
comment_chars=100,
)
assert summary["user_interactions"] == 10
assert summary["attention_level"] == 2
assert summary["attention_marker"] == "🔥🔥"
def test_reactions_count_toward_attention_markers():
since = collect_issue_digest.parse_timestamp("2026-04-25T00:00:00Z", "--since")
until = collect_issue_digest.parse_timestamp("2026-04-26T00:00:00Z", "--until")
issue = {
"number": 789,
"title": "Support 1M token context",
"html_url": "https://github.com/openai/codex/issues/789",
"state": "open",
"created_at": "2026-04-24T01:00:00Z",
"updated_at": "2026-04-25T12:00:00Z",
"user": {"login": "alice"},
"labels": [{"name": "enhancement"}, {"name": "context"}],
"reactions": {"total_count": 20, "+1": 20},
}
comments = [
{
"id": 1,
"created_at": "2026-04-25T02:00:00Z",
"updated_at": "2026-04-25T02:00:00Z",
"user": {"login": "commenter"},
"body": "please",
"reactions": {"total_count": 2, "+1": 2},
}
]
issue_reactions = [
{
"content": "+1",
"created_at": "2026-04-25T03:00:00Z",
"user": {"login": f"reactor-{idx}"},
}
for idx in range(18)
]
comment_reactions_by_id = {
1: [
{
"content": "heart",
"created_at": "2026-04-25T04:00:00Z",
"user": {"login": "human-reactor"},
},
{
"content": "+1",
"created_at": "2026-04-25T04:00:00Z",
"user": {"login": "github-actions[bot]"},
},
]
}
summary = collect_issue_digest.summarize_issue(
issue,
comments,
["context"],
since,
until,
body_chars=100,
comment_chars=100,
issue_reaction_events=issue_reactions,
comment_reactions_by_id=comment_reactions_by_id,
)
assert summary["new_reactions"] == 19
assert summary["new_upvotes"] == 18
assert summary["user_interactions"] == 20
assert summary["attention_level"] == 2
assert summary["attention_marker"] == "🔥🔥"
assert summary["new_comments"][0]["new_reactions"] == 1
assert summary["new_comments"][0]["new_upvotes"] == 0
def test_digest_rows_are_table_ready_with_concise_descriptions():
rows = collect_issue_digest.digest_rows(
[
{
"number": 1,
"title": "Quiet bug",
"description": "Quiet bug",
"url": "https://github.com/openai/codex/issues/1",
"owner_labels": ["context"],
"kind_labels": ["bug"],
"state": "open",
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 1,
"new_reactions": 0,
"new_upvotes": 0,
"engagement_score": 3,
"issue_reaction_total": 0,
"comment_reaction_total": 0,
"updated_at": "2026-04-25T01:00:00Z",
"activity": {
"new_issue": True,
"new_comments": 0,
"new_reactions": 0,
"updated_without_visible_new_post": False,
},
},
{
"number": 2,
"title": "Busy bug",
"description": "High-volume bug report",
"url": "https://github.com/openai/codex/issues/2",
"owner_labels": ["agent"],
"kind_labels": ["bug"],
"state": "open",
"attention": True,
"attention_level": 1,
"attention_marker": "🔥",
"user_interactions": 17,
"new_reactions": 3,
"new_upvotes": 2,
"engagement_score": 20,
"issue_reaction_total": 5,
"comment_reaction_total": 2,
"updated_at": "2026-04-25T02:00:00Z",
"activity": {
"new_issue": False,
"new_comments": 16,
"new_reactions": 3,
"updated_without_visible_new_post": False,
},
},
]
)
assert rows[0] == {
"ref": 1,
"ref_markdown": "[1](https://github.com/openai/codex/issues/2)",
"marker": "🔥",
"attention_marker": "🔥",
"number": 2,
"description": "High-volume bug report",
"title": "Busy bug",
"url": "https://github.com/openai/codex/issues/2",
"area": "agent",
"kind": "bug",
"state": "open",
"interactions": 17,
"user_interactions": 17,
"new_reactions": 3,
"new_upvotes": 2,
"current_reactions": 7,
}
def test_summary_inputs_are_model_ready_without_preclustering():
issues = [
{
"number": 20,
"title": "Windows app Browser Use external navigation fails",
"description": "Browser Use navigation or app-server failure",
"url": "https://github.com/openai/codex/issues/20",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 3,
"new_reactions": 1,
"engagement_score": 8,
"updated_at": "2026-04-25T04:00:00Z",
"activity": {"new_comments": 2},
},
{
"number": 21,
"title": "On Windows, cmake output waits until timeout",
"description": "Windows command timeout/capture problem",
"url": "https://github.com/openai/codex/issues/21",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 3,
"new_reactions": 0,
"engagement_score": 7,
"updated_at": "2026-04-25T03:00:00Z",
"activity": {"new_comments": 3},
},
{
"number": 22,
"title": "Windows computer use tool fails to click buttons",
"description": "Computer-use workflow failure",
"url": "https://github.com/openai/codex/issues/22",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"attention": False,
"attention_level": 0,
"attention_marker": "",
"user_interactions": 3,
"new_reactions": 0,
"engagement_score": 6,
"updated_at": "2026-04-25T02:00:00Z",
"activity": {"new_comments": 3},
},
]
rows = collect_issue_digest.summary_inputs(issues, ref_map={20: 1, 21: 2, 22: 3})
assert rows == [
{
"ref": 1,
"ref_markdown": "[1](https://github.com/openai/codex/issues/20)",
"number": 20,
"title": "Windows app Browser Use external navigation fails",
"description": "Browser Use navigation or app-server failure",
"url": "https://github.com/openai/codex/issues/20",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"state": "",
"attention_marker": "",
"interactions": 3,
"new_comments": 2,
"new_reactions": 1,
"new_upvotes": 0,
"current_reactions": 0,
},
{
"ref": 2,
"ref_markdown": "[2](https://github.com/openai/codex/issues/21)",
"number": 21,
"title": "On Windows, cmake output waits until timeout",
"description": "Windows command timeout/capture problem",
"url": "https://github.com/openai/codex/issues/21",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"state": "",
"attention_marker": "",
"interactions": 3,
"new_comments": 3,
"new_reactions": 0,
"new_upvotes": 0,
"current_reactions": 0,
},
{
"ref": 3,
"ref_markdown": "[3](https://github.com/openai/codex/issues/22)",
"number": 22,
"title": "Windows computer use tool fails to click buttons",
"description": "Computer-use workflow failure",
"url": "https://github.com/openai/codex/issues/22",
"labels": ["app", "bug"],
"owner_labels": ["app"],
"kind_labels": ["bug"],
"state": "",
"attention_marker": "",
"interactions": 3,
"new_comments": 3,
"new_reactions": 0,
"new_upvotes": 0,
"current_reactions": 0,
},
]

View File

@@ -4,9 +4,11 @@ ARG TZ
ARG DEBIAN_FRONTEND=noninteractive
ARG NODE_MAJOR=22
ARG RUST_TOOLCHAIN=1.92.0
ARG CODEX_NPM_VERSION=latest
# Keep this in sync with .devcontainer/codex-install/package.json and pnpm-lock.yaml.
ARG CODEX_NPM_VERSION=0.121.0
ENV TZ="$TZ"
ENV COREPACK_ENABLE_DOWNLOAD_PROMPT=0
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
@@ -43,12 +45,18 @@ RUN apt-get update \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
COPY .devcontainer/codex-install/package.json \
.devcontainer/codex-install/pnpm-lock.yaml \
.devcontainer/codex-install/pnpm-workspace.yaml \
/opt/codex-install/
RUN curl -fsSL "https://deb.nodesource.com/setup_${NODE_MAJOR}.x" | bash - \
&& apt-get update \
&& apt-get install -y --no-install-recommends nodejs \
&& npm install -g corepack@latest "@openai/codex@${CODEX_NPM_VERSION}" \
&& corepack enable \
&& corepack prepare pnpm@10.28.2 --activate \
&& test "$(node -p "require('/opt/codex-install/package.json').dependencies['@openai/codex']")" = "${CODEX_NPM_VERSION}" \
&& cd /opt/codex-install \
&& corepack pnpm install --prod --frozen-lockfile \
&& ln -s /opt/codex-install/node_modules/.bin/codex /usr/local/bin/codex \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -0,0 +1,13 @@
{
"name": "codex-devcontainer-install",
"private": true,
"description": "Locked Codex CLI install boundary for the secure devcontainer.",
"dependencies": {
"@openai/codex": "0.121.0"
},
"engines": {
"node": ">=22",
"pnpm": ">=10.33.0"
},
"packageManager": "pnpm@10.33.0+sha512.10568bb4a6afb58c9eb3630da90cc9516417abebd3fabbe6739f0ae795728da1491e9db5a544c76ad8eb7570f5c4bb3d6c637b2cb41bfdcdb47fa823c8649319"
}

View File

@@ -0,0 +1,85 @@
lockfileVersion: '9.0'
settings:
autoInstallPeers: true
excludeLinksFromLockfile: false
importers:
.:
dependencies:
'@openai/codex':
specifier: 0.121.0
version: 0.121.0
packages:
'@openai/codex@0.121.0':
resolution: {integrity: sha512-kCJ2NeATd4QBQRmqV04ymdN1ZU3MSwnJQDm/KzjpuzGvCuUVEn7no/T2mRyxQ2x77AACqriNOyPPoM/yufyvNg==}
engines: {node: '>=16'}
hasBin: true
'@openai/codex@0.121.0-darwin-arm64':
resolution: {integrity: sha512-ZyBqIB6Fb4I0hGb/h65Vu7ePYjHSmGiqqfm+/1djEuxDPkqjfi4wkxYxNYNY+6najyNGN4UijOSTTf19eDCrqw==}
engines: {node: '>=16'}
cpu: [arm64]
os: [darwin]
'@openai/codex@0.121.0-darwin-x64':
resolution: {integrity: sha512-1/OAtdkAZ5yPI3xqaEFlHuPziS1yCqL2gOZdswE7HTmmwpIxi6Z3FCo60JWDPluIp89z4tftdjq73/OCN0YVcw==}
engines: {node: '>=16'}
cpu: [x64]
os: [darwin]
'@openai/codex@0.121.0-linux-arm64':
resolution: {integrity: sha512-2UgMmdo237o7SCMsfb529cOSEM2HFUgN6OBkv5SBLwfNY1NO2Ex6JnUjlppEXlX6/4cXfZ5qjDghVz5j/+B9zw==}
engines: {node: '>=16'}
cpu: [arm64]
os: [linux]
'@openai/codex@0.121.0-linux-x64':
resolution: {integrity: sha512-vlpNJXIqss800J+32Vy7TUZzv31n61b45OLxmsVQGFkTNLJcjFrj9jDUC7I62eC4F16gLioilefNfv4CdJQOEw==}
engines: {node: '>=16'}
cpu: [x64]
os: [linux]
'@openai/codex@0.121.0-win32-arm64':
resolution: {integrity: sha512-m88q4f3XI5npn1t6OG0nWGHWWAjO5FgjRwxh4hdujbLO6t9CiCNfhfPZIOSsoATbrCNwLC+6S77m3cjbNToPNg==}
engines: {node: '>=16'}
cpu: [arm64]
os: [win32]
'@openai/codex@0.121.0-win32-x64':
resolution: {integrity: sha512-Fp0ecVOyM+VcBi/y4HVvRzhifO9YqRiHzhV3rhtAppC7flh22WPguLC4kmvXYAR0p3RPzbo35M2CedWnkOT+cw==}
engines: {node: '>=16'}
cpu: [x64]
os: [win32]
snapshots:
'@openai/codex@0.121.0':
optionalDependencies:
'@openai/codex-darwin-arm64': '@openai/codex@0.121.0-darwin-arm64'
'@openai/codex-darwin-x64': '@openai/codex@0.121.0-darwin-x64'
'@openai/codex-linux-arm64': '@openai/codex@0.121.0-linux-arm64'
'@openai/codex-linux-x64': '@openai/codex@0.121.0-linux-x64'
'@openai/codex-win32-arm64': '@openai/codex@0.121.0-win32-arm64'
'@openai/codex-win32-x64': '@openai/codex@0.121.0-win32-x64'
'@openai/codex@0.121.0-darwin-arm64':
optional: true
'@openai/codex@0.121.0-darwin-x64':
optional: true
'@openai/codex@0.121.0-linux-arm64':
optional: true
'@openai/codex@0.121.0-linux-x64':
optional: true
'@openai/codex@0.121.0-win32-arm64':
optional: true
'@openai/codex@0.121.0-win32-x64':
optional: true

View File

@@ -0,0 +1,12 @@
packages:
- "."
minimumReleaseAge: 10080
minimumReleaseAgeExclude: []
blockExoticSubdeps: true
strictDepBuilds: true
trustPolicy: no-downgrade
trustPolicyIgnoreAfter: 10080
trustPolicyExclude: []
allowBuilds: {}

View File

@@ -8,7 +8,7 @@
"TZ": "${localEnv:TZ:UTC}",
"NODE_MAJOR": "22",
"RUST_TOOLCHAIN": "1.92.0",
"CODEX_NPM_VERSION": "latest"
"CODEX_NPM_VERSION": "0.121.0"
}
},
"runArgs": [

1
.gitattributes vendored
View File

@@ -1 +1,2 @@
codex-rs/app-server-protocol/schema/** linguist-generated
codex-rs/hooks/schema/generated/** linguist-generated

View File

@@ -7,6 +7,9 @@ inputs:
artifacts-dir:
description: Absolute path to the directory containing built binaries to sign.
required: true
binaries:
description: Space-delimited binary basenames to sign.
default: "codex codex-responses-api-proxy"
runs:
using: composite
@@ -18,6 +21,7 @@ runs:
shell: bash
env:
ARTIFACTS_DIR: ${{ inputs.artifacts-dir }}
BINARIES: ${{ inputs.binaries }}
COSIGN_EXPERIMENTAL: "1"
COSIGN_YES: "true"
COSIGN_OIDC_CLIENT_ID: "sigstore"
@@ -31,7 +35,7 @@ runs:
exit 1
fi
for binary in codex codex-responses-api-proxy; do
for binary in ${BINARIES}; do
artifact="${dest}/${binary}"
if [[ ! -f "$artifact" ]]; then
echo "Binary $artifact not found"

View File

@@ -4,6 +4,9 @@ inputs:
target:
description: Rust compilation target triple (e.g. aarch64-apple-darwin).
required: true
binaries:
description: Space-delimited binary basenames to sign and notarize.
default: "codex codex-responses-api-proxy"
sign-binaries:
description: Whether to sign and notarize the macOS binaries.
required: false
@@ -119,6 +122,7 @@ runs:
shell: bash
env:
TARGET: ${{ inputs.target }}
BINARIES: ${{ inputs.binaries }}
run: |
set -euo pipefail
@@ -134,7 +138,7 @@ runs:
entitlements_path="$GITHUB_ACTION_PATH/codex.entitlements.plist"
for binary in codex codex-responses-api-proxy; do
for binary in ${BINARIES}; do
path="codex-rs/target/${TARGET}/release/${binary}"
codesign --force --options runtime --timestamp --entitlements "$entitlements_path" --sign "$APPLE_CODESIGN_IDENTITY" "${keychain_args[@]}" "$path"
done
@@ -144,6 +148,7 @@ runs:
shell: bash
env:
TARGET: ${{ inputs.target }}
BINARIES: ${{ inputs.binaries }}
APPLE_NOTARIZATION_KEY_P8: ${{ inputs.apple-notarization-key-p8 }}
APPLE_NOTARIZATION_KEY_ID: ${{ inputs.apple-notarization-key-id }}
APPLE_NOTARIZATION_ISSUER_ID: ${{ inputs.apple-notarization-issuer-id }}
@@ -182,8 +187,9 @@ runs:
notarize_submission "$binary" "$archive_path" "$notary_key_path"
}
notarize_binary "codex"
notarize_binary "codex-responses-api-proxy"
for binary in ${BINARIES}; do
notarize_binary "${binary}"
done
- name: Sign and notarize macOS dmg
if: ${{ inputs.sign-dmg == 'true' }}

View File

@@ -8,7 +8,7 @@ inputs:
description: Logical namespace used to keep concurrent Bazel jobs from reserving the same repository cache key.
required: true
install-test-prereqs:
description: Install Node.js and DotSlash for Bazel-backed test jobs.
description: Install DotSlash for Bazel-backed test jobs.
required: false
default: "false"
outputs:

View File

@@ -5,7 +5,7 @@ inputs:
description: Target triple used for cache namespacing.
required: true
install-test-prereqs:
description: Install Node.js and DotSlash for Bazel-backed test jobs.
description: Install DotSlash for Bazel-backed test jobs.
required: false
default: "false"
outputs:
@@ -16,12 +16,6 @@ outputs:
runs:
using: composite
steps:
- name: Set up Node.js for js_repl tests
if: inputs.install-test-prereqs == 'true'
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6
with:
node-version-file: codex-rs/node-version.txt
# Some integration tests rely on DotSlash being installed.
# See https://github.com/openai/codex/pull/7617.
- name: Install DotSlash
@@ -39,7 +33,7 @@ runs:
run: Copy-Item (Get-Command dotslash).Source -Destination "$env:LOCALAPPDATA\Microsoft\WindowsApps\dotslash.exe"
- name: Set up Bazel
uses: bazelbuild/setup-bazelisk@b39c379c82683a5f25d34f0d062761f62693e0b2 # v3
uses: bazel-contrib/setup-bazel@c5acdfb288317d0b5c0bbd7a396a3dc868bb0f86 # 0.19.0
- name: Configure Bazel repository cache
id: configure_bazel_repository_cache
@@ -122,6 +116,11 @@ runs:
}
}
- name: Compute cache-stable Windows Bazel PATH
if: runner.os == 'Windows'
shell: pwsh
run: ./.github/scripts/compute-bazel-windows-path.ps1
- name: Enable Git long paths (Windows)
if: runner.os == 'Windows'
shell: pwsh

View File

@@ -4,6 +4,9 @@ inputs:
target:
description: Target triple for the artifacts to sign.
required: true
binaries:
description: Space-delimited binary basenames to sign.
default: "codex codex-responses-api-proxy codex-windows-sandbox-setup codex-command-runner"
client-id:
description: Azure Trusted Signing client ID.
required: true
@@ -33,6 +36,23 @@ runs:
tenant-id: ${{ inputs.tenant-id }}
subscription-id: ${{ inputs.subscription-id }}
- name: Prepare file list
id: prepare
shell: bash
env:
TARGET: ${{ inputs.target }}
BINARIES: ${{ inputs.binaries }}
run: |
set -euo pipefail
{
echo "files<<EOF"
for binary in ${BINARIES}; do
echo "${GITHUB_WORKSPACE}/codex-rs/target/${TARGET}/release/${binary}.exe"
done
echo "EOF"
} >> "$GITHUB_OUTPUT"
- name: Sign Windows binaries with Azure Trusted Signing
uses: azure/trusted-signing-action@1d365fec12862c4aa68fcac418143d73f0cea293 # v0
with:
@@ -50,8 +70,4 @@ runs:
exclude-azure-developer-cli-credential: true
exclude-interactive-browser-credential: true
cache-dependencies: false
files: |
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex.exe
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex-responses-api-proxy.exe
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex-windows-sandbox-setup.exe
${{ github.workspace }}/codex-rs/target/${{ inputs.target }}/release/codex-command-runner.exe
files: ${{ steps.prepare.outputs.files }}

View File

@@ -11,11 +11,11 @@
"path": "codex"
},
"linux-x86_64": {
"regex": "^codex-x86_64-unknown-linux-musl\\.zst$",
"regex": "^codex-x86_64-unknown-linux-musl-bundle\\.tar\\.zst$",
"path": "codex"
},
"linux-aarch64": {
"regex": "^codex-aarch64-unknown-linux-musl\\.zst$",
"regex": "^codex-aarch64-unknown-linux-musl-bundle\\.tar\\.zst$",
"path": "codex"
},
"windows-x86_64": {
@@ -28,6 +28,34 @@
}
}
},
"codex-app-server": {
"platforms": {
"macos-aarch64": {
"regex": "^codex-app-server-aarch64-apple-darwin\\.zst$",
"path": "codex-app-server"
},
"macos-x86_64": {
"regex": "^codex-app-server-x86_64-apple-darwin\\.zst$",
"path": "codex-app-server"
},
"linux-x86_64": {
"regex": "^codex-app-server-x86_64-unknown-linux-musl\\.zst$",
"path": "codex-app-server"
},
"linux-aarch64": {
"regex": "^codex-app-server-aarch64-unknown-linux-musl\\.zst$",
"path": "codex-app-server"
},
"windows-x86_64": {
"regex": "^codex-app-server-x86_64-pc-windows-msvc\\.exe\\.zst$",
"path": "codex-app-server.exe"
},
"windows-aarch64": {
"regex": "^codex-app-server-aarch64-pc-windows-msvc\\.exe\\.zst$",
"path": "codex-app-server.exe"
}
}
},
"codex-responses-api-proxy": {
"platforms": {
"macos-aarch64": {
@@ -56,6 +84,18 @@
}
}
},
"bwrap": {
"platforms": {
"linux-x86_64": {
"regex": "^bwrap-x86_64-unknown-linux-musl\\.zst$",
"path": "bwrap"
},
"linux-aarch64": {
"regex": "^bwrap-aarch64-unknown-linux-musl\\.zst$",
"path": "bwrap"
}
}
},
"codex-command-runner": {
"platforms": {
"windows-x86_64": {

View File

@@ -1,6 +1,6 @@
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed:
External code contributions are by invitation only. Please read the dedicated "Contributing" markdown file for details:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.

View File

@@ -0,0 +1,113 @@
<#
BuildBuddy cache keys include the action and test environment, so Bazel should
not inherit the full hosted-runner PATH on Windows. That PATH includes volatile
tool entries, such as Maven, that can change independently of this repo and
cause avoidable cache misses.
This script derives a smaller, cache-stable PATH that keeps the Windows
toolchain entries Bazel-backed CI tasks need: MSVC and Windows SDK paths,
MinGW runtime DLL paths for gnullvm-built tests, Git, PowerShell, Node, Python,
DotSlash, and the standard Windows system directories.
`setup-bazel-ci` runs this after exporting the MSVC environment, and the script
publishes the result via `GITHUB_ENV` as `CODEX_BAZEL_WINDOWS_PATH` so later
steps can pass that explicit PATH to Bazel.
#>
$stablePathEntries = New-Object System.Collections.Generic.List[string]
$seenEntries = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::OrdinalIgnoreCase)
$windowsAppsPath = if ([string]::IsNullOrWhiteSpace($env:LOCALAPPDATA)) {
$null
} else {
"$($env:LOCALAPPDATA)\Microsoft\WindowsApps"
}
$windowsDir = if ($env:WINDIR) {
$env:WINDIR
} elseif ($env:SystemRoot) {
$env:SystemRoot
} else {
$null
}
function Add-StablePathEntry {
param([string]$PathEntry)
if ([string]::IsNullOrWhiteSpace($PathEntry)) {
return
}
if ($seenEntries.Add($PathEntry)) {
[void]$stablePathEntries.Add($PathEntry)
}
}
foreach ($pathEntry in ($env:PATH -split ';')) {
if ([string]::IsNullOrWhiteSpace($pathEntry)) {
continue
}
if (
$pathEntry -like '*Microsoft Visual Studio*' -or
$pathEntry -like '*Windows Kits*' -or
$pathEntry -like '*Microsoft SDKs*' -or
$pathEntry -eq 'C:\mingw64\bin' -or
$pathEntry -like 'C:\msys64\*\bin' -or
$pathEntry -like 'C:\Program Files\Git\*' -or
$pathEntry -like 'C:\Program Files\PowerShell\*' -or
$pathEntry -like 'C:\hostedtoolcache\windows\node\*' -or
$pathEntry -like 'C:\hostedtoolcache\windows\Python\*' -or
$pathEntry -eq 'D:\a\_temp\install-dotslash\bin' -or
($windowsDir -and ($pathEntry -eq $windowsDir -or $pathEntry -like "${windowsDir}\*"))
) {
Add-StablePathEntry $pathEntry
}
}
$gitCommand = Get-Command git -ErrorAction SilentlyContinue
if ($gitCommand) {
Add-StablePathEntry (Split-Path $gitCommand.Source -Parent)
}
$nodeCommand = Get-Command node -ErrorAction SilentlyContinue
if ($nodeCommand) {
Add-StablePathEntry (Split-Path $nodeCommand.Source -Parent)
}
$python3Command = Get-Command python3 -ErrorAction SilentlyContinue
if ($python3Command) {
Add-StablePathEntry (Split-Path $python3Command.Source -Parent)
}
$pythonCommand = Get-Command python -ErrorAction SilentlyContinue
if ($pythonCommand) {
Add-StablePathEntry (Split-Path $pythonCommand.Source -Parent)
}
$pwshCommand = Get-Command pwsh -ErrorAction SilentlyContinue
if ($pwshCommand) {
Add-StablePathEntry (Split-Path $pwshCommand.Source -Parent)
}
foreach ($mingwPath in @('C:\mingw64\bin', 'C:\msys64\mingw64\bin', 'C:\msys64\ucrt64\bin')) {
if (Test-Path $mingwPath) {
Add-StablePathEntry $mingwPath
}
}
if ($windowsAppsPath) {
Add-StablePathEntry $windowsAppsPath
}
if ($stablePathEntries.Count -eq 0) {
throw 'Failed to derive cache-stable Windows PATH.'
}
if ([string]::IsNullOrWhiteSpace($env:GITHUB_ENV)) {
throw 'GITHUB_ENV must be set.'
}
$stablePath = $stablePathEntries -join ';'
Write-Host 'Derived CODEX_BAZEL_WINDOWS_PATH entries:'
foreach ($pathEntry in $stablePathEntries) {
Write-Host " $pathEntry"
}
"CODEX_BAZEL_WINDOWS_PATH=$stablePath" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append

View File

@@ -2,16 +2,6 @@
set -euo pipefail
ci_config=ci-linux
case "${RUNNER_OS:-}" in
macOS)
ci_config=ci-macos
;;
Windows)
ci_config=ci-windows
;;
esac
bazel_lint_args=("$@")
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
has_host_platform_override=0
@@ -44,29 +34,6 @@ if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
bazel_lint_args+=("--skip_incompatible_explicit_targets")
fi
bazel_startup_args=()
if [[ -n "${BAZEL_OUTPUT_USER_ROOT:-}" ]]; then
bazel_startup_args+=("--output_user_root=${BAZEL_OUTPUT_USER_ROOT}")
fi
run_bazel() {
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
MSYS2_ARG_CONV_EXCL='*' bazel "$@"
return
fi
bazel "$@"
}
run_bazel_with_startup_args() {
if [[ ${#bazel_startup_args[@]} -gt 0 ]]; then
run_bazel "${bazel_startup_args[@]}" "$@"
return
fi
run_bazel "$@"
}
read_query_labels() {
local query="$1"
local query_stdout
@@ -74,12 +41,10 @@ read_query_labels() {
query_stdout="$(mktemp)"
query_stderr="$(mktemp)"
if ! run_bazel_with_startup_args \
--noexperimental_remote_repo_contents_cache \
query \
if ! ./.github/scripts/run-bazel-query-ci.sh \
--keep_going \
--output=label \
"$query" >"$query_stdout" 2>"$query_stderr"; then
-- "$query" >"$query_stdout" 2>"$query_stderr"; then
cat "$query_stderr" >&2
rm -f "$query_stdout" "$query_stderr"
exit 1

View File

@@ -4,9 +4,9 @@ set -euo pipefail
print_failed_bazel_test_logs=0
print_failed_bazel_action_summary=0
use_node_test_env=0
remote_download_toplevel=0
windows_msvc_host_platform=0
windows_cross_compile=0
while [[ $# -gt 0 ]]; do
case "$1" in
@@ -18,10 +18,6 @@ while [[ $# -gt 0 ]]; do
print_failed_bazel_action_summary=1
shift
;;
--use-node-test-env)
use_node_test_env=1
shift
;;
--remote-download-toplevel)
remote_download_toplevel=1
shift
@@ -30,6 +26,10 @@ while [[ $# -gt 0 ]]; do
windows_msvc_host_platform=1
shift
;;
--windows-cross-compile)
windows_cross_compile=1
shift
;;
--)
shift
break
@@ -42,7 +42,7 @@ while [[ $# -gt 0 ]]; do
done
if [[ $# -eq 0 ]]; then
echo "Usage: $0 [--print-failed-test-logs] [--print-failed-action-summary] [--use-node-test-env] [--remote-download-toplevel] [--windows-msvc-host-platform] -- <bazel args> -- <targets>" >&2
echo "Usage: $0 [--print-failed-test-logs] [--print-failed-action-summary] [--remote-download-toplevel] [--windows-msvc-host-platform] [--windows-cross-compile] -- <bazel args> -- <targets>" >&2
exit 1
fi
@@ -66,7 +66,11 @@ case "${RUNNER_OS:-}" in
ci_config=ci-macos
;;
Windows)
ci_config=ci-windows
if [[ $windows_cross_compile -eq 1 ]]; then
ci_config=ci-windows-cross
else
ci_config=ci-windows
fi
;;
esac
@@ -110,8 +114,8 @@ print_bazel_test_log_tails() {
while IFS= read -r target; do
failed_targets+=("$target")
done < <(
grep -E '^FAIL: //' "$console_log" \
| sed -E 's#^FAIL: (//[^ ]+).*#\1#' \
grep -E '^(FAIL: //|ERROR: .* Testing //)' "$console_log" \
| sed -E 's#^FAIL: (//[^ ]+).*#\1#; s#^ERROR: .* Testing (//[^ ]+) failed:.*#\1#' \
| sort -u
)
@@ -249,14 +253,10 @@ if [[ ${#bazel_args[@]} -eq 0 || ${#bazel_targets[@]} -eq 0 ]]; then
exit 1
fi
if [[ $use_node_test_env -eq 1 ]]; then
# Bazel test sandboxes on macOS may resolve an older Homebrew `node`
# before the `actions/setup-node` runtime on PATH.
node_bin="$(which node)"
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
node_bin="$(cygpath -w "${node_bin}")"
fi
bazel_args+=("--test_env=CODEX_JS_REPL_NODE_PATH=${node_bin}")
if [[ "${RUNNER_OS:-}" == "Windows" && $windows_cross_compile -eq 1 && -z "${BUILDBUDDY_API_KEY:-}" ]]; then
# Fork PRs do not receive the BuildBuddy secret needed for the remote
# cross-compile config. Preserve the previous local Windows build shape.
windows_msvc_host_platform=1
fi
post_config_bazel_args=()
@@ -284,6 +284,25 @@ if [[ $remote_download_toplevel -eq 1 ]]; then
post_config_bazel_args+=(--remote_download_toplevel)
fi
if [[ "${RUNNER_OS:-}" == "Windows" && $windows_cross_compile -eq 1 && -n "${BUILDBUDDY_API_KEY:-}" ]]; then
# `--enable_platform_specific_config` expands `common:windows` on Windows
# hosts after ordinary rc configs, which can override `ci-windows-cross`'s
# RBE host platform. Repeat the host platform on the command line so V8 and
# other genrules execute on Linux RBE workers instead of Git Bash locally.
#
# Bazel also derives the default genrule shell from the client host. Without
# an explicit shell executable, remote Linux actions can be asked to run
# `C:\Program Files\Git\usr\bin\bash.exe`.
post_config_bazel_args+=(--host_platform=//:rbe --shell_executable=/bin/bash)
fi
if [[ "${RUNNER_OS:-}" == "Windows" && $windows_cross_compile -eq 1 && -z "${BUILDBUDDY_API_KEY:-}" ]]; then
# The Windows cross-compile config depends on remote execution. Fork PRs do
# not receive the BuildBuddy secret, so fall back to the existing local build
# shape and keep its lower concurrency cap.
post_config_bazel_args+=(--jobs=8)
fi
if [[ -n "${BAZEL_REPO_CONTENTS_CACHE:-}" ]]; then
# Windows self-hosted runners can run multiple Bazel jobs concurrently. Give
# each job its own repo contents cache so they do not fight over the shared
@@ -302,27 +321,57 @@ if [[ -n "${CODEX_BAZEL_EXECUTION_LOG_COMPACT_DIR:-}" ]]; then
fi
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
windows_action_env_vars=(
INCLUDE
LIB
LIBPATH
PATH
UCRTVersion
UniversalCRTSdkDir
VCINSTALLDIR
VCToolsInstallDir
WindowsLibPath
WindowsSdkBinPath
WindowsSdkDir
WindowsSDKLibVersion
WindowsSDKVersion
)
pass_windows_build_env=1
if [[ $windows_cross_compile -eq 1 && -n "${BUILDBUDDY_API_KEY:-}" ]]; then
# Remote build actions execute on Linux RBE workers. Passing the Windows
# runner's build environment there makes Bazel genrules try to execute
# C:\Program Files\Git\usr\bin\bash.exe on Linux.
pass_windows_build_env=0
fi
for env_var in "${windows_action_env_vars[@]}"; do
if [[ -n "${!env_var:-}" ]]; then
post_config_bazel_args+=("--action_env=${env_var}" "--host_action_env=${env_var}")
fi
done
if [[ $pass_windows_build_env -eq 1 ]]; then
windows_action_env_vars=(
INCLUDE
LIB
LIBPATH
UCRTVersion
UniversalCRTSdkDir
VCINSTALLDIR
VCToolsInstallDir
WindowsLibPath
WindowsSdkBinPath
WindowsSdkDir
WindowsSDKLibVersion
WindowsSDKVersion
)
for env_var in "${windows_action_env_vars[@]}"; do
if [[ -n "${!env_var:-}" ]]; then
post_config_bazel_args+=("--action_env=${env_var}" "--host_action_env=${env_var}")
fi
done
fi
if [[ -z "${CODEX_BAZEL_WINDOWS_PATH:-}" ]]; then
echo "CODEX_BAZEL_WINDOWS_PATH must be set for Windows Bazel CI." >&2
exit 1
fi
if [[ $pass_windows_build_env -eq 1 ]]; then
post_config_bazel_args+=(
"--action_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}"
"--host_action_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}"
)
elif [[ $windows_cross_compile -eq 1 ]]; then
# Remote build actions run on Linux RBE workers. Give their shell snippets
# a Linux PATH while preserving CODEX_BAZEL_WINDOWS_PATH below for local
# Windows test execution.
post_config_bazel_args+=(
"--action_env=PATH=/usr/bin:/bin"
"--host_action_env=PATH=/usr/bin:/bin"
)
fi
post_config_bazel_args+=("--test_env=PATH=${CODEX_BAZEL_WINDOWS_PATH}")
fi
bazel_console_log="$(mktemp)"

84
.github/scripts/run-bazel-query-ci.sh vendored Executable file
View File

@@ -0,0 +1,84 @@
#!/usr/bin/env bash
set -euo pipefail
# Run Bazel queries with the same CI startup settings as the main build/test
# invocation so target-discovery queries can reuse the same Bazel server.
query_args=()
windows_cross_compile=0
while [[ $# -gt 0 ]]; do
case "$1" in
--windows-cross-compile)
windows_cross_compile=1
shift
;;
--)
shift
break
;;
*)
query_args+=("$1")
shift
;;
esac
done
if [[ $# -ne 1 ]]; then
echo "Usage: $0 [--windows-cross-compile] [<bazel query args>...] -- <query expression>" >&2
exit 1
fi
query_expression="$1"
ci_config=ci-linux
case "${RUNNER_OS:-}" in
macOS)
ci_config=ci-macos
;;
Windows)
if [[ $windows_cross_compile -eq 1 ]]; then
ci_config=ci-windows-cross
else
ci_config=ci-windows
fi
;;
esac
bazel_startup_args=()
if [[ -n "${BAZEL_OUTPUT_USER_ROOT:-}" ]]; then
bazel_startup_args+=("--output_user_root=${BAZEL_OUTPUT_USER_ROOT}")
fi
run_bazel() {
if [[ "${RUNNER_OS:-}" == "Windows" ]]; then
MSYS2_ARG_CONV_EXCL='*' bazel "$@"
return
fi
bazel "$@"
}
bazel_query_args=(--noexperimental_remote_repo_contents_cache query)
if [[ -n "${BUILDBUDDY_API_KEY:-}" ]]; then
bazel_query_args+=(
"--config=${ci_config}"
"--remote_header=x-buildbuddy-api-key=${BUILDBUDDY_API_KEY}"
)
fi
if [[ -n "${BAZEL_REPO_CONTENTS_CACHE:-}" ]]; then
bazel_query_args+=("--repo_contents_cache=${BAZEL_REPO_CONTENTS_CACHE}")
fi
if [[ -n "${BAZEL_REPOSITORY_CACHE:-}" ]]; then
bazel_query_args+=("--repository_cache=${BAZEL_REPOSITORY_CACHE}")
fi
bazel_query_args+=("${query_args[@]}" "$query_expression")
if (( ${#bazel_startup_args[@]} > 0 )); then
run_bazel "${bazel_startup_args[@]}" "${bazel_query_args[@]}"
else
run_bazel "${bazel_query_args[@]}"
fi

View File

@@ -63,8 +63,10 @@ def bazel_output_files(
platform: str,
labels: list[str],
compilation_mode: str = "fastbuild",
bazel_configs: list[str] | None = None,
) -> list[Path]:
expression = "set(" + " ".join(labels) + ")"
bazel_configs = bazel_configs or []
result = subprocess.run(
[
"bazel",
@@ -72,6 +74,7 @@ def bazel_output_files(
"-c",
compilation_mode,
f"--platforms=@llvm//platforms:{platform}",
*[f"--config={config}" for config in bazel_configs],
"--output=files",
expression,
],
@@ -87,7 +90,9 @@ def bazel_build(
platform: str,
labels: list[str],
compilation_mode: str = "fastbuild",
bazel_configs: list[str] | None = None,
) -> None:
bazel_configs = bazel_configs or []
subprocess.run(
[
"bazel",
@@ -95,6 +100,7 @@ def bazel_build(
"-c",
compilation_mode,
f"--platforms=@llvm//platforms:{platform}",
*[f"--config={config}" for config in bazel_configs],
*labels,
],
cwd=ROOT,
@@ -106,13 +112,14 @@ def ensure_bazel_output_files(
platform: str,
labels: list[str],
compilation_mode: str = "fastbuild",
bazel_configs: list[str] | None = None,
) -> list[Path]:
outputs = bazel_output_files(platform, labels, compilation_mode)
outputs = bazel_output_files(platform, labels, compilation_mode, bazel_configs)
if all(path.exists() for path in outputs):
return outputs
bazel_build(platform, labels, compilation_mode)
outputs = bazel_output_files(platform, labels, compilation_mode)
bazel_build(platform, labels, compilation_mode, bazel_configs)
outputs = bazel_output_files(platform, labels, compilation_mode, bazel_configs)
missing = [str(path) for path in outputs if not path.exists()]
if missing:
raise SystemExit(f"missing built outputs for {labels}: {missing}")
@@ -187,8 +194,9 @@ def single_bazel_output_file(
platform: str,
label: str,
compilation_mode: str = "fastbuild",
bazel_configs: list[str] | None = None,
) -> Path:
outputs = ensure_bazel_output_files(platform, [label], compilation_mode)
outputs = ensure_bazel_output_files(platform, [label], compilation_mode, bazel_configs)
if len(outputs) != 1:
raise SystemExit(f"expected exactly one output for {label}, found {outputs}")
return outputs[0]
@@ -198,11 +206,17 @@ def merged_musl_archive(
platform: str,
lib_path: Path,
compilation_mode: str = "fastbuild",
bazel_configs: list[str] | None = None,
) -> Path:
llvm_ar = single_bazel_output_file(platform, LLVM_AR_LABEL, compilation_mode)
llvm_ranlib = single_bazel_output_file(platform, LLVM_RANLIB_LABEL, compilation_mode)
llvm_ar = single_bazel_output_file(platform, LLVM_AR_LABEL, compilation_mode, bazel_configs)
llvm_ranlib = single_bazel_output_file(
platform,
LLVM_RANLIB_LABEL,
compilation_mode,
bazel_configs,
)
runtime_archives = [
single_bazel_output_file(platform, label, compilation_mode)
single_bazel_output_file(platform, label, compilation_mode, bazel_configs)
for label in MUSL_RUNTIME_ARCHIVE_LABELS
]
@@ -233,11 +247,13 @@ def stage_release_pair(
target: str,
output_dir: Path,
compilation_mode: str = "fastbuild",
bazel_configs: list[str] | None = None,
) -> None:
outputs = ensure_bazel_output_files(
platform,
[release_pair_label(target)],
compilation_mode,
bazel_configs,
)
try:
@@ -254,7 +270,7 @@ def stage_release_pair(
staged_library = output_dir / staged_archive_name(target, lib_path)
staged_binding = output_dir / f"src_binding_release_{target}.rs"
source_archive = (
merged_musl_archive(platform, lib_path, compilation_mode)
merged_musl_archive(platform, lib_path, compilation_mode, bazel_configs)
if is_musl_archive_target(target, lib_path)
else lib_path
)
@@ -293,6 +309,12 @@ def parse_args() -> argparse.Namespace:
stage_release_pair_parser.add_argument("--platform", required=True)
stage_release_pair_parser.add_argument("--target", required=True)
stage_release_pair_parser.add_argument("--output-dir", required=True)
stage_release_pair_parser.add_argument(
"--bazel-config",
action="append",
default=[],
dest="bazel_configs",
)
stage_release_pair_parser.add_argument(
"--compilation-mode",
default="fastbuild",
@@ -330,6 +352,7 @@ def main() -> int:
target=args.target,
output_dir=Path(args.output_dir),
compilation_mode=args.compilation_mode,
bazel_configs=args.bazel_configs,
)
return 0
if args.command == "resolved-v8-crate-version":

View File

@@ -25,7 +25,10 @@ TOP_LEVEL_NAME_EXCEPTIONS = {
UTILITY_NAME_EXCEPTIONS = {
"path-utils": "codex-utils-path",
}
MANIFEST_FEATURE_EXCEPTIONS = {}
MANIFEST_FEATURE_EXCEPTIONS = {
"codex-rs/code-mode/Cargo.toml": {"sandbox": ("v8/v8_enable_sandbox",)},
"codex-rs/v8-poc/Cargo.toml": {"sandbox": ("v8/v8_enable_sandbox",)},
}
OPTIONAL_DEPENDENCY_EXCEPTIONS = set()
INTERNAL_DEPENDENCY_FEATURE_EXCEPTIONS = {}

View File

@@ -8,25 +8,9 @@ FROM ubuntu:24.04
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl git python3 ca-certificates xz-utils && \
curl git python3 ca-certificates && \
rm -rf /var/lib/apt/lists/*
COPY codex-rs/node-version.txt /tmp/node-version.txt
RUN set -eux; \
node_arch="$(dpkg --print-architecture)"; \
case "${node_arch}" in \
amd64) node_dist_arch="x64" ;; \
arm64) node_dist_arch="arm64" ;; \
*) echo "unsupported architecture: ${node_arch}"; exit 1 ;; \
esac; \
node_version="$(tr -d '[:space:]' </tmp/node-version.txt)"; \
curl -fsSLO "https://nodejs.org/dist/v${node_version}/node-v${node_version}-linux-${node_dist_arch}.tar.xz"; \
tar -xJf "node-v${node_version}-linux-${node_dist_arch}.tar.xz" -C /usr/local --strip-components=1; \
rm "node-v${node_version}-linux-${node_dist_arch}.tar.xz" /tmp/node-version.txt; \
node --version; \
npm --version
# Install dotslash.
RUN curl -LSfs "https://github.com/facebook/dotslash/releases/download/v0.5.8/dotslash-ubuntu-22.04.$(uname -m).tar.gz" | tar fxz - -C /usr/local/bin

View File

@@ -17,6 +17,10 @@ concurrency:
cancel-in-progress: ${{ github.ref_name != 'main' }}
jobs:
test:
# PRs use a fast Windows cross-compiled test leg for pre-merge signal.
# Post-merge pushes to main also run the native Windows test job below for
# broader Windows signal without putting PR latency back on the critical
# path. Cargo CI owns V8/code-mode test coverage for now.
timeout-minutes: 30
strategy:
fail-fast: false
@@ -40,13 +44,16 @@ jobs:
# - os: ubuntu-24.04-arm
# target: aarch64-unknown-linux-gnu
# Windows
# Windows fast path: build the windows-gnullvm binaries with Linux
# RBE, then run the resulting Windows tests on the Windows runner.
# Cargo CI preserves V8/code-mode coverage while Bazel CI keeps broad
# non-code-mode signal.
- os: windows-latest
target: x86_64-pc-windows-gnullvm
runs-on: ${{ matrix.os }}
# Configure a human readable name for each job
name: Local Bazel build on ${{ matrix.os }} for ${{ matrix.target }}
name: Bazel test on ${{ matrix.os }} for ${{ matrix.target }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
@@ -81,11 +88,16 @@ jobs:
# path. V8 consumers under `//codex-rs/...` still participate
# transitively through `//...`.
-//third_party/v8:all
# V8-backed code-mode tests are covered by Cargo CI. Bazel CI
# cross-compiles in several legs, and those tests are not stable in
# that setup yet.
-//codex-rs/code-mode:code-mode-unit-tests
-//codex-rs/v8-poc:v8-poc-unit-tests
)
bazel_wrapper_args=(
--print-failed-action-summary
--print-failed-test-logs
--use-node-test-env
)
bazel_test_args=(
test
@@ -94,8 +106,10 @@ jobs:
--build_metadata=COMMIT_SHA=${GITHUB_SHA}
)
if [[ "${RUNNER_OS}" == "Windows" ]]; then
bazel_wrapper_args+=(--windows-msvc-host-platform)
bazel_test_args+=(--jobs=8)
bazel_wrapper_args+=(
--windows-cross-compile
--remote-download-toplevel
)
fi
./.github/scripts/run-bazel-ci.sh \
@@ -124,6 +138,79 @@ jobs:
path: ${{ steps.prepare_bazel.outputs.repository-cache-path }}
key: ${{ steps.prepare_bazel.outputs.repository-cache-key }}
test-windows-native-main:
# Native Windows Bazel tests are slower and frequently approach the
# 30-minute PR budget. Run this only for post-merge commits to main and give
# it a larger timeout.
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
timeout-minutes: 40
runs-on: windows-latest
name: Bazel test on windows-latest for x86_64-pc-windows-gnullvm (native main)
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Prepare Bazel CI
id: prepare_bazel
uses: ./.github/actions/prepare-bazel-ci
with:
target: x86_64-pc-windows-gnullvm
cache-scope: bazel-${{ github.job }}
install-test-prereqs: "true"
- name: bazel test //...
env:
BUILDBUDDY_API_KEY: ${{ secrets.BUILDBUDDY_API_KEY }}
shell: bash
run: |
bazel_targets=(
//...
# Keep standalone V8 library targets out of the ordinary Bazel CI
# path. V8 consumers under `//codex-rs/...` still participate
# transitively through `//...`.
-//third_party/v8:all
# Keep this aligned with the main Bazel job. The native Windows
# job preserves broad post-merge coverage, but code-mode/V8 tests
# are covered by Cargo CI rather than Bazel for now.
-//codex-rs/code-mode:code-mode-unit-tests
-//codex-rs/v8-poc:v8-poc-unit-tests
)
bazel_test_args=(
test
--test_tag_filters=-argument-comment-lint
--test_verbose_timeout_warnings
--build_metadata=COMMIT_SHA=${GITHUB_SHA}
--build_metadata=TAG_windows_native_main=true
)
./.github/scripts/run-bazel-ci.sh \
--print-failed-action-summary \
--print-failed-test-logs \
-- \
"${bazel_test_args[@]}" \
-- \
"${bazel_targets[@]}"
- name: Upload Bazel execution logs
if: always() && !cancelled()
continue-on-error: true
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
with:
name: bazel-execution-logs-test-windows-native-x86_64-pc-windows-gnullvm
path: ${{ runner.temp }}/bazel-execution-logs
if-no-files-found: ignore
# Save the job-scoped Bazel repository cache after cache misses. Keep the
# upload non-fatal so cache service issues never fail the job itself.
- name: Save bazel repository cache
if: always() && !cancelled() && steps.prepare_bazel.outputs.repository-cache-hit != 'true'
continue-on-error: true
uses: actions/cache/save@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5
with:
path: ${{ steps.prepare_bazel.outputs.repository-cache-path }}
key: ${{ steps.prepare_bazel.outputs.repository-cache-key }}
clippy:
timeout-minutes: 30
strategy:
@@ -164,17 +251,24 @@ jobs:
--build_metadata=TAG_job=clippy
)
bazel_wrapper_args=()
bazel_target_list_args=()
if [[ "${RUNNER_OS}" == "Windows" ]]; then
# Keep this aligned with the Windows Bazel test job. With the
# default `//:local_windows` host platform, Windows `rust_test`
# targets such as `//codex-rs/core:core-all-test` can be skipped
# by `--skip_incompatible_explicit_targets`, which hides clippy
# diagnostics from integration-test modules.
bazel_wrapper_args+=(--windows-msvc-host-platform)
bazel_clippy_args+=(--skip_incompatible_explicit_targets)
# Keep this aligned with the fast Windows Bazel test job: use
# Linux RBE for clippy build actions while targeting Windows
# gnullvm. Fork/community PRs without the BuildBuddy secret fall
# back inside `run-bazel-ci.sh` to the previous local Windows MSVC
# host-platform shape.
bazel_wrapper_args+=(--windows-cross-compile)
bazel_target_list_args+=(--windows-cross-compile)
if [[ -z "${BUILDBUDDY_API_KEY:-}" ]]; then
# The fork fallback can see incompatible explicit Windows-cross
# internal test binaries in the generated target list. Preserve
# the old local-fallback behavior there.
bazel_clippy_args+=(--skip_incompatible_explicit_targets)
fi
fi
bazel_target_lines="$(./scripts/list-bazel-clippy-targets.sh)"
bazel_target_lines="$(./scripts/list-bazel-clippy-targets.sh "${bazel_target_list_args[@]}")"
bazel_targets=()
while IFS= read -r target; do
bazel_targets+=("${target}")
@@ -246,7 +340,12 @@ jobs:
# Rust debug assertions explicitly.
bazel_wrapper_args=()
if [[ "${RUNNER_OS}" == "Windows" ]]; then
bazel_wrapper_args+=(--windows-msvc-host-platform)
# This is build-only signal, so use the same Linux-RBE
# cross-compile path as the fast Windows test and clippy jobs.
# Fork/community PRs without the BuildBuddy secret fall back
# inside `run-bazel-ci.sh` to the previous local Windows MSVC
# host-platform shape.
bazel_wrapper_args+=(--windows-cross-compile)
fi
bazel_build_args=(
@@ -272,6 +371,22 @@ jobs:
-- \
"${bazel_targets[@]}"
- name: Verify Bazel builds bwrap
if: runner.os == 'Linux'
env:
BUILDBUDDY_API_KEY: ${{ secrets.BUILDBUDDY_API_KEY }}
shell: bash
run: |
./.github/scripts/run-bazel-ci.sh \
--remote-download-toplevel \
--print-failed-action-summary \
-- \
build \
--build_metadata=COMMIT_SHA=${GITHUB_SHA} \
--build_metadata=TAG_job=verify-bwrap \
-- \
//codex-rs/bwrap:bwrap
- name: Upload Bazel execution logs
if: always() && !cancelled()
continue-on-error: true

View File

@@ -17,10 +17,10 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@631a55b12751854ce901bb631d5902ceb48146f7 # stable
uses: dtolnay/rust-toolchain@a0b273b48ed29de4470960879e8381ff45632f26 # 1.93.0
- name: Run cargo-deny
uses: EmbarkStudios/cargo-deny-action@82eb9f621fbc699dd0918f3ea06864c14cc84246 # v2
with:
rust-version: stable
rust-version: 1.93.0
manifest-path: ./codex-rs/Cargo.toml

View File

@@ -45,12 +45,19 @@ jobs:
GH_TOKEN: ${{ github.token }}
run: |
set -euo pipefail
# Use a rust-release version that includes all native binaries.
CODEX_VERSION=0.115.0
# Use a recent successful rust-release run that published the full
# cross-platform native payload required by the npm package layout.
# Passing the workflow URL directly avoids relying on old rust-v*
# branches remaining discoverable via `gh run list --branch ...`.
CODEX_VERSION=0.125.0
WORKFLOW_URL="https://github.com/openai/codex/actions/runs/24901475298"
OUTPUT_DIR="${RUNNER_TEMP}"
# This reused workflow predates the standalone bwrap artifact.
python3 ./scripts/stage_npm_packages.py \
--release-version "$CODEX_VERSION" \
--workflow-url "$WORKFLOW_URL" \
--package codex \
--allow-missing-native-component bwrap \
--output-dir "$OUTPUT_DIR"
PACK_OUTPUT="${OUTPUT_DIR}/codex-npm-${CODEX_VERSION}.tgz"
echo "pack_output=$PACK_OUTPUT" >> "$GITHUB_OUTPUT"

View File

@@ -61,7 +61,7 @@ jobs:
# .github/prompts/issue-deduplicator.txt file is obsolete and removed.
- id: codex-all
name: Find duplicates (pass 1, all issues)
uses: openai/codex-action@0b91f4a2703c23df3102c3f0967d3c6db34eedef # v1
uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02 # v1.7
with:
openai-api-key: ${{ secrets.CODEX_OPENAI_API_KEY }}
allow-users: "*"
@@ -195,7 +195,7 @@ jobs:
- id: codex-open
name: Find duplicates (pass 2, open issues)
uses: openai/codex-action@0b91f4a2703c23df3102c3f0967d3c6db34eedef # v1
uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02 # v1.7
with:
openai-api-key: ${{ secrets.CODEX_OPENAI_API_KEY }}
allow-users: "*"

View File

@@ -20,7 +20,7 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- id: codex
uses: openai/codex-action@0b91f4a2703c23df3102c3f0967d3c6db34eedef # v1
uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02 # v1.7
with:
openai-api-key: ${{ secrets.CODEX_OPENAI_API_KEY }}
allow-users: "*"
@@ -44,7 +44,7 @@ jobs:
6. iOS — Issues with the Codex iOS app.
- Additionally add zero or more of the following labels that are relevant to the issue content. Prefer a small set of precise labels over many broad ones.
- For agent-area issues, prefer the most specific applicable label. Use "agent" only as a fallback for agent-related issues that do not fit a more specific agent-area label. Prefer "app-server" over "session" or "config" when the issue is about app-server protocol, API, RPC, schema, launch, or bridge behavior.
- For agent-area issues, prefer the most specific applicable label. Use "agent" only as a fallback for agent-related issues that do not fit a more specific agent-area label. Prefer "app-server" over "session" or "config" when the issue is about app-server protocol, API, RPC, schema, launch, or bridge behavior. Use "memory" for agentic memory storage/retrieval and "performance" for high process memory utilization or memory leaks.
1. windows-os — Bugs or friction specific to Windows environments (always when PowerShell is mentioned, path handling, copy/paste, OS-specific auth or tooling failures).
2. mcp — Topics involving Model Context Protocol servers/clients.
3. mcp-server — Problems related to the codex mcp-server command, where codex runs as an MCP server.
@@ -68,7 +68,15 @@ jobs:
21. session - Issues involving session or thread management, including resume, fork, archive, rename/title, thread history, rollout persistence, compaction, checkpoints, retention, and cross-session state.
22. config - Issues involving config.toml, config keys, config key merging, config updates, profiles, hooks config, project config, agent role TOMLs, instruction/personality config, and config schema behavior.
23. plan - Issues involving plan mode, planning workflows, or plan-specific tools/behavior.
24. agent - Fallback only for core agent loop or agent-related issues that do not fit app-server, connectivity, subagent, session, config, or plan.
24. computer-use - Issues involving agentic computer use or SkyComputerUseService.
25. browser - Issues involving agentic browser use, IAB, or the built-in browser within the Codex app.
26. memory - Issues involving agentic memory storage and retrieval.
27. imagen - Issues involving image generation.
28. remote - Issues involving remote access, remote control, or SSH.
29. performance - Issues involving slow, laggy performance, high memory utilization, or memory leaks.
30. automations - Issues involving scheduled automation tasks or heartbeats.
31. pets - Issues involving pets avatars and animations.
32. agent - Fallback only for core agent loop or agent-related issues that do not fit app-server, connectivity, subagent, session, config, plan, computer-use, browser, memory, imagen, remote, performance, automations, or pets.
Issue number: ${{ github.event.issue.number }}

View File

@@ -76,6 +76,8 @@ jobs:
- name: Test argument comment lint package
working-directory: tools/argument-comment-lint
run: cargo test
env:
RUST_MIN_STACK: "8388608" # 8 MiB
argument_comment_lint_prebuilt:
name: Argument comment lint - ${{ matrix.name }}
@@ -558,10 +560,6 @@ jobs:
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Set up Node.js for js_repl tests
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6
with:
node-version-file: codex-rs/node-version.txt
- name: Install Linux build dependencies
if: ${{ runner.os == 'Linux' }}
shell: bash
@@ -671,6 +669,7 @@ jobs:
run: cargo nextest run --no-fail-fast --target ${{ matrix.target }} --cargo-profile ci-test --timings
env:
RUST_BACKTRACE: 1
RUST_MIN_STACK: "8388608" # 8 MiB
NEXTEST_STATUS_LEVEL: leak
- name: Upload Cargo timings (nextest)

View File

@@ -131,6 +131,8 @@ jobs:
- name: Test argument comment lint package
working-directory: tools/argument-comment-lint
run: cargo test
env:
RUST_MIN_STACK: "8388608" # 8 MiB
argument_comment_lint_prebuilt:
name: Argument comment lint - ${{ matrix.name }}

View File

@@ -24,7 +24,9 @@ jobs:
build-windows-binaries:
name: Build Windows binaries - ${{ matrix.runner }} - ${{ matrix.target }} - ${{ matrix.bundle }}
runs-on: ${{ matrix.runs_on }}
timeout-minutes: 60
# Windows release builds can exceed an hour on fat-LTO mainline releases,
# so keep the timeout aligned with the top-level release build headroom.
timeout-minutes: 90
permissions:
contents: read
defaults:
@@ -40,28 +42,42 @@ jobs:
- runner: windows-x64
target: x86_64-pc-windows-msvc
bundle: primary
build_args: --bin codex --bin codex-responses-api-proxy
binaries: "codex codex-responses-api-proxy"
runs_on:
group: codex-runners
labels: codex-windows-x64
- runner: windows-arm64
target: aarch64-pc-windows-msvc
bundle: primary
build_args: --bin codex --bin codex-responses-api-proxy
binaries: "codex codex-responses-api-proxy"
runs_on:
group: codex-runners
labels: codex-windows-arm64
- runner: windows-x64
target: x86_64-pc-windows-msvc
bundle: helpers
build_args: --bin codex-windows-sandbox-setup --bin codex-command-runner
binaries: "codex-windows-sandbox-setup codex-command-runner"
runs_on:
group: codex-runners
labels: codex-windows-x64
- runner: windows-arm64
target: aarch64-pc-windows-msvc
bundle: helpers
build_args: --bin codex-windows-sandbox-setup --bin codex-command-runner
binaries: "codex-windows-sandbox-setup codex-command-runner"
runs_on:
group: codex-runners
labels: codex-windows-arm64
- runner: windows-x64
target: x86_64-pc-windows-msvc
bundle: app-server
binaries: "codex-app-server"
runs_on:
group: codex-runners
labels: codex-windows-x64
- runner: windows-arm64
target: aarch64-pc-windows-msvc
bundle: app-server
binaries: "codex-app-server"
runs_on:
group: codex-runners
labels: codex-windows-arm64
@@ -89,7 +105,11 @@ jobs:
- name: Cargo build (Windows binaries)
shell: bash
run: |
cargo build --target ${{ matrix.target }} --release --timings ${{ matrix.build_args }}
build_args=()
for binary in ${{ matrix.binaries }}; do
build_args+=(--bin "$binary")
done
cargo build --target ${{ matrix.target }} --release --timings "${build_args[@]}"
- name: Upload Cargo timings
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
@@ -103,13 +123,9 @@ jobs:
run: |
output_dir="target/${{ matrix.target }}/release/staged-${{ matrix.bundle }}"
mkdir -p "$output_dir"
if [[ "${{ matrix.bundle }}" == "primary" ]]; then
cp target/${{ matrix.target }}/release/codex.exe "$output_dir/codex.exe"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy.exe "$output_dir/codex-responses-api-proxy.exe"
else
cp target/${{ matrix.target }}/release/codex-windows-sandbox-setup.exe "$output_dir/codex-windows-sandbox-setup.exe"
cp target/${{ matrix.target }}/release/codex-command-runner.exe "$output_dir/codex-command-runner.exe"
fi
for binary in ${{ matrix.binaries }}; do
cp "target/${{ matrix.target }}/release/${binary}.exe" "$output_dir/${binary}.exe"
done
- name: Upload Windows binaries
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
@@ -123,13 +139,15 @@ jobs:
- build-windows-binaries
name: Build - ${{ matrix.runner }} - ${{ matrix.target }}
runs-on: ${{ matrix.runs_on }}
timeout-minutes: 60
timeout-minutes: 90
permissions:
contents: read
id-token: write
defaults:
run:
working-directory: codex-rs
env:
WINDOWS_BINARIES: "codex codex-responses-api-proxy codex-windows-sandbox-setup codex-command-runner codex-app-server"
strategy:
fail-fast: false
@@ -161,19 +179,25 @@ jobs:
name: windows-binaries-${{ matrix.target }}-helpers
path: codex-rs/target/${{ matrix.target }}/release
- name: Download prebuilt Windows app-server binary
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8
with:
name: windows-binaries-${{ matrix.target }}-app-server
path: codex-rs/target/${{ matrix.target }}/release
- name: Verify binaries
shell: bash
run: |
set -euo pipefail
ls -lh target/${{ matrix.target }}/release/codex.exe
ls -lh target/${{ matrix.target }}/release/codex-responses-api-proxy.exe
ls -lh target/${{ matrix.target }}/release/codex-windows-sandbox-setup.exe
ls -lh target/${{ matrix.target }}/release/codex-command-runner.exe
for binary in ${WINDOWS_BINARIES}; do
ls -lh "target/${{ matrix.target }}/release/${binary}.exe"
done
- name: Sign Windows binaries with Azure Trusted Signing
uses: ./.github/actions/windows-code-sign
with:
target: ${{ matrix.target }}
binaries: ${{ env.WINDOWS_BINARIES }}
client-id: ${{ secrets.AZURE_TRUSTED_SIGNING_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TRUSTED_SIGNING_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_TRUSTED_SIGNING_SUBSCRIPTION_ID }}
@@ -187,10 +211,10 @@ jobs:
dest="dist/${{ matrix.target }}"
mkdir -p "$dest"
cp target/${{ matrix.target }}/release/codex.exe "$dest/codex-${{ matrix.target }}.exe"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy.exe "$dest/codex-responses-api-proxy-${{ matrix.target }}.exe"
cp target/${{ matrix.target }}/release/codex-windows-sandbox-setup.exe "$dest/codex-windows-sandbox-setup-${{ matrix.target }}.exe"
cp target/${{ matrix.target }}/release/codex-command-runner.exe "$dest/codex-command-runner-${{ matrix.target }}.exe"
for binary in ${WINDOWS_BINARIES}; do
cp "target/${{ matrix.target }}/release/${binary}.exe" \
"$dest/${binary}-${{ matrix.target }}.exe"
done
- name: Install DotSlash
uses: facebook/install-dotslash@1e4e7b3e07eaca387acb98f1d4720e0bee8dbb6a # v2

View File

@@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- uses: dtolnay/rust-toolchain@c2b55edffaf41a251c410bb32bed22afefa800f1 # 1.92
- uses: dtolnay/rust-toolchain@a0b273b48ed29de4470960879e8381ff45632f26 # 1.93.0
- name: Validate tag matches Cargo.toml version
shell: bash
run: |
@@ -47,9 +47,11 @@ jobs:
build:
needs: tag-check
name: Build - ${{ matrix.runner }} - ${{ matrix.target }}
name: Build - ${{ matrix.runner }} - ${{ matrix.target }} - ${{ matrix.bundle }}
runs-on: ${{ matrix.runs_on || matrix.runner }}
timeout-minutes: 60
# Release builds can take a long time, so leave some headroom to avoid
# having to restart the full workflow due to a timeout.
timeout-minutes: 90
permissions:
contents: read
id-token: write
@@ -67,16 +69,53 @@ jobs:
include:
- runner: macos-15-xlarge
target: aarch64-apple-darwin
bundle: primary
artifact_name: aarch64-apple-darwin
binaries: "codex codex-responses-api-proxy"
build_dmg: "true"
- runner: macos-15-xlarge
target: aarch64-apple-darwin
bundle: app-server
artifact_name: aarch64-apple-darwin-app-server
binaries: "codex-app-server"
build_dmg: "false"
- runner: macos-15-xlarge
target: x86_64-apple-darwin
bundle: primary
artifact_name: x86_64-apple-darwin
binaries: "codex codex-responses-api-proxy"
build_dmg: "true"
- runner: macos-15-xlarge
target: x86_64-apple-darwin
bundle: app-server
artifact_name: x86_64-apple-darwin-app-server
binaries: "codex-app-server"
build_dmg: "false"
# Release artifacts intentionally ship MUSL-linked Linux binaries.
- runner: ubuntu-24.04
target: x86_64-unknown-linux-musl
bundle: primary
artifact_name: x86_64-unknown-linux-musl
binaries: "codex codex-responses-api-proxy bwrap"
build_dmg: "false"
- runner: ubuntu-24.04
target: x86_64-unknown-linux-gnu
target: x86_64-unknown-linux-musl
bundle: app-server
artifact_name: x86_64-unknown-linux-musl-app-server
binaries: "codex-app-server"
build_dmg: "false"
- runner: ubuntu-24.04-arm
target: aarch64-unknown-linux-musl
bundle: primary
artifact_name: aarch64-unknown-linux-musl
binaries: "codex codex-responses-api-proxy bwrap"
build_dmg: "false"
- runner: ubuntu-24.04-arm
target: aarch64-unknown-linux-gnu
target: aarch64-unknown-linux-musl
bundle: app-server
artifact_name: aarch64-unknown-linux-musl-app-server
binaries: "codex-app-server"
build_dmg: "false"
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
@@ -216,16 +255,38 @@ jobs:
with:
target: ${{ matrix.target }}
- if: ${{ contains(matrix.target, 'linux') && matrix.bundle == 'primary' }}
name: Build bwrap and export digest
shell: bash
run: |
set -euo pipefail
target="${{ matrix.target }}"
cargo build --target "$target" --release --timings --bin bwrap
bwrap_path="target/${target}/release/bwrap"
if [[ ! -f "$bwrap_path" ]]; then
echo "bwrap binary ${bwrap_path} not found"
exit 1
fi
digest="$(sha256sum "$bwrap_path" | awk '{print $1}')"
echo "CODEX_BWRAP_SHA256=${digest}" >> "$GITHUB_ENV"
echo "Built bwrap ${bwrap_path} with sha256:${digest}"
- name: Cargo build
shell: bash
run: |
build_args=()
for binary in ${{ matrix.binaries }}; do
build_args+=(--bin "$binary")
done
echo "CARGO_PROFILE_RELEASE_LTO: ${CARGO_PROFILE_RELEASE_LTO}"
cargo build --target ${{ matrix.target }} --release --timings --bin codex --bin codex-responses-api-proxy
cargo build --target ${{ matrix.target }} --release --timings "${build_args[@]}"
- name: Upload Cargo timings
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
with:
name: cargo-timings-rust-release-${{ matrix.target }}
name: cargo-timings-rust-release-${{ matrix.target }}-${{ matrix.bundle }}
path: codex-rs/target/**/cargo-timings/cargo-timing.html
if-no-files-found: warn
@@ -235,12 +296,14 @@ jobs:
with:
target: ${{ matrix.target }}
artifacts-dir: ${{ github.workspace }}/codex-rs/target/${{ matrix.target }}/release
binaries: ${{ matrix.binaries }}
- if: ${{ runner.os == 'macOS' }}
name: MacOS code signing (binaries)
uses: ./.github/actions/macos-code-sign
with:
target: ${{ matrix.target }}
binaries: ${{ matrix.binaries }}
sign-binaries: "true"
sign-dmg: "false"
apple-certificate: ${{ secrets.APPLE_CERTIFICATE_P12 }}
@@ -249,7 +312,7 @@ jobs:
apple-notarization-key-id: ${{ secrets.APPLE_NOTARIZATION_KEY_ID }}
apple-notarization-issuer-id: ${{ secrets.APPLE_NOTARIZATION_ISSUER_ID }}
- if: ${{ runner.os == 'macOS' }}
- if: ${{ runner.os == 'macOS' && matrix.build_dmg == 'true' }}
name: Build macOS dmg
shell: bash
run: |
@@ -264,23 +327,17 @@ jobs:
# The previous "MacOS code signing (binaries)" step signs + notarizes the
# built artifacts in `${release_dir}`. This step packages *those same*
# signed binaries into a dmg.
codex_binary_path="${release_dir}/codex"
proxy_binary_path="${release_dir}/codex-responses-api-proxy"
rm -rf "$dmg_root"
mkdir -p "$dmg_root"
if [[ ! -f "$codex_binary_path" ]]; then
echo "Binary $codex_binary_path not found"
exit 1
fi
if [[ ! -f "$proxy_binary_path" ]]; then
echo "Binary $proxy_binary_path not found"
exit 1
fi
ditto "$codex_binary_path" "${dmg_root}/codex"
ditto "$proxy_binary_path" "${dmg_root}/codex-responses-api-proxy"
for binary in ${{ matrix.binaries }}; do
binary_path="${release_dir}/${binary}"
if [[ ! -f "${binary_path}" ]]; then
echo "Binary ${binary_path} not found"
exit 1
fi
ditto "${binary_path}" "${dmg_root}/${binary}"
done
rm -f "$dmg_path"
hdiutil create \
@@ -295,7 +352,7 @@ jobs:
exit 1
fi
- if: ${{ runner.os == 'macOS' }}
- if: ${{ runner.os == 'macOS' && matrix.build_dmg == 'true' }}
name: MacOS code signing (dmg)
uses: ./.github/actions/macos-code-sign
with:
@@ -314,15 +371,26 @@ jobs:
dest="dist/${{ matrix.target }}"
mkdir -p "$dest"
cp target/${{ matrix.target }}/release/codex "$dest/codex-${{ matrix.target }}"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy "$dest/codex-responses-api-proxy-${{ matrix.target }}"
for binary in ${{ matrix.binaries }}; do
cp "target/${{ matrix.target }}/release/${binary}" "$dest/${binary}-${{ matrix.target }}"
if [[ "${{ matrix.target }}" == *linux* ]]; then
cp "target/${{ matrix.target }}/release/${binary}.sigstore" \
"$dest/${binary}-${{ matrix.target }}.sigstore"
fi
done
if [[ "${{ matrix.target }}" == *linux* ]]; then
cp target/${{ matrix.target }}/release/codex.sigstore "$dest/codex-${{ matrix.target }}.sigstore"
cp target/${{ matrix.target }}/release/codex-responses-api-proxy.sigstore "$dest/codex-responses-api-proxy-${{ matrix.target }}.sigstore"
if [[ "${{ matrix.target }}" == *linux* && "${{ matrix.bundle }}" == "primary" ]]; then
bundle_root="${RUNNER_TEMP}/codex-${{ matrix.target }}-bundle"
rm -rf "$bundle_root"
mkdir -p "$bundle_root/codex-resources"
cp "$dest/codex-${{ matrix.target }}" "$bundle_root/codex"
cp "$dest/bwrap-${{ matrix.target }}" "$bundle_root/codex-resources/bwrap"
chmod 0755 "$bundle_root/codex" "$bundle_root/codex-resources/bwrap"
tar -C "$bundle_root" -cf - codex codex-resources/bwrap |
zstd -T0 -19 -o "$dest/codex-${{ matrix.target }}-bundle.tar.zst"
fi
if [[ "${{ matrix.target }}" == *apple-darwin ]]; then
if [[ "${{ matrix.build_dmg }}" == "true" ]]; then
cp target/${{ matrix.target }}/release/codex-${{ matrix.target }}.dmg "$dest/codex-${{ matrix.target }}.dmg"
fi
@@ -345,7 +413,7 @@ jobs:
base="$(basename "$f")"
# Skip files that are already archives (shouldn't happen, but be
# safe).
if [[ "$base" == *.tar.gz || "$base" == *.zip || "$base" == *.dmg ]]; then
if [[ "$base" == *.tar.gz || "$base" == *.tar.zst || "$base" == *.zip || "$base" == *.dmg ]]; then
continue
fi
@@ -364,9 +432,9 @@ jobs:
- uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7
with:
name: ${{ matrix.target }}
# Upload the per-binary .zst files as well as the new .tar.gz
# equivalents we generated in the previous step.
name: ${{ matrix.artifact_name }}
# Upload the per-binary .zst files, .tar.gz equivalents, and any
# prebuilt archives staged above.
path: |
codex-rs/dist/${{ matrix.target }}/*
@@ -614,11 +682,59 @@ jobs:
prefix="${NPM_TAG}-"
fi
root_tarball="dist/npm/codex-npm-${VERSION}.tgz"
sdk_tarball="dist/npm/codex-sdk-npm-${VERSION}.tgz"
# Keep this list in sync with CODEX_PLATFORM_PACKAGES in
# codex-cli/scripts/build_npm_package.py. The root wrapper advances
# @openai/codex@latest as soon as it publishes, so every platform
# package it aliases must already exist in the registry first.
platform_tarballs=(
"dist/npm/codex-npm-linux-x64-${VERSION}.tgz"
"dist/npm/codex-npm-linux-arm64-${VERSION}.tgz"
"dist/npm/codex-npm-darwin-x64-${VERSION}.tgz"
"dist/npm/codex-npm-darwin-arm64-${VERSION}.tgz"
"dist/npm/codex-npm-win32-x64-${VERSION}.tgz"
"dist/npm/codex-npm-win32-arm64-${VERSION}.tgz"
)
for required_tarball in "${platform_tarballs[@]}" "${root_tarball}"; do
if [[ ! -f "${required_tarball}" ]]; then
echo "Missing npm tarball: ${required_tarball}"
exit 1
fi
done
shopt -s nullglob
tarballs=(dist/npm/*-"${VERSION}".tgz)
if [[ ${#tarballs[@]} -eq 0 ]]; then
echo "No npm tarballs found in dist/npm for version ${VERSION}"
exit 1
other_tarballs=()
for tarball in dist/npm/*-"${VERSION}".tgz; do
if [[ "${tarball}" == "${root_tarball}" || "${tarball}" == "${sdk_tarball}" ]]; then
continue
fi
is_platform_tarball=false
for platform_tarball in "${platform_tarballs[@]}"; do
if [[ "${tarball}" == "${platform_tarball}" ]]; then
is_platform_tarball=true
break
fi
done
if [[ "${is_platform_tarball}" == true ]]; then
continue
fi
other_tarballs+=("${tarball}")
done
# Publish the platform packages before the root CLI wrapper. The root
# wrapper advances @openai/codex@latest, so it should only publish
# after the optional dependency versions it references exist.
tarballs=(
"${platform_tarballs[@]}"
"${other_tarballs[@]}"
"${root_tarball}"
)
if [[ -f "${sdk_tarball}" ]]; then
tarballs+=("${sdk_tarball}")
fi
for tarball in "${tarballs[@]}"; do

View File

@@ -1,20 +1,12 @@
name: rusty-v8-release
on:
workflow_dispatch:
inputs:
release_tag:
description: Optional release tag. Defaults to rusty-v8-v<resolved_v8_version>.
required: false
type: string
publish:
description: Publish the staged musl artifacts to a GitHub release.
required: false
default: true
type: boolean
push:
tags:
- "rusty-v8-v*.*.*"
concurrency:
group: ${{ github.workflow }}::${{ inputs.release_tag || github.run_id }}
group: ${{ github.workflow }}::${{ github.ref_name }}
cancel-in-progress: false
jobs:
@@ -43,15 +35,17 @@ jobs:
- name: Resolve release tag
id: release_tag
env:
RELEASE_TAG_INPUT: ${{ inputs.release_tag }}
GITHUB_REF_NAME: ${{ github.ref_name }}
V8_VERSION: ${{ steps.v8_version.outputs.version }}
shell: bash
run: |
set -euo pipefail
release_tag="${RELEASE_TAG_INPUT}"
if [[ -z "${release_tag}" ]]; then
release_tag="rusty-v8-v${V8_VERSION}"
expected_release_tag="rusty-v8-v${V8_VERSION}"
release_tag="${GITHUB_REF_NAME}"
if [[ "${release_tag}" != "${expected_release_tag}" ]]; then
echo "Tag ${release_tag} does not match resolved v8 crate version ${V8_VERSION}." >&2
exit 1
fi
echo "release_tag=${release_tag}" >> "$GITHUB_OUTPUT"
@@ -78,7 +72,9 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Set up Bazel
uses: bazelbuild/setup-bazelisk@b39c379c82683a5f25d34f0d062761f62693e0b2 # v3
uses: ./.github/actions/setup-bazel-ci
with:
target: ${{ matrix.target }}
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
@@ -109,6 +105,7 @@ jobs:
-c
opt
"--platforms=@llvm//platforms:${PLATFORM}"
--config=v8-release-compat
"${pair_target}"
"${extra_targets[@]}"
--build_metadata=COMMIT_SHA=$(git rev-parse HEAD)
@@ -132,6 +129,7 @@ jobs:
--platform "${PLATFORM}" \
--target "${TARGET}" \
--compilation-mode opt \
--bazel-config v8-release-compat \
--output-dir "dist/${TARGET}"
- name: Upload staged musl artifacts
@@ -141,7 +139,6 @@ jobs:
path: dist/${{ matrix.target }}/*
publish-release:
if: ${{ inputs.publish }}
needs:
- metadata
- build
@@ -151,16 +148,6 @@ jobs:
actions: read
steps:
- name: Ensure publishing from default branch
if: ${{ github.ref_name != github.event.repository.default_branch }}
env:
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
shell: bash
run: |
set -euo pipefail
echo "Publishing is only allowed from ${DEFAULT_BRANCH}; current ref is ${GITHUB_REF_NAME}." >&2
exit 1
- name: Ensure release tag is new
env:
GH_TOKEN: ${{ github.token }}

View File

@@ -3,6 +3,7 @@ name: v8-canary
on:
pull_request:
paths:
- ".github/actions/setup-bazel-ci/**"
- ".github/scripts/rusty_v8_bazel.py"
- ".github/workflows/rusty-v8-release.yml"
- ".github/workflows/v8-canary.yml"
@@ -16,6 +17,7 @@ on:
branches:
- main
paths:
- ".github/actions/setup-bazel-ci/**"
- ".github/scripts/rusty_v8_bazel.py"
- ".github/workflows/rusty-v8-release.yml"
- ".github/workflows/v8-canary.yml"
@@ -75,7 +77,9 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Set up Bazel
uses: bazelbuild/setup-bazelisk@b39c379c82683a5f25d34f0d062761f62693e0b2 # v3
uses: ./.github/actions/setup-bazel-ci
with:
target: ${{ matrix.target }}
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
@@ -101,6 +105,7 @@ jobs:
bazel_args=(
build
"--platforms=@llvm//platforms:${PLATFORM}"
--config=v8-release-compat
"${pair_target}"
"${extra_targets[@]}"
--build_metadata=COMMIT_SHA=$(git rev-parse HEAD)
@@ -123,6 +128,7 @@ jobs:
python3 .github/scripts/rusty_v8_bazel.py stage-release-pair \
--platform "${PLATFORM}" \
--target "${TARGET}" \
--bazel-config v8-release-compat \
--output-dir "dist/${TARGET}"
- name: Upload staged musl artifacts

2
.gitignore vendored
View File

@@ -52,6 +52,7 @@ yarn-error.log*
# env
.env*
!.env.example
.venv/
# package
*.tgz
@@ -91,4 +92,3 @@ CHANGELOG.ignore.md
# Python bytecode files
__pycache__/
*.pyc

View File

@@ -19,6 +19,12 @@ In the codex-rs folder where the rust code lives:
- You can run `just argument-comment-lint` to run the lint check locally. This is powered by Bazel, so running it the first time can be slow if Bazel is not warmed up, though incremental invocations should take <15s. Most of the time, it is best to update the PR and let CI take responsibility for checking this (or run it asynchronously in the background after submitting the PR). Note CI checks all three platforms, which the local run does not.
- When possible, make `match` statements exhaustive and avoid wildcard arms.
- Newly added traits should include doc comments that explain their role and how implementations are expected to use them.
- Discourage both `#[async_trait]` and `#[allow(async_fn_in_trait)]` in Rust traits.
- Prefer native RPITIT trait methods with explicit `Send` bounds on the returned future, as in `3c7f013f9735` / `#16630`.
- Preferred trait shape:
`fn foo(&self, ...) -> impl std::future::Future<Output = T> + Send;`
- Implementations may still use `async fn foo(&self, ...) -> T` when they satisfy that contract.
- Do not use `#[allow(async_fn_in_trait)]` as a shortcut around spelling the future contract explicitly.
- When writing tests, prefer comparing the equality of entire objects over fields one by one.
- When making a change that adds or changes an API, ensure that the documentation in the `docs/` folder is up to date if applicable.
- Prefer private modules and explicitly exported public crate API.

View File

@@ -30,6 +30,40 @@ platform(
parents = ["@platforms//host"],
)
platform(
name = "windows_x86_64_gnullvm",
constraint_values = [
"@platforms//cpu:x86_64",
"@platforms//os:windows",
"@rules_rs//rs/experimental/platforms/constraints:windows_gnullvm",
],
)
platform(
name = "windows_x86_64_msvc",
constraint_values = [
"@platforms//cpu:x86_64",
"@platforms//os:windows",
"@rules_rs//rs/experimental/platforms/constraints:windows_msvc",
],
)
toolchain(
name = "windows_gnullvm_tests_on_msvc_host_toolchain",
exec_compatible_with = [
"@platforms//cpu:x86_64",
"@platforms//os:windows",
"@rules_rs//rs/experimental/platforms/constraints:windows_msvc",
],
target_compatible_with = [
"@platforms//cpu:x86_64",
"@platforms//os:windows",
"@rules_rs//rs/experimental/platforms/constraints:windows_gnullvm",
],
toolchain = "@bazel_tools//tools/test:empty_toolchain",
toolchain_type = "@bazel_tools//tools/test:default_test_toolchain_type",
)
alias(
name = "rbe",
actual = "@rbe_platform",

View File

@@ -327,6 +327,18 @@ crate.annotation(
"RUSTY_V8_SRC_BINDING_PATH": "$(execpath @v8_targets//:rusty_v8_binding_for_target)",
},
crate = "v8",
# Keep the Rust feature aligned with the source-built Bazel artifacts.
# Windows MSVC still consumes upstream non-sandboxed prebuilts.
crate_features_select = {
"aarch64-apple-darwin": ["v8_enable_sandbox"],
"aarch64-pc-windows-gnullvm": ["v8_enable_sandbox"],
"aarch64-unknown-linux-gnu": ["v8_enable_sandbox"],
"aarch64-unknown-linux-musl": ["v8_enable_sandbox"],
"x86_64-apple-darwin": ["v8_enable_sandbox"],
"x86_64-pc-windows-gnullvm": ["v8_enable_sandbox"],
"x86_64-unknown-linux-gnu": ["v8_enable_sandbox"],
"x86_64-unknown-linux-musl": ["v8_enable_sandbox"],
},
gen_build_script = "on",
patch_args = ["-p1"],
patches = [

14
MODULE.bazel.lock generated

File diff suppressed because one or more lines are too long

3
NOTICE
View File

@@ -4,6 +4,3 @@ Copyright 2025 OpenAI
This project includes code derived from [Ratatui](https://github.com/ratatui/ratatui), licensed under the MIT license.
Copyright (c) 2016-2022 Florian Dehau
Copyright (c) 2023-2025 The Ratatui Developers
This project includes Meriyah parser assets from [meriyah](https://github.com/meriyah/meriyah), licensed under the ISC license.
Copyright (c) 2019 and later, KFlash and others.

View File

@@ -1 +0,0 @@
node_modules/

View File

@@ -1,59 +0,0 @@
FROM node:24-slim
ARG TZ
ENV TZ="$TZ"
# Install basic development tools, ca-certificates, and iptables/ipset, then clean up apt cache to reduce image size
RUN apt-get update && apt-get install -y --no-install-recommends \
aggregate \
ca-certificates \
curl \
dnsutils \
fzf \
gh \
git \
gnupg2 \
iproute2 \
ipset \
iptables \
jq \
less \
man-db \
procps \
unzip \
ripgrep \
zsh \
&& rm -rf /var/lib/apt/lists/*
# Ensure default node user has access to /usr/local/share
RUN mkdir -p /usr/local/share/npm-global && \
chown -R node:node /usr/local/share
ARG USERNAME=node
# Set up non-root user
USER node
# Install global packages
ENV NPM_CONFIG_PREFIX=/usr/local/share/npm-global
ENV PATH=$PATH:/usr/local/share/npm-global/bin
# Install codex
COPY dist/codex.tgz codex.tgz
RUN npm install -g codex.tgz \
&& npm cache clean --force \
&& rm -rf /usr/local/share/npm-global/lib/node_modules/codex-cli/node_modules/.cache \
&& rm -rf /usr/local/share/npm-global/lib/node_modules/codex-cli/tests \
&& rm -rf /usr/local/share/npm-global/lib/node_modules/codex-cli/docs
# Inside the container we consider the environment already sufficiently locked
# down, therefore instruct Codex CLI to allow running without sandboxing.
ENV CODEX_UNSAFE_ALLOW_NO_SANDBOX=1
# Copy and set up firewall script as root.
USER root
COPY scripts/init_firewall.sh /usr/local/bin/
RUN chmod 500 /usr/local/bin/init_firewall.sh
# Drop back to non-root.
USER node

View File

@@ -18,5 +18,5 @@
"url": "git+https://github.com/openai/codex.git",
"directory": "codex-cli"
},
"packageManager": "pnpm@10.29.3+sha512.498e1fb4cca5aa06c1dcf2611e6fafc50972ffe7189998c409e90de74566444298ffe43e6cd2acdc775ba1aa7cc5e092a8b7054c811ba8c5770f84693d33d2dc"
"packageManager": "pnpm@10.33.0+sha512.10568bb4a6afb58c9eb3630da90cc9516417abebd3fabbe6739f0ae795728da1491e9db5a544c76ad8eb7570f5c4bb3d6c637b2cb41bfdcdb47fa823c8649319"
}

View File

@@ -1,16 +0,0 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR=$(realpath "$(dirname "$0")")
trap "popd >> /dev/null" EXIT
pushd "$SCRIPT_DIR/.." >> /dev/null || {
echo "Error: Failed to change directory to $SCRIPT_DIR/.."
exit 1
}
pnpm install
pnpm run build
rm -rf ./dist/openai-codex-*.tgz
pnpm pack --pack-destination ./dist
mv ./dist/openai-codex-*.tgz ./dist/codex.tgz
docker build -t codex -f "./Dockerfile" .

View File

@@ -69,8 +69,8 @@ PACKAGE_EXPANSIONS: dict[str, list[str]] = {
PACKAGE_NATIVE_COMPONENTS: dict[str, list[str]] = {
"codex": [],
"codex-linux-x64": ["codex", "rg"],
"codex-linux-arm64": ["codex", "rg"],
"codex-linux-x64": ["bwrap", "codex", "rg"],
"codex-linux-arm64": ["bwrap", "codex", "rg"],
"codex-darwin-x64": ["codex", "rg"],
"codex-darwin-arm64": ["codex", "rg"],
"codex-win32-x64": ["codex", "rg", "codex-windows-sandbox-setup", "codex-command-runner"],
@@ -87,6 +87,7 @@ PACKAGE_TARGET_FILTERS: dict[str, str] = {
PACKAGE_CHOICES = tuple(PACKAGE_NATIVE_COMPONENTS)
COMPONENT_DEST_DIR: dict[str, str] = {
"bwrap": "codex-resources",
"codex": "codex",
"codex-responses-api-proxy": "codex-responses-api-proxy",
"codex-windows-sandbox-setup": "codex",
@@ -137,6 +138,16 @@ def parse_args() -> argparse.Namespace:
type=Path,
help="Directory containing pre-installed native binaries to bundle (vendor root).",
)
parser.add_argument(
"--allow-missing-native-component",
dest="allow_missing_native_components",
action="append",
default=[],
help=(
"Native component that may be absent from --vendor-src. Intended for CI "
"compatibility with older artifact workflows; releases should not use this."
),
)
return parser.parse_args()
@@ -177,6 +188,7 @@ def main() -> int:
staging_dir,
native_components,
target_filter={target_filter} if target_filter else None,
allow_missing_components=set(args.allow_missing_native_components),
)
if release_version:
@@ -365,12 +377,14 @@ def copy_native_binaries(
staging_dir: Path,
components: list[str],
target_filter: set[str] | None = None,
allow_missing_components: set[str] | None = None,
) -> None:
vendor_src = vendor_src.resolve()
if not vendor_src.exists():
raise RuntimeError(f"Vendor source directory not found: {vendor_src}")
components_set = {component for component in components if component in COMPONENT_DEST_DIR}
allow_missing_components = allow_missing_components or set()
if not components_set:
return
@@ -399,6 +413,8 @@ def copy_native_binaries(
src_component_dir = target_dir / dest_dir_name
if not src_component_dir.exists():
if component in allow_missing_components:
continue
raise RuntimeError(
f"Missing native component '{component}' in vendor source: {src_component_dir}"
)

View File

@@ -1,5 +1,5 @@
#!/usr/bin/env python3
"""Install Codex native binaries (Rust CLI plus ripgrep helpers)."""
"""Install Codex native binaries (Rust CLI, bwrap, and ripgrep helpers)."""
import argparse
from contextlib import contextmanager
@@ -42,8 +42,15 @@ class BinaryComponent:
WINDOWS_TARGETS = tuple(target for target in BINARY_TARGETS if "windows" in target)
LINUX_TARGETS = tuple(target for target in BINARY_TARGETS if "linux" in target)
BINARY_COMPONENTS = {
"bwrap": BinaryComponent(
artifact_prefix="bwrap",
dest_dir="codex-resources",
binary_basename="bwrap",
targets=LINUX_TARGETS,
),
"codex": BinaryComponent(
artifact_prefix="codex",
dest_dir="codex",
@@ -135,7 +142,7 @@ def parse_args() -> argparse.Namespace:
choices=tuple(list(BINARY_COMPONENTS) + ["rg"]),
help=(
"Limit installation to the specified components."
" May be repeated. Defaults to codex, codex-windows-sandbox-setup,"
" May be repeated. Defaults to bwrap, codex, codex-windows-sandbox-setup,"
" codex-command-runner, and rg."
),
)
@@ -159,6 +166,7 @@ def main() -> int:
vendor_dir.mkdir(parents=True, exist_ok=True)
components = args.components or [
"bwrap",
"codex",
"codex-windows-sandbox-setup",
"codex-command-runner",

View File

@@ -92,4 +92,4 @@ quoted_args=""
for arg in "$@"; do
quoted_args+=" $(printf '%q' "$arg")"
done
docker exec -it "$CONTAINER_NAME" bash -c "cd \"/app$WORK_DIR\" && codex --full-auto ${quoted_args}"
docker exec -it "$CONTAINER_NAME" bash -c "cd \"/app$WORK_DIR\" && codex --sandbox workspace-write --ask-for-approval on-request ${quoted_args}"

View File

@@ -6,5 +6,6 @@ ignore = [
"RUSTSEC-2024-0436", # paste 1.0.15 via starlark/ratatui; upstream crate is unmaintained
"RUSTSEC-2024-0320", # yaml-rust via syntect; remove when syntect drops or updates it
"RUSTSEC-2025-0141", # bincode via syntect; remove when syntect drops or updates it
"RUSTSEC-2026-0097", # rand 0.8.5 via age/codex-secrets and zbus/keyring; remove when transitive deps move to rand >=0.9.3
"RUSTSEC-2026-0118", # hickory-proto via rama-dns/rama-tcp; remove when rama updates to hickory 0.26.1 or hickory-net
"RUSTSEC-2026-0119", # hickory-proto via rama-dns/rama-tcp; remove when rama updates to hickory 0.26.1 or hickory-net
]

View File

@@ -17,7 +17,7 @@ jobs:
working-directory: codex-rs
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: dtolnay/rust-toolchain@a0b273b48ed29de4470960879e8381ff45632f26 # 1.93.0
- name: Install cargo-audit
uses: taiki-e/install-action@v2
with:

View File

@@ -1,6 +1,5 @@
exports_files([
"clippy.toml",
"node-version.txt",
])
filegroup(

401
codex-rs/Cargo.lock generated
View File

@@ -929,9 +929,9 @@ dependencies = [
[[package]]
name = "aws-smithy-async"
version = "1.2.7"
version = "1.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9ee19095c7c4dda59f1697d028ce704c24b2d33c6718790c7f1d5a3015b4107c"
checksum = "2ffcaf626bdda484571968400c326a244598634dc75fd451325a54ad1a59acfc"
dependencies = [
"futures-util",
"pin-project-lite",
@@ -961,9 +961,9 @@ dependencies = [
[[package]]
name = "aws-smithy-http-client"
version = "1.1.5"
version = "1.1.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "59e62db736db19c488966c8d787f52e6270be565727236fd5579eaa301e7bc4a"
checksum = "6a2f165a7feee6f263028b899d0a181987f4fa7179a6411a32a439fba7c5f769"
dependencies = [
"aws-smithy-async",
"aws-smithy-runtime-api",
@@ -1013,9 +1013,9 @@ dependencies = [
[[package]]
name = "aws-smithy-runtime"
version = "1.9.6"
version = "1.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "65fda37911905ea4d3141a01364bc5509a0f32ae3f3b22d6e330c0abfb62d247"
checksum = "a392db6c583ea4a912538afb86b7be7c5d8887d91604f50eb55c262ee1b4a5f5"
dependencies = [
"aws-smithy-async",
"aws-smithy-http",
@@ -1037,11 +1037,12 @@ dependencies = [
[[package]]
name = "aws-smithy-runtime-api"
version = "1.9.3"
version = "1.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ab0d43d899f9e508300e587bf582ba54c27a452dd0a9ea294690669138ae14a2"
checksum = "b71a13df6ada0aafbf21a73bdfcdf9324cfa9df77d96b8446045be3cde61b42e"
dependencies = [
"aws-smithy-async",
"aws-smithy-runtime-api-macros",
"aws-smithy-types",
"bytes",
"http 0.2.12",
@@ -1053,10 +1054,21 @@ dependencies = [
]
[[package]]
name = "aws-smithy-types"
version = "1.3.5"
name = "aws-smithy-runtime-api-macros"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "905cb13a9895626d49cf2ced759b062d913834c7482c38e49557eac4e6193f01"
checksum = "8d7396fd9500589e62e460e987ecb671bad374934e55ec3b5f498cc7a8a8a7b7"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.114",
]
[[package]]
name = "aws-smithy-types"
version = "1.4.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9d73dbfbaa8e4bc57b9045137680b958d274823509a360abfd8e1d514d40c95c"
dependencies = [
"base64-simd",
"bytes",
@@ -1736,6 +1748,21 @@ dependencies = [
"unicode-width 0.2.1",
]
[[package]]
name = "codex-agent-graph-store"
version = "0.0.0"
dependencies = [
"async-trait",
"codex-protocol",
"codex-state",
"pretty_assertions",
"serde",
"serde_json",
"tempfile",
"thiserror 2.0.18",
"tokio",
]
[[package]]
name = "codex-agent-identity"
version = "0.0.0"
@@ -1746,6 +1773,7 @@ dependencies = [
"codex-protocol",
"crypto_box",
"ed25519-dalek",
"jsonwebtoken",
"pretty_assertions",
"rand 0.9.3",
"reqwest",
@@ -1761,6 +1789,7 @@ dependencies = [
"codex-app-server-protocol",
"codex-git-utils",
"codex-login",
"codex-model-provider",
"codex-plugin",
"codex-protocol",
"codex-utils-absolute-path",
@@ -1829,6 +1858,7 @@ dependencies = [
"clap",
"codex-analytics",
"codex-app-server-protocol",
"codex-app-server-transport",
"codex-arg0",
"codex-backend-client",
"codex-chatgpt",
@@ -1838,12 +1868,17 @@ dependencies = [
"codex-core-plugins",
"codex-device-key",
"codex-exec-server",
"codex-external-agent-migration",
"codex-external-agent-sessions",
"codex-features",
"codex-feedback",
"codex-file-search",
"codex-git-utils",
"codex-hooks",
"codex-login",
"codex-mcp",
"codex-memories-write",
"codex-model-provider",
"codex-model-provider-info",
"codex-models-manager",
"codex-otel",
@@ -1861,16 +1896,12 @@ dependencies = [
"codex-utils-cli",
"codex-utils-json-to-toml",
"codex-utils-pty",
"codex-utils-rustls-provider",
"constant_time_eq 0.3.1",
"core_test_support",
"flate2",
"futures",
"gethostname",
"hmac",
"jsonwebtoken",
"opentelemetry",
"opentelemetry_sdk",
"owo-colors",
"pretty_assertions",
"reqwest",
"rmcp",
@@ -1879,6 +1910,7 @@ dependencies = [
"serial_test",
"sha2",
"shlex",
"tar",
"tempfile",
"thiserror 2.0.18",
"time",
@@ -1912,6 +1944,7 @@ dependencies = [
"pretty_assertions",
"serde",
"serde_json",
"tempfile",
"tokio",
"tokio-tungstenite",
"toml 0.9.11+spec-1.1.0",
@@ -1967,6 +2000,45 @@ dependencies = [
"uuid",
]
[[package]]
name = "codex-app-server-transport"
version = "0.0.0"
dependencies = [
"anyhow",
"axum",
"base64 0.22.1",
"chrono",
"clap",
"codex-api",
"codex-app-server-protocol",
"codex-config",
"codex-core",
"codex-login",
"codex-model-provider",
"codex-state",
"codex-uds",
"codex-utils-absolute-path",
"codex-utils-rustls-provider",
"constant_time_eq 0.3.1",
"futures",
"gethostname",
"hmac",
"jsonwebtoken",
"owo-colors",
"pretty_assertions",
"serde",
"serde_json",
"sha2",
"tempfile",
"time",
"tokio",
"tokio-tungstenite",
"tokio-util",
"tracing",
"url",
"uuid",
]
[[package]]
name = "codex-apply-patch"
version = "0.0.0"
@@ -2033,9 +2105,11 @@ name = "codex-backend-client"
version = "0.0.0"
dependencies = [
"anyhow",
"codex-api",
"codex-backend-openapi-models",
"codex-client",
"codex-login",
"codex-model-provider",
"codex-protocol",
"pretty_assertions",
"reqwest",
@@ -2052,6 +2126,26 @@ dependencies = [
"serde_with",
]
[[package]]
name = "codex-builtin-mcps"
version = "0.0.0"
dependencies = [
"anyhow",
"codex-config",
"codex-memories-mcp",
"codex-utils-absolute-path",
"pretty_assertions",
]
[[package]]
name = "codex-bwrap"
version = "0.0.0"
dependencies = [
"cc",
"libc",
"pkg-config",
]
[[package]]
name = "codex-chatgpt"
version = "0.0.0"
@@ -2059,11 +2153,13 @@ dependencies = [
"anyhow",
"clap",
"codex-app-server-protocol",
"codex-config",
"codex-connectors",
"codex-core",
"codex-core-plugins",
"codex-git-utils",
"codex-login",
"codex-model-provider",
"codex-plugin",
"codex-utils-cargo-bin",
"codex-utils-cli",
"pretty_assertions",
@@ -2082,15 +2178,16 @@ dependencies = [
"assert_matches",
"clap",
"clap_complete",
"codex-api",
"codex-app-server",
"codex-app-server-protocol",
"codex-app-server-test-client",
"codex-arg0",
"codex-builtin-mcps",
"codex-chatgpt",
"codex-cloud-tasks",
"codex-config",
"codex-core",
"codex-core-plugins",
"codex-exec",
"codex-exec-server",
"codex-execpolicy",
@@ -2098,11 +2195,12 @@ dependencies = [
"codex-login",
"codex-mcp",
"codex-mcp-server",
"codex-model-provider",
"codex-memories-write",
"codex-models-manager",
"codex-protocol",
"codex-responses-api-proxy",
"codex-rmcp-client",
"codex-rollout-trace",
"codex-sandboxing",
"codex-state",
"codex-stdio-to-uds",
@@ -2144,6 +2242,7 @@ dependencies = [
"opentelemetry_sdk",
"pretty_assertions",
"rand 0.9.3",
"rcgen",
"reqwest",
"rustls",
"rustls-native-certs",
@@ -2190,7 +2289,6 @@ version = "0.0.0"
dependencies = [
"anyhow",
"async-trait",
"base64 0.22.1",
"chrono",
"clap",
"codex-client",
@@ -2199,6 +2297,7 @@ dependencies = [
"codex-core",
"codex-git-utils",
"codex-login",
"codex-model-provider",
"codex-tui",
"codex-utils-cli",
"crossterm",
@@ -2223,6 +2322,7 @@ dependencies = [
"anyhow",
"async-trait",
"chrono",
"codex-api",
"codex-backend-client",
"codex-git-utils",
"serde",
@@ -2267,20 +2367,26 @@ version = "0.0.0"
dependencies = [
"anyhow",
"async-trait",
"base64 0.22.1",
"codex-app-server-protocol",
"codex-execpolicy",
"codex-features",
"codex-file-system",
"codex-git-utils",
"codex-model-provider-info",
"codex-network-proxy",
"codex-protocol",
"codex-utils-absolute-path",
"codex-utils-path",
"core-foundation 0.9.4",
"dns-lookup",
"dunce",
"futures",
"gethostname",
"libc",
"multimap",
"pretty_assertions",
"prost 0.14.3",
"schemars 0.8.22",
"serde",
"serde_json",
@@ -2289,11 +2395,16 @@ dependencies = [
"tempfile",
"thiserror 2.0.18",
"tokio",
"tokio-stream",
"toml 0.9.11+spec-1.1.0",
"toml_edit 0.24.0+spec-1.1.0",
"tonic",
"tonic-prost",
"tonic-prost-build",
"tracing",
"wildmatch",
"winapi-util",
"windows-sys 0.52.0",
]
[[package]]
@@ -2322,6 +2433,7 @@ dependencies = [
"bm25",
"chrono",
"clap",
"codex-agent-graph-store",
"codex-analytics",
"codex-api",
"codex-app-server-protocol",
@@ -2340,6 +2452,7 @@ dependencies = [
"codex-hooks",
"codex-login",
"codex-mcp",
"codex-memories-read",
"codex-model-provider",
"codex-model-provider-info",
"codex-models-manager",
@@ -2350,8 +2463,8 @@ dependencies = [
"codex-response-debug-context",
"codex-rmcp-client",
"codex-rollout",
"codex-rollout-trace",
"codex-sandboxing",
"codex-secrets",
"codex-shell-command",
"codex-shell-escalation",
"codex-state",
@@ -2373,7 +2486,6 @@ dependencies = [
"codex-utils-string",
"codex-utils-template",
"codex-windows-sandbox",
"core-foundation 0.9.4",
"core_test_support",
"csv",
"ctor 0.6.3",
@@ -2424,38 +2536,63 @@ dependencies = [
"walkdir",
"which 8.0.0",
"whoami",
"windows-sys 0.52.0",
"wiremock",
"zip 2.4.2",
"zstd 0.13.3",
]
[[package]]
name = "codex-core-api"
version = "0.0.0"
dependencies = [
"codex-analytics",
"codex-app-server-protocol",
"codex-arg0",
"codex-config",
"codex-core",
"codex-exec-server",
"codex-features",
"codex-login",
"codex-model-provider-info",
"codex-models-manager",
"codex-protocol",
"codex-utils-absolute-path",
]
[[package]]
name = "codex-core-plugins"
version = "0.0.0"
dependencies = [
"anyhow",
"chrono",
"codex-analytics",
"codex-app-server-protocol",
"codex-config",
"codex-core-skills",
"codex-exec-server",
"codex-git-utils",
"codex-login",
"codex-model-provider",
"codex-otel",
"codex-plugin",
"codex-protocol",
"codex-utils-absolute-path",
"codex-utils-plugins",
"dirs",
"flate2",
"libc",
"pretty_assertions",
"reqwest",
"serde",
"serde_json",
"tar",
"tempfile",
"thiserror 2.0.18",
"tokio",
"toml 0.9.11+spec-1.1.0",
"tracing",
"url",
"wiremock",
"zip 2.4.2",
]
[[package]]
@@ -2468,6 +2605,7 @@ dependencies = [
"codex-config",
"codex-exec-server",
"codex-login",
"codex-model-provider",
"codex-otel",
"codex-protocol",
"codex-skills",
@@ -2504,6 +2642,7 @@ dependencies = [
name = "codex-device-key"
version = "0.0.0"
dependencies = [
"async-trait",
"base64 0.22.1",
"p256",
"pretty_assertions",
@@ -2511,6 +2650,7 @@ dependencies = [
"serde",
"serde_json",
"thiserror 2.0.18",
"tokio",
"url",
]
@@ -2526,6 +2666,7 @@ dependencies = [
"codex-apply-patch",
"codex-arg0",
"codex-cloud-requirements",
"codex-config",
"codex-core",
"codex-feedback",
"codex-git-utils",
@@ -2566,8 +2707,10 @@ dependencies = [
"arc-swap",
"async-trait",
"base64 0.22.1",
"bytes",
"codex-app-server-protocol",
"codex-config",
"codex-client",
"codex-file-system",
"codex-protocol",
"codex-sandboxing",
"codex-test-binary-support",
@@ -2576,16 +2719,20 @@ dependencies = [
"ctor 0.6.3",
"futures",
"pretty_assertions",
"reqwest",
"serde",
"serde_json",
"serial_test",
"sha2",
"tempfile",
"test-case",
"thiserror 2.0.18",
"tokio",
"tokio-tungstenite",
"tokio-util",
"tracing",
"uuid",
"wiremock",
]
[[package]]
@@ -2634,6 +2781,32 @@ dependencies = [
"syn 2.0.114",
]
[[package]]
name = "codex-external-agent-migration"
version = "0.0.0"
dependencies = [
"codex-hooks",
"pretty_assertions",
"serde_json",
"serde_yaml",
"tempfile",
"toml 0.9.11+spec-1.1.0",
]
[[package]]
name = "codex-external-agent-sessions"
version = "0.0.0"
dependencies = [
"chrono",
"codex-app-server-protocol",
"codex-protocol",
"codex-utils-output-truncation",
"serde",
"serde_json",
"sha2",
"tempfile",
]
[[package]]
name = "codex-features"
version = "0.0.0"
@@ -2676,14 +2849,23 @@ dependencies = [
"tokio",
]
[[package]]
name = "codex-file-system"
version = "0.0.0"
dependencies = [
"async-trait",
"codex-protocol",
"codex-utils-absolute-path",
"serde",
]
[[package]]
name = "codex-git-utils"
version = "0.0.0"
dependencies = [
"anyhow",
"assert_matches",
"chrono",
"codex-exec-server",
"codex-file-system",
"codex-protocol",
"codex-utils-absolute-path",
"futures",
@@ -2708,8 +2890,10 @@ dependencies = [
"anyhow",
"chrono",
"codex-config",
"codex-plugin",
"codex-protocol",
"codex-utils-absolute-path",
"codex-utils-output-truncation",
"futures",
"pretty_assertions",
"regex",
@@ -2718,6 +2902,8 @@ dependencies = [
"serde_json",
"tempfile",
"tokio",
"tracing",
"uuid",
]
[[package]]
@@ -2741,21 +2927,20 @@ dependencies = [
name = "codex-linux-sandbox"
version = "0.0.0"
dependencies = [
"cc",
"clap",
"codex-config",
"codex-core",
"codex-process-hardening",
"codex-protocol",
"codex-sandboxing",
"codex-utils-absolute-path",
"globset",
"landlock",
"libc",
"pkg-config",
"pretty_assertions",
"seccompiler",
"serde",
"serde_json",
"sha2",
"tempfile",
"tokio",
"url",
@@ -2794,6 +2979,7 @@ dependencies = [
"codex-terminal-detection",
"codex-utils-template",
"core_test_support",
"jsonwebtoken",
"keyring",
"once_cell",
"os_info",
@@ -2822,15 +3008,17 @@ version = "0.0.0"
dependencies = [
"anyhow",
"async-channel",
"codex-api",
"codex-async-utils",
"codex-builtin-mcps",
"codex-config",
"codex-exec-server",
"codex-login",
"codex-model-provider",
"codex-otel",
"codex-plugin",
"codex-protocol",
"codex-rmcp-client",
"codex-utils-absolute-path",
"codex-utils-plugins",
"futures",
"pretty_assertions",
@@ -2856,9 +3044,7 @@ dependencies = [
"codex-config",
"codex-core",
"codex-exec-server",
"codex-features",
"codex-login",
"codex-models-manager",
"codex-protocol",
"codex-shell-command",
"codex-utils-absolute-path",
@@ -2880,19 +3066,107 @@ dependencies = [
"wiremock",
]
[[package]]
name = "codex-memories-mcp"
version = "0.0.0"
dependencies = [
"anyhow",
"codex-utils-absolute-path",
"codex-utils-output-truncation",
"pretty_assertions",
"rmcp",
"schemars 0.8.22",
"serde",
"serde_json",
"tempfile",
"thiserror 2.0.18",
"tokio",
]
[[package]]
name = "codex-memories-read"
version = "0.0.0"
dependencies = [
"codex-protocol",
"codex-shell-command",
"codex-utils-absolute-path",
"codex-utils-output-truncation",
"codex-utils-template",
"pretty_assertions",
"tempfile",
"tokio",
]
[[package]]
name = "codex-memories-write"
version = "0.0.0"
dependencies = [
"anyhow",
"chrono",
"codex-backend-client",
"codex-config",
"codex-core",
"codex-features",
"codex-git-utils",
"codex-login",
"codex-models-manager",
"codex-otel",
"codex-protocol",
"codex-rollout",
"codex-rollout-trace",
"codex-secrets",
"codex-state",
"codex-terminal-detection",
"codex-utils-absolute-path",
"codex-utils-output-truncation",
"codex-utils-template",
"core_test_support",
"futures",
"pretty_assertions",
"serde",
"serde_json",
"tempfile",
"tokio",
"tracing",
"uuid",
"wiremock",
]
[[package]]
name = "codex-message-history"
version = "0.0.0"
dependencies = [
"codex-config",
"pretty_assertions",
"serde",
"serde_json",
"tempfile",
"tokio",
"tracing",
]
[[package]]
name = "codex-model-provider"
version = "0.0.0"
dependencies = [
"async-trait",
"codex-agent-identity",
"codex-api",
"codex-aws-auth",
"codex-client",
"codex-feedback",
"codex-login",
"codex-model-provider-info",
"codex-models-manager",
"codex-otel",
"codex-protocol",
"codex-response-debug-context",
"http 1.4.0",
"pretty_assertions",
"serde_json",
"tokio",
"tracing",
"wiremock",
]
[[package]]
@@ -2916,32 +3190,21 @@ dependencies = [
name = "codex-models-manager"
version = "0.0.0"
dependencies = [
"base64 0.22.1",
"async-trait",
"chrono",
"codex-api",
"codex-app-server-protocol",
"codex-collaboration-mode-templates",
"codex-config",
"codex-feedback",
"codex-login",
"codex-model-provider",
"codex-model-provider-info",
"codex-otel",
"codex-protocol",
"codex-response-debug-context",
"codex-utils-absolute-path",
"codex-utils-output-truncation",
"codex-utils-template",
"core_test_support",
"http 1.4.0",
"pretty_assertions",
"serde",
"serde_json",
"tempfile",
"tokio",
"tracing",
"tracing-subscriber",
"wiremock",
]
[[package]]
@@ -3030,6 +3293,7 @@ dependencies = [
name = "codex-plugin"
version = "0.0.0"
dependencies = [
"codex-config",
"codex-utils-absolute-path",
"codex-utils-plugins",
"thiserror 2.0.18",
@@ -3081,6 +3345,7 @@ dependencies = [
"tracing",
"ts-rs",
"uuid",
"wildmatch",
]
[[package]]
@@ -3126,6 +3391,8 @@ version = "0.0.0"
dependencies = [
"anyhow",
"axum",
"bytes",
"codex-api",
"codex-client",
"codex-config",
"codex-exec-server",
@@ -3185,11 +3452,14 @@ name = "codex-rollout-trace"
version = "0.0.0"
dependencies = [
"anyhow",
"codex-code-mode",
"codex-protocol",
"pretty_assertions",
"serde",
"serde_json",
"tempfile",
"tracing",
"uuid",
]
[[package]]
@@ -3335,6 +3605,17 @@ dependencies = [
"tempfile",
]
[[package]]
name = "codex-thread-manager-sample"
version = "0.0.0"
dependencies = [
"anyhow",
"clap",
"codex-core-api",
"serde_json",
"tracing",
]
[[package]]
name = "codex-thread-store"
version = "0.0.0"
@@ -3356,6 +3637,7 @@ dependencies = [
"tonic",
"tonic-prost",
"tonic-prost-build",
"tracing",
"uuid",
]
@@ -3395,6 +3677,7 @@ dependencies = [
"codex-cloud-requirements",
"codex-config",
"codex-connectors",
"codex-core-plugins",
"codex-core-skills",
"codex-exec-server",
"codex-features",
@@ -3404,6 +3687,8 @@ dependencies = [
"codex-install-context",
"codex-login",
"codex-mcp",
"codex-message-history",
"codex-model-provider",
"codex-model-provider-info",
"codex-models-manager",
"codex-otel",
@@ -3691,6 +3976,8 @@ version = "0.0.0"
dependencies = [
"pretty_assertions",
"regex-lite",
"serde",
"serde_json",
]
[[package]]
@@ -3715,6 +4002,7 @@ dependencies = [
"anyhow",
"base64 0.22.1",
"chrono",
"codex-otel",
"codex-protocol",
"codex-utils-absolute-path",
"codex-utils-pty",
@@ -3966,9 +4254,11 @@ dependencies = [
"assert_cmd",
"base64 0.22.1",
"codex-arg0",
"codex-config",
"codex-core",
"codex-exec-server",
"codex-features",
"codex-hooks",
"codex-login",
"codex-model-provider-info",
"codex-models-manager",
@@ -3985,6 +4275,7 @@ dependencies = [
"reqwest",
"serde_json",
"shlex",
"similar",
"tempfile",
"tokio",
"tokio-tungstenite",
@@ -8624,7 +8915,7 @@ version = "5.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51e219e79014df21a225b1860a479e2dcd7cbd9130f4defd4bd0e191ea31d67d"
dependencies = [
"base64 0.22.1",
"base64 0.21.7",
"chrono",
"getrandom 0.2.17",
"http 1.4.0",
@@ -9092,7 +9383,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7d8fae84b431384b68627d0f9b3b1245fcf9f46f6c0e3dc902e9dce64edd1967"
dependencies = [
"libc",
"windows-sys 0.61.2",
"windows-sys 0.45.0",
]
[[package]]
@@ -10761,9 +11052,9 @@ dependencies = [
[[package]]
name = "rustls-webpki"
version = "0.103.12"
version = "0.103.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8279bb85272c9f10811ae6a6c547ff594d6a7f3c6c6b02ee9726d1d0dcfcdd06"
checksum = "61c429a8649f110dddef65e2a5ad240f747e85f7758a6bccc7e5777bd33f756e"
dependencies = [
"aws-lc-rs",
"ring",
@@ -12185,6 +12476,16 @@ version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7b2093cf4c8eb1e67749a6762251bc9cd836b6fc171623bd0a9d324d37af2417"
[[package]]
name = "tar"
version = "0.4.45"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "22692a6476a21fa75fdfc11d452fda482af402c008cdbaf3476414e122040973"
dependencies = [
"filetime",
"libc",
]
[[package]]
name = "target-lexicon"
version = "0.13.3"
@@ -13726,7 +14027,7 @@ version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c2a7b1c03c876122aa43f3020e6c3c3ee5c05081c9a00739faf7503aeba10d22"
dependencies = [
"windows-sys 0.61.2",
"windows-sys 0.48.0",
]
[[package]]

View File

@@ -2,11 +2,15 @@
members = [
"aws-auth",
"analytics",
"agent-graph-store",
"agent-identity",
"backend-client",
"builtin-mcps",
"bwrap",
"ansi-escape",
"async-utils",
"app-server",
"app-server-transport",
"app-server-client",
"app-server-protocol",
"app-server-test-client",
@@ -31,14 +35,18 @@ members = [
"shell-escalation",
"skills",
"core",
"core-api",
"core-plugins",
"core-skills",
"hooks",
"secrets",
"exec",
"file-system",
"exec-server",
"execpolicy",
"execpolicy-legacy",
"external-agent-migration",
"external-agent-sessions",
"keyring-store",
"file-search",
"linux-sandbox",
@@ -46,6 +54,9 @@ members = [
"login",
"codex-mcp",
"mcp-server",
"memories/mcp",
"memories/read",
"memories/write",
"model-provider-info",
"models-manager",
"network-proxy",
@@ -92,6 +103,7 @@ members = [
"state",
"terminal-detection",
"test-binary-support",
"thread-manager-sample",
"thread-store",
"uds",
"codex-experimental-api-macros",
@@ -113,11 +125,13 @@ license = "Apache-2.0"
# Internal
app_test_support = { path = "app-server/tests/common" }
codex-analytics = { path = "analytics" }
codex-agent-graph-store = { path = "agent-graph-store" }
codex-agent-identity = { path = "agent-identity" }
codex-ansi-escape = { path = "ansi-escape" }
codex-api = { path = "codex-api" }
codex-aws-auth = { path = "aws-auth" }
codex-app-server = { path = "app-server" }
codex-app-server-transport = { path = "app-server-transport" }
codex-app-server-client = { path = "app-server-client" }
codex-app-server-protocol = { path = "app-server-protocol" }
codex-app-server-test-client = { path = "app-server-test-client" }
@@ -125,6 +139,7 @@ codex-apply-patch = { path = "apply-patch" }
codex-arg0 = { path = "arg0" }
codex-async-utils = { path = "async-utils" }
codex-backend-client = { path = "backend-client" }
codex-builtin-mcps = { path = "builtin-mcps" }
codex-chatgpt = { path = "chatgpt" }
codex-cli = { path = "cli" }
codex-client = { path = "codex-client" }
@@ -136,12 +151,16 @@ codex-code-mode = { path = "code-mode" }
codex-config = { path = "config" }
codex-connectors = { path = "connectors" }
codex-core = { path = "core" }
codex-core-api = { path = "core-api" }
codex-core-plugins = { path = "core-plugins" }
codex-core-skills = { path = "core-skills" }
codex-device-key = { path = "device-key" }
codex-exec = { path = "exec" }
codex-file-system = { path = "file-system" }
codex-exec-server = { path = "exec-server" }
codex-execpolicy = { path = "execpolicy" }
codex-external-agent-migration = { path = "external-agent-migration" }
codex-external-agent-sessions = { path = "external-agent-sessions" }
codex-experimental-api-macros = { path = "codex-experimental-api-macros" }
codex-features = { path = "features" }
codex-feedback = { path = "feedback" }
@@ -153,6 +172,10 @@ codex-keyring-store = { path = "keyring-store" }
codex-linux-sandbox = { path = "linux-sandbox" }
codex-lmstudio = { path = "lmstudio" }
codex-login = { path = "login" }
codex-message-history = { path = "message-history" }
codex-memories-mcp = { path = "memories/mcp" }
codex-memories-read = { path = "memories/read" }
codex-memories-write = { path = "memories/write" }
codex-mcp = { path = "codex-mcp" }
codex-mcp-server = { path = "mcp-server" }
codex-model-provider-info = { path = "model-provider-info" }
@@ -169,6 +192,7 @@ codex-responses-api-proxy = { path = "responses-api-proxy" }
codex-response-debug-context = { path = "response-debug-context" }
codex-rmcp-client = { path = "rmcp-client" }
codex-rollout = { path = "rollout" }
codex-rollout-trace = { path = "rollout-trace" }
codex-sandboxing = { path = "sandboxing" }
codex-secrets = { path = "secrets" }
codex-shell-command = { path = "shell-command" }
@@ -253,6 +277,7 @@ encoding_rs = "0.8.35"
env-flags = "0.1.1"
env_logger = "0.11.9"
eventsource-stream = "0.2.3"
flate2 = "1.1.8"
futures = { version = "0.3", default-features = false }
gethostname = "1.1.0"
gix = { version = "0.81.0", default-features = false, features = ["sha1"] }
@@ -303,6 +328,10 @@ quick-xml = "0.38.4"
rand = "0.9"
ratatui = "0.29.0"
ratatui-macros = "0.6.0"
rcgen = { version = "0.14.7", default-features = false, features = [
"aws_lc_rs",
"pem",
] }
regex = "1.12.3"
regex-lite = "0.1.8"
reqwest = { version = "0.12", features = ["cookies"] }
@@ -345,6 +374,7 @@ strum_macros = "0.28.0"
supports-color = "3.0.2"
syntect = "5"
sys-locale = "0.3.2"
tar = { version = "=0.4.45", default-features = false }
tempfile = "3.23.0"
test-log = "0.2.19"
textwrap = "0.16.2"
@@ -436,6 +466,8 @@ unwrap_used = "deny"
# silence the false positive here instead of deleting a real dependency.
[workspace.metadata.cargo-shear]
ignored = [
"codex-agent-graph-store",
"codex-memories-mcp",
"icu_provider",
"openssl-sys",
"codex-utils-readiness",

View File

@@ -59,19 +59,22 @@ To test to see what happens when a command is run under the sandbox provided by
```
# macOS
codex sandbox macos [--full-auto] [--log-denials] [COMMAND]...
codex sandbox macos [--log-denials] [COMMAND]...
# Linux
codex sandbox linux [--full-auto] [COMMAND]...
codex sandbox linux [COMMAND]...
# Windows
codex sandbox windows [--full-auto] [COMMAND]...
codex sandbox windows [COMMAND]...
# Legacy aliases
codex debug seatbelt [--full-auto] [--log-denials] [COMMAND]...
codex debug landlock [--full-auto] [COMMAND]...
codex debug seatbelt [--log-denials] [COMMAND]...
codex debug landlock [COMMAND]...
```
To try a writable legacy sandbox mode with these commands, pass an explicit config override such
as `-c 'sandbox_mode="workspace-write"'`.
### Selecting a sandbox policy via `--sandbox`
The Rust CLI exposes a dedicated `--sandbox` (`-s`) flag that lets you pick the sandbox policy **without** having to reach for the generic `-c/--config` option:
@@ -94,7 +97,7 @@ In `workspace-write`, Codex also includes `~/.codex/memories` in its writable ro
This folder is the root of a Cargo workspace. It contains quite a bit of experimental code, but here are the key crates:
- [`core/`](./core) contains the business logic for Codex. Ultimately, we hope this to be a library crate that is generally useful for building other Rust/native applications that use Codex.
- [`core/`](./core) contains the business logic for Codex. Ultimately, we hope this becomes a library crate that is generally useful for building other Rust/native applications that use Codex.
- [`exec/`](./exec) "headless" CLI for use in automation.
- [`tui/`](./tui) CLI that launches a fullscreen TUI built with [Ratatui](https://ratatui.rs/).
- [`cli/`](./cli) CLI multitool that provides the aforementioned CLIs via subcommands.

View File

@@ -0,0 +1,6 @@
load("//:defs.bzl", "codex_rust_crate")
codex_rust_crate(
name = "agent-graph-store",
crate_name = "codex_agent_graph_store",
)

View File

@@ -0,0 +1,25 @@
[package]
edition.workspace = true
license.workspace = true
name = "codex-agent-graph-store"
version.workspace = true
[lib]
name = "codex_agent_graph_store"
path = "src/lib.rs"
[lints]
workspace = true
[dependencies]
async-trait = { workspace = true }
codex-protocol = { workspace = true }
codex-state = { workspace = true }
serde = { workspace = true, features = ["derive"] }
thiserror = { workspace = true }
[dev-dependencies]
pretty_assertions = { workspace = true }
serde_json = { workspace = true }
tempfile = { workspace = true }
tokio = { workspace = true, features = ["macros", "rt-multi-thread"] }

View File

@@ -0,0 +1,20 @@
/// Result type returned by agent graph store operations.
pub type AgentGraphStoreResult<T> = Result<T, AgentGraphStoreError>;
/// Error type shared by agent graph store implementations.
#[derive(Debug, thiserror::Error)]
pub enum AgentGraphStoreError {
/// The caller supplied invalid request data.
#[error("invalid agent graph store request: {message}")]
InvalidRequest {
/// User-facing explanation of the invalid request.
message: String,
},
/// Catch-all for implementation failures that do not fit a more specific category.
#[error("agent graph store internal error: {message}")]
Internal {
/// User-facing explanation of the implementation failure.
message: String,
},
}

View File

@@ -0,0 +1,12 @@
//! Storage-neutral parent/child topology for thread-spawned agents.
mod error;
mod local;
mod store;
mod types;
pub use error::AgentGraphStoreError;
pub use error::AgentGraphStoreResult;
pub use local::LocalAgentGraphStore;
pub use store::AgentGraphStore;
pub use types::ThreadSpawnEdgeStatus;

View File

@@ -0,0 +1,325 @@
use async_trait::async_trait;
use codex_protocol::ThreadId;
use codex_state::StateRuntime;
use std::sync::Arc;
use crate::AgentGraphStore;
use crate::AgentGraphStoreError;
use crate::AgentGraphStoreResult;
use crate::ThreadSpawnEdgeStatus;
/// SQLite-backed implementation of [`AgentGraphStore`] using an existing state runtime.
#[derive(Clone)]
pub struct LocalAgentGraphStore {
state_db: Arc<StateRuntime>,
}
impl std::fmt::Debug for LocalAgentGraphStore {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("LocalAgentGraphStore")
.field("codex_home", &self.state_db.codex_home())
.finish_non_exhaustive()
}
}
impl LocalAgentGraphStore {
/// Create a local graph store from an already-initialized state runtime.
pub fn new(state_db: Arc<StateRuntime>) -> Self {
Self { state_db }
}
}
#[async_trait]
impl AgentGraphStore for LocalAgentGraphStore {
async fn upsert_thread_spawn_edge(
&self,
parent_thread_id: ThreadId,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()> {
self.state_db
.upsert_thread_spawn_edge(parent_thread_id, child_thread_id, to_state_status(status))
.await
.map_err(internal_error)
}
async fn set_thread_spawn_edge_status(
&self,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()> {
self.state_db
.set_thread_spawn_edge_status(child_thread_id, to_state_status(status))
.await
.map_err(internal_error)
}
async fn list_thread_spawn_children(
&self,
parent_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>> {
if let Some(status) = status_filter {
return self
.state_db
.list_thread_spawn_children_with_status(parent_thread_id, to_state_status(status))
.await
.map_err(internal_error);
}
self.state_db
.list_thread_spawn_children(parent_thread_id)
.await
.map_err(internal_error)
}
async fn list_thread_spawn_descendants(
&self,
root_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>> {
match status_filter {
Some(status) => self
.state_db
.list_thread_spawn_descendants_with_status(root_thread_id, to_state_status(status))
.await
.map_err(internal_error),
None => self
.state_db
.list_thread_spawn_descendants(root_thread_id)
.await
.map_err(internal_error),
}
}
}
fn to_state_status(status: ThreadSpawnEdgeStatus) -> codex_state::DirectionalThreadSpawnEdgeStatus {
match status {
ThreadSpawnEdgeStatus::Open => codex_state::DirectionalThreadSpawnEdgeStatus::Open,
ThreadSpawnEdgeStatus::Closed => codex_state::DirectionalThreadSpawnEdgeStatus::Closed,
}
}
fn internal_error(err: impl std::fmt::Display) -> AgentGraphStoreError {
AgentGraphStoreError::Internal {
message: err.to_string(),
}
}
#[cfg(test)]
mod tests {
use super::*;
use codex_state::DirectionalThreadSpawnEdgeStatus;
use pretty_assertions::assert_eq;
use tempfile::TempDir;
struct TestRuntime {
state_db: Arc<StateRuntime>,
_codex_home: TempDir,
}
fn thread_id(suffix: u128) -> ThreadId {
ThreadId::from_string(&format!("00000000-0000-0000-0000-{suffix:012}"))
.expect("valid thread id")
}
async fn state_runtime() -> TestRuntime {
let codex_home = TempDir::new().expect("tempdir should be created");
let state_db =
StateRuntime::init(codex_home.path().to_path_buf(), "test-provider".to_string())
.await
.expect("state db should initialize");
TestRuntime {
state_db,
_codex_home: codex_home,
}
}
#[tokio::test]
async fn local_store_upserts_and_lists_direct_children_with_status_filters() {
let fixture = state_runtime().await;
let state_db = fixture.state_db;
let store = LocalAgentGraphStore::new(state_db.clone());
let parent_thread_id = thread_id(/*suffix*/ 1);
let first_child_thread_id = thread_id(/*suffix*/ 2);
let second_child_thread_id = thread_id(/*suffix*/ 3);
store
.upsert_thread_spawn_edge(
parent_thread_id,
second_child_thread_id,
ThreadSpawnEdgeStatus::Closed,
)
.await
.expect("closed child edge should insert");
store
.upsert_thread_spawn_edge(
parent_thread_id,
first_child_thread_id,
ThreadSpawnEdgeStatus::Open,
)
.await
.expect("open child edge should insert");
let all_children = store
.list_thread_spawn_children(parent_thread_id, /*status_filter*/ None)
.await
.expect("all children should load");
assert_eq!(
all_children,
vec![first_child_thread_id, second_child_thread_id]
);
let open_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Open))
.await
.expect("open children should load");
let state_open_children = state_db
.list_thread_spawn_children_with_status(
parent_thread_id,
DirectionalThreadSpawnEdgeStatus::Open,
)
.await
.expect("state open children should load");
assert_eq!(open_children, state_open_children);
assert_eq!(open_children, vec![first_child_thread_id]);
let closed_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Closed))
.await
.expect("closed children should load");
assert_eq!(closed_children, vec![second_child_thread_id]);
}
#[tokio::test]
async fn local_store_updates_edge_status() {
let fixture = state_runtime().await;
let state_db = fixture.state_db;
let store = LocalAgentGraphStore::new(state_db);
let parent_thread_id = thread_id(/*suffix*/ 10);
let child_thread_id = thread_id(/*suffix*/ 11);
store
.upsert_thread_spawn_edge(
parent_thread_id,
child_thread_id,
ThreadSpawnEdgeStatus::Open,
)
.await
.expect("child edge should insert");
store
.set_thread_spawn_edge_status(child_thread_id, ThreadSpawnEdgeStatus::Closed)
.await
.expect("child edge should close");
let open_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Open))
.await
.expect("open children should load");
assert_eq!(open_children, Vec::<ThreadId>::new());
let closed_children = store
.list_thread_spawn_children(parent_thread_id, Some(ThreadSpawnEdgeStatus::Closed))
.await
.expect("closed children should load");
assert_eq!(closed_children, vec![child_thread_id]);
}
#[tokio::test]
async fn local_store_lists_descendants_breadth_first_with_status_filters() {
let fixture = state_runtime().await;
let state_db = fixture.state_db;
let store = LocalAgentGraphStore::new(state_db.clone());
let root_thread_id = thread_id(/*suffix*/ 20);
let later_child_thread_id = thread_id(/*suffix*/ 22);
let earlier_child_thread_id = thread_id(/*suffix*/ 21);
let closed_grandchild_thread_id = thread_id(/*suffix*/ 23);
let open_grandchild_thread_id = thread_id(/*suffix*/ 24);
let closed_child_thread_id = thread_id(/*suffix*/ 25);
let closed_great_grandchild_thread_id = thread_id(/*suffix*/ 26);
for (parent_thread_id, child_thread_id, status) in [
(
root_thread_id,
later_child_thread_id,
ThreadSpawnEdgeStatus::Open,
),
(
root_thread_id,
earlier_child_thread_id,
ThreadSpawnEdgeStatus::Open,
),
(
earlier_child_thread_id,
open_grandchild_thread_id,
ThreadSpawnEdgeStatus::Open,
),
(
later_child_thread_id,
closed_grandchild_thread_id,
ThreadSpawnEdgeStatus::Closed,
),
(
root_thread_id,
closed_child_thread_id,
ThreadSpawnEdgeStatus::Closed,
),
(
closed_child_thread_id,
closed_great_grandchild_thread_id,
ThreadSpawnEdgeStatus::Closed,
),
] {
store
.upsert_thread_spawn_edge(parent_thread_id, child_thread_id, status)
.await
.expect("edge should insert");
}
let all_descendants = store
.list_thread_spawn_descendants(root_thread_id, /*status_filter*/ None)
.await
.expect("all descendants should load");
assert_eq!(
all_descendants,
vec![
earlier_child_thread_id,
later_child_thread_id,
closed_child_thread_id,
closed_grandchild_thread_id,
open_grandchild_thread_id,
closed_great_grandchild_thread_id,
]
);
let open_descendants = store
.list_thread_spawn_descendants(root_thread_id, Some(ThreadSpawnEdgeStatus::Open))
.await
.expect("open descendants should load");
let state_open_descendants = state_db
.list_thread_spawn_descendants_with_status(
root_thread_id,
DirectionalThreadSpawnEdgeStatus::Open,
)
.await
.expect("state open descendants should load");
assert_eq!(open_descendants, state_open_descendants);
assert_eq!(
open_descendants,
vec![
earlier_child_thread_id,
later_child_thread_id,
open_grandchild_thread_id,
]
);
let closed_descendants = store
.list_thread_spawn_descendants(root_thread_id, Some(ThreadSpawnEdgeStatus::Closed))
.await
.expect("closed descendants should load");
assert_eq!(
closed_descendants,
vec![closed_child_thread_id, closed_great_grandchild_thread_id]
);
}
}

View File

@@ -0,0 +1,55 @@
use async_trait::async_trait;
use codex_protocol::ThreadId;
use crate::AgentGraphStoreResult;
use crate::ThreadSpawnEdgeStatus;
/// Storage-neutral boundary for persisted thread-spawn parent/child topology.
///
/// Implementations are expected to return stable ordering for list methods so callers can merge
/// persisted graph state with live in-memory state without introducing nondeterministic output.
#[async_trait]
pub trait AgentGraphStore: Send + Sync {
/// Insert or replace the directional parent/child edge for a spawned thread.
///
/// `child_thread_id` has at most one persisted parent. Re-inserting the same child should
/// update both the parent and status to match the supplied values.
async fn upsert_thread_spawn_edge(
&self,
parent_thread_id: ThreadId,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()>;
/// Update the persisted lifecycle status of a spawned thread's incoming edge.
///
/// Implementations should treat missing children as a successful no-op.
async fn set_thread_spawn_edge_status(
&self,
child_thread_id: ThreadId,
status: ThreadSpawnEdgeStatus,
) -> AgentGraphStoreResult<()>;
/// List direct spawned children of a parent thread.
///
/// When `status_filter` is `Some`, only child edges with that exact status are returned. When
/// it is `None`, all direct child edges are returned regardless of status, including statuses
/// that may be added by a future store implementation.
async fn list_thread_spawn_children(
&self,
parent_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>>;
/// List spawned descendants breadth-first by depth, then by thread id.
///
/// `status_filter` is applied to every traversed edge, not just to the returned descendants.
/// For example, `Some(Open)` walks only open edges, so descendants under a closed edge are not
/// included even if their own incoming edge is open. `None` walks and returns every persisted
/// edge regardless of status.
async fn list_thread_spawn_descendants(
&self,
root_thread_id: ThreadId,
status_filter: Option<ThreadSpawnEdgeStatus>,
) -> AgentGraphStoreResult<Vec<ThreadId>>;
}

View File

@@ -0,0 +1,42 @@
use serde::Deserialize;
use serde::Serialize;
/// Lifecycle status attached to a directional thread-spawn edge.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum ThreadSpawnEdgeStatus {
/// The child thread is still live or resumable as an open spawned agent.
Open,
/// The child thread has been closed from the parent/child graph's perspective.
Closed,
}
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
#[test]
fn thread_spawn_edge_status_serializes_as_snake_case() {
assert_eq!(
serde_json::to_string(&ThreadSpawnEdgeStatus::Open)
.expect("open status should serialize"),
"\"open\""
);
assert_eq!(
serde_json::to_string(&ThreadSpawnEdgeStatus::Closed)
.expect("closed status should serialize"),
"\"closed\""
);
assert_eq!(
serde_json::from_str::<ThreadSpawnEdgeStatus>("\"open\"")
.expect("open status should deserialize"),
ThreadSpawnEdgeStatus::Open
);
assert_eq!(
serde_json::from_str::<ThreadSpawnEdgeStatus>("\"closed\"")
.expect("closed status should deserialize"),
ThreadSpawnEdgeStatus::Closed
);
}
}

View File

@@ -19,6 +19,7 @@ chrono = { workspace = true }
codex-protocol = { workspace = true }
crypto_box = { workspace = true }
ed25519-dalek = { workspace = true }
jsonwebtoken = { workspace = true }
rand = { workspace = true }
reqwest = { workspace = true, features = ["json"] }
serde = { workspace = true, features = ["derive"] }

View File

@@ -8,6 +8,7 @@ use base64::engine::general_purpose::STANDARD as BASE64_STANDARD;
use base64::engine::general_purpose::URL_SAFE_NO_PAD;
use chrono::SecondsFormat;
use chrono::Utc;
use codex_protocol::auth::PlanType as AuthPlanType;
use codex_protocol::protocol::SessionSource;
use crypto_box::SecretKey as Curve25519SecretKey;
use ed25519_dalek::Signer as _;
@@ -15,14 +16,24 @@ use ed25519_dalek::SigningKey;
use ed25519_dalek::VerifyingKey;
use ed25519_dalek::pkcs8::DecodePrivateKey;
use ed25519_dalek::pkcs8::EncodePrivateKey;
use jsonwebtoken::Algorithm;
use jsonwebtoken::DecodingKey;
use jsonwebtoken::Validation;
use jsonwebtoken::decode;
use jsonwebtoken::decode_header;
use jsonwebtoken::jwk::JwkSet;
use rand::TryRngCore;
use rand::rngs::OsRng;
use serde::Deserialize;
use serde::Serialize;
use serde::de::DeserializeOwned;
use sha2::Digest as _;
use sha2::Sha512;
const AGENT_TASK_REGISTRATION_TIMEOUT: Duration = Duration::from_secs(30);
const AGENT_IDENTITY_JWKS_TIMEOUT: Duration = Duration::from_secs(10);
const AGENT_IDENTITY_JWT_AUDIENCE: &str = "codex-app-server";
const AGENT_IDENTITY_JWT_ISSUER: &str = "https://chatgpt.com/codex-backend/agent-identity";
/// Stored key material for a registered agent identity.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
@@ -50,6 +61,22 @@ pub struct GeneratedAgentKeyMaterial {
pub public_key_ssh: String,
}
/// Claims carried by an Agent Identity JWT.
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
pub struct AgentIdentityJwtClaims {
pub iss: String,
pub aud: String,
pub iat: usize,
pub exp: usize,
pub agent_runtime_id: String,
pub agent_private_key: String,
pub account_id: String,
pub chatgpt_user_id: String,
pub email: String,
pub plan_type: AuthPlanType,
pub chatgpt_account_is_fedramp: bool,
}
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)]
struct AgentAssertionEnvelope {
agent_runtime_id: String,
@@ -98,6 +125,65 @@ pub fn authorization_header_for_agent_task(
Ok(format!("AgentAssertion {serialized_assertion}"))
}
pub async fn fetch_agent_identity_jwks(
client: &reqwest::Client,
chatgpt_base_url: &str,
) -> Result<JwkSet> {
let response = client
.get(agent_identity_jwks_url(chatgpt_base_url))
.timeout(AGENT_IDENTITY_JWKS_TIMEOUT)
.send()
.await
.context("failed to request agent identity JWKS")?
.error_for_status()
.context("agent identity JWKS endpoint returned an error")?;
response
.json()
.await
.context("failed to decode agent identity JWKS")
}
pub fn decode_agent_identity_jwt(
jwt: &str,
jwks: Option<&JwkSet>,
) -> Result<AgentIdentityJwtClaims> {
let Some(jwks) = jwks else {
return decode_agent_identity_jwt_payload(jwt);
};
let header = decode_header(jwt).context("failed to decode agent identity JWT header")?;
let kid = header
.kid
.context("agent identity JWT header does not include a kid")?;
let jwk = jwks
.find(&kid)
.with_context(|| format!("agent identity JWT kid {kid} is not trusted"))?;
let decoding_key = DecodingKey::from_jwk(jwk).context("failed to build JWT decoding key")?;
let mut validation = Validation::new(Algorithm::RS256);
validation.set_audience(&[AGENT_IDENTITY_JWT_AUDIENCE]);
validation.set_issuer(&[AGENT_IDENTITY_JWT_ISSUER]);
validation.required_spec_claims.insert("iss".to_string());
validation.required_spec_claims.insert("aud".to_string());
decode::<AgentIdentityJwtClaims>(jwt, &decoding_key, &validation)
.map(|data| data.claims)
.context("failed to verify agent identity JWT")
}
fn decode_agent_identity_jwt_payload<T: DeserializeOwned>(jwt: &str) -> Result<T> {
let mut parts = jwt.split('.');
let (_header_b64, payload_b64, _sig_b64) = match (parts.next(), parts.next(), parts.next()) {
(Some(h), Some(p), Some(s)) if !h.is_empty() && !p.is_empty() && !s.is_empty() => (h, p, s),
_ => anyhow::bail!("invalid agent identity JWT format"),
};
anyhow::ensure!(parts.next().is_none(), "invalid agent identity JWT format");
let payload_bytes = URL_SAFE_NO_PAD
.decode(payload_b64)
.context("agent identity JWT payload is not valid base64url")?;
serde_json::from_slice(&payload_bytes).context("agent identity JWT payload is not valid JSON")
}
pub fn sign_task_registration_payload(
key: AgentIdentityKey<'_>,
timestamp: &str,
@@ -117,19 +203,27 @@ pub async fn register_agent_task(
signature: sign_task_registration_payload(key, &timestamp)?,
timestamp,
};
let url = agent_task_registration_url(chatgpt_base_url, key.agent_runtime_id);
let response = client
.post(agent_task_registration_url(
chatgpt_base_url,
key.agent_runtime_id,
))
.post(url)
.timeout(AGENT_TASK_REGISTRATION_TIMEOUT)
.json(&request)
.send()
.await
.context("failed to register agent task")?
.error_for_status()
.context("failed to register agent task")?
.context("failed to register agent task")?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
let body = if body.len() > 512 {
format!("{}...", body.chars().take(512).collect::<String>())
} else {
body
};
anyhow::bail!("failed to register agent task with status {status}: {body}");
}
let response = response
.json()
.await
.context("failed to decode agent task registration response")?;
@@ -217,6 +311,15 @@ pub fn agent_identity_biscuit_url(chatgpt_base_url: &str) -> String {
format!("{trimmed}/authenticate_app_v2")
}
pub fn agent_identity_jwks_url(chatgpt_base_url: &str) -> String {
let trimmed = chatgpt_base_url.trim_end_matches('/');
if trimmed.contains("/backend-api") {
format!("{trimmed}/wham/agent-identities/jwks")
} else {
format!("{trimmed}/agent-identities/jwks")
}
}
pub fn agent_identity_request_id() -> Result<String> {
let mut request_id_bytes = [0u8; 16];
OsRng
@@ -228,29 +331,6 @@ pub fn agent_identity_request_id() -> Result<String> {
))
}
pub fn normalize_chatgpt_base_url(chatgpt_base_url: &str) -> String {
let mut base_url = chatgpt_base_url.trim_end_matches('/').to_string();
for suffix in [
"/wham/remote/control/server/enroll",
"/wham/remote/control/server",
] {
if let Some(stripped) = base_url.strip_suffix(suffix) {
base_url = stripped.to_string();
break;
}
}
if let Some(stripped) = base_url.strip_suffix("/codex") {
base_url = stripped.to_string();
}
if (base_url.starts_with("https://chatgpt.com")
|| base_url.starts_with("https://chat.openai.com"))
&& !base_url.contains("/backend-api")
{
base_url = format!("{base_url}/backend-api");
}
base_url
}
pub fn build_abom(session_source: SessionSource) -> AgentBillOfMaterials {
AgentBillOfMaterials {
agent_version: env!("CARGO_PKG_VERSION").to_string(),
@@ -260,6 +340,7 @@ pub fn build_abom(session_source: SessionSource) -> AgentBillOfMaterials {
| SessionSource::Exec
| SessionSource::Mcp
| SessionSource::Custom(_)
| SessionSource::Internal(_)
| SessionSource::SubAgent(_)
| SessionSource::Unknown => "codex-cli".to_string(),
},
@@ -323,8 +404,12 @@ mod tests {
use base64::Engine as _;
use ed25519_dalek::Signature;
use ed25519_dalek::Verifier as _;
use jsonwebtoken::EncodingKey;
use jsonwebtoken::Header;
use pretty_assertions::assert_eq;
use codex_protocol::auth::KnownPlan;
use super::*;
#[test]
@@ -405,10 +490,248 @@ mod tests {
}
#[test]
fn normalize_chatgpt_base_url_strips_codex_before_backend_api() {
fn decode_agent_identity_jwt_reads_claims() {
let jwt = jwt_with_payload(serde_json::json!({
"iss": AGENT_IDENTITY_JWT_ISSUER,
"aud": AGENT_IDENTITY_JWT_AUDIENCE,
"iat": 1_700_000_000usize,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "pro",
"chatgpt_account_is_fedramp": false,
}));
let claims = decode_agent_identity_jwt(&jwt, /*jwks*/ None).expect("JWT should decode");
assert_eq!(
normalize_chatgpt_base_url("https://chatgpt.com/codex"),
"https://chatgpt.com/backend-api"
claims,
AgentIdentityJwtClaims {
iss: AGENT_IDENTITY_JWT_ISSUER.to_string(),
aud: AGENT_IDENTITY_JWT_AUDIENCE.to_string(),
iat: 1_700_000_000,
exp: 4_000_000_000,
agent_runtime_id: "agent-runtime-id".to_string(),
agent_private_key: "private-key".to_string(),
account_id: "account-id".to_string(),
chatgpt_user_id: "user-id".to_string(),
email: "user@example.com".to_string(),
plan_type: AuthPlanType::Known(KnownPlan::Pro),
chatgpt_account_is_fedramp: false,
}
);
}
#[test]
fn decode_agent_identity_jwt_maps_raw_plan_aliases() {
let jwt = jwt_with_payload(serde_json::json!({
"iss": AGENT_IDENTITY_JWT_ISSUER,
"aud": AGENT_IDENTITY_JWT_AUDIENCE,
"iat": 1_700_000_000usize,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "hc",
"chatgpt_account_is_fedramp": false,
}));
let claims = decode_agent_identity_jwt(&jwt, /*jwks*/ None).expect("JWT should decode");
assert_eq!(claims.plan_type, AuthPlanType::Known(KnownPlan::Enterprise));
}
#[test]
fn decode_agent_identity_jwt_verifies_when_jwks_is_present() {
let jwks = test_jwks("test-key");
let claims = AgentIdentityJwtClaims {
iss: AGENT_IDENTITY_JWT_ISSUER.to_string(),
aud: AGENT_IDENTITY_JWT_AUDIENCE.to_string(),
iat: 1_700_000_000,
exp: 4_000_000_000,
agent_runtime_id: "agent-runtime-id".to_string(),
agent_private_key: "private-key".to_string(),
account_id: "account-id".to_string(),
chatgpt_user_id: "user-id".to_string(),
email: "user@example.com".to_string(),
plan_type: AuthPlanType::Known(KnownPlan::Pro),
chatgpt_account_is_fedramp: false,
};
let jwt = jsonwebtoken::encode(
&test_jwt_header("test-key"),
&serde_json::json!({
"iss": claims.iss,
"aud": claims.aud,
"iat": claims.iat,
"exp": claims.exp,
"agent_runtime_id": claims.agent_runtime_id,
"agent_private_key": claims.agent_private_key,
"account_id": claims.account_id,
"chatgpt_user_id": claims.chatgpt_user_id,
"email": claims.email,
"plan_type": "pro",
"chatgpt_account_is_fedramp": claims.chatgpt_account_is_fedramp,
}),
&test_rsa_encoding_key(),
)
.expect("JWT should encode");
let expected_claims = AgentIdentityJwtClaims {
iss: AGENT_IDENTITY_JWT_ISSUER.to_string(),
aud: AGENT_IDENTITY_JWT_AUDIENCE.to_string(),
iat: 1_700_000_000,
exp: 4_000_000_000,
agent_runtime_id: "agent-runtime-id".to_string(),
agent_private_key: "private-key".to_string(),
account_id: "account-id".to_string(),
chatgpt_user_id: "user-id".to_string(),
email: "user@example.com".to_string(),
plan_type: AuthPlanType::Known(KnownPlan::Pro),
chatgpt_account_is_fedramp: false,
};
assert_eq!(
decode_agent_identity_jwt(&jwt, Some(&jwks)).expect("JWT should verify"),
expected_claims
);
}
#[test]
fn decode_agent_identity_jwt_rejects_untrusted_kid() {
let jwks = test_jwks("other-key");
let jwt = jsonwebtoken::encode(
&test_jwt_header("test-key"),
&serde_json::json!({
"iss": AGENT_IDENTITY_JWT_ISSUER,
"aud": AGENT_IDENTITY_JWT_AUDIENCE,
"iat": 1_700_000_000,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "pro",
"chatgpt_account_is_fedramp": false,
}),
&test_rsa_encoding_key(),
)
.expect("JWT should encode");
decode_agent_identity_jwt(&jwt, Some(&jwks)).expect_err("JWT should not verify");
}
#[test]
fn decode_agent_identity_jwt_requires_issuer_and_audience() {
let jwks = test_jwks("test-key");
let jwt = jsonwebtoken::encode(
&test_jwt_header("test-key"),
&serde_json::json!({
"iat": 1_700_000_000,
"exp": 4_000_000_000usize,
"agent_runtime_id": "agent-runtime-id",
"agent_private_key": "private-key",
"account_id": "account-id",
"chatgpt_user_id": "user-id",
"email": "user@example.com",
"plan_type": "pro",
"chatgpt_account_is_fedramp": false,
}),
&test_rsa_encoding_key(),
)
.expect("JWT should encode");
decode_agent_identity_jwt(&jwt, Some(&jwks)).expect_err("JWT should not verify");
}
fn test_jwt_header(kid: &str) -> Header {
let mut header = Header::new(Algorithm::RS256);
header.kid = Some(kid.to_string());
header
}
fn test_rsa_encoding_key() -> EncodingKey {
EncodingKey::from_rsa_pem(
br#"-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDWpAXYypOsYAwO
bvBduMk/mxaoYDze0AZSzaSzLuIlcsl2EKDgC3AabhIWXh/qTGEJLOU3VB1e5mO9
FPbBlmIZSL3FQTbyt/hYutPFKfCou5PLmScw/TzILS3/RhT8UY9kxxZvXiEbTki9
mvxRuZFpVqDFJHwfitIjKZGhXDCYVKurPTrxetYZJg0h8sQBLKjkZ0BqqaTUkAsg
0eBgZAlXEzG3By8PGhUqYLt6W1Q3KYw0FmGy/gTyzH1g0ukGgSJvOd8SkNT8MbOs
zl5kKxDNqpuEE6UZ3jbuJ+5382d31w+rOAJRzbf7QVdI9+luCSwJcDACYPQ4WNBa
uCpV0ovpAgMBAAECggEAVu84LwZdqYN9XpswX8VoPYrjMm9IODapWQBRpQFoNyK2
1ksF3bjEPvA2Azk8U/l7k+vLKw22l6lY3EyRZPcz5GnB8xLm3ogE3mtNOp4yCyVu
RxhQ91aaN7mU17/a4BdorLi2LYVCg3zBmYociD1Q2AluNGsCmwPu+K7tfR2J0Sg8
NjqiTbDG1XDpR/icwgC9t6vh8lZpCHDhF4tbQfLLVLeA/OdcuzXDyMCXbmdVIdBQ
rm4aIFmr2e1/2ctTbCg85S6AGFTH+pSLjrwTzyvf+F6NW5uNjLQAQLFj+EznBDxj
Xdx90cySrjsKK6PVWQF4RiTvkSW8eWL7R6B2FZbGwQKBgQDuVQRj72hWloR7mbEL
aUEEv3pIXTMXWEsoMBNczos/1L1RnAN1AI44TurznasPZAWvQj+kVbLDR+TAeZrL
iA8HIWswQUI18hFmgKzSkwIXGtubcKVrgsKeS4lMDKCM/Ef6WAYdeq6ronoY5lCN
YrJFmGp81W5zcV7lyiycgbSiGwKBgQDmjWYf6pZjrK7Z+OJ3X1AZfi2vss15SCvL
3fPgzIDbViztpGyQhc3DQZIsBNIu0xZp/veGce9TEeTds2ro9NfdJFeou8+fC7Pq
sOsM3amGFFi+ZW/9BWyjZEM88bgWWAjqLHbpfHDxjAf5CSxddqxgHlbP0Ytyb1Vg
gmPDn9YKSwKBgQDbTi3hC35WFuDHn0/zcSHcDZmnFuOZeqyFyV83yfMGhGrEuqvP
sPgtRikajJ3IZsB4WZyYSidZXEFY/0z6NjOl2xF38MTNQPbT/FmK1q1Yt2UWrlv5
BvSwlk87RG9D7C0LZo4R+D7cPoDdgqjiwMvMEIkEX5zn641oI1ZTmWKuuwKBgQCD
KF+3unnRvHRAVoFnTZbA2fJdqMeRvogD04GhGlYX8V9f1hFY6nXTJaNlXVzA/J8c
r8ra9kgjJuPfZ+ljG58OFFW2DRohLcQtuHYPfK6rMzoFHqnl9EcIcMp7ijuionR3
29HOJFgQYgxLFXfit9d6WugiE+BTupiEbckZif13HwKBgE/lAlkVHP6YahOO2Ljc
J1bwkqKZTB5dHolX9A58e/xXnfZ5P8f3Z83+Izap3FwqQulk7b1WO1MQcHuVg2NN
5da0D4h2rYOXnbYIg0BVu4spQbaM6ewsp66b8+MzLOBvj8SzWdt1Oyw0q/MRyQAR
8U4M2TSWCKUY/A6sT4W8+mT9
-----END PRIVATE KEY-----"#,
)
.expect("test RSA key should parse")
}
fn test_jwks(kid: &str) -> jsonwebtoken::jwk::JwkSet {
serde_json::from_value(serde_json::json!({
"keys": [{
"kty": "RSA",
"kid": kid,
"use": "sig",
"alg": "RS256",
"n": "1qQF2MqTrGAMDm7wXbjJP5sWqGA83tAGUs2ksy7iJXLJdhCg4AtwGm4SFl4f6kxhCSzlN1QdXuZjvRT2wZZiGUi9xUE28rf4WLrTxSnwqLuTy5knMP08yC0t_0YU_FGPZMcWb14hG05IvZr8UbmRaVagxSR8H4rSIymRoVwwmFSrqz068XrWGSYNIfLEASyo5GdAaqmk1JALINHgYGQJVxMxtwcvDxoVKmC7eltUNymMNBZhsv4E8sx9YNLpBoEibznfEpDU_DGzrM5eZCsQzaqbhBOlGd427ifud_Nnd9cPqzgCUc23-0FXSPfpbgksCXAwAmD0OFjQWrgqVdKL6Q",
"e": "AQAB",
}]
}))
.expect("test JWKS should parse")
}
#[test]
fn agent_identity_jwks_url_uses_backend_api_base_url() {
assert_eq!(
agent_identity_jwks_url("https://chatgpt.com/backend-api"),
"https://chatgpt.com/backend-api/wham/agent-identities/jwks"
);
assert_eq!(
agent_identity_jwks_url("https://chatgpt.com/backend-api/"),
"https://chatgpt.com/backend-api/wham/agent-identities/jwks"
);
}
#[test]
fn agent_identity_jwks_url_uses_codex_api_base_url() {
assert_eq!(
agent_identity_jwks_url("http://localhost:8080/api/codex"),
"http://localhost:8080/api/codex/agent-identities/jwks"
);
assert_eq!(
agent_identity_jwks_url("http://localhost:8080/api/codex/"),
"http://localhost:8080/api/codex/agent-identities/jwks"
);
}
fn jwt_with_payload(payload: serde_json::Value) -> String {
let encode = |bytes: &[u8]| URL_SAFE_NO_PAD.encode(bytes);
let header_b64 = encode(br#"{"alg":"none","typ":"JWT"}"#);
let payload_b64 = encode(&serde_json::to_vec(&payload).expect("payload should serialize"));
let signature_b64 = encode(b"sig");
format!("{header_b64}.{payload_b64}.{signature_b64}")
}
}

View File

@@ -16,6 +16,7 @@ workspace = true
codex-app-server-protocol = { workspace = true }
codex-git-utils = { workspace = true }
codex-login = { workspace = true }
codex-model-provider = { workspace = true }
codex-plugin = { workspace = true }
codex-protocol = { workspace = true }
os_info = { workspace = true }

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,6 @@
use crate::events::AppServerRpcTransport;
use crate::events::GuardianReviewEventParams;
use crate::events::GuardianReviewAnalyticsResult;
use crate::events::GuardianReviewTrackContext;
use crate::events::TrackEventRequest;
use crate::events::TrackEventsRequest;
use crate::events::current_runtime_metadata;
@@ -21,11 +22,13 @@ use crate::facts::TurnResolvedConfigFact;
use crate::facts::TurnTokenUsageFact;
use crate::reducer::AnalyticsReducer;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::ClientResponse;
use codex_app_server_protocol::ClientResponsePayload;
use codex_app_server_protocol::InitializeParams;
use codex_app_server_protocol::JSONRPCErrorError;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ServerRequest;
use codex_app_server_protocol::ServerResponse;
use codex_login::AuthManager;
use codex_login::default_client::create_client;
use codex_plugin::PluginTelemetryMetadata;
@@ -48,8 +51,7 @@ pub(crate) struct AnalyticsEventsQueue {
#[derive(Clone)]
pub struct AnalyticsEventsClient {
queue: AnalyticsEventsQueue,
analytics_enabled: Option<bool>,
queue: Option<AnalyticsEventsQueue>,
}
impl AnalyticsEventsQueue {
@@ -118,11 +120,15 @@ impl AnalyticsEventsClient {
analytics_enabled: Option<bool>,
) -> Self {
Self {
queue: AnalyticsEventsQueue::new(Arc::clone(&auth_manager), base_url),
analytics_enabled,
queue: (analytics_enabled != Some(false))
.then(|| AnalyticsEventsQueue::new(Arc::clone(&auth_manager), base_url)),
}
}
pub fn disabled() -> Self {
Self { queue: None }
}
pub fn track_skill_invocations(
&self,
tracking: TrackEventsContext,
@@ -161,9 +167,13 @@ impl AnalyticsEventsClient {
));
}
pub fn track_guardian_review(&self, input: GuardianReviewEventParams) {
pub fn track_guardian_review(
&self,
tracking: &GuardianReviewTrackContext,
result: GuardianReviewAnalyticsResult,
) {
self.record_fact(AnalyticsFact::Custom(CustomAnalyticsFact::GuardianReview(
Box::new(input),
Box::new(tracking.event_params(result)),
)));
}
@@ -176,16 +186,30 @@ impl AnalyticsEventsClient {
)));
}
pub fn track_request(&self, connection_id: u64, request_id: RequestId, request: ClientRequest) {
self.record_fact(AnalyticsFact::Request {
pub fn track_request(
&self,
connection_id: u64,
request_id: RequestId,
request: &ClientRequest,
) {
if !matches!(
request,
ClientRequest::TurnStart { .. } | ClientRequest::TurnSteer { .. }
) {
return;
}
self.record_fact(AnalyticsFact::ClientRequest {
connection_id,
request_id,
request: Box::new(request),
request: Box::new(request.clone()),
});
}
pub fn track_app_used(&self, tracking: TrackEventsContext, app: AppInvocation) {
if !self.queue.should_enqueue_app_used(&tracking, &app) {
let Some(queue) = self.queue.as_ref() else {
return;
};
if !queue.should_enqueue_app_used(&tracking, &app) {
return;
}
self.record_fact(AnalyticsFact::Custom(CustomAnalyticsFact::AppUsed(
@@ -200,7 +224,10 @@ impl AnalyticsEventsClient {
}
pub fn track_plugin_used(&self, tracking: TrackEventsContext, plugin: PluginTelemetryMetadata) {
if !self.queue.should_enqueue_plugin_used(&tracking, &plugin) {
let Some(queue) = self.queue.as_ref() else {
return;
};
if !queue.should_enqueue_plugin_used(&tracking, &plugin) {
return;
}
self.record_fact(AnalyticsFact::Custom(CustomAnalyticsFact::PluginUsed(
@@ -263,15 +290,30 @@ impl AnalyticsEventsClient {
}
pub(crate) fn record_fact(&self, input: AnalyticsFact) {
if self.analytics_enabled == Some(false) {
return;
if let Some(queue) = self.queue.as_ref() {
queue.try_send(input);
}
self.queue.try_send(input);
}
pub fn track_response(&self, connection_id: u64, response: ClientResponse) {
self.record_fact(AnalyticsFact::Response {
pub fn track_response(
&self,
connection_id: u64,
request_id: RequestId,
response: ClientResponsePayload,
) {
if !matches!(
response,
ClientResponsePayload::ThreadStart(_)
| ClientResponsePayload::ThreadResume(_)
| ClientResponsePayload::ThreadFork(_)
| ClientResponsePayload::TurnStart(_)
| ClientResponsePayload::TurnSteer(_)
) {
return;
}
self.record_fact(AnalyticsFact::ClientResponse {
connection_id,
request_id,
response: Box::new(response),
});
}
@@ -291,7 +333,31 @@ impl AnalyticsEventsClient {
});
}
pub fn track_server_request(&self, connection_id: u64, request: ServerRequest) {
self.record_fact(AnalyticsFact::ServerRequest {
connection_id,
request: Box::new(request),
});
}
pub fn track_server_response(&self, response: ServerResponse) {
self.record_fact(AnalyticsFact::ServerResponse {
response: Box::new(response),
});
}
pub fn track_notification(&self, notification: ServerNotification) {
if !matches!(
notification,
ServerNotification::TurnStarted(_)
| ServerNotification::TurnCompleted(_)
| ServerNotification::ItemStarted(_)
| ServerNotification::ItemCompleted(_)
| ServerNotification::ItemGuardianApprovalReviewStarted(_)
| ServerNotification::ItemGuardianApprovalReviewCompleted(_)
) {
return;
}
self.record_fact(AnalyticsFact::Notification(Box::new(notification)));
}
}
@@ -307,16 +373,9 @@ async fn send_track_events(
let Some(auth) = auth_manager.auth().await else {
return;
};
if !auth.is_chatgpt_auth() {
if !auth.uses_codex_backend() {
return;
}
let access_token = match auth.get_token() {
Ok(token) => token,
Err(_) => return,
};
let Some(account_id) = auth.get_account_id() else {
return;
};
let base_url = base_url.trim_end_matches('/');
let url = format!("{base_url}/codex/analytics-events/events");
@@ -325,8 +384,7 @@ async fn send_track_events(
let response = create_client()
.post(&url)
.timeout(ANALYTICS_EVENTS_TIMEOUT)
.bearer_auth(&access_token)
.header("chatgpt-account-id", &account_id)
.headers(codex_model_provider::auth_provider_from_auth(&auth).to_auth_headers())
.header("Content-Type", "application/json")
.json(&payload)
.send()
@@ -344,3 +402,7 @@ async fn send_track_events(
}
}
}
#[cfg(test)]
#[path = "client_tests.rs"]
mod tests;

View File

@@ -0,0 +1,224 @@
use super::AnalyticsEventsClient;
use super::AnalyticsEventsQueue;
use crate::facts::AnalyticsFact;
use codex_app_server_protocol::ApprovalsReviewer as AppServerApprovalsReviewer;
use codex_app_server_protocol::AskForApproval as AppServerAskForApproval;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::ClientResponsePayload;
use codex_app_server_protocol::PermissionProfile as AppServerPermissionProfile;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::SandboxPolicy as AppServerSandboxPolicy;
use codex_app_server_protocol::SessionSource as AppServerSessionSource;
use codex_app_server_protocol::Thread;
use codex_app_server_protocol::ThreadArchiveParams;
use codex_app_server_protocol::ThreadArchiveResponse;
use codex_app_server_protocol::ThreadForkResponse;
use codex_app_server_protocol::ThreadResumeResponse;
use codex_app_server_protocol::ThreadStartResponse;
use codex_app_server_protocol::ThreadStatus as AppServerThreadStatus;
use codex_app_server_protocol::Turn;
use codex_app_server_protocol::TurnStartParams;
use codex_app_server_protocol::TurnStartResponse;
use codex_app_server_protocol::TurnStatus as AppServerTurnStatus;
use codex_app_server_protocol::TurnSteerParams;
use codex_app_server_protocol::TurnSteerResponse;
use codex_protocol::models::PermissionProfile as CorePermissionProfile;
use codex_utils_absolute_path::test_support::PathBufExt;
use codex_utils_absolute_path::test_support::test_path_buf;
use std::collections::HashSet;
use std::sync::Arc;
use std::sync::Mutex;
use tokio::sync::mpsc;
use tokio::sync::mpsc::error::TryRecvError;
fn client_with_receiver() -> (AnalyticsEventsClient, mpsc::Receiver<AnalyticsFact>) {
let (sender, receiver) = mpsc::channel(8);
let queue = AnalyticsEventsQueue {
sender,
app_used_emitted_keys: Arc::new(Mutex::new(HashSet::new())),
plugin_used_emitted_keys: Arc::new(Mutex::new(HashSet::new())),
};
(AnalyticsEventsClient { queue: Some(queue) }, receiver)
}
fn sample_turn_start_request() -> ClientRequest {
ClientRequest::TurnStart {
request_id: RequestId::Integer(1),
params: TurnStartParams {
thread_id: "thread-1".to_string(),
input: Vec::new(),
..Default::default()
},
}
}
fn sample_turn_steer_request() -> ClientRequest {
ClientRequest::TurnSteer {
request_id: RequestId::Integer(2),
params: TurnSteerParams {
thread_id: "thread-1".to_string(),
expected_turn_id: "turn-1".to_string(),
input: Vec::new(),
responsesapi_client_metadata: None,
},
}
}
fn sample_thread_archive_request() -> ClientRequest {
ClientRequest::ThreadArchive {
request_id: RequestId::Integer(3),
params: ThreadArchiveParams {
thread_id: "thread-1".to_string(),
},
}
}
fn sample_thread(thread_id: &str) -> Thread {
Thread {
id: thread_id.to_string(),
session_id: format!("session-{thread_id}"),
forked_from_id: None,
preview: "first prompt".to_string(),
ephemeral: false,
model_provider: "openai".to_string(),
created_at: 1,
updated_at: 2,
status: AppServerThreadStatus::Idle,
path: None,
cwd: test_path_buf("/tmp").abs(),
cli_version: "0.0.0".to_string(),
source: AppServerSessionSource::Exec,
thread_source: None,
agent_nickname: None,
agent_role: None,
git_info: None,
name: None,
turns: Vec::new(),
}
}
fn sample_permission_profile() -> AppServerPermissionProfile {
CorePermissionProfile::Disabled.into()
}
fn sample_thread_start_response() -> ClientResponsePayload {
ClientResponsePayload::ThreadStart(ThreadStartResponse {
thread: sample_thread("thread-1"),
model: "gpt-5".to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: Some(sample_permission_profile()),
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_thread_resume_response() -> ClientResponsePayload {
ClientResponsePayload::ThreadResume(ThreadResumeResponse {
thread: sample_thread("thread-2"),
model: "gpt-5".to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: Some(sample_permission_profile()),
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_thread_fork_response() -> ClientResponsePayload {
ClientResponsePayload::ThreadFork(ThreadForkResponse {
thread: sample_thread("thread-3"),
model: "gpt-5".to_string(),
model_provider: "openai".to_string(),
service_tier: None,
cwd: test_path_buf("/tmp").abs(),
instruction_sources: Vec::new(),
approval_policy: AppServerAskForApproval::OnFailure,
approvals_reviewer: AppServerApprovalsReviewer::User,
sandbox: AppServerSandboxPolicy::DangerFullAccess,
permission_profile: Some(sample_permission_profile()),
active_permission_profile: None,
reasoning_effort: None,
})
}
fn sample_turn_start_response() -> ClientResponsePayload {
ClientResponsePayload::TurnStart(TurnStartResponse {
turn: Turn {
id: "turn-1".to_string(),
items_view: codex_app_server_protocol::TurnItemsView::Full,
items: Vec::new(),
status: AppServerTurnStatus::InProgress,
error: None,
started_at: None,
completed_at: None,
duration_ms: None,
},
})
}
fn sample_turn_steer_response() -> ClientResponsePayload {
ClientResponsePayload::TurnSteer(TurnSteerResponse {
turn_id: "turn-2".to_string(),
})
}
#[test]
fn track_request_only_enqueues_analytics_relevant_requests() {
let (client, mut receiver) = client_with_receiver();
for (request_id, request) in [
(RequestId::Integer(1), sample_turn_start_request()),
(RequestId::Integer(2), sample_turn_steer_request()),
] {
client.track_request(/*connection_id*/ 7, request_id, &request);
assert!(matches!(
receiver.try_recv(),
Ok(AnalyticsFact::ClientRequest { .. })
));
}
let ignored_request = sample_thread_archive_request();
client.track_request(
/*connection_id*/ 7,
RequestId::Integer(3),
&ignored_request,
);
assert!(matches!(receiver.try_recv(), Err(TryRecvError::Empty)));
}
#[test]
fn track_response_only_enqueues_analytics_relevant_responses() {
let (client, mut receiver) = client_with_receiver();
for (request_id, response) in [
(RequestId::Integer(1), sample_thread_start_response()),
(RequestId::Integer(2), sample_thread_resume_response()),
(RequestId::Integer(3), sample_thread_fork_response()),
(RequestId::Integer(4), sample_turn_start_response()),
(RequestId::Integer(5), sample_turn_steer_response()),
] {
client.track_response(/*connection_id*/ 7, request_id, response);
assert!(matches!(
receiver.try_recv(),
Ok(AnalyticsFact::ClientResponse { .. })
));
}
client.track_response(
/*connection_id*/ 7,
RequestId::Integer(6),
ClientResponsePayload::ThreadArchive(ThreadArchiveResponse {}),
);
assert!(matches!(receiver.try_recv(), Err(TryRecvError::Empty)));
}

View File

@@ -1,3 +1,5 @@
use std::time::Instant;
use crate::facts::AppInvocation;
use crate::facts::CodexCompactionEvent;
use crate::facts::CompactionImplementation;
@@ -16,11 +18,13 @@ use crate::facts::TurnStatus;
use crate::facts::TurnSteerRejectionReason;
use crate::facts::TurnSteerResult;
use crate::facts::TurnSubmissionType;
use crate::now_unix_seconds;
use codex_app_server_protocol::CodexErrorInfo;
use codex_app_server_protocol::CommandExecutionSource;
use codex_login::default_client::originator;
use codex_plugin::PluginTelemetryMetadata;
use codex_protocol::approvals::NetworkApprovalProtocol;
use codex_protocol::models::PermissionProfile;
use codex_protocol::models::AdditionalPermissionProfile;
use codex_protocol::models::SandboxPermissions;
use codex_protocol::protocol::GuardianAssessmentOutcome;
use codex_protocol::protocol::GuardianCommandSource;
@@ -30,6 +34,8 @@ use codex_protocol::protocol::HookEventName;
use codex_protocol::protocol::HookRunStatus;
use codex_protocol::protocol::HookSource;
use codex_protocol::protocol::SubAgentSource;
use codex_protocol::protocol::ThreadSource;
use codex_protocol::protocol::TokenUsage;
use serde::Serialize;
#[derive(Clone, Copy, Debug, Serialize)]
@@ -57,6 +63,13 @@ pub(crate) enum TrackEventRequest {
Compaction(Box<CodexCompactionEventRequest>),
TurnEvent(Box<CodexTurnEventRequest>),
TurnSteer(CodexTurnSteerEventRequest),
CommandExecution(CodexCommandExecutionEventRequest),
FileChange(CodexFileChangeEventRequest),
McpToolCall(CodexMcpToolCallEventRequest),
DynamicToolCall(CodexDynamicToolCallEventRequest),
CollabAgentToolCall(CodexCollabAgentToolCallEventRequest),
WebSearch(CodexWebSearchEventRequest),
ImageGeneration(CodexImageGenerationEventRequest),
PluginUsed(CodexPluginUsedEventRequest),
PluginInstalled(CodexPluginEventRequest),
PluginUninstalled(CodexPluginEventRequest),
@@ -76,8 +89,10 @@ pub(crate) struct SkillInvocationEventRequest {
pub(crate) struct SkillInvocationEventParams {
pub(crate) product_client_id: Option<String>,
pub(crate) skill_scope: Option<String>,
pub(crate) plugin_id: Option<String>,
pub(crate) repo_url: Option<String>,
pub(crate) thread_id: Option<String>,
pub(crate) turn_id: Option<String>,
pub(crate) invoke_type: Option<InvocationType>,
pub(crate) model_slug: Option<String>,
}
@@ -106,7 +121,7 @@ pub(crate) struct ThreadInitializedEventParams {
pub(crate) runtime: CodexRuntimeMetadata,
pub(crate) model: String,
pub(crate) ephemeral: bool,
pub(crate) thread_source: Option<&'static str>,
pub(crate) thread_source: Option<ThreadSource>,
pub(crate) initialization_mode: ThreadInitializationMode,
pub(crate) subagent_source: Option<String>,
pub(crate) parent_thread_id: Option<String>,
@@ -176,17 +191,17 @@ pub enum GuardianApprovalRequestSource {
pub enum GuardianReviewedAction {
Shell {
sandbox_permissions: SandboxPermissions,
additional_permissions: Option<PermissionProfile>,
additional_permissions: Option<AdditionalPermissionProfile>,
},
UnifiedExec {
sandbox_permissions: SandboxPermissions,
additional_permissions: Option<PermissionProfile>,
additional_permissions: Option<AdditionalPermissionProfile>,
tty: bool,
},
Execve {
source: GuardianCommandSource,
program: String,
additional_permissions: Option<PermissionProfile>,
additional_permissions: Option<AdditionalPermissionProfile>,
},
ApplyPatch {},
NetworkAccess {
@@ -200,6 +215,7 @@ pub enum GuardianReviewedAction {
connector_name: Option<String>,
tool_title: Option<String>,
},
RequestPermissions {},
}
#[derive(Clone, Serialize)]
@@ -235,6 +251,142 @@ pub struct GuardianReviewEventParams {
pub total_tokens: Option<i64>,
}
pub struct GuardianReviewTrackContext {
thread_id: String,
turn_id: String,
review_id: String,
target_item_id: Option<String>,
approval_request_source: GuardianApprovalRequestSource,
reviewed_action: GuardianReviewedAction,
review_timeout_ms: u64,
started_at: u64,
started_instant: Instant,
}
impl GuardianReviewTrackContext {
pub fn new(
thread_id: String,
turn_id: String,
review_id: String,
target_item_id: Option<String>,
approval_request_source: GuardianApprovalRequestSource,
reviewed_action: GuardianReviewedAction,
review_timeout_ms: u64,
) -> Self {
Self {
thread_id,
turn_id,
review_id,
target_item_id,
approval_request_source,
reviewed_action,
review_timeout_ms,
started_at: now_unix_seconds(),
started_instant: Instant::now(),
}
}
pub(crate) fn event_params(
&self,
result: GuardianReviewAnalyticsResult,
) -> GuardianReviewEventParams {
GuardianReviewEventParams {
thread_id: self.thread_id.clone(),
turn_id: self.turn_id.clone(),
review_id: self.review_id.clone(),
target_item_id: self.target_item_id.clone(),
approval_request_source: self.approval_request_source,
reviewed_action: self.reviewed_action.clone(),
reviewed_action_truncated: result.reviewed_action_truncated,
decision: result.decision,
terminal_status: result.terminal_status,
failure_reason: result.failure_reason,
risk_level: result.risk_level,
user_authorization: result.user_authorization,
outcome: result.outcome,
guardian_thread_id: result.guardian_thread_id,
guardian_session_kind: result.guardian_session_kind,
guardian_model: result.guardian_model,
guardian_reasoning_effort: result.guardian_reasoning_effort,
had_prior_review_context: result.had_prior_review_context,
review_timeout_ms: self.review_timeout_ms,
// TODO(rhan-oai): plumb nested Guardian review session tool-call counts.
tool_call_count: None,
time_to_first_token_ms: result.time_to_first_token_ms,
completion_latency_ms: Some(self.started_instant.elapsed().as_millis() as u64),
started_at: self.started_at,
completed_at: Some(now_unix_seconds()),
input_tokens: result.token_usage.as_ref().map(|usage| usage.input_tokens),
cached_input_tokens: result
.token_usage
.as_ref()
.map(|usage| usage.cached_input_tokens),
output_tokens: result.token_usage.as_ref().map(|usage| usage.output_tokens),
reasoning_output_tokens: result
.token_usage
.as_ref()
.map(|usage| usage.reasoning_output_tokens),
total_tokens: result.token_usage.as_ref().map(|usage| usage.total_tokens),
}
}
}
#[derive(Debug)]
pub struct GuardianReviewAnalyticsResult {
pub decision: GuardianReviewDecision,
pub terminal_status: GuardianReviewTerminalStatus,
pub failure_reason: Option<GuardianReviewFailureReason>,
pub risk_level: Option<GuardianRiskLevel>,
pub user_authorization: Option<GuardianUserAuthorization>,
pub outcome: Option<GuardianAssessmentOutcome>,
pub guardian_thread_id: Option<String>,
pub guardian_session_kind: Option<GuardianReviewSessionKind>,
pub guardian_model: Option<String>,
pub guardian_reasoning_effort: Option<String>,
pub had_prior_review_context: Option<bool>,
pub reviewed_action_truncated: bool,
pub token_usage: Option<TokenUsage>,
pub time_to_first_token_ms: Option<u64>,
}
impl GuardianReviewAnalyticsResult {
pub fn without_session() -> Self {
Self {
decision: GuardianReviewDecision::Denied,
terminal_status: GuardianReviewTerminalStatus::FailedClosed,
failure_reason: None,
risk_level: None,
user_authorization: None,
outcome: None,
guardian_thread_id: None,
guardian_session_kind: None,
guardian_model: None,
guardian_reasoning_effort: None,
had_prior_review_context: None,
reviewed_action_truncated: false,
token_usage: None,
time_to_first_token_ms: None,
}
}
pub fn from_session(
guardian_thread_id: String,
guardian_session_kind: GuardianReviewSessionKind,
guardian_model: String,
guardian_reasoning_effort: Option<String>,
had_prior_review_context: bool,
) -> Self {
Self {
guardian_thread_id: Some(guardian_thread_id),
guardian_session_kind: Some(guardian_session_kind),
guardian_model: Some(guardian_model),
guardian_reasoning_effort,
had_prior_review_context: Some(had_prior_review_context),
..Self::without_session()
}
}
}
#[derive(Serialize)]
pub(crate) struct GuardianReviewEventPayload {
pub(crate) app_server_client: CodexAppServerClientMetadata,
@@ -243,6 +395,199 @@ pub(crate) struct GuardianReviewEventPayload {
pub(crate) guardian_review: GuardianReviewEventParams,
}
#[allow(dead_code)]
#[derive(Clone, Copy, Debug, Serialize)]
#[serde(rename_all = "snake_case")]
pub(crate) enum ToolItemFinalApprovalOutcome {
Unknown,
NotNeeded,
ConfigAllowed,
PolicyForbidden,
GuardianApproved,
GuardianDenied,
GuardianAborted,
UserApproved,
UserApprovedForSession,
UserDenied,
UserAborted,
}
#[allow(dead_code)]
#[derive(Clone, Copy, Debug, Serialize)]
#[serde(rename_all = "snake_case")]
pub(crate) enum ToolItemTerminalStatus {
Completed,
Failed,
Rejected,
Interrupted,
}
#[allow(dead_code)]
#[derive(Clone, Copy, Debug, Serialize)]
#[serde(rename_all = "snake_case")]
pub(crate) enum ToolItemFailureKind {
ToolError,
ApprovalDenied,
ApprovalAborted,
SandboxDenied,
PolicyForbidden,
}
#[derive(Serialize)]
pub(crate) struct CodexToolItemEventBase {
pub(crate) thread_id: String,
pub(crate) turn_id: String,
/// App-server ThreadItem.id. For tool-originated items this generally
/// corresponds to the originating core call_id.
pub(crate) item_id: String,
pub(crate) app_server_client: CodexAppServerClientMetadata,
pub(crate) runtime: CodexRuntimeMetadata,
pub(crate) thread_source: Option<&'static str>,
pub(crate) subagent_source: Option<String>,
pub(crate) parent_thread_id: Option<String>,
pub(crate) tool_name: String,
pub(crate) started_at_ms: u64,
pub(crate) completed_at_ms: u64,
// Observed item lifecycle duration. This may undercount end-to-end execution
// for tools where app-server only sees part of the upstream flow.
pub(crate) duration_ms: Option<u64>,
pub(crate) execution_duration_ms: Option<u64>,
pub(crate) review_count: u64,
pub(crate) guardian_review_count: u64,
pub(crate) user_review_count: u64,
pub(crate) final_approval_outcome: ToolItemFinalApprovalOutcome,
pub(crate) terminal_status: ToolItemTerminalStatus,
pub(crate) failure_kind: Option<ToolItemFailureKind>,
pub(crate) requested_additional_permissions: bool,
pub(crate) requested_network_access: bool,
}
#[derive(Clone, Copy, Debug, Serialize)]
#[serde(rename_all = "snake_case")]
pub(crate) enum WebSearchActionKind {
Search,
OpenPage,
FindInPage,
Other,
}
#[derive(Serialize)]
pub(crate) struct CodexCommandExecutionEventParams {
#[serde(flatten)]
pub(crate) base: CodexToolItemEventBase,
pub(crate) command_execution_source: CommandExecutionSource,
pub(crate) exit_code: Option<i32>,
pub(crate) command_total_action_count: u64,
pub(crate) command_read_action_count: u64,
pub(crate) command_list_files_action_count: u64,
pub(crate) command_search_action_count: u64,
pub(crate) command_unknown_action_count: u64,
}
#[derive(Serialize)]
pub(crate) struct CodexCommandExecutionEventRequest {
pub(crate) event_type: &'static str,
pub(crate) event_params: CodexCommandExecutionEventParams,
}
#[derive(Serialize)]
pub(crate) struct CodexFileChangeEventParams {
#[serde(flatten)]
pub(crate) base: CodexToolItemEventBase,
pub(crate) file_change_count: u64,
pub(crate) file_add_count: u64,
pub(crate) file_update_count: u64,
pub(crate) file_delete_count: u64,
pub(crate) file_move_count: u64,
}
#[derive(Serialize)]
pub(crate) struct CodexFileChangeEventRequest {
pub(crate) event_type: &'static str,
pub(crate) event_params: CodexFileChangeEventParams,
}
#[derive(Serialize)]
pub(crate) struct CodexMcpToolCallEventParams {
#[serde(flatten)]
pub(crate) base: CodexToolItemEventBase,
pub(crate) mcp_server_name: String,
pub(crate) mcp_tool_name: String,
pub(crate) mcp_error_present: bool,
}
#[derive(Serialize)]
pub(crate) struct CodexMcpToolCallEventRequest {
pub(crate) event_type: &'static str,
pub(crate) event_params: CodexMcpToolCallEventParams,
}
#[derive(Serialize)]
pub(crate) struct CodexDynamicToolCallEventParams {
#[serde(flatten)]
pub(crate) base: CodexToolItemEventBase,
pub(crate) dynamic_tool_name: String,
pub(crate) success: Option<bool>,
pub(crate) output_content_item_count: Option<u64>,
pub(crate) output_text_item_count: Option<u64>,
pub(crate) output_image_item_count: Option<u64>,
}
#[derive(Serialize)]
pub(crate) struct CodexDynamicToolCallEventRequest {
pub(crate) event_type: &'static str,
pub(crate) event_params: CodexDynamicToolCallEventParams,
}
#[derive(Serialize)]
pub(crate) struct CodexCollabAgentToolCallEventParams {
#[serde(flatten)]
pub(crate) base: CodexToolItemEventBase,
pub(crate) sender_thread_id: String,
pub(crate) receiver_thread_count: u64,
pub(crate) receiver_thread_ids: Option<Vec<String>>,
pub(crate) requested_model: Option<String>,
pub(crate) requested_reasoning_effort: Option<String>,
pub(crate) agent_state_count: Option<u64>,
pub(crate) completed_agent_count: Option<u64>,
pub(crate) failed_agent_count: Option<u64>,
}
#[derive(Serialize)]
pub(crate) struct CodexCollabAgentToolCallEventRequest {
pub(crate) event_type: &'static str,
pub(crate) event_params: CodexCollabAgentToolCallEventParams,
}
#[derive(Serialize)]
pub(crate) struct CodexWebSearchEventParams {
#[serde(flatten)]
pub(crate) base: CodexToolItemEventBase,
pub(crate) web_search_action: Option<WebSearchActionKind>,
pub(crate) query_present: bool,
pub(crate) query_count: Option<u64>,
}
#[derive(Serialize)]
pub(crate) struct CodexWebSearchEventRequest {
pub(crate) event_type: &'static str,
pub(crate) event_params: CodexWebSearchEventParams,
}
#[derive(Serialize)]
pub(crate) struct CodexImageGenerationEventParams {
#[serde(flatten)]
pub(crate) base: CodexToolItemEventBase,
pub(crate) revised_prompt_present: bool,
pub(crate) saved_path_present: bool,
}
#[derive(Serialize)]
pub(crate) struct CodexImageGenerationEventRequest {
pub(crate) event_type: &'static str,
pub(crate) event_params: CodexImageGenerationEventParams,
}
#[derive(Serialize)]
pub(crate) struct CodexAppMetadata {
pub(crate) connector_id: Option<String>,
@@ -288,7 +633,7 @@ pub(crate) struct CodexCompactionEventParams {
pub(crate) turn_id: String,
pub(crate) app_server_client: CodexAppServerClientMetadata,
pub(crate) runtime: CodexRuntimeMetadata,
pub(crate) thread_source: Option<&'static str>,
pub(crate) thread_source: Option<ThreadSource>,
pub(crate) subagent_source: Option<String>,
pub(crate) parent_thread_id: Option<String>,
pub(crate) trigger: CompactionTrigger,
@@ -321,7 +666,7 @@ pub(crate) struct CodexTurnEventParams {
pub(crate) app_server_client: CodexAppServerClientMetadata,
pub(crate) runtime: CodexRuntimeMetadata,
pub(crate) ephemeral: bool,
pub(crate) thread_source: Option<String>,
pub(crate) thread_source: Option<ThreadSource>,
pub(crate) initialization_mode: ThreadInitializationMode,
pub(crate) subagent_source: Option<String>,
pub(crate) parent_thread_id: Option<String>,
@@ -374,7 +719,7 @@ pub(crate) struct CodexTurnSteerEventParams {
pub(crate) accepted_turn_id: Option<String>,
pub(crate) app_server_client: CodexAppServerClientMetadata,
pub(crate) runtime: CodexRuntimeMetadata,
pub(crate) thread_source: Option<String>,
pub(crate) thread_source: Option<ThreadSource>,
pub(crate) subagent_source: Option<String>,
pub(crate) parent_thread_id: Option<String>,
pub(crate) num_input_images: usize,
@@ -446,11 +791,16 @@ pub(crate) fn codex_app_metadata(
}
pub(crate) fn codex_plugin_metadata(plugin: PluginTelemetryMetadata) -> CodexPluginMetadata {
let capability_summary = plugin.capability_summary;
let PluginTelemetryMetadata {
plugin_id,
remote_plugin_id,
capability_summary,
} = plugin;
let event_plugin_id = remote_plugin_id.unwrap_or_else(|| plugin_id.as_key());
CodexPluginMetadata {
plugin_id: Some(plugin.plugin_id.as_key()),
plugin_name: Some(plugin.plugin_id.plugin_name),
marketplace_name: Some(plugin.plugin_id.marketplace_name),
plugin_id: Some(event_plugin_id),
plugin_name: Some(plugin_id.plugin_name),
marketplace_name: Some(plugin_id.marketplace_name),
has_skills: capability_summary
.as_ref()
.map(|summary| summary.has_skills),
@@ -472,7 +822,7 @@ pub(crate) fn codex_compaction_event_params(
input: CodexCompactionEvent,
app_server_client: CodexAppServerClientMetadata,
runtime: CodexRuntimeMetadata,
thread_source: Option<&'static str>,
thread_source: Option<ThreadSource>,
subagent_source: Option<String>,
parent_thread_id: Option<String>,
) -> CodexCompactionEventParams {
@@ -543,6 +893,8 @@ fn analytics_hook_source(source: HookSource) -> &'static str {
HookSource::Project => "project",
HookSource::Mdm => "mdm",
HookSource::SessionFlags => "session_flags",
HookSource::Plugin => "plugin",
HookSource::CloudRequirements => "cloud_requirements",
HookSource::LegacyManagedConfigFile => "legacy_managed_config_file",
HookSource::LegacyManagedConfigMdm => "legacy_managed_config_mdm",
HookSource::Unknown => "unknown",
@@ -574,7 +926,7 @@ pub(crate) fn subagent_thread_started_event_request(
runtime: current_runtime_metadata(),
model: input.model,
ephemeral: input.ephemeral,
thread_source: Some("subagent"),
thread_source: Some(ThreadSource::Subagent),
initialization_mode: ThreadInitializationMode::New,
subagent_source: Some(subagent_source_name(&input.subagent_source)),
parent_thread_id: input

View File

@@ -2,23 +2,25 @@ use crate::events::AppServerRpcTransport;
use crate::events::CodexRuntimeMetadata;
use crate::events::GuardianReviewEventParams;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::ClientResponse;
use codex_app_server_protocol::ClientResponsePayload;
use codex_app_server_protocol::InitializeParams;
use codex_app_server_protocol::JSONRPCErrorError;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ServerRequest;
use codex_app_server_protocol::ServerResponse;
use codex_plugin::PluginTelemetryMetadata;
use codex_protocol::config_types::ApprovalsReviewer;
use codex_protocol::config_types::ModeKind;
use codex_protocol::config_types::Personality;
use codex_protocol::config_types::ReasoningSummary;
use codex_protocol::config_types::ServiceTier;
use codex_protocol::models::PermissionProfile;
use codex_protocol::openai_models::ReasoningEffort;
use codex_protocol::protocol::AskForApproval;
use codex_protocol::protocol::HookEventName;
use codex_protocol::protocol::HookRunStatus;
use codex_protocol::protocol::HookSource;
use codex_protocol::protocol::SandboxPolicy;
use codex_protocol::protocol::SessionSource;
use codex_protocol::protocol::SkillScope;
use codex_protocol::protocol::SubAgentSource;
@@ -62,7 +64,8 @@ pub struct TurnResolvedConfigFact {
pub session_source: SessionSource,
pub model: String,
pub model_provider: String,
pub sandbox_policy: SandboxPolicy,
pub permission_profile: PermissionProfile,
pub permission_profile_cwd: PathBuf,
pub reasoning_effort: Option<ReasoningEffort>,
pub reasoning_summary: Option<ReasoningSummary>,
pub service_tier: Option<ServiceTier>,
@@ -170,6 +173,7 @@ pub struct SkillInvocation {
pub skill_name: String,
pub skill_scope: SkillScope,
pub skill_path: PathBuf,
pub plugin_id: Option<String>,
pub invocation_type: InvocationType,
}
@@ -271,14 +275,15 @@ pub(crate) enum AnalyticsFact {
runtime: CodexRuntimeMetadata,
rpc_transport: AppServerRpcTransport,
},
Request {
ClientRequest {
connection_id: u64,
request_id: RequestId,
request: Box<ClientRequest>,
},
Response {
ClientResponse {
connection_id: u64,
response: Box<ClientResponse>,
request_id: RequestId,
response: Box<ClientResponsePayload>,
},
ErrorResponse {
connection_id: u64,
@@ -286,6 +291,13 @@ pub(crate) enum AnalyticsFact {
error: JSONRPCErrorError,
error_type: Option<AnalyticsJsonRpcError>,
},
ServerRequest {
connection_id: u64,
request: Box<ServerRequest>,
},
ServerResponse {
response: Box<ServerResponse>,
},
Notification(Box<ServerNotification>),
// Facts that do not naturally exist on the app-server protocol surface, or
// would require non-trivial protocol reshaping on this branch.

View File

@@ -9,11 +9,13 @@ use std::time::UNIX_EPOCH;
pub use client::AnalyticsEventsClient;
pub use events::AppServerRpcTransport;
pub use events::GuardianApprovalRequestSource;
pub use events::GuardianReviewAnalyticsResult;
pub use events::GuardianReviewDecision;
pub use events::GuardianReviewEventParams;
pub use events::GuardianReviewFailureReason;
pub use events::GuardianReviewSessionKind;
pub use events::GuardianReviewTerminalStatus;
pub use events::GuardianReviewTrackContext;
pub use events::GuardianReviewedAction;
pub use facts::AnalyticsJsonRpcError;
pub use facts::AppInvocation;
@@ -49,3 +51,27 @@ pub fn now_unix_seconds() -> u64 {
.unwrap_or_default()
.as_secs()
}
pub fn now_unix_millis() -> u64 {
u64::try_from(
SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or_default()
.as_millis(),
)
.unwrap_or(u64::MAX)
}
pub(crate) fn serialize_enum_as_string<T: serde::Serialize>(value: &T) -> Option<String> {
serde_json::to_value(value)
.ok()
.and_then(|value| value.as_str().map(str::to_string))
}
pub(crate) fn usize_to_u64(value: usize) -> u64 {
u64::try_from(value).unwrap_or(u64::MAX)
}
pub(crate) fn option_i64_to_u64(value: Option<i64>) -> Option<u64> {
value.and_then(|value| u64::try_from(value).ok())
}

File diff suppressed because it is too large Load Diff

View File

@@ -33,4 +33,5 @@ url = { workspace = true }
[dev-dependencies]
pretty_assertions = { workspace = true }
serde_json = { workspace = true }
tempfile = { workspace = true }
tokio = { workspace = true, features = ["macros", "rt-multi-thread"] }

View File

@@ -41,10 +41,13 @@ use codex_app_server_protocol::Result as JsonRpcResult;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ServerRequest;
use codex_arg0::Arg0DispatchPaths;
use codex_config::CloudRequirementsLoader;
use codex_config::LoaderOverrides;
use codex_config::NoopThreadConfigLoader;
use codex_config::RemoteThreadConfigLoader;
use codex_config::ThreadConfigLoader;
pub use codex_core::StateDbHandle;
use codex_core::config::Config;
use codex_core::config_loader::CloudRequirementsLoader;
use codex_core::config_loader::LoaderOverrides;
pub use codex_exec_server::EnvironmentManager;
pub use codex_exec_server::EnvironmentManagerArgs;
pub use codex_exec_server::ExecServerRuntimePaths;
@@ -69,12 +72,9 @@ pub mod legacy_core {
pub use codex_core::DEFAULT_AGENTS_MD_FILENAME;
pub use codex_core::LOCAL_AGENTS_MD_FILENAME;
pub use codex_core::McpManager;
pub use codex_core::append_message_history_entry;
pub use codex_core::check_execpolicy_for_warnings;
pub use codex_core::format_exec_policy_error_with_source;
pub use codex_core::grant_read_root_non_elevated;
pub use codex_core::lookup_message_history_entry;
pub use codex_core::message_history_metadata;
pub use codex_core::web_search_detail;
pub mod config {
@@ -97,10 +97,6 @@ pub mod legacy_core {
pub use codex_core::personality_migration::*;
}
pub mod plugins {
pub use codex_core::plugins::*;
}
pub mod review_format {
pub use codex_core::review_format::*;
}
@@ -302,7 +298,15 @@ impl fmt::Display for TypedRequestError {
write!(f, "{method} transport error: {source}")
}
Self::Server { method, source } => {
write!(f, "{method} failed: {}", source.message)
write!(
f,
"{method} failed: {} (code {})",
source.message, source.code
)?;
if let Some(data) = source.data.as_ref() {
write!(f, ", data: {data}")?;
}
Ok(())
}
Self::Deserialize { method, source } => {
write!(f, "{method} response decode error: {source}")
@@ -337,6 +341,8 @@ pub struct InProcessClientStartArgs {
pub feedback: CodexFeedback,
/// SQLite tracing layer used to flush recently emitted logs before feedback upload.
pub log_db: Option<LogDbLayer>,
/// Process-wide SQLite state handle shared with the embedded app-server.
pub state_db: Option<StateDbHandle>,
/// Environment manager used by core execution and filesystem operations.
pub environment_manager: Arc<EnvironmentManager>,
/// Startup warnings emitted after initialize succeeds.
@@ -357,6 +363,13 @@ pub struct InProcessClientStartArgs {
pub channel_capacity: usize,
}
fn configured_thread_config_loader(config: &Config) -> Arc<dyn ThreadConfigLoader> {
match config.experimental_thread_config_endpoint.as_deref() {
Some(endpoint) => Arc::new(RemoteThreadConfigLoader::new(endpoint)),
None => Arc::new(NoopThreadConfigLoader),
}
}
impl InProcessClientStartArgs {
/// Builds initialize params from caller-provided metadata.
pub fn initialize_params(&self) -> InitializeParams {
@@ -381,15 +394,17 @@ impl InProcessClientStartArgs {
fn into_runtime_start_args(self) -> InProcessStartArgs {
let initialize = self.initialize_params();
let thread_config_loader = configured_thread_config_loader(&self.config);
InProcessStartArgs {
arg0_paths: self.arg0_paths,
config: self.config,
cli_overrides: self.cli_overrides,
loader_overrides: self.loader_overrides,
cloud_requirements: self.cloud_requirements,
thread_config_loader: Arc::new(NoopThreadConfigLoader),
thread_config_loader,
feedback: self.feedback,
log_db: self.log_db,
state_db: self.state_db,
environment_manager: self.environment_manager,
config_warnings: self.config_warnings,
session_source: self.session_source,
@@ -936,9 +951,13 @@ mod tests {
use codex_app_server_protocol::ToolRequestUserInputParams;
use codex_app_server_protocol::ToolRequestUserInputQuestion;
use codex_core::config::ConfigBuilder;
use codex_core::init_state_db_from_config;
use futures::SinkExt;
use futures::StreamExt;
use pretty_assertions::assert_eq;
use std::ops::Deref;
use std::path::Path;
use tempfile::TempDir;
use tokio::net::TcpListener;
use tokio::time::Duration;
use tokio::time::timeout;
@@ -957,18 +976,59 @@ mod tests {
}
}
async fn build_test_config_for_codex_home(codex_home: &Path) -> Config {
match ConfigBuilder::default()
.codex_home(codex_home.to_path_buf())
.build()
.await
{
Ok(config) => config,
Err(_) => Config::load_default_with_cli_overrides_for_codex_home(
codex_home.to_path_buf(),
Vec::new(),
)
.await
.expect("default config should load"),
}
}
struct TestClient {
_codex_home: TempDir,
client: InProcessAppServerClient,
}
impl Deref for TestClient {
type Target = InProcessAppServerClient;
fn deref(&self) -> &Self::Target {
&self.client
}
}
impl TestClient {
async fn shutdown(self) -> IoResult<()> {
self.client.shutdown().await
}
}
async fn start_test_client_with_capacity(
session_source: SessionSource,
channel_capacity: usize,
) -> InProcessAppServerClient {
InProcessAppServerClient::start(InProcessClientStartArgs {
) -> TestClient {
let codex_home = TempDir::new().expect("temp dir");
let config = Arc::new(build_test_config_for_codex_home(codex_home.path()).await);
let state_db = init_state_db_from_config(config.as_ref())
.await
.expect("state db should initialize for in-process test");
let client = InProcessAppServerClient::start(InProcessClientStartArgs {
arg0_paths: Arg0DispatchPaths::default(),
config: Arc::new(build_test_config().await),
config,
cli_overrides: Vec::new(),
loader_overrides: LoaderOverrides::default(),
cloud_requirements: CloudRequirementsLoader::default(),
feedback: CodexFeedback::new(),
log_db: None,
state_db: Some(state_db),
environment_manager: Arc::new(EnvironmentManager::default_for_tests()),
config_warnings: Vec::new(),
session_source,
@@ -980,10 +1040,15 @@ mod tests {
channel_capacity,
})
.await
.expect("in-process app-server client should start")
.expect("in-process app-server client should start");
TestClient {
_codex_home: codex_home,
client,
}
}
async fn start_test_client(session_source: SessionSource) -> InProcessAppServerClient {
async fn start_test_client(session_source: SessionSource) -> TestClient {
start_test_client_with_capacity(session_source, DEFAULT_IN_PROCESS_CHANNEL_CAPACITY).await
}
@@ -1120,6 +1185,7 @@ mod tests {
ServerNotification::ItemCompleted(codex_app_server_protocol::ItemCompletedNotification {
thread_id: "thread".to_string(),
turn_id: "turn".to_string(),
completed_at_ms: 0,
item: codex_app_server_protocol::ThreadItem::AgentMessage {
id: "item".to_string(),
text: text.to_string(),
@@ -1134,6 +1200,7 @@ mod tests {
thread_id: "thread".to_string(),
turn: codex_app_server_protocol::Turn {
id: "turn".to_string(),
items_view: codex_app_server_protocol::TurnItemsView::Full,
items: Vec::new(),
status: codex_app_server_protocol::TurnStatus::Completed,
error: None,
@@ -1386,6 +1453,55 @@ mod tests {
client.shutdown().await.expect("shutdown should complete");
}
#[tokio::test]
async fn remote_typed_request_accepts_large_single_frame_response() {
let padding = "x".repeat((17 << 20) + 1024);
let websocket_url = start_test_remote_server(move |mut websocket| async move {
expect_remote_initialize(&mut websocket).await;
let JSONRPCMessage::Request(request) = read_websocket_message(&mut websocket).await
else {
panic!("expected account/read request");
};
assert_eq!(request.method, "account/read");
write_websocket_message(
&mut websocket,
JSONRPCMessage::Response(JSONRPCResponse {
id: request.id,
result: serde_json::json!({
"account": null,
"requiresOpenaiAuth": false,
"padding": padding,
}),
}),
)
.await;
websocket.close(None).await.expect("close should succeed");
})
.await;
let client = RemoteAppServerClient::connect(test_remote_connect_args(websocket_url))
.await
.expect("remote client should connect");
let response: GetAccountResponse = client
.request_typed(ClientRequest::GetAccount {
request_id: RequestId::Integer(1),
params: codex_app_server_protocol::GetAccountParams {
refresh_token: false,
},
})
.await
.expect("large typed request should succeed");
assert_eq!(
response,
GetAccountResponse {
account: None,
requires_openai_auth: false,
}
);
client.shutdown().await.expect("shutdown should complete");
}
#[tokio::test]
async fn remote_connect_includes_auth_header_when_configured() {
let auth_token = "remote-bearer-token".to_string();
@@ -1860,11 +1976,15 @@ mod tests {
method: "thread/read".to_string(),
source: JSONRPCErrorError {
code: -32603,
data: None,
data: Some(serde_json::json!({"detail": "config lock mismatch"})),
message: "internal".to_string(),
},
};
assert_eq!(std::error::Error::source(&server).is_some(), false);
assert_eq!(
server.to_string(),
"thread/read failed: internal (code -32603), data: {\"detail\":\"config lock mismatch\"}"
);
let deserialize = TypedRequestError::Deserialize {
method: "thread/start".to_string(),
@@ -1911,6 +2031,7 @@ mod tests {
thread_id: "thread".to_string(),
turn: codex_app_server_protocol::Turn {
id: "turn".to_string(),
items_view: codex_app_server_protocol::TurnItemsView::Full,
items: Vec::new(),
status: codex_app_server_protocol::TurnStatus::Completed,
error: None,
@@ -1940,6 +2061,7 @@ mod tests {
codex_app_server_protocol::ItemCompletedNotification {
thread_id: "thread".to_string(),
turn_id: "turn".to_string(),
completed_at_ms: 0,
item: codex_app_server_protocol::ThreadItem::AgentMessage {
id: "item".to_string(),
text: "hello".to_string(),
@@ -1970,14 +2092,17 @@ mod tests {
#[tokio::test]
async fn runtime_start_args_forward_environment_manager() {
let config = Arc::new(build_test_config().await);
let environment_manager = Arc::new(EnvironmentManager::new(EnvironmentManagerArgs {
exec_server_url: Some("ws://127.0.0.1:8765".to_string()),
local_runtime_paths: ExecServerRuntimePaths::new(
std::env::current_exe().expect("current exe"),
/*codex_linux_sandbox_exe*/ None,
let environment_manager = Arc::new(
EnvironmentManager::create_for_tests(
Some("ws://127.0.0.1:8765".to_string()),
ExecServerRuntimePaths::new(
std::env::current_exe().expect("current exe"),
/*codex_linux_sandbox_exe*/ None,
)
.expect("runtime paths"),
)
.expect("runtime paths"),
}));
.await,
);
let runtime_args = InProcessClientStartArgs {
arg0_paths: Arg0DispatchPaths::default(),
@@ -1987,6 +2112,7 @@ mod tests {
cloud_requirements: CloudRequirementsLoader::default(),
feedback: CodexFeedback::new(),
log_db: None,
state_db: None,
environment_manager: environment_manager.clone(),
config_warnings: Vec::new(),
session_source: SessionSource::Exec,
@@ -2013,6 +2139,43 @@ mod tests {
);
}
#[tokio::test]
async fn runtime_start_args_use_remote_thread_config_loader_when_configured() {
let mut config = build_test_config().await;
config.experimental_thread_config_endpoint = Some("not-a-valid-endpoint".to_string());
let runtime_args = InProcessClientStartArgs {
arg0_paths: Arg0DispatchPaths::default(),
config: Arc::new(config),
cli_overrides: Vec::new(),
loader_overrides: LoaderOverrides::default(),
cloud_requirements: CloudRequirementsLoader::default(),
feedback: CodexFeedback::new(),
log_db: None,
state_db: None,
environment_manager: Arc::new(EnvironmentManager::default_for_tests()),
config_warnings: Vec::new(),
session_source: SessionSource::Exec,
enable_codex_api_key_env: false,
client_name: "codex-app-server-client-test".to_string(),
client_version: "0.0.0-test".to_string(),
experimental_api: true,
opt_out_notification_methods: Vec::new(),
channel_capacity: DEFAULT_IN_PROCESS_CHANNEL_CAPACITY,
}
.into_runtime_start_args();
let err = runtime_args
.thread_config_loader
.load(Default::default())
.await
.expect_err("configured remote loader should try to connect");
assert_eq!(
err.code(),
codex_config::ThreadConfigLoadErrorCode::RequestFailed
);
}
#[tokio::test]
async fn shutdown_completes_promptly_without_retained_managers() {
let client = start_test_client(SessionSource::Cli).await;

View File

@@ -20,7 +20,6 @@ use crate::RequestResult;
use crate::SHUTDOWN_TIMEOUT;
use crate::TypedRequestError;
use crate::request_method_name;
use crate::server_notification_requires_delivery;
use codex_app_server_protocol::ClientInfo;
use codex_app_server_protocol::ClientNotification;
use codex_app_server_protocol::ClientRequest;
@@ -46,16 +45,18 @@ use tokio::sync::oneshot;
use tokio::time::timeout;
use tokio_tungstenite::MaybeTlsStream;
use tokio_tungstenite::WebSocketStream;
use tokio_tungstenite::connect_async;
use tokio_tungstenite::connect_async_with_config;
use tokio_tungstenite::tungstenite::Message;
use tokio_tungstenite::tungstenite::client::IntoClientRequest;
use tokio_tungstenite::tungstenite::http::HeaderValue;
use tokio_tungstenite::tungstenite::http::header::AUTHORIZATION;
use tokio_tungstenite::tungstenite::protocol::WebSocketConfig;
use tracing::warn;
use url::Url;
const CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
const INITIALIZE_TIMEOUT: Duration = Duration::from_secs(10);
const REMOTE_APP_SERVER_MAX_WEBSOCKET_MESSAGE_SIZE: usize = 128 << 20;
#[derive(Debug, Clone)]
pub struct RemoteAppServerConnectArgs {
@@ -126,7 +127,7 @@ enum RemoteClientCommand {
pub struct RemoteAppServerClient {
command_tx: mpsc::Sender<RemoteClientCommand>,
event_rx: mpsc::Receiver<AppServerEvent>,
event_rx: mpsc::UnboundedReceiver<AppServerEvent>,
pending_events: VecDeque<AppServerEvent>,
worker_handle: tokio::task::JoinHandle<()>,
}
@@ -171,20 +172,32 @@ impl RemoteAppServerClient {
request.headers_mut().insert(AUTHORIZATION, header_value);
}
ensure_rustls_crypto_provider();
let stream = timeout(CONNECT_TIMEOUT, connect_async(request))
.await
.map_err(|_| {
IoError::new(
ErrorKind::TimedOut,
format!("timed out connecting to remote app server at `{websocket_url}`"),
)
})?
.map(|(stream, _response)| stream)
.map_err(|err| {
IoError::other(format!(
"failed to connect to remote app server at `{websocket_url}`: {err}"
))
})?;
// Remote resume responses can legitimately carry large thread histories.
// Keep a bounded cap, but raise it above tungstenite's 16 MiB frame default.
let websocket_config = WebSocketConfig::default()
.max_frame_size(Some(REMOTE_APP_SERVER_MAX_WEBSOCKET_MESSAGE_SIZE))
.max_message_size(Some(REMOTE_APP_SERVER_MAX_WEBSOCKET_MESSAGE_SIZE));
let stream = timeout(
CONNECT_TIMEOUT,
connect_async_with_config(
request,
Some(websocket_config),
/*disable_nagle*/ false,
),
)
.await
.map_err(|_| {
IoError::new(
ErrorKind::TimedOut,
format!("timed out connecting to remote app server at `{websocket_url}`"),
)
})?
.map(|(stream, _response)| stream)
.map_err(|err| {
IoError::other(format!(
"failed to connect to remote app server at `{websocket_url}`: {err}"
))
})?;
let mut stream = stream;
let pending_events = initialize_remote_connection(
&mut stream,
@@ -195,11 +208,11 @@ impl RemoteAppServerClient {
.await?;
let (command_tx, mut command_rx) = mpsc::channel::<RemoteClientCommand>(channel_capacity);
let (event_tx, event_rx) = mpsc::channel::<AppServerEvent>(channel_capacity);
let (event_tx, event_rx) = mpsc::unbounded_channel::<AppServerEvent>();
let worker_handle = tokio::spawn(async move {
let mut pending_requests =
HashMap::<RequestId, oneshot::Sender<IoResult<RequestResult>>>::new();
let mut skipped_events = 0usize;
let mut worker_exit_error: Option<(ErrorKind, String)> = None;
loop {
tokio::select! {
command = command_rx.recv() => {
@@ -226,20 +239,19 @@ impl RemoteAppServerClient {
.await
{
let err_message = err.to_string();
let message = format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
);
if let Some(response_tx) = pending_requests.remove(&request_id) {
let _ = response_tx.send(Err(err));
}
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((ErrorKind::BrokenPipe, message));
break;
}
}
@@ -316,11 +328,8 @@ impl RemoteAppServerClient {
app_server_event_from_notification(notification)
&& let Err(err) = deliver_event(
&event_tx,
&mut skipped_events,
event,
&mut stream,
)
.await
{
warn!(%err, "failed to deliver remote app-server event");
break;
@@ -333,11 +342,8 @@ impl RemoteAppServerClient {
Ok(request) => {
if let Err(err) = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::ServerRequest(request),
&mut stream,
)
.await
{
warn!(%err, "failed to deliver remote app-server server request");
break;
@@ -362,34 +368,34 @@ impl RemoteAppServerClient {
.await
{
let err_message = reject_err.to_string();
let message = format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` write failed: {err_message}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error =
Some((ErrorKind::BrokenPipe, message));
break;
}
}
}
}
Err(err) => {
let message = format!(
"remote app server at `{websocket_url}` sent invalid JSON-RPC: {err}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` sent invalid JSON-RPC: {err}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error =
Some((ErrorKind::InvalidData, message));
break;
}
}
@@ -400,17 +406,19 @@ impl RemoteAppServerClient {
.map(|frame| frame.reason.to_string())
.filter(|reason| !reason.is_empty())
.unwrap_or_else(|| "connection closed".to_string());
let message = format!(
"remote app server at `{websocket_url}` disconnected: {reason}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` disconnected: {reason}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((
ErrorKind::ConnectionAborted,
message,
));
break;
}
Some(Ok(Message::Binary(_)))
@@ -418,31 +426,29 @@ impl RemoteAppServerClient {
| Some(Ok(Message::Pong(_)))
| Some(Ok(Message::Frame(_))) => {}
Some(Err(err)) => {
let message = format!(
"remote app server at `{websocket_url}` transport failed: {err}"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` transport failed: {err}"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((ErrorKind::InvalidData, message));
break;
}
None => {
let message = format!(
"remote app server at `{websocket_url}` closed the connection"
);
let _ = deliver_event(
&event_tx,
&mut skipped_events,
AppServerEvent::Disconnected {
message: format!(
"remote app server at `{websocket_url}` closed the connection"
),
message: message.clone(),
},
&mut stream,
)
.await;
);
worker_exit_error = Some((ErrorKind::UnexpectedEof, message));
break;
}
}
@@ -450,12 +456,14 @@ impl RemoteAppServerClient {
}
}
let err = IoError::new(
ErrorKind::BrokenPipe,
"remote app-server worker channel is closed",
);
let (err_kind, err_message) = worker_exit_error.unwrap_or_else(|| {
(
ErrorKind::BrokenPipe,
"remote app-server worker channel is closed".to_string(),
)
});
for (_, response_tx) in pending_requests {
let _ = response_tx.send(Err(IoError::new(err.kind(), err.to_string())));
let _ = response_tx.send(Err(IoError::new(err_kind, err_message.clone())));
}
});
@@ -801,100 +809,16 @@ fn app_server_event_from_notification(notification: JSONRPCNotification) -> Opti
}
}
async fn deliver_event(
event_tx: &mpsc::Sender<AppServerEvent>,
skipped_events: &mut usize,
fn deliver_event(
event_tx: &mpsc::UnboundedSender<AppServerEvent>,
event: AppServerEvent,
stream: &mut WebSocketStream<MaybeTlsStream<TcpStream>>,
) -> IoResult<()> {
if *skipped_events > 0 {
if event_requires_delivery(&event) {
if event_tx
.send(AppServerEvent::Lagged {
skipped: *skipped_events,
})
.await
.is_err()
{
return Err(IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
));
}
*skipped_events = 0;
} else {
match event_tx.try_send(AppServerEvent::Lagged {
skipped: *skipped_events,
}) {
Ok(()) => *skipped_events = 0,
Err(mpsc::error::TrySendError::Full(_)) => {
*skipped_events = (*skipped_events).saturating_add(1);
reject_if_server_request_dropped(stream, &event).await?;
return Ok(());
}
Err(mpsc::error::TrySendError::Closed(_)) => {
return Err(IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
));
}
}
}
}
if event_requires_delivery(&event) {
event_tx.send(event).await.map_err(|_| {
IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
)
})?;
return Ok(());
}
match event_tx.try_send(event) {
Ok(()) => Ok(()),
Err(mpsc::error::TrySendError::Full(event)) => {
*skipped_events = (*skipped_events).saturating_add(1);
reject_if_server_request_dropped(stream, &event).await
}
Err(mpsc::error::TrySendError::Closed(_)) => Err(IoError::new(
event_tx.send(event).map_err(|_| {
IoError::new(
ErrorKind::BrokenPipe,
"remote app-server event consumer channel is closed",
)),
}
}
async fn reject_if_server_request_dropped(
stream: &mut WebSocketStream<MaybeTlsStream<TcpStream>>,
event: &AppServerEvent,
) -> IoResult<()> {
let AppServerEvent::ServerRequest(request) = event else {
return Ok(());
};
write_jsonrpc_message(
stream,
JSONRPCMessage::Error(JSONRPCError {
error: JSONRPCErrorError {
code: -32001,
message: "remote app-server event queue is full".to_string(),
data: None,
},
id: request.id().clone(),
}),
"<remote-app-server>",
)
.await
}
fn event_requires_delivery(event: &AppServerEvent) -> bool {
match event {
AppServerEvent::ServerNotification(notification) => {
server_notification_requires_delivery(notification)
}
AppServerEvent::Disconnected { .. } => true,
AppServerEvent::Lagged { .. } | AppServerEvent::ServerRequest(_) => false,
}
)
})
}
fn request_id_from_client_request(request: &ClientRequest) -> RequestId {
@@ -940,47 +864,14 @@ async fn write_jsonrpc_message(
))
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn event_requires_delivery_marks_transcript_and_disconnect_events() {
assert!(event_requires_delivery(
&AppServerEvent::ServerNotification(ServerNotification::AgentMessageDelta(
codex_app_server_protocol::AgentMessageDeltaNotification {
thread_id: "thread".to_string(),
turn_id: "turn".to_string(),
item_id: "item".to_string(),
delta: "hello".to_string(),
},
),)
));
assert!(event_requires_delivery(
&AppServerEvent::ServerNotification(ServerNotification::ItemCompleted(
codex_app_server_protocol::ItemCompletedNotification {
thread_id: "thread".to_string(),
turn_id: "turn".to_string(),
item: codex_app_server_protocol::ThreadItem::Plan {
id: "item".to_string(),
text: "step".to_string(),
},
}
),)
));
assert!(event_requires_delivery(&AppServerEvent::Disconnected {
message: "closed".to_string(),
}));
assert!(!event_requires_delivery(&AppServerEvent::Lagged {
skipped: 1
}));
}
#[tokio::test]
async fn shutdown_tolerates_worker_exit_after_command_is_queued() {
let (command_tx, mut command_rx) = mpsc::channel(1);
let (_event_tx, event_rx) = mpsc::channel(1);
let (_event_tx, event_rx) = mpsc::unbounded_channel::<AppServerEvent>();
let worker_handle = tokio::spawn(async move {
let _ = command_rx.recv().await;
});

File diff suppressed because it is too large Load Diff

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -76,7 +78,8 @@
{
"type": "null"
}
]
],
"description": "Partial overlay used for per-command permission requests."
}
},
"type": "object"
@@ -389,21 +392,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -175,21 +177,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -175,21 +177,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
@@ -311,6 +298,13 @@
}
],
"default": "turn"
},
"strictAutoReview": {
"description": "Review every subsequent command in this turn before normal sandboxed execution.",
"type": [
"boolean",
"null"
]
}
},
"required": [

View File

@@ -84,6 +84,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -93,6 +94,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -480,6 +482,7 @@
"contextWindowExceeded",
"usageLimitExceeded",
"serverOverloaded",
"cyberPolicy",
"internalServerError",
"unauthorized",
"badRequest",
@@ -1029,6 +1032,7 @@
"type": "object"
},
"FileChangeOutputDeltaNotification": {
"description": "Deprecated legacy notification for `apply_patch` textual output.\n\nThe server no longer emits this notification.",
"properties": {
"delta": {
"type": "string"
@@ -1196,21 +1200,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
@@ -1704,6 +1693,23 @@
],
"type": "string"
},
"GuardianWarningNotification": {
"properties": {
"message": {
"description": "Concise guardian warning message for the user.",
"type": "string"
},
"threadId": {
"description": "Thread target for the guardian warning.",
"type": "string"
}
},
"required": [
"message",
"threadId"
],
"type": "object"
},
"HookCompletedNotification": {
"properties": {
"run": {
@@ -1895,6 +1901,8 @@
"project",
"mdm",
"sessionFlags",
"plugin",
"cloudRequirements",
"legacyManagedConfigFile",
"legacyManagedConfigMdm",
"unknown"
@@ -1924,6 +1932,11 @@
},
"ItemCompletedNotification": {
"properties": {
"completedAtMs": {
"description": "Unix timestamp (in milliseconds) when this item lifecycle completed.",
"format": "int64",
"type": "integer"
},
"item": {
"$ref": "#/definitions/ThreadItem"
},
@@ -1935,6 +1948,7 @@
}
},
"required": [
"completedAtMs",
"item",
"threadId",
"turnId"
@@ -2022,6 +2036,11 @@
"item": {
"$ref": "#/definitions/ThreadItem"
},
"startedAtMs": {
"description": "Unix timestamp (in milliseconds) when this item lifecycle started.",
"format": "int64",
"type": "integer"
},
"threadId": {
"type": "string"
},
@@ -2031,6 +2050,7 @@
},
"required": [
"item",
"startedAtMs",
"threadId",
"turnId"
],
@@ -2243,6 +2263,34 @@
],
"type": "object"
},
"ModelVerification": {
"enum": [
"trustedAccessForCyber"
],
"type": "string"
},
"ModelVerificationNotification": {
"properties": {
"threadId": {
"type": "string"
},
"turnId": {
"type": "string"
},
"verifications": {
"items": {
"$ref": "#/definitions/ModelVerification"
},
"type": "array"
}
},
"required": [
"threadId",
"turnId",
"verifications"
],
"type": "object"
},
"NetworkApprovalProtocol": {
"enum": [
"http",
@@ -2367,6 +2415,96 @@
],
"type": "string"
},
"ProcessExitedNotification": {
"description": "Final process exit notification for `process/spawn`.",
"properties": {
"exitCode": {
"description": "Process exit code.",
"format": "int32",
"type": "integer"
},
"processHandle": {
"description": "Client-supplied, connection-scoped `processHandle` from `process/spawn`.",
"type": "string"
},
"stderr": {
"description": "Buffered stderr capture.\n\nEmpty when stderr was streamed via `process/outputDelta`.",
"type": "string"
},
"stderrCapReached": {
"description": "Whether stderr reached `outputBytesCap`.\n\nIn streaming mode, stderr is empty and cap state is also reported on the final stderr `process/outputDelta` notification.",
"type": "boolean"
},
"stdout": {
"description": "Buffered stdout capture.\n\nEmpty when stdout was streamed via `process/outputDelta`.",
"type": "string"
},
"stdoutCapReached": {
"description": "Whether stdout reached `outputBytesCap`.\n\nIn streaming mode, stdout is empty and cap state is also reported on the final stdout `process/outputDelta` notification.",
"type": "boolean"
}
},
"required": [
"exitCode",
"processHandle",
"stderr",
"stderrCapReached",
"stdout",
"stdoutCapReached"
],
"type": "object"
},
"ProcessOutputDeltaNotification": {
"description": "Base64-encoded output chunk emitted for a streaming `process/spawn` request.",
"properties": {
"capReached": {
"description": "True on the final streamed chunk for this stream when output was truncated by `outputBytesCap`.",
"type": "boolean"
},
"deltaBase64": {
"description": "Base64-encoded output bytes.",
"type": "string"
},
"processHandle": {
"description": "Client-supplied, connection-scoped `processHandle` from `process/spawn`.",
"type": "string"
},
"stream": {
"allOf": [
{
"$ref": "#/definitions/ProcessOutputStream"
}
],
"description": "Output stream this chunk belongs to."
}
},
"required": [
"capReached",
"deltaBase64",
"processHandle",
"stream"
],
"type": "object"
},
"ProcessOutputStream": {
"description": "Stream label for `process/outputDelta` notifications.",
"oneOf": [
{
"description": "stdout stream. PTY mode multiplexes terminal output here.",
"enum": [
"stdout"
],
"type": "string"
},
{
"description": "stderr stream.",
"enum": [
"stderr"
],
"type": "string"
}
]
},
"RateLimitReachedType": {
"enum": [
"rate_limit_reached",
@@ -2569,6 +2707,33 @@
],
"type": "object"
},
"RemoteControlConnectionStatus": {
"enum": [
"disabled",
"connecting",
"connected",
"errored"
],
"type": "string"
},
"RemoteControlStatusChangedNotification": {
"description": "Current remote-control connection status and environment id exposed to clients.",
"properties": {
"environmentId": {
"type": [
"string",
"null"
]
},
"status": {
"$ref": "#/definitions/RemoteControlConnectionStatus"
}
},
"required": [
"status"
],
"type": "object"
},
"RequestId": {
"anyOf": [
{
@@ -2907,6 +3072,10 @@
"description": "Usually the first user message in the thread, if available.",
"type": "string"
},
"sessionId": {
"description": "Session id shared by threads that belong to the same session tree.",
"type": "string"
},
"source": {
"allOf": [
{
@@ -2923,6 +3092,17 @@
],
"description": "Current runtime status for the thread."
},
"threadSource": {
"anyOf": [
{
"$ref": "#/definitions/ThreadSource"
},
{
"type": "null"
}
],
"description": "Optional analytics source classification for this thread."
},
"turns": {
"description": "Only populated on `thread/resume`, `thread/rollback`, `thread/fork`, and `thread/read` (when `includeTurns` is true) responses. For all other responses and notifications returning a Thread, the turns field will be an empty list.",
"items": {
@@ -2944,6 +3124,7 @@
"id",
"modelProvider",
"preview",
"sessionId",
"source",
"status",
"turns",
@@ -2980,6 +3161,93 @@
],
"type": "object"
},
"ThreadGoal": {
"properties": {
"createdAt": {
"format": "int64",
"type": "integer"
},
"objective": {
"type": "string"
},
"status": {
"$ref": "#/definitions/ThreadGoalStatus"
},
"threadId": {
"type": "string"
},
"timeUsedSeconds": {
"format": "int64",
"type": "integer"
},
"tokenBudget": {
"format": "int64",
"type": [
"integer",
"null"
]
},
"tokensUsed": {
"format": "int64",
"type": "integer"
},
"updatedAt": {
"format": "int64",
"type": "integer"
}
},
"required": [
"createdAt",
"objective",
"status",
"threadId",
"timeUsedSeconds",
"tokensUsed",
"updatedAt"
],
"type": "object"
},
"ThreadGoalClearedNotification": {
"properties": {
"threadId": {
"type": "string"
}
},
"required": [
"threadId"
],
"type": "object"
},
"ThreadGoalStatus": {
"enum": [
"active",
"paused",
"budgetLimited",
"complete"
],
"type": "string"
},
"ThreadGoalUpdatedNotification": {
"properties": {
"goal": {
"$ref": "#/definitions/ThreadGoal"
},
"threadId": {
"type": "string"
},
"turnId": {
"type": [
"string",
"null"
]
}
},
"required": [
"goal",
"threadId"
],
"type": "object"
},
"ThreadId": {
"type": "string"
},
@@ -3781,7 +4049,7 @@
"ThreadRealtimeStartedNotification": {
"description": "EXPERIMENTAL - emitted when thread realtime startup is accepted.",
"properties": {
"sessionId": {
"realtimeSessionId": {
"type": [
"string",
"null"
@@ -3842,6 +4110,14 @@
],
"type": "object"
},
"ThreadSource": {
"enum": [
"user",
"subagent",
"memory_consolidation"
],
"type": "string"
},
"ThreadStartedNotification": {
"properties": {
"thread": {
@@ -4060,12 +4336,21 @@
"type": "string"
},
"items": {
"description": "Only populated on a `thread/resume` or `thread/fork` response. For all other responses and notifications returning a Turn, the items field will be an empty list.",
"description": "Thread items currently included in this turn payload.",
"items": {
"$ref": "#/definitions/ThreadItem"
},
"type": "array"
},
"itemsView": {
"allOf": [
{
"$ref": "#/definitions/TurnItemsView"
}
],
"default": "full",
"description": "Describes how much of `items` has been loaded for this turn."
},
"startedAt": {
"description": "Unix timestamp (in seconds) when the turn started.",
"format": "int64",
@@ -4148,6 +4433,31 @@
],
"type": "object"
},
"TurnItemsView": {
"oneOf": [
{
"description": "`items` was not loaded for this turn. The field is intentionally empty.",
"enum": [
"notLoaded"
],
"type": "string"
},
{
"description": "`items` contains only a display summary for this turn.",
"enum": [
"summary"
],
"type": "string"
},
{
"description": "`items` contains every ThreadItem available from persisted app-server history for this turn.",
"enum": [
"full"
],
"type": "string"
}
]
},
"TurnPlanStep": {
"properties": {
"status": {
@@ -4679,6 +4989,46 @@
"title": "Thread/name/updatedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"thread/goal/updated"
],
"title": "Thread/goal/updatedNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ThreadGoalUpdatedNotification"
}
},
"required": [
"method",
"params"
],
"title": "Thread/goal/updatedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"thread/goal/cleared"
],
"title": "Thread/goal/clearedNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ThreadGoalClearedNotification"
}
},
"required": [
"method",
"params"
],
"title": "Thread/goal/clearedNotification",
"type": "object"
},
{
"properties": {
"method": {
@@ -4961,6 +5311,48 @@
"title": "Command/exec/outputDeltaNotification",
"type": "object"
},
{
"description": "Stream base64-encoded stdout/stderr chunks for a running `process/spawn` session.",
"properties": {
"method": {
"enum": [
"process/outputDelta"
],
"title": "Process/outputDeltaNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ProcessOutputDeltaNotification"
}
},
"required": [
"method",
"params"
],
"title": "Process/outputDeltaNotification",
"type": "object"
},
{
"description": "Final exit notification for a `process/spawn` session.",
"properties": {
"method": {
"enum": [
"process/exited"
],
"title": "Process/exitedNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ProcessExitedNotification"
}
},
"required": [
"method",
"params"
],
"title": "Process/exitedNotification",
"type": "object"
},
{
"properties": {
"method": {
@@ -5002,6 +5394,7 @@
"type": "object"
},
{
"description": "Deprecated legacy apply_patch output stream notification.",
"properties": {
"method": {
"enum": [
@@ -5181,6 +5574,26 @@
"title": "App/list/updatedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"remoteControl/status/changed"
],
"title": "RemoteControl/status/changedNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/RemoteControlStatusChangedNotification"
}
},
"required": [
"method",
"params"
],
"title": "RemoteControl/status/changedNotification",
"type": "object"
},
{
"properties": {
"method": {
@@ -5322,6 +5735,26 @@
"title": "Model/reroutedNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"model/verification"
],
"title": "Model/verificationNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/ModelVerificationNotification"
}
},
"required": [
"method",
"params"
],
"title": "Model/verificationNotification",
"type": "object"
},
{
"properties": {
"method": {
@@ -5342,6 +5775,26 @@
"title": "WarningNotification",
"type": "object"
},
{
"properties": {
"method": {
"enum": [
"guardianWarning"
],
"title": "GuardianWarningNotificationMethod",
"type": "string"
},
"params": {
"$ref": "#/definitions/GuardianWarningNotification"
}
},
"required": [
"method",
"params"
],
"title": "GuardianWarningNotification",
"type": "object"
},
{
"properties": {
"method": {

View File

@@ -25,6 +25,7 @@
]
},
"read": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -34,6 +35,7 @@
]
},
"write": {
"description": "This will be removed in favor of `entries`.",
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
@@ -76,7 +78,8 @@
{
"type": "null"
}
]
],
"description": "Partial overlay used for per-command permission requests."
}
},
"type": "object"
@@ -728,21 +731,6 @@
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"current_working_directory"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "CurrentWorkingDirectoryFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {

View File

@@ -27,6 +27,202 @@
],
"type": "object"
},
"FileSystemAccessMode": {
"enum": [
"read",
"write",
"none"
],
"type": "string"
},
"FileSystemPath": {
"oneOf": [
{
"properties": {
"path": {
"$ref": "#/definitions/AbsolutePathBuf"
},
"type": {
"enum": [
"path"
],
"title": "PathFileSystemPathType",
"type": "string"
}
},
"required": [
"path",
"type"
],
"title": "PathFileSystemPath",
"type": "object"
},
{
"properties": {
"pattern": {
"type": "string"
},
"type": {
"enum": [
"glob_pattern"
],
"title": "GlobPatternFileSystemPathType",
"type": "string"
}
},
"required": [
"pattern",
"type"
],
"title": "GlobPatternFileSystemPath",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"special"
],
"title": "SpecialFileSystemPathType",
"type": "string"
},
"value": {
"$ref": "#/definitions/FileSystemSpecialPath"
}
},
"required": [
"type",
"value"
],
"title": "SpecialFileSystemPath",
"type": "object"
}
]
},
"FileSystemSandboxEntry": {
"properties": {
"access": {
"$ref": "#/definitions/FileSystemAccessMode"
},
"path": {
"$ref": "#/definitions/FileSystemPath"
}
},
"required": [
"access",
"path"
],
"type": "object"
},
"FileSystemSpecialPath": {
"oneOf": [
{
"properties": {
"kind": {
"enum": [
"root"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "RootFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"minimal"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "MinimalFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"project_roots"
],
"type": "string"
},
"subpath": {
"type": [
"string",
"null"
]
}
},
"required": [
"kind"
],
"title": "KindFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"tmpdir"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "TmpdirFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"slash_tmp"
],
"type": "string"
}
},
"required": [
"kind"
],
"title": "SlashTmpFileSystemSpecialPath",
"type": "object"
},
{
"properties": {
"kind": {
"enum": [
"unknown"
],
"type": "string"
},
"path": {
"type": "string"
},
"subpath": {
"type": [
"string",
"null"
]
}
},
"required": [
"kind",
"path"
],
"type": "object"
}
]
},
"NetworkAccess": {
"enum": [
"restricted",
@@ -34,53 +230,135 @@
],
"type": "string"
},
"ReadOnlyAccess": {
"PermissionProfile": {
"oneOf": [
{
"description": "Codex owns sandbox construction for this profile.",
"properties": {
"includePlatformDefaults": {
"default": true,
"type": "boolean"
"fileSystem": {
"$ref": "#/definitions/PermissionProfileFileSystemPermissions"
},
"readableRoots": {
"default": [],
"items": {
"$ref": "#/definitions/AbsolutePathBuf"
},
"type": "array"
"network": {
"$ref": "#/definitions/PermissionProfileNetworkPermissions"
},
"type": {
"enum": [
"restricted"
"managed"
],
"title": "RestrictedReadOnlyAccessType",
"title": "ManagedPermissionProfileType",
"type": "string"
}
},
"required": [
"fileSystem",
"network",
"type"
],
"title": "ManagedPermissionProfile",
"type": "object"
},
{
"description": "Do not apply an outer sandbox.",
"properties": {
"type": {
"enum": [
"disabled"
],
"title": "DisabledPermissionProfileType",
"type": "string"
}
},
"required": [
"type"
],
"title": "RestrictedReadOnlyAccess",
"title": "DisabledPermissionProfile",
"type": "object"
},
{
"description": "Filesystem isolation is enforced by an external caller.",
"properties": {
"network": {
"$ref": "#/definitions/PermissionProfileNetworkPermissions"
},
"type": {
"enum": [
"external"
],
"title": "ExternalPermissionProfileType",
"type": "string"
}
},
"required": [
"network",
"type"
],
"title": "ExternalPermissionProfile",
"type": "object"
}
]
},
"PermissionProfileFileSystemPermissions": {
"oneOf": [
{
"properties": {
"entries": {
"items": {
"$ref": "#/definitions/FileSystemSandboxEntry"
},
"type": "array"
},
"globScanMaxDepth": {
"format": "uint",
"minimum": 1.0,
"type": [
"integer",
"null"
]
},
"type": {
"enum": [
"restricted"
],
"title": "RestrictedPermissionProfileFileSystemPermissionsType",
"type": "string"
}
},
"required": [
"entries",
"type"
],
"title": "RestrictedPermissionProfileFileSystemPermissions",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"fullAccess"
"unrestricted"
],
"title": "FullAccessReadOnlyAccessType",
"title": "UnrestrictedPermissionProfileFileSystemPermissionsType",
"type": "string"
}
},
"required": [
"type"
],
"title": "FullAccessReadOnlyAccess",
"title": "UnrestrictedPermissionProfileFileSystemPermissions",
"type": "object"
}
]
},
"PermissionProfileNetworkPermissions": {
"properties": {
"enabled": {
"type": "boolean"
}
},
"required": [
"enabled"
],
"type": "object"
},
"SandboxPolicy": {
"oneOf": [
{
@@ -101,16 +379,6 @@
},
{
"properties": {
"access": {
"allOf": [
{
"$ref": "#/definitions/ReadOnlyAccess"
}
],
"default": {
"type": "fullAccess"
}
},
"networkAccess": {
"default": false,
"type": "boolean"
@@ -167,16 +435,6 @@
"default": false,
"type": "boolean"
},
"readOnlyAccess": {
"allOf": [
{
"$ref": "#/definitions/ReadOnlyAccess"
}
],
"default": {
"type": "fullAccess"
}
},
"type": {
"enum": [
"workspaceWrite"
@@ -263,7 +521,7 @@
"type": "null"
}
],
"description": "Optional sandbox policy for this command.\n\nUses the same shape as thread/turn execution sandbox configuration and defaults to the user's configured policy when omitted."
"description": "Optional sandbox policy for this command.\n\nUses the same shape as thread/turn execution sandbox configuration and defaults to the user's configured policy when omitted. Cannot be combined with `permissionProfile`."
},
"size": {
"anyOf": [

View File

@@ -97,9 +97,10 @@
"type": "object"
},
"ApprovalsReviewer": {
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `guardian_subagent` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request.",
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `auto_review` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request. The legacy value `guardian_subagent` is accepted for compatibility.",
"enum": [
"user",
"auto_review",
"guardian_subagent"
],
"type": "string"
@@ -351,13 +352,9 @@
]
},
"service_tier": {
"anyOf": [
{
"$ref": "#/definitions/ServiceTier"
},
{
"type": "null"
}
"type": [
"string",
"null"
]
},
"tools": {
@@ -657,13 +654,9 @@
]
},
"service_tier": {
"anyOf": [
{
"$ref": "#/definitions/ServiceTier"
},
{
"type": "null"
}
"type": [
"string",
"null"
]
},
"tools": {
@@ -753,13 +746,6 @@
},
"type": "object"
},
"ServiceTier": {
"enum": [
"fast",
"flex"
],
"type": "string"
},
"ToolsV2": {
"properties": {
"view_image": {

View File

@@ -2,9 +2,10 @@
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"ApprovalsReviewer": {
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `guardian_subagent` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request.",
"description": "Configures who approval requests are routed to for review. Examples include sandbox escapes, blocked network access, MCP approval prompts, and ARC escalations. Defaults to `user`. `auto_review` uses a carefully prompted subagent to gather relevant context and apply a risk-based decision framework before approving or denying the request. The legacy value `guardian_subagent` is accepted for compatibility.",
"enum": [
"user",
"auto_review",
"guardian_subagent"
],
"type": "string"
@@ -110,6 +111,161 @@
},
"type": "object"
},
"ConfiguredHookHandler": {
"oneOf": [
{
"properties": {
"async": {
"type": "boolean"
},
"command": {
"type": "string"
},
"statusMessage": {
"type": [
"string",
"null"
]
},
"timeoutSec": {
"format": "uint64",
"minimum": 0.0,
"type": [
"integer",
"null"
]
},
"type": {
"enum": [
"command"
],
"title": "CommandConfiguredHookHandlerType",
"type": "string"
}
},
"required": [
"async",
"command",
"type"
],
"title": "CommandConfiguredHookHandler",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"prompt"
],
"title": "PromptConfiguredHookHandlerType",
"type": "string"
}
},
"required": [
"type"
],
"title": "PromptConfiguredHookHandler",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"agent"
],
"title": "AgentConfiguredHookHandlerType",
"type": "string"
}
},
"required": [
"type"
],
"title": "AgentConfiguredHookHandler",
"type": "object"
}
]
},
"ConfiguredHookMatcherGroup": {
"properties": {
"hooks": {
"items": {
"$ref": "#/definitions/ConfiguredHookHandler"
},
"type": "array"
},
"matcher": {
"type": [
"string",
"null"
]
}
},
"required": [
"hooks"
],
"type": "object"
},
"ManagedHooksRequirements": {
"properties": {
"PermissionRequest": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"PostToolUse": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"PreToolUse": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"SessionStart": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"Stop": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"UserPromptSubmit": {
"items": {
"$ref": "#/definitions/ConfiguredHookMatcherGroup"
},
"type": "array"
},
"managedDir": {
"type": [
"string",
"null"
]
},
"windowsManagedDir": {
"type": [
"string",
"null"
]
}
},
"required": [
"PermissionRequest",
"PostToolUse",
"PreToolUse",
"SessionStart",
"Stop",
"UserPromptSubmit"
],
"type": "object"
},
"NetworkDomainPermission": {
"enum": [
"allow",

View File

@@ -9,6 +9,7 @@
"contextWindowExceeded",
"usageLimitExceeded",
"serverOverloaded",
"cyberPolicy",
"internalServerError",
"unauthorized",
"badRequest",

View File

@@ -1,6 +1,17 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"CommandMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"ExternalAgentConfigMigrationItem": {
"properties": {
"cwd": {
@@ -39,22 +50,81 @@
"CONFIG",
"SKILLS",
"PLUGINS",
"MCP_SERVER_CONFIG"
"MCP_SERVER_CONFIG",
"SUBAGENTS",
"HOOKS",
"COMMANDS",
"SESSIONS"
],
"type": "string"
},
"HookMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"McpServerMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"MigrationDetails": {
"properties": {
"commands": {
"default": [],
"items": {
"$ref": "#/definitions/CommandMigration"
},
"type": "array"
},
"hooks": {
"default": [],
"items": {
"$ref": "#/definitions/HookMigration"
},
"type": "array"
},
"mcpServers": {
"default": [],
"items": {
"$ref": "#/definitions/McpServerMigration"
},
"type": "array"
},
"plugins": {
"default": [],
"items": {
"$ref": "#/definitions/PluginsMigration"
},
"type": "array"
},
"sessions": {
"default": [],
"items": {
"$ref": "#/definitions/SessionMigration"
},
"type": "array"
},
"subagents": {
"default": [],
"items": {
"$ref": "#/definitions/SubagentMigration"
},
"type": "array"
}
},
"required": [
"plugins"
],
"type": "object"
},
"PluginsMigration": {
@@ -74,6 +144,38 @@
"pluginNames"
],
"type": "object"
},
"SessionMigration": {
"properties": {
"cwd": {
"type": "string"
},
"path": {
"type": "string"
},
"title": {
"type": [
"string",
"null"
]
}
},
"required": [
"cwd",
"path"
],
"type": "object"
},
"SubagentMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
}
},
"properties": {

View File

@@ -1,6 +1,17 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"CommandMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"ExternalAgentConfigMigrationItem": {
"properties": {
"cwd": {
@@ -39,22 +50,81 @@
"CONFIG",
"SKILLS",
"PLUGINS",
"MCP_SERVER_CONFIG"
"MCP_SERVER_CONFIG",
"SUBAGENTS",
"HOOKS",
"COMMANDS",
"SESSIONS"
],
"type": "string"
},
"HookMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"McpServerMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"MigrationDetails": {
"properties": {
"commands": {
"default": [],
"items": {
"$ref": "#/definitions/CommandMigration"
},
"type": "array"
},
"hooks": {
"default": [],
"items": {
"$ref": "#/definitions/HookMigration"
},
"type": "array"
},
"mcpServers": {
"default": [],
"items": {
"$ref": "#/definitions/McpServerMigration"
},
"type": "array"
},
"plugins": {
"default": [],
"items": {
"$ref": "#/definitions/PluginsMigration"
},
"type": "array"
},
"sessions": {
"default": [],
"items": {
"$ref": "#/definitions/SessionMigration"
},
"type": "array"
},
"subagents": {
"default": [],
"items": {
"$ref": "#/definitions/SubagentMigration"
},
"type": "array"
}
},
"required": [
"plugins"
],
"type": "object"
},
"PluginsMigration": {
@@ -74,6 +144,38 @@
"pluginNames"
],
"type": "object"
},
"SessionMigration": {
"properties": {
"cwd": {
"type": "string"
},
"path": {
"type": "string"
},
"title": {
"type": [
"string",
"null"
]
}
},
"required": [
"cwd",
"path"
],
"type": "object"
},
"SubagentMigration": {
"properties": {
"name": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
}
},
"properties": {

View File

@@ -1,5 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "Deprecated legacy notification for `apply_patch` textual output.\n\nThe server no longer emits this notification.",
"properties": {
"delta": {
"type": "string"

View File

@@ -42,6 +42,22 @@
],
"title": "ChatgptAccount",
"type": "object"
},
{
"properties": {
"type": {
"enum": [
"amazonBedrock"
],
"title": "AmazonBedrockAccountType",
"type": "string"
}
},
"required": [
"type"
],
"title": "AmazonBedrockAccount",
"type": "object"
}
]
},

Some files were not shown because too many files have changed in this diff Show More