Commit Graph

5486 Commits

Author SHA1 Message Date
canvrno-oai
17e2ecd641 avoid parenthesized rendered link urls 2026-04-17 11:57:00 -07:00
Won Park
af7b8d551c Guardian -> Auto-Review (#18021)
This PR is a user-facing change for our rebranding of guardian to
auto-review.
2026-04-17 09:56:24 -07:00
Michael Bolin
d0eff70383 Fix config-loader tests after filesystem abstraction race (#18351)
## Why

`origin/main` picked up two changes that crossed in flight:

- #18209 refactored config loading to read through `ExecutorFileSystem`,
changing `load_requirements_toml` to take a filesystem handle and an
`AbsolutePathBuf`.
- #17740 added managed `deny_read` requirements tests that still called
`load_requirements_toml` with the previous two-argument signature.

Once both landed, `just clippy` failed because the new tests no longer
matched the current helper API.

## What

- Updates the two managed `deny_read` requirements tests to convert the
fixture path to `AbsolutePathBuf` before loading.
- Passes `LOCAL_FS.as_ref()` into `load_requirements_toml` so these
tests follow the filesystem abstraction introduced by #18209.

## Verification

- `just clippy`
- `cargo test -p codex-core load_requirements_toml_resolves_deny_read`
- `cargo test -p codex-core --test all
unified_exec_enforces_glob_deny_read_policy`
2026-04-17 09:20:39 -07:00
pakrym-oai
71e4c6fa17 Move codex module under session (#18249)
## Summary
- rename the core codex module root to session/mod.rs without using
#[path]
- move the codex module directory and tests under core/src/session
- remove session/mod.rs reexports so call sites use explicit child
module paths

## Testing
- cargo test -p codex-core --lib
- cargo check -p codex-core --tests
- just fmt
- just fix -p codex-core
- git diff --check
2026-04-17 16:18:53 +00:00
viyatb-oai
dae0608c06 feat(config): support managed deny-read requirements (#17740)
## Summary
- adds managed requirements support for deny-read filesystem entries
- constrains config layers so managed deny-read requirements cannot be
widened by user-controlled config
- surfaces managed deny-read requirements through debug/config plumbing

This PR lets managed requirements inject deny-read filesystem
constraints into the effective filesystem sandbox policy.
User-controlled config can still choose the surrounding permission
profile, but it cannot remove or weaken the managed deny-read entries.

## Managed deny-read shape
A managed requirements file can declare exact paths and glob patterns
under `[permissions.filesystem]`:

```toml
# /etc/codex/requirements.toml
[permissions.filesystem]
deny_read = [
  "/Users/alice/.gitconfig",
  "/Users/alice/.ssh",
  "./managed-private/**/*.env",
]
```

Those entries are compiled into the effective filesystem policy as
`access = none` rules, equivalent in shape to filesystem permission
entries like:

```toml
[permissions.workspace.filesystem]
"/Users/alice/.gitconfig" = "none"
"/Users/alice/.ssh" = "none"
"/absolute/path/to/managed-private/**/*.env" = "none"
```

The important difference is that the managed entries come from
requirements, so lower-precedence user config cannot remove them or make
those paths readable again.

Relative managed `deny_read` entries are resolved relative to the
directory containing the managed requirements file. Glob entries keep
their glob suffix after the non-glob prefix is normalized.

## Runtime behavior
- Managed `deny_read` entries are appended to the effective
`FileSystemSandboxPolicy` after the selected permission profile is
resolved.
- Exact paths become `FileSystemPath::Path { access: None }`; glob
patterns become `FileSystemPath::GlobPattern { access: None }`.
- When managed deny-read entries are present, `sandbox_mode` is
constrained to `read-only` or `workspace-write`; `danger-full-access`
and `external-sandbox` cannot silently bypass the managed read-deny
policy.
- On Windows, the managed deny-read policy is enforced for direct file
tools, but shell subprocess reads are not sandboxed yet, so startup
emits a warning for that platform.
- `/debug-config` shows the effective managed requirement as
`permissions.filesystem.deny_read` with its source.

## Stack
1. #15979 - glob deny-read policy/config/direct-tool support
2. #18096 - macOS and Linux sandbox enforcement
3. This PR - managed deny-read requirements

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-17 08:40:09 -07:00
Eric Traut
2dd6734dd3 fix(tui): use BEL for terminal title updates (#18261)
## Summary

Fixes #18160.

iTerm2 can append the current foreground process to tab titles, and
Codex's terminal-title updates were causing that decoration to appear as
`(codex")` with a stray trailing quote. Codex was writing OSC 0 title
sequences terminated with ST (`ESC \`). Some terminal title integrations
appear to accept that title update but still expose the ST terminator in
their own process/title decoration.

## Changes

- Update `codex-rs/tui/src/terminal_title.rs` to terminate OSC 0 title
updates with BEL instead of ST.
- Update the focused terminal-title encoding test to assert the
BEL-terminated sequence.

## Compatibility

This should be low risk: the title payload and update timing are
unchanged, and BEL is the form already emitted by
`crossterm::terminal::SetTitle` in the crossterm version used by this
repository. BEL is also the widely supported xterm-family title
terminator used by common terminals and multiplexers. The main
theoretical risk would be a very old or unusual terminal that accepted
only ST and not BEL for OSC title termination, but that is unlikely
compared with the observed iTerm2 issue.

## Verification

- `cargo test -p codex-tui terminal_title`
- `cargo test -p codex-tui`
2026-04-17 08:39:37 -07:00
Eric Traut
c3ecb557d3 Support Ctrl+P/Ctrl+N in resume picker (#18267)
Fixes #18179.

## Why
The fullscreen `/resume` picker accepted Up/Down navigation but ignored
Ctrl+P/Ctrl+N, which made it inconsistent with other TUI selection flows
such as `ListSelectionView`-backed pickers and composer navigation.

## What Changed
Updated `codex-rs/tui/src/resume_picker.rs` so the resume picker treats
Ctrl+P/Ctrl+N as aliases for Up/Down, including the raw `^P`/`^N`
control-character events some terminals emit without a CONTROL modifier.
2026-04-17 08:38:47 -07:00
jif-oai
3421a107e0 nit: phase 2 ephemeral (#18338) 2026-04-17 16:10:58 +01:00
Abhinav
8494e5bd7b Add PermissionRequest hooks support (#17563)
## Why

We need `PermissionRequest` hook support!

Also addresses:
- https://github.com/openai/codex/issues/16301
- run a script on Hook to do things like play a sound to draw attention
but actually no-op so user can still approve
- can omit the `decision` object from output or just have the script
exit 0 and print nothing
- https://github.com/openai/codex/issues/15311
  - let the script approve/deny on its own
  - external UI what will run on Hook and relay decision back to codex


## Reviewer Note

There's a lot of plumbing for the new hook, key files to review are:
- New hook added in `codex-rs/hooks/src/events/permission_request.rs`
- Wiring for network approvals
`codex-rs/core/src/tools/network_approval.rs`
- Wiring for tool orchestrator `codex-rs/core/src/tools/orchestrator.rs`
- Wiring for execve
`codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`

## What

- Wires shell, unified exec, and network approval prompts into the
`PermissionRequest` hook flow.
- Lets hooks allow or deny approval prompts; quiet or invalid hooks fall
back to the normal approval path.
- Uses `tool_input.description` for user-facing context when it helps:
  - shell / `exec_command`: the request justification, when present
  - network approvals: `network-access <domain>`
- Uses `tool_name: Bash` for shell, unified exec, and network approval
permission-request hooks.
- For network approvals, passes the originating command in
`tool_input.command` when there is a single owning call; otherwise falls
back to the synthetic `network-access ...` command.

<details>
<summary>Example `PermissionRequest` hook input for a shell
approval</summary>

```json
{
  "session_id": "<session-id>",
  "turn_id": "<turn-id>",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/path/to/cwd",
  "hook_event_name": "PermissionRequest",
  "model": "gpt-5",
  "permission_mode": "default",
  "tool_name": "Bash",
  "tool_input": {
    "command": "rm -f /tmp/example"
  }
}
```

</details>

<details>
<summary>Example `PermissionRequest` hook input for an escalated
`exec_command` request</summary>

```json
{
  "session_id": "<session-id>",
  "turn_id": "<turn-id>",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/path/to/cwd",
  "hook_event_name": "PermissionRequest",
  "model": "gpt-5",
  "permission_mode": "default",
  "tool_name": "Bash",
  "tool_input": {
    "command": "cp /tmp/source.json /Users/alice/export/source.json",
    "description": "Need to copy a generated file outside the workspace"
  }
}
```

</details>

<details>
<summary>Example `PermissionRequest` hook input for a network
approval</summary>

```json
{
  "session_id": "<session-id>",
  "turn_id": "<turn-id>",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/path/to/cwd",
  "hook_event_name": "PermissionRequest",
  "model": "gpt-5",
  "permission_mode": "default",
  "tool_name": "Bash",
  "tool_input": {
    "command": "curl http://codex-network-test.invalid",
    "description": "network-access http://codex-network-test.invalid"
  }
}
```

</details>

## Follow-ups

- Implement the `PermissionRequest` semantics for `updatedInput`,
`updatedPermissions`, `interrupt`, and suggestions /
`permission_suggestions`
- Add `PermissionRequest` support for the `request_permissions` tool
path

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-17 14:45:47 +00:00
sayan-oai
d0047de7cb add token-based tool deferral behind feature flag (#18097)
add new `tool_search_always_defer_mcp_tools` feature flag that always
defers all mcp tools rather than deferring once > 100 deferrable tools.

add new tests, also move `mcp_exposure` tests into dedicated file rather
than polluting `codex_tests`.
2026-04-17 18:34:06 +08:00
alexsong-oai
20b4b80426 Sync local plugin imports, async remote imports, refresh caches after… (#18246)
… import

## Why

`externalAgentConfig/import` used to spawn plugin imports in the
background and return immediately. That meant local marketplace imports
could still be in flight when the caller refreshed plugin state, so
newly imported plugins would not show up right away.

This change makes local marketplace imports complete before the RPC
returns, while keeping remote marketplace imports asynchronous so we do
not block on remote fetches.

## What changed

- split plugin migration details into local and remote marketplace
imports based on the external config source
- import local marketplaces synchronously during
`externalAgentConfig/import`
- return pending remote plugin imports to the app-server so it can
finish them in the background
- clear the plugin and skills caches before responding to plugin
imports, and again after background remote imports complete, so the next
`plugin/list` reloads fresh state
- keep marketplace source parsing encapsulated behind
`is_local_marketplace_source(...)` instead of re-exporting the internal
enum
- add core and app-server coverage for the synchronous local import path
and the pending remote import path

## Verification

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core` (currently fails an existing unrelated
test:
`config_loader::tests::cli_override_can_update_project_local_mcp_server_when_project_is_trusted`)
- `cargo test` (currently fails existing `codex-app-server` integration
tests in MCP/skills/thread-start areas, plus the unrelated `codex-core`
failure above)
2026-04-17 09:34:55 +00:00
jif-oai
64177aaa22 fix: reduce writable root (#17947) 2026-04-17 09:33:12 +01:00
Eric Traut
2e038e6d38 Fix Windows exec policy test flake (#18304)
## Summary

This fixes a Windows-only failure in the exec policy multi-segment shell
test. The test was meant to verify that a compound shell command only
bypasses sandboxing when every parsed segment has an explicit exec
policy allow rule.

On Windows, the read-only sandbox setup is intentionally treated as
lacking sandbox protection, so the old fixture could take the approval
path before reaching the intended bypass assertion. The test now uses
the workspace-write sandbox policy, keeping the focus on the per-segment
bypass rule while preserving the expected bypass_sandbox false result
when only cat is explicitly allowed.
2026-04-17 00:43:49 -07:00
sashank-oai
22f7ef1cb7 [codex] Revoke ChatGPT tokens on logout (#17825)
## Summary

This changes Codex logout so managed ChatGPT auth is revoked against
AuthAPI before local auth state is removed. CLI logout, TUI `/logout`,
and the app-server account logout path now use the token-revoking logout
flow instead of only deleting `auth.json` / credential store state.

## Root Cause

Logout previously cleared only local auth storage. That removed Codex's
local credentials but did not ask the backend to invalidate the
refresh/access token state associated with a managed ChatGPT login.

## Behavior

For managed ChatGPT auth, logout sends the stored refresh token to
`https://auth.openai.com/oauth/revoke` with `token_type_hint:
refresh_token` and the Codex OAuth client id, then deletes all local
auth stores after revocation succeeds. If only an access token is
available, it falls back to revoking that access token. API key auth and
externally supplied `chatgptAuthTokens` are still only cleared locally
because Codex does not own a refresh token for those modes.

Revocation failures are fail-closed: if Codex cannot load stored auth or
the backend revoke call fails, logout returns an error and leaves local
auth in place so the user can retry instead of silently clearing local
state while backend tokens remain valid.

## Validation
ran local version of `codex-cli` with staging overrides/harness for auth

ran `codex login` then `codex logout`:

saw auth.json clear and  backend revocation endpoints were called

```
POST /oauth/revoke
status: 200

revoking access token
should clear auth session
clearing auth session due to token revocation
successfully revoked session and access token
CANONICAL-API-LINE Response: status='200' method='POST' path='/oauth/revoke
```

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 22:51:21 -07:00
Dylan Hurd
fe7c959e90 fix(exec-policy) rules parsing (#18126)
## Summary
See scenarios - rules must always be enforced on all commands in the
string

## Testing
- [x] Added ExecApprovalRequirementScenario tests
2026-04-16 21:18:39 -07:00
Tom
9d6f4f2e2e codex: split thread/read view loading (#18231)
Summary
- refactor thread/read into explicit persisted-load, live-load, and
merge steps
- preserve existing SQLite/filesystem/live-thread behavior exactly
- keep ThreadStore migration out of this PR so the next PR is easier to
review

Validation
- this one's a pure reorganization that relies on existing test coverage
2026-04-16 21:06:03 -07:00
Leo Shimonaka
dd00efe781 Move Computer Use tool suggestion to core (#18219)
## Summary

Move the Computer Use tool suggestion into core Codex plugin discovery.

Also search `openai-bundled` when listing suggested plugins, with test
coverage for overlap between baked-in suggestions and
`tool_suggest.discoverables`.

## Test plan

Tested locally:

- `cargo test -p codex-core list_tool_suggest_discoverable_plugins`
2026-04-16 19:55:23 -07:00
xl-openai
37161bc76e feat: Handle alternate plugin manifest paths (#18182)
Load plugin manifests through a shared discoverable-path helper so
manifest reads, installs, and skill names all see the same alternate
manifest location.
2026-04-16 19:43:19 -07:00
Celia Chen
a803790a10 feat: add opt-in provider runtime abstraction (#17713)
## Summary

- Add `codex-model-provider` as the runtime home for model-provider
behavior that does not belong in `codex-core`, `codex-login`, or
`codex-api`.
- The new crate wraps configured `ModelProviderInfo` in a
`ModelProvider` trait object that can resolve the API provider config,
provider-scoped auth manager, and request auth provider for each call.
- This centralizes provider auth behavior in one place today, and gives
us an extension point for future provider-specific auth, model listing,
request setup, and related runtime behavior.

## Tests
Ran tests manually to make sure that provider auth under different
configs still work as expected.

---------

Co-authored-by: pakrym-oai <pakrym@openai.com>
2026-04-17 02:27:45 +00:00
pakrym-oai
91e8eebd03 Split codex session modules (#18244)
## Summary
- split `codex.rs` session definitions and constructor into
`codex/session.rs`
- move MCP session methods into `codex/mcp.rs`
- move turn-context types/helpers into `codex/turn_context.rs`
- move review thread spawning into `codex/review.rs`

## Testing
- `cargo check -p codex-core`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core` (unit tests passed; integration run failed
locally with 45 failures, including missing helper binaries such as
`test_stdio_server`/`codex` plus approval/web-search/MCP-related cases)
2026-04-16 18:15:19 -07:00
Akshay Nathan
7995c66032 Stream apply_patch changes (#17862)
Adds new events for streaming apply_patch changes from responses api.
This is to enable clients to show progress during file writes.

Caveat: This does not work with apply_patch in function call mode, since
that required adding streaming json parsing.
2026-04-16 18:12:19 -07:00
pakrym-oai
9effa0509f Refactor config loading to use filesystem abstraction (#18209)
Initial pass propagating FileSystem through config loading.
2026-04-17 00:51:21 +00:00
viyatb-oai
2967900d81 fix: deprecate use_legacy_landlock feature flag (#17971)
## Summary
- mark `features.use_legacy_landlock` as a deprecated feature flag
- emit a startup deprecation notice when the flag is configured
- add feature- and core-level regression coverage for the notice


<img width="1288" height="93" alt="Screenshot 2026-04-15 at 11 14 00 PM"
src="https://github.com/user-attachments/assets/fffc628b-614c-4521-9374-64e50a269252"
/>

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 17:37:15 -07:00
viyatb-oai
0d0abe839a feat(sandbox): add glob deny-read platform enforcement (#18096)
## Summary
- adds macOS Seatbelt deny rules for unreadable glob patterns
- expands unreadable glob matches on Linux and masks them in bwrap,
including canonical symlink targets
- keeps Linux glob expansion robust when `rg` is unavailable in minimal
or Bazel test environments
- adds sandbox integration coverage that runs `shell` and `exec_command`
with a `**/*.env = none` policy and verifies the secret contents do not
reach the model

## Linux glob expansion

```text
Prefer:   rg --files --hidden --no-ignore --glob <pattern> -- <search-root>
Fallback: internal globset walker when rg is not installed
Failure:  any other rg failure aborts sandbox construction
```

```
[permissions.workspace.filesystem]
glob_scan_max_depth = 2

[permissions.workspace.filesystem.":project_roots"]
"**/*.env" = "none"
```


This keeps the common path fast without making sandbox construction
depend on an ambient `rg` binary. If `rg` is present but fails for
another reason, the sandbox setup fails closed instead of silently
omitting deny-read masks.

## Platform support
- macOS: subprocess sandbox enforcement is handled by Seatbelt regex
deny rules
- Linux: subprocess sandbox enforcement is handled by expanding existing
glob matches and masking them in bwrap
- Windows: policy/config/direct-tool glob support is already on `main`
from #15979; Windows subprocess sandbox paths continue to fail closed
when unreadable split filesystem carveouts require runtime enforcement,
rather than silently running unsandboxed

## Stack
1. #15979 - merged: cross-platform glob deny-read
policy/config/direct-tool support for macOS, Linux, and Windows
2. This PR - macOS/Linux subprocess sandbox enforcement plus Windows
fail-closed clarification
3. #17740 - managed deny-read requirements

## Verification
- Added integration coverage for `shell` and `exec_command` glob
deny-read enforcement
- `cargo check -p codex-sandboxing -p codex-linux-sandbox --tests`
- `cargo check -p codex-core --test all`
- `cargo clippy -p codex-linux-sandbox -p codex-sandboxing --tests`
- `just bazel-lock-check`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 17:35:16 -07:00
xli-oai
5818ed6660 Move marketplace add under plugin command (#18116)
## Summary
- move the marketplace add CLI from `codex marketplace add` to `codex
plugin marketplace add`
- keep marketplace config overrides working through the nested plugin
command
- reject `--sparse` for local marketplace directory sources before the
local-source install path bypasses git-source validation

## Validation
- `just fmt`
- `git diff --check`
- `cargo test -p codex-cli`
- `cargo test -p codex-core marketplace_add -- --nocapture`
- `cargo test -p codex-core
install_plugin_updates_config_with_relative_path_and_plugin_key --
--nocapture`
- `xli-test-marketplace-cli` local isolated matrix: `T1`, `L1`-`L10`
2026-04-16 17:06:34 -07:00
Matthew Zeng
bf6e7e12aa Use in-process app-server for unknown-thread MCP read test (#18196)
## Summary
- Switch the unknown-thread MCP resource read test from the stdio
subprocess to the in-process app-server path.
- Keep the assertion focused on the returned error message while
avoiding child-process teardown timing issues in nextest.

## Testing
- Not run (not requested)
2026-04-16 23:46:15 +00:00
Jeff Harris
65cc12d72e Use codex-auto-review for guardian reviews (#18169)
## Summary

This is the minimal client-side follow-up for the Codex Auto Review
model slug rollout. It updates the guardian reviewer preferred model
from `gpt-5.4` to `codex-auto-review`, so the client can rely on the
backend catalog + Statsig mapping instead of hardcoding the GPT-5.4
slug.

Context:
https://openai.slack.com/archives/C0AF9328RL0/p1775777479388369?thread_ts=1775773094.071629&cid=C0AF9328RL0

## Testing

- `cargo fmt --package codex-core --check`
- `cargo test -p codex-core guardian::`
- `bazel test --experimental_remote_downloader= --test_output=errors
//codex-rs/core:core-unit-tests --test_arg=guardian`
2026-04-16 15:43:51 -07:00
pakrym-oai
a1736fcd20 [codex] Split codex turn logic (#18206)
## Summary
- Move Codex turn execution logic from `codex.rs` into `codex/turn.rs`.
- Keep the existing crate-visible `run_turn`, `build_prompt`,
`built_tools`, and `get_last_assistant_message_from_turn` surface
re-exported from `codex.rs`.
- Preserve test access for moved turn helpers while reducing the main
`codex.rs` orchestration footprint.

## Stack
- Base: #18200 (`pakrym/split-codex-handlers`)

## Testing
- `CARGO_INCREMENTAL=0 cargo test -p codex-core --lib`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`
2026-04-16 15:28:59 -07:00
canvrno-oai
fa5d14e276 Add tabbed lists, single line rendering, col width changes (#18188)
This PR adds shared bottom-pane selection-list for future `/plugins`
menu work and wires the existing `/plugins` menu into the new
list-rendering path without changing it to tabs yet. The main
user-visible effect is that the current plugin list now renders as a
denser single-line list with shared name-column sizing, while the tabbed
selection support remains available for follow-up PRs but is currently
unused in production menus.

- Add generic tabbed selection-list support to the bottom pane,
including per-tab headers/items and tab-aware list state
- Add single-line row rendering with ellipsis truncation for dense list
UIs
- Add shared name-column width support so descriptions align
consistently across rows
- Wire the current /plugins menu to the new single-line and shared
column-width behavior only
- Keep tabbed menu adoption deferred; no existing menu is switched to
tabs in this PR

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 15:27:59 -07:00
bxie-openai
6a1ddfc366 [codex] Update realtime V2 VAD silence delay and 1.5 prompt (#18092)
## Summary

- set the realtime v2 server VAD silence delay to 500ms
- update the default realtime 1.5 backend prompt to the v4 text
- keep the session payload and prompt rendering tests aligned with those
changes

## Why

- the VAD change gives the voice path a longer pause before ending the
user's turn
- the prompt change makes the default bundled realtime prompt match the
current v4 content

## Validation

- `cargo +1.93.0 test -p codex-core realtime_prompt --manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-api
realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
--manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-app-server --test all
'suite::v2::realtime_conversation::realtime_webrtc_start_emits_sdp_notification'
--manifest-path /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml
-- --exact`
2026-04-16 14:30:57 -07:00
Abhinav
d9c71d41a9 Add OTEL metrics for hook runs (#18026)
# Why
We already emit analytics for completed hook runs, but we don't have
matching OTEL metrics to track hook volume and latency.

# What
- add `codex.hooks.run` and `codex.hooks.run.duration_ms`
- tag both metrics with `hook_name`, `source`, and `status`
- emit the metrics from the completed hook path

Verified locally against a dummy OTLP collector

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 21:30:38 +00:00
Adrian
55c3de75cb Register agent tasks behind use_agent_identity (#17387)
## Summary

Stack PR3 for feature-gated agent identity support.

This PR adds per-thread agent task registration behind
`features.use_agent_identity`. Tasks are minted on the first real user
turn and cached in thread runtime state for later turns.

## Stack

- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - register agent
identities when enabled
- PR3: https://github.com/openai/codex/pull/17387 - this PR, original
task registration slice
- PR3.1: https://github.com/openai/codex/pull/17978 - persist and
prewarm registered tasks per thread
- PR4: https://github.com/openai/codex/pull/17980 - use `AgentAssertion`
downstream when enabled

## Validation

Covered as part of the local stack validation pass:

- `just fmt`
- `cargo test -p codex-core --lib agent_identity`
- `cargo test -p codex-core --lib agent_assertion`
- `cargo test -p codex-core --lib websocket_agent_task`
- `cargo test -p codex-api api_bridge`
- `cargo build -p codex-cli --bin codex`

## Notes

The full local app-server E2E path is still being debugged after PR
creation. The current branch stack is directionally ready for review
while that follow-up continues.
2026-04-16 14:30:02 -07:00
pakrym-oai
0708cc78cb [codex] Split codex op handlers (#18200)
Start splitting the codex.rs
2026-04-16 14:21:29 -07:00
starr-openai
3905f72891 Throttle Windows Bazel test concurrency (#18192)
## Summary
- cap the Windows Bazel test lane at `--jobs=8` to reduce local runner
pressure
- keep Linux and macOS Bazel test concurrency unchanged
- make failed-test log tailing resolve `bazel-testlogs` with the same CI
config and Windows host-platform context as the failed invocation
- prefer Bazel-reported `test.log` paths and normalize Windows path
separators before tailing

## Context
The Windows Bazel workflow currently uses `ci-windows`, which does not
inherit the remote executor config. This means the lane runs the `//...`
test suite locally and otherwise falls back to the repo-wide `common
--jobs=30`. The new Windows-only override is intended to reduce local
executor pressure without changing coverage.

## Validation
Not run locally; this is a CI workflow change and the draft PR is
intended to exercise the GitHub Actions lane directly.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 14:16:15 -07:00
bxie-openai
37bf42d5d5 [codex] Make realtime startup context truncation deterministic (#18172)
## Summary

- remove the final whole-blob truncation pass from realtime
startup-context assembly
- enforce fixed per-section budgets, including each section heading
- keep the existing per-section caps and raise the overall realtime
startup-context budget to `5300`, matching the sum of those section
budgets
- add focused tests for the new wrapping and section-budget behavior

## Why

The previous flow truncated each section and then middle-truncated the
final combined startup-context blob again. Small input changes could
shift that combined cut point, which made retained context unstable and
caused nondeterministic tests.

## Impact

Startup context now preserves section boundaries and ordering
deterministically. Each section is still budgeted independently, but the
final assembled blob is no longer truncated again as a single opaque
string. To match that design, the overall startup-context token budget
is updated to the sum of the existing section budgets rather than
lowering the section caps.

## Validation

- `cargo +1.93.0 test -p codex-core realtime_context`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_start_injects_startup_context_from_thread_history
-- --exact`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_startup_context_current_thread_selects_many_turns_by_budget
-- --exact`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_startup_context_falls_back_to_workspace_map
-- --exact`
- `cargo +1.93.0 test -p codex-core --test all
suite::realtime_conversation::conversation_startup_context_is_truncated_and_sent_once_per_start
-- --exact`
2026-04-16 13:51:43 -07:00
Felipe Coury
ec8d4bfc77 fix(app-server): replay token usage after resume and fork (#18023)
## Problem

When a user resumed or forked a session, the TUI could render the
restored thread history immediately, but it did not receive token usage
until a later model turn emitted a fresh usage event. That left the
context/status UI blank or stale during the exact window where the user
expects resumed state to look complete. Core already reconstructed token
usage from the rollout; the missing behavior was app-server lifecycle
replay to the client that just attached.

## Mental model

Token usage has two representations. The rollout is the durable source
of historical `TokenCount` events, and the core session cache is the
in-memory snapshot reconstructed from that rollout on resume or fork.
App-server v2 clients do not read core state directly; they learn about
usage through `thread/tokenUsage/updated`. The fix keeps those roles
separate: core exposes the restored `TokenUsageInfo`, and app-server
sends one targeted notification after a successful `thread/resume` or
`thread/fork` response when that restored snapshot exists.

This notification is not a new model event. It is a replay of
already-persisted state for the client that just attached. That
distinction matters because using the normal core event path here would
risk duplicating `TokenCount` entries in the rollout and making future
resumes count historical usage twice.

## Non-goals

This change does not add a new protocol method or payload shape. It
reuses the existing v2 `thread/tokenUsage/updated` notification and the
TUI’s existing handler for that notification.

This change does not alter how token usage is computed, accumulated,
compacted, or written during turns. It only exposes the token usage that
resume and fork reconstruction already restored.

This change does not broadcast historical usage replay to every
subscribed client. The replay is intentionally scoped to the connection
that requested resume or fork so already-attached clients are not
surprised by an old usage update while they may be rendering live
activity.

## Tradeoffs

Sending the usage notification after the JSON-RPC response preserves a
clear lifecycle order: the client first receives the thread object, then
receives restored usage for that thread. The tradeoff is that usage is
still a notification rather than part of the `thread/resume` or
`thread/fork` response. That keeps the protocol shape stable and avoids
duplicating usage fields across response types, but clients must
continue listening for notifications after receiving the response.

The helper selects the latest non-in-progress turn id for the replayed
usage notification. This is conservative because restored usage belongs
to completed persisted accounting, not to newly attached in-flight work.
The fallback to the last turn preserves a stable wire payload for
unusual histories, but histories with no meaningful completed turn still
have a weak attribution story.

## Architecture

Core already seeds `Session` token state from the last persisted rollout
`TokenCount` during `InitialHistory::Resumed` and
`InitialHistory::Forked`. The new core accessor exposes the complete
`TokenUsageInfo` through `CodexThread` without giving app-server direct
session mutation authority.

App-server calls that accessor from three lifecycle paths: cold
`thread/resume`, running-thread resume/rejoin, and `thread/fork`. In
each path, the server sends the normal response first, then calls a
shared helper that converts core usage into
`ThreadTokenUsageUpdatedNotification` and sends it only to the
requesting connection.

The tests build fake rollouts with a user turn plus a persisted token
usage event. They then exercise `thread/resume` and `thread/fork`
without starting another model turn, proving that restored usage arrives
before any next-turn token event could be produced.

## Observability

The primary debug path is the app-server JSON-RPC stream. After
`thread/resume` or `thread/fork`, a client should see the response
followed by `thread/tokenUsage/updated` when the source rollout includes
token usage. If the notification is absent, check whether the rollout
contains an `event_msg` payload of type `token_count`, whether core
reconstruction seeded `Session::token_usage_info`, and whether the
connection stayed attached long enough to receive the targeted
notification.

The notification is sent through the existing
`OutgoingMessageSender::send_server_notification_to_connections` path,
so existing app-server tracing around server notifications still
applies. Because this is a replay, not a model turn event, debugging
should start at the resume/fork handlers rather than the turn event
translation in `bespoke_event_handling`.

## Tests

The focused regression coverage is `cargo test -p codex-app-server
emits_restored_token_usage`, which covers both resume and fork. The core
reconstruction guard is `cargo test -p codex-core
record_initial_history_seeds_token_info_from_rollout`.

Formatting and lint/fix passes were run with `just fmt`, `just fix -p
codex-core`, and `just fix -p codex-app-server`. Full crate test runs
surfaced pre-existing unrelated failures in command execution and plugin
marketplace tests; the new token usage tests passed in focused runs and
within the app-server suite before the unrelated command execution
failure.
2026-04-16 17:29:34 -03:00
Michael Bolin
ea34c6ed8d fix: fix clippy issue in examples/ folder (#18184)
I believe this use of `expect()` was introduced in
https://github.com/openai/codex/pull/17826, but was not flagged by CI.
Though I did see it in the diagnostics panel in VS Code, so it's worth
cleaning up.

I guess our current CI does include `examples/` when running Clippy?
2026-04-16 12:48:31 -07:00
Abhinav
8720b7bdce Add codex_hook_run analytics event (#17996)
# Why
Add product analytics for hook handler executions so we can understand
which hooks are running, where they came from, and whether they
completed, failed, stopped, or blocked work.

# What
- add the new `codex_hook_run` analytics event and payload plumbing in
`codex-rs/analytics`
- emit hook-run analytics from the shared hook completion path in
`codex-rs/core`
- classify hook source from the loaded hook path as `system`, `user`,
`project`, or `unknown`

```
{
  "event_type": "codex_hook_run",
  "event_params": {
    "thread_id": "string",
    "turn_id": "string",
    "model_slug": "string",
    "hook_name": "string, // any HookEventName
    "hook_source": "system | user | project | unknown",
    "status": "completed | failed | stopped | blocked"
  }
}
```

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 19:43:16 +00:00
starr-openai
62847e7554 Make thread unsubscribe test deterministic (#18000)
## Summary
- replace the unsubscribe-during-turn test's sleep/polling flow with a
gated streaming SSE response
- add request-count notification support to the streaming SSE test
server so the test can wait for the in-flight Responses request
deterministically

## Scope
- codex-rs/app-server/tests/suite/v2/thread_unsubscribe.rs
- codex-rs/core/tests/common/streaming_sse.rs

## Validation
- Not run locally; this is a narrow extraction from the prior CI-green
branch.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 19:34:04 +00:00
Michael Bolin
dfff8a7d03 fix: drop lock earlier; was held across send_event().await unnecessarily (#18178)
This was flagged by the Codex Security tool: the `state` lock was held
longer than necessary, which included being held across an `async` call,
increasing the potential for deadlock.

While this was flagged by the Codex Security tool, I will look into
enabling
https://rust-lang.github.io/rust-clippy/stable/index.html#await_holding_lock
in a follow-up PR (though unfortunately, that Clippy rule claims it
reports false positives when `drop()` is used to drop a guard instead of
using the end of block scope to drop). Though I can't seem to find a
Clippy rule that checks for opportunities to drop a guard as soon as it
is no longer referenced, in general.
2026-04-16 19:29:25 +00:00
Matthew Zeng
71174574ad Add server-level approval defaults for custom MCP servers (#17843)
## Summary
- Add `default_tools_approval_mode` support for custom MCP server
configs, matching the existing `codex_apps` behavior
- Apply approval precedence as per-tool override, then server default,
then `auto`
- Update config serialization, CLI display, schema generation, docs, and
tests

## Testing
- `cargo check -p codex-config`
- `cargo check -p codex-core`
- `just write-config-schema`
- `just fmt`
- `cargo test -p codex-config`
- Targeted `codex-core` tests for config parsing, config writes, and MCP
approval precedence
- `just fix -p codex-config -p codex-core`
2026-04-16 18:18:07 +00:00
pakrym-oai
206dd13c32 Move more connector logic into connectors crate (#18158)
Reduce the size of core
2026-04-16 11:16:44 -07:00
pakrym-oai
ab97c9aaad Refactor AGENTS.md discovery into AgentsMdManager (#18035)
Encapsulate Agents MD processing a bit and drop user_instructions_path
from config.
2026-04-16 10:51:33 -07:00
xli-oai
faf48489f3 Auto-upgrade configured marketplaces (#17425)
## Summary
- Add best-effort auto-upgrade for user-configured Git marketplaces
recorded in `config.toml`.
- Track the last activated Git revision with `last_revision` so
unchanged marketplace sources skip clone work.
- Trigger the upgrade from plugin startup and `plugin/list`, while
preserving existing fail-open plugin behavior with warning logs rather
than new user-visible errors.

## Details
- Remote configured marketplaces use `git ls-remote` to compare the
source/ref against the recorded revision.
- Upgrades clone into a staging directory, validate that
`.agents/plugins/marketplace.json` exists and that the manifest name
matches the configured marketplace key, then atomically activate the new
root.
- Local `.agents/plugins/marketplace.json` marketplaces remain live
filesystem state and are not auto-pulled.
- Existing non-curated plugin cache refresh is kicked after successful
marketplace root upgrades.

## Validation
- `just write-config-schema`
- `cargo test -p codex-core marketplace_upgrade`
- `cargo check -p codex-cli -p codex-app-server`
- `just fix -p codex-core`

Did not run the complete `cargo test` suite because the repo
instructions require asking before a full core workspace run.
2026-04-16 10:36:34 -07:00
alexsong-oai
109b22a8d0 Improve external agent plugin migration for configured marketplaces (#18055) 2026-04-16 17:34:38 +00:00
viyatb-oai
6862b9c745 feat(permissions): add glob deny-read policy support (#15979)
## Summary
- adds first-class filesystem policy entries for deny-read glob patterns
- parses config such as :project_roots { "**/*.env" = "none" } into
pattern entries
- enforces deny-read patterns in direct read/list helpers
- fails closed for sandbox execution until platform backends enforce
glob patterns in #18096
- preserves split filesystem policy in turn context only when it cannot
be reconstructed from legacy sandbox policy

## Stack
1. This PR - glob deny-read policy/config/direct-tool support
2. #18096 - macOS and Linux sandbox enforcement
3. #17740 - managed deny-read requirements

## Verification
- just fmt
- cargo check -p codex-core -p codex-sandboxing --tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 10:31:51 -07:00
Eric Traut
ff9744fd66 Avoid fatal TUI errors on skills list failure (#18061)
Addresses #17951

Problem: The TUI treated skills/list failures as fatal during refresh,
so proxy/firewall responses that break plugin discovery could crash the
session.

Solution: Route startup and refresh skills/list responses through shared
graceful handling that logs a warning and keeps the TUI running.
2026-04-16 10:30:28 -07:00
Ahmed Ibrahim
2ca270d08d [2/8] Support piped stdin in exec process API (#18086)
## Summary
- Add an explicit stdin mode to process/start.
- Keep normal non-interactive exec stdin closed while allowing
pipe-backed processes.

## Stack
```text
o  #18027 [8/8] Fail exec client operations after disconnect
│
o  #18025 [7/8] Cover MCP stdio tests with executor placement
│
o  #18089 [6/8] Wire remote MCP stdio through executor
│
o  #18088 [5/8] Add executor process transport for MCP stdio
│
o  #18087 [4/8] Abstract MCP stdio server launching
│
o  #18020 [3/8] Add pushed exec process events
│
@  #18086 [2/8] Support piped stdin in exec process API
│
o  #18085 [1/8] Add MCP server environment config
│
o  main
```

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 10:30:10 -07:00
Tom
6e72f0dbfd [codex] Add remote thread store implementation (#17826)
- Add a "remote" thread store implementation
- Implement the remote thread store as a thin wrapper that makes grpc
calls to a configurable service endpoint
- Implement only the thread/list method to start
- Encode the grpc method/param shape as protobufs in the remote
implementation

A wart: the proto generation script is an "example" binary target. This
is an example target only because Cargo lets examples use
dev-dependencies, which keeps tonic-prost-build out of the normal
codex-thread-store dependency surface. A regular bin would either need
to add proto generation deps as normal runtime deps, or use a
feature-gated optional dep, which this repo’s manifest checks explicitly
reject.
2026-04-16 10:15:31 -07:00
jif-oai
baaf42b2e4 fix: model menu pop (#18154)
Fix the `/model` menu looping on itself
2026-04-16 18:02:02 +01:00