6403 Commits

Author SHA1 Message Date
Michael Bolin
1d595488ff Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads no longer accept arbitrary PermissionProfile or SandboxPolicy replacements; permissions requests select a server-known profile id and apply the resolved server-owned profile together with active profile metadata. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 23:15:10 -07:00
viyatb-oai
46f30d0282 feat(sandbox): add Windows deny-read parity (#18202)
## Why

The split filesystem policy stack already supports exact and glob
`access = none` read restrictions on macOS and Linux. Windows still
needed subprocess handling for those deny-read policies without claiming
enforcement from a backend that cannot provide it.

## Key finding

The unelevated restricted-token backend cannot safely enforce deny-read
overlays. Its `WRITE_RESTRICTED` token model is authoritative for write
checks, not read denials, so this PR intentionally fails that backend
closed when deny-read overrides are present instead of claiming
unsupported enforcement.

## What changed

This PR adds the Windows deny-read enforcement layer and makes the
backend split explicit:

- Resolves Windows deny-read filesystem policy entries into concrete ACL
targets.
- Preserves exact missing paths so they can be materialized and denied
before an enforceable sandboxed process starts.
- Snapshot-expands existing glob matches into ACL targets for Windows
subprocess enforcement.
- Honors `glob_scan_max_depth` when expanding Windows deny-read globs.
- Plans both the configured lexical path and the canonical target for
existing paths so reparse-point aliases are covered.
- Threads deny-read overrides through the elevated/logon-user Windows
sandbox backend and unified exec.
- Applies elevated deny-read ACLs synchronously before command launch
rather than delegating them to the background read-grant helper.
- Reconciles persistent deny-read ACEs per sandbox principal so policy
changes do not leave stale deny-read ACLs behind.
- Fails closed on the unelevated restricted-token backend when deny-read
overrides are present, because its `WRITE_RESTRICTED` token model is not
authoritative for read denials.

## Landed prerequisites

These prerequisite PRs are already on `main`:

1. #15979 `feat(permissions): add glob deny-read policy support`
2. #18096 `feat(sandbox): add glob deny-read platform enforcement`
3. #17740 `feat(config): support managed deny-read requirements`

This PR targets `main` directly and contains only the Windows deny-read
enforcement layer.

## Implementation notes

- Exact deny-read paths remain enforceable on the elevated path even
when they do not exist yet: Windows materializes the missing path before
applying the deny ACE, so the sandboxed command cannot create and read
it during the same run.
- Existing exact deny paths are preserved lexically until the ACL
planner, which then adds the canonical target as a second ACL target
when needed. That keeps both the configured alias and the resolved
object covered.
- Windows ACLs do not consume Codex glob syntax directly, so glob
deny-read entries are expanded to the concrete matches that exist before
process launch.
- Glob traversal deduplicates directory visits within each pattern walk
to avoid cycles, without collapsing distinct lexical roots that happen
to resolve to the same target.
- Persistent deny-read ACL state is keyed by sandbox principal SID, so
cleanup only removes ACEs owned by the same backend principal.
- Deny-read ACEs are fail-closed on the elevated path: setup aborts if
mandatory deny-read ACL application fails.
- Unelevated restricted-token sessions reject deny-read overrides early
instead of running with a silently unenforceable read policy.

## Verification

- `cargo test -p codex-core
windows_restricted_token_rejects_unreadable_split_carveouts`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-windows-sandbox`
- GitHub Actions rerun is in progress on the pushed head.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 23:04:28 -07:00
pakrym-oai
c9e46ed639 [codex] Make handlers own parallel tool support (#22254)
## Why

`ToolRouter::tool_supports_parallel()` was still consulting configured
specs when a handler lookup missed, even though parallel schedulability
is really a property of the executable handler. Keeping that metadata on
`ConfiguredToolSpec` duplicated state between the model-visible spec
layer and the runtime handler layer.

This change makes handlers the sole source of truth for parallel tool
support and removes the extra spec wrapper that only existed to carry
duplicated metadata.

## What changed

- removed `ConfiguredToolSpec` and store plain `ToolSpec` values in the
registry/router builder path
- changed `ToolRouter::tool_supports_parallel()` to consult only the
handler registry and fall back to `false`
- simplified spec collection and test helpers to operate directly on
`ToolSpec`
- updated router/spec tests to cover handler-owned parallel behavior and
the no-handler fallback

## Validation

- `cargo test -p codex-tools`
- `cargo test -p codex-core mcp_parallel_support_uses_handler_data`
- `cargo test -p codex-core
deferred_responses_api_tool_serializes_with_defer_loading`
- `cargo test -p codex-core
tools_without_handlers_do_not_support_parallel`
- `cargo test -p codex-core
request_plugin_install_can_be_registered_without_search_tool`

## Docs

No documentation updates needed.
2026-05-11 22:26:33 -07:00
pakrym-oai
79c65f816c [codex] Filter legacy warning messages during compaction (#22243)
## Why

Older sessions can contain model-warning records persisted as `user`
messages, including the unified exec process-limit warning, the
`apply_patch`-via-`exec_command` warning, and the model-mismatch
high-risk cyber fallback warning. Those warnings are no longer produced
as conversation history items, but when old sessions compact they should
still be recognized as injected context rather than preserved as real
user turns.

## What changed

- Removed `record_model_warning` and the production paths that emitted
these warning messages into conversation history.
- Added `LegacyUnifiedExecProcessLimitWarning`,
`LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning`
contextual fragments that are used only for matching old persisted
messages.
- Registered the legacy fragments with contextual user message detection
so compaction filters them through the existing fragment path.
- Added focused compaction coverage for old warning messages being
dropped during compacted-history processing.

## Testing

- `cargo test -p codex-core warning`
- `just fix -p codex-core`
2026-05-11 19:51:51 -07:00
Abhinav
d08906a944 Support PreToolUse updatedInput rewrites (#20527)
## Why

`PreToolUse` already exposes `updatedInput` in its hook output schema,
but Codex currently rejects it instead of applying the rewrite. That
leaves hook authors unable to make the documented pre-execution
adjustment to a tool call before it runs.

## What

- Accept `updatedInput` from `PreToolUse` hooks when paired with
`permissionDecision: "allow"`.
- Apply the rewritten input before dispatch so the tool executes the
updated payload, not the original one.
- Preserve the stable hook-facing compatibility shapes that
participating tool handlers expose:
- Bash-like tools (`shell`, `container.exec`, `local_shell`,
`shell_command`, `exec_command`) use `{ "command": ... }`.
- `apply_patch` exposes its patch body through the same command-shaped
hook contract.
  - MCP tools expose their JSON argument object directly.
- Keep each participating tool handler responsible for translating
hook-facing `updatedInput` back into its concrete invocation shape.

## Verification

Direct Bash-like rewrite coverage:

- `pre_tool_use_rewrites_shell_before_execution`
- `pre_tool_use_rewrites_container_exec_before_execution`
- `pre_tool_use_rewrites_local_shell_before_execution`
- `pre_tool_use_rewrites_shell_command_before_execution`
- `pre_tool_use_rewrites_exec_command_before_execution`

These cases assert that each supported Bash-like surface runs only the
rewritten command while the hook still observes the original `{
"command": ... }` input.

`pre_tool_use_rewrites_apply_patch_before_execution`

- Model emits one patch.
- Hook swaps in a different patch.
- Asserts only the rewritten file is created, and the hook saw the
original patch.

`pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution`

- Model runs one nested shell command from code mode.
- Hook rewrites it.
- Asserts only the rewritten command runs, and the hook saw the original
nested input.

`pre_tool_use_rewrites_mcp_tool_before_execution`

- Model calls the RMCP echo tool.
- Hook rewrites the MCP arguments.
- Asserts the MCP server receives and returns the rewritten message, not
the original one.
2026-05-11 22:27:24 -04:00
starr-openai
17ed5ad0b0 Apply sandbox context to local view_image reads (#21861)
## Summary
- create a selected-cwd filesystem sandbox context for view_image
metadata and file reads in both local and remote environments
- add a local restricted-profile regression test for the previously
unsandboxed read path

## Validation
- just fmt
- bazel test --bes_backend= --bes_results_url= --test_output=errors
--test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads
//codex-rs/core:core-unit-tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 18:48:43 -07:00
efrazer-oai
fd24c00b0b feat(skills): default plugin creator to personal share flow (#22221)
## Summary
Plugin creation now defaults to the personal marketplace path and ends
with a readable handoff back into Codex after a marketplace-backed
scaffold.

Before this change, `plugin-creator` centered repo-local marketplace
updates and did not clearly guide the agent to return the user to the
created plugin afterward. This PR updates the bundled system skill so
marketplace-backed scaffolds default to `~/plugins/<plugin-name>` plus
`~/.agents/plugins/marketplace.json`, ask for user intent only when an
existing repo marketplace makes personal vs team scope ambiguous, and
end with named Markdown deeplinks labeled `View <plugin-name>` and
`Share <plugin-name>`.

## What changed
- default marketplace-backed creation to the personal plugin location
- document the explicit repo/team override path for codebases that
should own the plugin entry
- ask personal vs team only when the current Git repo already has
`.agents/plugins/marketplace.json` and the user has not stated scope
- require named Markdown deeplinks after marketplace-backed creation so
the final response returns the user to the exact plugin cleanly
- keep the deeplink targets precise with real absolute `marketplacePath`
and normalized `pluginName` values
- align the bundled prompt, scaffold help text, and marketplace
reference spec with the new default

## Testing
Tests: targeted skill validation, Python compile checks,
personal-default scaffold smoke, repo-override scaffold smoke, and
whitespace checks.
2026-05-11 17:58:48 -07:00
pakrym-oai
ed5944ba1d Simplify MCP tool handler plumbing (#21595)
## Why
The MCP tool path had accumulated a few core-owned special cases: a
dedicated payload variant, resolver plumbing, a legacy `AfterToolUse`
translation path, and a side channel for parallel-call metadata. That
made `ToolRegistry` and the spec builder know more about MCP than they
needed to.

This change moves MCP-specific execution details back onto `ToolInfo`
and `McpHandler` so `codex-core` can treat MCP calls like normal
function calls while still preserving MCP-specific dispatch and
telemetry behavior where it belongs.

## What changed
- removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and
the remaining registry-side MCP resolver path
- stored MCP routing metadata directly on `McpHandler` and `ToolInfo`,
including `supports_parallel_tool_calls`
- deleted the legacy `AfterToolUse` consumer in `core`, which removes
the need for handler-specific `after_tool_use_payload` implementations
- switched tool-result telemetry to handler-provided tags and kept
MCP-specific dispatch payload construction inside the handler
- simplified tool spec planning/building by passing `ToolInfo` directly
and dropping the direct/deferred MCP wrapper structs and the
parallel-server side table

## Testing
- `cargo check -p codex-core -p codex-mcp -p codex-otel`
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server`
- `cargo test -p codex-core
direct_mcp_tools_register_namespaced_handlers`
- `cargo test -p codex-core
search_tool_description_lists_each_mcp_source_once`
- `cargo test -p codex-mcp
list_all_tools_uses_startup_snapshot_while_client_is_pending`
- `just fix -p codex-core -p codex-mcp -p codex-otel`
2026-05-12 00:11:31 +00:00
Felipe Coury
e16b4e46d4 fix(tui): handle hidden app git directives (#21946) 2026-05-11 21:08:40 -03:00
Ruslan Nigmatullin
95d8669ab2 [exec-server] serve websocket listener via HTTP upgrade (#21963)
## Why

`codex exec-server` should keep the existing public `ws://IP:PORT` URL
shape while serving that websocket connection through an HTTP upgrade
path internally. That keeps the client-facing configuration simple and
allows the listener to work through intermediate HTTP-aware
infrastructure.

## What changed

- keep the emitted and configured exec-server URL as `ws://IP:PORT`
- serve that websocket endpoint through Axum HTTP upgrade handling on
`/`
- expose `GET /readyz` from the same listener for readiness checks
- route upgraded Axum websocket streams through the shared JSON-RPC
connection machinery
- initialize the rustls crypto provider before websocket client
connections
- preserve inbound binary websocket JSON-RPC parsing for compatibility
with the prior transport behavior

## Verification

- `cargo test -p codex-exec-server --test health --test process --test
websocket --test initialize --test exec_process`
2026-05-11 17:04:21 -07:00
Matthew Zeng
e15ecc9c35 Add production startup and TTFT telemetry (#22198)
## Why

While investigating `codex exec hi` startup latency, the useful
questions were not "is startup slow?" but "which durable bucket is slow
in production?"

The path we observed has a few distinct stages:

1. `thread/start` creates the session
2. startup prewarm builds the turn context, tools, and prompt
3. startup prewarm warms the websocket
4. the first real turn resolves the prewarm
5. the model produces the first token

Before this PR, production telemetry had some of the raw measurements
already:

- aggregate startup-prewarm duration / age-at-first-turn metrics
- TTFT as a metric
- websocket request telemetry

But there was no coherent production event stream for the startup
breakdown itself, and TTFT was metric-only. That made it hard to answer
the same latency questions from OpenTelemetry-backed logs without adding
one-off local instrumentation.

## What changed

Add durable production telemetry on the existing `SessionTelemetry`
path:

- new `codex.startup_phase` OTel log/trace events plus
`codex.startup.phase.duration_ms`
- new `codex.turn_ttft` OTel log/trace events while preserving the
existing TTFT metric

The startup phase event is emitted for the coarse buckets we actually
observed while running `exec hi`:

- `thread_start_create_thread`
- `startup_prewarm_total`
- `startup_prewarm_create_turn_context`
- `startup_prewarm_build_tools`
- `startup_prewarm_build_prompt`
- `startup_prewarm_websocket_warmup`
- `startup_prewarm_resolve`

These phases are intentionally low-cardinality so they remain safe as
production telemetry tags.

## Why this shape

This keeps the instrumentation on the same production path as the rest
of the session telemetry instead of adding a local debug-only trace
mode. It also avoids changing startup behavior:

- prewarm still runs
- no control flow changes
- no extra remote calls
- no user-visible behavior changes

One boundary is intentional: very early process bootstrap that happens
before a session exists is not included here, because this PR uses
session-scoped production telemetry. The expensive buckets we were
trying to understand after `thread/start` are now covered durably.

## Verification

- `cargo test -p codex-otel`
- `cargo test -p codex-core turn_timing`
- `cargo test -p codex-core
regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
- `cargo test -p codex-core
interrupting_regular_turn_waiting_on_startup_prewarm_emits_turn_aborted`
- `cargo test -p codex-app-server thread_start`
- `just fix -p codex-otel -p codex-core -p codex-app-server`

I also ran `cargo test -p codex-core`; it built successfully and then
hit an existing unrelated stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
2026-05-11 23:58:36 +00:00
starr-openai
22e84c49d0 Support multi-environment apply_patch selection (#21617)
## Summary
- add multi-environment apply_patch routing for both freeform and
function-call tool flows
- parse and reconcile the optional environment selector in the main
apply_patch parser, then verify against the selected environment in the
handler
- carry environment_id through runtime and approval surfaces so
remote-targeted patches stay explicit end to end

## Testing
- just fmt
- remote exec-server e2e: `cargo test -p codex-core --test all
apply_patch_multi_environment_uses_remote_executor -- --nocapture` on
dev via `scripts/test-remote-env.sh`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 16:33:44 -07:00
alexsong-oai
bb6134c028 Stop uploading accepted line fingerprints (#22180)
## Summary
- keep accepted-line diff parsing and fingerprint hashing logic locally
- stop uploading path/line hash fingerprints in the accepted-line
analytics event payload
- keep aggregate accepted added/deleted line counts in the event

## Testing
- just fmt
- cargo test -p codex-analytics
- just fix -p codex-analytics
2026-05-11 15:41:38 -07:00
Owen Lin
4859d80ffe Update codex remote-control to start the daemon (#22218)
## Why
Update `codex remote-control` to use the new app server daemon commands
instead.
- if the updater loop is not running, bootstrap the daemon with remote
control enabled (`codex app-server daemon bootstrap --remote-control`)
- otherwise, enable the persisted remote-control setting and start the
daemon normally
2026-05-11 15:38:30 -07:00
Abhinav
9ab7f4e6ac Add Windows hook command overrides (#22159)
# Why

Managed hook configs need a shared cross-platform shape without making
the existing `command` field polymorphic. The common case is still one
command string, with Windows needing a different entrypoint only when
the runtime is actually Windows.

Keeping `command` as the portable/default path and adding an optional
Windows override keeps the config easier to read, preserves the existing
scalar shape for non-Windows users, and avoids forcing every caller into
a `{ unix, windows }` object when only one platform needs special
handling.

# What

- Add optional `command_windows` / `commandWindows` alongside the
existing hook `command` field.
- Resolve `command_windows` only on Windows during hook discovery; other
platforms continue to use `command` unchanged.
- Keep trust hashing aligned to the effective command selected for the
current runtime.

# Docs

The Codex hooks/config reference should document `command_windows` as
the Windows-only override for command hooks.
2026-05-11 22:22:29 +00:00
rhan-oai
a175ddacc0 [codex-analytics] emit terminal review events (#18748)
## Why

Review telemetry should describe reviews as first-class events, not only
as counters denormalized onto terminal tool-item events. That lets us
analyze guardian and user reviews consistently across command execution,
file changes, permissions, and network access, while still preserving
the terminal item summaries that existing tool analytics need.

To make those review events accurate, analytics also needs the observed
completion time for each review and enough command metadata to
distinguish `shell` from `unified_exec` reviews.

## What changed

- emit generic `codex_review_event` rows for completed user and guardian
reviews, with review subjects, reviewer, trigger, terminal status,
resolution, and observed duration
- reduce approval request / response / abort facts into review events
for command execution, file change, and permissions flows
- keep denormalized review counts, final approval outcome, and
permission-request flags on terminal tool-item events for
item-associated reviews
- plumb review completion timing so user-review responses and aborts use
app-server-observed completion times, while guardian analytics reuse the
same terminal timestamps emitted on guardian assessment events
- carry command approval `source` through the protocol and app-server
layers so review analytics can distinguish `shell` from `unified_exec`
- add analytics coverage for user-review emission, guardian-review
emission, permission reviews that should not denormalize onto tool
items, item-summary isolation across threads, and the serialized
review-event shape

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18748).
* __->__ #18748
* #21434
* #18747
* #17090
* #17089
* #20514
2026-05-11 22:13:32 +00:00
Ahmed Ibrahim
aa9e8f0262 [8/8] Add Python SDK Ruff formatting (#22021)
## Why

The Python SDK needs the same tight formatter/lint loop as the rest of
the repo: a safe Ruff autofix pass, Ruff formatting, editor save
behavior, and CI checks that catch drift. Without that loop, SDK changes
can land with formatting or import ordering that differs from what
reviewers and CI expect.

## What

- Add Ruff configuration to `sdk/python/pyproject.toml`, excluding
generated protocol code and notebooks from the normal lint/format pass.
- Update `just fmt` so it still formats Rust and also runs Python SDK
Ruff autofix and formatting.
- Add Python SDK CI steps for `ruff check` and `ruff format --check`
before pytest.
- Recommend the Ruff VS Code extension and enable Python
format/fix/organize-on-save so Cmd+S uses the same tooling.
- Apply the resulting Ruff formatting to SDK Python files, examples, and
the checked-in generated `v2_all.py` output emitted by the pinned
generator.
- Add a guard test for the `just fmt` recipe so it keeps working from
both Rust and Python SDK working directories.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. This PR `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added `test_root_fmt_recipe_formats_rust_and_python_sdk` for the
shared format recipe.
- Ran `just fmt` after the recipe update.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 01:10:29 +03:00
Ahmed Ibrahim
3e10e09e24 [7/8] Add Python SDK app-server integration harness (#22014)
## Why

The SDK had behavioral tests that replaced SDK client internals. Those
tests could catch wrapper mistakes, but they did not prove the pinned
app-server runtime, generated notification models, request routing, and
sync/async public clients worked together.

This PR adds deterministic integration coverage that starts the pinned
`codex app-server` process and mocks only the upstream Responses HTTP
boundary.

## What

- Add `AppServerHarness` and `MockResponsesServer` helpers for isolated
`CODEX_HOME`, mock-provider config, queued SSE responses, and captured
`/v1/responses` requests.
- Add shared helpers for SSE construction, stream assertions,
approval-policy inspection, and image fixtures.
- Split integration coverage into focused modules for run behavior,
inputs, streaming, turn controls, approvals, and thread lifecycle.
- Cover sync and async `Thread.run`, `TurnHandle.stream`, interleaved
streams, approval-mode persistence, lifecycle helpers, final-answer
phase handling, image inputs, loaded skill input injection, steering,
interruption, listing, history reads, run overrides, and token usage
mapping.
- Replace public-wrapper tests that duplicated integration-test behavior
with lower-level client tests only where direct client behavior is the
thing under test.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. This PR `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added pinned app-server integration tests under
`sdk/python/tests/test_app_server_*.py` and
`test_real_app_server_integration.py`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 01:06:41 +03:00
Ahmed Ibrahim
2b90c37069 [6/8] Add high-level Python SDK approval mode (#21910)
## Why

The high-level SDK should expose the approval behavior it actually
supports instead of leaking generated app-server routing fields. New
work should have two clear choices: default auto review, or explicitly
deny escalated permission requests. Existing threads and subsequent
turns should preserve their current approval behavior unless the caller
passes an override.

## What

- Add the public `ApprovalMode` enum with `auto_review` and `deny_all`.
- Default new thread creation to `ApprovalMode.auto_review`.
- Preserve existing approval settings by default for resume, fork, run,
and turn helpers.
- Remove raw `approval_policy` / `approvals_reviewer` kwargs from
high-level SDK wrappers.
- Update generated wrapper output, docs, examples, notebooks, and tests
for the high-level approval mode API.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. This PR `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added approval-mode mapping/default tests for new threads, existing
threads, forks, resumes, and subsequent turns.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 01:02:43 +03:00
Ahmed Ibrahim
f1b84fac63 [5/8] Rename Python SDK package to openai-codex (#21905)
## Why

The SDK should publish under the reserved public distribution name
`openai-codex`, and its import module should match that name in the
Python style. Since package names can contain hyphens but import modules
cannot, the public import path becomes `openai_codex`.

Keeping the rename separate from the public API surface change makes the
naming change easy to review and avoids mixing it with API curation.

## What

- Rename the SDK distribution from `openai-codex-app-server-sdk` to
`openai-codex`.
- Rename the import package from `codex_app_server` to `openai_codex`.
- Keep the runtime wheel as the separate `openai-codex-cli-bin`
dependency.
- Update docs, examples, notebooks, artifact scripts, lockfile metadata,
and tests for the new distribution/module names.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. This PR `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Updated package metadata and public API tests to assert the
distribution and import names.

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 00:59:25 +03:00
Ahmed Ibrahim
b4bc02439f [4/8] Define Python SDK public API surface (#21896)
## Why

The SDK package root should be the ergonomic public client API, not a
dump of every generated app-server schema type. Generated models still
need a supported import path, but callers should be able to tell which
names are high-level SDK entrypoints and which names are protocol value
models.

## What

- Define a curated root `__all__` for clients, handles, input helpers,
retry helpers, config, and public errors.
- Add a `types` module as the supported home for generated app-server
response, event, enum, and helper models.
- Update docs and examples to import protocol/value models from the type
module.
- Add tests that lock root exports, type-module exports, star-import
behavior, and example import hygiene.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. This PR `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added public API signature tests for root exports, `types` exports,
and example imports.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 00:57:44 +03:00
Ahmed Ibrahim
3e2936dd0e [3/8] Run Python SDK tests in CI (#21895)
## Why

The Python SDK stack now depends on packaging metadata, pinned runtime
wheels, generated artifacts, async behavior, and stream interleaving.
Those checks need to run in CI so future changes cannot bypass the SDK
test suite.

## What

- Add a dedicated `python-sdk` job to `.github/workflows/sdk.yml`.
- Run the job in `python:3.12-alpine` so dependency resolution exercises
the pinned musl runtime wheel.
- Keep the Python SDK test job parallel to the existing SDK job instead
of serializing the full workflow.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. This PR `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- The added workflow job installs the SDK with `uv sync --extra dev
--frozen` and runs the Python SDK pytest suite.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 00:53:36 +03:00
Ahmed Ibrahim
6a4653efc8 [2/8] Generate Python SDK types from pinned runtime (#21893)
## Why

Once the SDK declares its runtime package, generated Python artifacts
should come from that pinned runtime rather than whatever app-server
schema happens to be in the current checkout. That keeps the generated
API and model surface aligned with the runtime users install.

## What

- Teach `scripts/update_sdk_artifacts.py generate-types` to invoke the
pinned runtime package for schema generation.
- Regenerate `v2_all.py`, `notification_registry.py`, and generated
public wrapper methods from that schema.
- Add freshness coverage so regenerating from the pinned runtime must
leave checked-in artifacts unchanged.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. This PR `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added `test_generated_files_are_up_to_date` for pinned-runtime
generation drift.
- Added generator-structure tests for schema annotation and notification
metadata generation.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 00:53:21 +03:00
Ahmed Ibrahim
5fe33443b0 [1/8] Pin Python SDK runtime dependency (#21891)
## Why

The Python SDK depends on the app-server runtime package for the bundled
`codex` binary and schema source of truth. That relationship should be
explicit in package metadata instead of inferred from matching version
numbers, so installers, lockfiles, and reviewers can see exactly which
runtime the SDK expects.

## What

- Declare `openai-codex-cli-bin==0.131.0a4` as a Python SDK dependency.
- Update runtime setup helpers to resolve the runtime version from the
declared dependency pin.
- Refresh the SDK lockfile for the pinned runtime wheel.
- Update package/runtime tests and docs that describe where the runtime
version comes from.

## Stack

1. This PR `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added coverage for the SDK runtime dependency pin and runtime
distribution naming.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 00:42:26 +03:00
viyatb-oai
c7b55cdc46 feat: add network proxy feature flag (#20147)
## Why

The permissions migration is making
`permissions.<profile>.network.enabled` the canonical sandbox network
bit, while proxy startup is a separate concern. Enabling network access
should not implicitly start the proxy, and users who are still on legacy
sandbox modes need a separate place to opt into proxy startup and
provide proxy-specific settings.

This follow-up to #19900 gives the network proxy its own feature surface
instead of overloading permission-profile network semantics.

## What changed

- Add an experimental `network_proxy` feature with a configurable
`[features.network_proxy]` table.
- Overlay `features.network_proxy` settings onto the configured proxy
state after permission-profile selection, so the proxy only starts when
the active `NetworkSandboxPolicy` already allows network access.
- Preserve `[experimental_network]` startup behavior independently of
the new feature flag.

## Behavior and examples

There are now three related knobs:

- `permissions.<profile>.network.enabled` controls whether the active
permission profile has network access at all.
- `features.network_proxy` enables proxy restrictions for an
already-network-enabled profile.
- Legacy `sandbox_mode` plus `[sandbox_workspace_write].network_access`
still control whether legacy `workspace-write` has network access at
all.

The rule is:

- network off + proxy flag on -> network stays off, proxy is a no-op
- network on + proxy flag off -> unrestricted direct network
- network on + proxy flag on -> network stays on, with proxy
restrictions applied

For permission profiles, the feature toggle adds proxy restrictions only
when network access is already enabled:

```toml
default_permissions = "workspace"

[permissions.workspace.filesystem]
":minimal" = "read"

[permissions.workspace.network]
enabled = true

[features]
network_proxy = true
```

If `network.enabled = false`, the same feature flag is a no-op: network
remains off and the proxy does not start.

For legacy sandbox config, `network_access` remains the master switch:

```toml
sandbox_mode = "workspace-write"

[sandbox_workspace_write]
network_access = true

[features]
network_proxy = true
```

That keeps legacy `workspace-write` network access on, but routes it
through the proxy policy. If `network_access = false`, the proxy feature
is a no-op and legacy `workspace-write` remains offline.

The same proxy opt-in can be supplied from the CLI:

```bash
codex -c 'features.network_proxy=true'
```

Additional proxy settings can be supplied when a table is needed:

```bash
codex \
  -c 'features.network_proxy.enabled=true' \
  -c 'features.network_proxy.enable_socks5=false'
```

The intended behavior matrix is:

| Config surface | Network setting | `features.network_proxy` | Direct
sandbox network | Proxy |
| --- | --- | --- | --- | --- |
| Permission profile | `network.enabled = false` | off | restricted |
off |
| Permission profile | `network.enabled = false` | on | restricted | off
|
| Permission profile | `network.enabled = true` | off | enabled | off |
| Permission profile | `network.enabled = true` | on | enabled | on |
| Legacy `workspace-write` | `network_access = false` | off | restricted
| off |
| Legacy `workspace-write` | `network_access = false` | on | restricted
| off |
| Legacy `workspace-write` | `network_access = true` | off | enabled |
off |
| Legacy `workspace-write` | `network_access = true` | on | enabled | on
|

`[experimental_network]` requirements remain separate from the user
feature toggle and still start the proxy on their own.

Relevant code:
-
[`features/src/feature_configs.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/features/src/feature_configs.rs#L58-L117)
defines the feature-specific proxy config.
-
[`core/src/config/mod.rs`](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L1959-L1964)
reads the feature table, and [later applies it only when network access
is already
enabled](https://github.com/openai/codex/blob/43785aff47/codex-rs/core/src/config/mod.rs#L2448-L2458).

## Verification

Added focused coverage for:
- keeping the proxy off when `features.network_proxy` is enabled but
sandbox network access is disabled
- the full permission-profile and legacy `workspace-write` matrix above
- preserving `[experimental_network]` startup without the feature
- reusing profile-supplied proxy settings when the feature is enabled

Ran:
- `cargo test -p codex-features`
- `cargo test -p codex-core network_proxy_feature`
- `cargo test -p codex-core
experimental_network_requirements_enable_proxy_without_feature`
2026-05-11 14:12:00 -07:00
cooper-oai
54ec99cb54 [login] revoke superseded auth tokens on relogin (#21747)
## Summary
- revoke previously stored managed ChatGPT tokens after a successful
re-login
- keep the new login successful even when revocation is unavailable or
fails
- cover the shared persistence path used by browser and device-code
login flows

## Why
A new `codex login` currently overwrites existing managed ChatGPT
credentials without attempting to revoke the superseded tokens, leaving
old credentials valid longer than necessary.

## Validation
- `just fmt`
- `CARGO_HOME=/tmp/cargo-home cargo test -p codex-login`

## Notes
- Initial local Cargo validation hit a corrupt existing crate cache in
the default `CARGO_HOME`; rerunning with a clean temporary `CARGO_HOME`
passed.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 13:36:46 -07:00
Ruslan Nigmatullin
e3f481da98 daemon: refresh updater after validated binary rollout (#21853)
## Why

`bootstrap` starts a detached pid-backed updater loop, but before this
change that updater could keep running an old executable image even
after `install.sh` replaced the managed standalone binary under
`CODEX_HOME`. That left the updater itself behind the binary it had just
rolled out, especially when the app-server was stopped or when the
managed binary changed without a version-string change.

## What changed

- Track updater identity from the executable contents rather than only
the reported CLI version.
- Force the managed app-server restart path when the managed binary
contents differ from the running updater image, then re-exec the updater
from the managed binary once the rollout is in a safe state.
- Distinguish a genuinely absent managed app-server from a managed
process that exists but is not yet probeable, so self-refresh does not
skip a required restart.
- Keep the restart/re-exec decision under the daemon operation lock so
`bootstrap` cannot race the handoff.
- Update `app-server-daemon/README.md` to document the resulting
standalone and out-of-band update behavior.

## Verification

- `cargo test -p codex-app-server-daemon`
- `just fix -p codex-app-server-daemon`

Added focused unit coverage for:
- content-based updater refresh decisions
- safe updater re-exec outcomes across restart states
2026-05-11 12:37:10 -07:00
Felipe Coury
99b98aece6 config: accept minus in TUI keymap config (#22192)
## Summary

Fixes #22128.

The `/keymap` flow already persists the `-` key as `minus`, and the
runtime keymap parser already accepts that spelling. `codex-config` was
the missing leg: it rejected `minus` during config deserialization, so a
binding saved by Codex could fail on the next startup or config reload.

## What Changed

- Accept `minus` as a valid canonical key name in `tui.keymap` config
normalization.
- Update the config validation message so its supported-key list
includes `minus`.
- Add regression coverage that deserializes both `minus` and `alt-minus`
under `[tui.keymap.global]` and verifies the normalized config shape.

## How to Test

1. Start Codex TUI.
2. Run `/keymap`.
3. Assign the `-` key to an action and save the change.
4. Restart Codex or reload the config.
5. Confirm the config loads normally and the saved binding remains
usable instead of failing on `minus`.
6. As a focused regression check, repeat with a modifier form such as
`alt--` captured through `/keymap`, which persists as `alt-minus` and
should also reload successfully.

Targeted tests:
- `cargo test -p codex-config`
2026-05-11 16:34:33 -03:00
Matthew Zeng
192481d1a1 [elicitation] Advertise new url elicitation capability when auth_elicitation is enabled. (#22188)
## Why

We've added support for auth elicitation behind the auth_elicitation
flag, but servers need to explicitly check the capability before it
decides to send elicitations in order to be backward compatible. This PR
adds the capability advertising conditioned on the flag.

## What changed

- Build `client_elicitation_capability` from the `AuthElicitation`
feature state.
- Thread that capability through MCP config, session startup, and
`McpConnectionManager` so RMCP initialization advertises the correct
elicitation support.
- Advertise both `form` and `url` elicitation when the feature is
enabled, and preserve the empty default capability when it is disabled.
- Add coverage for the feature-derived config shape and the advertised
initialization payload.

## Testing

- `cargo test -p codex-mcp`
- `cargo test -p codex-core
to_mcp_config_preserves_auth_elicitation_feature_from_config`
- `cargo test -p codex-core` *(currently fails outside this change in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`
with a stack overflow after unrelated tests have started running)*
2026-05-11 12:23:55 -07:00
viyatb-oai
d0fa2d81d8 feat(connectors): support managed app tool approval requirements (#21061)
## Why

Managed requirements can already centrally disable apps, but they could
not express the per-tool app approval rules that normal config already
supports. That left admins without a way to enforce connector tool
approvals through `/etc/codex/requirements.toml` or cloud requirements.

## What changed

- Extend app requirements with per-tool `approval_mode` entries.
- Merge managed app tool requirements across managed sources while
preserving higher-precedence exact tool settings.
- Apply managed tool approvals separately from user app config so
managed policy is matched only on raw MCP `tool.name`, while user config
keeps the existing raw-name-then-title convenience fallback.
- Add coverage for local requirements, cloud requirements parsing,
managed-over-user precedence, and a title-collision case that must not
widen managed auto-approval.

## Configuration shape

Local `/etc/codex/requirements.toml` and cloud requirements use the same
TOML shape:

```toml
[apps.connector_123123.tools."calendar/list_events"]
approval_mode = "approve"
```

This is a per-tool approval rule keyed by app ID and raw MCP tool name,
not an app-level boolean such as `apps.connector_123123.approve = true`.
2026-05-11 19:08:26 +00:00
viyatb-oai
6506765168 fix(permissions): preserve managed deny-read during escalation (#15977)
## Why

Managed filesystem `deny_read` requirements are administrator-enforced
restrictions on specific paths. Once those requirements are active,
Codex should not drop them just because an execution path would
otherwise leave the sandbox.

Before this change, an explicit escalation, a prefix-rule allow, a
sandbox-denial retry, or an app-server legacy sandbox override could
rebuild the runtime policy without those managed read-deny entries and
expose a path the administrator had marked unreadable.

This is narrower than general sandbox-mode constraints. If an enterprise
only sets `allowed_sandbox_modes`, a trusted `prefix_rule(..., decision
= "allow")` can still run its matching command unsandboxed; this PR only
preserves managed filesystem `deny_read` restrictions across those
paths.

## What Changed

- Mark filesystem policies built from managed `deny_read` requirements
so callers can tell when those deny entries must survive escalation.
- Preserve managed deny-read entries when runtime permission profiles
are rebuilt through protocol, app-server, or legacy sandbox-policy
compatibility paths.
- Keep managed deny-read attempts inside the selected sandbox on the
first attempt and after sandbox-denial retries.
- Preserve the same behavior in the zsh-fork escalation path, including
prefix-rule-driven escalation.
- Add a regression test showing the opposite case too: without managed
deny-read, a prefix-rule allow still chooses unsandboxed execution.

## Verification

Targeted automated verification:

```shell
cargo test -p codex-core shell_request_escalation_execution_is_explicit -- --nocapture
cargo test -p codex-core prefix_rule_uses_unsandboxed_execution_without_managed_deny_read -- --nocapture
cargo test -p codex-core prefix_rule_preserves_managed_deny_read_escalation -- --nocapture
cargo test -p codex-protocol permission_profile_round_trip_preserves_filesystem_policy_metadata -- --nocapture
cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable -- --nocapture
cargo test -p codex-app-server-protocol permission_profile_file_system_permissions_preserves_policy_metadata -- --nocapture
cargo check -p codex-app-server -p codex-tui
```

Smoke-test invocations:

```shell
# macOS exact deny + allowed control
codex exec --skip-git-repo-check -C "$ROOT" \
  -c 'default_permissions="deny_read_smoke"' \
  -c 'permissions.deny_read_smoke.filesystem={":minimal"="read",":project_roots"={"."="write","secrets"="none","future-secret"="none","**/*.env"="none"}}' \
  'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.'

# Linux exact deny + allowed control
codex exec --skip-git-repo-check -C "$ROOT" \
  -c 'default_permissions="deny_read_smoke"' \
  -c 'permissions.deny_read_smoke.filesystem={":minimal"="read",glob_scan_max_depth=3,":project_roots"={"."="write","secrets"="none","future-secret"="none","**/*.env"="none"}}' \
  'Run shell commands only. Print the contents of allowed.txt. Then test whether reading secrets/exact-secret.txt succeeds without printing that file if it does. End with exactly two lines: allowed=<contents> and exact_secret=<BLOCKED or READABLE>.'
```

Observed manual smoke matrix:

| Case | macOS Seatbelt | Linux bubblewrap |
| --- | --- | --- |
| `cat allowed.txt` | Pass | Pass |
| `cat secrets/exact-secret.txt` | Blocked | Blocked |
| `cat envs/root.env` | Blocked | Blocked |
| `cat envs/nested/one.env` | Blocked | Blocked |
| `cat envs/nested/two.env` | Blocked | Blocked |
| `cat alias-to-secrets/exact-secret.txt` | Blocked | Blocked |
| Missing denied path | A file created after sandbox setup remained
unreadable | Creation was blocked by the reserved missing-path
placeholder, and the placeholder was cleaned up after exit |
| Real `codex exec` shell turn | Pass | Pass |

Notes:

- The Linux smoke run used the fallback glob walker because the devbox
did not have `rg` installed.
- The smoke matrix verifies the end-to-end filesystem behavior on macOS
and Linux; the escalation-specific behavior is covered by the focused
tests above.

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charlie Marsh <charliemarsh@openai.com>
2026-05-11 11:49:44 -07:00
Owen Lin
7bddb3083d fix(app-server): thread history redaction for remote clients (#22178)
## Summary

Remote clients can still receive large `thread/resume` histories when
prior turns include MCP tool call payloads or image-generation results.
This adds a temporary response-only redaction path for the known remote
client names.

Longer term we will move towards fully paginated APIs backed by SQLite.

## Changes

- Redact MCP tool call payload-bearing fields in `thread/resume`
responses for `codex_chatgpt_android_remote` and
`codex_chatgpt_ios_remote`.
- Drop `imageGeneration` items from those `thread/resume` responses.
- Keep redaction out of persisted rollout files, `thread/read`,
`thread/turns/list`, live notifications, and token usage replay.
- Cover the behavior with app-server helper tests and a v2 resume
integration test that checks both remote clients plus a non-target
control client.

## Testing

- `cargo test -p codex-app-server thread_resume_redaction`
- `cargo test -p codex-app-server
thread_resume_redacts_payloads_for_chatgpt_remote_clients`
2026-05-11 11:45:25 -07:00
Felipe Coury
90bd445e7f fix(exec-server): suppress Windows taskkill output (#22058)
## Summary

This is the `exec-server` follow-up to #21759.

#21759 fixed the Windows `taskkill` output leak for the `rmcp-client`
MCP teardown path, but #22050 showed that `exec-server` still had a
parallel `taskkill /T /F` cleanup path in
`exec-server/src/connection.rs`. Because that command inherited the
parent stdio handles, Windows could still print `SUCCESS:` lines into
the user's terminal during stdio child cleanup.

This change silences that remaining `exec-server` callsite by
redirecting `taskkill` stdin, stdout, and stderr to `Stdio::null()`.

## What Changed

- add a Windows-only `Stdio` import in `exec-server/src/connection.rs`
- redirect the `taskkill` command in `kill_windows_process_tree` to
`Stdio::null()` for stdin, stdout, and stderr
- keep the existing kill semantics unchanged by still checking
`.status()` and preserving the existing fallback/logging behavior

## How to Test

Manual validation is Windows-only, so I did not run the UI repro path
locally here.

1. On Windows, use a Codex build from this branch.
2. Exercise an `exec-server` stdio flow that spawns a child process tree
and then triggers transport cleanup.
3. Confirm the child process tree is still torn down.
4. Confirm the terminal no longer shows `SUCCESS: The process with PID
... has been terminated.` lines during cleanup.

Targeted tests:
- `cargo test -p codex-exec-server
client::tests::dropping_stdio_client_terminates_spawned_process --
--exact`
- `cargo test -p codex-exec-server
client::tests::malformed_stdio_message_terminates_spawned_process --
--exact`

Notes:
- `cargo test -p codex-exec-server` still hits unrelated local macOS
`sandbox-exec: sandbox_apply: Operation not permitted` failures in
`tests/file_system.rs`.

## References

- Fixes the remaining callsite discussed in #22050
- Related earlier fix: #21759
2026-05-11 15:40:56 -03:00
Dylan Hurd
e783dab44c fix(exec-policy) use is_known_safe_command less (#20305)
## Summary
Restricts behavior of `is_known_safe_command` only to modes where it is
explicitly part of the documented behavior:
- when `environment_lacks_sandbox_protections`
- in `AskForApproval::UnlessTrusted`

Notably, as a result of this, escalations for commands that pass
`is_known_safe_commands` are no longer auto-approved in
AskForApproval::OnRequest or AskForApproval::Granular.

## Testing
- [x] Updated unit tests
- [x] Updated approvals scenario tests.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 11:37:53 -07:00
canvrno-oai
eaf05c9002 Unified mentions in TUI (#19068)
This PR replaces the TUI’s file-only `@mention` popup with a unified
mentions experience. Typing `@...` now searches across filesystem
matches, installed plugins, and skills in one popup, with result types
clearly labeled and selectable from the same flow.

- Adds a unified `@mentions` popup that returns:
  - plugins
  - skills
  - files
  - directories

- Adds search modes so users can narrow the popup without changing their
query:
  - All Results _(default/same as Codex App)_
  - Filesystem Only
  - Plugins _(...and skills)_

- Preserves existing insertion behavior:
  - selected file paths are inserted into the prompt
  - paths with spaces are quoted
  - image file selections still attach as images when possible
  - selecting a plugin or skill inserts the corresponding `$name`
- the composer records the canonical mention binding, such as
`plugin://...` or the skill path

- Expanded `@mentions` rendering:
  - type tags for Plugin, Skill, File, and Dir
  - distinct plugin/filesystem colors
  - stable fixed-height layout (8 rows)
  - truncation behavior for narrow terminals

Note:
- The unified mentions popup does not display app connectors under
`@mention` results for Codex App parity. Connector mentions remain
available through the existing `$mention` path.


https://github.com/user-attachments/assets/f93781ed-57d3-4cb5-9972-675bc5f3ef3f
2026-05-11 11:34:52 -07:00
jif-oai
b401666ca5 Add process-scoped SQLite telemetry (#22154)
## Summary
- add SQLite init, backfill-gate, and fallback telemetry without
introducing a cross-cutting state-db access wrapper
- install one process-scoped telemetry sink after OTEL startup and let
low-level state/rollout paths emit through it directly
- add process-start metrics for the process owners that initialize
SQLite

---------

Co-authored-by: Owen Lin <owen@openai.com>
2026-05-11 11:32:40 -07:00
rhan-oai
cf6342b75b [codex-analytics] add turn tool counts to turn events (#21431)
## Summary
- accumulate completed tool-item counts per turn from the item lifecycle
- populate the reserved count fields on `codex_turn_event`
- add reducer coverage for zero-count turns and mixed completed tool
items

## Why
PR #17090 moved tool-item analytics onto the item lifecycle, so the turn
reducer can now derive the per-turn tool counts from the same completed
items instead of leaving the reserved fields null.

## Validation
- `just fmt`
- `cargo test -p codex-analytics`
2026-05-11 18:18:02 +00:00
Won Park
0dbad2a348 Make auto-review denial short-circuit use a rolling review window (#22110)
## Why

Long-running turns can accumulate enough denied auto-review decisions to
trip the global short-circuit even when those denials are spread far
apart. The breaker should still stop genuinely bad loops, but it should
judge recent behavior instead of lifetime turn history.

## What changed

- Replaced the lifetime `10 total denials` threshold with `10 denials in
the last 50 reviews`.
- Kept the existing `3 consecutive denials` interrupt behavior
unchanged.
- Tracked recent auto-review outcomes in the circuit breaker and updated
the warning copy to report the rolling-window count.
- Renamed the new rolling-window coverage to `auto_review_*` test names.
- Added coverage that confirms older denials fall out of the 50-review
window and no longer trigger the breaker.

## Validation

- `just fmt`
- `cargo test -p codex-core guardian_rejection_circuit_breaker --lib`
- `cargo test -p codex-core auto_review_rejection_circuit_breaker --lib`
2026-05-11 11:03:11 -07:00
Eric Traut
1e65b3e0af Fix goal update and add /goal edit command in TUI (#21954)
## Why

Users have requested the ability to edit a goal's objective after a goal
has been created. This PR exposes a new `/goal edit` command in the TUI
to address this request.

In the process of implementing this, I also noticed an existing bug in
the goal runtime. When a goal's objective is updated through the
`thread/goal/set` app server API, the goal runtime didn't emit a new
steering prompt to tell the agent about the new objective. This PR also
fixes this hole.

## What Changed

- Adds `/goal edit` in the TUI, opening an edit box prefilled with the
current goal objective.
- Keeps active and paused goals in their current state, resets completed
goals to active, keeps budget-limited goals budget-limited, and
preserves the existing token budget.
- Changes the existing `thread/goal/set` behavior so editing an
objective preserves goal accounting instead of resetting it. The older
reset-on-new-objective behavior was left over from before
`thread/goal/clear`; clients that need to reset accounting can now clear
the existing goal and create a new one.
- Reuses the existing goal set API path; this does not add or change
app-server protocol surface area.
- Adds a dedicated goal runtime steering prompt when an externally
persisted goal mutation changes the objective, so active turns receive
the updated objective.

## Validation

- Make sure `/goal edit` returns an error if no goal currently exists
- Make sure `/goal edit` displays an edit box that can be optionally
canceled with no side effects
- Make sure that an edited goal results in a steer so the agent starts
pursuing the new objective
- Make sure the new objective is reflected in the goal if you use
`/goal` to display the goal summary
- Make sure that `/goal edit` doesn't reset the token budget, time/token
accounting on the updated goal
2026-05-11 10:49:19 -07:00
jif-oai
32b1ae7099 chore: drop built-in MCPs (#22173)
Drop something that was never used
2026-05-11 19:45:08 +02:00
Ruslan Nigmatullin
a124ddb854 app-server: remove TCP websocket listener (#21843)
## Why

The app-server no longer needs to expose a TCP websocket listener.
Keeping that transport also kept around a separate listener/auth surface
that is unnecessary now that local clients can use stdio or the
Unix-domain control socket, while remote connectivity is handled by
`remote_control`.

## What Changed

- Removed `ws://IP:PORT` parsing and the `AppServerTransport::WebSocket`
startup path.
- Deleted the app-server websocket listener auth module and removed
related CLI flags/dependencies.
- Kept websocket framing only where it is still needed: over the
Unix-domain control socket and in the outbound `remote_control`
connection.
- Updated app-server CLI/help text and `app-server/README.md` to
document only `stdio://`, `unix://`, `unix://PATH`, and `off` for local
transports.
- Converted affected app-server integration coverage from TCP websocket
listeners to UDS-backed websocket connections, and added a parse test
that rejects `ws://` listen URLs.
- Removed the now-unused workspace `constant_time_eq` dependency and
refreshed `Cargo.lock` after `cargo shear` caught the drift.
- Moved test app-server UDS socket paths to short Unix temp paths so
macOS Bazel test sandboxes do not exceed Unix socket path limits.

## Verification

- Added/updated tests around UDS websocket transport behavior and
`ws://` listen URL rejection.
- `cargo shear`
- `cargo metadata --no-deps --format-version 1`
- `cargo test -p codex-app-server unix_socket_transport`
- `cargo test -p codex-app-server unix_socket_disconnect`
- `just fix -p codex-app-server`
- `git diff --check`

Local full Rust test execution was blocked before compilation by an
external fetch failure for the pinned `nornagon/crossterm` git
dependency. `just bazel-lock-update` and `just bazel-lock-check` were
retried after the manifest cleanup but remain blocked by external
BuildBuddy/V8 fetch timeouts.
2026-05-11 10:17:26 -07:00
Eric Traut
f10ddc3f13 Use goal preview metadata for goal-first threads (#21981)
Fixes #20792

## Why

`/goal`-first threads are valid resumable threads, but they can be
missing from `codex resume` and app recents because discovery depends on
metadata derived from a normal first user message.

PR #21489 attempted to fix this by using the goal objective as
`first_user_message`. Review feedback pointed out that
`first_user_message` does more than provide visible text today: it gates
listing, supplies preview text, and participates in deciding whether a
later title should surface as a distinct thread name. Reusing it for the
goal objective could leave a `/goal`-first thread with
`first_user_message=<goal>` and `title=<later prompt>`, even though the
goal should only provide the initial visible preview.

This PR follows that feedback by and keeps the `first_user_message` as
is but introduces a new `preview` field to separate concerns. The
`preview` field is populated from the first user message or the goal
objective. We can extend it in the future to include other sources.

## What Changed

- Added internal thread `preview` metadata in `codex-state`, including a
SQLite migration that backfills from `first_user_message` and from
existing `thread_goals` objectives when needed.
- Treated `ThreadGoalUpdated` as preview-bearing metadata so goal-first
threads can be listed and searched without mutating
`first_user_message`.
- Updated rollout listing, state queries, thread-store conversion, and
app-server mapping to use preview metadata while continuing to expose
the existing public `preview` field.
- Preserved title/name distinctness behavior around literal
`first_user_message`, so a later normal prompt after `/goal` does not
surface as a separate name just because the goal supplied the initial
preview.
- Preserved compatibility for older/internal metadata writes by deriving
preview from `first_user_message` when explicit preview metadata is
absent.

## Verification

- Manually verified that a thread that starts with a `/goal <objective>`
shows up in the resume picker.
2026-05-11 10:12:46 -07:00
Eric Traut
96836e15ed Improve goal continuation based on feedback (#22045)
## Summary

This PR updates the goal continuation prompt to address feedback from
early adopters. There are two primary changes:

1. Goal continuation and budget-limit steering prompts now use hidden
user-context messages instead of hidden developer messages.
2. The goal continuation prompt is refined to improve the model's
ability to fully complete the active goal rather than stop at a smaller
or merely passing subset.

The user-message transition is important for two reasons. First, it
eliminates an issue where older steering messages could be responded to
again after a new turn. Second, it works better with compaction because
user messages are treated differently from developer messages during
compaction.

The prompt refinements make persistence explicit, ground work in current
evidence, encourage `update_plan` for multi-step progress visibility,
and require stronger completion audits before calling `update_goal`. It
also removes the elapsed-time reporting in the prompt; I saw evidence
that this was causing the model to shortcut work as it became nervous
about time.

These changes were tested with evals. Chriss4123 has also been running
independent evals in
[#19910](https://github.com/openai/codex/issues/19910), and many of the
improvements in this PR were suggested by him.

## Verification

- Tested with evals.
- Added and updated focused `codex-core` coverage for hidden goal user
context, continuation and budget-limit request shape, prompt rendering,
and objective delimiter escaping.
2026-05-11 09:51:21 -07:00
Eric Traut
c03eb20d8d Fix side conversation config inheritance (#22106)
Addresses #22101

## Why

Side conversations are ephemeral forks of the active thread, but `/side`
was building its fork config from the app-level config after refreshing
it from disk. If the parent thread had runtime settings that differed
from the current persisted defaults, such as a changed model, reasoning
effort, permissions, reviewer, or fast-mode selection, the side
conversation could start with different behavior than its parent.

## What changed

- Build side fork config from the active parent `ChatWidget` config,
then overlay the parent thread's effective model, reasoning effort,
service tier, and fast-mode opt-out state.
- Forward model reasoning summary, verbosity, personality, web search
mode, and service-tier overrides through TUI app-server
start/resume/fork lifecycle params.
- Add focused tests for parent runtime inheritance, side developer
guardrail preservation, and lifecycle param forwarding.
2026-05-11 09:47:51 -07:00
Ahmed Ibrahim
69f3183a8e Revert "[codex] Harden overflow auto-compaction recovery" (#22170)
Reverts openai/codex#22141
2026-05-11 19:33:15 +03:00
Ahmed Ibrahim
15e79f3c26 [codex] Harden overflow auto-compaction recovery (#22141)
## Why
Dogfooder feedback exposed two correctness gaps in normal-loop overflow
recovery:

1. a sampling request that hit `ContextWindowExceeded` could keep
re-entering auto-compaction indefinitely if the compacted retry still
did not fit, and
2. local compact-history rebuilds flattened user messages down to text,
so an overflowing `[image, "what is this?"]` turn could be retried
without the image after compaction.

That means recovery could either fail to terminate cleanly or proceed
with a materially weakened version of the user request.

## What changed
- Move normal-loop `ContextWindowExceeded` handling into the sampling
retry loop, so successful rescue compaction consumes the provider retry
budget instead of creating an unbounded outer-turn loop.
- Keep compacted user-history rebuilds structured:
`collect_user_messages` now carries user `UserInput` content rather than
flattened strings, and `build_compacted_history` reconstructs full user
messages from that structured representation.
- Preserve image inputs while retaining the existing text-budget
truncation behavior for compacted user history.
- Preserve existing compaction-task failure handling and client-session
reset behavior while bounding repeated overflow retries.
- Add focused regression coverage for:
  - recovery after a normal-loop overflow,
  - retry-budget exhaustion after repeated overflow,
  - local recovery preserving image + text input,
  - remote recovery preserving image + text input,
  - remote compaction v2 preserving image + text input, and
  - compaction failure still terminating cleanly.

The main behavior changes are in `codex-rs/core/src/session/turn.rs` and
`codex-rs/core/src/compact.rs`.

## Verification
- Not run locally; relying on PR CI for this update.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 16:16:49 +00:00
Eric Traut
2229c8daf2 Persist /goal commands in history (#21860)
## Summary

A user reported that `/goal` was not saved to the TUI command history,
which made it unavailable for later recall even though other accepted
input paths persist history entries.

This updates the TUI goal slash-command dispatch so successful `/goal`
invocations append the command text to message history. The change
covers the bare `/goal` menu command, goal control commands such as
`/goal pause`, and objective-setting commands such as `/goal improve
benchmark coverage`.

## Verification

- `cargo test -p codex-tui goal_slash_command -- --nocapture`
2026-05-11 08:43:55 -07:00
Andrey Mishchenko
704ad620f6 Add x-codex-ws-stream-request-start-ms (#22113)
For capturing client-side timing information.
2026-05-11 08:15:52 -07:00
jif-oai
8e12c12a07 feat: move extensions tool (#22163)
This PR is just moving stuff around
2026-05-11 17:14:43 +02:00
jif-oai
672cc1f669 feat: wire extension tool bundles into core (#22147)
## Why

This is the next narrow step toward moving concrete tool families out of
core. After #22138 introduced `codex-tool-api`, we still needed a real
end-to-end seam that lets an extension own an executable tool definition
once and have core install it without the temporary `extension-api`
wrapper or a dependency on `codex-tools`.

`codex-tool-api` is the small extension-facing execution contract, while
`codex-tools` still has a different job: host-side shared tool metadata
and planning logic that is not “run this contributed tool”, like spec
shaping, namespaces, discovery, code-mode augmentation, and
MCP/dynamic-to-Responses API conversion

## What changed

- Moved the shared leaf tool-spec and JSON Schema types into
`codex-tool-api`, so the executable contract now lives with
[`ToolBundle`](c538758095/codex-rs/tool-api/src/bundle.rs (L19-L70)).
- Replaced the temporary extension-side tool wrapper with direct
`ToolBundle` use in `codex-extension-api`.
- Taught core to collect contributed bundles, include them in spec
planning, register them through
[`ToolRegistryBuilder::register_tool_bundle`](c538758095/codex-rs/core/src/tools/registry.rs (L653-L667)),
and dispatch them through the existing router/runtime path.
- Added focused coverage for contributed tools becoming model-visible
and dispatchable, plus spec-planning coverage for contributed function
and freeform tools.

## Verification

- Added `extension_tool_bundles_are_model_visible_and_dispatchable` in
`core/src/tools/router_tests.rs`.
- Added spec-plan coverage in `core/src/tools/spec_plan_tests.rs` for
contributed extension bundles.

## Related

- Follow-up to #22138
2026-05-11 16:42:29 +02:00