Commit Graph

30 Commits

Author SHA1 Message Date
rhan-oai
5f1e7bf6cf [codex-analytics] ingest server requests and responses 2026-04-29 11:54:06 -07:00
efrazer-oai
2009f6e894 refactor: make auth loading async (#19762)
## Summary

Auth loading used to expose synchronous construction helpers in several
places even though some auth sources now need async work. This PR makes
the auth-loading surface async and updates the callers to await it.

This is intentionally only plumbing. It does not change how
AgentIdentity tokens are decoded, how task runtime ids are allocated, or
how JWT signatures are verified.

## Stack

1. **This PR:** [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762)
2. [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)

## Important call sites

| Area | Change |
| --- | --- |
| `codex-login` auth loading | `CodexAuth` and `AuthManager`
construction paths now await auth loading. |
| app-server startup | Auth manager construction is awaited during
initialization. |
| CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now
await the same behavior. |
| cloud requirements storage loader | The loader becomes async so it can
share the same auth construction path. |
| auth tests | Tests that load auth now run in async contexts. |

## Testing

Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
2026-04-27 11:00:27 -07:00
pakrym-oai
9c3abcd46c [codex] Move config loading into codex-config (#19487)
## Why

Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.

To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.

This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.

## What Changed

- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`

## Notes

- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
2026-04-26 15:10:53 -07:00
Michael Bolin
ac2bffa443 test: harden app-server integration tests (#19683)
## Why

Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.

A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.

## What Changed

- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.

## Verification

- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
2026-04-26 12:43:16 -07:00
Ruslan Nigmatullin
8a0ab3fc13 app-server: add Unix socket transport (#18255)
## Summary
- add unix:// app-server transport backed by the shared codex-uds crate
- reuse the websocket connection loop for axum and tungstenite-backed
streams
- add codex app-server proxy to bridge stdio clients to the control
socket
- tolerate Windows UDS backends that report a missing rendezvous path as
connection refused before binding

## Tests
- cargo test -p codex-app-server
control_socket_acceptor_forwards_websocket_text_messages_and_pings
- cargo test -p codex-app-server
- just fmt
- just fix -p codex-app-server
- git -c core.fsmonitor=false diff --check
2026-04-23 11:09:25 -07:00
Andrei Eternal
2b2de3f38b codex: support hooks in config.toml and requirements.toml (#18893)
## Summary

Support the existing hooks schema in inline TOML so hooks can be
configured from both `config.toml` and enterprise-managed
`requirements.toml` without requiring a separate `hooks.json` payload.

This gives enterprise admins a way to ship managed hook policy through
the existing requirements channel while still leaving script delivery to
MDM or other device-management tooling, and it keeps `hooks.json`
working unchanged for existing users.

This also lays the groundwork for follow-on managed filtering work such
as #15937, while continuing to respect project trust gating from #14718.
It does **not** implement `allow_managed_hooks_only` itself.

NOTE: yes, it's a bit unfortunate that the toml isn't formatted as
closely as normal to our default styling. This is because we're trying
to stay compatible with the spec for plugins/hooks that we'll need to
support & the main usecase here is embedding into requirements.toml

## What changed

- moved the shared hook serde model out of `codex-rs/hooks` into
`codex-rs/config` so the same schema can power `hooks.json`, inline
`config.toml` hooks, and managed `requirements.toml` hooks
- added `hooks` support to both `ConfigToml` and
`ConfigRequirementsToml`, including requirements-side `managed_dir` /
`windows_managed_dir`
- treated requirements-managed hooks as one constrained value via
`Constrained`, so managed hook policy is merged atomically and cannot
drift across requirement sources
- updated hook discovery to load requirements-managed hooks first, then
per-layer `hooks.json`, then per-layer inline TOML hooks, with a warning
when a single layer defines both representations
- threaded managed hook metadata through discovered handlers and exposed
requirements hooks in app-server responses, generated schemas, and
`/debug-config`
- added hook/config coverage in `codex-rs/config`, `codex-rs/hooks`,
`codex-rs/core/src/config_loader/tests.rs`, and
`codex-rs/core/tests/suite/hooks.rs`

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server config_api`

## Documentation

Companion updates are needed in the developers website repo for:

- the hooks guide
- the config reference, sample, basic, and advanced pages
- the enterprise managed configuration guide

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
2026-04-22 21:20:09 -07:00
Michael Bolin
18a26d7bbc app-server: accept permission profile overrides (#18279)
## Why

`PermissionProfile` is becoming the canonical permissions shape shared
by core and app-server. After app-server responses expose the active
profile, clients need to be able to send that same shape back when
starting, resuming, forking, or overriding a turn instead of translating
through the legacy `sandbox`/`sandboxPolicy` shorthands.

This still needs to preserve the existing requirements/platform
enforcement model. A profile-shaped request can be downgraded or
rejected by constraints, but the server should keep the user's
elevated-access intent for project trust decisions. Turn-level profile
overrides also need to retain existing read protections, including
deny-read entries and bounded glob-scan metadata, so a permission
override cannot accidentally drop configured protections such as
`**/*.env = deny`.

## What changed

- Adds optional `permissionProfile` request fields to `thread/start`,
`thread/resume`, `thread/fork`, and `turn/start`.
- Rejects ambiguous requests that specify both `permissionProfile` and
the legacy `sandbox`/`sandboxPolicy` fields, including running-thread
resume requests.
- Converts profile-shaped overrides into core runtime filesystem/network
permissions while continuing to derive the constrained legacy sandbox
projection used by existing execution paths.
- Preserves project-trust intent for profile overrides that are
equivalent to workspace-write or full-access sandbox requests.
- Preserves existing deny-read entries and `globScanMaxDepth` when
applying turn-level `permissionProfile` overrides.
- Updates app-server docs plus generated JSON/TypeScript schema fixtures
and regression coverage.

## Verification

- `cargo test -p codex-app-server-protocol schema_fixtures`
- `cargo test -p codex-core
session_configuration_apply_permission_profile_preserves_existing_deny_read_entries`







---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18279).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #18279
2026-04-22 13:34:33 -07:00
starr-openai
1d4cc494c9 Add turn-scoped environment selections (#18416)
## Summary
- add experimental turn/start.environments params for per-turn
environment id + cwd selections
- pass selections through core protocol ops and resolve them with
EnvironmentManager before TurnContext creation
- treat omitted selections as default behavior, empty selections as no
environment, and non-empty selections as first environment/cwd as the
turn primary

## Testing
- ran `just fmt`
- ran `just write-app-server-schema`
- not run: unit tests for this stacked PR

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 17:48:33 -07:00
starr-openai
ddbe2536be Support multiple managed environments (#18401)
## Summary
- refactor EnvironmentManager to own keyed environments with
default/local lookup helpers
- keep remote exec-server client creation lazy until exec/fs use
- preserve disabled agent environment access separately from internal
local environment access

## Validation
- not run (per Codex worktree instruction to avoid tests/builds unless
requested)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 15:29:35 -07:00
Ruslan Nigmatullin
69c3d12274 app-server: implement device key v2 methods (#18430)
## Why

The device-key protocol needs an app-server implementation that keeps
local key operations behind the same request-processing boundary as
other v2 APIs.

app-server owns request dispatch, transport policy, documentation, and
JSON-RPC error shaping. `codex-device-key` owns key binding, validation,
platform provider selection, and signing mechanics. Keeping the adapter
thin makes the boundary easier to review and avoids moving local
key-management details into thread orchestration code.

## What changed

- Added `DeviceKeyApi` as the app-server adapter around
`DeviceKeyStore`.
- Converted protocol protection policies, payload variants, algorithms,
and protection classes to and from the device-key crate types.
- Encoded SPKI public keys and DER signatures as base64 protocol fields.
- Routed `device/key/create`, `device/key/public`, and `device/key/sign`
through `MessageProcessor`.
- Rejected remote transports before provider access while allowing local
`stdio` and in-process callers to reach the device-key API.
- Added stdio, in-process, and websocket tests for device-key validation
and transport policy.
- Documented the device-key methods in the app-server v2 method list.

## Test coverage

- `device_key_create_rejects_empty_account_user_id`
- `in_process_allows_device_key_requests_to_reach_device_key_api`
- `device_key_methods_are_rejected_over_websocket`

## Stack

This is PR 3 of 4 in the device-key app-server stack. It is stacked on
#18429.

## Validation

- `cargo test -p codex-app-server device_key`
- `just fix -p codex-app-server`
2026-04-21 14:07:08 -07:00
pakrym-oai
ffa6944587 Load app-server config through ConfigManager (#18870)
## Summary
- Load app-server startup config through `ConfigManager` instead of
direct `ConfigBuilder` calls.
- Move `ConfigManager` constructor-owned state (`cli_overrides`, runtime
feature map, cloud requirements loader) behind internal manager fields.
- Pass `ConfigManager` into `MessageProcessor` directly instead of
reconstructing it from raw args.

## Tests
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server`
- `just fix -p codex-app-server`
- `just fmt`
2026-04-21 14:01:02 -07:00
Ruslan Nigmatullin
56375712e3 app-server: fix Bazel clippy in tracing tests (#18872)
## Why

PR #18431 exposed a Bazel clippy failure in the app-server unit-test
target across Linux, macOS, and Windows. The failing lint was
`clippy::await_holding_invalid_type`: two tracing tests serialized
access to global tracing state by holding a `tokio::sync::MutexGuard`
across awaited test work.

That serialization is still needed because the tests share
process-global tracing setup and exporter state, but it should not
require holding an async mutex guard through the whole test body.

## What changed

- Replaced the bespoke async `tracing_test_guard` helper with
`serial_test` on the two tracing tests that need global tracing
serialization.
- Removed the `#[expect(clippy::await_holding_invalid_type)]`
annotations and the lock guard callsites that Bazel clippy rejected.

## Validation

- `cargo test -p codex-app-server jsonrpc_span`
- `just fix -p codex-app-server`
- `git diff --check`

I also attempted the exact failing Bazel clippy target locally with
BuildBuddy disabled: `bazel --noexperimental_remote_repo_contents_cache
build --config=clippy --bes_backend= --remote_cache=
--experimental_remote_downloader= --
//codex-rs/app-server:app-server-unit-tests-bin`. That run did not reach
clippy because Bazel timed out downloading `libcap-2.27.tar.gz` from
`kernel.org`.
2026-04-21 13:10:36 -07:00
Michael Bolin
d62421d322 chore: document intentional await-holding cases (#18423)
## Why

This PR prepares the stack to enable Clippy await-holding lints that
were left disabled in #18178. The mechanical lock-scope cleanup is
handled separately; this PR is the documentation/configuration layer for
the remaining await-across-guard sites.

Without explicit annotations, reviewers and future maintainers cannot
tell whether an await-holding warning is a real concurrency smell or an
intentional serialization boundary.

## What changed

- Configures `clippy.toml` so `await_holding_invalid_type` also covers
`tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`.
- Adds targeted `#[expect(clippy::await_holding_invalid_type, reason =
...)]` annotations for intentional async guard lifetimes.
- Documents the main categories of intentional cases: active-turn state
transitions that must remain atomic, session-owned MCP manager accesses,
remote-control websocket serialization, JS REPL kernel/process
serialization, OAuth persistence, external bearer token refresh
serialization, and tests that intentionally serialize shared global or
session-owned state.
- For external bearer token refresh, documents the existing
serialization boundary: holding `cached_token` across the provider
command prevents concurrent cache misses from starting duplicate refresh
commands, and the current behavior is small enough that an explicit
expectation is easier to maintain than adding another synchronization
primitive.

## Verification

- `cargo clippy -p codex-login --all-targets`
- `cargo clippy -p codex-connectors --all-targets`
- `cargo clippy -p codex-core --all-targets`
- The follow-up PR #18698 enables `await_holding_invalid_type` and
`await_holding_lock` as workspace `deny` lints, so any undocumented
remaining offender will fail Clippy.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423).
* #18698
* __->__ #18423
2026-04-20 22:41:54 -07:00
Rasmus Rygaard
7b994100b3 Add session config loader interface (#18208)
## Why

Cloud-hosted sessions need a way for the service that starts or manages
a thread to provide session-owned config without treating all config as
if it came from the same user/project/workspace TOML stack.

The important boundary is ownership: some values should be controlled by
the session/orchestrator, some by the authenticated user, and later some
may come from the executor. The earlier broad config-store shape made
that boundary too fuzzy and overlapped heavily with the existing
filesystem-backed config loader. This PR starts with the smaller piece
we need now: a typed session config loader that can feed the existing
config layer stack while preserving the normal precedence and merge
behavior.

## What Changed

- Added `ThreadConfigLoader` and related typed payloads in
`codex-config`.
- `SessionThreadConfig` currently supports `model_provider`,
`model_providers`, and feature flags.
- `UserThreadConfig` is present as an ownership boundary, but does not
yet add TOML-backed fields.
- `NoopThreadConfigLoader` preserves existing behavior when no external
loader is configured.
  - `StaticThreadConfigLoader` supports tests and simple callers.

- Taught thread config sources to produce ordinary `ConfigLayerEntry`
values so the existing `ConfigLayerStack` remains the place where
precedence and merging happen.

- Wired the loader through `ConfigBuilder`, the config loader, and
app-server startup paths so app-server can provide session-owned config
before deriving a thread config.

- Added coverage for:
  - translating typed thread config into config layers,
- inserting thread config layers into the stack at the right precedence,
- applying session-provided model provider and feature settings when
app-server derives config from thread params.

## Follow-Ups

This intentionally stops short of adding the remote/service transport.
The next pieces are expected to be:

1. Define the proto/API shape for this interface.
2. Add a client implementation that can source session config from the
service side.

## Verification

- Added unit coverage in `codex-config` for the loader and layer
conversion.
- Added `codex-core` config loader coverage for thread config layer
precedence.
- Added app-server coverage that verifies session thread config wins
over request-provided config for model provider and feature settings.
2026-04-20 23:05:49 +00:00
Ruslan Nigmatullin
23d4098c0f app-server: prepare to run initialized rpcs concurrently (#17372)
## Summary

- Refactors `MessageProcessor` and per-connection session state so
initialized service RPC handling can be moved into spawned tasks in a
follow-up PR.
- Shares the processor and initialized session data with
`Arc`/`OnceLock` instead of mutable borrowed connection state.
- Keeps initialized request handling synchronous in this PR; it does
**not** call `tokio::spawn` for service RPCs yet.

## Testing

- `just fmt`
- `cargo test -p codex-app-server` *(fails on existing hardening gaps
covered by #17375, #17376, and #17377; the pipelined config regression
passed before the unrelated failures)*
- `just fix -p codex-app-server`
2026-04-14 11:24:34 -07:00
neil-oai
a92a5085bd Forward app-server turn clientMetadata to Responses (#16009)
## Summary
App-server v2 already receives turn-scoped `clientMetadata`, but the
Rust app-server was dropping it before the outbound Responses request.
This change keeps the fix lightweight by threading that metadata through
the existing turn-metadata path rather than inventing a new transport.

## What we're trying to do and why
We want turn-scoped metadata from the app-server protocol layer,
especially fields like Hermes/GAAS run IDs, to survive all the way to
the actual Responses API request so it is visible in downstream
websocket request logging and analytics.

The specific bug was:
- app-server protocol uses camelCase `clientMetadata`
- Responses transport already has an existing turn metadata carrier:
`x-codex-turn-metadata`
- websocket transport already rewrites that header into
`request.request_body.client_metadata["x-codex-turn-metadata"]`
- but the Rust app-server never parsed or stored `clientMetadata`, so
nothing from the app-server request was making it into that existing
path

This PR fixes that without adding a new header or a second metadata
channel.

## How we did it
### Protocol surface
- Add optional `clientMetadata` to v2 `TurnStartParams` and
`TurnSteerParams`
- Regenerate the JSON schema / TypeScript fixtures
- Update app-server docs to describe the field and its behavior

### Runtime plumbing
- Add a dedicated core op for app-server user input carrying turn-scoped
metadata: `Op::UserInputWithClientMetadata`
- Wire `turn/start` and `turn/steer` through that op / signature path
instead of dropping the metadata at the message-processor boundary
- Store the metadata in `TurnMetadataState`

### Transport behavior
- Reuse the existing serialized `x-codex-turn-metadata` payload
- Merge the new app-server `clientMetadata` into that JSON additively
- Do **not** replace built-in reserved fields already present in the
turn metadata payload
- Keep websocket behavior unchanged at the outer shape level: it still
sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
string now contains the merged fields
- Keep HTTP fallback behavior unchanged except that the existing
`x-codex-turn-metadata` header now includes the merged fields too

### Request shape before / after
Before, a websocket `response.create` looked like:
```json
{
  "type": "response.create",
  "client_metadata": {
    "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
  }
}
```
Even if the app-server caller supplied `clientMetadata`, it was not
represented there.

After, the same request shape is preserved, but the serialized payload
now includes the new turn-scoped fields:
```json
{
  "type": "response.create",
  "client_metadata": {
    "x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
  }
}
```

## Validation
### Targeted tests added / updated
- protocol round-trip coverage for `clientMetadata` on `turn/start` and
`turn/steer`
- protocol round-trip coverage for `Op::UserInputWithClientMetadata`
- `TurnMetadataState` merge test proving client metadata is added
without overwriting reserved built-in fields
- websocket request-shape test proving outbound `response.create`
contains merged metadata inside
`client_metadata["x-codex-turn-metadata"]`
- app-server integration tests proving:
- `turn/start` forwards `clientMetadata` into the outbound Responses
request path
  - websocket warmup + real turn request both behave correctly
  - `turn/steer` updates the follow-up request metadata

### Commands run
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
--lib`
- `cargo test -p codex-core --test all
responses_websocket_preserves_custom_turn_metadata_fields`
- `cargo test -p codex-app-server --test all client_metadata`
- `cargo test -p codex-app-server --test all
turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
-- --nocapture`
- `just fmt`
- `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
-p codex-app-server`
- `just fix -p codex-exec -p codex-tui-app-server`
- `just argument-comment-lint`

### Full suite note
`cargo test` in `codex-rs` still fails in:
-
`suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`

I verified that same failure on a clean detached `HEAD` worktree with an
isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.
2026-04-09 11:52:37 -07:00
Ruslan Nigmatullin
59af4a730c app-server: Allow enabling remote control in runtime (#16973)
Refresh the feature flag on writes to the config.
2026-04-07 11:36:17 -07:00
Ruslan Nigmatullin
1525bbdb9a app-server: centralize AuthManager initialization (#16764)
Extract a shared helper that builds AuthManager from Config and applies
the forced ChatGPT workspace override in one place.

Create the shared AuthManager at MessageProcessor call sites so that
upcoming new transport's initialization can reuse the same handle, and
keep only external auth refresher wiring inside `MessageProcessor`.

Remove the now-unused `AuthManager::shared_with_external_auth` helper.
2026-04-06 12:46:55 -07:00
Michael Bolin
aa2403e2eb core: remove cross-crate re-exports from lib.rs (#16512)
## Why

`codex-core` was re-exporting APIs owned by sibling `codex-*` crates,
which made downstream crates depend on `codex-core` as a proxy module
instead of the actual owner crate.

Removing those forwards makes crate boundaries explicit and lets leaf
crates drop unnecessary `codex-core` dependencies. In this PR, this
reduces the dependency on `codex-core` to `codex-login` in the following
files:

```
codex-rs/backend-client/Cargo.toml
codex-rs/mcp-server/tests/common/Cargo.toml
```

## What

- Remove `codex-rs/core/src/lib.rs` re-exports for symbols owned by
`codex-login`, `codex-mcp`, `codex-rollout`, `codex-analytics`,
`codex-protocol`, `codex-shell-command`, `codex-sandboxing`,
`codex-tools`, and `codex-utils-path`.
- Delete the `default_client` forwarding shim in `codex-rs/core`.
- Update in-crate and downstream callsites to import directly from the
owning `codex-*` crate.
- Add direct Cargo dependencies where callsites now target the owner
crate, and remove `codex-core` from `codex-rs/backend-client`.
2026-04-01 23:06:24 -07:00
rhan-oai
e8de4ea953 [codex-analytics] thread events (#15690)
- add event for thread initialization
- thread/start, thread/fork, thread/resume
- feature flagged behind `FeatureFlag::GeneralAnalytics`
- does not yet support threads started by subagents

PR stack:
- --> [[telemetry] thread events
#15690](https://github.com/openai/codex/pull/15690)
- [[telemetry] subagent events
#15915](https://github.com/openai/codex/pull/15915)
- [[telemetry] turn events
#15591](https://github.com/openai/codex/pull/15591)
- [[telemetry] steer events
#15697](https://github.com/openai/codex/pull/15697)
- [[telemetry] queued prompt data
#15804](https://github.com/openai/codex/pull/15804)


Sample extracted logs in Codex-backend
```
INFO     | 2026-03-29 16:39:37 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3bf7-9f5f-7f82-9877-6d48d1052531 product_surface=codex product_client_id=CODEX_CLI client_name=codex-tui client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=new subagent_source=None parent_thread_id=None created_at=1774827577 | 
INFO     | 2026-03-29 16:45:46 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3b84-5731-79d0-9b3b-9c6efe5f5066 product_surface=codex product_client_id=CODEX_CLI client_name=codex-tui client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=resumed subagent_source=None parent_thread_id=None created_at=1774820022 | 
INFO     | 2026-03-29 16:45:49 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3bfd-4cd6-7c12-a13e-48cef02e8c4d product_surface=codex product_client_id=CODEX_CLI client_name=codex-tui client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=forked subagent_source=None parent_thread_id=None created_at=1774827949 | 
INFO     | 2026-03-29 17:20:29 | codex_backend.routers.analytics_events | analytics_events.track_analytics_events:398 | Tracked analytics event codex_thread_initialized thread_id=019d3c1d-0412-7ed2-ad24-c9c0881a36b0 product_surface=codex product_client_id=CODEX_SERVICE_EXEC client_name=codex_exec client_version=0.0.0 rpc_transport=in_process experimental_api_enabled=True codex_rs_version=0.0.0 runtime_os=macos runtime_os_version=26.4.0 runtime_arch=aarch64 model=gpt-5.3-codex ephemeral=False thread_source=user initialization_mode=new subagent_source=None parent_thread_id=None created_at=1774830027 | 
```

Notes
- `product_client_id` gets canonicalized in codex-backend
- subagent threads are addressed in a following pr
2026-03-31 12:16:44 -07:00
Michael Bolin
61dfe0b86c chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
## Why

`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.

This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.

## What changed

- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`

That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.

## Validation

- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`

## Follow-up

- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.
2026-03-27 19:00:44 -07:00
pakrym-oai
8fa88fa8ca Add cached environment manager for exec server URL (#15785)
Add environment manager that is a singleton and is created early in
app-server (before skill manager, before config loading).

Use an environment variable to point to a running exec server.
2026-03-25 16:14:36 -07:00
Ruslan Nigmatullin
d61c03ca08 app-server: Add back pressure and batching to command/exec (#15547)
* Add
`OutgoingMessageSender::send_server_notification_to_connection_and_wait`
which returns only once message is written to websocket (or failed to do
so)
* Use this mechanism to apply back pressure to stdout/stderr streams of
processes spawned by `command/exec`, to limit them to at most one
message in-memory at a time
* Use back pressure signal to also batch smaller chunks into ≈64KiB ones

This should make commands execution more robust over
high-latency/low-throughput networks
2026-03-24 11:35:51 -07:00
Eric Traut
45f68843b8 Finish moving codex exec to app-server (#15424)
This PR completes the conversion of non-interactive `codex exec` to use
app server rather than directly using core events and methods.

### Summary
- move `codex-exec` off exec-owned `AuthManager` and `ThreadManager`
state
- route exec bootstrap, resume, and auth refresh through existing
app-server paths
- replace legacy `codex/event/*` decoding in exec with typed app-server
notification handling
- update human and JSONL exec output adapters to translate existing
app-server notifications only
- clean up "app server client" layer by eliminating support for legacy
notifications; this is no longer needed
- remove exposure of `authManager` and `threadManager` from "app server
client" layer

### Testing
- `exec` has pretty extensive unit and integration tests already, and
these all pass
- In addition, I asked Codex to put together a comprehensive manual set
of tests to cover all of the `codex exec` functionality (including
command-line options), and it successfully generated and ran these tests
2026-03-24 08:51:32 -06:00
Owen Lin
668330acc1 feat(tracing): tag app-server turn spans with turn_id (#15206)
So we can find and filter spans by `turn.id`.

We do this for the `turn/start`, `turn/steer`, and `turn/interrupt`
APIs.
2026-03-19 13:07:19 -07:00
Charley Cunningham
bc24017d64 Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved

## Runtime model
This PR does not introduce a new `approval_policy`.

Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
  - `user`
  - `guardian_subagent`

`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.

The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.

When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`

Users can still change `/approvals` afterward.

Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior

ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.

## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.

- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface

## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.

It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`

with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`

`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`

`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.

These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.

This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.

## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
  - resolved approval/denial state in history

## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)

Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 15:27:00 -07:00
Owen Lin
014e19510d feat(app-server, core): add more spans (#14479)
## Description

This PR expands tracing coverage across app-server thread startup, core
session initialization, and the Responses transport layer. It also gives
core dispatch spans stable operation-specific names so traces are easier
to follow than the old generic `submission_dispatch` spans.

Also use `fmt::Display` for types that we serialize in traces so we send
strings instead of rust types
2026-03-13 13:16:33 -07:00
Eric Traut
9dba7337f2 Start TUI on embedded app server (#14512)
This PR is part of the effort to move the TUI on top of the app server.
In a previous PR, we introduced an in-process app server and moved
`exec` on top of it.

For the TUI, we want to do the migration in stages. The app server
doesn't currently expose all of the functionality required by the TUI,
so we're going to need to support a hybrid approach as we make the
transition.

This PR changes the TUI initialization to instantiate an in-process app
server and access its `AuthManager` and `ThreadManager` rather than
constructing its own copies. It also adds a placeholder TUI event
handler that will eventually translate app server events into TUI
events. App server notifications are accepted but ignored for now. It
also adds proper shutdown of the app server when the TUI terminates.
2026-03-13 12:04:41 -06:00
Owen Lin
d3e6680531 fix turn_start_jsonrpc_span_parents_core_turn_spans flakiness (#14490)
This makes the test less flaky by checking the core invariant instead of
the full span chain.

Before, the test waited for several specific internal spans
(`submission_dispatch`, `session_task.turn`, `run_turn`) and asserted
their exact relationships. That was brittle because those spans are
exported asynchronously and are more of an implementation detail than
the thing we actually care about.

Now, the test only checks that:
- `turn/start` is on the expected remote trace with the expected remote
parent
- at least one representative core turn span on that same trace descends
from it

That keeps the sanity-check we want while making the test less sensitive
to timing and internal refactors.
2026-03-12 12:16:56 -07:00
Owen Lin
5bc82c5b93 feat(app-server): propagate traces across tasks and core ops (#14387)
## Summary

This PR keeps app-server RPC request trace context alive for the full
lifetime of the work that request kicks off (e.g. for `thread/start`,
this is `app-server rpc handler -> tokio background task -> core op
submissions`). Previously we lose trace lineage once the request handler
returns or hands work off to background tasks.

This approach is especially relevant for `thread/start` and other RPC
handlers that run in a non-blocking way. In the near future we'll most
likely want to make all app-server handlers run in a non-blocking way by
default, and only queue operations that must operate in order (e.g.
thread RPCs per thread?), so we want to make sure tracing in app-server
just generally works.

Depends on https://github.com/openai/codex/pull/14300

**Before**
<img width="155" height="207" alt="image"
src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
/>


**After**
<img width="299" height="337" alt="image"
src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
/>

## What changed

- Keep request-scoped trace context around until we send the final
response or error, or the connection closes.
- Thread that trace context through detached `thread/start` work so
background startup stays attached to the originating request.
- Pass request trace context through to downstream core operations,
including:
  - thread creation
  - resume/fork flows
  - turn submission
  - review
  - interrupt
  - realtime conversation operations
- Add tracing tests that verify:
  - remote W3C trace context is preserved for `thread/start`
  - remote W3C trace context is preserved for `turn/start`
  - downstream core spans stay under the originating request span
  - request-scoped tracing state is cleaned up correctly
- Clean up shutdown behavior so detached background tasks and spawned
threads are drained before process exit.
2026-03-11 20:18:31 -07:00