codex

mirror of https://github.com/openai/codex.git synced 2026-06-01 19:02:59 +00:00

Author	SHA1	Message	Date
Tom	0a9b559c0b	Migrate fork and resume reads to thread store (#18900 ) - Route cold thread/resume and thread/fork source loading through ThreadStore reads instead of direct rollout path operations - Keep lookups that explicitly specify a rollout-path using the local thread store methods but return an invalid-request error for remote ThreadStore configurations - Add some additional unit tests for code path coverage	2026-04-24 13:51:37 -07:00
Michael Bolin	13e0ec1614	permissions: make legacy profile conversion cwd-free (#19414 ) ## Why The profile conversion path still required a `cwd` even when it was only translating a legacy `SandboxPolicy` into a `PermissionProfile`. That made profile producers invent an ambient `cwd`, which is exactly the anchoring we are trying to remove from permission-profile data. A legacy workspace-write policy can be represented symbolically instead: `:cwd = write` plus read-only `:project_roots` metadata subpaths. This PR creates that cwd-free base so the rest of the stack can stop threading cwd through profile construction. Callers that actually need a concrete runtime filesystem policy for a specific cwd still have an explicitly named cwd-bound conversion. ## What Changed - `PermissionProfile::from_legacy_sandbox_policy` now takes only `&SandboxPolicy`. - `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the symbolic, cwd-free projection for profiles. - The old concrete projection is retained as `FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for runtime/boundary code that must materialize legacy cwd behavior. - Workspace-write profiles preserve `CurrentWorkingDirectory` and `ProjectRoots` special entries instead of materializing cwd into absolute paths. ## Verification - `cargo check -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p codex-sandboxing -p codex-linux-sandbox -p codex-analytics --tests` - `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p codex-sandboxing -p codex-linux-sandbox -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414). * #19395 * #19394 * #19393 * #19392 * #19391 * __->__ #19414	2026-04-24 13:42:05 -07:00
jif-oai	deb4509302	feat: surface multi-agent thread limit in spawn description (#19360 ) ## Summary - Thread `agent_max_threads` into `ToolsConfig` and `SpawnAgentToolOptions`. - Render the configured `max_concurrent_threads_per_session` value in the MultiAgentV2 `spawn_agent` description. - Cover the description text in `codex-tools` unit tests and `codex-core` tool spec tests. ## Validation - `just fmt` - `cargo test -p codex-tools` - `cargo test -p codex-core spawn_agent_description` - `git diff --check` ## Notes - `cargo test -p codex-core` was also attempted, but unrelated environment-sensitive tests failed with the active local environment. Examples: approvals reviewer defaults observed `AutoReview` instead of `User`, request-permissions event tests did not emit events, and proxy-env tests saw `http://127.0.0.1:50604` from the active proxy environment. Co-authored-by: Codex <noreply@openai.com>	2026-04-24 15:13:54 +02:00
Michael Bolin	4816b89204	permissions: make profiles represent enforcement (#19231 ) ## Why `PermissionProfile` is becoming the canonical permissions abstraction, but the old shape only carried optional filesystem and network fields. It could describe allowed access, but not who is responsible for enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy when profiles were exported, cached, or round-tripped through app-server APIs. The important model change is that active permissions are now a disjoint union over the enforcement mode. Conceptually: ```rust pub enum PermissionProfile { Managed { file_system: FileSystemSandboxPolicy, network: NetworkSandboxPolicy, }, Disabled, External { network: NetworkSandboxPolicy, }, } ``` This distinction matters because `Disabled` means Codex should apply no outer sandbox at all, while `External` means filesystem isolation is owned by an outside caller. Those are not equivalent to a broad managed sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an inner sandbox may require the outer Codex layer to use no sandbox rather than a permissive one. ## How Existing Modeling Maps Legacy `SandboxPolicy` remains a boundary projection, but it now maps into the higher-fidelity profile model: - `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed` with restricted filesystem entries plus the corresponding network policy. - `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving the “no outer sandbox” intent instead of treating it as a lax managed sandbox. - `ExternalSandbox { network_access }` maps to `PermissionProfile::External { network }`, preserving external filesystem enforcement while still carrying the active network policy. - Split runtime policies that legacy `SandboxPolicy` cannot faithfully express, such as managed unrestricted filesystem plus restricted network, stay `Managed` instead of being collapsed into `ExternalSandbox`. - Per-command/session/turn grants remain partial overlays via `AdditionalPermissionProfile`; full `PermissionProfile` is reserved for complete active runtime permissions. ## What Changed - Change active `PermissionProfile` into a tagged union: `managed`, `disabled`, and `external`. - Keep partial permission grants separate with `AdditionalPermissionProfile` for command/session/turn overlays. - Represent managed filesystem permissions as either `restricted` entries or `unrestricted`; `glob_scan_max_depth` is non-zero when present. - Preserve old rollout compatibility by accepting the pre-tagged `{ network, file_system }` profile shape during deserialization. - Preserve fidelity for important edge cases: `DangerFullAccess` round-trips as `disabled`, `ExternalSandbox` round-trips as `external`, and managed unrestricted filesystem + restricted network stays managed instead of being mistaken for external enforcement. - Preserve configured deny-read entries and bounded glob scan depth when full profiles are projected back into runtime policies, including unrestricted replacements that now become `:root = write` plus deny entries. - Regenerate the experimental app-server v2 JSON/TypeScript schema and update the `command/exec` README example for the tagged `permissionProfile` shape. ## Compatibility Legacy `SandboxPolicy` remains available at config/API boundaries as the compatibility projection. Existing rollout lines with the old `PermissionProfile` shape continue to load. The app-server `permissionProfile` field is experimental, so its v2 wire shape is intentionally updated to match the higher-fidelity model. ## Verification - `just write-app-server-schema` - `cargo check --tests` - `cargo test -p codex-protocol permission_profile` - `cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable` - `cargo test -p codex-app-server-protocol permission_profile_file_system_permissions` - `cargo test -p codex-app-server-protocol serialize_client_response` - `cargo test -p codex-core session_configured_reports_permission_profile_for_external_sandbox` - `just fix` - `just fix -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-core` - `just fix -p codex-app-server`	2026-04-23 23:02:18 -07:00
Celia Chen	e8d8080818	feat: let model providers own model discovery (#18950 ) ## Why `codex-models-manager` had grown to own provider-specific concerns: constructing OpenAI-compatible `/models` requests, resolving provider auth, emitting request telemetry, and deciding how provider catalogs should be sourced. That made the manager harder to reuse for providers whose model catalog is not fetched from the OpenAI `/models` endpoint, such as Amazon Bedrock. This change moves provider-specific model discovery behind provider-owned implementations, so the models manager can focus on refresh policy, cache behavior, picker ordering, and model metadata merging. ## What Changed - Introduced a `ModelsManager` trait with separate `OpenAiModelsManager` and `StaticModelsManager` implementations. - Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives outside `codex-models-manager`. - Moved `/models` request construction, provider auth resolution, timeout handling, and request telemetry into `codex-model-provider` via `OpenAiModelsEndpoint`. - Added provider-owned `models_manager(...)` construction so configured OpenAI-compatible providers use `OpenAiModelsManager`, while static/catalog-backed providers can return `StaticModelsManager`. - Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock model IDs. - Updated core/session/thread manager code and tests to depend on `Arc<dyn ModelsManager>`. - Moved offline model test helpers into `codex_models_manager::test_support`. ## Metadata References The Bedrock catalog metadata is based on the official Amazon Bedrock OpenAI model documentation: - [Amazon Bedrock OpenAI models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html) lists the Bedrock model IDs, text input/output modalities, and `128,000` token context window for `gpt-oss-20b` and `gpt-oss-120b`. - [Amazon Bedrock `gpt-oss-120b` model card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html) lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the `bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities, and `128K` context window. - [OpenAI `gpt-oss-120b` model docs](https://developers.openai.com/api/docs/models/gpt-oss-120b) document configurable reasoning effort with `low`, `medium`, and `high`, plus text input/output modality. The display names, default reasoning effort, and priority ordering are Codex-local catalog choices. ## Test Plan - Manually verified app-server model listing with an AWS profile: ```shell CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \ --codex-bin ./target/debug/codex \ -c 'model_provider="amazon-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \ model-list ``` The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0` as the default model and `openai.gpt-oss-20b-1:0` as the second listed model, both text-only and supporting low/medium/high reasoning effort.	2026-04-24 04:28:25 +00:00
starr-openai	49fb25997f	Add sticky environment API and thread state (#18897 ) ## Summary - add sticky environment selections to app-server v2 thread/start and turn/start request flow - carry thread-level selections through core session/thread state - add app-server coverage for sticky selections and turn overrides ## Stack 1. This PR: API and thread persistence 2. #18898: config.toml named environment loading 3. #18899: downstream tool/runtime consumers ## Validation - Not run locally; split only. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-23 18:57:13 -07:00
cassirer-openai	e3c8720a99	[rollout_trace] Add debug trace reduction command (#18880 ) ## Summary Adds the debug CLI entry point for reducing recorded rollout traces. This gives developers a direct way to inspect whether the emitted trace stream reduces into the expected conversation/runtime model. ## Stack This is PR 5/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This PR is intentionally last: it depends on the trace crate, core recorder, runtime/tool events, and session/agent edge data all existing. The command should remain a debug/developer tool and avoid adding new runtime behavior. The useful review question is whether the CLI exposes the reducer in the smallest practical way for local inspection without turning the debug command into a supported user-facing workflow.	2026-04-24 01:56:48 +00:00
efrazer-oai	5882f3f95e	refactor: route Codex auth through AuthProvider (#18811 ) ## Summary This PR moves Codex backend request authentication from direct bearer-token handling to `AuthProvider`. The new `codex-auth-provider` crate defines the shared request-auth trait. `CodexAuth::provider()` returns a provider that can apply all headers needed for the selected auth mode. This lets ChatGPT token auth and AgentIdentity auth share the same callsite path: - ChatGPT token auth applies bearer auth plus account/FedRAMP headers where needed. - AgentIdentity auth applies AgentAssertion plus account/FedRAMP headers where needed. Reference old stack: https://github.com/openai/codex/pull/17387/changes ## Callsite Migration \| Area \| Change \| \| --- \| --- \| \| backend-client \| accepts an `AuthProvider` instead of a raw token/header \| \| chatgpt client/connectors \| applies auth through `CodexAuth::provider()` \| \| cloud tasks \| keeps Codex-backend gating, applies auth through provider \| \| cloud requirements \| uses Codex-backend auth checks and provider headers \| \| app-server remote control \| applies provider headers for backend calls \| \| MCP Apps/connectors \| gates on `uses_codex_backend()` and keys caches from generic account getters \| \| model refresh \| treats AgentIdentity as Codex-backend auth \| \| OpenAI file upload path \| rejects non-Codex-backend auth before applying headers \| \| core client setup \| keeps model-provider auth flow and allows AgentIdentity through provider-backed OpenAI auth \| ## Stack 1. https://github.com/openai/codex/pull/18757: full revert 2. https://github.com/openai/codex/pull/18871: isolated Agent Identity crate 3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity auth mode and startup task allocation 4. This PR: migrate Codex backend auth callsites through AuthProvider 5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs and load `CODEX_AGENT_IDENTITY` ## Testing Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.	2026-04-23 17:14:02 -07:00
cassirer-openai	6d09b6752d	[rollout_trace] Trace tool and code-mode boundaries (#18878 ) ## Summary Extends rollout tracing across tool dispatch and code-mode runtime boundaries. This records canonical tool-call lifecycle events and links code-mode execution/wait operations back to the model-visible calls that caused them. ## Stack This is PR 3/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This PR is about attribution. Reviewers should focus on whether direct tool calls, code-mode-originated tool calls, waits, outputs, and cancellation boundaries are recorded with enough source information for deterministic reduction without coupling the reducer to live runtime internals. The stack remains valid after this layer: tool and code-mode traces reduce through the existing crate model, while the broader session and multi-agent relationships are added in the next PR.	2026-04-23 12:22:11 -07:00
Michael Bolin	f90cc0ee64	tui: carry permission profiles on user turns (#18285 ) ## Why Per-turn permission overrides should use the same canonical profile abstraction as session configuration. That lets TUI submissions preserve exact configured permissions without round-tripping through legacy sandbox fields. ## What changed This adds `permission_profile` to user-turn operations, threads it through TUI/app-server submission paths, fills the new field in existing test fixtures, and adds coverage that composer submission includes the configured profile. ## Verification - `cargo test -p codex-tui permissions -- --nocapture` - `cargo test -p codex-core --test all permissions_messages -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285). * #18288 * #18287 * #18286 * __->__ #18285	2026-04-23 11:54:17 -07:00
Tom	f1923a38b1	[codex] Route live thread writes through ThreadStore (#18882 ) Begin migrating the thread write codepaths to ThreadStore. This starts using ThreadStore inside of core session code, not only in the app server code. Rework the interfaces around thread recording/persistence. We're left with the following: * `ThreadManager`: owns the process-level registry of loaded threads and handles cross-thread orchestration: start, resume, fork, lookup, remove, and route ops to running CodexThreads. * `CodexThread`: represents one loaded/running thread from the outside. It is the handle app-server and callers use to submit ops, inspect session metadata, and shut the thread down. * `LiveThread`: session-owned persistence lifecycle handle for one active thread. Core session code uses it to append rollout items, materialize lazy persistence, flush, shutdown, discard init-failed writers, and load that thread’s persisted history. * `ThreadStore`: storage backend abstraction. It answers “how are threads persisted, read, listed, updated, archived?” Local and remote implementations live behind this trait. * `LocalThreadStore`: local ThreadStore implementation. It owns the file/sqlite-specific details and keeps RolloutRecorder as a local implementation detail. This is a few too many Thread abstractions for my liking, but they do all represent different concepts / needs / layers. Migration note: in places where the core code explicitly requires a path, rather than a thread ID, throw an error if we're running with a remote store. Cover the new local live-writer lifecycle with focused tests and preserve app-server thread-start behavior, including ephemeral pathless sessions.	2026-04-23 10:17:09 -07:00
Eric Traut	bbff4ee61a	Add safety check notification and error handling (#19055 ) Adds a new app-server notification that fires when a user account has been flagged for potential safety reasons.	2026-04-22 22:24:12 -07:00
Shijie Rao	02170996e6	Default Fast service tier for eligible ChatGPT plans (#19053 ) ## Why Enterprise and business-like ChatGPT plans should get Codex's Fast service tier by default when the user or caller has not made an explicit service-tier choice. At the same time, callers need a durable way to choose standard routing without adding a new persisted `standard` service tier value. This keeps existing config compatibility while letting core own the managed default policy. ## What changed - Resolve the effective service tier in core at session creation: explicit `fast` or `flex` wins, explicit null/clear or `[notice].fast_default_opt_out = true` resolves to standard routing, and otherwise eligible ChatGPT plans resolve to Fast when FastMode is enabled. - Add `[notice].fast_default_opt_out` as the persisted opt-out marker for managed Fast defaults. - Treat app-server/TUI `service_tier: null` as an explicit standard/clear choice by preserving that intent through config loading. - Update TUI rendering to use core's effective service tier for startup and status surfaces while still keeping `config.service_tier` as the explicit configured choice. - Update `/fast off` to clear `service_tier`, persist the opt-out marker, and send explicit standard for subsequent turns. ## Verification - Added unit coverage for config override/notice handling, service-tier resolution, runtime null clearing, and `/fast off` turn propagation. - `cargo build -p codex-cli` Full test suite was not run locally per author request.	2026-04-22 21:54:44 -07:00
Michael Bolin	082fc4f632	protocol: report session permission profiles (#18282 ) ## Why Clients that observe `SessionConfigured` need the same canonical permission view that app-server thread responses provide. Reporting the profile in protocol events lets clients keep their local state synchronized without reinterpreting legacy sandbox fields. ## What changed This adds `permission_profile` to `SessionConfigured` and propagates it through core, exec JSON output, MCP server messages, and TUI history/widget handling. ## Verification - `cargo test -p codex-tui permissions -- --nocapture` - `cargo test -p codex-core --test all permissions_messages -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18282). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * __->__ #18282	2026-04-22 21:29:32 -07:00
Dylan Hurd	5e71da1424	feat(request-permissions) approve with strict review (#19050 ) ## Summary Allow the user to approve a request_permissions_tool request with the condition that all commands in the rest of the turn are reviewed by guardian, regardless of sandbox status. ## Testing - [x] Added unit tests - [x] Ran locally	2026-04-23 01:56:32 +00:00
Won Park	83ec1eb5d6	Rename approvals reviewer variant to auto-review (#19056 ) ## Why `approvals_reviewer` now uses `auto_review` as the canonical config/API value after #18504, but the Rust enum variant and nearby helper/test names still used `GuardianSubagent` / guardian approval wording. That made follow-up code and reviews confusing even though the external value had already moved to Auto-review. ## What changed - Renamed `ApprovalsReviewer::GuardianSubagent` to `ApprovalsReviewer::AutoReview`. - Updated protocol, app-server, config, core, TUI, exec, and analytics test callsites. - Renamed nearby helper/test names from guardian approval wording to Auto-review wording where they refer to the approvals reviewer mode. - Preserved wire compatibility: - `auto_review` remains the canonical serialized value. - `guardian_subagent` remains accepted as a legacy alias. This intentionally does not rename the `[features].guardian_approval` key, `Feature::GuardianApproval`, `core/src/guardian`, analytics event names, or app-server Guardian review event types. ## Verification - `cargo test -p codex-protocol approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent` - `cargo test -p codex-app-server-protocol approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent` - `cargo test -p codex-config approvals_reviewer` - `cargo test -p codex-tui update_feature_flags` - `cargo test -p codex-core permissions_instructions` - `cargo test -p codex-tui permissions_selection`	2026-04-22 17:22:35 -07:00
Michael Bolin	6ca038bbd1	rollout: persist turn permission profiles (#18281 ) ## Why Resume and reconstruction need to preserve the permissions that were active for each user turn. If rollouts only keep legacy sandbox fields, replay cannot faithfully represent profile-shaped overrides introduced earlier in the stack. ## What changed This records `permission_profile` on user-turn rollout events, reconstructs it through history/state extraction, and updates rollout reconstruction and related fixtures to keep the field explicit. ## Verification - `cargo test -p codex-core --test all permissions_messages -- --nocapture` - `cargo test -p codex-core --test all request_permissions -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * __->__ #18281	2026-04-22 17:00:29 -07:00
Leo Shimonaka	16eeeb534a	Fix MCP permission policy sync (#19033 ) ###### Why/Context/Summary Repro: start a session outside Full Access, switch permissions to Full Access, then submit a new turn that triggers MCP/CUA permission handling. The turn used the live Full Access `SessionConfiguration`, but the MCP coordinator was still synced from the stale `original_config_do_not_use` / per-turn config copy. That left the coordinator with an old sandbox policy, so empty MCP permission elicitations could be denied instead of auto-accepted. Fix: update/rebuild the MCP connection manager from the live turn/session approval and sandbox policy fields. ###### Test plan ```sh just fmt cargo test -p codex-core --lib cargo test -p codex-core --lib mcp_tool_call::tests ```	2026-04-22 14:30:29 -07:00
Michael Bolin	18a26d7bbc	app-server: accept permission profile overrides (#18279 ) ## Why `PermissionProfile` is becoming the canonical permissions shape shared by core and app-server. After app-server responses expose the active profile, clients need to be able to send that same shape back when starting, resuming, forking, or overriding a turn instead of translating through the legacy `sandbox`/`sandboxPolicy` shorthands. This still needs to preserve the existing requirements/platform enforcement model. A profile-shaped request can be downgraded or rejected by constraints, but the server should keep the user's elevated-access intent for project trust decisions. Turn-level profile overrides also need to retain existing read protections, including deny-read entries and bounded glob-scan metadata, so a permission override cannot accidentally drop configured protections such as `*/.env = deny`. ## What changed - Adds optional `permissionProfile` request fields to `thread/start`, `thread/resume`, `thread/fork`, and `turn/start`. - Rejects ambiguous requests that specify both `permissionProfile` and the legacy `sandbox`/`sandboxPolicy` fields, including running-thread resume requests. - Converts profile-shaped overrides into core runtime filesystem/network permissions while continuing to derive the constrained legacy sandbox projection used by existing execution paths. - Preserves project-trust intent for profile overrides that are equivalent to workspace-write or full-access sandbox requests. - Preserves existing deny-read entries and `globScanMaxDepth` when applying turn-level `permissionProfile` overrides. - Updates app-server docs plus generated JSON/TypeScript schema fixtures and regression coverage. ## Verification - `cargo test -p codex-app-server-protocol schema_fixtures` - `cargo test -p codex-core session_configuration_apply_permission_profile_preserves_existing_deny_read_entries` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18279). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * #18281 * #18280 * __->__ #18279	2026-04-22 13:34:33 -07:00
Dylan Hurd	ed4def8286	feat(auto-review) short-circuit (#18890 ) ## Summary Short circuit the convo if auto-review hits too many denials ## Testing - [x] Added unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-22 20:34:15 +00:00
xl-openai	b77791c228	feat: Fairly trim skill descriptions within context budget (#18925 ) Preserve skill name/path entries whenever possible and trim descriptions first, using round-robin character allocation so short descriptions do not waste budget.	2026-04-22 12:33:29 -07:00
Won Park	11e5af53c4	Add plumbing to approve stored Auto-Review denials (#18955 ) ## Summary This adds the structural plumbing needed for an app-server client to approve a previously denied Guardian review and carry that approval context into the next model turn. This PR does not add the actual `/auto-review-denials` tool ## What Changed - Added app-server v2 RPC `thread/approveGuardianDeniedAction`. - Added generated JSON schema and TypeScript fixtures for `ThreadApproveGuardianDeniedAction*`. - Added core `Op::ApproveGuardianDeniedAction`. - Added a core handler that validates the event is a denied Guardian assessment and injects a developer message containing the stored denial event JSON. - Queues the approval context for the next turn if there is no active turn yet. - Added the TUI app-server bridge so `Op::ApproveGuardianDeniedAction { event }` is routed to the app-server request. ## What This Does Not Do - Does not add `/auto-review-denials`. - Does not add chat widget recent-denial state. - Does not add popup/list UI. - Does not add a product-facing denial lookup/store. - Does not change where Guardian denials are originally emitted or persisted. ## Verification - `cargo test -p codex-tui thread_approve_guardian_denied_action`	2026-04-22 10:38:19 -07:00
cassirer-openai	f67383bcba	[rollout_trace] Record core session rollout traces (#18877 ) ## Summary Wires rollout trace recording into `codex-core` session and turn execution. This records the core model request/response, compaction, and session lifecycle boundaries needed for replay without yet tracing every nested runtime/tool boundary. ## Stack This is PR 2/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This layer is the first live integration point. The important review question is whether trace recording is isolated from normal session behavior: trace failures should not become user-visible execution failures, and recording should preserve the existing turn/session lifecycle semantics. The PR depends on the reducer/data model from the first stack entry and only introduces the core recorder surface that later PRs use for richer runtime and relationship events.	2026-04-22 17:00:48 +00:00
jif-oai	639382609f	fix: wait_agent timeout for queued mailbox mail (#18968 ) ## Why `wait_agent` can be called while mailbox mail is already pending. The previous implementation subscribed for future mailbox sequence changes and then waited for the next notification. If the mail was queued before that wait started, no new notification arrived, so the tool could sit until `timeout_ms` even though mail was ready to deliver. ## What Changed - Added `Session::has_pending_mailbox_items()` for checking pending mailbox mail through the session API. - Updated `multi_agents_v2::wait` to return immediately when pending mailbox mail already exists before sleeping on a new mailbox sequence update. - Reworked the regression coverage in `multi_agents_tests.rs` so already queued mailbox mail must wake `wait_agent` promptly. Relevant code: - [`wait_agent` pending-mail check](`aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_v2/wait.rs (L55-L60)`) - [`Session::has_pending_mailbox_items`](`aa8ca06e83/codex-rs/core/src/session/mod.rs (L2979-L2981)`) - [`multi_agent_v2_wait_agent_returns_for_already_queued_mail`](`aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_tests.rs (L2854)`) ## Verification - `cargo test -p codex-core multi_agent_v2_wait_agent_returns_for_already_queued_mail`	2026-04-22 11:16:17 +01:00
rhan-oai	213b17b7a3	[codex-analytics] guardian review TTFT plumbing and emission (#17696 ) ## Why Guardian analytics includes time-to-first-token, but the Guardian reviewer runs as a normal Codex session and `TurnCompleteEvent` did not expose TTFT. The timing needs to flow through the standard turn-completion protocol so Guardian review analytics can consume the same value as the rest of the session machinery. ## What changed Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and populates it from `TurnTiming`. The value is carried through app-server thread history, rollout reconstruction, TUI/app-server adapters, and Guardian review session handling. Guardian review analytics now captures TTFT from the reviewer turn-complete event when available. Existing tests and fixtures are updated to set the new optional field to `None` where TTFT is not relevant. ## Verification - `cargo clippy -p codex-tui --tests -- -D warnings` - `cargo clippy -p codex-core --lib --tests -- -D warnings` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696). * __->__ #17696 * #17695 * #17693 * #18278 * #18953	2026-04-22 01:52:48 -07:00
rhan-oai	4e7399c6b9	[codex-analytics] guardian review analytics events emission (#17693 ) ## Why Guardian approvals now run as review sessions, but Codex analytics did not have a terminal event for those reviews. That made it hard to measure approval outcomes, failure modes, Guardian session reuse, model metadata, token usage, and timing separately from the parent turn. ## What changed Adds `codex_guardian_review` analytics emission for Guardian approval reviews. The event is emitted from the Guardian review path with review identity, target item id, approval request source, a PII-minimized reviewed-action shape, terminal decision/status, failure reason, Guardian assessment fields, Guardian session metadata, token usage, and timing metadata. The reviewed-action payload intentionally omits high-risk fields such as shell commands, working directories, argv, file paths, network targets/hosts, rationale, retry reason, and permission justifications. It also classifies prompt-build failures separately from Guardian session/runtime failures so fail-closed cases are distinguishable in analytics. ## Verification - Guardian review analytics tests cover terminal success, timeout/cancel/fail-closed paths, session metadata, and token usage plumbing. - `cargo clippy -p codex-core --lib --tests -- -D warnings` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17693). * #17696 * #17695 * __->__ #17693	2026-04-22 01:02:47 -07:00
Michael Bolin	0fef35dc3a	core: derive active permission profiles (#18277 ) ## Why `Permissions` should not store a separate `PermissionProfile` that can drift from the constrained `SandboxPolicy` and network settings. The active profile needs to be derived from the same constrained values that already honor `requirements.toml`. ## What changed This adds derivation of the active `PermissionProfile` from the constrained runtime permission settings and exposes that derived value through config snapshots and thread state. The app-server can then report the active profile without introducing a second source of truth. ## Verification - `cargo test -p codex-core --test all permissions_messages -- --nocapture` - `cargo test -p codex-core --test all request_permissions -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18277). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * #18281 * #18280 * #18279 * #18278 * __->__ #18277	2026-04-21 22:11:40 -07:00
Michael Bolin	36f8bb4ffa	exec-server: carry filesystem sandbox profiles (#18276 ) ## Why The exec-server still needs platform sandbox inputs, but the migration should preserve the `PermissionProfile` that produced them. Keeping only the derived legacy sandbox map would keep `SandboxPolicy` as the effective abstraction and would make full-disk vs. restricted profiles harder to preserve as the permissions stack starts round-tripping profiles. `PermissionProfile` entries can also be cwd-sensitive (`:cwd`, `:project_roots`, relative globs), so the exec-server must carry the request sandbox cwd instead of resolving those entries against the long-lived exec-server process cwd. ## What changed `FileSystemSandboxContext` now carries `permissions: PermissionProfile` plus an optional `cwd`: - removed `sandboxPolicy`, `sandboxPolicyCwd`, `fileSystemSandboxPolicy`, and `additionalPermissions` - added `permissions` and `cwd` - kept the platform knobs `windowsSandboxLevel`, `windowsSandboxPrivateDesktop`, and `useLegacyLandlock` Core turn and apply-patch paths populate the context from the active runtime permissions and request cwd. Exec-server derives platform `SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary, adds helper runtime reads there, and rejects cwd-dependent profiles that arrive without a cwd. The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor now preserves the old workspace-write conversion semantics for compatibility tests/callers. ## Verification - `cargo test -p codex-exec-server` - `cargo test -p codex-exec-server sandbox_cwd -- --nocapture` - `cargo test -p codex-exec-server sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths -- --nocapture` - `cargo test -p codex-core --lib file_system_sandbox_context_uses_active_attempt -- --nocapture`	2026-04-21 20:22:28 -07:00
starr-openai	1d4cc494c9	Add turn-scoped environment selections (#18416 ) ## Summary - add experimental turn/start.environments params for per-turn environment id + cwd selections - pass selections through core protocol ops and resolve them with EnvironmentManager before TurnContext creation - treat omitted selections as default behavior, empty selections as no environment, and non-empty selections as first environment/cwd as the turn primary ## Testing - ran `just fmt` - ran `just write-app-server-schema` - not run: unit tests for this stacked PR --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 17:48:33 -07:00
maja-openai	ef00014a46	Allow guardian bare allow output (#18797 ) ## Summary Allow guardian to skip other fields and output only `{"outcome":"allow"}` when the command is low risk. This change lets guardian reviews use a non-strict text format while keeping the JSON schema itself as plain user-visible schema data, so transport strictness is carried out-of-band instead of through a schema marker key. ## What changed - Add an explicit `output_schema_strict` flag to model prompts and pass it into `codex-api` text formatting. - Set guardian reviewer prompts to non-strict schema validation while preserving strict-by-default behavior for normal callers. - Update the guardian output contract so definitely-low-risk decisions may return only `{"outcome":"allow"}`. - Treat bare allow responses as low-risk approvals in the guardian parser. - Add tests and snapshots covering the non-strict guardian request and optional guardian output fields. ## Verification - `cargo test -p codex-core guardian::tests::guardian` - `cargo test -p codex-core guardian::tests::` - `cargo test -p codex-core client_common::tests::` - `cargo test -p codex-protocol user_input_serialization_includes_final_output_json_schema` - `cargo test -p codex-api` - `git diff --check` Note: `cargo test -p codex-core` was also attempted, but this desktop environment injects ambient config/proxy state that causes unrelated config/session tests expecting pristine defaults to fail. --------- Co-authored-by: Dylan Hurd <dylan.hurd@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-04-21 15:37:12 -07:00
starr-openai	ddbe2536be	Support multiple managed environments (#18401 ) ## Summary - refactor EnvironmentManager to own keyed environments with default/local lookup helpers - keep remote exec-server client creation lazy until exec/fs use - preserve disabled agent environment access separately from internal local environment access ## Validation - not run (per Codex worktree instruction to avoid tests/builds unless requested) --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 15:29:35 -07:00
efrazer-oai	be75785504	fix: fully revert agent identity runtime wiring (#18757 ) ## Summary This PR fully reverts the previously merged Agent Identity runtime integration from the old stack: https://github.com/openai/codex/pull/17387/changes It removes the Codex-side task lifecycle wiring, rollout/session persistence, feature flag plumbing, lazy `auth.json` mutation, background task auth paths, and request callsite changes introduced by that stack. This leaves the repo in a clean pre-AgentIdentity integration state so the follow-up PRs can reintroduce the pieces in smaller reviewable layers. ## Stack 1. This PR: full revert 2. https://github.com/openai/codex/pull/18871: move Agent Identity business logic into a crate 3. https://github.com/openai/codex/pull/18785: add explicit AgentIdentity auth mode and startup task allocation 4. https://github.com/openai/codex/pull/18811: migrate auth callsites through AuthProvider ## Testing Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.	2026-04-21 14:30:55 -07:00
Michael Bolin	f8562bd47b	sandboxing: intersect permission profiles semantically (#18275 ) ## Why Permission approval responses must not be able to grant more access than the tool requested. Moving this flow to `PermissionProfile` means the comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and cwd-relative special paths such as `:cwd` and `:project_roots` must stay anchored to the turn that produced the request. ## What changed This implements semantic `PermissionProfile` intersection in `codex-sandboxing` for file-system and network permissions. The intersection accepts narrower path grants, rejects broader grants, preserves deny-read carve-outs and glob scan depth, and materializes cwd-dependent special-path grants to absolute paths before they can be recorded for reuse. The request-permissions response paths now use that intersection consistently. App-server captures the request turn cwd before waiting for the client response, includes that cwd in the v2 approval params, and core stores the requested profile plus cwd for direct TUI/client responses and Guardian decisions before recording turn- or session-scoped grants. The TUI app-server bridge now preserves the app-server request cwd when converting permission approval params into core events. ## Verification - `cargo test -p codex-sandboxing intersect_permission_profiles -- --nocapture` - `cargo test -p codex-app-server request_permissions_response -- --nocapture` - `cargo test -p codex-core request_permissions_response_materializes_session_cwd_grants_before_recording -- --nocapture` - `cargo check -p codex-tui --tests` - `cargo check --tests` - `cargo test -p codex-tui app_server_request_permissions_preserves_file_system_permissions`	2026-04-21 10:23:01 -07:00
pakrym-oai	2a226096f6	Split DeveloperInstructions into individual fragments. (#18813 ) Split DeveloperInstructions into individual fragments.	2026-04-21 10:22:36 -07:00
pash-openai	dc1a8f2190	[tool search] support namespaced deferred dynamic tools (#18413 ) Deferred dynamic tools need to round-trip a namespace so a tool returned by `tool_search` can be called through the same registry key that core uses for dispatch. This change adds namespace support for dynamic tool specs/calls, persists it through app-server thread state, and routes dynamic tool calls by full `ToolName` while still sending the app the leaf tool name. Deferred dynamic tools must provide a namespace; non-deferred dynamic tools may remain top-level. It also introduces `LoadableToolSpec` as the shared function-or-namespace Responses shape used by both `tool_search` output and dynamic tool registration, so dynamic tools use the same wrapping logic in both paths. Validation: - `cargo test -p codex-tools` - `cargo test -p codex-core tool_search` --------- Co-authored-by: Sayan Sisodiya <sayan@openai.com>	2026-04-21 14:13:08 +08:00
Michael Bolin	d62421d322	chore: document intentional await-holding cases (#18423 ) ## Why This PR prepares the stack to enable Clippy await-holding lints that were left disabled in #18178. The mechanical lock-scope cleanup is handled separately; this PR is the documentation/configuration layer for the remaining await-across-guard sites. Without explicit annotations, reviewers and future maintainers cannot tell whether an await-holding warning is a real concurrency smell or an intentional serialization boundary. ## What changed - Configures `clippy.toml` so `await_holding_invalid_type` also covers `tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`. - Adds targeted `#[expect(clippy::await_holding_invalid_type, reason = ...)]` annotations for intentional async guard lifetimes. - Documents the main categories of intentional cases: active-turn state transitions that must remain atomic, session-owned MCP manager accesses, remote-control websocket serialization, JS REPL kernel/process serialization, OAuth persistence, external bearer token refresh serialization, and tests that intentionally serialize shared global or session-owned state. - For external bearer token refresh, documents the existing serialization boundary: holding `cached_token` across the provider command prevents concurrent cache misses from starting duplicate refresh commands, and the current behavior is small enough that an explicit expectation is easier to maintain than adding another synchronization primitive. ## Verification - `cargo clippy -p codex-login --all-targets` - `cargo clippy -p codex-connectors --all-targets` - `cargo clippy -p codex-core --all-targets` - The follow-up PR #18698 enables `await_holding_invalid_type` and `await_holding_lock` as workspace `deny` lints, so any undocumented remaining offender will fail Clippy. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423). * #18698 * __->__ #18423	2026-04-20 22:41:54 -07:00
pakrym-oai	4c2e730488	Organize context fragments (#18794 ) Organize context fragments under `core/context`. Implement same trait on all of them.	2026-04-20 22:39:17 -07:00
Abhinav	ab26554a3a	Add remote_sandbox_config to our config requirements (#18763 ) ## Why Customers need finer-grained control over allowed sandbox modes based on the host Codex is running on. For example, they may want stricter sandbox limits on devboxes while keeping a different default elsewhere. Our current cloud requirements can target user/account groups, but they cannot vary sandbox requirements by host. That makes remote development environments awkward because the same top-level `allowed_sandbox_modes` has to apply everywhere. ## What Adds a new `remote_sandbox_config` section to `requirements.toml`: ```toml allowed_sandbox_modes = ["read-only"] [[remote_sandbox_config]] hostname_patterns = [".org"] allowed_sandbox_modes = ["read-only", "workspace-write"] [[remote_sandbox_config]] hostname_patterns = [".sh", "runner-*.ci"] allowed_sandbox_modes = ["read-only", "danger-full-access"] ``` During requirements resolution, Codex resolves the local host name once, preferring the machine FQDN when available and falling back to the cleaned kernel hostname. This host classification is best effort rather than authenticated device proof. Each requirements source applies its first matching `remote_sandbox_config` entry before it is merged with other sources. The shared merge helper keeps that `apply_remote_sandbox_config` step paired with requirements merging so new requirements sources do not have to remember the extra call. That preserves source precedence: a lower-precedence requirements file with a matching `remote_sandbox_config` cannot override a higher-precedence source that already set `allowed_sandbox_modes`. This also wires the hostname-aware resolution through app-server, CLI/TUI config loading, config API reads, and config layer metadata so they all evaluate remote sandbox requirements consistently. ## Verification - `cargo test -p codex-config remote_sandbox_config` - `cargo test -p codex-config host_name` - `cargo test -p codex-core load_config_layers_applies_matching_remote_sandbox_config` - `cargo test -p codex-core system_remote_sandbox_config_keeps_cloud_sandbox_modes` - `cargo test -p codex-config` - `cargo test -p codex-core` unit tests passed; `tests/all.rs` integration matrix was intentionally stopped after the relevant focused tests passed - `just fix -p codex-config` - `just fix -p codex-core` - `cargo check -p codex-app-server`	2026-04-21 05:05:02 +00:00
Dylan Hurd	86535c9901	feat(auto-review) Handle request_permissions calls (#18393 ) ## Summary When auto-review is enabled, it should handle request_permissions tool. We'll need to clean up the UX but I'm planning to do that in a separate pass ## Testing - [x] Ran locally <img width="893" height="396" alt="Screenshot 2026-04-17 at 1 16 13 PM" src="https://github.com/user-attachments/assets/4c045c5f-1138-4c6c-ac6e-2cb6be4514d8" /> --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 21:48:57 -07:00
Rasmus Rygaard	7b994100b3	Add session config loader interface (#18208 ) ## Why Cloud-hosted sessions need a way for the service that starts or manages a thread to provide session-owned config without treating all config as if it came from the same user/project/workspace TOML stack. The important boundary is ownership: some values should be controlled by the session/orchestrator, some by the authenticated user, and later some may come from the executor. The earlier broad config-store shape made that boundary too fuzzy and overlapped heavily with the existing filesystem-backed config loader. This PR starts with the smaller piece we need now: a typed session config loader that can feed the existing config layer stack while preserving the normal precedence and merge behavior. ## What Changed - Added `ThreadConfigLoader` and related typed payloads in `codex-config`. - `SessionThreadConfig` currently supports `model_provider`, `model_providers`, and feature flags. - `UserThreadConfig` is present as an ownership boundary, but does not yet add TOML-backed fields. - `NoopThreadConfigLoader` preserves existing behavior when no external loader is configured. - `StaticThreadConfigLoader` supports tests and simple callers. - Taught thread config sources to produce ordinary `ConfigLayerEntry` values so the existing `ConfigLayerStack` remains the place where precedence and merging happen. - Wired the loader through `ConfigBuilder`, the config loader, and app-server startup paths so app-server can provide session-owned config before deriving a thread config. - Added coverage for: - translating typed thread config into config layers, - inserting thread config layers into the stack at the right precedence, - applying session-provided model provider and feature settings when app-server derives config from thread params. ## Follow-Ups This intentionally stops short of adding the remote/service transport. The next pieces are expected to be: 1. Define the proto/API shape for this interface. 2. Add a client implementation that can source session config from the service side. ## Verification - Added unit coverage in `codex-config` for the loader and layer conversion. - Added `codex-core` config loader coverage for thread config layer precedence. - Added app-server coverage that verifies session thread config wins over request-provided config for model provider and feature settings.	2026-04-20 23:05:49 +00:00
Akshay Nathan	34a3e85fcd	Wire the PatchUpdated events through app_server (#18289 ) Wires patch_updated events through app_server. These events are parsed and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer. The eventual goal is to use this to display better progress indicators in the codex app.	2026-04-20 10:44:03 -07:00
Ahmed Ibrahim	316cf0e90b	Update models.json (#18586 ) - Replace the active models-manager catalog with the deleted core catalog contents. - Replace stale hardcoded test model slugs with current bundled model slugs. - Keep this as a stacked change on top of the cleanup PR.	2026-04-20 10:27:01 -07:00
Michael Bolin	5d5d610740	refactor: use semaphores for async serialization gates (#18403 ) This is the second cleanup in the await-holding lint stack. The higher-level goal, following https://github.com/openai/codex/pull/18178 and https://github.com/openai/codex/pull/18398, is to enable Clippy coverage for guards held across `.await` points without carrying broad suppressions. The stack is working toward enabling Clippy's [`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock) lint and the configurable [`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type) lint for Tokio guard types. Several existing fields used `tokio::sync::Mutex<()>` only as one-at-a-time async gates. Those guards intentionally lived across `.await` while an operation was serialized. A mutex over `()` suggests protected data and trips the await-holding lint shape; a single-permit `tokio::sync::Semaphore` expresses the intended serialization directly. ## What changed - Replace `Mutex<()>` serialization gates with `Semaphore::new(1)` for agent identity ensure, exec policy updates, guardian review session reuse, plugin remote sync, managed network proxy refresh, auth token refresh, and RMCP session recovery. - Update call sites from `lock().await` / `try_lock()` to `acquire().await` / `try_acquire()`. - Map closed-semaphore errors into the existing local error types, even though these semaphores are owned for the lifetime of their managers. - Update session test builders for the new `managed_network_proxy_refresh_lock` type. ## Verification - The split stack was verified at the final lint-enabling head with `just clippy`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18403). * #18698 * #18423 * #18418 * __->__ #18403	2026-04-20 17:21:29 +00:00
Adrian	904c751a40	[codex] Use background agent task auth for backend calls (#18094 ) ## Summary Introduces a single background/control-plane agent task for ChatGPT backend requests that do not have a thread-scoped task, with `AuthManager` owning the default ChatGPT backend authorization decision. Callers now ask `AuthManager` for the default ChatGPT backend authorization header. `AuthManager` decides whether that is bearer or background AgentAssertion based on config/internal state, while low-level bootstrap paths can explicitly request bearer-only auth. This PR is stacked on PR4 and focuses on the shared background task auth plumbing plus the first tranche of backend/control-plane consumers. The remaining callsite wiring is split into PR4.2 to keep review size down. ## Stack - PR1: https://github.com/openai/codex/pull/17385 - add `features.use_agent_identity` - PR2: https://github.com/openai/codex/pull/17386 - register agent identities when enabled - PR3: https://github.com/openai/codex/pull/17387 - register agent tasks when enabled - PR3.1: https://github.com/openai/codex/pull/17978 - persist and prewarm registered tasks per thread - PR4: https://github.com/openai/codex/pull/17980 - use task-scoped `AgentAssertion` for downstream calls - PR4.1: this PR - introduce AuthManager-owned background/control-plane `AgentAssertion` auth - PR4.2: https://github.com/openai/codex/pull/18260 - use background task auth for additional backend/control-plane calls ## What Changed - add background task registration and assertion minting inside `codex-login` - persist `agent_identity.background_task_id` separately from per-session task state - make `BackgroundAgentTaskManager` private to `codex-login`; call sites do not instantiate or pass it around - teach `AuthManager` the ChatGPT backend base URL and feature-derived background auth mode from resolved config - expose bearer-only helpers for bootstrap/registration/refresh-style paths that must not use AgentAssertion - wire `AuthManager` default ChatGPT authorization through app listing, connector directory listing, remote plugins, MCP status/listing, analytics, and core-skills remote calls - preserve bearer fallback when the feature is disabled, the backend host is unsupported, or background task registration is not available ## Validation - `just fmt` - `cargo check -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `cargo test -p codex-login agent_identity` - `cargo test -p codex-model-provider bearer_auth_provider` - `cargo test -p codex-core agent_assertion` - `cargo test -p codex-app-server remote_control` - `cargo test -p codex-cloud-requirements fetch_cloud_requirements` - `cargo test -p codex-models-manager manager::tests` - `cargo test -p codex-chatgpt` - `cargo test -p codex-cloud-tasks` - `just fix -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `just fix -p codex-app-server` - `git diff --check`	2026-04-20 06:50:28 -07:00
jif-oai	2c59806fe0	feat: add metric to track the number of turns with memory usage (#18662 ) Add a metric `codex.turn.memory` to know if a turn used memories or not. This is not part of the other turn metrics as a label to limit cardinality	2026-04-20 14:31:22 +01:00
Adrian	b44d2851cf	[codex] Use AgentAssertion downstream behind use_agent_identity (#17980 ) ## Summary This is the AgentAssertion downstream slice for feature-gated agent identity support, replacing the oversized AgentAssertion slice from PR #17807. It isolates task-scoped downstream AgentAssertion wiring on top of the merged PR3.1 work without re-carrying the earlier agent registration, task registration, or task-state history. This PR includes the task-scoped bug-fix call sites from the review: generic file upload auth, MCP OpenAI file upload auth, and ARC monitor auth. Broader user/control-plane calls move to PR4.1 and PR4.2. ## Stack - PR1: https://github.com/openai/codex/pull/17385 - add `features.use_agent_identity` - PR2: https://github.com/openai/codex/pull/17386 - register agent identities when enabled - PR3: https://github.com/openai/codex/pull/17387 - register agent tasks when enabled - PR3.1: https://github.com/openai/codex/pull/17978 - persist and prewarm registered tasks per thread - PR4: this PR - use task-scoped `AgentAssertion` downstream when enabled - PR4.1: https://github.com/openai/codex/pull/18094 - introduce AuthManager-owned background/control-plane `AgentAssertion` auth - PR4.2: https://github.com/openai/codex/pull/18260 - use background task auth for additional backend/control-plane calls ## What Changed - add AgentAssertion envelope generation in `codex-core` - route downstream HTTP and websocket auth through AgentAssertion when an agent task is present - extend the model-provider auth provider so non-bearer authorization schemes can be passed through cleanly - make generic file uploads attach the full authorization header value - make MCP OpenAI file uploads use the cached thread agent task assertion when present - make ARC monitor calls use the cached thread agent task assertion when present ## Why The original PR had drifted ancestry and showed a much larger diff than the semantic change actually required. Restacking it onto PR3.1 keeps the reviewable surface down to the downstream assertion slice. ## Validation - `just fmt` - `cargo check -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `cargo test -p codex-model-provider bearer_auth_provider` - `cargo test -p codex-core agent_assertion` - `cargo test -p codex-app-server remote_control` - `cargo test -p codex-cloud-requirements fetch_cloud_requirements` - `cargo test -p codex-models-manager manager::tests` - `cargo test -p codex-chatgpt` - `cargo test -p codex-cloud-tasks` - `cargo test -p codex-login agent_identity` - `just fix -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `just fix -p codex-app-server` - `git diff --check`	2026-04-19 23:16:43 -07:00
Dylan Hurd	49403e3676	chore(multiagent) skills instructions toggle (#18596 ) ## Summary Support toggling the skills message off. ## Test Plan - [x] Updated unit tests	2026-04-19 21:11:52 -07:00
Adrian	e5b52a3caa	Persist and prewarm agent tasks per thread (#17978 ) ## Summary - persist registered agent tasks in the session state update stream so the thread can reuse them - prewarm task registration once identity registration succeeds, while keeping startup failures best-effort - isolate the session-side task lifecycle into a dedicated module so AgentIdentityManager and RegisteredAgentTask do not leak across as many core layers ## Testing - cargo test -p codex-core startup_agent_task_prewarm - cargo test -p codex-core cached_agent_task_for_current_identity_clears_stale_task - cargo test -p codex-core record_initial_history_	2026-04-19 15:45:28 -07:00
Ahmed Ibrahim	996aa23e4c	[5/6] Wire executor-backed MCP stdio (#18212 ) ## Summary - Add the executor-backed RMCP stdio transport. - Wire MCP stdio placement through the executor environment config. - Cover local and executor-backed stdio paths with the existing MCP test helpers. ## Stack ```text o #18027 [6/6] Fail exec client operations after disconnect │ @ #18212 [5/6] Wire executor-backed MCP stdio │ o #18087 [4/6] Abstract MCP stdio server launching │ o #18020 [3/6] Add pushed exec process events │ o #18086 [2/6] Support piped stdin in exec process API │ o #18085 [1/6] Add MCP server environment config │ o main ``` --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-18 21:47:43 -07:00
jif-oai	e3c2acb9cd	Revert "[codex] drain mailbox only at request boundaries" (#18325 ) ## Summary - Reverts PR #17749 so queued inter-agent mail can again preempt after reasoning/commentary output item boundaries. - Applies the revert to the current `codex/turn.rs` module layout and restores the prior pending-input test expectations/snapshots. ## Testing - `just fmt` - `cargo test -p codex-core --test all pending_input` - `cargo test -p codex-core` failed in unrelated `tools::js_repl::tests::js_repl_imported_local_files_can_access_repl_globals`: dotslash download hit `mktemp: mkdtemp failed ... Operation not permitted` in the sandbox temp dir. Co-authored-by: Codex <noreply@openai.com>	2026-04-18 09:53:48 -07:00

1 2 3 4 5

207 Commits