codex

mirror of https://github.com/openai/codex.git synced 2026-04-30 17:36:40 +00:00

Author	SHA1	Message	Date
Dylan Hurd	34800d717e	[codex] Clean guardian instructions (#18934 ) ## Summary - Keep the guardian policy installed as guardian base instructions. - Clear inherited parent `developer_instructions` for guardian review sessions. - Update guardian config tests to assert developer instructions are cleared and policy text is sourced from base instructions. ## Why Guardian review sessions are intended to run under an isolated guardian policy. Because the guardian config is cloned from the parent config, inherited custom or managed developer instructions could otherwise remain active and conflict with guardian review behavior. ## Validation - `just fmt` - `cargo test -p codex-core guardian_review_session_config` Co-authored-by: Codex <noreply@openai.com>	2026-04-21 21:47:58 -07:00
Dylan Hurd	0e39614d87	chore(tui) debug-config guardian_policy_config (#18923 ) ## Summary List guardian_policy_config_source in `/debug-config` output ## Testing - [x] Ran locally	2026-04-21 21:00:23 -07:00
Michael Bolin	36f8bb4ffa	exec-server: carry filesystem sandbox profiles (#18276 ) ## Why The exec-server still needs platform sandbox inputs, but the migration should preserve the `PermissionProfile` that produced them. Keeping only the derived legacy sandbox map would keep `SandboxPolicy` as the effective abstraction and would make full-disk vs. restricted profiles harder to preserve as the permissions stack starts round-tripping profiles. `PermissionProfile` entries can also be cwd-sensitive (`:cwd`, `:project_roots`, relative globs), so the exec-server must carry the request sandbox cwd instead of resolving those entries against the long-lived exec-server process cwd. ## What changed `FileSystemSandboxContext` now carries `permissions: PermissionProfile` plus an optional `cwd`: - removed `sandboxPolicy`, `sandboxPolicyCwd`, `fileSystemSandboxPolicy`, and `additionalPermissions` - added `permissions` and `cwd` - kept the platform knobs `windowsSandboxLevel`, `windowsSandboxPrivateDesktop`, and `useLegacyLandlock` Core turn and apply-patch paths populate the context from the active runtime permissions and request cwd. Exec-server derives platform `SandboxPolicy`/`FileSystemSandboxPolicy` at the filesystem boundary, adds helper runtime reads there, and rejects cwd-dependent profiles that arrive without a cwd. The legacy `FileSystemSandboxContext::new(SandboxPolicy)` constructor now preserves the old workspace-write conversion semantics for compatibility tests/callers. ## Verification - `cargo test -p codex-exec-server` - `cargo test -p codex-exec-server sandbox_cwd -- --nocapture` - `cargo test -p codex-exec-server sandbox_context_new_preserves_legacy_workspace_write_read_only_subpaths -- --nocapture` - `cargo test -p codex-core --lib file_system_sandbox_context_uses_active_attempt -- --nocapture`	2026-04-21 20:22:28 -07:00
xl-openai	a978e411f6	feat: Support remote plugin list/read. (#18452 ) Add a temporary internal remote_plugin feature flag that merges remote marketplaces into plugin/list and routes plugin/read through the remote APIs when needed, while keeping pure local marketplaces working as before. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 18:39:07 -07:00
Celia Chen	1cd3ad1f49	feat: add AWS SigV4 auth for OpenAI-compatible model providers (#17820 ) ## Summary Add first-class Amazon Bedrock Mantle provider support so Codex can keep using its existing Responses API transport with OpenAI-compatible AWS-hosted endpoints such as AOA/Mantle. This is needed for the AWS launch path, where provider traffic should authenticate with AWS credentials instead of OpenAI bearer credentials. Requests are authenticated immediately before transport send, so SigV4 signs the final method, URL, headers, and body bytes that `reqwest` will send. ## What Changed - Added a new `codex-aws-auth` crate for loading AWS SDK config, resolving credentials, and signing finalized HTTP requests with AWS SigV4. - Added a built-in `amazon-bedrock` provider that targets Bedrock Mantle Responses endpoints, defaults to `us-east-1`, supports region/profile overrides, disables WebSockets, and does not require OpenAI auth. - Added Amazon Bedrock auth resolution in `codex-model-provider`: prefer `AWS_BEARER_TOKEN_BEDROCK` when set, otherwise use AWS SDK credentials and SigV4 signing. - Added `AuthProvider::apply_auth` and `Request::prepare_body_for_send` so request-signing providers can sign the exact outbound request after JSON serialization/compression. - Determine the region by taking the `aws.region` config first (required for bearer token codepath), and fallback to SDK default region. ## Testing Amazon Bedrock Mantle Responses paths: - Built the local Codex binary with `cargo build`. - Verified the custom proxy-backed `aws` provider using `env_key = "AWS_BEARER_TOKEN_BEDROCK"` streamed raw `responses` output with `response.output_text.delta`, `response.completed`, and `mantle-env-ok`. - Verified a full `codex exec --profile aws` turn returned `mantle-env-ok`. - Confirmed the custom provider used the bearer env var, not AWS profile auth: bogus `AWS_PROFILE` still passed, empty env var failed locally, and malformed env var reached Mantle and failed with `401 invalid_api_key`. - Verified built-in `amazon-bedrock` with `AWS_BEARER_TOKEN_BEDROCK` set passed despite bogus AWS profiles, returning `amazon-bedrock-env-ok`. - Verified built-in `amazon-bedrock` SDK/SigV4 auth passed with `AWS_BEARER_TOKEN_BEDROCK` unset and temporary AWS session env credentials, returning `amazon-bedrock-sdk-env-ok`.	2026-04-22 01:11:17 +00:00
Michael Bolin	e18fe7a07f	test(core): move prompt debug coverage to integration suite (#18916 ) ## Why `build_prompt_input` now initializes `ExecServerRuntimePaths`, which requires a configured Codex executable path. The previous inline unit test in `core/src/prompt_debug.rs` built a bare `test_config()` and then failed before it could assert anything useful: ```text Codex executable path is not configured ``` This coverage is also integration-shaped: it drives the public `build_prompt_input` entry point through config, thread, and session setup rather than testing a small internal helper in isolation. Bazel CI did not catch this earlier because the affected test was behind the same wrapped Rust unit-test path fixed by #18913. Before that launcher/sharding fix, the outer `workspace_root_test` changed the working directory for Insta compatibility while the inner `rules_rust` sharding wrapper still expected its runfiles working directory. In practice, Bazel could report success without executing the Rust test cases in that shard. Once #18913 makes the wrapper run the Rust test binary directly and shard with libtest arguments, this stale unit test actually runs and exposes the missing `codex_self_exe` setup. ## What Changed - Moved `build_prompt_input_includes_context_and_user_message` out of `core/src/prompt_debug.rs`. - Added `core/tests/suite/prompt_debug_tests.rs` and registered it from `core/tests/suite/mod.rs`. - Builds the test config with `ConfigBuilder` and provides `codex_self_exe` using the current test executable, matching the runtime-path invariant required by prompt debug setup. - Preserves the existing assertions that the generated prompt input includes both the debug user message and project-specific user instructions. ## Verification - `cargo test -p codex-core --test all prompt_debug_tests::build_prompt_input_includes_context_and_user_message` - `bazel test //codex-rs/core:core-all-test --test_arg=prompt_debug_tests::build_prompt_input_includes_context_and_user_message --test_output=errors` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18916). * #18913 * __->__ #18916	2026-04-22 01:08:25 +00:00
Felipe Coury	09ebc34f17	fix(core): emit hooks for apply_patch edits (#18391 ) Fixes https://github.com/openai/codex/issues/16732. ## Why `apply_patch` is Codex's primary file edit path, but it was not emitting `PreToolUse` or `PostToolUse` hook events. That meant hook-based policy, auditing, and write coordination could observe shell commands while missing the actual file mutation performed by `apply_patch`. The issue also exposed that the hook runtime serialized command hook payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch` supplied hook payloads, hooks would either fail to match it directly or receive misleading stdin that identified the edit as a Bash tool call. ## What Changed - Added `PreToolUse` and `PostToolUse` payload support to `ApplyPatchHandler`. - Exposed the raw patch body as `tool_input.command` for both JSON/function and freeform `apply_patch` calls. - Taught tool hook payloads to carry a handler-supplied hook-facing `tool_name`. - Preserved existing shell compatibility by continuing to emit `Bash` for shell-like tools. - Serialized the selected hook `tool_name` into hook stdin instead of hardcoding `Bash`. - Relaxed the generated hook command input schema so `tool_name` can represent tools other than `Bash`. ## Verification Added focused handler coverage for: - JSON/function `apply_patch` calls producing a `PreToolUse` payload. - Freeform `apply_patch` calls producing a `PreToolUse` payload. - Successful `apply_patch` output producing a `PostToolUse` payload. - Shell and `exec_command` handlers continuing to expose `Bash`. Added end-to-end hook coverage for: - A `PreToolUse` hook matching `^apply_patch$` blocking the patch before the target file is created. - A `PostToolUse` hook matching `^apply_patch$` receiving the patch input and tool response, then adding context to the follow-up model request. - Non-participating tools such as the plan tool continuing not to emit `PreToolUse`/`PostToolUse` hook events. Also validated manually with a live `codex exec` smoke test using an isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with `tool_name: "apply_patch"`, a shell command still emits `tool_name: "Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file from being created.	2026-04-21 22:00:40 -03:00
starr-openai	1d4cc494c9	Add turn-scoped environment selections (#18416 ) ## Summary - add experimental turn/start.environments params for per-turn environment id + cwd selections - pass selections through core protocol ops and resolve them with EnvironmentManager before TurnContext creation - treat omitted selections as default behavior, empty selections as no environment, and non-empty selections as first environment/cwd as the turn primary ## Testing - ran `just fmt` - ran `just write-app-server-schema` - not run: unit tests for this stacked PR --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 17:48:33 -07:00
Michael Bolin	799e50412e	sandboxing: materialize cwd-relative permission globs (#18867 ) ## Why #18275 anchors session-scoped `:cwd` and `:project_roots` grants to the request cwd before recording them for reuse. Relative deny glob entries need the same treatment. Without anchoring, a stored session permission can keep a pattern such as `*/.env` relative, then reinterpret that deny against a later turn cwd. That makes the persisted profile depend on the cwd at reuse time instead of the cwd that was reviewed and approved. ## What changed `intersect_permission_profiles` now materializes retained `FileSystemPath::GlobPattern` entries against the request cwd, matching the existing materialization for cwd-sensitive special paths. Materialized accepted grants are now deduplicated before deny retention runs. This keeps the sticky-grant preapproval shape stable when a repeated request is merged with the stored grant and both `:cwd = write` and the materialized absolute cwd write are present. The preapproval check compares against the same materialized form, so a later request for the same cwd-relative deny glob still matches the stored anchored grant instead of re-prompting or rejecting. Tests cover both the storage path and the preapproval path: a session-scoped `:cwd = write` grant with `*/.env = none` is stored with both the cwd write and deny glob anchored to the original request cwd, cannot be reused from a later cwd, and remains preapproved when re-requested from the original cwd after merging with the stored grant. ## Verification - `cargo test -p codex-sandboxing policy_transforms` - `cargo test -p codex-core --lib relative_deny_glob_grants_remain_preapproved_after_materialization` - `cargo clippy -p codex-sandboxing --tests -- -D clippy::redundant_clone` - `cargo clippy -p codex-core --lib -- -D clippy::redundant_clone` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18867). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * #18281 * #18280 * #18279 * #18278 * #18277 * #18276 * __->__ #18867	2026-04-21 17:28:58 -07:00
maja-openai	ef00014a46	Allow guardian bare allow output (#18797 ) ## Summary Allow guardian to skip other fields and output only `{"outcome":"allow"}` when the command is low risk. This change lets guardian reviews use a non-strict text format while keeping the JSON schema itself as plain user-visible schema data, so transport strictness is carried out-of-band instead of through a schema marker key. ## What changed - Add an explicit `output_schema_strict` flag to model prompts and pass it into `codex-api` text formatting. - Set guardian reviewer prompts to non-strict schema validation while preserving strict-by-default behavior for normal callers. - Update the guardian output contract so definitely-low-risk decisions may return only `{"outcome":"allow"}`. - Treat bare allow responses as low-risk approvals in the guardian parser. - Add tests and snapshots covering the non-strict guardian request and optional guardian output fields. ## Verification - `cargo test -p codex-core guardian::tests::guardian` - `cargo test -p codex-core guardian::tests::` - `cargo test -p codex-core client_common::tests::` - `cargo test -p codex-protocol user_input_serialization_includes_final_output_json_schema` - `cargo test -p codex-api` - `git diff --check` Note: `cargo test -p codex-core` was also attempted, but this desktop environment injects ambient config/proxy state that causes unrelated config/session tests expecting pristine defaults to fail. --------- Co-authored-by: Dylan Hurd <dylan.hurd@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-04-21 15:37:12 -07:00
starr-openai	ddbe2536be	Support multiple managed environments (#18401 ) ## Summary - refactor EnvironmentManager to own keyed environments with default/local lookup helpers - keep remote exec-server client creation lazy until exec/fs use - preserve disabled agent environment access separately from internal local environment access ## Validation - not run (per Codex worktree instruction to avoid tests/builds unless requested) --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 15:29:35 -07:00
efrazer-oai	be75785504	fix: fully revert agent identity runtime wiring (#18757 ) ## Summary This PR fully reverts the previously merged Agent Identity runtime integration from the old stack: https://github.com/openai/codex/pull/17387/changes It removes the Codex-side task lifecycle wiring, rollout/session persistence, feature flag plumbing, lazy `auth.json` mutation, background task auth paths, and request callsite changes introduced by that stack. This leaves the repo in a clean pre-AgentIdentity integration state so the follow-up PRs can reintroduce the pieces in smaller reviewable layers. ## Stack 1. This PR: full revert 2. https://github.com/openai/codex/pull/18871: move Agent Identity business logic into a crate 3. https://github.com/openai/codex/pull/18785: add explicit AgentIdentity auth mode and startup task allocation 4. https://github.com/openai/codex/pull/18811: migrate auth callsites through AuthProvider ## Testing Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.	2026-04-21 14:30:55 -07:00
jif-oai	15b8cde2a4	chore: default multi-agent v2 fork to all (#18873 ) Default sub-agents v2 to `all` for the fork mode	2026-04-21 21:54:58 +01:00
iceweasel-oai	8612714aa6	Add Windows sandbox unified exec runtime support (#15578 ) ## Summary This is the runtime/foundation half of the Windows sandbox unified-exec work. - add Windows sandbox `unified_exec` session support in `windows-sandbox-rs` for both: - the legacy restricted-token backend - the elevated runner backend - extend the PTY/process runtime so driver-backed sessions can support: - stdin streaming - stdout/stderr separation - exit propagation - PTY resize hooks - add Windows sandbox runtime coverage in `codex-windows-sandbox` / `codex-utils-pty` This PR does not enable Windows sandbox `UnifiedExec` for product callers yet because hooking this up to app-server comes in the next PR. Windows sandbox advertising is intentionally kept aligned with `main`, so sandboxed Windows callers still fall back to `ShellCommand`. This PR isolates the runtime/session layer so it can be reviewed independently from product-surface enablement. --------- Co-authored-by: jif-oai <jif@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-04-21 10:44:49 -07:00
Michael Bolin	f8562bd47b	sandboxing: intersect permission profiles semantically (#18275 ) ## Why Permission approval responses must not be able to grant more access than the tool requested. Moving this flow to `PermissionProfile` means the comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and cwd-relative special paths such as `:cwd` and `:project_roots` must stay anchored to the turn that produced the request. ## What changed This implements semantic `PermissionProfile` intersection in `codex-sandboxing` for file-system and network permissions. The intersection accepts narrower path grants, rejects broader grants, preserves deny-read carve-outs and glob scan depth, and materializes cwd-dependent special-path grants to absolute paths before they can be recorded for reuse. The request-permissions response paths now use that intersection consistently. App-server captures the request turn cwd before waiting for the client response, includes that cwd in the v2 approval params, and core stores the requested profile plus cwd for direct TUI/client responses and Guardian decisions before recording turn- or session-scoped grants. The TUI app-server bridge now preserves the app-server request cwd when converting permission approval params into core events. ## Verification - `cargo test -p codex-sandboxing intersect_permission_profiles -- --nocapture` - `cargo test -p codex-app-server request_permissions_response -- --nocapture` - `cargo test -p codex-core request_permissions_response_materializes_session_cwd_grants_before_recording -- --nocapture` - `cargo check -p codex-tui --tests` - `cargo check --tests` - `cargo test -p codex-tui app_server_request_permissions_preserves_file_system_permissions`	2026-04-21 10:23:01 -07:00
pakrym-oai	2a226096f6	Split DeveloperInstructions into individual fragments. (#18813 ) Split DeveloperInstructions into individual fragments.	2026-04-21 10:22:36 -07:00
pakrym-oai	5fe767e8e1	Refactor app-server config loading into ConfigManager (#18442 ) Localize app-server configuration loading in one place.	2026-04-21 10:22:26 -07:00
Rennie	3a9df58d06	Propagate thread id in MCP tool metadata (#18093 ) ## Summary - attach the authoritative Codex thread id to MCP tool request `_meta.threadId` for model-initiated tool calls - attach the same thread id for manual `mcpServer/tool/call` requests before invoking the MCP server - cover both metadata helper behavior and the manual app-server MCP path in tests needed because the Rust app-server is the last place that still has authoritative knowledge of “this model-generated MCP tool call belongs to conversation/thread X” before the request leaves Codex and reaches Hoopa. It adds threadId to MCP request metadata in the model-generated tool-call path, using sess.conversation_id, and also does the same for the manual mcpServer/tool/call path. ## Test plan - `cargo test -p codex-core mcp_tool_call_thread_id_meta_is_added_to_request_meta --lib` - `cargo test -p codex-app-server mcp_server_tool_call_returns_tool_result` Paired Hoopa consumer PR: https://github.com/openai/openai/pull/833263	2026-04-21 10:09:46 -07:00
Michael Bolin	b06fc8bd0d	core: make test-log a dev dependency (#18846 ) The `test-log` crate is only used by `codex-core` tests, so it does not need to be part of the normal `codex-core` dependency graph. Keeping `test-log` in `dev-dependencies` removes it from normal `codex-core` builds and keeps the production dependency set a little smaller. Verification: - `cargo tree -p codex-core --edges normal --invert test-log` - `cargo check -p codex-core --lib` - `cargo test -p codex-core --lib`	2026-04-21 09:48:31 -07:00
pakrym-oai	833212115e	Move external agent config out of core (#18850 ) ## Summary - Move external agent config migration logic and tests from `codex-core` into `app-server/src/config`. - Keep the migration service crate-private to app-server and update the API adapter imports. - Remove stale core re-exports and expose only the needed marketplace source helper. ## Testing - `cargo test -p codex-app-server config::external_agent_config` - `just fmt` - `just fix -p codex-app-server` - `just fix -p codex-core` - `git diff --check`	2026-04-21 08:33:58 -07:00
pash-openai	dc1a8f2190	[tool search] support namespaced deferred dynamic tools (#18413 ) Deferred dynamic tools need to round-trip a namespace so a tool returned by `tool_search` can be called through the same registry key that core uses for dispatch. This change adds namespace support for dynamic tool specs/calls, persists it through app-server thread state, and routes dynamic tool calls by full `ToolName` while still sending the app the leaf tool name. Deferred dynamic tools must provide a namespace; non-deferred dynamic tools may remain top-level. It also introduces `LoadableToolSpec` as the shared function-or-namespace Responses shape used by both `tool_search` output and dynamic tool registration, so dynamic tools use the same wrapping logic in both paths. Validation: - `cargo test -p codex-tools` - `cargo test -p codex-core tool_search` --------- Co-authored-by: Sayan Sisodiya <sayan@openai.com>	2026-04-21 14:13:08 +08:00
Michael Bolin	d62421d322	chore: document intentional await-holding cases (#18423 ) ## Why This PR prepares the stack to enable Clippy await-holding lints that were left disabled in #18178. The mechanical lock-scope cleanup is handled separately; this PR is the documentation/configuration layer for the remaining await-across-guard sites. Without explicit annotations, reviewers and future maintainers cannot tell whether an await-holding warning is a real concurrency smell or an intentional serialization boundary. ## What changed - Configures `clippy.toml` so `await_holding_invalid_type` also covers `tokio::sync::{MutexGuard,RwLockReadGuard,RwLockWriteGuard}`. - Adds targeted `#[expect(clippy::await_holding_invalid_type, reason = ...)]` annotations for intentional async guard lifetimes. - Documents the main categories of intentional cases: active-turn state transitions that must remain atomic, session-owned MCP manager accesses, remote-control websocket serialization, JS REPL kernel/process serialization, OAuth persistence, external bearer token refresh serialization, and tests that intentionally serialize shared global or session-owned state. - For external bearer token refresh, documents the existing serialization boundary: holding `cached_token` across the provider command prevents concurrent cache misses from starting duplicate refresh commands, and the current behavior is small enough that an explicit expectation is easier to maintain than adding another synchronization primitive. ## Verification - `cargo clippy -p codex-login --all-targets` - `cargo clippy -p codex-connectors --all-targets` - `cargo clippy -p codex-core --all-targets` - The follow-up PR #18698 enables `await_holding_invalid_type` and `await_holding_lock` as workspace `deny` lints, so any undocumented remaining offender will fail Clippy. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18423). * #18698 * __->__ #18423	2026-04-20 22:41:54 -07:00
pakrym-oai	4c2e730488	Organize context fragments (#18794 ) Organize context fragments under `core/context`. Implement same trait on all of them.	2026-04-20 22:39:17 -07:00
Abhinav	ab26554a3a	Add remote_sandbox_config to our config requirements (#18763 ) ## Why Customers need finer-grained control over allowed sandbox modes based on the host Codex is running on. For example, they may want stricter sandbox limits on devboxes while keeping a different default elsewhere. Our current cloud requirements can target user/account groups, but they cannot vary sandbox requirements by host. That makes remote development environments awkward because the same top-level `allowed_sandbox_modes` has to apply everywhere. ## What Adds a new `remote_sandbox_config` section to `requirements.toml`: ```toml allowed_sandbox_modes = ["read-only"] [[remote_sandbox_config]] hostname_patterns = [".org"] allowed_sandbox_modes = ["read-only", "workspace-write"] [[remote_sandbox_config]] hostname_patterns = [".sh", "runner-*.ci"] allowed_sandbox_modes = ["read-only", "danger-full-access"] ``` During requirements resolution, Codex resolves the local host name once, preferring the machine FQDN when available and falling back to the cleaned kernel hostname. This host classification is best effort rather than authenticated device proof. Each requirements source applies its first matching `remote_sandbox_config` entry before it is merged with other sources. The shared merge helper keeps that `apply_remote_sandbox_config` step paired with requirements merging so new requirements sources do not have to remember the extra call. That preserves source precedence: a lower-precedence requirements file with a matching `remote_sandbox_config` cannot override a higher-precedence source that already set `allowed_sandbox_modes`. This also wires the hostname-aware resolution through app-server, CLI/TUI config loading, config API reads, and config layer metadata so they all evaluate remote sandbox requirements consistently. ## Verification - `cargo test -p codex-config remote_sandbox_config` - `cargo test -p codex-config host_name` - `cargo test -p codex-core load_config_layers_applies_matching_remote_sandbox_config` - `cargo test -p codex-core system_remote_sandbox_config_keeps_cloud_sandbox_modes` - `cargo test -p codex-config` - `cargo test -p codex-core` unit tests passed; `tests/all.rs` integration matrix was intentionally stopped after the relevant focused tests passed - `just fix -p codex-config` - `just fix -p codex-core` - `cargo check -p codex-app-server`	2026-04-21 05:05:02 +00:00
Dylan Hurd	86535c9901	feat(auto-review) Handle request_permissions calls (#18393 ) ## Summary When auto-review is enabled, it should handle request_permissions tool. We'll need to clean up the UX but I'm planning to do that in a separate pass ## Testing - [x] Ran locally <img width="893" height="396" alt="Screenshot 2026-04-17 at 1 16 13 PM" src="https://github.com/user-attachments/assets/4c045c5f-1138-4c6c-ac6e-2cb6be4514d8" /> --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 21:48:57 -07:00
Dylan Hurd	58e7605efc	fix(guardian) Dont hard error on feature disable (#18795 ) ## Summary This shouldn't error for now ## Test plan - [x] Updated unit test	2026-04-20 19:54:39 -07:00
Celia Chen	cefcfe43b9	feat: add a built-in Amazon Bedrock model provider (#18744 ) ## Why Codex needs a first-class `amazon-bedrock` model provider so users can select Bedrock without copying a full provider definition into `config.toml`. The provider has Codex-owned defaults for the pieces that should stay consistent across users: the display `name`, Bedrock `base_url`, and `wire_api`. At the same time, users still need a way to choose the AWS credential profile used by their local environment. This change makes `amazon-bedrock` a partially modifiable built-in provider: code owns the provider identity and endpoint defaults, while user config can set `model_providers.amazon-bedrock.aws.profile`. For example: ```toml model_provider = "amazon-bedrock" [model_providers.amazon-bedrock.aws] profile = "codex-bedrock" ``` ## What Changed - Added `amazon-bedrock` to the built-in model provider map with: - `name = "Amazon Bedrock"` - `base_url = "https://bedrock-mantle.us-east-1.api.aws/v1"` - `wire_api = "responses"` - Added AWS provider auth config with a profile-only shape: `model_providers.<id>.aws.profile`. - Kept AWS auth config restricted to `amazon-bedrock`; custom providers that set `aws` are rejected. - Allowed `model_providers.amazon-bedrock` through reserved-provider validation so it can act as a partial override. - During config loading, only `aws.profile` is copied from the user-provided `amazon-bedrock` entry onto the built-in provider. Other Bedrock provider fields remain hard-coded by the built-in definition. - Updated the generated config schema for the new provider AWS profile config.	2026-04-21 00:54:05 +00:00
guinness-oai	ca3246f77a	[codex] Send realtime transcript deltas on handoff (#18761 ) ## Summary - Track how many realtime transcript entries have already been attached to a background-agent handoff. - Attach only entries added since the previous handoff as `<transcript_delta>` instead of resending the accumulated transcript snapshot. - Update the realtime integration test so the second delegation carries only the second transcript delta. ## Validation - `just fmt` - `cargo test -p codex-api` - `cargo test -p codex-core inbound_handoff_request_sends_transcript_delta_after_each_handoff` - `cargo build -p codex-cli -p codex-app-server` ## Manual testing Built local debug binaries at: - `codex-rs/target/debug/codex` - `codex-rs/target/debug/codex-app-server`	2026-04-20 16:46:15 -07:00
viyatb-oai	33fa952426	fix: fix stale proxy env restoration after shell snapshots (#17271 ) ## Summary This fixes a stale-environment path in shell snapshot restoration. A sandboxed command can source a shell snapshot that was captured while an older proxy process was running. If that proxy has died and come back on a different port, the snapshot can otherwise put old proxy values back into the command environment, which is how tools like `pip` end up talking to a dead proxy. The wrapper now captures the live process environment before sourcing the snapshot and then restores or clears every proxy env var from the proxy crate's canonical list. That makes proxy state after shell snapshot restoration match the current command environment, rather than whatever proxy values happened to be present in the snapshot. On macOS, the Codex-generated `GIT_SSH_COMMAND` is refreshed when the SOCKS listener changes, while custom SSH wrappers are still left alone. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 16:39:17 -07:00
Rasmus Rygaard	7b994100b3	Add session config loader interface (#18208 ) ## Why Cloud-hosted sessions need a way for the service that starts or manages a thread to provide session-owned config without treating all config as if it came from the same user/project/workspace TOML stack. The important boundary is ownership: some values should be controlled by the session/orchestrator, some by the authenticated user, and later some may come from the executor. The earlier broad config-store shape made that boundary too fuzzy and overlapped heavily with the existing filesystem-backed config loader. This PR starts with the smaller piece we need now: a typed session config loader that can feed the existing config layer stack while preserving the normal precedence and merge behavior. ## What Changed - Added `ThreadConfigLoader` and related typed payloads in `codex-config`. - `SessionThreadConfig` currently supports `model_provider`, `model_providers`, and feature flags. - `UserThreadConfig` is present as an ownership boundary, but does not yet add TOML-backed fields. - `NoopThreadConfigLoader` preserves existing behavior when no external loader is configured. - `StaticThreadConfigLoader` supports tests and simple callers. - Taught thread config sources to produce ordinary `ConfigLayerEntry` values so the existing `ConfigLayerStack` remains the place where precedence and merging happen. - Wired the loader through `ConfigBuilder`, the config loader, and app-server startup paths so app-server can provide session-owned config before deriving a thread config. - Added coverage for: - translating typed thread config into config layers, - inserting thread config layers into the stack at the right precedence, - applying session-provided model provider and feature settings when app-server derives config from thread params. ## Follow-Ups This intentionally stops short of adding the remote/service transport. The next pieces are expected to be: 1. Define the proto/API shape for this interface. 2. Add a client implementation that can source session config from the service side. ## Verification - Added unit coverage in `codex-config` for the loader and layer conversion. - Added `codex-core` config loader coverage for thread config layer precedence. - Added app-server coverage that verifies session thread config wins over request-provided config for model provider and feature settings.	2026-04-20 23:05:49 +00:00
guinness-oai	1029742cf7	Add realtime silence tool (#18635 ) ## Summary Adds a second realtime v2 function tool, `remain_silent`, so the realtime model has an explicit non-speaking action when the collaboration mode or latest context says it should not answer aloud. This is stacked on #18597. ## Design - Advertise `remain_silent` alongside `background_agent` in realtime v2 conversational sessions. - Parse `remain_silent` function calls into a typed `RealtimeEvent::NoopRequested` event. - Have core answer that function call with an empty `function_call_output` and deliberately avoid `response.create`, so no follow-up realtime response is requested. - Keep the event hidden from app-server/TUI surfaces; it is operational plumbing, not user-visible conversation content.	2026-04-20 15:43:20 -07:00
Thibault Sottiaux	54bd07d28c	[codex] prefer inherited spawn agent model (#18701 ) This updates the spawn-agent tool contract so subagents are presented as inheriting the parent model by default. The visible model list is now framed as optional overrides, the model parameter tells callers to leave it unset and the delegation guidance no longer nudges models toward picking a smaller/mini override. Fixes reports that 5.4 would occasionally pick 5.2 or lower as sub-agents.	2026-04-20 22:34:08 +00:00
Tom	46e5814f77	Add experimental remote thread store config (#18714 ) Add experimental config to use remote thread store rather than local thread store implementation in app server	2026-04-20 22:20:39 +00:00
Ahmed Ibrahim	cc96a03f10	Fix stale model test fixtures (#18719 ) Fixes stale test fixtures left after the active bundled model catalog updates in #18586 and #18388. Those changes made `gpt-5.4` the current default and removed several older hardcoded slugs, which left Windows Bazel shards failing TUI and config tests. What changed: - Refresh TUI model migration, availability NUX, plan-mode, status, and snapshot fixtures to use active bundled model slugs. - Update the config edit test expectation for the TOML-quoted `"gpt-5.2"` migration key. - Move the model catalog tests into `codex-rs/tui/src/app/tests/model_catalog.rs` so touching them does not trip the blob-size policy for `app.rs`. Verification: - CI Bazel/lint checks are expected to cover the affected test shards.	2026-04-20 21:52:30 +00:00
guinness-oai	126bd6e7a8	Update realtime handoff transcript handling (#18597 ) ## Summary This PR aims to improve integration between the realtime model and the codex agent by sharing more context with each other. In particular, we now share full realtime conversation transcript deltas in addition to the delegation message. realtime_conversation.rs now turns a handoff into: ``` <realtime_delegation> <input>...</input> <transcript_delta>...</transcript_delta> </realtime_delegation> ``` ## Implementation notes The transcript is accumulated in the realtime websocket layer as parsed realtime events arrive. When a background-agent handoff is requested, the current transcript snapshot is copied onto the handoff event and then serialized by `realtime_conversation.rs` into the hidden realtime delegation envelope that Codex receives as user-turn context. For Realtime V2, the session now explicitly enables input audio transcription, and the parser handles the relevant input/output transcript completion events so the snapshot includes both user speech and realtime model responses. The delegation `<input>` remains the actual handoff request, while `<transcript_delta>` carries the surrounding conversation history for context. Reviewers should note that the transcript payload is intended for Codex context sharing, not UI rendering. The realtime delegation envelope should stay hidden from the user-facing transcript surface, while still being included in the background-agent turn so Codex can answer with the same conversational context the realtime model had.	2026-04-20 14:04:09 -07:00
Dylan Hurd	14ebfbced9	chore(guardian) disable mcps and plugins (#18722 ) ## Summary Disables apps, plugins, mcps for the guardian subagent thread ## Testing - [x] Added unit tests	2026-04-20 13:43:50 -07:00
rhan-oai	7f53e47250	[codex-analytics] guardian review analytics schema polishing (#17692 ) ## Why Guardian review analytics needs a Rust event shape that matches the backend schema while avoiding unnecessary PII exposure from reviewed tool calls. This PR narrows the analytics payload to the fields we intend to emit and keeps shared Guardian assessment enums in protocol instead of duplicating equivalent analytics-only enums. ## What changed - Uses protocol Guardian enums directly for `risk_level`, `user_authorization`, `outcome`, and command source values. - Removes high-risk reviewed-action fields from the analytics payload, including raw commands, display strings, working directories, file paths, network targets/hosts, justification text, retry reason, and rationale text. - Makes `target_item_id` and `tool_call_count` nullable so the Codex event can represent cases where the app-server protocol or producer does not have those values. - Keeps lower-risk structured reviewed-action metadata such as sandbox permissions, permission profile, `tty`, `execve` source/program, network protocol/port, and MCP connector/tool labels. - Adds an analytics reducer/client test covering `codex_guardian_review` serialization with an optional `target_item_id` and absent removed fields. ## Verification - `cargo test -p codex-analytics guardian_review_event_ingests_custom_fact_with_optional_target_item` - `cargo fmt --check` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17692). * #17696 * #17695 * #17693 * __->__ #17692	2026-04-20 13:08:17 -07:00
Akshay Nathan	34a3e85fcd	Wire the PatchUpdated events through app_server (#18289 ) Wires patch_updated events through app_server. These events are parsed and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer. The eventual goal is to use this to display better progress indicators in the codex app.	2026-04-20 10:44:03 -07:00
Ahmed Ibrahim	316cf0e90b	Update models.json (#18586 ) - Replace the active models-manager catalog with the deleted core catalog contents. - Replace stale hardcoded test model slugs with current bundled model slugs. - Keep this as a stacked change on top of the cleanup PR.	2026-04-20 10:27:01 -07:00
Michael Bolin	5d5d610740	refactor: use semaphores for async serialization gates (#18403 ) This is the second cleanup in the await-holding lint stack. The higher-level goal, following https://github.com/openai/codex/pull/18178 and https://github.com/openai/codex/pull/18398, is to enable Clippy coverage for guards held across `.await` points without carrying broad suppressions. The stack is working toward enabling Clippy's [`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock) lint and the configurable [`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type) lint for Tokio guard types. Several existing fields used `tokio::sync::Mutex<()>` only as one-at-a-time async gates. Those guards intentionally lived across `.await` while an operation was serialized. A mutex over `()` suggests protected data and trips the await-holding lint shape; a single-permit `tokio::sync::Semaphore` expresses the intended serialization directly. ## What changed - Replace `Mutex<()>` serialization gates with `Semaphore::new(1)` for agent identity ensure, exec policy updates, guardian review session reuse, plugin remote sync, managed network proxy refresh, auth token refresh, and RMCP session recovery. - Update call sites from `lock().await` / `try_lock()` to `acquire().await` / `try_acquire()`. - Map closed-semaphore errors into the existing local error types, even though these semaphores are owned for the lifetime of their managers. - Update session test builders for the new `managed_network_proxy_refresh_lock` type. ## Verification - The split stack was verified at the final lint-enabling head with `just clippy`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18403). * #18698 * #18423 * #18418 * __->__ #18403	2026-04-20 17:21:29 +00:00
Michael Bolin	dcec516313	protocol: canonicalize file system permissions (#18274 ) ## Why `PermissionProfile` needs stable, canonical file-system semantics before it can become the primary runtime permissions abstraction. Without a canonical form, callers have to keep re-deriving legacy sandbox maps and profile comparisons remain lossy or order-dependent. ## What changed This adds canonicalization helpers for `FileSystemPermissions` and `PermissionProfile`, expands special paths into explicit sandbox entries, and updates permission request/conversion paths to consume those canonical entries. It also tightens the legacy bridge so root-wide write profiles with narrower carveouts are not silently projected as full-disk legacy access. ## Verification - `cargo test -p codex-protocol root_write_with_read_only_child_is_not_full_disk_write -- --nocapture` - `cargo test -p codex-sandboxing permission -- --nocapture` - `cargo test -p codex-tui permissions -- --nocapture`	2026-04-20 09:57:03 -07:00
Eric Traut	fa0e2ba87c	Avoid false shell snapshot cleanup warnings (#18441 ) ## Why Fresh app-server thread startup can create a shell snapshot through a temp file and then promote it to the final snapshot path. The previous implementation briefly wrapped the temp path in `ShellSnapshot`, so after a successful rename its `Drop` attempted to delete the old temp path and could log a false `ENOENT` warning. Fixes #17549. ## What changed - Validate the temp snapshot path directly before promotion. - Rename the temp path directly to the final snapshot path. - Keep explicit cleanup of the temp path on validation or finalization failures.	2026-04-20 15:15:05 +01:00
Adrian	904c751a40	[codex] Use background agent task auth for backend calls (#18094 ) ## Summary Introduces a single background/control-plane agent task for ChatGPT backend requests that do not have a thread-scoped task, with `AuthManager` owning the default ChatGPT backend authorization decision. Callers now ask `AuthManager` for the default ChatGPT backend authorization header. `AuthManager` decides whether that is bearer or background AgentAssertion based on config/internal state, while low-level bootstrap paths can explicitly request bearer-only auth. This PR is stacked on PR4 and focuses on the shared background task auth plumbing plus the first tranche of backend/control-plane consumers. The remaining callsite wiring is split into PR4.2 to keep review size down. ## Stack - PR1: https://github.com/openai/codex/pull/17385 - add `features.use_agent_identity` - PR2: https://github.com/openai/codex/pull/17386 - register agent identities when enabled - PR3: https://github.com/openai/codex/pull/17387 - register agent tasks when enabled - PR3.1: https://github.com/openai/codex/pull/17978 - persist and prewarm registered tasks per thread - PR4: https://github.com/openai/codex/pull/17980 - use task-scoped `AgentAssertion` for downstream calls - PR4.1: this PR - introduce AuthManager-owned background/control-plane `AgentAssertion` auth - PR4.2: https://github.com/openai/codex/pull/18260 - use background task auth for additional backend/control-plane calls ## What Changed - add background task registration and assertion minting inside `codex-login` - persist `agent_identity.background_task_id` separately from per-session task state - make `BackgroundAgentTaskManager` private to `codex-login`; call sites do not instantiate or pass it around - teach `AuthManager` the ChatGPT backend base URL and feature-derived background auth mode from resolved config - expose bearer-only helpers for bootstrap/registration/refresh-style paths that must not use AgentAssertion - wire `AuthManager` default ChatGPT authorization through app listing, connector directory listing, remote plugins, MCP status/listing, analytics, and core-skills remote calls - preserve bearer fallback when the feature is disabled, the backend host is unsupported, or background task registration is not available ## Validation - `just fmt` - `cargo check -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `cargo test -p codex-login agent_identity` - `cargo test -p codex-model-provider bearer_auth_provider` - `cargo test -p codex-core agent_assertion` - `cargo test -p codex-app-server remote_control` - `cargo test -p codex-cloud-requirements fetch_cloud_requirements` - `cargo test -p codex-models-manager manager::tests` - `cargo test -p codex-chatgpt` - `cargo test -p codex-cloud-tasks` - `just fix -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `just fix -p codex-app-server` - `git diff --check`	2026-04-20 06:50:28 -07:00
jif-oai	2c59806fe0	feat: add metric to track the number of turns with memory usage (#18662 ) Add a metric `codex.turn.memory` to know if a turn used memories or not. This is not part of the other turn metrics as a label to limit cardinality	2026-04-20 14:31:22 +01:00
jif-oai	1c24347772	feat: chronicle alias (#18651 ) Rename Telepathy to Chronicle and add an alias for backward compatibility	2026-04-20 11:52:21 +01:00
jif-oai	fc758af9eb	fix: exec policy loading for sub-agents (#18654 )	2026-04-20 11:51:58 +01:00
jif-oai	ff6a5804d2	nit: telepathy to chronicle in tests (#18652 )	2026-04-20 11:51:55 +01:00
jif-oai	be4fe9f9b2	feat: add `--ignore-user-config` and `--ignore-rules` (#18646 ) Add those 2 flags to be able to fully isolate a run of `codex exec` from any rules or tools. This will be used by Chronicle	2026-04-20 11:27:47 +01:00
jif-oai	7d8bd69283	fix: FS watcher when file does not exist yet (#18492 ) The initial goal of this PR was to stabilise the test `fs_watch_allows_missing_file_targets`. After further investigation, it turns out that this test was always failing and the unstability was coming from a race between timeouts mostly The goal of the test was to test what happens if a notifier gets subscribed while a file does not exist yet. But actually the main code was broken and in case of a file not existing yet, the notifier used to never notify anything (even if the file ended up being created) This PR fixes the main code (and the test). For this, we basically watch the sup-directory when a file does not exist and refresh on it when the files gets created	2026-04-20 11:23:00 +01:00
jif-oai	7171b25b30	fix: main 2 (#18649 )	2026-04-20 10:53:54 +01:00

1 2 3 4 5 ...

2879 Commits