codex

mirror of https://github.com/openai/codex.git synced 2026-05-03 02:46:39 +00:00

Author	SHA1	Message	Date
Won Park	722e8f08e1	unifying all image saves to /tmp to bug-proof (#14149 ) image-gen feature will have the model saving to /tmp by default + at all times	2026-03-11 12:33:08 -07:00
Ahmed Ibrahim	91ca20c7c3	Add spawn_agent model overrides (#14160 ) - add `model` and `reasoning_effort` to the `spawn_agent` schema so the values pass through - validate requested models against `model.model` and only check that the selected model supports the requested reasoning effort --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-11 12:33:08 -07:00
xl-openai	d751e68f44	feat: Allow sync with remote plugin status. (#14176 ) Add forceRemoteSync to plugin/list. When it is set to True, we will sync the local plugin status with the remote one (backend-api/plugins/list).	2026-03-11 12:33:08 -07:00
Matthew Zeng	f2d66fadd8	add(core): arc_monitor (#13936 ) ## Summary - add ARC monitor support for MCP tool calls by serializing MCP approval requests into the ARC action shape and sending the relevant conversation/policy context to the `/api/codex/safety/arc` endpoint - route ARC outcomes back into MCP approval flow so `ask-user` falls back to a user prompt and `steer-model` blocks the tool call, with guardian/ARC tests covering the new request shape - update the TUI approval copy from “Approve Once” to “Allow” / “Allow for this session” and refresh the related snapshots --------- Co-authored-by: Fouad Matin <fouad@openai.com> Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>	2026-03-11 12:33:08 -07:00
pakrym-oai	c4d35084f5	Reuse McpToolOutput in McpHandler (#14229 ) We already have a type to represent the MCP tool output, reuse it instead of the custom McpHandlerOutput	2026-03-11 12:33:07 -07:00
pakrym-oai	00ea8aa7ee	Expose strongly-typed result for exec_command (#14183 ) Summary - document output types for the various tool handlers and registry so the API exposes richer descriptions - update unified execution helpers and client tests to align with the new output metadata - clean up unused helpers across tool dispatch paths Testing - Not run (not requested)	2026-03-11 12:33:07 -07:00
Eric Traut	f9cba5cb16	Log ChatGPT user ID for feedback tags (#13901 ) There are some bug investigations that currently require us to ask users for their user ID even though they've already uploaded logs and session details via `/feedback`. This frustrates users and increases the time for diagnosis. This PR includes the ChatGPT user ID in the metadata uploaded for `/feedback` (both the TUI and app-server).	2026-03-11 12:33:07 -07:00
Eric Traut	026cfde023	Fix Linux tmux segfault in user shell lookup (#13900 ) Replace the Unix shell lookup path in `codex-rs/core/src/shell.rs` to use `libc::getpwuid_r()` instead of `libc::getpwuid()` when resolving the current user's shell. Why: - `getpwuid()` can return pointers into libc-managed shared storage - on the musl static Linux build, concurrent callers can race on that storage - this matches the crash pattern reported in tmux/Linux sessions with parallel shell activity Refs: - Fixes #13842	2026-03-11 12:33:07 -07:00
Eric Traut	7144f84c69	Fix release-mode integration test compiler failure (#13603 ) Addresses #13586 This doesn't affect our CI scripts. It was user-reported. Summary - add `wiremock::ResponseTemplate` and `body_string_contains` imports behind `#[cfg(not(debug_assertions))]` in `codex-rs/core/tests/suite/view_image.rs` so release builds only pull the helpers they actually use	2026-03-11 12:33:07 -07:00
Ahmed Ibrahim	6b7253b123	Fix unified exec test output assertion (#14184 ) ## Summary - update the unified exec test to use truncated_output() instead of the removed output field - fix the compile failure on latest main after ExecCommandToolOutput changed shape	2026-03-09 23:12:36 -07:00
Ahmed Ibrahim	aa6a57dfa2	Stabilize incomplete SSE retry test (#13879 ) ## What changed - The retry test now uses the same streaming SSE test server used by production-style tests instead of a wiremock sequence. - The fixture is resolved via `find_resource!`, and the test asserts that exactly two outbound requests were sent. ## Why this fixes the flake - The old wiremock sequence approximated early-close behavior, but it did not reproduce the same streaming semantics the real client sees. - That meant the retry path depended on mock implementation details instead of on the actual transport behavior we care about. - Switching to the streaming SSE helper makes the test exercise the real early-close/retry contract, and counting requests directly verifies that we retried exactly once rather than merely hoping the sequence aligned. ## Scope - Test-only change.	2026-03-09 22:34:44 -07:00
Ahmed Ibrahim	2e24be2134	Use realtime transcript for handoff context (#14132 ) - collect input/output transcript deltas into active handoff transcript state - attach and clear that transcript on each handoff, and regenerate schema/tests	2026-03-09 22:30:03 -07:00
Channing Conger	c6343e0649	Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296 ) ### Purpose While trying to build out CLI-Tools for the agent to use under skills we have found that those tools sometimes need to invoke a user elicitation. These elicitations are handled out of band of the codex app-server but need to indicate to the exec manager that the command running is not going to progress on the usual timeout horizon. ### Example Model calls universal exec: `$ download-credit-card-history --start-date 2026-01-19 --end-date 2026-02-19 > credit_history.jsonl` download-cred-card-history might hit a hosted/preauthenticated service to fetch data. That service might decide that the request requires an end user approval the access to the personal data. It should be able to signal to the running thread that the command in question is blocked on user elicitation. In that case we want the exec to continue, but the timeout to not expire on the tool call, essentially freezing time until the user approves or rejects the command at which point the tool would signal the app-server to decrement the outstanding elicitation count. Now timeouts would proceed as normal. ### What's Added - New v2 RPC methods: - thread/increment_elicitation - thread/decrement_elicitation - Protocol updates in: - codex-rs/app-server-protocol/src/protocol/common.rs - codex-rs/app-server-protocol/src/protocol/v2.rs - App-server handlers wired in: - codex-rs/app-server/src/codex_message_processor.rs ### Behavior - Counter starts at 0 per thread. - increment atomically increases the counter. - decrement atomically decreases the counter; decrement at 0 returns invalid request. - Transition rules: - 0 -> 1: broadcast pause state, pausing all active stopwatches immediately. - \>0 -> >0: remain paused. - 1 -> 0: broadcast unpause state, resuming stopwatches. - Core thread/session logic: - codex-rs/core/src/codex_thread.rs - codex-rs/core/src/codex.rs - codex-rs/core/src/mcp_connection_manager.rs ### Exec-server stopwatch integration - Added centralized stopwatch tracking/controller: - codex-rs/exec-server/src/posix/stopwatch_controller.rs - Hooked pause/unpause broadcast handling + stopwatch registration: - codex-rs/exec-server/src/posix/mcp.rs - codex-rs/exec-server/src/posix/stopwatch.rs - codex-rs/exec-server/src/posix.rs	2026-03-09 22:29:26 -07:00
Matthew Zeng	566e4cee4b	[apps] Fix apps enablement condition. (#14011 ) - [x] Fix apps enablement condition to check both the feature flag and that the user is not an API key user.	2026-03-09 22:25:43 -07:00
pakrym-oai	a9ae43621b	Move exec command truncation into ExecCommandToolOutput (#14169 ) Summary - relocate truncation logic for exec command output into the new `ExecCommandToolOutput` response helper instead of centralized handler code - update all affected tools and unified exec handling to use the new response item structure and eliminate `Function(FunctionToolOutput)` responses - adjust context, registry, and handler interfaces to align with the new response semantics and error fields Testing - Not run (not requested)	2026-03-09 22:13:48 -07:00
xl-openai	0c33af7746	feat: support disabling bundled system skills (#13792 ) Support disable bundled system skills with a config: [skills.bundled] enabled = false	2026-03-09 22:02:53 -07:00
pakrym-oai	710682598d	Export tools module into code mode runner (#14167 ) Summary - allow `code_mode` to pass enabled tools metadata to the runner and expose them via `tools.js` - import tools inside JavaScript rather than relying only on globals or proxies for nested tool calls - update specs, docs, and tests to exercise the new bridge and explain the tooling changes Testing - Not run (not requested)	2026-03-09 21:59:09 -07:00
Dylan Hurd	772259b01f	fix(core) default RejectConfig.request_permissions (#14165 ) ## Summary Adds a default here so existing config deserializes ## Testing - [x] Added a unit test	2026-03-10 04:56:23 +00:00
pakrym-oai	d71e042694	Enforce single tool output type in codex handlers (#14157 ) We'll need to associate output schema with each tool. Each tool can only have on output type.	2026-03-09 21:49:44 -07:00
Andrei Eternal	244b2d53f4	start of hooks engine (#13276 ) (Experimental) This PR adds a first MVP for hooks, with SessionStart and Stop The core design is: - hooks live in a dedicated engine under codex-rs/hooks - each hook type has its own event-specific file - hook execution is synchronous and blocks normal turn progression while running - matching hooks run in parallel, then their results are aggregated into a normalized HookRunSummary On the AppServer side, hooks are exposed as operational metadata rather than transcript-native items: - new live notifications: hook/started, hook/completed - persisted/replayed hook results live on Turn.hookRuns - we intentionally did not add hook-specific ThreadItem variants Hooks messages are not persisted, they remain ephemeral. The context changes they add are (they get appended to the user's prompt)	2026-03-10 04:11:31 +00:00
pakrym-oai	da616136cc	Add code_mode experimental feature (#13418 ) A much narrower and more isolated (no node features) version of js_repl	2026-03-09 20:56:27 -07:00
pakrym-oai	aa04ea6bd7	Refactor tool output into trait implementations (#14152 ) First state to making tool outputs strongly typed (and `renderable`).	2026-03-09 19:38:32 -07:00
viyatb-oai	1165a16e6f	fix: keep permissions profiles forward compatible (#14107 ) ## Summary - preserve unknown `:special_path` tokens, including nested entries, so older Codex builds warn and ignore instead of failing config load - fail closed with a startup warning when a permissions profile has missing or empty filesystem entries instead of aborting profile compilation - normalize Windows verbatim paths like `\?\C:\...` before absolute-path validation while keeping explicit errors for truly invalid paths ## Testing - just fmt - cargo test -p codex-core permissions_profiles_allow - cargo test -p codex-core normalize_absolute_path_for_platform_simplifies_windows_verbatim_paths - cargo test -p codex-protocol unknown_special_paths_are_ignored_by_legacy_bridge - cargo clippy -p codex-core -p codex-protocol --all-targets -- -D warnings - cargo clean	2026-03-09 18:43:38 -07:00
viyatb-oai	b0cbc25a48	fix(protocol): preserve legacy workspace-write semantics (#13957 ) ## Summary This is a fast follow to the initial `[permissions]` structure. - keep the new split-policy carveout behavior for narrower non-write entries under broader writable roots - preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge that drops only redundant nested readable roots when projecting from `SandboxPolicy` - route the legacy macOS seatbelt adapter through that same legacy bridge so redundant nested readable roots do not become read-only carveouts on macOS - derive the legacy bridge for `command_exec` using the sandbox root cwd rather than the request cwd so policy derivation matches later sandbox enforcement - add regression coverage for the legacy macOS nested-readable-root case ## Examples ### Legacy `workspace-write` on macOS A legacy `workspace-write` policy can redundantly list a nested readable root under an already-writable workspace root. For example, legacy config can effectively mean: - workspace root (`.` / `cwd`) is writable - `docs/` is also listed in `readable_roots` The new shared split-policy helper intentionally treats a narrower non-write entry under a broader writable root as a carveout for real `[permissions]` configs. Without this fast follow, the unchanged macOS seatbelt legacy adapter could project that legacy shape into a `FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout under the writable workspace root. In practice, legacy callers on macOS could unexpectedly lose write access inside `docs/`, even though that path was writable before the `[permissions]` migration work. This change fixes that by routing the legacy seatbelt path through the cwd-aware legacy bridge, so: - legacy `workspace-write` keeps `docs/` writable when `docs/` was only a redundant readable root - explicit `[permissions]` entries like `'.' = 'write'` and `'docs' = 'read'` still make `docs/` read-only, which is the new intended split-policy behavior ### Legacy `command_exec` with a subdirectory cwd `command_exec` can run a command from a request cwd that is narrower than the sandbox root cwd. For example: - sandbox root cwd is `/repo` - request cwd is `/repo/subdir` - legacy policy is still `workspace-write` rooted at `/repo` Before this fast follow, `command_exec` derived the legacy bridge using the request cwd, but the sandbox was later built using the sandbox root cwd. That mismatch could miss redundant legacy readable roots during projection and accidentally reintroduce read-only carveouts for paths that should still be writable under the legacy model. This change fixes that by deriving the legacy bridge with the same sandbox root cwd that sandbox enforcement later uses. ## Verification - `just fmt` - `cargo test -p codex-core seatbelt_legacy_workspace_write_nested_readable_root_stays_writable` - `cargo test -p codex-core test_sandbox_config_parsing` - `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D warnings` - `cargo clean`	2026-03-09 18:43:27 -07:00
Dylan Hurd	6da84efed8	feat(approvals) RejectConfig for request_permissions (#14118 ) ## Summary We need to support allowing request_permissions calls when using `Reject` policy <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06 40 PM" src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5" /> Note that this is a backwards-incompatible change for Reject policy. I'm not sure if we need to add a default based on our current use/setup ## Testing - [x] Added tests - [x] Tested locally	2026-03-09 18:16:54 -07:00
Dylan Hurd	c1defcc98c	fix(core) RequestPermissions + ApplyPatch (#14055 ) ## Summary The apply_patch tool should also respect AdditionalPermissions ## Testing - [x] Added unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 16:11:19 -07:00
Owen Lin	d309c102ef	fix(core): use dedicated types for responsesapi web search tool config (#14136 ) This changes the web_search tool spec in codex-core to use dedicated Responses-API payload structs instead of shared config types and custom serializers. Previously, `ToolSpec::WebSearch` stored `WebSearchFilters` and `WebSearchUserLocation` directly and relied on hand-written serializers to shape the outgoing JSON. This worked, but it mixed config/schema types with the OpenAI Responses payload contract and created an easy place for drift if those shared types changed later. ### Why This keeps the boundary clearer: - app-server/config/schema types stay focused on config - Responses tool payload types stay focused on the OpenAI wire format It also makes the serialization behavior obvious from the structs themselves, instead of hiding it in custom serializer functions.	2026-03-09 14:58:33 -07:00
Dylan Hurd	d241dc598c	feat(core) Persist request_permission data across turns (#14009 ) ## Summary request_permissions flows should support persisting results for the session. Open Question: Still deciding if we need within-turn approvals - this adds complexity but I could see it being useful ## Testing - [x] Updated unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 14:36:38 -07:00
Won Park	42f20a6845	pass on save info to model + ui tweaks (#14123 ) Passing on more information to the model for context purposes, to streamline image-identification.	2026-03-09 20:10:15 +00:00
Ahmed Ibrahim	44ecc527cb	Stabilize RMCP streamable HTTP readiness tests (#13880 ) ## What changed - The RMCP streamable HTTP tests now wait for metadata and tool readiness before issuing tool calls. - OAuth state is isolated per test home. - The helper server startup path now uses bounded bind retries so transient `AddrInUse` collisions do not fail the test immediately. ## Why this fixes the flake - The old tests could begin issuing tool requests before the helper server had finished advertising its metadata and tools, so the first request sometimes raced the server startup sequence. - On top of that, shared OAuth state and occasional bind collisions on CI runners introduced cross-test environmental noise unrelated to the functionality under test. - Readiness polling makes the client wait for an observable “server is ready” signal, while isolated state and bounded bind retries remove external contention that was causing intermittent failures. ## Scope - Test-only change.	2026-03-09 19:52:55 +00:00
Owen Lin	da991bdf3a	feat(otel): Centralize OTEL metric names and shared tag builders (#14117 ) This cleans up a bunch of metric plumbing that had started to drift. The main change is making `codex-otel` the canonical home for shared metric definitions and metric tag helpers. I moved the `turn/thread` metric names that were still duplicated into the OTEL metric registry, added a shared `metrics::tags` module for common tag keys and session tag construction, and updated `SessionTelemetry` to build its metadata tags through that shared path. On the codex-core side, TTFT/TTFM now use the shared metric-name constants instead of local string definitions. I also switched the obvious remaining turn/thread metric callsites over to the shared constants, and added a small helper so TTFT/TTFM can attach an optional sanitized client.name tag from TurnContext. This should make follow-on telemetry work less ad hoc: - one canonical place for metric names - one canonical place for common metric tag keys/builders - less duplication between `codex-core` and `codex-otel`	2026-03-09 12:46:42 -07:00
sayan-oai	6ad448b658	chore: plugin/uninstall endpoint (#14111 ) add `plugin/uninstall` app-server endpoint to fully rm plugin from plugins cache dir and rm entry from user config file. plugin-enablement is session-scoped, so uninstalls are only picked up in new sessions (like installs). added tests.	2026-03-09 12:40:25 -07:00
Dylan Hurd	0334ddeccb	fix(ci) Faster shell_command::unicode_output test (#14114 ) ## Summary Alternative to #14061 - we need to use a child process on windows to correctly validate Powershell behavior. ## Testing - [x] These are tests	2026-03-09 19:09:56 +00:00
Ahmed Ibrahim	fefd01b9e0	Stabilize resumed rollout messages (#14060 ) ## What changed - add a bounded `resume_until_initial_messages` helper in `core/tests/suite/resume.rs` - retry the resume call until `initial_messages` contains the fully persisted final turn shape before asserting ## Why this fixes flakiness The old test resumed once immediately after `TurnComplete` and sometimes read rollout state before the final turn had been persisted. That made the assertion race persistence timing instead of checking the resumed message shape. The new helper polls for up to two seconds in 10ms steps and only asserts once the expected message sequence is actually present, so the test waits for the real readiness condition instead of depending on a lucky timing window. ## Scope - test-only - no production logic change	2026-03-09 11:48:13 -07:00
Ahmed Ibrahim	e03e9b63ea	Stabilize guardian approval coverage (#14103 ) ## Summary - align the guardian permission test with the actual sandbox policy it widens and use a slightly larger Windows-only timeout budget - expose the additional-permissions normalization helper to the guardian test module - replace the guardian popup snapshot assertion with targeted string assertions ## Why this fixes the flake This group was carrying two separate sources of drift. The guardian core test widened derived sandbox policies without updating the source sandbox policy, and it used a Windows command/timeout combination that was too tight on slower runners. Separately, the TUI test was snapshotting the full popup even though unrelated feature text changes were the only thing moving. The new assertions keep coverage on the guardian entry itself while removing unrelated snapshot churn.	2026-03-09 11:23:20 -07:00
Ahmed Ibrahim	ad57505ef5	Stabilize interrupted task approval cleanup (#14102 ) ## Summary - drain the active turn tasks before clearing pending approvals during interruption - keep the turn in hand long enough for interrupted tasks to observe cancellation first ## Why this fixes the flake Interrupted turns could clear pending approvals too early, which let an in-flight approval wait surface as a model-visible rejection before the turn emitted `TurnAborted`. Reordering the cleanup removes that race without changing the steady-state task model.	2026-03-09 11:22:51 -07:00
Ahmed Ibrahim	75e608343c	Stabilize realtime startup context tests (#13876 ) ## What changed - The realtime startup-context tests no longer assume the interesting websocket payload is always `connection 1 / request 0`. - Instead, they now wait for the first outbound websocket request that actually carries `session.instructions`, regardless of which websocket connection won the accept-order race on the runner. - The env-key fallback test stays serialized because it mutates process environment. ## Why this fixes the flake - The old test synchronized on the mirrored `session.updated` client event and then inspected a fixed websocket slot. - On CI, the response websocket and the realtime websocket can race each other during startup. When the response websocket wins that race, the fixed slot can contain `response.create` instead of the startup-context-bearing `session.update` request the test actually cares about. - That made the test fail nondeterministically by inspecting the wrong request, or by timing out waiting on a secondary event even though the real outbound request path was correct. - Waiting directly on the first request whose payload includes `session.instructions` removes both ordering assumptions and makes the assertion line up with the actual contract under test. - Separately, serializing the environment-mutating fallback case prevents unrelated tests from seeing partially updated auth state. ## Scope - Test-only change.	2026-03-09 10:57:43 -07:00
Ahmed Ibrahim	4a0e6dc916	Serialize shell snapshot stdin test (#13878 ) ## What changed - `snapshot_shell_does_not_inherit_stdin` now runs under its own serial key. - The change isolates it from other Unix shell-snapshot tests that also interact with stdin. ## Why this fixes the flake - The failure was not a shell-snapshot logic bug. It was shared-stdin interference between concurrently executing tests. - When multiple tests compete for inherited stdin at the same time, one test can observe EOF or consumed input that actually belongs to a different test. - Running this specific test in a dedicated serial bucket guarantees exclusive ownership of stdin, which makes the assertion deterministic without weakening coverage. ## Scope - Test-only change.	2026-03-09 10:44:13 -07:00
Charley Cunningham	f23fcd6ced	guardian initial feedback / tweaks (#13897 ) ## Summary - remove the remaining model-visible guardian-specific `on-request` prompt additions so enabling the feature does not change the main approval-policy instructions - neutralize user-facing guardian wording to talk about automatic approval review / approval requests rather than a second reviewer or only sandbox escalations - tighten guardian retry-context handling so agent-authored `justification` stays in the structured action JSON and is not also injected as raw retry context - simplify guardian review plumbing in core by deleting dead prompt-append paths and trimming some request/transcript setup code ## Notable Changes - delete the dead `permissions/approval_policy/guardian.md` append path and stop threading `guardian_approval_enabled` through model-facing developer-instruction builders - rename the experimental feature copy to `Automatic approval review` and update the `/experimental` snapshot text accordingly - make approval-review status strings generic across shell, patch, network, and MCP review types - forward real sandbox/network retry reasons for shell and unified-exec guardian review, but do not pass agent-authored justification as raw retry context - simplify `guardian.rs` by removing the one-field request wrapper, deduping reasoning-effort selection, and cleaning up transcript entry collection ## Testing - `just fmt` - full validation left to CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 09:25:24 -07:00
Jack Mousseau	e6b93841c5	Add request permissions tool (#13092 ) Adds a built-in `request_permissions` tool and wires it through the Codex core, protocol, and app-server layers so a running turn can ask the client for additional permissions instead of relying on a static session policy. The new flow emits a `RequestPermissions` event from core, tracks the pending request by call ID, forwards it through app-server v2 as an `item/permissions/requestApproval` request, and resumes the tool call once the client returns an approved subset of the requested permission profile.	2026-03-08 20:23:06 -07:00
Dylan Hurd	f41b1638c9	fix(core) patch otel test (#14014 ) ## Summary This test was missing the turn completion event in the responses stream, so it was hanging. This PR fixes the issue ## Testing - [x] This does update the test	2026-03-08 19:06:30 -07:00
Celia Chen	340f9c9ecb	app-server: include experimental skill metadata in exec approval requests (#13929 ) ## Summary This change surfaces skill metadata on command approval requests so app-server clients can tell when an approval came from a skill script and identify the originating `SKILL.md`. - add `skill_metadata` to exec approval events in the shared protocol - thread skill metadata through core shell escalation and delegated approval handling for skill-triggered approvals - expose the field in app-server v2 as experimental `skillMetadata` - regenerate the JSON/TypeScript schemas and cover the new field in protocol, transport, core, and TUI tests ## Why Skill-triggered approvals already carry skill context inside core, but app-server clients could not see which skill caused the prompt. Sending the skill metadata with the approval request makes it possible for clients to present better approval UX and connect the prompt back to the relevant skill definition. ## example event in app-server-v2 verified that we see this event when experimental api is on: ``` < { < "id": 11, < "method": "item/commandExecution/requestApproval", < "params": { < "additionalPermissions": { < "fileSystem": null, < "macos": { < "accessibility": false, < "automations": { < "bundle_ids": [ < "com.apple.Notes" < ] < }, < "calendar": false, < "preferences": "read_only" < }, < "network": null < }, < "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes", < "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy", < "skillMetadata": { < "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md" < }, < "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69", < "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f" < } < }` ``` & verified that this is the event when experimental api is off: ``` < { < "id": 13, < "method": "item/commandExecution/requestApproval", < "params": { < "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt", < "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4", < "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a" < } < } ```	2026-03-08 18:07:46 -07:00
Ahmed Ibrahim	1f150eda8b	Stabilize shell serialization tests (#13877 ) ## What changed - The duration-recording fixture sleep was reduced from a large artificial delay to `0.2s`, and the assertion floor was lowered to `0.1s`. - The shell tool fixtures now force `login = false` so they do not invoke login-shell startup paths. ## Why this fixes the flake - The old tests were paying for two kinds of noise that had nothing to do with the feature being validated: oversized sleep time and variable shell initialization cost. - Login shells can pick up runner-specific startup files and incur inconsistent startup latency. - The test only needs to prove that we record a nontrivial duration and preserve shell output. A shorter fixture delay plus a non-login shell keeps that coverage while removing runner-dependent wall-clock variance. ## Scope - Test-only change.	2026-03-08 13:37:41 -07:00
Charley Cunningham	7ba1fccfc1	fix(ci): restore guardian coverage and bazel unit tests (#13912 ) ## Summary - restore the guardian review request snapshot test and its tracked snapshot after it was dropped from `main` - make Bazel Rust unit-test wrappers resolve runfiles correctly on manifest-only platforms like macOS and point Insta at the real workspace root - harden the shell-escalation socket-closure assertion so the musl Bazel test no longer depends on fd reuse behavior ## Verification - cargo test -p codex-core guardian_review_request_layout_matches_model_visible_request_snapshot - cargo test -p codex-shell-escalation - bazel test //codex-rs/exec:exec-unit-tests //codex-rs/shell-escalation:shell-escalation-unit-tests Supersedes #13894. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com> Co-authored-by: viyatb-oai <viyatb@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-08 12:05:19 -07:00
Ahmed Ibrahim	dc19e78962	Stabilize abort task follow-up handling (#13874 ) - production logic plus tests; cancel running tasks before clearing pending turn state - suppress follow-up model requests after cancellation and assert on stabilized request counts instead of fixed sleeps	2026-03-07 22:56:00 -08:00
Michael Bolin	3b5fe5ca35	protocol: keep root carveouts sandboxed (#13452 ) ## Why A restricted filesystem policy that grants `:root` read or write access but also carries explicit deny entries should still behave like scoped access with carveouts, not like unrestricted disk access. Without that distinction, later platform backends cannot preserve blocked subpaths under root-level permissions because the protocol layer reports the policy as fully unrestricted. ## What changed - taught `FileSystemSandboxPolicy` to treat root access plus explicit deny entries as scoped access rather than full-disk access - derived readable and writable roots from the filesystem root when root access is combined with carveouts, while preserving the denied paths as read-only subpaths - added protocol coverage for root-write policies with carveouts and a core sandboxing regression so those policies still require platform sandboxing ## Verification - added protocol coverage in `protocol/src/permissions.rs` and `protocol/src/protocol.rs` for root access with explicit carveouts - added platform-sandbox regression coverage in `core/src/sandboxing/mod.rs` - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13452). * #13453 * __->__ #13452 * #13451 * #13449 * #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 21:15:47 -08:00
Michael Bolin	46b8d127cf	sandboxing: preserve denied paths when widening permissions (#13451 ) ## Why After the split-policy plumbing landed, additional-permissions widening still rebuilt filesystem access through the legacy projection in a few places. That can erase explicit deny entries and make the runtime treat a policy as fully writable even when it still has blocked subpaths, which in turn can skip the platform sandbox when it is still needed. ## What changed - preserved explicit deny entries when merging additional read and write permissions into `FileSystemSandboxPolicy` - switched platform-sandbox selection to rely on `FileSystemSandboxPolicy::has_full_disk_write_access()` instead of ad hoc root-write checks - kept the widened policy path in `core/src/exec.rs` and `core/src/sandboxing/mod.rs` aligned so denied subpaths survive both policy merging and sandbox selection - added regression coverage for root-write policies that still carry carveouts ## Verification - added regression coverage in `core/src/sandboxing/mod.rs` showing that root write plus carveouts still requires the platform sandbox - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13451). * #13453 * #13452 * __->__ #13451 * #13449 * #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-08 04:29:35 +00:00
Michael Bolin	07a30da3fb	linux-sandbox: plumb split sandbox policies through helper (#13449 ) ## Why The Linux sandbox helper still only accepted the legacy `SandboxPolicy` payload. That meant the runtime could compute split filesystem and network policies, but the helper would immediately collapse them back to the compatibility projection before applying seccomp or staging the bubblewrap inner command. ## What changed - added hidden `--file-system-sandbox-policy` and `--network-sandbox-policy` flags alongside the legacy `--sandbox-policy` flag so the helper can migrate incrementally - updated the core-side Landlock wrapper to pass the split policies explicitly when launching `codex-linux-sandbox` - added helper-side resolution logic that accepts either the legacy policy alone or a complete split-policy pair and normalizes that into one effective configuration - switched Linux helper network decisions to use `NetworkSandboxPolicy` directly - added `FromStr` support for the split policy types so the helper can parse them from CLI JSON ## Verification - added helper coverage in `linux-sandbox/src/linux_run_main_tests.rs` for split-policy flags and policy resolution - added CLI argument coverage in `core/src/landlock.rs` - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13449). * #13453 * #13452 * #13451 * __->__ #13449 * #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 19:40:10 -08:00
Matthew Zeng	a4a9536fd7	[elicitations] Support always allow option for mcp tool calls. (#13807 ) - [x] Support always allow option for mcp tool calls, writes to config.toml. - [x] Fix config hot-reload after starting a new thread for TUI.	2026-03-08 01:46:40 +00:00
sayan-oai	590cfa6176	chore: use @plugin instead of $plugin for plaintext mentions (#13921 ) change plaintext plugin-mentions from `$plugin` to `@plugin`, ensure TUI can correctly decode these from history. tested locally, added/updated tests.	2026-03-08 01:36:39 +00:00

... 10 11 12 13 14 ...

2695 Commits