codex

mirror of https://github.com/openai/codex.git synced 2026-04-26 15:45:02 +00:00

Author	SHA1	Message	Date
Michael Bolin	abbd74e2be	feat: make sandbox read access configurable with `ReadOnlyAccess` (#11387 ) `SandboxPolicy::ReadOnly` previously implied broad read access and could not express a narrower read surface. This change introduces an explicit read-access model so we can support user-configurable read restrictions in follow-up work, while preserving current behavior today. It also ensures unsupported backends fail closed for restricted-read policies instead of silently granting broader access than intended. ## What - Added `ReadOnlyAccess` in protocol with: - `Restricted { include_platform_defaults, readable_roots }` - `FullAccess` - Updated `SandboxPolicy` to carry read-access configuration: - `ReadOnly { access: ReadOnlyAccess }` - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }` - Preserved existing behavior by defaulting current construction paths to `ReadOnlyAccess::FullAccess`. - Threaded the new fields through sandbox policy consumers and call sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and related tests. - Updated Seatbelt policy generation to honor restricted read roots by emitting scoped read rules when full read access is not granted. - Added fail-closed behavior on Linux and Windows backends when restricted read access is requested but not yet implemented there (`UnsupportedOperation`). - Regenerated app-server protocol schema and TypeScript artifacts, including `ReadOnlyAccess`. ## Compatibility / rollout - Runtime behavior remains unchanged by default (`FullAccess`). - API/schema changes are in place so future config wiring can enable restricted read access without another policy-shape migration.	2026-02-11 18:31:14 -08:00
jif-oai	c67120f4a0	fix: flaky landlock (#10689 ) https://openai.slack.com/archives/C095U48JNL9/p1770243347893959	2026-02-05 10:30:18 +00:00
gt-oai	7c6d21a414	Fix test_shell_command_interruption flake (#10649 ) ## Human summary Sandboxing (specifically `LandlockRestrict`) is means that e.g. `sleep 10` fails immediately. Therefore it cannot be interrupted. In suite::interrupt::test_shell_command_interruption, sleep 10 is issued at 17:28:16.554 (ToolCall: shell_command {"command":"sleep 10"...}), then fails at 17:28:16.589 with duration_ms=34, success=false, exit_code=101, and Sandbox(LandlockRestrict). ## Codex summary - set `sandbox_mode = "danger-full-access"` in `interrupt` and `v2/turn_interrupt` integration tests - set `sandbox: Some(SandboxMode::DangerFullAccess)` in `test_codex_jsonrpc_conversation_flow` - set `sandbox_policy: Some(SandboxPolicy::DangerFullAccess)` in `command_execution_notifications_include_process_id` ## Why On some Linux CI environments, command execution fails immediately with `LandlockRestrict` when sandboxed. These tests are intended to validate JSON-RPC/task lifecycle behavior (interrupt semantics, command notification shape/process id, request flow), but early sandbox startup failure changes turn flow and can trigger extra follow-up requests, causing flakes. This change removes environment-specific sandbox startup dependency from these tests while preserving their primary intent. ## Testing - not run in this environment (per request)	2026-02-04 22:19:06 +00:00
gt-oai	48aeb67f7a	Fix flakey conversation flow test (#9784 ) I've seen this test fail with: ``` - Mock #1. Expected range of matching incoming requests: == 2 Number of matched incoming requests: 1 ``` This is because we pop the wrong task_complete events and then the test exits. I think this is because the MCP events are now buffered after https://github.com/openai/codex/pull/8874. So: 1. clear the buffer before we do any user message sending 2. additionally listen for task start before task complete 3. use the ID from task start to find the correct task complete event.	2026-01-26 15:58:14 +00:00
charley-oai	1fa8350ae7	Add text element metadata to protocol, app server, and core (#9331 ) The second part of breaking up PR https://github.com/openai/codex/pull/9116 Summary: - Add `TextElement` / `ByteRange` to protocol user inputs and user message events with defaults. - Thread `text_elements` through app-server v1/v2 request handling and history rebuild. - Preserve UI metadata only in user input/events (not `ContentItem`) while keeping local image attachments in user events for rehydration. Details: - Protocol: `UserInput::Text` carries `text_elements`; `UserMessageEvent` carries `text_elements` + `local_images`. Serialization includes empty vectors for backward compatibility. - app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in camelCase with conversions; v2 uses its own camelCase wrapper. - app-server: v1/v2 input mapping includes `text_elements`; thread history rebuilds include them. - Core: user event emission preserves UI metadata while model history stays clean; history replay round-trips the metadata.	2026-01-15 17:26:41 -08:00
jif-oai	1aed01e99f	renaming: task to turn (#8963 )	2026-01-09 17:31:17 +00:00
Celia Chen	be4364bb80	[chore] move app server tests from chat completion to responses (#8939 ) We are deprecating chat completions. Move all app server tests from chat completion to responses.	2026-01-08 22:27:55 +00:00
Celia Chen	051bf81df9	[fix] app server flaky send_messages test (#8874 ) Fix flakiness of CI test: https://github.com/openai/codex/actions/runs/20350530276/job/58473691434?pr=8282 This PR does two things: 1. move the flakiness test to use responses API instead of chat completion API 2. make mcp_process agnostic to the order of responses/notifications/requests that come in, by buffering messages not read	2026-01-08 20:41:21 +00:00
jif-oai	116059c3a0	chore: unify conversation with thread name (#8830 ) Done and verified by Codex + refactor feature of RustRover	2026-01-07 17:04:53 +00:00
Anton Panasenko	807f8a43c2	feat: expose outputSchema to user_turn/turn_start app_server API (#8377 ) What changed - Added `outputSchema` support to the app-server APIs, mirroring `codex exec --output-schema` behavior. - V1 `sendUserTurn` now accepts `outputSchema` and constrains the final assistant message for that turn. - V2 `turn/start` now accepts `outputSchema` and constrains the final assistant message for that turn (explicitly per-turn only). Core behavior - `Op::UserTurn` already supported `final_output_json_schema`; now V1 `sendUserTurn` forwards `outputSchema` into that field. - `Op::UserInput` now carries `final_output_json_schema` for per-turn settings updates; core maps it into `SessionSettingsUpdate.final_output_json_schema` so it applies to the created turn context. - V2 `turn/start` does NOT persist the schema via `OverrideTurnContext` (it’s applied only for the current turn). Other overrides (cwd/model/etc) keep their existing persistent behavior. API / docs - `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema: Option<serde_json::Value>` to `SendUserTurnParams` (serialized as `outputSchema`). - `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema: Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`). - `codex-rs/app-server/README.md`: document `outputSchema` for `turn/start` and clarify it applies only to the current turn. - `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1 `sendUserTurn` and v2 `turn/start`. Tests added/updated - New app-server integration tests asserting `outputSchema` is forwarded into outbound `/responses` requests as `text.format`: - `codex-rs/app-server/tests/suite/output_schema.rs` - `codex-rs/app-server/tests/suite/v2/output_schema.rs` - Added per-turn semantics tests (schema does not leak to the next turn): - `send_user_turn_output_schema_is_per_turn_v1` - `turn_start_output_schema_is_per_turn_v2` - Added protocol wire-compat tests for the merged op: - serialize omits `final_output_json_schema` when `None` - deserialize works when field is missing - serialize includes `final_output_json_schema` when `Some(schema)` Call site updates (high level) - Updated all `Op::UserInput { .. }` constructions to include `final_output_json_schema`: - `codex-rs/app-server/src/codex_message_processor.rs` - `codex-rs/core/src/codex_delegate.rs` - `codex-rs/mcp-server/src/codex_tool_runner.rs` - `codex-rs/tui/src/chatwidget.rs` - `codex-rs/tui2/src/chatwidget.rs` - plus impacted core tests. Validation - `just fmt` - `cargo test -p codex-core` - `cargo test -p codex-app-server` - `cargo test -p codex-mcp-server` - `cargo test -p codex-tui` - `cargo test -p codex-tui2` - `cargo test -p codex-protocol` - `cargo clippy --all-features --tests --profile dev --fix -- -D warnings`	2026-01-05 10:27:00 -08:00
Michael Bolin	642b7566df	fix: introduce AbsolutePathBuf as part of sandbox config (#7856 ) Changes the `writable_roots` field of the `WorkspaceWrite` variant of the `SandboxPolicy` enum from `Vec<PathBuf>` to `Vec<AbsolutePathBuf>`. This is helpful because now callers can be sure the value is an absolute path rather than a relative one. (Though when using an absolute path in a Seatbelt config policy, we still have to _canonicalize_ it first.) Because `writable_roots` can be read from a config file, it is important that we are able to resolve relative paths properly using the parent folder of the config file as the base path.	2025-12-12 15:25:22 -08:00
Eric Traut	c4af707e09	Removed experimental "command risk assessment" feature (#7799 ) This experimental feature received lukewarm reception during internal testing. Removing from the code base.	2025-12-10 09:48:11 -08:00
Ahmed Ibrahim	71504325d3	Migrate model preset (#7542 ) - Introduce `openai_models` in `/core` - Move `PRESETS` under it - Move `ModelPreset`, `ModelUpgrade`, `ReasoningEffortPreset`, `ReasoningEffortPreset`, and `ReasoningEffortPreset` to `protocol` - Introduce `Op::ListModels` and `EventMsg::AvailableModels` Next steps: - migrate `app-server` and `tui` to use the introduced Operation	2025-12-03 20:30:43 +00:00
pakrym-oai	767b66f407	Migrate coverage to shell_command (#7042 )	2025-11-21 03:44:00 +00:00
Owen Lin	266419217e	chore: use anyhow::Result for all app-server integration tests (#5836 ) There's a lot of visual noise in app-server's integration tests due to the number of `.expect("<some_msg>")` lines which are largely redundant / not very useful. Clean them up by using `anyhow::Result` + `?` consistently. Replaces the existing pattern of: ``` let codex_home = TempDir::new().expect("create temp dir"); create_config_toml(codex_home.path()).expect("write config.toml"); let mut mcp = McpProcess::new(codex_home.path()) .await .expect("spawn mcp process"); timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()) .await .expect("initialize timeout") .expect("initialize request"); ``` With: ``` let codex_home = TempDir::new()?; create_config_toml(codex_home.path())?; let mut mcp = McpProcess::new(codex_home.path()).await?; timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??; ```	2025-10-28 08:10:23 -07:00
Anton Panasenko	6af83d86ff	[codex][app-server] introduce codex/event/raw_item events (#5578 )	2025-10-24 22:41:52 +00:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
pakrym-oai	3c90728a29	Add new thread items and rewire event parsing to use them (#5418 ) 1. Adds AgentMessage, Reasoning, WebSearch items. 2. Switches the ResponseItem parsing to use new items and then also emit 3. Removes user-item kind and filters out "special" (environment) user items when returning to clients.	2025-10-22 10:14:50 -07:00
Michael Bolin	995f5c3614	feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222 ) This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent` in the core protocol (`protocol/src/protocol.rs`), which is also what this field is named on `ExecCommandBeginEvent`. Honestly, I don't love the name (it sounds like a single command, but it is actually a list of them), but I don't want to get distracted by a naming discussion right now. This also adds `parsed_cmd` to `ExecCommandApprovalParams` in `codex-rs/app-server-protocol/src/protocol.rs`, so it will be available via `codex app-server`, as well. For consistency, I also updated `ExecApprovalElicitRequestParams` in `codex-rs/mcp-server/src/exec_approval.rs` to include this field under the name `codex_parsed_cmd`, as that struct already has a number of special `codex_*` fields. Note this is the code for when Codex is used as an MCP _server_ and therefore has to conform to the official spec for an MCP elicitation type.	2025-10-15 13:58:40 -07:00
jif-oai	69cb72f842	chore: sandbox refactor 2 (#4653 ) Revert the revert and fix the UI issue	2025-10-03 11:17:39 +01:00
Ahmed Ibrahim	ed5d656fa8	Revert "chore: sanbox extraction" (#4626 ) Reverts openai/codex#4286	2025-10-02 21:09:21 +00:00
jif-oai	b8195a17e5	chore: sanbox extraction (#4286 ) # Extract and Centralize Sandboxing - Goal: Improve safety and clarity by centralizing sandbox planning and execution. - Approach: - Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux) with run_with_plan. - Refactor codex.rs to plan-then-execute; handle failures/escalation via the plan. - Delegate apply_patch to the codex binary and run it with an empty env for determinism.	2025-10-01 12:05:12 +01:00
Michael Bolin	5881c0d6d4	fix: remove mcp-types from app server protocol (#4537 ) We continue the separation between `codex app-server` and `codex mcp-server`. In particular, we introduce a new crate, `codex-app-server-protocol`, and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it `codex-rs/app-server-protocol/src/protocol.rs`. Because `ConversationId` was defined in `mcp_protocol.rs`, we move it into its own file, `codex-rs/protocol/src/conversation_id.rs`, and because it is referenced in a ton of places, we have to touch a lot of files as part of this PR. We also decide to get away from proper JSON-RPC 2.0 semantics, so we also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which is basically the same `JSONRPCMessage` type defined in `mcp-types` except with all of the `"jsonrpc": "2.0"` removed. Getting rid of `"jsonrpc": "2.0"` makes our serialization logic considerably simpler, as we can lean heavier on serde to serialize directly into the wire format that we use now.	2025-10-01 02:16:26 +00:00
Michael Bolin	32853ecbc5	fix: use macros to ensure request/response symmetry (#4529 ) Manually curating `protocol-ts/src/lib.rs` was error-prone, as expected. I finally asked Codex to write some Rust macros so we can ensure that: - For every variant of `ClientRequest` and `ServerRequest`, there is an associated `params` and `response` type. - All response types are included automatically in the output of `codex generate-ts`.	2025-09-30 18:06:05 -07:00
Michael Bolin	d9dbf48828	fix: separate `codex mcp` into `codex mcp-server` and `codex app-server` (#4471 ) This is a very large PR with some non-backwards-compatible changes. Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish server that had two overlapping responsibilities: - Running an MCP server, providing some basic tool calls. - Running the app server used to power experiences such as the VS Code extension. This PR aims to separate these into distinct concepts: - `codex mcp-server` for the MCP server - `codex app-server` for the "application server" Note `codex mcp` still exists because it already has its own subcommands for MCP management (`list`, `add`, etc.) The MCP logic continues to live in `codex-rs/mcp-server` whereas the refactored app server logic is in the new `codex-rs/app-server` folder. Note that most of the existing integration tests in `codex-rs/mcp-server/tests/suite` were actually for the app server, so all the tests have been moved with the exception of `codex-rs/mcp-server/tests/suite/mod.rs`. Because this is already a large diff, I tried not to change more than I had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses the name `McpProcess` for now, but I will do some mechanical renamings to things like `AppServer` in subsequent PRs. While `mcp-server` and `app-server` share some overlapping functionality (like reading streams of JSONL and dispatching based on message types) and some differences (completely different message types), I ended up doing a bit of copypasta between the two crates, as both have somewhat similar `message_processor.rs` and `outgoing_message.rs` files for now, though I expect them to diverge more in the near future. One material change is that of the initialize handshake for `codex app-server`, as we no longer use the MCP types for that handshake. Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an `Initialize` variant to `ClientRequest`, which takes the `ClientInfo` object we need to update the `USER_AGENT_SUFFIX` in `codex-rs/app-server/src/message_processor.rs`. One other material change is in `codex-rs/app-server/src/codex_message_processor.rs` where I eliminated a use of the `send_event_as_notification()` method I am generally trying to deprecate (because it blindly maps an `EventMsg` into a `JSONNotification`) in favor of `send_server_notification()`, which takes a `ServerNotification`, as that is intended to be a custom enum of all notification types supported by the app server. So to make this update, I had to introduce a new variant of `ServerNotification`, `SessionConfigured`, which is a non-backwards compatible change with the old `codex mcp`, and clients will have to be updated after the next release that contains this PR. Note that `codex-rs/app-server/tests/suite/list_resume.rs` also had to be update to reflect this change. I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility crate to avoid some of the copying between `mcp-server` and `app-server`.	2025-09-30 07:06:18 +00:00

25 Commits