codex

mirror of https://github.com/openai/codex.git synced 2026-04-26 07:35:29 +00:00

Author	SHA1	Message	Date
Dylan Hurd	4f6db60821	fix(core) exec_policy parsing fixes	2026-02-16 14:39:36 -08:00
jif-oai	af434b4f71	feat: drop MCP managing tools if no MCP servers (#11900 ) Drop MCP tools if no MCP servers to save context For this https://github.com/openai/codex/issues/11049	2026-02-16 18:40:45 +00:00
jif-oai	825a4af42f	feat: use shell policy in shell snapshot (#11759 ) Honor `shell_environment_policy.set` even after a shell snapshot	2026-02-16 09:11:00 +00:00
Anton Panasenko	02abd9a8ea	feat: persist and restore codex app's tools after search (#11780 ) ### What changed 1. Removed per-turn MCP selection reset in `core/src/tasks/mod.rs`. 2. Added `SessionState::set_mcp_tool_selection(Vec<String>)` in `core/src/state/session.rs` for authoritative restore behavior (deduped, order-preserving, empty clears). 3. Added rollout parsing in `core/src/codex.rs` to recover `active_selected_tools` from prior `search_tool_bm25` outputs: - tracks matching `call_id`s - parses function output text JSON - extracts `active_selected_tools` - latest valid payload wins - malformed/non-matching payloads are ignored 4. Applied restore logic to resumed and forked startup paths in `core/src/codex.rs`. 5. Updated instruction text to session/thread scope in `core/templates/search_tool/tool_description.md`. 6. Expanded tests in `core/tests/suite/search_tool.rs`, plus unit coverage in: - `core/src/codex.rs` - `core/src/state/session.rs` ### Behavior after change 1. Search activates matched tools. 2. Additional searches union into active selection. 3. Selection survives new turns in the same thread. 4. Resume/fork restores selection from rollout history. 5. Separate threads do not inherit selection unless forked.	2026-02-15 19:18:41 -08:00
sayan-oai	060a320e7d	fix: show user warning when using default fallback metadata (#11690 ) ### What It's currently unclear when the harness falls back to the default, generic `ModelInfo`. This happens when the `remote_models` feature is disabled or the model is truly unknown, and can lead to bad performance and issues in the harness. Add a user-facing warning when this happens so they are aware when their setup is broken. ### Tests Added tests, tested locally.	2026-02-15 18:46:05 -08:00
Charley Cunningham	85034b189e	core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487 ) This PR keeps compaction context-layout test coverage separate from runtime compaction behavior changes, so runtime logic review can stay focused. ## Included - Adds reusable context snapshot helpers in `core/tests/common/context_snapshot.rs` for rendering model-visible request/history shapes. - Standardizes helper naming for readability: - `format_request_input_snapshot` - `format_response_items_snapshot` - `format_labeled_requests_snapshot` - `format_labeled_items_snapshot` - Expands snapshot coverage for both local and remote compaction flows: - pre-turn auto-compaction - pre-turn failure/context-window-exceeded paths - mid-turn continuation compaction - manual `/compact` with and without prior user turns - Captures both sides where relevant: - compaction request shape - post-compaction history layout shape - Adds/uses shared request-inspection helpers so assertions target structured request content instead of ad-hoc JSON string parsing. - Aligns snapshots/assertions to current behavior and leaves explicit `TODO(ccunningham)` notes where behavior is known and intentionally deferred. ## Not Included - No runtime compaction logic changes. - No model-visible context/state behavior changes.	2026-02-14 19:57:10 -08:00
viyatb-oai	b527ee2890	feat(core): add structured network approval plumbing and policy decision model (#11672 ) ### Description #### Summary Introduces the core plumbing required for structured network approvals #### What changed - Added structured network policy decision modeling in core. - Added approval payload/context types needed for network approval semantics. - Wired shell/unified-exec runtime plumbing to consume structured decisions. - Updated related core error/event surfaces for structured handling. - Updated protocol plumbing used by core approval flow. - Included small CLI debug sandbox compatibility updates needed by this layer. #### Why establishes the minimal backend foundation for network approvals without yet changing high-level orchestration or TUI behavior. #### Notes - Behavior remains constrained by existing requirements/config gating. - Follow-up PRs in the stack handle orchestration, UX, and app-server integration. --------- Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>	2026-02-14 04:18:12 +00:00
Charley Cunningham	67e577da53	Handle model-switch base instructions after compaction (#11659 ) Strip trailing <model_switch> during model-switch compaction request, and append <model_switch> after model switch compaction	2026-02-13 19:02:53 -08:00
pash-openai	6c0a924203	turn metadata: per-turn non-blocking (#11677 )	2026-02-13 12:48:29 -08:00
Michael Bolin	2383978a2c	fix: reduce flakiness of compact_resume_after_second_compaction_preserves_history (#11663 ) ## Why `compact_resume_after_second_compaction_preserves_history` has been intermittently flaky in Windows CI. The test had two one-shot request matchers in the second compact/resume phase that could overlap, and it waited for the first `Warning` event after compaction. In practice, that made the test sensitive to platform/config-specific prompt shape and unrelated warning timing. ## What Changed - Hardened the second compaction matcher in `codex-rs/core/tests/suite/compact_resume_fork.rs` so it accepts expected compact-request variants while explicitly excluding the `AFTER_SECOND_RESUME` payload. - Updated `compact_conversation()` to wait for the specific compaction warning (`COMPACT_WARNING_MESSAGE`) rather than any `Warning` event. - Added an inline comment explaining why the matcher is intentionally broad but disjoint from the follow-up resume matcher. ## Test Plan - `cargo test -p codex-core --test all suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history -- --exact` - Repeated the same test in a loop (40 runs) to check for local nondeterminism.	2026-02-13 09:51:22 -08:00
Anton Panasenko	38c442ca7f	core: limit search_tool_bm25 to Apps and clarify discovery guidance (#11669 ) ## Summary - Limit `search_tool_bm25` indexing to `codex_apps` tools only, so non-Apps MCP servers are no longer discoverable through this search path. - Move search-tool discovery guidance into the `search_tool_bm25` tool description (via template include) instead of injecting it as a separate developer message. - Update Apps discovery guidance wording to clarify when to use `search_tool_bm25` for Apps-backed systems (for example Slack, Google Drive, Jira, Notion) and when to call tools directly. - Remove dead `core` helper code (`filter_codex_apps_mcp_tools` and `codex_apps_connector_id`) that is no longer used after the tool-selection refactor. - Update `core` search-tool tests to assert codex-apps-only behavior and to validate guidance from the tool description. ## Validation - ✅ `just fmt` - ✅ `cargo test -p codex-core search_tool` - ⚠️ `cargo test -p codex-core` was attempted, but the run repeatedly stalled on `tools::js_repl::tests::js_repl_can_attach_image_via_view_image_tool`. ## Tickets - None	2026-02-13 09:32:46 -08:00
Dylan Hurd	35692e99c1	chore(approvals) More approvals scenarios (#11660 ) ## Summary Add some additional tests to approvals flow ## Testing - [x] these are tests	2026-02-12 19:54:54 -08:00
Charley Cunningham	f24669d444	Persist complete TurnContextItem state via canonical conversion (#11656 ) ## Summary This PR delivers the first small, shippable step toward model-visible state diffing by making `TurnContextItem` more complete and standardizing how it is built. Specifically, it: - Adds persisted network context to `TurnContextItem`. - Introduces a single canonical `TurnContext -> TurnContextItem` conversion path. - Routes existing rollout write sites through that canonical conversion helper. No context injection/diff behavior changes are included in this PR. ## Why this change The design goal is to make `TurnContextItem` the canonical source of truth for context-diff decisions. Before this PR: - `TurnContextItem` did not include all TurnContext-derived environment inputs needed for v1 completeness. - Construction was duplicated at multiple write sites. This PR addresses both with a minimal, reviewable change. ## Changes ### 1) Extend `TurnContextItem` with network state - Added `TurnContextNetworkItem { allowed_domains, denied_domains }`. - Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`. - Kept backward compatibility by making the new field optional and skipped when absent. Files: - `codex-rs/protocol/src/protocol.rs` ### 2) Canonical conversion helper - Added `TurnContext::to_turn_context_item(collaboration_mode)` in core. - Added internal helper to derive network fields from `config_layer_stack.requirements().network`. Files: - `codex-rs/core/src/codex.rs` ### 3) Use canonical conversion at rollout write sites - Replaced ad hoc `TurnContextItem { ... }` construction with `to_turn_context_item(...)` in: - sampling request path - compaction path Files: - `codex-rs/core/src/codex.rs` - `codex-rs/core/src/compact.rs` ### 4) Update fixtures/tests for new optional field - Updated existing `TurnContextItem` literals in tests to include `network: None`. - Added protocol tests for: - deserializing old payloads with no `network` - serializing when `network` is present Files: - `codex-rs/core/tests/suite/resume_warning.rs` - No replay/diff logic changes. - Persisted rollout `TurnContextItem` now carries additional network context when available. - Older rollout lines without `network` remain readable.	2026-02-12 17:22:44 -08:00
Michael Bolin	a4cc1a4a85	feat: introduce Permissions (#11633 ) ## Why We currently carry multiple permission-related concepts directly on `Config` for shell/unified-exec behavior (`approval_policy`, `sandbox_policy`, `network`, `shell_environment_policy`, `windows_sandbox_mode`). Consolidating these into one in-memory struct makes permission handling easier to reason about and sets up the next step: supporting named permission profiles (`[permissions.PROFILE_NAME]`) without changing behavior now. This change is mostly mechanical: it updates existing callsites to go through `config.permissions`, but it does not yet refactor those callsites to take a single `Permissions` value in places where multiple permission fields are still threaded separately. This PR intentionally does not change the on-disk `config.toml` format yet and keeps compatibility with legacy config keys. ## What Changed - Introduced `Permissions` in `core/src/config/mod.rs`. - Added `Config::permissions` and moved effective runtime permission fields under it: - `approval_policy` - `sandbox_policy` - `network` - `shell_environment_policy` - `windows_sandbox_mode` - Updated config loading/building so these effective values are still derived from the same existing config inputs and constraints. - Updated Windows sandbox helpers/resolution to read/write via `permissions`. - Threaded the new field through all permission consumers across core runtime, app-server, CLI/exec, TUI, and sandbox summary code. - Updated affected tests to reference `config.permissions.*`. - Renamed the struct/field from `EffectivePermissions`/`effective_permissions` to `Permissions`/`permissions` and aligned variable naming accordingly. ## Verification - `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server -p codex-exec -p codex-utils-sandbox-summary` - `cargo build -p codex-core -p codex-tui -p codex-cli -p codex-app-server -p codex-exec -p codex-utils-sandbox-summary`	2026-02-12 14:42:54 -08:00
Curtis 'Fjord' Hawthorne	466be55abc	Add js_repl host helpers and exec end events (#10672 ) ## Summary This PR adds host-integrated helper APIs for `js_repl` and updates model guidance so the agent can use them reliably. ### What’s included - Add `codex.tool(name, args?)` in the JS kernel so `js_repl` can call normal Codex tools. - Keep persistent JS state and scratch-path helpers available: - `codex.state` - `codex.tmpDir` - Wire `js_repl` tool calls through the standard tool router path. - Add/align `js_repl` execution completion/end event behavior with existing tool logging patterns. - Update dynamic prompt injection (`project_doc`) to document: - how to call `codex.tool(...)` - raw output behavior - image flow via `view_image` (`codex.tmpDir` + `codex.tool("view_image", ...)`) - stdio safety guidance (`console.log` / `codex.tool`, avoid direct `process.std*`) ## Why - Standardize JS-side tool usage on `codex.tool(...)` - Make `js_repl` behavior more consistent with existing tool execution and event/logging patterns. - Give the model enough runtime guidance to use `js_repl` safely and effectively. ## Testing - Added/updated unit and runtime tests for: - `codex.tool` calls from `js_repl` (including shell/MCP paths) - image handoff flow via `view_image` - prompt-injection text for `js_repl` guidance - execution/end event behavior and related regression coverage #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/10674 - 👉 `2` https://github.com/openai/codex/pull/10672 - ⏳ `3` https://github.com/openai/codex/pull/10671 - ⏳ `4` https://github.com/openai/codex/pull/10673 - ⏳ `5` https://github.com/openai/codex/pull/10670	2026-02-12 12:10:25 -08:00
Owen Lin	efc8d45750	feat(app-server): experimental flag to persist extended history (#11227 ) This PR adds an experimental `persist_extended_history` bool flag to app-server thread APIs so rollout logs can retain a richer set of EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e. on `thread/resume`). ### Motivation Today, our rollout recorder only persists a small subset (e.g. user message, reasoning, assistant message) of `EventMsg` types, dropping a good number (like command exec, file change, etc.) that are important for reconstructing full item history for `thread/resume`, `thread/read`, and `thread/fork`. Some clients want to be able to resume a thread without lossiness. This lossiness is primarily a UI thing, since what the model sees are `ResponseItem` and not `EventMsg`. ### Approach This change introduces an opt-in `persist_full_history` flag to preserve those events when you start/resume/fork a thread (defaults to `false`). This is done by adding an `EventPersistenceMode` to the rollout recorder: - `Limited` (existing behavior, default) - `Extended` (new opt-in behavior) In `Extended` mode, persist additional `EventMsg` variants needed for non-lossy app-server `ThreadItem` reconstruction. We now store the following ThreadItems that we didn't before: - web search - command execution - patch/file changes - MCP tool calls - image view calls - collab tool outcomes - context compaction - review mode enter/exit For command executions in particular, we truncate the output using the existing `truncate_text` from core to store an upper bound of 10,000 bytes, which is also the default value for truncating tool outputs shown to the model. This keeps the size of the rollout file and command execution items returned over the wire reasonable. And we also persist `EventMsg::Error` which we can now map back to the Turn's status and populates the Turn's error metadata. #### Updates to EventMsgs To truly make `thread/resume` non-lossy, we also needed to persist the `status` on `EventMsg::CommandExecutionEndEvent` and `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a command failed or was declined (similar for apply_patch). These EventMsgs were never persisted before so I made it a required field.	2026-02-12 19:34:22 +00:00
jif-oai	545b266839	fix: fmt (#11619 )	2026-02-12 18:13:00 +00:00
Dylan Hurd	f39f506700	fix(core) model_info preserves slug (#11602 ) ## Summary Preserve the specified model slug when we get a prefix-based match ## Testing - [x] added unit test --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>	2026-02-12 09:43:32 -08:00
gt-oai	4027f1f1a4	Fix test flake (#11448 ) Flaking with ``` Nextest run ID 6b7ff5f7-57f6-4c9c-8026-67f08fa2f81f with nextest profile: default Starting 3282 tests across 118 binaries (21 tests skipped) FAIL [ 14.548s] (1367/3282) codex-core::all suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input stdout ─── running 1 test test suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input ... FAILED failures: failures: suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 522 filtered out; finished in 14.41s stderr ─── thread 'suite::apply_patch_cli::apply_patch_cli_can_use_shell_command_output_as_patch_input' (15632) panicked at C:\a\codex\codex\codex-rs\core\tests\common\lib.rs:186:14: timeout waiting for event: Elapsed(()) stack backtrace: read_output: Exit code: 0 Wall time: 8.5 seconds Output: line1 naïve café line3 stdout: line1 naïve café line3 patch: * Begin Patch * Add File: target.txt +line1 +naïve café +line3 *** End Patch note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. ```	2026-02-12 09:37:24 +00:00
pakrym-oai	fd7f2aedc7	Handle response.incomplete (#11558 ) Treat it same as error.	2026-02-12 00:11:38 -08:00
pakrym-oai	d391f3e2f9	Hide the first websocket retry (#11548 ) Sometimes connection needs to be quickly reestablished, don't produce an error for that.	2026-02-11 22:48:13 -08:00
sayan-oai	d1a97ed852	fix compilation (#11532 ) fix broken main	2026-02-11 19:31:13 -08:00
Michael Bolin	abbd74e2be	feat: make sandbox read access configurable with `ReadOnlyAccess` (#11387 ) `SandboxPolicy::ReadOnly` previously implied broad read access and could not express a narrower read surface. This change introduces an explicit read-access model so we can support user-configurable read restrictions in follow-up work, while preserving current behavior today. It also ensures unsupported backends fail closed for restricted-read policies instead of silently granting broader access than intended. ## What - Added `ReadOnlyAccess` in protocol with: - `Restricted { include_platform_defaults, readable_roots }` - `FullAccess` - Updated `SandboxPolicy` to carry read-access configuration: - `ReadOnly { access: ReadOnlyAccess }` - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }` - Preserved existing behavior by defaulting current construction paths to `ReadOnlyAccess::FullAccess`. - Threaded the new fields through sandbox policy consumers and call sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and related tests. - Updated Seatbelt policy generation to honor restricted read roots by emitting scoped read rules when full read access is not granted. - Added fail-closed behavior on Linux and Windows backends when restricted read access is requested but not yet implemented there (`UnsupportedOperation`). - Regenerated app-server protocol schema and TypeScript artifacts, including `ReadOnlyAccess`. ## Compatibility / rollout - Runtime behavior remains unchanged by default (`FullAccess`). - API/schema changes are in place so future config wiring can enable restricted read access without another policy-shape migration.	2026-02-11 18:31:14 -08:00
Ahmed Ibrahim	95fb86810f	Update context window after model switch (#11520 ) - Update token usage aggregation to refresh model context window after a model change. - Add protocol/core tests, including an e2e model-switch test that validates switching to a smaller model updates telemetry.	2026-02-11 17:41:23 -08:00
Ahmed Ibrahim	40de788c4d	Clamp auto-compact limit to context window (#11516 ) - Clamp auto-compaction to the minimum of configured limit and 90% of context window - Add an e2e compact test for clamped behavior - Update remote compact tests to account for earlier auto-compaction in setup turns	2026-02-11 17:41:08 -08:00
Ahmed Ibrahim	6938150c5e	Pre-sampling compact with previous model context (#11504 ) - Run pre-sampling compact through a single helper that builds previous-model turn context and compacts before the follow-up request when switching to a smaller context window. - Keep compaction events on the parent turn id and add compact suite coverage for switch-in-session and resume+switch flows.	2026-02-11 17:24:06 -08:00
Anton Panasenko	d3b078c282	Consolidate search_tool feature into apps (#11509 ) ## Summary - Remove `Feature::SearchTool` and the `search_tool` config key from the feature registry/schema. - Gate `search_tool_bm25` exposure via `Feature::Apps` in `core/src/tools/spec.rs`. - Update MCP selection logic in `core/src/codex.rs` to use `Feature::Apps` for search-tool behavior. - Update `core/tests/suite/search_tool.rs` to enable `Feature::Apps`. - Regenerate `core/config.schema.json` via `just write-config-schema`. ## Testing - `just fmt` - `cargo test -p codex-core --test all suite::search_tool::` ## Tickets - None	2026-02-11 16:52:42 -08:00
jif-oai	2fac9cc8cd	chore: sub-agent never ask for approval (#11464 )	2026-02-11 19:19:37 +00:00
Max Johnson	7053aa5457	Reapply "Add app-server transport layer with websocket support" (#11370 ) Reapply "Add app-server transport layer with websocket support" with additional fixes from https://github.com/openai/codex/pull/11313/changes to avoid deadlocking. This reverts commit `47356ff83c`. ## Summary To avoid deadlocking when queues are full, we maintain separate tokio tasks dedicated to incoming vs outgoing event handling - split the app-server main loop into two tasks in `run_main_with_transport` - inbound handling (`transport_event_rx`) - outbound handling (`outgoing_rx` + `thread_created_rx`) - separate incoming and outgoing websocket tasks ## Validation Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent requests <img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM" src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f" /> --------- Co-authored-by: jif-oai <jif@openai.com>	2026-02-11 18:13:39 +00:00
pakrym-oai	eac5473114	Do not attempt to append after response.completed (#11402 ) Completed responses are fully done, and new response must be created.	2026-02-11 07:45:17 -08:00
Michael Bolin	476c1a7160	Remove `test-support` feature from `codex-core` and replace it with explicit test toggles (#11405 ) ## Why `codex-core` was being built in multiple feature-resolved permutations because test-only behavior was modeled as crate features. For a large crate, those permutations increase compile cost and reduce cache reuse. ## Net Change - Removed the `test-support` crate feature and related feature wiring so `codex-core` no longer needs separate feature shapes for test consumers. - Standardized cross-crate test-only access behind `codex_core::test_support`. - External test code now imports helpers from `codex_core::test_support`. - Underlying implementation hooks are kept internal (`pub(crate)`) instead of broadly public. ## Outcome - Fewer `codex-core` build permutations. - Better incremental cache reuse across test targets. - No intended production behavior change.	2026-02-10 22:44:02 -08:00
xl-openai	fdd0cd1de9	feat: support multiple rate limits (#11260 ) Added multi-limit support end-to-end by carrying limit_name in rate-limit snapshots and handling multiple buckets instead of only codex. Extended /usage client parsing to consume additional_rate_limits Updated TUI /status and in-memory state to store/render per-limit snapshots Extended app-server rate-limit read response: kept rate_limits and added rate_limits_by_name. Adjusted usage-limit error messaging for non-default codex limit buckets	2026-02-10 20:09:31 -08:00
Celia Chen	641d5268fa	chore: persist turn_id in rollout session and make turn_id uuid based (#11246 ) Problem: 1. turn id is constructed in-memory; 2. on resuming threads, turn_id might not be unique; 3. client cannot no the boundary of a turn from rollout files easily. This PR does three things: 1. persist `task_started` and `task_complete` events; 1. persist `turn_id` in rollout turn events; 5. generate turn_id as unique uuids instead of incrementing it in memory. This helps us resolve the issue of clients wanting to have unique turn ids for resuming a thread, and knowing the boundry of each turn in rollout files. example debug logs ``` 2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None } 2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None } 2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None } ``` backward compatibility: if you try to resume an old session without task_started and task_complete event populated, the following happens: - If you resume and do nothing: those reconstructed historical IDs can differ next time you resume. - If you resume and send a new turn: the new turn gets a fresh UUID from live submission flow and is persisted, so that new turn’s ID is stable on later resumes. I think this behavior is fine, because we only care about deterministic turn id once a turn is triggered.	2026-02-11 03:56:01 +00:00
pakrym-oai	4473147985	Do not resend output items in incremental websockets connections (#11383 ) In the incremental websocket output items are already part of the context, no need to send them again and duplicate.	2026-02-10 19:38:08 -08:00
Dylan Hurd	cc8c293378	fix(exec-policy) No empty command lists (#11397 ) ## Summary This should rarely, if ever, happen in practice. But regardless, we should never provide an empty list of `commands` to ExecPolicy. This PR is almost entirely adding test around these cases. ## Testing - [x] Adds a bunch of unit tests for this	2026-02-10 19:22:23 -08:00
pakrym-oai	c68999ee6d	Prefer websocket transport when model opts in (#11386 ) Summary - add a `prefer_websockets` field to `ModelInfo`, defaulting to `false` in all fixtures and constructors - wire the new flag into websocket selection so models that opt in always use websocket transport even when the feature gate is off Testing - Not run (not requested)	2026-02-10 18:50:48 -08:00
pakrym-oai	bfd4e2112c	Disable very flaky tests (#11394 ) Collected from last 20 builds of main in https://github.com/openai/codex/commits/main/.	2026-02-10 18:50:11 -08:00
github-actions[bot]	3626399811	Update models.json (#11274 ) Automated update of models.json. --------- Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com> Co-authored-by: Sayan Sisodiya <sayan@openai.com>	2026-02-10 14:28:18 -08:00
Ahmed Ibrahim	5e01450963	Strip unsupported images from prompt history to guard against model switch (#11349 ) - Make `ContextManager::for_prompt` modality-aware and strip input_image content when the active model is text-only. - Added a test for multi-model -> text-only model switch	2026-02-10 11:58:00 -08:00
Ahmed Ibrahim	9c4656000f	Sanitize MCP image output for text-only models (#11346 ) - Replace image blocks in MCP tool results with a text placeholder when the active model does not accept image input. - Add an e2e rmcp test to verify sanitized tool output is what gets sent back to the model.	2026-02-10 11:25:32 -08:00
Ahmed Ibrahim	6e96e4837e	Always expose view_image and return unsupported image-input error (#11336 ) - Keep `view_image` in the advertised tool list for all models. - Return a clear error when the current model does not support image inputs, and cover it with a unit test.	2026-02-10 11:25:12 -08:00
jif-oai	847a6092e6	fix: reduce usage of `open_if_present` (#11344 )	2026-02-10 19:25:07 +00:00
pakrym-oai	0639c33892	Compare full request for websockets incrementality (#11343 ) Tools can dynamically change mid-turn now. We need to be more thorough about reusing incremental connections.	2026-02-10 19:14:36 +00:00
Dylan Hurd	f3bbcc987d	test(core): stabilize ARM bazel remote-model and parallelism tests (#11330 ) ## Summary - keep wiremock MockServer handles alive through async assertions in remote model suite tests - assert /models request count in remote_models_hide_picker_only_models - use a slightly higher parallel timing threshold on aarch64 while keeping existing x86 threshold ## Validation - just fmt - targeted tests: - cargo test -p codex-core --test all suite::remote_models::remote_models_merge_replaces_overlapping_model -- --exact - cargo test -p codex-core --test all suite::remote_models::remote_models_hide_picker_only_models -- --exact - cargo test -p codex-core --test all suite::tool_parallelism::shell_tools_run_in_parallel -- --exact - soak loop: 40 iterations of all three targeted tests ## Notes - cargo test -p codex-core has one unrelated local-env failure in shell_snapshot::tests::try_new_creates_and_deletes_snapshot_file from exported certificate env content in this workspace. - local bazel test //codex-rs/core:core-all-test failed to build due missing rust-objcopy in this host toolchain.	2026-02-10 10:57:50 -08:00
Shijie Rao	c4b771a16f	Fix: update parallel tool call exec approval to approve on request id (#11162 ) ### Summary In parallel tool call, exec command approvals were not approved at request level but at a turn level. i.e. when a single request is approved, the system currently treats all requests in turn as approved. ### Before https://github.com/user-attachments/assets/d50ed129-b3d2-4b2f-97fa-8601eb11f6a8 ### After https://github.com/user-attachments/assets/36528a43-a4aa-4775-9e12-f13287ef19fc	2026-02-10 09:38:00 -08:00
Max Johnson	47356ff83c	Revert "Add app-server transport layer with websocket support (#10693 )" (#11323 ) Suspected cause of deadlocking bug	2026-02-10 17:37:49 +00:00
Fouad Matin	693bac1851	fix(protocol): approval policy never prompt (#11288 ) This removes overly directed language about how the model should behave when it's in `approval_policy=never` mode. --------- Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>	2026-02-10 09:27:46 -08:00
jif-oai	59c625458b	Fix pending input test waiting logic (#11322 ) ## Summary - remove redundant user message wait that could time out and cause flakiness - rely on the existing turn-complete wait to ensure the follow-up request is observed ## Testing - Not run (not requested)	2026-02-10 15:40:53 +00:00
viyatb-oai	3391e5ea86	feat(sandbox): enforce proxy-aware network routing in sandbox (#11113 ) ## Summary - expand proxy env injection to cover common tool env vars (`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY` families + tool-specific variants) - harden macOS Seatbelt network policy generation to route through inferred loopback proxy endpoints and fail closed when proxy env is malformed - thread proxy-aware Linux sandbox flags and add minimal bwrap netns isolation hook for restricted non-proxy runs - add/refresh tests for proxy env wiring, Seatbelt policy generation, and Linux sandbox argument wiring	2026-02-10 07:44:21 +00:00
Dylan Hurd	168c359b71	Adjust shell command timeouts for Windows (#11247 ) Summary - add platform-aware defaults for shell command timeouts so Windows tests get longer waits - keep medium timeout longer on Windows to ensure flakiness is reduced Testing - Not run (not requested)	2026-02-09 20:03:32 -08:00

1 2 3 4 5 ...

558 Commits