codex

mirror of https://github.com/openai/codex.git synced 2026-04-27 08:05:51 +00:00

Author	SHA1	Message	Date
pakrym-oai	69df12efb3	Remove Responses V1 websocket implementation (#13364 ) V2 is the way to go!	2026-03-03 11:32:53 -07:00
Ahmed Ibrahim	b20b6aa46f	Update realtime websocket API (#13265 ) - migrate the realtime websocket transport to the new session and handoff flow - make the realtime model configurable in config.toml and use API-key auth for the websocket --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-02 16:05:40 -08:00
Curtis 'Fjord' Hawthorne	7e980d7db6	Support multimodal custom tool outputs (#12948 ) ## Summary This changes `custom_tool_call_output` to use the same output payload shape as `function_call_output`, so freeform tools can return either plain text or structured content items. The main goal is to let `js_repl` return image content from nested `view_image` calls in its own `custom_tool_call_output`, instead of relying on a separate injected message. ## What changed - Changed `custom_tool_call_output.output` from `string` to `FunctionCallOutputPayload` - Updated freeform tool plumbing to preserve structured output bodies - Updated `js_repl` to aggregate nested tool content items and attach them to the outer `js_repl` result - Removed the old `js_repl` special case that injected `view_image` results as a separate pending user image message - Updated normalization/history/truncation paths to handle multimodal `custom_tool_call_output` - Regenerated app-server protocol schema artifacts ## Behavior Direct `view_image` calls still return a `function_call_output` with image content. When `view_image` is called inside `js_repl`, the outer `js_repl` `custom_tool_call_output` now carries: - an `input_text` item if the JS produced text output - one or more `input_image` items from nested tool results So the nested image result now stays inside the `js_repl` tool output instead of being injected as a separate message. ## Compatibility This is intended to be backward-compatible for resumed conversations. Older histories that stored `custom_tool_call_output.output` as a plain string still deserialize correctly, and older histories that used the previous injected-image-message flow also continue to resume. Added regression coverage for resuming a pre-change rollout containing: - string-valued `custom_tool_call_output` - legacy injected image message history #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/12948	2026-02-26 18:17:46 -08:00
Ahmed Ibrahim	a11da86b37	Make realtime audio test deterministic (#12959 ) ## Summary\n- add a websocket test-server request waiter so tests can synchronize on recorded client messages\n- use that waiter in the realtime delegation test instead of a fixed audio timeout\n- add temporary timing logs in the test and websocket mock to inspect where the flake stalls	2026-02-26 16:09:00 -08:00
Charley Cunningham	07aefffb1f	core: bundle settings diff updates into one dev/user envelope (#12417 ) ## Summary - bundle contextual prompt injection into at most one developer message plus one contextual user message in both: - per-turn settings updates - initial context insertion - preserve `<model_switch>` across compaction by rebuilding it through canonical initial-context injection, instead of relying on strip/reattach hacks - centralize contextual user fragment detection in one shared definition table and reuse it for parsing/compaction logic - keep `AGENTS.md` in its natural serialized format: - `# AGENTS.md instructions for {dirname}` - `<INSTRUCTIONS>...</INSTRUCTIONS>` - simplify related tests/helpers and accept the expected snapshot/layout updates from bundled multi-part messages ## Why The goal is to converge toward a simpler, more intentional prompt shape where contextual updates are consistently represented as one developer envelope plus one contextual user envelope, while keeping parsing and compaction behavior aligned with that representation. ## Notable details - the temporary `SettingsUpdateEnvelope` wrapper was removed; these paths now return `Vec<ResponseItem>` directly - local/remote compaction no longer rely on model-switch strip/restore helpers - contextual user detection is now driven by shared fragment definitions instead of ad hoc matcher assembly - AGENTS/user instructions are still the same logical context; only the synthetic `<user_instructions>` wrapper was replaced by the natural AGENTS text format ## Testing - `just fmt` - `cargo test -p codex-app-server codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages -- --exact` - `cargo test -p codex-core compact::tests::collect_user_messages_filters_session_prefix_entries --lib -- --exact` - `cargo test -p codex-core --test all 'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_apps_guidance_as_developer_message_when_enabled' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_developer_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_user_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::resume_includes_initial_messages_and_sends_prior_items' -- --exact` - `cargo test -p codex-core --test all 'suite::review::review_input_isolated_from_parent_history' -- --exact` - `cargo test -p codex-exec --test all 'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' -- --exact` - `cargo test -p core_test_support context_snapshot::tests::full_text_mode_preserves_unredacted_text -- --exact` ## Notes - I also ran several targeted `compact`, `compact_remote`, `prompt_caching`, `model_visible_layout`, and `event_mapping` tests while iterating on prompt-shape changes. - I have not claimed a clean full-workspace `cargo test` from this environment because local sandbox/resource conditions have previously produced unrelated failures in large workspace runs.	2026-02-26 00:12:08 -08:00
pakrym-oai	97d0068658	Send warmup request (#11258 ) Send a request with `generate: falls` but a full set of tools and instructions to pre-warm inference. --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-24 08:15:47 -08:00
Charley Cunningham	bb0ac5be70	Fix compaction context reinjection and model baselines (#12252 ) ## Summary - move regular-turn context diff/full-context persistence into `run_turn` so pre-turn compaction runs before incoming context updates are recorded - after successful pre-turn compaction, rely on a cleared `reference_context_item` to trigger full context reinjection on the follow-up regular turn (manual `/compact` keeps replacement history summary-only and also clears the baseline) - preserve `<model_switch>` when full context is reinjected, and inject it before the rest of the full-context items - scope `reference_context_item` and `previous_model` to regular user turns only so standalone tasks (`/compact`, shell, review, undo) cannot suppress future reinjection or `<model_switch>` behavior - make context-diff persistence + `reference_context_item` updates explicit in the regular-turn path, with clearer docs/comments around the invariant - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots (only regular turns persist `TurnContextItem` now) - simplify resume/fork previous-model/reference-baseline hydration by looking up the last surviving turn context from rollout lifecycle events, including rollback and compaction-crossing handling - remove the legacy fallback that guessed from bare `TurnContext` rollouts without lifecycle events - update compaction/remote-compaction/model-visible snapshots and compact test assertions (including remote compaction mock response shape) ## Why We were persisting incoming context items before spawning the regular turn task, which let pre-turn compaction requests accidentally include incoming context diffs without the new user message. Fixing that exposed follow-on baseline issues around `/compact`, resume/fork, and standalone tasks that could cause duplicate context injection or suppress `<model_switch>` instructions. This PR re-centers the invariants around regular turns: - regular turns persist model-visible context diffs/full reinjection and update the `reference_context_item` - standalone tasks do not advance those regular-turn baselines - compaction clears the baseline when replacement history may have stripped the referenced context diffs ## Follow-ups (TODOs left in code) - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more comprehensively - `TODO(ccunningham)`: include pending incoming context items in pre-turn compaction threshold estimation - `TODO(ccunningham)`: inject updated personality spec alongside `<model_switch>` so some model-switch paths can avoid forced full reinjection - `TODO(ccunningham)`: review task turn lifecycle (`TurnStarted`/`TurnComplete`) behavior and emit task-start context diffs for task types that should have them (excluding `/compact`) ## Validation - `just fmt` - CI should cover the updated compaction/resume/model-visible snapshot expectations and rollout-hydration behavior - I did not rerun the full local test suite after the latest resume-lookup / rollout-persistence simplifications	2026-02-20 23:13:08 -08:00
Charley Cunningham	eb68767f2f	Unify remote compaction snapshot mocks around default endpoint behavior (#12050 ) ## Summary - standardize remote compaction test mocking around one default behavior in shared helpers - make default remote compact mocks mirror production shape: keep `message/user` + `message/developer`, drop assistant/tool artifacts, then append a summary user message - switch non-special `compact_remote` tests to the shared default mock instead of ad-hoc JSON payloads ## Special-case tests that still use explicit mocks - remote compaction error payload / HTTP failure behavior - summary-only compact output behavior - manual `/compact` with no prior user messages - stale developer-instruction injection coverage ## Why This removes inconsistent manual remote compaction fixtures and gives us one source of truth for normal remote compact behavior, while preserving explicit mocks only where tests intentionally cover non-default behavior.	2026-02-17 18:18:47 -08:00
Charley Cunningham	85034b189e	core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487 ) This PR keeps compaction context-layout test coverage separate from runtime compaction behavior changes, so runtime logic review can stay focused. ## Included - Adds reusable context snapshot helpers in `core/tests/common/context_snapshot.rs` for rendering model-visible request/history shapes. - Standardizes helper naming for readability: - `format_request_input_snapshot` - `format_response_items_snapshot` - `format_labeled_requests_snapshot` - `format_labeled_items_snapshot` - Expands snapshot coverage for both local and remote compaction flows: - pre-turn auto-compaction - pre-turn failure/context-window-exceeded paths - mid-turn continuation compaction - manual `/compact` with and without prior user turns - Captures both sides where relevant: - compaction request shape - post-compaction history layout shape - Adds/uses shared request-inspection helpers so assertions target structured request content instead of ad-hoc JSON string parsing. - Aligns snapshots/assertions to current behavior and leaves explicit `TODO(ccunningham)` notes where behavior is known and intentionally deferred. ## Not Included - No runtime compaction logic changes. - No model-visible context/state behavior changes.	2026-02-14 19:57:10 -08:00
pakrym-oai	eac5473114	Do not attempt to append after response.completed (#11402 ) Completed responses are fully done, and new response must be created.	2026-02-11 07:45:17 -08:00
Anton Panasenko	a94505a92a	feat: enable premessage-deflate for websockets (#10966 ) note: unfortunately, tokio-tungstenite / tungstenite upgrade triggers some problems with linker of rama-tls-boring with openssl: ``` error: linking with `/Users/apanasenko/Library/Caches/cargo-zigbuild/0.20.1/zigcc-x86_64-unknown-linux-musl-ff6a.sh` failed: exit status: 1 \| = note: "/Users/apanasenko/Library/Caches/cargo-zigbuild/0.20.1/zigcc-x86_64-unknown-linux-musl-ff6a.sh" "-m64" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/rcrt1.o" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crti.o" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtbeginS.o" "<1 object files omitted>" "-Wl,--as-needed" "-Wl,-Bstatic" "/var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/{liblzma_sys-662a82316f96ec30,libbzip2_sys-bf78a2d58d5cbce6,liblibsqlite3_sys-6c004987fd67a36a,libtree_sitter_bash-220b99a97d331ab7,libtree_sitter-858f0a1dbfea58bd,libzstd_sys-6eb237deec748c5b,libring-2a87376483bf916f,libopenssl_sys-7c189e68b37fe2bb,liblibz_sys-4344eef4345520b1,librama_boring_sys-0414e98115015ee0}.rlib" "-lc++" "-lc++abi" "-lunwind" "-lc" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/libcompiler_builtins-*.rlib" "-L" "/var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/raw-dylibs" "-Wl,-Bdynamic" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-nostartfiles" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/libz-sys-ff5ea50d88c28ffb/out/lib" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/ring-bdec3dddc19f5a5e/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/openssl-sys-96e0870de3ca22bc/out/openssl-build/install/lib" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/zstd-sys-0cc37a5da1481740/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/tree-sitter-72d2418073317c0f/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/tree-sitter-bash-bfd293a9f333ce6a/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/libsqlite3-sys-b78b2cfb81a330fc/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/bzip2-sys-69a145cc859ef275/out/lib" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/lzma-sys-07e92d0b6baa6fd4/out" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/crypto/" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/ssl/" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/" "-L" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build" "-L" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained" "-L" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib" "-o" "/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/deps/codex_network_proxy-d08268b863517761" "-Wl,--gc-sections" "-static-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-Wl,--strip-all" "-nodefaultlibs" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtendS.o" "<sysroot>/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtn.o" = note: some arguments are omitted. use `--verbose` to show all linker arguments = note: warning: ignoring deprecated linker optimization setting '1' warning: unable to open library directory '/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/build/crypto/': FileNotFound ld.lld: error: duplicate symbol: SSL_export_keying_material >>> defined at ssl_lib.c:3816 (ssl/ssl_lib.c:3816) >>> libssl-lib-ssl_lib.o:(SSL_export_keying_material) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/libopenssl_sys-7c189e68b37fe2bb.rlib >>> defined at t1_enc.cc:205 (/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/boringssl/ssl/t1_enc.cc:205) >>> t1_enc.cc.o:(.text.SSL_export_keying_material+0x0) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/librama_boring_sys-0414e98115015ee0.rlib ld.lld: error: duplicate symbol: d2i_ASN1_TIME >>> defined at a_time.c:27 (crypto/asn1/a_time.c:27) >>> libcrypto-lib-a_time.o:(d2i_ASN1_TIME) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/libopenssl_sys-7c189e68b37fe2bb.rlib >>> defined at a_time.cc:34 (/Users/apanasenko/code/codex/codex-rs/target/x86_64-unknown-linux-musl/release/build/rama-boring-sys-0bc2dfbf669addc4/out/boringssl/crypto/asn1/a_time.cc:34) >>> a_time.cc.o:(.text.d2i_ASN1_TIME+0x0) in archive /var/folders/kt/52y_g75x3ng8ktvk3rfwm6400000gp/T/rustcyGQdYm/librama_boring_sys-0414e98115015ee0.rlib ``` that force me to migrate away from rama-tls-boring to rama-tls-rustls and pin `ring` for rustls.	2026-02-07 17:59:34 -08:00
Josh McKinney	e416e578bb	core: preconnect Responses websocket for first turn (#10698 ) ## Problem The first user turn can pay websocket handshake latency even when a session has already started. We want to reduce that initial delay while preserving turn semantics and avoiding any prompt send during startup. Reviewer feedback also called out duplicated connect/setup paths and unnecessary preconnect state complexity. ## Mental model `ModelClient` owns session-scoped transport state. During session startup, it can opportunistically warm one websocket handshake slot. A turn-scoped `ModelClientSession` adopts that slot once if available, restores captured sticky turn-state, and otherwise opens a websocket through the same shared connect path. If startup preconnect is still in flight, first turn setup awaits that task and treats it as the first connection attempt for the turn. Preconnect is handshake-only. The first `response.create` is still sent only when a turn starts. ## Non-goals This change does not make preconnect required for correctness and does not change prompt/turn payload semantics. It also does not expand fallback behavior beyond clearing preconnect state when fallback activates. ## Tradeoffs The implementation prioritizes simpler ownership and shared connection code over header-match gating for reuse. The single-slot cache keeps lifecycle straightforward but only benefits the immediate next turn. Awaiting in-flight preconnect has the same app-level connect-timeout semantics as existing websocket connect behavior (no new timeout class introduced by this PR). ## Architecture `core/src/client.rs`: - Added session-level preconnect lifecycle state (`Idle` / `InFlight` / `Ready`) carrying one warmed websocket plus optional captured turn-state. - Added `pre_establish_connection()` startup warmup and `preconnect()` handshake-only setup. - Deduped auth/provider resolution into `current_client_setup()` and websocket handshake wiring into `connect_websocket()` / `build_websocket_headers()`. - Updated turn websocket path to adopt preconnect first, await in-flight preconnect when present, then create a new websocket only when needed. - Ensured fallback activation clears warmed preconnect state. - Added documentation for lifecycle, ownership, sticky-routing invariants, and timeout semantics. `core/src/codex.rs`: - Session startup invokes `model_client.pre_establish_connection(...)`. - Turn metadata resolution uses the shared timeout helper. `core/src/turn_metadata.rs`: - Centralized shared timeout helper used by both turn-time metadata resolution and startup preconnect metadata building. `core/tests/common/responses.rs` + websocket test suites: - Added deterministic handshake waiting helper (`wait_for_handshakes`) with bounded polling. - Added startup preconnect and in-flight preconnect reuse coverage. - Fallback expectations now assert exactly two websocket attempts in covered scenarios (startup preconnect + turn attempt before fallback sticks). ## Observability Preconnect remains best-effort and non-fatal. Existing websocket/fallback telemetry remains in place, and debug logs now make preconnect-await behavior and preconnect failures easier to reason about. ## Tests Validated with: 1. `just fmt` 2. `cargo test -p codex-core websocket_preconnect -- --nocapture` 3. `cargo test -p codex-core websocket_fallback -- --nocapture` 4. `cargo test -p codex-core websocket_first_turn_waits_for_inflight_preconnect -- --nocapture`	2026-02-06 19:08:24 +00:00
Ahmed Ibrahim	f9c38f531c	add none personality option (#10688 ) - add none personality enum value and empty placeholder behavior\n- add docs/schema updates and e2e coverage	2026-02-04 15:40:33 -08:00
sayan-oai	ff9fa56368	default enable compression, update test helpers (#10102 ) set `enable_request_compression` flag to default-enabled. update integration test helpers to decompress `zstd` if flag set.	2026-01-28 12:25:40 -08:00
sayan-oai	86adf53235	fix: handle all web_search actions and in progress invocations (#9960 ) ### Summary - Parse all `web_search` tool actions (`search`, `find_in_page`, `open_page`). - Previously we only parsed + displayed `search`, which made the TUI appear to pause when the other actions were being used. - Show in progress `web_search` calls as `Searching the web` - Previously we only showed completed tool calls <img width="308" height="149" alt="image" src="https://github.com/user-attachments/assets/90a4e8ff-b06a-48ff-a282-b57b31121845" /> ### Tests Added + updated tests, tested locally ### Follow ups Update VSCode extension to display these as well	2026-01-27 03:33:48 +00:00
Anton Panasenko	e117a3ff33	feat: support proxy for ws connection (#9719 ) reapply websocket changes without changing tls lib.	2026-01-22 15:23:15 -08:00
Dylan Hurd	8b3521ee77	feat(core) update Personality on turn (#9644 ) ## Summary Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This is explicitly unused by the clients so far, to simplify PRs - app-server and tui implementations will be follow-ups. ## Testing - [x] added integration tests	2026-01-22 12:04:23 -08:00
pakrym-oai	4d48d4e0c2	Revert "feat: support proxy for ws connection" (#9693 ) Reverts openai/codex#9409	2026-01-22 15:57:18 +00:00
Anton Panasenko	7b27aa7707	feat: support proxy for ws connection (#9409 ) unfortunately tokio-tungstenite doesn't support proxy configuration outbox, while https://github.com/snapview/tokio-tungstenite/pull/370 is in review, we can depend on source code for now.	2026-01-20 09:36:30 -08:00
Ahmed Ibrahim	ebdd8795e9	Turn-state sticky routing per turn (#9332 ) - capture the header from SSE/WS handshakes, store it per ModelClientSession using `Oncelock`, echo it on turn-scoped requests, and add SSE+WS integration tests for within-turn persistence + cross-turn reset. - keep `x-codex-turn-state` sticky within a user turn to maintain routing continuity for retries/tool follow-ups.	2026-01-16 09:30:11 -08:00
pakrym-oai	9f8d3c14ce	Fix flakiness in WebSocket tests (#9169 ) The connection was being added to the list after the WebSocket response was sent. So the test can sometimes race and observe connections before the list was updated. After this change, connection and request is added to the list before the response is sent.	2026-01-13 15:09:59 -08:00
pakrym-oai	2d56519ecd	Support response.done and add integration tests (#9129 ) The agent loop using a persistent incremental web socket connection.	2026-01-13 16:12:30 +00:00
pakrym-oai	490c1c1fdd	Add model client sessions (#9102 ) Maintain a long-running session.	2026-01-13 01:15:56 +00:00
Ahmed Ibrahim	81caee3400	Add 5s timeout to models list call + integration test (#8942 ) - Enforce a 5s timeout around the remote models refresh to avoid hanging /models calls.	2026-01-08 18:06:10 -08:00
Ahmed Ibrahim	0d3e673019	remove `get_responses_requests` and `get_responses_request_bodies` to use in-place matcher (#8858 )	2026-01-08 13:57:48 -08:00
Ahmed Ibrahim	66b7c673e9	Refresh on models etag mismatch (#8491 ) - Send models etag - Refresh models on 412 - This wires `ModelsManager` to `ModelFamily` so we don't mutate it mid-turn	2026-01-01 11:41:16 -08:00
Ahmed Ibrahim	cb9a189857	make `model` optional in config (#7769 ) - Make Config.model optional and centralize default-selection logic in ModelsManager, including a default_model helper (with codex-auto-balanced when available) so sessions now carry an explicit chosen model separate from the base config. - Resolve `model` once in `core` and `tui` from config. Then store the state of it on other structs. - Move refreshing models to be before resolving the default model	2025-12-10 11:19:00 -08:00
Ahmed Ibrahim	222a491570	load models from disk and set a ttl and etag (#7722 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2025-12-08 13:43:04 -08:00
Ahmed Ibrahim	903b7774bc	Add models endpoint (#7603 ) - Use the codex-api crate to introduce models endpoint. - Add `models` to codex core tests helpers - Add `ModelsInfo` for the endpoint return type	2025-12-04 12:57:54 -08:00
jif-oai	51307eaf07	feat: retroactive image placeholder to prevent poisoning (#6774 ) If an image can't be read by the API, it will poison the entire history, preventing any new turn on the conversation. This detect such cases and replace the image by a placeholder	2025-12-03 11:35:56 +00:00
Dylan Hurd	5b25915d7e	fix(apply_patch) tests for shell_command (#7307 ) ## Summary Adds test coverage for invocations of apply_patch via shell_command with heredoc, to validate behavior. ## Testing - [x] These are tests	2025-12-01 15:09:22 -08:00
Ahmed Ibrahim	b519267d05	Account for encrypted reasoning for auto compaction (#7113 ) - The total token used returned from the api doesn't account for the reasoning items before the assistant message - Account for those for auto compaction - Add the encrypted reasoning effort in the common tests utils - Add a test to make sure it works as expected	2025-11-22 03:06:45 +00:00
pakrym-oai	767b66f407	Migrate coverage to shell_command (#7042 )	2025-11-21 03:44:00 +00:00
pakrym-oai	75f38f16dd	Run remote auto compaction (#6879 )	2025-11-19 00:43:58 -08:00
jif-oai	838531d3e4	feat: remote compaction (#6795 ) Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-18 16:51:16 +00:00
pakrym-oai	6c384eb9c6	tests: replace mount_sse_once_match with mount_sse_once for SSE mocking (#6640 )	2025-11-13 18:04:05 -08:00
Ahmed Ibrahim	2a6e9b20df	Promote shared helpers for suite tests (#6460 ) ## Summary - add `TestCodex::submit_turn_with_policies` and extend the response helpers with reusable tool-call utilities - update the grep_files, read_file, list_dir, shell_serialization, and tools suites to rely on the shared helpers instead of local copies - make the list_dir helper return `anyhow::Result` so clippy no longer warns about `expect` ## Testing - `just fix -p codex-core` - `cargo test -p codex-core --test all suite::grep_files::grep_files_tool_collects_matches` - `cargo test -p codex-core suite::grep_files::grep_files_tool_collects_matches -- --ignored` (filter requests ignored tests so nothing runs, but the build stays clean) ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69112d53abac83219813cab4d7cb6446)	2025-11-13 17:12:10 -08:00
Celia Chen	b8ec97c0ef	[App-server] add new v2 events:`item/reasoning/delta`, `item/agentMessage/delta` & `item/reasoning/summaryPartAdded` (#6559 ) core event to app server event mapping: 1. `codex/event/reasoning_content_delta` -> `item/reasoning/summaryTextDelta`. 2. `codex/event/reasoning_raw_content_delta` -> `item/reasoning/textDelta` 3. `codex/event/agent_message_content_delta` → `item/agentMessage/delta`. 4. `codex/event/agent_reasoning_section_break` -> `item/reasoning/summaryPartAdded`. Also added a change in core to pass down content index, summary index and item id from events. Tested with the `git checkout owen/app_server_test_client && cargo run -p codex-app-server-test-client -- send-message-v2 "hello"` and verified that new events are emitted correctly.	2025-11-14 00:25:01 +00:00
Dylan Hurd	2c1b693da4	chore(core) Consolidate apply_patch tests (#6545 ) ## Summary Consolidates our apply_patch tests into one suite, and ensures each test case tests the various ways the harness supports apply_patch: 1. Freeform custom tool call 2. JSON function tool 3. Simple shell call 4. Heredoc shell call There are a few test cases that are specific to a particular variant, I've left those alone. ## Testing - [x] This adds a significant number of tests	2025-11-13 15:52:39 -08:00
pakrym-oai	7d9ad3effd	Fix otel tests (#6541 ) Mount responses only once, remove unneeded retries and add a final assistant messages to complete the turn.	2025-11-12 16:35:34 +00:00
zhao-oai	980886498c	Add user command event types (#6246 ) adding new user command event, logic in TUI to render user command events	2025-11-10 19:18:45 +00:00
jif-oai	0508823075	test: undo (#6034 )	2025-10-31 14:46:24 +00:00
pakrym-oai	3429e82e45	Add item streaming events (#5546 ) Adds AgentMessageContentDelta, ReasoningContentDelta, ReasoningRawContentDelta item streaming events while maintaining compatibility for old events. --------- Co-authored-by: Owen Lin <owen@openai.com>	2025-10-29 22:33:57 +00:00
pakrym-oai	1b8f2543ac	Filter out reasoning items from previous turns (#5857 ) Reduces request size and prevents 400 errors when switching between API orgs. Based on Responses API behavior described in https://cookbook.openai.com/examples/responses_api/reasoning_items#caching	2025-10-28 11:39:34 -07:00
Ahmed Ibrahim	f59978ed3d	Handle cancelling/aborting while processing a turn (#5543 ) Currently we collect all all turn items in a vector, then we add it to the history on success. This result in losing those items on errors including aborting `ctrl+c`. This PR: - Adds the ability for the tool call to handle cancellation - bubble the turn items up to where we are recording this info Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of error edge cases. The right thing to do is recording to the history on the spot as `items`/`tool calls output` come. However, this isn't possible because of having different `task_kind` that has different `conversation_histories`. The `try_run_turn` has no idea what thread are we using. We cannot also pass an `arc` to the `conversation_histories` because it's a private element of `state`. That's said, `abort` is the most common case and we should cover it until we remove `task kind`	2025-10-23 08:47:10 -07:00
Ahmed Ibrahim	273819aaae	Move changing turn input functionalities to `ConversationHistory` (#5473 ) We are doing some ad-hoc logic while dealing with conversation history. Ideally, we shouldn't mutate `vec[responseitem]` manually at all and should depend on `ConversationHistory` for those changes. Those changes are: - Adding input to the history - Removing items from the history - Correcting history I am also adding some `error` logs for cases we shouldn't ideally face. For example, we shouldn't be missing `toolcalls` or `outputs`. We shouldn't hit `ContextWindowExceeded` while performing `compact` This refactor will give us granular control over our context management.	2025-10-22 13:08:46 -07:00
pakrym-oai	3c90728a29	Add new thread items and rewire event parsing to use them (#5418 ) 1. Adds AgentMessage, Reasoning, WebSearch items. 2. Switches the ResponseItem parsing to use new items and then also emit 3. Removes user-item kind and filters out "special" (environment) user items when returning to clients.	2025-10-22 10:14:50 -07:00
Ahmed Ibrahim	049a61bcfc	Auto compact at ~90% (#5292 ) Users now hit a window exceeded limit and they usually don't know what to do. This starts auto compact at ~90% of the window.	2025-10-20 11:29:49 -07:00
pakrym-oai	35a770e871	Simplify request body assertions (#4845 ) We'll have a lot more test like these	2025-10-07 09:56:39 +01:00
pakrym-oai	aecbe0f333	Add helper for response created SSE events in tests (#4758 ) ## Summary - add a reusable `ev_response_created` helper that builds `response.created` SSE events for integration tests - update the exec and core integration suites to use the new helper instead of repeating manual JSON literals - keep the streaming fixtures consistent by relying on the shared helper in every touched test ## Testing - `just fmt` ------ https://chatgpt.com/codex/tasks/task_i_68e1fe885bb883208aafffb94218da61	2025-10-05 21:11:43 +00:00

1 2

56 Commits