codex

mirror of https://github.com/openai/codex.git synced 2026-04-28 00:25:56 +00:00

Author	SHA1	Message	Date
bxie-openai	c2bdb7812c	Clarify realtime v2 context and handoff messages (#17896 ) ## Summary - wrap realtime startup context in `<startup_context>...</startup_context>` tags - prefix V2 mirrored user text and relayed backend text with `[USER]` / `[BACKEND]` - remove the V2 progress suffix and replace the final V2 handoff output with a short completion acknowledgement while preserving the existing V1 wrapper ## Testing - cargo test -p codex-api realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item -- --exact - cargo test -p codex-app-server webrtc_v2_background_agent_ - cargo test -p codex-app-server webrtc_v2_text_input_is_ - cargo test -p codex-core conversation_user_text_turn_is_	2026-04-15 16:26:20 -07:00
Thibault Sottiaux	05c5829923	[codex] drain mailbox only at request boundaries (#17749 ) This changes multi-agent v2 mailbox handling so incoming inter-agent messages no longer preempt an in-flight sampling stream at reasoning or commentary output-item boundaries.	2026-04-13 22:09:51 -07:00
Ahmed Ibrahim	ec0133f5f8	Cap realtime mirrored user turns (#17685 ) Cap mirrored user text sent to realtime with the existing 300-token turn budget while preserving the full model turn. Adds integration coverage for capped realtime mirror payloads. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-13 14:31:18 -07:00
Ahmed Ibrahim	d840b247d7	Mirror user text into realtime (#17520 ) - Let typed user messages submit while realtime is active and mirror accepted text into the realtime text stream. - Add integration coverage and snapshot for outbound realtime text.	2026-04-12 15:03:14 -07:00
Ahmed Ibrahim	4db60d5d8b	Budget realtime current thread context (#17519 ) Select Current Thread startup context by budget from newest turns, cap each rendered turn at 300 approximate tokens, and add formatter plus integration snapshot coverage.	2026-04-12 11:59:09 -07:00
Ahmed Ibrahim	e4f1b3a65e	Preempt mailbox mail after reasoning/commentary items (#16725 ) Send pending mailbox mail after completed reasoning or commentary items so follow-up requests can pick it up mid-turn. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-03 18:29:05 -07:00
Charley Cunningham	2d61357c76	Trim pre-turn context updates during rollback (#15577 ) ## Summary - trim contiguous developer/contextual-user pre-turn updates when rollback cuts back to a user turn - add a focused history regression test for the trim behavior - update the rollback request-boundary snapshots to show the fixed non-duplicating context shape --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-24 12:43:53 -07:00
Charley Cunningham	0f34b14b41	[codex] Add rollback context duplication snapshot (#15562 ) ## What changed - adds a targeted snapshot test for rollback with contextual diffs in `codex_tests.rs` - snapshots the exact model-visible request input before the rolled-back turn and on the follow-up request after rollback - shows the duplicate developer and environment context pair appearing again before the follow-up user message ## Why Rollback currently rewinds the reference context baseline without rewinding the live session overrides. On the next turn, the same contextual diff is emitted again and duplicated in the request sent to the model. ## Impact - makes the regression visible in a canonical snapshot test - keeps the snapshot on the shared `context_snapshot` path without adding new formatting helpers - gives a direct repro for future fixes to rollback/context reconstruction --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-23 15:36:23 -07:00
sayan-oai	d272f45058	move plugin/skill instructions into dev msg and reorder (#14609 ) Move the general `Apps`, `Skills` and `Plugins` instructions blocks out of `user_instructions` and into the developer message, with new `Apps -> Skills -> Plugins` order for better clarity. Also wrap those sections in stable XML-style instruction tags (like other sections) and update prompt-layout tests/snapshots. This makes the tests less brittle in snapshot output (we can parse the sections), and it consolidates the capability instructions in one place. #### Tests Updated snapshots, added tests. `<AGENTS_MD>` disappearing in snapshots is expected: before this change, the wrapped user-instructions message was kept alive by `Skills` content. Now that `Skills` and `Plugins` are in the developer message, that wrapper only appears when there is real project-doc/user-instructions content. --------- Co-authored-by: Charley Cunningham <ccunningham@openai.com>	2026-03-13 20:51:01 -07:00
Charley Cunningham	f5bb338fdb	Defer initial context insertion until the first turn (#14313 ) ## Summary - defer fresh-session `build_initial_context()` until the first real turn instead of seeding model-visible context during startup - rely on the existing `reference_context_item == None` turn-start path to inject full initial context on that first real turn (and again after baseline resets such as compaction) - add a regression test for `InitialHistory::New` and update affected deterministic tests / snapshots around developer-message layout, collaboration instructions, personality updates, and compact request shapes ## Notes - this PR does not add any special empty-thread `/compact` behavior - most of the snapshot churn is the direct result of moving the initial model-visible context from startup to the first real turn, so first-turn request layouts no longer contain a pre-user startup copy of permissions / environment / other developer-visible context - remote manual `/compact` with no prior user still skips the remote compact request; local first-turn `/compact` still issues a compact request, but that request now reflects the lack of startup-seeded context --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-11 12:33:10 -07:00
Ahmed Ibrahim	629cb15bc6	Replay thread rollback from rollout history (#13615 ) - Replay thread rollback from the persisted rollout history instead of truncating in-memory state.\n- Add rollback coverage, including rollback-behind-compaction snapshot coverage.	2026-03-05 16:40:09 -08:00
Ahmed Ibrahim	041c896509	Revert "Revert "realtime prompt changes"" (#13398 ) Reverts openai/codex#13385	2026-03-03 14:41:26 -08:00
Ahmed Ibrahim	0aeb55bf08	Record realtime close marker on replacement (#13058 ) ## Summary - record a realtime close developer message when a new realtime session replaces an active one - assert the replacement marker through the mocked responses request path --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charles Cunningham <ccunningham@openai.com>	2026-03-01 13:54:12 -08:00
jif-oai	3404ecff15	feat: add post-compaction sub-agent infos (#12774 ) Co-authored-by: Codex <noreply@openai.com>	2026-02-26 18:55:34 +00:00
Charley Cunningham	07aefffb1f	core: bundle settings diff updates into one dev/user envelope (#12417 ) ## Summary - bundle contextual prompt injection into at most one developer message plus one contextual user message in both: - per-turn settings updates - initial context insertion - preserve `<model_switch>` across compaction by rebuilding it through canonical initial-context injection, instead of relying on strip/reattach hacks - centralize contextual user fragment detection in one shared definition table and reuse it for parsing/compaction logic - keep `AGENTS.md` in its natural serialized format: - `# AGENTS.md instructions for {dirname}` - `<INSTRUCTIONS>...</INSTRUCTIONS>` - simplify related tests/helpers and accept the expected snapshot/layout updates from bundled multi-part messages ## Why The goal is to converge toward a simpler, more intentional prompt shape where contextual updates are consistently represented as one developer envelope plus one contextual user envelope, while keeping parsing and compaction behavior aligned with that representation. ## Notable details - the temporary `SettingsUpdateEnvelope` wrapper was removed; these paths now return `Vec<ResponseItem>` directly - local/remote compaction no longer rely on model-switch strip/restore helpers - contextual user detection is now driven by shared fragment definitions instead of ad hoc matcher assembly - AGENTS/user instructions are still the same logical context; only the synthetic `<user_instructions>` wrapper was replaced by the natural AGENTS text format ## Testing - `just fmt` - `cargo test -p codex-app-server codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages -- --exact` - `cargo test -p codex-core compact::tests::collect_user_messages_filters_session_prefix_entries --lib -- --exact` - `cargo test -p codex-core --test all 'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_apps_guidance_as_developer_message_when_enabled' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_developer_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_user_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::resume_includes_initial_messages_and_sends_prior_items' -- --exact` - `cargo test -p codex-core --test all 'suite::review::review_input_isolated_from_parent_history' -- --exact` - `cargo test -p codex-exec --test all 'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' -- --exact` - `cargo test -p core_test_support context_snapshot::tests::full_text_mode_preserves_unredacted_text -- --exact` ## Notes - I also ran several targeted `compact`, `compact_remote`, `prompt_caching`, `model_visible_layout`, and `event_mapping` tests while iterating on prompt-shape changes. - I have not claimed a clean full-workspace `cargo test` from this environment because local sandbox/resource conditions have previously produced unrelated failures in large workspace runs.	2026-02-26 00:12:08 -08:00
Charley Cunningham	bb0ac5be70	Fix compaction context reinjection and model baselines (#12252 ) ## Summary - move regular-turn context diff/full-context persistence into `run_turn` so pre-turn compaction runs before incoming context updates are recorded - after successful pre-turn compaction, rely on a cleared `reference_context_item` to trigger full context reinjection on the follow-up regular turn (manual `/compact` keeps replacement history summary-only and also clears the baseline) - preserve `<model_switch>` when full context is reinjected, and inject it before the rest of the full-context items - scope `reference_context_item` and `previous_model` to regular user turns only so standalone tasks (`/compact`, shell, review, undo) cannot suppress future reinjection or `<model_switch>` behavior - make context-diff persistence + `reference_context_item` updates explicit in the regular-turn path, with clearer docs/comments around the invariant - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots (only regular turns persist `TurnContextItem` now) - simplify resume/fork previous-model/reference-baseline hydration by looking up the last surviving turn context from rollout lifecycle events, including rollback and compaction-crossing handling - remove the legacy fallback that guessed from bare `TurnContext` rollouts without lifecycle events - update compaction/remote-compaction/model-visible snapshots and compact test assertions (including remote compaction mock response shape) ## Why We were persisting incoming context items before spawning the regular turn task, which let pre-turn compaction requests accidentally include incoming context diffs without the new user message. Fixing that exposed follow-on baseline issues around `/compact`, resume/fork, and standalone tasks that could cause duplicate context injection or suppress `<model_switch>` instructions. This PR re-centers the invariants around regular turns: - regular turns persist model-visible context diffs/full reinjection and update the `reference_context_item` - standalone tasks do not advance those regular-turn baselines - compaction clears the baseline when replacement history may have stripped the referenced context diffs ## Follow-ups (TODOs left in code) - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more comprehensively - `TODO(ccunningham)`: include pending incoming context items in pre-turn compaction threshold estimation - `TODO(ccunningham)`: inject updated personality spec alongside `<model_switch>` so some model-switch paths can avoid forced full reinjection - `TODO(ccunningham)`: review task turn lifecycle (`TurnStarted`/`TurnComplete`) behavior and emit task-start context diffs for task types that should have them (excluding `/compact`) ## Validation - `just fmt` - CI should cover the updated compaction/resume/model-visible snapshot expectations and rollout-hydration behavior - I did not rerun the full local test suite after the latest resume-lookup / rollout-persistence simplifications	2026-02-20 23:13:08 -08:00
Charley Cunningham	c16f9daaaf	Add model-visible context layout snapshot tests (#12073 ) ## Summary - add a dedicated `core/tests/suite/model_visible_layout.rs` snapshot suite to materialize model-visible request layout in high-value scenarios - add three reviewer-focused snapshot scenarios: - turn-level context updates (cwd / permissions / personality) - first post-resume turn with model hydration + personality change - first post-resume turn where pre-turn model override matches rollout model - wire the new suite into `core/tests/suite/mod.rs` - commit generated `insta` snapshots under `core/tests/suite/snapshots/` ## Why This creates a stable, reviewable baseline of model-visible context layout against `main` before follow-on context-management refactors. It lets subsequent PRs show focused snapshot diffs for behavior changes instead of introducing the test surface and behavior changes at once. ## Testing - `just fmt` - `INSTA_UPDATE=always cargo test -p codex-core model_visible_layout`	2026-02-17 22:30:29 -08:00
Charley Cunningham	eb68767f2f	Unify remote compaction snapshot mocks around default endpoint behavior (#12050 ) ## Summary - standardize remote compaction test mocking around one default behavior in shared helpers - make default remote compact mocks mirror production shape: keep `message/user` + `message/developer`, drop assistant/tool artifacts, then append a summary user message - switch non-special `compact_remote` tests to the shared default mock instead of ad-hoc JSON payloads ## Special-case tests that still use explicit mocks - remote compaction error payload / HTTP failure behavior - summary-only compact output behavior - manual `/compact` with no prior user messages - stale developer-instruction injection coverage ## Why This removes inconsistent manual remote compaction fixtures and gives us one source of truth for normal remote compact behavior, while preserving explicit mocks only where tests intentionally cover non-default behavior.	2026-02-17 18:18:47 -08:00
Charley Cunningham	85034b189e	core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487 ) This PR keeps compaction context-layout test coverage separate from runtime compaction behavior changes, so runtime logic review can stay focused. ## Included - Adds reusable context snapshot helpers in `core/tests/common/context_snapshot.rs` for rendering model-visible request/history shapes. - Standardizes helper naming for readability: - `format_request_input_snapshot` - `format_response_items_snapshot` - `format_labeled_requests_snapshot` - `format_labeled_items_snapshot` - Expands snapshot coverage for both local and remote compaction flows: - pre-turn auto-compaction - pre-turn failure/context-window-exceeded paths - mid-turn continuation compaction - manual `/compact` with and without prior user turns - Captures both sides where relevant: - compaction request shape - post-compaction history layout shape - Adds/uses shared request-inspection helpers so assertions target structured request content instead of ad-hoc JSON string parsing. - Aligns snapshots/assertions to current behavior and leaves explicit `TODO(ccunningham)` notes where behavior is known and intentionally deferred. ## Not Included - No runtime compaction logic changes. - No model-visible context/state behavior changes.	2026-02-14 19:57:10 -08:00

19 Commits