codex

mirror of https://github.com/openai/codex.git synced 2026-04-28 16:45:54 +00:00

Author	SHA1	Message	Date
sayan-oai	d272f45058	move plugin/skill instructions into dev msg and reorder (#14609 ) Move the general `Apps`, `Skills` and `Plugins` instructions blocks out of `user_instructions` and into the developer message, with new `Apps -> Skills -> Plugins` order for better clarity. Also wrap those sections in stable XML-style instruction tags (like other sections) and update prompt-layout tests/snapshots. This makes the tests less brittle in snapshot output (we can parse the sections), and it consolidates the capability instructions in one place. #### Tests Updated snapshots, added tests. `<AGENTS_MD>` disappearing in snapshots is expected: before this change, the wrapped user-instructions message was kept alive by `Skills` content. Now that `Skills` and `Plugins` are in the developer message, that wrapper only appears when there is real project-doc/user-instructions content. --------- Co-authored-by: Charley Cunningham <ccunningham@openai.com>	2026-03-13 20:51:01 -07:00
Charley Cunningham	bc24017d64	Add Smart Approvals guardian review across core, app-server, and TUI (#13860 ) ## Summary - add `approvals_reviewer = "user" \| "guardian_subagent"` as the runtime control for who reviews approval requests - route Smart Approvals guardian review through core for command execution, file changes, managed-network approvals, MCP approvals, and delegated/subagent approval flows - expose guardian review in app-server with temporary unstable `item/autoApprovalReview/{started,completed}` notifications carrying `targetItemId`, `review`, and `action` - update the TUI so Smart Approvals can be enabled from `/experimental`, aligned with the matching `/approvals` mode, and surfaced clearly while reviews are pending or resolved ## Runtime model This PR does not introduce a new `approval_policy`. Instead: - `approval_policy` still controls when approval is needed - `approvals_reviewer` controls who reviewable approval requests are routed to: - `user` - `guardian_subagent` `guardian_subagent` is a carefully prompted reviewer subagent that gathers relevant context and applies a risk-based decision framework before approving or denying the request. The `smart_approvals` feature flag is a rollout/UI gate. Core runtime behavior keys off `approvals_reviewer`. When Smart Approvals is enabled from the TUI, it also switches the current `/approvals` settings to the matching Smart Approvals mode so users immediately see guardian review in the active thread: - `approval_policy = on-request` - `approvals_reviewer = guardian_subagent` - `sandbox_mode = workspace-write` Users can still change `/approvals` afterward. Config-load behavior stays intentionally narrow: - plain `smart_approvals = true` in `config.toml` remains just the rollout/UI gate and does not auto-set `approvals_reviewer` - the deprecated `guardian_approval = true` alias migration does backfill `approvals_reviewer = "guardian_subagent"` in the same scope when that reviewer is not already configured there, so old configs preserve their original guardian-enabled behavior ARC remains a separate safety check. For MCP tool approvals, ARC escalations now flow into the configured reviewer instead of always bypassing guardian and forcing manual review. ## Config stability The runtime reviewer override is stable, but the config-backed app-server protocol shape is still settling. - `thread/start`, `thread/resume`, and `turn/start` keep stable `approvalsReviewer` overrides - the config-backed `approvals_reviewer` exposure returned via `config/read` (including profile-level config) is now marked `[UNSTABLE]` / experimental in the app-server protocol until we are more confident in that config surface ## App-server surface This PR intentionally keeps the guardian app-server shape narrow and temporary. It adds generic unstable lifecycle notifications: - `item/autoApprovalReview/started` - `item/autoApprovalReview/completed` with payloads of the form: - `{ threadId, turnId, targetItemId, review, action? }` `review` is currently: - `{ status, riskScore?, riskLevel?, rationale? }` - where `status` is one of `inProgress`, `approved`, `denied`, or `aborted` `action` carries the guardian action summary payload from core when available. This lets clients render temporary standalone pending-review UI, including parallel reviews, even when the underlying tool item has not been emitted yet. These notifications are explicitly documented as `[UNSTABLE]` and expected to change soon. This PR does not persist guardian review state onto `thread/read` tool items. The intended follow-up is to attach guardian review state to the reviewed tool item lifecycle instead, which would improve consistency with manual approvals and allow thread history / reconnect flows to replay guardian review state directly. ## TUI behavior - `/experimental` exposes the rollout gate as `Smart Approvals` - enabling it in the TUI enables the feature and switches the current session to the matching Smart Approvals `/approvals` mode - disabling it in the TUI clears the persisted `approvals_reviewer` override when appropriate and returns the session to default manual review when the effective reviewer changes - `/approvals` still exposes the reviewer choice directly - the TUI renders: - pending guardian review state in the live status footer, including parallel review aggregation - resolved approval/denial state in history ## Scope notes This PR includes the supporting core/runtime work needed to make Smart Approvals usable end-to-end: - shell / unified-exec / apply_patch / managed-network / MCP guardian review - delegated/subagent approval routing into guardian review - guardian review risk metadata and action summaries for app-server/TUI - config/profile/TUI handling for `smart_approvals`, `guardian_approval` alias migration, and `approvals_reviewer` - a small internal cleanup of delegated approval forwarding to dedupe fallback paths and simplify guardian-vs-parent approval waiting (no intended behavior change) Out of scope for this PR: - redesigning the existing manual approval protocol shapes - persisting guardian review state onto app-server `ThreadItem`s - delegated MCP elicitation auto-review (the current delegated MCP guardian shim only covers the legacy `RequestUserInput` path) --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-13 15:27:00 -07:00
Rasmus Rygaard	53d5972226	Reapply "Pass more params to compaction" (#14298 ) (#14521 ) This reverts commit `8af97ce4b0`. Confirmed that this runs locally without the previous issues with tool use	2026-03-12 23:27:21 +00:00
Charley Cunningham	f5bb338fdb	Defer initial context insertion until the first turn (#14313 ) ## Summary - defer fresh-session `build_initial_context()` until the first real turn instead of seeding model-visible context during startup - rely on the existing `reference_context_item == None` turn-start path to inject full initial context on that first real turn (and again after baseline resets such as compaction) - add a regression test for `InitialHistory::New` and update affected deterministic tests / snapshots around developer-message layout, collaboration instructions, personality updates, and compact request shapes ## Notes - this PR does not add any special empty-thread `/compact` behavior - most of the snapshot churn is the direct result of moving the initial model-visible context from startup to the first real turn, so first-turn request layouts no longer contain a pre-user startup copy of permissions / environment / other developer-visible context - remote manual `/compact` with no prior user still skips the remote compact request; local first-turn `/compact` still issues a compact request, but that request now reflects the lack of startup-seeded context --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-11 12:33:10 -07:00
Rasmus Rygaard	7f22329389	Revert "Pass more params to compaction" (#14298 )	2026-03-11 12:33:10 -07:00
Ahmed Ibrahim	39c1bc1c68	Add realtime start instructions config override (#14270 ) - add `realtime_start_instructions` config support - thread it into realtime context updates, schema, docs, and tests	2026-03-11 12:33:09 -07:00
Rasmus Rygaard	2621ba17e3	Pass more params to compaction (#14247 ) Pass more params to /compact. This should give us parity with the /responses endpoint to improve caching. I'm torn about the MCP await. Blocking will give us parity but it seems like we explicitly don't block on MCPs. Happy either way	2026-03-11 12:33:09 -07:00
pash-openai	2f5b01abd6	add fast mode toggle (#13212 ) - add a local Fast mode setting in codex-core (similar to how model id is currently stored on disk locally) - send `service_tier=priority` on requests when Fast is enabled - add `/fast` in the TUI and persist it locally - feature flag	2026-03-02 20:29:33 -08:00
Ahmed Ibrahim	b20b6aa46f	Update realtime websocket API (#13265 ) - migrate the realtime websocket transport to the new session and handoff flow - make the realtime model configurable in config.toml and use API-key auth for the websocket --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-02 16:05:40 -08:00
Ahmed Ibrahim	0aeb55bf08	Record realtime close marker on replacement (#13058 ) ## Summary - record a realtime close developer message when a new realtime session replaces an active one - assert the replacement marker through the mocked responses request path --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charles Cunningham <ccunningham@openai.com>	2026-03-01 13:54:12 -08:00
Rasmus Rygaard	73eaebbd1c	Propagate session ID when compacting (#12802 ) We propagate the session ID when sending requests for inference but we don't do the same for compaction requests. This makes it hard to link compaction requests to their session for debugging purposes	2026-02-25 19:17:38 +00:00
Michael Bolin	1af2a37ada	chore: remove codex-core public protocol/shell re-exports (#12432 ) ## Why `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules from `codex-protocol` and `codex-shell-command`. That made it easy for workspace crates to import those APIs through `codex-core`, which in turn hides dependency edges and makes it harder to reduce compile-time coupling over time. This change removes those public re-exports so call sites must import from the source crates directly. Even when a crate still depends on `codex-core` today, this makes dependency boundaries explicit and unblocks future work to drop `codex-core` dependencies where possible. ## What Changed - Removed public re-exports from `codex-rs/core/src/lib.rs` for: - `codex_protocol::protocol` and related protocol/model types (including `InitialHistory`) - `codex_protocol::config_types` (`protocol_config_types`) - `codex_shell_command::{bash, is_dangerous_command, is_safe_command, parse_command, powershell}` - Migrated workspace Rust call sites to import directly from: - `codex_protocol::protocol` - `codex_protocol::config_types` - `codex_protocol::models` - `codex_shell_command` - Added explicit `Cargo.toml` dependencies (`codex-protocol` / `codex-shell-command`) in crates that now import those crates directly. - Kept `codex-core` internal modules compiling by using `pub(crate)` aliases in `core/src/lib.rs` (internal-only, not part of the public API). - Updated the two utility crates that can already drop a `codex-core` dependency edge entirely: - `codex-utils-approval-presets` - `codex-utils-cli` ## Verification - `cargo test -p codex-utils-approval-presets` - `cargo test -p codex-utils-cli` - `cargo check --workspace --all-targets` - `just clippy`	2026-02-20 23:45:35 -08:00
Charley Cunningham	bb0ac5be70	Fix compaction context reinjection and model baselines (#12252 ) ## Summary - move regular-turn context diff/full-context persistence into `run_turn` so pre-turn compaction runs before incoming context updates are recorded - after successful pre-turn compaction, rely on a cleared `reference_context_item` to trigger full context reinjection on the follow-up regular turn (manual `/compact` keeps replacement history summary-only and also clears the baseline) - preserve `<model_switch>` when full context is reinjected, and inject it before the rest of the full-context items - scope `reference_context_item` and `previous_model` to regular user turns only so standalone tasks (`/compact`, shell, review, undo) cannot suppress future reinjection or `<model_switch>` behavior - make context-diff persistence + `reference_context_item` updates explicit in the regular-turn path, with clearer docs/comments around the invariant - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots (only regular turns persist `TurnContextItem` now) - simplify resume/fork previous-model/reference-baseline hydration by looking up the last surviving turn context from rollout lifecycle events, including rollback and compaction-crossing handling - remove the legacy fallback that guessed from bare `TurnContext` rollouts without lifecycle events - update compaction/remote-compaction/model-visible snapshots and compact test assertions (including remote compaction mock response shape) ## Why We were persisting incoming context items before spawning the regular turn task, which let pre-turn compaction requests accidentally include incoming context diffs without the new user message. Fixing that exposed follow-on baseline issues around `/compact`, resume/fork, and standalone tasks that could cause duplicate context injection or suppress `<model_switch>` instructions. This PR re-centers the invariants around regular turns: - regular turns persist model-visible context diffs/full reinjection and update the `reference_context_item` - standalone tasks do not advance those regular-turn baselines - compaction clears the baseline when replacement history may have stripped the referenced context diffs ## Follow-ups (TODOs left in code) - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more comprehensively - `TODO(ccunningham)`: include pending incoming context items in pre-turn compaction threshold estimation - `TODO(ccunningham)`: inject updated personality spec alongside `<model_switch>` so some model-switch paths can avoid forced full reinjection - `TODO(ccunningham)`: review task turn lifecycle (`TurnStarted`/`TurnComplete`) behavior and emit task-start context diffs for task types that should have them (excluding `/compact`) ## Validation - `just fmt` - CI should cover the updated compaction/resume/model-visible snapshot expectations and rollout-hydration behavior - I did not rerun the full local test suite after the latest resume-lookup / rollout-persistence simplifications	2026-02-20 23:13:08 -08:00
Charley Cunningham	eb68767f2f	Unify remote compaction snapshot mocks around default endpoint behavior (#12050 ) ## Summary - standardize remote compaction test mocking around one default behavior in shared helpers - make default remote compact mocks mirror production shape: keep `message/user` + `message/developer`, drop assistant/tool artifacts, then append a summary user message - switch non-special `compact_remote` tests to the shared default mock instead of ad-hoc JSON payloads ## Special-case tests that still use explicit mocks - remote compaction error payload / HTTP failure behavior - summary-only compact output behavior - manual `/compact` with no prior user messages - stale developer-instruction injection coverage ## Why This removes inconsistent manual remote compaction fixtures and gives us one source of truth for normal remote compact behavior, while preserving explicit mocks only where tests intentionally cover non-default behavior.	2026-02-17 18:18:47 -08:00
Charley Cunningham	85034b189e	core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487 ) This PR keeps compaction context-layout test coverage separate from runtime compaction behavior changes, so runtime logic review can stay focused. ## Included - Adds reusable context snapshot helpers in `core/tests/common/context_snapshot.rs` for rendering model-visible request/history shapes. - Standardizes helper naming for readability: - `format_request_input_snapshot` - `format_response_items_snapshot` - `format_labeled_requests_snapshot` - `format_labeled_items_snapshot` - Expands snapshot coverage for both local and remote compaction flows: - pre-turn auto-compaction - pre-turn failure/context-window-exceeded paths - mid-turn continuation compaction - manual `/compact` with and without prior user turns - Captures both sides where relevant: - compaction request shape - post-compaction history layout shape - Adds/uses shared request-inspection helpers so assertions target structured request content instead of ad-hoc JSON string parsing. - Aligns snapshots/assertions to current behavior and leaves explicit `TODO(ccunningham)` notes where behavior is known and intentionally deferred. ## Not Included - No runtime compaction logic changes. - No model-visible context/state behavior changes.	2026-02-14 19:57:10 -08:00
Ahmed Ibrahim	40de788c4d	Clamp auto-compact limit to context window (#11516 ) - Clamp auto-compaction to the minimum of configured limit and 90% of context window - Add an e2e compact test for clamped behavior - Update remote compact tests to account for earlier auto-compaction in setup turns	2026-02-11 17:41:08 -08:00
Ahmed Ibrahim	ba8b5d9018	Treat compaction failure as failure state (#10927 ) - Return compaction errors from local and remote compaction flows.\n- Stop turns/tasks when auto-compaction fails instead of continuing execution.	2026-02-06 13:51:46 -08:00
Charley Cunningham	143daadb31	core: refresh developer instructions after compaction replacement history (#10574 ) ## Summary When replaying compacted history (especially `replacement_history` from remote compaction), we should not keep stale developer messages from older session state. This PR trims developer- role messages from compacted replacement history and reinjects fresh developer instructions derived from current turn/session state. This aligns compaction replay behavior with the intended "fresh instructions after summary" model. ## Problem Compaction replay had two paths: - `Compacted { replacement_history: None }`: rebuilt with fresh initial context - `Compacted { replacement_history: Some(...) }`: previously used raw replacement history as-is The second path could carry stale developer instructions (permissions/personality/collab-mode guidance) across session changes. ## What Changed ### 1) Added helper to refresh compacted developer instructions - File: `codex-rs/core/src/compact.rs` - Function: `refresh_compacted_developer_instructions(...)` Behavior: - remove all `ResponseItem::Message { role: "developer", .. }` from compacted history - append fresh developer messages from current `build_initial_context(...)` ### 2) Applied helper in remote compaction flow - File: `codex-rs/core/src/compact_remote.rs` - After receiving compact endpoint output, refresh developer instructions before replacing history and persisting `replacement_history`. ### 3) Applied helper while reconstructing history from rollout - File: `codex-rs/core/src/codex.rs` - In `reconstruct_history_from_rollout(...)`, when processing `Compacted` entries with `replacement_history`, refresh developer instructions instead of directly replacing with raw history. ## Non-Goals / Follow-up This PR does not address the existing first-turn-after-resume double-injection behavior. A follow-up PR will handle resume-time dedup/idempotence separately. If you want, I can also give you a shorter “squash-merge friendly” version of the description. ## Codex author `codex fork 019c25e6-706e-75d1-9198-688ec00a8256`	2026-02-06 12:25:08 -08:00
Eric Traut	dd80e332c4	Removed the "remote_compaction" feature flag (#10840 ) This feature is always on now	2026-02-05 23:54:57 -08:00
Charley Cunningham	dc7007beaa	Fix remote compaction estimator/payload instruction small mismatch (#10692 ) ## Summary This PR fixes a deterministic mismatch in remote compaction where pre-trim estimation and the `/v1/responses/compact` payload could use different base instructions. Before this change: - pre-trim estimation used model-derived instructions (`model_info.get_model_instructions(...)`) - compact payload used session base instructions (`sess.get_base_instructions()`) After this change: - remote pre-trim estimation and compact payload both use the same `BaseInstructions` instance from session state. ## Changes - Added a shared estimator entry point in `ContextManager`: - `estimate_token_count_with_base_instructions(&self, base_instructions: &BaseInstructions) -> Option<i64>` - Kept `estimate_token_count(&TurnContext)` as a thin wrapper that resolves model/personality instructions and delegates to the new helper. - Updated remote compaction flow to fetch base instructions once and reuse it for both: - trim preflight estimation - compact request payload construction - Added regression coverage for parity and behavior: - unit test verifying explicit-base estimator behavior - integration test proving remote compaction uses session override instructions and trims accordingly ## Why this matters This removes a deterministic divergence source where pre-trim could think the request fits while the actual compact request exceeded context because its instructions were longer/different. ## Scope In scope: - estimator/payload base-instructions parity in remote compaction Out of scope: - retry-on-`context_length_exceeded` - compaction threshold/headroom policy changes - broader trimming policy changes ## Codex author: `codex fork 019c2b24-c2df-7b31-a482-fb8cf7a28559`	2026-02-04 23:24:06 -08:00
pakrym-oai	53d8474061	Ignore remote_compact_trims_function_call_history_to_fit_context_window on windows (#10474 )	2026-02-02 22:47:38 -08:00
pakrym-oai	cbfd2a37cc	Trim compaction input (#10374 ) Two fixes: 1. Include trailing tool output in the total context size calculation. Otherwise when checking whether compaction should run we ignore newly added outputs. 2. Trim trailing tool output/tool calls until we can fit the request into the model context size. Otherwise the compaction endpoint will fail to compact. We only trim items that can be reproduced again by the model (tool calls, tool call outputs).	2026-02-02 19:03:11 -08:00
sayan-oai	fc05374344	chore: add phase to message responseitem (#10455 ) ### What add wiring for `phase` field on `ResponseItem::Message` to lay groundwork for differentiating model preambles and final messages. currently optional. follows pattern in #9698. updated schemas with `just write-app-server-schema` so we can see type changes. ### Tests Updated existing tests for SSE parsing and hydrating from history	2026-02-03 02:52:26 +00:00
Ahmed Ibrahim	b7edeee8ca	compaction (#10034 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2026-01-28 11:36:11 -08:00
jif-oai	83775f4df1	feat: ephemeral threads (#9765 ) Add ephemeral threads capabilities. Only exposed through the `app-server` v2 The idea is to disable the rollout recorder for those threads.	2026-01-24 14:57:40 +00:00
pakrym-oai	b511c38ddb	Support end_turn flag (#9698 ) Experimental flag that signals the end of the turn.	2026-01-22 17:27:48 +00:00
charley-oai	4a9c2bcc5a	Add text element metadata to types (#9235 ) Initial type tweaking PR to make the diff of https://github.com/openai/codex/pull/9116 smaller This should not change any behavior, just adds some fields to types	2026-01-14 16:41:50 -08:00
jif-oai	1aed01e99f	renaming: task to turn (#8963 )	2026-01-09 17:31:17 +00:00
Anton Panasenko	807f8a43c2	feat: expose outputSchema to user_turn/turn_start app_server API (#8377 ) What changed - Added `outputSchema` support to the app-server APIs, mirroring `codex exec --output-schema` behavior. - V1 `sendUserTurn` now accepts `outputSchema` and constrains the final assistant message for that turn. - V2 `turn/start` now accepts `outputSchema` and constrains the final assistant message for that turn (explicitly per-turn only). Core behavior - `Op::UserTurn` already supported `final_output_json_schema`; now V1 `sendUserTurn` forwards `outputSchema` into that field. - `Op::UserInput` now carries `final_output_json_schema` for per-turn settings updates; core maps it into `SessionSettingsUpdate.final_output_json_schema` so it applies to the created turn context. - V2 `turn/start` does NOT persist the schema via `OverrideTurnContext` (it’s applied only for the current turn). Other overrides (cwd/model/etc) keep their existing persistent behavior. API / docs - `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema: Option<serde_json::Value>` to `SendUserTurnParams` (serialized as `outputSchema`). - `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema: Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`). - `codex-rs/app-server/README.md`: document `outputSchema` for `turn/start` and clarify it applies only to the current turn. - `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1 `sendUserTurn` and v2 `turn/start`. Tests added/updated - New app-server integration tests asserting `outputSchema` is forwarded into outbound `/responses` requests as `text.format`: - `codex-rs/app-server/tests/suite/output_schema.rs` - `codex-rs/app-server/tests/suite/v2/output_schema.rs` - Added per-turn semantics tests (schema does not leak to the next turn): - `send_user_turn_output_schema_is_per_turn_v1` - `turn_start_output_schema_is_per_turn_v2` - Added protocol wire-compat tests for the merged op: - serialize omits `final_output_json_schema` when `None` - deserialize works when field is missing - serialize includes `final_output_json_schema` when `Some(schema)` Call site updates (high level) - Updated all `Op::UserInput { .. }` constructions to include `final_output_json_schema`: - `codex-rs/app-server/src/codex_message_processor.rs` - `codex-rs/core/src/codex_delegate.rs` - `codex-rs/mcp-server/src/codex_tool_runner.rs` - `codex-rs/tui/src/chatwidget.rs` - `codex-rs/tui2/src/chatwidget.rs` - plus impacted core tests. Validation - `just fmt` - `cargo test -p codex-core` - `cargo test -p codex-app-server` - `cargo test -p codex-mcp-server` - `cargo test -p codex-tui` - `cargo test -p codex-tui2` - `cargo test -p codex-protocol` - `cargo clippy --all-features --tests --profile dev --fix -- -D warnings`	2026-01-05 10:27:00 -08:00
pakrym-oai	b3ddd50eee	Remote compact for API-key users (#7835 )	2025-12-12 10:05:02 -08:00
jif-oai	9ba27cfa0a	feat: add compaction event (#7289 )	2025-11-25 16:12:14 +00:00
pakrym-oai	75f38f16dd	Run remote auto compaction (#6879 )	2025-11-19 00:43:58 -08:00
jif-oai	838531d3e4	feat: remote compaction (#6795 ) Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-18 16:51:16 +00:00

33 Commits