Commit Graph

652 Commits

Author SHA1 Message Date
Charles Cunningham
be08067aca Stabilize metadata spawn test with local git repo
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 17:28:04 -08:00
Charles Cunningham
c727a66bb9 Carry trace_id in refreshed turn contexts
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 17:10:03 -08:00
Charles Cunningham
4b6c5880fb Prevent stale same-turn context refresh overwrites
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:20 -08:00
Charles Cunningham
03119b822c Serialize session config updates with MCP side effects
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:20 -08:00
Charles Cunningham
23e54a9593 Guard refreshed turn-context installs by sub_id
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:20 -08:00
Charles Cunningham
ec5f273255 Install refreshed turn metadata tasks atomically
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:20 -08:00
Charles Cunningham
f06e7c9bb9 Cancel refreshed turn metadata tasks on teardown
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:20 -08:00
Charles Cunningham
1f50631422 Use refreshed turn context for network approvals
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:19 -08:00
Charles Cunningham
82e08ddece Persist refreshed context before first sampling request
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:19 -08:00
Charles Cunningham
d88b47d010 Advance previous-turn baseline on mid-turn compaction
Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:19 -08:00
Charles Cunningham
068272f32f Support mid-turn realtime context refresh
Update active turn contexts when realtime starts or stops, keep previous_turn_settings aligned with committed same-turn context updates, and teach replay plus remote compact snapshots to use the latest TurnContextItem within a turn.

Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:19 -08:00
Charles Cunningham
d8f4093818 Fix rebased turn-context refresh regressions
Restore cross-turn model-switch diffing to use previous-turn settings while keeping same-turn updates keyed to the persisted reference context. Also gate pre-request mid-turn compaction on the immediately preceding follow-up response so continuation requests do not compact repeatedly, and drop the synthetic compaction prompt from rebuilt local compact history.

Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:19 -08:00
Charles Cunningham
2327519d34 Allow turn context refresh between sampling requests
Reload the active turn context before each sampling request so nudges and other mid-turn setting changes can take effect without mutating TurnContext in place. Move mid-turn compaction to the start of follow-up requests, skip separate diff injection when compaction re-establishes full context, and persist TurnContextItem baselines only when the active context actually changes.

Also refresh the active turn context after override_turn_context, reset the reused model client session on mid-turn model changes, and key model-switch diffs off the persisted reference context so same-turn switch-backs still re-inject model instructions.

Co-authored-by: Codex <noreply@openai.com>
2026-03-05 15:19:19 -08:00
Owen Lin
aa3fe8abf8 feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602)
This PR adds a durable trace linkage for each turn by storing the active
trace ID on the rollout TurnContext record stored in session rollout
files.

Before this change, we propagated trace context at runtime but didn’t
persist a stable per-turn trace key in rollout history. That made
after-the-fact debugging harder (for example, mapping a historical turn
to the corresponding trace in datadog). This sets us up for much easier
debugging in the future.

### What changed
- Added an optional `trace_id` to TurnContextItem (rollout schema).
- Added a small OTEL helper to read the current span trace ID.
- Captured `trace_id` when creating `TurnContext` and included it in
`to_turn_context_item()`.
- Updated tests and fixtures that construct TurnContextItem so
older/no-trace cases still work.

### Why this approach
TurnContext is already the canonical durable per-turn metadata in
rollout. This keeps ownership clean: trace linkage lives with other
persisted turn metadata.
2026-03-05 13:26:48 -08:00
Owen Lin
926b2f19e8 feat(app-server): support mcp elicitations in v2 api (#13425)
This adds a first-class server request for MCP server elicitations:
`mcpServer/elicitation/request`.

Until now, MCP elicitation requests only showed up as a raw
`codex/event/elicitation_request` event from core. That made it hard for
v2 clients to handle elicitations using the same request/response flow
as other server-driven interactions (like shell and `apply_patch`
tools).

This also updates the underlying MCP elicitation request handling in
core to pass through the full MCP request (including URL and form data)
so we can expose it properly in app-server.

### Why not `item/mcpToolCall/elicitationRequest`?
This is because MCP elicitations are related to MCP servers first, and
only optionally to a specific MCP tool call.

In the MCP protocol, elicitation is a server-to-client capability: the
server sends `elicitation/create`, and the client replies with an
elicitation result. RMCP models it that way as well.

In practice an elicitation is often triggered by an MCP tool call, but
not always.

### What changed
- add `mcpServer/elicitation/request` to the v2 app-server API
- translate core `codex/event/elicitation_request` events into the new
v2 server request
- map client responses back into `Op::ResolveElicitation` so the MCP
server can continue
- update app-server docs and generated protocol schema
- add an end-to-end app-server test that covers the full round trip
through a real RMCP elicitation flow
- The new test exercises a realistic case where an MCP tool call
triggers an elicitation, the app-server emits
mcpServer/elicitation/request, the client accepts it, and the tool call
resumes and completes successfully.

### app-server API flow
- Client starts a thread with `thread/start`.
- Client starts a turn with `turn/start`.
- App-server sends `item/started` for the `mcpToolCall`.
- While that tool call is in progress, app-server sends
`mcpServer/elicitation/request`.
- Client responds to that request with `{ action: "accept" | "decline" |
"cancel" }`.
- App-server sends `serverRequest/resolved`.
- App-server sends `item/completed` for the mcpToolCall.
- App-server sends `turn/completed`.
- If the turn is interrupted while the elicitation is pending,
app-server still sends `serverRequest/resolved` before the turn
finishes.
2026-03-05 07:20:20 -08:00
jif-oai
0cc6835416 feat: ultra polish package manager (#13573)
See the readme
2026-03-05 13:02:30 +00:00
jif-oai
f304b2ef62 feat: bind package manager (#13571) 2026-03-05 11:57:13 +00:00
Ahmed Ibrahim
8f828f8a43 Reduce realtime audio submission log noise (#13539)
- lower `submission_dispatch` span logging to debug for realtime audio
submissions only
- keep other submission spans at info and add a targeted test for the
level selection

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-04 22:44:14 -08:00
sayan-oai
d44398905b feat: track plugins mcps/apps and add plugin info to user_instructions (#13433)
### first half of changes, followed by #13510

Track plugin capabilities as derived summaries on `PluginLoadOutcome`
for enabled plugins with at least one skill/app/mcp.

Also add `Plugins` section to `user_instructions` injected on session
start. These introduce the plugins concept and list enabled plugins, but
do NOT currently include paths to enabled plugins or details on what
apps/mcps the plugins contain (current plan is to inject this on
@-mention). that can be adjusted in a follow up and based on evals.

### tests
Added/updated tests, confirmed locally that new `Plugins` section +
currently enabled plugins show up in `user_instructions`.
2026-03-04 19:46:13 -08:00
Won Park
229e6d0347 image-gen-event/client_processing (#13512)
enabling client-side to process with image-generation capabilities
(setting app-server)
2026-03-04 16:54:38 -08:00
jif-oai
2322e49549 feat: external artifacts builder (#13485)
This PR reverts the built-in artifact render while a decision is being
reached. No impact expected on any features
2026-03-04 20:22:34 +00:00
Owen Lin
27724f6ead feat(core, tracing): add a span representing a turn (#13424)
This is PR 3 of the app-server tracing rollout.

PRs https://github.com/openai/codex/pull/13285 and
https://github.com/openai/codex/pull/13368 gave us inbound request spans
in app-server and propagated trace context through Submission. This
change finishes the next piece in core: when a request actually starts a
turn, we now create a core-owned long-lived span that stays open for the
real lifetime of the turn.

What changed:
- `Session::spawn_task` can now optionally create a long-lived turn span
and run the spawned task inside it
- `turn/start` uses that path, so normal turn execution stays under a
single core-owned span after the async handoff
- `review/start` uses the same pattern
- added a unit test that verifies the spawned turn task inherits the
submission dispatch trace ancestry

**Why**
The app-server request span is intentionally short-lived. Once work
crosses into core, we still want one span that covers the actual
execution window until completion or interruption. This keeps that
ownership where it belongs: in the layer that owns the runtime
lifecycle.
2026-03-04 11:09:17 -08:00
jif-oai
f72ab43fd1 feat: memories in workspace write (#13467) 2026-03-04 13:00:26 +00:00
jif-oai
49634b7f9c add metric for per-turn token usage (#13454) 2026-03-04 10:17:25 +00:00
Michael Bolin
bfff0c729f config: enforce enterprise feature requirements (#13388)
## Why

Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.

This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.

It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.

The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.

## What Changed

Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:

```toml
[features]
personality = true
unified_exec = false
```

Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.

- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.

## Docs

`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
2026-03-04 04:40:22 +00:00
Owen Lin
52521a5e40 feat(app-server): propagate app-server trace context into core (#13368)
### Summary
Propagate trace context originating at app-server RPC method handlers ->
codex core submission loop (so this includes spans such as `run_turn`!).
This implements PR 2 of the app-server tracing rollout.

This also removes the old lower-level env-based reparenting in core so
explicit request/submission ancestry wins instead of being overridden by
ambient `TRACEPARENT` state.

### What changed
- Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
- Taught `Codex::submit()` / `submit_with_id()` to automatically capture
the current span context when constructing or forwarding a submission
- Wrapped the core submission loop in a submission_dispatch span
parented from Submission.trace
- Warn on invalid submission trace carriers and ignore them cleanly
- Removed the old env-based downstream reparenting path in core task
execution
- Stopped OTEL provider init from implicitly attaching env trace context
process-wide
- Updated mcp-server Submission call sites for the new field

Added focused unit tests for:
- capturing trace context into Submission
- preferring `Submission.trace` when building the core dispatch span

### Why
PR 1 gave us consistent inbound request spans in app-server, but that
only covered the transport boundary. For long-running work like turns
and reviews, the important missing piece was preserving ancestry after
the request handler returns and core continues work on a different async
path.

This change makes that handoff explicit and keeps the parentage rules
simple:
- app-server request span sets the current context
- `Submission.trace` snapshots that context
- core restores it once, at the submission boundary
- deeper core spans inherit naturally

That also lets us stop relying on env-based reparenting for this path,
which was too ambient and could override explicit ancestry.
2026-03-04 01:03:45 +00:00
sayan-oai
082682a628 feat: load plugin apps (#13401)
load plugin-apps from `.app.json`.

make apps runtime-mentionable iff `codex_apps` MCP actually exposes
tools for that `connector_id`.

if the app isn't available, it's filtered out of runtime connector set,
so no tools are added and no app-mentions resolve.

right now we don't have a clean cli-side error for an app not being
installed. can look at this after.

### Tests
Added tests, tested locally that using a plugin that bundles an app
picks up the app.
2026-03-03 16:29:15 -08:00
Charley Cunningham
299b8ac445 tui: align pending steers with core acceptance (#12868)
## Summary
- submit `Enter` steers immediately while a turn is already running
instead of routing them through `queued_user_messages`
- keep those submitted steers visible in the footer as `pending_steers`
until core records them as a user message or aborts the turn
- reconcile pending steers on `ItemCompleted(UserMessage)`, not
`RawResponseItem`
- emit user-message item lifecycle for leftover pending input at task
finish, then remove the TUI `TurnComplete` fallback
- keep `queued_user_messages` for actual queued drafts, rendered below
pending steers

## Problem
While the assistant was generating, pressing `Enter` could send the
input into `queued_user_messages`. That queue only drains after the turn
ends, so ordinary steers behaved like queued drafts instead of landing
at the next core sampling boundary.

The first version of this fix also used `RawResponseItem` to decide when
a steer had landed. Review feedback was that this is the wrong
abstraction for client behavior.

There was also a late edge case in core: if pending steer input was
accepted after the final sampling decision but before `TurnComplete`,
core would record that user message into history at task finish without
emitting `ItemStarted(UserMessage)` / `ItemCompleted(UserMessage)`. TUI
had a fallback to paper over that gap locally.

## Approach
- `Enter` during an active turn now submits a normal `Op::UserTurn`
immediately
- TUI keeps a local pending-steer preview instead of rendering that user
message into history immediately
- when core records the steer as `ItemCompleted(UserMessage)`, TUI
matches and removes the corresponding pending preview, then renders the
committed user message
- core now emits the same user-message lifecycle when
`on_task_finished(...)` drains leftover pending user input, before
`TurnComplete`
- with that lifecycle gap closed in core, TUI no longer needs to flush
pending steers into history on `TurnComplete`
- if the turn is interrupted, pending steers and queued drafts are both
restored into the composer, with pending steers first

## Notes
- `Tab` still uses the real queued-message path
- `queued_user_messages` and `pending_steers` are separate state with
separate semantics
- the pending-steer matching key is built directly from `UserInput`
- this removes the new TUI dependency on `RawResponseItem`

## Validation
- `just fmt`
- `cargo test -p codex-core
task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input --
--nocapture`
- `cargo test -p codex-tui`
2026-03-03 15:31:52 -08:00
pakrym-oai
69df12efb3 Remove Responses V1 websocket implementation (#13364)
V2 is the way to go!
2026-03-03 11:32:53 -07:00
jif-oai
8159f05dfd feat: wire spreadsheet artifact (#13362) 2026-03-03 15:27:37 +00:00
jif-oai
1df040e62b feat: add multi-actions to presentation tool (#13357) 2026-03-03 14:37:26 +00:00
jif-oai
4874b9291a feat: presentation artifact p1 (#13341)
Part 1 of presentation tool artifact
2026-03-03 11:38:03 +00:00
pash-openai
07e532dcb9 app-server service tier plumbing (plus some cleanup) (#13334)
followup to https://github.com/openai/codex/pull/13212 to expose fast
tier controls to app server
(majority of this PR is generated schema jsons - actual code is +69 /
-35 and +24 tests )

- add service tier fields to the app-server protocol surfaces used by
thread lifecycle, turn start, config, and session configured events
- thread service tier through the app-server message processor and core
thread config snapshots
- allow runtime config overrides to carry service tier for app-server
callers

cleanup:
- Removing useless "legacy" code supporting "standard" - we moved to
None | "fast", so "standard" is not needed.
2026-03-03 02:35:09 -08:00
pash-openai
2f5b01abd6 add fast mode toggle (#13212)
- add a local Fast mode setting in codex-core (similar to how model id
is currently stored on disk locally)
- send `service_tier=priority` on requests when Fast is enabled
- add `/fast` in the TUI and persist it locally
- feature flag
2026-03-02 20:29:33 -08:00
Ahmed Ibrahim
b20b6aa46f Update realtime websocket API (#13265)
- migrate the realtime websocket transport to the new session and
handoff flow
- make the realtime model configurable in config.toml and use API-key
auth for the websocket

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-02 16:05:40 -08:00
daveaitel-openai
c2e126f92a core: reuse parent shell snapshot for thread-spawn subagents (#13052)
## Summary
- reuse the parent shell snapshot when spawning/forking/resuming
`SessionSource::SubAgent(SubAgentSource::ThreadSpawn { .. })` sessions
- plumb inherited snapshot through `AgentControl -> ThreadManager ->
Codex::spawn -> SessionConfiguration`
- skip shell snapshot refresh on cwd updates for thread-spawn subagents
so inherited snapshots are not replaced

## Why
- avoids per-subagent shell snapshot creation and cleanup work
- keeps thread-spawn subagents on the parent snapshot path, matching the
intended parent/child snapshot model

## Validation
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-core spawn_agent -- --nocapture`
- `cargo test -p codex-core --test all
suite::agent_jobs::spawn_agents_on_csv_runs_and_exports`

## Notes
- full `cargo test -p codex-core --test all` was left running separately
for broader verification

Co-authored-by: Codex <noreply@openai.com>
2026-03-02 15:53:15 +00:00
Ahmed Ibrahim
0aeb55bf08 Record realtime close marker on replacement (#13058)
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
2026-03-01 13:54:12 -08:00
xl-openai
752402c4fe feat: load from plugins (#12864)
Support loading plugins.

Plugins can now be enabled via [plugins.<name>] in config.toml. They are
loaded as first-class entities through PluginsManager, and their default
skills/ and .mcp.json contributions are integrated into the existing
skills and MCP flows.
2026-03-01 10:50:56 -08:00
daveaitel-openai
eec3b1e235 Speed up subagent startup (#12935)
## Summary
- skip online model refresh for subagent sessions
- avoid rollout flushes during subagent startup
- keep /models refresh for non-subagent sessions

## Testing
- cargo test -p codex-core --test all
suite::models_etag_responses::refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch
- cargo test -p codex-core --test all
suite::remote_models::remote_models_long_model_slug_is_sent_with_high_reasoning
- cargo test -p codex-core --test all
suite::model_switching::model_switch_to_smaller_model_updates_token_context_window
- cargo test -p codex-core --test all
suite::compact::pre_sampling_compact_runs_on_switch_to_smaller_context_model
- cargo test -p codex-core --test all
suite::compact::pre_sampling_compact_runs_after_resume_and_switch_to_smaller_model
- cargo test -p codex-core --test all
suite::personality::remote_model_friendly_personality_instructions_with_feature

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-28 14:54:08 +01:00
Ruslan Nigmatullin
8c1e3f3e64 app-server: Add ephemeral field to Thread object (#13084)
Currently there is no alternative way to know that thread is ephemeral,
only client which did create it has the knowledge.
2026-02-27 17:42:25 -08:00
Charley Cunningham
695957a348 Unify rollout reconstruction with resume/fork TurnContext hydration (#12612)
## Summary

This PR unifies rollout history reconstruction and resume/fork metadata
hydration under a single `Session::reconstruct_history_from_rollout`
implementation.

The key change from main is that replay metadata now comes from the same
reconstruction pass that rebuilds model-visible history, instead of
doing a second bespoke rollout scan to recover `previous_model` /
`reference_context_item`.

## What Changed

### Unified reconstruction output

`reconstruct_history_from_rollout` now returns a single
`RolloutReconstruction` bundle containing:

- rebuilt `history`
- `previous_model`
- `reference_context_item`

Resume and fork both consume that shared output directly.

### Reverse replay core

The reconstruction logic moved into
`codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans
rollout items newest-to-oldest.

That reverse pass:

- derives `previous_model`
- derives whether `reference_context_item` is preserved or cleared
- stops early once it has both resume metadata and a surviving
`replacement_history` checkpoint

History materialization is still bridged eagerly for now by replaying
only the surviving suffix forward, which keeps the history result stable
while moving the control flow toward the future lazy reverse loader
design.

### Removed bespoke context lookup

This deletes `last_rollout_regular_turn_context_lookup` and its separate
compaction-aware scan.

The previous model / baseline metadata is now computed from the same
replay state that rebuilds history, so resume/fork cannot drift from the
reconstructed transcript view.

### `TurnContextItem` persistence contract

`TurnContextItem` is now treated as the replay source of truth for
durable model-visible baselines.

This PR keeps the following contract explicit:

- persist `TurnContextItem` for the first real user turn so resume can
recover `previous_model`
- persist it for later turns that emit model-visible context updates
- if mid-turn compaction reinjects full initial context into replacement
history, persist a fresh `TurnContextItem` after `Compacted` so
resume/fork can re-establish the baseline from the rewritten history
- do not treat manual compaction or pre-sampling compaction as creating
a new durable baseline on their own

## Behavior Preserved

- rollback replay stays aligned with `drop_last_n_user_turns`
- rollback skips only user turns
- incomplete active user turns are dropped before older finalized turns
when rollback applies
- unmatched aborts do not consume the current active turn
- missing abort IDs still conservatively clear stale compaction state
- compaction clears `reference_context_item` until a later
`TurnContextItem` re-establishes it
- `previous_model` still comes from the newest surviving user turn that
established one

## Tests

Targeted validation run for the current branch shape:

- `cd codex-rs && cargo test -p codex-core --lib
codex::rollout_reconstruction_tests -- --nocapture`
- `cd codex-rs && just fmt`

The branch also extracts the rollout reconstruction tests into
`codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic
has a dedicated home instead of living inline in `codex.rs`.
2026-02-27 13:50:45 -08:00
jif-oai
a63d8bd569 feat: add use memories config (#12997) 2026-02-27 11:40:54 +01:00
Michael Bolin
e6cd75a684 notify: include client in legacy hook payload (#12968)
## Why

The `notify` hook payload did not identify which Codex client started
the turn. That meant downstream notification hooks could not distinguish
between completions coming from the TUI and completions coming from
app-server clients such as VS Code or Xcode. Now that the Codex App
provides its own desktop notifications, it would be nice to be able to
filter those out.

This change adds that context without changing the existing payload
shape for callers that do not know the client name, and keeps the new
end-to-end test cross-platform.

## What changed

- added an optional top-level `client` field to the legacy `notify` JSON
payload
- threaded that value through `core` and `hooks`; the internal session
and turn state now carries it as `app_server_client_name`
- set the field to `codex-tui` for TUI turns
- captured `initialize.clientInfo.name` in the app server and applied it
to subsequent turns before dispatching hooks
- replaced the notify integration test hook with a `python3` script so
the test does not rely on Unix shell permissions or `bash`
- documented the new field in `docs/config.md`

## Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-tui`
- `cargo test -p codex-app-server
suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name
-- --exact --nocapture`
- `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs`
still has unrelated existing failures in this environment)

## Docs

The public config reference on `developers.openai.com/codex` should
mention that the legacy `notify` payload may include a top-level
`client` field. The TUI reports `codex-tui`, and the app server reports
`initialize.clientInfo.name` when it is available.
2026-02-26 22:27:34 -08:00
Celia Chen
90cc4e79a2 feat: add local date/timezone to turn environment context (#12947)
## Summary

This PR includes the session's local date and timezone in the
model-visible environment context and persists that data in
`TurnContextItem`.

  ## What changed
- captures the current local date and IANA timezone when building a turn
context, with a UTC fallback if the timezone lookup fails
- includes current_date and timezone in the serialized
<environment_context> payload
- stores those fields on TurnContextItem so they survive rollout/history
handling, subagent review threads, and resume flows
- treats date/timezone changes as environment updates, so prompt caching
and context refresh logic do not silently reuse stale time context
- updates tests to validate the new environment fields without depending
on a single hardcoded environment-context string

## test

built a local build and saw it in the rollout file:
```
{"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n  <shell>zsh</shell>\n  <current_date>2026-02-26</current_date>\n  <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
```
2026-02-26 23:17:35 +00:00
pakrym-oai
951a389654 Allow clients not to send summary as an option (#12950)
Summary is a required parameter on UserTurn. Ideally we'd like the core
to decide the appropriate summary level.

Make the summary optional and don't send it when not needed.
2026-02-26 14:37:38 -08:00
jif-oai
3404ecff15 feat: add post-compaction sub-agent infos (#12774)
Co-authored-by: Codex <noreply@openai.com>
2026-02-26 18:55:34 +00:00
pakrym-oai
ba41e84a50 Use model catalog default for reasoning summary fallback (#12873)
## Summary
- make `Config.model_reasoning_summary` optional so unset means use
model default
- resolve the optional config value to a concrete summary when building
`TurnContext`
- add protocol support for `default_reasoning_summary` in model metadata

## Validation
- `cargo test -p codex-core --lib client::tests -- --nocapture`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-26 09:31:13 -08:00
daveaitel-openai
79cbca324a Skip history metadata scan for subagents (#12918)
Summary
- Skip `history_metadata` scanning when spawning subagents to avoid
expensive per-spawn history scans.
- Keeps behavior unchanged for normal sessions.

Testing
  - `cd codex-rs && cargo test -p codex-core` 
- Failing in this environment (pre-existing and I don't think something
I did?):
- `suite::cli_stream::responses_mode_stream_cli` (SIGKILL + OTEL export
error to http://localhost:14318/v1/logs)
- `suite::grep_files::grep_files_tool_collects_matches` (unsupported
call: grep_files)
- `suite::grep_files::grep_files_tool_reports_empty_results`
(unsupported call: grep_files)

Co-authored-by: Codex <noreply@openai.com>
2026-02-26 16:21:26 +00:00
Charley Cunningham
07aefffb1f core: bundle settings diff updates into one dev/user envelope (#12417)
## Summary
- bundle contextual prompt injection into at most one developer message
plus one contextual user message in both:
  - per-turn settings updates
  - initial context insertion
- preserve `<model_switch>` across compaction by rebuilding it through
canonical initial-context injection, instead of relying on
strip/reattach hacks
- centralize contextual user fragment detection in one shared definition
table and reuse it for parsing/compaction logic
- keep `AGENTS.md` in its natural serialized format:
  - `# AGENTS.md instructions for {dirname}`
  - `<INSTRUCTIONS>...</INSTRUCTIONS>`
- simplify related tests/helpers and accept the expected snapshot/layout
updates from bundled multi-part messages

## Why
The goal is to converge toward a simpler, more intentional prompt shape
where contextual updates are consistently represented as one developer
envelope plus one contextual user envelope, while keeping parsing and
compaction behavior aligned with that representation.

## Notable details
- the temporary `SettingsUpdateEnvelope` wrapper was removed; these
paths now return `Vec<ResponseItem>` directly
- local/remote compaction no longer rely on model-switch strip/restore
helpers
- contextual user detection is now driven by shared fragment definitions
instead of ad hoc matcher assembly
- AGENTS/user instructions are still the same logical context; only the
synthetic `<user_instructions>` wrapper was replaced by the natural
AGENTS text format

## Testing
- `just fmt`
- `cargo test -p codex-app-server
codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
-- --exact`
- `cargo test -p codex-core
compact::tests::collect_user_messages_filters_session_prefix_entries
--lib -- --exact`
- `cargo test -p codex-core --test all
'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_developer_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::includes_user_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::resume_includes_initial_messages_and_sends_prior_items'
-- --exact`
- `cargo test -p codex-core --test all
'suite::review::review_input_isolated_from_parent_history' -- --exact`
- `cargo test -p codex-exec --test all
'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
--exact`
- `cargo test -p core_test_support
context_snapshot::tests::full_text_mode_preserves_unredacted_text --
--exact`

## Notes
- I also ran several targeted `compact`, `compact_remote`,
`prompt_caching`, `model_visible_layout`, and `event_mapping` tests
while iterating on prompt-shape changes.
- I have not claimed a clean full-workspace `cargo test` from this
environment because local sandbox/resource conditions have previously
produced unrelated failures in large workspace runs.
2026-02-26 00:12:08 -08:00
Curtis 'Fjord' Hawthorne
40ab71a985 Disable js_repl when Node is incompatible at startup (#12824)
## Summary
- validate `js_repl` Node compatibility during session startup when the
experiment is enabled
- if Node is missing or too old, disable `js_repl` and
`js_repl_tools_only` for the session before tools and instructions are
built
- surface that startup disablement to users through the existing startup
warning flow instead of only logging it
- reuse the same compatibility check in js_repl kernel startup so
startup gating and runtime behavior stay aligned
- add a regression test that verifies the warning is emitted and that
the first advertised tool list omits `js_repl` and `js_repl_reset` when
Node is incompatible

## Why
Today `js_repl` can be advertised based only on the feature flag, then
fail later when the kernel starts. That makes the available tool list
inaccurate at the start of a conversation, and users do not get a clear
explanation for why the tool is unavailable.

This change makes tool availability reflect real startup checks, keeps
the advertised tool set stable for the lifetime of the session, and
gives users a visible warning when `js_repl` is disabled.

## Testing
- `just fmt`
- `cargo test -p codex-core --test all
js_repl_is_not_advertised_when_startup_node_is_incompatible`
2026-02-26 01:14:51 +00:00