Commit Graph

82 Commits

Author SHA1 Message Date
starr-openai
be71b6fcd1 Use selected turn environments for runtime context (#20281)
## Summary
- make selected turn environments the source of truth for session
runtime cwd and MCP runtime environment selection
- keep local/no-selection fallback behavior intact
- add coverage for duplicate selected environments, cwd resolution, and
MCP runtime environment selection

## Validation
- git diff --check
- rustfmt was run on touched Rust files during the implementation
workflow

CI should provide the full Bazel/test signal.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-01 11:00:14 -07:00
Tom
fe05acad23 Make thread store process-scoped (#19474)
- Build one app-server process ThreadStore from startup config and share
it with ThreadManager and CodexMessageProcessor.
- Remove per-thread/fork store reconstruction so effective thread config
cannot switch the persistence backend.
- Add params to ThreadStore create/resume for specifying thread
metadata, since otherwise the metadata from store creation would be used
(incorrectly).
2026-04-30 21:24:59 -07:00
xl-openai
7b3de63041 Move plugin out of core. (#20348) 2026-04-30 14:26:14 -07:00
Tom
127be0612c [codex] Migrate thread turns list to thread store (#19280)
- migrate `thread/turns/list` to ThreadStore. Uses ThreadStore for most
data now but merges in the in-memory state from thread manager
- keep v2 `thread/list` pathless-store friendly by converting
`StoredThread` directly to API `Thread`
- add regression coverage for pathless store history/listing
2026-04-30 14:16:42 -07:00
pakrym-oai
fedcefe9da Reduce the surface of collaboration modes (#20149)
Collaboration modes were slightly invasive both into ThreadManager
construction and ModelProvider
2026-04-29 17:22:41 -07:00
pakrym-oai
8356806fc9 Add ThreadManager sample crate (#20141)
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.

Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check

Tests: Not run per request.
2026-04-29 11:21:06 -07:00
jif-oai
431ebeaef7 feat: split memories part 2 (#19860)
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-28 13:03:28 +02:00
Eric Traut
4167628622 Add goal core runtime (4 / 5) (#18076)
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.

## Why

A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.

## What changed

- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.

## Verification

- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
2026-04-24 21:16:00 -07:00
Tom
588f7a9fc4 [codex] add non-local thread store regression harness (#19266)
- Add an integration test that guarantees nothing gets written to codex
home dir or sqlite when running a rollout with a non-local ThreadStore
- Add an in-memory "spy" ThreadStore for tests like this

Note I could not find a good way to also ensure there were no filesystem
_reads_ that didn't go through threadstore. I explored a more elaborate
sandboxed-subprocess approach but it isn't platform portable and felt
like it wasn't (yet) worth it.
2026-04-24 15:45:44 -07:00
Tom
0a9b559c0b Migrate fork and resume reads to thread store (#18900)
- Route cold thread/resume and thread/fork source loading through
ThreadStore reads instead of direct rollout path operations
- Keep lookups that explicitly specify a rollout-path using the local
thread store methods but return an invalid-request error for remote
ThreadStore configurations
- Add some additional unit tests for code path coverage
2026-04-24 13:51:37 -07:00
jif-oai
28742866c7 Add agents.interrupt_message for interruption markers (#19351)
## Why

Agent interruptions currently always persist a model-visible
interrupted-turn marker before emitting `TurnAborted`. That marker is
useful by default because it gives the next model turn context about a
deliberately interrupted task, but some deployments need to suppress
that history injection entirely while still keeping the client-visible
interruption event.

## What changed

- Add `[agents] interrupt_message = false` to disable the model-visible
interrupted-turn marker.
- Resolve the setting into `Config::agent_interrupt_message_enabled`,
defaulting to `true` so existing behavior is unchanged.
- Apply the setting to both live interrupted turns and interrupted fork
snapshots.
- Keep emitting `TurnAborted` even when the history marker is disabled.
- Regenerate `core/config.schema.json` for the new
`agents.interrupt_message` field.

## Testing

- `cargo test -p codex-core load_config_resolves_agent_interrupt_message
-- --nocapture`
- `cargo test -p codex-core
disabled_interrupted_fork_snapshot_appends_only_interrupt_event --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_developer_input_message --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_can_disable_interrupted_marker --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo check -p codex-core`
2026-04-24 16:02:45 +02:00
jif-oai
120aa07d81 Make MultiAgentV2 interruption markers assistant-authored (#19124)
## Why

`MultiAgentV2` follow-up messages are delivered to agents as
assistant-authored `InterAgentCommunication` envelopes. When
`followup_task` used `interrupt: true`, the interrupted-turn guidance
was still persisted as a contextual user message, so model-visible
history made a system-generated interruption boundary look
user-authored.

This keeps interruption guidance consistent with the rest of the v2
inter-agent message stream while preserving the legacy marker shape for
non-v2 sessions.

## What changed

- Make `interrupted_turn_history_marker` feature-aware.
- Record the interrupted-turn marker as an assistant `OutputText`
message when `Feature::MultiAgentV2` is enabled.
- Keep the existing user contextual fragment for non-v2 sessions.
- Apply the same feature-aware marker to interrupted fork snapshots.
- Add coverage for the live `followup_task` interrupt path and the
helper-level v2 marker shape.

## Testing

- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_assistant_output_message --
--nocapture`
- `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`
2026-04-24 13:39:26 +02:00
Celia Chen
e8d8080818 feat: let model providers own model discovery (#18950)
## Why

`codex-models-manager` had grown to own provider-specific concerns:
constructing OpenAI-compatible `/models` requests, resolving provider
auth, emitting request telemetry, and deciding how provider catalogs
should be sourced. That made the manager harder to reuse for providers
whose model catalog is not fetched from the OpenAI `/models` endpoint,
such as Amazon Bedrock.

This change moves provider-specific model discovery behind
provider-owned implementations, so the models manager can focus on
refresh policy, cache behavior, picker ordering, and model metadata
merging.

## What Changed

- Introduced a `ModelsManager` trait with separate `OpenAiModelsManager`
and `StaticModelsManager` implementations.
- Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives
outside `codex-models-manager`.
- Moved `/models` request construction, provider auth resolution,
timeout handling, and request telemetry into `codex-model-provider` via
`OpenAiModelsEndpoint`.
- Added provider-owned `models_manager(...)` construction so configured
OpenAI-compatible providers use `OpenAiModelsManager`, while
static/catalog-backed providers can return `StaticModelsManager`.
- Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock
model IDs.
- Updated core/session/thread manager code and tests to depend on
`Arc<dyn ModelsManager>`.
- Moved offline model test helpers into
`codex_models_manager::test_support`.
## Metadata References

The Bedrock catalog metadata is based on the official Amazon Bedrock
OpenAI model documentation:

- [Amazon Bedrock OpenAI
models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html)
lists the Bedrock model IDs, text input/output modalities, and `128,000`
token context window for `gpt-oss-20b` and `gpt-oss-120b`.
- [Amazon Bedrock `gpt-oss-120b` model
card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html)
lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the
`bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities,
and `128K` context window.
- [OpenAI `gpt-oss-120b` model
docs](https://developers.openai.com/api/docs/models/gpt-oss-120b)
document configurable reasoning effort with `low`, `medium`, and `high`,
plus text input/output modality.

The display names, default reasoning effort, and priority ordering are
Codex-local catalog choices.

## Test Plan
- Manually verified app-server model listing with an AWS profile:

```shell
CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \
  model-list
```

The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0`
as the default model and `openai.gpt-oss-20b-1:0` as the second listed
model, both text-only and supporting low/medium/high reasoning effort.
2026-04-24 04:28:25 +00:00
starr-openai
49fb25997f Add sticky environment API and thread state (#18897)
## Summary
- add sticky environment selections to app-server v2 thread/start and
turn/start request flow
- carry thread-level selections through core session/thread state
- add app-server coverage for sticky selections and turn overrides

## Stack
1. This PR: API and thread persistence
2. #18898: config.toml named environment loading
3. #18899: downstream tool/runtime consumers

## Validation
- Not run locally; split only.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 18:57:13 -07:00
cassirer-openai
e3c8720a99 [rollout_trace] Add debug trace reduction command (#18880)
## Summary

Adds the debug CLI entry point for reducing recorded rollout traces.
This gives developers a direct way to inspect whether the emitted trace
stream reduces into the expected conversation/runtime model.

## Stack

This is PR 5/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is intentionally last: it depends on the trace crate, core
recorder, runtime/tool events, and session/agent edge data all existing.
The command should remain a debug/developer tool and avoid adding new
runtime behavior.

The useful review question is whether the CLI exposes the reducer in the
smallest practical way for local inspection without turning the debug
command into a supported user-facing workflow.
2026-04-24 01:56:48 +00:00
Tom
f1061d9d07 [codex] Implement remote thread store methods (#19008) 2026-04-23 17:49:28 +00:00
Tom
f1923a38b1 [codex] Route live thread writes through ThreadStore (#18882)
Begin migrating the thread write codepaths to ThreadStore.

This starts using ThreadStore inside of core session code, not only in
the app server code.

Rework the interfaces around thread recording/persistence. We're left
with the following:

* `ThreadManager`: owns the process-level registry of loaded threads and
handles cross-thread orchestration: start, resume, fork, lookup, remove,
and route ops to running CodexThreads.
* `CodexThread`: represents one loaded/running thread from the outside.
It is the handle app-server and callers use to submit ops, inspect
session metadata, and shut the thread down.
* `LiveThread`: session-owned persistence lifecycle handle for one
active thread. Core session code uses it to append rollout items,
materialize lazy persistence, flush, shutdown, discard init-failed
writers, and load that thread’s persisted history.
* `ThreadStore`: storage backend abstraction. It answers “how are
threads persisted, read, listed, updated, archived?” Local and remote
implementations live behind this trait.
* `LocalThreadStore`: local ThreadStore implementation. It owns the
file/sqlite-specific details and keeps RolloutRecorder as a local
implementation detail.

This is a few too many Thread abstractions for my liking, but they do
all represent different concepts / needs / layers.

Migration note: in places where the core code explicitly requires a
path, rather than a thread ID, throw an error if we're running with a
remote store.

Cover the new local live-writer lifecycle with focused tests and
preserve app-server thread-start behavior, including ephemeral pathless
sessions.
2026-04-23 10:17:09 -07:00
cassirer-openai
f67383bcba [rollout_trace] Record core session rollout traces (#18877)
## Summary

Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.

## Stack

This is PR 2/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.

The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.
2026-04-22 17:00:48 +00:00
starr-openai
ddbe2536be Support multiple managed environments (#18401)
## Summary
- refactor EnvironmentManager to own keyed environments with
default/local lookup helpers
- keep remote exec-server client creation lazy until exec/fs use
- preserve disabled agent environment access separately from internal
local environment access

## Validation
- not run (per Codex worktree instruction to avoid tests/builds unless
requested)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-21 15:29:35 -07:00
Andrey Mishchenko
ab65fbbdd6 Add codex debug models to show model catalog (#18625) 2026-04-20 05:42:22 +00:00
pakrym-oai
71e4c6fa17 Move codex module under session (#18249)
## Summary
- rename the core codex module root to session/mod.rs without using
#[path]
- move the codex module directory and tests under core/src/session
- remove session/mod.rs reexports so call sites use explicit child
module paths

## Testing
- cargo test -p codex-core --lib
- cargo check -p codex-core --tests
- just fmt
- just fix -p codex-core
- git diff --check
2026-04-17 16:18:53 +00:00
pakrym-oai
96254a763a Make skill loading filesystem-aware (#17720)
Migrates skill loading to support reading repo skills from the remote
environment.
2026-04-14 15:40:40 -07:00
pakrym-oai
3b24a9a532 Refactor plugin loading to async (#17747)
Simplifies skills migration.
2026-04-13 21:52:56 -07:00
pakrym-oai
ac82443d07 Use AbsolutePathBuf in skill loading and codex_home (#17407)
Helps with FS migration later
2026-04-13 10:26:51 -07:00
Abhinav
7999b0f60f Support clear SessionStart source (#17073)
## Motivation

The `SessionStart` hook already receives `startup` and `resume` sources,
but sessions created from `/clear` previously looked like normal startup
sessions. This makes it impossible for hook authors to distinguish
between these with the matcher.

## Summary

- Add `InitialHistory::Cleared` so `/clear`-created sessions can be
distinguished from ordinary startup sessions.
- Add `SessionStartSource::Clear` and wire it through core, app-server
thread start params, and TUI clear-session flow.
- Update app-server protocol schemas, generated TypeScript, docs, and
related tests.


https://github.com/user-attachments/assets/9cae3cb4-41c7-4d06-b34f-966252442e5c
2026-04-10 16:05:21 -07:00
rhan-oai
5779be314a [codex-analytics] add compaction analytics event (#17155)
- event for compaction analytics
- introduces thread-connection and thread metadata caches for data
denormalization, expected to be useful for denormalization onto core
emitted events in general
- threads analytics event client into core (mirrors approved
implementation in #16640)
- denormalizes key thread metadata: thread_source, subagent_source,
parent_thread_id, as well as app-server client and runtime metadata)
- compaction strategy defaults to memento, forward compatible with
expected prefill_compaction strategy

1. Manual standalone compact, local
`INFO | 2026-04-09 17:35:50 | codex_backend.routers.analytics_events |
analytics_events.track_analytics_events:526 | Tracked
codex_compaction_event event params={'thread_id':
'019d74d0-5cfb-70c0-bef9-165c3bf9b2df', 'turn_id':
'019d74d0-d7f6-7c81-acc6-aae2030243d6', 'product_surface': 'codex',
'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
'0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
'runtime_arch': 'aarch64'}, 'trigger': 'manual', 'reason':
'user_requested', 'implementation': 'responses', 'phase':
'standalone_turn', 'strategy': 'memento', 'status': 'completed',
'active_context_tokens_before': 20170, 'active_context_tokens_after':
4830, 'started_at': 1775781337, 'completed_at': 1775781350,
'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
None, 'error': None, 'duration_ms': 13524} | `

2. Auto pre-turn compact, local
`INFO | 2026-04-09 17:37:30 | codex_backend.routers.analytics_events |
analytics_events.track_analytics_events:526 | Tracked
codex_compaction_event event params={'thread_id':
'019d74d2-45ef-71d1-9c93-23cc0c13d988', 'turn_id':
'019d74d2-7b42-7372-9f0e-c0da3f352328', 'product_surface': 'codex',
'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
'0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
'runtime_arch': 'aarch64'}, 'trigger': 'auto', 'reason':
'context_limit', 'implementation': 'responses', 'phase': 'pre_turn',
'strategy': 'memento', 'status': 'completed',
'active_context_tokens_before': 20063, 'active_context_tokens_after':
4822, 'started_at': 1775781444, 'completed_at': 1775781449,
'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
None, 'error': None, 'duration_ms': 5497} | `

3. Auto mid-turn compact, local
`INFO | 2026-04-09 17:38:28 | codex_backend.routers.analytics_events |
analytics_events.track_analytics_events:526 | Tracked
codex_compaction_event event params={'thread_id':
'019d74d3-212f-7a20-8c0a-4816a978675e', 'turn_id':
'019d74d3-3ee1-7462-89f6-2ffbeefcd5e3', 'product_surface': 'codex',
'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
'0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
'runtime_arch': 'aarch64'}, 'trigger': 'auto', 'reason':
'context_limit', 'implementation': 'responses', 'phase': 'mid_turn',
'strategy': 'memento', 'status': 'completed',
'active_context_tokens_before': 20325, 'active_context_tokens_after':
14641, 'started_at': 1775781500, 'completed_at': 1775781508,
'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
None, 'error': None, 'duration_ms': 7507} | `

4. Remote /responses/compact, manual standalone
`INFO | 2026-04-09 17:40:20 | codex_backend.routers.analytics_events |
analytics_events.track_analytics_events:526 | Tracked
codex_compaction_event event params={'thread_id':
'019d74d4-7a11-78a1-89f7-0535a1149416', 'turn_id':
'019d74d4-e087-7183-9c20-b1e40b7578c0', 'product_surface': 'codex',
'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name':
'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process',
'experimental_api_enabled': True}, 'runtime': {'codex_rs_version':
'0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0',
'runtime_arch': 'aarch64'}, 'trigger': 'manual', 'reason':
'user_requested', 'implementation': 'responses_compact', 'phase':
'standalone_turn', 'strategy': 'memento', 'status': 'completed',
'active_context_tokens_before': 23461, 'active_context_tokens_after':
6171, 'started_at': 1775781601, 'completed_at': 1775781620,
'thread_source': 'user', 'subagent_source': None, 'parent_thread_id':
None, 'error': None, 'duration_ms': 18971} | `
2026-04-10 13:03:54 -07:00
jif-oai
89f1a44afa feat: /feedback cascade (#16442)
Example here:
https://openai.sentry.io/issues/7380240430/?project=4510195390611458&query=019d498f-bec4-7ba2-96d2-612b1e4507df&referrer=issue-stream
2026-04-07 12:47:37 +01:00
rhan-oai
756c45ec61 [codex-analytics] add protocol-native turn timestamps (#16638)
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16638).
* #16870
* #16706
* #16659
* #16641
* #16640
* __->__ #16638
2026-04-06 16:22:59 -07:00
Ahmed Ibrahim
af8a9d2d2b remove temporary ownership re-exports (#16626)
Stacked on #16508.

This removes the temporary `codex-core` / `codex-login` re-export shims
from the ownership split and rewrites callsites to import directly from
`codex-model-provider-info`, `codex-models-manager`, `codex-api`,
`codex-protocol`, `codex-feedback`, and `codex-response-debug-context`.

No behavior change intended; this is the mechanical import cleanup layer
split out from the ownership move.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-03 00:33:34 -07:00
Michael Bolin
aa2403e2eb core: remove cross-crate re-exports from lib.rs (#16512)
## Why

`codex-core` was re-exporting APIs owned by sibling `codex-*` crates,
which made downstream crates depend on `codex-core` as a proxy module
instead of the actual owner crate.

Removing those forwards makes crate boundaries explicit and lets leaf
crates drop unnecessary `codex-core` dependencies. In this PR, this
reduces the dependency on `codex-core` to `codex-login` in the following
files:

```
codex-rs/backend-client/Cargo.toml
codex-rs/mcp-server/tests/common/Cargo.toml
```

## What

- Remove `codex-rs/core/src/lib.rs` re-exports for symbols owned by
`codex-login`, `codex-mcp`, `codex-rollout`, `codex-analytics`,
`codex-protocol`, `codex-shell-command`, `codex-sandboxing`,
`codex-tools`, and `codex-utils-path`.
- Delete the `default_client` forwarding shim in `codex-rs/core`.
- Update in-crate and downstream callsites to import directly from the
owning `codex-*` crate.
- Add direct Cargo dependencies where callsites now target the owner
crate, and remove `codex-core` from `codex-rs/backend-client`.
2026-04-01 23:06:24 -07:00
pakrym-oai
8fa88fa8ca Add cached environment manager for exec server URL (#15785)
Add environment manager that is a singleton and is created early in
app-server (before skill manager, before config loading).

Use an environment variable to point to a running exec server.
2026-03-25 16:14:36 -07:00
Ahmed Ibrahim
9dbe098349 Extract codex-core-skills crate (#15749)
## Summary
- move skill loading and management into codex-core-skills
- leave codex-core with the thin integration layer and shared wiring

## Testing
- CI

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-25 12:57:42 -07:00
Ruslan Nigmatullin
daf5e584c2 core: Make FileWatcher reusable (#15093)
### Summary
Make `FileWatcher` a reusable core component which can be built upon.
Extract skills-related logic into a separate `SkillWatcher`.
Introduce a composable `ThrottledWatchReceiver` to throttle filesystem
events, coalescing affected paths among them.

### Testing
Updated existing unit tests.
2026-03-24 11:04:47 -07:00
Charley Cunningham
f547b79bd0 Add fork snapshot modes (#15239)
## Summary
- add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can
request either a committed snapshot or an interrupted snapshot
- share the model-visible `<turn_aborted>` history marker between the
live interrupt path and interrupted forks
- update the small set of direct fork callsites to pass
`ForkSnapshotMode::Committed`

Note: this enables /btw to work similarly as Esc to interrupt (hopefully
somewhat in distribution)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-23 19:05:42 -07:00
jif-oai
18f1a08bc9 feat: new op type for sub-agents communication (#15556)
Add `InterAgentCommunication` for v2 agent communication
2026-03-23 21:09:00 +00:00
jif-oai
37ac0c093c feat: structured multi-agent output (#15515)
Send input now sends messages as assistant message and with this format:

```
author: /root/worker_a
recipient: /root/worker_a/tester
other_recipients: []
Content: bla bla bla. Actual content. Only text for now
```
2026-03-23 18:53:54 +00:00
xl-openai
db5781a088 feat: support product-scoped plugins. (#15041)
1. Added SessionSource::Custom(String) and --session-source.
  2. Enforced plugin and skill products by session_source.
  3. Applied the same filtering to curated background refresh.
2026-03-19 00:46:15 -07:00
alexsong-oai
825d09373d Support featured plugins (#15042) 2026-03-18 17:45:30 -07:00
Dylan Hurd
84f4e7b39d fix(subagents) share execpolicy by default (#13702)
## Summary
If a subagent requests approval, and the user persists that approval to
the execpolicy, it should (by default) propagate. We'll need to rethink
this a bit in light of coming Permissions changes, though I think this
is closer to the end state that we'd want, which is that execpolicy
changes to one permissions profile should be synced across threads.

## Testing
- [x] Added integration test

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 06:42:26 +00:00
Ahmed Ibrahim
b02388672f Stabilize Windows cmd-based shell test harnesses (#14958)
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:

- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`

## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.

There were multiple independent sources of nondeterminism in that setup:

- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.

So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.

## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.

## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.

---------

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 20:21:46 +00:00
Michael Bolin
b77fe8fefe Apply argument comment lint across codex-rs (#14652)
## Why

Once the repo-local lint exists, `codex-rs` needs to follow the
checked-in convention and CI needs to keep it from drifting. This commit
applies the fallback `/*param*/` style consistently across existing
positional literal call sites without changing those APIs.

The longer-term preference is still to avoid APIs that require comments
by choosing clearer parameter types and call shapes. This PR is
intentionally the mechanical follow-through for the places where the
existing signatures stay in place.

After rebasing onto newer `main`, the rollout also had to cover newly
introduced `tui_app_server` call sites. That made it clear the first cut
of the CI job was too expensive for the common path: it was spending
almost as much time installing `cargo-dylint` and re-testing the lint
crate as a representative test job spends running product tests. The CI
update keeps the full workspace enforcement but trims that extra
overhead from ordinary `codex-rs` PRs.

## What changed

- keep a dedicated `argument_comment_lint` job in `rust-ci`
- mechanically annotate remaining opaque positional literals across
`codex-rs` with exact `/*param*/` comments, including the rebased
`tui_app_server` call sites that now fall under the lint
- keep the checked-in style aligned with the lint policy by using
`/*param*/` and leaving string and char literals uncommented
- cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
registry/git metadata in the lint job
- split changed-path detection so the lint crate's own `cargo test` step
runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
- continue to run the repo wrapper over the `codex-rs` workspace, so
product-code enforcement is unchanged

Most of the code changes in this commit are intentionally mechanical
comment rewrites or insertions driven by the lint itself.

## Verification

- `./tools/argument-comment-lint/run.sh --workspace`
- `cargo test -p codex-tui-app-server -p codex-tui`
- parsed `.github/workflows/rust-ci.yml` locally with PyYAML

---

* -> #14652
* #14651
2026-03-16 16:48:15 -07:00
Eric Traut
4b9d5c8c1b Add openai_base_url config override for built-in provider (#12031)
We regularly get bug reports from users who mistakenly have the
`OPENAI_BASE_URL` environment variable set. This PR deprecates this
environment variable in favor of a top-level config key
`openai_base_url` that is used for the same purpose. By making it a
config key, it will be more visible to users. It will also participate
in all of the infrastructure we've added for layered and managed
configs.

Summary
- introduce the `openai_base_url` top-level config key, update
schema/tests, and route the built-in openai provider through it while
- fall back to deprecated `OPENAI_BASE_URL` env var but warn user of
deprecation when no `openai_base_url` config key is present
- update CLI, SDK, and TUI code to prefer the new config path (with a
deprecated env-var fallback) and document the SDK behavior change
2026-03-13 20:12:25 -06:00
Michael Bolin
0c8a36676a fix: move inline codex-rs/core unit tests into sibling files (#14444)
## Why
PR #13783 moved the `codex.rs` unit tests into `codex_tests.rs`. This
applies the same extraction pattern across the rest of `codex-rs/core`
so the production modules stay focused on runtime code instead of large
inline test blocks.

Keeping the tests in sibling files also makes follow-up edits easier to
review because product changes no longer have to share a file with
hundreds or thousands of lines of test scaffolding.

## What changed
- replaced each inline `mod tests { ... }` in `codex-rs/core/src/**`
with a path-based module declaration
- moved each extracted unit test module into a sibling `*_tests.rs`
file, using `mod_tests.rs` for `mod.rs` modules
- preserved the existing `cfg(...)` guards and module-local structure so
the refactor remains structural rather than behavioral

## Testing
- `cargo test -p codex-core --lib` (`1653 passed; 0 failed; 5 ignored`)
- `just fix -p codex-core`
- `cargo fmt --check`
- `cargo shear`
2026-03-12 08:16:36 -07:00
Owen Lin
5bc82c5b93 feat(app-server): propagate traces across tasks and core ops (#14387)
## Summary

This PR keeps app-server RPC request trace context alive for the full
lifetime of the work that request kicks off (e.g. for `thread/start`,
this is `app-server rpc handler -> tokio background task -> core op
submissions`). Previously we lose trace lineage once the request handler
returns or hands work off to background tasks.

This approach is especially relevant for `thread/start` and other RPC
handlers that run in a non-blocking way. In the near future we'll most
likely want to make all app-server handlers run in a non-blocking way by
default, and only queue operations that must operate in order (e.g.
thread RPCs per thread?), so we want to make sure tracing in app-server
just generally works.

Depends on https://github.com/openai/codex/pull/14300

**Before**
<img width="155" height="207" alt="image"
src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
/>


**After**
<img width="299" height="337" alt="image"
src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
/>

## What changed

- Keep request-scoped trace context around until we send the final
response or error, or the connection closes.
- Thread that trace context through detached `thread/start` work so
background startup stays attached to the originating request.
- Pass request trace context through to downstream core operations,
including:
  - thread creation
  - resume/fork flows
  - turn submission
  - review
  - interrupt
  - realtime conversation operations
- Add tracing tests that verify:
  - remote W3C trace context is preserved for `thread/start`
  - remote W3C trace context is preserved for `turn/start`
  - downstream core spans stay under the originating request span
  - request-scoped tracing state is cleaned up correctly
- Clean up shutdown behavior so detached background tasks and spawned
threads are drained before process exit.
2026-03-11 20:18:31 -07:00
Anton Panasenko
77b0c75267 feat: search_tool migrate to bring you own tool of Responses API (#14274)
## Why

to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.

## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
2026-03-11 17:51:51 -07:00
xl-openai
0c33af7746 feat: support disabling bundled system skills (#13792)
Support disable bundled system skills with a config:

[skills.bundled]
enabled = false
2026-03-09 22:02:53 -07:00
Michael Bolin
7134220f3c core: box wrapper futures to reduce stack pressure (#13429)
Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This
uses the same general fix pattern as
[#12421](https://github.com/openai/codex/pull/12421), but in the
`codex-core` compact/resume/fork path.

## Why

`compact_resume_after_second_compaction_preserves_history` started
overflowing the stack on Windows CI after `#13388`.

The important part is that this was not a compaction-recursion bug. The
test exercises a path with several thin `async fn` wrappers around much
larger thread-spawn, resume, and fork futures. When one `async fn`
awaits another inline, the outer future stores the callee future as part
of its own state machine. In a long wrapper chain, that means a caller
can accidentally inline a lot more state than the source code suggests.

That is exactly what was happening here:

- `ThreadManager` convenience methods such as `start_thread`,
`resume_thread_from_rollout`, and `fork_thread` were inlining the larger
spawn/resume futures beneath them.
- `core_test_support::test_codex` added another wrapper layer on top of
those same paths.
- `compact_resume_fork` adds a few more helpers, and this particular
test drives the resume/fork path multiple times.

On Windows, that was enough to push both the libtest thread and Tokio
worker threads over the edge. The previous 8 MiB test-thread workaround
proved the failure was stack-related, but it did not address the
underlying future size.

## How This Was Debugged

The useful debugging pattern here was to turn the CI-only failure into a
local low-stack repro.

1. First, remove the explicit large-stack harness so the test runs on
the normal `#[tokio::test]` path.
2. Build the test binary normally.
3. Re-run the already-built `tests/all` binary directly with
progressively smaller `RUST_MIN_STACK` values.

Running the built binary directly matters: it keeps the reduced stack
size focused on the test process instead of also applying it to `cargo`
and `rustc`.

That made it possible to answer two questions quickly:

- Does the failure still reproduce without the workaround? Yes.
- Does boxing the wrapper futures actually buy back stack headroom? Also
yes.

After this change, the built test binary passes with
`RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough
evidence to justify removing the explicit 8 MiB override while keeping a
deterministic low-stack repro for future debugging.

If we hit a similar issue again, the first places to inspect are thin
`async fn` wrappers that mostly forward into a much larger async
implementation.

## `Box::pin()` Primer

`async fn` compiles into a state machine. If a wrapper does this:

```rust
async fn wrapper() {
    inner().await;
}
```

then `wrapper()` stores the full `inner()` future inline as part of its
own state.

If the wrapper instead does this:

```rust
async fn wrapper() {
    Box::pin(inner()).await;
}
```

then the child future lives on the heap, and the outer future only
stores a pinned pointer to it. That usually trades one allocation for a
substantially smaller outer future, which is exactly the tradeoff we
want when the problem is stack pressure rather than raw CPU time.

Useful references:

-
[`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin)
- [Async book:
Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html)

## What Changed

- Boxed the wrapper futures in `core/src/thread_manager.rs` around
`start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the
corresponding `ThreadManagerState` spawn helpers so callers no longer
inline the full spawn/resume state machine through multiple layers.
- Boxed the matching test-only wrapper futures in
`core/tests/common/test_codex.rs` and
`core/tests/suite/compact_resume_fork.rs`, which sit directly on top of
the same path.
- Restored `compact_resume_after_second_compaction_preserves_history` in
`core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]`
and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing.
- Simplified a tiny helper in `compact_resume_fork` by making
`fetch_conversation_path()` synchronous, which removes one more
unnecessary future layer from the test path.

## Verification

- `cargo test -p codex-core --test all
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
-- --exact --nocapture`
- `cargo test -p codex-core --test all suite::compact_resume_fork --
--nocapture`
- Re-ran the built `codex-core` `tests/all` binary directly with reduced
stack sizes:
  - `RUST_MIN_STACK=917504` passes
  - `RUST_MIN_STACK=786432` still overflows
- `cargo test -p codex-core`
- Still fails locally in unrelated existing integration areas that
expect the `codex` / `test_stdio_server` binaries or hit the existing
`search_tool` wiremock mismatches.
2026-03-04 05:44:52 +00:00
daveaitel-openai
c2e126f92a core: reuse parent shell snapshot for thread-spawn subagents (#13052)
## Summary
- reuse the parent shell snapshot when spawning/forking/resuming
`SessionSource::SubAgent(SubAgentSource::ThreadSpawn { .. })` sessions
- plumb inherited snapshot through `AgentControl -> ThreadManager ->
Codex::spawn -> SessionConfiguration`
- skip shell snapshot refresh on cwd updates for thread-spawn subagents
so inherited snapshots are not replaced

## Why
- avoids per-subagent shell snapshot creation and cleanup work
- keeps thread-spawn subagents on the parent snapshot path, matching the
intended parent/child snapshot model

## Validation
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-core spawn_agent -- --nocapture`
- `cargo test -p codex-core --test all
suite::agent_jobs::spawn_agents_on_csv_runs_and_exports`

## Notes
- full `cargo test -p codex-core --test all` was left running separately
for broader verification

Co-authored-by: Codex <noreply@openai.com>
2026-03-02 15:53:15 +00:00
Ahmed Ibrahim
0aeb55bf08 Record realtime close marker on replacement (#13058)
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
2026-03-01 13:54:12 -08:00
xl-openai
752402c4fe feat: load from plugins (#12864)
Support loading plugins.

Plugins can now be enabled via [plugins.<name>] in config.toml. They are
loaded as first-class entities through PluginsManager, and their default
skills/ and .mcp.json contributions are integrated into the existing
skills and MCP flows.
2026-03-01 10:50:56 -08:00