codex

mirror of https://github.com/openai/codex.git synced 2026-05-20 03:05:02 +00:00

Author	SHA1	Message	Date
starr-openai	be71b6fcd1	Use selected turn environments for runtime context (#20281 ) ## Summary - make selected turn environments the source of truth for session runtime cwd and MCP runtime environment selection - keep local/no-selection fallback behavior intact - add coverage for duplicate selected environments, cwd resolution, and MCP runtime environment selection ## Validation - git diff --check - rustfmt was run on touched Rust files during the implementation workflow CI should provide the full Bazel/test signal. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-01 11:00:14 -07:00
Tom	fe05acad23	Make thread store process-scoped (#19474 ) - Build one app-server process ThreadStore from startup config and share it with ThreadManager and CodexMessageProcessor. - Remove per-thread/fork store reconstruction so effective thread config cannot switch the persistence backend. - Add params to ThreadStore create/resume for specifying thread metadata, since otherwise the metadata from store creation would be used (incorrectly).	2026-04-30 21:24:59 -07:00
xl-openai	7b3de63041	Move plugin out of core. (#20348 )	2026-04-30 14:26:14 -07:00
Tom	127be0612c	[codex] Migrate thread turns list to thread store (#19280 ) - migrate `thread/turns/list` to ThreadStore. Uses ThreadStore for most data now but merges in the in-memory state from thread manager - keep v2 `thread/list` pathless-store friendly by converting `StoredThread` directly to API `Thread` - add regression coverage for pathless store history/listing	2026-04-30 14:16:42 -07:00
pakrym-oai	fedcefe9da	Reduce the surface of collaboration modes (#20149 ) Collaboration modes were slightly invasive both into ThreadManager construction and ModelProvider	2026-04-29 17:22:41 -07:00
pakrym-oai	8356806fc9	Add ThreadManager sample crate (#20141 ) Summary: - Add codex-thread-manager-sample, a one-shot binary that starts a ThreadManager thread, submits a prompt, and prints the final assistant output. - Pass ThreadStore into ThreadManager::new and expose thread_store_from_config for existing callsites. - Build the sample Config directly with only --model and prompt inputs. Verification: - just fmt - cargo check -p codex-thread-manager-sample -p codex-app-server -p codex-mcp-server - git diff --check Tests: Not run per request.	2026-04-29 11:21:06 -07:00
jif-oai	431ebeaef7	feat: split memories part 2 (#19860 ) Keep extracting memories out of core and moving the write trigger in the app-server This is temporary and it should move at the client level as a follow-up This makes core fully independant from `codex-memories-write` --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-28 13:03:28 +02:00
Eric Traut	4167628622	Add goal core runtime (4 / 5) (#18076 ) Adds the core runtime behavior for active goals on top of the model tools from PR 3. ## Why A long-running goal should be a core runtime concern, not something every client has to implement. Core owns the turn lifecycle, tool completion boundaries, interruptions, resume behavior, and token usage, so it is the right place to account progress, enforce budgets, and decide when to continue work. ## What changed - Centralized goal lifecycle side effects behind `Session::goal_runtime_apply(GoalRuntimeEvent::...)`. - Starts goal continuation turns only when the session is idle; pending user input and mailbox work take priority. - Accounts token and wall-clock usage at turn, tool, mutation, interrupt, and resume boundaries; `get_thread_goal` remains read-only. - Preserves sub-second wall-clock remainder across accounting boundaries so long-running goals do not drift downward over time. - Treats token budget exhaustion as a soft stop by marking the goal `budget_limited` and injecting wrap-up steering instead of aborting the active turn. - Suppresses budget steering when `update_goal` marks a goal complete. - Pauses active goals on interrupt and auto-reactivates paused goals when a thread resumes outside plan mode. - Suppresses repeated automatic continuation when a continuation turn makes no tool calls. - Added continuation and budget-limit prompt templates. ## Verification - Added focused core coverage for continuation scheduling, accounting boundaries, budget-limit steering, completion accounting, interrupt pause behavior, resume auto-activation, and wall-clock remainder accounting.	2026-04-24 21:16:00 -07:00
Tom	588f7a9fc4	[codex] add non-local thread store regression harness (#19266 ) - Add an integration test that guarantees nothing gets written to codex home dir or sqlite when running a rollout with a non-local ThreadStore - Add an in-memory "spy" ThreadStore for tests like this Note I could not find a good way to also ensure there were no filesystem _reads_ that didn't go through threadstore. I explored a more elaborate sandboxed-subprocess approach but it isn't platform portable and felt like it wasn't (yet) worth it.	2026-04-24 15:45:44 -07:00
Tom	0a9b559c0b	Migrate fork and resume reads to thread store (#18900 ) - Route cold thread/resume and thread/fork source loading through ThreadStore reads instead of direct rollout path operations - Keep lookups that explicitly specify a rollout-path using the local thread store methods but return an invalid-request error for remote ThreadStore configurations - Add some additional unit tests for code path coverage	2026-04-24 13:51:37 -07:00
jif-oai	28742866c7	Add agents.interrupt_message for interruption markers (#19351 ) ## Why Agent interruptions currently always persist a model-visible interrupted-turn marker before emitting `TurnAborted`. That marker is useful by default because it gives the next model turn context about a deliberately interrupted task, but some deployments need to suppress that history injection entirely while still keeping the client-visible interruption event. ## What changed - Add `[agents] interrupt_message = false` to disable the model-visible interrupted-turn marker. - Resolve the setting into `Config::agent_interrupt_message_enabled`, defaulting to `true` so existing behavior is unchanged. - Apply the setting to both live interrupted turns and interrupted fork snapshots. - Keep emitting `TurnAborted` even when the history marker is disabled. - Regenerate `core/config.schema.json` for the new `agents.interrupt_message` field. ## Testing - `cargo test -p codex-core load_config_resolves_agent_interrupt_message -- --nocapture` - `cargo test -p codex-core disabled_interrupted_fork_snapshot_appends_only_interrupt_event -- --nocapture` - `cargo test -p codex-core multi_agent_v2_interrupted_marker_uses_developer_input_message -- --nocapture` - `cargo test -p codex-core multi_agent_v2_followup_task_can_disable_interrupted_marker -- --nocapture` - `cargo test -p codex-core multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message -- --nocapture` - `cargo check -p codex-core`	2026-04-24 16:02:45 +02:00
jif-oai	120aa07d81	Make MultiAgentV2 interruption markers assistant-authored (#19124 ) ## Why `MultiAgentV2` follow-up messages are delivered to agents as assistant-authored `InterAgentCommunication` envelopes. When `followup_task` used `interrupt: true`, the interrupted-turn guidance was still persisted as a contextual user message, so model-visible history made a system-generated interruption boundary look user-authored. This keeps interruption guidance consistent with the rest of the v2 inter-agent message stream while preserving the legacy marker shape for non-v2 sessions. ## What changed - Make `interrupted_turn_history_marker` feature-aware. - Record the interrupted-turn marker as an assistant `OutputText` message when `Feature::MultiAgentV2` is enabled. - Keep the existing user contextual fragment for non-v2 sessions. - Apply the same feature-aware marker to interrupted fork snapshots. - Add coverage for the live `followup_task` interrupt path and the helper-level v2 marker shape. ## Testing - `cargo test -p codex-core multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message -- --nocapture` - `cargo test -p codex-core multi_agent_v2_interrupted_marker_uses_assistant_output_message -- --nocapture` - `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`	2026-04-24 13:39:26 +02:00
Celia Chen	e8d8080818	feat: let model providers own model discovery (#18950 ) ## Why `codex-models-manager` had grown to own provider-specific concerns: constructing OpenAI-compatible `/models` requests, resolving provider auth, emitting request telemetry, and deciding how provider catalogs should be sourced. That made the manager harder to reuse for providers whose model catalog is not fetched from the OpenAI `/models` endpoint, such as Amazon Bedrock. This change moves provider-specific model discovery behind provider-owned implementations, so the models manager can focus on refresh policy, cache behavior, picker ordering, and model metadata merging. ## What Changed - Introduced a `ModelsManager` trait with separate `OpenAiModelsManager` and `StaticModelsManager` implementations. - Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives outside `codex-models-manager`. - Moved `/models` request construction, provider auth resolution, timeout handling, and request telemetry into `codex-model-provider` via `OpenAiModelsEndpoint`. - Added provider-owned `models_manager(...)` construction so configured OpenAI-compatible providers use `OpenAiModelsManager`, while static/catalog-backed providers can return `StaticModelsManager`. - Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock model IDs. - Updated core/session/thread manager code and tests to depend on `Arc<dyn ModelsManager>`. - Moved offline model test helpers into `codex_models_manager::test_support`. ## Metadata References The Bedrock catalog metadata is based on the official Amazon Bedrock OpenAI model documentation: - [Amazon Bedrock OpenAI models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html) lists the Bedrock model IDs, text input/output modalities, and `128,000` token context window for `gpt-oss-20b` and `gpt-oss-120b`. - [Amazon Bedrock `gpt-oss-120b` model card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html) lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the `bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities, and `128K` context window. - [OpenAI `gpt-oss-120b` model docs](https://developers.openai.com/api/docs/models/gpt-oss-120b) document configurable reasoning effort with `low`, `medium`, and `high`, plus text input/output modality. The display names, default reasoning effort, and priority ordering are Codex-local catalog choices. ## Test Plan - Manually verified app-server model listing with an AWS profile: ```shell CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \ --codex-bin ./target/debug/codex \ -c 'model_provider="amazon-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \ model-list ``` The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0` as the default model and `openai.gpt-oss-20b-1:0` as the second listed model, both text-only and supporting low/medium/high reasoning effort.	2026-04-24 04:28:25 +00:00
starr-openai	49fb25997f	Add sticky environment API and thread state (#18897 ) ## Summary - add sticky environment selections to app-server v2 thread/start and turn/start request flow - carry thread-level selections through core session/thread state - add app-server coverage for sticky selections and turn overrides ## Stack 1. This PR: API and thread persistence 2. #18898: config.toml named environment loading 3. #18899: downstream tool/runtime consumers ## Validation - Not run locally; split only. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-23 18:57:13 -07:00
cassirer-openai	e3c8720a99	[rollout_trace] Add debug trace reduction command (#18880 ) ## Summary Adds the debug CLI entry point for reducing recorded rollout traces. This gives developers a direct way to inspect whether the emitted trace stream reduces into the expected conversation/runtime model. ## Stack This is PR 5/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This PR is intentionally last: it depends on the trace crate, core recorder, runtime/tool events, and session/agent edge data all existing. The command should remain a debug/developer tool and avoid adding new runtime behavior. The useful review question is whether the CLI exposes the reducer in the smallest practical way for local inspection without turning the debug command into a supported user-facing workflow.	2026-04-24 01:56:48 +00:00
Tom	f1061d9d07	[codex] Implement remote thread store methods (#19008 )	2026-04-23 17:49:28 +00:00
Tom	f1923a38b1	[codex] Route live thread writes through ThreadStore (#18882 ) Begin migrating the thread write codepaths to ThreadStore. This starts using ThreadStore inside of core session code, not only in the app server code. Rework the interfaces around thread recording/persistence. We're left with the following: * `ThreadManager`: owns the process-level registry of loaded threads and handles cross-thread orchestration: start, resume, fork, lookup, remove, and route ops to running CodexThreads. * `CodexThread`: represents one loaded/running thread from the outside. It is the handle app-server and callers use to submit ops, inspect session metadata, and shut the thread down. * `LiveThread`: session-owned persistence lifecycle handle for one active thread. Core session code uses it to append rollout items, materialize lazy persistence, flush, shutdown, discard init-failed writers, and load that thread’s persisted history. * `ThreadStore`: storage backend abstraction. It answers “how are threads persisted, read, listed, updated, archived?” Local and remote implementations live behind this trait. * `LocalThreadStore`: local ThreadStore implementation. It owns the file/sqlite-specific details and keeps RolloutRecorder as a local implementation detail. This is a few too many Thread abstractions for my liking, but they do all represent different concepts / needs / layers. Migration note: in places where the core code explicitly requires a path, rather than a thread ID, throw an error if we're running with a remote store. Cover the new local live-writer lifecycle with focused tests and preserve app-server thread-start behavior, including ephemeral pathless sessions.	2026-04-23 10:17:09 -07:00
cassirer-openai	f67383bcba	[rollout_trace] Record core session rollout traces (#18877 ) ## Summary Wires rollout trace recording into `codex-core` session and turn execution. This records the core model request/response, compaction, and session lifecycle boundaries needed for replay without yet tracing every nested runtime/tool boundary. ## Stack This is PR 2/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This layer is the first live integration point. The important review question is whether trace recording is isolated from normal session behavior: trace failures should not become user-visible execution failures, and recording should preserve the existing turn/session lifecycle semantics. The PR depends on the reducer/data model from the first stack entry and only introduces the core recorder surface that later PRs use for richer runtime and relationship events.	2026-04-22 17:00:48 +00:00
starr-openai	ddbe2536be	Support multiple managed environments (#18401 ) ## Summary - refactor EnvironmentManager to own keyed environments with default/local lookup helpers - keep remote exec-server client creation lazy until exec/fs use - preserve disabled agent environment access separately from internal local environment access ## Validation - not run (per Codex worktree instruction to avoid tests/builds unless requested) --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 15:29:35 -07:00
Andrey Mishchenko	ab65fbbdd6	Add `codex debug models` to show model catalog (#18625 )	2026-04-20 05:42:22 +00:00
pakrym-oai	71e4c6fa17	Move codex module under session (#18249 ) ## Summary - rename the core codex module root to session/mod.rs without using #[path] - move the codex module directory and tests under core/src/session - remove session/mod.rs reexports so call sites use explicit child module paths ## Testing - cargo test -p codex-core --lib - cargo check -p codex-core --tests - just fmt - just fix -p codex-core - git diff --check	2026-04-17 16:18:53 +00:00
pakrym-oai	96254a763a	Make skill loading filesystem-aware (#17720 ) Migrates skill loading to support reading repo skills from the remote environment.	2026-04-14 15:40:40 -07:00
pakrym-oai	3b24a9a532	Refactor plugin loading to async (#17747 ) Simplifies skills migration.	2026-04-13 21:52:56 -07:00
pakrym-oai	ac82443d07	Use AbsolutePathBuf in skill loading and codex_home (#17407 ) Helps with FS migration later	2026-04-13 10:26:51 -07:00
Abhinav	7999b0f60f	Support clear SessionStart source (#17073 ) ## Motivation The `SessionStart` hook already receives `startup` and `resume` sources, but sessions created from `/clear` previously looked like normal startup sessions. This makes it impossible for hook authors to distinguish between these with the matcher. ## Summary - Add `InitialHistory::Cleared` so `/clear`-created sessions can be distinguished from ordinary startup sessions. - Add `SessionStartSource::Clear` and wire it through core, app-server thread start params, and TUI clear-session flow. - Update app-server protocol schemas, generated TypeScript, docs, and related tests. https://github.com/user-attachments/assets/9cae3cb4-41c7-4d06-b34f-966252442e5c	2026-04-10 16:05:21 -07:00
rhan-oai	5779be314a	[codex-analytics] add compaction analytics event (#17155 ) - event for compaction analytics - introduces thread-connection and thread metadata caches for data denormalization, expected to be useful for denormalization onto core emitted events in general - threads analytics event client into core (mirrors approved implementation in #16640) - denormalizes key thread metadata: thread_source, subagent_source, parent_thread_id, as well as app-server client and runtime metadata) - compaction strategy defaults to memento, forward compatible with expected prefill_compaction strategy 1. Manual standalone compact, local `INFO \| 2026-04-09 17:35:50 \| codex_backend.routers.analytics_events \| analytics_events.track_analytics_events:526 \| Tracked codex_compaction_event event params={'thread_id': '019d74d0-5cfb-70c0-bef9-165c3bf9b2df', 'turn_id': '019d74d0-d7f6-7c81-acc6-aae2030243d6', 'product_surface': 'codex', 'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name': 'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process', 'experimental_api_enabled': True}, 'runtime': {'codex_rs_version': '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0', 'runtime_arch': 'aarch64'}, 'trigger': 'manual', 'reason': 'user_requested', 'implementation': 'responses', 'phase': 'standalone_turn', 'strategy': 'memento', 'status': 'completed', 'active_context_tokens_before': 20170, 'active_context_tokens_after': 4830, 'started_at': 1775781337, 'completed_at': 1775781350, 'thread_source': 'user', 'subagent_source': None, 'parent_thread_id': None, 'error': None, 'duration_ms': 13524} \| ` 2. Auto pre-turn compact, local `INFO \| 2026-04-09 17:37:30 \| codex_backend.routers.analytics_events \| analytics_events.track_analytics_events:526 \| Tracked codex_compaction_event event params={'thread_id': '019d74d2-45ef-71d1-9c93-23cc0c13d988', 'turn_id': '019d74d2-7b42-7372-9f0e-c0da3f352328', 'product_surface': 'codex', 'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name': 'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process', 'experimental_api_enabled': True}, 'runtime': {'codex_rs_version': '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0', 'runtime_arch': 'aarch64'}, 'trigger': 'auto', 'reason': 'context_limit', 'implementation': 'responses', 'phase': 'pre_turn', 'strategy': 'memento', 'status': 'completed', 'active_context_tokens_before': 20063, 'active_context_tokens_after': 4822, 'started_at': 1775781444, 'completed_at': 1775781449, 'thread_source': 'user', 'subagent_source': None, 'parent_thread_id': None, 'error': None, 'duration_ms': 5497} \| ` 3. Auto mid-turn compact, local `INFO \| 2026-04-09 17:38:28 \| codex_backend.routers.analytics_events \| analytics_events.track_analytics_events:526 \| Tracked codex_compaction_event event params={'thread_id': '019d74d3-212f-7a20-8c0a-4816a978675e', 'turn_id': '019d74d3-3ee1-7462-89f6-2ffbeefcd5e3', 'product_surface': 'codex', 'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name': 'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process', 'experimental_api_enabled': True}, 'runtime': {'codex_rs_version': '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0', 'runtime_arch': 'aarch64'}, 'trigger': 'auto', 'reason': 'context_limit', 'implementation': 'responses', 'phase': 'mid_turn', 'strategy': 'memento', 'status': 'completed', 'active_context_tokens_before': 20325, 'active_context_tokens_after': 14641, 'started_at': 1775781500, 'completed_at': 1775781508, 'thread_source': 'user', 'subagent_source': None, 'parent_thread_id': None, 'error': None, 'duration_ms': 7507} \| ` 4. Remote /responses/compact, manual standalone `INFO \| 2026-04-09 17:40:20 \| codex_backend.routers.analytics_events \| analytics_events.track_analytics_events:526 \| Tracked codex_compaction_event event params={'thread_id': '019d74d4-7a11-78a1-89f7-0535a1149416', 'turn_id': '019d74d4-e087-7183-9c20-b1e40b7578c0', 'product_surface': 'codex', 'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name': 'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process', 'experimental_api_enabled': True}, 'runtime': {'codex_rs_version': '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0', 'runtime_arch': 'aarch64'}, 'trigger': 'manual', 'reason': 'user_requested', 'implementation': 'responses_compact', 'phase': 'standalone_turn', 'strategy': 'memento', 'status': 'completed', 'active_context_tokens_before': 23461, 'active_context_tokens_after': 6171, 'started_at': 1775781601, 'completed_at': 1775781620, 'thread_source': 'user', 'subagent_source': None, 'parent_thread_id': None, 'error': None, 'duration_ms': 18971} \| `	2026-04-10 13:03:54 -07:00
jif-oai	89f1a44afa	feat: /feedback cascade (#16442 ) Example here: https://openai.sentry.io/issues/7380240430/?project=4510195390611458&query=019d498f-bec4-7ba2-96d2-612b1e4507df&referrer=issue-stream	2026-04-07 12:47:37 +01:00
rhan-oai	756c45ec61	[codex-analytics] add protocol-native turn timestamps (#16638 ) --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16638). * #16870 * #16706 * #16659 * #16641 * #16640 * __->__ #16638	2026-04-06 16:22:59 -07:00
Ahmed Ibrahim	af8a9d2d2b	remove temporary ownership re-exports (#16626 ) Stacked on #16508. This removes the temporary `codex-core` / `codex-login` re-export shims from the ownership split and rewrites callsites to import directly from `codex-model-provider-info`, `codex-models-manager`, `codex-api`, `codex-protocol`, `codex-feedback`, and `codex-response-debug-context`. No behavior change intended; this is the mechanical import cleanup layer split out from the ownership move. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-03 00:33:34 -07:00
Michael Bolin	aa2403e2eb	core: remove cross-crate re-exports from lib.rs (#16512 ) ## Why `codex-core` was re-exporting APIs owned by sibling `codex-` crates, which made downstream crates depend on `codex-core` as a proxy module instead of the actual owner crate. Removing those forwards makes crate boundaries explicit and lets leaf crates drop unnecessary `codex-core` dependencies. In this PR, this reduces the dependency on `codex-core` to `codex-login` in the following files: ``` codex-rs/backend-client/Cargo.toml codex-rs/mcp-server/tests/common/Cargo.toml ``` ## What - Remove `codex-rs/core/src/lib.rs` re-exports for symbols owned by `codex-login`, `codex-mcp`, `codex-rollout`, `codex-analytics`, `codex-protocol`, `codex-shell-command`, `codex-sandboxing`, `codex-tools`, and `codex-utils-path`. - Delete the `default_client` forwarding shim in `codex-rs/core`. - Update in-crate and downstream callsites to import directly from the owning `codex-` crate. - Add direct Cargo dependencies where callsites now target the owner crate, and remove `codex-core` from `codex-rs/backend-client`.	2026-04-01 23:06:24 -07:00
pakrym-oai	8fa88fa8ca	Add cached environment manager for exec server URL (#15785 ) Add environment manager that is a singleton and is created early in app-server (before skill manager, before config loading). Use an environment variable to point to a running exec server.	2026-03-25 16:14:36 -07:00
Ahmed Ibrahim	9dbe098349	Extract codex-core-skills crate (#15749 ) ## Summary - move skill loading and management into codex-core-skills - leave codex-core with the thin integration layer and shared wiring ## Testing - CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-25 12:57:42 -07:00
Ruslan Nigmatullin	daf5e584c2	core: Make FileWatcher reusable (#15093 ) ### Summary Make `FileWatcher` a reusable core component which can be built upon. Extract skills-related logic into a separate `SkillWatcher`. Introduce a composable `ThrottledWatchReceiver` to throttle filesystem events, coalescing affected paths among them. ### Testing Updated existing unit tests.	2026-03-24 11:04:47 -07:00
Charley Cunningham	f547b79bd0	Add fork snapshot modes (#15239 ) ## Summary - add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can request either a committed snapshot or an interrupted snapshot - share the model-visible `<turn_aborted>` history marker between the live interrupt path and interrupted forks - update the small set of direct fork callsites to pass `ForkSnapshotMode::Committed` Note: this enables /btw to work similarly as Esc to interrupt (hopefully somewhat in distribution) --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-23 19:05:42 -07:00
jif-oai	18f1a08bc9	feat: new op type for sub-agents communication (#15556 ) Add `InterAgentCommunication` for v2 agent communication	2026-03-23 21:09:00 +00:00
jif-oai	37ac0c093c	feat: structured multi-agent output (#15515 ) Send input now sends messages as assistant message and with this format: ``` author: /root/worker_a recipient: /root/worker_a/tester other_recipients: [] Content: bla bla bla. Actual content. Only text for now ```	2026-03-23 18:53:54 +00:00
xl-openai	db5781a088	feat: support product-scoped plugins. (#15041 ) 1. Added SessionSource::Custom(String) and --session-source. 2. Enforced plugin and skill products by session_source. 3. Applied the same filtering to curated background refresh.	2026-03-19 00:46:15 -07:00
alexsong-oai	825d09373d	Support featured plugins (#15042 )	2026-03-18 17:45:30 -07:00
Dylan Hurd	84f4e7b39d	fix(subagents) share execpolicy by default (#13702 ) ## Summary If a subagent requests approval, and the user persists that approval to the execpolicy, it should (by default) propagate. We'll need to rethink this a bit in light of coming Permissions changes, though I think this is closer to the end state that we'd want, which is that execpolicy changes to one permissions profile should be synced across threads. ## Testing - [x] Added integration test --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 06:42:26 +00:00
Ahmed Ibrahim	b02388672f	Stabilize Windows cmd-based shell test harnesses (#14958 ) ## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:21:46 +00:00
Michael Bolin	b77fe8fefe	Apply argument comment lint across codex-rs (#14652 ) ## Why Once the repo-local lint exists, `codex-rs` needs to follow the checked-in convention and CI needs to keep it from drifting. This commit applies the fallback `/param/` style consistently across existing positional literal call sites without changing those APIs. The longer-term preference is still to avoid APIs that require comments by choosing clearer parameter types and call shapes. This PR is intentionally the mechanical follow-through for the places where the existing signatures stay in place. After rebasing onto newer `main`, the rollout also had to cover newly introduced `tui_app_server` call sites. That made it clear the first cut of the CI job was too expensive for the common path: it was spending almost as much time installing `cargo-dylint` and re-testing the lint crate as a representative test job spends running product tests. The CI update keeps the full workspace enforcement but trims that extra overhead from ordinary `codex-rs` PRs. ## What changed - keep a dedicated `argument_comment_lint` job in `rust-ci` - mechanically annotate remaining opaque positional literals across `codex-rs` with exact `/param/` comments, including the rebased `tui_app_server` call sites that now fall under the lint - keep the checked-in style aligned with the lint policy by using `/param/` and leaving string and char literals uncommented - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo registry/git metadata in the lint job - split changed-path detection so the lint crate's own `cargo test` step runs only when `tools/argument-comment-lint/` or `rust-ci.yml` changes - continue to run the repo wrapper over the `codex-rs` workspace, so product-code enforcement is unchanged Most of the code changes in this commit are intentionally mechanical comment rewrites or insertions driven by the lint itself. ## Verification - `./tools/argument-comment-lint/run.sh --workspace` - `cargo test -p codex-tui-app-server -p codex-tui` - parsed `.github/workflows/rust-ci.yml` locally with PyYAML --- -> #14652 * #14651	2026-03-16 16:48:15 -07:00
Eric Traut	4b9d5c8c1b	Add openai_base_url config override for built-in provider (#12031 ) We regularly get bug reports from users who mistakenly have the `OPENAI_BASE_URL` environment variable set. This PR deprecates this environment variable in favor of a top-level config key `openai_base_url` that is used for the same purpose. By making it a config key, it will be more visible to users. It will also participate in all of the infrastructure we've added for layered and managed configs. Summary - introduce the `openai_base_url` top-level config key, update schema/tests, and route the built-in openai provider through it while - fall back to deprecated `OPENAI_BASE_URL` env var but warn user of deprecation when no `openai_base_url` config key is present - update CLI, SDK, and TUI code to prefer the new config path (with a deprecated env-var fallback) and document the SDK behavior change	2026-03-13 20:12:25 -06:00
Michael Bolin	0c8a36676a	fix: move inline codex-rs/core unit tests into sibling files (#14444 ) ## Why PR #13783 moved the `codex.rs` unit tests into `codex_tests.rs`. This applies the same extraction pattern across the rest of `codex-rs/core` so the production modules stay focused on runtime code instead of large inline test blocks. Keeping the tests in sibling files also makes follow-up edits easier to review because product changes no longer have to share a file with hundreds or thousands of lines of test scaffolding. ## What changed - replaced each inline `mod tests { ... }` in `codex-rs/core/src/*` with a path-based module declaration - moved each extracted unit test module into a sibling `_tests.rs` file, using `mod_tests.rs` for `mod.rs` modules - preserved the existing `cfg(...)` guards and module-local structure so the refactor remains structural rather than behavioral ## Testing - `cargo test -p codex-core --lib` (`1653 passed; 0 failed; 5 ignored`) - `just fix -p codex-core` - `cargo fmt --check` - `cargo shear`	2026-03-12 08:16:36 -07:00
Owen Lin	5bc82c5b93	feat(app-server): propagate traces across tasks and core ops (#14387 ) ## Summary This PR keeps app-server RPC request trace context alive for the full lifetime of the work that request kicks off (e.g. for `thread/start`, this is `app-server rpc handler -> tokio background task -> core op submissions`). Previously we lose trace lineage once the request handler returns or hands work off to background tasks. This approach is especially relevant for `thread/start` and other RPC handlers that run in a non-blocking way. In the near future we'll most likely want to make all app-server handlers run in a non-blocking way by default, and only queue operations that must operate in order (e.g. thread RPCs per thread?), so we want to make sure tracing in app-server just generally works. Depends on https://github.com/openai/codex/pull/14300 Before <img width="155" height="207" alt="image" src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af" /> After <img width="299" height="337" alt="image" src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea" /> ## What changed - Keep request-scoped trace context around until we send the final response or error, or the connection closes. - Thread that trace context through detached `thread/start` work so background startup stays attached to the originating request. - Pass request trace context through to downstream core operations, including: - thread creation - resume/fork flows - turn submission - review - interrupt - realtime conversation operations - Add tracing tests that verify: - remote W3C trace context is preserved for `thread/start` - remote W3C trace context is preserved for `turn/start` - downstream core spans stay under the originating request span - request-scoped tracing state is cleaned up correctly - Clean up shutdown behavior so detached background tasks and spawned threads are drained before process exit.	2026-03-11 20:18:31 -07:00
Anton Panasenko	77b0c75267	feat: search_tool migrate to bring you own tool of Responses API (#14274 ) ## Why to support a new bring your own search tool in Responses API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search) we migrating our bm25 search tool to use official way to execute search on client and communicate additional tools to the model. ## What - replace the legacy `search_tool_bm25` flow with client-executed `tool_search` - add protocol, SSE, history, and normalization support for `tool_search_call` and `tool_search_output` - return namespaced Codex Apps search results and wire namespaced follow-up tool calls back into MCP dispatch	2026-03-11 17:51:51 -07:00
xl-openai	0c33af7746	feat: support disabling bundled system skills (#13792 ) Support disable bundled system skills with a config: [skills.bundled] enabled = false	2026-03-09 22:02:53 -07:00
Michael Bolin	7134220f3c	core: box wrapper futures to reduce stack pressure (#13429 ) Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This uses the same general fix pattern as [#12421](https://github.com/openai/codex/pull/12421), but in the `codex-core` compact/resume/fork path. ## Why `compact_resume_after_second_compaction_preserves_history` started overflowing the stack on Windows CI after `#13388`. The important part is that this was not a compaction-recursion bug. The test exercises a path with several thin `async fn` wrappers around much larger thread-spawn, resume, and fork futures. When one `async fn` awaits another inline, the outer future stores the callee future as part of its own state machine. In a long wrapper chain, that means a caller can accidentally inline a lot more state than the source code suggests. That is exactly what was happening here: - `ThreadManager` convenience methods such as `start_thread`, `resume_thread_from_rollout`, and `fork_thread` were inlining the larger spawn/resume futures beneath them. - `core_test_support::test_codex` added another wrapper layer on top of those same paths. - `compact_resume_fork` adds a few more helpers, and this particular test drives the resume/fork path multiple times. On Windows, that was enough to push both the libtest thread and Tokio worker threads over the edge. The previous 8 MiB test-thread workaround proved the failure was stack-related, but it did not address the underlying future size. ## How This Was Debugged The useful debugging pattern here was to turn the CI-only failure into a local low-stack repro. 1. First, remove the explicit large-stack harness so the test runs on the normal `#[tokio::test]` path. 2. Build the test binary normally. 3. Re-run the already-built `tests/all` binary directly with progressively smaller `RUST_MIN_STACK` values. Running the built binary directly matters: it keeps the reduced stack size focused on the test process instead of also applying it to `cargo` and `rustc`. That made it possible to answer two questions quickly: - Does the failure still reproduce without the workaround? Yes. - Does boxing the wrapper futures actually buy back stack headroom? Also yes. After this change, the built test binary passes with `RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough evidence to justify removing the explicit 8 MiB override while keeping a deterministic low-stack repro for future debugging. If we hit a similar issue again, the first places to inspect are thin `async fn` wrappers that mostly forward into a much larger async implementation. ## `Box::pin()` Primer `async fn` compiles into a state machine. If a wrapper does this: ```rust async fn wrapper() { inner().await; } ``` then `wrapper()` stores the full `inner()` future inline as part of its own state. If the wrapper instead does this: ```rust async fn wrapper() { Box::pin(inner()).await; } ``` then the child future lives on the heap, and the outer future only stores a pinned pointer to it. That usually trades one allocation for a substantially smaller outer future, which is exactly the tradeoff we want when the problem is stack pressure rather than raw CPU time. Useful references: - [`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin) - [Async book: Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html) ## What Changed - Boxed the wrapper futures in `core/src/thread_manager.rs` around `start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the corresponding `ThreadManagerState` spawn helpers so callers no longer inline the full spawn/resume state machine through multiple layers. - Boxed the matching test-only wrapper futures in `core/tests/common/test_codex.rs` and `core/tests/suite/compact_resume_fork.rs`, which sit directly on top of the same path. - Restored `compact_resume_after_second_compaction_preserves_history` in `core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]` and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing. - Simplified a tiny helper in `compact_resume_fork` by making `fetch_conversation_path()` synchronous, which removes one more unnecessary future layer from the test path. ## Verification - `cargo test -p codex-core --test all suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history -- --exact --nocapture` - `cargo test -p codex-core --test all suite::compact_resume_fork -- --nocapture` - Re-ran the built `codex-core` `tests/all` binary directly with reduced stack sizes: - `RUST_MIN_STACK=917504` passes - `RUST_MIN_STACK=786432` still overflows - `cargo test -p codex-core` - Still fails locally in unrelated existing integration areas that expect the `codex` / `test_stdio_server` binaries or hit the existing `search_tool` wiremock mismatches.	2026-03-04 05:44:52 +00:00
daveaitel-openai	c2e126f92a	core: reuse parent shell snapshot for thread-spawn subagents (#13052 ) ## Summary - reuse the parent shell snapshot when spawning/forking/resuming `SessionSource::SubAgent(SubAgentSource::ThreadSpawn { .. })` sessions - plumb inherited snapshot through `AgentControl -> ThreadManager -> Codex::spawn -> SessionConfiguration` - skip shell snapshot refresh on cwd updates for thread-spawn subagents so inherited snapshots are not replaced ## Why - avoids per-subagent shell snapshot creation and cleanup work - keeps thread-spawn subagents on the parent snapshot path, matching the intended parent/child snapshot model ## Validation - `just fmt` (in `codex-rs`) - `cargo test -p codex-core --no-run` - `cargo test -p codex-core spawn_agent -- --nocapture` - `cargo test -p codex-core --test all suite::agent_jobs::spawn_agents_on_csv_runs_and_exports` ## Notes - full `cargo test -p codex-core --test all` was left running separately for broader verification Co-authored-by: Codex <noreply@openai.com>	2026-03-02 15:53:15 +00:00
Ahmed Ibrahim	0aeb55bf08	Record realtime close marker on replacement (#13058 ) ## Summary - record a realtime close developer message when a new realtime session replaces an active one - assert the replacement marker through the mocked responses request path --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charles Cunningham <ccunningham@openai.com>	2026-03-01 13:54:12 -08:00
xl-openai	752402c4fe	feat: load from plugins (#12864 ) Support loading plugins. Plugins can now be enabled via [plugins.<name>] in config.toml. They are loaded as first-class entities through PluginsManager, and their default skills/ and .mcp.json contributions are integrated into the existing skills and MCP flows.	2026-03-01 10:50:56 -08:00

1 2

82 Commits