js_repl_reset previously raced with in-flight/new js_repl executions
because reset() could clear exec_tool_calls without synchronizing with
execute(). In that window, a running exec could lose its per-exec
tool-call context, and subsequent kernel RunTool messages would fail
with js_repl exec context not found. The fix serializes reset and
execute on the same exec_lock, so reset cannot run concurrently with
exec setup/teardown. We also keep the timeout path safe by performing
reset steps inline while execute() already holds the lock, avoiding
re-entrant lock acquisition. A regression test now verifies that reset
waits for the exec lock and does not clear tool-call state early.
Remove the waiting loop in `reset` so it no longer blocks on potentially
hanging exec tool calls + add `clear_all_exec_tool_calls_map` to drain
the map and notify waiters so `reset` completes immediately
## Summary
This feature is now reasonably stable, let's remove it so we can
simplify our upcoming iterations here.
## Testing
- [x] Existing tests pass
Summary
- rename the `collab` handlers and UI files to `multi_agents` to match
the new naming
- update module references and specs so the handlers and TUI widgets
consistently use the renamed files
- keep the existing functionality while aligning file and module names
with the multi-agent terminology
The idea is to have 2 family of agents.
1. Built-in that we packaged directly with Codex
2. User defined that are defined using the `agents_config.toml` file. It
can reference config files that will override the agent config. This
looks like this:
```
version = 1
[agents.explorer]
description = """Use `explorer` for all codebase questions.
Explorers are fast and authoritative.
Always prefer them over manual search or file reading.
Rules:
- Ask explorers first and precisely.
- Do not re-read or re-search code they cover.
- Trust explorer results without verification.
- Run explorers in parallel when useful.
- Reuse existing explorers for related questions."""
config_file = "explorer.toml"
```
### What changed
1. Removed per-turn MCP selection reset in `core/src/tasks/mod.rs`.
2. Added `SessionState::set_mcp_tool_selection(Vec<String>)` in
`core/src/state/session.rs` for authoritative restore behavior (deduped,
order-preserving, empty clears).
3. Added rollout parsing in `core/src/codex.rs` to recover
`active_selected_tools` from prior `search_tool_bm25` outputs:
- tracks matching `call_id`s
- parses function output text JSON
- extracts `active_selected_tools`
- latest valid payload wins
- malformed/non-matching payloads are ignored
4. Applied restore logic to resumed and forked startup paths in
`core/src/codex.rs`.
5. Updated instruction text to session/thread scope in
`core/templates/search_tool/tool_description.md`.
6. Expanded tests in `core/tests/suite/search_tool.rs`, plus unit
coverage in:
- `core/src/codex.rs`
- `core/src/state/session.rs`
### Behavior after change
1. Search activates matched tools.
2. Additional searches union into active selection.
3. Selection survives new turns in the same thread.
4. Resume/fork restores selection from rollout history.
5. Separate threads do not inherit selection unless forked.
## Summary
- add a distinct `linux_bubblewrap` sandbox tag when the Linux
bubblewrap pipeline feature is enabled
- thread the bubblewrap feature flag into sandbox tag generation for:
- turn metadata header emission
- tool telemetry metric tags and after-tool-use hooks
- add focused unit tests for `sandbox_tag` precedence and Linux
bubblewrap behavior
## Validation
- `just fmt`
- `cargo clippy -p codex-core --all-targets`
- `cargo test -p codex-core sandbox_tags::tests`
- started `cargo test -p codex-core` and stopped it per request
Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
### Description
#### Summary
Introduces the core plumbing required for structured network approvals
#### What changed
- Added structured network policy decision modeling in core.
- Added approval payload/context types needed for network approval
semantics.
- Wired shell/unified-exec runtime plumbing to consume structured
decisions.
- Updated related core error/event surfaces for structured handling.
- Updated protocol plumbing used by core approval flow.
- Included small CLI debug sandbox compatibility updates needed by this
layer.
#### Why
establishes the minimal backend foundation for network approvals without
yet changing high-level orchestration or TUI behavior.
#### Notes
- Behavior remains constrained by existing requirements/config gating.
- Follow-up PRs in the stack handle orchestration, UX, and app-server
integration.
---------
Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
## Summary
This PR fixes a race in `js_repl` tool-call draining that could leave an
exec waiting indefinitely for in-flight tool calls to finish.
The fix is in:
-
`/Users/fjord/code/codex-jsrepl-seq/codex-rs/core/src/tools/js_repl/mod.rs`
## Problem
`js_repl` tracks in-flight tool calls per exec and waits for them to
drain on completion/timeout/cancel paths.
The previous wait logic used a check-then-wait pattern with `Notify`
that could miss a wakeup:
1. Observe `in_flight > 0`
2. Drop lock
3. Register wait (`notified().await`)
If `notify_waiters()` happened between (2) and (3), the waiter could
sleep until another notification that never comes.
## What changed
- Updated all exec-tool-call wait loops to create an owned notification
future while holding the lock:
- use `Arc<Notify>::notified_owned()` instead of cloning notify and
awaiting later.
- Applied this consistently to:
- `wait_for_exec_tool_calls`
- `wait_for_all_exec_tool_calls`
- `wait_for_exec_tool_calls_map`
This preserves existing behavior while eliminating the lost-wakeup
window.
## Test coverage
Added a regression test:
- `wait_for_exec_tool_calls_map_drains_inflight_calls_without_hanging`
The test repeatedly races waiter/finisher tasks and asserts bounded
completion to catch hangs.
## Impact
- No API changes.
- No user-facing behavior changes intended.
- Improves reliability of exec lifecycle boundaries when tool calls are
still in flight.
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/11796
- 👉 `2` https://github.com/openai/codex/pull/11800
- ⏳ `3` https://github.com/openai/codex/pull/10673
- ⏳ `4` https://github.com/openai/codex/pull/10670
## Summary
Fixes a flaky/panicking `js_repl` image-path test by running it on a
multi-thread Tokio runtime and tightening assertions to focus on real
behavior.
## Problem
`js_repl_can_attach_image_via_view_image_tool` in
`/Users/fjord/code/codex-jsrepl-seq/codex-rs/core/src/tools/js_repl/mod.rs`
can panic under single-thread test runtime with:
`can call blocking only when running on the multi-threaded runtime`
It also asserted a brittle user-facing text string.
## Changes
1. Updated the test runtime to:
`#[tokio::test(flavor = "multi_thread", worker_threads = 2)]`
2. Removed the brittle `"attached local image path"` string assertion.
3. Kept the concrete side-effect assertions:
- tool call succeeds
- image is actually injected into pending input (`InputImage` with
`data:image/png;base64,...`)
## Why this is safe
This is test-only behavior. No production runtime code paths are
changed.
## Validation
- Ran:
`cargo test -p codex-core
tools::js_repl::tests::js_repl_can_attach_image_via_view_image_tool --
--nocapture`
- Result: pass
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/11796
- ⏳ `2` https://github.com/openai/codex/pull/11800
- ⏳ `3` https://github.com/openai/codex/pull/10673
- ⏳ `4` https://github.com/openai/codex/pull/10670
## Summary
- Limit `search_tool_bm25` indexing to `codex_apps` tools only, so
non-Apps MCP servers are no longer discoverable through this search
path.
- Move search-tool discovery guidance into the `search_tool_bm25` tool
description (via template include) instead of injecting it as a separate
developer message.
- Update Apps discovery guidance wording to clarify when to use
`search_tool_bm25` for Apps-backed systems (for example Slack, Google
Drive, Jira, Notion) and when to call tools directly.
- Remove dead `core` helper code (`filter_codex_apps_mcp_tools` and
`codex_apps_connector_id`) that is no longer used after the
tool-selection refactor.
- Update `core` search-tool tests to assert codex-apps-only behavior and
to validate guidance from the tool description.
## Validation
- ✅ `just fmt`
- ✅ `cargo test -p codex-core search_tool`
- ⚠️ `cargo test -p codex-core` was attempted, but the run repeatedly
stalled on
`tools::js_repl::tests::js_repl_can_attach_image_via_view_image_tool`.
## Tickets
- None
## Why
We currently carry multiple permission-related concepts directly on
`Config` for shell/unified-exec behavior (`approval_policy`,
`sandbox_policy`, `network`, `shell_environment_policy`,
`windows_sandbox_mode`).
Consolidating these into one in-memory struct makes permission handling
easier to reason about and sets up the next step: supporting named
permission profiles (`[permissions.PROFILE_NAME]`) without changing
behavior now.
This change is mostly mechanical: it updates existing callsites to go
through `config.permissions`, but it does not yet refactor those
callsites to take a single `Permissions` value in places where multiple
permission fields are still threaded separately.
This PR intentionally **does not** change the on-disk `config.toml`
format yet and keeps compatibility with legacy config keys.
## What Changed
- Introduced `Permissions` in `core/src/config/mod.rs`.
- Added `Config::permissions` and moved effective runtime permission
fields under it:
- `approval_policy`
- `sandbox_policy`
- `network`
- `shell_environment_policy`
- `windows_sandbox_mode`
- Updated config loading/building so these effective values are still
derived from the same existing config inputs and constraints.
- Updated Windows sandbox helpers/resolution to read/write via
`permissions`.
- Threaded the new field through all permission consumers across core
runtime, app-server, CLI/exec, TUI, and sandbox summary code.
- Updated affected tests to reference `config.permissions.*`.
- Renamed the struct/field from
`EffectivePermissions`/`effective_permissions` to
`Permissions`/`permissions` and aligned variable naming accordingly.
## Verification
- `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
-p codex-exec -p codex-utils-sandbox-summary`
- `cargo build -p codex-core -p codex-tui -p codex-cli -p
codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
## Summary
This PR adds host-integrated helper APIs for `js_repl` and updates model
guidance so the agent can use them reliably.
### What’s included
- Add `codex.tool(name, args?)` in the JS kernel so `js_repl` can call
normal Codex tools.
- Keep persistent JS state and scratch-path helpers available:
- `codex.state`
- `codex.tmpDir`
- Wire `js_repl` tool calls through the standard tool router path.
- Add/align `js_repl` execution completion/end event behavior with
existing tool logging patterns.
- Update dynamic prompt injection (`project_doc`) to document:
- how to call `codex.tool(...)`
- raw output behavior
- image flow via `view_image` (`codex.tmpDir` +
`codex.tool("view_image", ...)`)
- stdio safety guidance (`console.log` / `codex.tool`, avoid direct
`process.std*`)
## Why
- Standardize JS-side tool usage on `codex.tool(...)`
- Make `js_repl` behavior more consistent with existing tool execution
and event/logging patterns.
- Give the model enough runtime guidance to use `js_repl` safely and
effectively.
## Testing
- Added/updated unit and runtime tests for:
- `codex.tool` calls from `js_repl` (including shell/MCP paths)
- image handoff flow via `view_image`
- prompt-injection text for `js_repl` guidance
- execution/end event behavior and related regression coverage
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/10674
- 👉 `2` https://github.com/openai/codex/pull/10672
- ⏳ `3` https://github.com/openai/codex/pull/10671
- ⏳ `4` https://github.com/openai/codex/pull/10673
- ⏳ `5` https://github.com/openai/codex/pull/10670
This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).
### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.
Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.
### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).
This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)
In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit
For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.
And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.
#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.
It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.
## What
- Added `ReadOnlyAccess` in protocol with:
- `Restricted { include_platform_defaults, readable_roots }`
- `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
- `ReadOnly { access: ReadOnlyAccess }`
- `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.
## Compatibility / rollout
- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
## Summary
- Remove `Feature::SearchTool` and the `search_tool` config key from the
feature registry/schema.
- Gate `search_tool_bm25` exposure via `Feature::Apps` in
`core/src/tools/spec.rs`.
- Update MCP selection logic in `core/src/codex.rs` to use
`Feature::Apps` for search-tool behavior.
- Update `core/tests/suite/search_tool.rs` to enable `Feature::Apps`.
- Regenerate `core/config.schema.json` via `just write-config-schema`.
## Testing
- `just fmt`
- `cargo test -p codex-core --test all suite::search_tool::`
## Tickets
- None
## Why
`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.
## Net Change
- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.
## Outcome
- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
## Summary
- Reduced repeated approvals for equivalent wrapper commands and fixed
execpolicy matching for heredoc-style shell invocations, with minimal
behavior change and fail-closed defaults.
## Fixes
1. Canonicalized approval matching for wrappers so equivalent commands
map to the same approval intent.
2. Added heredoc-aware prefix extraction for execpolicy so commands like
`python3 <<'PY' ... PY` match rules such as `prefix_rule(["python3"],
...)`.
3. Kept fallback behavior conservative: if parsing is ambiguous,
existing prompt behavior is preserved.
## Edge Cases Covered
- Wrapper path/name differences: `/bin/bash` vs `bash`, `/bin/zsh` vs
`zsh`.
- Shell modes: `-c` and `-lc`.
- Heredoc forms: quoted delimiter (`<<'PY'`) and unquoted delimiter (`<<
PY`).
- Multi-command heredoc scripts are rejected by the fallback
- Non-heredoc redirections (`>`, etc.) are not treated as heredoc prefix
matches.
- Complex scripts still fall back to prior behavior rather than
expanding permissions.
---------
Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
- Keep `view_image` in the advertised tool list for all models.
- Return a clear error when the current model does not support image
inputs, and cover it with a unit test.
As of this PR, `SessionServices` retains a
`Option<StartedNetworkProxy>`, if appropriate.
Now the `network` field on `Config` is `Option<NetworkProxySpec>`
instead of `Option<NetworkProxy>`.
Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
create the `StartedNetworkProxy`, which is a new struct that retains the
`NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
when it is dropped.)
The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
the appropriate places.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
* #11285
* __->__ #11207
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
**Why We Did This**
- The goal is to reduce MCP tool context pollution by not exposing the
full MCP tool list up front
- It forces an explicit discovery step (`search_tool_bm25`) so the model
narrows tool scope before making MCP calls, which helps relevance and
lowers prompt/tool clutter.
**What It Changed**
- Added a new experimental feature flag `search_tool` in
`core/src/features.rs:90` and `core/src/features.rs:430`.
- Added config/schema support for that flag in
`core/config.schema.json:214` and `core/config.schema.json:1235`.
- Added BM25 dependency (`bm25`) in `Cargo.toml:129` and
`core/Cargo.toml:23`.
- Added new tool handler `search_tool_bm25` in
`core/src/tools/handlers/search_tool_bm25.rs:18`.
- Registered the handler and tool spec in
`core/src/tools/handlers/mod.rs:11` and `core/src/tools/spec.rs:780` and
`core/src/tools/spec.rs:1344`.
- Extended `ToolsConfig` to carry `search_tool` enablement in
`core/src/tools/spec.rs:32` and `core/src/tools/spec.rs:56`.
- Injected dedicated developer instructions for tool-discovery workflow
in `core/src/codex.rs:483` and `core/src/codex.rs:1976`, using
`core/templates/search_tool/developer_instructions.md:1`.
- Added session state to store one-shot selected MCP tools in
`core/src/state/session.rs:27` and `core/src/state/session.rs:131`.
- Added filtering so when feature is enabled, only selected MCP tools
are exposed on the next request (then consumed) in
`core/src/codex.rs:3800` and `core/src/codex.rs:3843`.
- Added E2E suite coverage for
enablement/instructions/hide-until-search/one-turn-selection in
`core/tests/suite/search_tool.rs:72`,
`core/tests/suite/search_tool.rs:109`,
`core/tests/suite/search_tool.rs:147`, and
`core/tests/suite/search_tool.rs:218`.
- Refactored test helper utilities to support config-driven tool
collection in `core/tests/suite/tools.rs:281`.
**Net Behavioral Effect**
- With `search_tool` **off**: existing MCP behavior (tools exposed
normally).
- With `search_tool` **on**: MCP tools start hidden, model must call
`search_tool_bm25`, and only returned `selected_tools` are available for
the next model call.
This PR adds the following field to `Config`:
```rust
pub network: Option<NetworkProxy>,
```
Though for the moment, it will always be initialized as `None` (this
will be addressed in a subsequent PR).
This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
- Plumb input modalities from model catalog through the openai model
protocol. Default to text and image.
- Conditionally add the view_image tool only if input modalities support
image.
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes
For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests
Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
Summary
- wrap `shell -lc` executions that use a snapshot with the session shell
so the saved environment is sourced before delegating to the original
shell
- escape single quotes in the generated wrapper and add tests covering
Bash/Zsh/sh session bootstrapping
Testing
- Not run (not requested)
Summary
- add the new resume_agent collab tool path through core, protocol, and
the app server API, including the resume events
- update the schema/TypeScript definitions plus docs so resume_agent
appears in generated artifacts and README
- note that resumed agents rehydrate rollout history without overwriting
their base instructions
Testing
- Not run (not requested)
<img width="1019" height="284" alt="Screenshot 2026-02-05 at 23 34 08"
src="https://github.com/user-attachments/assets/19ec3ce1-3c3b-40f5-b251-a31d964bf3bb"
/>
Currently, if a config value is set that fails the requirements, we exit
Codex.
Now, instead of this, we print a warning and default to a
requirements-permitting value.
## Summary
- add shared `ModeKind` helpers for display names, TUI visibility, and
`request_user_input` availability
- derive TUI mode filtering/labels from shared `ModeKind` metadata
instead of local hardcoded matches
- derive `request_user_input` availability text and unavailable error
mode names from shared mode metadata
- replace hardcoded known mode names in the Default collaboration-mode
template with `{{KNOWN_MODE_NAMES}}` and fill it from
`TUI_VISIBLE_COLLABORATION_MODES`
- add regression tests for mode metadata sync and placeholder
replacement
## Notes
- `cargo test -p codex-core` integration target (`tests/all`) still
shows pre-existing env-specific failures in this environment due missing
`test_stdio_server` binary resolution; core unit tests are green.
## Codex author
`codex resume 019c26ff-dfe7-7173-bc04-c9e1fff1e447`