## Summary
- add `previous_context_item: Option<TurnContextItem>` to
`ContextManager`
- expose session/state accessors for reading and updating the stored
previous context item
- switch settings diffing to use `TurnContextItem` instead of
`TurnContext`
- remove submission-loop local `previous_context` and persist the
previous context item in history
## Testing
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core --test all model_switching::`
- `cargo test -p codex-core --test all collaboration_instructions::`
- `cargo test -p codex-core --test all personality::`
- `cargo test -p codex-core --test all
permissions_messages::permissions_message_not_added_when_no_change`
Summary
This PR expands MCP client-side approval behavior beyond codex_apps and
tightens elicitation capability signaling.
- Removed the codex_apps-only gate in MCP tool approval checks, so
local/custom MCP servers are now eligible for the same client-side
approval prompt flow when tool annotations indicate side effects.
- Updated approval memory keying to support tools without a connector ID
(connector_id: Option<String>), allowing “Approve this Session” to be
remembered even when connector metadata is missing.
- Updated prompt text for non-codex_apps tools to identify origin as The
<server> MCP server instead of This app.
- Added MCP initialization capability policy so only codex_apps
advertises MCP elicitation capability; other servers advertise no
elicitation support.
- Added regression tests for:
server-specific prompt copy behavior
codex-apps-only elicitation capability advertisement
Testing
- Not run (not requested)
- Summary
- raise `DEFAULT_MEMORIES_MAX_ROLLOUTS_PER_STARTUP` to 16 so more
rollouts are allowed per startup
- lower `DEFAULT_MEMORIES_MIN_ROLLOUT_IDLE_HOURS` to 6 to make rollouts
eligible sooner
- Testing
- Not run (not requested)
Summary
- replace the stale `docs/config.md#feature-flags` reference in the
legacy feature notice with the canonical published URL
- align the deprecation notice test to expect the new link
This addresses #12123
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.
The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
## Summary
Fix `js_repl` package-resolution boundary checks for macOS temp
directory path aliasing (`/var` vs `/private/var`).
## Problem
`js_repl` verifies that resolved bare-package imports stay inside a
configured `node_modules` root.
On macOS, temp directories are commonly exposed as `/var/...` but
canonicalize to `/private/var/...`.
Because the boundary check compared raw paths with `path.relative(...)`,
valid resolutions under temp dirs could be misclassified as escaping the
allowed base, causing false `Module not found` errors.
## Changes
- Add `fs` import in the JS kernel.
- Add `canonicalizePath()` using `fs.realpathSync.native(...)` (with
safe fallback).
- Canonicalize both `base` and `resolvedPath` before running the
`node_modules` containment check.
## Impact
- Fixes false-negative boundary checks for valid package resolutions in
macOS temp-dir scenarios.
- Keeps the existing security boundary behavior intact.
- Scope is limited to `js_repl` kernel module path validation logic.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12177
- ⏳ `2` https://github.com/openai/codex/pull/10673
## Summary
Increase the rollout summary filename slug cap from 20 to 60 characters
in memory storage.
## What changed
- Updated `ROLLOUT_SLUG_MAX_LEN` from `20` to `60` in:
- `codex-rs/core/src/memories/storage.rs`
- Updated slug truncation test to verify 60-char behavior.
## Why
This preserves more semantic context in rollout summary filenames while
keeping existing normalization behavior unchanged.
## Testing
- `just fmt`
- `cargo test -p codex-core
memories::storage::tests::rollout_summary_file_stem_sanitizes_and_truncates_slug
-- --exact`
The issue was that the file_watcher never unsubscribe a file watch. All
of them leave in the owning of the ThreadManager. As a result, for each
newly created thread we create a new file watcher but this one never get
deleted even if we close the thread. On Unix system, a file watcher uses
an `inotify` and after some time we end up having consumed all of them.
This PR adds a mechanism to unsubscribe a file watcher when a thread is
dropped
We've continued to receive reports from users that they're seeing the
error message "Your access token could not be refreshed because your
refresh token was already used. Please log out and sign in again." This
PR fixes two holes in the token refresh logic that lead to this
condition.
Background: A previous change in token refresh introduced the
`UnauthorizedRecovery` object. It implements a state machine in the core
agent loop that first performs a load of the on-disk auth information
guarded by a check for matching account ID. If it finds that the on-disk
version has been updated by another instance of codex, it uses the
reloaded auth tokens. If the on-disk version hasn't been updated, it
issues a refresh request from the token authority.
There are two problems that this PR addresses:
Problem 1: We weren't doing the same thing for the code path used by the
app server interface. This PR effectively replicates the
`UnauthorizedRecovery` logic for that code path.
Problem 2: The `UnauthorizedRecovery` logic contained a hole in the
`ReloadOutcome::Skipped` case. Here's the scenario. A user starts two
instances of the CLI. Instance 1 is active (working on a task), instance
2 is idle. Both instances have the same in-memory cached tokens. The
user then runs `codex logout` or `codex login` to log in to a separate
account, which overwrites the `auth.json` file. Instance 1 receives a
401 and refreshes its token, but it doesn't write the new token to the
`auth.json` file because the account ID doesn't match. Instance 2 is
later activated and presented with a new task. It immediately hits a 401
and attempts to refresh its token but fails because its cached refresh
token is now invalid. To avoid this situation, I've changed the logic to
immediately fail a token refresh if the user has since logged out or
logged in to another account. This will still be seen as an error by the
user, but the cause will be clearer.
I also took this opportunity to clean up the names of existing functions
to make their roles clearer.
* `try_refresh_token` is renamed `request_chatgpt_token_refresh`
* the existing `refresh_token` is renamed `refresh_token_from_authority`
(there's a new higher-level function named `refresh_token` now)
* `refresh_tokens` is renamed `refresh_and_persist_chatgpt_token`, and
it now implicitly reloads
* `update_tokens` is renamed `persist_tokens`
Summary
- prevent delegated review agents from re-enabling blocked tools by
explicitly disabling the Collab feature alongside web search and view
image controls
Testing
- Not run (not requested)
## Summary
This change removes tool-list filtering in `js_repl_tools_only` mode and
relies on the normal model tool descriptions, while still enforcing that
tool execution must go through `js_repl` + `codex.tool(...)`.
## Motivation
The previous `js_repl_tools_only` filtering hid most tools from the
model request, which diverged from standard tool-list behavior and made
signatures less discoverable. I tested that this filtering is not
needed, and the model can follow the prompt to only call tools via
`js_repl`.
## What Changed
- `filter_tools_for_model(...)` in `core/src/tools/spec.rs` is now a
pass-through (no filtering when `js_repl_tools_only` is enabled).
- Updated tests to assert that model tools are not filtered in
`js_repl_tools_only` mode.
- Updated dynamic-tool test to assert dynamic tools remain visible in
model tool specs.
- Removed obsolete test helper used only by the old filtering
assertions.
## Safety / Behavior
- This commit does **not** relax execution policy.
- Direct model tool calls remain blocked in `js_repl_tools_only` mode
(except internal `js_repl` tools), and callers are instructed to use
`js_repl` + `codex.tool(...)`.
## Testing
- `cargo test -p codex-core js_repl_tools_only`
- Manual rollout validation showed the model can follow the `js_repl`
routing instructions without needing filtered tool lists.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12069
- ⏳ `2` https://github.com/openai/codex/pull/10673
- ⏳ `3` https://github.com/openai/codex/pull/10670
…fault
Update the list of platform defaults included for `ReadOnlyAccess`.
When `ReadOnlyAccess::Restricted::include_platform_defaults` is `true`,
the policy defined in
`codex-rs/core/src/seatbelt_platform_defaults.sbpl` is appended to
enable macOS programs to function properly.
# External (non-OpenAI) Pull Request Requirements
In `js_repl` mode, module resolution currently starts from
`js_repl_kernel.js`, which is written to a per-kernel temp dir. This
effectively means that bare imports will not resolve.
This PR adds a new config option, `js_repl_node_module_dirs`, which is a
list of dirs that are used (in order) to resolve a bare import. If none
of those work, the current working directory of the thread is used.
For example:
```toml
js_repl_node_module_dirs = [
"/path/to/node_modules/",
"/other/path/to/node_modules/",
]
```
## Summary
- add a dedicated `core/tests/suite/model_visible_layout.rs` snapshot
suite to materialize model-visible request layout in high-value
scenarios
- add three reviewer-focused snapshot scenarios:
- turn-level context updates (cwd / permissions / personality)
- first post-resume turn with model hydration + personality change
- first post-resume turn where pre-turn model override matches rollout
model
- wire the new suite into `core/tests/suite/mod.rs`
- commit generated `insta` snapshots under `core/tests/suite/snapshots/`
## Why
This creates a stable, reviewable baseline of model-visible context
layout against `main` before follow-on context-management refactors. It
lets subsequent PRs show focused snapshot diffs for behavior changes
instead of introducing the test surface and behavior changes at once.
## Testing
- `just fmt`
- `INSTA_UPDATE=always cargo test -p codex-core model_visible_layout`
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051
- https://github.com/openai/codex/pull/12052👈
### Summary
This PR introduces a feature-gated native shell runtime path that routes
shell execution through a patched zsh exec bridge, removing MCP-specific
behavior from the shell hot path while preserving existing
CommandExecution lifecycle semantics.
When shell_zsh_fork is enabled, shell commands run via patched zsh with
per-`execve` interception through EXEC_WRAPPER. Core receives wrapper
IPC requests over a Unix socket, applies existing approval policy, and
returns allow/deny before the subcommand executes.
### What’s included
**1) New zsh exec bridge runtime in core**
- Wrapper-mode entrypoint (maybe_run_zsh_exec_wrapper_mode) for
EXEC_WRAPPER invocations.
- Per-execution Unix-socket IPC handling for wrapper requests/responses.
- Approval callback integration using existing core approval
orchestration.
- Streaming stdout/stderr deltas to existing command output event
pipeline.
- Error handling for malformed IPC, denial/abort, and execution
failures.
**2) Session lifecycle integration**
SessionServices now owns a `ZshExecBridge`.
Session startup initializes bridge state; shutdown tears it down
cleanly.
**3) Shell runtime routing (feature-gated)**
When `shell_zsh_fork` is enabled:
- Build execution env/spec as usual.
- Add wrapper socket env wiring.
- Execute via `zsh_exec_bridge.execute_shell_request(...)` instead of
the regular shell path.
- Non-zsh-fork behavior remains unchanged.
**4) Config + feature wiring**
- Added `Feature::ShellZshFork` (under development).
- Added config support for `zsh_path` (optional absolute path to patched
zsh):
- `Config`, `ConfigToml`, `ConfigProfile`, overrides, and schema.
- Session startup validates that `zsh_path` exists/usable when zsh-fork
is enabled.
- Added startup test for missing `zsh_path` failure mode.
**5) Seatbelt/sandbox updates for wrapper IPC**
- Extended seatbelt policy generation to optionally allow outbound
connection to explicitly permitted Unix sockets.
- Wired sandboxing path to pass wrapper socket path through to seatbelt
policy generation.
- Added/updated seatbelt tests for explicit socket allow rule and
argument emission.
**6) Runtime entrypoint hooks**
- This allows the same binary to act as the zsh wrapper subprocess when
invoked via `EXEC_WRAPPER`.
**7) Tool selection behavior**
- ToolsConfig now prefers ShellCommand type when shell_zsh_fork is
enabled.
- Added test coverage for precedence with unified-exec enabled.
## Summary
- standardize remote compaction test mocking around one default behavior
in shared helpers
- make default remote compact mocks mirror production shape: keep
`message/user` + `message/developer`, drop assistant/tool artifacts,
then append a summary user message
- switch non-special `compact_remote` tests to the shared default mock
instead of ad-hoc JSON payloads
## Special-case tests that still use explicit mocks
- remote compaction error payload / HTTP failure behavior
- summary-only compact output behavior
- manual `/compact` with no prior user messages
- stale developer-instruction injection coverage
## Why
This removes inconsistent manual remote compaction fixtures and gives us
one source of truth for normal remote compact behavior, while preserving
explicit mocks only where tests intentionally cover non-default
behavior.
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051👈
- https://github.com/openai/codex/pull/12052
With upcoming support for a fork of zsh that allows us to intercept
`execve` and run execpolicy checks for each subcommand as part of a
`CommandExecution`, it will be possible for there to be multiple
approval requests for a shell command like `/path/to/zsh -lc 'git status
&& rg \"TODO\" src && make test'`.
To support that, this PR introduces a new `approval_id` field across
core, protocol, and app-server so that we can associate approvals
properly for subcommands.
### Summary
Ensure that we use the model value from the response header only so that
we are guaranteed with the correct slug name. We are no longer checking
against the model value from response so that we are less likely to have
false positive.
There are two different treatments - for SSE we use the header from the
response and for websocket we check top-level events.
Bumps [arc-swap](https://github.com/vorner/arc-swap) from 1.8.0 to
1.8.2.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/vorner/arc-swap/blob/master/CHANGELOG.md">arc-swap's
changelog</a>.</em></p>
<blockquote>
<h1>1.8.2</h1>
<ul>
<li>Proper gate of <code>Pin</code> (since 1.39 - we are not using only
<code>Pin</code>, but also
<code>Pin::into_inner</code>, <a
href="https://redirect.github.com/vorner/arc-swap/issues/197">#197</a>).</li>
</ul>
<h1>1.8.1</h1>
<ul>
<li>Some more careful orderings (<a
href="https://redirect.github.com/vorner/arc-swap/issues/195">#195</a>).</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="19f0d661a2"><code>19f0d66</code></a>
Version 1.8.2</li>
<li><a
href="c222a22864"><code>c222a22</code></a>
Release 1.8.1</li>
<li><a
href="cccf3548a8"><code>cccf354</code></a>
Upgrade the other ordering too, for transitivity</li>
<li><a
href="e94df5511a"><code>e94df55</code></a>
Merge pull request <a
href="https://redirect.github.com/vorner/arc-swap/issues/195">#195</a>
from 0xfMel/master</li>
<li><a
href="bd5d3276e4"><code>bd5d327</code></a>
Fix Debt::pay failure ordering</li>
<li><a
href="22431daf64"><code>22431da</code></a>
Merge pull request <a
href="https://redirect.github.com/vorner/arc-swap/issues/189">#189</a>
from atouchet/rdm</li>
<li><a
href="b142bd81da"><code>b142bd8</code></a>
Update Readme</li>
<li>See full diff in <a
href="https://github.com/vorner/arc-swap/compare/v1.8.0...v1.8.2">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
rm `remote_models` feature flag.
We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.
Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
### Summary
Builiding off
5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0),
we have created a `model/rerouted` notification that captures the event
so that consumers can render as expected. Keep the `EventMsg::Warning`
path in core so that this does not affect TUI rendering.
`model/rerouted` is meant to be generic to account for future usage
including capacity planning etc.
## Summary
This PR centralizes model-visible state diffing for turn context updates
into one module, while keeping existing behavior and call sites stable.
### What changed
- Added `core/src/context_updates.rs` with the consolidated diffing
logic for:
- environment context updates
- permissions/policy updates
- collaboration mode updates
- model-instruction switch updates
- personality updates
- Added `BuildSettingsUpdateItemsParams` so required dependencies are
passed explicitly.
- Updated `Session::build_settings_update_items` in `core/src/codex.rs`
to delegate to the centralized module.
- Reused the same centralized `personality_message_for` helper from
initial-context assembly to avoid duplicated logic.
- Registered the new module in `core/src/lib.rs`.
## Why
This is a minimal, shippable step toward the model-visible-state design:
all state diff decisions for turn-context update items now live in one
place, improving reviewability and reducing drift risk without expanding
scope.
## Behavior
- Intended to be behavior-preserving.
- No protocol/schema changes.
- No call-site behavior changes beyond routing through the new
centralized logic.
## Testing
Ran targeted tests in this worktree:
- `cargo test -p codex-core
build_settings_update_items_emits_environment_item_for_network_changes`
- `cargo test -p codex-core collaboration_instructions --test all`
Both passed.
## Codex author
`codex resume 019c540f-3951-7352-a3fa-6f07b834d4ce`
The `model_supports_reasoning_summaries` config option was originally
added so users could enable reasoning for custom models (models that
codex doesn't know about). This is how it was documented in the source,
but its implementation didn't match. It was implemented such that it can
also be used to disable reasoning for models that otherwise support
reasoning. This leads to bad behavior for some reasoning models like
`gpt-5.3-codex`. Diagnosing this is difficult, and it has led to many
support issues.
This PR changes the handling of `model_supports_reasoning_summaries` so
it matches its original documented behavior. If it is set to false, it
is a no-op. That is, it never disables reasoning for models that are
known to support reasoning. It can still be used for its intended
purpose -- to enable reasoning for unknown models.
js_repl_reset previously raced with in-flight/new js_repl executions
because reset() could clear exec_tool_calls without synchronizing with
execute(). In that window, a running exec could lose its per-exec
tool-call context, and subsequent kernel RunTool messages would fail
with js_repl exec context not found. The fix serializes reset and
execute on the same exec_lock, so reset cannot run concurrently with
exec setup/teardown. We also keep the timeout path safe by performing
reset steps inline while execute() already holds the lock, avoiding
re-entrant lock acquisition. A regression test now verifies that reset
waits for the exec lock and does not clear tool-call state early.
Remove the waiting loop in `reset` so it no longer blocks on potentially
hanging exec tool calls + add `clear_all_exec_tool_calls_map` to drain
the map and notify waiters so `reset` completes immediately
## Summary
Fixes a few things in our exec_policy handling of prefix_rules:
1. Correctly match redirects specifically for exec_policy parsing. i.e.
if you have `prefix_rule(["echo"], decision="allow")` then `echo hello >
output.txt` should match - this should fix#10321
2. If there already exists any rule that would match our prefix rule
(not just a prompt), then drop it, since it won't do anything.
## Testing
- [x] Updated unit tests, added approvals ScenarioSpecs
Add per-turn notice when a request is downgraded to a fallback model due
to cyber safety checks.
**Changes**
- codex-api: Emit a ServerModel event based on the openai-model response
header and/or response payload (SSE + WebSocket), including when the
model changes mid-stream.
- core: When the server-reported model differs from the requested model,
emit a single per-turn warning explaining the reroute to gpt-5.2 and
directing users to Trusted
Access verification and the cyber safety explainer.
- app-server (v2): Surface these cyber model-routing warnings as
synthetic userMessage items with text prefixed by Warning: (and document
this behavior).
## Summary
This feature is now reasonably stable, let's remove it so we can
simplify our upcoming iterations here.
## Testing
- [x] Existing tests pass
Summary
- rename the `collab` handlers and UI files to `multi_agents` to match
the new naming
- update module references and specs so the handlers and TUI widgets
consistently use the renamed files
- keep the existing functionality while aligning file and module names
with the multi-agent terminology