## Summary
Fixes a flaky/panicking `js_repl` image-path test by running it on a
multi-thread Tokio runtime and tightening assertions to focus on real
behavior.
## Problem
`js_repl_can_attach_image_via_view_image_tool` in
`/Users/fjord/code/codex-jsrepl-seq/codex-rs/core/src/tools/js_repl/mod.rs`
can panic under single-thread test runtime with:
`can call blocking only when running on the multi-threaded runtime`
It also asserted a brittle user-facing text string.
## Changes
1. Updated the test runtime to:
`#[tokio::test(flavor = "multi_thread", worker_threads = 2)]`
2. Removed the brittle `"attached local image path"` string assertion.
3. Kept the concrete side-effect assertions:
- tool call succeeds
- image is actually injected into pending input (`InputImage` with
`data:image/png;base64,...`)
## Why this is safe
This is test-only behavior. No production runtime code paths are
changed.
## Validation
- Ran:
`cargo test -p codex-core
tools::js_repl::tests::js_repl_can_attach_image_via_view_image_tool --
--nocapture`
- Result: pass
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/11796
- ⏳ `2` https://github.com/openai/codex/pull/11800
- ⏳ `3` https://github.com/openai/codex/pull/10673
- ⏳ `4` https://github.com/openai/codex/pull/10670
Currently, if there are syntax errors detected in the starlark rules
file, the entire policy is silently ignored by the CLI. The app server
correctly emits a message that can be displayed in a GUI.
This PR changes the CLI (both the TUI and non-interactive exec) to fail
when the rules file can't be parsed. It then prints out an error message
and exits with a non-zero exit code. This is consistent with the
handling of errors in the config file.
This addresses #11603
## Summary
- add a shared `codex-core` sleep inhibitor that uses native macOS IOKit
assertions (`IOPMAssertionCreateWithName` / `IOPMAssertionRelease`)
instead of spawning `caffeinate`
- wire sleep inhibition to turn lifecycle in `tui` (`TurnStarted`
enables; `TurnComplete` and abort/error finalization disable)
- gate this behavior behind a `/experimental` feature toggle
(`[features].prevent_idle_sleep`) instead of a dedicated `[tui]` config
flag
- expose the toggle in `/experimental` on macOS; keep it under
development on other platforms
- keep behavior no-op on non-macOS targets
<img width="1326" height="577" alt="image"
src="https://github.com/user-attachments/assets/73fac06b-97ae-46a2-800a-30f9516cf8a3"
/>
## Testing
- `cargo check -p codex-core -p codex-tui`
- `cargo test -p codex-core sleep_inhibitor::tests -- --nocapture`
- `cargo test -p codex-core
tui_config_missing_notifications_field_defaults_to_enabled --
--nocapture`
- `cargo test -p codex-core prevent_idle_sleep_is_ -- --nocapture`
## Semantics and API references
- This PR targets `caffeinate -i` semantics: prevent *idle system sleep*
while allowing display idle sleep.
- `caffeinate -i` mapping in Apple open source (`assertionMap`):
- `kIdleAssertionFlag -> kIOPMAssertionTypePreventUserIdleSystemSleep`
- Source:
https://github.com/apple-oss-distributions/PowerManagement/blob/PowerManagement-1846.60.12/caffeinate/caffeinate.c#L52-L54
- Apple IOKit docs for assertion types and API:
-
https://developer.apple.com/documentation/iokit/iopmlib_h/iopmassertiontypes
-
https://developer.apple.com/documentation/iokit/1557092-iopmassertioncreatewithname
- https://developer.apple.com/library/archive/qa/qa1340/_index.html
## Codex Electron vs this PR (full stack path)
- Codex Electron app requests sleep blocking with
`powerSaveBlocker.start("prevent-app-suspension")`:
-
https://github.com/openai/codex/blob/main/codex/codex-vscode/electron/src/electron-message-handler.ts
- Electron maps that string to Chromium wake lock type
`kPreventAppSuspension`:
-
https://github.com/electron/electron/blob/main/shell/browser/api/electron_api_power_save_blocker.cc
- Chromium macOS backend maps wake lock types to IOKit assertion
constants and calls IOKit:
- `kPreventAppSuspension -> kIOPMAssertionTypeNoIdleSleep`
- `kPreventDisplaySleep / kPreventDisplaySleepAllowDimming ->
kIOPMAssertionTypeNoDisplaySleep`
-
https://github.com/chromium/chromium/blob/main/services/device/wake_lock/power_save_blocker/power_save_blocker_mac.cc
## Why this PR uses a different macOS constant name
- This PR uses `"PreventUserIdleSystemSleep"` directly, via
`IOPMAssertionCreateWithName`, in
`codex-rs/core/src/sleep_inhibitor.rs`.
- Apple’s IOKit header documents `kIOPMAssertionTypeNoIdleSleep` as
deprecated and recommends `kIOPMAssertPreventUserIdleSystemSleep` /
`kIOPMAssertionTypePreventUserIdleSystemSleep`:
-
https://github.com/apple-oss-distributions/IOKitUser/blob/IOKitUser-100222.60.2/pwr_mgt.subproj/IOPMLib.h#L1000-L1030
- So Chromium and this PR are using different constant names, but
semantically equivalent idle-system-sleep prevention behavior.
## Future platform support
The architecture is intentionally set up for multi-platform extensions:
- UI code (`tui`) only calls `SleepInhibitor::set_turn_running(...)` on
turn lifecycle boundaries.
- Platform-specific behavior is isolated in
`codex-rs/core/src/sleep_inhibitor.rs` behind `cfg(...)` blocks.
- Feature exposure is centralized in `core/src/features.rs` and surfaced
via `/experimental`.
- Adding new OS backends should not require additional TUI wiring; only
the backend internals and feature stage metadata need to change.
Potential follow-up implementations:
- Windows:
- Add a backend using Win32 power APIs
(`SetThreadExecutionState(ES_CONTINUOUS | ES_SYSTEM_REQUIRED)` as
baseline).
- Optionally move to `PowerCreateRequest` / `PowerSetRequest` /
`PowerClearRequest` for richer assertion semantics.
- Linux:
- Add a backend using logind inhibitors over D-Bus
(`org.freedesktop.login1.Manager.Inhibit` with `what="sleep"`).
- Keep a no-op fallback where logind/D-Bus is unavailable.
This PR keeps the cross-platform API surface minimal so future PRs can
add Windows/Linux support incrementally with low churn.
---------
Co-authored-by: jif-oai <jif@openai.com>
## Summary
- Limit `search_tool_bm25` indexing to `codex_apps` tools only, so
non-Apps MCP servers are no longer discoverable through this search
path.
- Move search-tool discovery guidance into the `search_tool_bm25` tool
description (via template include) instead of injecting it as a separate
developer message.
- Update Apps discovery guidance wording to clarify when to use
`search_tool_bm25` for Apps-backed systems (for example Slack, Google
Drive, Jira, Notion) and when to call tools directly.
- Remove dead `core` helper code (`filter_codex_apps_mcp_tools` and
`codex_apps_connector_id`) that is no longer used after the
tool-selection refactor.
- Update `core` search-tool tests to assert codex-apps-only behavior and
to validate guidance from the tool description.
## Validation
- ✅ `just fmt`
- ✅ `cargo test -p codex-core search_tool`
- ⚠️ `cargo test -p codex-core` was attempted, but the run repeatedly
stalled on
`tools::js_repl::tests::js_repl_can_attach_image_via_view_image_tool`.
## Tickets
- None
Summary
- make the phase1 memories schema require `rollout_slug` while still
allowing it to be `null`
- update the corresponding test to check the required fields and
nullable type list
Testing
- Not run (not requested)
## Summary
If the model suggests a bad rule, don't show it to the user. This does
not impact the parsing of existing rules, just the ones we show.
## Testing
- [x] Added unit tests
- [x] Ran locally
When `app/list` is called with `force_refetch=True`, we should seed the
results with what is already cached instead of starting from an empty
list. Otherwise when we send app/list/updated events, the client will
first see an empty list of accessible apps and then get the updated one.
We've had a few cases recently where someone enabled a feature flag for
a feature that's still under development or experimental. This test
should prevent this.
### Motivation
- Git subcommand matching was being classified as "dangerous" and caused
benign developer workflows (for example `git push --force-with-lease`)
to be blocked by the preflight policy.
- The change aligns behavior with the intent to reserve the dangerous
checklist for truly destructive shell ops (e.g. `rm -rf`) and avoid
surprising developer-facing blocks.
### Description
- Remove git-specific subcommand checks from
`is_dangerous_to_call_with_exec` in
`codex-rs/shell-command/src/command_safety/is_dangerous_command.rs`,
leaving only explicit `rm` and `sudo` passthrough checks.
- Deleted the git-specific helper logic that classified `reset`,
`branch`-delete, `push` (force/delete/refspec) and `clean --force` as
dangerous.
- Updated unit tests in the same file to assert that various `git
reset`/`git branch`/`git push`/`git clean` variants are no longer
classified as dangerous.
- Kept `find_git_subcommand` (used by safe-command classification)
intact so safe/unsafe parsing elsewhere remains functional.
### Testing
- Ran formatter with `just fmt` successfully.
- Ran unit tests with `cargo test -p codex-shell-command` and all tests
passed (`144 passed; 0 failed`).
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_698d19dedb4883299c3ceb5bbc6a0dcf)
## Summary
This PR delivers the first small, shippable step toward model-visible
state diffing by making
`TurnContextItem` more complete and standardizing how it is built.
Specifically, it:
- Adds persisted network context to `TurnContextItem`.
- Introduces a single canonical `TurnContext -> TurnContextItem`
conversion path.
- Routes existing rollout write sites through that canonical conversion
helper.
No context injection/diff behavior changes are included in this PR.
## Why this change
The design goal is to make `TurnContextItem` the canonical source of
truth for context-diff
decisions.
Before this PR:
- `TurnContextItem` did not include all TurnContext-derived environment
inputs needed for v1
completeness.
- Construction was duplicated at multiple write sites.
This PR addresses both with a minimal, reviewable change.
## Changes
### 1) Extend `TurnContextItem` with network state
- Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
- Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
- Kept backward compatibility by making the new field optional and
skipped when absent.
Files:
- `codex-rs/protocol/src/protocol.rs`
### 2) Canonical conversion helper
- Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
- Added internal helper to derive network fields from
`config_layer_stack.requirements().network`.
Files:
- `codex-rs/core/src/codex.rs`
### 3) Use canonical conversion at rollout write sites
- Replaced ad hoc `TurnContextItem { ... }` construction with
`to_turn_context_item(...)` in:
- sampling request path
- compaction path
Files:
- `codex-rs/core/src/codex.rs`
- `codex-rs/core/src/compact.rs`
### 4) Update fixtures/tests for new optional field
- Updated existing `TurnContextItem` literals in tests to include
`network: None`.
- Added protocol tests for:
- deserializing old payloads with no `network`
- serializing when `network` is present
Files:
- `codex-rs/core/tests/suite/resume_warning.rs`
- No replay/diff logic changes.
- Persisted rollout `TurnContextItem` now carries additional network
context when available.
- Older rollout lines without `network` remain readable.
Adds a new apps_mcp_gateway flag to route Apps MCP calls through
https://api.openai.com/v1/connectors/mcp/ when enabled, while keeping
legacy MCP routing as default.
## Why
We currently carry multiple permission-related concepts directly on
`Config` for shell/unified-exec behavior (`approval_policy`,
`sandbox_policy`, `network`, `shell_environment_policy`,
`windows_sandbox_mode`).
Consolidating these into one in-memory struct makes permission handling
easier to reason about and sets up the next step: supporting named
permission profiles (`[permissions.PROFILE_NAME]`) without changing
behavior now.
This change is mostly mechanical: it updates existing callsites to go
through `config.permissions`, but it does not yet refactor those
callsites to take a single `Permissions` value in places where multiple
permission fields are still threaded separately.
This PR intentionally **does not** change the on-disk `config.toml`
format yet and keeps compatibility with legacy config keys.
## What Changed
- Introduced `Permissions` in `core/src/config/mod.rs`.
- Added `Config::permissions` and moved effective runtime permission
fields under it:
- `approval_policy`
- `sandbox_policy`
- `network`
- `shell_environment_policy`
- `windows_sandbox_mode`
- Updated config loading/building so these effective values are still
derived from the same existing config inputs and constraints.
- Updated Windows sandbox helpers/resolution to read/write via
`permissions`.
- Threaded the new field through all permission consumers across core
runtime, app-server, CLI/exec, TUI, and sandbox summary code.
- Updated affected tests to reference `config.permissions.*`.
- Renamed the struct/field from
`EffectivePermissions`/`effective_permissions` to
`Permissions`/`permissions` and aligned variable naming accordingly.
## Verification
- `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
-p codex-exec -p codex-utils-sandbox-summary`
- `cargo build -p codex-core -p codex-tui -p codex-cli -p
codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
## Summary
In an effort to start simplifying our sandbox setup, we're announcing
this approval_policy as deprecated. In general, it performs worse than
`on-request`, and we're focusing on making fewer sandbox configurations
perform much better.
## Testing
- [x] Tested locally
- [x] Existing tests pass
There is an edge case where a directory is not readable by the sandbox.
In practice, we've seen very little of it, but it can happen so this
slash command unlocks users when it does.
Future idea is to make this a tool that the agent knows about so it can
be more integrated.
## Summary
This PR adds host-integrated helper APIs for `js_repl` and updates model
guidance so the agent can use them reliably.
### What’s included
- Add `codex.tool(name, args?)` in the JS kernel so `js_repl` can call
normal Codex tools.
- Keep persistent JS state and scratch-path helpers available:
- `codex.state`
- `codex.tmpDir`
- Wire `js_repl` tool calls through the standard tool router path.
- Add/align `js_repl` execution completion/end event behavior with
existing tool logging patterns.
- Update dynamic prompt injection (`project_doc`) to document:
- how to call `codex.tool(...)`
- raw output behavior
- image flow via `view_image` (`codex.tmpDir` +
`codex.tool("view_image", ...)`)
- stdio safety guidance (`console.log` / `codex.tool`, avoid direct
`process.std*`)
## Why
- Standardize JS-side tool usage on `codex.tool(...)`
- Make `js_repl` behavior more consistent with existing tool execution
and event/logging patterns.
- Give the model enough runtime guidance to use `js_repl` safely and
effectively.
## Testing
- Added/updated unit and runtime tests for:
- `codex.tool` calls from `js_repl` (including shell/MCP paths)
- image handoff flow via `view_image`
- prompt-injection text for `js_repl` guidance
- execution/end event behavior and related regression coverage
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/10674
- 👉 `2` https://github.com/openai/codex/pull/10672
- ⏳ `3` https://github.com/openai/codex/pull/10671
- ⏳ `4` https://github.com/openai/codex/pull/10673
- ⏳ `5` https://github.com/openai/codex/pull/10670
This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).
### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.
Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.
### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).
This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)
In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit
For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.
And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.
#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
This PR introduces a skill-expansion mechanism for mentions so nested or
skill or connection mentions are expanded if present in skills invoked
by the user. This keeps behavior aligned with existing mention handling
while extending coverage to deeper scenarios. With these changes, users
can create skills that invoke connectors, and skills that invoke other
skills.
Replaces #10863, which is not needed with the addition of
[search_tool_bm25](https://github.com/openai/codex/issues/10657)
## Summary
Preserve the specified model slug when we get a prefix-based match
## Testing
- [x] added unit test
---------
Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
Summary
- trim `state_db::list_threads_db` results to entries whose rollout
files still exist, logging and recording a discrepancy for dropped rows
- delete stale metadata rows from the SQLite store so future calls don’t
surface invalid paths
- add regression coverage in `recorder.rs` to verify stale DB paths are
dropped when the file is missing
Summary
- address the nondeterministic behavior observed in
`pre_sampling_compact_runs_on_switch_to_smaller_context_model` so it no
longer fails intermittently during model switches
- ensure the surrounding sampling logic consistently handles the
smaller-context case that the test exercises
Testing
- Not run (not requested)
## Why
`project_doc::tests::skills_are_appended_to_project_doc` and
`project_doc::tests::skills_render_without_project_doc` were assuming a
single synthetic skill in test setup, but they called
`load_skills(&cfg)`, which loads from repo/user/system roots.
That made the assertions environment-dependent. After
[#11531](https://github.com/openai/codex/pull/11531) added
`.codex/skills/test-tui/SKILL.md`, the repo-scoped `test-tui` skill
began appearing in these test outputs and exposed the flake.
## What Changed
- Added a test-only helper in `codex-rs/core/src/project_doc.rs` that
loads skills from an explicit root via `load_skills_from_roots`.
- Scoped that root to `codex_home/skills` with `SkillScope::User`.
- Updated both affected tests to use this helper instead of
`load_skills(&cfg)`:
- `skills_are_appended_to_project_doc`
- `skills_render_without_project_doc`
This keeps the tests focused on the fixture skills they create,
independent of ambient repo/home skills.
## Verification
- `cargo test -p codex-core
project_doc::tests::skills_render_without_project_doc -- --exact`
- `cargo test -p codex-core
project_doc::tests::skills_are_appended_to_project_doc -- --exact`