Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs.
Co-authored-by: Codex <noreply@openai.com>
Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary.
Co-authored-by: Codex <noreply@openai.com>
Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload.
Co-authored-by: Codex <noreply@openai.com>
Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers.
Co-authored-by: Codex <noreply@openai.com>
Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server.
Co-authored-by: Codex <noreply@openai.com>
Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path.
Co-authored-by: Codex <noreply@openai.com>
Move result extraction, stream_text, approval inheritance, model list, and compact coverage onto the pinned app-server integration harness so the remaining unit tests stay focused on generated models and transport internals.
Co-authored-by: Codex <noreply@openai.com>
Assert the stable parts of the pinned app-server behavior: the user prompt appears as the final user input, approval overrides update the stored policy, and thread lifecycle coverage does not depend on thread/list indexing.
Co-authored-by: Codex <noreply@openai.com>
Make the new Python SDK integration tests assert stable app-server behavior: filter run result items to agent messages, accept either ordering for concurrent mock Responses requests, and avoid lifecycle operations that require a persisted rollout before one exists.
Co-authored-by: Codex <noreply@openai.com>
Build deterministic Python SDK integration coverage around the pinned app-server runtime and a local mock Responses server. Port behavioral coverage off direct SDK monkeypatches where the real app-server boundary is more useful.
Co-authored-by: Codex <noreply@openai.com>
Replace fake sync and async client approval tests with direct serialization checks using generated TurnStartParams, while keeping existing run-path coverage for the default behavior.
Co-authored-by: Codex <noreply@openai.com>
Use an explicit match over ApprovalMode values while keeping a separate runtime validation error for non-enum inputs.
Co-authored-by: Codex <noreply@openai.com>
Default high-level thread and turn starts to auto-review, keep deny_all as the explicit opt-out, and remove the generated AskForApproval alias customization.
Co-authored-by: Codex <noreply@openai.com>
Cover the exact public ApprovalMode values and ensure unsupported modes fail before sync or async high-level APIs can issue client requests.
Co-authored-by: Codex <noreply@openai.com>
Expose approval_mode with deny_all and auto_review options on the high-level Python SDK, and map those choices to generated app-server approval params internally.
Update examples, docs, notebooks, and public API tests to use the new mode instead of raw generated approval fields.
Co-authored-by: Codex <noreply@openai.com>
Add a separate Python SDK runner that installs the pinned musl runtime wheel in an Alpine Python container and runs the SDK pytest suite in parallel with existing SDK checks.
Co-authored-by: Codex <noreply@openai.com>
Make the SDK artifact generator fetch schema from the pinned runtime package, regenerate the checked-in Python types from that schema, and assert generated artifacts stay up to date.
Co-authored-by: Codex <noreply@openai.com>
Make the Python SDK declare its published runtime package dependency directly and resolve the runtime version from that pin instead of inferring it from the SDK package version.
Co-authored-by: Codex <noreply@openai.com>
Build the stage-runtime command as a single non-empty Bash array and append Linux resource binaries conditionally so macOS runners do not expand an empty optional array under set -u.
Co-authored-by: Codex <noreply@openai.com>
Use the v1.13.0 commit for the PyPI publish action so the pinned action reference has a clear release version.
Co-authored-by: Codex <noreply@openai.com>
Pass the release bwrap binary into Linux runtime wheel staging so PyPI installs preserve sandbox fallback behavior.
Co-authored-by: Codex <noreply@openai.com>
Build platform-specific openai-codex-cli-bin wheels from signed release binaries and publish them to PyPI using trusted publishing.
Co-authored-by: Codex <noreply@openai.com>
## Why
The Python SDK previously protected the stdio transport with a single
active turn-consumer guard. That avoided competing reads from stdout,
but it also meant one `Codex`/`AsyncCodex` client could not stream
multiple active turns at the same time. Notifications could also arrive
before the caller received a `TurnHandle` and registered for streaming,
so the SDK needed an explicit routing layer instead of letting
individual API calls read directly from the shared transport.
## What Changed
- Added a private `MessageRouter` that owns per-request response queues,
per-turn notification queues, pending turn-notification replay, and
global notification delivery behind a single stdout reader thread.
- Generated typed notification routing metadata so turn IDs come from
known payload shapes instead of router-side attribute guessing, with
explicit fallback handling for unknown notification payloads.
- Updated sync and async turn streaming so `TurnHandle.stream()`/`run()`
and `stream_text()` consume only notifications for their own turn ID,
while `AsyncAppServerClient` no longer serializes all transport calls
behind one async lock.
- Cleared pending turn-notification buffers when unregistered turns
complete so never-consumed turn handles do not leave stale queues
behind.
- Removed the internal stream-until helper now that turn completion
waiting can register directly with routed turn notifications.
- Updated Python SDK docs and focused tests for concurrent transport
calls, interleaved turn routing, buffered early notifications, unknown
notification routing, async delegation, and routed turn completion
behavior.
## Validation
- `uv run --extra dev ruff format scripts/update_sdk_artifacts.py
src/codex_app_server/_message_router.py src/codex_app_server/client.py
src/codex_app_server/generated/notification_registry.py
tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `uv run --extra dev ruff check scripts/update_sdk_artifacts.py
src/codex_app_server/_message_router.py src/codex_app_server/client.py
src/codex_app_server/generated/notification_registry.py
tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `uv run --extra dev pytest tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `git diff --check`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
The model-visible `<network>` context currently repeats indentation and
a pair of XML tags for every allowed or denied domain. Large domain sets
spend a surprising amount of prompt budget on that scaffolding instead
of the actual policy values.
## What changed
- Render allowed domains as one comma-separated `<allowed>` value
instead of one element per domain.
- Render denied domains the same way.
- Keep the full allow/deny domain sets model-visible while updating the
serialization and settings-update coverage for the denser shape.
## Example
Before:
```xml
<network enabled="true">
<allowed>api.example.test</allowed>
<allowed>cdn.example.test</allowed>
<denied>blocked.example.test</denied>
</network>
```
After:
```xml
<network enabled="true"><allowed>api.example.test,cdn.example.test</allowed><denied>blocked.example.test</denied></network>
```
## Validation
- `cargo test -p codex-core environment_context`
- `cargo test -p codex-core
build_settings_update_items_emits_environment_item_for_network_changes`
- Ran a local `codex` session with a real network context containing 121
allowed domains and 42 denied domains, then inspected the raw prompt
with `raw_token_viewer_cli.py`. With the same domain set, the rendered
`<network>` section shrank from 7,175 characters across 161 lines to
3,666 characters on one line, and the containing environment-context
block fell from 6,428 tokens to 5,379 tokens.
Expose discoverability and full share principals in share context, carry
roles through save/updateTargets, hydrate local shared plugin reads, and
keep share URLs only under plugin.shareContext.
## Why
The app-server watcher relocation leaves the generic filesystem watcher
as the last watcher-specific implementation still living inside
`codex-core`. Moving that code to a small crate keeps `codex-core`
focused on thread execution and lets app-server depend on the watcher
without reaching back into core for filesystem watching primitives.
This PR is stacked on #21287.
## What changed
- Added a new `codex-file-watcher` crate containing the existing watcher
implementation and its unit tests.
- Updated app-server `fs_watch`, `skills_watcher`, and listener state to
import watcher types from `codex-file-watcher`.
- Removed the `file_watcher` module and `notify` dependency from
`codex-core`.
- Updated Cargo workspace metadata and `Cargo.lock` for the new internal
crate.
## Validation
- `cargo check -p codex-file-watcher -p codex-core -p codex-app-server`
- `cargo test -p codex-file-watcher`
- `cargo test -p codex-app-server
skills_changed_notification_is_emitted_after_skill_change`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just fix -p codex-file-watcher`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
## Why
PR #21460 reverted the earlier move of skills change watching from
`codex-core` into app-server. This reapplies that boundary change so
app-server owns client-facing `skills/changed` notifications and core no
longer carries the watcher.
## What
- Restore the app-server `SkillsWatcher` and register it from thread
listener setup.
- Remove the core-owned skills watcher and its core live-reload
integration surface.
- Restore app-server coverage for `skills/changed` notifications after a
watched skill file changes.
## Validation
- `cargo test -p codex-app-server --test all
suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change
-- --exact --nocapture`
- `cargo test -p codex-core --lib --no-run`
## Why
We'd like SQLite state to become required and load-bearing. As a first
step, let's remove the mechanism that allows us to blow away the SQLite
DB on a version bump, and instead rely on graceful migrations.
The original motivation
([PR](https://github.com/openai/codex/pull/10623)) behind this mechanism
was to care less about backwards compatibility while SQLite was being
landed, but I'd say it's quite important now to keep the data in it.
## What changed
- Make `STATE_DB_FILENAME` and `LOGS_DB_FILENAME` the full canonical
filenames: `state_5.sqlite` and `logs_2.sqlite`.
- Remove `STATE_DB_VERSION` / `LOGS_DB_VERSION` and the helper that
constructed filenames from versions.
- Stop `StateRuntime::init` from scanning for or deleting older SQLite
DB filenames at startup.
- Delete the tests that encoded legacy state/logs DB deletion behavior.
## Verification
- `cargo test -p codex-state`
## Why
Amazon Bedrock Mantle needs a stable client-agent header so requests
from the built-in Bedrock provider can be identified as coming from
Codex for safety stack.
## What changed
- Added `x-amzn-mantle-client-agent: codex` to the built-in Amazon
Bedrock provider default HTTP headers.
## Why
Desktop and mobile Codex clients need a machine-readable way to
bootstrap and manage `codex app-server` on remote machines reached over
SSH. The same flow is also useful for bringing up app-server with
`remote_control` enabled on a fresh developer machine and keeping that
managed install current without requiring a human session.
## What changed
- add the new experimental `codex-app-server-daemon` crate and wire it
into `codex app-server daemon` lifecycle commands: `start`, `restart`,
`stop`, `version`, and `bootstrap`
- add explicit `enable-remote-control` and `disable-remote-control`
commands that persist the launch setting and restart a running managed
daemon so the change takes effect immediately
- emit JSON success responses for daemon commands so remote callers can
consume them directly
- support a Unix-only pidfile-backed detached backend for lifecycle
management
- assume the standalone `install.sh` layout for daemon-managed binaries
and always launch `CODEX_HOME/packages/standalone/current/codex`
- add bootstrap support for the standalone managed install plus a
detached hourly updater loop
- harden lifecycle management around concurrent operations, pidfile
ownership, stale state cleanup, updater ownership, managed-binary
preflight, Unix-only rejection, forced shutdown after the graceful
window, and updater process-group tracking/cleanup
- document the experimental Unix-only support boundary plus the
standalone bootstrap/update flow in
`codex-rs/app-server-daemon/README.md`
## Verification
- `cargo test -p codex-app-server-daemon -p codex-cli`
- live pid validation on `cb4`: `bootstrap --remote-control`, `restart`,
`version`, `stop`
## Follow-up
- Add updater self-refresh so the long-lived `pid-update-loop` can
replace its own executable image after installing a newer managed Codex
binary.
## Why
The environment-backed exec-server transport currently hardcodes 5
second connect and initialize timeouts in `client_transport.rs`. That is
short for SSH-backed stdio environments and remote websocket
environments, and there is currently no way to raise those values from
`CODEX_HOME/environments.toml`.
This stacked follow-up raises the default environment transport timeouts
and lets each configured environment override them in
`environments.toml`.
## What Changed
- raise the default environment transport connect and initialize
timeouts from 5s to 10s
- store concrete timeout values on `ExecServerTransportParams` instead
of hardcoding them in `connect_for_transport(...)`
- add `connect_timeout_sec` and `initialize_timeout_sec` to
`[[environments]]` entries in `environments.toml`
- apply parse-time defaults so runtime transport code receives fully
resolved timeout values
- reject `connect_timeout_sec` on stdio environments because it only
applies to websocket transports
- extend parser tests to cover the new fields and defaults
## Stack
- base: https://github.com/openai/codex/pull/21794
- this PR: configurable environment transport timeouts
## Validation
- `cd
/Users/starr/code/codex-worktrees/exec-env-timeouts-config-20260508/codex-rs
&& just fmt`
- not run: tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Support registry-backed remote executors end to end so downstream
services can resolve an executor id into an exec-server URL and make
that environment available to Codex without relying on the legacy cloud
environments flow.
## What changed
- switch remote executor registration to the executor registry bootstrap
contract
- allow named remote environments to be inserted into
`EnvironmentManager` at runtime
- add the experimental app-server RPC `environment/add` so initialized
experimental clients can register those remote environments for later
`thread/start` and `turn/start` selection
## Validation
Ran focused validation locally:
- `cargo test -p codex-exec-server environment_manager_`
- `cargo test -p codex-exec-server
register_executor_posts_with_bearer_token_header`
- `cargo test -p codex-app-server-protocol`
## Why
The app-server daemon work needs two app-server behaviors to be safe
when lifecycle management is driven by a helper process:
- a readiness probe must not become the process-wide client identity
just because it connects first
- a graceful reload signal needs to keep draining active turns even if
it is delivered more than once
## What changed
- Treat `codex_app_server_daemon` initialization as a probe-only client
for process-global originator and user-agent suffix state.
- Distinguish forceable shutdown signals from graceful-only ones, and
treat Unix `SIGHUP` as graceful-only while leaving `SIGTERM` and Ctrl-C
forceable.
- Add regression coverage for daemon probe initialization and repeated
`SIGHUP` delivery while a turn is still running.
## Testing
- `cargo test -p codex-app-server`
- The new daemon-probe and repeated-`SIGHUP` coverage passed.
- The run still failed in the existing
`suite::conversation_summary::get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
and
`suite::conversation_summary::get_conversation_summary_by_thread_id_reads_rollout`
tests because their initialize handshake timed out.
- `cargo test -p codex-app-server --test all
suite::conversation_summary::`
- Reproduced the same two existing initialize-timeout failures in
isolation.