Commit Graph

6375 Commits

Author SHA1 Message Date
Ahmed Ibrahim
c3e22fe134 Tighten SDK integration assertions
Assert skill inputs as persisted structured history and keep run override coverage to the model request plus token usage, matching the public SDK behavior exercised by the harness.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 15:17:20 +03:00
Ahmed Ibrahim
d77f543654 Cover SDK app-server integration gaps
Add focused integration coverage for thread listing, persisted history reads, async lifecycle wrappers, skill input injection, and run override/usage behavior through the pinned app-server test harness.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 15:14:45 +03:00
Ahmed Ibrahim
b9cd273f8d Shorten app-server integration test names
Rename the split Python SDK app-server integration files and helper module to concise group names.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:53:02 +03:00
Ahmed Ibrahim
280d690137 Materialize fork lifecycle integration test
Seed the fork test with a real turn so the pinned app-server has a persisted rollout before thread/fork runs.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:49:46 +03:00
Ahmed Ibrahim
57edbbf4eb Split pinned app-server integration tests by behavior
Break the large integration test module into focused run, input, stream, turn-control, approval-mode, and lifecycle files with shared helpers for the mock Responses boundary.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:47:06 +03:00
Ahmed Ibrahim
daf4694e4a Align multimodal integration assertion
Assert the prompt text is present alongside app-server image wrapper text while keeping the request image checks on the real Responses payload.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:31:25 +03:00
Ahmed Ibrahim
4e9b97865f Fix new SDK integration assertions
Assert the latest user multimodal payload after history replay and seed a rollout before exercising archive lifecycle helpers.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:29:25 +03:00
Ahmed Ibrahim
ad23385e39 Add more SDK app-server integration coverage
Add new harness coverage for multimodal inputs, active turn controls, and archive lifecycle behavior through the pinned app-server.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:26:40 +03:00
Ahmed Ibrahim
feffa481bd Fix app-server integration expectations
Seed approval inheritance coverage with a real persisted turn and align compaction coverage with the pinned runtime's model request path.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:08:19 +03:00
Ahmed Ibrahim
5d0ba5a0be Port SDK behavior tests to app-server harness
Move result extraction, stream_text, approval inheritance, model list, and compact coverage onto the pinned app-server integration harness so the remaining unit tests stay focused on generated models and transport internals.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 14:04:55 +03:00
Ahmed Ibrahim
a248323b7d Relax mock integration assertions
Assert the stable parts of the pinned app-server behavior: the user prompt appears as the final user input, approval overrides update the stored policy, and thread lifecycle coverage does not depend on thread/list indexing.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 13:40:17 +03:00
Ahmed Ibrahim
ebf625c43d Stabilize app-server integration expectations
Make the new Python SDK integration tests assert stable app-server behavior: filter run result items to agent messages, accept either ordering for concurrent mock Responses requests, and avoid lifecycle operations that require a persisted rollout before one exists.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 13:37:25 +03:00
Ahmed Ibrahim
cad4bbdd64 Add Python SDK mock app-server integration tests
Build deterministic Python SDK integration coverage around the pinned app-server runtime and a local mock Responses server. Port behavioral coverage off direct SDK monkeypatches where the real app-server boundary is more useful.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 13:32:31 +03:00
Ahmed Ibrahim
f3e16de572 Update approval mode run expectations
Co-authored-by: Codex <noreply@openai.com>
2026-05-10 13:06:56 +03:00
Ahmed Ibrahim
70053fbe42 Preserve approval settings by default
Co-authored-by: Codex <noreply@openai.com>
2026-05-10 13:05:01 +03:00
Ahmed Ibrahim
3b0b5a58e1 Reduce approval mode test mocking
Replace fake sync and async client approval tests with direct serialization checks using generated TurnStartParams, while keeping existing run-path coverage for the default behavior.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 12:45:11 +03:00
Ahmed Ibrahim
934eda61c3 Make approval mode mapping exhaustive
Use an explicit match over ApprovalMode values while keeping a separate runtime validation error for non-enum inputs.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 12:36:52 +03:00
Ahmed Ibrahim
bfd11aa1fc Focus Python SDK approval mode
Default high-level thread and turn starts to auto-review, keep deny_all as the explicit opt-out, and remove the generated AskForApproval alias customization.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 12:06:14 +03:00
Ahmed Ibrahim
800aa1d6ba Add approval mode contract tests
Cover the exact public ApprovalMode values and ensure unsupported modes fail before sync or async high-level APIs can issue client requests.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 11:58:13 +03:00
Ahmed Ibrahim
ffe6e44a03 Add high-level Python SDK approval mode
Expose approval_mode with deny_all and auto_review options on the high-level Python SDK, and map those choices to generated app-server approval params internally.

Update examples, docs, notebooks, and public API tests to use the new mode instead of raw generated approval fields.

Co-authored-by: Codex <noreply@openai.com>
2026-05-10 11:44:34 +03:00
Ahmed Ibrahim
7edbdc555c Add approval callback TODO
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 13:49:35 +03:00
Ahmed Ibrahim
d80a43263f Default Python SDK approval policy to never
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 12:03:09 +03:00
Ahmed Ibrahim
78c0d5ca3d Rename Python SDK package to openai-codex
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 11:38:22 +03:00
Ahmed Ibrahim
9306e60848 Define Python SDK public type surface
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 11:34:46 +03:00
Ahmed Ibrahim
8d7a5c27c1 Keep Python SDK type exports
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:39:52 +03:00
Ahmed Ibrahim
692c08faf9 Narrow Python SDK root exports
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:35:44 +03:00
Ahmed Ibrahim
8b8e868140 Document Python SDK CI job
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:24:29 +03:00
Ahmed Ibrahim
2654cc299e Run Python SDK tests in CI
Add a separate Python SDK runner that installs the pinned musl runtime wheel in an Alpine Python container and runs the SDK pytest suite in parallel with existing SDK checks.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:24:17 +03:00
Ahmed Ibrahim
242ca6d8fd Document pinned schema generation helpers
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:24:00 +03:00
Ahmed Ibrahim
b7635f4d77 Generate Python SDK types from pinned runtime
Make the SDK artifact generator fetch schema from the pinned runtime package, regenerate the checked-in Python types from that schema, and assert generated artifacts stay up to date.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:23:41 +03:00
Ahmed Ibrahim
c24694bdb0 Document Python runtime pinning helpers
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:23:11 +03:00
Ahmed Ibrahim
6e10973c78 Pin Python SDK runtime dependency
Make the Python SDK declare its published runtime package dependency directly and resolve the runtime version from that pin instead of inferring it from the SDK package version.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:23:11 +03:00
Ahmed Ibrahim
becbd2a127 Document SDK turn routing helpers
Co-authored-by: Codex <noreply@openai.com>
2026-05-09 10:23:06 +03:00
Ahmed Ibrahim
11e31d7d38 Fix Python runtime wheel release args
Build the stage-runtime command as a single non-empty Bash array and append Linux resource binaries conditionally so macOS runners do not expand an empty optional array under set -u.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:03 +03:00
Ahmed Ibrahim
1d0023776f Build Python runtime wheels in virtualenvs
Avoid installing build into runner-managed Python environments when release jobs build runtime wheels.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:03 +03:00
Ahmed Ibrahim
9b54951688 Make Python runtime publish non-blocking
Allow the Rust release workflow to finish even if the new Python runtime PyPI publish job needs follow-up.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:02 +03:00
Ahmed Ibrahim
3a3e1b477c Pin PyPI publish action to release tag commit
Use the v1.13.0 commit for the PyPI publish action so the pinned action reference has a clear release version.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:02 +03:00
Ahmed Ibrahim
356c6797b8 Use PyPI environment for runtime publishing
Set the Python runtime publish job environment to match the PyPI trusted publisher configuration.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:02 +03:00
Ahmed Ibrahim
bd14ac4758 Bundle Linux bwrap in Python runtime wheels
Pass the release bwrap binary into Linux runtime wheel staging so PyPI installs preserve sandbox fallback behavior.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:02 +03:00
Ahmed Ibrahim
d764740e6f Explain Windows runtime wheel helper packaging
Document why the release workflow includes sandbox helper executables in Windows Python runtime wheels.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:02 +03:00
Ahmed Ibrahim
29e1c96f72 Publish Python runtime wheels on release
Build platform-specific openai-codex-cli-bin wheels from signed release binaries and publish them to PyPI using trusted publishing.

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 09:24:02 +03:00
Ahmed Ibrahim
ebe75bb683 Route Python SDK turn notifications by ID (#21778)
## Why

The Python SDK previously protected the stdio transport with a single
active turn-consumer guard. That avoided competing reads from stdout,
but it also meant one `Codex`/`AsyncCodex` client could not stream
multiple active turns at the same time. Notifications could also arrive
before the caller received a `TurnHandle` and registered for streaming,
so the SDK needed an explicit routing layer instead of letting
individual API calls read directly from the shared transport.

## What Changed

- Added a private `MessageRouter` that owns per-request response queues,
per-turn notification queues, pending turn-notification replay, and
global notification delivery behind a single stdout reader thread.
- Generated typed notification routing metadata so turn IDs come from
known payload shapes instead of router-side attribute guessing, with
explicit fallback handling for unknown notification payloads.
- Updated sync and async turn streaming so `TurnHandle.stream()`/`run()`
and `stream_text()` consume only notifications for their own turn ID,
while `AsyncAppServerClient` no longer serializes all transport calls
behind one async lock.
- Cleared pending turn-notification buffers when unregistered turns
complete so never-consumed turn handles do not leave stale queues
behind.
- Removed the internal stream-until helper now that turn completion
waiting can register directly with routed turn notifications.
- Updated Python SDK docs and focused tests for concurrent transport
calls, interleaved turn routing, buffered early notifications, unknown
notification routing, async delegation, and routed turn completion
behavior.

## Validation

- `uv run --extra dev ruff format scripts/update_sdk_artifacts.py
src/codex_app_server/_message_router.py src/codex_app_server/client.py
src/codex_app_server/generated/notification_registry.py
tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `uv run --extra dev ruff check scripts/update_sdk_artifacts.py
src/codex_app_server/_message_router.py src/codex_app_server/client.py
src/codex_app_server/generated/notification_registry.py
tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `uv run --extra dev pytest tests/test_client_rpc_methods.py
tests/test_public_api_runtime_behavior.py
tests/test_async_client_behavior.py`
- `git diff --check`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-09 04:16:23 +00:00
sayan-oai
77d9223e9f [codex] compact network context rendering (#21875)
## Why

The model-visible `<network>` context currently repeats indentation and
a pair of XML tags for every allowed or denied domain. Large domain sets
spend a surprising amount of prompt budget on that scaffolding instead
of the actual policy values.

## What changed

- Render allowed domains as one comma-separated `<allowed>` value
instead of one element per domain.
- Render denied domains the same way.
- Keep the full allow/deny domain sets model-visible while updating the
serialization and settings-update coverage for the denser shape.

## Example

Before:
```xml
<network enabled="true">
  <allowed>api.example.test</allowed>
  <allowed>cdn.example.test</allowed>
  <denied>blocked.example.test</denied>
</network>
```

After:
```xml
<network enabled="true"><allowed>api.example.test,cdn.example.test</allowed><denied>blocked.example.test</denied></network>
```

## Validation

- `cargo test -p codex-core environment_context`
- `cargo test -p codex-core
build_settings_update_items_emits_environment_item_for_network_changes`
- Ran a local `codex` session with a real network context containing 121
allowed domains and 42 denied domains, then inspected the raw prompt
with `raw_token_viewer_cli.py`. With the same domain set, the rendered
`<network>` section shrank from 7,175 characters across 161 lines to
3,666 characters on one line, and the containing environment-context
block fell from 6,428 tokens to 5,379 tokens.
2026-05-09 03:52:48 +00:00
xl-openai
479491ed89 feat: Add role-aware plugin share context APIs (#21867)
Expose discoverability and full share principals in share context, carry
roles through save/updateTargets, hydrate local shared plugin reads, and
keep share URLs only under plugin.shareContext.
2026-05-08 20:46:39 -07:00
pakrym-oai
c579da41b1 Move file watcher out of core (#21290)
## Why

The app-server watcher relocation leaves the generic filesystem watcher
as the last watcher-specific implementation still living inside
`codex-core`. Moving that code to a small crate keeps `codex-core`
focused on thread execution and lets app-server depend on the watcher
without reaching back into core for filesystem watching primitives.

This PR is stacked on #21287.

## What changed

- Added a new `codex-file-watcher` crate containing the existing watcher
implementation and its unit tests.
- Updated app-server `fs_watch`, `skills_watcher`, and listener state to
import watcher types from `codex-file-watcher`.
- Removed the `file_watcher` module and `notify` dependency from
`codex-core`.
- Updated Cargo workspace metadata and `Cargo.lock` for the new internal
crate.

## Validation

- `cargo check -p codex-file-watcher -p codex-core -p codex-app-server`
- `cargo test -p codex-file-watcher`
- `cargo test -p codex-app-server
skills_changed_notification_is_emitted_after_skill_change`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just fix -p codex-file-watcher`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
2026-05-08 18:19:23 -07:00
pakrym-oai
408e6218ab Reapply "Move skills watcher to app-server" (#21652)
## Why

PR #21460 reverted the earlier move of skills change watching from
`codex-core` into app-server. This reapplies that boundary change so
app-server owns client-facing `skills/changed` notifications and core no
longer carries the watcher.

## What

- Restore the app-server `SkillsWatcher` and register it from thread
listener setup.
- Remove the core-owned skills watcher and its core live-reload
integration surface.
- Restore app-server coverage for `skills/changed` notifications after a
watched skill file changes.

## Validation

- `cargo test -p codex-app-server --test all
suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change
-- --exact --nocapture`
- `cargo test -p codex-core --lib --no-run`
2026-05-08 17:41:15 -07:00
Owen Lin
95ca276373 sqlite: no more destructive version bumps (#21847)
## Why

We'd like SQLite state to become required and load-bearing. As a first
step, let's remove the mechanism that allows us to blow away the SQLite
DB on a version bump, and instead rely on graceful migrations.

The original motivation
([PR](https://github.com/openai/codex/pull/10623)) behind this mechanism
was to care less about backwards compatibility while SQLite was being
landed, but I'd say it's quite important now to keep the data in it.

## What changed

- Make `STATE_DB_FILENAME` and `LOGS_DB_FILENAME` the full canonical
filenames: `state_5.sqlite` and `logs_2.sqlite`.
- Remove `STATE_DB_VERSION` / `LOGS_DB_VERSION` and the helper that
constructed filenames from versions.
- Stop `StateRuntime::init` from scanning for or deleting older SQLite
DB filenames at startup.
- Delete the tests that encoded legacy state/logs DB deletion behavior.

## Verification

- `cargo test -p codex-state`
2026-05-08 17:29:44 -07:00
Celia Chen
bd42660cb4 feat: add Bedrock Mantle client agent header (#21840)
## Why

Amazon Bedrock Mantle needs a stable client-agent header so requests
from the built-in Bedrock provider can be identified as coming from
Codex for safety stack.

## What changed

- Added `x-amzn-mantle-client-agent: codex` to the built-in Amazon
Bedrock provider default HTTP headers.
2026-05-08 23:58:41 +00:00
Ruslan Nigmatullin
0c8d42525e [daemon] Add app-server daemon lifecycle management (#20718)
## Why

Desktop and mobile Codex clients need a machine-readable way to
bootstrap and manage `codex app-server` on remote machines reached over
SSH. The same flow is also useful for bringing up app-server with
`remote_control` enabled on a fresh developer machine and keeping that
managed install current without requiring a human session.

## What changed

- add the new experimental `codex-app-server-daemon` crate and wire it
into `codex app-server daemon` lifecycle commands: `start`, `restart`,
`stop`, `version`, and `bootstrap`
- add explicit `enable-remote-control` and `disable-remote-control`
commands that persist the launch setting and restart a running managed
daemon so the change takes effect immediately
- emit JSON success responses for daemon commands so remote callers can
consume them directly
- support a Unix-only pidfile-backed detached backend for lifecycle
management
- assume the standalone `install.sh` layout for daemon-managed binaries
and always launch `CODEX_HOME/packages/standalone/current/codex`
- add bootstrap support for the standalone managed install plus a
detached hourly updater loop
- harden lifecycle management around concurrent operations, pidfile
ownership, stale state cleanup, updater ownership, managed-binary
preflight, Unix-only rejection, forced shutdown after the graceful
window, and updater process-group tracking/cleanup
- document the experimental Unix-only support boundary plus the
standalone bootstrap/update flow in
`codex-rs/app-server-daemon/README.md`

## Verification

- `cargo test -p codex-app-server-daemon -p codex-cli`
- live pid validation on `cb4`: `bootstrap --remote-control`, `restart`,
`version`, `stop`

## Follow-up

- Add updater self-refresh so the long-lived `pid-update-loop` can
replace its own executable image after installing a newer managed Codex
binary.
2026-05-08 16:51:16 -07:00
starr-openai
faa5d4a5e2 Increase exec-server environment transport timeouts (#21825)
## Why

The environment-backed exec-server transport currently hardcodes 5
second connect and initialize timeouts in `client_transport.rs`. That is
short for SSH-backed stdio environments and remote websocket
environments, and there is currently no way to raise those values from
`CODEX_HOME/environments.toml`.

This stacked follow-up raises the default environment transport timeouts
and lets each configured environment override them in
`environments.toml`.

## What Changed

- raise the default environment transport connect and initialize
timeouts from 5s to 10s
- store concrete timeout values on `ExecServerTransportParams` instead
of hardcoding them in `connect_for_transport(...)`
- add `connect_timeout_sec` and `initialize_timeout_sec` to
`[[environments]]` entries in `environments.toml`
- apply parse-time defaults so runtime transport code receives fully
resolved timeout values
- reject `connect_timeout_sec` on stdio environments because it only
applies to websocket transports
- extend parser tests to cover the new fields and defaults

## Stack

- base: https://github.com/openai/codex/pull/21794
- this PR: configurable environment transport timeouts

## Validation

- `cd
/Users/starr/code/codex-worktrees/exec-env-timeouts-config-20260508/codex-rs
&& just fmt`
- not run: tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-08 16:33:29 -07:00