Commit Graph

15364 Commits

Author SHA1 Message Date
Michael Bolin
4ca6efdba1 merge commit for archive created by Sapling 2026-05-11 20:08:33 -07:00
Michael Bolin
8210503007 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 20:08:16 -07:00
Michael Bolin
c8ba58b46a Merge 5d0c7dea61 into sapling-pr-archive-bolinfest 2026-05-11 19:54:59 -07:00
Michael Bolin
5d0c7dea61 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 19:54:51 -07:00
pakrym-oai
79c65f816c [codex] Filter legacy warning messages during compaction (#22243)
## Why

Older sessions can contain model-warning records persisted as `user`
messages, including the unified exec process-limit warning, the
`apply_patch`-via-`exec_command` warning, and the model-mismatch
high-risk cyber fallback warning. Those warnings are no longer produced
as conversation history items, but when old sessions compact they should
still be recognized as injected context rather than preserved as real
user turns.

## What changed

- Removed `record_model_warning` and the production paths that emitted
these warning messages into conversation history.
- Added `LegacyUnifiedExecProcessLimitWarning`,
`LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning`
contextual fragments that are used only for matching old persisted
messages.
- Registered the legacy fragments with contextual user message detection
so compaction filters them through the existing fragment path.
- Added focused compaction coverage for old warning messages being
dropped during compacted-history processing.

## Testing

- `cargo test -p codex-core warning`
- `just fix -p codex-core`
2026-05-11 19:51:51 -07:00
Michael Bolin
58af6a52c4 merge commit for archive created by Sapling 2026-05-11 19:50:03 -07:00
Michael Bolin
5801edb3eb Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 19:49:51 -07:00
Abhinav
d08906a944 Support PreToolUse updatedInput rewrites (#20527)
## Why

`PreToolUse` already exposes `updatedInput` in its hook output schema,
but Codex currently rejects it instead of applying the rewrite. That
leaves hook authors unable to make the documented pre-execution
adjustment to a tool call before it runs.

## What

- Accept `updatedInput` from `PreToolUse` hooks when paired with
`permissionDecision: "allow"`.
- Apply the rewritten input before dispatch so the tool executes the
updated payload, not the original one.
- Preserve the stable hook-facing compatibility shapes that
participating tool handlers expose:
- Bash-like tools (`shell`, `container.exec`, `local_shell`,
`shell_command`, `exec_command`) use `{ "command": ... }`.
- `apply_patch` exposes its patch body through the same command-shaped
hook contract.
  - MCP tools expose their JSON argument object directly.
- Keep each participating tool handler responsible for translating
hook-facing `updatedInput` back into its concrete invocation shape.

## Verification

Direct Bash-like rewrite coverage:

- `pre_tool_use_rewrites_shell_before_execution`
- `pre_tool_use_rewrites_container_exec_before_execution`
- `pre_tool_use_rewrites_local_shell_before_execution`
- `pre_tool_use_rewrites_shell_command_before_execution`
- `pre_tool_use_rewrites_exec_command_before_execution`

These cases assert that each supported Bash-like surface runs only the
rewritten command while the hook still observes the original `{
"command": ... }` input.

`pre_tool_use_rewrites_apply_patch_before_execution`

- Model emits one patch.
- Hook swaps in a different patch.
- Asserts only the rewritten file is created, and the hook saw the
original patch.

`pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution`

- Model runs one nested shell command from code mode.
- Hook rewrites it.
- Asserts only the rewritten command runs, and the hook saw the original
nested input.

`pre_tool_use_rewrites_mcp_tool_before_execution`

- Model calls the RMCP echo tool.
- Hook rewrites the MCP arguments.
- Asserts the MCP server receives and returns the rewritten message, not
the original one.
2026-05-11 22:27:24 -04:00
Michael Bolin
1bd15bc24a merge commit for archive created by Sapling 2026-05-11 19:06:38 -07:00
Michael Bolin
d824faf0dc Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 19:06:28 -07:00
starr-openai
17ed5ad0b0 Apply sandbox context to local view_image reads (#21861)
## Summary
- create a selected-cwd filesystem sandbox context for view_image
metadata and file reads in both local and remote environments
- add a local restricted-profile regression test for the previously
unsandboxed read path

## Validation
- just fmt
- bazel test --bes_backend= --bes_results_url= --test_output=errors
--test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads
//codex-rs/core:core-unit-tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 18:48:43 -07:00
Michael Bolin
c0a3e2bc63 merge commit for archive created by Sapling 2026-05-11 18:40:33 -07:00
Michael Bolin
448ea1b930 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 18:40:24 -07:00
Michael Bolin
583117ffa1 merge commit for archive created by Sapling 2026-05-11 18:13:13 -07:00
Michael Bolin
bb9aa31ee5 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 18:13:04 -07:00
efrazer-oai
fd24c00b0b feat(skills): default plugin creator to personal share flow (#22221)
## Summary
Plugin creation now defaults to the personal marketplace path and ends
with a readable handoff back into Codex after a marketplace-backed
scaffold.

Before this change, `plugin-creator` centered repo-local marketplace
updates and did not clearly guide the agent to return the user to the
created plugin afterward. This PR updates the bundled system skill so
marketplace-backed scaffolds default to `~/plugins/<plugin-name>` plus
`~/.agents/plugins/marketplace.json`, ask for user intent only when an
existing repo marketplace makes personal vs team scope ambiguous, and
end with named Markdown deeplinks labeled `View <plugin-name>` and
`Share <plugin-name>`.

## What changed
- default marketplace-backed creation to the personal plugin location
- document the explicit repo/team override path for codebases that
should own the plugin entry
- ask personal vs team only when the current Git repo already has
`.agents/plugins/marketplace.json` and the user has not stated scope
- require named Markdown deeplinks after marketplace-backed creation so
the final response returns the user to the exact plugin cleanly
- keep the deeplink targets precise with real absolute `marketplacePath`
and normalized `pluginName` values
- align the bundled prompt, scaffold help text, and marketplace
reference spec with the new default

## Testing
Tests: targeted skill validation, Python compile checks,
personal-default scaffold smoke, repo-override scaffold smoke, and
whitespace checks.
2026-05-11 17:58:48 -07:00
Michael Bolin
3bb2466299 merge commit for archive created by Sapling 2026-05-11 17:58:06 -07:00
Michael Bolin
4c0a41a53d Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 17:57:58 -07:00
Michael Bolin
9077a2d7dd merge commit for archive created by Sapling 2026-05-11 17:38:35 -07:00
Michael Bolin
b191b5e546 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 17:38:18 -07:00
Michael Bolin
eae233a57b merge commit for archive created by Sapling 2026-05-11 17:20:21 -07:00
Michael Bolin
0e2d80b644 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 17:20:13 -07:00
pakrym-oai
ed5944ba1d Simplify MCP tool handler plumbing (#21595)
## Why
The MCP tool path had accumulated a few core-owned special cases: a
dedicated payload variant, resolver plumbing, a legacy `AfterToolUse`
translation path, and a side channel for parallel-call metadata. That
made `ToolRegistry` and the spec builder know more about MCP than they
needed to.

This change moves MCP-specific execution details back onto `ToolInfo`
and `McpHandler` so `codex-core` can treat MCP calls like normal
function calls while still preserving MCP-specific dispatch and
telemetry behavior where it belongs.

## What changed
- removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and
the remaining registry-side MCP resolver path
- stored MCP routing metadata directly on `McpHandler` and `ToolInfo`,
including `supports_parallel_tool_calls`
- deleted the legacy `AfterToolUse` consumer in `core`, which removes
the need for handler-specific `after_tool_use_payload` implementations
- switched tool-result telemetry to handler-provided tags and kept
MCP-specific dispatch payload construction inside the handler
- simplified tool spec planning/building by passing `ToolInfo` directly
and dropping the direct/deferred MCP wrapper structs and the
parallel-server side table

## Testing
- `cargo check -p codex-core -p codex-mcp -p codex-otel`
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server`
- `cargo test -p codex-core
direct_mcp_tools_register_namespaced_handlers`
- `cargo test -p codex-core
search_tool_description_lists_each_mcp_source_once`
- `cargo test -p codex-mcp
list_all_tools_uses_startup_snapshot_while_client_is_pending`
- `just fix -p codex-core -p codex-mcp -p codex-otel`
2026-05-12 00:11:31 +00:00
Felipe Coury
e16b4e46d4 fix(tui): handle hidden app git directives (#21946) 2026-05-11 21:08:40 -03:00
Ruslan Nigmatullin
95d8669ab2 [exec-server] serve websocket listener via HTTP upgrade (#21963)
## Why

`codex exec-server` should keep the existing public `ws://IP:PORT` URL
shape while serving that websocket connection through an HTTP upgrade
path internally. That keeps the client-facing configuration simple and
allows the listener to work through intermediate HTTP-aware
infrastructure.

## What changed

- keep the emitted and configured exec-server URL as `ws://IP:PORT`
- serve that websocket endpoint through Axum HTTP upgrade handling on
`/`
- expose `GET /readyz` from the same listener for readiness checks
- route upgraded Axum websocket streams through the shared JSON-RPC
connection machinery
- initialize the rustls crypto provider before websocket client
connections
- preserve inbound binary websocket JSON-RPC parsing for compatibility
with the prior transport behavior

## Verification

- `cargo test -p codex-exec-server --test health --test process --test
websocket --test initialize --test exec_process`
2026-05-11 17:04:21 -07:00
Matthew Zeng
e15ecc9c35 Add production startup and TTFT telemetry (#22198)
## Why

While investigating `codex exec hi` startup latency, the useful
questions were not "is startup slow?" but "which durable bucket is slow
in production?"

The path we observed has a few distinct stages:

1. `thread/start` creates the session
2. startup prewarm builds the turn context, tools, and prompt
3. startup prewarm warms the websocket
4. the first real turn resolves the prewarm
5. the model produces the first token

Before this PR, production telemetry had some of the raw measurements
already:

- aggregate startup-prewarm duration / age-at-first-turn metrics
- TTFT as a metric
- websocket request telemetry

But there was no coherent production event stream for the startup
breakdown itself, and TTFT was metric-only. That made it hard to answer
the same latency questions from OpenTelemetry-backed logs without adding
one-off local instrumentation.

## What changed

Add durable production telemetry on the existing `SessionTelemetry`
path:

- new `codex.startup_phase` OTel log/trace events plus
`codex.startup.phase.duration_ms`
- new `codex.turn_ttft` OTel log/trace events while preserving the
existing TTFT metric

The startup phase event is emitted for the coarse buckets we actually
observed while running `exec hi`:

- `thread_start_create_thread`
- `startup_prewarm_total`
- `startup_prewarm_create_turn_context`
- `startup_prewarm_build_tools`
- `startup_prewarm_build_prompt`
- `startup_prewarm_websocket_warmup`
- `startup_prewarm_resolve`

These phases are intentionally low-cardinality so they remain safe as
production telemetry tags.

## Why this shape

This keeps the instrumentation on the same production path as the rest
of the session telemetry instead of adding a local debug-only trace
mode. It also avoids changing startup behavior:

- prewarm still runs
- no control flow changes
- no extra remote calls
- no user-visible behavior changes

One boundary is intentional: very early process bootstrap that happens
before a session exists is not included here, because this PR uses
session-scoped production telemetry. The expensive buckets we were
trying to understand after `thread/start` are now covered durably.

## Verification

- `cargo test -p codex-otel`
- `cargo test -p codex-core turn_timing`
- `cargo test -p codex-core
regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
- `cargo test -p codex-core
interrupting_regular_turn_waiting_on_startup_prewarm_emits_turn_aborted`
- `cargo test -p codex-app-server thread_start`
- `just fix -p codex-otel -p codex-core -p codex-app-server`

I also ran `cargo test -p codex-core`; it built successfully and then
hit an existing unrelated stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.
2026-05-11 23:58:36 +00:00
Michael Bolin
a9dc65e802 merge commit for archive created by Sapling 2026-05-11 16:50:06 -07:00
Michael Bolin
014c5898ce Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 16:49:49 -07:00
starr-openai
22e84c49d0 Support multi-environment apply_patch selection (#21617)
## Summary
- add multi-environment apply_patch routing for both freeform and
function-call tool flows
- parse and reconcile the optional environment selector in the main
apply_patch parser, then verify against the selected environment in the
handler
- carry environment_id through runtime and approval surfaces so
remote-targeted patches stay explicit end to end

## Testing
- just fmt
- remote exec-server e2e: `cargo test -p codex-core --test all
apply_patch_multi_environment_uses_remote_executor -- --nocapture` on
dev via `scripts/test-remote-env.sh`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-11 16:33:44 -07:00
Michael Bolin
53d3023da1 merge commit for archive created by Sapling 2026-05-11 16:29:03 -07:00
Michael Bolin
ba3a40bc3b Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 16:28:52 -07:00
Michael Bolin
1c2f0d38d3 merge commit for archive created by Sapling 2026-05-11 16:05:29 -07:00
Michael Bolin
4081253747 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 16:05:17 -07:00
Michael Bolin
f921703092 merge commit for archive created by Sapling 2026-05-11 15:44:54 -07:00
Michael Bolin
435b7ab8c5 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 15:44:41 -07:00
alexsong-oai
bb6134c028 Stop uploading accepted line fingerprints (#22180)
## Summary
- keep accepted-line diff parsing and fingerprint hashing logic locally
- stop uploading path/line hash fingerprints in the accepted-line
analytics event payload
- keep aggregate accepted added/deleted line counts in the event

## Testing
- just fmt
- cargo test -p codex-analytics
- just fix -p codex-analytics
2026-05-11 15:41:38 -07:00
Owen Lin
4859d80ffe Update codex remote-control to start the daemon (#22218)
## Why
Update `codex remote-control` to use the new app server daemon commands
instead.
- if the updater loop is not running, bootstrap the daemon with remote
control enabled (`codex app-server daemon bootstrap --remote-control`)
- otherwise, enable the persisted remote-control setting and start the
daemon normally
2026-05-11 15:38:30 -07:00
Michael Bolin
5fa4a8c994 merge commit for archive created by Sapling 2026-05-11 15:29:31 -07:00
Michael Bolin
2d23f3ad7b Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 15:24:36 -07:00
Michael Bolin
c50d5fdbb7 Merge 6579ec2f9d into sapling-pr-archive-bolinfest 2026-05-11 15:23:23 -07:00
Michael Bolin
6579ec2f9d Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 15:23:16 -07:00
Abhinav
9ab7f4e6ac Add Windows hook command overrides (#22159)
# Why

Managed hook configs need a shared cross-platform shape without making
the existing `command` field polymorphic. The common case is still one
command string, with Windows needing a different entrypoint only when
the runtime is actually Windows.

Keeping `command` as the portable/default path and adding an optional
Windows override keeps the config easier to read, preserves the existing
scalar shape for non-Windows users, and avoids forcing every caller into
a `{ unix, windows }` object when only one platform needs special
handling.

# What

- Add optional `command_windows` / `commandWindows` alongside the
existing hook `command` field.
- Resolve `command_windows` only on Windows during hook discovery; other
platforms continue to use `command` unchanged.
- Keep trust hashing aligned to the effective command selected for the
current runtime.

# Docs

The Codex hooks/config reference should document `command_windows` as
the Windows-only override for command hooks.
2026-05-11 22:22:29 +00:00
Michael Bolin
84307c03ee merge commit for archive created by Sapling 2026-05-11 15:19:58 -07:00
Michael Bolin
162da66557 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 15:19:34 -07:00
rhan-oai
a175ddacc0 [codex-analytics] emit terminal review events (#18748)
## Why

Review telemetry should describe reviews as first-class events, not only
as counters denormalized onto terminal tool-item events. That lets us
analyze guardian and user reviews consistently across command execution,
file changes, permissions, and network access, while still preserving
the terminal item summaries that existing tool analytics need.

To make those review events accurate, analytics also needs the observed
completion time for each review and enough command metadata to
distinguish `shell` from `unified_exec` reviews.

## What changed

- emit generic `codex_review_event` rows for completed user and guardian
reviews, with review subjects, reviewer, trigger, terminal status,
resolution, and observed duration
- reduce approval request / response / abort facts into review events
for command execution, file change, and permissions flows
- keep denormalized review counts, final approval outcome, and
permission-request flags on terminal tool-item events for
item-associated reviews
- plumb review completion timing so user-review responses and aborts use
app-server-observed completion times, while guardian analytics reuse the
same terminal timestamps emitted on guardian assessment events
- carry command approval `source` through the protocol and app-server
layers so review analytics can distinguish `shell` from `unified_exec`
- add analytics coverage for user-review emission, guardian-review
emission, permission reviews that should not denormalize onto tool
items, item-summary isolation across threads, and the serialized
review-event shape

## Verification

- `cargo test -p codex-analytics`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18748).
* __->__ #18748
* #21434
* #18747
* #17090
* #17089
* #20514
2026-05-11 22:13:32 +00:00
Ahmed Ibrahim
aa9e8f0262 [8/8] Add Python SDK Ruff formatting (#22021)
## Why

The Python SDK needs the same tight formatter/lint loop as the rest of
the repo: a safe Ruff autofix pass, Ruff formatting, editor save
behavior, and CI checks that catch drift. Without that loop, SDK changes
can land with formatting or import ordering that differs from what
reviewers and CI expect.

## What

- Add Ruff configuration to `sdk/python/pyproject.toml`, excluding
generated protocol code and notebooks from the normal lint/format pass.
- Update `just fmt` so it still formats Rust and also runs Python SDK
Ruff autofix and formatting.
- Add Python SDK CI steps for `ruff check` and `ruff format --check`
before pytest.
- Recommend the Ruff VS Code extension and enable Python
format/fix/organize-on-save so Cmd+S uses the same tooling.
- Apply the resulting Ruff formatting to SDK Python files, examples, and
the checked-in generated `v2_all.py` output emitted by the pinned
generator.
- Add a guard test for the `just fmt` recipe so it keeps working from
both Rust and Python SDK working directories.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. This PR `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added `test_root_fmt_recipe_formats_rust_and_python_sdk` for the
shared format recipe.
- Ran `just fmt` after the recipe update.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 01:10:29 +03:00
Michael Bolin
f7a99e5a29 Merge ea13af2fa5 into sapling-pr-archive-bolinfest 2026-05-11 15:06:46 -07:00
Ahmed Ibrahim
3e10e09e24 [7/8] Add Python SDK app-server integration harness (#22014)
## Why

The SDK had behavioral tests that replaced SDK client internals. Those
tests could catch wrapper mistakes, but they did not prove the pinned
app-server runtime, generated notification models, request routing, and
sync/async public clients worked together.

This PR adds deterministic integration coverage that starts the pinned
`codex app-server` process and mocks only the upstream Responses HTTP
boundary.

## What

- Add `AppServerHarness` and `MockResponsesServer` helpers for isolated
`CODEX_HOME`, mock-provider config, queued SSE responses, and captured
`/v1/responses` requests.
- Add shared helpers for SSE construction, stream assertions,
approval-policy inspection, and image fixtures.
- Split integration coverage into focused modules for run behavior,
inputs, streaming, turn controls, approvals, and thread lifecycle.
- Cover sync and async `Thread.run`, `TurnHandle.stream`, interleaved
streams, approval-mode persistence, lifecycle helpers, final-answer
phase handling, image inputs, loaded skill input injection, steering,
interruption, listing, history reads, run overrides, and token usage
mapping.
- Replace public-wrapper tests that duplicated integration-test behavior
with lower-level client tests only where direct client behavior is the
thing under test.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. #21910 `[6/8]` Add high-level Python SDK approval mode
7. This PR `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added pinned app-server integration tests under
`sdk/python/tests/test_app_server_*.py` and
`test_real_app_server_integration.py`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 01:06:41 +03:00
Michael Bolin
ea13af2fa5 Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots. 2026-05-11 15:06:35 -07:00
Ahmed Ibrahim
2b90c37069 [6/8] Add high-level Python SDK approval mode (#21910)
## Why

The high-level SDK should expose the approval behavior it actually
supports instead of leaking generated app-server routing fields. New
work should have two clear choices: default auto review, or explicitly
deny escalated permission requests. Existing threads and subsequent
turns should preserve their current approval behavior unless the caller
passes an override.

## What

- Add the public `ApprovalMode` enum with `auto_review` and `deny_all`.
- Default new thread creation to `ApprovalMode.auto_review`.
- Preserve existing approval settings by default for resume, fork, run,
and turn helpers.
- Remove raw `approval_policy` / `approvals_reviewer` kwargs from
high-level SDK wrappers.
- Update generated wrapper output, docs, examples, notebooks, and tests
for the high-level approval mode API.

## Stack

1. #21891 `[1/8]` Pin Python SDK runtime dependency
2. #21893 `[2/8]` Generate Python SDK types from pinned runtime
3. #21895 `[3/8]` Run Python SDK tests in CI
4. #21896 `[4/8]` Define Python SDK public API surface
5. #21905 `[5/8]` Rename Python SDK package to `openai-codex`
6. This PR `[6/8]` Add high-level Python SDK approval mode
7. #22014 `[7/8]` Add Python SDK app-server integration harness
8. #22021 `[8/8]` Add Python SDK Ruff formatting

## Verification

- Added approval-mode mapping/default tests for new threads, existing
threads, forks, resumes, and subsequent turns.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-12 01:02:43 +03:00