codex

mirror of https://github.com/openai/codex.git synced 2026-06-01 19:02:59 +00:00

Author	SHA1	Message	Date
Michael Bolin	4ca6efdba1	merge commit for archive created by Sapling	2026-05-11 20:08:33 -07:00
Michael Bolin	8210503007	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 20:08:16 -07:00
Michael Bolin	c8ba58b46a	Merge `5d0c7dea61` into sapling-pr-archive-bolinfest	2026-05-11 19:54:59 -07:00
Michael Bolin	5d0c7dea61	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 19:54:51 -07:00
pakrym-oai	79c65f816c	[codex] Filter legacy warning messages during compaction (#22243 ) ## Why Older sessions can contain model-warning records persisted as `user` messages, including the unified exec process-limit warning, the `apply_patch`-via-`exec_command` warning, and the model-mismatch high-risk cyber fallback warning. Those warnings are no longer produced as conversation history items, but when old sessions compact they should still be recognized as injected context rather than preserved as real user turns. ## What changed - Removed `record_model_warning` and the production paths that emitted these warning messages into conversation history. - Added `LegacyUnifiedExecProcessLimitWarning`, `LegacyApplyPatchExecCommandWarning`, and `LegacyModelMismatchWarning` contextual fragments that are used only for matching old persisted messages. - Registered the legacy fragments with contextual user message detection so compaction filters them through the existing fragment path. - Added focused compaction coverage for old warning messages being dropped during compacted-history processing. ## Testing - `cargo test -p codex-core warning` - `just fix -p codex-core`	2026-05-11 19:51:51 -07:00
Michael Bolin	58af6a52c4	merge commit for archive created by Sapling	2026-05-11 19:50:03 -07:00
Michael Bolin	5801edb3eb	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 19:49:51 -07:00
Abhinav	d08906a944	Support PreToolUse updatedInput rewrites (#20527 ) ## Why `PreToolUse` already exposes `updatedInput` in its hook output schema, but Codex currently rejects it instead of applying the rewrite. That leaves hook authors unable to make the documented pre-execution adjustment to a tool call before it runs. ## What - Accept `updatedInput` from `PreToolUse` hooks when paired with `permissionDecision: "allow"`. - Apply the rewritten input before dispatch so the tool executes the updated payload, not the original one. - Preserve the stable hook-facing compatibility shapes that participating tool handlers expose: - Bash-like tools (`shell`, `container.exec`, `local_shell`, `shell_command`, `exec_command`) use `{ "command": ... }`. - `apply_patch` exposes its patch body through the same command-shaped hook contract. - MCP tools expose their JSON argument object directly. - Keep each participating tool handler responsible for translating hook-facing `updatedInput` back into its concrete invocation shape. ## Verification Direct Bash-like rewrite coverage: - `pre_tool_use_rewrites_shell_before_execution` - `pre_tool_use_rewrites_container_exec_before_execution` - `pre_tool_use_rewrites_local_shell_before_execution` - `pre_tool_use_rewrites_shell_command_before_execution` - `pre_tool_use_rewrites_exec_command_before_execution` These cases assert that each supported Bash-like surface runs only the rewritten command while the hook still observes the original `{ "command": ... }` input. `pre_tool_use_rewrites_apply_patch_before_execution` - Model emits one patch. - Hook swaps in a different patch. - Asserts only the rewritten file is created, and the hook saw the original patch. `pre_tool_use_rewrites_code_mode_nested_exec_command_before_execution` - Model runs one nested shell command from code mode. - Hook rewrites it. - Asserts only the rewritten command runs, and the hook saw the original nested input. `pre_tool_use_rewrites_mcp_tool_before_execution` - Model calls the RMCP echo tool. - Hook rewrites the MCP arguments. - Asserts the MCP server receives and returns the rewritten message, not the original one.	2026-05-11 22:27:24 -04:00
Michael Bolin	1bd15bc24a	merge commit for archive created by Sapling	2026-05-11 19:06:38 -07:00
Michael Bolin	d824faf0dc	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 19:06:28 -07:00
starr-openai	17ed5ad0b0	Apply sandbox context to local view_image reads (#21861 ) ## Summary - create a selected-cwd filesystem sandbox context for view_image metadata and file reads in both local and remote environments - add a local restricted-profile regression test for the previously unsandboxed read path ## Validation - just fmt - bazel test --bes_backend= --bes_results_url= --test_output=errors --test_filter=view_image::tests::handle_passes_sandbox_context_for_local_filesystem_reads //codex-rs/core:core-unit-tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 18:48:43 -07:00
Michael Bolin	c0a3e2bc63	merge commit for archive created by Sapling	2026-05-11 18:40:33 -07:00
Michael Bolin	448ea1b930	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 18:40:24 -07:00
Michael Bolin	583117ffa1	merge commit for archive created by Sapling	2026-05-11 18:13:13 -07:00
Michael Bolin	bb9aa31ee5	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 18:13:04 -07:00
efrazer-oai	fd24c00b0b	feat(skills): default plugin creator to personal share flow (#22221 ) ## Summary Plugin creation now defaults to the personal marketplace path and ends with a readable handoff back into Codex after a marketplace-backed scaffold. Before this change, `plugin-creator` centered repo-local marketplace updates and did not clearly guide the agent to return the user to the created plugin afterward. This PR updates the bundled system skill so marketplace-backed scaffolds default to `~/plugins/<plugin-name>` plus `~/.agents/plugins/marketplace.json`, ask for user intent only when an existing repo marketplace makes personal vs team scope ambiguous, and end with named Markdown deeplinks labeled `View <plugin-name>` and `Share <plugin-name>`. ## What changed - default marketplace-backed creation to the personal plugin location - document the explicit repo/team override path for codebases that should own the plugin entry - ask personal vs team only when the current Git repo already has `.agents/plugins/marketplace.json` and the user has not stated scope - require named Markdown deeplinks after marketplace-backed creation so the final response returns the user to the exact plugin cleanly - keep the deeplink targets precise with real absolute `marketplacePath` and normalized `pluginName` values - align the bundled prompt, scaffold help text, and marketplace reference spec with the new default ## Testing Tests: targeted skill validation, Python compile checks, personal-default scaffold smoke, repo-override scaffold smoke, and whitespace checks.	2026-05-11 17:58:48 -07:00
Michael Bolin	3bb2466299	merge commit for archive created by Sapling	2026-05-11 17:58:06 -07:00
Michael Bolin	4c0a41a53d	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 17:57:58 -07:00
Michael Bolin	9077a2d7dd	merge commit for archive created by Sapling	2026-05-11 17:38:35 -07:00
Michael Bolin	b191b5e546	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 17:38:18 -07:00
Michael Bolin	eae233a57b	merge commit for archive created by Sapling	2026-05-11 17:20:21 -07:00
Michael Bolin	0e2d80b644	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 17:20:13 -07:00
pakrym-oai	ed5944ba1d	Simplify MCP tool handler plumbing (#21595 ) ## Why The MCP tool path had accumulated a few core-owned special cases: a dedicated payload variant, resolver plumbing, a legacy `AfterToolUse` translation path, and a side channel for parallel-call metadata. That made `ToolRegistry` and the spec builder know more about MCP than they needed to. This change moves MCP-specific execution details back onto `ToolInfo` and `McpHandler` so `codex-core` can treat MCP calls like normal function calls while still preserving MCP-specific dispatch and telemetry behavior where it belongs. ## What changed - removed `resolve_mcp_tool_info`, `ToolPayload::Mcp`, `ToolKind`, and the remaining registry-side MCP resolver path - stored MCP routing metadata directly on `McpHandler` and `ToolInfo`, including `supports_parallel_tool_calls` - deleted the legacy `AfterToolUse` consumer in `core`, which removes the need for handler-specific `after_tool_use_payload` implementations - switched tool-result telemetry to handler-provided tags and kept MCP-specific dispatch payload construction inside the handler - simplified tool spec planning/building by passing `ToolInfo` directly and dropping the direct/deferred MCP wrapper structs and the parallel-server side table ## Testing - `cargo check -p codex-core -p codex-mcp -p codex-otel` - `cargo test -p codex-core mcp_parallel_support_uses_exact_payload_server` - `cargo test -p codex-core direct_mcp_tools_register_namespaced_handlers` - `cargo test -p codex-core search_tool_description_lists_each_mcp_source_once` - `cargo test -p codex-mcp list_all_tools_uses_startup_snapshot_while_client_is_pending` - `just fix -p codex-core -p codex-mcp -p codex-otel`	2026-05-12 00:11:31 +00:00
Felipe Coury	e16b4e46d4	fix(tui): handle hidden app git directives (#21946 )	2026-05-11 21:08:40 -03:00
Ruslan Nigmatullin	95d8669ab2	[exec-server] serve websocket listener via HTTP upgrade (#21963 ) ## Why `codex exec-server` should keep the existing public `ws://IP:PORT` URL shape while serving that websocket connection through an HTTP upgrade path internally. That keeps the client-facing configuration simple and allows the listener to work through intermediate HTTP-aware infrastructure. ## What changed - keep the emitted and configured exec-server URL as `ws://IP:PORT` - serve that websocket endpoint through Axum HTTP upgrade handling on `/` - expose `GET /readyz` from the same listener for readiness checks - route upgraded Axum websocket streams through the shared JSON-RPC connection machinery - initialize the rustls crypto provider before websocket client connections - preserve inbound binary websocket JSON-RPC parsing for compatibility with the prior transport behavior ## Verification - `cargo test -p codex-exec-server --test health --test process --test websocket --test initialize --test exec_process`	2026-05-11 17:04:21 -07:00
Matthew Zeng	e15ecc9c35	Add production startup and TTFT telemetry (#22198 ) ## Why While investigating `codex exec hi` startup latency, the useful questions were not "is startup slow?" but "which durable bucket is slow in production?" The path we observed has a few distinct stages: 1. `thread/start` creates the session 2. startup prewarm builds the turn context, tools, and prompt 3. startup prewarm warms the websocket 4. the first real turn resolves the prewarm 5. the model produces the first token Before this PR, production telemetry had some of the raw measurements already: - aggregate startup-prewarm duration / age-at-first-turn metrics - TTFT as a metric - websocket request telemetry But there was no coherent production event stream for the startup breakdown itself, and TTFT was metric-only. That made it hard to answer the same latency questions from OpenTelemetry-backed logs without adding one-off local instrumentation. ## What changed Add durable production telemetry on the existing `SessionTelemetry` path: - new `codex.startup_phase` OTel log/trace events plus `codex.startup.phase.duration_ms` - new `codex.turn_ttft` OTel log/trace events while preserving the existing TTFT metric The startup phase event is emitted for the coarse buckets we actually observed while running `exec hi`: - `thread_start_create_thread` - `startup_prewarm_total` - `startup_prewarm_create_turn_context` - `startup_prewarm_build_tools` - `startup_prewarm_build_prompt` - `startup_prewarm_websocket_warmup` - `startup_prewarm_resolve` These phases are intentionally low-cardinality so they remain safe as production telemetry tags. ## Why this shape This keeps the instrumentation on the same production path as the rest of the session telemetry instead of adding a local debug-only trace mode. It also avoids changing startup behavior: - prewarm still runs - no control flow changes - no extra remote calls - no user-visible behavior changes One boundary is intentional: very early process bootstrap that happens before a session exists is not included here, because this PR uses session-scoped production telemetry. The expensive buckets we were trying to understand after `thread/start` are now covered durably. ## Verification - `cargo test -p codex-otel` - `cargo test -p codex-core turn_timing` - `cargo test -p codex-core regular_turn_emits_turn_started_without_waiting_for_startup_prewarm` - `cargo test -p codex-core interrupting_regular_turn_waiting_on_startup_prewarm_emits_turn_aborted` - `cargo test -p codex-app-server thread_start` - `just fix -p codex-otel -p codex-core -p codex-app-server` I also ran `cargo test -p codex-core`; it built successfully and then hit an existing unrelated stack overflow in `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.	2026-05-11 23:58:36 +00:00
Michael Bolin	a9dc65e802	merge commit for archive created by Sapling	2026-05-11 16:50:06 -07:00
Michael Bolin	014c5898ce	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 16:49:49 -07:00
starr-openai	22e84c49d0	Support multi-environment apply_patch selection (#21617 ) ## Summary - add multi-environment apply_patch routing for both freeform and function-call tool flows - parse and reconcile the optional environment selector in the main apply_patch parser, then verify against the selected environment in the handler - carry environment_id through runtime and approval surfaces so remote-targeted patches stay explicit end to end ## Testing - just fmt - remote exec-server e2e: `cargo test -p codex-core --test all apply_patch_multi_environment_uses_remote_executor -- --nocapture` on dev via `scripts/test-remote-env.sh` --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-11 16:33:44 -07:00
Michael Bolin	53d3023da1	merge commit for archive created by Sapling	2026-05-11 16:29:03 -07:00
Michael Bolin	ba3a40bc3b	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 16:28:52 -07:00
Michael Bolin	1c2f0d38d3	merge commit for archive created by Sapling	2026-05-11 16:05:29 -07:00
Michael Bolin	4081253747	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 16:05:17 -07:00
Michael Bolin	f921703092	merge commit for archive created by Sapling	2026-05-11 15:44:54 -07:00
Michael Bolin	435b7ab8c5	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 15:44:41 -07:00
alexsong-oai	bb6134c028	Stop uploading accepted line fingerprints (#22180 ) ## Summary - keep accepted-line diff parsing and fingerprint hashing logic locally - stop uploading path/line hash fingerprints in the accepted-line analytics event payload - keep aggregate accepted added/deleted line counts in the event ## Testing - just fmt - cargo test -p codex-analytics - just fix -p codex-analytics	2026-05-11 15:41:38 -07:00
Owen Lin	4859d80ffe	Update codex remote-control to start the daemon (#22218 ) ## Why Update `codex remote-control` to use the new app server daemon commands instead. - if the updater loop is not running, bootstrap the daemon with remote control enabled (`codex app-server daemon bootstrap --remote-control`) - otherwise, enable the persisted remote-control setting and start the daemon normally	2026-05-11 15:38:30 -07:00
Michael Bolin	5fa4a8c994	merge commit for archive created by Sapling	2026-05-11 15:29:31 -07:00
Michael Bolin	2d23f3ad7b	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 15:24:36 -07:00
Michael Bolin	c50d5fdbb7	Merge `6579ec2f9d` into sapling-pr-archive-bolinfest	2026-05-11 15:23:23 -07:00
Michael Bolin	6579ec2f9d	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 15:23:16 -07:00
Abhinav	9ab7f4e6ac	Add Windows hook command overrides (#22159 ) # Why Managed hook configs need a shared cross-platform shape without making the existing `command` field polymorphic. The common case is still one command string, with Windows needing a different entrypoint only when the runtime is actually Windows. Keeping `command` as the portable/default path and adding an optional Windows override keeps the config easier to read, preserves the existing scalar shape for non-Windows users, and avoids forcing every caller into a `{ unix, windows }` object when only one platform needs special handling. # What - Add optional `command_windows` / `commandWindows` alongside the existing hook `command` field. - Resolve `command_windows` only on Windows during hook discovery; other platforms continue to use `command` unchanged. - Keep trust hashing aligned to the effective command selected for the current runtime. # Docs The Codex hooks/config reference should document `command_windows` as the Windows-only override for command hooks.	2026-05-11 22:22:29 +00:00
Michael Bolin	84307c03ee	merge commit for archive created by Sapling	2026-05-11 15:19:58 -07:00
Michael Bolin	162da66557	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 15:19:34 -07:00
rhan-oai	a175ddacc0	[codex-analytics] emit terminal review events (#18748 ) ## Why Review telemetry should describe reviews as first-class events, not only as counters denormalized onto terminal tool-item events. That lets us analyze guardian and user reviews consistently across command execution, file changes, permissions, and network access, while still preserving the terminal item summaries that existing tool analytics need. To make those review events accurate, analytics also needs the observed completion time for each review and enough command metadata to distinguish `shell` from `unified_exec` reviews. ## What changed - emit generic `codex_review_event` rows for completed user and guardian reviews, with review subjects, reviewer, trigger, terminal status, resolution, and observed duration - reduce approval request / response / abort facts into review events for command execution, file change, and permissions flows - keep denormalized review counts, final approval outcome, and permission-request flags on terminal tool-item events for item-associated reviews - plumb review completion timing so user-review responses and aborts use app-server-observed completion times, while guardian analytics reuse the same terminal timestamps emitted on guardian assessment events - carry command approval `source` through the protocol and app-server layers so review analytics can distinguish `shell` from `unified_exec` - add analytics coverage for user-review emission, guardian-review emission, permission reviews that should not denormalize onto tool items, item-summary isolation across threads, and the serialized review-event shape ## Verification - `cargo test -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18748). * __->__ #18748 * #21434 * #18747 * #17090 * #17089 * #20514	2026-05-11 22:13:32 +00:00
Ahmed Ibrahim	aa9e8f0262	[8/8] Add Python SDK Ruff formatting (#22021 ) ## Why The Python SDK needs the same tight formatter/lint loop as the rest of the repo: a safe Ruff autofix pass, Ruff formatting, editor save behavior, and CI checks that catch drift. Without that loop, SDK changes can land with formatting or import ordering that differs from what reviewers and CI expect. ## What - Add Ruff configuration to `sdk/python/pyproject.toml`, excluding generated protocol code and notebooks from the normal lint/format pass. - Update `just fmt` so it still formats Rust and also runs Python SDK Ruff autofix and formatting. - Add Python SDK CI steps for `ruff check` and `ruff format --check` before pytest. - Recommend the Ruff VS Code extension and enable Python format/fix/organize-on-save so Cmd+S uses the same tooling. - Apply the resulting Ruff formatting to SDK Python files, examples, and the checked-in generated `v2_all.py` output emitted by the pinned generator. - Add a guard test for the `just fmt` recipe so it keeps working from both Rust and Python SDK working directories. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. #21910 `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. This PR `[8/8]` Add Python SDK Ruff formatting ## Verification - Added `test_root_fmt_recipe_formats_rust_and_python_sdk` for the shared format recipe. - Ran `just fmt` after the recipe update. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-12 01:10:29 +03:00
Michael Bolin	f7a99e5a29	Merge `ea13af2fa5` into sapling-pr-archive-bolinfest	2026-05-11 15:06:46 -07:00
Ahmed Ibrahim	3e10e09e24	[7/8] Add Python SDK app-server integration harness (#22014 ) ## Why The SDK had behavioral tests that replaced SDK client internals. Those tests could catch wrapper mistakes, but they did not prove the pinned app-server runtime, generated notification models, request routing, and sync/async public clients worked together. This PR adds deterministic integration coverage that starts the pinned `codex app-server` process and mocks only the upstream Responses HTTP boundary. ## What - Add `AppServerHarness` and `MockResponsesServer` helpers for isolated `CODEX_HOME`, mock-provider config, queued SSE responses, and captured `/v1/responses` requests. - Add shared helpers for SSE construction, stream assertions, approval-policy inspection, and image fixtures. - Split integration coverage into focused modules for run behavior, inputs, streaming, turn controls, approvals, and thread lifecycle. - Cover sync and async `Thread.run`, `TurnHandle.stream`, interleaved streams, approval-mode persistence, lifecycle helpers, final-answer phase handling, image inputs, loaded skill input injection, steering, interruption, listing, history reads, run overrides, and token usage mapping. - Replace public-wrapper tests that duplicated integration-test behavior with lower-level client tests only where direct client behavior is the thing under test. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. #21910 `[6/8]` Add high-level Python SDK approval mode 7. This PR `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - Added pinned app-server integration tests under `sdk/python/tests/test_app_server_*.py` and `test_real_app_server_integration.py`. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-12 01:06:41 +03:00
Michael Bolin	ea13af2fa5	Move workspace roots onto thread/session state and stop using active permission profile modifications as an overlay for writable roots. Existing app-server threads now preserve their persisted PermissionProfile value across resume, fork, and turn updates; permissions requests on existing threads only update the active named profile after validating it exists. Workspace roots can be updated independently, and SandboxPolicy::WorkspaceWrite no longer stores its own writable_roots.	2026-05-11 15:06:35 -07:00
Ahmed Ibrahim	2b90c37069	[6/8] Add high-level Python SDK approval mode (#21910 ) ## Why The high-level SDK should expose the approval behavior it actually supports instead of leaking generated app-server routing fields. New work should have two clear choices: default auto review, or explicitly deny escalated permission requests. Existing threads and subsequent turns should preserve their current approval behavior unless the caller passes an override. ## What - Add the public `ApprovalMode` enum with `auto_review` and `deny_all`. - Default new thread creation to `ApprovalMode.auto_review`. - Preserve existing approval settings by default for resume, fork, run, and turn helpers. - Remove raw `approval_policy` / `approvals_reviewer` kwargs from high-level SDK wrappers. - Update generated wrapper output, docs, examples, notebooks, and tests for the high-level approval mode API. ## Stack 1. #21891 `[1/8]` Pin Python SDK runtime dependency 2. #21893 `[2/8]` Generate Python SDK types from pinned runtime 3. #21895 `[3/8]` Run Python SDK tests in CI 4. #21896 `[4/8]` Define Python SDK public API surface 5. #21905 `[5/8]` Rename Python SDK package to `openai-codex` 6. This PR `[6/8]` Add high-level Python SDK approval mode 7. #22014 `[7/8]` Add Python SDK app-server integration harness 8. #22021 `[8/8]` Add Python SDK Ruff formatting ## Verification - Added approval-mode mapping/default tests for new threads, existing threads, forks, resumes, and subsequent turns. --------- Co-authored-by: Codex <noreply@openai.com>	2026-05-12 01:02:43 +03:00

1 2 3 4 5 ...

15364 Commits