Commit Graph

6305 Commits

Author SHA1 Message Date
starr-openai
2bfcc88340 Compile archive lld shim with Roslyn 2026-05-13 17:10:35 -07:00
starr-openai
134893adf3 Fix MSVC lld wrapper compilation 2026-05-13 17:06:07 -07:00
starr-openai
bb20435949 Fix archive lld wrapper compilation 2026-05-13 17:05:58 -07:00
starr-openai
c974d3f5cd Wrap lld for ARM64 MSVC setup action 2026-05-13 17:03:50 -07:00
starr-openai
6d357d4fe3 Filter unsupported ARM64 lld flag in archive probe 2026-05-13 17:03:43 -07:00
starr-openai
11a10df438 Prefer rust-lld in MSVC setup action 2026-05-13 16:56:26 -07:00
starr-openai
2116527ae0 Probe ARM64 archive with rust-lld 2026-05-13 16:56:19 -07:00
starr-openai
96b02724a0 Prefer lld-link in MSVC setup action 2026-05-13 16:47:23 -07:00
starr-openai
04132fdfbf Probe ARM64 archive with lld-link 2026-05-13 16:47:22 -07:00
starr-openai
fe3230ad4a Add archive cargo process probe 2026-05-13 16:34:37 -07:00
starr-openai
ff0141d713 Set explicit ARM64 Cargo linker for archive probe 2026-05-13 16:14:45 -07:00
starr-openai
d5ebb31383 Set explicit Cargo linker in MSVC setup action 2026-05-13 16:14:35 -07:00
starr-openai
5440bbfaaa Normalize MSVC PATH export 2026-05-13 16:06:56 -07:00
starr-openai
9c6ce80d08 Normalize MSVC PATH export for archive probe 2026-05-13 16:06:53 -07:00
starr-openai
202487bd63 Export ARM64 MSVC env for archive probe 2026-05-13 16:00:38 -07:00
starr-openai
858c744081 Add MSVC env helper for ARM64 archive build 2026-05-13 15:59:13 -07:00
starr-openai
7f3f228a60 Try Windows arm64 nextest archive
Add an opt-in rust-ci-full path that builds the Windows arm64 nextest archive on Windows x64, uploads it, and runs Windows arm64 shard jobs from that archive instead of recompiling in every shard.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 18:11:30 -07:00
starr-openai
755d128add Shard Windows arm64 nextest runs
Add a dynamic rust-ci-full test matrix so workflow_dispatch or shard-specific full-ci branch names can split the Windows arm64 nextest lane across 2 or 4 hosts while leaving the default push behavior unchanged.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 17:11:16 -07:00
starr-openai
cd8ea2f36b Keep sccache stats alive through CI jobs
Disable the sccache daemon idle timeout in rust-ci-full so long test phases can still report the compile-cache stats collected during the build phase.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 15:59:45 -07:00
starr-openai
fcb1fb8ec6 Re-enable Windows sccache in Rust CI
Let Windows rust-ci-full jobs use sccache again, store the fallback cache on the configured work drive, and set Cargo's rustc wrapper to an absolute sccache path so Windows subprocesses resolve it consistently.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 15:20:40 -07:00
starr-openai
077a3970d7 Use Dev Drive for Windows CI
Configure Windows Rust CI jobs and the shared Bazel CI setup to put temp, repository-cache, and output-root paths on the runner's fast work drive when available. Fall back to C: if no secondary drive or Dev Drive provisioning path is available.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 15:20:40 -07:00
starr-openai
5815dd6a4b Give Windows arm64 tests enough CI time
Let the Windows arm64 test matrix use a longer timeout after CI showed the lane spending most of the default 45 minutes compiling before nextest could finish.

Also pin nextest through taiki-e/install-action's supported tool version syntax so the requested version is not ignored.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 15:20:39 -07:00
starr-openai
296fa6df0c Serialize Windows process-heavy nextest cases
Windows rust-ci-full repeatedly times out in subprocess-heavy tests even when the global nextest thread count is capped. Isolate the recurring Windows-only families with nextest overrides so the rest of the suite can keep normal parallelism.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 15:20:39 -07:00
starr-openai
64c684bd57 Add Windows nextest thread override for rust-ci-full
Co-authored-by: Codex <noreply@openai.com>
2026-05-07 15:20:39 -07:00
starr-openai
ce5d84e43a Make pending sideband close test deterministic
Replace the realtime websocket accept-delay race with an explicit test-server gate so close is issued while the sideband connection is pending, then prove the closed conversation does not emit stale events or send sideband websocket requests.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 15:20:35 -07:00
starr-openai
926b8d77cd Tolerate transient Windows metadata denial in memory startup test
Keep polling when Windows temporarily denies metadata reads while the phase 2 memory workspace is being cleaned up, so the test still verifies the file is removed and the baseline becomes clean.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:09 -07:00
starr-openai
7cd5127421 Wait for agent shutdown before resume tests reopen IDs
Subscribe before test shutdown and close operations, then wait for the Shutdown status before resuming the same thread IDs. This removes the Windows live-writer race exposed by the full nextest run.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:09 -07:00
starr-openai
6a2ce743f1 Make Windows realtime shell test use successful cmd echo
Use a Windows command form that exits successfully in constrained CI shells and trim the expected newline in the delegated realtime shell-tool assertion.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:08 -07:00
starr-openai
32deb67fc6 Harden Windows realtime and agent resume tests
Avoid PowerShell command forms that depend on method invocation for the delegated realtime shell-tool test, and wait for a shutdown status before resuming the same subagent thread in the nickname/role restore test.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:08 -07:00
starr-openai
59d9e96d66 Use PowerShell literal output in sandbox tests
The legacy sandbox runs PowerShell in constrained language mode, so method calls fail and module-backed cmdlets may not autoload. Use literal string expressions for the PowerShell I/O smoke tests so they exercise process output without depending on cmdlets or method invocation.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:08 -07:00
starr-openai
097e3ef949 Avoid PowerShell module autoload in sandbox tests
Windows arm64 can launch pwsh in the legacy sandbox while still failing Write-Output because Microsoft.PowerShell.Utility cannot autoload. Use Console output in the legacy PowerShell smoke tests so they continue to verify sandbox process I/O without depending on module autoload.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:07 -07:00
starr-openai
f3afa1132d Fix rollout cwd fixture import
Import the Windows-aware test_path_buf helper from core_test_support where it is defined.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:07 -07:00
starr-openai
a666109389 Make rollout cwd fixtures drive-stable on Windows
Dev Drive setup can put temporary Codex homes on D:, which exposed test fixtures that wrote root-relative '/' rollout cwd values while assertions expected the Windows-aware C:\ root helper. Use the same test_path_buf helper when creating and expecting fake rollout cwd values so the tests remain independent of the process temp drive.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:07 -07:00
starr-openai
16648c8d1c Make realtime sideband failure test deterministic
Use the existing mock server as the sideband failure endpoint instead of relying on an OS-level connection refusal from 127.0.0.1:1. Disable retries in this failure-path test so Windows CI does not spend the default retry budget before emitting the expected error/close events.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:06 -07:00
starr-openai
7d2c8dbec4 Fix agent job worker assignment race
Claim job items before spawning workers and allow reports to complete unassigned running items, so fast workers cannot lose stop=true reports before the parent records their thread id.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:06 -07:00
starr-openai
bfe33e5a7a Make agent job stop cancellation atomic
A worker stop request used to record the item result and job cancellation in separate updates, so the job runner could observe the item completion first and continue spawning pending work. Commit both state updates together and prevent completion from overwriting a final cancellation.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 14:48:05 -07:00
William Woodruff
8abcc5357d [codex] Fully qualify hash-pins in GitHub Actions (#21436)
This builds on top of https://github.com/openai/codex/pull/15828 by
ensuring that hash-pinned actions with version comments are fully
qualified, rather than referencing floating/mutable comments like "v7".
This makes actions management tools behave more consistently.

This shouldn't break anything, since it's comment only. But if it does,
ping ww@ 🙂
2026-05-07 14:31:20 -07:00
Zanie Blue
27ec488ad5 Add a Cargo build profile for benchmarking (#21574)
A clean release build takes ~18m and an incremental build takes ~12m.
This is far too slow to iterate on performance related changes and the
build time is dominated by LTO.

This pull request adds a `profiling` profile for Cargo which takes ~13m
clean and ~6m incremental, the primary change is that LTO is disabled.
This matches a profile used in uv and follows the great work at
https://github.com/astral-sh/uv/pull/5955 — there's a bit of commentary
there about the trade-offs this implies.

We've found that this does not inhibit the ability to accurately
benchmark as measurements with LTO disabled are generally consistent
with the results with LTO enabled and it makes it much faster (~2x) to
rebuild after making a change.

This is motivated by my interest in improving Codex TUI performance,
which is blocked by the tragically builds right now.

I tested incremental build times by making a no-op change to the
`codex-cli` crate.
2026-05-07 14:30:35 -07:00
Zanie Blue
8367ef4522 Use descriptive names for Cargo profile options (#21582)
These are equivalent and their intent is clearer, e.g., I was confused
if `debug = 1` meant the same thing as `debug = true` (it does not).
2026-05-07 14:19:32 -07:00
iceweasel-oai
163eac9306 Grant sandbox users access to desktop runtime bin (#21564)
## Why

Codex desktop copies bundled Windows binaries out of `WindowsApps` into
a LocalAppData runtime cache before launching `codex.exe`. Sandboxed
commands can then need to execute helpers from that cache, but the
sandbox user group may not have read/execute access to the runtime bin
directory.

This makes the Windows sandbox refresh path repair that access directly
so the packaged desktop runtime remains usable from sandboxed sessions.

## What changed

- Added `setup_runtime_bin` to locate `%LOCALAPPDATA%\OpenAI\Codex\bin`,
matching the desktop bundled-binaries destination path, with the same
`USERPROFILE\AppData\Local` fallback shape.
- During refresh setup, check whether `CodexSandboxUsers` already has
read/execute access to the runtime bin directory.
- If access is missing, grant `CodexSandboxUsers` `OI/CI/RX` inheritance
on that directory.
- If the runtime bin directory does not exist, no-op cleanly.

## Verification

- `cargo build -p codex-windows-sandbox --bin
codex-windows-sandbox-setup`
- `cargo test -p codex-windows-sandbox --bin
codex-windows-sandbox-setup`
- Manual Windows ACL exercise against the installed packaged runtime
bin:
- existing inherited `CodexSandboxUsers:(I)(OI)(CI)(RX)` no-ops without
changing SDDL
- after disabling inheritance and removing the group ACE, setup adds
`CodexSandboxUsers:(OI)(CI)(RX)`
- with `LOCALAPPDATA` pointed at a fake location without
`OpenAI\Codex\bin`, setup exits successfully and does not create the
directory
- restored the real runtime bin with inherited ACLs and confirmed the
final SDDL matched the baseline exactly
2026-05-07 11:38:10 -07:00
Tom
4242bba2eb Route ThreadManager rollout path reads through thread store (#21265)
- Route ThreadManager rollout-path resume/fork through ThreadStore
history reads.
- Add in-memory store coverage proving path-addressed reads are used.

This isn't strictly necessary for the ThreadStore migration, since these
ThreadManager methods _only_ work for path-based lookups, but I'm trying
to migrate all the rollout recorder callsites to use the threadstore
were possible for consistency.
2026-05-07 11:25:25 -07:00
Tom
0274398901 [codex] Fix pathless thread summaries (#21266)
## Summary

Fix `getConversationSummary` so thread-id summaries work for stored
threads that do not have a local rollout path, such as remote thread
stores.

The root cause was that `summary_from_stored_thread` returned `None`
when `StoredThread.rollout_path` was absent, and
`get_thread_summary_response_inner` treated that as an internal error.
This made conversation-id lookups depend on a local-only field even
though the thread store can address the thread by id.
2026-05-07 11:18:16 -07:00
Tom
56823ec46b Move thread name edits to ThreadStore (#21264)
- Route live thread renames through `ThreadStore` metadata updates.
- Read resumed thread names from store metadata with legacy local
fallback preserved in the store.
2026-05-07 11:12:22 -07:00
Charlie Marsh
0dc1885a5c Upgrade cargo-shear to 1.11.2 (#21547)
## Summary

Catches a few additional dependencies (`sha2`, `url`) that should be in
`dev-dependencies`.
2026-05-07 11:07:18 -07:00
pakrym-oai
566f2cb612 [codex] Move tool specs onto handlers (#21461)
## Why

This is the next stacked step after deleting the tool-handler kind
indirection. Specs should come from the registered handlers themselves
so registry construction has a single source of truth for handler
behavior and exposed tool definitions.

## What changed

- Added `ToolHandler::spec()` plus handler-provided parallel/code-mode
metadata, and made `ToolRegistryBuilder::register_handler` automatically
collect specs from registered handlers.
- Moved builtin tool spec construction into the corresponding handlers
and their adjacent `_spec` modules, including shell, unified exec, apply
patch, view image, request plugin install, tool search, MCP resource,
goals, planning, permissions, agent jobs, and multi-agent tools.
- Reworked configurable handlers to receive their tool-building options
through constructors, with non-optional handler options where the
handler is always spec-backed. Shell fallback handlers keep an explicit
no-spec mode because they are also registered as hidden dispatch
aliases.
- Kept `CodeModeExecuteHandler` on the explicit configured wrapper so
the code-mode exec spec can still be built from the nested registry.

## Verification

- `cargo check -p codex-core`
- `cargo test -p codex-core tools::spec_plan::tests`
- `cargo test -p codex-core tools::spec::tests`
- `cargo test -p codex-core tools::handlers::multi_agents_spec::tests`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core
tools::handlers::multi_agents::tests`
- `cargo test -p codex-core tools::handlers::apply_patch::tests`
- `cargo test -p codex-core tools::handlers::unified_exec::tests`
- `just fix -p codex-core`
- `git diff --check`
2026-05-07 10:48:36 -07:00
jif-oai
eb0462f2af app-server: refresh live threads from latest config snapshot (#21187)
## Why

App-server config writes were leaving existing threads partially stale.
After a config mutation, the app-server told each live thread to run
`Op::ReloadUserConfig`, but that path only re-read the user
`config.toml` layer. Settings that came from the app-server's
materialized config snapshot did not propagate to existing threads until
restart.

This change prevent a FS access from `core` for CCA.

## What changed

- add `CodexThread::refresh_runtime_config()` and
`Session::refresh_runtime_config()` so the app-server can push a freshly
rebuilt config snapshot into a live thread
- rebuild the latest config with each thread's `cwd` after config
mutations, then refresh the thread from that snapshot instead of asking
it to reload only `config.toml`
- keep session-static settings unchanged during refresh, while updating
runtime-refreshable state such as the config layer stack,
`tool_suggest`, and derived hook/plugin/skill state
- keep `reload_user_config_layer()` as the file-backed fallback for
legacy local reload flows, but route the shared refresh logic through
the new runtime refresh path

## Testing

- add a session test that verifies `refresh_runtime_config()` rebuilds
hooks from refreshed config
- add a session test that verifies runtime-refreshable fields update
while session-static settings like `model` and `notify` stay unchanged

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 19:22:04 +02:00
Owen Lin
129401df43 add top-level remote-control command (#21424)
## Summary

`codex --enable remote_control app-server --listen off` is the current
way to start a headless, remote-controllable app-server, but it is hard
to remember and exposes implementation details.

This adds `codex remote-control` as a friendly top-level wrapper for
that flow. The command starts a foreground app-server with local
transports disabled and enables `remote_control` only for that
invocation.

## Changes

- Add a visible `codex remote-control` CLI subcommand.
- Launch app-server with `AppServerTransport::Off`.
- Append `features.remote_control=true` after root feature toggles so
the explicit command wins over `--disable remote_control`.
- Reject root `--remote` / `--remote-auth-token-env`, matching other
non-TUI subcommands.
- Add tests for parsing, launch defaults, override ordering, and remote
flag rejection.

## Verification

- `cargo test -p codex-cli`
- `just fix -p codex-cli`
2026-05-07 10:17:07 -07:00
pakrym-oai
857e731478 [codex] Remove string-keyed MCP tool maps (#21454)
## Summary

This PR removes the synthetic `HashMap<String, ToolInfo>` keys from MCP
tool discovery. `McpConnectionManager::list_all_tools()` now returns
normalized `Vec<ToolInfo>`, and downstream code derives identity from
`ToolInfo::canonical_tool_name()`.

The motivation is to keep model-visible tool identity on
`ToolName`/`ToolInfo` instead of parallel string map keys, so future
namespace changes do not have to preserve otherwise-unused lookup keys.

## Changes

- Rename the MCP normalization path from `qualify_tools` to
`normalize_tools_for_model` and return tool values directly.
- Flow MCP tool lists through connectors, plugin injection, router/spec
building, code mode, and tool search as vectors/slices.
- Keep direct/deferred subtraction local to `mcp_tool_exposure`, using
`ToolName` values.
- Update tests to compare `ToolName` instances where MCP identity
matters.

## Validation

- `cargo test -p codex-mcp test_normalize_tools`
- `cargo test -p codex-core mcp_tool_exposure`
- `cargo test -p codex-core
direct_mcp_tools_register_namespaced_handlers`
- `cargo test -p codex-core
search_tool_registers_namespaced_mcp_tool_aliases`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
2026-05-07 10:16:10 -07:00
xl-openai
114bac1409 feat: Expose plugin share metadata in shareContext (#21495)
Extends PluginSummary.shareContext with shareUrl and reader shareTargets
2026-05-07 10:07:03 -07:00
rhan-oai
3444b0d60a [codex-analytics] add tool review event schema (#18747)
## Why

We want to emit terminal review analytics for tool-related approval
flows, but the event contract needs to exist before the reducer can
publish anything.

This PR is the schema-only slice for the Codex review event family.

## What changed

- add the `ReviewEvent` analytics envelope in
`codex-rs/analytics/src/events.rs`
- define the review subject kind, reviewer, trigger, terminal status,
and post-review resolution enums
- define the review event payload with thread, turn, item, lineage,
tool, and timing fields that the emitter stack will populate

## Verification

- stacked verification in dependent PRs: `cargo test -p codex-analytics
analytics_client_tests --manifest-path codex-rs/Cargo.toml`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18747).
* #18748
* #21434
* __->__ #18747
* #17090
* #17089
* #20514
2026-05-07 09:46:46 -07:00