Compare commits

...

19 Commits

Author SHA1 Message Date
starr-openai
ff0c9ce47d Use absolute sccache wrapper path in Rust CI
Set both RUSTC_WRAPPER and CARGO_BUILD_RUSTC_WRAPPER from the resolved sccache binary path so Cargo and nextest do not rely on shell PATH lookup on Windows.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:51 -07:00
starr-openai
f361fc8438 Tolerate transient Windows metadata denial in memory startup test
Keep polling when Windows temporarily denies metadata reads while the phase 2 memory workspace is being cleaned up, so the test still verifies the file is removed and the baseline becomes clean.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:51 -07:00
starr-openai
ce151d1911 Fix agent job worker assignment race
Claim job items before spawning workers and allow reports to complete unassigned running items, so fast workers cannot lose stop=true reports before the parent records their thread id.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:51 -07:00
starr-openai
df6b08791b Wait for agent shutdown before resume tests reopen IDs
Subscribe before test shutdown and close operations, then wait for the Shutdown status before resuming the same thread IDs. This removes the Windows live-writer race exposed by the full nextest run.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:51 -07:00
starr-openai
6c94e20284 Make Windows realtime shell test use successful cmd echo
Use a Windows command form that exits successfully in constrained CI shells and trim the expected newline in the delegated realtime shell-tool assertion.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:51 -07:00
starr-openai
c36e625d57 Harden Windows realtime and agent resume tests
Avoid PowerShell command forms that depend on method invocation for the delegated realtime shell-tool test, and wait for a shutdown status before resuming the same subagent thread in the nickname/role restore test.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:50 -07:00
starr-openai
e22744c86e Use PowerShell literal output in sandbox tests
The legacy sandbox runs PowerShell in constrained language mode, so method calls fail and module-backed cmdlets may not autoload. Use literal string expressions for the PowerShell I/O smoke tests so they exercise process output without depending on cmdlets or method invocation.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:50 -07:00
starr-openai
7ca9eaff79 Avoid PowerShell module autoload in sandbox tests
Windows arm64 can launch pwsh in the legacy sandbox while still failing Write-Output because Microsoft.PowerShell.Utility cannot autoload. Use Console output in the legacy PowerShell smoke tests so they continue to verify sandbox process I/O without depending on module autoload.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:50 -07:00
starr-openai
f807263fee Make agent job stop cancellation atomic
A worker stop request used to record the item result and job cancellation in separate updates, so the job runner could observe the item completion first and continue spawning pending work. Commit both state updates together and prevent completion from overwriting a final cancellation.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:50 -07:00
starr-openai
cbb91954c3 Fix rollout cwd fixture import
Import the Windows-aware test_path_buf helper from core_test_support where it is defined.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:50 -07:00
starr-openai
02fde39450 Make rollout cwd fixtures drive-stable on Windows
Dev Drive setup can put temporary Codex homes on D:, which exposed test fixtures that wrote root-relative '/' rollout cwd values while assertions expected the Windows-aware C:\ root helper. Use the same test_path_buf helper when creating and expecting fake rollout cwd values so the tests remain independent of the process temp drive.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:50 -07:00
starr-openai
ef15d395b1 Give Windows arm64 tests enough CI time
Let the Windows arm64 test matrix use a longer timeout after CI showed the lane spending most of the default 45 minutes compiling before nextest could finish. Restore a guarded Dev Drive VHD provisioning attempt for GitHub-hosted runners that no longer expose D:, while preserving the C: fallback when provisioning is unavailable.

Also pin nextest through taiki-e/install-action's supported tool version syntax so the requested version is not ignored.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:50 -07:00
starr-openai
c86d50a290 Fix Windows sccache startup in rust-ci-full
Pin sccache through taiki-e/install-action's supported tool spec and explicitly start the server on Windows before Cargo/nextest invokes rustc wrappers in parallel. This avoids the Windows socket bind race observed after enabling sccache in rust-ci-full.

Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:49 -07:00
starr-openai
5baab2aade Make realtime sideband failure test deterministic
Use the existing mock server as the sideband failure endpoint instead of relying on an OS-level connection refusal from 127.0.0.1:1. Disable retries in this failure-path test so Windows CI does not spend the default retry budget before emitting the expected error/close events.\n\nCo-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:49 -07:00
starr-openai
377d163d0a Serialize Windows process-heavy nextest cases
Windows rust-ci-full repeatedly times out in subprocess-heavy tests even when the global nextest thread count is capped. Isolate the recurring Windows-only families with nextest overrides so the rest of the suite can keep normal parallelism.\n\nCo-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:49 -07:00
starr-openai
2b9f05a0f0 Add Windows nextest thread override for rust-ci-full
Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:49 -07:00
starr-openai
b9ba5d92c6 Fix Windows Dev Drive step path in rust-ci-full
Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:49 -07:00
starr-openai
42e099aff6 Make Windows Dev Drive setup use existing volumes
Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:49 -07:00
starr-openai
fb4c0f24cc Enable Windows Dev Drive and sccache in CI
Co-authored-by: Codex <noreply@openai.com>
2026-05-07 12:16:49 -07:00
14 changed files with 517 additions and 94 deletions

View File

@@ -35,6 +35,11 @@ runs:
- name: Set up Bazel
uses: bazel-contrib/setup-bazel@c5acdfb288317d0b5c0bbd7a396a3dc868bb0f86 # 0.19.0
- name: Configure Dev Drive (Windows)
if: runner.os == 'Windows'
shell: pwsh
run: ./.github/scripts/setup-dev-drive.ps1
- name: Configure Bazel repository cache
id: configure_bazel_repository_cache
shell: pwsh
@@ -42,7 +47,12 @@ runs:
# Keep the repository cache under HOME on all runners. Windows `D:\a`
# cache paths match `.bazelrc`, but `actions/cache/restore` currently
# returns HTTP 400 for that path in the Windows clippy job.
$repositoryCachePath = Join-Path $HOME '.cache/bazel-repo-cache'
$cacheRoot = if ($env:RUNNER_OS -eq 'Windows' -and $env:DEV_DRIVE) {
$env:DEV_DRIVE
} else {
$HOME
}
$repositoryCachePath = Join-Path $cacheRoot '.cache/bazel-repo-cache'
"repository-cache-path=$repositoryCachePath" | Out-File -FilePath $env:GITHUB_OUTPUT -Encoding utf8 -Append
"BAZEL_REPOSITORY_CACHE=$repositoryCachePath" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
@@ -50,11 +60,10 @@ runs:
if: runner.os == 'Windows'
shell: pwsh
run: |
# Use the shortest available drive to reduce argv/path length issues,
# but avoid the drive root because some Windows test launchers mis-handle
# MANIFEST paths there.
$hasDDrive = Test-Path 'D:\'
$bazelOutputUserRoot = if ($hasDDrive) { 'D:\b' } else { 'C:\b' }
# Keep Bazel on the fast Windows work drive, but avoid the drive root
# because some Windows test launchers mis-handle MANIFEST paths there.
$driveRoot = if ($env:DEV_DRIVE) { $env:DEV_DRIVE } elseif (Test-Path 'D:\') { 'D:' } else { 'C:' }
$bazelOutputUserRoot = Join-Path $driveRoot 'b'
$repoContentsCache = Join-Path $env:RUNNER_TEMP "bazel-repo-contents-cache-$env:GITHUB_RUN_ID-$env:GITHUB_JOB"
"BAZEL_OUTPUT_USER_ROOT=$bazelOutputUserRoot" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
"BAZEL_REPO_CONTENTS_CACHE=$repoContentsCache" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append

62
.github/scripts/setup-dev-drive.ps1 vendored Normal file
View File

@@ -0,0 +1,62 @@
# Configure a fast drive for Windows CI jobs.
#
# GitHub-hosted Windows runners do not always expose a secondary D: volume. When
# they do not, try to create a Dev Drive VHD and fall back to C: if the runner
# image does not allow that provisioning path.
function Use-FallbackDrive {
param([string]$Reason)
Write-Warning "$Reason Falling back to C:"
return "C:"
}
function Invoke-BestEffort {
param([scriptblock]$Script, [string]$Description)
try {
& $Script
} catch {
Write-Warning "$Description failed: $($_.Exception.Message)"
}
}
if (Test-Path "D:\") {
Write-Output "Using existing drive at D:"
$Drive = "D:"
} else {
try {
$VhdPath = Join-Path $env:RUNNER_TEMP "codex-dev-drive.vhdx"
$SizeBytes = 64GB
if (Test-Path $VhdPath) {
Remove-Item -Path $VhdPath -Force
}
New-VHD -Path $VhdPath -SizeBytes $SizeBytes -Dynamic -ErrorAction Stop | Out-Null
$Mounted = Mount-VHD -Path $VhdPath -Passthru -ErrorAction Stop
$Disk = $Mounted | Get-Disk -ErrorAction Stop
$Disk | Initialize-Disk -PartitionStyle GPT -ErrorAction Stop
$Partition = $Disk | New-Partition -AssignDriveLetter -UseMaximumSize -ErrorAction Stop
$Volume = $Partition | Format-Volume -FileSystem ReFS -NewFileSystemLabel "CodexDevDrive" -DevDrive -Confirm:$false -Force -ErrorAction Stop
$Drive = "$($Volume.DriveLetter):"
Invoke-BestEffort { fsutil devdrv trust $Drive } "Trusting Dev Drive $Drive"
Invoke-BestEffort { fsutil devdrv enable /disallowAv } "Disabling AV filter attachment for Dev Drives"
Invoke-BestEffort { fsutil devdrv query $Drive } "Querying Dev Drive $Drive"
Write-Output "Using Dev Drive at $Drive"
} catch {
$Drive = Use-FallbackDrive "Failed to create Dev Drive: $($_.Exception.Message)"
}
}
$Tmp = "$Drive\codex-tmp"
New-Item -Path $Tmp -ItemType Directory -Force | Out-Null
@(
"DEV_DRIVE=$Drive"
"TMP=$Tmp"
"TEMP=$Tmp"
) | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append

View File

@@ -1,10 +1,21 @@
name: rust-ci-full
run-name: >-
rust-ci-full${{
github.event_name == 'workflow_dispatch' &&
format(' windows-nextest-{0}', inputs.windows_nextest_threads) ||
''
}}
on:
push:
branches:
- main
- "**full-ci**"
workflow_dispatch:
inputs:
windows_nextest_threads:
description: "Optional nextest --test-threads override for Windows test jobs"
required: false
type: string
# CI builds in debug (dev) for faster signal.
@@ -147,7 +158,7 @@ jobs:
# Speed up repeated builds across CI runs by caching compiled objects, except on
# arm64 macOS runners cross-targeting x86_64 where ring/cc-rs can produce
# mixed-architecture archives under sccache.
USE_SCCACHE: ${{ (startsWith(matrix.runner, 'windows') || (matrix.runner == 'macos-15-xlarge' && matrix.target == 'x86_64-apple-darwin')) && 'false' || 'true' }}
USE_SCCACHE: ${{ (matrix.runner == 'macos-15-xlarge' && matrix.target == 'x86_64-apple-darwin') && 'false' || 'true' }}
CARGO_INCREMENTAL: "0"
SCCACHE_CACHE_SIZE: 10G
# In rust-ci, representative release-profile checks use thin LTO for faster feedback.
@@ -234,6 +245,10 @@ jobs:
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Configure Dev Drive (Windows)
if: ${{ runner.os == 'Windows' }}
shell: pwsh
run: ../.github/scripts/setup-dev-drive.ps1
- name: Install Linux build dependencies
if: ${{ runner.os == 'Linux' }}
shell: bash
@@ -296,8 +311,7 @@ jobs:
if: ${{ env.USE_SCCACHE == 'true' }}
uses: taiki-e/install-action@44c6d64aa62cd779e873306675c7a58e86d6d532 # v2
with:
tool: sccache
version: 0.7.5
tool: sccache@0.14.0
- name: Configure sccache backend
if: ${{ env.USE_SCCACHE == 'true' }}
@@ -309,32 +323,52 @@ jobs:
echo "Using sccache GitHub backend"
else
echo "SCCACHE_GHA_ENABLED=false" >> "$GITHUB_ENV"
echo "SCCACHE_DIR=${{ github.workspace }}/.sccache" >> "$GITHUB_ENV"
if [[ -n "${DEV_DRIVE:-}" ]]; then
echo "SCCACHE_DIR=${DEV_DRIVE}\\.sccache" >> "$GITHUB_ENV"
else
echo "SCCACHE_DIR=${{ github.workspace }}/.sccache" >> "$GITHUB_ENV"
fi
echo "Using sccache local disk + actions/cache fallback"
fi
- name: Enable sccache wrapper
if: ${{ env.USE_SCCACHE == 'true' }}
shell: bash
run: echo "RUSTC_WRAPPER=sccache" >> "$GITHUB_ENV"
run: |
set -euo pipefail
wrapper="$(command -v sccache)"
if [[ "${RUNNER_OS}" == "Windows" ]] && command -v cygpath >/dev/null 2>&1; then
wrapper="$(cygpath -w "${wrapper}")"
fi
echo "RUSTC_WRAPPER=${wrapper}" >> "$GITHUB_ENV"
echo "CARGO_BUILD_RUSTC_WRAPPER=${wrapper}" >> "$GITHUB_ENV"
- name: Restore sccache cache (fallback)
if: ${{ env.USE_SCCACHE == 'true' && env.SCCACHE_GHA_ENABLED != 'true' }}
id: cache_sccache_restore
uses: actions/cache/restore@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5
with:
path: ${{ github.workspace }}/.sccache/
path: ${{ env.SCCACHE_DIR }}
key: sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-${{ steps.lockhash.outputs.hash }}-${{ github.run_id }}
restore-keys: |
sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-${{ steps.lockhash.outputs.hash }}-
sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-
- name: Start sccache server (Windows)
if: ${{ env.USE_SCCACHE == 'true' && runner.os == 'Windows' }}
shell: bash
run: |
set -euo pipefail
sccache --start-server
sccache --show-stats
- if: ${{ matrix.target == 'x86_64-unknown-linux-musl' || matrix.target == 'aarch64-unknown-linux-musl'}}
name: Disable sccache wrapper (musl)
shell: bash
run: |
set -euo pipefail
echo "RUSTC_WRAPPER=" >> "$GITHUB_ENV"
echo "CARGO_BUILD_RUSTC_WRAPPER=" >> "$GITHUB_ENV"
echo "RUSTC_WORKSPACE_WRAPPER=" >> "$GITHUB_ENV"
- if: ${{ matrix.target == 'x86_64-unknown-linux-musl' || matrix.target == 'aarch64-unknown-linux-musl'}}
@@ -478,7 +512,7 @@ jobs:
continue-on-error: true
uses: actions/cache/save@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5
with:
path: ${{ github.workspace }}/.sccache/
path: ${{ env.SCCACHE_DIR }}
key: sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-${{ steps.lockhash.outputs.hash }}-${{ github.run_id }}
- name: sccache stats
@@ -510,10 +544,10 @@ jobs:
tests:
name: Tests — ${{ matrix.runner }} - ${{ matrix.target }}${{ matrix.remote_env == 'true' && ' (remote)' || '' }}
runs-on: ${{ matrix.runs_on || matrix.runner }}
# Perhaps we can bring this back down to 30m once we finish the cutover
# from tui_app_server/ to tui/. Incidentally, windows-arm64 was the main
# offender for exceeding the timeout.
timeout-minutes: 45
# Perhaps we can bring this back down once we finish the cutover from
# tui_app_server/ to tui/. Incidentally, windows-arm64 was the main offender
# for exceeding the timeout.
timeout-minutes: ${{ matrix.timeout_minutes || 45 }}
defaults:
run:
working-directory: codex-rs
@@ -521,9 +555,10 @@ jobs:
# Speed up repeated builds across CI runs by caching compiled objects, except on
# arm64 macOS runners cross-targeting x86_64 where ring/cc-rs can produce
# mixed-architecture archives under sccache.
USE_SCCACHE: ${{ (startsWith(matrix.runner, 'windows') || (matrix.runner == 'macos-15-xlarge' && matrix.target == 'x86_64-apple-darwin')) && 'false' || 'true' }}
USE_SCCACHE: ${{ (matrix.runner == 'macos-15-xlarge' && matrix.target == 'x86_64-apple-darwin') && 'false' || 'true' }}
CARGO_INCREMENTAL: "0"
SCCACHE_CACHE_SIZE: 10G
WINDOWS_NEXTEST_THREADS: ${{ github.event_name == 'workflow_dispatch' && inputs.windows_nextest_threads || '' }}
strategy:
fail-fast: false
@@ -554,12 +589,17 @@ jobs:
- runner: windows-arm64
target: aarch64-pc-windows-msvc
profile: dev
timeout_minutes: 75
runs_on:
group: codex-runners
labels: codex-windows-arm64
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Configure Dev Drive (Windows)
if: ${{ runner.os == 'Windows' }}
shell: pwsh
run: ../.github/scripts/setup-dev-drive.ps1
- name: Install Linux build dependencies
if: ${{ runner.os == 'Linux' }}
shell: bash
@@ -605,8 +645,7 @@ jobs:
if: ${{ env.USE_SCCACHE == 'true' }}
uses: taiki-e/install-action@44c6d64aa62cd779e873306675c7a58e86d6d532 # v2
with:
tool: sccache
version: 0.7.5
tool: sccache@0.14.0
- name: Configure sccache backend
if: ${{ env.USE_SCCACHE == 'true' }}
@@ -618,30 +657,48 @@ jobs:
echo "Using sccache GitHub backend"
else
echo "SCCACHE_GHA_ENABLED=false" >> "$GITHUB_ENV"
echo "SCCACHE_DIR=${{ github.workspace }}/.sccache" >> "$GITHUB_ENV"
if [[ -n "${DEV_DRIVE:-}" ]]; then
echo "SCCACHE_DIR=${DEV_DRIVE}\\.sccache" >> "$GITHUB_ENV"
else
echo "SCCACHE_DIR=${{ github.workspace }}/.sccache" >> "$GITHUB_ENV"
fi
echo "Using sccache local disk + actions/cache fallback"
fi
- name: Enable sccache wrapper
if: ${{ env.USE_SCCACHE == 'true' }}
shell: bash
run: echo "RUSTC_WRAPPER=sccache" >> "$GITHUB_ENV"
run: |
set -euo pipefail
wrapper="$(command -v sccache)"
if [[ "${RUNNER_OS}" == "Windows" ]] && command -v cygpath >/dev/null 2>&1; then
wrapper="$(cygpath -w "${wrapper}")"
fi
echo "RUSTC_WRAPPER=${wrapper}" >> "$GITHUB_ENV"
echo "CARGO_BUILD_RUSTC_WRAPPER=${wrapper}" >> "$GITHUB_ENV"
- name: Restore sccache cache (fallback)
if: ${{ env.USE_SCCACHE == 'true' && env.SCCACHE_GHA_ENABLED != 'true' }}
id: cache_sccache_restore
uses: actions/cache/restore@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5
with:
path: ${{ github.workspace }}/.sccache/
path: ${{ env.SCCACHE_DIR }}
key: sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-${{ steps.lockhash.outputs.hash }}-${{ github.run_id }}
restore-keys: |
sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-${{ steps.lockhash.outputs.hash }}-
sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-
- name: Start sccache server (Windows)
if: ${{ env.USE_SCCACHE == 'true' && runner.os == 'Windows' }}
shell: bash
run: |
set -euo pipefail
sccache --start-server
sccache --show-stats
- uses: taiki-e/install-action@44c6d64aa62cd779e873306675c7a58e86d6d532 # v2
with:
tool: nextest
version: 0.9.103
tool: nextest@0.9.103
- name: Enable unprivileged user namespaces (Linux)
if: runner.os == 'Linux'
@@ -666,7 +723,19 @@ jobs:
- name: tests
id: test
run: cargo nextest run --no-fail-fast --target ${{ matrix.target }} --cargo-profile ci-test --timings
shell: bash
run: |
set -euo pipefail
nextest_args=(
--no-fail-fast
--target "${{ matrix.target }}"
--cargo-profile ci-test
--timings
)
if [[ "${{ runner.os }}" == "Windows" && -n "${WINDOWS_NEXTEST_THREADS}" ]]; then
nextest_args+=(--test-threads "${WINDOWS_NEXTEST_THREADS}")
fi
cargo nextest run "${nextest_args[@]}"
env:
RUST_BACKTRACE: 1
RUST_MIN_STACK: "8388608" # 8 MiB
@@ -697,7 +766,7 @@ jobs:
continue-on-error: true
uses: actions/cache/save@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5
with:
path: ${{ github.workspace }}/.sccache/
path: ${{ env.SCCACHE_DIR }}
key: sccache-${{ matrix.runner }}-${{ matrix.target }}-${{ matrix.profile }}-${{ steps.lockhash.outputs.hash }}-${{ github.run_id }}
- name: sccache stats

View File

@@ -14,6 +14,9 @@ max-threads = 1
[test-groups.windows_sandbox_legacy_sessions]
max-threads = 1
[test-groups.windows_process_heavy]
max-threads = 1
[[profile.default.overrides]]
# Do not add new tests here
filter = 'test(rmcp_client) | test(humanlike_typing_1000_chars_appears_live_no_placeholder)'
@@ -27,6 +30,41 @@ slow-timeout = { period = "30s", terminate-after = 2 }
filter = 'package(codex-app-server-protocol) & (test(typescript_schema_fixtures_match_generated) | test(json_schema_fixtures_match_generated) | test(generate_ts_with_experimental_api_retains_experimental_entries) | test(generated_ts_optional_nullable_fields_only_in_params) | test(generate_json_filters_experimental_fields_and_methods))'
test-group = 'app_server_protocol_codegen'
[[profile.default.overrides]]
# These Windows CI tests launch full Codex/app-server process trees. They have
# repeatedly timed out when nextest schedules them alongside similar tests.
platform = 'cfg(windows)'
filter = 'package(codex-core) & kind(test) & (test(cli_stream) | test(realtime_conversation))'
test-group = 'windows_process_heavy'
threads-required = "num-test-threads"
slow-timeout = { period = "1m", terminate-after = 4 }
[[profile.default.overrides]]
# The exec resume tests spawn the CLI and touch shared session state on disk.
platform = 'cfg(windows)'
filter = 'package(codex-exec) & kind(test) & test(exec_resume)'
test-group = 'windows_process_heavy'
threads-required = "num-test-threads"
slow-timeout = { period = "1m", terminate-after = 4 }
[[profile.default.overrides]]
# Keep the specific app-server subprocess-heavy cases isolated on Windows. This
# must stay before the broader codex-app-server override below.
platform = 'cfg(windows)'
filter = 'package(codex-app-server) & kind(test) & (test(thread_fork_can_exclude_turns_and_skip_restored_token_usage) | test(turn_start_resolves_sticky_thread_environments_and_turn_overrides) | test(message_processor_tracing_tests))'
test-group = 'windows_process_heavy'
threads-required = "num-test-threads"
slow-timeout = { period = "1m", terminate-after = 4 }
[[profile.default.overrides]]
# These tests create restricted-token Windows child processes and private
# desktops. Running them alone avoids contention with other subprocess tests.
platform = 'cfg(windows)'
filter = 'package(codex-windows-sandbox) & kind(test) & test(legacy_)'
test-group = 'windows_process_heavy'
threads-required = "num-test-threads"
slow-timeout = { period = "1m", terminate-after = 4 }
[[profile.default.overrides]]
# These integration tests spawn a fresh app-server subprocess per case.
# Keep the library unit tests parallel.

View File

@@ -8,6 +8,7 @@ use codex_protocol::protocol::SessionSource;
use codex_protocol::protocol::TokenCountEvent;
use codex_protocol::protocol::TokenUsage;
use codex_protocol::protocol::TokenUsageInfo;
use core_test_support::test_path_buf;
use serde_json::json;
use std::fs;
use std::fs::FileTimes;
@@ -134,7 +135,7 @@ pub fn create_fake_rollout_with_source(
id: conversation_id,
forked_from_id: None,
timestamp: meta_rfc3339.to_string(),
cwd: PathBuf::from("/"),
cwd: test_path_buf("/"),
originator: "codex".to_string(),
cli_version: "0.0.0".to_string(),
source,
@@ -218,7 +219,7 @@ pub fn create_fake_rollout_with_text_elements(
id: conversation_id,
forked_from_id: None,
timestamp: meta_rfc3339.to_string(),
cwd: PathBuf::from("/"),
cwd: test_path_buf("/"),
originator: "codex".to_string(),
cli_version: "0.0.0".to_string(),
source: SessionSource::Cli,

View File

@@ -12,6 +12,7 @@ use codex_app_server_protocol::RequestId;
use codex_protocol::ThreadId;
use codex_protocol::protocol::SessionSource;
use codex_utils_absolute_path::AbsolutePathBuf;
use core_test_support::test_path_buf;
use pretty_assertions::assert_eq;
use std::path::Path;
use std::path::PathBuf;
@@ -35,7 +36,7 @@ fn expected_summary(conversation_id: ThreadId, path: PathBuf) -> ConversationSum
timestamp: Some(CREATED_AT_RFC3339.to_string()),
updated_at: Some(UPDATED_AT_RFC3339.to_string()),
model_provider: MODEL_PROVIDER.to_string(),
cwd: PathBuf::from("/"),
cwd: test_path_buf("/"),
cli_version: "0.0.0".to_string(),
source: SessionSource::Cli,
git_info: None,

View File

@@ -1836,7 +1836,10 @@ async fn webrtc_v2_tool_call_delegated_turn_can_execute_shell_tool() -> Result<(
};
assert_eq!(id.as_str(), "shell_call");
assert_eq!(status, CommandExecutionStatus::Completed);
assert_eq!(aggregated_output.as_deref(), Some("realtime-tool-ok"));
assert_eq!(
aggregated_output.as_deref().map(str::trim),
Some("realtime-tool-ok")
);
// Phase 3: verify the shell output reached Responses and the final delegated answer returned
// to realtime as a single function-call-output item.
@@ -2154,10 +2157,10 @@ fn realtime_tool_ok_command() -> Vec<String> {
#[cfg(windows)]
{
vec![
"powershell.exe".to_string(),
"-NoProfile".to_string(),
"-Command".to_string(),
"[Console]::Write('realtime-tool-ok')".to_string(),
"cmd.exe".to_string(),
"/D".to_string(),
"/C".to_string(),
"echo realtime-tool-ok".to_string(),
]
}

View File

@@ -239,6 +239,65 @@ async fn wait_for_live_thread_spawn_children(
.expect("expected persisted child tree");
}
async fn wait_for_agent_shutdown(
thread_id: ThreadId,
mut status_rx: tokio::sync::watch::Receiver<AgentStatus>,
) {
if matches!(status_rx.borrow().clone(), AgentStatus::Shutdown) {
return;
}
timeout(Duration::from_secs(5), async {
loop {
status_rx
.changed()
.await
.unwrap_or_else(|_| panic!("thread {thread_id} status should reach shutdown"));
if matches!(status_rx.borrow().clone(), AgentStatus::Shutdown) {
break;
}
}
})
.await
.unwrap_or_else(|_| panic!("thread {thread_id} should shut down before resume"));
}
async fn shutdown_live_agent_and_wait(control: &AgentControl, thread_id: ThreadId) {
let status_rx = control
.subscribe_status(thread_id)
.await
.expect("status subscription should succeed before shutdown");
let _ = control
.shutdown_live_agent(thread_id)
.await
.expect("thread shutdown should submit");
wait_for_agent_shutdown(thread_id, status_rx).await;
}
async fn close_agent_and_wait(
control: &AgentControl,
agent_id: ThreadId,
shutdown_ids: &[ThreadId],
) {
let mut status_rxs = Vec::with_capacity(shutdown_ids.len());
for thread_id in shutdown_ids {
status_rxs.push((
*thread_id,
control
.subscribe_status(*thread_id)
.await
.expect("status subscription should succeed before close"),
));
}
let _ = control
.close_agent(agent_id)
.await
.expect("agent close should succeed");
for (thread_id, status_rx) in status_rxs {
wait_for_agent_shutdown(thread_id, status_rx).await;
}
}
#[tokio::test]
async fn send_input_errors_when_manager_dropped() {
let control = AgentControl::default();
@@ -1626,11 +1685,9 @@ async fn resume_thread_subagent_restores_stored_nickname_and_role() {
.await
.expect("child thread metadata should be persisted to sqlite before shutdown");
let _ = harness
.control
.shutdown_live_agent(child_thread_id)
.await
.expect("child shutdown should submit");
drop(status_rx);
shutdown_live_agent_and_wait(&harness.control, child_thread_id).await;
drop(child_thread);
let resumed_thread_id = harness
.control
@@ -1699,11 +1756,8 @@ async fn resume_agent_from_rollout_reads_archived_rollout_path() {
.await
.expect("child thread should exist");
persist_thread_for_tree_resume(&child_thread, "persist before archiving").await;
let _ = harness
.control
.shutdown_live_agent(child_thread_id)
.await
.expect("child shutdown should succeed");
shutdown_live_agent_and_wait(&harness.control, child_thread_id).await;
drop(child_thread);
let store = LocalThreadStore::new(
LocalThreadStoreConfig::from_config(&harness.config),
harness.state_db.clone(),
@@ -1993,11 +2047,12 @@ async fn shutdown_agent_tree_closes_descendants_when_started_at_child() {
wait_for_live_thread_spawn_children(&harness.control, child_thread_id, &[grandchild_thread_id])
.await;
let _ = harness
.control
.close_agent(child_thread_id)
.await
.expect("child close should succeed");
close_agent_and_wait(
&harness.control,
child_thread_id,
&[child_thread_id, grandchild_thread_id],
)
.await;
let _ = harness
.control
@@ -2085,16 +2140,14 @@ async fn resume_agent_from_rollout_does_not_reopen_closed_descendants() {
wait_for_live_thread_spawn_children(&harness.control, child_thread_id, &[grandchild_thread_id])
.await;
let _ = harness
.control
.close_agent(child_thread_id)
.await
.expect("child close should succeed");
let _ = harness
.control
.shutdown_live_agent(parent_thread_id)
.await
.expect("parent shutdown should succeed");
close_agent_and_wait(
&harness.control,
child_thread_id,
&[child_thread_id, grandchild_thread_id],
)
.await;
shutdown_live_agent_and_wait(&harness.control, parent_thread_id).await;
drop(parent_thread);
let resumed_parent_thread_id = harness
.control
@@ -2180,11 +2233,12 @@ async fn resume_closed_child_reopens_open_descendants() {
wait_for_live_thread_spawn_children(&harness.control, child_thread_id, &[grandchild_thread_id])
.await;
let _ = harness
.control
.close_agent(child_thread_id)
.await
.expect("child close should succeed");
close_agent_and_wait(
&harness.control,
child_thread_id,
&[child_thread_id, grandchild_thread_id],
)
.await;
let resumed_child_thread_id = harness
.control

View File

@@ -196,6 +196,12 @@ async fn run_agent_job_loop(
)
.await?;
for item in pending_items {
let claimed = db
.mark_agent_job_item_running(job_id.as_str(), item.item_id.as_str())
.await?;
if !claimed {
continue;
}
let prompt = build_worker_prompt(&job, &item)?;
let items = vec![UserInput::Text {
text: prompt,
@@ -240,7 +246,7 @@ async fn run_agent_job_loop(
}
};
let assigned = db
.mark_agent_job_item_running_with_thread(
.set_agent_job_item_thread(
job_id.as_str(),
item.item_id.as_str(),
thread_id.to_string().as_str(),

View File

@@ -55,27 +55,31 @@ pub async fn handle(
}
let db = required_state_db(&session)?;
let reporting_thread_id = session.conversation_id.to_string();
let accepted = db
.report_agent_job_item_result(
let accepted = if args.stop.unwrap_or(false) {
db.report_agent_job_item_result_and_cancel_job(
args.job_id.as_str(),
args.item_id.as_str(),
reporting_thread_id.as_str(),
&args.result,
"cancelled by worker request",
)
.await
} else {
db.report_agent_job_item_result(
args.job_id.as_str(),
args.item_id.as_str(),
reporting_thread_id.as_str(),
&args.result,
)
.await
.map_err(|err| {
let job_id = args.job_id.as_str();
let item_id = args.item_id.as_str();
FunctionCallError::RespondToModel(format!(
"failed to record agent job result for {job_id} / {item_id}: {err}"
))
})?;
if accepted && args.stop.unwrap_or(false) {
let message = "cancelled by worker request";
let _ = db
.mark_agent_job_cancelled(args.job_id.as_str(), message)
.await;
}
.map_err(|err| {
let job_id = args.job_id.as_str();
let item_id = args.item_id.as_str();
FunctionCallError::RespondToModel(format!(
"failed to record agent job result for {job_id} / {item_id}: {err}"
))
})?;
let content =
serde_json::to_string(&ReportAgentJobResultToolResult { accepted }).map_err(|err| {
FunctionCallError::Fatal(format!(

View File

@@ -749,11 +749,13 @@ async fn conversation_webrtc_sideband_connect_failure_closes_with_error() -> Res
)
.mount(&server)
.await;
let mut builder = test_codex().with_config(|config| {
let realtime_base_url = server.uri();
let mut builder = test_codex().with_config(move |config| {
config.experimental_realtime_ws_backend_prompt = Some("backend prompt".to_string());
config.experimental_realtime_ws_model = Some("realtime-test-model".to_string());
config.experimental_realtime_ws_startup_context = Some(String::new());
config.experimental_realtime_ws_base_url = Some("http://127.0.0.1:1".to_string());
config.experimental_realtime_ws_base_url = Some(realtime_base_url);
config.model_provider.request_max_retries = Some(0);
config.realtime.version = RealtimeWsVersion::V1;
});
let test = builder.build(&server).await?;

View File

@@ -372,7 +372,16 @@ async fn wait_for_single_request(mock: &ResponseMock) -> ResponsesRequest {
async fn wait_for_file_removed(path: &Path) -> anyhow::Result<()> {
let deadline = Instant::now() + Duration::from_secs(10);
loop {
if !tokio::fs::try_exists(path).await? {
let exists = match tokio::fs::try_exists(path).await {
Ok(exists) => exists,
Err(err) if err.kind() == std::io::ErrorKind::PermissionDenied => {
// Windows can transiently deny metadata reads while another task
// is removing or resetting files in this workspace.
true
}
Err(err) => return Err(err.into()),
};
if !exists {
return Ok(());
}
assert!(

View File

@@ -227,22 +227,23 @@ WHERE id = ?
Ok(())
}
pub async fn mark_agent_job_completed(&self, job_id: &str) -> anyhow::Result<()> {
pub async fn mark_agent_job_completed(&self, job_id: &str) -> anyhow::Result<bool> {
let now = Utc::now().timestamp();
sqlx::query(
let result = sqlx::query(
r#"
UPDATE agent_jobs
SET status = ?, updated_at = ?, completed_at = ?, last_error = NULL
WHERE id = ?
WHERE id = ? AND status = ?
"#,
)
.bind(AgentJobStatus::Completed.as_str())
.bind(now)
.bind(now)
.bind(job_id)
.bind(AgentJobStatus::Running.as_str())
.execute(self.pool.as_ref())
.await?;
Ok(())
Ok(result.rows_affected() > 0)
}
pub async fn mark_agent_job_failed(
@@ -428,9 +429,46 @@ WHERE job_id = ? AND item_id = ? AND status = ?
item_id: &str,
reporting_thread_id: &str,
result_json: &Value,
) -> anyhow::Result<bool> {
self.report_agent_job_item_result_inner(
job_id,
item_id,
reporting_thread_id,
result_json,
/*cancel_job_reason*/ None,
)
.await
}
pub async fn report_agent_job_item_result_and_cancel_job(
&self,
job_id: &str,
item_id: &str,
reporting_thread_id: &str,
result_json: &Value,
cancel_job_reason: &str,
) -> anyhow::Result<bool> {
self.report_agent_job_item_result_inner(
job_id,
item_id,
reporting_thread_id,
result_json,
Some(cancel_job_reason),
)
.await
}
async fn report_agent_job_item_result_inner(
&self,
job_id: &str,
item_id: &str,
reporting_thread_id: &str,
result_json: &Value,
cancel_job_reason: Option<&str>,
) -> anyhow::Result<bool> {
let now = Utc::now().timestamp();
let serialized = serde_json::to_string(result_json)?;
let mut tx = self.pool.begin().await?;
let result = sqlx::query(
r#"
UPDATE agent_job_items
@@ -446,7 +484,7 @@ WHERE
job_id = ?
AND item_id = ?
AND status = ?
AND assigned_thread_id = ?
AND (assigned_thread_id = ? OR assigned_thread_id IS NULL)
"#,
)
.bind(AgentJobItemStatus::Completed.as_str())
@@ -458,9 +496,29 @@ WHERE
.bind(item_id)
.bind(AgentJobItemStatus::Running.as_str())
.bind(reporting_thread_id)
.execute(self.pool.as_ref())
.execute(&mut *tx)
.await?;
Ok(result.rows_affected() > 0)
let accepted = result.rows_affected() > 0;
if accepted && let Some(reason) = cancel_job_reason {
sqlx::query(
r#"
UPDATE agent_jobs
SET status = ?, updated_at = ?, completed_at = ?, last_error = ?
WHERE id = ? AND status IN (?, ?)
"#,
)
.bind(AgentJobStatus::Cancelled.as_str())
.bind(now)
.bind(now)
.bind(reason)
.bind(job_id)
.bind(AgentJobStatus::Pending.as_str())
.bind(AgentJobStatus::Running.as_str())
.execute(&mut *tx)
.await?;
}
tx.commit().await?;
Ok(accepted)
}
pub async fn mark_agent_job_item_completed(
@@ -652,6 +710,113 @@ mod tests {
Ok(())
}
#[tokio::test]
async fn report_agent_job_item_result_can_cancel_job_atomically() -> anyhow::Result<()> {
let codex_home = unique_temp_dir();
let runtime = StateRuntime::init(codex_home, "test-provider".to_string()).await?;
let (job_id, item_id, thread_id) = create_running_single_item_job(runtime.as_ref()).await?;
let accepted = runtime
.report_agent_job_item_result_and_cancel_job(
job_id.as_str(),
item_id.as_str(),
thread_id.as_str(),
&json!({"ok": true}),
"cancelled by worker request",
)
.await?;
assert!(accepted);
let job = runtime
.get_agent_job(job_id.as_str())
.await?
.expect("job should exist");
assert_eq!(job.status, AgentJobStatus::Cancelled);
assert_eq!(
job.last_error,
Some("cancelled by worker request".to_string())
);
let item = runtime
.get_agent_job_item(job_id.as_str(), item_id.as_str())
.await?
.expect("job item should exist");
assert_eq!(item.status, AgentJobItemStatus::Completed);
assert_eq!(item.result_json, Some(json!({"ok": true})));
assert_eq!(item.assigned_thread_id, None);
let completed = runtime.mark_agent_job_completed(job_id.as_str()).await?;
assert!(!completed);
let job = runtime
.get_agent_job(job_id.as_str())
.await?
.expect("job should exist");
assert_eq!(job.status, AgentJobStatus::Cancelled);
Ok(())
}
#[tokio::test]
async fn report_agent_job_item_result_accepts_unassigned_running_item() -> anyhow::Result<()> {
let codex_home = unique_temp_dir();
let runtime = StateRuntime::init(codex_home, "test-provider".to_string()).await?;
let job_id = "job-1".to_string();
let item_id = "item-1".to_string();
let thread_id = "thread-1".to_string();
runtime
.create_agent_job(
&AgentJobCreateParams {
id: job_id.clone(),
name: "test-job".to_string(),
instruction: "Return a result".to_string(),
auto_export: true,
max_runtime_seconds: None,
output_schema_json: None,
input_headers: vec!["path".to_string()],
input_csv_path: "/tmp/in.csv".to_string(),
output_csv_path: "/tmp/out.csv".to_string(),
},
&[AgentJobItemCreateParams {
item_id: item_id.clone(),
row_index: 0,
source_id: None,
row_json: json!({"path":"file-1"}),
}],
)
.await?;
runtime.mark_agent_job_running(job_id.as_str()).await?;
let marked_running = runtime
.mark_agent_job_item_running(job_id.as_str(), item_id.as_str())
.await?;
assert!(marked_running);
let accepted = runtime
.report_agent_job_item_result_and_cancel_job(
job_id.as_str(),
item_id.as_str(),
thread_id.as_str(),
&json!({"ok": true}),
"cancelled by worker request",
)
.await?;
assert!(accepted);
let job = runtime
.get_agent_job(job_id.as_str())
.await?
.expect("job should exist");
assert_eq!(job.status, AgentJobStatus::Cancelled);
let item = runtime
.get_agent_job_item(job_id.as_str(), item_id.as_str())
.await?
.expect("job item should exist");
assert_eq!(item.status, AgentJobItemStatus::Completed);
assert_eq!(item.result_json, Some(json!({"ok": true})));
assert_eq!(item.assigned_thread_id, None);
Ok(())
}
#[tokio::test]
async fn report_agent_job_item_result_rejects_late_reports() -> anyhow::Result<()> {
let codex_home = unique_temp_dir();

View File

@@ -195,7 +195,7 @@ fn legacy_non_tty_powershell_emits_output() {
pwsh.display().to_string(),
"-NoProfile".to_string(),
"-Command".to_string(),
"Write-Output LEGACY-NONTTY-DIRECT".to_string(),
"'LEGACY-NONTTY-DIRECT'".to_string(),
],
cwd.as_path(),
HashMap::new(),
@@ -378,7 +378,7 @@ fn legacy_capture_powershell_emits_output() {
pwsh.display().to_string(),
"-NoProfile".to_string(),
"-Command".to_string(),
"Write-Output LEGACY-CAPTURE-DIRECT".to_string(),
"'LEGACY-CAPTURE-DIRECT'".to_string(),
],
cwd.as_path(),
HashMap::new(),
@@ -419,7 +419,7 @@ fn legacy_tty_powershell_emits_output_and_accepts_input() {
"-NoProfile".to_string(),
"-NoExit".to_string(),
"-Command".to_string(),
"$PID; Write-Output ready".to_string(),
"$PID; 'ready'".to_string(),
],
cwd.as_path(),
HashMap::new(),
@@ -434,7 +434,7 @@ fn legacy_tty_powershell_emits_output_and_accepts_input() {
let writer = spawned.session.writer_sender();
writer
.send(b"Write-Output second\n".to_vec())
.send(b"'second'\n".to_vec())
.await
.expect("send second command");
writer