Compare commits

...

9 Commits

Author SHA1 Message Date
starr-openai
37ed4ddc98 docs: update devbox run helper for main 2026-04-03 11:30:24 -07:00
starr-openai
2e009ba30a docs: add codex applied devbox skill 2026-04-02 21:26:56 -07:00
starr-openai
6db6de031a build: fix Bazel lzma-sys wiring (#16634)
This seems to be required to fix bazel builds on an applied devbox

## Summary
- add the Bazel `xz` module
- wire `lzma-sys` directly to `@xz//:lzma` and disable its build script
- refresh `MODULE.bazel.lock`

## Validation
- `just bazel-lock-update`
- `just bazel-lock-check`
- `bazel run //codex-rs/cli:codex --run_under="cd $PWD &&" -- --version`
- `just bazel-codex --version`

Co-authored-by: Codex <noreply@openai.com>
2026-04-02 17:33:42 -07:00
Michael Bolin
beb3978a3b test: use cmd.exe for ProviderAuthScript on Windows (#16629)
## Why

The Windows `ProviderAuthScript` test helpers do not need PowerShell.
Running them through `cmd.exe` is enough to emit the next fixture token
and rotate `tokens.txt`, and it avoids a PowerShell-specific dependency
in these tests.

## What changed

- Replaced the Windows `print-token.ps1` fixtures with `print-token.cmd`
in `codex-rs/core/src/models_manager/manager_tests.rs` and
`codex-rs/login/src/auth/auth_tests.rs`.
- Switched the failing external-auth helper in
`codex-rs/login/src/auth/auth_tests.rs` from `powershell.exe -Command
'exit 1'` to `cmd.exe /d /s /c 'exit /b 1'`.
- Updated Windows timeout comments so they no longer call out PowerShell
specifically.

## Verification

- `cargo test -p codex-login`
- `cargo test -p codex-core` (fails in unrelated
`core/src/config/config_tests.rs` assertions in this checkout)
2026-04-02 17:33:07 -07:00
Michael Bolin
862158b9e9 app-server: make thread/shellCommand tests shell-aware (#16635)
## Why
`thread/shellCommand` executes the raw command string through the
current user shell, which is PowerShell on Windows. The two v2
app-server tests in `app-server/tests/suite/v2/thread_shell_command.rs`
used POSIX `printf`, so Bazel CI on Windows failed with `printf` not
being recognized as a PowerShell command.

For reference, the user-shell task wraps commands with the active shell
before execution:
[`core/src/tasks/user_shell.rs`](7a3eec6fdb/codex-rs/core/src/tasks/user_shell.rs (L120-L126)).

## What Changed
Added a test-local helper that builds a shell-appropriate output command
and expected newline sequence from `default_user_shell()`:

- PowerShell: `Write-Output '...'` with `\r\n`
- Cmd: `echo ...` with `\r\n`
- POSIX shells: `printf '%s\n' ...` with `\n`

Both `thread_shell_command_runs_as_standalone_turn_and_persists_history`
and `thread_shell_command_uses_existing_active_turn` now use that
helper.

## Verification
- `cargo test -p codex-app-server thread_shell_command`
2026-04-02 17:28:47 -07:00
Michael Bolin
cb9fb562a4 fix: address unused variable on windows (#16633)
This slipped in during https://github.com/openai/codex/pull/16578. I am
still working on getting Windows working properly with Bazel on PRs.
2026-04-02 17:05:45 -07:00
Ahmed Ibrahim
95e809c135 Auto-trust cwd on thread start (#16492)
- Persist trusted cwd state during thread/start when the resolved
sandbox is elevated.
- Add app-server coverage for trusted root resolution and confirm
turn/start does not mutate trust.
2026-04-03 00:02:56 +00:00
Michael Bolin
7a3eec6fdb core: cut codex-core compile time 48% with native async SessionTask (#16631)
## Why

This continues the compile-time cleanup from #16630. `SessionTask`
implementations are monomorphized, but `Session` stores the task behind
a `dyn` boundary so it can drive and abort heterogenous turn tasks
uniformly. That means we can move the `#[async_trait]` expansion off the
implementation trait, keep a small boxed adapter only at the storage
boundary, and preserve the existing task lifecycle semantics while
reducing the amount of generated async-trait glue in `codex-core`.

One measurement caveat showed up while exploring this: a warm
incremental benchmark based on `touch core/src/tasks/mod.rs && cargo
check -p codex-core --lib` was basically flat, but that was the wrong
benchmark for this change. Using package-clean `codex-core` rebuilds,
like #16630, shows the real win.

Relevant pre-change code:

- [`SessionTask` with
`#[async_trait]`](3c7f013f97/codex-rs/core/src/tasks/mod.rs (L129-L182))
- [`RunningTask` storing `Arc<dyn
SessionTask>`](3c7f013f97/codex-rs/core/src/state/turn.rs (L69-L77))

## What changed

- Switched `SessionTask::{run, abort}` to native RPITIT futures with
explicit `Send` bounds.
- Added a private `AnySessionTask` adapter that boxes those futures only
at the `Arc<dyn ...>` storage boundary.
- Updated `RunningTask` to store `Arc<dyn AnySessionTask>` and removed
`#[async_trait]` from the concrete task impls plus test-only
`SessionTask` impls.

## Timing

Benchmarked package-clean `codex-core` rebuilds with dependencies left
warm:

```shell
cargo check -p codex-core --lib >/dev/null
cargo clean -p codex-core >/dev/null
/usr/bin/time -p cargo +nightly rustc -p codex-core --lib -- \
  -Z time-passes \
  -Z time-passes-format=json >/dev/null
```

| revision | rustc `total` | process `real` | `generate_crate_metadata`
| `MIR_borrow_checking` | `monomorphization_collector_graph_walk` |
| --- | ---: | ---: | ---: | ---: | ---: |
| parent `3c7f013f9735` | 67.21s | 67.71s | 24.61s | 23.43s | 22.43s |
| this PR `2cafd783ac22` | 35.08s | 35.60s | 8.01s | 7.25s | 7.15s |
| delta | -47.8% | -47.4% | -67.5% | -69.1% | -68.1% |

For completeness, the warm touched-file benchmark stayed flat (`1.96s`
parent vs `1.97s` this PR), which is why that benchmark should not be
used to evaluate this refactor.

## Verification

- Ran `cargo test -p codex-core`; this change compiled and task-related
tests passed before hitting the same unrelated 5
`config::tests::*guardian*` failures already present on the parent
stack.
2026-04-02 23:39:56 +00:00
Michael Bolin
3c7f013f97 core: cut codex-core compile time 63% with native async ToolHandler (#16630)
## Why

`ToolHandler` was still paying a large compile-time tax from
`#[async_trait]` on every concrete handler impl, even though the only
object-safe boundary the registry actually stores is the internal
`AnyToolHandler` adapter.

This PR removes that macro-generated async wrapper layer from concrete
`ToolHandler` impls while keeping the existing object-safe shim in
`AnyToolHandler`. In practice, that gets essentially the same
compile-time win as the larger type-erasure refactor in #16627, but with
a much smaller diff and without changing the public shape of
`ToolHandler<Output = T>`.

That tradeoff matters here because this is a broad `codex-core` hotspot
and reviewers should be able to judge the compile-time impact from hard
numbers, not vibes.

## Headline result

On a clean `codex-core` package rebuild (`cargo clean -p codex-core`
before each command), rustc `total` dropped from **187.15s to 68.98s**
versus the shared `0bd31dc382bd` baseline: **-63.1%**.

The biggest hot passes dropped by roughly **71-72%**:

| Metric | Baseline `0bd31dc382bd` | This PR `41f7ac0adeac` | Delta |
|---|---:|---:|---:|
| `total` | 187.15s | 68.98s | **-63.1%** |
| `generate_crate_metadata` | 84.53s | 24.49s | **-71.0%** |
| `MIR_borrow_checking` | 84.13s | 24.58s | **-70.8%** |
| `monomorphization_collector_graph_walk` | 79.74s | 22.19s | **-72.2%**
|
| `evaluate_obligation` self-time | 180.62s | 46.91s | **-74.0%** |

Important caveat: `-Z time-passes` timings are nested, so
`generate_crate_metadata` and `monomorphization_collector_graph_walk`
are mostly overlapping, not additive.

## Why this PR over #16627

#16627 already proved that the `ToolHandler` stack was the right
hotspot, but it got there by making `ToolHandler` object-safe and
changing every handler to return `BoxFuture<Result<AnyToolResult, _>>`
directly.

This PR keeps the lower-churn shape:

- `ToolHandler` remains generic over `type Output`.
- Concrete handlers use native RPITIT futures with explicit `Send`
bounds.
- `AnyToolHandler` remains the only object-safe adapter and still does
the boxing at the registry boundary, as before.
- The implementation diff is only **33 files, +28/-77**.

The measurements are at least comparable, and in this run this PR is
slightly faster than #16627 on the pass-level total:

| Metric | #16627 | This PR | Delta |
|---|---:|---:|---:|
| `total` | 79.90s | 68.98s | **-13.7%** |
| `generate_crate_metadata` | 25.88s | 24.49s | **-5.4%** |
| `monomorphization_collector_graph_walk` | 23.54s | 22.19s | **-5.7%**
|
| `evaluate_obligation` self-time | 43.29s | 46.91s | +8.4% |

## Profile data

### Crate-level timings

`cargo +nightly build -p codex-core --lib -Z unstable-options
--timings=json` after `cargo clean -p codex-core`.

Baseline data below is reused from the shared parent `0bd31dc382bd`
profile because this PR and #16627 are both one commit on top of that
same parent.

| Crate | Baseline `duration` | This PR `duration` | Delta | Baseline
`rmeta_time` | This PR `rmeta_time` | Delta |
|---|---:|---:|---:|---:|---:|---:|
| `codex_core` | 187.380776583s | 69.171113833s | **-63.1%** |
174.474507208s | 55.873015583s | **-68.0%** |
| `starlark` | 17.90s | 16.773824125s | -6.3% | n/a | 8.8999965s | n/a |

### Pass-level timings

`cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
time-passes-format=json` after `cargo clean -p codex-core`.

| Pass | Baseline | This PR | Delta |
|---|---:|---:|---:|
| `total` | 187.150662083s | 68.978770375s | **-63.1%** |
| `generate_crate_metadata` | 84.531864625s | 24.487462958s | **-71.0%**
|
| `MIR_borrow_checking` | 84.131389375s | 24.575553875s | **-70.8%** |
| `monomorphization_collector_graph_walk` | 79.737515042s |
22.190207417s | **-72.2%** |
| `codegen_crate` | 12.362532292s | 12.695237625s | +2.7% |
| `type_check_crate` | 4.4765405s | 5.442019542s | +21.6% |
| `coherence_checking` | 3.311121208s | 4.239935292s | +28.0% |
| process `real` / `user` / `sys` | 187.70s / 201.87s / 4.99s | 69.52s /
85.90s / 2.92s | n/a |

### Self-profile query summary

`cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
self-profile-events=default,query-keys,args,llvm,artifact-sizes` after
`cargo clean -p codex-core`, summarized with `measureme summarize -p
0.5`.

| Query / phase | Baseline self time | This PR self time | Delta |
Baseline total time | This PR total time | Baseline item count | This PR
item count | Baseline cache hits | This PR cache hits |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| `evaluate_obligation` | 180.62s | 46.91s | **-74.0%** | 182.08s |
48.37s | 572,234 | 388,659 | 1,130,998 | 1,058,553 |
| `mir_borrowck` | 1.42s | 1.49s | +4.9% | 93.77s | 29.59s | n/a | 6,184
| n/a | 15,298 |
| `typeck` | 1.84s | 1.87s | +1.6% | 2.38s | 2.44s | n/a | 9,367 | n/a |
79,247 |
| `LLVM_module_codegen_emit_obj` | n/a | 17.12s | n/a | 17.01s | 17.12s
| n/a | 256 | n/a | 0 |
| `LLVM_passes` | n/a | 13.07s | n/a | 12.95s | 13.07s | n/a | 1 | n/a |
0 |
| `codegen_module` | n/a | 12.33s | n/a | 12.22s | 13.64s | n/a | 256 |
n/a | 0 |
| `items_of_instance` | n/a | 676.00ms | n/a | n/a | 24.96s | n/a |
99,990 | n/a | 0 |
| `type_op_prove_predicate` | n/a | 660.79ms | n/a | n/a | 24.78s | n/a
| 78,762 | n/a | 235,877 |

| Summary | Baseline | This PR |
|---|---:|---:|
| `evaluate_obligation` % of total CPU | 70.821% | 38.880% |
| self-profile total CPU time | 255.042999997s | 120.661175956s |
| process `real` / `user` / `sys` | 220.96s / 235.02s / 7.09s | 86.35s /
103.66s / 3.54s |

### Artifact sizes

From the same `measureme summarize` output:

| Artifact | Baseline | This PR | Delta |
|---|---:|---:|---:|
| `crate_metadata` | 26,534,471 bytes | 26,545,248 bytes | +10,777 |
| `dep_graph` | 253,181,425 bytes | 239,240,806 bytes | -13,940,619 |
| `linked_artifact` | 565,366,624 bytes | 562,673,176 bytes | -2,693,448
|
| `object_file` | 513,127,264 bytes | 510,464,096 bytes | -2,663,168 |
| `query_cache` | 137,440,945 bytes | 136,982,566 bytes | -458,379 |
| `cgu_instructions` | 3,586,307 bytes | 3,575,121 bytes | -11,186 |
| `codegen_unit_size_estimate` | 2,084,846 bytes | 2,078,773 bytes |
-6,073 |
| `work_product_index` | 19,565 bytes | 19,565 bytes | 0 |

### Baseline hotspots before this change

These are the top normalized obligation buckets from the shared baseline
profile:

| Obligation bucket | Samples | Duration |
|---|---:|---:|
| `outlives:tasks::review::ReviewTask` | 1,067 | 6.33s |
| `outlives:tools::handlers::unified_exec::UnifiedExecHandler` | 896 |
5.63s |
| `trait:T as tools::registry::ToolHandler` | 876 | 5.45s |
| `outlives:tools::handlers::shell::ShellHandler` | 888 | 5.37s |
| `outlives:tools::handlers::shell::ShellCommandHandler` | 870 | 5.29s |
|
`outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider`
| 637 | 3.73s |
| `outlives:tools::handlers::mcp::McpHandler` | 695 | 3.61s |
| `outlives:tasks::regular::RegularTask` | 726 | 3.57s |

Top `items_of_instance` entries before this change were mostly concrete
async handler/task impls:

| Instance | Duration |
|---|---:|
| `tasks::regular::{impl#2}::run` | 3.79s |
| `tools::handlers::mcp::{impl#0}::handle` | 3.27s |
| `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action`
| 3.09s |
| `tools::handlers::agent_jobs::{impl#11}::handle` | 3.07s |
| `tools::handlers::multi_agents::spawn::{impl#1}::handle` | 2.84s |
| `tasks::review::{impl#4}::run` | 2.82s |
| `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` | 2.80s |
| `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` |
2.73s |
| `tools::handlers::unified_exec::{impl#2}::handle` | 2.54s |
| `tasks::compact::{impl#4}::run` | 2.45s |

## What changed

Relevant pre-change registry shape:
[`codex-rs/core/src/tools/registry.rs`](0bd31dc382/codex-rs/core/src/tools/registry.rs (L38-L219))

Current registry shape in this PR:
[`codex-rs/core/src/tools/registry.rs`](41f7ac0ade/codex-rs/core/src/tools/registry.rs (L38-L203))

- `ToolHandler::{is_mutating, handle}` now return native `impl Future +
Send` futures instead of using `#[async_trait]`.
- `AnyToolHandler` remains the object-safe adapter and boxes those
futures at the registry boundary with explicit lifetimes.
- Concrete handlers and the registry test handler drop `#[async_trait]`
but otherwise keep their async method bodies intact.
- Representative examples:
[`codex-rs/core/src/tools/handlers/shell.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/shell.rs (L223-L379)),
[`codex-rs/core/src/tools/handlers/unified_exec.rs`](41f7ac0ade/codex-rs/core/src/tools/handlers/unified_exec.rs),
[`codex-rs/core/src/tools/registry_tests.rs`](41f7ac0ade/codex-rs/core/src/tools/registry_tests.rs)

## Tradeoff

This is intentionally less invasive than #16627: it does **not** move
result boxing into every concrete handler and does **not** change
`ToolHandler` into an object-safe trait.

Instead, it keeps the existing registry-level type-erasure boundary and
only removes the macro-generated async wrapper layer from concrete
impls. So the runtime boxing story stays basically the same as before,
while the compile-time savings are still large.

## Verification

Existing verification for this branch still applies:

- Ran `cargo test -p codex-core`; this change compiled and the suite
reached the known unrelated `config::tests::*guardian*` failures, with
no local diff under `codex-rs/core/src/config/`.

Profiling commands used for the tables above:

- `cargo clean -p codex-core`
- `cargo +nightly build -p codex-core --lib -Z unstable-options
--timings=json`
- `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z
time-passes-format=json`
- `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z
self-profile-events=default,query-keys,args,llvm,artifact-sizes`
- `measureme summarize -p 0.5`
2026-04-02 16:03:52 -07:00
55 changed files with 1007 additions and 166 deletions

View File

@@ -234,14 +234,18 @@ crate.annotation(
inject_repo(crate, "zlib")
# TODO(zbarsky): Enable annotation after fixing windows arm64 builds.
bazel_dep(name = "xz", version = "5.4.5.bcr.8")
crate.annotation(
crate = "lzma-sys",
gen_build_script = "on",
gen_build_script = "off",
deps = ["@xz//:lzma"],
)
bazel_dep(name = "openssl", version = "3.5.4.bcr.0")
inject_repo(crate, "xz")
crate.annotation(
build_script_data = [
"@openssl//:gen_dir",

2
MODULE.bazel.lock generated
View File

@@ -228,6 +228,8 @@
"https://bcr.bazel.build/modules/upb/0.0.0-20220923-a547704/MODULE.bazel": "7298990c00040a0e2f121f6c32544bab27d4452f80d9ce51349b1a28f3005c43",
"https://bcr.bazel.build/modules/with_cfg.bzl/0.12.0/MODULE.bazel": "b573395fe63aef4299ba095173e2f62ccfee5ad9bbf7acaa95dba73af9fc2b38",
"https://bcr.bazel.build/modules/with_cfg.bzl/0.12.0/source.json": "3f3fbaeafecaf629877ad152a2c9def21f8d330d91aa94c5dc75bbb98c10b8b8",
"https://bcr.bazel.build/modules/xz/5.4.5.bcr.8/MODULE.bazel": "e48a69bd54053c2ec5fffc2a29fb70122afd3e83ab6c07068f63bc6553fa57cc",
"https://bcr.bazel.build/modules/xz/5.4.5.bcr.8/source.json": "bd7e928ccd63505b44f4784f7bbf12cc11f9ff23bf3ca12ff2c91cd74846099e",
"https://bcr.bazel.build/modules/zlib/1.2.11/MODULE.bazel": "07b389abc85fdbca459b69e2ec656ae5622873af3f845e1c9d80fe179f3effa0",
"https://bcr.bazel.build/modules/zlib/1.3.1.bcr.5/MODULE.bazel": "eec517b5bbe5492629466e11dae908d043364302283de25581e3eb944326c4ca",
"https://bcr.bazel.build/modules/zlib/1.3.1.bcr.8/MODULE.bazel": "772c674bb78a0342b8caf32ab5c25085c493ca4ff08398208dcbe4375fe9f776",

View File

@@ -132,7 +132,7 @@ Example with notification opt-out:
## API Overview
- `thread/start` — create a new thread; emits `thread/started` (including the current `thread.status`) and auto-subscribes you to turn/item events for that thread.
- `thread/start` — create a new thread; emits `thread/started` (including the current `thread.status`) and auto-subscribes you to turn/item events for that thread. When the request includes a `cwd` and the resolved sandbox is `workspace-write` or full access, app-server also marks that project as trusted in the user `config.toml`.
- `thread/resume` — reopen an existing thread by id so subsequent `turn/start` calls append to it.
- `thread/fork` — fork an existing thread into a new thread id by copying the stored history; if the source thread is currently mid-turn, the fork records the same interruption marker as `turn/interrupt` instead of inheriting an unmarked partial turn suffix. The returned `thread.forkedFromId` points at the source thread when known. Accepts `ephemeral: true` for an in-memory temporary fork, emits `thread/started` (including the current `thread.status`), and auto-subscribes you to turn/item events for the new thread.
- `thread/list` — page through stored rollouts; supports cursor-based pagination and optional `modelProviders`, `sourceKinds`, `archived`, `cwd`, and `searchTerm` filters. Each returned `thread` includes `status` (`ThreadStatus`), defaulting to `notLoaded` when the thread is not currently loaded.

View File

@@ -235,6 +235,7 @@ use codex_features::Feature;
use codex_features::Stage;
use codex_feedback::CodexFeedback;
use codex_git_utils::git_diff_to_remote;
use codex_git_utils::resolve_root_git_project_for_trust;
use codex_login::AuthManager;
use codex_login::AuthMode as CoreAuthMode;
use codex_login::CLIENT_ID;
@@ -255,6 +256,7 @@ use codex_protocol::ThreadId;
use codex_protocol::config_types::CollaborationMode;
use codex_protocol::config_types::ForcedLoginMethod;
use codex_protocol::config_types::Personality;
use codex_protocol::config_types::TrustLevel;
use codex_protocol::config_types::WindowsSandboxLevel;
use codex_protocol::dynamic_tools::DynamicToolSpec as CoreDynamicToolSpec;
use codex_protocol::items::TurnItem;
@@ -2190,10 +2192,11 @@ impl CodexMessageProcessor {
experimental_raw_events: bool,
request_trace: Option<W3cTraceContext>,
) {
let config = match derive_config_from_params(
let requested_cwd = typesafe_overrides.cwd.clone();
let mut config = match derive_config_from_params(
&cli_overrides,
config_overrides,
typesafe_overrides,
config_overrides.clone(),
typesafe_overrides.clone(),
&cloud_requirements,
&listener_task_context.codex_home,
&runtime_feature_enablement,
@@ -2211,6 +2214,56 @@ impl CodexMessageProcessor {
}
};
if requested_cwd.is_some()
&& !config.active_project.is_trusted()
&& matches!(
config.permissions.sandbox_policy.get(),
codex_protocol::protocol::SandboxPolicy::WorkspaceWrite { .. }
| codex_protocol::protocol::SandboxPolicy::DangerFullAccess
| codex_protocol::protocol::SandboxPolicy::ExternalSandbox { .. }
)
{
let trust_target = resolve_root_git_project_for_trust(config.cwd.as_path())
.unwrap_or_else(|| config.cwd.to_path_buf());
if let Err(err) = codex_core::config::set_project_trust_level(
&listener_task_context.codex_home,
trust_target.as_path(),
TrustLevel::Trusted,
) {
let error = JSONRPCErrorError {
code: INTERNAL_ERROR_CODE,
message: format!("failed to persist trusted project state: {err}"),
data: None,
};
listener_task_context
.outgoing
.send_error(request_id, error)
.await;
return;
}
config = match derive_config_from_params(
&cli_overrides,
config_overrides,
typesafe_overrides,
&cloud_requirements,
&listener_task_context.codex_home,
&runtime_feature_enablement,
)
.await
{
Ok(config) => config,
Err(err) => {
let error = config_load_error(&err);
listener_task_context
.outgoing
.send_error(request_id, error)
.await;
return;
}
};
}
let dynamic_tools = dynamic_tools.unwrap_or_default();
let core_dynamic_tools = if dynamic_tools.is_empty() {
Vec::new()

View File

@@ -26,6 +26,7 @@ use codex_app_server_protocol::TurnCompletedNotification;
use codex_app_server_protocol::TurnStartParams;
use codex_app_server_protocol::TurnStartResponse;
use codex_app_server_protocol::UserInput as V2UserInput;
use codex_core::shell::default_user_shell;
use codex_features::FEATURES;
use codex_features::Feature;
use pretty_assertions::assert_eq;
@@ -67,11 +68,12 @@ async fn thread_shell_command_runs_as_standalone_turn_and_persists_history() ->
)
.await??;
let ThreadStartResponse { thread, .. } = to_response::<ThreadStartResponse>(start_resp)?;
let (shell_command, expected_output) = current_shell_output_command("hello from bang")?;
let shell_id = mcp
.send_thread_shell_command_request(ThreadShellCommandParams {
thread_id: thread.id.clone(),
command: "printf 'hello from bang\\n'".to_string(),
command: shell_command,
})
.await?;
let shell_resp: JSONRPCResponse = timeout(
@@ -93,7 +95,7 @@ async fn thread_shell_command_runs_as_standalone_turn_and_persists_history() ->
assert_eq!(status, &CommandExecutionStatus::InProgress);
let delta = wait_for_command_execution_output_delta(&mut mcp, &command_id).await?;
assert_eq!(delta.delta, "hello from bang\n");
assert_eq!(delta.delta, expected_output);
let completed = wait_for_command_execution_completed(&mut mcp, Some(&command_id)).await?;
let ThreadItem::CommandExecution {
@@ -110,7 +112,7 @@ async fn thread_shell_command_runs_as_standalone_turn_and_persists_history() ->
assert_eq!(id, &command_id);
assert_eq!(source, &CommandExecutionSource::UserShell);
assert_eq!(status, &CommandExecutionStatus::Completed);
assert_eq!(aggregated_output.as_deref(), Some("hello from bang\n"));
assert_eq!(aggregated_output.as_deref(), Some(expected_output.as_str()));
assert_eq!(*exit_code, Some(0));
timeout(
@@ -147,7 +149,7 @@ async fn thread_shell_command_runs_as_standalone_turn_and_persists_history() ->
};
assert_eq!(source, &CommandExecutionSource::UserShell);
assert_eq!(status, &CommandExecutionStatus::Completed);
assert_eq!(aggregated_output.as_deref(), Some("hello from bang\n"));
assert_eq!(aggregated_output.as_deref(), Some(expected_output.as_str()));
Ok(())
}
@@ -196,6 +198,7 @@ async fn thread_shell_command_uses_existing_active_turn() -> Result<()> {
)
.await??;
let ThreadStartResponse { thread, .. } = to_response::<ThreadStartResponse>(start_resp)?;
let (shell_command, expected_output) = current_shell_output_command("active turn bang")?;
let turn_id = mcp
.send_turn_start_request(TurnStartParams {
@@ -240,7 +243,7 @@ async fn thread_shell_command_uses_existing_active_turn() -> Result<()> {
let shell_id = mcp
.send_thread_shell_command_request(ThreadShellCommandParams {
thread_id: thread.id.clone(),
command: "printf 'active turn bang\\n'".to_string(),
command: shell_command,
})
.await?;
let shell_resp: JSONRPCResponse = timeout(
@@ -269,7 +272,7 @@ async fn thread_shell_command_uses_existing_active_turn() -> Result<()> {
unreachable!("helper returns command execution item");
};
assert_eq!(source, &CommandExecutionSource::UserShell);
assert_eq!(aggregated_output.as_deref(), Some("active turn bang\n"));
assert_eq!(aggregated_output.as_deref(), Some(expected_output.as_str()));
mcp.send_response(
request_id,
@@ -309,7 +312,7 @@ async fn thread_shell_command_uses_existing_active_turn() -> Result<()> {
source: CommandExecutionSource::UserShell,
aggregated_output,
..
} if aggregated_output.as_deref() == Some("active turn bang\n")
} if aggregated_output.as_deref() == Some(expected_output.as_str())
)
}),
"expected active-turn shell command to be persisted on the existing turn"
@@ -318,6 +321,24 @@ async fn thread_shell_command_uses_existing_active_turn() -> Result<()> {
Ok(())
}
fn current_shell_output_command(text: &str) -> Result<(String, String)> {
let command_and_output = match default_user_shell().name() {
"powershell" => {
let escaped_text = text.replace('\'', "''");
(
format!("Write-Output '{escaped_text}'"),
format!("{text}\r\n"),
)
}
"cmd" => (format!("echo {text}"), format!("{text}\r\n")),
_ => {
let quoted_text = shlex::try_quote(text)?;
(format!("printf '%s\\n' {quoted_text}"), format!("{text}\n"))
}
};
Ok(command_and_output)
}
async fn wait_for_command_execution_started(
mcp: &mut McpProcess,
expected_id: Option<&str>,

View File

@@ -4,12 +4,14 @@ use app_test_support::McpProcess;
use app_test_support::create_mock_responses_server_repeating_assistant;
use app_test_support::to_response;
use app_test_support::write_chatgpt_auth;
use codex_app_server_protocol::AskForApproval;
use codex_app_server_protocol::JSONRPCError;
use codex_app_server_protocol::JSONRPCMessage;
use codex_app_server_protocol::JSONRPCResponse;
use codex_app_server_protocol::McpServerStartupState;
use codex_app_server_protocol::McpServerStatusUpdatedNotification;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::SandboxMode;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ThreadStartParams;
use codex_app_server_protocol::ThreadStartResponse;
@@ -17,6 +19,7 @@ use codex_app_server_protocol::ThreadStartedNotification;
use codex_app_server_protocol::ThreadStatus;
use codex_app_server_protocol::ThreadStatusChangedNotification;
use codex_core::config::set_project_trust_level;
use codex_git_utils::resolve_root_git_project_for_trust;
use codex_login::AuthCredentialsStoreMode;
use codex_login::REFRESH_TOKEN_URL_OVERRIDE_ENV_VAR;
use codex_protocol::config_types::ServiceTier;
@@ -48,7 +51,7 @@ async fn thread_start_creates_thread_and_emits_started() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml(codex_home.path(), &server.uri())?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
// Start server and initialize.
let mut mcp = McpProcess::new(codex_home.path()).await?;
@@ -231,7 +234,7 @@ async fn thread_start_respects_project_config_from_cwd() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml(codex_home.path(), &server.uri())?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let workspace = TempDir::new()?;
let project_config_dir = workspace.path().join(".codex");
@@ -272,7 +275,7 @@ async fn thread_start_accepts_flex_service_tier() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml(codex_home.path(), &server.uri())?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
@@ -300,7 +303,7 @@ async fn thread_start_accepts_metrics_service_name() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml(codex_home.path(), &server.uri())?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
@@ -327,7 +330,7 @@ async fn thread_start_accepts_metrics_service_name() -> Result<()> {
async fn thread_start_ephemeral_remains_pathless() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml(codex_home.path(), &server.uri())?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
@@ -584,16 +587,210 @@ async fn thread_start_surfaces_cloud_requirements_load_errors() -> Result<()> {
Ok(())
}
// Helper to create a config.toml pointing at the mock model server.
fn create_config_toml(codex_home: &Path, server_uri: &str) -> std::io::Result<()> {
#[tokio::test]
async fn thread_start_with_elevated_sandbox_trusts_project_and_followup_loads_project_config()
-> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let workspace = TempDir::new()?;
let project_config_dir = workspace.path().join(".codex");
std::fs::create_dir_all(&project_config_dir)?;
std::fs::write(
project_config_dir.join("config.toml"),
r#"
model_reasoning_effort = "high"
"#,
)?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
let first_request = mcp
.send_thread_start_request(ThreadStartParams {
cwd: Some(workspace.path().display().to_string()),
sandbox: Some(SandboxMode::WorkspaceWrite),
..Default::default()
})
.await?;
timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(first_request)),
)
.await??;
let second_request = mcp
.send_thread_start_request(ThreadStartParams {
cwd: Some(workspace.path().display().to_string()),
..Default::default()
})
.await?;
let second_response: JSONRPCResponse = timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(second_request)),
)
.await??;
let ThreadStartResponse {
approval_policy,
reasoning_effort,
..
} = to_response::<ThreadStartResponse>(second_response)?;
assert_eq!(approval_policy, AskForApproval::OnRequest);
assert_eq!(reasoning_effort, Some(ReasoningEffort::High));
let config_toml = std::fs::read_to_string(codex_home.path().join("config.toml"))?;
let trusted_root = resolve_root_git_project_for_trust(workspace.path())
.unwrap_or_else(|| workspace.path().to_path_buf());
assert!(config_toml.contains(&trusted_root.display().to_string()));
assert!(config_toml.contains("trust_level = \"trusted\""));
Ok(())
}
#[tokio::test]
async fn thread_start_with_nested_git_cwd_trusts_repo_root() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let repo_root = TempDir::new()?;
std::fs::create_dir(repo_root.path().join(".git"))?;
let nested = repo_root.path().join("nested/project");
std::fs::create_dir_all(&nested)?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
let request_id = mcp
.send_thread_start_request(ThreadStartParams {
cwd: Some(nested.display().to_string()),
sandbox: Some(SandboxMode::WorkspaceWrite),
..Default::default()
})
.await?;
timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(request_id)),
)
.await??;
let config_toml = std::fs::read_to_string(codex_home.path().join("config.toml"))?;
let trusted_root =
resolve_root_git_project_for_trust(&nested).expect("git root should resolve");
assert!(config_toml.contains(&trusted_root.display().to_string()));
assert!(!config_toml.contains(&nested.display().to_string()));
Ok(())
}
#[tokio::test]
async fn thread_start_with_read_only_sandbox_does_not_persist_project_trust() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let workspace = TempDir::new()?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
let request_id = mcp
.send_thread_start_request(ThreadStartParams {
cwd: Some(workspace.path().display().to_string()),
..Default::default()
})
.await?;
timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(request_id)),
)
.await??;
let config_toml = std::fs::read_to_string(codex_home.path().join("config.toml"))?;
assert!(!config_toml.contains("trust_level = \"trusted\""));
assert!(!config_toml.contains(&workspace.path().display().to_string()));
Ok(())
}
#[tokio::test]
async fn thread_start_skips_trust_write_when_project_is_already_trusted() -> Result<()> {
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
create_config_toml_without_approval_policy(codex_home.path(), &server.uri())?;
let workspace = TempDir::new()?;
let project_config_dir = workspace.path().join(".codex");
std::fs::create_dir_all(&project_config_dir)?;
std::fs::write(
project_config_dir.join("config.toml"),
r#"
model_reasoning_effort = "high"
"#,
)?;
set_project_trust_level(codex_home.path(), workspace.path(), TrustLevel::Trusted)?;
let config_before = std::fs::read_to_string(codex_home.path().join("config.toml"))?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
let request_id = mcp
.send_thread_start_request(ThreadStartParams {
cwd: Some(workspace.path().display().to_string()),
sandbox: Some(SandboxMode::WorkspaceWrite),
..Default::default()
})
.await?;
let response: JSONRPCResponse = timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(request_id)),
)
.await??;
let ThreadStartResponse {
approval_policy,
reasoning_effort,
..
} = to_response::<ThreadStartResponse>(response)?;
assert_eq!(approval_policy, AskForApproval::OnRequest);
assert_eq!(reasoning_effort, Some(ReasoningEffort::High));
let config_after = std::fs::read_to_string(codex_home.path().join("config.toml"))?;
assert_eq!(config_after, config_before);
Ok(())
}
fn create_config_toml_without_approval_policy(
codex_home: &Path,
server_uri: &str,
) -> std::io::Result<()> {
create_config_toml_with_optional_approval_policy(
codex_home, server_uri, /*approval_policy*/ None,
)
}
fn create_config_toml_with_optional_approval_policy(
codex_home: &Path,
server_uri: &str,
approval_policy: Option<&str>,
) -> std::io::Result<()> {
let config_toml = codex_home.join("config.toml");
let approval_policy = approval_policy
.map(|policy| format!("approval_policy = \"{policy}\"\n"))
.unwrap_or_default();
std::fs::write(
config_toml,
format!(
r#"
model = "mock-model"
approval_policy = "never"
sandbox_mode = "read-only"
{approval_policy}sandbox_mode = "read-only"
model_provider = "mock_provider"

View File

@@ -2523,6 +2523,67 @@ async fn command_execution_notifications_include_process_id() -> Result<()> {
Ok(())
}
#[tokio::test]
async fn turn_start_with_elevated_override_does_not_persist_project_trust() -> Result<()> {
let responses = vec![create_final_assistant_message_sse_response("Done")?];
let server = create_mock_responses_server_sequence_unchecked(responses).await;
let codex_home = TempDir::new()?;
create_config_toml(
codex_home.path(),
&server.uri(),
"never",
&BTreeMap::from([(Feature::Personality, true)]),
)?;
let workspace = TempDir::new()?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
let thread_request = mcp
.send_thread_start_request(ThreadStartParams {
cwd: Some(workspace.path().display().to_string()),
..Default::default()
})
.await?;
let thread_response: JSONRPCResponse = timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(thread_request)),
)
.await??;
let ThreadStartResponse { thread, .. } = to_response::<ThreadStartResponse>(thread_response)?;
let turn_request = mcp
.send_turn_start_request(TurnStartParams {
thread_id: thread.id,
cwd: Some(workspace.path().to_path_buf()),
sandbox_policy: Some(codex_app_server_protocol::SandboxPolicy::DangerFullAccess),
input: vec![V2UserInput::Text {
text: "Hello".to_string(),
text_elements: Vec::new(),
}],
..Default::default()
})
.await?;
timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(turn_request)),
)
.await??;
timeout(
DEFAULT_READ_TIMEOUT,
mcp.read_stream_until_notification_message("turn/completed"),
)
.await??;
let config_toml = std::fs::read_to_string(codex_home.path().join("config.toml"))?;
assert!(!config_toml.contains("trust_level = \"trusted\""));
assert!(!config_toml.contains(&workspace.path().display().to_string()));
Ok(())
}
// Helper to create a config.toml pointing at the mock model server.
fn create_config_toml(
codex_home: &Path,

View File

@@ -3119,7 +3119,6 @@ async fn spawn_task_turn_span_inherits_dispatch_trace_context() {
captured_trace: Arc<std::sync::Mutex<Option<W3cTraceContext>>>,
}
#[async_trait::async_trait]
impl SessionTask for TraceCaptureTask {
fn kind(&self) -> TaskKind {
TaskKind::Regular
@@ -4375,7 +4374,6 @@ struct NeverEndingTask {
listen_to_cancellation_token: bool,
}
#[async_trait::async_trait]
impl SessionTask for NeverEndingTask {
fn kind(&self) -> TaskKind {
self.kind

View File

@@ -112,10 +112,12 @@ impl ProviderAuthScript {
fn new(tokens: &[&str]) -> std::io::Result<Self> {
let tempdir = tempfile::tempdir()?;
let tokens_file = tempdir.path().join("tokens.txt");
// `cmd.exe`'s `set /p` treats LF-only input as one line, so use CRLF on Windows.
let token_line_ending = if cfg!(windows) { "\r\n" } else { "\n" };
let mut token_file_contents = String::new();
for token in tokens {
token_file_contents.push_str(token);
token_file_contents.push('\n');
token_file_contents.push_str(token_line_ending);
}
std::fs::write(&tokens_file, token_file_contents)?;
@@ -142,23 +144,28 @@ mv tokens.next tokens.txt
#[cfg(windows)]
let (command, args) = {
let script_path = tempdir.path().join("print-token.ps1");
let script_path = tempdir.path().join("print-token.cmd");
std::fs::write(
&script_path,
r#"$lines = @(Get-Content -Path tokens.txt)
if ($lines.Count -eq 0) { exit 1 }
Write-Output $lines[0]
$lines | Select-Object -Skip 1 | Set-Content -Path tokens.txt
r#"@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "first_line="
<tokens.txt set /p "first_line="
if not defined first_line exit /b 1
setlocal EnableDelayedExpansion
echo(!first_line!
endlocal
more +1 tokens.txt > tokens.next
move /y tokens.next tokens.txt >nul
"#,
)?;
(
"powershell".to_string(),
"cmd.exe".to_string(),
vec![
"-NoProfile".to_string(),
"-ExecutionPolicy".to_string(),
"Bypass".to_string(),
"-File".to_string(),
".\\print-token.ps1".to_string(),
"/d".to_string(),
"/s".to_string(),
"/c".to_string(),
".\\print-token.cmd".to_string(),
],
)
};
@@ -172,7 +179,7 @@ $lines | Select-Object -Skip 1 | Set-Content -Path tokens.txt
fn auth_config(&self) -> ModelProviderAuthInfo {
let timeout_ms = if cfg!(windows) {
// `powershell.exe` startup can be slow on loaded Windows CI workers
// Process startup can be slow on loaded Windows CI workers.
10_000
} else {
2_000

View File

@@ -18,7 +18,7 @@ use rmcp::model::RequestId;
use tokio::sync::oneshot;
use crate::codex::TurnContext;
use crate::tasks::SessionTask;
use crate::tasks::AnySessionTask;
use codex_protocol::models::PermissionProfile;
use codex_protocol::protocol::ReviewDecision;
use codex_protocol::protocol::TokenUsage;
@@ -69,7 +69,7 @@ pub(crate) enum TaskKind {
pub(crate) struct RunningTask {
pub(crate) done: Arc<Notify>,
pub(crate) kind: TaskKind,
pub(crate) task: Arc<dyn SessionTask>,
pub(crate) task: Arc<dyn AnySessionTask>,
pub(crate) cancellation_token: CancellationToken,
pub(crate) handle: Arc<AbortOnDropHandle<()>>,
pub(crate) turn_context: Arc<TurnContext>,

View File

@@ -4,14 +4,12 @@ use super::SessionTask;
use super::SessionTaskContext;
use crate::codex::TurnContext;
use crate::state::TaskKind;
use async_trait::async_trait;
use codex_protocol::user_input::UserInput;
use tokio_util::sync::CancellationToken;
#[derive(Clone, Copy, Default)]
pub(crate) struct CompactTask;
#[async_trait]
impl SessionTask for CompactTask {
fn kind(&self) -> TaskKind {
TaskKind::Compact
@@ -30,14 +28,14 @@ impl SessionTask for CompactTask {
) -> Option<String> {
let session = session.clone_session();
let _ = if crate::compact::should_use_remote_compact_task(&ctx.provider) {
let _ = session.services.session_telemetry.counter(
session.services.session_telemetry.counter(
"codex.task.compact",
/*inc*/ 1,
&[("type", "remote")],
);
crate::compact_remote::run_remote_compact_task(session.clone(), ctx).await
} else {
let _ = session.services.session_telemetry.counter(
session.services.session_telemetry.counter(
"codex.task.compact",
/*inc*/ 1,
&[("type", "local")],

View File

@@ -2,7 +2,6 @@ use crate::codex::TurnContext;
use crate::state::TaskKind;
use crate::tasks::SessionTask;
use crate::tasks::SessionTaskContext;
use async_trait::async_trait;
use codex_git_utils::CreateGhostCommitOptions;
use codex_git_utils::GhostSnapshotReport;
use codex_git_utils::GitToolingError;
@@ -26,7 +25,6 @@ pub(crate) struct GhostSnapshotTask {
const SNAPSHOT_WARNING_THRESHOLD: Duration = Duration::from_secs(240);
#[async_trait]
impl SessionTask for GhostSnapshotTask {
fn kind(&self) -> TaskKind {
TaskKind::Regular

View File

@@ -9,7 +9,7 @@ use std::sync::Arc;
use std::time::Duration;
use std::time::Instant;
use async_trait::async_trait;
use futures::future::BoxFuture;
use tokio::select;
use tokio::sync::Notify;
use tokio_util::sync::CancellationToken;
@@ -126,7 +126,6 @@ impl SessionTaskContext {
/// intentionally small: implementers identify themselves via
/// [`SessionTask::kind`], perform their work in [`SessionTask::run`], and may
/// release resources in [`SessionTask::abort`].
#[async_trait]
pub(crate) trait SessionTask: Send + Sync + 'static {
/// Describes the type of work the task performs so the session can
/// surface it in telemetry and UI.
@@ -143,21 +142,84 @@ pub(crate) trait SessionTask: Send + Sync + 'static {
/// abort; implementers should watch for it and terminate quickly once it
/// fires. Returning [`Some`] yields a final message that
/// [`Session::on_task_finished`] will emit to the client.
async fn run(
fn run(
self: Arc<Self>,
session: Arc<SessionTaskContext>,
ctx: Arc<TurnContext>,
input: Vec<UserInput>,
cancellation_token: CancellationToken,
) -> Option<String>;
) -> impl std::future::Future<Output = Option<String>> + Send;
/// Gives the task a chance to perform cleanup after an abort.
///
/// The default implementation is a no-op; override this if additional
/// teardown or notifications are required once
/// [`Session::abort_all_tasks`] cancels the task.
async fn abort(&self, session: Arc<SessionTaskContext>, ctx: Arc<TurnContext>) {
let _ = (session, ctx);
fn abort(
&self,
session: Arc<SessionTaskContext>,
ctx: Arc<TurnContext>,
) -> impl std::future::Future<Output = ()> + Send {
async move {
let _ = (session, ctx);
}
}
}
pub(crate) trait AnySessionTask: Send + Sync + 'static {
fn kind(&self) -> TaskKind;
fn span_name(&self) -> &'static str;
fn run(
self: Arc<Self>,
session: Arc<SessionTaskContext>,
ctx: Arc<TurnContext>,
input: Vec<UserInput>,
cancellation_token: CancellationToken,
) -> BoxFuture<'static, Option<String>>;
fn abort<'a>(
&'a self,
session: Arc<SessionTaskContext>,
ctx: Arc<TurnContext>,
) -> BoxFuture<'a, ()>;
}
impl<T> AnySessionTask for T
where
T: SessionTask,
{
fn kind(&self) -> TaskKind {
SessionTask::kind(self)
}
fn span_name(&self) -> &'static str {
SessionTask::span_name(self)
}
fn run(
self: Arc<Self>,
session: Arc<SessionTaskContext>,
ctx: Arc<TurnContext>,
input: Vec<UserInput>,
cancellation_token: CancellationToken,
) -> BoxFuture<'static, Option<String>> {
Box::pin(SessionTask::run(
self,
session,
ctx,
input,
cancellation_token,
))
}
fn abort<'a>(
&'a self,
session: Arc<SessionTaskContext>,
ctx: Arc<TurnContext>,
) -> BoxFuture<'a, ()> {
Box::pin(SessionTask::abort(self, session, ctx))
}
}
@@ -179,7 +241,7 @@ impl Session {
input: Vec<UserInput>,
task: T,
) {
let task: Arc<dyn SessionTask> = Arc::new(task);
let task: Arc<dyn AnySessionTask> = Arc::new(task);
let task_kind = task.kind();
let span_name = task.span_name();
let started_at = Instant::now();

View File

@@ -1,6 +1,5 @@
use std::sync::Arc;
use async_trait::async_trait;
use tokio_util::sync::CancellationToken;
use crate::codex::TurnContext;
@@ -25,7 +24,6 @@ impl RegularTask {
}
}
#[async_trait]
impl SessionTask for RegularTask {
fn kind(&self) -> TaskKind {
TaskKind::Regular

View File

@@ -1,7 +1,6 @@
use std::borrow::Cow;
use std::sync::Arc;
use async_trait::async_trait;
use codex_protocol::config_types::WebSearchMode;
use codex_protocol::items::TurnItem;
use codex_protocol::models::ContentItem;
@@ -48,7 +47,6 @@ impl ReviewTask {
}
}
#[async_trait]
impl SessionTask for ReviewTask {
fn kind(&self) -> TaskKind {
TaskKind::Review
@@ -65,7 +63,7 @@ impl SessionTask for ReviewTask {
input: Vec<UserInput>,
cancellation_token: CancellationToken,
) -> Option<String> {
let _ = session.session.services.session_telemetry.counter(
session.session.services.session_telemetry.counter(
"codex.task.review",
/*inc*/ 1,
&[],

View File

@@ -4,7 +4,6 @@ use crate::codex::TurnContext;
use crate::state::TaskKind;
use crate::tasks::SessionTask;
use crate::tasks::SessionTaskContext;
use async_trait::async_trait;
use codex_git_utils::RestoreGhostCommitOptions;
use codex_git_utils::restore_ghost_commit_with_options;
use codex_protocol::models::ResponseItem;
@@ -25,7 +24,6 @@ impl UndoTask {
}
}
#[async_trait]
impl SessionTask for UndoTask {
fn kind(&self) -> TaskKind {
TaskKind::Regular
@@ -42,11 +40,11 @@ impl SessionTask for UndoTask {
_input: Vec<UserInput>,
cancellation_token: CancellationToken,
) -> Option<String> {
let _ = session.session.services.session_telemetry.counter(
"codex.task.undo",
/*inc*/ 1,
&[],
);
session
.session
.services
.session_telemetry
.counter("codex.task.undo", /*inc*/ 1, &[]);
let sess = session.clone_session();
sess.send_event(
ctx.as_ref(),

View File

@@ -1,7 +1,6 @@
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use codex_async_utils::CancelErr;
use codex_async_utils::OrCancelExt;
use codex_protocol::user_input::UserInput;
@@ -62,7 +61,6 @@ impl UserShellCommandTask {
}
}
#[async_trait]
impl SessionTask for UserShellCommandTask {
fn kind(&self) -> TaskKind {
TaskKind::Regular

View File

@@ -1,5 +1,3 @@
use async_trait::async_trait;
use crate::function_tool::FunctionCallError;
use crate::tools::context::FunctionToolOutput;
use crate::tools::context::ToolInvocation;
@@ -53,7 +51,6 @@ impl CodeModeExecuteHandler {
}
}
#[async_trait]
impl ToolHandler for CodeModeExecuteHandler {
type Output = FunctionToolOutput;

View File

@@ -1,4 +1,3 @@
use async_trait::async_trait;
use serde::Deserialize;
use crate::function_tool::FunctionCallError;
@@ -39,7 +38,6 @@ where
})
}
#[async_trait]
impl ToolHandler for CodeModeWaitHandler {
type Output = FunctionToolOutput;

View File

@@ -13,7 +13,6 @@ use crate::tools::handlers::multi_agents::build_agent_spawn_config;
use crate::tools::handlers::parse_arguments;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
use async_trait::async_trait;
use codex_protocol::ThreadId;
use codex_protocol::protocol::AgentStatus;
use codex_protocol::protocol::SessionSource;
@@ -178,7 +177,6 @@ impl JobProgressEmitter {
}
}
#[async_trait]
impl ToolHandler for BatchJobHandler {
type Output = FunctionToolOutput;

View File

@@ -21,7 +21,6 @@ use crate::tools::registry::ToolKind;
use crate::tools::runtimes::apply_patch::ApplyPatchRequest;
use crate::tools::runtimes::apply_patch::ApplyPatchRuntime;
use crate::tools::sandboxing::ToolCtx;
use async_trait::async_trait;
use codex_apply_patch::ApplyPatchAction;
use codex_apply_patch::ApplyPatchFileChange;
use codex_protocol::models::FileSystemPermissions;
@@ -122,7 +121,6 @@ async fn effective_patch_permissions(
)
}
#[async_trait]
impl ToolHandler for ApplyPatchHandler {
type Output = ApplyPatchToolOutput;

View File

@@ -7,7 +7,6 @@ use crate::tools::context::ToolPayload;
use crate::tools::handlers::parse_arguments;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
use async_trait::async_trait;
use codex_protocol::dynamic_tools::DynamicToolCallRequest;
use codex_protocol::dynamic_tools::DynamicToolResponse;
use codex_protocol::models::FunctionCallOutputContentItem;
@@ -20,7 +19,6 @@ use tracing::warn;
pub struct DynamicToolHandler;
#[async_trait]
impl ToolHandler for DynamicToolHandler {
type Output = FunctionToolOutput;

View File

@@ -1,4 +1,3 @@
use async_trait::async_trait;
use serde_json::Value as JsonValue;
use std::sync::Arc;
use std::time::Duration;
@@ -92,7 +91,6 @@ async fn emit_js_repl_exec_end(
};
emitter.emit(ctx, stage).await;
}
#[async_trait]
impl ToolHandler for JsReplHandler {
type Output = FunctionToolOutput;
@@ -182,7 +180,6 @@ impl ToolHandler for JsReplHandler {
}
}
#[async_trait]
impl ToolHandler for JsReplResetHandler {
type Output = FunctionToolOutput;

View File

@@ -4,7 +4,6 @@ use std::fs::FileType;
use std::path::Path;
use std::path::PathBuf;
use async_trait::async_trait;
use codex_utils_string::take_bytes_at_char_boundary;
use serde::Deserialize;
use tokio::fs;
@@ -45,7 +44,6 @@ struct ListDirArgs {
depth: usize,
}
#[async_trait]
impl ToolHandler for ListDirHandler {
type Output = FunctionToolOutput;

View File

@@ -1,4 +1,3 @@
use async_trait::async_trait;
use std::sync::Arc;
use crate::function_tool::FunctionCallError;
@@ -10,7 +9,6 @@ use crate::tools::registry::ToolKind;
use codex_protocol::mcp::CallToolResult;
pub struct McpHandler;
#[async_trait]
impl ToolHandler for McpHandler {
type Output = CallToolResult;

View File

@@ -3,7 +3,6 @@ use std::sync::Arc;
use std::time::Duration;
use std::time::Instant;
use async_trait::async_trait;
use codex_protocol::mcp::CallToolResult;
use codex_protocol::models::function_call_output_content_items_to_text;
use rmcp::model::ListResourceTemplatesResult;
@@ -178,7 +177,6 @@ struct ReadResourcePayload {
result: ReadResourceResult,
}
#[async_trait]
impl ToolHandler for McpResourceHandler {
type Output = FunctionToolOutput;

View File

@@ -17,7 +17,6 @@ pub(crate) use crate::tools::handlers::multi_agents_common::*;
use crate::tools::handlers::parse_arguments;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
use async_trait::async_trait;
use codex_protocol::ThreadId;
use codex_protocol::models::ResponseInputItem;
use codex_protocol::openai_models::ReasoningEffort;

View File

@@ -2,7 +2,6 @@ use super::*;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = CloseAgentResult;

View File

@@ -4,7 +4,6 @@ use std::sync::Arc;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = ResumeAgentResult;

View File

@@ -3,7 +3,6 @@ use crate::agent::control::render_input_preview;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = SendInputResult;

View File

@@ -10,7 +10,6 @@ use crate::agent::next_thread_spawn_depth;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = SpawnAgentResult;

View File

@@ -14,7 +14,6 @@ use tokio::time::timeout_at;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = WaitAgentResult;

View File

@@ -115,7 +115,6 @@ fn history_contains_inter_agent_communication(
#[derive(Clone, Copy)]
struct NeverEndingTask;
#[async_trait::async_trait]
impl SessionTask for NeverEndingTask {
fn kind(&self) -> TaskKind {
TaskKind::Regular

View File

@@ -11,7 +11,6 @@ use crate::tools::handlers::multi_agents_common::*;
use crate::tools::handlers::parse_arguments;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
use async_trait::async_trait;
use codex_protocol::AgentPath;
use codex_protocol::models::ResponseInputItem;
use codex_protocol::openai_models::ReasoningEffort;

View File

@@ -2,7 +2,6 @@ use super::*;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = CloseAgentResult;

View File

@@ -6,7 +6,6 @@ use super::*;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = MessageToolResult;

View File

@@ -3,7 +3,6 @@ use crate::agent::control::ListedAgent;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = ListAgentsResult;

View File

@@ -6,7 +6,6 @@ use super::*;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = MessageToolResult;

View File

@@ -11,7 +11,6 @@ use codex_protocol::protocol::Op;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = SpawnAgentResult;

View File

@@ -6,7 +6,6 @@ use tokio::time::timeout_at;
pub(crate) struct Handler;
#[async_trait]
impl ToolHandler for Handler {
type Output = WaitAgentResult;

View File

@@ -6,7 +6,6 @@ use crate::tools::context::ToolOutput;
use crate::tools::context::ToolPayload;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
use async_trait::async_trait;
use codex_protocol::config_types::ModeKind;
use codex_protocol::models::FunctionCallOutputPayload;
use codex_protocol::models::ResponseInputItem;
@@ -44,7 +43,6 @@ impl ToolOutput for PlanToolOutput {
}
}
#[async_trait]
impl ToolHandler for PlanHandler {
type Output = PlanToolOutput;

View File

@@ -1,4 +1,3 @@
use async_trait::async_trait;
use codex_protocol::request_permissions::RequestPermissionsArgs;
use codex_sandboxing::policy_transforms::normalize_additional_permissions;
@@ -12,7 +11,6 @@ use crate::tools::registry::ToolKind;
pub struct RequestPermissionsHandler;
#[async_trait]
impl ToolHandler for RequestPermissionsHandler {
type Output = FunctionToolOutput;

View File

@@ -5,7 +5,6 @@ use crate::tools::context::ToolPayload;
use crate::tools::handlers::parse_arguments;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
use async_trait::async_trait;
use codex_protocol::request_user_input::RequestUserInputArgs;
use codex_tools::REQUEST_USER_INPUT_TOOL_NAME;
use codex_tools::normalize_request_user_input_args;
@@ -15,7 +14,6 @@ pub struct RequestUserInputHandler {
pub default_mode_request_user_input: bool,
}
#[async_trait]
impl ToolHandler for RequestUserInputHandler {
type Output = FunctionToolOutput;

View File

@@ -1,4 +1,3 @@
use async_trait::async_trait;
use codex_protocol::ThreadId;
use codex_protocol::models::ShellCommandToolCallParams;
use codex_protocol::models::ShellToolCallParams;
@@ -178,7 +177,6 @@ impl From<ShellCommandBackendConfig> for ShellCommandHandler {
}
}
#[async_trait]
impl ToolHandler for ShellHandler {
type Output = FunctionToolOutput;
@@ -279,7 +277,6 @@ impl ToolHandler for ShellHandler {
}
}
#[async_trait]
impl ToolHandler for ShellCommandHandler {
type Output = FunctionToolOutput;

View File

@@ -4,7 +4,6 @@ use std::sync::Arc;
use std::sync::OnceLock;
use std::time::Duration;
use async_trait::async_trait;
use serde::Deserialize;
use tokio::sync::Barrier;
use tokio::time::sleep;
@@ -54,7 +53,6 @@ fn barrier_map() -> &'static tokio::sync::Mutex<HashMap<String, BarrierState>> {
BARRIERS.get_or_init(|| tokio::sync::Mutex::new(HashMap::new()))
}
#[async_trait]
impl ToolHandler for TestSyncHandler {
type Output = FunctionToolOutput;

View File

@@ -4,7 +4,6 @@ use crate::tools::context::ToolPayload;
use crate::tools::context::ToolSearchOutput;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
use async_trait::async_trait;
use bm25::Document;
use bm25::Language;
use bm25::SearchEngineBuilder;
@@ -25,7 +24,6 @@ impl ToolSearchHandler {
}
}
#[async_trait]
impl ToolHandler for ToolSearchHandler {
type Output = ToolSearchOutput;

View File

@@ -1,6 +1,5 @@
use std::collections::HashSet;
use async_trait::async_trait;
use codex_app_server_protocol::AppInfo;
use codex_mcp::mcp::CODEX_APPS_MCP_SERVER_NAME;
use codex_rmcp_client::ElicitationAction;
@@ -28,7 +27,6 @@ use crate::tools::registry::ToolKind;
pub struct ToolSuggestHandler;
#[async_trait]
impl ToolHandler for ToolSuggestHandler {
type Output = FunctionToolOutput;

View File

@@ -22,7 +22,6 @@ use crate::unified_exec::ExecCommandRequest;
use crate::unified_exec::UnifiedExecContext;
use crate::unified_exec::UnifiedExecProcessManager;
use crate::unified_exec::WriteStdinRequest;
use async_trait::async_trait;
use codex_features::Feature;
use codex_otel::SessionTelemetry;
use codex_otel::metrics::names::TOOL_CALL_UNIFIED_EXEC_METRIC;
@@ -86,7 +85,6 @@ fn default_tty() -> bool {
false
}
#[async_trait]
impl ToolHandler for UnifiedExecHandler {
type Output = ExecCommandToolOutput;

View File

@@ -1,4 +1,3 @@
use async_trait::async_trait;
use codex_protocol::models::FunctionCallOutputBody;
use codex_protocol::models::FunctionCallOutputContentItem;
use codex_protocol::models::FunctionCallOutputPayload;
@@ -37,7 +36,6 @@ enum ViewImageDetail {
Original,
}
#[async_trait]
impl ToolHandler for ViewImageHandler {
type Output = ViewImageOutput;

View File

@@ -13,7 +13,6 @@ use crate::tools::context::FunctionToolOutput;
use crate::tools::context::ToolInvocation;
use crate::tools::context::ToolOutput;
use crate::tools::context::ToolPayload;
use async_trait::async_trait;
use codex_hooks::HookEvent;
use codex_hooks::HookEventAfterToolUse;
use codex_hooks::HookPayload;
@@ -26,6 +25,7 @@ use codex_protocol::protocol::SandboxPolicy;
use codex_tools::ConfiguredToolSpec;
use codex_tools::ToolSpec;
use codex_utils_readiness::Readiness;
use futures::future::BoxFuture;
use serde_json::Value;
use tracing::warn;
@@ -35,7 +35,6 @@ pub enum ToolKind {
Mcp,
}
#[async_trait]
pub trait ToolHandler: Send + Sync {
type Output: ToolOutput + 'static;
@@ -54,8 +53,11 @@ pub trait ToolHandler: Send + Sync {
/// user (through file system, OS operations, ...).
/// This function must remains defensive and return `true` if a doubt exist on the
/// exact effect of a ToolInvocation.
async fn is_mutating(&self, _invocation: &ToolInvocation) -> bool {
false
fn is_mutating(
&self,
_invocation: &ToolInvocation,
) -> impl std::future::Future<Output = bool> + Send {
async { false }
}
fn pre_tool_use_payload(&self, _invocation: &ToolInvocation) -> Option<PreToolUsePayload> {
@@ -73,7 +75,10 @@ pub trait ToolHandler: Send + Sync {
/// Perform the actual [ToolInvocation] and returns a [ToolOutput] containing
/// the final output to return to the model.
async fn handle(&self, invocation: ToolInvocation) -> Result<Self::Output, FunctionCallError>;
fn handle(
&self,
invocation: ToolInvocation,
) -> impl std::future::Future<Output = Result<Self::Output, FunctionCallError>> + Send;
}
pub(crate) struct AnyToolResult {
@@ -112,11 +117,10 @@ pub(crate) struct PostToolUsePayload {
pub(crate) tool_response: Value,
}
#[async_trait]
trait AnyToolHandler: Send + Sync {
fn matches_kind(&self, payload: &ToolPayload) -> bool;
async fn is_mutating(&self, invocation: &ToolInvocation) -> bool;
fn is_mutating<'a>(&'a self, invocation: &'a ToolInvocation) -> BoxFuture<'a, bool>;
fn pre_tool_use_payload(&self, invocation: &ToolInvocation) -> Option<PreToolUsePayload>;
@@ -127,13 +131,12 @@ trait AnyToolHandler: Send + Sync {
result: &dyn ToolOutput,
) -> Option<PostToolUsePayload>;
async fn handle_any(
&self,
fn handle_any<'a>(
&'a self,
invocation: ToolInvocation,
) -> Result<AnyToolResult, FunctionCallError>;
) -> BoxFuture<'a, Result<AnyToolResult, FunctionCallError>>;
}
#[async_trait]
impl<T> AnyToolHandler for T
where
T: ToolHandler,
@@ -142,8 +145,8 @@ where
ToolHandler::matches_kind(self, payload)
}
async fn is_mutating(&self, invocation: &ToolInvocation) -> bool {
ToolHandler::is_mutating(self, invocation).await
fn is_mutating<'a>(&'a self, invocation: &'a ToolInvocation) -> BoxFuture<'a, bool> {
Box::pin(ToolHandler::is_mutating(self, invocation))
}
fn pre_tool_use_payload(&self, invocation: &ToolInvocation) -> Option<PreToolUsePayload> {
@@ -159,17 +162,19 @@ where
ToolHandler::post_tool_use_payload(self, call_id, payload, result)
}
async fn handle_any(
&self,
fn handle_any<'a>(
&'a self,
invocation: ToolInvocation,
) -> Result<AnyToolResult, FunctionCallError> {
let call_id = invocation.call_id.clone();
let payload = invocation.payload.clone();
let output = self.handle(invocation).await?;
Ok(AnyToolResult {
call_id,
payload,
result: Box::new(output),
) -> BoxFuture<'a, Result<AnyToolResult, FunctionCallError>> {
Box::pin(async move {
let call_id = invocation.call_id.clone();
let payload = invocation.payload.clone();
let output = self.handle(invocation).await?;
Ok(AnyToolResult {
call_id,
payload,
result: Box::new(output),
})
})
}
}

View File

@@ -1,10 +1,8 @@
use super::*;
use async_trait::async_trait;
use pretty_assertions::assert_eq;
struct TestHandler;
#[async_trait]
impl ToolHandler for TestHandler {
type Output = crate::tools::context::FunctionToolOutput;

View File

@@ -355,10 +355,12 @@ impl ProviderAuthScript {
fn new(tokens: &[&str]) -> std::io::Result<Self> {
let tempdir = tempfile::tempdir()?;
let token_file = tempdir.path().join("tokens.txt");
// `cmd.exe`'s `set /p` treats LF-only input as one line, so use CRLF on Windows.
let token_line_ending = if cfg!(windows) { "\r\n" } else { "\n" };
let mut token_file_contents = String::new();
for token in tokens {
token_file_contents.push_str(token);
token_file_contents.push('\n');
token_file_contents.push_str(token_line_ending);
}
std::fs::write(&token_file, token_file_contents)?;
@@ -385,23 +387,28 @@ mv tokens.next tokens.txt
#[cfg(windows)]
let (command, args) = {
let script_path = tempdir.path().join("print-token.ps1");
let script_path = tempdir.path().join("print-token.cmd");
std::fs::write(
&script_path,
r#"$lines = @(Get-Content -Path tokens.txt)
if ($lines.Count -eq 0) { exit 1 }
Write-Output $lines[0]
$lines | Select-Object -Skip 1 | Set-Content -Path tokens.txt
r#"@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "first_line="
<tokens.txt set /p "first_line="
if not defined first_line exit /b 1
setlocal EnableDelayedExpansion
echo(!first_line!
endlocal
more +1 tokens.txt > tokens.next
move /y tokens.next tokens.txt >nul
"#,
)?;
(
"powershell.exe".to_string(),
"cmd.exe".to_string(),
vec![
"-NoProfile".to_string(),
"-ExecutionPolicy".to_string(),
"Bypass".to_string(),
"-File".to_string(),
".\\print-token.ps1".to_string(),
"/d".to_string(),
"/s".to_string(),
"/c".to_string(),
".\\print-token.cmd".to_string(),
],
)
};
@@ -436,13 +443,12 @@ exit 1
#[cfg(windows)]
let (command, args) = (
"powershell.exe".to_string(),
"cmd.exe".to_string(),
vec![
"-NoProfile".to_string(),
"-ExecutionPolicy".to_string(),
"Bypass".to_string(),
"-Command".to_string(),
"exit 1".to_string(),
"/d".to_string(),
"/s".to_string(),
"/c".to_string(),
"exit /b 1".to_string(),
],
);
@@ -457,8 +463,8 @@ exit 1
serde_json::from_value(json!({
"command": self.command,
"args": self.args,
// `powershell.exe` startup can be slow on loaded Windows CI workers, so leave enough
// slack to avoid turning these auth-cache assertions into a process-launch timing test.
// Process startup can be slow on loaded Windows CI workers, so leave enough slack to
// avoid turning these auth-cache assertions into a process-launch timing test.
"timeout_ms": 10_000,
"refresh_interval_ms": 60000,
"cwd": self.tempdir.path(),

View File

@@ -569,11 +569,10 @@ impl Tui {
terminal.invalidate_viewport();
}
let area = terminal.viewport_area;
// Update the y position for suspending so Ctrl-Z can place the cursor correctly.
#[cfg(unix)]
{
let area = terminal.viewport_area;
let inline_area_bottom = if self.alt_screen_active.load(Ordering::Relaxed) {
self.alt_saved_viewport
.map(|r| r.bottom().saturating_sub(1))

View File

@@ -0,0 +1,328 @@
---
name: codex-applied-devbox
description: Sync a local Codex worktree from `~/code/codex-worktrees/` to a mirrored path on a remote host, then run a reproducible remote build or exec command there.
---
# Codex Applied Devbox
Use this skill when you want local file editing/search on your laptop, but want
the actual build or execution to happen on a remote host such as `dev`.
This skill assumes:
- remote host alias: `dev`
- local Codex worktree root: `~/code/codex-worktrees`
- remote mirror root: `/tmp/codex-worktrees`
If the box itself needs to be created, resumed, suspended, or inspected, use
the `applied-devbox` skill first.
## Objective
1. Create or reuse a local worktree under `~/code/codex-worktrees/`.
2. Mirror that worktree to the remote host under `/tmp/codex-worktrees/`.
3. Run one configurable remote Bazel command against the mirrored copy.
4. Keep the flow reproducible by excluding build artifacts and local repo state.
## Operator Defaults
When using this skill interactively, the operator should bias toward immediate
execution over setup-heavy preflights.
Default posture:
- If the user asks for a specific PR or branch, create a fresh worktree first.
- Do not spend time checking whether an equivalent worktree already exists
unless the user explicitly asked to reuse one.
- Assume `dev` is reachable and run the sync directly; only debug SSH or remote
prereqs after the real command fails.
- Avoid separate "can I reach the host?" or "does rsync exist remotely?"
checks unless there is a known problem pattern.
- Prefer one end-to-end attempt over multiple speculative probes.
In practice, that means the operator should usually do this:
1. Fetch the requested PR or ref.
2. Create a new local worktree under `~/code/codex-worktrees/`.
3. Run `sync-worktree-and-run` immediately.
4. Only inspect host reachability, missing tools, or conflicting paths if that
end-to-end run fails.
### PR Fast Path
For a request like "build PR 16620 on devbox", prefer this shape:
```bash
mkdir -p ~/code/codex-worktrees
git -C ~/code/codex fetch origin pull/16620/head
git -C ~/code/codex worktree add -b pr-16620 \
~/code/codex-worktrees/pr-16620 FETCH_HEAD
skills/codex-applied-devbox/scripts/sync-worktree-and-run \
~/code/codex-worktrees/pr-16620
```
This is intentionally direct. It skips separate validation steps and lets the
real sync/build path prove whether the environment is healthy.
## Key rule for concurrent builds
- Keep each worktree as its own Bazel workspace path.
- Let Bazel derive a separate `output_base` per worktree automatically.
- Reuse the shared caches from `.bazelrc`:
- `~/.cache/bazel-disk-cache`
- `~/.cache/bazel-repo-cache`
- `~/.cache/bazel-repo-contents-cache`
- Do not force a shared `--output_base` across two live worktrees.
On `dev`, this has already been validated with two mirrored worktrees:
- both builds started at the same second
- each worktree got its own Bazel server and `output_base`
- both builds reused shared cache state and completed successfully
## Script
The script lives at:
`skills/codex-applied-devbox/scripts/sync-worktree-and-run`
Default behavior:
- host: `dev`
- local worktree root: `~/code/codex-worktrees`
- remote mirror root: `/tmp/codex-worktrees`
- remote command:
`cd codex-rs && export PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH && bazel build --bes_backend= --bes_results_url= //codex-rs/cli:cli`
- prints the exact copy-paste remote Codex Bazel run command for the mirrored checkout
- the helper command intentionally stays on the mirrored-worktree Bazel path and uses `//codex-rs/cli:codex`
Example:
```bash
skills/codex-applied-devbox/scripts/sync-worktree-and-run \
~/code/codex-worktrees/my-feature
```
This will mirror:
- local: `~/code/codex-worktrees/my-feature`
- remote: `/tmp/codex-worktrees/my-feature`
It will print:
```bash
ssh -t dev 'cd /tmp/codex-worktrees/my-feature/codex-rs && export PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH && bazel run --bes_backend= --bes_results_url= //codex-rs/cli:codex --'
```
Custom host, remote root, and command:
```bash
skills/codex-applied-devbox/scripts/sync-worktree-and-run \
--host dev \
--remote-root /tmp/codex-worktrees \
--command 'cd codex-rs && export PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH && bazel build --bes_backend= --bes_results_url= //codex-rs/tui:tui' \
~/code/codex-worktrees/my-feature
```
## Recommended setup
1. Create the local worktree from your main Codex checkout.
```bash
mkdir -p ~/code/codex-worktrees
git -C ~/code/codex worktree add -b my-feature \
~/code/codex-worktrees/my-feature origin/main
```
2. Edit locally or fetch the PR/ref you want to test.
3. Sync and build remotely immediately:
```bash
skills/codex-applied-devbox/scripts/sync-worktree-and-run \
~/code/codex-worktrees/my-feature
```
4. Repeat sync/build as needed after local edits.
## Retrospective Notes
The main friction in a real run was not rsync itself. It was operator delay
before the first real attempt:
- checking whether a matching worktree already existed before simply creating
the one needed for the task
- verifying host reachability before letting the real sync prove it
- remembering the mirrored remote path after the sync
- hand-writing the SSH command needed to run Codex in that mirrored checkout
- waiting through a cold Bazel build with no simple "jump in here yourself"
command printed by the script
The current script update addresses the third and fourth issues by printing the
exact `ssh -t ...` command for running Codex in the mirrored checkout.
This skill update addresses the first two issues by telling the operator to
start the end-to-end flow sooner and only investigate after an actual failure.
The next improvements worth making, if you want this flow to feel faster and
more automatic, are:
- add `--pr <number>` so the script can fetch `pull/<n>/head` and create or
reuse `~/code/codex-worktrees/pr-<n>` itself
- add `--tmux-window <name>` support so the remote command can start in a named
tmux session/window and print the exact follow/log command
- add an optional "sync only changed files" mode driven by git status or
`git diff --name-only` for large worktrees
- add an optional `--bazel-target <label>` shortcut so users do not have to
remember the common labels
## Validated run paths on `dev`
What has been verified:
- `sync-worktree-and-run` can mirror the local worktree and complete a remote
Bazel build with:
`bazel build --bes_backend= --bes_results_url= //codex-rs/cli:cli`
- on current `main`, `bazel run --bes_backend= --bes_results_url=
//codex-rs/cli:codex --` builds successfully on `dev`
Practical note:
- older pre-`#16634` checkouts could fail on `dev` when launching
`//codex-rs/cli:codex`; treat current `main` as the baseline before carrying
that older caveat forward
## Bazel defaults on the devbox
Use this decision rule:
- Default to Bazel for remote builds in mirrored worktrees.
- Keep the existing `.bazelrc` cache settings; they already share the useful
cache layers across worktrees.
- On `dev`, clear the BES flags for routine builds:
`--bes_backend= --bes_results_url=`
- Prepend both common Bazel locations to `PATH`:
`export PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH`
- Prefer labels that have already been validated on the host:
- `//codex-rs/cli:cli`
- `//codex-rs/tui:tui`
- `//codex-rs/utils/absolute-path:absolute-path`
Current practical note:
- older pre-`#16634` checkouts could fail on `dev` when launching
`//codex-rs/cli:codex`; re-test current `main` before treating that older
caveat as still active
What is shared versus isolated:
- Shared across worktrees:
- `~/.cache/bazel-disk-cache`
- `~/.cache/bazel-repo-cache`
- `~/.cache/bazel-repo-contents-cache`
- the Bazel install base under `~/.cache/bazel/_bazel_dev-user/install`
- Still per worktree:
- each `output_base`
- each Bazel server
- mutable workspace-specific state under
`~/.cache/bazel/_bazel_dev-user/<hash>`
That means this setup saves disk space compared with giving every worktree its
own completely separate Bazel root, but it does not eliminate the large
per-worktree `output_base` directories.
## Fresh default devbox bootstrap
This was validated against a fresh box created with a temporary minimal config
override, not your personal `~/.config/applied-devbox/config.toml`.
Validated sequence:
1. Create a minimal config file locally and point `APPLIED_DEVBOX_CONFIG` at it.
An empty file is enough if you want the CLI's built-in defaults without your
personal apt/git/custom-setup additions.
2. Create the box:
```bash
APPLIED_DEVBOX_CONFIG=/tmp/applied-devbox-default-config.toml \
a devbox new codex-bazel-0402-1800 \
--sku cpu64 \
--home-size 2Ti \
--skip-secret-setup \
--skip-tool-setup
```
If you expect large Bazel output trees or long-lived mirrored worktrees, prefer
`--sku cpu64 --home-size 2Ti` over the smaller defaults.
3. If the first `a devbox ssh` fails on websocket transport, establish
connectivity with:
```bash
APPLIED_DEVBOX_CONFIG=/tmp/applied-devbox-default-config.toml \
a devbox ssh codex-bazel-0402-1800 --no-ws --no-tmux -- bash -lc 'hostname && whoami'
```
After that, direct `ssh codex-bazel-0402-1800` was available on this machine.
4. Install `rsync` once on the new box:
```bash
ssh codex-bazel-0402-1800 'sudo apt-get update && sudo apt-get install -y rsync'
```
5. Run the mirrored Bazel build:
```bash
skills/codex-applied-devbox/scripts/sync-worktree-and-run \
--host codex-bazel-0402-1800 \
~/code/codex-worktrees/my-feature
```
What was validated on the fresh box:
- the box came up from a default-style config override
- the first websocket-based SSH attempt failed, but `--no-ws` succeeded
- plain `ssh <box>` worked after the first successful `--no-ws` connection
- `rsync` was the only package that had to be installed manually
- Bazel was already available from the default OpenAI clone at
`~/code/openai/project/dotslash-gen/bin`
- the first mirrored `//codex-rs/cli:cli` build completed successfully in
`68.24s`
## Sync exclusions
The script excludes:
- `.git`
- `.sl`
- `.jj`
- `target`
- `node_modules`
- `.venv`, `venv`
- `dist`, `build`, `.next`
- `.pytest_cache`, `.mypy_cache`, `__pycache__`, `.ruff_cache`
- `.DS_Store`
## Cleanup
Remove a stale remote mirror:
```bash
ssh dev 'rm -rf /tmp/codex-worktrees/my-feature'
```
Remove the local worktree when finished:
```bash
git -C ~/code/codex worktree remove ~/code/codex-worktrees/my-feature
git -C ~/code/codex branch -D my-feature
```
## Guardrails
- Treat the local worktree as the editing source of truth.
- Treat the mirrored remote copy as disposable build state.
- Do not sync `.git` or build outputs.
- Keep the local worktree under `~/code/codex-worktrees/` so the mirror path is
stable and easy to clean up.

View File

@@ -0,0 +1,165 @@
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'EOF'
Usage:
sync-worktree-and-run [options] <local-worktree>
Sync a local Codex worktree to a mirrored path on a remote host, then run a
command there.
Options:
--host <host> Remote SSH host. Default: dev
--local-root <path> Expected local worktree root.
Default: ~/code/codex-worktrees
--remote-root <path> Remote mirror root.
Default: /tmp/codex-worktrees
--command <command> Command to run on the remote copy.
Default: cd codex-rs &&
export PATH=$HOME/code/openai/project/dotslash-gen/bin:
$HOME/.local/bin:$PATH &&
bazel build --bes_backend= --bes_results_url=
//codex-rs/cli:cli
Prints the exact Bazel-backed Codex SSH run command for the mirrored
checkout on every run.
-h, --help Show this help text.
EOF
}
shell_single_quote() {
local value="$1"
value=${value//\'/\'\"\'\"\'}
printf "'%s'" "$value"
}
host="dev"
local_root="$HOME/code/codex-worktrees"
remote_root="/tmp/codex-worktrees"
command_to_run='cd codex-rs && export PATH=$HOME/code/openai/project/dotslash-gen/bin:$HOME/.local/bin:$PATH && bazel build --bes_backend= --bes_results_url= //codex-rs/cli:cli'
local_worktree=""
while [[ $# -gt 0 ]]; do
case "$1" in
--host)
host="$2"
shift 2
;;
--local-root)
local_root="$2"
shift 2
;;
--remote-root)
remote_root="$2"
shift 2
;;
--command)
command_to_run="$2"
shift 2
;;
-h|--help)
usage
exit 0
;;
-*)
echo "unknown option: $1" >&2
usage >&2
exit 2
;;
*)
if [[ -n "$local_worktree" ]]; then
echo "expected exactly one local worktree path" >&2
usage >&2
exit 2
fi
local_worktree="$1"
shift
;;
esac
done
if [[ -z "$local_worktree" ]]; then
echo "missing local worktree path" >&2
usage >&2
exit 2
fi
if [[ ! -d "$local_worktree" ]]; then
echo "local worktree does not exist: $local_worktree" >&2
exit 1
fi
if [[ ! -d "$local_root" ]]; then
echo "local root does not exist: $local_root" >&2
exit 1
fi
local_root_abs="$(cd "$local_root" && pwd -P)"
local_worktree_abs="$(cd "$local_worktree" && pwd -P)"
case "$local_worktree_abs/" in
"$local_root_abs"/*)
relative_path="${local_worktree_abs#$local_root_abs/}"
;;
*)
echo "local worktree must live under local root: $local_root_abs" >&2
exit 1
;;
esac
remote_worktree="${remote_root%/}/$relative_path"
remote_codex_dir="${remote_worktree%/}/codex-rs"
remote_codex_run_command="cd $remote_codex_dir && export PATH=\$HOME/code/openai/project/dotslash-gen/bin:\$HOME/.local/bin:\$PATH && bazel run --bes_backend= --bes_results_url= //codex-rs/cli:codex --"
echo "# Shared-worktree Bazel Codex run command:"
echo "ssh -t $host $(shell_single_quote "$remote_codex_run_command")"
if ! command -v rsync >/dev/null 2>&1; then
echo "local rsync is not installed or not on PATH" >&2
exit 1
fi
if ! ssh "$host" 'command -v rsync >/dev/null 2>&1'; then
echo "remote rsync is not installed on $host" >&2
echo "try: ssh $host 'sudo apt-get update && sudo apt-get install -y rsync'" >&2
exit 1
fi
ssh "$host" bash -s -- "$remote_worktree" <<'EOF'
set -euo pipefail
remote_worktree="$1"
mkdir -p "$remote_worktree"
EOF
rsync -a --delete \
--exclude='.git' \
--exclude='.sl' \
--exclude='.jj' \
--exclude='target' \
--exclude='node_modules' \
--exclude='.venv' \
--exclude='venv' \
--exclude='dist' \
--exclude='build' \
--exclude='.next' \
--exclude='.pytest_cache' \
--exclude='.mypy_cache' \
--exclude='__pycache__' \
--exclude='.ruff_cache' \
--exclude='.DS_Store' \
-e ssh \
"$local_worktree_abs/" \
"$host:$remote_worktree/"
printf -v remote_worktree_q '%q' "$remote_worktree"
printf -v command_to_run_q '%q' "$command_to_run"
ssh "$host" "bash -s" <<EOF
set -euo pipefail
remote_worktree=$remote_worktree_q
command_to_run=$command_to_run_q
cd "\$remote_worktree"
echo "REMOTE_PWD=\$PWD"
eval "\$command_to_run"
EOF