## Why
We were seeing failures in the following tests as part of trying to get
all the tests running under Bazel on Windows in CI
(https://github.com/openai/codex/pull/16528):
```
suite::shell_command::unicode_output::with_login
suite::shell_command::unicode_output::without_login
```
Certainly `PATHEXT` should have been included in the extra `CORE_VARS`
list, so we fix that up here, but also take things a step further for
now by forcibly ensuring it is set on Windows in the return value of
`create_env()`. Once we get the Windows Bazel build working reliably
(i.e., after #16528 is merged), we should come back to this and confirm
we can remove the special case in `create_env()`.
## What
- Split core env inheritance into `COMMON_CORE_VARS` plus
platform-specific allowlists for Windows and Unix in
[`exec_env.rs`](1b55c88fbf/codex-rs/core/src/exec_env.rs (L45-L81)).
- Preserve `PATHEXT`, `USERNAME`, and `USERPROFILE` on Windows, and
`HOME` / locale vars on Unix.
- Backfill a default `PATHEXT` in `create_env()` on Windows if the
parent env does not provide one, so child process launch still works in
stripped-down Bazel environments.
- Extend the Windows exec-env test to assert mixed-case `PathExt`
survives case-insensitive core filtering, and document why the
shell-command Unicode test goes through a child process.
## Verification
- `cargo test -p codex-core exec_env::tests`
Addresses #16124
Problem: `codex --remote --cd <path>` canonicalized the path locally and
then omitted it from remote thread lifecycle requests, so remote-only
working directories failed or were ignored.
Solution: Keep remote startup on the local cwd, forward explicit `--cd`
values verbatim to `thread/start`, `thread/resume`, and `thread/fork`,
and cover the behavior with `codex-tui` tests.
Testing: I manually tested `--remote --cd` with both absolute and
relative paths and validated correct behavior.
---
Update based on code review feedback:
Problem: Remote `--cd` was forwarded to `thread/resume` and
`thread/fork`, but not to `thread/list` lookups, so `--resume --last`
and picker flows could select a session from the wrong cwd; relative cwd
filters also failed against stored absolute paths.
Solution: Apply explicit remote `--cd` to `thread/list` lookups for
`--last` and picker flows, normalize relative cwd filters on the
app-server before exact matching, and document/test the behavior.
Addresses #11555
Problem: macOS malloc stack-logging diagnostics could leak into the TUI
composer and get misclassified as pasted user input.
Solution: Strip `MallocStackLogging*` and `MallocLogFile*` during macOS
pre-main hardening and document the additional env cleanup.
Addresses #15640
Problem: `codex exec` panicked on macOS when sandboxed proxy discovery
hit a NULL `SCDynamicStore` handle in `system-configuration`.
Solution: Bump `hyper-util` and `system-configuration` to versions that
handle denied `configd` lookups safely, and refresh the Bazel lockfile.
Testing: Verified using the manual `printf '(version 1) (allow default)
(deny mach-lookup (global-name
"com.apple.SystemConfiguration.configd"))' > /tmp/deny-configd.sb
sandbox-exec -f /tmp/deny-configd.sb codex exec -s danger-full-access
"echo test"`. Prior to the fix, this caused a panic.
Addresses #15282
Problem: Codex warned about missing system bubblewrap even when
sandboxing was disabled.
Solution: Gate the bwrap warning on the active sandbox policy and skip
it for danger-full-access and external-sandbox modes.
Addresses #16671 and #14927
Problem: `mcpServerStatus/list` rebuilt MCP tool groups from sanitized
tool prefixes but looked them up by unsanitized server names, so
hyphenated servers rendered as having no tools in `/mcp`. This was
reported as a regression when the TUI switched to use the app server.
Solution: Build each server's tool map using the original server name's
sanitized prefix, include effective runtime MCP servers in the status
response, and add a regression test for hyphenated server names.
Addresses #16454
Problem: `/copy` could keep stale output after a turn with
commentary-only assistant text.
Solution: Cache the latest non-empty agent message during a turn and
promote it on turn completion.
Stacked on #16508.
This removes the temporary `codex-core` / `codex-login` re-export shims
from the ownership split and rewrites callsites to import directly from
`codex-model-provider-info`, `codex-models-manager`, `codex-api`,
`codex-protocol`, `codex-feedback`, and `codex-response-debug-context`.
No behavior change intended; this is the mechanical import cleanup layer
split out from the ownership move.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
This is a follow-up to #16665. The Windows `unicode_output` test should
still exercise a child process so it verifies PowerShell's UTF-8 output
configuration, but `$env:COMSPEC` depends on that environment variable
surviving the curated Bazel test environment.
Using `cmd.exe` keeps the child-process coverage while avoiding both
bare `cmd` + `PATHEXT` lookup and `$env:COMSPEC` env passthrough
assumptions.
## What
- Run `cmd.exe /c echo naïve_café` in the Windows branch of
`unicode_output`.
## Verification
- `cargo test -p codex-core unicode_output`
## Why
Windows Bazel shell tests launch PowerShell with a curated environment,
so `PATHEXT` may be absent. The existing `unicode_output` test invokes
bare `cmd`, which can fail before the test exercises UTF-8 child-process
output.
## What
- Use `$env:COMSPEC /c echo naïve_café` in the Windows branch of
`unicode_output`.
- Preserve the external child-process path instead of switching the test
to a PowerShell builtin.
## Verification
- `cargo test -p codex-core unicode_output`
https://github.com/openai/codex/pull/16460 was a large PR created by
Codex to try to get the tests to pass under Bazel on Windows. Indeed, it
successfully ran all of the tests under `//codex-rs/core:` with its
changes to `codex-rs/core/`, though the full set of changes seems to be
too broad.
This PR tries to port the key changes, which are:
- Under Bazel, the `USERNAME` environment variable is not guaranteed to
be set on Windows, so for tests that need a non-empty env var as a
convenient substitute for an env var containing an API key, just use
`PATH`. Note that `PATH` is unlikely to contain characters that are not
allowed in an HTTP header value.
- Specify `"powershell.exe"` instead of just `"powershell"` in case the
`PATHEXT` env var gets lost in the shuffle.
## Summary
- split `models-manager` out of `core` and add `ModelsManagerConfig`
plus `Config::to_models_manager_config()` so model metadata paths stop
depending on `core::Config`
- move login-owned/auth-owned code out of `core` into `codex-login`,
move model provider config into `codex-model-provider-info`, move API
bridge mapping into `codex-api`, move protocol-owned types/impls into
`codex-protocol`, and move response debug helpers into a dedicated
`response-debug-context` crate
- move feedback tag emission into `codex-feedback`, relocate tests to
the crates that now own the code, and keep broad temporary re-exports so
this PR avoids a giant import-only rewrite
## Major moves and decisions
- created `codex-models-manager` as the owner for model
cache/catalog/config/model info logic, including the new
`ModelsManagerConfig` struct
- created `codex-model-provider-info` as the owner for provider config
parsing/defaults and kept temporary `codex-login`/`codex-core`
re-exports for old import paths
- moved `api_bridge` error mapping + `CoreAuthProvider` into
`codex-api`, while `codex-login::api_bridge` temporarily re-exports
those symbols and keeps the `auth_provider_from_auth` wrapper
- moved `auth_env_telemetry` and `provider_auth` ownership to
`codex-login`
- moved `CodexErr` ownership to `codex-protocol::error`, plus
`StreamOutput`, `bytes_to_string_smart`, and network policy helpers to
protocol-owned modules
- created `codex-response-debug-context` for
`extract_response_debug_context`, `telemetry_transport_error_message`,
and related response-debug plumbing instead of leaving that behavior in
`core`
- moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and
`emit_feedback_request_tags_with_auth_env` to `codex-feedback`
- deferred removal of temporary re-exports and the mechanical import
rewrites to a stacked follow-up PR so this PR stays reviewable
## Test moves
- moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to
`login/tests/suite/auth_refresh.rs`
- moved text encoding coverage from
`core/tests/suite/text_encoding_fix.rs` to
`protocol/src/exec_output_tests.rs`
- moved model info override coverage from
`core/tests/suite/model_info_overrides.rs` to
`models-manager/src/model_info_overrides_tests.rs`
---------
Co-authored-by: Codex <noreply@openai.com>
Addresses #16655
Problem: `codex login --api-key` failed in Clap before Codex could show
the deprecation guidance.
Solution: Allow the hidden `--api-key` flag to parse with zero or one
values so both forms reach the `--with-api-key` message.
## Why
The Windows `ProviderAuthScript` test helpers do not need PowerShell.
Running them through `cmd.exe` is enough to emit the next fixture token
and rotate `tokens.txt`, and it avoids a PowerShell-specific dependency
in these tests.
## What changed
- Replaced the Windows `print-token.ps1` fixtures with `print-token.cmd`
in `codex-rs/core/src/models_manager/manager_tests.rs` and
`codex-rs/login/src/auth/auth_tests.rs`.
- Switched the failing external-auth helper in
`codex-rs/login/src/auth/auth_tests.rs` from `powershell.exe -Command
'exit 1'` to `cmd.exe /d /s /c 'exit /b 1'`.
- Updated Windows timeout comments so they no longer call out PowerShell
specifically.
## Verification
- `cargo test -p codex-login`
- `cargo test -p codex-core` (fails in unrelated
`core/src/config/config_tests.rs` assertions in this checkout)
## Why
`thread/shellCommand` executes the raw command string through the
current user shell, which is PowerShell on Windows. The two v2
app-server tests in `app-server/tests/suite/v2/thread_shell_command.rs`
used POSIX `printf`, so Bazel CI on Windows failed with `printf` not
being recognized as a PowerShell command.
For reference, the user-shell task wraps commands with the active shell
before execution:
[`core/src/tasks/user_shell.rs`](7a3eec6fdb/codex-rs/core/src/tasks/user_shell.rs (L120-L126)).
## What Changed
Added a test-local helper that builds a shell-appropriate output command
and expected newline sequence from `default_user_shell()`:
- PowerShell: `Write-Output '...'` with `\r\n`
- Cmd: `echo ...` with `\r\n`
- POSIX shells: `printf '%s\n' ...` with `\n`
Both `thread_shell_command_runs_as_standalone_turn_and_persists_history`
and `thread_shell_command_uses_existing_active_turn` now use that
helper.
## Verification
- `cargo test -p codex-app-server thread_shell_command`
- Persist trusted cwd state during thread/start when the resolved
sandbox is elevated.
- Add app-server coverage for trusted root resolution and confirm
turn/start does not mutate trust.
## Why
This continues the compile-time cleanup from #16630. `SessionTask`
implementations are monomorphized, but `Session` stores the task behind
a `dyn` boundary so it can drive and abort heterogenous turn tasks
uniformly. That means we can move the `#[async_trait]` expansion off the
implementation trait, keep a small boxed adapter only at the storage
boundary, and preserve the existing task lifecycle semantics while
reducing the amount of generated async-trait glue in `codex-core`.
One measurement caveat showed up while exploring this: a warm
incremental benchmark based on `touch core/src/tasks/mod.rs && cargo
check -p codex-core --lib` was basically flat, but that was the wrong
benchmark for this change. Using package-clean `codex-core` rebuilds,
like #16630, shows the real win.
Relevant pre-change code:
- [`SessionTask` with
`#[async_trait]`](3c7f013f97/codex-rs/core/src/tasks/mod.rs (L129-L182))
- [`RunningTask` storing `Arc<dyn
SessionTask>`](3c7f013f97/codex-rs/core/src/state/turn.rs (L69-L77))
## What changed
- Switched `SessionTask::{run, abort}` to native RPITIT futures with
explicit `Send` bounds.
- Added a private `AnySessionTask` adapter that boxes those futures only
at the `Arc<dyn ...>` storage boundary.
- Updated `RunningTask` to store `Arc<dyn AnySessionTask>` and removed
`#[async_trait]` from the concrete task impls plus test-only
`SessionTask` impls.
## Timing
Benchmarked package-clean `codex-core` rebuilds with dependencies left
warm:
```shell
cargo check -p codex-core --lib >/dev/null
cargo clean -p codex-core >/dev/null
/usr/bin/time -p cargo +nightly rustc -p codex-core --lib -- \
-Z time-passes \
-Z time-passes-format=json >/dev/null
```
| revision | rustc `total` | process `real` | `generate_crate_metadata`
| `MIR_borrow_checking` | `monomorphization_collector_graph_walk` |
| --- | ---: | ---: | ---: | ---: | ---: |
| parent `3c7f013f9735` | 67.21s | 67.71s | 24.61s | 23.43s | 22.43s |
| this PR `2cafd783ac22` | 35.08s | 35.60s | 8.01s | 7.25s | 7.15s |
| delta | -47.8% | -47.4% | -67.5% | -69.1% | -68.1% |
For completeness, the warm touched-file benchmark stayed flat (`1.96s`
parent vs `1.97s` this PR), which is why that benchmark should not be
used to evaluate this refactor.
## Verification
- Ran `cargo test -p codex-core`; this change compiled and task-related
tests passed before hitting the same unrelated 5
`config::tests::*guardian*` failures already present on the parent
stack.