From d53e68954acee2eb50303970498ffebddff393ed Mon Sep 17 00:00:00 2001
From: anp-oai <anp@openai.com>
Date: Fri, 22 May 2026 09:58:14 -0700
Subject: [PATCH] Prefer `just test` over `cargo test` in docs (#23910)

`cargo test` for the core and other crates fails on a fresh macOS
checkout without the right stack size variable. This change encourages
using the just test command that sets the environment up correctly.

As a bonus, this should encourage agents to get more benefit out of
nextest's parallel execution.
---
 AGENTS.md                             | 11 ++++++-----
 codex-rs/app-server/README.md         |  2 +-
 codex-rs/core/tests/suite/live_cli.rs |  3 ++-
 codex-rs/utils/pty/README.md          |  2 +-
 docs/contributing.md                  |  2 +-
 docs/install.md                       |  8 +++-----
 justfile                              |  7 +++----
 scripts/test-remote-env.sh            |  2 +-
 8 files changed, 18 insertions(+), 19 deletions(-)
diff --git a/AGENTS.md b/AGENTS.md
index c13fdea641..9906d3039a 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -52,12 +52,13 @@ In the codex-rs folder where the rust code lives:
     the new implementation so the invariants stay close to the code that owns them.
   - Avoid adding new standalone methods to `codex-rs/tui/src/chatwidget.rs` unless the change is
     trivial; prefer new modules/files and keep `chatwidget.rs` focused on orchestration.
-- When running Rust commands (e.g. `just fix` or `cargo test`) be patient with the command and never try to kill them using the PID. Rust lock can make the execution slow, this is expected.
+- When running Rust commands (e.g. `just fix` or `just test`) be patient with the command and never try to kill them using the PID. Rust lock can make the execution slow, this is expected.
 
 Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:
 
-1. Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `cargo test -p codex-tui`.
-2. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `cargo test` (or `just test` if `cargo-nextest` is installed). Avoid `--all-features` for routine local runs because it expands the build matrix and can significantly increase `target/` disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.
+1. Do not run `cargo test` directly. Use `just test` so test execution follows the repo defaults.
+2. Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `just test -p codex-tui`.
+3. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `just test`. Avoid `--all-features` for routine local runs because it expands the build matrix and can significantly increase `target/` disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.
 
 Before finalizing a large change to `codex-rs`, run `just fix -p <project>` (in `codex-rs` directory) to fix any linter issues in the code. Prefer scoping with `-p` to avoid slow workspace‑wide Clippy builds; only run `just fix` without `-p` if you changed shared crates. Do not re-run tests after running `fix` or `fmt`.
 
@@ -120,7 +121,7 @@ is easy to review and future diffs stay visual.
 When UI or text output changes intentionally, update the snapshots as follows:
 
 - Run tests to generate any updated snapshots:
-  - `cargo test -p codex-tui`
+  - `just test -p codex-tui`
 - Check what’s pending:
   - `cargo insta pending-snapshots -p codex-tui`
 - Review changes by reading the generated `*.snap.new` files directly in the repo, or preview a specific file:
@@ -214,6 +215,6 @@ These guidelines apply to app-server protocol work in `codex-rs`, especially:
 - Regenerate schema fixtures when API shapes change:
   `just write-app-server-schema`
   (and `just write-app-server-schema --experimental` when experimental API fixtures are affected).
-- Validate with `cargo test -p codex-app-server-protocol`.
+- Validate with `just test -p codex-app-server-protocol`.
 - Avoid boilerplate tests that only assert experimental field markers for individual
   request fields in `common.rs`; rely on schema generation/tests and behavioral coverage instead.
diff --git a/codex-rs/app-server/README.md b/codex-rs/app-server/README.md
index 2ceffc86fe..71b068c93f 100644
--- a/codex-rs/app-server/README.md
+++ b/codex-rs/app-server/README.md
@@ -1950,5 +1950,5 @@ For server-initiated request payloads, annotate the field the same way so schema
 5. Verify the protocol crate:
 
    ```bash
-   cargo test -p codex-app-server-protocol
+   just test -p codex-app-server-protocol
    ```
diff --git a/codex-rs/core/tests/suite/live_cli.rs b/codex-rs/core/tests/suite/live_cli.rs
index 5e2c0415ea..6273cd15e4 100644
--- a/codex-rs/core/tests/suite/live_cli.rs
+++ b/codex-rs/core/tests/suite/live_cli.rs
@@ -2,7 +2,8 @@
 
 //! Optional smoke tests that hit the real OpenAI /v1/responses endpoint. They are `#[ignore]` by
 //! default so CI stays deterministic and free. Developers can run them locally with
-//! `cargo test --test live_cli -- --ignored` provided they set a valid `OPENAI_API_KEY`.
+//! `just test -p codex-core --test all --run-ignored only live_cli` provided they set a valid
+//! `OPENAI_API_KEY`.
 
 use assert_cmd::prelude::*;
 use predicates::prelude::*;
diff --git a/codex-rs/utils/pty/README.md b/codex-rs/utils/pty/README.md
index e70d7bc6af..7b9df30d0a 100644
--- a/codex-rs/utils/pty/README.md
+++ b/codex-rs/utils/pty/README.md
@@ -60,5 +60,5 @@ Use `spawn_pipe_process_no_stdin` to force stdin closed (commands that read stdi
 Unit tests live in `src/lib.rs` and cover both backends (PTY Python REPL and pipe-based stdin roundtrip). Run with:
 
 ```
-cargo test -p codex-utils-pty -- --nocapture
+just test -p codex-utils-pty --no-capture
 ```
diff --git a/docs/contributing.md b/docs/contributing.md
index 19b31073e9..aeae1f10d3 100644
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -54,7 +54,7 @@ When a change updates model catalogs or model metadata (`/models` payloads, pres
 
 - Fill in the PR template (or include similar information) - **What? Why? How?**
 - Include a link to a bug report or enhancement request in the issue tracker
-- Run **all** checks locally. Use the root `just` helpers so you stay consistent with the rest of the workspace: `just fmt`, `just fix -p <crate>` for the crate you touched, and the relevant tests (e.g., `cargo test -p codex-tui` or `just test` if you need a full sweep). CI failures that could have been caught locally slow down the process.
+- Run **all** checks locally. Use the root `just` helpers so you stay consistent with the rest of the workspace: `just fmt`, `just fix -p <crate>` for the crate you touched, and the relevant tests (e.g., `just test -p codex-tui` or `just test` if you need a full sweep). CI failures that could have been caught locally slow down the process.
 - Make sure your branch is up-to-date with `main` and that you have resolved merge conflicts.
 - Mark the PR as **Ready for review** only when you believe it is in a merge-able state.
 
diff --git a/docs/install.md b/docs/install.md
index 0991e7d16c..7c762c4c50 100644
--- a/docs/install.md
+++ b/docs/install.md
@@ -26,7 +26,7 @@ rustup component add rustfmt
 rustup component add clippy
 # Install helper tools used by the workspace justfile:
 cargo install --locked just
-# Optional: install nextest for the `just test` helper
+# Install nextest for the `just test` helper.
 cargo install --locked cargo-nextest
 
 # Build Codex.
@@ -40,13 +40,11 @@ just fmt
 just fix -p <crate-you-touched>
 
 # Run the relevant tests (project-specific is fastest), for example:
-cargo test -p codex-tui
-# If you have cargo-nextest installed, `just test` runs the test suite via nextest:
+just test -p codex-tui
+# `just test` runs the test suite via nextest:
 just test
 # Avoid `--all-features` for routine local runs because it increases build
 # time and `target/` disk usage by compiling additional feature combinations.
-# If you specifically want full feature coverage, use:
-cargo test --all-features
 ```
 
 ## Tracing / verbose logging
diff --git a/justfile b/justfile
index ab2fbc6362..907cd71f6d 100644
--- a/justfile
+++ b/justfile
@@ -46,14 +46,13 @@ install:
     rustup show active-toolchain
     cargo fetch
 
-# Run `cargo nextest` since it's faster than `cargo test`, though including
-# --no-fail-fast is important to ensure all tests are run.
+# Run nextest with --no-fail-fast so all tests are run.
 #
 # Run `cargo install --locked cargo-nextest` if you don't have it installed.
 # Prefer this for routine local runs. Workspace crate features are banned, so
 # there should be no need to add `--all-features`.
-test:
-    RUST_MIN_STACK={{ rust_min_stack }} cargo nextest run --no-fail-fast
+test *args:
+    RUST_MIN_STACK={{ rust_min_stack }} cargo nextest run --no-fail-fast "$@"
 
 # Build and run Codex from source using Bazel.
 # Note we have to use the combination of `[no-cd]` and `--run_under="cd $PWD &&"`
diff --git a/scripts/test-remote-env.sh b/scripts/test-remote-env.sh
index 96743616a2..584a0f6f29 100755
--- a/scripts/test-remote-env.sh
+++ b/scripts/test-remote-env.sh
@@ -5,7 +5,7 @@
 # Usage (source-only):
 #   source scripts/test-remote-env.sh
 #   cd codex-rs
-#   cargo test -p codex-core --test all remote_env_connects_creates_temp_dir_and_runs_sample_script
+#   just test -p codex-core --test all remote_test_env_can_connect_and_use_filesystem
 #   codex_remote_env_cleanup
 
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"