core: customize collab config deprecation warning

test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518 )
## Why The zsh integration tests were still brittle in two ways: - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so they often did not exercise the patched zsh fork that `shell-tool-mcp` ships - once the tests consistently used the vendored zsh fork, they exposed real Linux-specific zsh-fork issues in CI In particular, the Linux failures were not just test noise: - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode could receive malformed arguments - the `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` test uses the zsh exec bridge (which talks to the parent over a Unix socket), but Linux restricted sandbox seccomp denies `connect(2)`, causing timeouts on `ubuntu-24.04` x86/arm This PR makes the zsh tests consistently run against the intended vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI signal is meaningful. ## What Changed - Added a single shared test-only DotSlash file for the patched zsh fork at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing `bash` test resource). - Updated both app-server and exec-server zsh tests to use that shared DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH` dependency). - Updated the app-server zsh-fork test helper to resolve the shared DotSlash zsh and avoid silently falling back to host zsh. - Kept the app-server zsh-fork tests configured via `config.toml`, using a test wrapper path where needed to force `zsh -df` (and rewrite `-lc` to `-c`) for the subcommand-decline test. - Hardened the app-server subcommand-decline zsh-fork test for CI variability: - tolerate an extra `/responses` POST with a no-op mock response - tolerate non-target approval ordering while remaining strict on the two `/usr/bin/true` approvals and decline behavior - use `DangerFullAccess` on Linux for this one test because it validates zsh approval flow, not Linux sandbox socket restrictions - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox` arg0 dispatch continues to work. - Moved `maybe_run_zsh_exec_wrapper_mode()` under `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode handling coexists correctly with arg0-dispatched helper modes. - Consolidated duplicated `dotslash -- fetch` resolution logic into shared test support (`core/tests/common/lib.rs`). - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh differences by: - resolving an absolute `git` path - running `git init --quiet .` - asserting success / `.git` creation instead of relying on banner text ## Verification - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture` - `cargo test -p codex-exec-server accept_elicitation -- --nocapture` - `bazel test //codex-rs/exec-server:exec-server-all-test --test_output=streamed --test_arg=--nocapture --test_arg=accept_elicitation_for_prompt_rule_with_zsh` - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 - x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm - aarch64-unknown-linux-gnu` passed in [run 22291424358](https://github.com/openai/codex/actions/runs/22291424358)
2026-02-25 02:03:48 +00:00 · 2026-02-22 23:14:30 -05:00 · 2026-02-22 19:39:56 -08:00 · 2026-02-22 15:36:28 -08:00
15 changed files with 1460 additions and 104 deletions
--- a/.codex/skills/babysit-pr/SKILL.md
+++ b/.codex/skills/babysit-pr/SKILL.md
@@ -0,0 +1,185 @@
+---
+name: babysit-pr
+description: Babysit a GitHub pull request after creation by continuously polling CI checks/workflow runs, new review comments, and mergeability state until the PR is ready to merge (or merged/closed). Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and stop only when user help is required (for example CI infrastructure issues, exhausted flaky retries, or ambiguous/blocking situations). Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
+---
+
+# PR Babysitter
+
+## Objective
+Babysit a PR persistently until one of these terminal outcomes occurs:
+
+- The PR is merged or closed.
+- CI is successful, there are no unaddressed review comments surfaced by the watcher, required review approval is not blocking merge, and there are no potential merge conflicts (PR is mergeable / not reporting conflict risk).
+- A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely).
+
+Do not stop merely because a single snapshot returns `idle` while checks are still pending.
+
+## Inputs
+Accept any of the following:
+
+- No PR argument: infer the PR from the current branch (`--pr auto`)
+- PR number
+- PR URL
+
+## Core Workflow
+
+1. When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (`--watch`) unless you are intentionally doing a one-shot diagnostic snapshot.
+2. Run the watcher script to snapshot PR/CI/review state (or consume each streamed snapshot from `--watch`).
+3. Inspect the `actions` list in the JSON response.
+4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure.
+5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
+6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
+7. If a review item is actionable and correct, patch code locally, commit, and push.
+8. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
+9. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
+10. On every loop, verify mergeability / merge-conflict status (for example via `gh pr view`) in addition to CI and review state.
+11. After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state.
+12. If you had been using `--watch` before pausing to patch/commit/push, relaunch `--watch` yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).
+13. Repeat polling until the PR is green + review-clean + mergeable, `stop_pr_closed` appears, or a user-help-required blocker is reached.
+14. Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached `--watch` process running and then end the turn as if monitoring were complete.
+
+## Commands
+
+### One-shot snapshot
+
+```bash
+python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --once
+```
+
+### Continuous watch (JSONL)
+
+```bash
+python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch
+```
+
+### Trigger flaky retry cycle (only when watcher indicates)
+
+```bash
+python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now
+```
+
+### Explicit PR target
+
+```bash
+python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --once
+```
+
+## CI Failure Classification
+Use `gh` commands to inspect failed runs before deciding to rerun.
+
+- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
+- `gh run view <run-id> --log-failed`
+
+Prefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
+
+Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
+
+If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
+
+Read `.codex/skills/babysit-pr/references/heuristics.md` for a concise checklist.
+
+## Review Comment Handling
+The watcher surfaces review items from:
+
+- PR issue comments
+- Inline review comments
+- Review submissions (COMMENT / APPROVED / CHANGES_REQUESTED)
+
+It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from `chatgpt-codex-connector[bot]`) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored.
+For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex.
+On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
+
+When you agree with a comment and it is actionable:
+
+1. Patch code locally.
+2. Commit with `codex: address PR review feedback (#<n>)`.
+3. Push to the PR head branch.
+4. Resume watching on the new SHA immediately (do not stop after reporting the push).
+5. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.
+
+If you disagree or the comment is non-actionable/already addressed, record it as handled by continuing the watcher loop (the script de-duplicates surfaced items via state after surfacing them).
+If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
+
+## Git Safety Rules
+
+- Work only on the PR head branch.
+- Avoid destructive git commands.
+- Do not switch branches unless necessary to recover context.
+- Before editing, check for unrelated uncommitted changes. If present, stop and ask the user.
+- After each successful fix, commit and `git push`, then re-run the watcher.
+- If you interrupted a live `--watch` session to make the fix, restart `--watch` immediately after the push in the same turn.
+- Do not run multiple concurrent `--watch` processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.
+- A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met.
+
+Commit message defaults:
+
+- `codex: fix CI failure on PR #<n>`
+- `codex: address PR review feedback (#<n>)`
+
+## Monitoring Loop Pattern
+Use this loop in a live Codex session:
+
+1. Run `--once`.
+2. Read `actions`.
+3. First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
+4. Check CI summary, new review items, and mergeability/conflict status.
+5. Diagnose CI failures and classify branch-related vs flaky/unrelated.
+6. Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
+7. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit.
+8. If you pushed a commit or triggered a rerun, report the action briefly and continue polling (do not stop).
+9. After a review-fix push, proactively restart continuous monitoring (`--watch`) in the same turn unless a strict stop condition has already been reached.
+10. If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report success and stop.
+11. If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.
+12. Otherwise sleep according to the polling cadence below and repeat.
+
+When the user explicitly asks to monitor/watch/babysit a PR, prefer `--watch` so polling continues autonomously in one command. Use repeated `--once` snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
+Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
+Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
+If a `--watch` process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
+
+## Polling Cadence
+Use adaptive polling and continue monitoring even after CI turns green:
+
+- While CI is not green (pending/running/queued or failing): poll every 1 minute.
+- After CI turns green: start at every 1 minute, then back off exponentially when there is no change (for example 1m, 2m, 4m, 8m, 16m, 32m), capping at every 1 hour.
+- Reset the green-state polling interval back to 1 minute whenever anything changes (new commit/SHA, check status changes, new review comments, mergeability changes, review decision changes).
+- If CI stops being green again (new commit, rerun, or regression): return to 1-minute polling.
+- If any poll shows the PR is merged or otherwise closed: stop polling immediately and report the terminal state.
+
+## Stop Conditions (Strict)
+Stop only when one of the following is true:
+
+- PR merged or closed (stop as soon as a poll/snapshot confirms this).
+- PR is ready to merge: CI succeeded, no surfaced unaddressed review comments, not blocked on required review approval, and no merge conflict risk.
+- User intervention is required and Codex cannot safely proceed alone.
+
+Keep polling when:
+
+- `actions` contains only `idle` but checks are still pending.
+- CI is still running/queued.
+- Review state is quiet but CI is not terminal.
+- CI is green but mergeability is unknown/pending.
+- CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes per the green-state cadence.
+- The PR is green but blocked on review approval (`REVIEW_REQUIRED` / similar); continue polling on the green-state cadence and surface any new review comments without asking for confirmation to keep watching.
+
+## Output Expectations
+Provide concise progress updates while monitoring and a final summary that includes:
+
+- During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
+- Treat push confirmations, intermediate CI snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
+- A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
+- A review-fix commit + push is not a completion event; immediately resume live monitoring (`--watch`) in the same turn and continue reporting progress updates.
+- When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: `🚀 CI is all green! 33/33 passed. Still on watch for review approval.`
+- Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
+
+- Final PR SHA
+- CI status summary
+- Mergeability / conflict status
+- Fixes pushed
+- Flaky retry cycles used
+- Remaining unresolved failures or review comments
+
+## References
+
+- Heuristics and decision tree: `.codex/skills/babysit-pr/references/heuristics.md`
+- GitHub CLI/API details used by the watcher: `.codex/skills/babysit-pr/references/github-api-notes.md`
--- a/.codex/skills/babysit-pr/agents/openai.yaml
+++ b/.codex/skills/babysit-pr/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "PR Babysitter"
+  short_description: "Watch PR CI, reviews, and merge conflicts"
+  default_prompt: "Babysit the current PR: monitor CI, reviewer comments, and merge-conflict status (prefer the watcher’s --watch mode for live monitoring); fix valid issues, push updates, and rerun flaky failures up to 3 times. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Continue polling autonomously after any push/rerun until a strict terminal stop condition is reached or the user interrupts."
--- a/.codex/skills/babysit-pr/references/github-api-notes.md
+++ b/.codex/skills/babysit-pr/references/github-api-notes.md
@@ -0,0 +1,72 @@
+# GitHub CLI / API Notes For `babysit-pr`
+
+## Primary commands used
+
+### PR metadata
+
+- `gh pr view --json number,url,state,mergedAt,closedAt,headRefName,headRefOid,headRepository,headRepositoryOwner`
+
+Used to resolve PR number, URL, branch, head SHA, and closed/merged state.
+
+### PR checks summary
+
+- `gh pr checks --json name,state,bucket,link,workflow,event,startedAt,completedAt`
+
+Used to compute pending/failed/passed counts and whether the current CI round is terminal.
+
+### Workflow runs for head SHA
+
+- `gh api repos/{owner}/{repo}/actions/runs -X GET -f head_sha=<sha> -f per_page=100`
+
+Used to discover failed workflow runs and rerunnable run IDs.
+
+### Failed log inspection
+
+- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
+- `gh run view <run-id> --log-failed`
+
+Used by Codex to classify branch-related vs flaky/unrelated failures.
+
+### Retry failed jobs only
+
+- `gh run rerun <run-id> --failed`
+
+Reruns only failed jobs (and dependencies) for a workflow run.
+
+## Review-related endpoints
+
+- Issue comments on PR:
+  - `gh api repos/{owner}/{repo}/issues/<pr_number>/comments?per_page=100`
+- Inline PR review comments:
+  - `gh api repos/{owner}/{repo}/pulls/<pr_number>/comments?per_page=100`
+- Review submissions:
+  - `gh api repos/{owner}/{repo}/pulls/<pr_number>/reviews?per_page=100`
+
+## JSON fields consumed by the watcher
+
+### `gh pr view`
+
+- `number`
+- `url`
+- `state`
+- `mergedAt`
+- `closedAt`
+- `headRefName`
+- `headRefOid`
+
+### `gh pr checks`
+
+- `bucket` (`pass`, `fail`, `pending`, `skipping`)
+- `state`
+- `name`
+- `workflow`
+- `link`
+
+### Actions runs API (`workflow_runs[]`)
+
+- `id`
+- `name`
+- `status`
+- `conclusion`
+- `html_url`
+- `head_sha`
--- a/.codex/skills/babysit-pr/references/heuristics.md
+++ b/.codex/skills/babysit-pr/references/heuristics.md
@@ -0,0 +1,58 @@
+# CI / Review Heuristics
+
+## CI classification checklist
+
+Treat as **branch-related** when logs clearly indicate a regression caused by the PR branch:
+
+- Compile/typecheck/lint failures in files or modules touched by the branch
+- Deterministic unit/integration test failures in changed areas
+- Snapshot output changes caused by UI/text changes in the branch
+- Static analysis violations introduced by the latest push
+- Build script/config changes in the PR causing a deterministic failure
+
+Treat as **likely flaky or unrelated** when evidence points to transient or external issues:
+
+- DNS/network/registry timeout errors while fetching dependencies
+- Runner image provisioning or startup failures
+- GitHub Actions infrastructure/service outages
+- Cloud/service rate limits or transient API outages
+- Non-deterministic failures in unrelated integration tests with known flake patterns
+
+If uncertain, inspect failed logs once before choosing rerun.
+
+## Decision tree (fix vs rerun vs stop)
+
+1. If PR is merged/closed: stop.
+2. If there are failed checks:
+   - Diagnose first.
+   - If branch-related: fix locally, commit, push.
+   - If likely flaky/unrelated and all checks for the current SHA are terminal: rerun failed jobs.
+   - If checks are still pending: wait.
+3. If flaky reruns for the same SHA reach the configured limit (default 3): stop and report persistent failure.
+4. Independently, process any new human review comments.
+
+## Review comment agreement criteria
+
+Address the comment when:
+
+- The comment is technically correct.
+- The change is actionable in the current branch.
+- The requested change does not conflict with the user’s intent or recent guidance.
+- The change can be made safely without unrelated refactors.
+
+Do not auto-fix when:
+
+- The comment is ambiguous and needs clarification.
+- The request conflicts with explicit user instructions.
+- The proposed change requires product/design decisions the user has not made.
+- The codebase is in a dirty/unrelated state that makes safe editing uncertain.
+
+## Stop-and-ask conditions
+
+Stop and ask the user instead of continuing automatically when:
+
+- The local worktree has unrelated uncommitted changes.
+- `gh` auth/permissions fail.
+- The PR branch cannot be pushed.
+- CI failures persist after the flaky retry budget.
+- Reviewer feedback requires a product decision or cross-team coordination.
--- a/.codex/skills/babysit-pr/scripts/gh_pr_watch.py
+++ b/.codex/skills/babysit-pr/scripts/gh_pr_watch.py
@@ -0,0 +1,805 @@
+#!/usr/bin/env python3
+"""Watch GitHub PR CI and review activity for Codex PR babysitting workflows."""
+
+import argparse
+import json
+import os
+import re
+import subprocess
+import sys
+import tempfile
+import time
+from pathlib import Path
+from urllib.parse import urlparse
+
+FAILED_RUN_CONCLUSIONS = {
+    "failure",
+    "timed_out",
+    "cancelled",
+    "action_required",
+    "startup_failure",
+    "stale",
+}
+PENDING_CHECK_STATES = {
+    "QUEUED",
+    "IN_PROGRESS",
+    "PENDING",
+    "WAITING",
+    "REQUESTED",
+}
+REVIEW_BOT_LOGIN_KEYWORDS = {
+    "codex",
+}
+TRUSTED_AUTHOR_ASSOCIATIONS = {
+    "OWNER",
+    "MEMBER",
+    "COLLABORATOR",
+}
+MERGE_BLOCKING_REVIEW_DECISIONS = {
+    "REVIEW_REQUIRED",
+    "CHANGES_REQUESTED",
+}
+MERGE_CONFLICT_OR_BLOCKING_STATES = {
+    "BLOCKED",
+    "DIRTY",
+    "DRAFT",
+    "UNKNOWN",
+}
+GREEN_STATE_MAX_POLL_SECONDS = 60 * 60
+
+
+class GhCommandError(RuntimeError):
+    pass
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description=(
+            "Normalize PR/CI/review state for Codex PR babysitting and optionally "
+            "trigger flaky reruns."
+        )
+    )
+    parser.add_argument("--pr", default="auto", help="auto, PR number, or PR URL")
+    parser.add_argument("--repo", help="Optional OWNER/REPO override")
+    parser.add_argument("--poll-seconds", type=int, default=30, help="Watch poll interval")
+    parser.add_argument(
+        "--max-flaky-retries",
+        type=int,
+        default=3,
+        help="Max rerun cycles per head SHA before stop recommendation",
+    )
+    parser.add_argument("--state-file", help="Path to state JSON file")
+    parser.add_argument("--once", action="store_true", help="Emit one snapshot and exit")
+    parser.add_argument("--watch", action="store_true", help="Continuously emit JSONL snapshots")
+    parser.add_argument(
+        "--retry-failed-now",
+        action="store_true",
+        help="Rerun failed jobs for current failed workflow runs when policy allows",
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        help="Emit machine-readable output (default behavior for --once and --retry-failed-now)",
+    )
+    args = parser.parse_args()
+
+    if args.poll_seconds <= 0:
+        parser.error("--poll-seconds must be > 0")
+    if args.max_flaky_retries < 0:
+        parser.error("--max-flaky-retries must be >= 0")
+    if args.watch and args.retry_failed_now:
+        parser.error("--watch cannot be combined with --retry-failed-now")
+    if not args.once and not args.watch and not args.retry_failed_now:
+        args.once = True
+    return args
+
+
+def _format_gh_error(cmd, err):
+    stdout = (err.stdout or "").strip()
+    stderr = (err.stderr or "").strip()
+    parts = [f"GitHub CLI command failed: {' '.join(cmd)}"]
+    if stdout:
+        parts.append(f"stdout: {stdout}")
+    if stderr:
+        parts.append(f"stderr: {stderr}")
+    return "\n".join(parts)
+
+
+def gh_text(args, repo=None):
+    cmd = ["gh"]
+    # `gh api` does not accept `-R/--repo` on all gh versions. The watcher's
+    # API calls use explicit endpoints (e.g. repos/{owner}/{repo}/...), so the
+    # repo flag is unnecessary there.
+    if repo and (not args or args[0] != "api"):
+        cmd.extend(["-R", repo])
+    cmd.extend(args)
+    try:
+        proc = subprocess.run(cmd, check=True, capture_output=True, text=True)
+    except FileNotFoundError as err:
+        raise GhCommandError("`gh` command not found") from err
+    except subprocess.CalledProcessError as err:
+        raise GhCommandError(_format_gh_error(cmd, err)) from err
+    return proc.stdout
+
+
+def gh_json(args, repo=None):
+    raw = gh_text(args, repo=repo).strip()
+    if not raw:
+        return None
+    try:
+        return json.loads(raw)
+    except json.JSONDecodeError as err:
+        raise GhCommandError(f"Failed to parse JSON from gh output for {' '.join(args)}") from err
+
+
+def parse_pr_spec(pr_spec):
+    if pr_spec == "auto":
+        return {"mode": "auto", "value": None}
+    if re.fullmatch(r"\d+", pr_spec):
+        return {"mode": "number", "value": pr_spec}
+    parsed = urlparse(pr_spec)
+    if parsed.scheme and parsed.netloc and "/pull/" in parsed.path:
+        return {"mode": "url", "value": pr_spec}
+    raise ValueError("--pr must be 'auto', a PR number, or a PR URL")
+
+
+def pr_view_fields():
+    return (
+        "number,url,state,mergedAt,closedAt,headRefName,headRefOid,"
+        "headRepository,headRepositoryOwner,mergeable,mergeStateStatus,reviewDecision"
+    )
+
+
+def checks_fields():
+    return "name,state,bucket,link,workflow,event,startedAt,completedAt"
+
+
+def resolve_pr(pr_spec, repo_override=None):
+    parsed = parse_pr_spec(pr_spec)
+    cmd = ["pr", "view"]
+    if parsed["value"] is not None:
+        cmd.append(parsed["value"])
+    cmd.extend(["--json", pr_view_fields()])
+    data = gh_json(cmd, repo=repo_override)
+    if not isinstance(data, dict):
+        raise GhCommandError("Unexpected PR payload from `gh pr view`")
+
+    pr_url = str(data.get("url") or "")
+    repo = (
+        repo_override
+        or extract_repo_from_pr_url(pr_url)
+        or extract_repo_from_pr_view(data)
+    )
+    if not repo:
+        raise GhCommandError("Unable to determine OWNER/REPO for the PR")
+
+    state = str(data.get("state") or "")
+    merged = bool(data.get("mergedAt"))
+    closed = bool(data.get("closedAt")) or state.upper() == "CLOSED"
+
+    return {
+        "number": int(data["number"]),
+        "url": pr_url,
+        "repo": repo,
+        "head_sha": str(data.get("headRefOid") or ""),
+        "head_branch": str(data.get("headRefName") or ""),
+        "state": state,
+        "merged": merged,
+        "closed": closed,
+        "mergeable": str(data.get("mergeable") or ""),
+        "merge_state_status": str(data.get("mergeStateStatus") or ""),
+        "review_decision": str(data.get("reviewDecision") or ""),
+    }
+
+
+def extract_repo_from_pr_view(data):
+    head_repo = data.get("headRepository")
+    head_owner = data.get("headRepositoryOwner")
+    owner = None
+    name = None
+    if isinstance(head_owner, dict):
+        owner = head_owner.get("login") or head_owner.get("name")
+    elif isinstance(head_owner, str):
+        owner = head_owner
+    if isinstance(head_repo, dict):
+        name = head_repo.get("name")
+        repo_owner = head_repo.get("owner")
+        if not owner and isinstance(repo_owner, dict):
+            owner = repo_owner.get("login") or repo_owner.get("name")
+    elif isinstance(head_repo, str):
+        name = head_repo
+    if owner and name:
+        return f"{owner}/{name}"
+    return None
+def extract_repo_from_pr_url(pr_url):
+    parsed = urlparse(pr_url)
+    parts = [p for p in parsed.path.split("/") if p]
+    if len(parts) >= 4 and parts[2] == "pull":
+        return f"{parts[0]}/{parts[1]}"
+    return None
+
+
+def load_state(path):
+    if path.exists():
+        try:
+            data = json.loads(path.read_text())
+        except json.JSONDecodeError as err:
+            raise RuntimeError(f"State file is not valid JSON: {path}") from err
+        if not isinstance(data, dict):
+            raise RuntimeError(f"State file must contain an object: {path}")
+        return data, False
+    return {
+        "pr": {},
+        "started_at": None,
+        "last_seen_head_sha": None,
+        "retries_by_sha": {},
+        "seen_issue_comment_ids": [],
+        "seen_review_comment_ids": [],
+        "seen_review_ids": [],
+        "last_snapshot_at": None,
+    }, True
+
+
+def save_state(path, state):
+    path.parent.mkdir(parents=True, exist_ok=True)
+    payload = json.dumps(state, indent=2, sort_keys=True) + "\n"
+    fd, tmp_name = tempfile.mkstemp(prefix=f"{path.name}.", suffix=".tmp", dir=path.parent)
+    tmp_path = Path(tmp_name)
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as tmp_file:
+            tmp_file.write(payload)
+        os.replace(tmp_path, path)
+    except Exception:
+        try:
+            tmp_path.unlink(missing_ok=True)
+        except OSError:
+            pass
+        raise
+
+
+def default_state_file_for(pr):
+    repo_slug = pr["repo"].replace("/", "-")
+    return Path(f"/tmp/codex-babysit-pr-{repo_slug}-pr{pr['number']}.json")
+
+
+def get_pr_checks(pr_spec, repo):
+    parsed = parse_pr_spec(pr_spec)
+    cmd = ["pr", "checks"]
+    if parsed["value"] is not None:
+        cmd.append(parsed["value"])
+    cmd.extend(["--json", checks_fields()])
+    data = gh_json(cmd, repo=repo)
+    if data is None:
+        return []
+    if not isinstance(data, list):
+        raise GhCommandError("Unexpected payload from `gh pr checks`")
+    return data
+
+
+def is_pending_check(check):
+    bucket = str(check.get("bucket") or "").lower()
+    state = str(check.get("state") or "").upper()
+    return bucket == "pending" or state in PENDING_CHECK_STATES
+
+
+def summarize_checks(checks):
+    pending_count = 0
+    failed_count = 0
+    passed_count = 0
+    for check in checks:
+        bucket = str(check.get("bucket") or "").lower()
+        if is_pending_check(check):
+            pending_count += 1
+        if bucket == "fail":
+            failed_count += 1
+        if bucket == "pass":
+            passed_count += 1
+    return {
+        "pending_count": pending_count,
+        "failed_count": failed_count,
+        "passed_count": passed_count,
+        "all_terminal": pending_count == 0,
+    }
+
+
+def get_workflow_runs_for_sha(repo, head_sha):
+    endpoint = f"repos/{repo}/actions/runs"
+    data = gh_json(
+        ["api", endpoint, "-X", "GET", "-f", f"head_sha={head_sha}", "-f", "per_page=100"],
+        repo=repo,
+    )
+    if not isinstance(data, dict):
+        raise GhCommandError("Unexpected payload from actions runs API")
+    runs = data.get("workflow_runs") or []
+    if not isinstance(runs, list):
+        raise GhCommandError("Expected `workflow_runs` to be a list")
+    return runs
+
+
+def failed_runs_from_workflow_runs(runs, head_sha):
+    failed_runs = []
+    for run in runs:
+        if not isinstance(run, dict):
+            continue
+        if str(run.get("head_sha") or "") != head_sha:
+            continue
+        conclusion = str(run.get("conclusion") or "")
+        if conclusion not in FAILED_RUN_CONCLUSIONS:
+            continue
+        failed_runs.append(
+            {
+                "run_id": run.get("id"),
+                "workflow_name": run.get("name") or run.get("display_title") or "",
+                "status": str(run.get("status") or ""),
+                "conclusion": conclusion,
+                "html_url": str(run.get("html_url") or ""),
+            }
+        )
+    failed_runs.sort(key=lambda item: (str(item.get("workflow_name") or ""), str(item.get("run_id") or "")))
+    return failed_runs
+
+
+def get_authenticated_login():
+    data = gh_json(["api", "user"])
+    if not isinstance(data, dict) or not data.get("login"):
+        raise GhCommandError("Unable to determine authenticated GitHub login from `gh api user`")
+    return str(data["login"])
+
+
+def comment_endpoints(repo, pr_number):
+    return {
+        "issue_comment": f"repos/{repo}/issues/{pr_number}/comments",
+        "review_comment": f"repos/{repo}/pulls/{pr_number}/comments",
+        "review": f"repos/{repo}/pulls/{pr_number}/reviews",
+    }
+
+
+def gh_api_list_paginated(endpoint, repo=None, per_page=100):
+    items = []
+    page = 1
+    while True:
+        sep = "&" if "?" in endpoint else "?"
+        page_endpoint = f"{endpoint}{sep}per_page={per_page}&page={page}"
+        payload = gh_json(["api", page_endpoint], repo=repo)
+        if payload is None:
+            break
+        if not isinstance(payload, list):
+            raise GhCommandError(f"Unexpected paginated payload from gh api {endpoint}")
+        items.extend(payload)
+        if len(payload) < per_page:
+            break
+        page += 1
+    return items
+
+
+def normalize_issue_comments(items):
+    out = []
+    for item in items:
+        if not isinstance(item, dict):
+            continue
+        out.append(
+            {
+                "kind": "issue_comment",
+                "id": str(item.get("id") or ""),
+                "author": extract_login(item.get("user")),
+                "author_association": str(item.get("author_association") or ""),
+                "created_at": str(item.get("created_at") or ""),
+                "body": str(item.get("body") or ""),
+                "path": None,
+                "line": None,
+                "url": str(item.get("html_url") or ""),
+            }
+        )
+    return out
+
+
+def normalize_review_comments(items):
+    out = []
+    for item in items:
+        if not isinstance(item, dict):
+            continue
+        line = item.get("line")
+        if line is None:
+            line = item.get("original_line")
+        out.append(
+            {
+                "kind": "review_comment",
+                "id": str(item.get("id") or ""),
+                "author": extract_login(item.get("user")),
+                "author_association": str(item.get("author_association") or ""),
+                "created_at": str(item.get("created_at") or ""),
+                "body": str(item.get("body") or ""),
+                "path": item.get("path"),
+                "line": line,
+                "url": str(item.get("html_url") or ""),
+            }
+        )
+    return out
+
+
+def normalize_reviews(items):
+    out = []
+    for item in items:
+        if not isinstance(item, dict):
+            continue
+        out.append(
+            {
+                "kind": "review",
+                "id": str(item.get("id") or ""),
+                "author": extract_login(item.get("user")),
+                "author_association": str(item.get("author_association") or ""),
+                "created_at": str(item.get("submitted_at") or item.get("created_at") or ""),
+                "body": str(item.get("body") or ""),
+                "path": None,
+                "line": None,
+                "url": str(item.get("html_url") or ""),
+            }
+        )
+    return out
+
+
+def extract_login(user_obj):
+    if isinstance(user_obj, dict):
+        return str(user_obj.get("login") or "")
+    return ""
+
+
+def is_bot_login(login):
+    return bool(login) and login.endswith("[bot]")
+
+
+def is_actionable_review_bot_login(login):
+    if not is_bot_login(login):
+        return False
+    lower_login = login.lower()
+    return any(keyword in lower_login for keyword in REVIEW_BOT_LOGIN_KEYWORDS)
+
+
+def is_trusted_human_review_author(item, authenticated_login):
+    author = str(item.get("author") or "")
+    if not author:
+        return False
+    if authenticated_login and author == authenticated_login:
+        return True
+    association = str(item.get("author_association") or "").upper()
+    return association in TRUSTED_AUTHOR_ASSOCIATIONS
+
+
+def fetch_new_review_items(pr, state, fresh_state, authenticated_login=None):
+    repo = pr["repo"]
+    pr_number = pr["number"]
+    endpoints = comment_endpoints(repo, pr_number)
+
+    issue_payload = gh_api_list_paginated(endpoints["issue_comment"], repo=repo)
+    review_comment_payload = gh_api_list_paginated(endpoints["review_comment"], repo=repo)
+    review_payload = gh_api_list_paginated(endpoints["review"], repo=repo)
+
+    issue_items = normalize_issue_comments(issue_payload)
+    review_comment_items = normalize_review_comments(review_comment_payload)
+    review_items = normalize_reviews(review_payload)
+    all_items = issue_items + review_comment_items + review_items
+
+    seen_issue = {str(x) for x in state.get("seen_issue_comment_ids") or []}
+    seen_review_comment = {str(x) for x in state.get("seen_review_comment_ids") or []}
+    seen_review = {str(x) for x in state.get("seen_review_ids") or []}
+
+    # On a brand-new state file, surface existing review activity instead of
+    # silently treating it as seen. This avoids missing already-pending review
+    # feedback when monitoring starts after comments were posted.
+
+    new_items = []
+    for item in all_items:
+        item_id = item.get("id")
+        if not item_id:
+            continue
+        author = item.get("author") or ""
+        if not author:
+            continue
+        if is_bot_login(author):
+            if not is_actionable_review_bot_login(author):
+                continue
+        elif not is_trusted_human_review_author(item, authenticated_login):
+            continue
+
+        kind = item["kind"]
+        if kind == "issue_comment" and item_id in seen_issue:
+            continue
+        if kind == "review_comment" and item_id in seen_review_comment:
+            continue
+        if kind == "review" and item_id in seen_review:
+            continue
+
+        new_items.append(item)
+        if kind == "issue_comment":
+            seen_issue.add(item_id)
+        elif kind == "review_comment":
+            seen_review_comment.add(item_id)
+        elif kind == "review":
+            seen_review.add(item_id)
+
+    new_items.sort(key=lambda item: (item.get("created_at") or "", item.get("kind") or "", item.get("id") or ""))
+    state["seen_issue_comment_ids"] = sorted(seen_issue)
+    state["seen_review_comment_ids"] = sorted(seen_review_comment)
+    state["seen_review_ids"] = sorted(seen_review)
+    return new_items
+
+
+def current_retry_count(state, head_sha):
+    retries = state.get("retries_by_sha") or {}
+    value = retries.get(head_sha, 0)
+    try:
+        return int(value)
+    except (TypeError, ValueError):
+        return 0
+
+
+def set_retry_count(state, head_sha, count):
+    retries = state.get("retries_by_sha")
+    if not isinstance(retries, dict):
+        retries = {}
+    retries[head_sha] = int(count)
+    state["retries_by_sha"] = retries
+
+
+def unique_actions(actions):
+    out = []
+    seen = set()
+    for action in actions:
+        if action not in seen:
+            out.append(action)
+            seen.add(action)
+    return out
+
+
+def is_pr_ready_to_merge(pr, checks_summary, new_review_items):
+    if pr["closed"] or pr["merged"]:
+        return False
+    if not checks_summary["all_terminal"]:
+        return False
+    if checks_summary["failed_count"] > 0 or checks_summary["pending_count"] > 0:
+        return False
+    if new_review_items:
+        return False
+    if str(pr.get("mergeable") or "") != "MERGEABLE":
+        return False
+    if str(pr.get("merge_state_status") or "") in MERGE_CONFLICT_OR_BLOCKING_STATES:
+        return False
+    if str(pr.get("review_decision") or "") in MERGE_BLOCKING_REVIEW_DECISIONS:
+        return False
+    return True
+
+
+def recommend_actions(pr, checks_summary, failed_runs, new_review_items, retries_used, max_retries):
+    actions = []
+    if pr["closed"] or pr["merged"]:
+        if new_review_items:
+            actions.append("process_review_comment")
+        actions.append("stop_pr_closed")
+        return unique_actions(actions)
+
+    if is_pr_ready_to_merge(pr, checks_summary, new_review_items):
+        actions.append("stop_ready_to_merge")
+        return unique_actions(actions)
+
+    if new_review_items:
+        actions.append("process_review_comment")
+
+    has_failed_pr_checks = checks_summary["failed_count"] > 0
+    if has_failed_pr_checks:
+        if checks_summary["all_terminal"] and retries_used >= max_retries:
+            actions.append("stop_exhausted_retries")
+        else:
+            actions.append("diagnose_ci_failure")
+            if checks_summary["all_terminal"] and failed_runs and retries_used < max_retries:
+                actions.append("retry_failed_checks")
+
+    if not actions:
+        actions.append("idle")
+    return unique_actions(actions)
+
+
+def collect_snapshot(args):
+    pr = resolve_pr(args.pr, repo_override=args.repo)
+    state_path = Path(args.state_file) if args.state_file else default_state_file_for(pr)
+    state, fresh_state = load_state(state_path)
+
+    if not state.get("started_at"):
+        state["started_at"] = int(time.time())
+
+    # `gh pr checks -R <repo>` requires an explicit PR/branch/url argument.
+    # After resolving `--pr auto`, reuse the concrete PR number.
+    checks = get_pr_checks(str(pr["number"]), repo=pr["repo"])
+    checks_summary = summarize_checks(checks)
+    workflow_runs = get_workflow_runs_for_sha(pr["repo"], pr["head_sha"])
+    failed_runs = failed_runs_from_workflow_runs(workflow_runs, pr["head_sha"])
+    authenticated_login = get_authenticated_login()
+    new_review_items = fetch_new_review_items(
+        pr,
+        state,
+        fresh_state=fresh_state,
+        authenticated_login=authenticated_login,
+    )
+
+    retries_used = current_retry_count(state, pr["head_sha"])
+    actions = recommend_actions(
+        pr,
+        checks_summary,
+        failed_runs,
+        new_review_items,
+        retries_used,
+        args.max_flaky_retries,
+    )
+
+    state["pr"] = {"repo": pr["repo"], "number": pr["number"]}
+    state["last_seen_head_sha"] = pr["head_sha"]
+    state["last_snapshot_at"] = int(time.time())
+    save_state(state_path, state)
+
+    snapshot = {
+        "pr": pr,
+        "checks": checks_summary,
+        "failed_runs": failed_runs,
+        "new_review_items": new_review_items,
+        "actions": actions,
+        "retry_state": {
+            "current_sha_retries_used": retries_used,
+            "max_flaky_retries": args.max_flaky_retries,
+        },
+    }
+    return snapshot, state_path
+
+
+def retry_failed_now(args):
+    snapshot, state_path = collect_snapshot(args)
+    pr = snapshot["pr"]
+    checks_summary = snapshot["checks"]
+    failed_runs = snapshot["failed_runs"]
+    retries_used = snapshot["retry_state"]["current_sha_retries_used"]
+    max_retries = snapshot["retry_state"]["max_flaky_retries"]
+
+    result = {
+        "snapshot": snapshot,
+        "state_file": str(state_path),
+        "rerun_attempted": False,
+        "rerun_count": 0,
+        "rerun_run_ids": [],
+        "reason": None,
+    }
+
+    if pr["closed"] or pr["merged"]:
+        result["reason"] = "pr_closed"
+        return result
+    if checks_summary["failed_count"] <= 0:
+        result["reason"] = "no_failed_pr_checks"
+        return result
+    if not failed_runs:
+        result["reason"] = "no_failed_runs"
+        return result
+    if not checks_summary["all_terminal"]:
+        result["reason"] = "checks_still_pending"
+        return result
+    if retries_used >= max_retries:
+        result["reason"] = "retry_budget_exhausted"
+        return result
+
+    for run in failed_runs:
+        run_id = run.get("run_id")
+        if run_id in (None, ""):
+            continue
+        gh_text(["run", "rerun", str(run_id), "--failed"], repo=pr["repo"])
+        result["rerun_run_ids"].append(run_id)
+
+    if result["rerun_run_ids"]:
+        state, _ = load_state(state_path)
+        new_count = current_retry_count(state, pr["head_sha"]) + 1
+        set_retry_count(state, pr["head_sha"], new_count)
+        state["last_snapshot_at"] = int(time.time())
+        save_state(state_path, state)
+        result["rerun_attempted"] = True
+        result["rerun_count"] = len(result["rerun_run_ids"])
+        result["reason"] = "rerun_triggered"
+    else:
+        result["reason"] = "failed_runs_missing_ids"
+
+    return result
+
+
+def print_json(obj):
+    sys.stdout.write(json.dumps(obj, sort_keys=True) + "\n")
+    sys.stdout.flush()
+
+
+def print_event(event, payload):
+    print_json({"event": event, "payload": payload})
+
+
+def is_ci_green(snapshot):
+    checks = snapshot.get("checks") or {}
+    return (
+        bool(checks.get("all_terminal"))
+        and int(checks.get("failed_count") or 0) == 0
+        and int(checks.get("pending_count") or 0) == 0
+    )
+
+
+def snapshot_change_key(snapshot):
+    pr = snapshot.get("pr") or {}
+    checks = snapshot.get("checks") or {}
+    review_items = snapshot.get("new_review_items") or []
+    return (
+        str(pr.get("head_sha") or ""),
+        str(pr.get("state") or ""),
+        str(pr.get("mergeable") or ""),
+        str(pr.get("merge_state_status") or ""),
+        str(pr.get("review_decision") or ""),
+        int(checks.get("passed_count") or 0),
+        int(checks.get("failed_count") or 0),
+        int(checks.get("pending_count") or 0),
+        tuple(
+            (str(item.get("kind") or ""), str(item.get("id") or ""))
+            for item in review_items
+            if isinstance(item, dict)
+        ),
+        tuple(snapshot.get("actions") or []),
+    )
+
+
+def run_watch(args):
+    poll_seconds = args.poll_seconds
+    last_change_key = None
+    while True:
+        snapshot, state_path = collect_snapshot(args)
+        print_event(
+            "snapshot",
+            {
+                "snapshot": snapshot,
+                "state_file": str(state_path),
+                "next_poll_seconds": poll_seconds,
+            },
+        )
+        actions = set(snapshot.get("actions") or [])
+        if (
+            "stop_pr_closed" in actions
+            or "stop_exhausted_retries" in actions
+            or "stop_ready_to_merge" in actions
+        ):
+            print_event("stop", {"actions": snapshot.get("actions"), "pr": snapshot.get("pr")})
+            return 0
+
+        current_change_key = snapshot_change_key(snapshot)
+        changed = current_change_key != last_change_key
+        green = is_ci_green(snapshot)
+
+        if not green:
+            poll_seconds = args.poll_seconds
+        elif changed or last_change_key is None:
+            poll_seconds = args.poll_seconds
+        else:
+            poll_seconds = min(poll_seconds * 2, GREEN_STATE_MAX_POLL_SECONDS)
+
+        last_change_key = current_change_key
+        time.sleep(poll_seconds)
+
+
+def main():
+    args = parse_args()
+    try:
+        if args.retry_failed_now:
+            print_json(retry_failed_now(args))
+            return 0
+        if args.watch:
+            return run_watch(args)
+        snapshot, state_path = collect_snapshot(args)
+        snapshot["state_file"] = str(state_path)
+        print_json(snapshot)
+        return 0
+    except (GhCommandError, RuntimeError, ValueError) as err:
+        sys.stderr.write(f"gh_pr_watch.py error: {err}\n")
+        return 1
+    except KeyboardInterrupt:
+        sys.stderr.write("gh_pr_watch.py interrupted\n")
+        return 130
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/codex-rs/Cargo.lock
+++ b/codex-rs/Cargo.lock
@@ -1787,6 +1787,7 @@ dependencies = [
 "codex-protocol",
 "codex-shell-command",
 "codex-utils-cargo-bin",
+ "core_test_support",
 "exec_server_test_support",
 "libc",
 "maplit",
--- a/codex-rs/app-server/src/main.rs
+++ b/codex-rs/app-server/src/main.rs
@@ -23,10 +23,12 @@ struct AppServerArgs {
 }

 fn main() -> anyhow::Result<()> {
-    if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
-        return Ok(());
-    }
    arg0_dispatch_or_else(|codex_linux_sandbox_exe| async move {
+        // Run wrapper mode only after arg0 dispatch so `codex-linux-sandbox`
+        // invocations don't get misclassified as zsh exec-wrapper calls.
+        if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
+            return Ok(());
+        }
        let args = AppServerArgs::parse();
        let managed_config_path = managed_config_path_from_debug_env();
        let loader_overrides = LoaderOverrides {
--- a/codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs
+++ b/codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs
@@ -2,18 +2,15 @@
 //
 // Running these tests with the patched zsh fork:
 //
-// The suite uses `CODEX_TEST_ZSH_PATH` when set. Example:
-//   CODEX_TEST_ZSH_PATH="$HOME/.local/codex-zsh-77045ef/bin/zsh" \
-//   cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture
-//
-// For a single test:
-//   CODEX_TEST_ZSH_PATH="$HOME/.local/codex-zsh-77045ef/bin/zsh" \
-//   cargo test -p codex-app-server turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2 -- --nocapture
+// The suite resolves the shared test-only zsh DotSlash file at
+// `exec-server/tests/suite/zsh` via DotSlash on first use, so `dotslash` and
+// network access are required the first time the artifact is fetched.

 use anyhow::Result;
 use app_test_support::McpProcess;
 use app_test_support::create_final_assistant_message_sse_response;
 use app_test_support::create_mock_responses_server_sequence;
+use app_test_support::create_mock_responses_server_sequence_unchecked;
 use app_test_support::create_shell_command_sse_response;
 use app_test_support::to_response;
 use codex_app_server_protocol::CommandExecutionApprovalDecision;
@@ -38,6 +35,7 @@ use core_test_support::responses;
 use core_test_support::skip_if_no_network;
 use pretty_assertions::assert_eq;
 use std::collections::BTreeMap;
+use std::os::unix::fs::PermissionsExt;
 use std::path::Path;
 use tempfile::TempDir;
 use tokio::time::timeout;
@@ -57,7 +55,7 @@ async fn turn_start_shell_zsh_fork_executes_command_v2() -> Result<()> {
    let workspace = tmp.path().join("workspace");
    std::fs::create_dir(&workspace)?;

-    let Some(zsh_path) = find_test_zsh_path() else {
+    let Some(zsh_path) = find_test_zsh_path()? else {
        eprintln!("skipping zsh fork test: no zsh executable found");
        return Ok(());
    };
@@ -82,7 +80,7 @@ async fn turn_start_shell_zsh_fork_executes_command_v2() -> Result<()> {
        &zsh_path,
    )?;

-    let mut mcp = McpProcess::new(&codex_home).await?;
+    let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
    timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;

    let start_id = mcp
@@ -167,7 +165,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_decline_v2() -> Result<()> {
    let workspace = tmp.path().join("workspace");
    std::fs::create_dir(&workspace)?;

-    let Some(zsh_path) = find_test_zsh_path() else {
+    let Some(zsh_path) = find_test_zsh_path()? else {
        eprintln!("skipping zsh fork decline test: no zsh executable found");
        return Ok(());
    };
@@ -199,7 +197,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_decline_v2() -> Result<()> {
        &zsh_path,
    )?;

-    let mut mcp = McpProcess::new(&codex_home).await?;
+    let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
    timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;

    let start_id = mcp
@@ -303,7 +301,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_cancel_v2() -> Result<()> {
    let workspace = tmp.path().join("workspace");
    std::fs::create_dir(&workspace)?;

-    let Some(zsh_path) = find_test_zsh_path() else {
+    let Some(zsh_path) = find_test_zsh_path()? else {
        eprintln!("skipping zsh fork cancel test: no zsh executable found");
        return Ok(());
    };
@@ -332,7 +330,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_cancel_v2() -> Result<()> {
        &zsh_path,
    )?;

-    let mut mcp = McpProcess::new(&codex_home).await?;
+    let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
    timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;

    let start_id = mcp
@@ -434,7 +432,7 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
    let workspace = tmp.path().join("workspace");
    std::fs::create_dir(&workspace)?;

-    let Some(zsh_path) = find_test_zsh_path() else {
+    let Some(zsh_path) = find_test_zsh_path()? else {
        eprintln!("skipping zsh fork subcommand decline test: no zsh executable found");
        return Ok(());
    };
@@ -446,6 +444,29 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
        return Ok(());
    }
    eprintln!("using zsh path for zsh-fork test: {}", zsh_path.display());
+    let zsh_path_for_config = {
+        // App-server config accepts only a zsh path, not extra argv. Use a
+        // wrapper so this test can force `-df` and downgrade `-lc` to `-c`
+        // to avoid rc/login-shell startup noise.
+        let path = workspace.join("zsh-no-rc");
+        std::fs::write(
+            &path,
+            format!(
+                r#"#!/bin/sh
+if [ "$1" = "-lc" ]; then
+  shift
+  set -- -c "$@"
+fi
+exec "{}" -df "$@"
+"#,
+                zsh_path.display()
+            ),
+        )?;
+        let mut permissions = std::fs::metadata(&path)?.permissions();
+        permissions.set_mode(0o755);
+        std::fs::set_permissions(&path, permissions)?;
+        path
+    };

    let tool_call_arguments = serde_json::to_string(&serde_json::json!({
        "command": "/usr/bin/true && /usr/bin/true",
@@ -461,7 +482,16 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
        ),
        responses::ev_completed("resp-1"),
    ]);
-    let server = create_mock_responses_server_sequence(vec![response]).await;
+    let no_op_response = responses::sse(vec![
+        responses::ev_response_created("resp-2"),
+        responses::ev_completed("resp-2"),
+    ]);
+    // Linux CI has occasionally issued a second `/responses` POST after the
+    // subcommand-decline flow. This test is about approval/decline behavior in
+    // the zsh fork, not exact model request count, so allow an extra request
+    // and return a harmless no-op response if it arrives.
+    let server =
+        create_mock_responses_server_sequence_unchecked(vec![response, no_op_response]).await;
    create_config_toml(
        &codex_home,
        &server.uri(),
@@ -471,10 +501,10 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
            (Feature::UnifiedExec, false),
            (Feature::ShellSnapshot, false),
        ]),
-        &zsh_path,
+        &zsh_path_for_config,
    )?;

-    let mut mcp = McpProcess::new(&codex_home).await?;
+    let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
    timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;

    let start_id = mcp
@@ -500,8 +530,16 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
            }],
            cwd: Some(workspace.clone()),
            approval_policy: Some(codex_app_server_protocol::AskForApproval::OnRequest),
-            sandbox_policy: Some(codex_app_server_protocol::SandboxPolicy::ReadOnly {
-                access: codex_app_server_protocol::ReadOnlyAccess::FullAccess,
+            sandbox_policy: Some(if cfg!(target_os = "linux") {
+                // The zsh exec-bridge wrapper uses a Unix socket back to the parent
+                // process. Linux restricted sandbox seccomp denies connect(2), so use
+                // full access here; this test is validating zsh approval/decline
+                // behavior, not Linux sandboxing.
+                codex_app_server_protocol::SandboxPolicy::DangerFullAccess
+            } else {
+                codex_app_server_protocol::SandboxPolicy::ReadOnly {
+                    access: codex_app_server_protocol::ReadOnlyAccess::FullAccess,
+                }
            }),
            model: Some("mock-model".to_string()),
            effort: Some(codex_protocol::openai_models::ReasoningEffort::Medium),
@@ -517,10 +555,13 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
    let TurnStartResponse { turn } = to_response::<TurnStartResponse>(turn_resp)?;

    let mut approval_ids = Vec::new();
-    for decision in [
+    let mut saw_parent_approval = false;
+    let target_decisions = [
        CommandExecutionApprovalDecision::Accept,
        CommandExecutionApprovalDecision::Cancel,
-    ] {
+    ];
+    let mut target_decision_index = 0;
+    while target_decision_index < target_decisions.len() {
        let server_req = timeout(
            DEFAULT_READ_TIMEOUT,
            mcp.read_stream_until_request_message(),
@@ -531,13 +572,40 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
            panic!("expected CommandExecutionRequestApproval request");
        };
        assert_eq!(params.item_id, "call-zsh-fork-subcommand-decline");
-        approval_ids.push(
-            params
-                .approval_id
-                .clone()
-                .expect("approval_id must be present for zsh subcommand approvals"),
-        );
        assert_eq!(params.thread_id, thread.id);
+        let is_target_subcommand = params.command.as_deref() == Some("/usr/bin/true");
+        if is_target_subcommand {
+            approval_ids.push(
+                params
+                    .approval_id
+                    .clone()
+                    .expect("approval_id must be present for zsh subcommand approvals"),
+            );
+        }
+        let decision = if is_target_subcommand {
+            let decision = target_decisions[target_decision_index].clone();
+            target_decision_index += 1;
+            decision
+        } else {
+            let command = params
+                .command
+                .as_deref()
+                .expect("approval command should be present");
+            assert!(
+                !saw_parent_approval,
+                "unexpected extra non-target approval: {command}"
+            );
+            assert!(
+                command.contains("zsh-no-rc"),
+                "expected parent zsh wrapper approval, got: {command}"
+            );
+            assert!(
+                command.contains("/usr/bin/true && /usr/bin/true"),
+                "expected tool command in parent approval, got: {command}"
+            );
+            saw_parent_approval = true;
+            CommandExecutionApprovalDecision::Accept
+        };
        mcp.send_response(
            request_id,
            serde_json::to_value(CommandExecutionRequestApprovalResponse { decision })?,
@@ -545,6 +613,8 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
        .await?;
    }

+    assert_eq!(approval_ids.len(), 2);
+    assert_ne!(approval_ids[0], approval_ids[1]);
    let parent_completed_command_execution = timeout(DEFAULT_READ_TIMEOUT, async {
        loop {
            let completed_notif = mcp
@@ -563,32 +633,61 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
            }
        }
    })
-    .await??;
+    .await;

-    let ThreadItem::CommandExecution {
-        id,
-        status,
-        aggregated_output,
-        ..
-    } = parent_completed_command_execution
-    else {
-        unreachable!("loop ensures we break on parent command execution item");
-    };
-    assert_eq!(id, "call-zsh-fork-subcommand-decline");
-    assert_eq!(status, CommandExecutionStatus::Declined);
-    assert!(
-        aggregated_output.is_none()
-            || aggregated_output == Some("exec command rejected by user".to_string())
-    );
-    assert_eq!(approval_ids.len(), 2);
-    assert_ne!(approval_ids[0], approval_ids[1]);
+    match parent_completed_command_execution {
+        Ok(Ok(parent_completed_command_execution)) => {
+            let ThreadItem::CommandExecution {
+                id,
+                status,
+                aggregated_output,
+                ..
+            } = parent_completed_command_execution
+            else {
+                unreachable!("loop ensures we break on parent command execution item");
+            };
+            assert_eq!(id, "call-zsh-fork-subcommand-decline");
+            assert_eq!(status, CommandExecutionStatus::Declined);
+            assert!(
+                aggregated_output.is_none()
+                    || aggregated_output == Some("exec command rejected by user".to_string())
+            );

-    mcp.interrupt_turn_and_wait_for_aborted(thread.id, turn.id, DEFAULT_READ_TIMEOUT)
-        .await?;
+            mcp.interrupt_turn_and_wait_for_aborted(
+                thread.id.clone(),
+                turn.id.clone(),
+                DEFAULT_READ_TIMEOUT,
+            )
+            .await?;
+        }
+        Ok(Err(error)) => return Err(error),
+        Err(_) => {
+            // Some zsh builds abort the turn immediately after the rejected
+            // subcommand without emitting a parent `item/completed`.
+            let completed_notif = timeout(
+                DEFAULT_READ_TIMEOUT,
+                mcp.read_stream_until_notification_message("turn/completed"),
+            )
+            .await??;
+            let completed: TurnCompletedNotification = serde_json::from_value(
+                completed_notif
+                    .params
+                    .expect("turn/completed params must be present"),
+            )?;
+            assert_eq!(completed.thread_id, thread.id);
+            assert_eq!(completed.turn.id, turn.id);
+            assert_eq!(completed.turn.status, TurnStatus::Interrupted);
+        }
+    }

    Ok(())
 }

+async fn create_zsh_test_mcp_process(codex_home: &Path, zdotdir: &Path) -> Result<McpProcess> {
+    let zdotdir = zdotdir.to_string_lossy().into_owned();
+    McpProcess::new_with_env(codex_home, &[("ZDOTDIR", Some(zdotdir.as_str()))]).await
+}
+
 fn create_config_toml(
    codex_home: &Path,
    server_uri: &str,
@@ -640,36 +739,24 @@ stream_max_retries = 0
    )
 }

-fn find_test_zsh_path() -> Option<std::path::PathBuf> {
-    if let Some(path) = std::env::var_os("CODEX_TEST_ZSH_PATH") {
-        let path = std::path::PathBuf::from(path);
-        if path.is_file() {
-            return Some(path);
-        }
-        panic!(
-            "CODEX_TEST_ZSH_PATH is set but is not a file: {}",
-            path.display()
+fn find_test_zsh_path() -> Result<Option<std::path::PathBuf>> {
+    let repo_root = codex_utils_cargo_bin::repo_root()?;
+    let dotslash_zsh = repo_root.join("codex-rs/exec-server/tests/suite/zsh");
+    if !dotslash_zsh.is_file() {
+        eprintln!(
+            "skipping zsh fork test: shared zsh DotSlash file not found at {}",
+            dotslash_zsh.display()
        );
+        return Ok(None);
    }
-
-    for candidate in ["/bin/zsh", "/usr/bin/zsh"] {
-        let path = Path::new(candidate);
-        if path.is_file() {
-            return Some(path.to_path_buf());
+    match core_test_support::fetch_dotslash_file(&dotslash_zsh, None) {
+        Ok(path) => return Ok(Some(path)),
+        Err(error) => {
+            eprintln!("failed to fetch vendored zsh via dotslash: {error:#}");
        }
    }

-    let shell = std::env::var_os("SHELL")?;
-    let shell_path = std::path::PathBuf::from(shell);
-    if shell_path
-        .file_name()
-        .is_some_and(|file_name| file_name == "zsh")
-        && shell_path.is_file()
-    {
-        return Some(shell_path);
-    }
-
-    None
+    Ok(None)
 }

 fn supports_exec_wrapper_intercept(zsh_path: &Path) -> bool {
--- a/codex-rs/cli/src/main.rs
+++ b/codex-rs/cli/src/main.rs
@@ -543,10 +543,12 @@ fn stage_str(stage: codex_core::features::Stage) -> &'static str {
 }

 fn main() -> anyhow::Result<()> {
-    if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
-        return Ok(());
-    }
    arg0_dispatch_or_else(|codex_linux_sandbox_exe| async move {
+        // Run wrapper mode only after arg0 dispatch so `codex-linux-sandbox`
+        // invocations don't get misclassified as zsh exec-wrapper calls.
+        if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
+            return Ok(());
+        }
        cli_main(codex_linux_sandbox_exe).await?;
        Ok(())
    })
--- a/codex-rs/core/src/features.rs
+++ b/codex-rs/core/src/features.rs
@@ -368,13 +368,25 @@ fn legacy_usage_notice(alias: &str, feature: Feature) -> (String, Option<String>
            (summary, Some(web_search_details().to_string()))
        }
        _ => {
-            let summary = format!("`{alias}` is deprecated. Use `[features].{canonical}` instead.");
-            let details = if alias == canonical {
-                None
+            let (summary, details) = if alias == "collab" && feature == Feature::Collab {
+                (
+                    "Your configuration file has an error.".to_string(),
+                    Some(
+                        "Change collab=true to multi_agent=true in your config.toml or enable it by running codex --enable multi_agent".to_string(),
+                    ),
+                )
+            } else if alias == canonical {
+                (
+                    format!("`{alias}` is deprecated. Use `[features].{canonical}` instead."),
+                    None,
+                )
            } else {
-                Some(format!(
-                    "Enable it with `--enable {canonical}` or `[features].{canonical}` in config.toml. See https://developers.openai.com/codex/config-basic#feature-flags for details."
-                ))
+                (
+                    format!("`{alias}` is deprecated. Use `[features].{canonical}` instead."),
+                    Some(format!(
+                        "Enable it with `--enable {canonical}` or `[features].{canonical}` in config.toml. See https://developers.openai.com/codex/config-basic#feature-flags for details."
+                    )),
+                )
            };
            (summary, details)
        }
@@ -764,4 +776,17 @@ mod tests {
        assert_eq!(feature_for_key("multi_agent"), Some(Feature::Collab));
        assert_eq!(feature_for_key("collab"), Some(Feature::Collab));
    }
+
+    #[test]
+    fn collab_legacy_notice_uses_config_error_text() {
+        let (summary, details) = legacy_usage_notice("collab", Feature::Collab);
+
+        assert_eq!(summary, "Your configuration file has an error.".to_string());
+        assert_eq!(
+            details,
+            Some(
+                "Change collab=true to multi_agent=true in your config.toml or enable it by running codex --enable multi_agent".to_string()
+            )
+        );
+    }
 }
--- a/codex-rs/core/src/zsh_exec_bridge/mod.rs
+++ b/codex-rs/core/src/zsh_exec_bridge/mod.rs
@@ -166,6 +166,10 @@ impl ZshExecBridge {
        })?;

        let mut cmd = tokio::process::Command::new(&command[0]);
+        #[cfg(unix)]
+        if let Some(arg0) = &req.arg0 {
+            cmd.arg0(arg0);
+        }
        if command.len() > 1 {
            cmd.args(&command[1..]);
        }
@@ -459,7 +463,6 @@ fn run_exec_wrapper_mode() -> anyhow::Result<()> {
            argv: argv.clone(),
            cwd,
        };
-
        let mut stream = StdUnixStream::connect(&socket_path)
            .with_context(|| format!("connect to wrapper socket at {socket_path}"))?;
        let encoded = serde_json::to_string(&request).context("serialize wrapper request")?;
--- a/codex-rs/core/tests/common/lib.rs
+++ b/codex-rs/core/tests/common/lib.rs
@@ -1,5 +1,7 @@
 #![expect(clippy::expect_used)]

+use anyhow::Context as _;
+use anyhow::ensure;
 use codex_utils_cargo_bin::CargoBinError;
 use ctor::ctor;
 use tempfile::TempDir;
@@ -99,6 +101,42 @@ pub fn test_tmp_path_buf() -> PathBuf {
    test_tmp_path().into_path_buf()
 }

+/// Fetch a DotSlash resource and return the resolved executable/file path.
+pub fn fetch_dotslash_file(
+    dotslash_file: &std::path::Path,
+    dotslash_cache: Option<&std::path::Path>,
+) -> anyhow::Result<PathBuf> {
+    let mut command = std::process::Command::new("dotslash");
+    command.arg("--").arg("fetch").arg(dotslash_file);
+    if let Some(dotslash_cache) = dotslash_cache {
+        command.env("DOTSLASH_CACHE", dotslash_cache);
+    }
+    let output = command.output().with_context(|| {
+        format!(
+            "failed to run dotslash to fetch resource {}",
+            dotslash_file.display()
+        )
+    })?;
+    ensure!(
+        output.status.success(),
+        "dotslash fetch failed for {}: {}",
+        dotslash_file.display(),
+        String::from_utf8_lossy(&output.stderr).trim()
+    );
+    let fetched_path = String::from_utf8(output.stdout)
+        .context("dotslash fetch output was not utf8")?
+        .trim()
+        .to_string();
+    ensure!(!fetched_path.is_empty(), "dotslash fetch output was empty");
+    let fetched_path = PathBuf::from(fetched_path);
+    ensure!(
+        fetched_path.is_file(),
+        "dotslash returned non-file path: {}",
+        fetched_path.display()
+    );
+    Ok(fetched_path)
+}
+
 /// Returns a default `Config` whose on-disk state is confined to the provided
 /// temporary directory. Using a per-test directory keeps tests hermetic and
 /// avoids clobbering a developer’s real `~/.codex`.
--- a/codex-rs/exec-server/Cargo.toml
+++ b/codex-rs/exec-server/Cargo.toml
@@ -58,6 +58,7 @@ tracing = { workspace = true }
 tracing-subscriber = { workspace = true, features = ["env-filter", "fmt"] }

 [dev-dependencies]
+core_test_support = { workspace = true }
 codex-utils-cargo-bin = { workspace = true }
 codex-protocol = { workspace = true }
 exec_server_test_support = { workspace = true }
--- a/codex-rs/exec-server/tests/suite/accept_elicitation.rs
+++ b/codex-rs/exec-server/tests/suite/accept_elicitation.rs
@@ -61,15 +61,9 @@ prefix_rule(
 /// Verify the same prompt/escalation flow works when the server is launched
 /// with a patched zsh binary.
 ///
-/// Set CODEX_TEST_ZSH_PATH to enable this test locally or in CI.
+/// The suite resolves `tests/suite/zsh` via DotSlash on first use.
 #[tokio::test(flavor = "current_thread")]
 async fn accept_elicitation_for_prompt_rule_with_zsh() -> Result<()> {
-    let Some(zsh_path) = std::env::var_os("CODEX_TEST_ZSH_PATH") else {
-        eprintln!("skipping zsh test: CODEX_TEST_ZSH_PATH is not set");
-        return Ok(());
-    };
-    let zsh_path = PathBuf::from(zsh_path);
-
    let codex_home = TempDir::new()?;
    write_default_execpolicy(
        r#"
@@ -87,6 +81,11 @@ prefix_rule(
    .await?;
    let dotslash_cache_temp_dir = TempDir::new()?;
    let dotslash_cache = dotslash_cache_temp_dir.path();
+    let zsh_path = resolve_test_zsh_path(dotslash_cache).await?;
+    eprintln!(
+        "using zsh path for exec-server test: {}",
+        zsh_path.display()
+    );
    let transport =
        create_transport_with_shell_path(codex_home.as_ref(), dotslash_cache, &zsh_path).await?;
    run_accept_elicitation_for_prompt_rule_with_transport(transport).await
@@ -95,13 +94,13 @@ prefix_rule(
 async fn run_accept_elicitation_for_prompt_rule_with_transport(
    transport: rmcp::transport::TokioChildProcess,
 ) -> Result<()> {
-    // Create an MCP client that approves expected elicitation messages.
+    // Create an MCP client that approves the expected elicitation message.
    let project_root = TempDir::new()?;
    let project_root_path = project_root.path().canonicalize().unwrap();
    let git_path = resolve_git_path(USE_LOGIN_SHELL).await?;
+    let git_init_command = format!("{git_path} init --quiet .");
    let expected_elicitation_message = format!(
-        "Allow agent to run `{} init .` in `{}`?",
-        git_path,
+        "Allow agent to run `{git_path} init --quiet .` in `{}`?",
        project_root_path.display()
    );
    let elicitation_requests: Arc<Mutex<Vec<CreateElicitationRequestParams>>> = Default::default();
@@ -142,7 +141,7 @@ async fn run_accept_elicitation_for_prompt_rule_with_transport(
            arguments: Some(object(json!(
                {
                    "login": USE_LOGIN_SHELL,
-                    "command": "git init .",
+                    "command": git_init_command,
                    "workdir": project_root_path.to_string_lossy(),
                }
            ))),
@@ -157,15 +156,11 @@ async fn run_accept_elicitation_for_prompt_rule_with_transport(
    let ExecResult {
        exit_code, output, ..
    } = serde_json::from_str::<ExecResult>(&tool_call_content.text)?;
-    let git_init_succeeded = format!(
-        "Initialized empty Git repository in {}/.git/\n",
-        project_root_path.display()
-    );
-    // Normally, this would be an exact match, but it might include extra output
-    // if `git config set advice.defaultBranchName false` has not been set.
+    // `git init --quiet` is expected to suppress the usual initialization
+    // banner, so assert on success and filesystem effects instead of output.
    assert!(
-        output.contains(&git_init_succeeded),
-        "expected output `{output}` to contain `{git_init_succeeded}`"
+        output.is_empty(),
+        "expected no output from `git init --quiet .`, got `{output}`"
    );
    assert_eq!(exit_code, 0, "command should succeed");
    assert_eq!(is_error, Some(false), "command should succeed");
@@ -192,6 +187,12 @@ async fn run_accept_elicitation_for_prompt_rule_with_transport(
    Ok(())
 }

+async fn resolve_test_zsh_path(dotslash_cache: &std::path::Path) -> Result<PathBuf> {
+    let dotslash_zsh = codex_utils_cargo_bin::find_resource!("tests/suite/zsh")?;
+    core_test_support::fetch_dotslash_file(&dotslash_zsh, Some(dotslash_cache))
+        .with_context(|| format!("failed to fetch test zsh from {}", dotslash_zsh.display()))
+}
+
 fn ensure_codex_cli() -> Result<PathBuf> {
    let codex_cli = codex_utils_cargo_bin::cargo_bin("codex")?;

--- a/codex-rs/exec-server/tests/suite/zsh
+++ b/codex-rs/exec-server/tests/suite/zsh
@@ -0,0 +1,72 @@
+#!/usr/bin/env dotslash
+
+// This is the patched zsh fork built by
+// `.github/workflows/shell-tool-mcp.yml` for the shell-tool-mcp package.
+// Fetching the prebuilt version via DotSlash makes it easier to write
+// integration tests that exercise the zsh fork behavior in exec-server tests.
+//
+// TODO(mbolin): Currently, we use a .tgz artifact that includes binaries for
+// multiple platforms, but we could save a bit of space by making arch-specific
+// artifacts available in the GitHub releases and referencing those here.
+{
+  "name": "codex-zsh",
+  "platforms": {
+    // macOS 13 builds (and therefore x86_64) were dropped in
+    // https://github.com/openai/codex/pull/7295, so we only provide an
+    // Apple Silicon build for now.
+    "macos-aarch64": {
+      "size": 53771483,
+      "hash": "blake3",
+      "digest": "ff664f63f5e1fa62762c9aff0aafa66cf196faf9b157f98ec98f59c152fc7bd3",
+      "format": "tar.gz",
+      "path": "package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh",
+      "providers": [
+        {
+          "url": "https://github.com/openai/codex/releases/download/rust-v0.104.0/codex-shell-tool-mcp-npm-0.104.0.tgz"
+        },
+        {
+          "type": "github-release",
+          "repo": "openai/codex",
+          "tag": "rust-v0.104.0",
+          "name": "codex-shell-tool-mcp-npm-0.104.0.tgz"
+        }
+      ]
+    },
+    "linux-x86_64": {
+      "size": 53771483,
+      "hash": "blake3",
+      "digest": "ff664f63f5e1fa62762c9aff0aafa66cf196faf9b157f98ec98f59c152fc7bd3",
+      "format": "tar.gz",
+      "path": "package/vendor/x86_64-unknown-linux-musl/zsh/ubuntu-24.04/zsh",
+      "providers": [
+        {
+          "url": "https://github.com/openai/codex/releases/download/rust-v0.104.0/codex-shell-tool-mcp-npm-0.104.0.tgz"
+        },
+        {
+          "type": "github-release",
+          "repo": "openai/codex",
+          "tag": "rust-v0.104.0",
+          "name": "codex-shell-tool-mcp-npm-0.104.0.tgz"
+        }
+      ]
+    },
+    "linux-aarch64": {
+      "size": 53771483,
+      "hash": "blake3",
+      "digest": "ff664f63f5e1fa62762c9aff0aafa66cf196faf9b157f98ec98f59c152fc7bd3",
+      "format": "tar.gz",
+      "path": "package/vendor/aarch64-unknown-linux-musl/zsh/ubuntu-24.04/zsh",
+      "providers": [
+        {
+          "url": "https://github.com/openai/codex/releases/download/rust-v0.104.0/codex-shell-tool-mcp-npm-0.104.0.tgz"
+        },
+        {
+          "type": "github-release",
+          "repo": "openai/codex",
+          "tag": "rust-v0.104.0",
+          "name": "codex-shell-tool-mcp-npm-0.104.0.tgz"
+        }
+      ]
+    },
+  }
+}