mirror of
https://github.com/openai/codex.git
synced 2026-02-25 02:03:48 +00:00
Compare commits
3 Commits
remove/ste
...
eb/collab-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f4c61633e6 | ||
|
|
e8949f4507 | ||
|
|
7e569f1162 |
185
.codex/skills/babysit-pr/SKILL.md
Normal file
185
.codex/skills/babysit-pr/SKILL.md
Normal file
@@ -0,0 +1,185 @@
|
||||
---
|
||||
name: babysit-pr
|
||||
description: Babysit a GitHub pull request after creation by continuously polling CI checks/workflow runs, new review comments, and mergeability state until the PR is ready to merge (or merged/closed). Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and stop only when user help is required (for example CI infrastructure issues, exhausted flaky retries, or ambiguous/blocking situations). Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
|
||||
---
|
||||
|
||||
# PR Babysitter
|
||||
|
||||
## Objective
|
||||
Babysit a PR persistently until one of these terminal outcomes occurs:
|
||||
|
||||
- The PR is merged or closed.
|
||||
- CI is successful, there are no unaddressed review comments surfaced by the watcher, required review approval is not blocking merge, and there are no potential merge conflicts (PR is mergeable / not reporting conflict risk).
|
||||
- A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely).
|
||||
|
||||
Do not stop merely because a single snapshot returns `idle` while checks are still pending.
|
||||
|
||||
## Inputs
|
||||
Accept any of the following:
|
||||
|
||||
- No PR argument: infer the PR from the current branch (`--pr auto`)
|
||||
- PR number
|
||||
- PR URL
|
||||
|
||||
## Core Workflow
|
||||
|
||||
1. When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (`--watch`) unless you are intentionally doing a one-shot diagnostic snapshot.
|
||||
2. Run the watcher script to snapshot PR/CI/review state (or consume each streamed snapshot from `--watch`).
|
||||
3. Inspect the `actions` list in the JSON response.
|
||||
4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure.
|
||||
5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
|
||||
6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
|
||||
7. If a review item is actionable and correct, patch code locally, commit, and push.
|
||||
8. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
|
||||
9. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
|
||||
10. On every loop, verify mergeability / merge-conflict status (for example via `gh pr view`) in addition to CI and review state.
|
||||
11. After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state.
|
||||
12. If you had been using `--watch` before pausing to patch/commit/push, relaunch `--watch` yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).
|
||||
13. Repeat polling until the PR is green + review-clean + mergeable, `stop_pr_closed` appears, or a user-help-required blocker is reached.
|
||||
14. Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached `--watch` process running and then end the turn as if monitoring were complete.
|
||||
|
||||
## Commands
|
||||
|
||||
### One-shot snapshot
|
||||
|
||||
```bash
|
||||
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --once
|
||||
```
|
||||
|
||||
### Continuous watch (JSONL)
|
||||
|
||||
```bash
|
||||
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch
|
||||
```
|
||||
|
||||
### Trigger flaky retry cycle (only when watcher indicates)
|
||||
|
||||
```bash
|
||||
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now
|
||||
```
|
||||
|
||||
### Explicit PR target
|
||||
|
||||
```bash
|
||||
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr <number-or-url> --once
|
||||
```
|
||||
|
||||
## CI Failure Classification
|
||||
Use `gh` commands to inspect failed runs before deciding to rerun.
|
||||
|
||||
- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
|
||||
- `gh run view <run-id> --log-failed`
|
||||
|
||||
Prefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
|
||||
|
||||
Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
|
||||
|
||||
If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
|
||||
|
||||
Read `.codex/skills/babysit-pr/references/heuristics.md` for a concise checklist.
|
||||
|
||||
## Review Comment Handling
|
||||
The watcher surfaces review items from:
|
||||
|
||||
- PR issue comments
|
||||
- Inline review comments
|
||||
- Review submissions (COMMENT / APPROVED / CHANGES_REQUESTED)
|
||||
|
||||
It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from `chatgpt-codex-connector[bot]`) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored.
|
||||
For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex.
|
||||
On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
|
||||
|
||||
When you agree with a comment and it is actionable:
|
||||
|
||||
1. Patch code locally.
|
||||
2. Commit with `codex: address PR review feedback (#<n>)`.
|
||||
3. Push to the PR head branch.
|
||||
4. Resume watching on the new SHA immediately (do not stop after reporting the push).
|
||||
5. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.
|
||||
|
||||
If you disagree or the comment is non-actionable/already addressed, record it as handled by continuing the watcher loop (the script de-duplicates surfaced items via state after surfacing them).
|
||||
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
|
||||
|
||||
## Git Safety Rules
|
||||
|
||||
- Work only on the PR head branch.
|
||||
- Avoid destructive git commands.
|
||||
- Do not switch branches unless necessary to recover context.
|
||||
- Before editing, check for unrelated uncommitted changes. If present, stop and ask the user.
|
||||
- After each successful fix, commit and `git push`, then re-run the watcher.
|
||||
- If you interrupted a live `--watch` session to make the fix, restart `--watch` immediately after the push in the same turn.
|
||||
- Do not run multiple concurrent `--watch` processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.
|
||||
- A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met.
|
||||
|
||||
Commit message defaults:
|
||||
|
||||
- `codex: fix CI failure on PR #<n>`
|
||||
- `codex: address PR review feedback (#<n>)`
|
||||
|
||||
## Monitoring Loop Pattern
|
||||
Use this loop in a live Codex session:
|
||||
|
||||
1. Run `--once`.
|
||||
2. Read `actions`.
|
||||
3. First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
|
||||
4. Check CI summary, new review items, and mergeability/conflict status.
|
||||
5. Diagnose CI failures and classify branch-related vs flaky/unrelated.
|
||||
6. Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
|
||||
7. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit.
|
||||
8. If you pushed a commit or triggered a rerun, report the action briefly and continue polling (do not stop).
|
||||
9. After a review-fix push, proactively restart continuous monitoring (`--watch`) in the same turn unless a strict stop condition has already been reached.
|
||||
10. If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report success and stop.
|
||||
11. If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.
|
||||
12. Otherwise sleep according to the polling cadence below and repeat.
|
||||
|
||||
When the user explicitly asks to monitor/watch/babysit a PR, prefer `--watch` so polling continues autonomously in one command. Use repeated `--once` snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
|
||||
Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
|
||||
Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
|
||||
If a `--watch` process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
|
||||
|
||||
## Polling Cadence
|
||||
Use adaptive polling and continue monitoring even after CI turns green:
|
||||
|
||||
- While CI is not green (pending/running/queued or failing): poll every 1 minute.
|
||||
- After CI turns green: start at every 1 minute, then back off exponentially when there is no change (for example 1m, 2m, 4m, 8m, 16m, 32m), capping at every 1 hour.
|
||||
- Reset the green-state polling interval back to 1 minute whenever anything changes (new commit/SHA, check status changes, new review comments, mergeability changes, review decision changes).
|
||||
- If CI stops being green again (new commit, rerun, or regression): return to 1-minute polling.
|
||||
- If any poll shows the PR is merged or otherwise closed: stop polling immediately and report the terminal state.
|
||||
|
||||
## Stop Conditions (Strict)
|
||||
Stop only when one of the following is true:
|
||||
|
||||
- PR merged or closed (stop as soon as a poll/snapshot confirms this).
|
||||
- PR is ready to merge: CI succeeded, no surfaced unaddressed review comments, not blocked on required review approval, and no merge conflict risk.
|
||||
- User intervention is required and Codex cannot safely proceed alone.
|
||||
|
||||
Keep polling when:
|
||||
|
||||
- `actions` contains only `idle` but checks are still pending.
|
||||
- CI is still running/queued.
|
||||
- Review state is quiet but CI is not terminal.
|
||||
- CI is green but mergeability is unknown/pending.
|
||||
- CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes per the green-state cadence.
|
||||
- The PR is green but blocked on review approval (`REVIEW_REQUIRED` / similar); continue polling on the green-state cadence and surface any new review comments without asking for confirmation to keep watching.
|
||||
|
||||
## Output Expectations
|
||||
Provide concise progress updates while monitoring and a final summary that includes:
|
||||
|
||||
- During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
|
||||
- Treat push confirmations, intermediate CI snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
|
||||
- A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
|
||||
- A review-fix commit + push is not a completion event; immediately resume live monitoring (`--watch`) in the same turn and continue reporting progress updates.
|
||||
- When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: `🚀 CI is all green! 33/33 passed. Still on watch for review approval.`
|
||||
- Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
|
||||
|
||||
- Final PR SHA
|
||||
- CI status summary
|
||||
- Mergeability / conflict status
|
||||
- Fixes pushed
|
||||
- Flaky retry cycles used
|
||||
- Remaining unresolved failures or review comments
|
||||
|
||||
## References
|
||||
|
||||
- Heuristics and decision tree: `.codex/skills/babysit-pr/references/heuristics.md`
|
||||
- GitHub CLI/API details used by the watcher: `.codex/skills/babysit-pr/references/github-api-notes.md`
|
||||
4
.codex/skills/babysit-pr/agents/openai.yaml
Normal file
4
.codex/skills/babysit-pr/agents/openai.yaml
Normal file
@@ -0,0 +1,4 @@
|
||||
interface:
|
||||
display_name: "PR Babysitter"
|
||||
short_description: "Watch PR CI, reviews, and merge conflicts"
|
||||
default_prompt: "Babysit the current PR: monitor CI, reviewer comments, and merge-conflict status (prefer the watcher’s --watch mode for live monitoring); fix valid issues, push updates, and rerun flaky failures up to 3 times. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Continue polling autonomously after any push/rerun until a strict terminal stop condition is reached or the user interrupts."
|
||||
72
.codex/skills/babysit-pr/references/github-api-notes.md
Normal file
72
.codex/skills/babysit-pr/references/github-api-notes.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# GitHub CLI / API Notes For `babysit-pr`
|
||||
|
||||
## Primary commands used
|
||||
|
||||
### PR metadata
|
||||
|
||||
- `gh pr view --json number,url,state,mergedAt,closedAt,headRefName,headRefOid,headRepository,headRepositoryOwner`
|
||||
|
||||
Used to resolve PR number, URL, branch, head SHA, and closed/merged state.
|
||||
|
||||
### PR checks summary
|
||||
|
||||
- `gh pr checks --json name,state,bucket,link,workflow,event,startedAt,completedAt`
|
||||
|
||||
Used to compute pending/failed/passed counts and whether the current CI round is terminal.
|
||||
|
||||
### Workflow runs for head SHA
|
||||
|
||||
- `gh api repos/{owner}/{repo}/actions/runs -X GET -f head_sha=<sha> -f per_page=100`
|
||||
|
||||
Used to discover failed workflow runs and rerunnable run IDs.
|
||||
|
||||
### Failed log inspection
|
||||
|
||||
- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
|
||||
- `gh run view <run-id> --log-failed`
|
||||
|
||||
Used by Codex to classify branch-related vs flaky/unrelated failures.
|
||||
|
||||
### Retry failed jobs only
|
||||
|
||||
- `gh run rerun <run-id> --failed`
|
||||
|
||||
Reruns only failed jobs (and dependencies) for a workflow run.
|
||||
|
||||
## Review-related endpoints
|
||||
|
||||
- Issue comments on PR:
|
||||
- `gh api repos/{owner}/{repo}/issues/<pr_number>/comments?per_page=100`
|
||||
- Inline PR review comments:
|
||||
- `gh api repos/{owner}/{repo}/pulls/<pr_number>/comments?per_page=100`
|
||||
- Review submissions:
|
||||
- `gh api repos/{owner}/{repo}/pulls/<pr_number>/reviews?per_page=100`
|
||||
|
||||
## JSON fields consumed by the watcher
|
||||
|
||||
### `gh pr view`
|
||||
|
||||
- `number`
|
||||
- `url`
|
||||
- `state`
|
||||
- `mergedAt`
|
||||
- `closedAt`
|
||||
- `headRefName`
|
||||
- `headRefOid`
|
||||
|
||||
### `gh pr checks`
|
||||
|
||||
- `bucket` (`pass`, `fail`, `pending`, `skipping`)
|
||||
- `state`
|
||||
- `name`
|
||||
- `workflow`
|
||||
- `link`
|
||||
|
||||
### Actions runs API (`workflow_runs[]`)
|
||||
|
||||
- `id`
|
||||
- `name`
|
||||
- `status`
|
||||
- `conclusion`
|
||||
- `html_url`
|
||||
- `head_sha`
|
||||
58
.codex/skills/babysit-pr/references/heuristics.md
Normal file
58
.codex/skills/babysit-pr/references/heuristics.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# CI / Review Heuristics
|
||||
|
||||
## CI classification checklist
|
||||
|
||||
Treat as **branch-related** when logs clearly indicate a regression caused by the PR branch:
|
||||
|
||||
- Compile/typecheck/lint failures in files or modules touched by the branch
|
||||
- Deterministic unit/integration test failures in changed areas
|
||||
- Snapshot output changes caused by UI/text changes in the branch
|
||||
- Static analysis violations introduced by the latest push
|
||||
- Build script/config changes in the PR causing a deterministic failure
|
||||
|
||||
Treat as **likely flaky or unrelated** when evidence points to transient or external issues:
|
||||
|
||||
- DNS/network/registry timeout errors while fetching dependencies
|
||||
- Runner image provisioning or startup failures
|
||||
- GitHub Actions infrastructure/service outages
|
||||
- Cloud/service rate limits or transient API outages
|
||||
- Non-deterministic failures in unrelated integration tests with known flake patterns
|
||||
|
||||
If uncertain, inspect failed logs once before choosing rerun.
|
||||
|
||||
## Decision tree (fix vs rerun vs stop)
|
||||
|
||||
1. If PR is merged/closed: stop.
|
||||
2. If there are failed checks:
|
||||
- Diagnose first.
|
||||
- If branch-related: fix locally, commit, push.
|
||||
- If likely flaky/unrelated and all checks for the current SHA are terminal: rerun failed jobs.
|
||||
- If checks are still pending: wait.
|
||||
3. If flaky reruns for the same SHA reach the configured limit (default 3): stop and report persistent failure.
|
||||
4. Independently, process any new human review comments.
|
||||
|
||||
## Review comment agreement criteria
|
||||
|
||||
Address the comment when:
|
||||
|
||||
- The comment is technically correct.
|
||||
- The change is actionable in the current branch.
|
||||
- The requested change does not conflict with the user’s intent or recent guidance.
|
||||
- The change can be made safely without unrelated refactors.
|
||||
|
||||
Do not auto-fix when:
|
||||
|
||||
- The comment is ambiguous and needs clarification.
|
||||
- The request conflicts with explicit user instructions.
|
||||
- The proposed change requires product/design decisions the user has not made.
|
||||
- The codebase is in a dirty/unrelated state that makes safe editing uncertain.
|
||||
|
||||
## Stop-and-ask conditions
|
||||
|
||||
Stop and ask the user instead of continuing automatically when:
|
||||
|
||||
- The local worktree has unrelated uncommitted changes.
|
||||
- `gh` auth/permissions fail.
|
||||
- The PR branch cannot be pushed.
|
||||
- CI failures persist after the flaky retry budget.
|
||||
- Reviewer feedback requires a product decision or cross-team coordination.
|
||||
805
.codex/skills/babysit-pr/scripts/gh_pr_watch.py
Executable file
805
.codex/skills/babysit-pr/scripts/gh_pr_watch.py
Executable file
@@ -0,0 +1,805 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Watch GitHub PR CI and review activity for Codex PR babysitting workflows."""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
from urllib.parse import urlparse
|
||||
|
||||
FAILED_RUN_CONCLUSIONS = {
|
||||
"failure",
|
||||
"timed_out",
|
||||
"cancelled",
|
||||
"action_required",
|
||||
"startup_failure",
|
||||
"stale",
|
||||
}
|
||||
PENDING_CHECK_STATES = {
|
||||
"QUEUED",
|
||||
"IN_PROGRESS",
|
||||
"PENDING",
|
||||
"WAITING",
|
||||
"REQUESTED",
|
||||
}
|
||||
REVIEW_BOT_LOGIN_KEYWORDS = {
|
||||
"codex",
|
||||
}
|
||||
TRUSTED_AUTHOR_ASSOCIATIONS = {
|
||||
"OWNER",
|
||||
"MEMBER",
|
||||
"COLLABORATOR",
|
||||
}
|
||||
MERGE_BLOCKING_REVIEW_DECISIONS = {
|
||||
"REVIEW_REQUIRED",
|
||||
"CHANGES_REQUESTED",
|
||||
}
|
||||
MERGE_CONFLICT_OR_BLOCKING_STATES = {
|
||||
"BLOCKED",
|
||||
"DIRTY",
|
||||
"DRAFT",
|
||||
"UNKNOWN",
|
||||
}
|
||||
GREEN_STATE_MAX_POLL_SECONDS = 60 * 60
|
||||
|
||||
|
||||
class GhCommandError(RuntimeError):
|
||||
pass
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(
|
||||
description=(
|
||||
"Normalize PR/CI/review state for Codex PR babysitting and optionally "
|
||||
"trigger flaky reruns."
|
||||
)
|
||||
)
|
||||
parser.add_argument("--pr", default="auto", help="auto, PR number, or PR URL")
|
||||
parser.add_argument("--repo", help="Optional OWNER/REPO override")
|
||||
parser.add_argument("--poll-seconds", type=int, default=30, help="Watch poll interval")
|
||||
parser.add_argument(
|
||||
"--max-flaky-retries",
|
||||
type=int,
|
||||
default=3,
|
||||
help="Max rerun cycles per head SHA before stop recommendation",
|
||||
)
|
||||
parser.add_argument("--state-file", help="Path to state JSON file")
|
||||
parser.add_argument("--once", action="store_true", help="Emit one snapshot and exit")
|
||||
parser.add_argument("--watch", action="store_true", help="Continuously emit JSONL snapshots")
|
||||
parser.add_argument(
|
||||
"--retry-failed-now",
|
||||
action="store_true",
|
||||
help="Rerun failed jobs for current failed workflow runs when policy allows",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json",
|
||||
action="store_true",
|
||||
help="Emit machine-readable output (default behavior for --once and --retry-failed-now)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.poll_seconds <= 0:
|
||||
parser.error("--poll-seconds must be > 0")
|
||||
if args.max_flaky_retries < 0:
|
||||
parser.error("--max-flaky-retries must be >= 0")
|
||||
if args.watch and args.retry_failed_now:
|
||||
parser.error("--watch cannot be combined with --retry-failed-now")
|
||||
if not args.once and not args.watch and not args.retry_failed_now:
|
||||
args.once = True
|
||||
return args
|
||||
|
||||
|
||||
def _format_gh_error(cmd, err):
|
||||
stdout = (err.stdout or "").strip()
|
||||
stderr = (err.stderr or "").strip()
|
||||
parts = [f"GitHub CLI command failed: {' '.join(cmd)}"]
|
||||
if stdout:
|
||||
parts.append(f"stdout: {stdout}")
|
||||
if stderr:
|
||||
parts.append(f"stderr: {stderr}")
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
def gh_text(args, repo=None):
|
||||
cmd = ["gh"]
|
||||
# `gh api` does not accept `-R/--repo` on all gh versions. The watcher's
|
||||
# API calls use explicit endpoints (e.g. repos/{owner}/{repo}/...), so the
|
||||
# repo flag is unnecessary there.
|
||||
if repo and (not args or args[0] != "api"):
|
||||
cmd.extend(["-R", repo])
|
||||
cmd.extend(args)
|
||||
try:
|
||||
proc = subprocess.run(cmd, check=True, capture_output=True, text=True)
|
||||
except FileNotFoundError as err:
|
||||
raise GhCommandError("`gh` command not found") from err
|
||||
except subprocess.CalledProcessError as err:
|
||||
raise GhCommandError(_format_gh_error(cmd, err)) from err
|
||||
return proc.stdout
|
||||
|
||||
|
||||
def gh_json(args, repo=None):
|
||||
raw = gh_text(args, repo=repo).strip()
|
||||
if not raw:
|
||||
return None
|
||||
try:
|
||||
return json.loads(raw)
|
||||
except json.JSONDecodeError as err:
|
||||
raise GhCommandError(f"Failed to parse JSON from gh output for {' '.join(args)}") from err
|
||||
|
||||
|
||||
def parse_pr_spec(pr_spec):
|
||||
if pr_spec == "auto":
|
||||
return {"mode": "auto", "value": None}
|
||||
if re.fullmatch(r"\d+", pr_spec):
|
||||
return {"mode": "number", "value": pr_spec}
|
||||
parsed = urlparse(pr_spec)
|
||||
if parsed.scheme and parsed.netloc and "/pull/" in parsed.path:
|
||||
return {"mode": "url", "value": pr_spec}
|
||||
raise ValueError("--pr must be 'auto', a PR number, or a PR URL")
|
||||
|
||||
|
||||
def pr_view_fields():
|
||||
return (
|
||||
"number,url,state,mergedAt,closedAt,headRefName,headRefOid,"
|
||||
"headRepository,headRepositoryOwner,mergeable,mergeStateStatus,reviewDecision"
|
||||
)
|
||||
|
||||
|
||||
def checks_fields():
|
||||
return "name,state,bucket,link,workflow,event,startedAt,completedAt"
|
||||
|
||||
|
||||
def resolve_pr(pr_spec, repo_override=None):
|
||||
parsed = parse_pr_spec(pr_spec)
|
||||
cmd = ["pr", "view"]
|
||||
if parsed["value"] is not None:
|
||||
cmd.append(parsed["value"])
|
||||
cmd.extend(["--json", pr_view_fields()])
|
||||
data = gh_json(cmd, repo=repo_override)
|
||||
if not isinstance(data, dict):
|
||||
raise GhCommandError("Unexpected PR payload from `gh pr view`")
|
||||
|
||||
pr_url = str(data.get("url") or "")
|
||||
repo = (
|
||||
repo_override
|
||||
or extract_repo_from_pr_url(pr_url)
|
||||
or extract_repo_from_pr_view(data)
|
||||
)
|
||||
if not repo:
|
||||
raise GhCommandError("Unable to determine OWNER/REPO for the PR")
|
||||
|
||||
state = str(data.get("state") or "")
|
||||
merged = bool(data.get("mergedAt"))
|
||||
closed = bool(data.get("closedAt")) or state.upper() == "CLOSED"
|
||||
|
||||
return {
|
||||
"number": int(data["number"]),
|
||||
"url": pr_url,
|
||||
"repo": repo,
|
||||
"head_sha": str(data.get("headRefOid") or ""),
|
||||
"head_branch": str(data.get("headRefName") or ""),
|
||||
"state": state,
|
||||
"merged": merged,
|
||||
"closed": closed,
|
||||
"mergeable": str(data.get("mergeable") or ""),
|
||||
"merge_state_status": str(data.get("mergeStateStatus") or ""),
|
||||
"review_decision": str(data.get("reviewDecision") or ""),
|
||||
}
|
||||
|
||||
|
||||
def extract_repo_from_pr_view(data):
|
||||
head_repo = data.get("headRepository")
|
||||
head_owner = data.get("headRepositoryOwner")
|
||||
owner = None
|
||||
name = None
|
||||
if isinstance(head_owner, dict):
|
||||
owner = head_owner.get("login") or head_owner.get("name")
|
||||
elif isinstance(head_owner, str):
|
||||
owner = head_owner
|
||||
if isinstance(head_repo, dict):
|
||||
name = head_repo.get("name")
|
||||
repo_owner = head_repo.get("owner")
|
||||
if not owner and isinstance(repo_owner, dict):
|
||||
owner = repo_owner.get("login") or repo_owner.get("name")
|
||||
elif isinstance(head_repo, str):
|
||||
name = head_repo
|
||||
if owner and name:
|
||||
return f"{owner}/{name}"
|
||||
return None
|
||||
def extract_repo_from_pr_url(pr_url):
|
||||
parsed = urlparse(pr_url)
|
||||
parts = [p for p in parsed.path.split("/") if p]
|
||||
if len(parts) >= 4 and parts[2] == "pull":
|
||||
return f"{parts[0]}/{parts[1]}"
|
||||
return None
|
||||
|
||||
|
||||
def load_state(path):
|
||||
if path.exists():
|
||||
try:
|
||||
data = json.loads(path.read_text())
|
||||
except json.JSONDecodeError as err:
|
||||
raise RuntimeError(f"State file is not valid JSON: {path}") from err
|
||||
if not isinstance(data, dict):
|
||||
raise RuntimeError(f"State file must contain an object: {path}")
|
||||
return data, False
|
||||
return {
|
||||
"pr": {},
|
||||
"started_at": None,
|
||||
"last_seen_head_sha": None,
|
||||
"retries_by_sha": {},
|
||||
"seen_issue_comment_ids": [],
|
||||
"seen_review_comment_ids": [],
|
||||
"seen_review_ids": [],
|
||||
"last_snapshot_at": None,
|
||||
}, True
|
||||
|
||||
|
||||
def save_state(path, state):
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
payload = json.dumps(state, indent=2, sort_keys=True) + "\n"
|
||||
fd, tmp_name = tempfile.mkstemp(prefix=f"{path.name}.", suffix=".tmp", dir=path.parent)
|
||||
tmp_path = Path(tmp_name)
|
||||
try:
|
||||
with os.fdopen(fd, "w", encoding="utf-8") as tmp_file:
|
||||
tmp_file.write(payload)
|
||||
os.replace(tmp_path, path)
|
||||
except Exception:
|
||||
try:
|
||||
tmp_path.unlink(missing_ok=True)
|
||||
except OSError:
|
||||
pass
|
||||
raise
|
||||
|
||||
|
||||
def default_state_file_for(pr):
|
||||
repo_slug = pr["repo"].replace("/", "-")
|
||||
return Path(f"/tmp/codex-babysit-pr-{repo_slug}-pr{pr['number']}.json")
|
||||
|
||||
|
||||
def get_pr_checks(pr_spec, repo):
|
||||
parsed = parse_pr_spec(pr_spec)
|
||||
cmd = ["pr", "checks"]
|
||||
if parsed["value"] is not None:
|
||||
cmd.append(parsed["value"])
|
||||
cmd.extend(["--json", checks_fields()])
|
||||
data = gh_json(cmd, repo=repo)
|
||||
if data is None:
|
||||
return []
|
||||
if not isinstance(data, list):
|
||||
raise GhCommandError("Unexpected payload from `gh pr checks`")
|
||||
return data
|
||||
|
||||
|
||||
def is_pending_check(check):
|
||||
bucket = str(check.get("bucket") or "").lower()
|
||||
state = str(check.get("state") or "").upper()
|
||||
return bucket == "pending" or state in PENDING_CHECK_STATES
|
||||
|
||||
|
||||
def summarize_checks(checks):
|
||||
pending_count = 0
|
||||
failed_count = 0
|
||||
passed_count = 0
|
||||
for check in checks:
|
||||
bucket = str(check.get("bucket") or "").lower()
|
||||
if is_pending_check(check):
|
||||
pending_count += 1
|
||||
if bucket == "fail":
|
||||
failed_count += 1
|
||||
if bucket == "pass":
|
||||
passed_count += 1
|
||||
return {
|
||||
"pending_count": pending_count,
|
||||
"failed_count": failed_count,
|
||||
"passed_count": passed_count,
|
||||
"all_terminal": pending_count == 0,
|
||||
}
|
||||
|
||||
|
||||
def get_workflow_runs_for_sha(repo, head_sha):
|
||||
endpoint = f"repos/{repo}/actions/runs"
|
||||
data = gh_json(
|
||||
["api", endpoint, "-X", "GET", "-f", f"head_sha={head_sha}", "-f", "per_page=100"],
|
||||
repo=repo,
|
||||
)
|
||||
if not isinstance(data, dict):
|
||||
raise GhCommandError("Unexpected payload from actions runs API")
|
||||
runs = data.get("workflow_runs") or []
|
||||
if not isinstance(runs, list):
|
||||
raise GhCommandError("Expected `workflow_runs` to be a list")
|
||||
return runs
|
||||
|
||||
|
||||
def failed_runs_from_workflow_runs(runs, head_sha):
|
||||
failed_runs = []
|
||||
for run in runs:
|
||||
if not isinstance(run, dict):
|
||||
continue
|
||||
if str(run.get("head_sha") or "") != head_sha:
|
||||
continue
|
||||
conclusion = str(run.get("conclusion") or "")
|
||||
if conclusion not in FAILED_RUN_CONCLUSIONS:
|
||||
continue
|
||||
failed_runs.append(
|
||||
{
|
||||
"run_id": run.get("id"),
|
||||
"workflow_name": run.get("name") or run.get("display_title") or "",
|
||||
"status": str(run.get("status") or ""),
|
||||
"conclusion": conclusion,
|
||||
"html_url": str(run.get("html_url") or ""),
|
||||
}
|
||||
)
|
||||
failed_runs.sort(key=lambda item: (str(item.get("workflow_name") or ""), str(item.get("run_id") or "")))
|
||||
return failed_runs
|
||||
|
||||
|
||||
def get_authenticated_login():
|
||||
data = gh_json(["api", "user"])
|
||||
if not isinstance(data, dict) or not data.get("login"):
|
||||
raise GhCommandError("Unable to determine authenticated GitHub login from `gh api user`")
|
||||
return str(data["login"])
|
||||
|
||||
|
||||
def comment_endpoints(repo, pr_number):
|
||||
return {
|
||||
"issue_comment": f"repos/{repo}/issues/{pr_number}/comments",
|
||||
"review_comment": f"repos/{repo}/pulls/{pr_number}/comments",
|
||||
"review": f"repos/{repo}/pulls/{pr_number}/reviews",
|
||||
}
|
||||
|
||||
|
||||
def gh_api_list_paginated(endpoint, repo=None, per_page=100):
|
||||
items = []
|
||||
page = 1
|
||||
while True:
|
||||
sep = "&" if "?" in endpoint else "?"
|
||||
page_endpoint = f"{endpoint}{sep}per_page={per_page}&page={page}"
|
||||
payload = gh_json(["api", page_endpoint], repo=repo)
|
||||
if payload is None:
|
||||
break
|
||||
if not isinstance(payload, list):
|
||||
raise GhCommandError(f"Unexpected paginated payload from gh api {endpoint}")
|
||||
items.extend(payload)
|
||||
if len(payload) < per_page:
|
||||
break
|
||||
page += 1
|
||||
return items
|
||||
|
||||
|
||||
def normalize_issue_comments(items):
|
||||
out = []
|
||||
for item in items:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
out.append(
|
||||
{
|
||||
"kind": "issue_comment",
|
||||
"id": str(item.get("id") or ""),
|
||||
"author": extract_login(item.get("user")),
|
||||
"author_association": str(item.get("author_association") or ""),
|
||||
"created_at": str(item.get("created_at") or ""),
|
||||
"body": str(item.get("body") or ""),
|
||||
"path": None,
|
||||
"line": None,
|
||||
"url": str(item.get("html_url") or ""),
|
||||
}
|
||||
)
|
||||
return out
|
||||
|
||||
|
||||
def normalize_review_comments(items):
|
||||
out = []
|
||||
for item in items:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
line = item.get("line")
|
||||
if line is None:
|
||||
line = item.get("original_line")
|
||||
out.append(
|
||||
{
|
||||
"kind": "review_comment",
|
||||
"id": str(item.get("id") or ""),
|
||||
"author": extract_login(item.get("user")),
|
||||
"author_association": str(item.get("author_association") or ""),
|
||||
"created_at": str(item.get("created_at") or ""),
|
||||
"body": str(item.get("body") or ""),
|
||||
"path": item.get("path"),
|
||||
"line": line,
|
||||
"url": str(item.get("html_url") or ""),
|
||||
}
|
||||
)
|
||||
return out
|
||||
|
||||
|
||||
def normalize_reviews(items):
|
||||
out = []
|
||||
for item in items:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
out.append(
|
||||
{
|
||||
"kind": "review",
|
||||
"id": str(item.get("id") or ""),
|
||||
"author": extract_login(item.get("user")),
|
||||
"author_association": str(item.get("author_association") or ""),
|
||||
"created_at": str(item.get("submitted_at") or item.get("created_at") or ""),
|
||||
"body": str(item.get("body") or ""),
|
||||
"path": None,
|
||||
"line": None,
|
||||
"url": str(item.get("html_url") or ""),
|
||||
}
|
||||
)
|
||||
return out
|
||||
|
||||
|
||||
def extract_login(user_obj):
|
||||
if isinstance(user_obj, dict):
|
||||
return str(user_obj.get("login") or "")
|
||||
return ""
|
||||
|
||||
|
||||
def is_bot_login(login):
|
||||
return bool(login) and login.endswith("[bot]")
|
||||
|
||||
|
||||
def is_actionable_review_bot_login(login):
|
||||
if not is_bot_login(login):
|
||||
return False
|
||||
lower_login = login.lower()
|
||||
return any(keyword in lower_login for keyword in REVIEW_BOT_LOGIN_KEYWORDS)
|
||||
|
||||
|
||||
def is_trusted_human_review_author(item, authenticated_login):
|
||||
author = str(item.get("author") or "")
|
||||
if not author:
|
||||
return False
|
||||
if authenticated_login and author == authenticated_login:
|
||||
return True
|
||||
association = str(item.get("author_association") or "").upper()
|
||||
return association in TRUSTED_AUTHOR_ASSOCIATIONS
|
||||
|
||||
|
||||
def fetch_new_review_items(pr, state, fresh_state, authenticated_login=None):
|
||||
repo = pr["repo"]
|
||||
pr_number = pr["number"]
|
||||
endpoints = comment_endpoints(repo, pr_number)
|
||||
|
||||
issue_payload = gh_api_list_paginated(endpoints["issue_comment"], repo=repo)
|
||||
review_comment_payload = gh_api_list_paginated(endpoints["review_comment"], repo=repo)
|
||||
review_payload = gh_api_list_paginated(endpoints["review"], repo=repo)
|
||||
|
||||
issue_items = normalize_issue_comments(issue_payload)
|
||||
review_comment_items = normalize_review_comments(review_comment_payload)
|
||||
review_items = normalize_reviews(review_payload)
|
||||
all_items = issue_items + review_comment_items + review_items
|
||||
|
||||
seen_issue = {str(x) for x in state.get("seen_issue_comment_ids") or []}
|
||||
seen_review_comment = {str(x) for x in state.get("seen_review_comment_ids") or []}
|
||||
seen_review = {str(x) for x in state.get("seen_review_ids") or []}
|
||||
|
||||
# On a brand-new state file, surface existing review activity instead of
|
||||
# silently treating it as seen. This avoids missing already-pending review
|
||||
# feedback when monitoring starts after comments were posted.
|
||||
|
||||
new_items = []
|
||||
for item in all_items:
|
||||
item_id = item.get("id")
|
||||
if not item_id:
|
||||
continue
|
||||
author = item.get("author") or ""
|
||||
if not author:
|
||||
continue
|
||||
if is_bot_login(author):
|
||||
if not is_actionable_review_bot_login(author):
|
||||
continue
|
||||
elif not is_trusted_human_review_author(item, authenticated_login):
|
||||
continue
|
||||
|
||||
kind = item["kind"]
|
||||
if kind == "issue_comment" and item_id in seen_issue:
|
||||
continue
|
||||
if kind == "review_comment" and item_id in seen_review_comment:
|
||||
continue
|
||||
if kind == "review" and item_id in seen_review:
|
||||
continue
|
||||
|
||||
new_items.append(item)
|
||||
if kind == "issue_comment":
|
||||
seen_issue.add(item_id)
|
||||
elif kind == "review_comment":
|
||||
seen_review_comment.add(item_id)
|
||||
elif kind == "review":
|
||||
seen_review.add(item_id)
|
||||
|
||||
new_items.sort(key=lambda item: (item.get("created_at") or "", item.get("kind") or "", item.get("id") or ""))
|
||||
state["seen_issue_comment_ids"] = sorted(seen_issue)
|
||||
state["seen_review_comment_ids"] = sorted(seen_review_comment)
|
||||
state["seen_review_ids"] = sorted(seen_review)
|
||||
return new_items
|
||||
|
||||
|
||||
def current_retry_count(state, head_sha):
|
||||
retries = state.get("retries_by_sha") or {}
|
||||
value = retries.get(head_sha, 0)
|
||||
try:
|
||||
return int(value)
|
||||
except (TypeError, ValueError):
|
||||
return 0
|
||||
|
||||
|
||||
def set_retry_count(state, head_sha, count):
|
||||
retries = state.get("retries_by_sha")
|
||||
if not isinstance(retries, dict):
|
||||
retries = {}
|
||||
retries[head_sha] = int(count)
|
||||
state["retries_by_sha"] = retries
|
||||
|
||||
|
||||
def unique_actions(actions):
|
||||
out = []
|
||||
seen = set()
|
||||
for action in actions:
|
||||
if action not in seen:
|
||||
out.append(action)
|
||||
seen.add(action)
|
||||
return out
|
||||
|
||||
|
||||
def is_pr_ready_to_merge(pr, checks_summary, new_review_items):
|
||||
if pr["closed"] or pr["merged"]:
|
||||
return False
|
||||
if not checks_summary["all_terminal"]:
|
||||
return False
|
||||
if checks_summary["failed_count"] > 0 or checks_summary["pending_count"] > 0:
|
||||
return False
|
||||
if new_review_items:
|
||||
return False
|
||||
if str(pr.get("mergeable") or "") != "MERGEABLE":
|
||||
return False
|
||||
if str(pr.get("merge_state_status") or "") in MERGE_CONFLICT_OR_BLOCKING_STATES:
|
||||
return False
|
||||
if str(pr.get("review_decision") or "") in MERGE_BLOCKING_REVIEW_DECISIONS:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def recommend_actions(pr, checks_summary, failed_runs, new_review_items, retries_used, max_retries):
|
||||
actions = []
|
||||
if pr["closed"] or pr["merged"]:
|
||||
if new_review_items:
|
||||
actions.append("process_review_comment")
|
||||
actions.append("stop_pr_closed")
|
||||
return unique_actions(actions)
|
||||
|
||||
if is_pr_ready_to_merge(pr, checks_summary, new_review_items):
|
||||
actions.append("stop_ready_to_merge")
|
||||
return unique_actions(actions)
|
||||
|
||||
if new_review_items:
|
||||
actions.append("process_review_comment")
|
||||
|
||||
has_failed_pr_checks = checks_summary["failed_count"] > 0
|
||||
if has_failed_pr_checks:
|
||||
if checks_summary["all_terminal"] and retries_used >= max_retries:
|
||||
actions.append("stop_exhausted_retries")
|
||||
else:
|
||||
actions.append("diagnose_ci_failure")
|
||||
if checks_summary["all_terminal"] and failed_runs and retries_used < max_retries:
|
||||
actions.append("retry_failed_checks")
|
||||
|
||||
if not actions:
|
||||
actions.append("idle")
|
||||
return unique_actions(actions)
|
||||
|
||||
|
||||
def collect_snapshot(args):
|
||||
pr = resolve_pr(args.pr, repo_override=args.repo)
|
||||
state_path = Path(args.state_file) if args.state_file else default_state_file_for(pr)
|
||||
state, fresh_state = load_state(state_path)
|
||||
|
||||
if not state.get("started_at"):
|
||||
state["started_at"] = int(time.time())
|
||||
|
||||
# `gh pr checks -R <repo>` requires an explicit PR/branch/url argument.
|
||||
# After resolving `--pr auto`, reuse the concrete PR number.
|
||||
checks = get_pr_checks(str(pr["number"]), repo=pr["repo"])
|
||||
checks_summary = summarize_checks(checks)
|
||||
workflow_runs = get_workflow_runs_for_sha(pr["repo"], pr["head_sha"])
|
||||
failed_runs = failed_runs_from_workflow_runs(workflow_runs, pr["head_sha"])
|
||||
authenticated_login = get_authenticated_login()
|
||||
new_review_items = fetch_new_review_items(
|
||||
pr,
|
||||
state,
|
||||
fresh_state=fresh_state,
|
||||
authenticated_login=authenticated_login,
|
||||
)
|
||||
|
||||
retries_used = current_retry_count(state, pr["head_sha"])
|
||||
actions = recommend_actions(
|
||||
pr,
|
||||
checks_summary,
|
||||
failed_runs,
|
||||
new_review_items,
|
||||
retries_used,
|
||||
args.max_flaky_retries,
|
||||
)
|
||||
|
||||
state["pr"] = {"repo": pr["repo"], "number": pr["number"]}
|
||||
state["last_seen_head_sha"] = pr["head_sha"]
|
||||
state["last_snapshot_at"] = int(time.time())
|
||||
save_state(state_path, state)
|
||||
|
||||
snapshot = {
|
||||
"pr": pr,
|
||||
"checks": checks_summary,
|
||||
"failed_runs": failed_runs,
|
||||
"new_review_items": new_review_items,
|
||||
"actions": actions,
|
||||
"retry_state": {
|
||||
"current_sha_retries_used": retries_used,
|
||||
"max_flaky_retries": args.max_flaky_retries,
|
||||
},
|
||||
}
|
||||
return snapshot, state_path
|
||||
|
||||
|
||||
def retry_failed_now(args):
|
||||
snapshot, state_path = collect_snapshot(args)
|
||||
pr = snapshot["pr"]
|
||||
checks_summary = snapshot["checks"]
|
||||
failed_runs = snapshot["failed_runs"]
|
||||
retries_used = snapshot["retry_state"]["current_sha_retries_used"]
|
||||
max_retries = snapshot["retry_state"]["max_flaky_retries"]
|
||||
|
||||
result = {
|
||||
"snapshot": snapshot,
|
||||
"state_file": str(state_path),
|
||||
"rerun_attempted": False,
|
||||
"rerun_count": 0,
|
||||
"rerun_run_ids": [],
|
||||
"reason": None,
|
||||
}
|
||||
|
||||
if pr["closed"] or pr["merged"]:
|
||||
result["reason"] = "pr_closed"
|
||||
return result
|
||||
if checks_summary["failed_count"] <= 0:
|
||||
result["reason"] = "no_failed_pr_checks"
|
||||
return result
|
||||
if not failed_runs:
|
||||
result["reason"] = "no_failed_runs"
|
||||
return result
|
||||
if not checks_summary["all_terminal"]:
|
||||
result["reason"] = "checks_still_pending"
|
||||
return result
|
||||
if retries_used >= max_retries:
|
||||
result["reason"] = "retry_budget_exhausted"
|
||||
return result
|
||||
|
||||
for run in failed_runs:
|
||||
run_id = run.get("run_id")
|
||||
if run_id in (None, ""):
|
||||
continue
|
||||
gh_text(["run", "rerun", str(run_id), "--failed"], repo=pr["repo"])
|
||||
result["rerun_run_ids"].append(run_id)
|
||||
|
||||
if result["rerun_run_ids"]:
|
||||
state, _ = load_state(state_path)
|
||||
new_count = current_retry_count(state, pr["head_sha"]) + 1
|
||||
set_retry_count(state, pr["head_sha"], new_count)
|
||||
state["last_snapshot_at"] = int(time.time())
|
||||
save_state(state_path, state)
|
||||
result["rerun_attempted"] = True
|
||||
result["rerun_count"] = len(result["rerun_run_ids"])
|
||||
result["reason"] = "rerun_triggered"
|
||||
else:
|
||||
result["reason"] = "failed_runs_missing_ids"
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def print_json(obj):
|
||||
sys.stdout.write(json.dumps(obj, sort_keys=True) + "\n")
|
||||
sys.stdout.flush()
|
||||
|
||||
|
||||
def print_event(event, payload):
|
||||
print_json({"event": event, "payload": payload})
|
||||
|
||||
|
||||
def is_ci_green(snapshot):
|
||||
checks = snapshot.get("checks") or {}
|
||||
return (
|
||||
bool(checks.get("all_terminal"))
|
||||
and int(checks.get("failed_count") or 0) == 0
|
||||
and int(checks.get("pending_count") or 0) == 0
|
||||
)
|
||||
|
||||
|
||||
def snapshot_change_key(snapshot):
|
||||
pr = snapshot.get("pr") or {}
|
||||
checks = snapshot.get("checks") or {}
|
||||
review_items = snapshot.get("new_review_items") or []
|
||||
return (
|
||||
str(pr.get("head_sha") or ""),
|
||||
str(pr.get("state") or ""),
|
||||
str(pr.get("mergeable") or ""),
|
||||
str(pr.get("merge_state_status") or ""),
|
||||
str(pr.get("review_decision") or ""),
|
||||
int(checks.get("passed_count") or 0),
|
||||
int(checks.get("failed_count") or 0),
|
||||
int(checks.get("pending_count") or 0),
|
||||
tuple(
|
||||
(str(item.get("kind") or ""), str(item.get("id") or ""))
|
||||
for item in review_items
|
||||
if isinstance(item, dict)
|
||||
),
|
||||
tuple(snapshot.get("actions") or []),
|
||||
)
|
||||
|
||||
|
||||
def run_watch(args):
|
||||
poll_seconds = args.poll_seconds
|
||||
last_change_key = None
|
||||
while True:
|
||||
snapshot, state_path = collect_snapshot(args)
|
||||
print_event(
|
||||
"snapshot",
|
||||
{
|
||||
"snapshot": snapshot,
|
||||
"state_file": str(state_path),
|
||||
"next_poll_seconds": poll_seconds,
|
||||
},
|
||||
)
|
||||
actions = set(snapshot.get("actions") or [])
|
||||
if (
|
||||
"stop_pr_closed" in actions
|
||||
or "stop_exhausted_retries" in actions
|
||||
or "stop_ready_to_merge" in actions
|
||||
):
|
||||
print_event("stop", {"actions": snapshot.get("actions"), "pr": snapshot.get("pr")})
|
||||
return 0
|
||||
|
||||
current_change_key = snapshot_change_key(snapshot)
|
||||
changed = current_change_key != last_change_key
|
||||
green = is_ci_green(snapshot)
|
||||
|
||||
if not green:
|
||||
poll_seconds = args.poll_seconds
|
||||
elif changed or last_change_key is None:
|
||||
poll_seconds = args.poll_seconds
|
||||
else:
|
||||
poll_seconds = min(poll_seconds * 2, GREEN_STATE_MAX_POLL_SECONDS)
|
||||
|
||||
last_change_key = current_change_key
|
||||
time.sleep(poll_seconds)
|
||||
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
try:
|
||||
if args.retry_failed_now:
|
||||
print_json(retry_failed_now(args))
|
||||
return 0
|
||||
if args.watch:
|
||||
return run_watch(args)
|
||||
snapshot, state_path = collect_snapshot(args)
|
||||
snapshot["state_file"] = str(state_path)
|
||||
print_json(snapshot)
|
||||
return 0
|
||||
except (GhCommandError, RuntimeError, ValueError) as err:
|
||||
sys.stderr.write(f"gh_pr_watch.py error: {err}\n")
|
||||
return 1
|
||||
except KeyboardInterrupt:
|
||||
sys.stderr.write("gh_pr_watch.py interrupted\n")
|
||||
return 130
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
1
codex-rs/Cargo.lock
generated
1
codex-rs/Cargo.lock
generated
@@ -1787,6 +1787,7 @@ dependencies = [
|
||||
"codex-protocol",
|
||||
"codex-shell-command",
|
||||
"codex-utils-cargo-bin",
|
||||
"core_test_support",
|
||||
"exec_server_test_support",
|
||||
"libc",
|
||||
"maplit",
|
||||
|
||||
@@ -23,10 +23,12 @@ struct AppServerArgs {
|
||||
}
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
|
||||
return Ok(());
|
||||
}
|
||||
arg0_dispatch_or_else(|codex_linux_sandbox_exe| async move {
|
||||
// Run wrapper mode only after arg0 dispatch so `codex-linux-sandbox`
|
||||
// invocations don't get misclassified as zsh exec-wrapper calls.
|
||||
if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
|
||||
return Ok(());
|
||||
}
|
||||
let args = AppServerArgs::parse();
|
||||
let managed_config_path = managed_config_path_from_debug_env();
|
||||
let loader_overrides = LoaderOverrides {
|
||||
|
||||
@@ -2,18 +2,15 @@
|
||||
//
|
||||
// Running these tests with the patched zsh fork:
|
||||
//
|
||||
// The suite uses `CODEX_TEST_ZSH_PATH` when set. Example:
|
||||
// CODEX_TEST_ZSH_PATH="$HOME/.local/codex-zsh-77045ef/bin/zsh" \
|
||||
// cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture
|
||||
//
|
||||
// For a single test:
|
||||
// CODEX_TEST_ZSH_PATH="$HOME/.local/codex-zsh-77045ef/bin/zsh" \
|
||||
// cargo test -p codex-app-server turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2 -- --nocapture
|
||||
// The suite resolves the shared test-only zsh DotSlash file at
|
||||
// `exec-server/tests/suite/zsh` via DotSlash on first use, so `dotslash` and
|
||||
// network access are required the first time the artifact is fetched.
|
||||
|
||||
use anyhow::Result;
|
||||
use app_test_support::McpProcess;
|
||||
use app_test_support::create_final_assistant_message_sse_response;
|
||||
use app_test_support::create_mock_responses_server_sequence;
|
||||
use app_test_support::create_mock_responses_server_sequence_unchecked;
|
||||
use app_test_support::create_shell_command_sse_response;
|
||||
use app_test_support::to_response;
|
||||
use codex_app_server_protocol::CommandExecutionApprovalDecision;
|
||||
@@ -38,6 +35,7 @@ use core_test_support::responses;
|
||||
use core_test_support::skip_if_no_network;
|
||||
use pretty_assertions::assert_eq;
|
||||
use std::collections::BTreeMap;
|
||||
use std::os::unix::fs::PermissionsExt;
|
||||
use std::path::Path;
|
||||
use tempfile::TempDir;
|
||||
use tokio::time::timeout;
|
||||
@@ -57,7 +55,7 @@ async fn turn_start_shell_zsh_fork_executes_command_v2() -> Result<()> {
|
||||
let workspace = tmp.path().join("workspace");
|
||||
std::fs::create_dir(&workspace)?;
|
||||
|
||||
let Some(zsh_path) = find_test_zsh_path() else {
|
||||
let Some(zsh_path) = find_test_zsh_path()? else {
|
||||
eprintln!("skipping zsh fork test: no zsh executable found");
|
||||
return Ok(());
|
||||
};
|
||||
@@ -82,7 +80,7 @@ async fn turn_start_shell_zsh_fork_executes_command_v2() -> Result<()> {
|
||||
&zsh_path,
|
||||
)?;
|
||||
|
||||
let mut mcp = McpProcess::new(&codex_home).await?;
|
||||
let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
|
||||
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
|
||||
|
||||
let start_id = mcp
|
||||
@@ -167,7 +165,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_decline_v2() -> Result<()> {
|
||||
let workspace = tmp.path().join("workspace");
|
||||
std::fs::create_dir(&workspace)?;
|
||||
|
||||
let Some(zsh_path) = find_test_zsh_path() else {
|
||||
let Some(zsh_path) = find_test_zsh_path()? else {
|
||||
eprintln!("skipping zsh fork decline test: no zsh executable found");
|
||||
return Ok(());
|
||||
};
|
||||
@@ -199,7 +197,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_decline_v2() -> Result<()> {
|
||||
&zsh_path,
|
||||
)?;
|
||||
|
||||
let mut mcp = McpProcess::new(&codex_home).await?;
|
||||
let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
|
||||
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
|
||||
|
||||
let start_id = mcp
|
||||
@@ -303,7 +301,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_cancel_v2() -> Result<()> {
|
||||
let workspace = tmp.path().join("workspace");
|
||||
std::fs::create_dir(&workspace)?;
|
||||
|
||||
let Some(zsh_path) = find_test_zsh_path() else {
|
||||
let Some(zsh_path) = find_test_zsh_path()? else {
|
||||
eprintln!("skipping zsh fork cancel test: no zsh executable found");
|
||||
return Ok(());
|
||||
};
|
||||
@@ -332,7 +330,7 @@ async fn turn_start_shell_zsh_fork_exec_approval_cancel_v2() -> Result<()> {
|
||||
&zsh_path,
|
||||
)?;
|
||||
|
||||
let mut mcp = McpProcess::new(&codex_home).await?;
|
||||
let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
|
||||
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
|
||||
|
||||
let start_id = mcp
|
||||
@@ -434,7 +432,7 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
let workspace = tmp.path().join("workspace");
|
||||
std::fs::create_dir(&workspace)?;
|
||||
|
||||
let Some(zsh_path) = find_test_zsh_path() else {
|
||||
let Some(zsh_path) = find_test_zsh_path()? else {
|
||||
eprintln!("skipping zsh fork subcommand decline test: no zsh executable found");
|
||||
return Ok(());
|
||||
};
|
||||
@@ -446,6 +444,29 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
return Ok(());
|
||||
}
|
||||
eprintln!("using zsh path for zsh-fork test: {}", zsh_path.display());
|
||||
let zsh_path_for_config = {
|
||||
// App-server config accepts only a zsh path, not extra argv. Use a
|
||||
// wrapper so this test can force `-df` and downgrade `-lc` to `-c`
|
||||
// to avoid rc/login-shell startup noise.
|
||||
let path = workspace.join("zsh-no-rc");
|
||||
std::fs::write(
|
||||
&path,
|
||||
format!(
|
||||
r#"#!/bin/sh
|
||||
if [ "$1" = "-lc" ]; then
|
||||
shift
|
||||
set -- -c "$@"
|
||||
fi
|
||||
exec "{}" -df "$@"
|
||||
"#,
|
||||
zsh_path.display()
|
||||
),
|
||||
)?;
|
||||
let mut permissions = std::fs::metadata(&path)?.permissions();
|
||||
permissions.set_mode(0o755);
|
||||
std::fs::set_permissions(&path, permissions)?;
|
||||
path
|
||||
};
|
||||
|
||||
let tool_call_arguments = serde_json::to_string(&serde_json::json!({
|
||||
"command": "/usr/bin/true && /usr/bin/true",
|
||||
@@ -461,7 +482,16 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
),
|
||||
responses::ev_completed("resp-1"),
|
||||
]);
|
||||
let server = create_mock_responses_server_sequence(vec![response]).await;
|
||||
let no_op_response = responses::sse(vec![
|
||||
responses::ev_response_created("resp-2"),
|
||||
responses::ev_completed("resp-2"),
|
||||
]);
|
||||
// Linux CI has occasionally issued a second `/responses` POST after the
|
||||
// subcommand-decline flow. This test is about approval/decline behavior in
|
||||
// the zsh fork, not exact model request count, so allow an extra request
|
||||
// and return a harmless no-op response if it arrives.
|
||||
let server =
|
||||
create_mock_responses_server_sequence_unchecked(vec![response, no_op_response]).await;
|
||||
create_config_toml(
|
||||
&codex_home,
|
||||
&server.uri(),
|
||||
@@ -471,10 +501,10 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
(Feature::UnifiedExec, false),
|
||||
(Feature::ShellSnapshot, false),
|
||||
]),
|
||||
&zsh_path,
|
||||
&zsh_path_for_config,
|
||||
)?;
|
||||
|
||||
let mut mcp = McpProcess::new(&codex_home).await?;
|
||||
let mut mcp = create_zsh_test_mcp_process(&codex_home, &workspace).await?;
|
||||
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
|
||||
|
||||
let start_id = mcp
|
||||
@@ -500,8 +530,16 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
}],
|
||||
cwd: Some(workspace.clone()),
|
||||
approval_policy: Some(codex_app_server_protocol::AskForApproval::OnRequest),
|
||||
sandbox_policy: Some(codex_app_server_protocol::SandboxPolicy::ReadOnly {
|
||||
access: codex_app_server_protocol::ReadOnlyAccess::FullAccess,
|
||||
sandbox_policy: Some(if cfg!(target_os = "linux") {
|
||||
// The zsh exec-bridge wrapper uses a Unix socket back to the parent
|
||||
// process. Linux restricted sandbox seccomp denies connect(2), so use
|
||||
// full access here; this test is validating zsh approval/decline
|
||||
// behavior, not Linux sandboxing.
|
||||
codex_app_server_protocol::SandboxPolicy::DangerFullAccess
|
||||
} else {
|
||||
codex_app_server_protocol::SandboxPolicy::ReadOnly {
|
||||
access: codex_app_server_protocol::ReadOnlyAccess::FullAccess,
|
||||
}
|
||||
}),
|
||||
model: Some("mock-model".to_string()),
|
||||
effort: Some(codex_protocol::openai_models::ReasoningEffort::Medium),
|
||||
@@ -517,10 +555,13 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
let TurnStartResponse { turn } = to_response::<TurnStartResponse>(turn_resp)?;
|
||||
|
||||
let mut approval_ids = Vec::new();
|
||||
for decision in [
|
||||
let mut saw_parent_approval = false;
|
||||
let target_decisions = [
|
||||
CommandExecutionApprovalDecision::Accept,
|
||||
CommandExecutionApprovalDecision::Cancel,
|
||||
] {
|
||||
];
|
||||
let mut target_decision_index = 0;
|
||||
while target_decision_index < target_decisions.len() {
|
||||
let server_req = timeout(
|
||||
DEFAULT_READ_TIMEOUT,
|
||||
mcp.read_stream_until_request_message(),
|
||||
@@ -531,13 +572,40 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
panic!("expected CommandExecutionRequestApproval request");
|
||||
};
|
||||
assert_eq!(params.item_id, "call-zsh-fork-subcommand-decline");
|
||||
approval_ids.push(
|
||||
params
|
||||
.approval_id
|
||||
.clone()
|
||||
.expect("approval_id must be present for zsh subcommand approvals"),
|
||||
);
|
||||
assert_eq!(params.thread_id, thread.id);
|
||||
let is_target_subcommand = params.command.as_deref() == Some("/usr/bin/true");
|
||||
if is_target_subcommand {
|
||||
approval_ids.push(
|
||||
params
|
||||
.approval_id
|
||||
.clone()
|
||||
.expect("approval_id must be present for zsh subcommand approvals"),
|
||||
);
|
||||
}
|
||||
let decision = if is_target_subcommand {
|
||||
let decision = target_decisions[target_decision_index].clone();
|
||||
target_decision_index += 1;
|
||||
decision
|
||||
} else {
|
||||
let command = params
|
||||
.command
|
||||
.as_deref()
|
||||
.expect("approval command should be present");
|
||||
assert!(
|
||||
!saw_parent_approval,
|
||||
"unexpected extra non-target approval: {command}"
|
||||
);
|
||||
assert!(
|
||||
command.contains("zsh-no-rc"),
|
||||
"expected parent zsh wrapper approval, got: {command}"
|
||||
);
|
||||
assert!(
|
||||
command.contains("/usr/bin/true && /usr/bin/true"),
|
||||
"expected tool command in parent approval, got: {command}"
|
||||
);
|
||||
saw_parent_approval = true;
|
||||
CommandExecutionApprovalDecision::Accept
|
||||
};
|
||||
mcp.send_response(
|
||||
request_id,
|
||||
serde_json::to_value(CommandExecutionRequestApprovalResponse { decision })?,
|
||||
@@ -545,6 +613,8 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
.await?;
|
||||
}
|
||||
|
||||
assert_eq!(approval_ids.len(), 2);
|
||||
assert_ne!(approval_ids[0], approval_ids[1]);
|
||||
let parent_completed_command_execution = timeout(DEFAULT_READ_TIMEOUT, async {
|
||||
loop {
|
||||
let completed_notif = mcp
|
||||
@@ -563,32 +633,61 @@ async fn turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2()
|
||||
}
|
||||
}
|
||||
})
|
||||
.await??;
|
||||
.await;
|
||||
|
||||
let ThreadItem::CommandExecution {
|
||||
id,
|
||||
status,
|
||||
aggregated_output,
|
||||
..
|
||||
} = parent_completed_command_execution
|
||||
else {
|
||||
unreachable!("loop ensures we break on parent command execution item");
|
||||
};
|
||||
assert_eq!(id, "call-zsh-fork-subcommand-decline");
|
||||
assert_eq!(status, CommandExecutionStatus::Declined);
|
||||
assert!(
|
||||
aggregated_output.is_none()
|
||||
|| aggregated_output == Some("exec command rejected by user".to_string())
|
||||
);
|
||||
assert_eq!(approval_ids.len(), 2);
|
||||
assert_ne!(approval_ids[0], approval_ids[1]);
|
||||
match parent_completed_command_execution {
|
||||
Ok(Ok(parent_completed_command_execution)) => {
|
||||
let ThreadItem::CommandExecution {
|
||||
id,
|
||||
status,
|
||||
aggregated_output,
|
||||
..
|
||||
} = parent_completed_command_execution
|
||||
else {
|
||||
unreachable!("loop ensures we break on parent command execution item");
|
||||
};
|
||||
assert_eq!(id, "call-zsh-fork-subcommand-decline");
|
||||
assert_eq!(status, CommandExecutionStatus::Declined);
|
||||
assert!(
|
||||
aggregated_output.is_none()
|
||||
|| aggregated_output == Some("exec command rejected by user".to_string())
|
||||
);
|
||||
|
||||
mcp.interrupt_turn_and_wait_for_aborted(thread.id, turn.id, DEFAULT_READ_TIMEOUT)
|
||||
.await?;
|
||||
mcp.interrupt_turn_and_wait_for_aborted(
|
||||
thread.id.clone(),
|
||||
turn.id.clone(),
|
||||
DEFAULT_READ_TIMEOUT,
|
||||
)
|
||||
.await?;
|
||||
}
|
||||
Ok(Err(error)) => return Err(error),
|
||||
Err(_) => {
|
||||
// Some zsh builds abort the turn immediately after the rejected
|
||||
// subcommand without emitting a parent `item/completed`.
|
||||
let completed_notif = timeout(
|
||||
DEFAULT_READ_TIMEOUT,
|
||||
mcp.read_stream_until_notification_message("turn/completed"),
|
||||
)
|
||||
.await??;
|
||||
let completed: TurnCompletedNotification = serde_json::from_value(
|
||||
completed_notif
|
||||
.params
|
||||
.expect("turn/completed params must be present"),
|
||||
)?;
|
||||
assert_eq!(completed.thread_id, thread.id);
|
||||
assert_eq!(completed.turn.id, turn.id);
|
||||
assert_eq!(completed.turn.status, TurnStatus::Interrupted);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn create_zsh_test_mcp_process(codex_home: &Path, zdotdir: &Path) -> Result<McpProcess> {
|
||||
let zdotdir = zdotdir.to_string_lossy().into_owned();
|
||||
McpProcess::new_with_env(codex_home, &[("ZDOTDIR", Some(zdotdir.as_str()))]).await
|
||||
}
|
||||
|
||||
fn create_config_toml(
|
||||
codex_home: &Path,
|
||||
server_uri: &str,
|
||||
@@ -640,36 +739,24 @@ stream_max_retries = 0
|
||||
)
|
||||
}
|
||||
|
||||
fn find_test_zsh_path() -> Option<std::path::PathBuf> {
|
||||
if let Some(path) = std::env::var_os("CODEX_TEST_ZSH_PATH") {
|
||||
let path = std::path::PathBuf::from(path);
|
||||
if path.is_file() {
|
||||
return Some(path);
|
||||
}
|
||||
panic!(
|
||||
"CODEX_TEST_ZSH_PATH is set but is not a file: {}",
|
||||
path.display()
|
||||
fn find_test_zsh_path() -> Result<Option<std::path::PathBuf>> {
|
||||
let repo_root = codex_utils_cargo_bin::repo_root()?;
|
||||
let dotslash_zsh = repo_root.join("codex-rs/exec-server/tests/suite/zsh");
|
||||
if !dotslash_zsh.is_file() {
|
||||
eprintln!(
|
||||
"skipping zsh fork test: shared zsh DotSlash file not found at {}",
|
||||
dotslash_zsh.display()
|
||||
);
|
||||
return Ok(None);
|
||||
}
|
||||
|
||||
for candidate in ["/bin/zsh", "/usr/bin/zsh"] {
|
||||
let path = Path::new(candidate);
|
||||
if path.is_file() {
|
||||
return Some(path.to_path_buf());
|
||||
match core_test_support::fetch_dotslash_file(&dotslash_zsh, None) {
|
||||
Ok(path) => return Ok(Some(path)),
|
||||
Err(error) => {
|
||||
eprintln!("failed to fetch vendored zsh via dotslash: {error:#}");
|
||||
}
|
||||
}
|
||||
|
||||
let shell = std::env::var_os("SHELL")?;
|
||||
let shell_path = std::path::PathBuf::from(shell);
|
||||
if shell_path
|
||||
.file_name()
|
||||
.is_some_and(|file_name| file_name == "zsh")
|
||||
&& shell_path.is_file()
|
||||
{
|
||||
return Some(shell_path);
|
||||
}
|
||||
|
||||
None
|
||||
Ok(None)
|
||||
}
|
||||
|
||||
fn supports_exec_wrapper_intercept(zsh_path: &Path) -> bool {
|
||||
|
||||
@@ -543,10 +543,12 @@ fn stage_str(stage: codex_core::features::Stage) -> &'static str {
|
||||
}
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
|
||||
return Ok(());
|
||||
}
|
||||
arg0_dispatch_or_else(|codex_linux_sandbox_exe| async move {
|
||||
// Run wrapper mode only after arg0 dispatch so `codex-linux-sandbox`
|
||||
// invocations don't get misclassified as zsh exec-wrapper calls.
|
||||
if codex_core::maybe_run_zsh_exec_wrapper_mode()? {
|
||||
return Ok(());
|
||||
}
|
||||
cli_main(codex_linux_sandbox_exe).await?;
|
||||
Ok(())
|
||||
})
|
||||
|
||||
@@ -368,13 +368,25 @@ fn legacy_usage_notice(alias: &str, feature: Feature) -> (String, Option<String>
|
||||
(summary, Some(web_search_details().to_string()))
|
||||
}
|
||||
_ => {
|
||||
let summary = format!("`{alias}` is deprecated. Use `[features].{canonical}` instead.");
|
||||
let details = if alias == canonical {
|
||||
None
|
||||
let (summary, details) = if alias == "collab" && feature == Feature::Collab {
|
||||
(
|
||||
"Your configuration file has an error.".to_string(),
|
||||
Some(
|
||||
"Change collab=true to multi_agent=true in your config.toml or enable it by running codex --enable multi_agent".to_string(),
|
||||
),
|
||||
)
|
||||
} else if alias == canonical {
|
||||
(
|
||||
format!("`{alias}` is deprecated. Use `[features].{canonical}` instead."),
|
||||
None,
|
||||
)
|
||||
} else {
|
||||
Some(format!(
|
||||
"Enable it with `--enable {canonical}` or `[features].{canonical}` in config.toml. See https://developers.openai.com/codex/config-basic#feature-flags for details."
|
||||
))
|
||||
(
|
||||
format!("`{alias}` is deprecated. Use `[features].{canonical}` instead."),
|
||||
Some(format!(
|
||||
"Enable it with `--enable {canonical}` or `[features].{canonical}` in config.toml. See https://developers.openai.com/codex/config-basic#feature-flags for details."
|
||||
)),
|
||||
)
|
||||
};
|
||||
(summary, details)
|
||||
}
|
||||
@@ -764,4 +776,17 @@ mod tests {
|
||||
assert_eq!(feature_for_key("multi_agent"), Some(Feature::Collab));
|
||||
assert_eq!(feature_for_key("collab"), Some(Feature::Collab));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn collab_legacy_notice_uses_config_error_text() {
|
||||
let (summary, details) = legacy_usage_notice("collab", Feature::Collab);
|
||||
|
||||
assert_eq!(summary, "Your configuration file has an error.".to_string());
|
||||
assert_eq!(
|
||||
details,
|
||||
Some(
|
||||
"Change collab=true to multi_agent=true in your config.toml or enable it by running codex --enable multi_agent".to_string()
|
||||
)
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -166,6 +166,10 @@ impl ZshExecBridge {
|
||||
})?;
|
||||
|
||||
let mut cmd = tokio::process::Command::new(&command[0]);
|
||||
#[cfg(unix)]
|
||||
if let Some(arg0) = &req.arg0 {
|
||||
cmd.arg0(arg0);
|
||||
}
|
||||
if command.len() > 1 {
|
||||
cmd.args(&command[1..]);
|
||||
}
|
||||
@@ -459,7 +463,6 @@ fn run_exec_wrapper_mode() -> anyhow::Result<()> {
|
||||
argv: argv.clone(),
|
||||
cwd,
|
||||
};
|
||||
|
||||
let mut stream = StdUnixStream::connect(&socket_path)
|
||||
.with_context(|| format!("connect to wrapper socket at {socket_path}"))?;
|
||||
let encoded = serde_json::to_string(&request).context("serialize wrapper request")?;
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
#![expect(clippy::expect_used)]
|
||||
|
||||
use anyhow::Context as _;
|
||||
use anyhow::ensure;
|
||||
use codex_utils_cargo_bin::CargoBinError;
|
||||
use ctor::ctor;
|
||||
use tempfile::TempDir;
|
||||
@@ -99,6 +101,42 @@ pub fn test_tmp_path_buf() -> PathBuf {
|
||||
test_tmp_path().into_path_buf()
|
||||
}
|
||||
|
||||
/// Fetch a DotSlash resource and return the resolved executable/file path.
|
||||
pub fn fetch_dotslash_file(
|
||||
dotslash_file: &std::path::Path,
|
||||
dotslash_cache: Option<&std::path::Path>,
|
||||
) -> anyhow::Result<PathBuf> {
|
||||
let mut command = std::process::Command::new("dotslash");
|
||||
command.arg("--").arg("fetch").arg(dotslash_file);
|
||||
if let Some(dotslash_cache) = dotslash_cache {
|
||||
command.env("DOTSLASH_CACHE", dotslash_cache);
|
||||
}
|
||||
let output = command.output().with_context(|| {
|
||||
format!(
|
||||
"failed to run dotslash to fetch resource {}",
|
||||
dotslash_file.display()
|
||||
)
|
||||
})?;
|
||||
ensure!(
|
||||
output.status.success(),
|
||||
"dotslash fetch failed for {}: {}",
|
||||
dotslash_file.display(),
|
||||
String::from_utf8_lossy(&output.stderr).trim()
|
||||
);
|
||||
let fetched_path = String::from_utf8(output.stdout)
|
||||
.context("dotslash fetch output was not utf8")?
|
||||
.trim()
|
||||
.to_string();
|
||||
ensure!(!fetched_path.is_empty(), "dotslash fetch output was empty");
|
||||
let fetched_path = PathBuf::from(fetched_path);
|
||||
ensure!(
|
||||
fetched_path.is_file(),
|
||||
"dotslash returned non-file path: {}",
|
||||
fetched_path.display()
|
||||
);
|
||||
Ok(fetched_path)
|
||||
}
|
||||
|
||||
/// Returns a default `Config` whose on-disk state is confined to the provided
|
||||
/// temporary directory. Using a per-test directory keeps tests hermetic and
|
||||
/// avoids clobbering a developer’s real `~/.codex`.
|
||||
|
||||
@@ -58,6 +58,7 @@ tracing = { workspace = true }
|
||||
tracing-subscriber = { workspace = true, features = ["env-filter", "fmt"] }
|
||||
|
||||
[dev-dependencies]
|
||||
core_test_support = { workspace = true }
|
||||
codex-utils-cargo-bin = { workspace = true }
|
||||
codex-protocol = { workspace = true }
|
||||
exec_server_test_support = { workspace = true }
|
||||
|
||||
@@ -61,15 +61,9 @@ prefix_rule(
|
||||
/// Verify the same prompt/escalation flow works when the server is launched
|
||||
/// with a patched zsh binary.
|
||||
///
|
||||
/// Set CODEX_TEST_ZSH_PATH to enable this test locally or in CI.
|
||||
/// The suite resolves `tests/suite/zsh` via DotSlash on first use.
|
||||
#[tokio::test(flavor = "current_thread")]
|
||||
async fn accept_elicitation_for_prompt_rule_with_zsh() -> Result<()> {
|
||||
let Some(zsh_path) = std::env::var_os("CODEX_TEST_ZSH_PATH") else {
|
||||
eprintln!("skipping zsh test: CODEX_TEST_ZSH_PATH is not set");
|
||||
return Ok(());
|
||||
};
|
||||
let zsh_path = PathBuf::from(zsh_path);
|
||||
|
||||
let codex_home = TempDir::new()?;
|
||||
write_default_execpolicy(
|
||||
r#"
|
||||
@@ -87,6 +81,11 @@ prefix_rule(
|
||||
.await?;
|
||||
let dotslash_cache_temp_dir = TempDir::new()?;
|
||||
let dotslash_cache = dotslash_cache_temp_dir.path();
|
||||
let zsh_path = resolve_test_zsh_path(dotslash_cache).await?;
|
||||
eprintln!(
|
||||
"using zsh path for exec-server test: {}",
|
||||
zsh_path.display()
|
||||
);
|
||||
let transport =
|
||||
create_transport_with_shell_path(codex_home.as_ref(), dotslash_cache, &zsh_path).await?;
|
||||
run_accept_elicitation_for_prompt_rule_with_transport(transport).await
|
||||
@@ -95,13 +94,13 @@ prefix_rule(
|
||||
async fn run_accept_elicitation_for_prompt_rule_with_transport(
|
||||
transport: rmcp::transport::TokioChildProcess,
|
||||
) -> Result<()> {
|
||||
// Create an MCP client that approves expected elicitation messages.
|
||||
// Create an MCP client that approves the expected elicitation message.
|
||||
let project_root = TempDir::new()?;
|
||||
let project_root_path = project_root.path().canonicalize().unwrap();
|
||||
let git_path = resolve_git_path(USE_LOGIN_SHELL).await?;
|
||||
let git_init_command = format!("{git_path} init --quiet .");
|
||||
let expected_elicitation_message = format!(
|
||||
"Allow agent to run `{} init .` in `{}`?",
|
||||
git_path,
|
||||
"Allow agent to run `{git_path} init --quiet .` in `{}`?",
|
||||
project_root_path.display()
|
||||
);
|
||||
let elicitation_requests: Arc<Mutex<Vec<CreateElicitationRequestParams>>> = Default::default();
|
||||
@@ -142,7 +141,7 @@ async fn run_accept_elicitation_for_prompt_rule_with_transport(
|
||||
arguments: Some(object(json!(
|
||||
{
|
||||
"login": USE_LOGIN_SHELL,
|
||||
"command": "git init .",
|
||||
"command": git_init_command,
|
||||
"workdir": project_root_path.to_string_lossy(),
|
||||
}
|
||||
))),
|
||||
@@ -157,15 +156,11 @@ async fn run_accept_elicitation_for_prompt_rule_with_transport(
|
||||
let ExecResult {
|
||||
exit_code, output, ..
|
||||
} = serde_json::from_str::<ExecResult>(&tool_call_content.text)?;
|
||||
let git_init_succeeded = format!(
|
||||
"Initialized empty Git repository in {}/.git/\n",
|
||||
project_root_path.display()
|
||||
);
|
||||
// Normally, this would be an exact match, but it might include extra output
|
||||
// if `git config set advice.defaultBranchName false` has not been set.
|
||||
// `git init --quiet` is expected to suppress the usual initialization
|
||||
// banner, so assert on success and filesystem effects instead of output.
|
||||
assert!(
|
||||
output.contains(&git_init_succeeded),
|
||||
"expected output `{output}` to contain `{git_init_succeeded}`"
|
||||
output.is_empty(),
|
||||
"expected no output from `git init --quiet .`, got `{output}`"
|
||||
);
|
||||
assert_eq!(exit_code, 0, "command should succeed");
|
||||
assert_eq!(is_error, Some(false), "command should succeed");
|
||||
@@ -192,6 +187,12 @@ async fn run_accept_elicitation_for_prompt_rule_with_transport(
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn resolve_test_zsh_path(dotslash_cache: &std::path::Path) -> Result<PathBuf> {
|
||||
let dotslash_zsh = codex_utils_cargo_bin::find_resource!("tests/suite/zsh")?;
|
||||
core_test_support::fetch_dotslash_file(&dotslash_zsh, Some(dotslash_cache))
|
||||
.with_context(|| format!("failed to fetch test zsh from {}", dotslash_zsh.display()))
|
||||
}
|
||||
|
||||
fn ensure_codex_cli() -> Result<PathBuf> {
|
||||
let codex_cli = codex_utils_cargo_bin::cargo_bin("codex")?;
|
||||
|
||||
|
||||
72
codex-rs/exec-server/tests/suite/zsh
Executable file
72
codex-rs/exec-server/tests/suite/zsh
Executable file
@@ -0,0 +1,72 @@
|
||||
#!/usr/bin/env dotslash
|
||||
|
||||
// This is the patched zsh fork built by
|
||||
// `.github/workflows/shell-tool-mcp.yml` for the shell-tool-mcp package.
|
||||
// Fetching the prebuilt version via DotSlash makes it easier to write
|
||||
// integration tests that exercise the zsh fork behavior in exec-server tests.
|
||||
//
|
||||
// TODO(mbolin): Currently, we use a .tgz artifact that includes binaries for
|
||||
// multiple platforms, but we could save a bit of space by making arch-specific
|
||||
// artifacts available in the GitHub releases and referencing those here.
|
||||
{
|
||||
"name": "codex-zsh",
|
||||
"platforms": {
|
||||
// macOS 13 builds (and therefore x86_64) were dropped in
|
||||
// https://github.com/openai/codex/pull/7295, so we only provide an
|
||||
// Apple Silicon build for now.
|
||||
"macos-aarch64": {
|
||||
"size": 53771483,
|
||||
"hash": "blake3",
|
||||
"digest": "ff664f63f5e1fa62762c9aff0aafa66cf196faf9b157f98ec98f59c152fc7bd3",
|
||||
"format": "tar.gz",
|
||||
"path": "package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh",
|
||||
"providers": [
|
||||
{
|
||||
"url": "https://github.com/openai/codex/releases/download/rust-v0.104.0/codex-shell-tool-mcp-npm-0.104.0.tgz"
|
||||
},
|
||||
{
|
||||
"type": "github-release",
|
||||
"repo": "openai/codex",
|
||||
"tag": "rust-v0.104.0",
|
||||
"name": "codex-shell-tool-mcp-npm-0.104.0.tgz"
|
||||
}
|
||||
]
|
||||
},
|
||||
"linux-x86_64": {
|
||||
"size": 53771483,
|
||||
"hash": "blake3",
|
||||
"digest": "ff664f63f5e1fa62762c9aff0aafa66cf196faf9b157f98ec98f59c152fc7bd3",
|
||||
"format": "tar.gz",
|
||||
"path": "package/vendor/x86_64-unknown-linux-musl/zsh/ubuntu-24.04/zsh",
|
||||
"providers": [
|
||||
{
|
||||
"url": "https://github.com/openai/codex/releases/download/rust-v0.104.0/codex-shell-tool-mcp-npm-0.104.0.tgz"
|
||||
},
|
||||
{
|
||||
"type": "github-release",
|
||||
"repo": "openai/codex",
|
||||
"tag": "rust-v0.104.0",
|
||||
"name": "codex-shell-tool-mcp-npm-0.104.0.tgz"
|
||||
}
|
||||
]
|
||||
},
|
||||
"linux-aarch64": {
|
||||
"size": 53771483,
|
||||
"hash": "blake3",
|
||||
"digest": "ff664f63f5e1fa62762c9aff0aafa66cf196faf9b157f98ec98f59c152fc7bd3",
|
||||
"format": "tar.gz",
|
||||
"path": "package/vendor/aarch64-unknown-linux-musl/zsh/ubuntu-24.04/zsh",
|
||||
"providers": [
|
||||
{
|
||||
"url": "https://github.com/openai/codex/releases/download/rust-v0.104.0/codex-shell-tool-mcp-npm-0.104.0.tgz"
|
||||
},
|
||||
{
|
||||
"type": "github-release",
|
||||
"repo": "openai/codex",
|
||||
"tag": "rust-v0.104.0",
|
||||
"name": "codex-shell-tool-mcp-npm-0.104.0.tgz"
|
||||
}
|
||||
]
|
||||
},
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user