[codex] Improve PR babysitter CI diagnostics and guardrails (#20484)

## Summary

- Surface failed GitHub Actions jobs in the PR babysitter watcher so
Codex can fetch job logs as soon as a job fails, instead of waiting for
the overall workflow run to complete.
- Update babysit-pr skill instructions, GitHub API notes, and heuristics
to prefer direct job log archives before falling back to `gh run view
--log-failed`.
- Add guardrails requiring explicit user confirmation before posting
replies to human-authored review comments.
- Add guardrails preventing Codex from patching unrelated flaky tests,
CI infrastructure, runner issues, dependency outages, or other failures
not caused by the PR branch.

## Validation

- `python3 -m pytest
.codex/skills/babysit-pr/scripts/test_gh_pr_watch.py`
This commit is contained in:
Tom
2026-04-30 19:58:19 -07:00
committed by GitHub
parent 6b1b227804
commit c39824c2fd
6 changed files with 164 additions and 14 deletions

View File

@@ -23,9 +23,11 @@ Used to discover failed workflow runs and rerunnable run IDs.
### Failed log inspection
- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
- `gh api repos/{owner}/{repo}/actions/runs/{run_id}/jobs -X GET -f per_page=100`
- `gh api repos/{owner}/{repo}/actions/jobs/{job_id}/logs > /tmp/codex-gh-job-{job_id}-logs.zip`
- `gh run view <run-id> --log-failed`
Used by Codex to classify branch-related vs flaky/unrelated failures.
Used by Codex to classify branch-related vs flaky/unrelated failures. Prefer the direct job log endpoint as soon as a job has failed because `gh run view --log-failed` may not produce failed-job logs until the overall workflow run completes.
### Retry failed jobs only
@@ -70,3 +72,11 @@ Reruns only failed jobs (and dependencies) for a workflow run.
- `conclusion`
- `html_url`
- `head_sha`
### Actions run jobs API (`jobs[]`)
- `id`
- `name`
- `status`
- `conclusion`
- `html_url`