tui: add /validate subcommand with high‑risk web validation

- Add /validate slash command - Plan + execute validations (Playwright MCP preference, curl, python) - Pre‑validation account setup: auto‑register or manual fallback (opens login) - Persist credentials in context/validation/credentials.json (usernames logged) - Update bugs.md and report with validation results + transcript Also adds Playwright tool support, inline python execution, and UI logs.
2026-05-04 03:16:31 +00:00 · 2025-11-01 11:19:40 -07:00
parent ae5150c37a
commit 883a108624
6 changed files with 716 additions and 4 deletions
--- a/codex-rs/tui/src/security_prompts.rs
+++ b/codex-rs/tui/src/security_prompts.rs
@@ -87,6 +87,47 @@ Produce newline-delimited JSON (NDJSON), one object per classified data type wit
 # Output
 Emit only NDJSON lines. Each JSON object must contain exactly the keys listed above (no arrays, extra keys, or prose).
 "#;
+
+// Validation plan prompts
+pub(crate) const VALIDATION_PLAN_SYSTEM_PROMPT: &str = "You are an application security engineer planning minimal, safe validations for high-risk findings. Respond ONLY with JSON Lines as requested; do not include markdown or prose.";
+pub(crate) const VALIDATION_PLAN_PROMPT_TEMPLATE: &str = r#"
+Before any checks, create two test accounts if the app requires login. Prefer a short Python script that calls a signup endpoint or automates the registration form headlessly. If this is not feasible, return a `manual` instruction with a `login_url`.
+
+Then select ONLY high-risk findings to validate. For each, choose the minimal tool and target:
+- Use the Playwright MCP tool for web_browser checks (supply a reachable URL in `target`).
+- Use tool "curl" for network_api checks (supply full URL in `target`).
+- Use tool "python" only if a short, non-destructive PoC is essential (include inline script text in `script`).
+
+Rules:
+- Keep requests minimal and non-destructive; no state-changing actions.
+- Prefer headless checks (e.g., page loads, HTTP status, presence of a marker string).
+- Max 5 requests total; prioritize Critical/High severity or lowest risk_rank.
+
+Context (findings):
+{findings}
+
+Output format (one JSON object per line, no fences):
+- For account setup (emit at most one line): {"id_kind":"setup","action":"register|manual","login_url":"<string, optional>","tool":"python|manual","script":"<string, optional>"}
+- For validations: {"id_kind":"risk_rank|summary_id","id_value":<int>,"tool":"playwright|curl|python","target":"<string, optional>","script":"<string, optional>"}
+"#;
+
+// Account setup planning (standalone, used when needed)
+pub(crate) const VALIDATION_ACCOUNTS_SYSTEM_PROMPT: &str = "You plan how to create two test accounts for a typical web app. Respond ONLY with JSON Lines; no prose.";
+pub(crate) const VALIDATION_ACCOUNTS_PROMPT_TEMPLATE: &str = r#"
+Goal: ensure two test accounts exist prior to validation. Prefer a short Python script that registers accounts via HTTP or a headless flow; otherwise return a manual login URL.
+
+Constraints:
+- The script must be non-destructive and idempotent.
+- Print credentials to stdout as JSON: {"accounts":[{"username":"...","password":"..."},{"username":"...","password":"..."}]}.
+- If you cannot identify a safe automated path, return a single JSON line: {"action":"manual","login_url":"https://..."}.
+
+Context (findings):
+{findings}
+
+Output format (one JSON object per line, no fences):
+- Automated: {"action":"register","tool":"python","login_url":"<string, optional>","script":"<python script>"}
+- Manual: {"action":"manual","login_url":"<string>"}
+"#;
 pub(crate) const MARKDOWN_OUTPUT_GUARD: &str = "\n# Output Guard (strict)\n    - Output only the final markdown content requested.\n    - Do not include goal, analysis, planning, chain-of-thought, or step lists.\n    - Do not echo prompt sections like \"Task\", \"Steps\", \"Output\", or \"Important\".\n    - Do not include any XML/angle-bracket blocks (e.g., <...> inputs) in the output.\n    - Do not wrap the entire response in code fences; use code fences only for code snippets.\n    - Do not include apologies, disclaimers, or references to being an AI model.\n";
 pub(crate) const MARKDOWN_FIX_SYSTEM_PROMPT: &str = "You are a meticulous technical editor. Polish markdown formatting while preserving the original security analysis content. Focus on fixing numbering, bullet spacing, code fences, and diagram syntax without adding or removing information.";
 pub(crate) const SPEC_COMBINE_PROMPT_TEMPLATE: &str = "You previously generated specification drafts for the following code locations:\n{project_locations}\n\nDraft content (each draft may include an \"API Entry Points\" section summarizing externally exposed interfaces):\n{spec_drafts}\n\nTask: merge these drafts into one comprehensive specification that describes the entire project. Remove duplication, keep terminology consistent, and ensure the final document reads as a single report that preserves API coverage. Follow the template exactly and return only markdown.\n\nNon-negotiable requirements:\n- Carry forward every concrete security-relevant fact, list, table, code block, and data classification entry from the drafts unless it is an exact duplicate.\n- When multiple drafts contribute to the same template section, include the union of their paragraphs and bullet points. If details differ, keep both and attribute them with inline labels such as `(from {location_label})` rather than dropping information.\n- Preserve API entry points verbatim (including tables) and incorporate them into the appropriate section without shortening columns.\n- Keep all identifiers (component names, queue names, environment variables, secrets, external services, metric names) exactly as written; do not rename or generalize.\n- Follow the template's structure exactly: populate every section, create the requested subsections, and include the explicit `Sources:` lines and bullet styles. Do not leave the instructional text in place or drop mandatory sections.\n- Populate the \"Relevant Source Files\" section with bullet points that reference each draft's location label and any concrete file paths mentioned in the drafts.\n- Ensure the \"Data Classification\" section exists even when the drafts were sparse; aggregate and preserve every classification detail there.\n- If multiple drafts contain tabular data (APIs, components, data classification), merge rows from all drafts and maintain duplicates when the sources disagree so the consumer can reconcile manually.\n- Do not introduce new speculation or remove nuance from mitigations, caveats, or risk descriptions provided in the drafts. Err on the side of length; the final document should be at least as detailed as the most verbose draft.\n\n# Available tools\n- READ: respond with `READ: <relative path>#Lstart-Lend` (range optional) to open code or draft files. Use paths relative to the repository root.\n- GREP_FILES: respond with `GREP_FILES: {\"pattern\": \"...\", \"include\": \"*.rs\", \"path\": \"subdir\", \"limit\": 200}` to list files whose contents match.\nEmit at most one tool command in a single message and wait for the tool output before continuing. Prefer READ for prose context; SEARCH is not available during this step.\n\nTemplate:\n{combined_template}\n";
@@ -97,7 +138,7 @@ You triage directories for a security review specification. Only choose director
 - Limit the selection to the most critical directories (ideally 3-8).
 Respond with a newline-separated list containing only the directory paths chosen from the provided list. Respond with `ALL` if every directory should be included. Do not add quotes or extra commentary.
 "#;
-pub(crate) const SPEC_MARKDOWN_TEMPLATE: &str = "# Project Specification\n- Location: {target_label}\n- Prepared by: {model_name}\n- Date: {date}\n- In-scope paths:\n```\n{project_locations}\n```\n\n## Overview\nSummarize the product or service, primary users, and the business problem it solves. Highlight the most security relevant entry points.\n\n## Architecture Summary\nDescribe the high-level system architecture, major services, data stores, and external integrations. Include a concise mermaid flowchart when it improves clarity.\n\n## Components\nList 5-8 major components. For each, note the role, responsibilities, key dependencies, and security-critical behavior.\n\n## Business Flows\nDocument up to 5 important flows (CRUD, external integrations, workflow orchestration). For each flow capture triggers, main steps, data touched, and security notes. Include a short mermaid sequence diagram if helpful.\n\n## Tech Stack\nCapture languages, frameworks, and infrastructure used by each major component. Tabulate runtimes, key libraries, storage technologies, and deployment targets.\n\n## Authentication\nExplain how principals authenticate, token lifecycles, libraries used, and how secrets are managed.\n\n## Authorization\nDescribe the authorization model, enforcement points, privileged roles, and escalation paths.\n\n## Data Classification\nIdentify sensitive data types handled by the project and where they are stored or transmitted.\n\n## Infrastructure and Deployment\nSummarize infrastructure-as-code, runtime platforms, and configuration or secret handling that affects security posture.\n\n## API Entry Points\nList externally reachable interfaces (HTTP/gRPC endpoints, message queues, CLIs, SDK methods) and how they handle security.\n\n### Server APIs\nProvide a markdown table with the exact columns:\n- endpoint path\n- authN method\n- authZ type\n- request parameters\n- example request (params, body, or method)\n- code location\n- parsing/validation logic\nIf the project exposes no server APIs, write `- None identified.` instead of a table.\n\n### Client APIs (optional)\nInclude a markdown table when the project ships an SDK, CLI, or other callable client surface. Columns:\n- api name (module.func or Class.method)\n- module/package\n- summary\n- parameters (omit if noisy)\n- returns (omit if noisy)\n- stability (public/official/internal)\n- code location\nIf there is no public client surface, state `- None.`\n";
+pub(crate) const SPEC_MARKDOWN_TEMPLATE: &str = "# Project Specification\n- Location: {target_label}\n- Prepared by: {model_name}\n- Date: {date}\n- In-scope paths:\n```\n{project_locations}\n```\n\n## Overview\nSummarize the product or service, primary users, and the business problem it solves. Highlight the most security relevant entry points.\n\n## Architecture Summary\nDescribe the high-level system architecture, major services, data stores, and external integrations. Include a concise mermaid flowchart when it improves clarity. If the specification uses more than one mermaid diagram, add a `title Component request flow` line (with a descriptive label) inside each diagram so the rendered report shows distinct titles.\n\n## Components\nList 5-8 major components. For each, note the role, responsibilities, key dependencies, and security-critical behavior.\n\n## Business Flows\nDocument up to 5 important flows (CRUD, external integrations, workflow orchestration). For each flow capture triggers, main steps, data touched, and security notes. Include a short mermaid sequence diagram if helpful.\n\n## Tech Stack\nCapture languages, frameworks, and infrastructure used by each major component. Tabulate runtimes, key libraries, storage technologies, and deployment targets.\n\n## Authentication\nExplain how principals authenticate, token lifecycles, libraries used, and how secrets are managed.\n\n## Authorization\nDescribe the authorization model, enforcement points, privileged roles, and escalation paths.\n\n## Data Classification\nIdentify sensitive data types handled by the project and where they are stored or transmitted.\n\n## Infrastructure and Deployment\nSummarize infrastructure-as-code, runtime platforms, and configuration or secret handling that affects security posture.\n\n## API Entry Points\nList externally reachable interfaces (HTTP/gRPC endpoints, message queues, CLIs, SDK methods) and how they handle security.\n\n### Server APIs\nProvide a markdown table with the exact columns:\n- endpoint path\n- authN method\n- authZ type\n- request parameters\n- example request (params, body, or method)\n- code location\n- parsing/validation logic\nIf the project exposes no server APIs, write `- None identified.` instead of a table.\n\n### Client APIs (optional)\nInclude a markdown table when the project ships an SDK, CLI, or other callable client surface. Columns:\n- api name (module.func or Class.method)\n- module/package\n- summary\n- parameters (omit if noisy)\n- returns (omit if noisy)\n- stability (public/official/internal)\n- code location\nIf there is no public client surface, state `- None.`\n";
 pub(crate) const SPEC_COMBINED_MARKDOWN_TEMPLATE: &str = r#"# Project Specification
 Provide a 2–3 sentence executive overview summarizing the system's purpose, primary users, and the highest-value assets or flows that matter for security.

@@ -108,6 +149,7 @@ List bullet points for the key files and directories covered by the drafts. Use
 Provide a concise overview of how control and data move through the system, highlighting major services, external dependencies, and trust boundaries.
 Include exactly one overarching mermaid diagram here that captures the end-to-end flow (no per-component or sequence diagrams in this section).
 Move any detailed or per-component diagrams to the relevant component subsections below.
+If the specification contains additional mermaid diagrams, add a `title Component request flow` line (with a descriptive label) inside each diagram so the rendered report labels them distinctly.
 End with a `Sources:` line enumerating the files or modules that support this description.

 ## Core Components