Files
codex/codex-rs/app-server/tests/suite/v2/skills_list.rs
Charley Cunningham bc24017d64 Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved

## Runtime model
This PR does not introduce a new `approval_policy`.

Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
  - `user`
  - `guardian_subagent`

`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.

The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.

When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`

Users can still change `/approvals` afterward.

Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior

ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.

## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.

- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface

## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.

It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`

with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`

`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`

`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.

These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.

This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.

## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
  - resolved approval/denial state in history

## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)

Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 15:27:00 -07:00

279 lines
9.1 KiB
Rust

use std::time::Duration;
use anyhow::Context;
use anyhow::Result;
use app_test_support::McpProcess;
use app_test_support::to_response;
use codex_app_server_protocol::JSONRPCResponse;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::SkillsChangedNotification;
use codex_app_server_protocol::SkillsListExtraRootsForCwd;
use codex_app_server_protocol::SkillsListParams;
use codex_app_server_protocol::SkillsListResponse;
use codex_app_server_protocol::ThreadStartParams;
use pretty_assertions::assert_eq;
use tempfile::TempDir;
use tokio::time::timeout;
const DEFAULT_TIMEOUT: Duration = Duration::from_secs(10);
const WATCHER_TIMEOUT: Duration = Duration::from_secs(20);
fn write_skill(root: &TempDir, name: &str) -> Result<()> {
let skill_dir = root.path().join("skills").join(name);
std::fs::create_dir_all(&skill_dir)?;
let content = format!("---\nname: {name}\ndescription: {name} description\n---\n\n# Body\n");
std::fs::write(skill_dir.join("SKILL.md"), content)?;
Ok(())
}
#[tokio::test]
async fn skills_list_includes_skills_from_per_cwd_extra_user_roots() -> Result<()> {
let codex_home = TempDir::new()?;
let cwd = TempDir::new()?;
let extra_root = TempDir::new()?;
write_skill(&extra_root, "extra-skill")?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_TIMEOUT, mcp.initialize()).await??;
let request_id = mcp
.send_skills_list_request(SkillsListParams {
cwds: vec![cwd.path().to_path_buf()],
force_reload: true,
per_cwd_extra_user_roots: Some(vec![SkillsListExtraRootsForCwd {
cwd: cwd.path().to_path_buf(),
extra_user_roots: vec![extra_root.path().to_path_buf()],
}]),
})
.await?;
let response: JSONRPCResponse = timeout(
DEFAULT_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(request_id)),
)
.await??;
let SkillsListResponse { data } = to_response(response)?;
assert_eq!(data.len(), 1);
assert_eq!(data[0].cwd, cwd.path().to_path_buf());
assert!(
data[0]
.skills
.iter()
.any(|skill| skill.name == "extra-skill")
);
Ok(())
}
#[tokio::test]
async fn skills_list_rejects_relative_extra_user_roots() -> Result<()> {
let codex_home = TempDir::new()?;
let cwd = TempDir::new()?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_TIMEOUT, mcp.initialize()).await??;
let request_id = mcp
.send_skills_list_request(SkillsListParams {
cwds: vec![cwd.path().to_path_buf()],
force_reload: true,
per_cwd_extra_user_roots: Some(vec![SkillsListExtraRootsForCwd {
cwd: cwd.path().to_path_buf(),
extra_user_roots: vec![std::path::PathBuf::from("relative/skills")],
}]),
})
.await?;
let err = timeout(
DEFAULT_TIMEOUT,
mcp.read_stream_until_error_message(RequestId::Integer(request_id)),
)
.await??;
assert!(
err.error
.message
.contains("perCwdExtraUserRoots extraUserRoots paths must be absolute"),
"unexpected error: {}",
err.error.message
);
Ok(())
}
#[tokio::test]
async fn skills_list_ignores_per_cwd_extra_roots_for_unknown_cwd() -> Result<()> {
let codex_home = TempDir::new()?;
let requested_cwd = TempDir::new()?;
let unknown_cwd = TempDir::new()?;
let extra_root = TempDir::new()?;
write_skill(&extra_root, "ignored-extra-skill")?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_TIMEOUT, mcp.initialize()).await??;
let request_id = mcp
.send_skills_list_request(SkillsListParams {
cwds: vec![requested_cwd.path().to_path_buf()],
force_reload: true,
per_cwd_extra_user_roots: Some(vec![SkillsListExtraRootsForCwd {
cwd: unknown_cwd.path().to_path_buf(),
extra_user_roots: vec![extra_root.path().to_path_buf()],
}]),
})
.await?;
let response: JSONRPCResponse = timeout(
DEFAULT_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(request_id)),
)
.await??;
let SkillsListResponse { data } = to_response(response)?;
assert_eq!(data.len(), 1);
assert_eq!(data[0].cwd, requested_cwd.path().to_path_buf());
assert!(
data[0]
.skills
.iter()
.all(|skill| skill.name != "ignored-extra-skill")
);
Ok(())
}
#[tokio::test]
async fn skills_list_uses_cached_result_until_force_reload() -> Result<()> {
let codex_home = TempDir::new()?;
let cwd = TempDir::new()?;
let extra_root = TempDir::new()?;
write_skill(&extra_root, "late-extra-skill")?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_TIMEOUT, mcp.initialize()).await??;
// Seed the cwd cache first without extra roots.
let first_request_id = mcp
.send_skills_list_request(SkillsListParams {
cwds: vec![cwd.path().to_path_buf()],
force_reload: false,
per_cwd_extra_user_roots: None,
})
.await?;
let first_response: JSONRPCResponse = timeout(
DEFAULT_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(first_request_id)),
)
.await??;
let SkillsListResponse { data: first_data } = to_response(first_response)?;
assert_eq!(first_data.len(), 1);
assert!(
first_data[0]
.skills
.iter()
.all(|skill| skill.name != "late-extra-skill")
);
let second_request_id = mcp
.send_skills_list_request(SkillsListParams {
cwds: vec![cwd.path().to_path_buf()],
force_reload: false,
per_cwd_extra_user_roots: Some(vec![SkillsListExtraRootsForCwd {
cwd: cwd.path().to_path_buf(),
extra_user_roots: vec![extra_root.path().to_path_buf()],
}]),
})
.await?;
let second_response: JSONRPCResponse = timeout(
DEFAULT_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(second_request_id)),
)
.await??;
let SkillsListResponse { data: second_data } = to_response(second_response)?;
assert_eq!(second_data.len(), 1);
assert!(
second_data[0]
.skills
.iter()
.all(|skill| skill.name != "late-extra-skill")
);
let third_request_id = mcp
.send_skills_list_request(SkillsListParams {
cwds: vec![cwd.path().to_path_buf()],
force_reload: true,
per_cwd_extra_user_roots: Some(vec![SkillsListExtraRootsForCwd {
cwd: cwd.path().to_path_buf(),
extra_user_roots: vec![extra_root.path().to_path_buf()],
}]),
})
.await?;
let third_response: JSONRPCResponse = timeout(
DEFAULT_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(third_request_id)),
)
.await??;
let SkillsListResponse { data: third_data } = to_response(third_response)?;
assert_eq!(third_data.len(), 1);
assert!(
third_data[0]
.skills
.iter()
.any(|skill| skill.name == "late-extra-skill")
);
Ok(())
}
#[tokio::test]
async fn skills_changed_notification_is_emitted_after_skill_change() -> Result<()> {
let codex_home = TempDir::new()?;
write_skill(&codex_home, "demo")?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_TIMEOUT, mcp.initialize()).await??;
let thread_start_request_id = mcp
.send_thread_start_request(ThreadStartParams {
model: None,
model_provider: None,
service_tier: None,
cwd: None,
approval_policy: None,
approvals_reviewer: None,
sandbox: None,
config: None,
service_name: None,
base_instructions: None,
developer_instructions: None,
personality: None,
ephemeral: None,
dynamic_tools: None,
mock_experimental_field: None,
experimental_raw_events: false,
persist_extended_history: false,
})
.await?;
let _: JSONRPCResponse = timeout(
DEFAULT_TIMEOUT,
mcp.read_stream_until_response_message(RequestId::Integer(thread_start_request_id)),
)
.await??;
let skill_path = codex_home
.path()
.join("skills")
.join("demo")
.join("SKILL.md");
std::fs::write(
&skill_path,
"---\nname: demo\ndescription: updated\n---\n\n# Updated\n",
)?;
let notification = timeout(
WATCHER_TIMEOUT,
mcp.read_stream_until_notification_message("skills/changed"),
)
.await??;
let params = notification
.params
.context("skills/changed params must be present")?;
let notification: SkillsChangedNotification = serde_json::from_value(params)?;
assert_eq!(notification, SkillsChangedNotification {});
Ok(())
}