Commit Graph

15 Commits

Author SHA1 Message Date
rhan-oai
99016ec732 [codex-analytics] plumb protocol-native review timing (#21434)
## Why

We want terminal tool review analytics, but the reducer should not stamp
review timing from its own wall clock.

This PR plumbs review timing through the real protocol and app-server
seams so downstream analytics can consume the emitter's timestamps
directly. Guardian reviews keep their enriched `started_at` /
`completed_at` analytics fields by deriving those legacy second-based
values from the same protocol-native millisecond lifecycle timestamps,
rather than sampling a separate analytics clock.

## What changed

- add `started_at_ms` to user approval request payloads
- add `started_at_ms` / `completed_at_ms` to guardian review
notifications
- preserve Guardian review `started_at` / `completed_at` enrichment from
the protocol-native timing source
- stamp typed `ServerResponse` analytics facts with app-server-observed
`completed_at_ms`
- thread the new timing fields through core, protocol, app-server, TUI,
and analytics fixtures

## Verification

- `cargo test -p codex-app-server outgoing_message --manifest-path
codex-rs/Cargo.toml`
- `cargo test -p codex-app-server-protocol guardian --manifest-path
codex-rs/Cargo.toml`
- `cargo test -p codex-tui guardian --manifest-path codex-rs/Cargo.toml`
- `cargo test -p codex-analytics analytics_client_tests --manifest-path
codex-rs/Cargo.toml`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/21434).
* #18748
* __->__ #21434
* #18747
* #17090
* #17089
* #20514
2026-05-07 20:31:41 -07:00
Eric Traut
3f8c06e457 Fix /review interrupt and TUI exit wedges (#18921)
Addresses #11267

## Summary
`/review` can be interrupted while it is still spawning the review
sub-agent. That spawn path lives in `codex-core` and did not observe the
task cancellation token until after `Codex::spawn` returned, so an
interrupted review could keep building a child session and leave the TUI
in a wedged state.

The TUI exit path also waited indefinitely for app-server
`thread/unsubscribe`, which made Ctrl+C look broken if the app-server
was already stuck. This makes interactive delegate startup
cancellation-aware and bounds the TUI shutdown-first unsubscribe wait
with a short UI escape-hatch timeout.

## Testing
I reproed the hang using the steps in the bug report. Confirmed hang no
longer exists after fix.
2026-04-23 13:28:12 -07:00
Dylan Hurd
5e71da1424 feat(request-permissions) approve with strict review (#19050)
## Summary
Allow the user to approve a request_permissions_tool request with the
condition that all commands in the rest of the turn are reviewed by
guardian, regardless of sandbox status.

## Testing
- [x] Added unit tests
- [x] Ran locally
2026-04-23 01:56:32 +00:00
Won Park
83ec1eb5d6 Rename approvals reviewer variant to auto-review (#19056)
## Why

`approvals_reviewer` now uses `auto_review` as the canonical config/API
value after #18504, but the Rust enum variant and nearby helper/test
names still used `GuardianSubagent` / guardian approval wording. That
made follow-up code and reviews confusing even though the external value
had already moved to Auto-review.

## What changed

- Renamed `ApprovalsReviewer::GuardianSubagent` to
`ApprovalsReviewer::AutoReview`.
- Updated protocol, app-server, config, core, TUI, exec, and analytics
test callsites.
- Renamed nearby helper/test names from guardian approval wording to
Auto-review wording where they refer to the approvals reviewer mode.
- Preserved wire compatibility:
  - `auto_review` remains the canonical serialized value.
  - `guardian_subagent` remains accepted as a legacy alias.

This intentionally does not rename the `[features].guardian_approval`
key, `Feature::GuardianApproval`, `core/src/guardian`, analytics event
names, or app-server Guardian review event types.

## Verification

- `cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-config approvals_reviewer`
- `cargo test -p codex-tui update_feature_flags`
- `cargo test -p codex-core permissions_instructions`
- `cargo test -p codex-tui permissions_selection`
2026-04-22 17:22:35 -07:00
Michael Bolin
f8562bd47b sandboxing: intersect permission profiles semantically (#18275)
## Why

Permission approval responses must not be able to grant more access than
the tool requested. Moving this flow to `PermissionProfile` means the
comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and
cwd-relative special paths such as `:cwd` and `:project_roots` must stay
anchored to the turn that produced the request.

## What changed

This implements semantic `PermissionProfile` intersection in
`codex-sandboxing` for file-system and network permissions. The
intersection accepts narrower path grants, rejects broader grants,
preserves deny-read carve-outs and glob scan depth, and materializes
cwd-dependent special-path grants to absolute paths before they can be
recorded for reuse.

The request-permissions response paths now use that intersection
consistently. App-server captures the request turn cwd before waiting
for the client response, includes that cwd in the v2 approval params,
and core stores the requested profile plus cwd for direct TUI/client
responses and Guardian decisions before recording turn- or
session-scoped grants. The TUI app-server bridge now preserves the
app-server request cwd when converting permission approval params into
core events.

## Verification

- `cargo test -p codex-sandboxing intersect_permission_profiles --
--nocapture`
- `cargo test -p codex-app-server request_permissions_response --
--nocapture`
- `cargo test -p codex-core
request_permissions_response_materializes_session_cwd_grants_before_recording
-- --nocapture`
- `cargo check -p codex-tui --tests`
- `cargo check --tests`
- `cargo test -p codex-tui
app_server_request_permissions_preserves_file_system_permissions`
2026-04-21 10:23:01 -07:00
pakrym-oai
71e4c6fa17 Move codex module under session (#18249)
## Summary
- rename the core codex module root to session/mod.rs without using
#[path]
- move the codex module directory and tests under core/src/session
- remove session/mod.rs reexports so call sites use explicit child
module paths

## Testing
- cargo test -p codex-core --lib
- cargo check -p codex-core --tests
- just fmt
- just fix -p codex-core
- git diff --check
2026-04-17 16:18:53 +00:00
pakrym-oai
dd1321d11b Spread AbsolutePathBuf (#17792)
Mechanical change to promote absolute paths through code.
2026-04-14 14:26:10 -07:00
Owen Lin
a3be74143a fix(guardian, app-server): introduce guardian review ids (#17298)
## Description

This PR introduces `review_id` as the stable identifier for guardian
reviews and exposes it in app-server `item/autoApprovalReview/started`
and `item/autoApprovalReview/completed` events.

Internally, guardian rejection state is now keyed by `review_id` instead
of the reviewed tool item ID. `target_item_id` is still included when a
review maps to a concrete thread item, but it is no longer overloaded as
the review lifecycle identifier.

## Motivation

We'd like to give users the ability to preempt a guardian review while
it's running (approve or decline).

However, we can't implement the API that allows the user to override a
running guardian review because we didn't have a unique `review_id` per
guardian review. Using `target_item_id` is not correct since:
- with execve reviews, there can be multiple execve calls (and therefore
guardian reviews) per shell command
- with network policy reviews, there is no target item ID

The PR that actually implements user overrides will use `review_id` as
the stable identifier.
2026-04-10 16:21:02 -07:00
maja-openai
dcbc91fd39 Update guardian output schema (#17061)
## Summary
- Update guardian output schema to separate risk, authorization,
outcome, and rationale.
- Feed guardian rationale into rejection messages.
- Split the guardian policy into template and tenant-config sections.

## Validation
- `cargo test -p codex-core mcp_tool_call`
- `env -u CODEX_SANDBOX_NETWORK_DISABLED INSTA_UPDATE=always cargo test
-p codex-core guardian::`

---------

Co-authored-by: Owen Lin <owen@openai.com>
2026-04-08 15:47:29 -07:00
rhan-oai
756c45ec61 [codex-analytics] add protocol-native turn timestamps (#16638)
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16638).
* #16870
* #16706
* #16659
* #16641
* #16640
* __->__ #16638
2026-04-06 16:22:59 -07:00
Owen Lin
30f6786d62 fix(guardian): make GuardianAssessmentEvent.action strongly typed (#16448)
## Description

Previously the `action` field on `EventMsg::GuardianAssessment`, which
describes what Guardian is reviewing, was typed as an arbitrary JSON
blob. This PR cleans it up and defines a sum type representing all the
various actions that Guardian can review.

This is a breaking change (on purpose), which is fine because:
- the Codex app / VSCE does not actually use `action` at the moment
- the TUI code that consumes `action` is updated in this PR as well
- rollout files that serialized old `EventMsg::GuardianAssessment` will
just silently drop these guardian events
- the contract is defined as unstable, so other clients have a fair
warning :)

This will make things much easier for followup Guardian work.

## Why

The old guardian review payloads worked, but they pushed too much shape
knowledge into downstream consumers. The TUI had custom JSON parsing
logic for commands, patches, network requests, and MCP calls, and the
app-server protocol was effectively just passing through an opaque blob.

Typing this at the protocol boundary makes the contract clearer.
2026-04-01 15:42:18 -07:00
Michael Bolin
5906c6a658 chore: remove skill metadata from command approval payloads (#15906)
## Why

This is effectively a follow-up to
[#15812](https://github.com/openai/codex/pull/15812). That change
removed the special skill-script exec path, but `skill_metadata` was
still being threaded through command-approval payloads even though the
approval flow no longer uses it to render prompts or resolve decisions.

Keeping it around added extra protocol, schema, and client surface area
without changing behavior.

Removing it keeps the command-approval contract smaller and avoids
carrying a dead field through app-server, TUI, and MCP boundaries.

## What changed

- removed `ExecApprovalRequestSkillMetadata` and the corresponding
`skillMetadata` field from core approval events and the v2 app-server
protocol
- removed the generated JSON and TypeScript schema output for that field
- updated app-server, MCP server, TUI, and TUI app-server approval
plumbing to stop forwarding the field
- cleaned up tests that previously constructed or asserted
`skillMetadata`

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-test-client`
- `cargo test -p codex-mcp-server`
- `just argument-comment-lint`
2026-03-26 15:32:03 -07:00
Charley Cunningham
bc24017d64 Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved

## Runtime model
This PR does not introduce a new `approval_policy`.

Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
  - `user`
  - `guardian_subagent`

`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.

The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.

When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`

Users can still change `/approvals` afterward.

Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior

ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.

## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.

- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface

## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.

It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`

with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`

`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`

`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.

These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.

This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.

## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
  - resolved approval/denial state in history

## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)

Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 15:27:00 -07:00
Jack Mousseau
7c7e267501 Simplify permissions available in request permissions tool (#14529) 2026-03-12 21:13:17 -07:00
Michael Bolin
0c8a36676a fix: move inline codex-rs/core unit tests into sibling files (#14444)
## Why
PR #13783 moved the `codex.rs` unit tests into `codex_tests.rs`. This
applies the same extraction pattern across the rest of `codex-rs/core`
so the production modules stay focused on runtime code instead of large
inline test blocks.

Keeping the tests in sibling files also makes follow-up edits easier to
review because product changes no longer have to share a file with
hundreds or thousands of lines of test scaffolding.

## What changed
- replaced each inline `mod tests { ... }` in `codex-rs/core/src/**`
with a path-based module declaration
- moved each extracted unit test module into a sibling `*_tests.rs`
file, using `mod_tests.rs` for `mod.rs` modules
- preserved the existing `cfg(...)` guards and module-local structure so
the refactor remains structural rather than behavioral

## Testing
- `cargo test -p codex-core --lib` (`1653 passed; 0 failed; 5 ignored`)
- `just fix -p codex-core`
- `cargo fmt --check`
- `cargo shear`
2026-03-12 08:16:36 -07:00