Guardian events were emitted a bit out of order for CommandExecution
items. This would make it hard for the frontend to render a guardian
auto-review, which has this payload:
```
pub struct ItemGuardianApprovalReviewStartedNotification {
pub thread_id: String,
pub turn_id: String,
pub target_item_id: String,
pub review: GuardianApprovalReview,
// FYI this is no longer a json blob
pub action: Option<JsonValue>,
}
```
There is a `target_item_id` the auto-approval review is referring to,
but the actual item had not been emitted yet.
Before this PR:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`, and if approved...
- `item/started`
- `item/completed`
After this PR:
- `item/started`
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`
- `item/completed`
This lines up much better with existing patterns (i.e. human review in
`Default mode`, where app-server would send a server request to prompt
for user approval after `item/started`), and makes it easier for clients
to render what guardian is actually reviewing.
We do this following a similar pattern as `FileChange` (aka apply patch)
items, where we create a FileChange item and emit `item/started` if we
see the apply patch approval request, before the actual apply patch call
runs.
- create `codex-git-utils` and move the shared git helpers into it with
file moves preserved for diff readability
- move the `GitInfo` helpers out of `core` so stacked rollout work can
depend on the shared crate without carrying its own git info module
---------
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
### Why
i'm working on something that parses and analyzes codex rollout logs,
and i'd like to have a schema for generating a parser/validator.
`codex app-server generate-internal-json-schema` writes an
`RolloutLine.json` file
while doing this, i noticed we have a writer <> reader mismatch issue on
`FunctionCallOutputPayload` and reasoning item ID -- added some schemars
annotations to fix those
### Test
```
$ just codex app-server generate-internal-json-schema --out ./foo
```
generates an `RolloutLine.json` file, which i validated against jsonl
files on disk
`just codex app-server --help` doesn't expose the
`generate-internal-json-schema` option by default, but you can do `just
codex app-server generate-internal-json-schema --help` if you know the
command
everything else still works
---------
Co-authored-by: Codex <noreply@openai.com>
## Description
This PR trims `app-server-protocol`'s v1 surface down to the small set
of legacy types we still actually use.
Unfortunately, we can't delete all of them yet because:
- a few one-off v1 RPCs are still used by the Codex app
- a few of these app-server-protocol v1 types are actually imported by
core crates
This change deletes that unused RPC surface, keeps the remaining
compatibility types in place, and makes the crate root re-export only
the v1 structs that downstream crates still depend on.
## Why
The main goal here is to make the legacy protocol surface match reality.
Leaving a large pile of dead v1 structs in place makes it harder to tell
which compatibility paths are still intentional, and it keeps old
schema/types around even though nothing should be building against them
anymore.
This also gives us a cleaner boundary for future cleanup. Instead of
re-exporting all of `protocol::v1::*`, the crate now explicitly exposes
only the v1 types that are still live, which makes it much easier to see
what remains and delete more safely later.
## What changed
- Deleted the unused v1 RPC/request/response structs from
`app-server-protocol/src/protocol/v1.rs`.
- Kept the small set of v1 compatibility types that are still live,
including:
- `initialize`
- `getConversationSummary`
- `getAuthStatus`
- `gitDiffToRemote`
- legacy approval payloads
- config-related structs still used by downstream crates
- Replaced the blanket `pub use protocol::v1::*` export in
`app-server-protocol/src/lib.rs` with an explicit list of the remaining
supported v1 types.
- Regenerated the schema/type artifacts, which also updated the
`InitializeCapabilities` opt-out example to use `thread/started` instead
of the old `codex/event/session_configured` example.
## Validation
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
## Follow-up
The next cleanup is to keep shrinking the remaining v1 compatibility
surface as callers migrate off it. Once the remaining consumers stop
importing these legacy types, we should be able to remove more of the v1
module and eventually stop exporting it from the crate root entirely.
## What changed
- TypeScript schema fixture generation now goes through in-memory tree
helpers rather than a heavier on-disk generation path.
- The comparison logic normalizes generated banner and path differences
that are not semantically relevant to the exported schema.
- TypeScript and JSON fixture coverage are split into separate tests,
and the expensive schema-export tests are serialized in `nextest`.
## Why this fixes the flake
- The original fixture coverage mixed several heavy codegen paths into
one monolithic test and then compared generated output that included
incidental banner/path differences.
- On Windows CI, that combination created both runtime pressure and
output variance unrelated to the schema shapes we actually care about.
- Splitting the coverage isolates failures by format, in-memory
generation reduces filesystem churn, normalization strips generator
noise, and serializing the heavy tests removes parallel resource
contention.
## Scope
- Production helper change plus test changes.
Adding a `--experimental` flag to the `generate-ts` fct in the
app-sever.
It can be called through one of those 2 command
```
just write-app-server-schema --experimental
codex app-server generate-ts --experimental
```
## Problem being solved
- We need a single, reliable way to mark app-server API surface as
experimental so that:
1. the runtime can reject experimental usage unless the client opts in
2. generated TS/JSON schemas can exclude experimental methods/fields for
stable clients.
Right now that’s easy to drift or miss when done ad-hoc.
## How to declare experimental methods and fields
- **Experimental method**: add `#[experimental("method/name")]` to the
`ClientRequest` variant in `client_request_definitions!`.
- **Experimental field**: on the params struct, derive `ExperimentalApi`
and annotate the field with `#[experimental("method/name.field")]` + set
`inspect_params: true` for the method variant so
`ClientRequest::experimental_reason()` inspects params for experimental
fields.
## How the macro solves it
- The new derive macro lives in
`codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
`#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
attributes.
- **Structs**:
- Generates `ExperimentalApi::experimental_reason(&self)` that checks
only annotated fields.
- The “presence” check is type-aware:
- `Option<T>`: `is_some_and(...)` recursively checks inner.
- `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
- `bool`: must be `true`.
- Other types: considered present (returns `true`).
- Registers each experimental field in an `inventory` with `(type_name,
serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
that type. Field names are converted from `snake_case` to `camelCase`
for schema/TS filtering.
- **Enums**:
- Generates an exhaustive `match` returning `Some(reason)` for annotated
variants and `None` otherwise (no wildcard arm).
- **Wiring**:
- Runtime gating uses `ExperimentalApi::experimental_reason()` in
`codex-rs/app-server/src/message_processor.rs` to reject requests unless
`InitializeParams.capabilities.experimental_api == true`.
- Schema/TS export filters use the inventory list and
`EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
strip experimental methods/fields when `experimental_api` is false.
Similar to what @sayan-oai did in openai/codex#8956 for
`config.schema.json`, this PR updates the repo so that it includes the
output of `codex app-server generate-json-schema` and `codex app-server
generate-ts` and adds a test to verify it is in sync with the current
code.
Motivation:
- This makes any schema changes introduced by a PR transparent during
code review.
- In particular, this should help us catch PRs that would introduce a
non-backwards-compatible change to the app schema (eventually, this
should also be enforced by tooling).
- Once https://github.com/openai/codex/pull/10231 is in to formalize the
notion of "experimental" fields, we can work on ensuring the
non-experimental bits are backwards-compatible.
`codex-rs/app-server-protocol/tests/schema_fixtures.rs` was added as the
test and `just write-app-server-schema` can be use to generate the
vendored schema files.
Incidentally, when I run:
```
rg _ codex-rs/app-server-protocol/schema/typescript/v2
```
I see a number of `snake_case` names that should be `camelCase`.
This PR allows clients to render historical messages when resuming a
thread via `thread/resume` by reading from the list of `EventMsg`
payloads loaded from the rollout, and then transforming them into Turns
and ThreadItems to be returned on the `Thread` object.
This is implemented by leveraging `SessionConfiguredNotification` which
returns this list of `EventMsg` objects when resuming a conversation,
and then applying a stateful `ThreadHistoryBuilder` that parses from
this EventMsg log and transforms it into Turns and ThreadItems.
Note that we only persist a subset of `EventMsg`s in a rollout as
defined in `policy.rs`, so we lose fidelity whenever we resume a thread
compared to when we streamed the thread's turns originally. However,
this behavior is at parity with the legacy API.
Add annotations and an export script that let us generate app-server
protocol types as typescript and JSONSchema.
The script itself is a bit hacky because we need to manually label some
of the types. Unfortunately it seems that enum variants don't get good
names by default and end up with something like `EventMsg1`,
`EventMsg2`, etc. I'm not an expert in this by any means, but since this
is only run manually and we already need to enumerate the types required
to describe the protocol, it didn't seem that much worse. An ideal
solution here would be to have some kind of root that we could generate
schemas for in one go, but I'm not sure if that's compatible with how we
generate the protocol today.
We continue the separation between `codex app-server` and `codex
mcp-server`.
In particular, we introduce a new crate, `codex-app-server-protocol`,
and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it
`codex-rs/app-server-protocol/src/protocol.rs`.
Because `ConversationId` was defined in `mcp_protocol.rs`, we move it
into its own file, `codex-rs/protocol/src/conversation_id.rs`, and
because it is referenced in a ton of places, we have to touch a lot of
files as part of this PR.
We also decide to get away from proper JSON-RPC 2.0 semantics, so we
also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which
is basically the same `JSONRPCMessage` type defined in `mcp-types`
except with all of the `"jsonrpc": "2.0"` removed.
Getting rid of `"jsonrpc": "2.0"` makes our serialization logic
considerably simpler, as we can lean heavier on serde to serialize
directly into the wire format that we use now.