mirror of
https://github.com/openai/codex.git
synced 2026-05-23 20:44:50 +00:00
2ac71d4e0b158d8b6b9e9eb119070dba936e6fa4
35 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2ac71d4e0b | Reject local-only app-server APIs without local env | ||
|
|
3fd79b7986 |
app-server: use profile ids in v2 permission params (#23360)
## Why The v2 app-server permission profile fields are experimental, but the previous migration kept a legacy object payload for profile selection. That made clients aware of server-owned `activePermissionProfile` metadata such as `extends`, and it kept a `legacy_additional_writable_roots` path even though `runtimeWorkspaceRoots` now owns runtime workspace-root selection. This PR makes the client contract match the intended model: clients select a permission profile by id, and the server resolves and reports active profile provenance in response payloads. Follow-up to #22611. ## What Changed - Changed `thread/start`, `thread/resume`, `thread/fork`, and `turn/start` permission profile selection to plain profile id strings. - Changed `command/exec.permissionProfile` to a plain profile id string for the same client/server ownership split. - Removed `PermissionProfileSelectionParams` and the legacy `{ type: "profile", modifications: [...] }` compatibility deserializer. - Updated app-server, TUI, and `codex exec` call sites to send only ids, while keeping `activePermissionProfile` as server response metadata. - Updated app-server docs and schema fixtures for the revised `command/exec.permissionProfile` shape. ## Verification - `cargo test -p codex-app-server-protocol` - `RUST_MIN_STACK=8388608 cargo test -p codex-app-server` - `cargo test -p codex-exec` - `RUST_MIN_STACK=8388608 cargo test -p codex-tui` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23360). * #23368 * __->__ #23360 |
||
|
|
83bbb4f326 |
app-server: stop returning thread permission profiles (#22792)
## Why The app-server thread lifecycle API should no longer expose the full `PermissionProfile` value. After the permissions-profile migration, clients should round-trip only the active profile identity through `activePermissionProfile` and `permissions` when that identity is known. The full profile is server-side config. Treating a response-derived legacy sandbox projection as a new local profile can lose named-profile restrictions and accidentally widen permissions on the next turn. The legacy `sandbox` response field remains only as the compatibility/display fallback. ## What Changed - Removed `permissionProfile` from `ThreadStartResponse`, `ThreadResumeResponse`, and `ThreadForkResponse`. - Stopped populating that field in app-server thread start/resume/fork responses. - Updated embedded exec/TUI response mapping to derive display permission state from local config or the legacy sandbox fallback instead of a response profile value. - Added a TUI turn override shape that distinguishes preserving server permissions, selecting an active profile id, and sending a legacy sandbox for an explicit local override. - Preserved remote app-server permissions across turns by sending `permissions` only when an `activePermissionProfile` id is known, and otherwise sending no sandbox override unless the user selected a local override. - Kept embedded `thread/resume` hydration server-authored when `activePermissionProfile` is absent, which matches the live-thread attach path where the server ignores requested overrides. - Updated the app-server README to remove the obsolete lifecycle response `permissionProfile` reference. The remaining `permissionProfile` README references are request-side permission overrides. - Regenerated app-server JSON schema and TypeScript fixtures. - Kept the generated typed response enum exempt from `large_enum_variant`, matching the existing payload enum exemption after the lifecycle response variants shrank. ## How To Review Start with `codex-rs/app-server-protocol/src/protocol/v2/thread.rs` to confirm the response shape, then check the response construction in `codex-rs/app-server/src/request_processors`. The generated schema and TypeScript fixture changes are mechanical follow-through from the protocol removal. The TUI behavior is the delicate part: review `codex-rs/tui/src/app_server_session.rs` for response hydration and turn-start override projection, then `codex-rs/tui/src/app/thread_routing.rs` for the decision about whether the next turn should preserve the server snapshot, send an active profile id, or send a legacy sandbox for an explicit local override. ## Verification - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol thread_lifecycle_responses_default_missing_optional_fields` - `cargo test -p codex-exec session_configured_from_thread_response_uses_permission_profile_from_config` - `cargo test -p codex-tui --lib thread_response` - `cargo test -p codex-tui turn_permissions_` - `cargo test -p codex-tui resume_response_restores_turns_from_thread_items` - `cargo test -p codex-analytics track_response_only_enqueues_analytics_relevant_responses` - `just fix -p codex-analytics` - `just fix -p codex-app-server-protocol` - `just fix -p codex-tui` - `just argument-comment-lint` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22792). * #22795 * __->__ #22792 |
||
|
|
8a5306ff88 |
app-server: use permission ids and runtime workspace roots (#22611)
## Why This PR builds on [#22610](https://github.com/openai/codex/pull/22610) and is the app-server side of the migration from mutable per-turn `SandboxPolicy` replacement toward selecting immutable permission profiles by id plus mutable runtime workspace roots. Once permission profiles can carry their own immutable `workspace_roots`, app-server no longer needs to mutate the selected `PermissionProfile` just to represent thread-specific filesystem context. The mutable part now lives on the thread as explicit `runtimeWorkspaceRoots`, while `:workspace_roots` remains symbolic until the sandbox is realized for a turn. ## What Changed - Replaced the v2 permission-selection wrapper surface with plain profile ids for `thread/start`, `thread/resume`, `thread/fork`, and `turn/start`. - Removed the API surface for profile modifications (`PermissionProfileSelectionParams`, `PermissionProfileModificationParams`, `ActivePermissionProfileModification`). - Added experimental `runtimeWorkspaceRoots` fields to the thread lifecycle and turn-start APIs. - Threaded runtime workspace roots through core session/thread snapshots, turn overrides, app-server request handling, and command execution permission resolution. - Kept session permission state symbolic so later runtime root updates and cwd-only implicit-root retargeting rebind `:workspace_roots` correctly. - Updated the embedded clients just enough to send and restore the new thread state. - Refreshed the generated schema/TypeScript artifacts and the app-server README to match the new contract. ## Verification Targeted coverage for this layer lives in: - `codex-rs/app-server-protocol/src/protocol/v2/tests.rs` - `codex-rs/app-server/tests/suite/v2/thread_start.rs` - `codex-rs/app-server/tests/suite/v2/thread_resume.rs` - `codex-rs/app-server/tests/suite/v2/turn_start.rs` - `codex-rs/core/src/session/tests.rs` The key regression checks exercise that: - `runtimeWorkspaceRoots` resolve against the effective cwd on thread start. - Profile-declared workspace roots are excluded from the runtime workspace roots returned by app-server. - A turn-level runtime workspace-root update persists onto the thread and is returned by `thread/resume`. - A named permission profile selected on one turn remains symbolic so a later runtime-root-only turn update changes the actual sandbox writes. - A cwd-only turn update retargets the implicit runtime cwd root while preserving additional runtime roots. - The protocol fixtures and generated client artifacts stay in sync with the string-based permission selection contract. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22611). * #22612 * __->__ #22611 |
||
|
|
c25d905f61 |
permissions: support workspace roots in profiles (#22610)
## Why This is the configuration/model half of the alternative permissions migration we discussed as a comparison point for [#22401](https://github.com/openai/codex/pull/22401) and [#22402](https://github.com/openai/codex/pull/22402). The old `workspace-write` model mixes three concerns that we want to keep separate: - reusable profile rules that should stay immutable once selected - user/runtime workspace roots from `cwd`, `--add-dir`, and legacy workspace-write config - internal Codex writable roots such as memories, which should not be shown as user workspace roots This PR gives permission profiles first-class `workspace_roots` so users can opt multiple repositories into the same `:workspace_roots` rules without using broad absolute-path write grants. It also starts separating the raw selected profile from the effective runtime profile by making `Permissions` expose explicit accessors instead of public mutable fields. A representative `config.toml` looks like this: ```toml default_permissions = "dev" [permissions.dev.workspace_roots] "~/code/openai" = true "~/code/developers-website" = true [permissions.dev.filesystem.":workspace_roots"] "." = "write" ".codex" = "read" ".git" = "read" ".vscode" = "read" ``` If Codex starts in `~/code/codex` with that profile selected, the effective workspace-root set becomes: - `~/code/codex` from the runtime `cwd` - `~/code/openai` from the profile - `~/code/developers-website` from the profile The `:workspace_roots` rules are materialized across each root, so `.git`, `.codex`, and `.vscode` stay scoped the same way everywhere. Runtime additions such as `--add-dir` can still layer on later stack entries without mutating the selected profile. ## Stack Shape This PR intentionally stops before the profile-identity cleanup in [#22683](https://github.com/openai/codex/pull/22683) so the base review stays focused on config loading, workspace-root materialization, and compatibility with legacy `workspace-write`. The representation in this PR is therefore transitional: `Permissions` carries enough state to distinguish the raw constrained profile from the effective runtime profile, and there are still call sites that must keep the active profile identity and constrained profile value in sync. The follow-up PR replaces that with a single resolved profile state (`ResolvedPermissionProfile` / `PermissionProfileState`) that keeps the profile id, immutable `PermissionProfile`, and profile-declared workspace roots together. That follow-up removes APIs such as `set_constrained_permission_profile_with_active_profile()` where separate arguments could drift out of sync. Downstream PRs then build on this base to switch app-server turn updates to profile ids plus runtime workspace roots and to finish the user-visible summary behavior. Reviewers should judge this PR as the workspace-roots foundation, not as the final in-memory shape of selected permission profiles. ## Review Guide Suggested review order: 1. Start with `codex-rs/core/src/config/mod.rs`. This is the main shape change in the base slice. `Permissions` now stores a private raw `Constrained<PermissionProfile>` plus runtime `workspace_roots`. Callers use `permission_profile()` when they need the raw constrained value and `effective_permission_profile()` when they need a materialized runtime profile. As noted above, [#22683](https://github.com/openai/codex/pull/22683) replaces this transitional shape with a resolved profile state that keeps identity and profile data together. 2. Review `codex-rs/config/src/permissions_toml.rs` and `codex-rs/core/src/config/permissions.rs`. These add `[permissions.<id>.workspace_roots]`, resolve enabled entries relative to the policy cwd, and keep `:workspace_roots` deny-read glob patterns symbolic until the actual roots are known. 3. Review `codex-rs/protocol/src/permissions.rs` and `codex-rs/protocol/src/models.rs`. These add the policy/profile materialization helpers that expand exact `:workspace_roots` entries and scoped deny-read globs over every workspace root. This is also where `ActivePermissionProfileModification` is removed from the core model. 4. Review the legacy bridge in `Config::load_from_base_config_with_overrides` and `Config::set_legacy_sandbox_policy`. This is where legacy `workspace-write` roots become runtime workspace roots, while Codex internal writable roots stay internal and do not appear as user-facing workspace roots. 5. Then skim downstream call sites. The interesting pattern is raw-vs-effective access: state/proxy/bwrap paths keep the raw constrained profile, while execution, summaries, and user-visible status use the effective profile and workspace-root list. ## What Changed - added `[permissions.<id>.workspace_roots]` to the config model and schema - added runtime `workspace_roots` state to `Config`/`Permissions` and `ConfigOverrides` - made `Permissions` profile fields private and replaced direct mutation with accessors/setters - added `PermissionProfile` and `FileSystemSandboxPolicy` helpers for materializing `:workspace_roots` exact paths and deny-read globs across all roots - moved legacy additional writable roots into runtime workspace-root state instead of active profile modifications - removed `ActivePermissionProfileModification` and its app-server protocol/schema export - updated sandbox/status summary paths so internal writable roots are not reported as user workspace roots ## Verification Strategy The targeted tests cover the behavior at the layers where regressions are most likely: - `codex-rs/core/src/config/config_tests.rs` verifies config loading, legacy workspace-root seeding, effective profile materialization, and memory-root handling. - `codex-rs/core/src/config/permissions_tests.rs` verifies profile `workspace_roots` parsing and `:workspace_roots` scoped/glob compilation. - `codex-rs/protocol/src/permissions.rs` unit tests verify exact and glob materialization over multiple workspace roots. - `codex-rs/tui/src/status/tests.rs` and `codex-rs/utils/sandbox-summary/src/sandbox_summary.rs` verify the user-facing summaries show effective workspace roots and hide internal writes. I also ran `cargo check --tests` locally after the latest stack refresh to catch cross-crate API breakage from the private-field/accessor changes. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22610). * #22612 * #22611 * #22683 * __->__ #22610 |
||
|
|
01d93fd9fc |
permissions: canonicalize workspace_roots and danger-full-access names (#22624)
## Why This is a small precursor to the larger permissions-migration work. Both the comparison stack in [#22401](https://github.com/openai/codex/pull/22401) / [#22402](https://github.com/openai/codex/pull/22402) and the alternate stack in [#22610](https://github.com/openai/codex/pull/22610) / [#22611](https://github.com/openai/codex/pull/22611) / [#22612](https://github.com/openai/codex/pull/22612) are easier to review if the terminology is already settled underneath them. Because `:project_roots` and `:danger-no-sandbox` have not shipped as stable user-facing surface area, carrying them forward as aliases would just add more migration logic to the later stacks. This PR removes that ambiguity now so the follow-on work can rely on one spelling for each built-in concept. ## What Changed - renamed the config-facing special filesystem key from `:project_roots` to `:workspace_roots` - dropped unpublished `:project_roots` parsing support in `core/src/config/permissions.rs`, so new config only recognizes `:workspace_roots` - renamed the built-in full-access permission profile id from `:danger-no-sandbox` to `:danger-full-access` - dropped unpublished `:danger-no-sandbox` support entirely, including the old active-profile canonicalization path, and added explicit rejection coverage for the legacy id - introduced shared built-in permission-profile id constants in `codex-rs/protocol/src/models.rs` - updated `core`, `app-server`, and `tui` call sites that special-case built-in profiles to use the shared constants and canonical ids - updated tests and the Linux sandbox README to use `:workspace_roots` / `:danger-full-access` ## Verification I focused verification on the three places this rename can regress: config parsing, active-profile identity surfaced back out of `core`, and user/server call sites that special-case built-in profiles. Targeted checks: - `config::tests::default_permissions_can_select_builtin_profile_without_permissions_table` - `config::tests::default_permissions_read_only_applies_additional_writable_roots_as_modifications` - `config::tests::default_permissions_can_select_builtin_full_access_profile` - `config::tests::legacy_danger_no_sandbox_is_rejected` - `workspace_root` filtered `codex-core` tests - `request_processors::thread_processor::thread_processor_tests::thread_processor_behavior_tests::requested_permissions_trust_project_uses_permission_profile_intent` - `suite::v2::turn_start::turn_start_rejects_invalid_permission_selection_before_starting_turn` - `status::tests::status_snapshot_shows_auto_review_permissions` - `status::tests::status_permissions_full_disk_managed_with_network_is_danger_full_access` - `app_server_session::tests::embedded_turn_permissions_use_active_profile_selection` |
||
|
|
c51c65ad09 |
Unify thread metadata updates above store (#22236)
- make ThreadStore::update_thread_metadata accept a broad range of metadata patches - keep ThreadStore::append_items as raw canonical history append (no metadata side effects) - in the local store, write these metadata updates to a combination of sqlite and rollout jsonl files for backwards-compat. It special cases which fields need to go into jsonl vs sqlite vs whatever, confining the awkwardness to just this implementation - in remote stores we can simply persist the metadata directly to a database, no special casing required. - move the "implicit metadata updates triggered by appending rollout items" from the RolloutRecorder (which is local-threadstore-specific) to the LiveThread layer above the ThreadStore, inside of a private helper utility called ThreadMetadataSync. LiveThread calls ThreadStore append_items and update_metadata separately. - Add a generic update metadata method to ThreadManager that works on both live threads and "cold" threads - Call that ThreadManager method from app server code, so app server doesn't need to worry about whether the thread is live or not |
||
|
|
e15ecc9c35 |
Add production startup and TTFT telemetry (#22198)
## Why While investigating `codex exec hi` startup latency, the useful questions were not "is startup slow?" but "which durable bucket is slow in production?" The path we observed has a few distinct stages: 1. `thread/start` creates the session 2. startup prewarm builds the turn context, tools, and prompt 3. startup prewarm warms the websocket 4. the first real turn resolves the prewarm 5. the model produces the first token Before this PR, production telemetry had some of the raw measurements already: - aggregate startup-prewarm duration / age-at-first-turn metrics - TTFT as a metric - websocket request telemetry But there was no coherent production event stream for the startup breakdown itself, and TTFT was metric-only. That made it hard to answer the same latency questions from OpenTelemetry-backed logs without adding one-off local instrumentation. ## What changed Add durable production telemetry on the existing `SessionTelemetry` path: - new `codex.startup_phase` OTel log/trace events plus `codex.startup.phase.duration_ms` - new `codex.turn_ttft` OTel log/trace events while preserving the existing TTFT metric The startup phase event is emitted for the coarse buckets we actually observed while running `exec hi`: - `thread_start_create_thread` - `startup_prewarm_total` - `startup_prewarm_create_turn_context` - `startup_prewarm_build_tools` - `startup_prewarm_build_prompt` - `startup_prewarm_websocket_warmup` - `startup_prewarm_resolve` These phases are intentionally low-cardinality so they remain safe as production telemetry tags. ## Why this shape This keeps the instrumentation on the same production path as the rest of the session telemetry instead of adding a local debug-only trace mode. It also avoids changing startup behavior: - prewarm still runs - no control flow changes - no extra remote calls - no user-visible behavior changes One boundary is intentional: very early process bootstrap that happens before a session exists is not included here, because this PR uses session-scoped production telemetry. The expensive buckets we were trying to understand after `thread/start` are now covered durably. ## Verification - `cargo test -p codex-otel` - `cargo test -p codex-core turn_timing` - `cargo test -p codex-core regular_turn_emits_turn_started_without_waiting_for_startup_prewarm` - `cargo test -p codex-core interrupting_regular_turn_waiting_on_startup_prewarm_emits_turn_aborted` - `cargo test -p codex-app-server thread_start` - `just fix -p codex-otel -p codex-core -p codex-app-server` I also ran `cargo test -p codex-core`; it built successfully and then hit an existing unrelated stack overflow in `tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`. |
||
|
|
7bddb3083d |
fix(app-server): thread history redaction for remote clients (#22178)
## Summary Remote clients can still receive large `thread/resume` histories when prior turns include MCP tool call payloads or image-generation results. This adds a temporary response-only redaction path for the known remote client names. Longer term we will move towards fully paginated APIs backed by SQLite. ## Changes - Redact MCP tool call payload-bearing fields in `thread/resume` responses for `codex_chatgpt_android_remote` and `codex_chatgpt_ios_remote`. - Drop `imageGeneration` items from those `thread/resume` responses. - Keep redaction out of persisted rollout files, `thread/read`, `thread/turns/list`, live notifications, and token usage replay. - Cover the behavior with app-server helper tests and a v2 resume integration test that checks both remote clients plus a non-target control client. ## Testing - `cargo test -p codex-app-server thread_resume_redaction` - `cargo test -p codex-app-server thread_resume_redacts_payloads_for_chatgpt_remote_clients` |
||
|
|
f10ddc3f13 |
Use goal preview metadata for goal-first threads (#21981)
Fixes #20792 ## Why `/goal`-first threads are valid resumable threads, but they can be missing from `codex resume` and app recents because discovery depends on metadata derived from a normal first user message. PR #21489 attempted to fix this by using the goal objective as `first_user_message`. Review feedback pointed out that `first_user_message` does more than provide visible text today: it gates listing, supplies preview text, and participates in deciding whether a later title should surface as a distinct thread name. Reusing it for the goal objective could leave a `/goal`-first thread with `first_user_message=<goal>` and `title=<later prompt>`, even though the goal should only provide the initial visible preview. This PR follows that feedback by and keeps the `first_user_message` as is but introduces a new `preview` field to separate concerns. The `preview` field is populated from the first user message or the goal objective. We can extend it in the future to include other sources. ## What Changed - Added internal thread `preview` metadata in `codex-state`, including a SQLite migration that backfills from `first_user_message` and from existing `thread_goals` objectives when needed. - Treated `ThreadGoalUpdated` as preview-bearing metadata so goal-first threads can be listed and searched without mutating `first_user_message`. - Updated rollout listing, state queries, thread-store conversion, and app-server mapping to use preview metadata while continuing to expose the existing public `preview` field. - Preserved title/name distinctness behavior around literal `first_user_message`, so a later normal prompt after `/goal` does not surface as a separate name just because the goal supplied the initial preview. - Preserved compatibility for older/internal metadata writes by deriving preview from `first_user_message` when explicit preview metadata is absent. ## Verification - Manually verified that a thread that starts with a `/goal <objective>` shows up in the resume picker. |
||
|
|
408e6218ab |
Reapply "Move skills watcher to app-server" (#21652)
## Why PR #21460 reverted the earlier move of skills change watching from `codex-core` into app-server. This reapplies that boundary change so app-server owns client-facing `skills/changed` notifications and core no longer carries the watcher. ## What - Restore the app-server `SkillsWatcher` and register it from thread listener setup. - Remove the core-owned skills watcher and its core live-reload integration surface. - Restore app-server coverage for `skills/changed` notifications after a watched skill file changes. ## Validation - `cargo test -p codex-app-server --test all suite::v2::skills_list::skills_changed_notification_is_emitted_after_skill_change -- --exact --nocapture` - `cargo test -p codex-core --lib --no-run` |
||
|
|
5f4d0ec343 |
[codex] request desktop attestation from app (#20619)
## Summary TL;DR: teaches `codex-rs` / app-server to request a desktop-provided attestation token and attach it as `x-oai-attestation` on the scoped ChatGPT Codex request paths.  ## Details This PR teaches the Codex app-server runtime how to request and attach an attestation token. It does not generate DeviceCheck tokens directly; instead, it relies on the connected desktop app to advertise that it can generate attestation and then asks that app for a fresh header value when needed. The flow is: 1. The Codex desktop app connects to app-server. 2. During `initialize`, the app can advertise that it supports `requestAttestation`. 3. Before app-server calls selected ChatGPT Codex endpoints, it sends the internal server request `attestation/generate` to the app. 4. app-server receives a pre-encoded header value back. 5. app-server forwards that value as `x-oai-attestation` on the scoped outbound requests. The code in this repo is mostly protocol and runtime plumbing: it adds the app-server request/response shape, introduces an attestation provider in core, wires that provider into Responses / compaction / realtime setup paths, and covers the intended scoping with tests. The signed macOS DeviceCheck generation remains owned by the desktop app PR. ## Related PR - Codex desktop app implementation: https://github.com/openai/openai/pull/878649 ## Validation <details> <summary>Tests run</summary> ```sh cargo test -p codex-app-server-protocol cargo test -p codex-core attestation --lib cargo test -p codex-app-server --lib attestation ``` Also ran: ```sh just fix -p codex-core just fix -p codex-app-server just fix -p codex-app-server-protocol just fmt just write-app-server-schema ``` </details> <details> <summary>E2E DeviceCheck validation</summary> First validated the signed desktop app boundary directly: launched a packaged signed `Codex.app`, sent `attestation/generate`, decoded the returned `v1.` attestation header, and validated the extracted DeviceCheck token with `personal/jm/verify_devicecheck_token.py` using bundle ID `com.openai.codex`. Apple returned `status_code: 200` and `is_ok: true`. Then ran the fuller app + app-server flow. The packaged `Codex.app` launched a current-branch app-server via `CODEX_CLI_PATH`, and a local MITM proxy intercepted outbound `chatgpt.com` traffic. The app-server requested `attestation/generate` from the real Electron app process, and the intercepted `/backend-api/codex/responses` traffic included `x-oai-attestation` on both routes: ```text GET /backend-api/codex/responses Upgrade: websocket x-oai-attestation: present POST /backend-api/codex/responses Upgrade: none x-oai-attestation: present ``` The captured header decoded to a DeviceCheck token that also validated with Apple for `com.openai.codex` (`status_code: 200`, `is_ok: true`, team `2DC432GLL2`). </details> --------- Co-authored-by: Codex <noreply@openai.com> |
||
|
|
0d0835dd53 |
feat(app-server, threadstore): Thread pagination APIs and ThreadStore contract (#21566)
## Why The goal of this PR is to align on app-server and `ThreadStore` API updates for paginating through large threads. #### app-server ##### `thread/turns/list` - Updates `thread/turns/list` to support `itemsView?: "notLoaded" | "summary" | "full" | null`, defaulting to `summary`. - Implements the current `thread/turns/list` behavior over the existing persisted rollout-history fallback: - `notLoaded` returns turn envelopes with empty `items`. - `summary` returns the first user message and final assistant message when available. - `full` preserves the existing full item behavior. Note that this method still uses the naive approach of loading the entire rollout file, and returns just the filtered slice of the data. Real pagination will come later by leveraging SQLite. ##### `thread/turns/items/list` - Adds the experimental `thread/turns/items/list` protocol, schema, dispatcher, and processor stub. The app-server currently returns JSON-RPC `-32601` with `thread/turns/items/list is not supported yet`. #### ThreadStore - Adds the experimental `thread/turns/items/list` protocol, schema, dispatcher, and processor stub. The app-server currently returns JSON-RPC `-32601` with `thread/turns/items/list is not supported yet`. - Adds `ThreadStore` contract types and stubbed methods for listing thread turns and listing items within a turn. - Adds a typed `StoredTurnStatus` and `StoredTurnError` to avoid baking app-server API enums or lossy string status values into the store-facing turn contract. - Adds a typed `StoredTurnStatus` and `StoredTurnError` to avoid baking app-server API enums or lossy string status values into the store-facing turn contract. This also sketches the storage abstraction we expect to need once turns are indexed/stored. In particular, `notLoaded` is useful only if ThreadStore can eventually list turn metadata without loading every persisted item for each turn. ## Validation - Added/updated protocol serialization coverage for the new request and response shapes. - Added app-server integration coverage for `thread/turns/list` default summary behavior and all three `itemsView` modes. - Added app-server integration coverage that `thread/turns/items/list` returns the expected unsupported JSON-RPC error when experimental APIs are enabled. - Added thread-store coverage that the default trait methods return `ThreadStoreError::Unsupported`. No developers.openai.com documentation update is needed for this internal experimental app-server API surface. |
||
|
|
0274398901 |
[codex] Fix pathless thread summaries (#21266)
## Summary Fix `getConversationSummary` so thread-id summaries work for stored threads that do not have a local rollout path, such as remote thread stores. The root cause was that `summary_from_stored_thread` returned `None` when `StoredThread.rollout_path` was absent, and `get_thread_summary_response_inner` treated that as an internal error. This made conversation-id lookups depend on a local-only field even though the thread store can address the thread by id. |
||
|
|
56823ec46b |
Move thread name edits to ThreadStore (#21264)
- Route live thread renames through `ThreadStore` metadata updates. - Read resumed thread names from store metadata with legacy local fallback preserved in the store. |
||
|
|
a8488fec5e |
Revert state DB injection and agent graph store (#21481)
## Why Reverts #20689 to restore the previous optional state DB plumbing. The conflict resolution keeps the newer installation ID and session/thread identity changes that landed after #20689, while removing the mandatory state DB and agent graph store dependency from ThreadManager construction. ## What changed - Restored `Option<StateDbHandle>` through app-server, MCP server, prompt debug, and test entry points. - Removed the `codex-core` dependency on `codex-agent-graph-store` and reverted descendant lookup back to the existing state DB path when available. - Kept newer `installation_id` forwarding by passing it beside the optional DB handle. - Kept local thread-name updates working when the optional state DB handle is absent. ## Validation - `git diff --check` - `cargo test -p codex-thread-store` - `cargo test -p codex-state -p codex-rollout -p codex-app-server-protocol` - Attempted `env CARGO_INCREMENTAL=0 cargo test -p codex-core -p codex-app-server -p codex-app-server-client -p codex-mcp-server -p codex-thread-manager-sample -p codex-tui`; blocked locally by a rustc ICE while compiling `v8 v146.4.0` with `rustc 1.93.0 (254b59607 2026-01-19)` on `aarch64-apple-darwin`. |
||
|
|
103dc2b6ae |
Revert "Move skills watcher to app-server" (#21460)
Reverts openai/codex#21287 |
||
|
|
d5eea229cc |
Move skills watcher to app-server (#21287)
## Why Skills update notifications are app-server API behavior, but the watcher lived in `codex-core` and surfaced through `EventMsg::SkillsUpdateAvailable`. Moving the watcher out keeps core focused on thread execution and lets app-server own both cache invalidation and the `skills/changed` notification. ## What changed - Added an app-server-owned skills watcher that watches local skill roots, clears the shared skills cache, and emits `skills/changed` directly. - Registers skill watches from the common app-server thread listener attach path, including direct starts, resumes, and app-server-observed child or forked threads. - Stores the `WatchRegistration` on `ThreadState`, so listener replacement, thread teardown, idle unload, and app-server shutdown deregister by dropping the RAII guard. - Removed `EventMsg::SkillsUpdateAvailable`, the core watcher, and the old core live-reload test. - Extended the app-server skills change test to verify a cached skills list is refreshed after a filesystem change without forcing reload. ## Validation - `cargo check -p codex-core -p codex-app-server -p codex-mcp-server -p codex-rollout -p codex-rollout-trace` - `cargo test -p codex-app-server skills_changed_notification_is_emitted_after_skill_change` |
||
|
|
fbdbc6b2fe |
[codex-analytics] emit tool item events from item lifecycle (#17090)
## Why After the tool-item schemas are in place, analytics needs to emit them from the app-server item lifecycle rather than requiring bespoke tracking at each callsite. The reducer should also reuse the shared thread analytics context introduced below it in the stack so later event families do not repeat the same reducer joins or missing-state ladder. ## What changed - Tracks tool-item completion notifications and emits the matching tool analytics event when a terminal item arrives. - Derives event-specific payload details for command execution, file changes, MCP calls, dynamic tools, collaboration tools, web search, and image generation. - Denormalizes thread, app-server client, runtime, and subagent provenance metadata through the shared thread analytics context. - Adds reducer coverage for item lifecycle emission and subagent metadata inheritance. ## Duration semantics `duration_ms` is computed from the app-server item lifecycle timestamps: `completed_at_ms - started_at_ms`. That makes it the duration of the lifecycle Codex observed locally, not necessarily the upstream provider's full execution time. - Web search usually has a meaningful observed lifecycle because Responses can send `response.output_item.added` before `response.output_item.done`; in that case `started_at_ms` comes from the added event and `completed_at_ms` comes from the done event. - Image generation can be much less precise. In the current observed stream, image generation often arrives only as a completed `response.output_item.done`; when there is no earlier added event, Codex synthesizes the started item immediately before completion, so `duration_ms` can be `0` even though upstream image generation took longer. - Standalone web search and standalone image generation work is expected to land after this stack. Those paths may introduce more direct lifecycle events or timing points, so the current web-search/image-generation duration semantics should be treated as the best available item-lifecycle approximation, not the final latency contract for those tool families. - `execution_duration_ms` is populated only where the completed item already carries a native execution duration; otherwise it remains `null` while `duration_ms` still reflects the local lifecycle interval. ## Currently placeholder / partial fields Some fields are included in the schema for the intended steady-state contract, but this PR does not yet populate them from real approval/review state: - `review_count`, `guardian_review_count`, and `user_review_count` currently default to `0`. - `final_approval_outcome` currently defaults to `unknown`. - `requested_additional_permissions` and `requested_network_access` currently default to `false`. ## Verification - `cargo test -p codex-analytics` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17090). * #18748 * #18747 * __->__ #17090 * #17089 * #20514 |
||
|
|
be1d3cff93 |
2- Use string service tiers in session protocol (#20971)
## Summary - break service tier session/op/app-server protocol fields from the closed enum to string tier ids - send the service tier string directly through model requests, prewarm, compaction, memories, and TUI/app-server turn starts - regenerate app-server protocol JSON/TypeScript schemas, removing the standalone ServiceTier TS enum ## Verification - just fmt - cargo check -p codex-core -p codex-app-server -p codex-tui - just write-app-server-schema --------- Co-authored-by: Codex <noreply@openai.com> |
||
|
|
5ecff05196 |
feat(app-server): move v2 sessionId onto Thread (#21336)
## Why `session_id` and `thread_id` are separate identities after #20437, but app-server only surfaced `sessionId` on the `thread/start`, `thread/resume`, and `thread/fork` response envelopes. Other thread-bearing surfaces such as `thread/list`, `thread/read`, `thread/started`, `thread/rollback`, `thread/metadata/update`, and `thread/unarchive` either lacked the grouping key or forced clients to special-case those three responses. Making `sessionId` part of the reusable `Thread` payload gives every v2 API surface one place to expose session-tree identity. ## Mental model 1. thread.sessionId lives on `Thread` 2. It is a view/runtime identity for the current live session tree, not durable stored lineage metadata 3. When app-server has a live loaded thread, it copies the real value from core’s session_configured.session_id 4. When it only has stored/unloaded data, it falls back to thread.sessionId = thread.id ## What changed - Added `sessionId` to the v2 [`Thread`]( |
||
|
|
06e5dfa4dd |
feat: return session ID from thread/fork (#21332)
## Why `thread/start` and `thread/resume` already return `sessionId`, but `thread/fork` only returned the new thread. That left clients to infer the forked thread's session identity from `thread.id`, which kept the new `session_id` / `thread_id` split implicit at one lifecycle boundary. Follow-up to #20437. ## What changed - Add `sessionId` to `ThreadForkResponse`. - Populate it from the forked session configuration. - Regenerate the v2 JSON/TypeScript schema fixtures and update the app-server docs/example. - Extend the fork integration test to assert the returned `sessionId`. ## Verification - Added coverage in `thread_fork_creates_new_thread_and_emits_started` for the new response field. |
||
|
|
a98623511b |
feat: add session_id (#20437)
## Summary Related to https://openai.slack.com/archives/C095U48JNL9/p1777537279707449 TLDR: We update the meaning of session ids and thread ids: * thread_id stays as now * session_id become a shared id between every thread under a /root thread (i.e. every sub-agent share the same session id) This PR introduces an explicit `SessionId` and threads it through the protocol/client boundary so `session_id` and `thread_id` can diverge when they need to, while preserving compatibility for older serialized `session_configured` events. --------- Co-authored-by: Codex <noreply@openai.com> |
||
|
|
8ef31894dc |
app-server: align dynamic tool identifiers with Responses API (#20724)
## Why Codex currently accepts dynamic tool names and namespaces that the upstream Responses function-tool path does not actually support. In practice, that means app-server can register a dynamic tool successfully and only discover later that the LLM-facing tool contract will reject or mishandle it. This PR tightens the app-server-side dynamic tool contract to match the Responses API before we stack dynamic tool hook support on top of it. ## What changed - validate dynamic tool `name` against the Responses function-tool identifier contract: `^[a-zA-Z0-9_-]+$`, length `1..128` - validate dynamic tool `namespace` the same way, with the Responses namespace length limit `1..64` - reject namespaces that collide with the always-reserved Responses runtime namespaces such as `functions`, `multi_tool_use`, `file_search`, `web`, `browser`, `image_gen`, `computer`, `container`, `terminal`, `python`, `python_user_visible`, `api_tool`, `tool_search`, and `submodel_delegator` - escape invalid identifiers in error messages so control characters do not spill raw into logs or client-visible error text - document the tightened dynamic tool identifier contract in `codex-rs/app-server/README.md` - add both unit coverage for the validator and an app-server integration test that rejects a `thread/start` request with Responses-incompatible dynamic tool identifiers ## Verification - `cargo test -p codex-app-server validate_dynamic_tools_` - `cargo test -p codex-app-server --test all thread_start_rejects_dynamic_tools_not_supported_by_responses` |
||
|
|
b3d4f1a9f0 |
[codex-analytics] rework thread_source for thread analytics (#20949)
## Summary - make `thread_source` an explicit optional thread-level field on `thread/start`, `thread/fork`, and returned thread payloads - persist `thread_source` in rollout/session metadata so resumed live threads retain the original value - replace the old best-effort `session_source` -> `thread_source` mapping with an explicit caller-supplied analytics classification ## Why Before this change, analytics `thread_source` was populated by a best-effort mapping from `session_source`. `session_source` describes the runtime/client surface, not the actual thread-level origin, so that projection was not accurate enough to distinguish cases such as `user`, `subagent`, `memory_consolidation`, and future thread origins reliably. Making `thread_source` explicit keeps one thread-level analytics field while letting callers provide the real classification directly instead of recovering it indirectly from `session_source`. ## Impact For new analytics events, `thread_source` now reflects the explicit thread-level classification supplied by the caller rather than an inferred value derived from `session_source`. Existing protocol fields remain optional; callers that omit `threadSource` now produce `null` instead of a best-effort inferred value. ## Validation - `just write-app-server-schema` - `cargo test -p codex-analytics -p codex-core -p codex-app-server-protocol --no-run` - `cargo test -p codex-app-server-protocol generated_ts_optional_nullable_fields_only_in_params` - `cargo test -p codex-analytics thread_initialized_event_serializes_expected_shape` - `cargo test -p codex-core resume_stopped_thread_from_rollout_preserves_thread_source` |
||
|
|
2c1a361a2e |
[codex] Move thread naming to app server (#21260)
## Why Thread names are app-server metadata now, backed by the thread store and sqlite state database. Keeping a core `SetThreadName` op plus a rollout `thread_name_updated` event made rename persistence live in the wrong layer and required historical replay support for an event that new app-server flows should not write. ## What changed - Removed `Op::SetThreadName` and `EventMsg::ThreadNameUpdated` from the core protocol and deleted the core handler path that appended rename events to rollouts. - Updated app-server `thread/name/set` so both loaded and unloaded threads write through thread-store metadata and app-server emits `thread/name/updated` notifications. - Updated local thread-store name metadata updates to write sqlite title metadata and the legacy thread-name index without appending rollout events. - Removed state extraction and rollout handling for the deleted thread-name event. ## Validation - `cargo test -p codex-app-server thread_name_updated_broadcasts` - `cargo test -p codex-app-server thread_name_set_is_reflected_in_read_list_and_resume` - `cargo test -p codex-thread-store update_thread_metadata_sets_name_on_active_rollout_and_indexes_name` - `cargo test -p codex-state` - `cargo check -p codex-mcp-server -p codex-rollout-trace` - `just fix -p codex-app-server -p codex-thread-store -p codex-state -p codex-mcp-server -p codex-rollout-trace` ## Docs No external documentation update is expected for this internal ownership change. |
||
|
|
7e310bc7f3 |
Inject state DB, agent graph store (#20689)
## Why We want the agent graph store to be passed down the stack as a real dependency, the same way we already treat the thread store. This will let us inject the agent graph store as a real dependency and support implementations other than the local SQLite-backed one. Right now most code instantiates a state DB and an agent graph store just-in-time. Ideally, we would not depend on the state DB directly but only read through the higher-level interfaces. This change makes the dependency boundaries explicit and moves state DB initialization to process bootstrap instead of hiding it inside local store implementations. ## What changed - `ThreadManager` now requires a `StateDbHandle` and an `AgentGraphStore` at construction time instead of treating them as optional internals. - The local store constructors no longer lazily initialize SQLite. Callers now initialize the state DB once per process and use that shared handle to build: - `LocalThreadStore` - `LocalAgentGraphStore` - App bootstraps (`app-server`, `mcp-server`, `prompt_debug`, and the thread-manager sample) now initialize the state DB up front and inject the resulting handle down the stack. - `app-server` now consistently uses its process-scoped state DB handle instead of reopening SQLite or trying to recover it from loaded threads. - Device-key storage now reuses the shared state DB handle instead of maintaining its own lazy opener. - The thread archive / descendant traversal paths now use the injected `AgentGraphStore` instead of reaching through local thread-store-specific state. ## Verification - `cargo check -p codex-core -p codex-thread-store -p codex-app-server -p codex-mcp-server -p codex-thread-manager-sample --tests` - `cargo test -p codex-thread-store` - `cargo test -p codex-core thread_manager_accepts_separate_agent_graph_store_and_thread_store -- --nocapture` - `cargo test -p codex-app-server thread_archive_archives_spawned_descendants -- --nocapture` |
||
|
|
8c88f9a304 |
Auto-deny MCP elicitations for Xcode 26.4 clients (#21113)
## Summary Xcode 26.4 was built against app-server behavior from before MCP elicitation requests became client-visible in CLI 0.120.0 via #17043. That client line does not expect the new events/messages, so this PR restores the old behavior for exactly that client/version combination. The compatibility handling stays in the app-server layer: when the initialized client is `Xcode` and its version starts with `26.4`, the app server marks the live Codex thread so MCP elicitations are auto-denied. The flag is applied on thread start/resume/fork/turn attachment, carried through `Codex`/`CodexThread`, and stored on `McpConnectionManager` so refreshed MCP managers preserve the behavior. ## Notes This is intentionally narrow and includes a TODO to remove the compatibility path once Xcode 26.4 ages out. |
||
|
|
b6d4c4ea6b |
[codex] Use shared app-server JSON-RPC error helpers (#21221)
## Why App-server had repeated hand-built JSON-RPC error objects for standard error shapes. Using the shared helpers keeps the common `invalid_request`, `invalid_params`, and `internal_error` construction in one place and reduces the chance of new call sites drifting from the common error payload shape. ## What changed - Replaced manual standard JSON-RPC error object creation with `internal_error(...)`, `invalid_request(...)`, and `invalid_params(...)` across app-server request processors and runtime paths. - Removed local duplicate helper definitions from search and review request handling. - Preserved existing structured `data` payloads by creating the shared helper error first and then attaching the existing metadata. - Left custom non-standard errors and raw error-code assertions intact. ## Validation - `cargo test -p codex-app-server` |
||
|
|
6075b77001 |
app-server: ignore persist_extended_history param (#21225)
## Why Taking a step to removing the `persistExtendedHistory` field. It's not scalable to be persisting so much data in the rollout file and returning it in the thread history. When a client explicitly sends `true`, the server now tells that client the parameter is deprecated and ignored so the caller has a clear migration signal via the `deprecationNotice` notification. ## What changed - Keep the `persist_extended_history` / `persistExtendedHistory` field in the v2 protocol for compatibility, but document it as deprecated and ignored. - Ignore the parameter in app-server `thread/start`, `thread/resume`, and `thread/fork`; those paths always use limited history persistence now. - Stop treating `persistExtendedHistory` as a running-thread resume override mismatch. - Emit a connection-scoped `deprecationNotice` when a request explicitly sets `persist_extended_history: true`. ## Verification - Added `thread_start_deprecates_persist_extended_history_true` to cover the deprecation notice. - `cargo test -p codex-app-server` - `cargo test -p codex-app-server-protocol` |
||
|
|
33d24b0df5 |
codex: migrate (more) app-server thread history reads to ThreadStore (#20575)
Migrate token usage replay, rollback responses, and detached review setup (a special case of forking) to be served from ThreadStore reads rather direct rollout files. - replay restored token usage from already-loaded `RolloutItem` history instead of reopening `Thread.path` - rebuild rollback responses from loaded `ThreadStore` snapshots and history - start detached reviews from store-backed parent history and stored review-thread metadata - remove obsolete app-server rollout-summary helper code that became dead after the store-backed migration - preserve response/notification ordering for resume, fork, rollback, and detached review flows - add integration test coverage for the affected paths |
||
|
|
707e51bd8b |
codex: route metadata updates through ThreadStore (#20576)
- Route `thread/metadata/update` through `ThreadStore::update_thread_metadata`. - Add `LocalThreadStore` git metadata patch support for set, partial update, and clear semantics. - Add some unit tests for the new thread store code - Remove a lot of dead code/tests! |
||
|
|
4d201e340e |
state: pass state db handles through consumers (#20561)
## Why SQLite state was still being opened from consumer paths, including lazy `OnceCell`-backed thread-store call sites. That let one process construct multiple state DB connections for the same Codex home, which makes SQLite lock contention and `database is locked` failures much easier to hit. State DB lifetime should be chosen by main-like entrypoints and tests, then passed through explicitly. Consumers should use the supplied `Option<StateDbHandle>` or `StateDbHandle` and keep their existing filesystem fallback or error behavior when no handle is available. The startup path also needs to keep the rollout crate in charge of SQLite state initialization. Opening `codex_state::StateRuntime` directly bypasses rollout metadata backfill, so entrypoints should initialize through `codex_rollout::state_db` and receive a handle only after required rollout backfills have completed. ## What Changed - Initialize the state DB in main-like entrypoints for CLI, TUI, app-server, exec, MCP server, and the thread-manager sample. - Pass `Option<StateDbHandle>` through `ThreadManager`, `LocalThreadStore`, app-server processors, TUI app wiring, rollout listing/recording, personality migration, shell snapshot cleanup, session-name lookup, and memory/device-key consumers. - Remove the lazy local state DB wrapper from the thread store so non-test consumers use only the supplied handle or their existing fallback path. - Make `codex_rollout::state_db::init` the local state startup path: it opens/migrates SQLite, runs rollout metadata backfill when needed, waits for concurrent backfill workers up to a bounded timeout, verifies completion, and then returns the initialized handle. - Keep optional/non-owning SQLite helpers, such as remote TUI local reads, as open-only paths that do not run startup backfill. - Switch app-server startup from direct `codex_state::StateRuntime::init` to the rollout state initializer so app-server cannot skip rollout backfill. - Collapse split rollout lookup/list APIs so callers use the normal methods with an optional state handle instead of `_with_state_db` variants. - Restore `getConversationSummary(ThreadId)` to delegate through `ThreadStore::read_thread` instead of a LocalThreadStore-specific rollout path special case. - Keep DB-backed rollout path lookup keyed on the DB row and file existence, without imposing the filesystem filename convention on existing DB rows. - Verify readable DB-backed rollout paths against `session_meta.id` before returning them, so a stale SQLite row that points at another thread's JSONL falls back to filesystem search and read-repairs the DB row. - Keep `debug prompt-input` filesystem-only so a one-off debug command does not initialize or backfill SQLite state just to print prompt input. - Keep goal-session test Codex homes alive only in the goal-specific helper, rather than leaking tempdirs from the shared session test helper. - Update tests and call sites to pass explicit state handles where DB behavior is expected and explicit `None` where filesystem-only behavior is intended. ## Validation - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p codex-tui -p codex-exec -p codex-cli --tests` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout state_db_` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout try_init_ -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p codex-rollout --lib -- -D warnings` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store read_thread_falls_back_when_sqlite_path_points_to_another_thread -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core shell_snapshot` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all personality_migration` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find` - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find::find_prefers_sqlite_path_by_id -- --nocapture` - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core interrupt_accounts_active_goal_before_pausing` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server get_auth_status -- --test-threads=1` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server --lib` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-app-server --tests` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p codex-exec -p codex-cli` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-app-server` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core` - `just argument-comment-lint -p codex-core` - `just argument-comment-lint -p codex-rollout` Focused coverage added in `codex-rollout`: - `recorder::tests::state_db_init_backfills_before_returning` verifies the rollout metadata row exists before startup init returns. - `state_db::tests::try_init_waits_for_concurrent_startup_backfill` verifies startup waits for another worker to finish backfill instead of disabling the handle for the process. - `state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill` verifies startup does not hang indefinitely on a stuck backfill lease. - `tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename` verifies DB-backed lookup accepts valid existing rollout paths even when the filename does not include the thread UUID. - `tests::find_thread_path_falls_back_when_db_path_points_to_another_thread` verifies DB-backed lookup ignores a stale row whose existing path belongs to another thread and read-repairs the row after filesystem fallback. Focused coverage updated in `codex-core`: - `rollout_list_find::find_prefers_sqlite_path_by_id` now uses a DB-preferred rollout file with matching `session_meta.id`, so it still verifies that valid SQLite paths win without depending on stale/empty rollout contents. `cargo test -p codex-app-server thread_list_respects_search_term_filter -- --test-threads=1 --nocapture` was attempted locally but timed out waiting for the app-server test harness `initialize` response before reaching the changed thread-list code path. `bazel test //codex-rs/thread-store:thread-store-unit-tests --test_output=errors` was attempted locally after the thread-store fix, but this container failed before target analysis while fetching `v8+` through BuildBuddy/direct GitHub. The equivalent local crate coverage, including `cargo test -p codex-thread-store`, passes. A plain local `cargo check -p codex-rollout -p codex-app-server --tests` also requires system `libcap.pc` for `codex-linux-sandbox`; the follow-up app-server check above used `CODEX_SKIP_VENDORED_BWRAP=1` in this container. |
||
|
|
541e99cf09 |
feat(app-server): always return limited thread history (#20682)
## Why Whenever we return a thread's history (turns and items) over app-server, always return the limited form as specified by the rollout policy `EventPersistenceMode::Limited`, even if the thread was previously started with `EventPersistenceMode::Extended`. We're finding it is quite unscalable to be returning the extended history, so let's apply the same filtering logic of the rollout policy when we load and return the thread's history. ## What Changed - Reuse the rollout persistence policy when reconstructing app-server `ThreadItem` history so only `EventPersistenceMode::Limited` rollout items are replayed into API turns. - Route `thread/read`, `thread/resume`, `thread/fork`, `thread/turns/list`, and rollback responses through the same filtered app-server history projection. - Keep live active turns intact when composing a response for a currently running thread. - Update command execution coverage so persisted extended command events are excluded from returned history for `thread/read`, `thread/fork`, and `thread/turns/list`. ## Test Plan - `cargo test -p codex-app-server limited` - `cargo test -p codex-app-server thread_shell_command` - `cargo test -p codex-app-server thread_read` - `cargo test -p codex-app-server thread_rollback` - `cargo test -p codex-app-server thread_fork` - `cargo test -p codex-app-server-protocol` |
||
|
|
33b19bcfde |
[codex] Split app-server request processors (#20940)
## Why The app-server request path had grown around a large `CodexMessageProcessor` plus separate API wrapper/helper modules. That made the dependency graph hard to see and forced unrelated request families to share broad processor state. This PR makes the split mechanical and command-prefix oriented so request families own only the dependencies they use. ## What changed - Replaced `CodexMessageProcessor` with command-prefix request processors under `app-server/src/request_processors/`. - Removed the old config, device-key, external-agent-config, and fs API wrapper files by moving their API handling into processors. - Split apps, plugins, marketplace, catalog, account, MCP, command exec, fs, git, feedback, thread, turn, thread goals, and Windows sandbox handling into dedicated processors. - Kept shared lifecycle, summary conversion, token usage replay, and shared error mapping only where multiple processors use them; single-use helpers were inlined into their owning processor. - Removed the fallback processor path and moved processor tests to `_tests` files. ## Validation - `cargo test -p codex-app-server` - `cargo check -p codex-app-server` - `just fix -p codex-app-server` |