Files
codex/codex-rs/core/src/turn_metadata_tests.rs
ningyi-oai bee78806a9 [codex] add compaction metadata to turn headers (#24368)
## Summary
- Add `request_kind` values for foreground turn, startup prewarm,
compaction, and detached memory model requests.
- Attach compaction dispatch metadata to local Responses, legacy
`/v1/responses/compact`, and remote v2 compact requests.
- Add the existing logical context-window identifier as `window_id` on
turn-owned model request metadata.
- Keep identity fields optional for detached memory requests, while
still emitting `request_kind="memory"` in non-git/no-sandbox workspaces.

## Root Cause
`x-codex-turn-metadata` has more than one producer. Foreground turns and
compaction requests own a real turn and should carry that turn identity.
Detached memory stage-one requests do not own a foreground turn, so
absent identity fields are valid rather than missing data. Startup
websocket prewarm is also a model request, but it has `generate=false`
and must not be counted as a foreground turn.

`thread_source` or session source identifies where a thread came from
(for example review, guardian, or another subagent). `request_kind`
identifies what the current outbound model request is doing (`turn`,
`prewarm`, `compaction`, or `memory`). A review or guardian thread can
issue either a normal turn request or a compaction request, so source
cannot replace request kind.

## Behavior / Impact
- Ordinary foreground requests send `request_kind="turn"`, their real
identity fields, and `window_id="<thread_id>:<window_generation>"`.
- Startup websocket warmup requests send `request_kind="prewarm"` so
they are not counted as foreground turns.
- Compaction requests send `request_kind="compaction"`, their real
owning turn identity, the existing `window_id`, and
`compaction.{trigger,reason,implementation,phase,strategy}`.
- Detached memory stage-one requests send `request_kind="memory"`
without `session_id`, `thread_id`, `turn_id`, or `window_id`; when no
workspace metadata exists, the kind-only header is still emitted.
- `session_id`, `thread_id`, `turn_id`, and `window_id` remain optional
in the header schema because detached memory requests do not own a
foreground turn or context window.
- `window_id` is not a new ID system: it is copied from the already-sent
`x-codex-window-id` / WS client metadata value at model-request dispatch
time.
- Existing `x-codex-window-id` HTTP/WS emission, value format,
generation advancement, resume behavior, and fork reset behavior are
unchanged.
- `request_kind`, `window_id`, and upstream turn-owned identity fields
remain schema-owned; input `responsesapi_client_metadata` cannot replace
their canonical values.
- No table, DAG, export, app-server API, or MCP `_meta` schema changes
are included.

A compaction attempt stopped by a pre-compact hook issues no model
request and therefore has no request header; its outcome remains in
analytics events. Status, error, duration, and token deltas also remain
analytics fields rather than request-header fields.

Future detached-memory attribution using a real initiating turn ID as
`trigger_turn_id` is intentionally not part of this PR.

## Sync With Main
- Final pushed head `716342e79` is rebased onto `origin/main@0d37db4b2`.
- The metadata conflict came from upstream `#24160`, which added
`forked_from_thread_id` on the same `turn_metadata` surface. Resolution
preserves that field and its protection from client metadata override
alongside this PR's request-kind, compaction, and window-id fields.
- While resolving the overlapping commits, I removed an accidental
recursive model-request overlay and a duplicate detached-memory header
builder before completing the rebase.

## Latency / User Experience Boundary
- Foreground turns perform no new filesystem, git, or network work. New
fields are inserted into metadata already serialized for outgoing
requests.
- Compaction issues the same model/HTTP requests with the same prompt,
model, service tier, and sampling settings; only metadata bytes change.
- Startup prewarm already sent metadata; it is now correctly classified
as `prewarm`.
- Non-git detached memory now sends a small kind-only metadata header
rather than no header.
- This client diff adds no user-visible latency mechanism beyond
negligible serialization and header bytes on already-existing requests.

## Validation
On conflict-resolved head `1d35c2cfb` based on `origin/main@487521733`:
- `just fmt` (passed)
- `just fix -p codex-core` (passed)
- `git diff --check origin/main...HEAD` (passed)
- `just test -p codex-core -E 'test(turn_metadata) |
test(websocket_first_turn_uses_startup_prewarm_and_create) |
test(responses_stream_includes_turn_metadata_header_for_git_workspace_e2e)
|
test(responses_websocket_forwards_turn_metadata_on_initial_and_incremental_create)
| test(remote_compact_v2_retries_failures_with_stream_retry_budget) |
test(window_id_advances_after_compact_persists_on_resume_and_resets_on_fork)'`
(`23 passed`; `bench-smoke` passed)
- `just test -p codex-app-server -E
'test(turn_start_forwards_client_metadata_to_responses_request_v2) |
test(turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2)
| test(auto_compaction_remote_emits_started_and_completed_items)'` (`3
passed`; `bench-smoke` passed)
- `just test -p codex-memories-write` (`29 passed`; `bench-smoke`
passed)
2026-05-27 11:09:33 -07:00

500 lines
17 KiB
Rust

use super::*;
use crate::sandbox_tags::permission_profile_sandbox_tag;
use codex_protocol::models::PermissionProfile;
use codex_protocol::openai_models::ReasoningEffort as ReasoningEffortConfig;
use codex_protocol::protocol::ThreadSource;
use core_test_support::PathBufExt;
use core_test_support::PathExt;
use pretty_assertions::assert_eq;
use serde_json::Value;
use std::collections::HashMap;
use tempfile::TempDir;
use tokio::process::Command;
fn test_mcp_turn_metadata_context() -> McpTurnMetadataContext<'static> {
McpTurnMetadataContext {
model: "gpt-5.4",
reasoning_effort: Some(ReasoningEffortConfig::High),
}
}
#[tokio::test]
async fn build_turn_metadata_header_marks_detached_memory_without_turn_identity() {
let temp_dir = TempDir::new().expect("temp dir");
let repo_path = temp_dir.path().join("repo-東京").abs();
std::fs::create_dir_all(&repo_path).expect("create repo");
Command::new("git")
.args(["init"])
.current_dir(&repo_path)
.output()
.await
.expect("git init");
Command::new("git")
.args(["config", "user.name", "Test User"])
.current_dir(&repo_path)
.output()
.await
.expect("git config user.name");
Command::new("git")
.args(["config", "user.email", "test@example.com"])
.current_dir(&repo_path)
.output()
.await
.expect("git config user.email");
std::fs::write(repo_path.join("README.md"), "hello").expect("write file");
Command::new("git")
.args(["add", "."])
.current_dir(&repo_path)
.output()
.await
.expect("git add");
Command::new("git")
.args(["commit", "-m", "initial"])
.current_dir(&repo_path)
.output()
.await
.expect("git commit");
let header = build_turn_metadata_header(&repo_path, Some("none"))
.await
.expect("header");
assert!(header.is_ascii());
assert!(!header.contains("東京"));
let parsed: Value = serde_json::from_str(&header).expect("valid json");
assert_eq!(parsed["request_kind"].as_str(), Some("memory"));
assert!(parsed.get("session_id").is_none());
assert!(parsed.get("thread_id").is_none());
assert!(parsed.get("forked_from_thread_id").is_none());
assert!(parsed.get("turn_id").is_none());
assert!(parsed.get(WINDOW_ID_KEY).is_none());
let expected_repo_path = repo_path.to_string_lossy().into_owned();
let actual_repo_path = parsed
.get("workspaces")
.and_then(Value::as_object)
.and_then(|workspaces| workspaces.keys().next())
.expect("workspace path");
assert_eq!(actual_repo_path, &expected_repo_path);
let workspace = parsed
.get("workspaces")
.and_then(Value::as_object)
.and_then(|workspaces| workspaces.values().next())
.cloned()
.expect("workspace");
assert_eq!(
workspace.get("has_changes").and_then(Value::as_bool),
Some(false)
);
}
#[tokio::test]
async fn build_turn_metadata_header_marks_memory_without_workspace_metadata() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let header = build_turn_metadata_header(&cwd, /*sandbox*/ None)
.await
.expect("detached memory should emit its request kind");
let parsed: Value = serde_json::from_str(&header).expect("valid json");
assert_eq!(parsed, serde_json::json!({"request_kind": "memory"}));
}
#[test]
fn turn_metadata_state_uses_platform_sandbox_tag() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
/*forked_from_thread_id*/ None,
Some(ThreadSource::User),
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
let header = state.current_header_value().expect("header");
let json: Value = serde_json::from_str(&header).expect("json");
let sandbox_name = json.get("sandbox").and_then(Value::as_str);
let session_id = json.get("session_id").and_then(Value::as_str);
let thread_id = json.get("thread_id").and_then(Value::as_str);
let thread_source = json.get("thread_source").and_then(Value::as_str);
assert!(json.get("request_kind").is_none());
let expected_sandbox = permission_profile_sandbox_tag(
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
assert_eq!(sandbox_name, Some(expected_sandbox));
assert_eq!(session_id, Some("session-a"));
assert_eq!(thread_id, Some("thread-a"));
assert_eq!(thread_source, Some("user"));
assert!(json.get("session_source").is_none());
}
#[test]
fn turn_metadata_state_uses_explicit_subagent_thread_source() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
/*forked_from_thread_id*/ None,
Some(ThreadSource::Subagent),
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
let header = state.current_header_value().expect("header");
let json: Value = serde_json::from_str(&header).expect("json");
assert_eq!(json["thread_source"].as_str(), Some("subagent"));
assert!(json.get("session_source").is_none());
}
#[test]
fn turn_metadata_state_includes_root_fork_lineage() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let source_thread_id =
ThreadId::from_string("11111111-1111-4111-8111-111111111111").expect("thread id");
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
Some(source_thread_id),
Some(ThreadSource::User),
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
let header = state.current_header_value().expect("header");
let json: Value = serde_json::from_str(&header).expect("json");
assert_eq!(
json["forked_from_thread_id"].as_str(),
Some("11111111-1111-4111-8111-111111111111")
);
}
#[test]
fn turn_metadata_state_includes_turn_started_at_unix_ms_after_start() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
/*forked_from_thread_id*/ None,
Some(ThreadSource::User),
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
state.set_turn_started_at_unix_ms(/*turn_started_at_unix_ms*/ 1_700_000_000_123);
let header = state.current_header_value().expect("header");
let json: Value = serde_json::from_str(&header).expect("json");
assert_eq!(
json["turn_started_at_unix_ms"].as_i64(),
Some(1_700_000_000_123)
);
}
#[test]
fn turn_metadata_state_includes_model_and_reasoning_effort_only_in_request_meta() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
/*forked_from_thread_id*/ None,
/*thread_source*/ None,
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
let header = state.current_header_value().expect("header");
let header_json: Value = serde_json::from_str(&header).expect("json");
assert!(header_json.get("model").is_none());
assert!(header_json.get("reasoning_effort").is_none());
let meta = state
.current_meta_value_for_mcp_request(test_mcp_turn_metadata_context())
.expect("turn metadata should be present");
assert!(meta.get("request_kind").is_none());
assert_eq!(meta["model"].as_str(), Some("gpt-5.4"));
assert_eq!(meta["reasoning_effort"].as_str(), Some("high"));
let meta_without_reasoning_effort = state
.current_meta_value_for_mcp_request(McpTurnMetadataContext {
model: "gpt-5.4",
reasoning_effort: None,
})
.expect("turn metadata should be present");
assert_eq!(
meta_without_reasoning_effort["model"].as_str(),
Some("gpt-5.4")
);
assert!(
meta_without_reasoning_effort
.get("reasoning_effort")
.is_none()
);
}
#[test]
fn turn_metadata_state_marks_user_input_requested_during_turn_only_for_mcp_request_meta() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
/*forked_from_thread_id*/ None,
/*thread_source*/ None,
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
let header = state.current_header_value().expect("header");
let header_json: Value = serde_json::from_str(&header).expect("json");
assert!(
header_json
.get(USER_INPUT_REQUESTED_DURING_TURN_KEY)
.is_none()
);
let meta = state
.current_meta_value_for_mcp_request(test_mcp_turn_metadata_context())
.expect("turn metadata should be present");
assert!(meta.get(USER_INPUT_REQUESTED_DURING_TURN_KEY).is_none());
state.mark_user_input_requested_during_turn();
let header = state.current_header_value().expect("header");
let header_json: Value = serde_json::from_str(&header).expect("json");
assert!(
header_json
.get(USER_INPUT_REQUESTED_DURING_TURN_KEY)
.is_none()
);
let meta = state
.current_meta_value_for_mcp_request(test_mcp_turn_metadata_context())
.expect("turn metadata should be present");
assert_eq!(
meta.get(USER_INPUT_REQUESTED_DURING_TURN_KEY)
.and_then(Value::as_bool),
Some(true)
);
}
#[test]
fn turn_metadata_state_ignores_client_reserved_metadata_before_start() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
/*forked_from_thread_id*/ None,
Some(ThreadSource::User),
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
state.set_responsesapi_client_metadata(HashMap::from([
(
"turn_started_at_unix_ms".to_string(),
"client-supplied".to_string(),
),
(
"forked_from_thread_id".to_string(),
"client-supplied".to_string(),
),
]));
let header = state.current_header_value().expect("header");
let json: Value = serde_json::from_str(&header).expect("json");
assert!(json.get("turn_started_at_unix_ms").is_none());
assert!(json.get("forked_from_thread_id").is_none());
}
#[test]
fn turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let source_thread_id =
ThreadId::from_string("44444444-4444-4444-8444-444444444444").expect("thread id");
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
Some(source_thread_id),
Some(ThreadSource::User),
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
state.set_responsesapi_client_metadata(HashMap::from([
("fiber_run_id".to_string(), "fiber-123".to_string()),
("origin".to_string(), "東京".to_string()),
("model".to_string(), "client-supplied".to_string()),
(
"reasoning_effort".to_string(),
"client-supplied".to_string(),
),
("session_id".to_string(), "client-supplied".to_string()),
("thread_id".to_string(), "client-supplied".to_string()),
(
"forked_from_thread_id".to_string(),
"client-supplied".to_string(),
),
("turn_id".to_string(), "client-supplied".to_string()),
(WINDOW_ID_KEY.to_string(), "client-supplied".to_string()),
("thread_source".to_string(), "client-supplied".to_string()),
("request_kind".to_string(), "client-supplied".to_string()),
(
"turn_started_at_unix_ms".to_string(),
"client-supplied".to_string(),
),
]));
state.set_turn_started_at_unix_ms(/*turn_started_at_unix_ms*/ 1_700_000_000_123);
let header = state.current_header_value().expect("header");
assert!(header.is_ascii());
assert!(!header.contains("東京"));
let json: Value = serde_json::from_str(&header).expect("json");
assert_eq!(json["fiber_run_id"].as_str(), Some("fiber-123"));
assert_eq!(json["origin"].as_str(), Some("東京"));
assert_eq!(json["model"].as_str(), Some("client-supplied"));
assert_eq!(json["reasoning_effort"].as_str(), Some("client-supplied"));
assert_eq!(json["session_id"].as_str(), Some("session-a"));
assert_eq!(json["thread_id"].as_str(), Some("thread-a"));
assert_eq!(
json["forked_from_thread_id"].as_str(),
Some("44444444-4444-4444-8444-444444444444")
);
assert_eq!(json["thread_source"].as_str(), Some("user"));
assert_eq!(json["turn_id"].as_str(), Some("turn-a"));
assert!(json.get("request_kind").is_none());
assert!(json.get(WINDOW_ID_KEY).is_none());
assert_eq!(
json["turn_started_at_unix_ms"].as_i64(),
Some(1_700_000_000_123)
);
let model_request_header = state
.current_header_value_for_model_request("thread-a:1")
.expect("model request header");
let model_request_json: Value =
serde_json::from_str(&model_request_header).expect("model request json");
assert_eq!(model_request_json["request_kind"].as_str(), Some("turn"));
assert_eq!(
model_request_json[WINDOW_ID_KEY].as_str(),
Some("thread-a:1")
);
let meta = state
.current_meta_value_for_mcp_request(test_mcp_turn_metadata_context())
.expect("turn metadata should be present");
assert_eq!(meta["model"].as_str(), Some("gpt-5.4"));
assert_eq!(meta["reasoning_effort"].as_str(), Some("high"));
assert!(meta.get(WINDOW_ID_KEY).is_none());
}
#[test]
fn turn_metadata_state_overlays_compaction_only_on_compaction_requests() {
let temp_dir = TempDir::new().expect("temp dir");
let cwd = temp_dir.path().abs();
let permission_profile = PermissionProfile::read_only();
let state = TurnMetadataState::new(
"session-a".to_string(),
"thread-a".to_string(),
/*forked_from_thread_id*/ None,
Some(ThreadSource::User),
"turn-a".to_string(),
cwd,
&permission_profile,
WindowsSandboxLevel::Disabled,
/*enforce_managed_network*/ false,
);
state.set_responsesapi_client_metadata(HashMap::from([(
"compaction".to_string(),
"client-supplied".to_string(),
)]));
let compact_header = state
.current_header_value_for_compaction(
"thread-a:2",
CompactionTurnMetadata::new(
CompactionTrigger::Auto,
CompactionReason::ContextLimit,
CompactionImplementation::ResponsesCompactionV2,
CompactionPhase::MidTurn,
),
)
.expect("compact header");
let compact_json: Value = serde_json::from_str(&compact_header).expect("json");
assert_eq!(compact_json["request_kind"].as_str(), Some("compaction"));
assert_eq!(compact_json["turn_id"].as_str(), Some("turn-a"));
assert_eq!(compact_json[WINDOW_ID_KEY].as_str(), Some("thread-a:2"));
assert_eq!(
compact_json["compaction"],
serde_json::json!({
"trigger": "auto",
"reason": "context_limit",
"implementation": "responses_compaction_v2",
"phase": "mid_turn",
"strategy": "memento",
})
);
let regular_header = state
.current_header_value_for_model_request("thread-a:3")
.expect("regular header");
let regular_json: Value = serde_json::from_str(&regular_header).expect("json");
assert_eq!(regular_json["request_kind"].as_str(), Some("turn"));
assert_eq!(regular_json[WINDOW_ID_KEY].as_str(), Some("thread-a:3"));
assert!(regular_json.get("compaction").is_none());
}