Fix compaction context reinjection and model baselines (#12252)

## Summary
- move regular-turn context diff/full-context persistence into
`run_turn` so pre-turn compaction runs before incoming context updates
are recorded
- after successful pre-turn compaction, rely on a cleared
`reference_context_item` to trigger full context reinjection on the
follow-up regular turn (manual `/compact` keeps replacement history
summary-only and also clears the baseline)
- preserve `<model_switch>` when full context is reinjected, and inject
it *before* the rest of the full-context items
- scope `reference_context_item` and `previous_model` to regular user
turns only so standalone tasks (`/compact`, shell, review, undo) cannot
suppress future reinjection or `<model_switch>` behavior
- make context-diff persistence + `reference_context_item` updates
explicit in the regular-turn path, with clearer docs/comments around the
invariant
- stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
(only regular turns persist `TurnContextItem` now)
- simplify resume/fork previous-model/reference-baseline hydration by
looking up the last surviving turn context from rollout lifecycle
events, including rollback and compaction-crossing handling
- remove the legacy fallback that guessed from bare `TurnContext`
rollouts without lifecycle events
- update compaction/remote-compaction/model-visible snapshots and
compact test assertions (including remote compaction mock response
shape)

## Why
We were persisting incoming context items before spawning the regular
turn task, which let pre-turn compaction requests accidentally include
incoming context diffs without the new user message. Fixing that exposed
follow-on baseline issues around `/compact`, resume/fork, and standalone
tasks that could cause duplicate context injection or suppress
`<model_switch>` instructions.

This PR re-centers the invariants around regular turns:
- regular turns persist model-visible context diffs/full reinjection and
update the `reference_context_item`
- standalone tasks do not advance those regular-turn baselines
- compaction clears the baseline when replacement history may have
stripped the referenced context diffs

## Follow-ups (TODOs left in code)
- `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
comprehensively
- `TODO(ccunningham)`: include pending incoming context items in
pre-turn compaction threshold estimation
- `TODO(ccunningham)`: inject updated personality spec alongside
`<model_switch>` so some model-switch paths can avoid forced full
reinjection
- `TODO(ccunningham)`: review task turn lifecycle
(`TurnStarted`/`TurnComplete`) behavior and emit task-start context
diffs for task types that should have them (excluding `/compact`)

## Validation
- `just fmt`
- CI should cover the updated compaction/resume/model-visible snapshot
expectations and rollout-hydration behavior
- I did **not** rerun the full local test suite after the latest
resume-lookup / rollout-persistence simplifications
This commit is contained in:
Charley Cunningham
2026-02-20 23:13:08 -08:00
committed by GitHub
parent 264fc444b6
commit bb0ac5be70
31 changed files with 1289 additions and 1206 deletions

View File

@@ -363,7 +363,7 @@ async fn summarize_context_three_requests_and_instructions() {
codex.submit(Op::Shutdown).await.unwrap();
wait_for_event(&codex, |ev| matches!(ev, EventMsg::ShutdownComplete)).await;
// Verify rollout contains APITurn entries for each API call and a Compacted entry.
// Verify rollout contains regular sampling TurnContext entries and a Compacted entry.
println!("rollout path: {}", rollout_path.display());
let text = std::fs::read_to_string(&rollout_path).unwrap_or_else(|e| {
panic!(
@@ -371,7 +371,7 @@ async fn summarize_context_three_requests_and_instructions() {
rollout_path.display()
)
});
let mut api_turn_count = 0usize;
let mut regular_turn_context_count = 0usize;
let mut saw_compacted_summary = false;
for line in text.lines() {
let trimmed = line.trim();
@@ -383,7 +383,7 @@ async fn summarize_context_three_requests_and_instructions() {
};
match entry.item {
RolloutItem::TurnContext(_) => {
api_turn_count += 1;
regular_turn_context_count += 1;
}
RolloutItem::Compacted(ci) => {
if ci.message == expected_summary_message {
@@ -395,8 +395,8 @@ async fn summarize_context_three_requests_and_instructions() {
}
assert!(
api_turn_count == 3,
"expected three APITurn entries in rollout"
regular_turn_context_count == 2,
"expected two regular sampling TurnContext entries in rollout"
);
assert!(
saw_compacted_summary,
@@ -2414,26 +2414,36 @@ async fn manual_compact_twice_preserves_latest_user_messages() {
first_request_user_texts[first_turn_user_index], first_user_message,
"first turn request should end with the submitted user message"
);
let seeded_user_prefix = &first_request_user_texts[..first_turn_user_index];
let initial_seeded_user_prefix = &first_request_user_texts[..first_turn_user_index];
let final_request_user_texts = requests
.last()
.unwrap_or_else(|| panic!("final turn request missing for {final_user_message}"))
.message_input_texts("user");
assert!(
final_request_user_texts
.as_slice()
.starts_with(seeded_user_prefix),
"final request should start with seeded user prefix from first request: {seeded_user_prefix:?}"
!initial_seeded_user_prefix.is_empty(),
"first turn should include seeded user prefix before the submitted user message"
);
let final_output = &final_request_user_texts[seeded_user_prefix.len()..];
let expected = vec![
let (final_request_last_user_text, final_request_before_last_user) = final_request_user_texts
.split_last()
.unwrap_or_else(|| panic!("final turn request missing user messages"));
assert_eq!(
final_request_last_user_text, final_user_message,
"final turn request should end with the submitted user message"
);
let history_before_seeded_prefix = final_request_before_last_user
.strip_suffix(initial_seeded_user_prefix)
.unwrap_or_else(|| {
panic!(
"final request should end with the seeded user prefix from the first request: {initial_seeded_user_prefix:?}"
)
});
let expected_history = vec![
first_user_message.to_string(),
second_user_message.to_string(),
expected_second_summary,
final_user_message.to_string(),
];
assert_eq!(final_output, expected.as_slice());
assert_eq!(history_before_seeded_prefix, expected_history.as_slice());
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]