Fix compaction context reinjection and model baselines (#12252)

## Summary
- move regular-turn context diff/full-context persistence into
`run_turn` so pre-turn compaction runs before incoming context updates
are recorded
- after successful pre-turn compaction, rely on a cleared
`reference_context_item` to trigger full context reinjection on the
follow-up regular turn (manual `/compact` keeps replacement history
summary-only and also clears the baseline)
- preserve `<model_switch>` when full context is reinjected, and inject
it *before* the rest of the full-context items
- scope `reference_context_item` and `previous_model` to regular user
turns only so standalone tasks (`/compact`, shell, review, undo) cannot
suppress future reinjection or `<model_switch>` behavior
- make context-diff persistence + `reference_context_item` updates
explicit in the regular-turn path, with clearer docs/comments around the
invariant
- stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
(only regular turns persist `TurnContextItem` now)
- simplify resume/fork previous-model/reference-baseline hydration by
looking up the last surviving turn context from rollout lifecycle
events, including rollback and compaction-crossing handling
- remove the legacy fallback that guessed from bare `TurnContext`
rollouts without lifecycle events
- update compaction/remote-compaction/model-visible snapshots and
compact test assertions (including remote compaction mock response
shape)

## Why
We were persisting incoming context items before spawning the regular
turn task, which let pre-turn compaction requests accidentally include
incoming context diffs without the new user message. Fixing that exposed
follow-on baseline issues around `/compact`, resume/fork, and standalone
tasks that could cause duplicate context injection or suppress
`<model_switch>` instructions.

This PR re-centers the invariants around regular turns:
- regular turns persist model-visible context diffs/full reinjection and
update the `reference_context_item`
- standalone tasks do not advance those regular-turn baselines
- compaction clears the baseline when replacement history may have
stripped the referenced context diffs

## Follow-ups (TODOs left in code)
- `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
comprehensively
- `TODO(ccunningham)`: include pending incoming context items in
pre-turn compaction threshold estimation
- `TODO(ccunningham)`: inject updated personality spec alongside
`<model_switch>` so some model-switch paths can avoid forced full
reinjection
- `TODO(ccunningham)`: review task turn lifecycle
(`TurnStarted`/`TurnComplete`) behavior and emit task-start context
diffs for task types that should have them (excluding `/compact`)

## Validation
- `just fmt`
- CI should cover the updated compaction/resume/model-visible snapshot
expectations and rollout-hydration behavior
- I did **not** rerun the full local test suite after the latest
resume-lookup / rollout-persistence simplifications
This commit is contained in:
Charley Cunningham
2026-02-20 23:13:08 -08:00
committed by GitHub
parent 264fc444b6
commit bb0ac5be70
31 changed files with 1289 additions and 1206 deletions

View File

@@ -13,9 +13,9 @@ Scenario: Manual /compact with prior user history compacts existing history and
05:message/user:<SUMMARIZATION_PROMPT>
## Local Post-Compaction History Layout
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/user:<AGENTS_MD>
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:first manual turn
04:message/user:<COMPACTION_SUMMARY>\nFIRST_MANUAL_SUMMARY
00:message/user:first manual turn
01:message/user:<COMPACTION_SUMMARY>\nFIRST_MANUAL_SUMMARY
02:message/developer:<PERMISSIONS_INSTRUCTIONS>
03:message/user:<AGENTS_MD>
04:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
05:message/user:second manual turn

View File

@@ -11,8 +11,8 @@ Scenario: Manual /compact with no prior user turn currently still issues a compa
03:message/user:<SUMMARIZATION_PROMPT>
## Local Post-Compaction History Layout
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/user:<AGENTS_MD>
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:<COMPACTION_SUMMARY>\nMANUAL_EMPTY_SUMMARY
00:message/user:<COMPACTION_SUMMARY>\nMANUAL_EMPTY_SUMMARY
01:message/developer:<PERMISSIONS_INSTRUCTIONS>
02:message/user:<AGENTS_MD>
03:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
04:message/user:AFTER_MANUAL_EMPTY_COMPACT

View File

@@ -1,6 +1,5 @@
---
source: core/tests/suite/compact.rs
assertion_line: 1773
expression: "format_labeled_requests_snapshot(\"Pre-sampling compaction on model switch to a smaller context window: current behavior compacts using prior-turn history only (incoming user message excluded), and the follow-up request carries compacted history plus the new user message.\",\n&[(\"Initial Request (Previous Model)\", &requests[0]),\n(\"Pre-sampling Compaction Request\", &requests[1]),\n(\"Post-Compaction Follow-up Request (Next Model)\", &requests[2]),])"
---
Scenario: Pre-sampling compaction on model switch to a smaller context window: current behavior compacts using prior-turn history only (incoming user message excluded), and the follow-up request carries compacted history plus the new user message.
@@ -22,10 +21,10 @@ Scenario: Pre-sampling compaction on model switch to a smaller context window: c
06:message/user:<SUMMARIZATION_PROMPT>
## Post-Compaction Follow-up Request (Next Model)
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/user:<AGENTS_MD>
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:before switch
04:message/user:<COMPACTION_SUMMARY>\nPRE_SAMPLING_SUMMARY
05:message/developer:<model_switch>\nThe user was previously using a different model....
00:message/user:before switch
01:message/user:<COMPACTION_SUMMARY>\nPRE_SAMPLING_SUMMARY
02:message/developer:<model_switch>\nThe user was previously using a different model....
03:message/developer:<PERMISSIONS_INSTRUCTIONS>
04:message/user:<AGENTS_MD>
05:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
06:message/user:after switch

View File

@@ -12,14 +12,13 @@ Scenario: Pre-turn auto-compaction with a context override emits the context dif
04:message/assistant:FIRST_REPLY
05:message/user:USER_TWO
06:message/assistant:SECOND_REPLY
07:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
08:message/user:<SUMMARIZATION_PROMPT>
07:message/user:<SUMMARIZATION_PROMPT>
## Local Post-Compaction History Layout
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/user:<AGENTS_MD>
02:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
03:message/user:USER_ONE
04:message/user:USER_TWO
05:message/user:<COMPACTION_SUMMARY>\nPRE_TURN_SUMMARY
00:message/user:USER_ONE
01:message/user:USER_TWO
02:message/user:<COMPACTION_SUMMARY>\nPRE_TURN_SUMMARY
03:message/developer:<PERMISSIONS_INSTRUCTIONS>
04:message/user:<AGENTS_MD>
05:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
06:message/user:<image> | <input_image:image_url> | </image> | USER_THREE

View File

@@ -1,6 +1,5 @@
---
source: core/tests/suite/compact.rs
assertion_line: 3152
expression: "format_labeled_requests_snapshot(\"Pre-turn compaction during model switch (without pre-sampling model-switch compaction): current behavior strips incoming <model_switch> from the compact request and restores it in the post-compaction follow-up request.\",\n&[(\"Initial Request (Previous Model)\", &requests[0]),\n(\"Local Compaction Request\", &requests[1]),\n(\"Local Post-Compaction History Layout\", &requests[2]),])"
---
Scenario: Pre-turn compaction during model switch (without pre-sampling model-switch compaction): current behavior strips incoming <model_switch> from the compact request and restores it in the post-compaction follow-up request.
@@ -22,11 +21,11 @@ Scenario: Pre-turn compaction during model switch (without pre-sampling model-sw
06:message/user:<SUMMARIZATION_PROMPT>
## Local Post-Compaction History Layout
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/developer:<personality_spec> The user has requested a new communication st...
02:message/user:<AGENTS_MD>
03:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
04:message/user:BEFORE_SWITCH_USER
05:message/user:<COMPACTION_SUMMARY>\nPRETURN_SWITCH_SUMMARY
06:message/developer:<model_switch>\nThe user was previously using a different model....
00:message/user:BEFORE_SWITCH_USER
01:message/user:<COMPACTION_SUMMARY>\nPRETURN_SWITCH_SUMMARY
02:message/developer:<model_switch>\nThe user was previously using a different model....
03:message/developer:<PERMISSIONS_INSTRUCTIONS>
04:message/developer:<personality_spec> The user has requested a new communication st...
05:message/user:<AGENTS_MD>
06:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
07:message/user:AFTER_SWITCH_USER

View File

@@ -1,9 +1,8 @@
---
source: core/tests/suite/compact_remote.rs
assertion_line: 178
expression: "format_labeled_requests_snapshot(\"Remote manual /compact where remote compact output is summary-only: follow-up layout uses returned summary plus new user message.\",\n&[(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", follow_up_request),])"
expression: "format_labeled_requests_snapshot(\"Remote manual /compact where remote compact output is compaction-only: follow-up layout uses the returned compaction item plus new user message.\",\n&[(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", follow_up_request),])"
---
Scenario: Remote manual /compact where remote compact output is summary-only: follow-up layout uses returned summary plus new user message.
Scenario: Remote manual /compact where remote compact output is compaction-only: follow-up layout uses the returned compaction item plus new user message.
## Remote Compaction Request
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
@@ -13,9 +12,8 @@ Scenario: Remote manual /compact where remote compact output is summary-only: fo
04:message/assistant:FIRST_REMOTE_REPLY
## Remote Post-Compaction History Layout
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/user:<AGENTS_MD>
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:REMOTE_COMPACTED_SUMMARY
04:compaction:encrypted=true
05:message/user:after compact
00:compaction:encrypted=true
01:message/developer:<PERMISSIONS_INSTRUCTIONS>
02:message/user:<AGENTS_MD>
03:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
04:message/user:after compact

View File

@@ -1,21 +1,18 @@
---
source: core/tests/suite/compact_remote.rs
expression: "format_labeled_requests_snapshot(\"Remote mid-turn compaction after an earlier summary compaction: the older summary remains in model-visible history and round-trips into the next compact request.\",\n&[(\"Second Turn Request (Before Mid-Turn Compaction)\", &requests[1]),\n(\"Remote Compaction Request\", &compact_request),])"
assertion_line: 1876
expression: "format_labeled_requests_snapshot(\"After a prior manual /compact produced an older remote compaction item, the next turn hits remote auto-compaction before the next sampling request. The compact request carries forward that earlier compaction item, and the next sampling request shows the latest compaction item with context reinjected before USER_TWO.\",\n&[(\"Remote Compaction Request\", &compact_request),\n(\"Second Turn Request (After Compaction)\", &second_turn_request),])"
---
Scenario: Remote mid-turn compaction after an earlier summary compaction: the older summary remains in model-visible history and round-trips into the next compact request.
## Second Turn Request (Before Mid-Turn Compaction)
00:message/user:USER_ONE
01:message/user:<COMPACTION_SUMMARY>\nREMOTE_OLDER_SUMMARY
02:message/developer:<PERMISSIONS_INSTRUCTIONS>
03:message/user:<AGENTS_MD>
04:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
05:message/user:<COMPACTION_SUMMARY>\nREMOTE_LATEST_SUMMARY
06:message/user:USER_TWO
Scenario: After a prior manual /compact produced an older remote compaction item, the next turn hits remote auto-compaction before the next sampling request. The compact request carries forward that earlier compaction item, and the next sampling request shows the latest compaction item with context reinjected before USER_TWO.
## Remote Compaction Request
00:message/user:USER_ONE
01:message/developer:<PERMISSIONS_INSTRUCTIONS>
02:message/user:<AGENTS_MD>
03:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
04:message/user:<COMPACTION_SUMMARY>\nREMOTE_OLDER_SUMMARY
01:compaction:encrypted=true
## Second Turn Request (After Compaction)
00:message/user:USER_ONE
01:compaction:encrypted=true
02:message/developer:<PERMISSIONS_INSTRUCTIONS>
03:message/user:<AGENTS_MD>
04:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
05:message/user:USER_TWO

View File

@@ -1,8 +1,8 @@
---
source: core/tests/suite/compact_remote.rs
expression: "format_labeled_requests_snapshot(\"Remote mid-turn continuation compaction after tool output: compact request includes tool artifacts and follow-up request includes the summary.\",\n&[(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", &requests[1]),])"
expression: "format_labeled_requests_snapshot(\"Remote mid-turn continuation compaction after tool output: compact request includes tool artifacts and the follow-up request includes the returned compaction item.\",\n&[(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", &requests[1]),])"
---
Scenario: Remote mid-turn continuation compaction after tool output: compact request includes tool artifacts and follow-up request includes the summary.
Scenario: Remote mid-turn continuation compaction after tool output: compact request includes tool artifacts and the follow-up request includes the returned compaction item.
## Remote Compaction Request
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
@@ -13,8 +13,8 @@ Scenario: Remote mid-turn continuation compaction after tool output: compact req
05:function_call_output:unsupported call: test_tool
## Remote Post-Compaction History Layout
00:message/user:USER_ONE
01:message/developer:<PERMISSIONS_INSTRUCTIONS>
02:message/user:<AGENTS_MD>
03:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
04:message/user:<COMPACTION_SUMMARY>\nREMOTE_MID_TURN_SUMMARY
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/user:<AGENTS_MD>
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:USER_ONE
04:compaction:encrypted=true

View File

@@ -1,8 +1,8 @@
---
source: core/tests/suite/compact_remote.rs
expression: "format_labeled_requests_snapshot(\"Remote mid-turn compaction where compact output has only summary user content: continuation layout reinjects canonical context before that summary.\",\n&[(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", &requests[1]),])"
expression: "format_labeled_requests_snapshot(\"Remote mid-turn compaction where compact output has only a compaction item: continuation layout reinjects context before that compaction item.\",\n&[(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", &requests[1]),])"
---
Scenario: Remote mid-turn compaction where compact output has only summary user content: continuation layout reinjects canonical context before that summary.
Scenario: Remote mid-turn compaction where compact output has only a compaction item: continuation layout reinjects context before that compaction item.
## Remote Compaction Request
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
@@ -16,4 +16,4 @@ Scenario: Remote mid-turn compaction where compact output has only summary user
00:message/developer:<PERMISSIONS_INSTRUCTIONS>
01:message/user:<AGENTS_MD>
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:<COMPACTION_SUMMARY>\nREMOTE_SUMMARY_ONLY
03:compaction:encrypted=true

View File

@@ -12,13 +12,12 @@ Scenario: Remote pre-turn auto-compaction with a context override emits the cont
04:message/assistant:REMOTE_FIRST_REPLY
05:message/user:USER_TWO
06:message/assistant:REMOTE_SECOND_REPLY
07:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
## Remote Post-Compaction History Layout
00:message/user:USER_ONE
01:message/user:USER_TWO
02:message/developer:<PERMISSIONS_INSTRUCTIONS>
03:message/user:<AGENTS_MD>
04:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
05:message/user:<COMPACTION_SUMMARY>\nREMOTE_PRE_TURN_SUMMARY
02:compaction:encrypted=true
03:message/developer:<PERMISSIONS_INSTRUCTIONS>
04:message/user:<AGENTS_MD>
05:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
06:message/user:USER_THREE

View File

@@ -1,6 +1,6 @@
---
source: core/tests/suite/compact_remote.rs
expression: "format_labeled_requests_snapshot(\"Remote pre-turn compaction during model switch currently excludes incoming user input, strips incoming <model_switch> from the compact request payload, and restores it in the post-compaction follow-up request.\",\n&[(\"Initial Request (Previous Model)\", &requests[0]),\n(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", &requests[1]),])"
expression: "format_labeled_requests_snapshot(\"Remote pre-turn compaction during model switch currently excludes incoming user input, strips incoming <model_switch> from the compact request payload, and restores it in the post-compaction follow-up request.\",\n&[(\"Initial Request (Previous Model)\", &initial_turn_request),\n(\"Remote Compaction Request\", &compact_request),\n(\"Remote Post-Compaction History Layout\", &post_compact_turn_request),])"
---
Scenario: Remote pre-turn compaction during model switch currently excludes incoming user input, strips incoming <model_switch> from the compact request payload, and restores it in the post-compaction follow-up request.
@@ -19,10 +19,10 @@ Scenario: Remote pre-turn compaction during model switch currently excludes inco
## Remote Post-Compaction History Layout
00:message/user:BEFORE_SWITCH_USER
01:message/developer:<PERMISSIONS_INSTRUCTIONS>
02:message/developer:<personality_spec> The user has requested a new communication st...
03:message/user:<AGENTS_MD>
04:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
05:message/user:<COMPACTION_SUMMARY>\nREMOTE_SWITCH_SUMMARY
06:message/developer:<model_switch>\nThe user was previously using a different model....
01:compaction:encrypted=true
02:message/developer:<model_switch>\nThe user was previously using a different model....
03:message/developer:<PERMISSIONS_INSTRUCTIONS>
04:message/developer:<personality_spec> The user has requested a new communication st...
05:message/user:<AGENTS_MD>
06:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
07:message/user:AFTER_SWITCH_USER

View File

@@ -1,5 +1,6 @@
---
source: core/tests/suite/model_visible_layout.rs
assertion_line: 435
expression: "format_labeled_requests_snapshot(\"First post-resume turn where pre-turn override sets model to rollout model; no model-switch update should appear.\",\n&[(\"Last Request Before Resume\", &initial_request),\n(\"First Request After Resume + Override\", &resumed_request),])"
---
Scenario: First post-resume turn where pre-turn override sets model to rollout model; no model-switch update should appear.
@@ -16,7 +17,5 @@ Scenario: First post-resume turn where pre-turn override sets model to rollout m
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:seed resume history
04:message/assistant:recorded before resume
05:message/developer:<PERMISSIONS_INSTRUCTIONS>
06:message/user:<AGENTS_MD>
07:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
08:message/user:first resumed turn after model override
05:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
06:message/user:first resumed turn after model override

View File

@@ -1,5 +1,6 @@
---
source: core/tests/suite/model_visible_layout.rs
assertion_line: 337
expression: "format_labeled_requests_snapshot(\"First post-resume turn where resumed config model differs from rollout and personality changes.\",\n&[(\"Last Request Before Resume\", &initial_request),\n(\"First Request After Resume\", &resumed_request),])"
---
Scenario: First post-resume turn where resumed config model differs from rollout and personality changes.
@@ -16,11 +17,7 @@ Scenario: First post-resume turn where resumed config model differs from rollout
02:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
03:message/user:seed resume history
04:message/assistant:recorded before resume
05:message/developer:<PERMISSIONS_INSTRUCTIONS>
06:message/developer:<personality_spec> The user has requested a new communication style. Future messages should adhe...
07:message/user:<AGENTS_MD>
08:message/user:<ENVIRONMENT_CONTEXT:cwd=<CWD>>
09:message/developer:<PERMISSIONS_INSTRUCTIONS>
10:message/developer:<model_switch>\nThe user was previously using a different model. Please continue the conversatio...
11:message/developer:<personality_spec> The user has requested a new communication style. Future messages should adhe...
12:message/user:resume and change personality
05:message/developer:<model_switch>\nThe user was previously using a different model. Please continue the conversatio...
06:message/user:<ENVIRONMENT_CONTEXT:cwd=PRETURN_CONTEXT_DIFF_CWD>
07:message/developer:<PERMISSIONS_INSTRUCTIONS>
08:message/user:resume and change personality