Files
codex/docs/tui2/performance-testing.md
Josh McKinney 90f37e8549 perf(tui2): cache transcript view rendering (#8693)
The transcript viewport draws every frame. Ratatui's Line::render_ref
does grapheme segmentation and span layout, so repeated redraws can burn
CPU during streaming even when the visible transcript hasn't changed.

Introduce TranscriptViewCache to reduce per-frame work:
- WrappedTranscriptCache memoizes flattened+wrapped transcript lines per
width, appends incrementally as new cells arrive, and rebuilds on width
change, truncation (backtrack), or transcript replacement.
- TranscriptRasterCache caches rasterized rows (Vec<Cell>) per line
index and user-row styling; redraws copy cells instead of rerendering
spans.

The caches are width-scoped and store base transcript content only;
selection highlighting and copy affordances are applied after drawing.
User rows include the row-wide base style in the cached raster.

Refactor transcript_render to expose append_wrapped_transcript_cell for
incremental building and add a test that incremental append matches the
full build.

Add docs/tui2/performance-testing.md as a playbook for macOS sample
profiles and hotspot greps.

Expand transcript_view_cache tests to cover rebuild conditions, raster
equivalence vs direct rendering, user-row caching, and eviction.

Test: cargo test -p codex-tui2
2026-01-03 11:44:27 -08:00

98 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Performance testing (`codex-tui2`)
This doc captures a repeatable workflow for investigating `codex-tui2` performance issues
(especially high idle CPU and high CPU while streaming) and validating optimizations to the draw
hot path.
## Scope (this round)
The current focus is the transcript draw hot path, specifically the cost of repeatedly rendering
the same visible transcript lines via Ratatuis `Line::render_ref` (notably grapheme segmentation
and span layout).
The intended mitigation is a **rasterization cache**: render a wrapped transcript `Line` into a
row of `Cell`s once, cache it, and on subsequent redraws copy cached cells into the frame buffer.
Key invariants:
- The cache is width-scoped (invalidate on terminal width changes).
- The cache stores **base content** only; selection highlight and copy affordances are applied
after rendering, so they dont pollute cached rows.
## Roles
- Human: runs `codex-tui2` in an interactive terminal (e.g. Ghostty), triggers “idle” and
“streaming” scenarios, and captures profiles.
- Assistant (or a script): reads profile output and extracts hotspots and deltas.
## Baseline setup
Build from a clean checkout:
```sh
cd codex-rs
cargo build -p codex-tui2
```
Run `codex-tui2` in a terminal and get a PID (macOS):
```sh
pgrep -n codex-tui2
```
Track CPU quickly while reproducing:
```sh
top -pid "$(pgrep -n codex-tui2)"
```
## Capture profiles (macOS)
Capture both an “idle” and a “streaming” profile so hotspots are not conflated:
```sh
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.idle.sample.txt
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.streaming.sample.txt
```
For the streaming sample, trigger a response that emits many deltas (e.g. “Tell me a story”) so
the stream runs long enough to sample.
## Quick hotspot extraction
These `rg` patterns keep the investigation grounded in the data:
```sh
# Buffer diff hot path (idle)
rg -n "custom_terminal::diff_buffers|diff_buffers" /tmp/tui2.*.sample.txt | head -n 80
# Transcript rendering hot path (streaming)
rg -n "App::render_transcript_cells|Line::render|render_spans|styled_graphemes|GraphemeCursor::next_boundary" /tmp/tui2.*.sample.txt | head -n 120
```
## Rasterization-cache validation checklist
After implementing a transcript rasterization cache, re-run the same scenarios and confirm:
- Streaming sample shifts away from `unicode_segmentation::grapheme::GraphemeCursor::next_boundary`
stacks dominating the main thread.
- CPU during streaming drops materially vs baseline for the same streaming load.
- Idle CPU does not regress (redraw gating changes can mask rendering improvements; always measure
both idle and streaming).
## Notes to record per run
- Terminal size: width × height
- Scenario: idle vs streaming (prompt + approximate response length)
- CPU snapshot: `top` (directional)
- Profile excerpt: 2050 relevant lines for the dominant stacks
## Code pointers
- `codex-rs/tui2/src/transcript_view_cache.rs`: wrapped transcript memoization + per-line
rasterization cache (cached `Cell` rows).
- `codex-rs/tui2/src/transcript_render.rs`: incremental helper used by the wrapped-line cache
(`append_wrapped_transcript_cell`).
- `codex-rs/tui2/src/app.rs`: wiring in `App::render_transcript_cells` (uses cached rows instead of
calling `Line::render_ref` every frame).