The transcript viewport draws every frame. Ratatui's Line::render_ref does grapheme segmentation and span layout, so repeated redraws can burn CPU during streaming even when the visible transcript hasn't changed. Introduce TranscriptViewCache to reduce per-frame work: - WrappedTranscriptCache memoizes flattened+wrapped transcript lines per width, appends incrementally as new cells arrive, and rebuilds on width change, truncation (backtrack), or transcript replacement. - TranscriptRasterCache caches rasterized rows (Vec<Cell>) per line index and user-row styling; redraws copy cells instead of rerendering spans. The caches are width-scoped and store base transcript content only; selection highlighting and copy affordances are applied after drawing. User rows include the row-wide base style in the cached raster. Refactor transcript_render to expose append_wrapped_transcript_cell for incremental building and add a test that incremental append matches the full build. Add docs/tui2/performance-testing.md as a playbook for macOS sample profiles and hotspot greps. Expand transcript_view_cache tests to cover rebuild conditions, raster equivalence vs direct rendering, user-row caching, and eviction. Test: cargo test -p codex-tui2
3.3 KiB
Performance testing (codex-tui2)
This doc captures a repeatable workflow for investigating codex-tui2 performance issues
(especially high idle CPU and high CPU while streaming) and validating optimizations to the draw
hot path.
Scope (this round)
The current focus is the transcript draw hot path, specifically the cost of repeatedly rendering
the same visible transcript lines via Ratatui’s Line::render_ref (notably grapheme segmentation
and span layout).
The intended mitigation is a rasterization cache: render a wrapped transcript Line into a
row of Cells once, cache it, and on subsequent redraws copy cached cells into the frame buffer.
Key invariants:
- The cache is width-scoped (invalidate on terminal width changes).
- The cache stores base content only; selection highlight and copy affordances are applied after rendering, so they don’t pollute cached rows.
Roles
- Human: runs
codex-tui2in an interactive terminal (e.g. Ghostty), triggers “idle” and “streaming” scenarios, and captures profiles. - Assistant (or a script): reads profile output and extracts hotspots and deltas.
Baseline setup
Build from a clean checkout:
cd codex-rs
cargo build -p codex-tui2
Run codex-tui2 in a terminal and get a PID (macOS):
pgrep -n codex-tui2
Track CPU quickly while reproducing:
top -pid "$(pgrep -n codex-tui2)"
Capture profiles (macOS)
Capture both an “idle” and a “streaming” profile so hotspots are not conflated:
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.idle.sample.txt
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.streaming.sample.txt
For the streaming sample, trigger a response that emits many deltas (e.g. “Tell me a story”) so the stream runs long enough to sample.
Quick hotspot extraction
These rg patterns keep the investigation grounded in the data:
# Buffer diff hot path (idle)
rg -n "custom_terminal::diff_buffers|diff_buffers" /tmp/tui2.*.sample.txt | head -n 80
# Transcript rendering hot path (streaming)
rg -n "App::render_transcript_cells|Line::render|render_spans|styled_graphemes|GraphemeCursor::next_boundary" /tmp/tui2.*.sample.txt | head -n 120
Rasterization-cache validation checklist
After implementing a transcript rasterization cache, re-run the same scenarios and confirm:
- Streaming sample shifts away from
unicode_segmentation::grapheme::GraphemeCursor::next_boundarystacks dominating the main thread. - CPU during streaming drops materially vs baseline for the same streaming load.
- Idle CPU does not regress (redraw gating changes can mask rendering improvements; always measure both idle and streaming).
Notes to record per run
- Terminal size: width × height
- Scenario: idle vs streaming (prompt + approximate response length)
- CPU snapshot:
top(directional) - Profile excerpt: 20–50 relevant lines for the dominant stacks
Code pointers
codex-rs/tui2/src/transcript_view_cache.rs: wrapped transcript memoization + per-line rasterization cache (cachedCellrows).codex-rs/tui2/src/transcript_render.rs: incremental helper used by the wrapped-line cache (append_wrapped_transcript_cell).codex-rs/tui2/src/app.rs: wiring inApp::render_transcript_cells(uses cached rows instead of callingLine::render_refevery frame).