Files
codex/docs/tui2/performance-testing.md
Josh McKinney 90f37e8549 perf(tui2): cache transcript view rendering (#8693)
The transcript viewport draws every frame. Ratatui's Line::render_ref
does grapheme segmentation and span layout, so repeated redraws can burn
CPU during streaming even when the visible transcript hasn't changed.

Introduce TranscriptViewCache to reduce per-frame work:
- WrappedTranscriptCache memoizes flattened+wrapped transcript lines per
width, appends incrementally as new cells arrive, and rebuilds on width
change, truncation (backtrack), or transcript replacement.
- TranscriptRasterCache caches rasterized rows (Vec<Cell>) per line
index and user-row styling; redraws copy cells instead of rerendering
spans.

The caches are width-scoped and store base transcript content only;
selection highlighting and copy affordances are applied after drawing.
User rows include the row-wide base style in the cached raster.

Refactor transcript_render to expose append_wrapped_transcript_cell for
incremental building and add a test that incremental append matches the
full build.

Add docs/tui2/performance-testing.md as a playbook for macOS sample
profiles and hotspot greps.

Expand transcript_view_cache tests to cover rebuild conditions, raster
equivalence vs direct rendering, user-row caching, and eviction.

Test: cargo test -p codex-tui2
2026-01-03 11:44:27 -08:00

3.3 KiB
Raw Blame History

Performance testing (codex-tui2)

This doc captures a repeatable workflow for investigating codex-tui2 performance issues (especially high idle CPU and high CPU while streaming) and validating optimizations to the draw hot path.

Scope (this round)

The current focus is the transcript draw hot path, specifically the cost of repeatedly rendering the same visible transcript lines via Ratatuis Line::render_ref (notably grapheme segmentation and span layout).

The intended mitigation is a rasterization cache: render a wrapped transcript Line into a row of Cells once, cache it, and on subsequent redraws copy cached cells into the frame buffer.

Key invariants:

  • The cache is width-scoped (invalidate on terminal width changes).
  • The cache stores base content only; selection highlight and copy affordances are applied after rendering, so they dont pollute cached rows.

Roles

  • Human: runs codex-tui2 in an interactive terminal (e.g. Ghostty), triggers “idle” and “streaming” scenarios, and captures profiles.
  • Assistant (or a script): reads profile output and extracts hotspots and deltas.

Baseline setup

Build from a clean checkout:

cd codex-rs
cargo build -p codex-tui2

Run codex-tui2 in a terminal and get a PID (macOS):

pgrep -n codex-tui2

Track CPU quickly while reproducing:

top -pid "$(pgrep -n codex-tui2)"

Capture profiles (macOS)

Capture both an “idle” and a “streaming” profile so hotspots are not conflated:

sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.idle.sample.txt
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.streaming.sample.txt

For the streaming sample, trigger a response that emits many deltas (e.g. “Tell me a story”) so the stream runs long enough to sample.

Quick hotspot extraction

These rg patterns keep the investigation grounded in the data:

# Buffer diff hot path (idle)
rg -n "custom_terminal::diff_buffers|diff_buffers" /tmp/tui2.*.sample.txt | head -n 80

# Transcript rendering hot path (streaming)
rg -n "App::render_transcript_cells|Line::render|render_spans|styled_graphemes|GraphemeCursor::next_boundary" /tmp/tui2.*.sample.txt | head -n 120

Rasterization-cache validation checklist

After implementing a transcript rasterization cache, re-run the same scenarios and confirm:

  • Streaming sample shifts away from unicode_segmentation::grapheme::GraphemeCursor::next_boundary stacks dominating the main thread.
  • CPU during streaming drops materially vs baseline for the same streaming load.
  • Idle CPU does not regress (redraw gating changes can mask rendering improvements; always measure both idle and streaming).

Notes to record per run

  • Terminal size: width × height
  • Scenario: idle vs streaming (prompt + approximate response length)
  • CPU snapshot: top (directional)
  • Profile excerpt: 2050 relevant lines for the dominant stacks

Code pointers

  • codex-rs/tui2/src/transcript_view_cache.rs: wrapped transcript memoization + per-line rasterization cache (cached Cell rows).
  • codex-rs/tui2/src/transcript_render.rs: incremental helper used by the wrapped-line cache (append_wrapped_transcript_cell).
  • codex-rs/tui2/src/app.rs: wiring in App::render_transcript_cells (uses cached rows instead of calling Line::render_ref every frame).