mirror of
https://github.com/openai/codex.git
synced 2026-04-24 22:54:54 +00:00
perf(tui2): cache transcript view rendering (#8693)
The transcript viewport draws every frame. Ratatui's Line::render_ref does grapheme segmentation and span layout, so repeated redraws can burn CPU during streaming even when the visible transcript hasn't changed. Introduce TranscriptViewCache to reduce per-frame work: - WrappedTranscriptCache memoizes flattened+wrapped transcript lines per width, appends incrementally as new cells arrive, and rebuilds on width change, truncation (backtrack), or transcript replacement. - TranscriptRasterCache caches rasterized rows (Vec<Cell>) per line index and user-row styling; redraws copy cells instead of rerendering spans. The caches are width-scoped and store base transcript content only; selection highlighting and copy affordances are applied after drawing. User rows include the row-wide base style in the cached raster. Refactor transcript_render to expose append_wrapped_transcript_cell for incremental building and add a test that incremental append matches the full build. Add docs/tui2/performance-testing.md as a playbook for macOS sample profiles and hotspot greps. Expand transcript_view_cache tests to cover rebuild conditions, raster equivalence vs direct rendering, user-row caching, and eviction. Test: cargo test -p codex-tui2
This commit is contained in:
97
docs/tui2/performance-testing.md
Normal file
97
docs/tui2/performance-testing.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# Performance testing (`codex-tui2`)
|
||||
|
||||
This doc captures a repeatable workflow for investigating `codex-tui2` performance issues
|
||||
(especially high idle CPU and high CPU while streaming) and validating optimizations to the draw
|
||||
hot path.
|
||||
|
||||
## Scope (this round)
|
||||
|
||||
The current focus is the transcript draw hot path, specifically the cost of repeatedly rendering
|
||||
the same visible transcript lines via Ratatui’s `Line::render_ref` (notably grapheme segmentation
|
||||
and span layout).
|
||||
|
||||
The intended mitigation is a **rasterization cache**: render a wrapped transcript `Line` into a
|
||||
row of `Cell`s once, cache it, and on subsequent redraws copy cached cells into the frame buffer.
|
||||
|
||||
Key invariants:
|
||||
|
||||
- The cache is width-scoped (invalidate on terminal width changes).
|
||||
- The cache stores **base content** only; selection highlight and copy affordances are applied
|
||||
after rendering, so they don’t pollute cached rows.
|
||||
|
||||
## Roles
|
||||
|
||||
- Human: runs `codex-tui2` in an interactive terminal (e.g. Ghostty), triggers “idle” and
|
||||
“streaming” scenarios, and captures profiles.
|
||||
- Assistant (or a script): reads profile output and extracts hotspots and deltas.
|
||||
|
||||
## Baseline setup
|
||||
|
||||
Build from a clean checkout:
|
||||
|
||||
```sh
|
||||
cd codex-rs
|
||||
cargo build -p codex-tui2
|
||||
```
|
||||
|
||||
Run `codex-tui2` in a terminal and get a PID (macOS):
|
||||
|
||||
```sh
|
||||
pgrep -n codex-tui2
|
||||
```
|
||||
|
||||
Track CPU quickly while reproducing:
|
||||
|
||||
```sh
|
||||
top -pid "$(pgrep -n codex-tui2)"
|
||||
```
|
||||
|
||||
## Capture profiles (macOS)
|
||||
|
||||
Capture both an “idle” and a “streaming” profile so hotspots are not conflated:
|
||||
|
||||
```sh
|
||||
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.idle.sample.txt
|
||||
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.streaming.sample.txt
|
||||
```
|
||||
|
||||
For the streaming sample, trigger a response that emits many deltas (e.g. “Tell me a story”) so
|
||||
the stream runs long enough to sample.
|
||||
|
||||
## Quick hotspot extraction
|
||||
|
||||
These `rg` patterns keep the investigation grounded in the data:
|
||||
|
||||
```sh
|
||||
# Buffer diff hot path (idle)
|
||||
rg -n "custom_terminal::diff_buffers|diff_buffers" /tmp/tui2.*.sample.txt | head -n 80
|
||||
|
||||
# Transcript rendering hot path (streaming)
|
||||
rg -n "App::render_transcript_cells|Line::render|render_spans|styled_graphemes|GraphemeCursor::next_boundary" /tmp/tui2.*.sample.txt | head -n 120
|
||||
```
|
||||
|
||||
## Rasterization-cache validation checklist
|
||||
|
||||
After implementing a transcript rasterization cache, re-run the same scenarios and confirm:
|
||||
|
||||
- Streaming sample shifts away from `unicode_segmentation::grapheme::GraphemeCursor::next_boundary`
|
||||
stacks dominating the main thread.
|
||||
- CPU during streaming drops materially vs baseline for the same streaming load.
|
||||
- Idle CPU does not regress (redraw gating changes can mask rendering improvements; always measure
|
||||
both idle and streaming).
|
||||
|
||||
## Notes to record per run
|
||||
|
||||
- Terminal size: width × height
|
||||
- Scenario: idle vs streaming (prompt + approximate response length)
|
||||
- CPU snapshot: `top` (directional)
|
||||
- Profile excerpt: 20–50 relevant lines for the dominant stacks
|
||||
|
||||
## Code pointers
|
||||
|
||||
- `codex-rs/tui2/src/transcript_view_cache.rs`: wrapped transcript memoization + per-line
|
||||
rasterization cache (cached `Cell` rows).
|
||||
- `codex-rs/tui2/src/transcript_render.rs`: incremental helper used by the wrapped-line cache
|
||||
(`append_wrapped_transcript_cell`).
|
||||
- `codex-rs/tui2/src/app.rs`: wiring in `App::render_transcript_cells` (uses cached rows instead of
|
||||
calling `Line::render_ref` every frame).
|
||||
Reference in New Issue
Block a user