mirror of https://github.com/openai/codex.git synced 2026-04-24 22:54:54 +00:00

Files

Josh McKinney 3fbf379e02 docs: refine tui2 viewport roadmap (#8122 )

Update the tui2 viewport/history design doc with current status and a
prioritized roadmap (scroll feel, selection/copy correctness, streaming
wrap polish, terminal integration, and longer-term per-cell
interactivity ideas).

2025-12-16 22:16:50 +00:00

27 KiB

Raw Blame History

TUI2 Viewport, Transcript, and History – Design Notes

This document describes the viewport and history model we are implementing in the new codex-rs/tui2 crate. It builds on lessons from the legacy TUI and explains why we moved away from directly writing history into terminal scrollback.

The target audience is Codex developers and curious contributors who want to understand or critique how TUI2 owns its viewport, scrollback, and suspend behavior.

Unless stated otherwise, references to “the TUI” in this document mean the TUI2 implementation; when we mean the legacy TUI specifically, we call it out explicitly.

1. Problem Overview

Historically, the legacy TUI tried to “cooperate” with the terminal’s own scrollback:

The inline viewport sat somewhere above the bottom of the screen.
When new history arrived, we tried to insert it directly into the terminal scrollback above the viewport.
On certain transitions (e.g. switching sessions, overlays), we cleared and re‑wrote portions of the screen from scratch.

This had several failure modes:

Terminal‑dependent behavior.
- Different terminals handle scroll regions, clears, and resize semantics differently.
- What looked correct in one terminal could drop or duplicate content in another.
Resizes and layout churn.
- The TUI reacts to resizes, focus changes, and overlay transitions.
- When the viewport moved or its size changed, our attempts to keep scrollback “aligned” with the in‑memory history could go out of sync.
- In practice this meant:
  - Some lines were lost or overwritten.
  - Others were duplicated or appeared in unexpected places.
“Clear and rewrite everything” didn’t save us.
- We briefly tried a strategy of clearing large regions (or the full screen) and re‑rendering history when the layout changed.
- This ran into two issues:
  - Terminals treat full clears differently. For example, Terminal.app often leaves the cleared screen as a “page” at the top of scrollback, some terminals interpret only a subset of the ANSI clear/scrollback codes, and others (like iTerm2) gate “clear full scrollback” behind explicit user consent.
  - Replaying a long session is expensive and still subject to timing/race conditions with user output (e.g. shell prompts) when we weren’t in alt screen.

The net result: the legacy TUI could not reliably guarantee “the history you see on screen is complete, in order, and appears exactly once” across terminals, resizes, suspend/resume, and overlay transitions.

2. Goals

The redesign is guided by a few explicit goals:

Codex, not the terminal, owns the viewport.
- The in‑memory transcript (a list of history entries) is the single source of truth for what’s on screen.
- The TUI decides how to map that transcript into the current viewport; scrollback becomes an output target, not an extra data structure we try to maintain.
History must be correct, ordered, and never silently dropped.
- Every logical history cell should either:
  - Be visible in the TUI, or
  - Have been printed into scrollback as part of a suspend/exit flow.
- We would rather (rarely) duplicate content than risk losing it.
Avoid unnecessary duplication.
- When emitting history to scrollback (on suspend or exit), print each logical cell’s content at most once.
- Streaming cells are allowed to be “re‑seen” as they grow, but finished cells should not keep reappearing.
Behave sensibly under resizes.
- TUI rendering should reflow to the current width on every frame.
- History printed to scrollback may have been wrapped at different widths over time; that is acceptable, but it must not cause missing content or unbounded duplication.
Suspend/alt‑screen interaction is predictable.
- Ctrl+Z should:
  - Cleanly exit alt screen, if active.
  - Print a consistent transcript prefix into normal scrollback.
  - Resume with the TUI fully redrawn, without stale artifacts.

3. New Viewport & Transcript Model

3.1 Transcript as a logical sequence of cells

At a high level, the TUI transcript is a list of “cells”, each representing one logical thing in the conversation:

A user prompt (with padding and a distinct background).
An agent response (which may arrive in multiple streaming chunks).
System or info rows (session headers, migration banners, reasoning summaries, etc.).

Each cell knows how to draw itself for a given width: how many lines it needs, what prefixes to use, how to style its content. The transcript itself is purely logical:

It has no scrollback coordinates or terminal state baked into it.
It can be re‑rendered for any viewport width.

The TUI’s job is to take this logical sequence and decide how much of it fits into the current viewport, and how it should be wrapped and styled on screen.

3.2 Building viewport lines from the transcript

To render the main transcript area above the composer, the TUI:

Defines a “transcript region” as the full frame minus the height of the bottom input area.
Flattens all cells into a list of visual lines, remembering for each visual line which cell it came from and which line within that cell it corresponds to.
Uses this flattened list plus a scroll position to decide which visual line should appear at the top of the region.
Clears the transcript region and draws the visible slice of lines into it.
For user messages, paints the entire row background (including padding lines) so the user block stands out even when it does not fill the whole width.
Applies selection styling and other overlays on top of the rendered lines.

Scrolling (mouse wheel, PgUp/PgDn, Home/End) operates entirely in terms of these flattened lines and the current scroll anchor. The terminal’s own scrollback is not part of this calculation; it only ever sees fully rendered frames.

3.3 Alternate screen, overlays, and redraw guarantees

The TUI uses the terminal’s alternate screen for:

The main interactive chat session (so the viewport can cover the full terminal).
Full‑screen overlays such as the transcript pager, diff view, model migration screen, and onboarding.

Conceptually:

Entering alt screen:
- Switches the terminal into alt screen and expands the viewport to cover the full terminal.
- Clears that alt‑screen buffer.
Leaving alt screen:
- Disables “alternate scroll” so mouse wheel events behave predictably.
- Returns to the normal screen.
On leaving overlays and on resuming from suspend, the TUI viewport is explicitly cleared and fully redrawn:
- This prevents stale overlay content or shell output from lingering in the TUI area.
- The next frame reconstructs the UI entirely from the in‑memory transcript and other state, not from whatever the terminal happened to remember.

Alt screen is therefore treated as a temporary render target. The only authoritative copy of the UI is the in‑memory state.

4. Mouse, Selection, and Scrolling

Mouse interaction is a first‑class part of the new design:

Scrolling.
- Mouse wheel scrolls the transcript in fixed line increments.
- Keyboard shortcuts (PgUp/PgDn/Home/End) use the same scroll model, so the footer can show consistent hints regardless of input device.
Selection.
- A click‑and‑drag gesture defines a linear text selection in terms of the flattened transcript lines (not raw buffer coordinates).
- Selection tracks the content rather than a fixed screen row. When the transcript scrolls, the selection moves along with the underlying lines instead of staying glued to a particular Y position.
- The selection only covers the “transcript text” area; it intentionally skips the left gutter that we use for bullets/prefixes.
Copy.
- When the user triggers copy, the TUI re‑renders just the transcript region off‑screen using the same wrapping as the visible view.
- It then walks the selected lines and columns in that off‑screen buffer to reconstruct the exact text region the user highlighted (including internal spaces and empty lines).
- That text is sent to the system clipboard and a status footer indicates success or failure.

Because scrolling, selection, and copy all operate on the same flattened transcript representation, they remain consistent even as the viewport resizes or the chat composer grows/shrinks. Owning our own scrolling also means we must own mouse interactions end‑to‑end: if we left scrolling entirely to the terminal, we could not reliably line up selections with transcript content or avoid accidentally copying gutter/margin characters instead of just the conversation text.

5. Printing History to Scrollback

We still want the final session (and suspend points) to appear in the user’s normal scrollback, but we no longer try to maintain scrollback in lock‑step with the TUI frame. Instead, we treat scrollback as an append‑only log of logical transcript cells.

In practice this means:

The TUI may print history both when you suspend (Ctrl+Z) and when you exit.
Some users may prefer to only print on exit (for example to keep scrollback quieter during long sessions). The current design anticipates gating suspend‑time printing behind a config toggle so that this behavior can be made opt‑in or opt‑out without touching the core viewport logic, but that switch has not been implemented yet.

5.1 Cell‑based high‑water mark

Internally, the TUI keeps a simple “high‑water mark” for history printing:

Think of this as “how many cells at the front of the transcript have already been sent to scrollback.”
It is just a counter over the logical transcript, not over wrapped lines.
It moves forward only when we have actually printed more history.

This means we never try to guess “how many terminal lines have already been printed”; we only remember that “the first N logical entries are done.”

5.2 Rendering new cells for scrollback

When we need to print history (on suspend or exit), we:

Take the suffix of the transcript that lies beyond the high‑water mark.
Render just that suffix into styled lines at the current terminal width.
Write those lines to stdout.
Advance the high‑water mark to include all cells we just printed.

Older cells are never re‑rendered for scrollback; they remain in whatever wrapping they had when they were first printed. This avoids the line‑count–based bugs we had before while still allowing the on‑screen TUI to reflow freely.

5.3 Suspend (`Ctrl+Z`) flow

On suspend (typically Ctrl+Z on Unix):

Before yielding control back to the shell, the TUI:
- Leaves alt screen if it is active and restores normal terminal modes.
- Determines which transcript cells have not yet been printed and renders them for the current width.
- Prints those new lines once into normal scrollback.
- Marks those cells as printed in the high‑water mark.
- Finally, sends the process to the background.

On fg, the process resumes, re‑enters TUI modes, and redraws the viewport from the in‑memory transcript. The history printed during suspend stays in scrollback and is not touched again.

5.4 Exit flow

When the TUI exits, we follow the same principle:

We compute the suffix of the transcript that has not yet been printed (taking into account any prior suspends).
We render just that suffix to styled lines at the current width.
The outer main function leaves alt screen, restores the terminal, and prints those lines, plus a blank line and token usage summary.

If you never suspended, exit prints the entire transcript once. If you did suspend one or more times, exit prints only the cells appended after the last suspend. In both cases, each logical conversation entry reaches scrollback exactly once.

6. Streaming, Width Changes, and Tradeoffs

6.1 Streaming cells

Streaming agent responses are represented as a sequence of history entries:

The first chunk produces a “first line” entry for the message.
Subsequent chunks produce continuation entries that extend that message.

From the history/scrollback perspective:

Each streaming chunk is just another entry in the logical transcript.
The high‑water mark is a simple count of how many entries at the front of the transcript have already been printed.
As new streaming chunks arrive, they are appended as new entries and will be included the next time we print history on suspend or exit.

We do not attempt to reprint or retroactively merge older chunks. In scrollback you will see the streaming response as a series of discrete blocks, matching the internal history structure.

Today, streaming rendering still “bakes in” some width at the time chunks are committed: line breaks for the streaming path are computed using the width that was active at the time, and stored in the intermediate representation. This is a known limitation and is called out in more detail in codex-rs/tui2/docs/streaming_wrapping_design.md; a follow‑up change will make streaming behavior match the rest of the transcript more closely (wrap only at display time, not at commit time).

6.2 Width changes over time

Because we now use a cell‑level high‑water mark instead of a visual line‑count, width changes are handled gracefully:

On every suspend/exit, we render the not‑yet‑printed suffix of the transcript at the current width and append those lines.
Previously printed entries remain in scrollback with whatever wrapping they had at the time they were printed.
We no longer rely on “N lines printed before, therefore skip N lines of the newly wrapped transcript,” which was the source of dropped and duplicated content when widths changed.

This does mean scrollback can contain older cells wrapped for narrower or wider widths than the final terminal size, but:

Each logical cell’s content appears exactly once.
New cells are append‑only and never overwrite or implicitly “shrink” earlier content.
The on‑screen TUI always reflows to the current width independently of scrollback.

If we later choose to also re‑emit the “currently streaming” cell when printing on suspend (to make sure the latest chunk of a long answer is always visible in scrollback), that would intentionally duplicate a small number of lines at the boundary of that cell. The design assumes any such behavior would be controlled by configuration (for example, by disabling suspend‑time printing entirely for users who prefer only exit‑time output).

6.3 Why not reflow scrollback?

In theory we could try to reflow already‑printed content when widths change by:

Recomputing the entire transcript at the new width, and
Printing diffs that “rewrite” old regions in scrollback.

In practice, this runs into the same issues that motivated the redesign:

Terminals treat full clears and scroll regions differently.
There is no portable way to “rewrite” arbitrary portions of scrollback above the visible buffer.
Interleaving user output (e.g. shell prompts after suspend) makes it impossible to reliably reconstruct the original scrollback structure.

We therefore deliberately accept that scrollback is append‑only and not subject to reflow; correctness is measured in terms of logical transcript content, not pixel‑perfect layout.

7. Backtrack and Overlays (Context)

While this document is focused on viewport and history, it’s worth mentioning a few related behaviors that rely on the same model.

7.1 Transcript overlay and backtrack

The transcript overlay (pager) is a full‑screen view of the same logical transcript:

When opened, it takes a snapshot of the current transcript and renders it in an alt‑screen overlay.
Backtrack mode (Esc sequences) walks backwards through user messages in that snapshot and highlights the candidate “edit from here” point.
Confirming a backtrack request forks the conversation on the server and trims the in‑memory transcript so that only history up to the chosen user message remains, then re‑renders that prefix in the main view.

The overlay is purely a different view of the same transcript; it never infers anything from scrollback.

8. Summary of Tradeoffs

What we gain:

The TUI has a clear, single source of truth for history (the in‑memory transcript).
Viewport rendering is deterministic and independent of scrollback.
Suspend and exit flows:
- Print each logical history cell exactly once.
- Are robust to terminal width changes.
- Interact cleanly with alt screen and raw‑mode toggling.
Streaming, overlays, selection, and backtrack all share the same logical history model.
Because cells are always re‑rendered live from the transcript, per‑cell interactions can become richer over time. Instead of treating the transcript as “dead text”, we can make individual entries interactive after they are rendered: expanding or contracting tool calls, diffs, or reasoning summaries in place, jum…truncated… ***

9. TUI2 Implementation Notes

This section maps the design above onto the codex-rs/tui2 crate so future viewport work has concrete code pointers.

9.1 Transcript state and layout

The main app struct (codex-rs/tui2/src/app.rs) tracks the transcript and viewport state with:

transcript_cells: Vec<Arc<dyn HistoryCell>> – the logical history.
transcript_scroll: TranscriptScroll – whether the viewport is pinned to the bottom or anchored at a specific cell/line pair.
transcript_selection: TranscriptSelection – a selection expressed in screen coordinates over the flattened transcript region.
transcript_view_top / transcript_total_lines – the current viewport’s top line index and total number of wrapped lines for the inline transcript area.

9.2 Rendering, wrapping, and selection

App::render_transcript_cells defines the transcript region, builds flattened lines via App::build_transcript_lines, wraps them with word_wrap_lines_borrowed from codex-rs/tui2/src/wrapping.rs, and applies selection via apply_transcript_selection before writing to the frame buffer.

Streaming wrapping details live in codex-rs/tui2/docs/streaming_wrapping_design.md.

9.3 Input, selection, and footer state

Mouse handling lives in App::handle_mouse_event, keyboard scrolling in App::handle_key_event, selection rendering in App::apply_transcript_selection, and copy in App::copy_transcript_selection plus codex-rs/tui2/src/clipboard_copy.rs. Scroll/selection UI state is forwarded through ChatWidget::set_transcript_ui_state, BottomPane::set_transcript_ui_state, and ChatComposer::footer_props, with footer text assembled in codex-rs/tui2/src/bottom_pane/footer.rs.

9.4 Exit transcript output

App::run returns session_lines on AppExitInfo after flattening with App::build_transcript_lines and converting to ANSI via App::render_lines_to_ansi. The CLI prints those lines before the token usage and resume hints.

10. Future Work and Open Questions

10.1 Current status

This design shipped behind the tui2 feature flag (as a separate crate, duplicating the legacy tui crate to enable rollout without breaking existing behavior). The following items from early feedback are already implemented:

Bottom pane positioning is pegged high with an empty transcript and moves down as the transcript fills (including on resume).
Wheel-based transcript scrolling is enabled on top of the new scroll model.
While a selection is active, streaming stops “follow latest output” so the selection remains stable, and follow mode resumes after the selection is cleared.

10.2 Roadmap (prioritized)

This section captures a prioritized list of improvements we want to add to TUI2 based on early feedback, with the goal of making scrolling/selection/copy feel as close to “native terminal” (and Vim) behavior as we can while still owning the viewport.

P0 — must-have (usability/correctness):

Scrolling behavior. Default to small scroll increments (ideally 1 line per wheel tick) with acceleration/velocity for faster navigation, and ensure we stop scrolling when the user stops input (avoid redraw/event-loop backlog that makes scrolling feel “janky”).
Mouse event bounds. Ignore mouse events outside the transcript region so clicks in the composer/footer don’t start or mutate transcript selection state.
Copy includes offscreen lines. Make copy operate on the full selection range even when part (or all) of the selection is outside the current viewport.
Copy fidelity. Preserve meaningful indentation (especially code blocks), treat soft-wrapped prose as a single logical line when copying, and copy markdown source (including backticks and heading markers) even if we render it differently.

P1 — should-have (UX polish and power user workflows):

Streaming wrapping polish. Ensure all streaming paths use display-time wrapping only, and add tests that cover resizing after streaming has started.
Copy shortcut and discoverability. Switch copy from Ctrl+Y to Ctrl+Shift+C, and add an on-screen copy affordance (e.g. a small button near the selection) that also displays the shortcut.
Selection semantics. Define and implement selection behavior across multi-step output (and whether step boundaries should be copy boundaries), while continuing to exclude the left gutter from copied text.
Auto-scroll during drag. While dragging a selection, auto-scroll when the cursor is at/near the top or bottom of the transcript viewport to allow selecting beyond the visible window.
Width-aware selection. Ensure selection highlighting and copy reconstruction handle wide glyphs correctly (emoji, CJK), matching terminal display width rather than raw character count.
Multi-click selection. Support double/triple/quad click selection (word/line/paragraph), implemented on top of the transcript/viewport model rather than terminal buffer coordinates.
Find in transcript. Add text search over the transcript (and consider integrating match markers with any future scroll indicator work).
Cross-terminal behavior checks. Validate copy/selection behavior across common terminals (incl. terminal-provided “override selection” modes like holding Shift) and document the tradeoffs.

P2 — nice-to-have (polish, configuration, and interactivity):

Suspend printing. Decide whether printing history on suspend is desirable at all (it is not implemented yet). If we keep it, finalize the config shape/defaults, wire it through TUI startup, and document it in the appropriate config docs.
Terminal integration. Consider guiding (or optionally managing) terminal-emulator-specific settings that affect TUI behavior (for example iTerm’s clipboard opt-in prompts or Ghostty keybinding quirks), so the “works well out of the box” path is consistent across terminals.
Interactive cells (unlocked by transcript ownership). Because transcript entries are structured objects (not dead text in terminal scrollback), we can attach metadata to rendered regions and map mouse/keys back to the underlying cell reliably across resizes and reflow. Examples:
- Drill into a specific tool/command output. Click (or press Enter) on a tool call / command cell to open a focused overlay that shows the command, exit status, timing, and stdout/stderr as separate sections, with dedicated “copy output” actions. This enables copying just one command’s output even when multiple commands are interleaved in a turn.
- Copy an entire cell or entire turn. Provide an action to copy a whole logical unit (one cell, or “user prompt + assistant response”), without gutters and with well-defined boundaries. This is hard to do with raw selection because step boundaries and padding aren’t reliably expressible in terminal coordinates once the viewport moves or reflows.
- Expand/collapse structured subregions with source-aware copy. Tool calls, diffs, and markdown can render in a compact form by default and expand in place. Copy actions can choose between “copy rendered view” and “copy source” (e.g. raw markdown, raw JSON arguments, raw diff), since we retain the original source alongside the rendered lines.
- Cell-scoped actions. Actions like “copy command”, “yank into composer”, “retry tool call”, or “open related view” (diff/pager) can be offered per cell and behave deterministically, because the UI can address cells by stable IDs rather than by fragile screen coordinates.
Additional affordances. Consider an ephemeral scrollbar and/or a more explicit “selecting…” status if footer hints aren’t sufficient.
UX capture. Maintain short “golden path” clips showing scrolling (mouse + keys), selection and copy, streaming under resize, and suspend/resume + exit printing.

10.3 Open questions

This section collects design questions that follow naturally from the current model and are worth explicit discussion before we commit to further UI changes.

“Scroll mode” vs “live follow” UI.
- We already distinguish “scrolled away from bottom” vs “following the latest output” in the footer and scroll state. Do we need a more explicit “scroll mode vs live mode” affordance (e.g., a dedicated indicator or toggle), or is the current behavior sufficient and adding more chrome would be noise?
Ephemeral scroll indicator.
- For long sessions, a more visible sense of “where am I?” could help. One option is a minimalist scrollbar that appears while the user is actively scrolling and fades out when idle. A full “mini‑map” is probably too heavy for a TUI given the limited vertical space, but we could imagine adding simple markers along the scrollbar to show where prior prompts occurred, or where text search matches are, without trying to render a full preview of the buffer.
Selection affordances.
- Today, the primary hint that selection is active is the reversed text and the “Ctrl+Y copy selection” footer text. Do we want an explicit “Selecting… (Esc to cancel)” status while a drag is in progress, or would that be redundant/clutter for most users?
Suspend banners in scrollback.
- When printing history on suspend, should we also emit a small banner such as --- codex suspended; history up to here --- to make those boundaries obvious in scrollback? This would slightly increase noise but could make multi‑suspend sessions easier to read.
Configuring suspend printing behavior.
- The design already assumes that suspend‑time printing can be gated by config. Questions to resolve:
  - Should printing on suspend be on or off by default?
  - Should we support multiple modes (e.g., “off”, “print all new cells”, “print streaming cell tail only”) or keep it binary?
Streaming duplication at the edges.
- If we later choose to always re‑emit the “currently streaming” message when printing on suspend, we would intentionally allow a small amount of duplication at the boundary of that message (for example, its last line appearing twice across suspends). Is that acceptable if it improves the readability of long streaming answers in scrollback, and should the ability to disable suspend‑time printing be our escape hatch for users who care about exact de‑duplication?***

27 KiB Raw Blame History Unescape Escape