What Is Streaming UI in AI Applications? A Complete Guide
What Is Streaming UI?
Streaming UI is a frontend pattern where AI-generated content appears on screen incrementally, token by token, as the language model generates it. Instead of showing a loading spinner for 5-15 seconds and then displaying the complete response, the user sees text appear in real time, word by word.
If you've used ChatGPT, Claude, or Gemini, you've seen streaming UI. The words flowing onto the screen in real time is not just a visual effect. It's a fundamental UX pattern that changes how users perceive speed, builds trust in the output, and gives users the ability to interrupt bad responses early.
For any AI product that uses large language models, streaming UI is the expected baseline. A non-streaming AI interface, one that makes the user wait for the full response before showing anything, feels broken by comparison.
How Streaming Works Under the Hood
The API Layer
Most LLM providers offer streaming API endpoints that send responses as a sequence of small chunks rather than a single complete response.
Server-Sent Events (SSE): The most common transport for LLM streaming. The server sends a stream of text events over a single HTTP connection. Each event contains a small chunk of the response (usually a few tokens).
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" there"}}]}
data: {"choices":[{"delta":{"content":"! How"}}]}
data: {"choices":[{"delta":{"content":" can"}}]}
data: {"choices":[{"delta":{"content":" I"}}]}
data: {"choices":[{"delta":{"content":" help"}}]}
data: [DONE]
WebSockets: Some providers and custom implementations use WebSockets for bidirectional streaming. This is more complex but allows the client to send new messages while the server is still streaming a response.
The Frontend Layer
The frontend receives these chunks and renders them progressively:
- Receive chunk from the stream
- Append the new text to the accumulated response
- Parse the accumulated text (handle partial Markdown, incomplete code blocks)
- Render the updated content
- Repeat until the stream signals completion
// Simplified streaming implementation
async function streamResponse(prompt) {
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ prompt }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let accumulated = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
accumulated += parseChunk(chunk);
setContent(accumulated); // Trigger re-render
}
}
The Rendering Challenge
The hard part of streaming UI isn't receiving the data. It's rendering it correctly as it arrives. LLM responses often contain:
- Markdown that's only valid when complete (a
**boldtag halfway through rendering) - Code blocks with opening fences but no closing fence yet
- Tables that are structurally invalid until the last row arrives
- Lists where indentation determines nesting level
A naive renderer that parses Markdown on every chunk will produce flickering, broken formatting, and layout shifts. Production streaming UIs buffer partial content, defer rendering of incomplete structures, and use techniques to prevent layout thrash.
Why Streaming UI Matters
Perceived Performance
Streaming makes AI responses feel faster even when the total generation time is identical. This is backed by research on perceived latency:
- Time to first token (TTFT): The time from submitting a prompt to seeing the first word. With streaming, TTFT is typically 200-500ms. Without streaming, users wait for the entire response (5-30 seconds depending on length).
- Progressive disclosure: Users start processing the response immediately. By the time the last token arrives, they've already read and understood most of the answer.
A non-streaming interface with a 10-second wait feels slow. A streaming interface with the same total generation time feels responsive because the user is engaged the entire time.
Early Evaluation
Streaming lets users evaluate the response as it forms. If the model is heading in the wrong direction, the user can:
- Stop generation to save time and API cost
- Refine the prompt based on what they're seeing
- Start working with the partial response if it already contains what they need
Without streaming, users have to wait for a complete response before they can decide if it's useful. That's a worse user experience and a more expensive one (the model generates tokens the user doesn't need).
Trust and Transparency
Watching text appear token by token creates a sense of transparency. Users feel like they're watching the model "think." A response that appears all at once feels more opaque and machine-generated, even if the content is identical.
For AI products where trust matters (enterprise, legal, medical, financial), the perception of transparency is valuable.
Streaming UI Patterns
Basic Text Streaming
The simplest pattern: append text to a container as chunks arrive.
function StreamingMessage({ content, isStreaming }) {
return (
<div
className="prose"
aria-live="polite"
aria-busy={isStreaming}
>
{content}
{isStreaming && <span className="animate-pulse">▊</span>}
</div>
);
}
The blinking cursor at the end signals that generation is in progress. Remove it when streaming completes.
Markdown Streaming
Most LLM responses contain Markdown. Rendering Markdown that's still being generated requires buffering:
function StreamingMarkdown({ chunks, isStreaming }) {
const [rendered, setRendered] = useState('');
const bufferRef = useRef('');
useEffect(() => {
bufferRef.current += chunks;
// Only render complete Markdown structures
const safeContent = sanitizePartialMarkdown(bufferRef.current);
setRendered(markdownToHtml(safeContent));
}, [chunks]);
return (
<div
className="prose"
dangerouslySetInnerHTML={{ __html: rendered }}
aria-live="polite"
aria-busy={isStreaming}
/>
);
}
function sanitizePartialMarkdown(text) {
// Close any open bold/italic tags
// Close any open code fences
// Remove incomplete table rows
// Buffer incomplete list items
return cleanedText;
}
Code Block Streaming
Code blocks need special handling because syntax highlighting requires the complete block:
Option 1: Defer highlighting. Render code as plain monospace text during streaming. Apply syntax highlighting after the code block is complete (closing fence received).
Option 2: Progressive highlighting. Re-run the highlighter on every chunk. This works but can be expensive for long code blocks. Debounce the highlighting to every 200-300ms.
Option 3: Language detection. Wait for the opening fence with language tag (```python), then apply language-specific highlighting progressively.
Streaming with Citations
AI products that include source citations face a display challenge: citations often reference content that appears later in the response, or the citation format is only complete when the full reference is generated.
// Citation appears inline as [1]
// Full reference appears at the bottom of the response
// During streaming, [1] might not have a matching reference yet
function StreamingWithCitations({ content, isStreaming }) {
const { text, citations } = parseCitations(content);
return (
<div>
<div className="prose" aria-live="polite" aria-busy={isStreaming}>
{text}
</div>
{citations.length > 0 && (
<div className="citations-panel" aria-label="Sources">
{citations.map((citation, i) => (
<a key={i} href={citation.url} className="citation-card">
<span className="citation-number">[{i + 1}]</span>
<span className="citation-title">{citation.title}</span>
<span className="citation-domain">{citation.domain}</span>
</a>
))}
</div>
)}
</div>
);
}
The AI UX Kit from thefrontkit includes citation components that handle inline references, expandable source cards, and keyboard navigation across citation blocks.
Stream Controls
Every streaming UI needs controls that let users manage the generation process.
Stop Button
Visible during streaming. Aborts the generation and keeps whatever has been generated so far.
function StreamControls({ isStreaming, onStop, onRetry }) {
return (
<div className="flex gap-2">
{isStreaming ? (
<button onClick={onStop} aria-label="Stop generating">
<StopIcon aria-hidden="true" />
Stop
</button>
) : (
<button onClick={onRetry} aria-label="Regenerate response">
<RefreshIcon aria-hidden="true" />
Retry
</button>
)}
</div>
);
}
Retry / Regenerate
Available after streaming completes. Sends the same prompt again for a new response. Some products keep the previous response visible and show the new one alongside it.
Copy Button
Appears after streaming completes (not during, since the content is still changing). Copies the full response text to the clipboard.
Format Toggle
Some responses benefit from viewing in different formats: rendered Markdown, raw Markdown, JSON view, or plain text. Let users switch between formats without regenerating.
For a complete guide to stream controls, see AI Chat UI Best Practices.
Streaming and Accessibility
Streaming content creates specific accessibility challenges. If your app needs to meet WCAG AA (and it should), these patterns are required. See What Is WCAG 2.1 AA? for background.
Screen Reader Announcements
Use aria-live="polite" on the streaming container. This tells screen readers to announce new content when the user is idle, without interrupting what they're currently hearing.
<div aria-live="polite" aria-atomic="false" aria-busy={isStreaming}>
{streamedContent}
</div>
aria-live="polite": Announces changes without interruptingaria-atomic="false": Only announces the new content, not the entire region (critical for streaming, otherwise the screen reader re-reads the entire response on every chunk)aria-busy="true"during streaming: Tells screen readers to hold announcements until streaming completes, preventing a flood of partial announcements
Don't Steal Focus
When a response starts streaming, do not move focus from the prompt input. Users often want to type a follow-up immediately. Let the response stream in the background while focus stays on the input.
Debounce Announcements
Screen readers announcing every token in real time creates an overwhelming, unintelligible experience. Batch announcements in 2-3 second intervals, or wait until streaming completes to announce the full response.
Keyboard Controls
Stop, retry, and copy buttons must be keyboard accessible. Tab should move from the prompt input to the response area to the action buttons in a logical order. See Keyboard Navigation Patterns for Web Apps and ARIA Attributes Cheat Sheet for React.
Streaming with the Vercel AI SDK
The Vercel AI SDK is the most popular library for building streaming AI interfaces in Next.js. It abstracts the streaming plumbing and provides React hooks.
'use client';
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading, stop } = useChat();
return (
<div>
{messages.map((message) => (
<div key={message.id} role={message.role === 'assistant' ? 'status' : undefined}>
<strong>{message.role === 'user' ? 'You' : 'Assistant'}</strong>
<p>{message.content}</p>
</div>
))}
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={handleInputChange}
placeholder="Ask something..."
aria-label="Chat prompt"
/>
{isLoading ? (
<button type="button" onClick={stop}>Stop</button>
) : (
<button type="submit">Send</button>
)}
</form>
</div>
);
}
The Vercel AI SDK handles the streaming connection, chunk parsing, and state management. You provide the UI layer. The AI UX Kit provides production-ready UI components (prompt input, response viewer, citations, feedback) that work with the Vercel AI SDK or any other streaming source.
Streaming UI Anti-Patterns
1. Layout Thrash
Each new token causes the response container to resize, pushing content below it down the page. Users can't read content that keeps jumping.
Fix: Set a minimum height on the response container. Use CSS that allows vertical growth without reflowing sibling elements. Scroll the response container rather than the page.
2. No Loading State Before First Token
The gap between submitting a prompt and receiving the first token (TTFT) can be 200ms to several seconds. If nothing visible happens during this time, users think the submit failed.
Fix: Show an immediate loading indicator the moment the user submits: a typing indicator, a skeleton, or a "thinking..." message.
3. Flickering Markdown
Re-parsing and re-rendering Markdown on every chunk creates visible flicker, especially when formatting toggles appear and disappear (a * that could be an italic opener or just an asterisk).
Fix: Buffer Markdown rendering. Only re-render when you receive a whitespace boundary or sentence break, not on every token.
4. No Stop Button
If the model generates a long, irrelevant response, the user has no way to stop it. They wait for it to finish, costing time and money.
Fix: Always show a prominent stop button during streaming. It should be one of the most visible elements on screen.
5. Losing Partial Responses on Error
If the stream errors mid-response, some implementations discard everything. The user loses content they already read.
Fix: Keep the partial response visible. Show an error message with a retry option that continues from where it left off, or regenerates completely. Let the user choose.
Building Streaming UI with thefrontkit
The AI UX Kit provides production-ready streaming UI components for React and Next.js:
- ResponseViewer: Handles buffered Markdown rendering, code syntax highlighting, streaming cursor, and
aria-liveregions. Supports Markdown, JSON, and plain text formats. - PromptInput: Auto-resizing textarea with keyboard shortcuts (Enter to submit, Shift+Enter for newline), file attachments, and accessible labels.
- StreamControls: Stop, retry, and copy buttons with keyboard navigation and proper ARIA attributes.
- CitationPanel: Inline citation rendering with expandable source cards and keyboard-navigable references.
- FeedbackControls: Thumbs up/down, star ratings, and detailed feedback capture attached to responses.
Every component is WCAG AA accessible, themed with design tokens that sync between Figma and Tailwind CSS, and compatible with the Vercel AI SDK or any custom streaming source.
Combined with the SaaS Starter Kit for auth, dashboard, and settings, you get a complete AI product foundation. See Building AI Products by Combining SaaS and AI UX Kits for a walkthrough.
Related reading:
- AI Chat UI Best Practices
- Session Management in AI Chat Applications
- Production-Ready AI Interfaces with Next.js
- Why Your AI Product's UI Is Losing Users
Start building AI interfaces: View AI UX Kit | View SaaS Starter Kit | Try the live demo