llm-streaming-response-handler

LLM Streaming Response Handler

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "llm-streaming-response-handler" with this command: npx skills add curiositech/some_claude_skills/curiositech-some-claude-skills-llm-streaming-response-handler

LLM Streaming Response Handler

Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.

When to Use

✅ Use for:

  • Chat interfaces with typing animation

  • Real-time AI assistants

  • Code generation with live preview

  • Document summarization with progressive display

  • Any UI where users expect immediate feedback from LLMs

❌ NOT for:

  • Batch document processing (no user watching)

  • APIs that don't support streaming

  • WebSocket-based bidirectional chat (use Socket.IO)

  • Simple request/response (fetch is fine)

Quick Decision Tree

Does your LLM interaction: ├── Need immediate visual feedback? → Streaming ├── Display long-form content (>100 words)? → Streaming ├── User expects typewriter effect? → Streaming ├── Short response (<50 words)? → Regular fetch └── Background processing? → Regular fetch

Technology Selection

Server-Sent Events (SSE) - Recommended

Why SSE over WebSockets for LLM streaming:

  • Simplicity: HTTP-based, works with existing infrastructure

  • Auto-reconnect: Built-in reconnection logic

  • Firewall-friendly: Easier than WebSockets through proxies

  • One-way perfect: LLMs only stream server → client

Timeline:

  • 2015-2020: WebSockets for everything

  • 2020: SSE adoption for streaming APIs

  • 2023+: SSE standard for LLM streaming (OpenAI, Anthropic)

  • 2024: Vercel AI SDK popularizes SSE patterns

Streaming APIs

Provider Streaming Method Response Format

OpenAI SSE data: {"choices":[{"delta":{"content":"token"}}]}

Anthropic SSE data: {"type":"content_block_delta","delta":{"text":"token"}}

Claude (API) SSE data: {"delta":{"text":"token"}}

Vercel AI SDK SSE Normalized across providers

Common Anti-Patterns

Anti-Pattern 1: Buffering Before Display

Novice thinking: "Collect all tokens, then show complete response"

Problem: Defeats the entire purpose of streaming.

Wrong approach:

// ❌ Waits for entire response before showing anything const response = await fetch('/api/chat', { method: 'POST', body: prompt }); const fullText = await response.text(); setMessage(fullText); // User sees nothing until done

Correct approach:

// ✅ Display tokens as they arrive const response = await fetch('/api/chat', { method: 'POST', body: JSON.stringify({ prompt }) });

const reader = response.body.getReader(); const decoder = new TextDecoder();

while (true) { const { done, value } = await reader.read(); if (done) break;

const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim());

for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); setMessage(prev => prev + data.content); // Update immediately } } }

Timeline:

  • Pre-2023: Many apps buffered entire response

  • 2023+: Token-by-token display expected

Anti-Pattern 2: No Stream Cancellation

Problem: User can't stop generation, wasting tokens and money.

Symptom: "Stop" button doesn't work or doesn't exist.

Correct approach:

// ✅ AbortController for cancellation const [abortController, setAbortController] = useState<AbortController | null>(null);

const streamResponse = async () => { const controller = new AbortController(); setAbortController(controller);

try { const response = await fetch('/api/chat', { signal: controller.signal, method: 'POST', body: JSON.stringify({ prompt }) });

// Stream handling...

} catch (error) { if (error.name === 'AbortError') { console.log('Stream cancelled by user'); } } finally { setAbortController(null); } };

const cancelStream = () => { abortController?.abort(); };

return ( <button onClick={cancelStream} disabled={!abortController}> Stop Generating </button> );

Anti-Pattern 3: No Error Recovery

Problem: Stream fails mid-response, user sees partial text with no indication of failure.

Correct approach:

// ✅ Error states and recovery const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle'); const [errorMessage, setErrorMessage] = useState<string | null>(null);

try { setStreamState('streaming');

// Streaming logic...

setStreamState('complete'); } catch (error) { setStreamState('error');

if (error.name === 'AbortError') { setErrorMessage('Generation stopped'); } else if (error.message.includes('429')) { setErrorMessage('Rate limit exceeded. Try again in a moment.'); } else { setErrorMessage('Something went wrong. Please retry.'); } }

// UI feedback {streamState === 'error' && ( <div className="error-banner"> {errorMessage} <button onClick={retryStream}>Retry</button> </div> )}

Anti-Pattern 4: Memory Leaks from Unclosed Streams

Problem: Streams not cleaned up, causing memory leaks.

Symptom: Browser slows down after multiple requests.

Correct approach:

// ✅ Cleanup with useEffect useEffect(() => { let reader: ReadableStreamDefaultReader | null = null;

const streamResponse = async () => { const response = await fetch('/api/chat', { ... }); reader = response.body.getReader();

// Streaming...

};

streamResponse();

// Cleanup on unmount return () => { reader?.cancel(); }; }, [prompt]);

Anti-Pattern 5: No Typing Indicator Between Tokens

Problem: UI feels frozen between slow tokens.

Correct approach:

// ✅ Animated cursor during generation <div className="message"> {content} {isStreaming && <span className="typing-cursor">▊</span>} </div>

.typing-cursor { animation: blink 1s step-end infinite; }

@keyframes blink { 50% { opacity: 0; } }

Implementation Patterns

Pattern 1: Basic SSE Stream Handler

async function* streamCompletion(prompt: string) { const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt }) });

const reader = response.body!.getReader(); const decoder = new TextDecoder();

while (true) { const { done, value } = await reader.read(); if (done) break;

const chunk = decoder.decode(value);
const lines = chunk.split('\n');

for (const line of lines) {
  if (line.startsWith('data: ')) {
    const data = JSON.parse(line.slice(6));

    if (data.content) {
      yield data.content;
    }

    if (data.done) {
      return;
    }
  }
}

} }

// Usage for await (const token of streamCompletion('Hello')) { console.log(token); }

Pattern 2: React Hook for Streaming

import { useState, useCallback } from 'react';

interface UseStreamingOptions { onToken?: (token: string) => void; onComplete?: (fullText: string) => void; onError?: (error: Error) => void; }

export function useStreaming(options: UseStreamingOptions = {}) { const [content, setContent] = useState(''); const [isStreaming, setIsStreaming] = useState(false); const [error, setError] = useState<Error | null>(null); const [abortController, setAbortController] = useState<AbortController | null>(null);

const stream = useCallback(async (prompt: string) => { const controller = new AbortController(); setAbortController(controller); setIsStreaming(true); setError(null); setContent('');

try {
  const response = await fetch('/api/chat', {
    method: 'POST',
    signal: controller.signal,
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt })
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  let accumulated = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(line => line.trim());

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));

        if (data.content) {
          accumulated += data.content;
          setContent(accumulated);
          options.onToken?.(data.content);
        }
      }
    }
  }

  options.onComplete?.(accumulated);
} catch (err) {
  if (err.name !== 'AbortError') {
    setError(err as Error);
    options.onError?.(err as Error);
  }
} finally {
  setIsStreaming(false);
  setAbortController(null);
}

}, [options]);

const cancel = useCallback(() => { abortController?.abort(); }, [abortController]);

return { content, isStreaming, error, stream, cancel }; }

// Usage in component function ChatInterface() { const { content, isStreaming, stream, cancel } = useStreaming({ onToken: (token) => console.log('New token:', token), onComplete: (text) => console.log('Done:', text) });

return ( <div> <div className="message"> {content} {isStreaming && <span className="cursor">▊</span>} </div>

  &#x3C;button onClick={() => stream('Tell me a story')} disabled={isStreaming}>
    Generate
  &#x3C;/button>

  {isStreaming &#x26;&#x26; &#x3C;button onClick={cancel}>Stop&#x3C;/button>}
&#x3C;/div>

); }

Pattern 3: Server-Side Streaming (Next.js)

// app/api/chat/route.ts import { OpenAI } from 'openai';

export const runtime = 'edge'; // Required for streaming

export async function POST(req: Request) { const { prompt } = await req.json();

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const stream = await openai.chat.completions.create({ model: 'gpt-4', messages: [{ role: 'user', content: prompt }], stream: true });

// Convert OpenAI stream to SSE format const encoder = new TextEncoder();

const readable = new ReadableStream({ async start(controller) { try { for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content;

      if (content) {
        const sseMessage = `data: ${JSON.stringify({ content })}\n\n`;
        controller.enqueue(encoder.encode(sseMessage));
      }
    }

    // Send completion signal
    controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
    controller.close();
  } catch (error) {
    controller.error(error);
  }
}

});

return new Response(readable, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' } }); }

Production Checklist

□ AbortController for cancellation □ Error states with retry capability □ Typing indicator during generation □ Cleanup on component unmount □ Rate limiting on API route □ Token usage tracking □ Streaming fallback (if API fails) □ Accessibility (screen reader announces updates) □ Mobile-friendly (touch targets for stop button) □ Network error recovery (auto-retry on disconnect) □ Max response length enforcement □ Cost estimation before generation

When to Use vs Avoid

Scenario Use Streaming?

Chat interface ✅ Yes

Long-form content generation ✅ Yes

Code generation with preview ✅ Yes

Short completions (<50 words) ❌ No - regular fetch

Background jobs ❌ No - use job queue

Bidirectional chat ⚠️ Use WebSockets instead

Technology Comparison

Feature SSE WebSockets Long Polling

Complexity Low Medium High

Auto-reconnect ✅ ❌ ❌

Bidirectional ❌ ✅ ❌

Firewall-friendly ✅ ⚠️ ✅

Browser support ✅ All modern ✅ All modern ✅ Universal

LLM API support ✅ Standard ❌ Rare ❌ Not used

References

  • /references/sse-protocol.md

  • Server-Sent Events specification details

  • /references/vercel-ai-sdk.md

  • Vercel AI SDK integration patterns

  • /references/error-recovery.md

  • Stream error handling strategies

Scripts

  • scripts/stream_tester.ts

  • Test SSE endpoints locally

  • scripts/token_counter.ts

  • Estimate costs before generation

This skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

llm-streaming-response-handler

No summary provided by upstream source.

Repository SourceNeeds Review
General

video-processing-editing

No summary provided by upstream source.

Repository SourceNeeds Review
General

interior-design-expert

No summary provided by upstream source.

Repository SourceNeeds Review
General

project-management-guru-adhd

No summary provided by upstream source.

Repository SourceNeeds Review