⚡ Real-Time Streaming Responses
Right now, your AI feels slow. You send a message, wait a few seconds, then the entire response appears at once.
Streaming changes this. Instead of waiting for the complete response, you get the AI’s words as it generates them - just like ChatGPT. The words appear as the AI “thinks,” making it feel more natural and responsive.
🔄 How Streaming Works
Section titled “🔄 How Streaming Works”Without Streaming (what you have now):
User: "Write a story about a robot"[Wait 5 seconds...]AI: "Once upon a time, there was a robot named Charlie..."
With Streaming (what you’re building):
User: "Write a story about a robot"AI: "Once"AI: "Once upon"AI: "Once upon a time,"AI: "Once upon a time, there was"[...continues word by word]
Each piece arrives in real-time, making the experience feel faster and more interactive — just like ChatGPT.
🛠️ Backend: Adding Stream Support
Section titled “🛠️ Backend: Adding Stream Support”Let’s upgrade your backend to support streaming responses. You’ll add a new endpoint specifically for streaming.
Why two routes?
Splitting it into two clean routes keeps your code clear and modular. You keep your original /api/chat
for simple requests and add /api/chat/stream
for real-time streaming. This is good for learning and gives you flexibility.
Step 1: Create the Streaming Route
Section titled “Step 1: Create the Streaming Route”Add this new route to your index.js
file, right after your existing /api/chat
endpoint:
// Streaming AI Chat endpointapp.post("/api/chat/stream", async (req, res) => { try { const { message } = req.body;
if (!message) { return res.status(400).json({ error: "Message is required" }); }
// Set headers for streaming res.writeHead(200, { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', });
// Create streaming response using Response API const stream = await openai.responses.create({ model: "gpt-4o-mini", input: message, stream: true, });
// Stream each chunk to the frontend - Handle Response API events for await (const event of stream) { switch (event.type) { case "response.output_text.delta": if (event.delta) { let textChunk = typeof event.delta === "string" ? event.delta : event.delta.text || "";
if (textChunk) { res.write(textChunk); res.flush?.(); } } break;
case "text_delta": if (event.text) { res.write(event.text); res.flush?.(); } break;
case "response.created": case "response.completed": case "response.output_item.added": case "response.content_part.added": case "response.content_part.done": case "response.output_item.done": case "response.output_text.done": // Keep connection alive, no content to write break;
case "error": console.error("Stream error:", event.error); res.write("\n[Error during generation]"); break; } }
// Close the stream res.end();
} catch (error) { console.error("OpenAI Streaming Error:", error);
// Handle error properly for streaming if (res.headersSent) { res.write("\n[Error occurred]"); res.end(); } else { res.status(500).json({ error: "Failed to stream AI response", success: false, }); } }});
Step 2: Understanding the Streaming Headers
Section titled “Step 2: Understanding the Streaming Headers”res.writeHead(200, { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive',});
What each header does:
Content-Type: text/plain
- Tells the browser you’re sending plain text, not JSONCache-Control: no-cache
- Prevents browsers from caching the streamConnection: keep-alive
- Keeps the HTTP connection open for streaming
Why you need these: Without proper headers, the browser won’t handle the streaming correctly.
Step 3: Setting Up the OpenAI Response API Stream
Section titled “Step 3: Setting Up the OpenAI Response API Stream”const stream = await openai.responses.create({ model: "gpt-4o-mini", input: message, stream: true, // This is the magic flag});
Breaking it down:
stream: true
- This tells the Response API to send the response in chunks instead of all at oncemodel: "gpt-4o-mini"
- Using a fast, efficient model for streaminginput: message
- The user’s message to process- The Response API will now send you events as it generates the response
Step 4: Processing the Response API Stream Events
Section titled “Step 4: Processing the Response API Stream Events”for await (const event of stream) { switch (event.type) { case "response.output_text.delta": if (event.delta) { let textChunk = typeof event.delta === "string" ? event.delta : event.delta.text || "";
if (textChunk) { res.write(textChunk); res.flush?.(); } } break;
case "text_delta": if (event.text) { res.write(event.text); res.flush?.(); } break; }}
What’s happening here:
for await
- This special loop waits for each event from the Response APIevent.type
- Each event has a type that tells you what kind of data it containsresponse.output_text.delta
- Contains incremental text chunkstext_delta
- Alternative event type for text chunksres.write()
- Immediately sends that piece to your frontendres.flush?.()
- Forces the data to be sent immediately (if available)
Why this works: Instead of waiting for the Response API to finish the entire response, you forward each text chunk immediately to the user as events arrive.
📝 Complete Updated Backend
Section titled “📝 Complete Updated Backend”Here’s your complete index.js
with streaming support using the Response API:
import express from "express";import { config } from "dotenv";import cors from "cors";import OpenAI from "openai";
config();
const app = express();const port = process.env.PORT || 8000;
// Create OpenAI client onceconst openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY,});
app.use(cors());app.use(express.json());
// Test routeapp.get("/", (req, res) => { res.send("Backend is running successfully.");});
// Original chat endpoint (keep for compatibility)app.post("/api/chat", async (req, res) => { try { const { message } = req.body;
if (!message) { return res.status(400).json({ error: "Message is required" }); }
const response = await openai.responses.create({ model: "gpt-4o-mini", input: message, });
res.json({ response: response.output_text, success: true, }); } catch (error) { console.error("OpenAI API Error:", error); res.status(500).json({ error: "Failed to get AI response", success: false, }); }});
// 🆕 NEW ADDITION: Streaming chat endpoint using Response APIapp.post("/api/chat/stream", async (req, res) => { try { const { message } = req.body;
if (!message) { return res.status(400).json({ error: "Message is required" }); }
// Set streaming headers res.writeHead(200, { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', });
// Create streaming response using Response API const stream = await openai.responses.create({ model: "gpt-4o-mini", input: message, stream: true, });
// Stream each event from Response API for await (const event of stream) { switch (event.type) { case "response.output_text.delta": if (event.delta) { let textChunk = typeof event.delta === "string" ? event.delta : event.delta.text || "";
if (textChunk) { res.write(textChunk); res.flush?.(); } } break;
case "text_delta": if (event.text) { res.write(event.text); res.flush?.(); } break;
case "response.created": case "response.completed": case "response.output_item.added": case "response.content_part.added": case "response.content_part.done": case "response.output_item.done": case "response.output_text.done": // Keep connection alive, no content to write break;
case "error": console.error("Stream error:", event.error); res.write("\n[Error during generation]"); break; } }
res.end();
} catch (error) { console.error("OpenAI Streaming Error:", error);
// Handle error properly for streaming if (res.headersSent) { res.write("\n[Error occurred]"); res.end(); } else { res.status(500).json({ error: "Failed to stream AI response", success: false, }); } }});
app.listen(port, () => { console.log(`🚀 Server running on http://localhost:${port}`);});
🧪 Test Your Streaming Endpoint
Section titled “🧪 Test Your Streaming Endpoint”Test with curl to see streaming in action:
curl -X POST http://localhost:8000/api/chat/stream \ -H "Content-Type: application/json" \ -d '{"message": "Count from 1 to 10 slowly"}' \ --no-buffer
You should see the response appear word by word instead of all at once!
🔧 Key Differences from Standard OpenAI API
Section titled “🔧 Key Differences from Standard OpenAI API”The Response API works differently from the standard Chat Completions API:
Standard Chat API | Response API |
---|---|
openai.chat.completions.create() | openai.responses.create() |
messages: [...] | input: message |
chunk.choices[0]?.delta?.content | event.delta.text or event.text |
Single event type | Multiple event types |
Response API Events:
response.output_text.delta
- Contains text chunkstext_delta
- Alternative text chunk formatresponse.created
- Stream startedresponse.completed
- Stream finishederror
- Something went wrong
✅ What’s Next
Section titled “✅ What’s Next”Your backend now supports streaming with the Response API! Next, you’ll update your React frontend to:
- Handle streaming responses
- Display text as it arrives
- Show a typing indicator
- Create that smooth ChatGPT-like experience
The backend is ready - now let’s make the frontend feel truly real-time! 🚀