Skip to content

⚡ Real-Time Streaming Responses

Right now your AI feels slow. You send a message, wait several seconds, then the entire response appears at once. That’s not how modern AI chat feels.

Streaming changes everything. Instead of waiting for the complete response, you see the AI’s words as it generates them - just like ChatGPT. The experience becomes fluid and natural.


What you have now:

User: "Write a story about a robot"
[Wait 5 seconds... nothing happens...]
AI: "Once upon a time, there was a robot named Charlie who lived in a small village..."

What streaming gives you:

User: "Write a story about a robot"
AI: "Once"
AI: "Once upon"
AI: "Once upon a"
AI: "Once upon a time,"
AI: "Once upon a time, there"
[...continues in real-time]

Each word appears instantly as the AI generates it.


🛠️ Step 1: Add the Basic Streaming Endpoint

Section titled “🛠️ Step 1: Add the Basic Streaming Endpoint”

Let’s start simple. Add this new endpoint to your index.js file, right after your existing /api/chat endpoint:

// 🚀 NEW: Streaming chat endpoint
app.post("/api/chat/stream", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Create streaming response from OpenAI
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true, // 👈 This enables streaming!
});
// TODO: We'll handle the stream in the next step
} catch (error) {
console.error("Streaming Error:", error);
res.status(500).json({ error: "Failed to stream response" });
}
});

What’s different? Just one line: stream: true. This tells OpenAI to send chunks instead of waiting for the complete response.

Think of it like this: Instead of OpenAI writing an entire letter and mailing it to you, it’s now writing each word and immediately passing it over the fence to you.


Before we can send streaming data, we need to tell the browser this isn’t a regular JSON response. Think of HTTP headers like putting the right label on a package - they tell the recipient what to expect inside.

Add these headers to prepare for streaming:

app.post("/api/chat/stream", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set up streaming headers - like addressing an envelope
res.writeHead(200, {
'Content-Type': 'text/plain', // We're sending text, not JSON
'Cache-Control': 'no-cache', // Don't save this anywhere
'Connection': 'keep-alive', // Keep the line open
});
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true,
});
// TODO: Process the stream in the next step
} catch (error) {
console.error("Streaming Error:", error);
res.status(500).json({ error: "Failed to stream response" });
}
});

Let me explain each header like you’re talking to a friend:

'Content-Type': 'text/plain'
This is like telling your browser: “Hey, I’m about to send you plain text, not a JSON object. Don’t try to parse it as data - just display it as words.”

Why not JSON? With streaming, we’re sending pieces of text one by one. JSON needs to be complete and valid. But with plain text, we can send “Hello” then ” world” then ”!” and it makes sense.

'Cache-Control': 'no-cache'
This tells the browser: “Don’t save this response anywhere. Each piece of text I send is fresh and should go straight to the user.”

Why no caching? Imagine if your browser saved “Hello” and kept showing that even when the AI was trying to stream “Hello world!” You’d miss the rest of the message.

'Connection': 'keep-alive'
This is like saying: “Don’t hang up the phone! I’m going to send you multiple messages in a row.”

Normal HTTP: Request → Response → Hang up
Streaming: Request → Response part 1 → Response part 2 → Response part 3 → … → Hang up

Why keep alive? Usually, HTTP connections close after each response. But with streaming, we need to keep the connection open so we can send many chunks through the same connection.


🚀 Step 3: Process and Forward the Stream

Section titled “🚀 Step 3: Process and Forward the Stream”

Now for the magic part - processing each chunk as it arrives and immediately sending it to the frontend. This is where the real-time experience happens!

app.post("/api/chat/stream", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set headers to prepare for streaming
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
// Get the stream from OpenAI
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true,
});
// Process each chunk as it arrives
for await (const event of stream) {
if (event.type === "response.output_text.delta" && event.delta) {
const textChunk = event.delta.text || event.delta;
res.write(textChunk); // Send this chunk immediately
res.flush?.(); // Force send (don't buffer)
}
}
res.end(); // Close the stream when done
} catch (error) {
console.error("Streaming Error:", error);
res.status(500).json({ error: "Failed to stream response" });
}
});

Let me break down the streaming loop step by step:

for await (const event of stream)
This is like saying: “For each new piece that OpenAI sends me, do something with it.” The await means “wait for the next piece, then process it immediately.”

Think of it like this: OpenAI is reading you a story over the phone. Instead of waiting for them to finish the entire story, you repeat each sentence to your friend as soon as you hear it.

if (event.type === "response.output_text.delta" && event.delta)
OpenAI sends different types of events. We only care about text chunks. This line says: “Only process this if it’s actual text content.”

Why filter events? OpenAI might send metadata, status updates, or other info. We only want the actual words to display to the user.

const textChunk = event.delta.text || event.delta
Extract the actual text from the event. Sometimes it’s nested, sometimes it’s direct.

res.write(textChunk)
This sends the text chunk to your frontend immediately. It’s like whispering each word to your friend as you hear it.

res.flush?.()
This forces the data to send right now, not later. The ?. is just safety - some environments don’t have flush, but it still works.

Why flush? Sometimes servers collect data before sending it (buffering). Flush says “send everything you have right now!” so users see words instantly.

res.end()
When OpenAI is done sending chunks, we close the connection. Like hanging up the phone when the story is finished.


Streaming is trickier than regular responses because you might be halfway through sending data when an error occurs. It’s like being on a phone call that drops - you need to handle it gracefully.

app.post("/api/chat/stream", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true,
});
for await (const event of stream) {
if (event.type === "response.output_text.delta" && event.delta) {
const textChunk = event.delta.text || event.delta;
res.write(textChunk);
res.flush?.();
}
}
res.end();
} catch (error) {
console.error("Streaming Error:", error);
// Smart error handling for streaming
if (res.headersSent) {
// We're already streaming, send a text error
res.write("\n[Error occurred]");
res.end();
} else {
// Headers not sent yet, send JSON error
res.status(500).json({ error: "Failed to stream response" });
}
}
});

Smart error handling explained:

if (res.headersSent)
This checks: “Have I already started sending data to the frontend?” It’s like asking “Have I already started talking on the phone?”

Why does this matter? Once you start streaming, you can’t go back and send a JSON error response. You’re already in “text mode.”

Mid-stream error (headers already sent):

res.write("\n[Error occurred]");
res.end();

If we’re already streaming text, we send a simple text error message. The frontend can detect this and show a user-friendly error.

Pre-stream error (headers not sent yet):

res.status(500).json({ error: "Failed to stream response" });

If we haven’t started streaming yet, we can still send a proper JSON error response.

Always graceful: Never crash the server or leave connections hanging. Always close the connection properly.


📋 Step 5: Your Complete Updated Backend

Section titled “📋 Step 5: Your Complete Updated Backend”

Here’s your complete index.js with both regular and streaming endpoints:

import express from "express";
import { config } from "dotenv";
import cors from "cors";
import OpenAI from "openai";
config();
const app = express();
const PORT = process.env.PORT || 8000;
// Create OpenAI client
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Middleware
app.use(cors());
app.use(express.json());
// Test route
app.get("/", (req, res) => {
res.json({
message: "🤖 OpenAI Backend is running!",
endpoints: {
chat: "/api/chat",
streaming: "/api/chat/stream"
}
});
});
// Original chat endpoint (keep this!)
app.post("/api/chat", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
const response = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
});
res.json({
response: response.output_text,
model: "gpt-4o-mini",
success: true
});
} catch (error) {
console.error("OpenAI Error:", error);
res.status(500).json({
error: "Failed to get AI response",
success: false
});
}
});
// 🚀 NEW: Streaming chat endpoint
app.post("/api/chat/stream", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set streaming headers
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
// Create streaming response
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true,
});
// Send each chunk immediately
for await (const event of stream) {
if (event.type === "response.output_text.delta" && event.delta) {
const textChunk = event.delta.text || event.delta;
res.write(textChunk);
res.flush?.();
}
}
res.end();
} catch (error) {
console.error("Streaming Error:", error);
if (res.headersSent) {
res.write("\n[Error occurred]");
res.end();
} else {
res.status(500).json({ error: "Failed to stream response" });
}
}
});
// Start server
app.listen(PORT, () => {
console.log(`🚀 Server running on http://localhost:${PORT}`);
console.log(`💬 Regular chat: POST /api/chat`);
console.log(`⚡ Streaming chat: POST /api/chat/stream`);
});

Why keep both endpoints?

  • /api/chat - Simple, reliable, good for testing and when you need the complete response
  • /api/chat/stream - Real-time experience, what users expect from modern AI apps

Start your backend:

Terminal window
npm run dev

Test with curl (see the magic!):

Terminal window
curl -X POST http://localhost:8000/api/chat/stream \
-H "Content-Type: application/json" \
-d '{"message": "Write a haiku about programming"}'

You should see: Words appearing one by one in your terminal, just like someone typing!

Code flows like water,
Bugs dance in the morning light,
Coffee saves the day.

Compare to regular endpoint:

Terminal window
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "Write a haiku about programming"}'

Feel the difference? Streaming feels alive, regular feels static.


❌ “Cannot set headers after they are sent”

  • This happens if you try to send a JSON response after starting a stream
  • Use the error handling pattern we showed (check res.headersSent)

❌ Stream never ends in curl

  • This is normal! The stream closes when complete
  • Press Ctrl+C to exit curl if needed

❌ Getting JSON instead of streaming text

  • Check you’re calling /api/chat/stream not /api/chat
  • Verify the Content-Type is text/plain in your headers

❌ Chunks arriving all at once

  • Some networks buffer data. This is normal in development
  • Real browsers handle streaming perfectly

❌ “flush is not a function”

  • The ?. operator handles this gracefully
  • res.flush() isn’t available in all environments, but streaming still works

Backend perspective:

  1. Receive user message
  2. Start stream to OpenAI with stream: true
  3. Forward each chunk to frontend immediately
  4. Close connection when OpenAI finishes

What OpenAI sends:

// First chunk
{ type: "response.output_text.delta", delta: { text: "Code" } }
// Second chunk
{ type: "response.output_text.delta", delta: { text: " flows" } }
// Third chunk
{ type: "response.output_text.delta", delta: { text: " like" } }
// And so on...

What your backend forwards:

"Code" → " flows" → " like" → " water" → "..."

Performance benefit: Users see progress immediately instead of waiting for the complete response.

Think of it like a live sports broadcast: Instead of waiting for the game to end and then watching a recording, you see each play as it happens. That’s the difference streaming makes for AI responses.


Excellent work! 🎉 Your backend now supports real-time streaming.

You’ve just implemented one of the most important features in modern AI applications. Your chat will feel incredibly responsive and professional.

What you’ve built:

  • Real-time streaming - Words appear as AI generates them
  • 🛡️ Robust error handling - Graceful failures during streaming
  • 🔄 Dual endpoints - Both streaming and traditional responses
  • 🚀 Production-ready - Proper headers and connection management

You now understand:

  • 🧠 HTTP streaming concepts - How to keep connections alive and send chunks
  • 🔧 Header management - Why each header matters for streaming
  • 🛡️ Error handling - Different strategies for different error scenarios
  • 🎯 Real-time data flow - From OpenAI chunks to user’s screen

The foundation is solid. Your backend can now deliver AI responses that feel instant and engaging.

👉 Next: Streaming Frontend Integration - Let’s make your React app handle these real-time responses!