Skip to content

⚡ Real-Time Streaming Responses

Right now, your AI feels slow. You send a message, wait a few seconds, then the entire response appears at once.

Streaming changes this. Instead of waiting for the complete response, you get the AI’s words as it generates them - just like ChatGPT. The words appear as the AI “thinks,” making it feel more natural and responsive.


Without Streaming (what you have now):

User: "Write a story about a robot"
[Wait 5 seconds...]
AI: "Once upon a time, there was a robot named Charlie..."

With Streaming (what you’re building):

User: "Write a story about a robot"
AI: "Once"
AI: "Once upon"
AI: "Once upon a time,"
AI: "Once upon a time, there was"
[...continues word by word]

Each piece arrives in real-time, making the experience feel faster and more interactive — just like ChatGPT.


Let’s upgrade your backend to support streaming responses. You’ll add a new endpoint specifically for streaming.

Why two routes? Splitting it into two clean routes keeps your code clear and modular. You keep your original /api/chat for simple requests and add /api/chat/stream for real-time streaming. This is good for learning and gives you flexibility.

Add this new route to your index.js file, right after your existing /api/chat endpoint:

// Streaming AI Chat endpoint
app.post("/api/chat/stream", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set headers for streaming
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
// Create streaming response using Response API
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true,
});
// Stream each chunk to the frontend - Handle Response API events
for await (const event of stream) {
switch (event.type) {
case "response.output_text.delta":
if (event.delta) {
let textChunk = typeof event.delta === "string"
? event.delta
: event.delta.text || "";
if (textChunk) {
res.write(textChunk);
res.flush?.();
}
}
break;
case "text_delta":
if (event.text) {
res.write(event.text);
res.flush?.();
}
break;
case "response.created":
case "response.completed":
case "response.output_item.added":
case "response.content_part.added":
case "response.content_part.done":
case "response.output_item.done":
case "response.output_text.done":
// Keep connection alive, no content to write
break;
case "error":
console.error("Stream error:", event.error);
res.write("\n[Error during generation]");
break;
}
}
// Close the stream
res.end();
} catch (error) {
console.error("OpenAI Streaming Error:", error);
// Handle error properly for streaming
if (res.headersSent) {
res.write("\n[Error occurred]");
res.end();
} else {
res.status(500).json({
error: "Failed to stream AI response",
success: false,
});
}
}
});

Step 2: Understanding the Streaming Headers

Section titled “Step 2: Understanding the Streaming Headers”
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});

What each header does:

  • Content-Type: text/plain - Tells the browser you’re sending plain text, not JSON
  • Cache-Control: no-cache - Prevents browsers from caching the stream
  • Connection: keep-alive - Keeps the HTTP connection open for streaming

Why you need these: Without proper headers, the browser won’t handle the streaming correctly.

Step 3: Setting Up the OpenAI Response API Stream

Section titled “Step 3: Setting Up the OpenAI Response API Stream”
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true, // This is the magic flag
});

Breaking it down:

  • stream: true - This tells the Response API to send the response in chunks instead of all at once
  • model: "gpt-4o-mini" - Using a fast, efficient model for streaming
  • input: message - The user’s message to process
  • The Response API will now send you events as it generates the response

Step 4: Processing the Response API Stream Events

Section titled “Step 4: Processing the Response API Stream Events”
for await (const event of stream) {
switch (event.type) {
case "response.output_text.delta":
if (event.delta) {
let textChunk = typeof event.delta === "string"
? event.delta
: event.delta.text || "";
if (textChunk) {
res.write(textChunk);
res.flush?.();
}
}
break;
case "text_delta":
if (event.text) {
res.write(event.text);
res.flush?.();
}
break;
}
}

What’s happening here:

  • for await - This special loop waits for each event from the Response API
  • event.type - Each event has a type that tells you what kind of data it contains
  • response.output_text.delta - Contains incremental text chunks
  • text_delta - Alternative event type for text chunks
  • res.write() - Immediately sends that piece to your frontend
  • res.flush?.() - Forces the data to be sent immediately (if available)

Why this works: Instead of waiting for the Response API to finish the entire response, you forward each text chunk immediately to the user as events arrive.


Here’s your complete index.js with streaming support using the Response API:

import express from "express";
import { config } from "dotenv";
import cors from "cors";
import OpenAI from "openai";
config();
const app = express();
const port = process.env.PORT || 8000;
// Create OpenAI client once
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
app.use(cors());
app.use(express.json());
// Test route
app.get("/", (req, res) => {
res.send("Backend is running successfully.");
});
// Original chat endpoint (keep for compatibility)
app.post("/api/chat", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
const response = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
});
res.json({
response: response.output_text,
success: true,
});
} catch (error) {
console.error("OpenAI API Error:", error);
res.status(500).json({
error: "Failed to get AI response",
success: false,
});
}
});
// 🆕 NEW ADDITION: Streaming chat endpoint using Response API
app.post("/api/chat/stream", async (req, res) => {
try {
const { message } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set streaming headers
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
// Create streaming response using Response API
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: message,
stream: true,
});
// Stream each event from Response API
for await (const event of stream) {
switch (event.type) {
case "response.output_text.delta":
if (event.delta) {
let textChunk = typeof event.delta === "string"
? event.delta
: event.delta.text || "";
if (textChunk) {
res.write(textChunk);
res.flush?.();
}
}
break;
case "text_delta":
if (event.text) {
res.write(event.text);
res.flush?.();
}
break;
case "response.created":
case "response.completed":
case "response.output_item.added":
case "response.content_part.added":
case "response.content_part.done":
case "response.output_item.done":
case "response.output_text.done":
// Keep connection alive, no content to write
break;
case "error":
console.error("Stream error:", event.error);
res.write("\n[Error during generation]");
break;
}
}
res.end();
} catch (error) {
console.error("OpenAI Streaming Error:", error);
// Handle error properly for streaming
if (res.headersSent) {
res.write("\n[Error occurred]");
res.end();
} else {
res.status(500).json({
error: "Failed to stream AI response",
success: false,
});
}
}
});
app.listen(port, () => {
console.log(`🚀 Server running on http://localhost:${port}`);
});

Test with curl to see streaming in action:

Terminal window
curl -X POST http://localhost:8000/api/chat/stream \
-H "Content-Type: application/json" \
-d '{"message": "Count from 1 to 10 slowly"}' \
--no-buffer

You should see the response appear word by word instead of all at once!


🔧 Key Differences from Standard OpenAI API

Section titled “🔧 Key Differences from Standard OpenAI API”

The Response API works differently from the standard Chat Completions API:

Standard Chat APIResponse API
openai.chat.completions.create()openai.responses.create()
messages: [...]input: message
chunk.choices[0]?.delta?.contentevent.delta.text or event.text
Single event typeMultiple event types

Response API Events:

  • response.output_text.delta - Contains text chunks
  • text_delta - Alternative text chunk format
  • response.created - Stream started
  • response.completed - Stream finished
  • error - Something went wrong

Your backend now supports streaming with the Response API! Next, you’ll update your React frontend to:

  • Handle streaming responses
  • Display text as it arrives
  • Show a typing indicator
  • Create that smooth ChatGPT-like experience

The backend is ready - now let’s make the frontend feel truly real-time! 🚀