⚡ Real-Time Streaming Responses

Right now, your AI feels slow. You send a message, wait a few seconds, then the entire response appears at once.

Streaming changes this. Instead of waiting for the complete response, you get the AI’s words as it generates them - just like ChatGPT. The words appear as the AI “thinks,” making it feel more natural and responsive.

🔄 How Streaming Works

Without Streaming (what you have now):

User: "Write a story about a robot"
[Wait 5 seconds...]
AI: "Once upon a time, there was a robot named Charlie..."

With Streaming (what you’re building):

User: "Write a story about a robot"
AI: "Once"
AI: "Once upon"
AI: "Once upon a time,"
AI: "Once upon a time, there was"
[...continues word by word]

Each piece arrives in real-time, making the experience feel faster and more interactive — just like ChatGPT.

🛠️ Backend: Adding Stream Support

Let’s upgrade your backend to support streaming responses. You’ll add a new endpoint specifically for streaming.

Why two routes? Splitting it into two clean routes keeps your code clear and modular. You keep your original /api/chat for simple requests and add /api/chat/stream for real-time streaming. This is good for learning and gives you flexibility.

Step 1: Create the Streaming Route

Add this new route to your index.js file, right after your existing /api/chat endpoint:

// Streaming AI Chat endpoint
app.post("/api/chat/stream", async (req, res) => {
  try {
    const { message } = req.body;

    if (!message) {
      return res.status(400).json({ error: "Message is required" });
    }

    // Set headers for streaming
    res.writeHead(200, {
      'Content-Type': 'text/plain',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    });

    // Create streaming response using Response API
    const stream = await openai.responses.create({
      model: "gpt-4o-mini",
      input: message,
      stream: true,
    });

    // Stream each chunk to the frontend - Handle Response API events
    for await (const event of stream) {
      switch (event.type) {
        case "response.output_text.delta":
          if (event.delta) {
            let textChunk = typeof event.delta === "string"
              ? event.delta
              : event.delta.text || "";

            if (textChunk) {
              res.write(textChunk);
              res.flush?.();
            }
          }
          break;

        case "text_delta":
          if (event.text) {
            res.write(event.text);
            res.flush?.();
          }
          break;

        case "response.created":
        case "response.completed":
        case "response.output_item.added":
        case "response.content_part.added":
        case "response.content_part.done":
        case "response.output_item.done":
        case "response.output_text.done":
          // Keep connection alive, no content to write
          break;

        case "error":
          console.error("Stream error:", event.error);
          res.write("\n[Error during generation]");
          break;
      }
    }

    // Close the stream
    res.end();

  } catch (error) {
    console.error("OpenAI Streaming Error:", error);

    // Handle error properly for streaming
    if (res.headersSent) {
      res.write("\n[Error occurred]");
      res.end();
    } else {
      res.status(500).json({
        error: "Failed to stream AI response",
        success: false,
      });
    }
  }
});

Step 2: Understanding the Streaming Headers

res.writeHead(200, {
  'Content-Type': 'text/plain',
  'Cache-Control': 'no-cache',
  'Connection': 'keep-alive',
});

What each header does:

Content-Type: text/plain - Tells the browser you’re sending plain text, not JSON
Cache-Control: no-cache - Prevents browsers from caching the stream
Connection: keep-alive - Keeps the HTTP connection open for streaming

Why you need these: Without proper headers, the browser won’t handle the streaming correctly.

Step 3: Setting Up the OpenAI Response API Stream

const stream = await openai.responses.create({
  model: "gpt-4o-mini",
  input: message,
  stream: true,  // This is the magic flag
});

Breaking it down:

stream: true - This tells the Response API to send the response in chunks instead of all at once
model: "gpt-4o-mini" - Using a fast, efficient model for streaming
input: message - The user’s message to process
The Response API will now send you events as it generates the response

Step 4: Processing the Response API Stream Events

for await (const event of stream) {
  switch (event.type) {
    case "response.output_text.delta":
      if (event.delta) {
        let textChunk = typeof event.delta === "string"
          ? event.delta
          : event.delta.text || "";

        if (textChunk) {
          res.write(textChunk);
          res.flush?.();
        }
      }
      break;

    case "text_delta":
      if (event.text) {
        res.write(event.text);
        res.flush?.();
      }
      break;
  }
}

What’s happening here:

for await - This special loop waits for each event from the Response API
event.type - Each event has a type that tells you what kind of data it contains
response.output_text.delta - Contains incremental text chunks
text_delta - Alternative event type for text chunks
res.write() - Immediately sends that piece to your frontend
res.flush?.() - Forces the data to be sent immediately (if available)

Why this works: Instead of waiting for the Response API to finish the entire response, you forward each text chunk immediately to the user as events arrive.

📝 Complete Updated Backend

Here’s your complete index.js with streaming support using the Response API:

import express from "express";
import { config } from "dotenv";
import cors from "cors";
import OpenAI from "openai";

config();

const app = express();
const port = process.env.PORT || 8000;

// Create OpenAI client once
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

app.use(cors());
app.use(express.json());

// Test route
app.get("/", (req, res) => {
  res.send("Backend is running successfully.");
});

// Original chat endpoint (keep for compatibility)
app.post("/api/chat", async (req, res) => {
  try {
    const { message } = req.body;

    if (!message) {
      return res.status(400).json({ error: "Message is required" });
    }

    const response = await openai.responses.create({
      model: "gpt-4o-mini",
      input: message,
    });

    res.json({
      response: response.output_text,
      success: true,
    });
  } catch (error) {
    console.error("OpenAI API Error:", error);
    res.status(500).json({
      error: "Failed to get AI response",
      success: false,
    });
  }
});

// 🆕 NEW ADDITION: Streaming chat endpoint using Response API
app.post("/api/chat/stream", async (req, res) => {
  try {
    const { message } = req.body;

    if (!message) {
      return res.status(400).json({ error: "Message is required" });
    }

    // Set streaming headers
    res.writeHead(200, {
      'Content-Type': 'text/plain',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    });

    // Create streaming response using Response API
    const stream = await openai.responses.create({
      model: "gpt-4o-mini",
      input: message,
      stream: true,
    });

    // Stream each event from Response API
    for await (const event of stream) {
      switch (event.type) {
        case "response.output_text.delta":
          if (event.delta) {
            let textChunk = typeof event.delta === "string"
              ? event.delta
              : event.delta.text || "";

            if (textChunk) {
              res.write(textChunk);
              res.flush?.();
            }
          }
          break;

        case "text_delta":
          if (event.text) {
            res.write(event.text);
            res.flush?.();
          }
          break;

        case "response.created":
        case "response.completed":
        case "response.output_item.added":
        case "response.content_part.added":
        case "response.content_part.done":
        case "response.output_item.done":
        case "response.output_text.done":
          // Keep connection alive, no content to write
          break;

        case "error":
          console.error("Stream error:", event.error);
          res.write("\n[Error during generation]");
          break;
      }
    }

    res.end();

  } catch (error) {
    console.error("OpenAI Streaming Error:", error);

    // Handle error properly for streaming
    if (res.headersSent) {
      res.write("\n[Error occurred]");
      res.end();
    } else {
      res.status(500).json({
        error: "Failed to stream AI response",
        success: false,
      });
    }
  }
});

app.listen(port, () => {
  console.log(`🚀 Server running on http://localhost:${port}`);
});

🧪 Test Your Streaming Endpoint

Test with curl to see streaming in action:

curl -X POST http://localhost:8000/api/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"message": "Count from 1 to 10 slowly"}' \
  --no-buffer

You should see the response appear word by word instead of all at once!

🔧 Key Differences from Standard OpenAI API

The Response API works differently from the standard Chat Completions API:

Standard Chat API	Response API
`openai.chat.completions.create()`	`openai.responses.create()`
`messages: [...]`	`input: message`
`chunk.choices[0]?.delta?.content`	`event.delta.text` or `event.text`
Single event type	Multiple event types

Response API Events:

response.output_text.delta - Contains text chunks
text_delta - Alternative text chunk format
response.created - Stream started
response.completed - Stream finished
error - Something went wrong

✅ What’s Next

Your backend now supports streaming with the Response API! Next, you’ll update your React frontend to:

Handle streaming responses
Display text as it arrives
Show a typing indicator
Create that smooth ChatGPT-like experience

The backend is ready - now let’s make the frontend feel truly real-time! 🚀