Skip to content

🧠 Adding Chat Memory

Right now, your chat has a problem. Every time you ask a question, the AI feels like it’s meeting you for the first time. You can’t say “tell me more about that” or “what was the first thing you mentioned?” because the AI has no memory of your previous messages.

This makes conversations feel disconnected and frustrating. Real conversations build on what was said before.


What’s happening now:

You: "What's the capital of France?"
AI: "Paris is the capital of France."
You: "Tell me more about that city"
AI: "I'd be happy to help! What city are you asking about?"

The AI doesn’t remember you were just talking about Paris. Each message is completely isolated.

What you want:

You: "What's the capital of France?"
AI: "Paris is the capital of France."
You: "Tell me more about that city"
AI: "Paris is a beautiful city with over 2 million people. It's famous for the Eiffel Tower, the Louvre Museum, and its café culture..."

The AI remembers the context and continues the conversation naturally.


Instead of sending just the current message to OpenAI, you need to send the entire conversation history.

Without Memory (what you have now):

// Only sending current message
const response = await openai.responses.create({
model: "gpt-4.1",
input: currentMessage
})

With Memory (what you’re building):

// Sending entire conversation
const response = await openai.responses.create({
model: "gpt-4.1",
input: [
{ role: "user", content: "What's the capital of France?" },
{ role: "assistant", content: "Paris is the capital of France." },
{ role: "user", content: "Tell me more about that city" }
]
})

OpenAI expects messages in a specific format with roles. Understanding these roles is crucial for building proper conversation memory.

const conversationHistory = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
{ role: "assistant", content: "Hi! How can I help you today?" },
{ role: "user", content: "What's the weather like?" }
]

🧠 Understanding Roles in the Response API

Section titled “🧠 Understanding Roles in the Response API”

When sending messages to the OpenAI Response API, each message must include a role. This helps the model understand who is speaking and how to behave in context.

Unlike the Chat Completion API, which uses system, user, and assistant, the Response API introduces a new role called developer.


🧑‍💻 developer – Define app behavior and context

Section titled “🧑‍💻 developer – Define app behavior and context”

Use this at the beginning of a conversation to set expectations for the assistant.

{
role: "developer",
content: "You are integrated into a customer support app. Always be professional and helpful. Ask for a ticket number when there's a complaint. If a user is angry or frustrated, offer to escalate to a human agent."
}

✅ This replaces system from Chat Completions — but with better alignment to your app’s purpose.

Used for the actual input from the end user (your app’s customer or user):

{
role: "user",
content: "I'm having trouble with my order."
}

Used when you want to include or simulate previous AI responses in the thread:

{
role: "assistant",
content: "I'm sorry to hear that. Could you provide your order number so I can help?"
}

Here’s how a real conversation with memory looks:

const buildConversationHistory = (messages) => {
return [
{
role: "developer",
content: "You are a programming tutor. Break down complex concepts into simple steps. Use code examples when helpful."
},
...messages.map(msg => ({
role: msg.isUser ? "user" : "assistant",
content: msg.text
}))
]
}
// Usage
const messages = [
{ text: "Explain JavaScript closures", isUser: true },
{ text: "A closure is a function that has access to variables...", isUser: false },
{ text: "Can you show me an example?", isUser: true }
]
const conversationHistory = buildConversationHistory(messages)

For developer role:

  • Set clear behavior expectations
  • Define the AI’s purpose in your app
  • Include any special instructions or constraints
  • Keep it concise but comprehensive

For conversation flow:

  • Always alternate between user and assistant
  • Never have two consecutive messages with the same role
  • Include all previous messages for full context

Example of proper role sequencing:

// ✅ Good - alternating roles
[
{ role: "developer", content: "You are a helpful assistant." },
{ role: "user", content: "Hello" },
{ role: "assistant", content: "Hi there!" },
{ role: "user", content: "How are you?" },
{ role: "assistant", content: "I'm doing well, thanks!" }
]
// ❌ Bad - consecutive user messages
[
{ role: "user", content: "Hello" },
{ role: "user", content: "How are you?" }, // This breaks the pattern
{ role: "assistant", content: "Hi! I'm well!" }
]

AI models have limits on how much text they can process at once (called the “context window”). For GPT-4:

  • GPT-4o: ~128,000 tokens (about 96,000 words)
  • GPT-4.1: ~200,000 tokens (about 150,000 words)

What this means: Very long conversations might hit this limit, so you need strategies to manage it.

Every message in your conversation history costs tokens. A 10-message conversation might cost 5x more than a single message because you’re sending the history each time.

Cost example:

  • Message 1: 100 tokens
  • Message 2: 200 tokens (100 new + 100 history)
  • Message 3: 300 tokens (100 new + 200 history)
  • Message 4: 400 tokens (100 new + 300 history)

Costs grow quickly in long conversations.

You need to decide where to store conversation history:

  • Frontend only: Simple but lost when page refreshes
  • Backend storage: Database or session storage
  • User accounts: Persistent across devices and sessions

In the next sections, you’ll add memory to your chat by:

  1. Modifying the backend to accept and manage conversation history
  2. Updating the frontend to send full conversation context
  3. Adding memory management to handle long conversations
  4. Implementing conversation storage for persistence

By the end, your AI will remember everything from the conversation and respond naturally to follow-up questions.


You’ll explore different approaches to manage memory:

Store everything in the frontend - perfect for short sessions.

Keep only the last N messages to control costs and context length.

Summarize old parts of the conversation to maintain context while reducing tokens.

Save conversations to a database for long-term storage.


After adding memory, your chat will:

  • ✅ Remember previous messages in the conversation
  • ✅ Handle follow-up questions naturally
  • ✅ Maintain context throughout the session
  • ✅ Feel like talking to a real person
  • ✅ Support complex, multi-turn conversations

Ready to make your AI actually remember things? Let’s start with the backend! 🚀