Skip to content

πŸ“ Summary Memory Implementation

Your simple memory works great for short conversations, but what happens when chats get really long? Costs skyrocket and you might hit token limits! πŸ’Έ

Summary memory solves this by keeping recent messages intact while summarizing older parts of the conversation. This maintains full context while controlling token costs - the best of both worlds.

Building on: This assumes you’ve completed the Simple Memory Implementation. We’ll enhance that code to add intelligent summarization.


Cost Growth with Simple Memory:

Simple Memory (sends everything):
Message 10: [1,2,3,4,5,6,7,8,9,10] β†’ 1,000 tokens
Message 25: [1,2,3...23,24,25] β†’ 2,500 tokens
Message 50: [1,2,3...48,49,50] β†’ 5,000 tokens
Message 100: [1,2,3...98,99,100] β†’ 10,000 tokens
Total Cost: ~175,000 tokens πŸ’ΈπŸ’ΈπŸ’Έ

How Summary Memory Solves This:

Summary Memory (summarize old + keep recent):
Message 25: Create summary of messages 1-15 + keep messages 16-25
Message 50: Update summary (1-35) + keep messages 36-50
Message 100: Update summary (1-85) + keep messages 86-100
Total Cost: ~50,000 tokens πŸ’° (70% savings!)

Visual Comparison:

// Simple Memory: Everything grows
conversationHistory = [msg1, msg2, msg3, ..., msg100] // All 100 messages
// Summary Memory: Smart optimization
summary = "User discussed React app setup, chose Firebase auth, implemented user login..."
recentMessages = [msg85, msg86, msg87, ..., msg100] // Last 15 messages
// Send: summary + recent messages (much more efficient!)

Why Summary Memory is Better:

  • βœ… Keeps context from old messages via summary (unlike sliding window)
  • βœ… Controls costs by limiting total tokens (unlike simple memory)
  • βœ… AI remembers your name from message 1 even at message 100
  • βœ… Stays fast with background summarization

πŸ› οΈ Step 1: Add Summarization Endpoint to Backend

Section titled β€œπŸ› οΈ Step 1: Add Summarization Endpoint to Backend”

Let’s enhance your memory backend by adding a dedicated summary endpoint. We’ll build on your existing streaming implementation.

Add this new endpoint to your index.js, right after your existing chat endpoints:

// πŸ†• SUMMARY MEMORY: Dedicated summarization endpoint
app.post("/api/summarize", async (req, res) => {
try {
const { messages, conversationType = 'general' } = req.body;
if (!messages || messages.length === 0) {
return res.status(400).json({ error: "Messages are required" });
}
// Summary instructions for different conversation types
const summaryInstructions = {
technical: "Create a technical summary focusing on technologies discussed, decisions made, code examples covered, and implementation details. Preserve specific technical context.",
creative: "Summarize the creative process including ideas generated, concepts explored, and creative directions chosen. Maintain the creative flow context.",
support: "Summarize the support conversation including the user's issue, troubleshooting steps attempted, solutions provided, and current status.",
general: "Create a conversational summary capturing key topics, decisions, and important context for continuing the discussion naturally."
};
const instruction = summaryInstructions[conversationType] || summaryInstructions.general;
// Build context-aware message for the AI
const conversationText = messages
.map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
.join('\n\n');
// Add summarization instructions
const contextualMessage = `You are a conversation summarizer. ${instruction} Keep it concise but comprehensive enough to maintain conversation continuity.\n\nConversation to summarize:\n${conversationText}`;
console.log(`Creating summary for ${messages.length} messages`);
// Create response using Response API
const response = await openai.responses.create({
model: "gpt-4o-mini",
input: contextualMessage,
});
// Return results
res.json({
summary: response.output_text,
messagesCount: messages.length,
conversationType: conversationType,
success: true,
});
} catch (error) {
console.error("Summarization Error:", error);
res.status(500).json({
error: "Failed to create summary",
success: false,
});
}
});

What this endpoint does:

  • Creates intelligent summaries of conversation history
  • Handles different conversation types (technical, creative, support, general)
  • Runs separately from chat responses to keep them fast
  • Returns structured data for frontend integration

πŸ”„ Step 2: Enhanced Chat Endpoint for Summary Support

Section titled β€œπŸ”„ Step 2: Enhanced Chat Endpoint for Summary Support”

Update your existing /api/chat/stream endpoint to handle summaries:

// πŸ”„ ENHANCED: Updated streaming endpoint with summary support
app.post("/api/chat/stream", async (req, res) => {
try {
const {
message,
conversationHistory = [],
summary = null,
recentWindowSize = 15
} = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set headers for streaming
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
// πŸ†• SUMMARY MEMORY: Build smart context with summary
let contextualMessage = message;
// If we have a summary, use it + recent messages for context
if (summary && conversationHistory.length > 0) {
const recentMessages = conversationHistory.slice(-recentWindowSize);
const recentContext = recentMessages
.map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
.join('\n');
contextualMessage = `Previous conversation summary:\n${summary}\n\nRecent conversation:\n${recentContext}\n\nCurrent question: ${message}`;
}
// If no summary but we have conversation history, use all of it (Simple Memory fallback)
else if (conversationHistory.length > 0) {
const context = conversationHistory
.map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
.join('\n');
contextualMessage = `Previous conversation:\n${context}\n\nCurrent question: ${message}`;
}
// Create streaming response using Response API
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: contextualMessage,
stream: true,
});
// Stream each chunk to the frontend
for await (const event of stream) {
if (event.type === "response.output_text.delta" && event.delta) {
const textChunk = event.delta.text || event.delta;
res.write(textChunk);
res.flush?.();
}
}
res.end();
} catch (error) {
console.error("Streaming Error:", error);
if (res.headersSent) {
res.write("\n[Error occurred]");
res.end();
} else {
res.status(500).json({ error: "Failed to stream response" });
}
}
});

What the enhanced endpoint does:

  • Accepts summary parameter from frontend
  • Smart context building - uses summary + recent messages when available
  • Fallback support - still works with simple memory if no summary exists
  • Maintains streaming - no performance impact on chat responses

🧠 Step 3: Add Summary State and Logic to Frontend

Section titled β€œπŸ§  Step 3: Add Summary State and Logic to Frontend”

Now let’s enhance your memory frontend to add intelligent summarization.

Add this new state to your component, right after your existing state:

// πŸ†• SUMMARY MEMORY: Summary-specific state
const [summary, setSummary] = useState(null)
const [recentWindowSize, setRecentWindowSize] = useState(15)
const [summaryThreshold, setSummaryThreshold] = useState(25)
const [isCreatingSummary, setIsCreatingSummary] = useState(false)
const [conversationType, setConversationType] = useState('general')

Add these helper functions right after your buildConversationHistory function:

// πŸ†• SUMMARY MEMORY: Detect conversation type automatically
const detectConversationType = (messages) => {
const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) {
return 'technical';
} else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) {
return 'creative';
} else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) {
return 'support';
}
return 'general';
};
// πŸ†• SUMMARY MEMORY: Create summary with intelligent timing
const createSummary = async (messagesToSummarize) => {
if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try {
setIsCreatingSummary(true);
// Detect conversation type for better summaries
const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: messagesToSummarize,
conversationType: detectedType
}),
});
const data = await response.json();
if (data.success) {
setSummary(data.summary);
setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`);
}
} catch (error) {
console.error("Failed to create summary:", error);
} finally {
setIsCreatingSummary(false);
}
};
// πŸ†• SUMMARY MEMORY: Smart summary triggers
const shouldCreateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold && !summary;
};
const shouldUpdateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold * 2 && summary;
};
const isGoodTimeToSummarize = (conversationHistory) => {
const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic
const hasCodeDiscussion = recentMessages.some(msg =>
msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg =>
msg.content.toLowerCase().includes('can you explain') ||
msg.content.toLowerCase().includes('tell me more') ||
msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp;
};
// πŸ†• SUMMARY MEMORY: Calculate memory statistics
const getMemoryStats = () => {
const totalMessages = messages.filter(msg => !msg.isStreaming).length
const recentMessages = Math.min(totalMessages, recentWindowSize)
const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages }
};
// πŸ†• SUMMARY MEMORY: Manual summary trigger
const triggerManualSummary = async () => {
const conversationHistory = buildConversationHistory(messages);
if (conversationHistory.length >= 10) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
await createSummary(messagesToSummarize);
}
};

Find this part of your sendMessage function:

const response = await fetch('http://localhost:8000/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: currentInput,
conversationHistory: conversationHistory
}),
signal: abortControllerRef.current.signal,
})

Replace it with this enhanced version:

// πŸ†• SUMMARY MEMORY: Smart summary timing - happens in background
if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
} else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
}
const response = await fetch('http://localhost:8000/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: currentInput,
conversationHistory: conversationHistory,
summary: summary, // πŸ†• SUMMARY MEMORY: Include summary
recentWindowSize: recentWindowSize // πŸ†• SUMMARY MEMORY: Include window size
}),
signal: abortControllerRef.current.signal,
})

What this change does:

  • Background summarization - doesn’t block chat responses
  • Intelligent timing - waits for natural conversation breaks
  • Sends summary data - includes summary and window size for smart context building

Here’s your complete src/App.jsx with summary memory functionality integrated:

import { useState, useRef } from 'react'
import { Send, Bot, User } from 'lucide-react'
import ReactMarkdown from 'react-markdown'
// Component: Handles message content with markdown formatting
function MessageContent({ message }) {
if (message.isUser) {
return (
<p className="text-sm leading-relaxed whitespace-pre-wrap">
{message.text}
{message.isStreaming && (
<span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" />
)}
</p>
)
}
return (
<div className="text-sm leading-relaxed">
<ReactMarkdown
components={{
h1: ({children}) => <h1 className="text-lg font-bold mb-2 text-slate-800">{children}</h1>,
h2: ({children}) => <h2 className="text-base font-bold mb-2 text-slate-800">{children}</h2>,
h3: ({children}) => <h3 className="text-sm font-bold mb-1 text-slate-800">{children}</h3>,
p: ({children}) => <p className="mb-2 last:mb-0 text-slate-700">{children}</p>,
ul: ({children}) => <ul className="list-disc list-inside mb-2 space-y-1">{children}</ul>,
ol: ({children}) => <ol className="list-decimal list-inside mb-2 space-y-1">{children}</ol>,
li: ({children}) => <li className="text-slate-700">{children}</li>,
code: ({inline, children}) => {
const copyToClipboard = (text) => {
navigator.clipboard.writeText(text)
}
if (inline) {
return (
<code className="bg-slate-100 text-red-600 px-1.5 py-0.5 rounded text-xs font-mono border">
{children}
</code>
)
}
return (
<div className="relative group mb-2">
<code className="block bg-gray-900 text-green-400 p-4 rounded-lg text-xs font-mono overflow-x-auto whitespace-pre border-l-4 border-blue-400 shadow-sm">
{children}
</code>
<button
onClick={() => copyToClipboard(children)}
className="absolute top-2 right-2 bg-slate-600 hover:bg-slate-500 text-white px-2 py-1 rounded text-xs opacity-0 group-hover:opacity-100 transition-opacity"
>
Copy
</button>
</div>
)
},
pre: ({children}) => <div className="mb-2">{children}</div>,
strong: ({children}) => <strong className="font-semibold text-slate-800">{children}</strong>,
em: ({children}) => <em className="italic text-slate-700">{children}</em>,
blockquote: ({children}) => (
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-slate-600 mb-2">
{children}
</blockquote>
),
a: ({href, children}) => (
<a href={href} className="text-blue-600 hover:text-blue-800 underline" target="_blank" rel="noopener noreferrer">
{children}
</a>
),
}}
>
{message.text}
</ReactMarkdown>
{message.isStreaming && (
<span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" />
)}
</div>
)
}
function App() {
// State management
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [isStreaming, setIsStreaming] = useState(false)
const abortControllerRef = useRef(null)
// πŸ†• SUMMARY MEMORY: Summary-specific state
const [summary, setSummary] = useState(null)
const [recentWindowSize, setRecentWindowSize] = useState(15)
const [summaryThreshold, setSummaryThreshold] = useState(25)
const [isCreatingSummary, setIsCreatingSummary] = useState(false)
const [conversationType, setConversationType] = useState('general')
// MEMORY: Function to build conversation history
const buildConversationHistory = (messages) => {
return messages
.filter(msg => !msg.isStreaming)
.map(msg => ({
role: msg.isUser ? "user" : "assistant",
content: msg.text
}));
};
// πŸ†• SUMMARY MEMORY: Detect conversation type automatically
const detectConversationType = (messages) => {
const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) {
return 'technical';
} else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) {
return 'creative';
} else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) {
return 'support';
}
return 'general';
};
// πŸ†• SUMMARY MEMORY: Create summary with intelligent timing
const createSummary = async (messagesToSummarize) => {
if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try {
setIsCreatingSummary(true);
// Detect conversation type for better summaries
const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: messagesToSummarize,
conversationType: detectedType
}),
});
const data = await response.json();
if (data.success) {
setSummary(data.summary);
setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`);
}
} catch (error) {
console.error("Failed to create summary:", error);
} finally {
setIsCreatingSummary(false);
}
};
// πŸ†• SUMMARY MEMORY: Smart summary triggers
const shouldCreateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold && !summary;
};
const shouldUpdateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold * 2 && summary;
};
const isGoodTimeToSummarize = (conversationHistory) => {
const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic
const hasCodeDiscussion = recentMessages.some(msg =>
msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg =>
msg.content.toLowerCase().includes('can you explain') ||
msg.content.toLowerCase().includes('tell me more') ||
msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp;
};
// πŸ†• SUMMARY MEMORY: Calculate memory statistics
const getMemoryStats = () => {
const totalMessages = messages.filter(msg => !msg.isStreaming).length
const recentMessages = Math.min(totalMessages, recentWindowSize)
const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages }
};
// πŸ†• SUMMARY MEMORY: Manual summary trigger
const triggerManualSummary = async () => {
const conversationHistory = buildConversationHistory(messages);
if (conversationHistory.length >= 10) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
await createSummary(messagesToSummarize);
}
};
// Helper functions (same as before)
const createAiPlaceholder = () => {
const aiMessageId = Date.now() + 1
const aiMessage = {
text: "",
isUser: false,
id: aiMessageId,
isStreaming: true,
}
setMessages(prev => [...prev, aiMessage])
return aiMessageId
}
const readStream = async (response, aiMessageId) => {
const reader = response.body.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: msg.text + chunk }
: msg
)
)
}
}
const sendMessage = async () => {
if (!input.trim() || isStreaming) return
const userMessage = { text: input.trim(), isUser: true, id: Date.now() }
setMessages(prev => [...prev, userMessage])
const currentInput = input
setInput('')
setIsStreaming(true)
const aiMessageId = createAiPlaceholder()
try {
// Build conversation history from current messages
const conversationHistory = buildConversationHistory(messages)
// πŸ†• SUMMARY MEMORY: Smart summary timing - happens in background
if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
} else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
}
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: currentInput,
conversationHistory: conversationHistory,
summary: summary, // πŸ†• SUMMARY MEMORY: Include summary
recentWindowSize: recentWindowSize // πŸ†• SUMMARY MEMORY: Include window size
}),
signal: abortControllerRef.current.signal,
})
if (!response.ok) throw new Error('Failed to get response')
await readStream(response, aiMessageId)
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId ? { ...msg, isStreaming: false } : msg
)
)
} catch (error) {
if (error.name !== 'AbortError') {
console.error('Streaming error:', error)
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false }
: msg
)
)
}
} finally {
setIsStreaming(false)
abortControllerRef.current = null
}
}
const stopStreaming = () => {
if (abortControllerRef.current) {
abortControllerRef.current.abort()
}
}
const handleKeyPress = (e) => {
if (e.key === 'Enter' && !e.shiftKey && !isStreaming) {
e.preventDefault()
sendMessage()
}
}
return (
<div className="min-h-screen bg-gradient-to-br from-slate-100 to-blue-50 flex items-center justify-center p-4">
<div className="bg-white rounded-2xl shadow-2xl w-full max-w-2xl h-[700px] flex flex-col overflow-hidden">
{/* Header with summary controls */}
<div className="bg-gradient-to-r from-blue-600 to-indigo-600 text-white p-6">
<div className="flex justify-between items-start">
<div>
<h1 className="text-xl font-bold">⚑ Streaming AI Chat</h1>
<p className="text-blue-100 text-sm">Smart summary memory!</p>
</div>
<div className="text-right space-y-2">
<div>
<label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label>
<input
type="range" min="5" max="30" value={recentWindowSize}
onChange={(e) => setRecentWindowSize(parseInt(e.target.value))}
className="w-20" disabled={isStreaming}
/>
</div>
<div>
<label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label>
<input
type="range" min="15" max="50" value={summaryThreshold}
onChange={(e) => setSummaryThreshold(parseInt(e.target.value))}
className="w-20" disabled={isStreaming}
/>
</div>
<button
onClick={triggerManualSummary}
disabled={isCreatingSummary || messages.length < 10}
className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50"
>
Create Summary Now
</button>
</div>
</div>
</div>
{/* πŸ†• SUMMARY MEMORY: Memory status dashboard */}
<div className="bg-slate-100 px-6 py-3 border-b border-slate-200">
{(() => {
const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();
return (
<div className="space-y-2">
<div className="flex justify-between items-center text-sm">
<div className="flex space-x-4 text-slate-600">
<span>πŸ“Š Total: {totalMessages}</span>
<span>πŸ”₯ Recent: {recentMessages}</span>
{summarizedMessages > 0 && (
<span>πŸ“ Summarized: {summarizedMessages}</span>
)}
<span className="text-blue-600">🧠 Type: {conversationType}</span>
</div>
<div className="flex items-center space-x-2 text-xs">
{summary && (
<span className="text-green-600">βœ… Summary Active</span>
)}
{isCreatingSummary && (
<span className="text-blue-600">πŸ”„ Creating Summary...</span>
)}
</div>
</div>
{/* Memory usage bar */}
<div className="w-full bg-slate-200 rounded-full h-2">
<div
className="bg-blue-500 h-2 rounded-full transition-all duration-300"
style={{
width: `${Math.min(100, (totalMessages / 50) * 100)}%`
}}
/>
</div>
<div className="text-xs text-slate-500 text-center">
Memory usage: {totalMessages}/50 messages before optimization
</div>
</div>
);
})()}
</div>
{/* πŸ†• SUMMARY MEMORY: Active summary display */}
{summary && (
<div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-6 mt-2 rounded">
<div className="flex items-start">
<span className="text-blue-600 mr-2">πŸ“‹</span>
<div className="flex-1">
<p className="text-xs font-medium text-blue-800 mb-1">
Active Summary ({conversationType})
</p>
<p className="text-xs text-blue-700 leading-relaxed">
{summary}
</p>
</div>
</div>
</div>
)}
{/* Messages Area */}
<div className="flex-1 overflow-y-auto p-6 space-y-4 bg-slate-50">
{messages.length === 0 ? (
<div className="text-center text-slate-500 mt-20">
<div className="w-16 h-16 bg-blue-100 rounded-2xl flex items-center justify-center mx-auto mb-4">
<Bot className="w-8 h-8 text-blue-600" />
</div>
<h3 className="text-lg font-semibold text-slate-700 mb-2">
Welcome to Smart Summary Chat!
</h3>
<p className="text-sm">I'll intelligently summarize our conversation to maintain context while controlling costs!</p>
</div>
) : (
messages.map(message => (
<div
key={message.id}
className={`flex items-start space-x-3 ${
message.isUser ? 'justify-end' : 'justify-start'
}`}
>
{!message.isUser && (
<div className="w-8 h-8 bg-gradient-to-r from-blue-500 to-indigo-600 rounded-full flex items-center justify-center flex-shrink-0">
<Bot className="w-4 h-4 text-white" />
</div>
)}
<div
className={`max-w-xs lg:max-w-md px-4 py-3 rounded-2xl ${
message.isUser
? 'bg-gradient-to-r from-blue-600 to-indigo-600 text-white'
: 'bg-white text-slate-800 shadow-sm border border-slate-200'
}`}
>
<MessageContent message={message} />
</div>
{message.isUser && (
<div className="w-8 h-8 bg-gradient-to-r from-slate-400 to-slate-600 rounded-full flex items-center justify-center flex-shrink-0">
<User className="w-4 h-4 text-white" />
</div>
)}
</div>
))
)}
</div>
{/* Input Area */}
<div className="bg-white border-t border-slate-200 p-4">
<div className="flex space-x-3">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={handleKeyPress}
placeholder="Ask anything - I'll maintain full context intelligently..."
disabled={isStreaming}
className="flex-1 border border-slate-300 rounded-xl px-4 py-3 focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:bg-slate-100 transition-all duration-200"
/>
{isStreaming ? (
<button
onClick={stopStreaming}
className="bg-gradient-to-r from-red-500 to-red-600 hover:from-red-600 hover:to-red-700 text-white px-6 py-3 rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg"
>
<span className="w-2 h-2 bg-white rounded-full"></span>
<span className="hidden sm:inline">Stop</span>
</button>
) : (
<button
onClick={sendMessage}
disabled={!input.trim()}
className="bg-gradient-to-r from-blue-600 to-indigo-600 hover:from-blue-700 hover:to-indigo-700 disabled:from-slate-300 disabled:to-slate-300 text-white px-6 py-3 rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg disabled:shadow-none"
>
<Send className="w-4 h-4" />
<span className="hidden sm:inline">Send</span>
</button>
)}
</div>
{isStreaming && (
<div className="mt-3 flex items-center justify-center text-sm text-slate-500">
<div className="flex space-x-1 mr-2">
<div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce"></div>
<div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce" style={{animationDelay: '0.1s'}}></div>
<div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce" style={{animationDelay: '0.2s'}}></div>
</div>
AI is generating response...
</div>
)}
</div>
</div>
</div>
)
}
export default App

What this complete component now includes:

  • βœ… All previous features - Streaming, memory, markdown formatting, copy buttons
  • βœ… Intelligent summarization - Automatically creates summaries to control costs
  • βœ… Context retention - Maintains full conversation context via summaries
  • βœ… Background processing - Chat responses stay instant while summaries are created
  • βœ… Smart timing - Waits for natural conversation breaks to summarize
  • βœ… Visual feedback - Shows summary status and memory optimization
  • βœ… User controls - Adjustable thresholds and manual summary triggers

Start both servers:

Backend:

Terminal window
cd openai-backend
npm run dev

Frontend:

Terminal window
cd openai-frontend
npm run dev

Test with this conversation to see summary memory in action:

  1. Set summary threshold to 10 for faster testing using the slider
  2. Build initial context (Messages 1-8):
You: "Hi! My name is Sarah and I'm 25 years old"
AI: "Nice to meet you, Sarah! It's great to know you're 25."
You: "I'm building a React todo app with Firebase"
AI: "That sounds like a great project! React and Firebase work well together."
You: "I'm using TypeScript and want authentication"
AI: "Excellent choice! TypeScript adds great type safety to React projects."
You: "I work as a frontend developer in New York"
AI: "That's awesome! New York has a great tech scene."
  1. Continue past threshold (Messages 9-12):
You: "I love using modern frameworks and tools"
AI: "Modern frameworks definitely make development more efficient."
You: "What CSS framework should I use?"
AI: "For a React app, you might consider Tailwind CSS or styled-components."
[Watch the memory status - should show summary being created in background]
  1. Test context retention (Message 13+):
You: "What do you remember about me and my project?"
AI: "Based on our conversation, you're Sarah, 25 years old, a frontend developer in New York working on a React todo app with Firebase and TypeScript authentication. You're also considering CSS frameworks like Tailwind."
[Should reference information from early messages via summary!]

What to watch for:

  • Memory indicator shows total vs summarized vs recent messages
  • Summary creation happens automatically at threshold
  • AI maintains context from early messages even after summarization
  • Chat responses stay fast (no waiting for summarization)
  • Visual feedback shows when summary is active and conversation type

50-Message Conversation Cost Analysis:

Without Summary Memory (Simple Memory):

Message 1: [1] β†’ 100 tokens
Message 10: [1,2,3,4,5,6,7,8,9,10] β†’ 1,000 tokens
Message 25: [1,2,3...23,24,25] β†’ 2,500 tokens
Message 50: [1,2,3...48,49,50] β†’ 5,000 tokens
Total Cost: ~125,000 tokens πŸ’ΈπŸ’ΈπŸ’Έ

With Summary Memory (Smart optimization):

Message 1: [1] β†’ 100 tokens
Message 10: [1,2,3,4,5,6,7,8,9,10] β†’ 1,000 tokens
Message 25: [summary] + [16,17,18,19,20,21,22,23,24,25] β†’ ~1,200 tokens
Message 50: [updated summary] + [36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] β†’ ~1,200 tokens
Total Cost: ~40,000 tokens πŸ’° (70% savings!)

Summary Memory vs Simple Memory:

  • βœ… Summary: Maintains full context via summaries (knows your name from message 1)
  • ❌ Simple: Costs grow exponentially with conversation length

Summary Memory vs Sliding Window:

  • βœ… Summary: Remembers important context from early messages
  • ❌ Sliding Window: Completely forgets everything outside the window

Summary Memory: Best of Both Worlds:

// Perfect balance of context retention + cost control
contextualMessage = summary + recentMessages + currentMessage
// Full context + Manageable cost + Fast responses

❌ AI doesn’t maintain context from early messages

  • Check that summary is being created and included in requests
  • Verify the /api/summarize endpoint is working
  • Look at browser network tab to confirm summary is being sent

❌ Summary creation is slow or fails

  • Check backend console for summarization errors
  • Verify OpenAI API key has sufficient credits
  • Make sure conversation type detection is working

❌ Chat responses become slow

  • Ensure summary creation happens in background (no await)
  • Check that timing logic prevents summarization during complex topics

❌ Summary is too brief or loses important context

  • Adjust conversation type detection logic
  • Modify summary instructions for your specific use case
  • Increase the recentWindowSize to keep more detailed messages

When Summary Memory Works Best:

  • βœ… Long conversations (25+ messages)
  • βœ… Cost-sensitive applications with high conversation volume
  • βœ… Complex discussions requiring full context retention
  • βœ… Production applications needing predictable costs

Limitations:

  • Summary quality depends on AI’s ability to identify important context
  • Some nuance may be lost in the summarization process
  • Additional complexity compared to simple memory approaches

Best Practices:

  • Monitor summary quality and adjust conversation type detection
  • Test with different thresholds to find optimal balance
  • Provide feedback mechanisms for users to report context loss
  • Consider manual summary review for critical applications

Outstanding work! πŸŽ‰ You’ve implemented the most sophisticated memory system possible for AI chat applications.

What you’ve accomplished:

  • πŸ“ Intelligent summarization - Automatically creates context-preserving summaries
  • 🧠 Full context retention - Never loses important conversation details
  • πŸ’° Cost optimization - Up to 70% savings on long conversations
  • ⚑ Background processing - Chat responses stay instant
  • 🎯 Smart timing - Summarizes at natural conversation breaks
  • πŸ“Š Visual feedback - Real-time memory optimization indicators

You now understand:

  • πŸ”„ Advanced memory strategies - When and how to use different approaches
  • πŸ€– AI orchestration - Background processing and intelligent timing
  • πŸ’Έ Token economics - Balancing context retention with cost control
  • 🎨 User experience - Transparent memory management with visual feedback

Your chat now provides the ultimate AI conversation experience - unlimited conversation length with full context retention and predictable costs. This is the same memory strategy used by production AI applications!

πŸ‘‰ Next: Persistent Memory & Database Storage - Let’s explore memory that survives page refreshes and user sessions!