π Summary Memory Implementation
Your simple memory works great for short conversations, but what happens when chats get really long? Costs skyrocket and you might hit token limits! πΈ
Summary memory solves this by keeping recent messages intact while summarizing older parts of the conversation. This maintains full context while controlling token costs - the best of both worlds.
Building on: This assumes youβve completed the Simple Memory Implementation. Weβll enhance that code to add intelligent summarization.
π― The Problem with Simple Memory
Section titled βπ― The Problem with Simple MemoryβCost Growth with Simple Memory:
Simple Memory (sends everything):Message 10: [1,2,3,4,5,6,7,8,9,10] β 1,000 tokensMessage 25: [1,2,3...23,24,25] β 2,500 tokensMessage 50: [1,2,3...48,49,50] β 5,000 tokensMessage 100: [1,2,3...98,99,100] β 10,000 tokens
Total Cost: ~175,000 tokens πΈπΈπΈ
How Summary Memory Solves This:
Summary Memory (summarize old + keep recent):Message 25: Create summary of messages 1-15 + keep messages 16-25Message 50: Update summary (1-35) + keep messages 36-50Message 100: Update summary (1-85) + keep messages 86-100
Total Cost: ~50,000 tokens π° (70% savings!)
Visual Comparison:
// Simple Memory: Everything growsconversationHistory = [msg1, msg2, msg3, ..., msg100] // All 100 messages
// Summary Memory: Smart optimizationsummary = "User discussed React app setup, chose Firebase auth, implemented user login..."recentMessages = [msg85, msg86, msg87, ..., msg100] // Last 15 messages// Send: summary + recent messages (much more efficient!)
Why Summary Memory is Better:
- β Keeps context from old messages via summary (unlike sliding window)
- β Controls costs by limiting total tokens (unlike simple memory)
- β AI remembers your name from message 1 even at message 100
- β Stays fast with background summarization
π οΈ Step 1: Add Summarization Endpoint to Backend
Section titled βπ οΈ Step 1: Add Summarization Endpoint to BackendβLetβs enhance your memory backend by adding a dedicated summary endpoint. Weβll build on your existing streaming implementation.
Add this new endpoint to your index.js
, right after your existing chat endpoints:
// π SUMMARY MEMORY: Dedicated summarization endpointapp.post("/api/summarize", async (req, res) => { try { const { messages, conversationType = 'general' } = req.body;
if (!messages || messages.length === 0) { return res.status(400).json({ error: "Messages are required" }); }
// Summary instructions for different conversation types const summaryInstructions = { technical: "Create a technical summary focusing on technologies discussed, decisions made, code examples covered, and implementation details. Preserve specific technical context.", creative: "Summarize the creative process including ideas generated, concepts explored, and creative directions chosen. Maintain the creative flow context.", support: "Summarize the support conversation including the user's issue, troubleshooting steps attempted, solutions provided, and current status.", general: "Create a conversational summary capturing key topics, decisions, and important context for continuing the discussion naturally." };
const instruction = summaryInstructions[conversationType] || summaryInstructions.general;
// Build context-aware message for the AI const conversationText = messages .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n\n');
// Add summarization instructions const contextualMessage = `You are a conversation summarizer. ${instruction} Keep it concise but comprehensive enough to maintain conversation continuity.\n\nConversation to summarize:\n${conversationText}`;
console.log(`Creating summary for ${messages.length} messages`);
// Create response using Response API const response = await openai.responses.create({ model: "gpt-4o-mini", input: contextualMessage, });
// Return results res.json({ summary: response.output_text, messagesCount: messages.length, conversationType: conversationType, success: true, });
} catch (error) { console.error("Summarization Error:", error); res.status(500).json({ error: "Failed to create summary", success: false, }); }});
What this endpoint does:
- Creates intelligent summaries of conversation history
- Handles different conversation types (technical, creative, support, general)
- Runs separately from chat responses to keep them fast
- Returns structured data for frontend integration
π Step 2: Enhanced Chat Endpoint for Summary Support
Section titled βπ Step 2: Enhanced Chat Endpoint for Summary SupportβUpdate your existing /api/chat/stream
endpoint to handle summaries:
// π ENHANCED: Updated streaming endpoint with summary supportapp.post("/api/chat/stream", async (req, res) => { try { const { message, conversationHistory = [], summary = null, recentWindowSize = 15 } = req.body;
if (!message) { return res.status(400).json({ error: "Message is required" }); }
// Set headers for streaming res.writeHead(200, { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', });
// π SUMMARY MEMORY: Build smart context with summary let contextualMessage = message;
// If we have a summary, use it + recent messages for context if (summary && conversationHistory.length > 0) { const recentMessages = conversationHistory.slice(-recentWindowSize); const recentContext = recentMessages .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n');
contextualMessage = `Previous conversation summary:\n${summary}\n\nRecent conversation:\n${recentContext}\n\nCurrent question: ${message}`; } // If no summary but we have conversation history, use all of it (Simple Memory fallback) else if (conversationHistory.length > 0) { const context = conversationHistory .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n');
contextualMessage = `Previous conversation:\n${context}\n\nCurrent question: ${message}`; }
// Create streaming response using Response API const stream = await openai.responses.create({ model: "gpt-4o-mini", input: contextualMessage, stream: true, });
// Stream each chunk to the frontend for await (const event of stream) { if (event.type === "response.output_text.delta" && event.delta) { const textChunk = event.delta.text || event.delta; res.write(textChunk); res.flush?.(); } }
res.end();
} catch (error) { console.error("Streaming Error:", error);
if (res.headersSent) { res.write("\n[Error occurred]"); res.end(); } else { res.status(500).json({ error: "Failed to stream response" }); } }});
What the enhanced endpoint does:
- Accepts summary parameter from frontend
- Smart context building - uses summary + recent messages when available
- Fallback support - still works with simple memory if no summary exists
- Maintains streaming - no performance impact on chat responses
π§ Step 3: Add Summary State and Logic to Frontend
Section titled βπ§ Step 3: Add Summary State and Logic to FrontendβNow letβs enhance your memory frontend to add intelligent summarization.
Add this new state to your component, right after your existing state:
// π SUMMARY MEMORY: Summary-specific stateconst [summary, setSummary] = useState(null)const [recentWindowSize, setRecentWindowSize] = useState(15)const [summaryThreshold, setSummaryThreshold] = useState(25)const [isCreatingSummary, setIsCreatingSummary] = useState(false)const [conversationType, setConversationType] = useState('general')
Add these helper functions right after your buildConversationHistory
function:
// π SUMMARY MEMORY: Detect conversation type automaticallyconst detectConversationType = (messages) => { const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) { return 'technical'; } else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) { return 'creative'; } else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) { return 'support'; } return 'general';};
// π SUMMARY MEMORY: Create summary with intelligent timingconst createSummary = async (messagesToSummarize) => { if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try { setIsCreatingSummary(true);
// Detect conversation type for better summaries const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: messagesToSummarize, conversationType: detectedType }), });
const data = await response.json();
if (data.success) { setSummary(data.summary); setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`); } } catch (error) { console.error("Failed to create summary:", error); } finally { setIsCreatingSummary(false); }};
// π SUMMARY MEMORY: Smart summary triggersconst shouldCreateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold && !summary;};
const shouldUpdateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold * 2 && summary;};
const isGoodTimeToSummarize = (conversationHistory) => { const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic const hasCodeDiscussion = recentMessages.some(msg => msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg => msg.content.toLowerCase().includes('can you explain') || msg.content.toLowerCase().includes('tell me more') || msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp;};
// π SUMMARY MEMORY: Calculate memory statisticsconst getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const recentMessages = Math.min(totalMessages, recentWindowSize) const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages }};
// π SUMMARY MEMORY: Manual summary triggerconst triggerManualSummary = async () => { const conversationHistory = buildConversationHistory(messages); if (conversationHistory.length >= 10) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); await createSummary(messagesToSummarize); }};
π Step 4: Update Your Send Message Function
Section titled βπ Step 4: Update Your Send Message FunctionβFind this part of your sendMessage
function:
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory }), signal: abortControllerRef.current.signal,})
Replace it with this enhanced version:
// π SUMMARY MEMORY: Smart summary timing - happens in backgroundif (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process} else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process}
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, summary: summary, // π SUMMARY MEMORY: Include summary recentWindowSize: recentWindowSize // π SUMMARY MEMORY: Include window size }), signal: abortControllerRef.current.signal,})
What this change does:
- Background summarization - doesnβt block chat responses
- Intelligent timing - waits for natural conversation breaks
- Sends summary data - includes summary and window size for smart context building
π Step 5: Your Complete Updated Frontend
Section titled βπ Step 5: Your Complete Updated FrontendβHereβs your complete src/App.jsx
with summary memory functionality integrated:
import { useState, useRef } from 'react'import { Send, Bot, User } from 'lucide-react'import ReactMarkdown from 'react-markdown'
// Component: Handles message content with markdown formattingfunction MessageContent({ message }) { if (message.isUser) { return ( <p className="text-sm leading-relaxed whitespace-pre-wrap"> {message.text} {message.isStreaming && ( <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" /> )} </p> ) }
return ( <div className="text-sm leading-relaxed"> <ReactMarkdown components={{ h1: ({children}) => <h1 className="text-lg font-bold mb-2 text-slate-800">{children}</h1>, h2: ({children}) => <h2 className="text-base font-bold mb-2 text-slate-800">{children}</h2>, h3: ({children}) => <h3 className="text-sm font-bold mb-1 text-slate-800">{children}</h3>, p: ({children}) => <p className="mb-2 last:mb-0 text-slate-700">{children}</p>, ul: ({children}) => <ul className="list-disc list-inside mb-2 space-y-1">{children}</ul>, ol: ({children}) => <ol className="list-decimal list-inside mb-2 space-y-1">{children}</ol>, li: ({children}) => <li className="text-slate-700">{children}</li>, code: ({inline, children}) => { const copyToClipboard = (text) => { navigator.clipboard.writeText(text) }
if (inline) { return ( <code className="bg-slate-100 text-red-600 px-1.5 py-0.5 rounded text-xs font-mono border"> {children} </code> ) }
return ( <div className="relative group mb-2"> <code className="block bg-gray-900 text-green-400 p-4 rounded-lg text-xs font-mono overflow-x-auto whitespace-pre border-l-4 border-blue-400 shadow-sm"> {children} </code> <button onClick={() => copyToClipboard(children)} className="absolute top-2 right-2 bg-slate-600 hover:bg-slate-500 text-white px-2 py-1 rounded text-xs opacity-0 group-hover:opacity-100 transition-opacity" > Copy </button> </div> ) }, pre: ({children}) => <div className="mb-2">{children}</div>, strong: ({children}) => <strong className="font-semibold text-slate-800">{children}</strong>, em: ({children}) => <em className="italic text-slate-700">{children}</em>, blockquote: ({children}) => ( <blockquote className="border-l-4 border-blue-200 pl-4 italic text-slate-600 mb-2"> {children} </blockquote> ), a: ({href, children}) => ( <a href={href} className="text-blue-600 hover:text-blue-800 underline" target="_blank" rel="noopener noreferrer"> {children} </a> ), }} > {message.text} </ReactMarkdown> {message.isStreaming && ( <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" /> )} </div> )}
function App() { // State management const [messages, setMessages] = useState([]) const [input, setInput] = useState('') const [isStreaming, setIsStreaming] = useState(false) const abortControllerRef = useRef(null)
// π SUMMARY MEMORY: Summary-specific state const [summary, setSummary] = useState(null) const [recentWindowSize, setRecentWindowSize] = useState(15) const [summaryThreshold, setSummaryThreshold] = useState(25) const [isCreatingSummary, setIsCreatingSummary] = useState(false) const [conversationType, setConversationType] = useState('general')
// MEMORY: Function to build conversation history const buildConversationHistory = (messages) => { return messages .filter(msg => !msg.isStreaming) .map(msg => ({ role: msg.isUser ? "user" : "assistant", content: msg.text })); };
// π SUMMARY MEMORY: Detect conversation type automatically const detectConversationType = (messages) => { const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) { return 'technical'; } else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) { return 'creative'; } else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) { return 'support'; } return 'general'; };
// π SUMMARY MEMORY: Create summary with intelligent timing const createSummary = async (messagesToSummarize) => { if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try { setIsCreatingSummary(true);
// Detect conversation type for better summaries const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: messagesToSummarize, conversationType: detectedType }), });
const data = await response.json();
if (data.success) { setSummary(data.summary); setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`); } } catch (error) { console.error("Failed to create summary:", error); } finally { setIsCreatingSummary(false); } };
// π SUMMARY MEMORY: Smart summary triggers const shouldCreateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold && !summary; };
const shouldUpdateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold * 2 && summary; };
const isGoodTimeToSummarize = (conversationHistory) => { const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic const hasCodeDiscussion = recentMessages.some(msg => msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg => msg.content.toLowerCase().includes('can you explain') || msg.content.toLowerCase().includes('tell me more') || msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp; };
// π SUMMARY MEMORY: Calculate memory statistics const getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const recentMessages = Math.min(totalMessages, recentWindowSize) const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages } };
// π SUMMARY MEMORY: Manual summary trigger const triggerManualSummary = async () => { const conversationHistory = buildConversationHistory(messages); if (conversationHistory.length >= 10) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); await createSummary(messagesToSummarize); } };
// Helper functions (same as before) const createAiPlaceholder = () => { const aiMessageId = Date.now() + 1 const aiMessage = { text: "", isUser: false, id: aiMessageId, isStreaming: true, } setMessages(prev => [...prev, aiMessage]) return aiMessageId }
const readStream = async (response, aiMessageId) => { const reader = response.body.getReader() const decoder = new TextDecoder()
while (true) { const { done, value } = await reader.read() if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: msg.text + chunk } : msg ) ) } }
const sendMessage = async () => { if (!input.trim() || isStreaming) return
const userMessage = { text: input.trim(), isUser: true, id: Date.now() } setMessages(prev => [...prev, userMessage])
const currentInput = input setInput('') setIsStreaming(true) const aiMessageId = createAiPlaceholder()
try { // Build conversation history from current messages const conversationHistory = buildConversationHistory(messages)
// π SUMMARY MEMORY: Smart summary timing - happens in background if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process } else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process }
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, summary: summary, // π SUMMARY MEMORY: Include summary recentWindowSize: recentWindowSize // π SUMMARY MEMORY: Include window size }), signal: abortControllerRef.current.signal, })
if (!response.ok) throw new Error('Failed to get response')
await readStream(response, aiMessageId)
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, isStreaming: false } : msg ) )
} catch (error) { if (error.name !== 'AbortError') { console.error('Streaming error:', error) setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false } : msg ) ) } } finally { setIsStreaming(false) abortControllerRef.current = null } }
const stopStreaming = () => { if (abortControllerRef.current) { abortControllerRef.current.abort() } }
const handleKeyPress = (e) => { if (e.key === 'Enter' && !e.shiftKey && !isStreaming) { e.preventDefault() sendMessage() } }
return ( <div className="min-h-screen bg-gradient-to-br from-slate-100 to-blue-50 flex items-center justify-center p-4"> <div className="bg-white rounded-2xl shadow-2xl w-full max-w-2xl h-[700px] flex flex-col overflow-hidden">
{/* Header with summary controls */} <div className="bg-gradient-to-r from-blue-600 to-indigo-600 text-white p-6"> <div className="flex justify-between items-start"> <div> <h1 className="text-xl font-bold">β‘ Streaming AI Chat</h1> <p className="text-blue-100 text-sm">Smart summary memory!</p> </div>
<div className="text-right space-y-2"> <div> <label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label> <input type="range" min="5" max="30" value={recentWindowSize} onChange={(e) => setRecentWindowSize(parseInt(e.target.value))} className="w-20" disabled={isStreaming} /> </div> <div> <label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label> <input type="range" min="15" max="50" value={summaryThreshold} onChange={(e) => setSummaryThreshold(parseInt(e.target.value))} className="w-20" disabled={isStreaming} /> </div> <button onClick={triggerManualSummary} disabled={isCreatingSummary || messages.length < 10} className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50" > Create Summary Now </button> </div> </div> </div>
{/* π SUMMARY MEMORY: Memory status dashboard */} <div className="bg-slate-100 px-6 py-3 border-b border-slate-200"> {(() => { const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();
return ( <div className="space-y-2"> <div className="flex justify-between items-center text-sm"> <div className="flex space-x-4 text-slate-600"> <span>π Total: {totalMessages}</span> <span>π₯ Recent: {recentMessages}</span> {summarizedMessages > 0 && ( <span>π Summarized: {summarizedMessages}</span> )} <span className="text-blue-600">π§ Type: {conversationType}</span> </div>
<div className="flex items-center space-x-2 text-xs"> {summary && ( <span className="text-green-600">β
Summary Active</span> )} {isCreatingSummary && ( <span className="text-blue-600">π Creating Summary...</span> )} </div> </div>
{/* Memory usage bar */} <div className="w-full bg-slate-200 rounded-full h-2"> <div className="bg-blue-500 h-2 rounded-full transition-all duration-300" style={{ width: `${Math.min(100, (totalMessages / 50) * 100)}%` }} /> </div> <div className="text-xs text-slate-500 text-center"> Memory usage: {totalMessages}/50 messages before optimization </div> </div> ); })()} </div>
{/* π SUMMARY MEMORY: Active summary display */} {summary && ( <div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-6 mt-2 rounded"> <div className="flex items-start"> <span className="text-blue-600 mr-2">π</span> <div className="flex-1"> <p className="text-xs font-medium text-blue-800 mb-1"> Active Summary ({conversationType}) </p> <p className="text-xs text-blue-700 leading-relaxed"> {summary} </p> </div> </div> </div> )}
{/* Messages Area */} <div className="flex-1 overflow-y-auto p-6 space-y-4 bg-slate-50"> {messages.length === 0 ? ( <div className="text-center text-slate-500 mt-20"> <div className="w-16 h-16 bg-blue-100 rounded-2xl flex items-center justify-center mx-auto mb-4"> <Bot className="w-8 h-8 text-blue-600" /> </div> <h3 className="text-lg font-semibold text-slate-700 mb-2"> Welcome to Smart Summary Chat! </h3> <p className="text-sm">I'll intelligently summarize our conversation to maintain context while controlling costs!</p> </div> ) : ( messages.map(message => ( <div key={message.id} className={`flex items-start space-x-3 ${ message.isUser ? 'justify-end' : 'justify-start' }`} > {!message.isUser && ( <div className="w-8 h-8 bg-gradient-to-r from-blue-500 to-indigo-600 rounded-full flex items-center justify-center flex-shrink-0"> <Bot className="w-4 h-4 text-white" /> </div> )}
<div className={`max-w-xs lg:max-w-md px-4 py-3 rounded-2xl ${ message.isUser ? 'bg-gradient-to-r from-blue-600 to-indigo-600 text-white' : 'bg-white text-slate-800 shadow-sm border border-slate-200' }`} > <MessageContent message={message} /> </div>
{message.isUser && ( <div className="w-8 h-8 bg-gradient-to-r from-slate-400 to-slate-600 rounded-full flex items-center justify-center flex-shrink-0"> <User className="w-4 h-4 text-white" /> </div> )} </div> )) )} </div>
{/* Input Area */} <div className="bg-white border-t border-slate-200 p-4"> <div className="flex space-x-3"> <input type="text" value={input} onChange={(e) => setInput(e.target.value)} onKeyPress={handleKeyPress} placeholder="Ask anything - I'll maintain full context intelligently..." disabled={isStreaming} className="flex-1 border border-slate-300 rounded-xl px-4 py-3 focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:bg-slate-100 transition-all duration-200" />
{isStreaming ? ( <button onClick={stopStreaming} className="bg-gradient-to-r from-red-500 to-red-600 hover:from-red-600 hover:to-red-700 text-white px-6 py-3 rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg" > <span className="w-2 h-2 bg-white rounded-full"></span> <span className="hidden sm:inline">Stop</span> </button> ) : ( <button onClick={sendMessage} disabled={!input.trim()} className="bg-gradient-to-r from-blue-600 to-indigo-600 hover:from-blue-700 hover:to-indigo-700 disabled:from-slate-300 disabled:to-slate-300 text-white px-6 py-3 rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg disabled:shadow-none" > <Send className="w-4 h-4" /> <span className="hidden sm:inline">Send</span> </button> )} </div>
{isStreaming && ( <div className="mt-3 flex items-center justify-center text-sm text-slate-500"> <div className="flex space-x-1 mr-2"> <div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce"></div> <div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce" style={{animationDelay: '0.1s'}}></div> <div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce" style={{animationDelay: '0.2s'}}></div> </div> AI is generating response... </div> )} </div> </div> </div> )}
export default App
What this complete component now includes:
- β All previous features - Streaming, memory, markdown formatting, copy buttons
- β Intelligent summarization - Automatically creates summaries to control costs
- β Context retention - Maintains full conversation context via summaries
- β Background processing - Chat responses stay instant while summaries are created
- β Smart timing - Waits for natural conversation breaks to summarize
- β Visual feedback - Shows summary status and memory optimization
- β User controls - Adjustable thresholds and manual summary triggers
π§ͺ Step 6: Test Your Summary Memory
Section titled βπ§ͺ Step 6: Test Your Summary MemoryβStart both servers:
Backend:
cd openai-backendnpm run dev
Frontend:
cd openai-frontendnpm run dev
Test with this conversation to see summary memory in action:
- Set summary threshold to 10 for faster testing using the slider
- Build initial context (Messages 1-8):
You: "Hi! My name is Sarah and I'm 25 years old"AI: "Nice to meet you, Sarah! It's great to know you're 25."
You: "I'm building a React todo app with Firebase"AI: "That sounds like a great project! React and Firebase work well together."
You: "I'm using TypeScript and want authentication"AI: "Excellent choice! TypeScript adds great type safety to React projects."
You: "I work as a frontend developer in New York"AI: "That's awesome! New York has a great tech scene."
- Continue past threshold (Messages 9-12):
You: "I love using modern frameworks and tools"AI: "Modern frameworks definitely make development more efficient."
You: "What CSS framework should I use?"AI: "For a React app, you might consider Tailwind CSS or styled-components."
[Watch the memory status - should show summary being created in background]
- Test context retention (Message 13+):
You: "What do you remember about me and my project?"AI: "Based on our conversation, you're Sarah, 25 years old, a frontend developer in New York working on a React todo app with Firebase and TypeScript authentication. You're also considering CSS frameworks like Tailwind."
[Should reference information from early messages via summary!]
What to watch for:
- Memory indicator shows total vs summarized vs recent messages
- Summary creation happens automatically at threshold
- AI maintains context from early messages even after summarization
- Chat responses stay fast (no waiting for summarization)
- Visual feedback shows when summary is active and conversation type
π° Real Cost Impact
Section titled βπ° Real Cost Impactβ50-Message Conversation Cost Analysis:
Without Summary Memory (Simple Memory):
Message 1: [1] β 100 tokensMessage 10: [1,2,3,4,5,6,7,8,9,10] β 1,000 tokensMessage 25: [1,2,3...23,24,25] β 2,500 tokensMessage 50: [1,2,3...48,49,50] β 5,000 tokens
Total Cost: ~125,000 tokens πΈπΈπΈ
With Summary Memory (Smart optimization):
Message 1: [1] β 100 tokensMessage 10: [1,2,3,4,5,6,7,8,9,10] β 1,000 tokensMessage 25: [summary] + [16,17,18,19,20,21,22,23,24,25] β ~1,200 tokensMessage 50: [updated summary] + [36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] β ~1,200 tokens
Total Cost: ~40,000 tokens π° (70% savings!)
π― Summary Memory vs Other Approaches
Section titled βπ― Summary Memory vs Other ApproachesβSummary Memory vs Simple Memory:
- β Summary: Maintains full context via summaries (knows your name from message 1)
- β Simple: Costs grow exponentially with conversation length
Summary Memory vs Sliding Window:
- β Summary: Remembers important context from early messages
- β Sliding Window: Completely forgets everything outside the window
Summary Memory: Best of Both Worlds:
// Perfect balance of context retention + cost controlcontextualMessage = summary + recentMessages + currentMessage// Full context + Manageable cost + Fast responses
π§ Common Issues & Solutions
Section titled βπ§ Common Issues & Solutionsββ AI doesnβt maintain context from early messages
- Check that summary is being created and included in requests
- Verify the
/api/summarize
endpoint is working - Look at browser network tab to confirm summary is being sent
β Summary creation is slow or fails
- Check backend console for summarization errors
- Verify OpenAI API key has sufficient credits
- Make sure conversation type detection is working
β Chat responses become slow
- Ensure summary creation happens in background (no
await
) - Check that timing logic prevents summarization during complex topics
β Summary is too brief or loses important context
- Adjust conversation type detection logic
- Modify summary instructions for your specific use case
- Increase the recentWindowSize to keep more detailed messages
β οΈ Summary Memory Considerations
Section titled ββ οΈ Summary Memory ConsiderationsβWhen Summary Memory Works Best:
- β Long conversations (25+ messages)
- β Cost-sensitive applications with high conversation volume
- β Complex discussions requiring full context retention
- β Production applications needing predictable costs
Limitations:
- Summary quality depends on AIβs ability to identify important context
- Some nuance may be lost in the summarization process
- Additional complexity compared to simple memory approaches
Best Practices:
- Monitor summary quality and adjust conversation type detection
- Test with different thresholds to find optimal balance
- Provide feedback mechanisms for users to report context loss
- Consider manual summary review for critical applications
β¨ Lesson Recap
Section titled ββ¨ Lesson RecapβOutstanding work! π Youβve implemented the most sophisticated memory system possible for AI chat applications.
What youβve accomplished:
- π Intelligent summarization - Automatically creates context-preserving summaries
- π§ Full context retention - Never loses important conversation details
- π° Cost optimization - Up to 70% savings on long conversations
- β‘ Background processing - Chat responses stay instant
- π― Smart timing - Summarizes at natural conversation breaks
- π Visual feedback - Real-time memory optimization indicators
You now understand:
- π Advanced memory strategies - When and how to use different approaches
- π€ AI orchestration - Background processing and intelligent timing
- πΈ Token economics - Balancing context retention with cost control
- π¨ User experience - Transparent memory management with visual feedback
Your chat now provides the ultimate AI conversation experience - unlimited conversation length with full context retention and predictable costs. This is the same memory strategy used by production AI applications!
π Next: Persistent Memory & Database Storage - Letβs explore memory that survives page refreshes and user sessions!