📝 Summary Memory Implementation
Sliding window memory works well but has a problem - it completely forgets older context. Summary memory solves this by keeping recent messages intact while summarizing older parts of the conversation. This maintains context while controlling token costs.
Building on: This guide assumes you’ve completed the Simple Memory Implementation. We’ll enhance that code to add intelligent summarization.
🎯 The Problem with Simple Memory
Section titled “🎯 The Problem with Simple Memory”Cost Growth with Simple Memory
Section titled “Cost Growth with Simple Memory”Simple Memory (sends everything):Message 10: [1,2,3,4,5,6,7,8,9,10] → 1,000 tokensMessage 25: [1,2,3...23,24,25] → 2,500 tokensMessage 50: [1,2,3...48,49,50] → 5,000 tokensMessage 100: [1,2,3...98,99,100] → 10,000 tokens
Total Cost: ~175,000 tokens 💸💸💸
How Summary Memory Solves This
Section titled “How Summary Memory Solves This”Summary Memory (summarize old + keep recent):Message 25: Create summary of messages 1-15 + keep messages 16-25Message 50: Update summary (1-35) + keep messages 36-50Message 100: Update summary (1-85) + keep messages 86-100
Total Cost: ~50,000 tokens 💰 (70% savings!)
Visual Comparison
Section titled “Visual Comparison”// Simple Memory: Everything growsconversationHistory = [msg1, msg2, msg3, ..., msg100] // All 100 messages
// Summary Memory: Smart optimizationsummary = "User discussed React app setup, chose Firebase auth, implemented user login..."recentMessages = [msg85, msg86, msg87, ..., msg100] // Last 15 messages// Send: summary + recent messages (much more efficient!)
Why Summary Memory is Better Than Alternatives
Section titled “Why Summary Memory is Better Than Alternatives”Compared to Sliding Window Memory (which you might consider):
- ✅ Summary Memory: Keeps context from old messages via summary
- ❌ Sliding Window: Completely forgets old messages
- ✅ Summary Memory: AI remembers your name from message 1 even at message 100
- ❌ Sliding Window: AI forgets your name after window size is exceeded
Best of Both Worlds:
// Summary Memory = Context Retention + Cost Controlsummary + recentMessages = Full context + Manageable cost
// Simple Memory = Context but ExpensiveallMessages = Full context + Growing cost
// Sliding Window = Cheap but ForgetsrecentMessages = Limited context + Fixed cost
🤔 Why We Need a Separate Summary Endpoint
Section titled “🤔 Why We Need a Separate Summary Endpoint”The Problem with Inline Summarization
Section titled “The Problem with Inline Summarization”You might think: “Why not just summarize during the chat response?” Here’s why that’s a bad idea:
Bad approach (summarizing inline):
app.post("/api/chat/stream", async (req, res) => { // 1. User sends message // 2. Check if we need summary // 3. Create summary (takes 3-5 seconds) ❌ // 4. Then respond to user (another 3-5 seconds) ❌ // Total: 6-10 seconds of waiting!})
What users experience:
You: "How do I deploy this app?"[Wait... wait... wait... 8 seconds later...]AI: "For deployment, you can use..."
The Solution: Dedicated Summary Endpoint
Section titled “The Solution: Dedicated Summary Endpoint”Good approach (separate endpoints):
// Summarization happens separately and strategicallyapp.post("/api/summarize", async (req, res) => { // Just creates summary and returns it})
// Chat responses stay fastapp.post("/api/chat/stream", async (req, res) => { // Uses existing summary + responds immediately})
What users experience:
You: "How do I deploy this app?"AI: "For deployment, you can use..." [instant response][Summary created quietly in background when needed]
📋 When to Trigger Summary Creation
Section titled “📋 When to Trigger Summary Creation”We don’t summarize randomly. We use strategic triggers based on conversation length and natural breaks.
Trigger Strategy 1: Message Count Thresholds
Section titled “Trigger Strategy 1: Message Count Thresholds”const shouldCreateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold && !summary;};
const shouldUpdateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold * 2 && summary;};
Timeline example:
- Messages 1-24: Simple memory (send all messages)
- Message 25: Create first summary (summarize messages 1-10, keep 11-25 detailed)
- Messages 26-49: Use summary + recent messages
- Message 50: Update summary (summarize messages 1-35, keep 36-50 detailed)
Trigger Strategy 2: Natural Conversation Breaks
Section titled “Trigger Strategy 2: Natural Conversation Breaks”const isGoodTimeToSummarize = (conversationHistory) => { const recentMessages = conversationHistory.slice(-5);
// Don't summarize during complex topics const hasCodeBlocks = recentMessages.some(msg => msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUps = recentMessages.some(msg => msg.content.toLowerCase().includes('can you explain') || msg.content.toLowerCase().includes('tell me more'));
// Wait for natural break if in middle of complex topic if (hasCodeBlocks || hasFollowUps) { return false; // Wait for better timing }
return true; // Good time to summarize};
🛠️ Step 1: Build the Summarization Endpoint
Section titled “🛠️ Step 1: Build the Summarization Endpoint”Let’s enhance your backend by adding a dedicated summary endpoint. We’ll build on your existing streaming implementation.
Understanding the Backend Enhancement
Section titled “Understanding the Backend Enhancement”Your current Simple Memory backend accepts conversation history and sends it all to OpenAI. We’ll add:
- New summarization endpoint for creating summaries
- Enhanced chat endpoint that uses summaries + recent messages
- Smart context building that combines both efficiently
Step 1a: Add the Summarization Endpoint
Section titled “Step 1a: Add the Summarization Endpoint”Add this new endpoint to your backend index.js
, right after your existing chat endpoints:
// 🆕 SUMMARY MEMORY ADDITION: Dedicated summarization endpointapp.post("/api/summarize", async (req, res) => { try { const { messages, conversationType = 'general' } = req.body;
if (!messages || messages.length === 0) { return res.status(400).json({ error: "Messages are required" }); }
// Summary instructions for different conversation types const summaryInstructions = { technical: "Create a technical summary focusing on technologies discussed, decisions made, code examples covered, and implementation details. Preserve specific technical context.", creative: "Summarize the creative process including ideas generated, concepts explored, and creative directions chosen. Maintain the creative flow context.", support: "Summarize the support conversation including the user's issue, troubleshooting steps attempted, solutions provided, and current status.", general: "Create a conversational summary capturing key topics, decisions, and important context for continuing the discussion naturally." };
const instruction = summaryInstructions[conversationType] || summaryInstructions.general;
// Build context-aware message for the AI let contextualMessage = `Please summarize this conversation:\n\n${messages.map(msg => `${msg.role}: ${msg.content}`).join('\n\n')}`;
// Add summarization instructions contextualMessage = `You are a conversation summarizer. ${instruction} Keep it concise but comprehensive enough to maintain conversation continuity.\n\n${contextualMessage}`;
console.log(`Creating summary for ${messages.length} messages`);
// Create streaming response using Response API const response = await openai.responses.create({ model: "gpt-4o-mini", input: contextualMessage, });
// Return results res.json({ summary: response.output_text, messagesCount: messages.length, conversationType: conversationType, success: true, });
} catch (error) { console.error("Summarization Error:", error); res.status(500).json({ error: "Failed to create summary", success: false, }); }});
Step 1b: Enhance Your Chat Endpoint for Summary Support
Section titled “Step 1b: Enhance Your Chat Endpoint for Summary Support”Update your existing /api/chat/stream
endpoint to handle summaries:
// 🔄 ENHANCED: Updated streaming endpoint with summary supportapp.post("/api/chat/stream", async (req, res) => { try { const { message, conversationHistory = [], summary = null, recentWindowSize = 15 } = req.body;
if (!message) { return res.status(400).json({ error: "Message is required" }); }
// Set headers for streaming res.writeHead(200, { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', });
// 🆕 SUMMARY MEMORY ADDITION: Build smart context with summary let contextualMessage = message;
// If we have a summary, use it + recent messages for context if (summary && conversationHistory.length > 0) { const recentMessages = conversationHistory.slice(-recentWindowSize); const recentContext = recentMessages .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n');
contextualMessage = `Previous conversation summary:\n${summary}\n\nRecent conversation:\n${recentContext}\n\nCurrent question: ${message}`; } // If no summary but we have conversation history, use all of it (Simple Memory fallback) else if (conversationHistory.length > 0) { const context = conversationHistory .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n');
contextualMessage = `Previous conversation:\n${context}\n\nCurrent question: ${message}`; }
// Create streaming response using Response API const stream = await openai.responses.create({ model: "gpt-4o-mini", input: contextualMessage, stream: true, });
// Stream each chunk to the frontend - Handle Response API events for await (const event of stream) { switch (event.type) { case "response.output_text.delta": if (event.delta) { let textChunk = typeof event.delta === "string" ? event.delta : event.delta.text || "";
if (textChunk) { res.write(textChunk); res.flush?.(); } } break;
case "text_delta": if (event.text) { res.write(event.text); res.flush?.(); } break;
case "response.created": case "response.completed": case "response.output_item.added": case "response.content_part.added": case "response.content_part.done": case "response.output_item.done": case "response.output_text.done": // Keep connection alive, no content to write break;
case "error": console.error("Stream error:", event.error); res.write("\n[Error during generation]"); break; } }
// Close the stream res.end();
} catch (error) { console.error("OpenAI Streaming Error:", error);
// Handle error properly for streaming if (res.headersSent) { res.write("\n[Error occurred]"); res.end(); } else { res.status(500).json({ error: "Failed to stream AI response", success: false, }); } }});
Summary of Backend Changes
Section titled “Summary of Backend Changes”🆕 New /api/summarize
endpoint:
- What it does: Creates intelligent summaries of conversation history
- Why it’s separate: Keeps chat responses fast while summarization happens in background
- How it works: Uses conversation type detection for better summaries
🔄 Enhanced /api/chat/stream
endpoint:
- Added parameters:
summary
,recentWindowSize
- Smart context building: Uses summary + recent messages when available
- Fallback support: Still works with Simple Memory if no summary exists
🔄 Step 2: Enhance Your Frontend with Summary Logic
Section titled “🔄 Step 2: Enhance Your Frontend with Summary Logic”Now let’s enhance your Simple Memory frontend to add intelligent summarization. We’ll build on your existing StreamingChat
component.
Understanding the Frontend Enhancement
Section titled “Understanding the Frontend Enhancement”Your current Simple Memory frontend builds and sends all conversation history. We’ll enhance it to:
- Add summary state management for tracking summaries
- Create smart summarization logic with intelligent timing
- Send summaries + recent messages instead of all messages
- Provide visual feedback about memory optimization
Step 2a: Add Summary State Management
Section titled “Step 2a: Add Summary State Management”Update your component state to include summary-related functionality:
function StreamingChat() { const [messages, setMessages] = useState([]) const [input, setInput] = useState('') const [isStreaming, setIsStreaming] = useState(false) const abortControllerRef = useRef(null)
// 🆕 SUMMARY MEMORY ADDITION: Summary-specific state const [summary, setSummary] = useState(null) const [recentWindowSize, setRecentWindowSize] = useState(15) const [summaryThreshold, setSummaryThreshold] = useState(25) const [isCreatingSummary, setIsCreatingSummary] = useState(false) const [conversationType, setConversationType] = useState('general')
What each new state does:
summary
- Stores the current conversation summary textrecentWindowSize
- How many recent messages to keep in detail (default 15)summaryThreshold
- When to create first summary (default 25 messages)isCreatingSummary
- Shows when summarization is happeningconversationType
- Tracks detected conversation type
Step 2b: Add Summary Creation Logic
Section titled “Step 2b: Add Summary Creation Logic”Add these functions right after your existing buildConversationHistory
function:
// 🆕 SUMMARY MEMORY ADDITION: Detect conversation type automaticallyconst detectConversationType = (messages) => { const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) { return 'technical'; } else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) { return 'creative'; } else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) { return 'support'; } return 'general';};
// 🆕 SUMMARY MEMORY ADDITION: Create summary with intelligent timingconst createSummary = async (messagesToSummarize) => { if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try { setIsCreatingSummary(true);
// Detect conversation type for better summaries const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: messagesToSummarize, conversationType: detectedType }), });
const data = await response.json();
if (data.success) { setSummary(data.summary); setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`); } } catch (error) { console.error("Failed to create summary:", error); } finally { setIsCreatingSummary(false); }};
// 🆕 SUMMARY MEMORY ADDITION: Smart summary triggersconst shouldCreateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold && !summary;};
const shouldUpdateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold * 2 && summary;};
const isGoodTimeToSummarize = (conversationHistory) => { const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic const hasCodeDiscussion = recentMessages.some(msg => msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg => msg.content.toLowerCase().includes('can you explain') || msg.content.toLowerCase().includes('tell me more') || msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp;};
Step 2c: Update Your sendMessage Function
Section titled “Step 2c: Update Your sendMessage Function”Replace your existing sendMessage
function with this enhanced version:
const sendMessage = async () => { if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() } setMessages(prev => [...prev, userMessage])
const currentInput = input setInput('') setIsStreaming(true)
const aiMessageId = Date.now() + 1 const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true } setMessages(prev => [...prev, aiMessage])
try { // Build conversation history from current messages const conversationHistory = buildConversationHistory(messages)
// 🆕 SUMMARY MEMORY ADDITION: Smart summary timing - happens in background if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process } else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process }
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, summary: summary, // 🆕 SUMMARY MEMORY ADDITION: Include summary recentWindowSize: recentWindowSize // 🆕 SUMMARY MEMORY ADDITION: Include window size }), signal: abortControllerRef.current.signal, })
if (!response.ok) { throw new Error('Failed to get response') }
// Read the stream (unchanged) const reader = response.body.getReader() const decoder = new TextDecoder()
while (true) { const { done, value } = await reader.read() if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: msg.text + chunk } : msg ) ) }
// Mark streaming as complete (unchanged) setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, isStreaming: false } : msg ) )
} catch (error) { if (error.name === 'AbortError') { console.log('Request was cancelled') } else { console.error('Streaming error:', error) setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false } : msg ) ) } } finally { setIsStreaming(false) abortControllerRef.current = null }}
Step 2d: Add Memory Status Helper
Section titled “Step 2d: Add Memory Status Helper”Add this helper function for displaying memory statistics:
// 🆕 SUMMARY MEMORY ADDITION: Calculate memory statisticsconst getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const recentMessages = Math.min(totalMessages, recentWindowSize) const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages }};
// 🆕 SUMMARY MEMORY ADDITION: Manual summary triggerconst triggerManualSummary = async () => { const conversationHistory = buildConversationHistory(messages); if (conversationHistory.length >= 10) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); await createSummary(messagesToSummarize); }};
Step 2e: Enhanced UI with Summary Controls
Section titled “Step 2e: Enhanced UI with Summary Controls”Replace your existing return statement with this enhanced UI:
return ( <div className="min-h-screen bg-gray-100 flex items-center justify-center p-4"> <div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col"> {/* 🔄 ENHANCED: Header with summary controls */} <div className="bg-blue-500 text-white p-4 rounded-t-lg"> <div className="flex justify-between items-start"> <div> <h1 className="text-xl font-bold">Smart Summary Memory Chat</h1> <p className="text-blue-100 text-sm">Intelligent conversation memory with automatic summarization</p> </div>
<div className="text-right space-y-2"> <div> <label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label> <input type="range" min="5" max="30" value={recentWindowSize} onChange={(e) => setRecentWindowSize(parseInt(e.target.value))} className="w-20" disabled={isStreaming} /> </div> <div> <label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label> <input type="range" min="15" max="50" value={summaryThreshold} onChange={(e) => setSummaryThreshold(parseInt(e.target.value))} className="w-20" disabled={isStreaming} /> </div> <button onClick={triggerManualSummary} disabled={isCreatingSummary || messages.length < 10} className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50" > Create Summary Now </button> </div> </div> </div>
{/* 🆕 SUMMARY MEMORY ADDITION: Memory status dashboard */} <div className="bg-gray-50 px-4 py-3 border-b"> {(() => { const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();
return ( <div className="space-y-2"> <div className="flex justify-between items-center text-sm"> <div className="flex space-x-4 text-gray-600"> <span>📊 Total: {totalMessages}</span> <span>🔥 Recent: {recentMessages}</span> {summarizedMessages > 0 && ( <span>📝 Summarized: {summarizedMessages}</span> )} <span className="text-blue-600">🧠 Type: {conversationType}</span> </div>
<div className="flex items-center space-x-2 text-xs"> {summary && ( <span className="text-green-600">✅ Summary Active</span> )} {isCreatingSummary && ( <span className="text-blue-600">🔄 Creating Summary...</span> )} </div> </div>
{/* Memory usage bar */} <div className="w-full bg-gray-200 rounded-full h-2"> <div className="bg-blue-500 h-2 rounded-full transition-all duration-300" style={{ width: `${Math.min(100, (totalMessages / 50) * 100)}%` }} /> </div> <div className="text-xs text-gray-500 text-center"> Memory usage: {totalMessages}/50 messages before optimization </div> </div> ); })()} </div>
{/* 🆕 SUMMARY MEMORY ADDITION: Active summary display */} {summary && ( <div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-4 mt-2 rounded"> <div className="flex items-start"> <span className="text-blue-600 mr-2">📋</span> <div className="flex-1"> <p className="text-xs font-medium text-blue-800 mb-1"> Active Summary ({conversationType}) </p> <p className="text-xs text-blue-700 leading-relaxed"> {summary} </p> </div> </div> </div> )}
{/* Messages (unchanged) */} <div className="flex-1 overflow-y-auto p-4 space-y-4"> {messages.length === 0 && ( <div className="text-center text-gray-500 mt-20"> <Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" /> <p>Send a message to see streaming and summary memory in action!</p> </div> )}
{messages.map((message) => ( <div key={message.id} className={`flex items-start space-x-3 ${ message.isUser ? 'justify-end' : 'justify-start' }`} > {!message.isUser && ( <div className="bg-blue-500 p-2 rounded-full"> <Bot className="w-4 h-4 text-white" /> </div> )}
<div className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${ message.isUser ? 'bg-blue-500 text-white' : 'bg-gray-200 text-gray-800' }`} > {message.text} {message.isStreaming && ( <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" /> )} </div>
{message.isUser && ( <div className="bg-gray-500 p-2 rounded-full"> <User className="w-4 h-4 text-white" /> </div> )} </div> ))} </div>
{/* Input (unchanged) */} <div className="border-t p-4"> <div className="flex space-x-2"> <input type="text" value={input} onChange={(e) => setInput(e.target.value)} onKeyPress={handleKeyPress} placeholder="Type your message..." className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500" disabled={isStreaming} /> {isStreaming ? ( <button onClick={stopStreaming} className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors" > Stop </button> ) : ( <button onClick={sendMessage} disabled={!input.trim()} className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors" > <Send className="w-5 h-5" /> </button> )} </div> </div> </div> </div>)
🧪 Test Your Summary Memory
Section titled “🧪 Test Your Summary Memory”Step-by-Step Testing Guide
Section titled “Step-by-Step Testing Guide”- Start both servers (backend and frontend)
- Open your enhanced streaming chat
- Set summary threshold to 10 for faster testing
- Have this comprehensive test conversation:
Messages 1-5: Build contextYou: "Hi! My name is Sarah and I'm 25 years old"AI: "Nice to meet you, Sarah! It's great to know you're 25."
You: "I'm building a React todo app with Firebase"AI: "That sounds like a great project! React and Firebase work well together."
You: "I'm using TypeScript and want authentication"AI: "Excellent choice! TypeScript adds great type safety to React projects."
Messages 6-10: Continue building contextYou: "I work as a frontend developer in New York"AI: "That's awesome! New York has a great tech scene."
You: "I love using modern frameworks and tools"AI: "Modern frameworks definitely make development more efficient."
Messages 11-15: Watch summary creationYou: "What CSS framework should I use?"AI: "For a React app, you might consider Tailwind CSS or styled-components."
[Watch the memory status - should show summary being created]
You: "What do you remember about me?"AI: "Based on our conversation, you're Sarah, 25 years old, a frontend developer in New York working on a React todo app with Firebase and TypeScript authentication."
[Should reference information from early messages via summary!]
What to Watch For
Section titled “What to Watch For”- Memory indicator shows total vs summarized vs recent messages
- Summary creation happens automatically at threshold
- AI maintains context from early messages even after summarization
- Chat responses stay fast (no waiting for summarization)
- Visual feedback shows when summary is active
Complete Enhanced StreamingChat Component
Section titled “Complete Enhanced StreamingChat Component”Here’s your complete StreamingChat.jsx
component with all Summary Memory enhancements:
import { useState, useRef } from 'react'import { Send, Bot, User } from 'lucide-react'
function StreamingChat() { const [messages, setMessages] = useState([]) const [input, setInput] = useState('') const [isStreaming, setIsStreaming] = useState(false) const abortControllerRef = useRef(null)
// 🆕 SUMMARY MEMORY ADDITION: Summary-specific state const [summary, setSummary] = useState(null) const [recentWindowSize, setRecentWindowSize] = useState(15) const [summaryThreshold, setSummaryThreshold] = useState(25) const [isCreatingSummary, setIsCreatingSummary] = useState(false) const [conversationType, setConversationType] = useState('general')
// Function to build conversation history (from Simple Memory) const buildConversationHistory = (messages) => { return messages .filter(msg => !msg.isStreaming) .map(msg => ({ role: msg.isUser ? "user" : "assistant", content: msg.text })); };
// 🆕 SUMMARY MEMORY ADDITION: Detect conversation type automatically const detectConversationType = (messages) => { const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) { return 'technical'; } else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) { return 'creative'; } else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) { return 'support'; } return 'general'; };
// 🆕 SUMMARY MEMORY ADDITION: Create summary with intelligent timing const createSummary = async (messagesToSummarize) => { if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try { setIsCreatingSummary(true);
// Detect conversation type for better summaries const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ messages: messagesToSummarize, conversationType: detectedType }), });
const data = await response.json();
if (data.success) { setSummary(data.summary); setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`); } } catch (error) { console.error("Failed to create summary:", error); } finally { setIsCreatingSummary(false); } };
// 🆕 SUMMARY MEMORY ADDITION: Smart summary triggers const shouldCreateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold && !summary; };
const shouldUpdateSummary = (conversationHistory) => { return conversationHistory.length >= summaryThreshold * 2 && summary; };
const isGoodTimeToSummarize = (conversationHistory) => { const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic const hasCodeDiscussion = recentMessages.some(msg => msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg => msg.content.toLowerCase().includes('can you explain') || msg.content.toLowerCase().includes('tell me more') || msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp; };
// 🆕 SUMMARY MEMORY ADDITION: Calculate memory statistics const getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const recentMessages = Math.min(totalMessages, recentWindowSize) const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages } };
// 🆕 SUMMARY MEMORY ADDITION: Manual summary trigger const triggerManualSummary = async () => { const conversationHistory = buildConversationHistory(messages); if (conversationHistory.length >= 10) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); await createSummary(messagesToSummarize); } };
const sendMessage = async () => { if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() } setMessages(prev => [...prev, userMessage])
const currentInput = input setInput('') setIsStreaming(true)
const aiMessageId = Date.now() + 1 const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true } setMessages(prev => [...prev, aiMessage])
try { // Build conversation history from current messages const conversationHistory = buildConversationHistory(messages)
// 🆕 SUMMARY MEMORY ADDITION: Smart summary timing - happens in background if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process } else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) { const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize); createSummary(messagesToSummarize); // No await - background process }
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, summary: summary, // 🆕 SUMMARY MEMORY ADDITION: Include summary recentWindowSize: recentWindowSize // 🆕 SUMMARY MEMORY ADDITION: Include window size }), signal: abortControllerRef.current.signal, })
if (!response.ok) { throw new Error('Failed to get response') }
// Read the stream (unchanged) const reader = response.body.getReader() const decoder = new TextDecoder()
while (true) { const { done, value } = await reader.read() if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: msg.text + chunk } : msg ) ) }
// Mark streaming as complete (unchanged) setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, isStreaming: false } : msg ) )
} catch (error) { if (error.name === 'AbortError') { console.log('Request was cancelled') } else { console.error('Streaming error:', error) setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false } : msg ) ) } } finally { setIsStreaming(false) abortControllerRef.current = null } }
const handleKeyPress = (e) => { if (e.key === 'Enter' && !e.shiftKey && !isStreaming) { e.preventDefault() sendMessage() } }
const stopStreaming = () => { if (abortControllerRef.current) { abortControllerRef.current.abort() } }
return ( <div className="min-h-screen bg-gray-100 flex items-center justify-center p-4"> <div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col"> {/* 🔄 ENHANCED: Header with summary controls */} <div className="bg-blue-500 text-white p-4 rounded-t-lg"> <div className="flex justify-between items-start"> <div> <h1 className="text-xl font-bold">Smart Summary Memory Chat</h1> <p className="text-blue-100 text-sm">Intelligent conversation memory with automatic summarization</p> </div>
<div className="text-right space-y-2"> <div> <label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label> <input type="range" min="5" max="30" value={recentWindowSize} onChange={(e) => setRecentWindowSize(parseInt(e.target.value))} className="w-20" disabled={isStreaming} /> </div> <div> <label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label> <input type="range" min="15" max="50" value={summaryThreshold} onChange={(e) => setSummaryThreshold(parseInt(e.target.value))} className="w-20" disabled={isStreaming} /> </div> <button onClick={triggerManualSummary} disabled={isCreatingSummary || messages.length < 10} className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50" > Create Summary Now </button> </div> </div> </div>
{/* 🆕 SUMMARY MEMORY ADDITION: Memory status dashboard */} <div className="bg-gray-50 px-4 py-3 border-b"> {(() => { const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();
return ( <div className="space-y-2"> <div className="flex justify-between items-center text-sm"> <div className="flex space-x-4 text-gray-600"> <span>📊 Total: {totalMessages}</span> <span>🔥 Recent: {recentMessages}</span> {summarizedMessages > 0 && ( <span>📝 Summarized: {summarizedMessages}</span> )} <span className="text-blue-600">🧠 Type: {conversationType}</span> </div>
<div className="flex items-center space-x-2 text-xs"> {summary && ( <span className="text-green-600">✅ Summary Active</span> )} {isCreatingSummary && ( <span className="text-blue-600">🔄 Creating Summary...</span> )} </div> </div>
{/* Memory usage bar */} <div className="w-full bg-gray-200 rounded-full h-2"> <div className="bg-blue-500 h-2 rounded-full transition-all duration-300" style={{ width: `${Math.min(100, (totalMessages / 50) * 100)}%` }} /> </div> <div className="text-xs text-gray-500 text-center"> Memory usage: {totalMessages}/50 messages before optimization </div> </div> ); })()} </div>
{/* 🆕 SUMMARY MEMORY ADDITION: Active summary display */} {summary && ( <div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-4 mt-2 rounded"> <div className="flex items-start"> <span className="text-blue-600 mr-2">📋</span> <div className="flex-1"> <p className="text-xs font-medium text-blue-800 mb-1"> Active Summary ({conversationType}) </p> <p className="text-xs text-blue-700 leading-relaxed"> {summary} </p> </div> </div> </div> )}
{/* Messages (unchanged) */} <div className="flex-1 overflow-y-auto p-4 space-y-4"> {messages.length === 0 && ( <div className="text-center text-gray-500 mt-20"> <Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" /> <p>Send a message to see streaming and summary memory in action!</p> </div> )}
{messages.map((message) => ( <div key={message.id} className={`flex items-start space-x-3 ${ message.isUser ? 'justify-end' : 'justify-start' }`} > {!message.isUser && ( <div className="bg-blue-500 p-2 rounded-full"> <Bot className="w-4 h-4 text-white" /> </div> )}
<div className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${ message.isUser ? 'bg-blue-500 text-white' : 'bg-gray-200 text-gray-800' }`} > {message.text} {message.isStreaming && ( <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" /> )} </div>
{message.isUser && ( <div className="bg-gray-500 p-2 rounded-full"> <User className="w-4 h-4 text-white" /> </div> )} </div> ))} </div>
{/* Input (unchanged) */} <div className="border-t p-4"> <div className="flex space-x-2"> <input type="text" value={input} onChange={(e) => setInput(e.target.value)} onKeyPress={handleKeyPress} placeholder="Type your message..." className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500" disabled={isStreaming} /> {isStreaming ? ( <button onClick={stopStreaming} className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors" > Stop </button> ) : ( <button onClick={sendMessage} disabled={!input.trim()} className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors" > <Send className="w-5 h-5" /> </button> )} </div> </div> </div> </div> )}
export default StreamingChat
Summary of All Frontend Changes
Section titled “Summary of All Frontend Changes”🆕 New State Variables (Lines 9-13):
summary
- Current conversation summaryrecentWindowSize
- How many recent messages to keepsummaryThreshold
- When to trigger summary creationisCreatingSummary
- Summary creation statusconversationType
- Detected conversation type
🆕 New Functions (Lines 24-89):
detectConversationType()
- Auto-detects conversation typecreateSummary()
- Creates summaries via API callshouldCreateSummary()
/shouldUpdateSummary()
- Smart triggersisGoodTimeToSummarize()
- Timing intelligencegetMemoryStats()
- Memory usage calculationstriggerManualSummary()
- Manual summary creation
🔄 Enhanced sendMessage (Lines 91-163):
- Background summary creation
- Includes summary and window size in requests
- Non-blocking summarization
🔄 Enhanced UI (Lines 174-350):
- Summary controls in header
- Memory status dashboard
- Active summary display
- Visual indicators for memory usage
🔍 How Summary Memory Works
Section titled “🔍 How Summary Memory Works”Data Flow Visualization
Section titled “Data Flow Visualization”Frontend State → Summary Creation → Enhanced Context
messages = [ { text: "I'm Sarah, 25", isUser: true, id: 1 }, // Gets summarized { text: "Nice to meet you!", isUser: false, id: 2 }, // Gets summarized { text: "I work in NYC", isUser: true, id: 3 }, // Gets summarized // ... 20 more messages ... { text: "What frameworks?", isUser: true, id: 23 }, // Recent (kept) { text: "React is great!", isUser: false, id: 24 }, // Recent (kept) { text: "Tell me about CSS", isUser: true, id: 25 } // Current message]
↓ Summary Creation (messages 1-10) ↓
summary = "User Sarah (25) is a frontend dev in NYC building a React todo app with Firebase auth and TypeScript"recentMessages = [messages 11-24] // Last 15 messagescurrentMessage = "Tell me about CSS"
↓ Enhanced Context ↓
contextualMessage = `Previous conversation summary:User Sarah (25) is a frontend dev in NYC building a React todo app with Firebase auth and TypeScript
Recent conversation:User: What frameworks should I use?Assistant: React is great for modern apps...
Current question: Tell me about CSS`
Memory Optimization Process
Section titled “Memory Optimization Process”// Before Summary (25 messages):Request = [msg1, msg2, msg3, ..., msg25] = ~2,500 tokens
// After Summary (25+ messages):Request = summary + [msg16, msg17, ..., msg25] = ~800 tokens// 70% cost reduction while maintaining full context!
🧪 Advanced Testing Scenarios
Section titled “🧪 Advanced Testing Scenarios”Test Scenario 1: Context Retention
Section titled “Test Scenario 1: Context Retention”1. Tell AI your name and age (message 1-2)2. Have 20+ messages about different topics3. Ask "What do you remember about me?"4. AI should remember details from message 1 via summary
Test Scenario 2: Summary Timing
Section titled “Test Scenario 2: Summary Timing”1. Start complex coding discussion2. Notice summary waits for natural break3. Change topics completely4. Summary triggers automatically
Test Scenario 3: Cost Optimization
Section titled “Test Scenario 3: Cost Optimization”1. Watch memory usage bar grow2. See summarization reduce effective memory3. Compare with Simple Memory costs4. Verify 70%+ savings in long conversations
✅ What You’ve Built
Section titled “✅ What You’ve Built”Your intelligent Summary Memory system now provides:
Smart Memory Management
Section titled “Smart Memory Management”- ✅ Context retention - Never loses important conversation details
- ✅ Cost optimization - Up to 70% savings on long conversations
- ✅ Intelligent timing - Summarizes at natural conversation breaks
- ✅ Type detection - Different summary styles for different conversation types
Production-Ready Features
Section titled “Production-Ready Features”- ✅ Background processing - Chat responses stay instant
- ✅ Visual feedback - Real-time memory usage indicators
- ✅ User controls - Adjustable thresholds and manual triggers
- ✅ Error handling - Graceful degradation when summarization fails
Advanced Capabilities
Section titled “Advanced Capabilities”- ✅ Unlimited conversations - Scales to any conversation length
- ✅ Smart fallbacks - Works with Simple Memory when no summary exists
- ✅ Multiple conversation types - Technical, creative, support, general
- ✅ Real-time optimization - Automatic memory management
This is production-ready memory management that combines the best of Simple Memory (context retention) and Sliding Window (cost control) without the downsides of either approach! 🧠✨