Skip to content

📝 Summary Memory Implementation

Sliding window memory works well but has a problem - it completely forgets older context. Summary memory solves this by keeping recent messages intact while summarizing older parts of the conversation. This maintains context while controlling token costs.

Building on: This guide assumes you’ve completed the Simple Memory Implementation. We’ll enhance that code to add intelligent summarization.


Simple Memory (sends everything):
Message 10: [1,2,3,4,5,6,7,8,9,10] → 1,000 tokens
Message 25: [1,2,3...23,24,25] → 2,500 tokens
Message 50: [1,2,3...48,49,50] → 5,000 tokens
Message 100: [1,2,3...98,99,100] → 10,000 tokens
Total Cost: ~175,000 tokens 💸💸💸
Summary Memory (summarize old + keep recent):
Message 25: Create summary of messages 1-15 + keep messages 16-25
Message 50: Update summary (1-35) + keep messages 36-50
Message 100: Update summary (1-85) + keep messages 86-100
Total Cost: ~50,000 tokens 💰 (70% savings!)
// Simple Memory: Everything grows
conversationHistory = [msg1, msg2, msg3, ..., msg100] // All 100 messages
// Summary Memory: Smart optimization
summary = "User discussed React app setup, chose Firebase auth, implemented user login..."
recentMessages = [msg85, msg86, msg87, ..., msg100] // Last 15 messages
// Send: summary + recent messages (much more efficient!)

Why Summary Memory is Better Than Alternatives

Section titled “Why Summary Memory is Better Than Alternatives”

Compared to Sliding Window Memory (which you might consider):

  • Summary Memory: Keeps context from old messages via summary
  • Sliding Window: Completely forgets old messages
  • Summary Memory: AI remembers your name from message 1 even at message 100
  • Sliding Window: AI forgets your name after window size is exceeded

Best of Both Worlds:

// Summary Memory = Context Retention + Cost Control
summary + recentMessages = Full context + Manageable cost
// Simple Memory = Context but Expensive
allMessages = Full context + Growing cost
// Sliding Window = Cheap but Forgets
recentMessages = Limited context + Fixed cost

🤔 Why We Need a Separate Summary Endpoint

Section titled “🤔 Why We Need a Separate Summary Endpoint”

You might think: “Why not just summarize during the chat response?” Here’s why that’s a bad idea:

Bad approach (summarizing inline):

app.post("/api/chat/stream", async (req, res) => {
// 1. User sends message
// 2. Check if we need summary
// 3. Create summary (takes 3-5 seconds) ❌
// 4. Then respond to user (another 3-5 seconds) ❌
// Total: 6-10 seconds of waiting!
})

What users experience:

You: "How do I deploy this app?"
[Wait... wait... wait... 8 seconds later...]
AI: "For deployment, you can use..."

Good approach (separate endpoints):

// Summarization happens separately and strategically
app.post("/api/summarize", async (req, res) => {
// Just creates summary and returns it
})
// Chat responses stay fast
app.post("/api/chat/stream", async (req, res) => {
// Uses existing summary + responds immediately
})

What users experience:

You: "How do I deploy this app?"
AI: "For deployment, you can use..." [instant response]
[Summary created quietly in background when needed]

We don’t summarize randomly. We use strategic triggers based on conversation length and natural breaks.

Trigger Strategy 1: Message Count Thresholds

Section titled “Trigger Strategy 1: Message Count Thresholds”
const shouldCreateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold && !summary;
};
const shouldUpdateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold * 2 && summary;
};

Timeline example:

  • Messages 1-24: Simple memory (send all messages)
  • Message 25: Create first summary (summarize messages 1-10, keep 11-25 detailed)
  • Messages 26-49: Use summary + recent messages
  • Message 50: Update summary (summarize messages 1-35, keep 36-50 detailed)

Trigger Strategy 2: Natural Conversation Breaks

Section titled “Trigger Strategy 2: Natural Conversation Breaks”
const isGoodTimeToSummarize = (conversationHistory) => {
const recentMessages = conversationHistory.slice(-5);
// Don't summarize during complex topics
const hasCodeBlocks = recentMessages.some(msg =>
msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUps = recentMessages.some(msg =>
msg.content.toLowerCase().includes('can you explain') ||
msg.content.toLowerCase().includes('tell me more'));
// Wait for natural break if in middle of complex topic
if (hasCodeBlocks || hasFollowUps) {
return false; // Wait for better timing
}
return true; // Good time to summarize
};

🛠️ Step 1: Build the Summarization Endpoint

Section titled “🛠️ Step 1: Build the Summarization Endpoint”

Let’s enhance your backend by adding a dedicated summary endpoint. We’ll build on your existing streaming implementation.

Your current Simple Memory backend accepts conversation history and sends it all to OpenAI. We’ll add:

  1. New summarization endpoint for creating summaries
  2. Enhanced chat endpoint that uses summaries + recent messages
  3. Smart context building that combines both efficiently

Add this new endpoint to your backend index.js, right after your existing chat endpoints:

// 🆕 SUMMARY MEMORY ADDITION: Dedicated summarization endpoint
app.post("/api/summarize", async (req, res) => {
try {
const { messages, conversationType = 'general' } = req.body;
if (!messages || messages.length === 0) {
return res.status(400).json({ error: "Messages are required" });
}
// Summary instructions for different conversation types
const summaryInstructions = {
technical: "Create a technical summary focusing on technologies discussed, decisions made, code examples covered, and implementation details. Preserve specific technical context.",
creative: "Summarize the creative process including ideas generated, concepts explored, and creative directions chosen. Maintain the creative flow context.",
support: "Summarize the support conversation including the user's issue, troubleshooting steps attempted, solutions provided, and current status.",
general: "Create a conversational summary capturing key topics, decisions, and important context for continuing the discussion naturally."
};
const instruction = summaryInstructions[conversationType] || summaryInstructions.general;
// Build context-aware message for the AI
let contextualMessage = `Please summarize this conversation:\n\n${messages.map(msg => `${msg.role}: ${msg.content}`).join('\n\n')}`;
// Add summarization instructions
contextualMessage = `You are a conversation summarizer. ${instruction} Keep it concise but comprehensive enough to maintain conversation continuity.\n\n${contextualMessage}`;
console.log(`Creating summary for ${messages.length} messages`);
// Create streaming response using Response API
const response = await openai.responses.create({
model: "gpt-4o-mini",
input: contextualMessage,
});
// Return results
res.json({
summary: response.output_text,
messagesCount: messages.length,
conversationType: conversationType,
success: true,
});
} catch (error) {
console.error("Summarization Error:", error);
res.status(500).json({
error: "Failed to create summary",
success: false,
});
}
});

Step 1b: Enhance Your Chat Endpoint for Summary Support

Section titled “Step 1b: Enhance Your Chat Endpoint for Summary Support”

Update your existing /api/chat/stream endpoint to handle summaries:

// 🔄 ENHANCED: Updated streaming endpoint with summary support
app.post("/api/chat/stream", async (req, res) => {
try {
const {
message,
conversationHistory = [],
summary = null,
recentWindowSize = 15
} = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set headers for streaming
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
// 🆕 SUMMARY MEMORY ADDITION: Build smart context with summary
let contextualMessage = message;
// If we have a summary, use it + recent messages for context
if (summary && conversationHistory.length > 0) {
const recentMessages = conversationHistory.slice(-recentWindowSize);
const recentContext = recentMessages
.map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
.join('\n');
contextualMessage = `Previous conversation summary:\n${summary}\n\nRecent conversation:\n${recentContext}\n\nCurrent question: ${message}`;
}
// If no summary but we have conversation history, use all of it (Simple Memory fallback)
else if (conversationHistory.length > 0) {
const context = conversationHistory
.map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
.join('\n');
contextualMessage = `Previous conversation:\n${context}\n\nCurrent question: ${message}`;
}
// Create streaming response using Response API
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: contextualMessage,
stream: true,
});
// Stream each chunk to the frontend - Handle Response API events
for await (const event of stream) {
switch (event.type) {
case "response.output_text.delta":
if (event.delta) {
let textChunk = typeof event.delta === "string"
? event.delta
: event.delta.text || "";
if (textChunk) {
res.write(textChunk);
res.flush?.();
}
}
break;
case "text_delta":
if (event.text) {
res.write(event.text);
res.flush?.();
}
break;
case "response.created":
case "response.completed":
case "response.output_item.added":
case "response.content_part.added":
case "response.content_part.done":
case "response.output_item.done":
case "response.output_text.done":
// Keep connection alive, no content to write
break;
case "error":
console.error("Stream error:", event.error);
res.write("\n[Error during generation]");
break;
}
}
// Close the stream
res.end();
} catch (error) {
console.error("OpenAI Streaming Error:", error);
// Handle error properly for streaming
if (res.headersSent) {
res.write("\n[Error occurred]");
res.end();
} else {
res.status(500).json({
error: "Failed to stream AI response",
success: false,
});
}
}
});

🆕 New /api/summarize endpoint:

  • What it does: Creates intelligent summaries of conversation history
  • Why it’s separate: Keeps chat responses fast while summarization happens in background
  • How it works: Uses conversation type detection for better summaries

🔄 Enhanced /api/chat/stream endpoint:

  • Added parameters: summary, recentWindowSize
  • Smart context building: Uses summary + recent messages when available
  • Fallback support: Still works with Simple Memory if no summary exists

🔄 Step 2: Enhance Your Frontend with Summary Logic

Section titled “🔄 Step 2: Enhance Your Frontend with Summary Logic”

Now let’s enhance your Simple Memory frontend to add intelligent summarization. We’ll build on your existing StreamingChat component.

Your current Simple Memory frontend builds and sends all conversation history. We’ll enhance it to:

  1. Add summary state management for tracking summaries
  2. Create smart summarization logic with intelligent timing
  3. Send summaries + recent messages instead of all messages
  4. Provide visual feedback about memory optimization

Update your component state to include summary-related functionality:

function StreamingChat() {
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [isStreaming, setIsStreaming] = useState(false)
const abortControllerRef = useRef(null)
// 🆕 SUMMARY MEMORY ADDITION: Summary-specific state
const [summary, setSummary] = useState(null)
const [recentWindowSize, setRecentWindowSize] = useState(15)
const [summaryThreshold, setSummaryThreshold] = useState(25)
const [isCreatingSummary, setIsCreatingSummary] = useState(false)
const [conversationType, setConversationType] = useState('general')

What each new state does:

  • summary - Stores the current conversation summary text
  • recentWindowSize - How many recent messages to keep in detail (default 15)
  • summaryThreshold - When to create first summary (default 25 messages)
  • isCreatingSummary - Shows when summarization is happening
  • conversationType - Tracks detected conversation type

Add these functions right after your existing buildConversationHistory function:

// 🆕 SUMMARY MEMORY ADDITION: Detect conversation type automatically
const detectConversationType = (messages) => {
const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) {
return 'technical';
} else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) {
return 'creative';
} else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) {
return 'support';
}
return 'general';
};
// 🆕 SUMMARY MEMORY ADDITION: Create summary with intelligent timing
const createSummary = async (messagesToSummarize) => {
if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try {
setIsCreatingSummary(true);
// Detect conversation type for better summaries
const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: messagesToSummarize,
conversationType: detectedType
}),
});
const data = await response.json();
if (data.success) {
setSummary(data.summary);
setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`);
}
} catch (error) {
console.error("Failed to create summary:", error);
} finally {
setIsCreatingSummary(false);
}
};
// 🆕 SUMMARY MEMORY ADDITION: Smart summary triggers
const shouldCreateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold && !summary;
};
const shouldUpdateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold * 2 && summary;
};
const isGoodTimeToSummarize = (conversationHistory) => {
const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic
const hasCodeDiscussion = recentMessages.some(msg =>
msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg =>
msg.content.toLowerCase().includes('can you explain') ||
msg.content.toLowerCase().includes('tell me more') ||
msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp;
};

Replace your existing sendMessage function with this enhanced version:

const sendMessage = async () => {
if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() }
setMessages(prev => [...prev, userMessage])
const currentInput = input
setInput('')
setIsStreaming(true)
const aiMessageId = Date.now() + 1
const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true }
setMessages(prev => [...prev, aiMessage])
try {
// Build conversation history from current messages
const conversationHistory = buildConversationHistory(messages)
// 🆕 SUMMARY MEMORY ADDITION: Smart summary timing - happens in background
if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
} else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
}
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
message: currentInput,
conversationHistory: conversationHistory,
summary: summary, // 🆕 SUMMARY MEMORY ADDITION: Include summary
recentWindowSize: recentWindowSize // 🆕 SUMMARY MEMORY ADDITION: Include window size
}),
signal: abortControllerRef.current.signal,
})
if (!response.ok) {
throw new Error('Failed to get response')
}
// Read the stream (unchanged)
const reader = response.body.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: msg.text + chunk }
: msg
)
)
}
// Mark streaming as complete (unchanged)
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, isStreaming: false }
: msg
)
)
} catch (error) {
if (error.name === 'AbortError') {
console.log('Request was cancelled')
} else {
console.error('Streaming error:', error)
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false }
: msg
)
)
}
} finally {
setIsStreaming(false)
abortControllerRef.current = null
}
}

Add this helper function for displaying memory statistics:

// 🆕 SUMMARY MEMORY ADDITION: Calculate memory statistics
const getMemoryStats = () => {
const totalMessages = messages.filter(msg => !msg.isStreaming).length
const recentMessages = Math.min(totalMessages, recentWindowSize)
const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages }
};
// 🆕 SUMMARY MEMORY ADDITION: Manual summary trigger
const triggerManualSummary = async () => {
const conversationHistory = buildConversationHistory(messages);
if (conversationHistory.length >= 10) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
await createSummary(messagesToSummarize);
}
};

Step 2e: Enhanced UI with Summary Controls

Section titled “Step 2e: Enhanced UI with Summary Controls”

Replace your existing return statement with this enhanced UI:

return (
<div className="min-h-screen bg-gray-100 flex items-center justify-center p-4">
<div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col">
{/* 🔄 ENHANCED: Header with summary controls */}
<div className="bg-blue-500 text-white p-4 rounded-t-lg">
<div className="flex justify-between items-start">
<div>
<h1 className="text-xl font-bold">Smart Summary Memory Chat</h1>
<p className="text-blue-100 text-sm">Intelligent conversation memory with automatic summarization</p>
</div>
<div className="text-right space-y-2">
<div>
<label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label>
<input
type="range" min="5" max="30" value={recentWindowSize}
onChange={(e) => setRecentWindowSize(parseInt(e.target.value))}
className="w-20" disabled={isStreaming}
/>
</div>
<div>
<label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label>
<input
type="range" min="15" max="50" value={summaryThreshold}
onChange={(e) => setSummaryThreshold(parseInt(e.target.value))}
className="w-20" disabled={isStreaming}
/>
</div>
<button
onClick={triggerManualSummary}
disabled={isCreatingSummary || messages.length < 10}
className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50"
>
Create Summary Now
</button>
</div>
</div>
</div>
{/* 🆕 SUMMARY MEMORY ADDITION: Memory status dashboard */}
<div className="bg-gray-50 px-4 py-3 border-b">
{(() => {
const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();
return (
<div className="space-y-2">
<div className="flex justify-between items-center text-sm">
<div className="flex space-x-4 text-gray-600">
<span>📊 Total: {totalMessages}</span>
<span>🔥 Recent: {recentMessages}</span>
{summarizedMessages > 0 && (
<span>📝 Summarized: {summarizedMessages}</span>
)}
<span className="text-blue-600">🧠 Type: {conversationType}</span>
</div>
<div className="flex items-center space-x-2 text-xs">
{summary && (
<span className="text-green-600">✅ Summary Active</span>
)}
{isCreatingSummary && (
<span className="text-blue-600">🔄 Creating Summary...</span>
)}
</div>
</div>
{/* Memory usage bar */}
<div className="w-full bg-gray-200 rounded-full h-2">
<div
className="bg-blue-500 h-2 rounded-full transition-all duration-300"
style={{
width: `${Math.min(100, (totalMessages / 50) * 100)}%`
}}
/>
</div>
<div className="text-xs text-gray-500 text-center">
Memory usage: {totalMessages}/50 messages before optimization
</div>
</div>
);
})()}
</div>
{/* 🆕 SUMMARY MEMORY ADDITION: Active summary display */}
{summary && (
<div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-4 mt-2 rounded">
<div className="flex items-start">
<span className="text-blue-600 mr-2">📋</span>
<div className="flex-1">
<p className="text-xs font-medium text-blue-800 mb-1">
Active Summary ({conversationType})
</p>
<p className="text-xs text-blue-700 leading-relaxed">
{summary}
</p>
</div>
</div>
</div>
)}
{/* Messages (unchanged) */}
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.length === 0 && (
<div className="text-center text-gray-500 mt-20">
<Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" />
<p>Send a message to see streaming and summary memory in action!</p>
</div>
)}
{messages.map((message) => (
<div
key={message.id}
className={`flex items-start space-x-3 ${
message.isUser ? 'justify-end' : 'justify-start'
}`}
>
{!message.isUser && (
<div className="bg-blue-500 p-2 rounded-full">
<Bot className="w-4 h-4 text-white" />
</div>
)}
<div
className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${
message.isUser
? 'bg-blue-500 text-white'
: 'bg-gray-200 text-gray-800'
}`}
>
{message.text}
{message.isStreaming && (
<span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" />
)}
</div>
{message.isUser && (
<div className="bg-gray-500 p-2 rounded-full">
<User className="w-4 h-4 text-white" />
</div>
)}
</div>
))}
</div>
{/* Input (unchanged) */}
<div className="border-t p-4">
<div className="flex space-x-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={handleKeyPress}
placeholder="Type your message..."
className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500"
disabled={isStreaming}
/>
{isStreaming ? (
<button
onClick={stopStreaming}
className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors"
>
Stop
</button>
) : (
<button
onClick={sendMessage}
disabled={!input.trim()}
className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors"
>
<Send className="w-5 h-5" />
</button>
)}
</div>
</div>
</div>
</div>
)

  1. Start both servers (backend and frontend)
  2. Open your enhanced streaming chat
  3. Set summary threshold to 10 for faster testing
  4. Have this comprehensive test conversation:
Messages 1-5: Build context
You: "Hi! My name is Sarah and I'm 25 years old"
AI: "Nice to meet you, Sarah! It's great to know you're 25."
You: "I'm building a React todo app with Firebase"
AI: "That sounds like a great project! React and Firebase work well together."
You: "I'm using TypeScript and want authentication"
AI: "Excellent choice! TypeScript adds great type safety to React projects."
Messages 6-10: Continue building context
You: "I work as a frontend developer in New York"
AI: "That's awesome! New York has a great tech scene."
You: "I love using modern frameworks and tools"
AI: "Modern frameworks definitely make development more efficient."
Messages 11-15: Watch summary creation
You: "What CSS framework should I use?"
AI: "For a React app, you might consider Tailwind CSS or styled-components."
[Watch the memory status - should show summary being created]
You: "What do you remember about me?"
AI: "Based on our conversation, you're Sarah, 25 years old, a frontend developer in New York working on a React todo app with Firebase and TypeScript authentication."
[Should reference information from early messages via summary!]
  • Memory indicator shows total vs summarized vs recent messages
  • Summary creation happens automatically at threshold
  • AI maintains context from early messages even after summarization
  • Chat responses stay fast (no waiting for summarization)
  • Visual feedback shows when summary is active

Here’s your complete StreamingChat.jsx component with all Summary Memory enhancements:

import { useState, useRef } from 'react'
import { Send, Bot, User } from 'lucide-react'
function StreamingChat() {
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [isStreaming, setIsStreaming] = useState(false)
const abortControllerRef = useRef(null)
// 🆕 SUMMARY MEMORY ADDITION: Summary-specific state
const [summary, setSummary] = useState(null)
const [recentWindowSize, setRecentWindowSize] = useState(15)
const [summaryThreshold, setSummaryThreshold] = useState(25)
const [isCreatingSummary, setIsCreatingSummary] = useState(false)
const [conversationType, setConversationType] = useState('general')
// Function to build conversation history (from Simple Memory)
const buildConversationHistory = (messages) => {
return messages
.filter(msg => !msg.isStreaming)
.map(msg => ({
role: msg.isUser ? "user" : "assistant",
content: msg.text
}));
};
// 🆕 SUMMARY MEMORY ADDITION: Detect conversation type automatically
const detectConversationType = (messages) => {
const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();
if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) {
return 'technical';
} else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) {
return 'creative';
} else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) {
return 'support';
}
return 'general';
};
// 🆕 SUMMARY MEMORY ADDITION: Create summary with intelligent timing
const createSummary = async (messagesToSummarize) => {
if (isCreatingSummary) return; // Prevent multiple simultaneous summaries
try {
setIsCreatingSummary(true);
// Detect conversation type for better summaries
const detectedType = detectConversationType(messages);
const response = await fetch('http://localhost:8000/api/summarize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: messagesToSummarize,
conversationType: detectedType
}),
});
const data = await response.json();
if (data.success) {
setSummary(data.summary);
setConversationType(data.conversationType);
console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`);
}
} catch (error) {
console.error("Failed to create summary:", error);
} finally {
setIsCreatingSummary(false);
}
};
// 🆕 SUMMARY MEMORY ADDITION: Smart summary triggers
const shouldCreateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold && !summary;
};
const shouldUpdateSummary = (conversationHistory) => {
return conversationHistory.length >= summaryThreshold * 2 && summary;
};
const isGoodTimeToSummarize = (conversationHistory) => {
const recentMessages = conversationHistory.slice(-3);
// Check if we're in middle of complex topic
const hasCodeDiscussion = recentMessages.some(msg =>
msg.content.includes('```') || msg.content.includes('function'));
const hasFollowUp = recentMessages.some(msg =>
msg.content.toLowerCase().includes('can you explain') ||
msg.content.toLowerCase().includes('tell me more') ||
msg.content.toLowerCase().includes('what about'));
return !hasCodeDiscussion && !hasFollowUp;
};
// 🆕 SUMMARY MEMORY ADDITION: Calculate memory statistics
const getMemoryStats = () => {
const totalMessages = messages.filter(msg => !msg.isStreaming).length
const recentMessages = Math.min(totalMessages, recentWindowSize)
const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)
return { totalMessages, recentMessages, summarizedMessages }
};
// 🆕 SUMMARY MEMORY ADDITION: Manual summary trigger
const triggerManualSummary = async () => {
const conversationHistory = buildConversationHistory(messages);
if (conversationHistory.length >= 10) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
await createSummary(messagesToSummarize);
}
};
const sendMessage = async () => {
if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() }
setMessages(prev => [...prev, userMessage])
const currentInput = input
setInput('')
setIsStreaming(true)
const aiMessageId = Date.now() + 1
const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true }
setMessages(prev => [...prev, aiMessage])
try {
// Build conversation history from current messages
const conversationHistory = buildConversationHistory(messages)
// 🆕 SUMMARY MEMORY ADDITION: Smart summary timing - happens in background
if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
} else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
createSummary(messagesToSummarize); // No await - background process
}
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
message: currentInput,
conversationHistory: conversationHistory,
summary: summary, // 🆕 SUMMARY MEMORY ADDITION: Include summary
recentWindowSize: recentWindowSize // 🆕 SUMMARY MEMORY ADDITION: Include window size
}),
signal: abortControllerRef.current.signal,
})
if (!response.ok) {
throw new Error('Failed to get response')
}
// Read the stream (unchanged)
const reader = response.body.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: msg.text + chunk }
: msg
)
)
}
// Mark streaming as complete (unchanged)
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, isStreaming: false }
: msg
)
)
} catch (error) {
if (error.name === 'AbortError') {
console.log('Request was cancelled')
} else {
console.error('Streaming error:', error)
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false }
: msg
)
)
}
} finally {
setIsStreaming(false)
abortControllerRef.current = null
}
}
const handleKeyPress = (e) => {
if (e.key === 'Enter' && !e.shiftKey && !isStreaming) {
e.preventDefault()
sendMessage()
}
}
const stopStreaming = () => {
if (abortControllerRef.current) {
abortControllerRef.current.abort()
}
}
return (
<div className="min-h-screen bg-gray-100 flex items-center justify-center p-4">
<div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col">
{/* 🔄 ENHANCED: Header with summary controls */}
<div className="bg-blue-500 text-white p-4 rounded-t-lg">
<div className="flex justify-between items-start">
<div>
<h1 className="text-xl font-bold">Smart Summary Memory Chat</h1>
<p className="text-blue-100 text-sm">Intelligent conversation memory with automatic summarization</p>
</div>
<div className="text-right space-y-2">
<div>
<label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label>
<input
type="range" min="5" max="30" value={recentWindowSize}
onChange={(e) => setRecentWindowSize(parseInt(e.target.value))}
className="w-20" disabled={isStreaming}
/>
</div>
<div>
<label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label>
<input
type="range" min="15" max="50" value={summaryThreshold}
onChange={(e) => setSummaryThreshold(parseInt(e.target.value))}
className="w-20" disabled={isStreaming}
/>
</div>
<button
onClick={triggerManualSummary}
disabled={isCreatingSummary || messages.length < 10}
className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50"
>
Create Summary Now
</button>
</div>
</div>
</div>
{/* 🆕 SUMMARY MEMORY ADDITION: Memory status dashboard */}
<div className="bg-gray-50 px-4 py-3 border-b">
{(() => {
const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();
return (
<div className="space-y-2">
<div className="flex justify-between items-center text-sm">
<div className="flex space-x-4 text-gray-600">
<span>📊 Total: {totalMessages}</span>
<span>🔥 Recent: {recentMessages}</span>
{summarizedMessages > 0 && (
<span>📝 Summarized: {summarizedMessages}</span>
)}
<span className="text-blue-600">🧠 Type: {conversationType}</span>
</div>
<div className="flex items-center space-x-2 text-xs">
{summary && (
<span className="text-green-600">✅ Summary Active</span>
)}
{isCreatingSummary && (
<span className="text-blue-600">🔄 Creating Summary...</span>
)}
</div>
</div>
{/* Memory usage bar */}
<div className="w-full bg-gray-200 rounded-full h-2">
<div
className="bg-blue-500 h-2 rounded-full transition-all duration-300"
style={{
width: `${Math.min(100, (totalMessages / 50) * 100)}%`
}}
/>
</div>
<div className="text-xs text-gray-500 text-center">
Memory usage: {totalMessages}/50 messages before optimization
</div>
</div>
);
})()}
</div>
{/* 🆕 SUMMARY MEMORY ADDITION: Active summary display */}
{summary && (
<div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-4 mt-2 rounded">
<div className="flex items-start">
<span className="text-blue-600 mr-2">📋</span>
<div className="flex-1">
<p className="text-xs font-medium text-blue-800 mb-1">
Active Summary ({conversationType})
</p>
<p className="text-xs text-blue-700 leading-relaxed">
{summary}
</p>
</div>
</div>
</div>
)}
{/* Messages (unchanged) */}
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.length === 0 && (
<div className="text-center text-gray-500 mt-20">
<Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" />
<p>Send a message to see streaming and summary memory in action!</p>
</div>
)}
{messages.map((message) => (
<div
key={message.id}
className={`flex items-start space-x-3 ${
message.isUser ? 'justify-end' : 'justify-start'
}`}
>
{!message.isUser && (
<div className="bg-blue-500 p-2 rounded-full">
<Bot className="w-4 h-4 text-white" />
</div>
)}
<div
className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${
message.isUser
? 'bg-blue-500 text-white'
: 'bg-gray-200 text-gray-800'
}`}
>
{message.text}
{message.isStreaming && (
<span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" />
)}
</div>
{message.isUser && (
<div className="bg-gray-500 p-2 rounded-full">
<User className="w-4 h-4 text-white" />
</div>
)}
</div>
))}
</div>
{/* Input (unchanged) */}
<div className="border-t p-4">
<div className="flex space-x-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={handleKeyPress}
placeholder="Type your message..."
className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500"
disabled={isStreaming}
/>
{isStreaming ? (
<button
onClick={stopStreaming}
className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors"
>
Stop
</button>
) : (
<button
onClick={sendMessage}
disabled={!input.trim()}
className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors"
>
<Send className="w-5 h-5" />
</button>
)}
</div>
</div>
</div>
</div>
)
}
export default StreamingChat

🆕 New State Variables (Lines 9-13):

  • summary - Current conversation summary
  • recentWindowSize - How many recent messages to keep
  • summaryThreshold - When to trigger summary creation
  • isCreatingSummary - Summary creation status
  • conversationType - Detected conversation type

🆕 New Functions (Lines 24-89):

  • detectConversationType() - Auto-detects conversation type
  • createSummary() - Creates summaries via API call
  • shouldCreateSummary() / shouldUpdateSummary() - Smart triggers
  • isGoodTimeToSummarize() - Timing intelligence
  • getMemoryStats() - Memory usage calculations
  • triggerManualSummary() - Manual summary creation

🔄 Enhanced sendMessage (Lines 91-163):

  • Background summary creation
  • Includes summary and window size in requests
  • Non-blocking summarization

🔄 Enhanced UI (Lines 174-350):

  • Summary controls in header
  • Memory status dashboard
  • Active summary display
  • Visual indicators for memory usage

Frontend State → Summary Creation → Enhanced Context
messages = [
{ text: "I'm Sarah, 25", isUser: true, id: 1 }, // Gets summarized
{ text: "Nice to meet you!", isUser: false, id: 2 }, // Gets summarized
{ text: "I work in NYC", isUser: true, id: 3 }, // Gets summarized
// ... 20 more messages ...
{ text: "What frameworks?", isUser: true, id: 23 }, // Recent (kept)
{ text: "React is great!", isUser: false, id: 24 }, // Recent (kept)
{ text: "Tell me about CSS", isUser: true, id: 25 } // Current message
]
↓ Summary Creation (messages 1-10) ↓
summary = "User Sarah (25) is a frontend dev in NYC building a React todo app with Firebase auth and TypeScript"
recentMessages = [messages 11-24] // Last 15 messages
currentMessage = "Tell me about CSS"
↓ Enhanced Context ↓
contextualMessage = `
Previous conversation summary:
User Sarah (25) is a frontend dev in NYC building a React todo app with Firebase auth and TypeScript
Recent conversation:
User: What frameworks should I use?
Assistant: React is great for modern apps...
Current question: Tell me about CSS
`
// Before Summary (25 messages):
Request = [msg1, msg2, msg3, ..., msg25] = ~2,500 tokens
// After Summary (25+ messages):
Request = summary + [msg16, msg17, ..., msg25] = ~800 tokens
// 70% cost reduction while maintaining full context!

1. Tell AI your name and age (message 1-2)
2. Have 20+ messages about different topics
3. Ask "What do you remember about me?"
4. AI should remember details from message 1 via summary
1. Start complex coding discussion
2. Notice summary waits for natural break
3. Change topics completely
4. Summary triggers automatically
1. Watch memory usage bar grow
2. See summarization reduce effective memory
3. Compare with Simple Memory costs
4. Verify 70%+ savings in long conversations

Your intelligent Summary Memory system now provides:

  • Context retention - Never loses important conversation details
  • Cost optimization - Up to 70% savings on long conversations
  • Intelligent timing - Summarizes at natural conversation breaks
  • Type detection - Different summary styles for different conversation types
  • Background processing - Chat responses stay instant
  • Visual feedback - Real-time memory usage indicators
  • User controls - Adjustable thresholds and manual triggers
  • Error handling - Graceful degradation when summarization fails
  • Unlimited conversations - Scales to any conversation length
  • Smart fallbacks - Works with Simple Memory when no summary exists
  • Multiple conversation types - Technical, creative, support, general
  • Real-time optimization - Automatic memory management

This is production-ready memory management that combines the best of Simple Memory (context retention) and Sliding Window (cost control) without the downsides of either approach! 🧠✨