🪟 Sliding Window Memory
Simple memory works great for short conversations, but what happens when users have long chats? Costs skyrocket and you might hit token limits. Sliding window memory solves this by keeping only the most recent messages while maintaining context.
Building on: This guide assumes you’ve completed the Simple Memory Implementation. We’ll enhance that code to add sliding window functionality.
🎯 The Problem with Simple Memory
Section titled “🎯 The Problem with Simple Memory”Cost Growth Visualization
Section titled “Cost Growth Visualization”Simple Memory (grows forever):Message 1: [1] → Send 1 messageMessage 5: [1,2,3,4,5] → Send 5 messagesMessage 20: [1,2,3...18,19,20] → Send 20 messagesMessage 50: [1,2,3...48,49,50] → Send 50 messages (expensive!)
Token Cost: 100 → 500 → 2,000 → 5,000 tokens
Real Example: 20-Message Conversation
Section titled “Real Example: 20-Message Conversation”// Without sliding window - sending everything:conversationHistory = [ { role: "user", content: "Hi, I'm Sarah" }, // Message 1 { role: "assistant", content: "Hello Sarah!" }, // Message 2 { role: "user", content: "I like pizza" }, // Message 3 // ... 14 more messages ... { role: "user", content: "What's my name?" }, // Message 20]// Sends 19 messages as context = expensive!
🪟 How Sliding Window Memory Works
Section titled “🪟 How Sliding Window Memory Works”Sliding Window Visualization
Section titled “Sliding Window Visualization”Sliding Window (stays constant):Window Size = 6 messages
Message 1: [1] → Send 1 messageMessage 5: [1,2,3,4,5] → Send 5 messagesMessage 10: [5,6,7,8,9,10] → Send last 6 messages (forgot 1-4)Message 20: [15,16,17,18,19,20] → Send last 6 messages (forgot 1-14)
Token Cost: 100 → 500 → 600 → 600 tokens (constant!)
Real Example: Same 20-Message Conversation
Section titled “Real Example: Same 20-Message Conversation”// With sliding window (size 6) - only recent messages:conversationHistory = [ // Forgot messages 1-13 { role: "assistant", content: "Great choice!" }, // Message 14 { role: "user", content: "I love coding" }, // Message 15 { role: "assistant", content: "What languages?" }, // Message 16 { role: "user", content: "JavaScript mainly" }, // Message 17 { role: "assistant", content: "Excellent choice!" }, // Message 18 { role: "user", content: "What's my name?" }, // Message 19]// Only sends 5 messages as context = much cheaper!// But forgot your name (from message 1)
⚙️ Step 1: Enhanced Backend with Sliding Window
Section titled “⚙️ Step 1: Enhanced Backend with Sliding Window”Let’s enhance your working memory backend to implement sliding window. We’ll build on your existing code.
Understanding the Backend Enhancement
Section titled “Understanding the Backend Enhancement”Your current memory backend accepts all conversation history. We’ll enhance it to:
- Accept window size parameter from frontend
- Slice conversation history to only recent messages
- Add smart context about memory limitations
- Provide feedback about what’s being remembered
Updated Backend Code with Highlighted Changes
Section titled “Updated Backend Code with Highlighted Changes”// Enhanced streaming endpoint with sliding window memoryapp.post("/api/chat/stream", async (req, res) => { try { // 🆕 SLIDING WINDOW ADDITION: Accept windowSize parameter const { message, conversationHistory = [], windowSize = 10 } = req.body;
if (!message) { return res.status(400).json({ error: "Message is required" }); }
// Set headers for streaming (unchanged) res.writeHead(200, { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', });
// 🆕 SLIDING WINDOW ADDITION: Apply sliding window to conversation history const recentHistory = conversationHistory.slice(-windowSize);
// 🆕 SLIDING WINDOW ADDITION: Calculate memory stats for logging const totalMessages = conversationHistory.length; const rememberedMessages = recentHistory.length; const forgottenMessages = Math.max(0, totalMessages - windowSize);
// Build context-aware message for the AI (enhanced) let contextualMessage = message;
if (recentHistory.length > 0) { const context = recentHistory .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n');
// 🔄 ENHANCED: Add context about memory limitations let memoryNote = ""; if (forgottenMessages > 0) { memoryNote = `\n\nNote: This conversation started ${totalMessages} messages ago, but I only remember the last ${rememberedMessages} messages. If you reference something from earlier, I may need clarification.`; }
contextualMessage = `Recent conversation:${context}${memoryNote}\n\nCurrent question: ${message}`; }
// 🆕 SLIDING WINDOW ADDITION: Log memory usage for debugging console.log(`Sliding Window: Using ${rememberedMessages}/${totalMessages} messages (forgot ${forgottenMessages})`);
// Create streaming response using Response API (unchanged) const stream = await openai.responses.create({ model: "gpt-4o-mini", input: contextualMessage, stream: true, });
// Stream each chunk to the frontend - Handle Response API events (unchanged) for await (const event of stream) { switch (event.type) { case "response.output_text.delta": if (event.delta) { let textChunk = typeof event.delta === "string" ? event.delta : event.delta.text || "";
if (textChunk) { res.write(textChunk); res.flush?.(); } } break;
case "text_delta": if (event.text) { res.write(event.text); res.flush?.(); } break;
case "response.created": case "response.completed": case "response.output_item.added": case "response.content_part.added": case "response.content_part.done": case "response.output_item.done": case "response.output_text.done": break;
case "error": console.error("Stream error:", event.error); res.write("\n[Error during generation]"); break; } }
// Close the stream (unchanged) res.end();
} catch (error) { console.error("OpenAI Streaming Error:", error);
if (res.headersSent) { res.write("\n[Error occurred]"); res.end(); } else { res.status(500).json({ error: "Failed to stream AI response", success: false, }); } }});
Summary of Backend Changes
Section titled “Summary of Backend Changes”🆕 Line 4: windowSize = 10
- What it does: Accepts window size parameter (defaults to 10 messages)
- Why: Allows frontend to control how much memory to use
🆕 Line 15: const recentHistory = conversationHistory.slice(-windowSize)
- What it does: Keeps only the last N messages from conversation history
- Why: This is the core of sliding window - limits memory to recent context
🆕 Lines 17-20: Memory statistics calculation
- What it does: Tracks how many messages are remembered vs forgotten
- Why: Useful for logging and debugging memory behavior
🔄 Lines 27-33: Enhanced context building
- What changed: Adds memory limitation note when messages are forgotten
- Why: Helps AI understand it might be missing context
🆕 Line 38: Memory usage logging
- What it does: Logs memory statistics to console
- Why: Helps developers understand sliding window behavior
🔄 Step 2: Enhanced Frontend with Window Controls
Section titled “🔄 Step 2: Enhanced Frontend with Window Controls”Now let’s enhance your working memory frontend to add sliding window controls.
Understanding the Frontend Enhancement
Section titled “Understanding the Frontend Enhancement”Your current memory frontend builds and sends all conversation history. We’ll enhance it to:
- Add window size control for users to adjust memory
- Send window size to backend with each request
- Display memory statistics showing what’s being remembered
- Provide visual feedback about memory usage
Step 2a: Add Window Size State
Section titled “Step 2a: Add Window Size State”Add this new state to your component, right after your existing state:
function StreamingChat() { const [messages, setMessages] = useState([]) const [input, setInput] = useState('') const [isStreaming, setIsStreaming] = useState(false) const abortControllerRef = useRef(null)
// 🆕 SLIDING WINDOW ADDITION: Window size control const [windowSize, setWindowSize] = useState(10)
Step 2b: Add Memory Statistics Helper
Section titled “Step 2b: Add Memory Statistics Helper”Add this helper function right after your buildConversationHistory
function:
// 🆕 SLIDING WINDOW ADDITION: Function to calculate memory statisticsconst getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const rememberedMessages = Math.min(totalMessages, windowSize) const forgottenMessages = Math.max(0, totalMessages - windowSize)
return { totalMessages, rememberedMessages, forgottenMessages }};
What this function does:
- Calculates total: All completed messages in the conversation
- Calculates remembered: How many messages fit in the current window
- Calculates forgotten: How many messages have been forgotten
- Returns stats: Object with all three numbers for display
Step 2c: Enhanced sendMessage Function
Section titled “Step 2c: Enhanced sendMessage Function”Update your sendMessage
function to include window size:
const sendMessage = async () => { if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() } setMessages(prev => [...prev, userMessage])
const currentInput = input setInput('') setIsStreaming(true)
const aiMessageId = Date.now() + 1 const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true } setMessages(prev => [...prev, aiMessage])
try { const conversationHistory = buildConversationHistory(messages)
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, windowSize: windowSize // 🆕 SLIDING WINDOW ADDITION: Include window size }), signal: abortControllerRef.current.signal, })
// ... rest of your existing sendMessage code (unchanged) ... } catch (error) { // ... your existing error handling (unchanged) ... } finally { setIsStreaming(false) abortControllerRef.current = null }}
Key change:
- Line 25:
windowSize: windowSize
- Sends the current window size to backend
Step 2d: Complete Enhanced Component
Section titled “Step 2d: Complete Enhanced Component”Here’s your complete component with sliding window additions highlighted:
import { useState, useRef } from 'react'import { Send, Bot, User } from 'lucide-react'
function StreamingChat() { const [messages, setMessages] = useState([]) const [input, setInput] = useState('') const [isStreaming, setIsStreaming] = useState(false) const abortControllerRef = useRef(null)
// 🆕 SLIDING WINDOW ADDITION: Window size control const [windowSize, setWindowSize] = useState(10)
// Function to build conversation history (unchanged from simple memory) const buildConversationHistory = (messages) => { return messages .filter(msg => !msg.isStreaming) .map(msg => ({ role: msg.isUser ? "user" : "assistant", content: msg.text })); };
// 🆕 SLIDING WINDOW ADDITION: Function to calculate memory statistics const getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const rememberedMessages = Math.min(totalMessages, windowSize) const forgottenMessages = Math.max(0, totalMessages - windowSize)
return { totalMessages, rememberedMessages, forgottenMessages } };
const sendMessage = async () => { if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() } setMessages(prev => [...prev, userMessage])
const currentInput = input setInput('') setIsStreaming(true)
const aiMessageId = Date.now() + 1 const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true } setMessages(prev => [...prev, aiMessage])
try { const conversationHistory = buildConversationHistory(messages)
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, windowSize: windowSize // 🆕 SLIDING WINDOW ADDITION: Include window size }), signal: abortControllerRef.current.signal, })
if (!response.ok) { throw new Error('Failed to get response') }
const reader = response.body.getReader() const decoder = new TextDecoder()
while (true) { const { done, value } = await reader.read() if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: msg.text + chunk } : msg ) ) }
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, isStreaming: false } : msg ) )
} catch (error) { if (error.name === 'AbortError') { console.log('Request was cancelled') } else { console.error('Streaming error:', error) setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false } : msg ) ) } } finally { setIsStreaming(false) abortControllerRef.current = null } }
const handleKeyPress = (e) => { if (e.key === 'Enter' && !e.shiftKey && !isStreaming) { e.preventDefault() sendMessage() } }
const stopStreaming = () => { if (abortControllerRef.current) { abortControllerRef.current.abort() } }
return ( <div className="min-h-screen bg-gray-100 flex items-center justify-center p-4"> <div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col"> {/* 🔄 ENHANCED: Header with sliding window controls */} <div className="bg-blue-500 text-white p-4 rounded-t-lg"> <div className="flex justify-between items-center"> <div> <h1 className="text-xl font-bold">Streaming AI Chat with Sliding Window</h1> <p className="text-blue-100">Memory window: {windowSize} messages</p> </div> <div className="text-right"> <label className="block text-sm text-blue-100 mb-1"> Memory Window </label> <input type="range" min="5" max="50" value={windowSize} onChange={(e) => setWindowSize(parseInt(e.target.value))} className="w-24" disabled={isStreaming} /> <div className="text-xs text-blue-200 mt-1">{windowSize} msgs</div> </div> </div> </div>
{/* 🆕 SLIDING WINDOW ADDITION: Memory usage indicator */} <div className="bg-gray-50 px-4 py-2 border-b"> {(() => { const { totalMessages, rememberedMessages, forgottenMessages } = getMemoryStats() return ( <div className="flex justify-between text-sm text-gray-600"> <span> 📊 Total: {totalMessages} messages </span> <span> 🧠 Remembering: {rememberedMessages} messages </span> {forgottenMessages > 0 && ( <span className="text-orange-600"> 💭 Forgotten: {forgottenMessages} messages </span> )} </div> ) })()} </div>
{/* Messages (unchanged) */} <div className="flex-1 overflow-y-auto p-4 space-y-4"> {messages.length === 0 && ( <div className="text-center text-gray-500 mt-20"> <Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" /> <p>Send a message to see streaming and sliding window memory in action!</p> </div> )}
{messages.map((message) => ( <div key={message.id} className={`flex items-start space-x-3 ${ message.isUser ? 'justify-end' : 'justify-start' }`} > {!message.isUser && ( <div className="bg-blue-500 p-2 rounded-full"> <Bot className="w-4 h-4 text-white" /> </div> )}
<div className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${ message.isUser ? 'bg-blue-500 text-white' : 'bg-gray-200 text-gray-800' }`} > {message.text} {message.isStreaming && ( <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" /> )} </div>
{message.isUser && ( <div className="bg-gray-500 p-2 rounded-full"> <User className="w-4 h-4 text-white" /> </div> )} </div> ))} </div>
{/* Input (unchanged) */} <div className="border-t p-4"> <div className="flex space-x-2"> <input type="text" value={input} onChange={(e) => setInput(e.target.value)} onKeyPress={handleKeyPress} placeholder="Type your message..." className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500" disabled={isStreaming} /> {isStreaming ? ( <button onClick={stopStreaming} className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors" > Stop </button> ) : ( <button onClick={sendMessage} disabled={!input.trim()} className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors" > <Send className="w-5 h-5" /> </button> )} </div> </div> </div> </div> )}
export default StreamingChat
Summary of Frontend Changes
Section titled “Summary of Frontend Changes”🆕 Line 9: const [windowSize, setWindowSize] = useState(10)
- What it does: Adds state to control sliding window size
- Why: Users can adjust how much memory the AI has
🆕 Lines 23-30: getMemoryStats
function
- What it does: Calculates memory usage statistics
- Why: Shows users what’s being remembered vs forgotten
🆕 Line 55: windowSize: windowSize
in request body
- What it does: Sends window size to backend
- Why: Backend needs this to apply the sliding window
🔄 Lines 125-143: Enhanced header with slider control
- What changed: Added range input to control window size
- Why: Gives users control over memory vs cost tradeoff
🆕 Lines 146-162: Memory usage indicator
- What it does: Shows real-time memory statistics
- Why: Visual feedback about what’s being remembered
🧪 Test Your Sliding Window Memory
Section titled “🧪 Test Your Sliding Window Memory”Step-by-Step Testing Guide
Section titled “Step-by-Step Testing Guide”- Start both servers (backend and frontend)
- Open your enhanced streaming chat
- Set window size to 5 using the slider
- Have this test conversation:
Message 1You: "My name is Sarah and I'm 25 years old"AI: "Nice to meet you, Sarah! It's great to know you're 25."
Message 2-3 (fill the window)You: "I love pizza"AI: "Pizza is delicious! What's your favorite topping?"
Message 4-5 (window getting full)You: "I like pepperoni"AI: "Great choice! Pepperoni is a classic."
Message 6-7 (window now full, starts forgetting)You: "I work as a developer"AI: "That's awesome! What kind of development do you do?"
Message 8-9 (forgot messages 1-2, including your name!)You: "I code in JavaScript"AI: "JavaScript is great! Very versatile language."
Message 10 (test the memory limitation)You: "What's my name and age?"AI: "I don't see that information in our recent conversation. Could you remind me what your name and age are?"
Visual Memory Test
Section titled “Visual Memory Test”Watch the memory indicator as you chat:
- Total messages increases with each message
- Remembering stops growing at your window size
- Forgotten appears and grows when window is exceeded
Try adjusting the window size during conversation and see how it affects the AI’s memory!
📊 Memory Window Size Guidelines
Section titled “📊 Memory Window Size Guidelines”Visual Cost Comparison
Section titled “Visual Cost Comparison”Window Size 5: [🧠🧠🧠🧠🧠] → ~500 tokens per messageWindow Size 10: [🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠] → ~1,000 tokens per messageWindow Size 20: [🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠] → ~2,000 tokens per message
Choosing the Right Window Size
Section titled “Choosing the Right Window Size”Small Window (5-8 messages)
- ✅ Pros: Very low costs, fast responses
- ❌ Cons: Forgets context quickly, may need frequent clarification
- 🎯 Best for: Simple Q&A, customer service, cost-sensitive applications
Medium Window (10-15 messages)
- ✅ Pros: Good balance of cost and context retention
- ❌ Cons: May lose important context in longer discussions
- 🎯 Best for: Most applications, general chat, moderate conversations
Large Window (20-50 messages)
- ✅ Pros: Maintains lots of context, handles complex discussions
- ❌ Cons: Higher costs, slower responses, more tokens per message
- 🎯 Best for: Complex problem-solving, detailed analysis, premium features
💰 Real Cost Impact Visualization
Section titled “💰 Real Cost Impact Visualization”50-Message Conversation Cost Analysis
Section titled “50-Message Conversation Cost Analysis”Without Sliding Window (Simple Memory):
Message 1: [1] → 100 tokensMessage 10: [1,2,3,4,5,6,7,8,9,10] → 1,000 tokensMessage 25: [1,2,3...23,24,25] → 2,500 tokensMessage 50: [1,2,3...48,49,50] → 5,000 tokens
Total Cost: ~125,000 tokens 💸💸💸
With Sliding Window (Size 10):
Message 1: [1] → 100 tokensMessage 10: [1,2,3,4,5,6,7,8,9,10] → 1,000 tokensMessage 25: [16,17,18,19,20,21,22,23,24,25] → 1,000 tokens (capped!)Message 50: [41,42,43,44,45,46,47,48,49,50] → 1,000 tokens (capped!)
Total Cost: ~25,000 tokens 💰 (80% savings!)
📝 For Normal Chat Implementation
Section titled “📝 For Normal Chat Implementation”Apply the same sliding window logic to your normal chat endpoint:
// Normal chat with sliding windowapp.post("/api/chat", async (req, res) => { try { // 🆕 SLIDING WINDOW ADDITION: Accept windowSize parameter const { message, conversationHistory = [], windowSize = 10 } = req.body;
// 🆕 SLIDING WINDOW ADDITION: Apply sliding window const recentHistory = conversationHistory.slice(-windowSize);
// Build context-aware message (same logic as streaming) let contextualMessage = message;
if (recentHistory.length > 0) { const context = recentHistory .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n');
const totalMessages = conversationHistory.length; const forgottenMessages = Math.max(0, totalMessages - windowSize);
let memoryNote = ""; if (forgottenMessages > 0) { memoryNote = `\n\nNote: I only remember the last ${windowSize} messages of our ${totalMessages}-message conversation.`; }
contextualMessage = `Recent conversation:\n${context}${memoryNote}\n\nCurrent question: ${message}`; }
const response = await openai.responses.create({ model: "gpt-4o-mini", input: contextualMessage, });
res.json({ response: response.output_text, success: true, }); } catch (error) { console.error("OpenAI API Error:", error); res.status(500).json({ error: "Failed to get AI response", success: false, }); }});
✅ What You’ve Built
Section titled “✅ What You’ve Built”Your enhanced sliding window memory system now provides:
Smart Memory Management
Section titled “Smart Memory Management”- ✅ Controlled costs - Token usage stays constant regardless of conversation length
- ✅ Recent context - Maintains the most relevant recent conversation
- ✅ User control - Adjustable window size for different use cases
- ✅ Visual feedback - Real-time memory usage statistics
Advanced Features
Section titled “Advanced Features”- ✅ Memory awareness - AI knows when it might be missing context
- ✅ Graceful degradation - Handles memory limitations intelligently
- ✅ Cost optimization - Up to 80% cost savings on long conversations
- ✅ Production ready - Scales to unlimited conversation length
Enhanced User Experience
Section titled “Enhanced User Experience”- ✅ Transparent memory - Users see exactly what’s being remembered
- ✅ Adjustable settings - Real-time window size control
- ✅ Smart responses - AI asks for clarification when needed
- ✅ Visual indicators - Clear memory status display
Perfect for production applications that need predictable costs while maintaining conversation quality! 🚀
Next Steps: Ready to explore even more advanced memory strategies like conversation summarization and persistent storage? Let’s build those next!