πͺ Sliding Window Memory
Your AI chat has memory, but what happens when conversations get long? Costs skyrocket and you might hit token limits! πΈ
Sliding window memory solves this by keeping only the most recent messages while maintaining context. Itβs like having a short-term memory that focuses on whatβs most relevant.
Building on: This assumes youβve completed the Simple Memory Implementation. Weβll enhance that code to add smart sliding window functionality.
π― The Problem with Simple Memory
Section titled βπ― The Problem with Simple MemoryβCost Growth Visualization:
Simple Memory (grows forever):Message 1: [1] β Send 1 messageMessage 5: [1,2,3,4,5] β Send 5 messagesMessage 20: [1,2,3...18,19,20] β Send 20 messagesMessage 50: [1,2,3...48,49,50] β Send 50 messages (expensive!)
Token Cost: 100 β 500 β 2,000 β 5,000 tokens
Real Example: 20-Message Conversation
// Without sliding window - sending everything:conversationHistory = [ { role: "user", content: "Hi, I'm Sarah" }, // Message 1 { role: "assistant", content: "Hello Sarah!" }, // Message 2 { role: "user", content: "I like pizza" }, // Message 3 // ... 14 more messages ... { role: "user", content: "What's my name?" }, // Message 20]// Sends 19 messages as context = expensive!
πͺ How Sliding Window Memory Works
Section titled βπͺ How Sliding Window Memory WorksβSliding Window Visualization:
Sliding Window (stays constant):Window Size = 6 messages
Message 1: [1] β Send 1 messageMessage 5: [1,2,3,4,5] β Send 5 messagesMessage 10: [5,6,7,8,9,10] β Send last 6 messages (forgot 1-4)Message 20: [15,16,17,18,19,20] β Send last 6 messages (forgot 1-14)
Token Cost: 100 β 500 β 600 β 600 tokens (constant!)
Real Example: Same 20-Message Conversation
// With sliding window (size 6) - only recent messages:conversationHistory = [ // Forgot messages 1-13 { role: "assistant", content: "Great choice!" }, // Message 14 { role: "user", content: "I love coding" }, // Message 15 { role: "assistant", content: "What languages?" }, // Message 16 { role: "user", content: "JavaScript mainly" }, // Message 17 { role: "assistant", content: "Excellent choice!" }, // Message 18 { role: "user", content: "What's my name?" }, // Message 19]// Only sends 5 messages as context = much cheaper!// But forgot your name (from message 1)
π οΈ Step 1: Enhanced Backend with Sliding Window
Section titled βπ οΈ Step 1: Enhanced Backend with Sliding WindowβLetβs enhance your memory backend to implement sliding window. Weβll build on your existing code.
Find your /api/chat/stream
route and replace it with this enhanced version:
// π Enhanced streaming endpoint with sliding window memoryapp.post("/api/chat/stream", async (req, res) => { try { // π SLIDING WINDOW: Accept windowSize parameter const { message, conversationHistory = [], windowSize = 10 } = req.body;
if (!message) { return res.status(400).json({ error: "Message is required" }); }
// Set streaming headers (unchanged) res.writeHead(200, { 'Content-Type': 'text/plain', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', });
// π SLIDING WINDOW: Apply sliding window to conversation history const recentHistory = conversationHistory.slice(-windowSize);
// π SLIDING WINDOW: Calculate memory stats for logging const totalMessages = conversationHistory.length; const rememberedMessages = recentHistory.length; const forgottenMessages = Math.max(0, totalMessages - windowSize);
// Build context-aware message for the AI (enhanced) let contextualMessage = message;
if (recentHistory.length > 0) { const context = recentHistory .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`) .join('\n');
// π ENHANCED: Add context about memory limitations let memoryNote = ""; if (forgottenMessages > 0) { memoryNote = `\n\nNote: This conversation started ${totalMessages} messages ago, but I only remember the last ${rememberedMessages} messages. If you reference something from earlier, I may need clarification.`; }
contextualMessage = `Recent conversation:\n${context}${memoryNote}\n\nCurrent question: ${message}`; }
// π SLIDING WINDOW: Log memory usage for debugging console.log(`Sliding Window: Using ${rememberedMessages}/${totalMessages} messages (forgot ${forgottenMessages})`);
// Create streaming response (unchanged) const stream = await openai.responses.create({ model: "gpt-4o-mini", input: contextualMessage, stream: true, });
// Stream each chunk to the frontend (unchanged) for await (const event of stream) { if (event.type === "response.output_text.delta" && event.delta) { const textChunk = event.delta.text || event.delta; res.write(textChunk); res.flush?.(); } }
res.end();
} catch (error) { console.error("Streaming Error:", error);
if (res.headersSent) { res.write("\n[Error occurred]"); res.end(); } else { res.status(500).json({ error: "Failed to stream response" }); } }});
What each sliding window addition does:
π Line 4: windowSize = 10
- Accepts window size parameter from frontend (defaults to 10 messages)
- Why: Allows frontend to control how much memory to use
π Line 17: const recentHistory = conversationHistory.slice(-windowSize)
- Keeps only recent messages using JavaScriptβs slice method
- Why: This is the core of sliding window - limits memory to recent context
π Lines 19-22: Memory statistics calculation
- Tracks forgotten messages to provide feedback to AI
- Why: Helps AI understand when it might be missing context
π Lines 29-35: Enhanced context building
- Adds memory limitation note when messages are forgotten
- Why: Helps AI understand it might be missing context and ask for clarification
π§ Step 2: Add Sliding Window Controls to Frontend
Section titled βπ§ Step 2: Add Sliding Window Controls to FrontendβNow letβs enhance your memory frontend to add sliding window controls and visual feedback.
Add this new state to your component, right after your existing state:
// π SLIDING WINDOW: Window size controlconst [windowSize, setWindowSize] = useState(10)
Add this helper function right after your buildConversationHistory
function:
// π SLIDING WINDOW: Function to calculate memory statisticsconst getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const rememberedMessages = Math.min(totalMessages, windowSize) const forgottenMessages = Math.max(0, totalMessages - windowSize)
return { totalMessages, rememberedMessages, forgottenMessages }};
What this function does:
- Calculates total completed messages in the conversation
- Calculates remembered messages that fit in the current window
- Calculates forgotten messages that have been forgotten
- Returns stats for display in the UI
π Step 3: Update Your Send Message Function
Section titled βπ Step 3: Update Your Send Message FunctionβFind this part of your sendMessage
function:
body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory}),
Replace it with:
body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, windowSize: windowSize // π SLIDING WINDOW: Include window size}),
What this change does:
- Sends window size to backend with each request
- Enables sliding window - backend uses this to limit memory
π Step 4: Update Visual Indicators
Section titled βπ Step 4: Update Visual IndicatorsβLetβs update your UI to show sliding window controls and memory statistics.
Update your header to include the window size control:
Find your header section and replace it with:
{/* Header with sliding window controls */}<div className="bg-gradient-to-r from-blue-600 to-indigo-600 text-white p-6"> <div className="flex justify-between items-center"> <div> <h1 className="text-xl font-bold">β‘ Streaming AI Chat</h1> <p className="text-blue-100 text-sm">Smart sliding window memory!</p> {/* π Updated subtitle */} </div> <div className="text-right"> {/* π Added control section */} <label className="block text-sm text-blue-100 mb-1"> Memory Window </label> <input type="range" min="5" max="30" value={windowSize} onChange={(e) => setWindowSize(parseInt(e.target.value))} className="w-24" disabled={isStreaming} /> <div className="text-xs text-blue-200 mt-1">{windowSize} msgs</div> </div> </div></div>
What this adds:
- Window size slider - Users can adjust memory window from 5-30 messages
- Live value display - Shows current window size
- Disabled during streaming - Prevents changes during AI responses
Add the memory usage indicator below the header:
{/* π SLIDING WINDOW: Memory usage indicator */}<div className="bg-slate-100 px-6 py-3 border-b border-slate-200"> {(() => { const { totalMessages, rememberedMessages, forgottenMessages } = getMemoryStats() return ( <div className="flex justify-between text-sm text-slate-600"> <span> π Total: {totalMessages} messages </span> <span> π§ Remembering: {rememberedMessages} messages </span> {forgottenMessages > 0 && ( <span className="text-orange-600"> π Forgotten: {forgottenMessages} messages </span> )} </div> ) })()}</div>
What this indicator shows:
- Total messages - Complete conversation length
- Remembering - Messages currently in the window
- Forgotten - Messages outside the window (only shows when > 0)
- Color coding - Orange for forgotten messages to draw attention
Update the welcome message:
Find your welcome state and update it:
<h3 className="text-lg font-semibold text-slate-700 mb-2"> Welcome to Smart Memory Chat! {/* π Updated title */}</h3><p className="text-sm">I'll remember recent messages and forget old ones to control costs!</p> {/* π Updated description */}
Update the input placeholder:
placeholder="Ask anything - I'll manage memory intelligently..." {/* π Updated placeholder */}
π Step 5: Your Complete Updated Frontend
Section titled βπ Step 5: Your Complete Updated FrontendβHereβs your complete src/App.jsx
with sliding window functionality integrated:
import { useState, useRef } from 'react'import { Send, Bot, User } from 'lucide-react'import ReactMarkdown from 'react-markdown'
// Component: Handles message content with markdown formattingfunction MessageContent({ message }) { if (message.isUser) { return ( <p className="text-sm leading-relaxed whitespace-pre-wrap"> {message.text} {message.isStreaming && ( <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" /> )} </p> ) }
return ( <div className="text-sm leading-relaxed"> <ReactMarkdown components={{ h1: ({children}) => <h1 className="text-lg font-bold mb-2 text-slate-800">{children}</h1>, h2: ({children}) => <h2 className="text-base font-bold mb-2 text-slate-800">{children}</h2>, h3: ({children}) => <h3 className="text-sm font-bold mb-1 text-slate-800">{children}</h3>, p: ({children}) => <p className="mb-2 last:mb-0 text-slate-700">{children}</p>, ul: ({children}) => <ul className="list-disc list-inside mb-2 space-y-1">{children}</ul>, ol: ({children}) => <ol className="list-decimal list-inside mb-2 space-y-1">{children}</ol>, li: ({children}) => <li className="text-slate-700">{children}</li>, code: ({inline, children}) => { const copyToClipboard = (text) => { navigator.clipboard.writeText(text) }
if (inline) { return ( <code className="bg-slate-100 text-red-600 px-1.5 py-0.5 rounded text-xs font-mono border"> {children} </code> ) }
return ( <div className="relative group mb-2"> <code className="block bg-gray-900 text-green-400 p-4 rounded-lg text-xs font-mono overflow-x-auto whitespace-pre border-l-4 border-blue-400 shadow-sm"> {children} </code> <button onClick={() => copyToClipboard(children)} className="absolute top-2 right-2 bg-slate-600 hover:bg-slate-500 text-white px-2 py-1 rounded text-xs opacity-0 group-hover:opacity-100 transition-opacity" > Copy </button> </div> ) }, pre: ({children}) => <div className="mb-2">{children}</div>, strong: ({children}) => <strong className="font-semibold text-slate-800">{children}</strong>, em: ({children}) => <em className="italic text-slate-700">{children}</em>, blockquote: ({children}) => ( <blockquote className="border-l-4 border-blue-200 pl-4 italic text-slate-600 mb-2"> {children} </blockquote> ), a: ({href, children}) => ( <a href={href} className="text-blue-600 hover:text-blue-800 underline" target="_blank" rel="noopener noreferrer"> {children} </a> ), }} > {message.text} </ReactMarkdown> {message.isStreaming && ( <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" /> )} </div> )}
function App() { // State management const [messages, setMessages] = useState([]) const [input, setInput] = useState('') const [isStreaming, setIsStreaming] = useState(false) const abortControllerRef = useRef(null)
// π SLIDING WINDOW: Window size control const [windowSize, setWindowSize] = useState(10)
// MEMORY: Function to build conversation history const buildConversationHistory = (messages) => { return messages .filter(msg => !msg.isStreaming) .map(msg => ({ role: msg.isUser ? "user" : "assistant", content: msg.text })); };
// π SLIDING WINDOW: Function to calculate memory statistics const getMemoryStats = () => { const totalMessages = messages.filter(msg => !msg.isStreaming).length const rememberedMessages = Math.min(totalMessages, windowSize) const forgottenMessages = Math.max(0, totalMessages - windowSize)
return { totalMessages, rememberedMessages, forgottenMessages } };
// Helper functions (same as before) const createAiPlaceholder = () => { const aiMessageId = Date.now() + 1 const aiMessage = { text: "", isUser: false, id: aiMessageId, isStreaming: true, } setMessages(prev => [...prev, aiMessage]) return aiMessageId }
const readStream = async (response, aiMessageId) => { const reader = response.body.getReader() const decoder = new TextDecoder()
while (true) { const { done, value } = await reader.read() if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: msg.text + chunk } : msg ) ) } }
const sendMessage = async () => { if (!input.trim() || isStreaming) return
const userMessage = { text: input.trim(), isUser: true, id: Date.now() } setMessages(prev => [...prev, userMessage])
const currentInput = input setInput('') setIsStreaming(true) const aiMessageId = createAiPlaceholder()
try { const conversationHistory = buildConversationHistory(messages)
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message: currentInput, conversationHistory: conversationHistory, windowSize: windowSize // π SLIDING WINDOW: Include window size }), signal: abortControllerRef.current.signal, })
if (!response.ok) throw new Error('Failed to get response')
await readStream(response, aiMessageId)
setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, isStreaming: false } : msg ) )
} catch (error) { if (error.name !== 'AbortError') { console.error('Streaming error:', error) setMessages(prev => prev.map(msg => msg.id === aiMessageId ? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false } : msg ) ) } } finally { setIsStreaming(false) abortControllerRef.current = null } }
const stopStreaming = () => { if (abortControllerRef.current) { abortControllerRef.current.abort() } }
const handleKeyPress = (e) => { if (e.key === 'Enter' && !e.shiftKey && !isStreaming) { e.preventDefault() sendMessage() } }
return ( <div className="min-h-screen bg-gradient-to-br from-slate-100 to-blue-50 flex items-center justify-center p-4"> <div className="bg-white rounded-2xl shadow-2xl w-full max-w-2xl h-[700px] flex flex-col overflow-hidden">
{/* Header with sliding window controls */} <div className="bg-gradient-to-r from-blue-600 to-indigo-600 text-white p-6"> <div className="flex justify-between items-center"> <div> <h1 className="text-xl font-bold">β‘ Streaming AI Chat</h1> <p className="text-blue-100 text-sm">Smart sliding window memory!</p> </div> <div className="text-right"> <label className="block text-sm text-blue-100 mb-1"> Memory Window </label> <input type="range" min="5" max="30" value={windowSize} onChange={(e) => setWindowSize(parseInt(e.target.value))} className="w-24" disabled={isStreaming} /> <div className="text-xs text-blue-200 mt-1">{windowSize} msgs</div> </div> </div> </div>
{/* π SLIDING WINDOW: Memory usage indicator */} <div className="bg-slate-100 px-6 py-3 border-b border-slate-200"> {(() => { const { totalMessages, rememberedMessages, forgottenMessages } = getMemoryStats() return ( <div className="flex justify-between text-sm text-slate-600"> <span> π Total: {totalMessages} messages </span> <span> π§ Remembering: {rememberedMessages} messages </span> {forgottenMessages > 0 && ( <span className="text-orange-600"> π Forgotten: {forgottenMessages} messages </span> )} </div> ) })()} </div>
{/* Messages Area */} <div className="flex-1 overflow-y-auto p-6 space-y-4 bg-slate-50"> {messages.length === 0 ? ( <div className="text-center text-slate-500 mt-20"> <div className="w-16 h-16 bg-blue-100 rounded-2xl flex items-center justify-center mx-auto mb-4"> <Bot className="w-8 h-8 text-blue-600" /> </div> <h3 className="text-lg font-semibold text-slate-700 mb-2"> Welcome to Smart Memory Chat! </h3> <p className="text-sm">I'll remember recent messages and forget old ones to control costs!</p> </div> ) : ( messages.map(message => ( <div key={message.id} className={`flex items-start space-x-3 ${ message.isUser ? 'justify-end' : 'justify-start' }`} > {!message.isUser && ( <div className="w-8 h-8 bg-gradient-to-r from-blue-500 to-indigo-600 rounded-full flex items-center justify-center flex-shrink-0"> <Bot className="w-4 h-4 text-white" /> </div> )}
<div className={`max-w-xs lg:max-w-md px-4 py-3 rounded-2xl ${ message.isUser ? 'bg-gradient-to-r from-blue-600 to-indigo-600 text-white' : 'bg-white text-slate-800 shadow-sm border border-slate-200' }`} > <MessageContent message={message} /> </div>
{message.isUser && ( <div className="w-8 h-8 bg-gradient-to-r from-slate-400 to-slate-600 rounded-full flex items-center justify-center flex-shrink-0"> <User className="w-4 h-4 text-white" /> </div> )} </div> )) )} </div>
{/* Input Area */} <div className="bg-white border-t border-slate-200 p-4"> <div className="flex space-x-3"> <input type="text" value={input} onChange={(e) => setInput(e.target.value)} onKeyPress={handleKeyPress} placeholder="Ask anything - I'll manage memory intelligently..." disabled={isStreaming} className="flex-1 border border-slate-300 rounded-xl px-4 py-3 focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:bg-slate-100 transition-all duration-200" />
{isStreaming ? ( <button onClick={stopStreaming} className="bg-gradient-to-r from-red-500 to-red-600 hover:from-red-600 hover:to-red-700 text-white px-6 py-3 rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg" > <span className="w-2 h-2 bg-white rounded-full"></span> <span className="hidden sm:inline">Stop</span> </button> ) : ( <button onClick={sendMessage} disabled={!input.trim()} className="bg-gradient-to-r from-blue-600 to-indigo-600 hover:from-blue-700 hover:to-indigo-700 disabled:from-slate-300 disabled:to-slate-300 text-white px-6 py-3 rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg disabled:shadow-none" > <Send className="w-4 h-4" /> <span className="hidden sm:inline">Send</span> </button> )} </div>
{isStreaming && ( <div className="mt-3 flex items-center justify-center text-sm text-slate-500"> <div className="flex space-x-1 mr-2"> <div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce"></div> <div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce" style={{animationDelay: '0.1s'}}></div> <div className="w-2 h-2 bg-blue-400 rounded-full animate-bounce" style={{animationDelay: '0.2s'}}></div> </div> AI is generating response... </div> )} </div> </div> </div> )}
export default App
What this complete component now includes:
- β All previous features - Streaming, memory, markdown formatting, copy buttons
- β Sliding window memory - Controls conversation length and costs
- β User controls - Adjustable window size with slider
- β Visual feedback - Real-time memory usage statistics
- β Smart cost management - Prevents unlimited token growth
π§ͺ Step 5: Test Your Sliding Window Memory
Section titled βπ§ͺ Step 5: Test Your Sliding Window MemoryβStart both servers:
Backend:
cd openai-backendnpm run dev
Frontend:
cd openai-frontendnpm run dev
Test with this conversation to see sliding window in action:
- Set window size to 5 using the slider
- Have this test conversation:
Message 1: "My name is Sarah and I'm 25 years old"Message 2: "I love pizza with pepperoni"Message 3: "I work as a software developer"Message 4: "I use JavaScript and React"Message 5: "I live in New York City"
[Window is now full - next messages will start forgetting]
Message 6: "I have a cat named Whiskers"Message 7: "What's my name and age?"
Expected behavior:
- AI should remember your cat (recent)
- AI should forget your name and age (forgotten in sliding window)
- Memory indicator shows whatβs being remembered vs forgotten
Watch the memory indicator:
- Total messages increases with each message
- Remembering stops growing at your window size
- Forgotten appears when window is exceeded
π Memory Window Size Guidelines
Section titled βπ Memory Window Size GuidelinesβChoosing the Right Window Size:
Small Window (5-8 messages)
- β Pros: Very low costs, fast responses
- β Cons: Forgets context quickly, may need frequent clarification
- π― Best for: Simple Q&A, customer service, cost-sensitive applications
Medium Window (10-15 messages)
- β Pros: Good balance of cost and context retention
- β Cons: May lose important context in longer discussions
- π― Best for: Most applications, general chat, moderate conversations
Large Window (20-30 messages)
- β Pros: Maintains lots of context, handles complex discussions
- β Cons: Higher costs, slower responses, more tokens per message
- π― Best for: Complex problem-solving, detailed analysis, premium features
π° Real Cost Impact
Section titled βπ° Real Cost Impactβ50-Message Conversation Cost Analysis:
Without Sliding Window (Simple Memory):
Message 1: [1] β 100 tokensMessage 10: [1,2,3,4,5,6,7,8,9,10] β 1,000 tokensMessage 25: [1,2,3...23,24,25] β 2,500 tokensMessage 50: [1,2,3...48,49,50] β 5,000 tokens
Total Cost: ~125,000 tokens πΈπΈπΈ
With Sliding Window (Size 10):
Message 1: [1] β 100 tokensMessage 10: [1,2,3,4,5,6,7,8,9,10] β 1,000 tokensMessage 25: [16,17,18,19,20,21,22,23,24,25] β 1,000 tokens (capped!)Message 50: [41,42,43,44,45,46,47,48,49,50] β 1,000 tokens (capped!)
Total Cost: ~25,000 tokens π° (80% savings!)
π§ Common Issues & Solutions
Section titled βπ§ Common Issues & Solutionsββ AI forgets important information too quickly
- Increase window size using the slider
- Consider which information is truly essential for context
β Costs still too high
- Decrease window size for more aggressive cost control
- Monitor the memory indicator to find the right balance
β AI seems confused about context
- Check that the memory note is being sent to AI
- Verify backend is logging memory stats correctly
β Window size not updating
- Make sure youβre sending
windowSize
in the request body - Check that slider onChange updates state properly
β οΈ Sliding Window Considerations
Section titled ββ οΈ Sliding Window ConsiderationsβWhat Gets Forgotten:
- Early conversation context - Names, preferences mentioned at the start
- Detailed instructions - Complex setup from beginning of chat
- Historical references - Things discussed many messages ago
What Gets Remembered:
- Recent context - Last N messages in the conversation
- Current conversation flow - What youβre currently discussing
- Immediate references - Things mentioned in recent messages
Best Practices:
- Important info should be repeated if the conversation is long
- Key context can be re-established when needed
- Window size should match conversation complexity
β¨ Lesson Recap
Section titled ββ¨ Lesson RecapβExcellent work! π Youβve implemented intelligent memory management that scales with conversation length.
What youβve accomplished:
- πͺ Sliding window memory - Maintains recent context while controlling costs
- ποΈ User controls - Adjustable window size for different needs
- π Visual feedback - Real-time memory usage statistics
- π° Cost optimization - Up to 80% savings on long conversations
- π§ Smart AI awareness - AI knows when it might be missing context
You now understand:
- π Memory strategies - Different approaches to conversation context
- πΈ Token economics - How to balance context and cost
- π― User experience - Providing control and transparency
- π οΈ Production patterns - Scalable memory management
Your chat now provides intelligent memory management that works for both short conversations and extended discussions. The sliding window ensures costs stay predictable while maintaining relevant context!
π Next: Persistent Memory & Database Storage - Letβs explore memory that survives page refreshes and user sessions!