Skip to content

🪟 Sliding Window Memory

Simple memory works great for short conversations, but what happens when users have long chats? Costs skyrocket and you might hit token limits. Sliding window memory solves this by keeping only the most recent messages while maintaining context.

Building on: This guide assumes you’ve completed the Simple Memory Implementation. We’ll enhance that code to add sliding window functionality.


Simple Memory (grows forever):
Message 1: [1] → Send 1 message
Message 5: [1,2,3,4,5] → Send 5 messages
Message 20: [1,2,3...18,19,20] → Send 20 messages
Message 50: [1,2,3...48,49,50] → Send 50 messages (expensive!)
Token Cost: 100 → 500 → 2,000 → 5,000 tokens
// Without sliding window - sending everything:
conversationHistory = [
{ role: "user", content: "Hi, I'm Sarah" }, // Message 1
{ role: "assistant", content: "Hello Sarah!" }, // Message 2
{ role: "user", content: "I like pizza" }, // Message 3
// ... 14 more messages ...
{ role: "user", content: "What's my name?" }, // Message 20
]
// Sends 19 messages as context = expensive!

Sliding Window (stays constant):
Window Size = 6 messages
Message 1: [1] → Send 1 message
Message 5: [1,2,3,4,5] → Send 5 messages
Message 10: [5,6,7,8,9,10] → Send last 6 messages (forgot 1-4)
Message 20: [15,16,17,18,19,20] → Send last 6 messages (forgot 1-14)
Token Cost: 100 → 500 → 600 → 600 tokens (constant!)

Real Example: Same 20-Message Conversation

Section titled “Real Example: Same 20-Message Conversation”
// With sliding window (size 6) - only recent messages:
conversationHistory = [
// Forgot messages 1-13
{ role: "assistant", content: "Great choice!" }, // Message 14
{ role: "user", content: "I love coding" }, // Message 15
{ role: "assistant", content: "What languages?" }, // Message 16
{ role: "user", content: "JavaScript mainly" }, // Message 17
{ role: "assistant", content: "Excellent choice!" }, // Message 18
{ role: "user", content: "What's my name?" }, // Message 19
]
// Only sends 5 messages as context = much cheaper!
// But forgot your name (from message 1)

⚙️ Step 1: Enhanced Backend with Sliding Window

Section titled “⚙️ Step 1: Enhanced Backend with Sliding Window”

Let’s enhance your working memory backend to implement sliding window. We’ll build on your existing code.

Your current memory backend accepts all conversation history. We’ll enhance it to:

  1. Accept window size parameter from frontend
  2. Slice conversation history to only recent messages
  3. Add smart context about memory limitations
  4. Provide feedback about what’s being remembered

Updated Backend Code with Highlighted Changes

Section titled “Updated Backend Code with Highlighted Changes”
// Enhanced streaming endpoint with sliding window memory
app.post("/api/chat/stream", async (req, res) => {
try {
// 🆕 SLIDING WINDOW ADDITION: Accept windowSize parameter
const { message, conversationHistory = [], windowSize = 10 } = req.body;
if (!message) {
return res.status(400).json({ error: "Message is required" });
}
// Set headers for streaming (unchanged)
res.writeHead(200, {
'Content-Type': 'text/plain',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
});
// 🆕 SLIDING WINDOW ADDITION: Apply sliding window to conversation history
const recentHistory = conversationHistory.slice(-windowSize);
// 🆕 SLIDING WINDOW ADDITION: Calculate memory stats for logging
const totalMessages = conversationHistory.length;
const rememberedMessages = recentHistory.length;
const forgottenMessages = Math.max(0, totalMessages - windowSize);
// Build context-aware message for the AI (enhanced)
let contextualMessage = message;
if (recentHistory.length > 0) {
const context = recentHistory
.map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
.join('\n');
// 🔄 ENHANCED: Add context about memory limitations
let memoryNote = "";
if (forgottenMessages > 0) {
memoryNote = `\n\nNote: This conversation started ${totalMessages} messages ago, but I only remember the last ${rememberedMessages} messages. If you reference something from earlier, I may need clarification.`;
}
contextualMessage = `Recent conversation:${context}${memoryNote}\n\nCurrent question: ${message}`;
}
// 🆕 SLIDING WINDOW ADDITION: Log memory usage for debugging
console.log(`Sliding Window: Using ${rememberedMessages}/${totalMessages} messages (forgot ${forgottenMessages})`);
// Create streaming response using Response API (unchanged)
const stream = await openai.responses.create({
model: "gpt-4o-mini",
input: contextualMessage,
stream: true,
});
// Stream each chunk to the frontend - Handle Response API events (unchanged)
for await (const event of stream) {
switch (event.type) {
case "response.output_text.delta":
if (event.delta) {
let textChunk = typeof event.delta === "string"
? event.delta
: event.delta.text || "";
if (textChunk) {
res.write(textChunk);
res.flush?.();
}
}
break;
case "text_delta":
if (event.text) {
res.write(event.text);
res.flush?.();
}
break;
case "response.created":
case "response.completed":
case "response.output_item.added":
case "response.content_part.added":
case "response.content_part.done":
case "response.output_item.done":
case "response.output_text.done":
break;
case "error":
console.error("Stream error:", event.error);
res.write("\n[Error during generation]");
break;
}
}
// Close the stream (unchanged)
res.end();
} catch (error) {
console.error("OpenAI Streaming Error:", error);
if (res.headersSent) {
res.write("\n[Error occurred]");
res.end();
} else {
res.status(500).json({
error: "Failed to stream AI response",
success: false,
});
}
}
});

🆕 Line 4: windowSize = 10

  • What it does: Accepts window size parameter (defaults to 10 messages)
  • Why: Allows frontend to control how much memory to use

🆕 Line 15: const recentHistory = conversationHistory.slice(-windowSize)

  • What it does: Keeps only the last N messages from conversation history
  • Why: This is the core of sliding window - limits memory to recent context

🆕 Lines 17-20: Memory statistics calculation

  • What it does: Tracks how many messages are remembered vs forgotten
  • Why: Useful for logging and debugging memory behavior

🔄 Lines 27-33: Enhanced context building

  • What changed: Adds memory limitation note when messages are forgotten
  • Why: Helps AI understand it might be missing context

🆕 Line 38: Memory usage logging

  • What it does: Logs memory statistics to console
  • Why: Helps developers understand sliding window behavior

🔄 Step 2: Enhanced Frontend with Window Controls

Section titled “🔄 Step 2: Enhanced Frontend with Window Controls”

Now let’s enhance your working memory frontend to add sliding window controls.

Your current memory frontend builds and sends all conversation history. We’ll enhance it to:

  1. Add window size control for users to adjust memory
  2. Send window size to backend with each request
  3. Display memory statistics showing what’s being remembered
  4. Provide visual feedback about memory usage

Add this new state to your component, right after your existing state:

function StreamingChat() {
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [isStreaming, setIsStreaming] = useState(false)
const abortControllerRef = useRef(null)
// 🆕 SLIDING WINDOW ADDITION: Window size control
const [windowSize, setWindowSize] = useState(10)

Add this helper function right after your buildConversationHistory function:

// 🆕 SLIDING WINDOW ADDITION: Function to calculate memory statistics
const getMemoryStats = () => {
const totalMessages = messages.filter(msg => !msg.isStreaming).length
const rememberedMessages = Math.min(totalMessages, windowSize)
const forgottenMessages = Math.max(0, totalMessages - windowSize)
return { totalMessages, rememberedMessages, forgottenMessages }
};

What this function does:

  • Calculates total: All completed messages in the conversation
  • Calculates remembered: How many messages fit in the current window
  • Calculates forgotten: How many messages have been forgotten
  • Returns stats: Object with all three numbers for display

Update your sendMessage function to include window size:

const sendMessage = async () => {
if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() }
setMessages(prev => [...prev, userMessage])
const currentInput = input
setInput('')
setIsStreaming(true)
const aiMessageId = Date.now() + 1
const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true }
setMessages(prev => [...prev, aiMessage])
try {
const conversationHistory = buildConversationHistory(messages)
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
message: currentInput,
conversationHistory: conversationHistory,
windowSize: windowSize // 🆕 SLIDING WINDOW ADDITION: Include window size
}),
signal: abortControllerRef.current.signal,
})
// ... rest of your existing sendMessage code (unchanged) ...
} catch (error) {
// ... your existing error handling (unchanged) ...
} finally {
setIsStreaming(false)
abortControllerRef.current = null
}
}

Key change:

  • Line 25: windowSize: windowSize - Sends the current window size to backend

Here’s your complete component with sliding window additions highlighted:

import { useState, useRef } from 'react'
import { Send, Bot, User } from 'lucide-react'
function StreamingChat() {
const [messages, setMessages] = useState([])
const [input, setInput] = useState('')
const [isStreaming, setIsStreaming] = useState(false)
const abortControllerRef = useRef(null)
// 🆕 SLIDING WINDOW ADDITION: Window size control
const [windowSize, setWindowSize] = useState(10)
// Function to build conversation history (unchanged from simple memory)
const buildConversationHistory = (messages) => {
return messages
.filter(msg => !msg.isStreaming)
.map(msg => ({
role: msg.isUser ? "user" : "assistant",
content: msg.text
}));
};
// 🆕 SLIDING WINDOW ADDITION: Function to calculate memory statistics
const getMemoryStats = () => {
const totalMessages = messages.filter(msg => !msg.isStreaming).length
const rememberedMessages = Math.min(totalMessages, windowSize)
const forgottenMessages = Math.max(0, totalMessages - windowSize)
return { totalMessages, rememberedMessages, forgottenMessages }
};
const sendMessage = async () => {
if (!input.trim() || isStreaming) return
const userMessage = { text: input, isUser: true, id: Date.now() }
setMessages(prev => [...prev, userMessage])
const currentInput = input
setInput('')
setIsStreaming(true)
const aiMessageId = Date.now() + 1
const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true }
setMessages(prev => [...prev, aiMessage])
try {
const conversationHistory = buildConversationHistory(messages)
abortControllerRef.current = new AbortController()
const response = await fetch('http://localhost:8000/api/chat/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
message: currentInput,
conversationHistory: conversationHistory,
windowSize: windowSize // 🆕 SLIDING WINDOW ADDITION: Include window size
}),
signal: abortControllerRef.current.signal,
})
if (!response.ok) {
throw new Error('Failed to get response')
}
const reader = response.body.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: msg.text + chunk }
: msg
)
)
}
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, isStreaming: false }
: msg
)
)
} catch (error) {
if (error.name === 'AbortError') {
console.log('Request was cancelled')
} else {
console.error('Streaming error:', error)
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessageId
? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false }
: msg
)
)
}
} finally {
setIsStreaming(false)
abortControllerRef.current = null
}
}
const handleKeyPress = (e) => {
if (e.key === 'Enter' && !e.shiftKey && !isStreaming) {
e.preventDefault()
sendMessage()
}
}
const stopStreaming = () => {
if (abortControllerRef.current) {
abortControllerRef.current.abort()
}
}
return (
<div className="min-h-screen bg-gray-100 flex items-center justify-center p-4">
<div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col">
{/* 🔄 ENHANCED: Header with sliding window controls */}
<div className="bg-blue-500 text-white p-4 rounded-t-lg">
<div className="flex justify-between items-center">
<div>
<h1 className="text-xl font-bold">Streaming AI Chat with Sliding Window</h1>
<p className="text-blue-100">Memory window: {windowSize} messages</p>
</div>
<div className="text-right">
<label className="block text-sm text-blue-100 mb-1">
Memory Window
</label>
<input
type="range"
min="5"
max="50"
value={windowSize}
onChange={(e) => setWindowSize(parseInt(e.target.value))}
className="w-24"
disabled={isStreaming}
/>
<div className="text-xs text-blue-200 mt-1">{windowSize} msgs</div>
</div>
</div>
</div>
{/* 🆕 SLIDING WINDOW ADDITION: Memory usage indicator */}
<div className="bg-gray-50 px-4 py-2 border-b">
{(() => {
const { totalMessages, rememberedMessages, forgottenMessages } = getMemoryStats()
return (
<div className="flex justify-between text-sm text-gray-600">
<span>
📊 Total: {totalMessages} messages
</span>
<span>
🧠 Remembering: {rememberedMessages} messages
</span>
{forgottenMessages > 0 && (
<span className="text-orange-600">
💭 Forgotten: {forgottenMessages} messages
</span>
)}
</div>
)
})()}
</div>
{/* Messages (unchanged) */}
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.length === 0 && (
<div className="text-center text-gray-500 mt-20">
<Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" />
<p>Send a message to see streaming and sliding window memory in action!</p>
</div>
)}
{messages.map((message) => (
<div
key={message.id}
className={`flex items-start space-x-3 ${
message.isUser ? 'justify-end' : 'justify-start'
}`}
>
{!message.isUser && (
<div className="bg-blue-500 p-2 rounded-full">
<Bot className="w-4 h-4 text-white" />
</div>
)}
<div
className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${
message.isUser
? 'bg-blue-500 text-white'
: 'bg-gray-200 text-gray-800'
}`}
>
{message.text}
{message.isStreaming && (
<span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" />
)}
</div>
{message.isUser && (
<div className="bg-gray-500 p-2 rounded-full">
<User className="w-4 h-4 text-white" />
</div>
)}
</div>
))}
</div>
{/* Input (unchanged) */}
<div className="border-t p-4">
<div className="flex space-x-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={handleKeyPress}
placeholder="Type your message..."
className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500"
disabled={isStreaming}
/>
{isStreaming ? (
<button
onClick={stopStreaming}
className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors"
>
Stop
</button>
) : (
<button
onClick={sendMessage}
disabled={!input.trim()}
className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors"
>
<Send className="w-5 h-5" />
</button>
)}
</div>
</div>
</div>
</div>
)
}
export default StreamingChat

🆕 Line 9: const [windowSize, setWindowSize] = useState(10)

  • What it does: Adds state to control sliding window size
  • Why: Users can adjust how much memory the AI has

🆕 Lines 23-30: getMemoryStats function

  • What it does: Calculates memory usage statistics
  • Why: Shows users what’s being remembered vs forgotten

🆕 Line 55: windowSize: windowSize in request body

  • What it does: Sends window size to backend
  • Why: Backend needs this to apply the sliding window

🔄 Lines 125-143: Enhanced header with slider control

  • What changed: Added range input to control window size
  • Why: Gives users control over memory vs cost tradeoff

🆕 Lines 146-162: Memory usage indicator

  • What it does: Shows real-time memory statistics
  • Why: Visual feedback about what’s being remembered

  1. Start both servers (backend and frontend)
  2. Open your enhanced streaming chat
  3. Set window size to 5 using the slider
  4. Have this test conversation:
Message 1
You: "My name is Sarah and I'm 25 years old"
AI: "Nice to meet you, Sarah! It's great to know you're 25."
Message 2-3 (fill the window)
You: "I love pizza"
AI: "Pizza is delicious! What's your favorite topping?"
Message 4-5 (window getting full)
You: "I like pepperoni"
AI: "Great choice! Pepperoni is a classic."
Message 6-7 (window now full, starts forgetting)
You: "I work as a developer"
AI: "That's awesome! What kind of development do you do?"
Message 8-9 (forgot messages 1-2, including your name!)
You: "I code in JavaScript"
AI: "JavaScript is great! Very versatile language."
Message 10 (test the memory limitation)
You: "What's my name and age?"
AI: "I don't see that information in our recent conversation. Could you remind me what your name and age are?"

Watch the memory indicator as you chat:

  • Total messages increases with each message
  • Remembering stops growing at your window size
  • Forgotten appears and grows when window is exceeded

Try adjusting the window size during conversation and see how it affects the AI’s memory!


Window Size 5: [🧠🧠🧠🧠🧠] → ~500 tokens per message
Window Size 10: [🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠] → ~1,000 tokens per message
Window Size 20: [🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠] → ~2,000 tokens per message

Small Window (5-8 messages)

  • Pros: Very low costs, fast responses
  • Cons: Forgets context quickly, may need frequent clarification
  • 🎯 Best for: Simple Q&A, customer service, cost-sensitive applications

Medium Window (10-15 messages)

  • Pros: Good balance of cost and context retention
  • Cons: May lose important context in longer discussions
  • 🎯 Best for: Most applications, general chat, moderate conversations

Large Window (20-50 messages)

  • Pros: Maintains lots of context, handles complex discussions
  • Cons: Higher costs, slower responses, more tokens per message
  • 🎯 Best for: Complex problem-solving, detailed analysis, premium features

Without Sliding Window (Simple Memory):

Message 1: [1] → 100 tokens
Message 10: [1,2,3,4,5,6,7,8,9,10] → 1,000 tokens
Message 25: [1,2,3...23,24,25] → 2,500 tokens
Message 50: [1,2,3...48,49,50] → 5,000 tokens
Total Cost: ~125,000 tokens 💸💸💸

With Sliding Window (Size 10):

Message 1: [1] → 100 tokens
Message 10: [1,2,3,4,5,6,7,8,9,10] → 1,000 tokens
Message 25: [16,17,18,19,20,21,22,23,24,25] → 1,000 tokens (capped!)
Message 50: [41,42,43,44,45,46,47,48,49,50] → 1,000 tokens (capped!)
Total Cost: ~25,000 tokens 💰 (80% savings!)

Apply the same sliding window logic to your normal chat endpoint:

// Normal chat with sliding window
app.post("/api/chat", async (req, res) => {
try {
// 🆕 SLIDING WINDOW ADDITION: Accept windowSize parameter
const { message, conversationHistory = [], windowSize = 10 } = req.body;
// 🆕 SLIDING WINDOW ADDITION: Apply sliding window
const recentHistory = conversationHistory.slice(-windowSize);
// Build context-aware message (same logic as streaming)
let contextualMessage = message;
if (recentHistory.length > 0) {
const context = recentHistory
.map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
.join('\n');
const totalMessages = conversationHistory.length;
const forgottenMessages = Math.max(0, totalMessages - windowSize);
let memoryNote = "";
if (forgottenMessages > 0) {
memoryNote = `\n\nNote: I only remember the last ${windowSize} messages of our ${totalMessages}-message conversation.`;
}
contextualMessage = `Recent conversation:\n${context}${memoryNote}\n\nCurrent question: ${message}`;
}
const response = await openai.responses.create({
model: "gpt-4o-mini",
input: contextualMessage,
});
res.json({
response: response.output_text,
success: true,
});
} catch (error) {
console.error("OpenAI API Error:", error);
res.status(500).json({
error: "Failed to get AI response",
success: false,
});
}
});

Your enhanced sliding window memory system now provides:

  • Controlled costs - Token usage stays constant regardless of conversation length
  • Recent context - Maintains the most relevant recent conversation
  • User control - Adjustable window size for different use cases
  • Visual feedback - Real-time memory usage statistics
  • Memory awareness - AI knows when it might be missing context
  • Graceful degradation - Handles memory limitations intelligently
  • Cost optimization - Up to 80% cost savings on long conversations
  • Production ready - Scales to unlimited conversation length
  • Transparent memory - Users see exactly what’s being remembered
  • Adjustable settings - Real-time window size control
  • Smart responses - AI asks for clarification when needed
  • Visual indicators - Clear memory status display

Perfect for production applications that need predictable costs while maintaining conversation quality! 🚀

Next Steps: Ready to explore even more advanced memory strategies like conversation summarization and persistent storage? Let’s build those next!