📝 Summary Memory Implementation

Sliding window memory works well but has a problem - it completely forgets older context. Summary memory solves this by keeping recent messages intact while summarizing older parts of the conversation. This maintains context while controlling token costs.

Building on: This guide assumes you’ve completed the Simple Memory Implementation. We’ll enhance that code to add intelligent summarization.

🎯 The Problem with Simple Memory

Cost Growth with Simple Memory

Simple Memory (sends everything):
Message 10:  [1,2,3,4,5,6,7,8,9,10] → 1,000 tokens
Message 25:  [1,2,3...23,24,25] → 2,500 tokens
Message 50:  [1,2,3...48,49,50] → 5,000 tokens
Message 100: [1,2,3...98,99,100] → 10,000 tokens

Total Cost: ~175,000 tokens 💸💸💸

How Summary Memory Solves This

Summary Memory (summarize old + keep recent):
Message 25: Create summary of messages 1-15 + keep messages 16-25
Message 50: Update summary (1-35) + keep messages 36-50
Message 100: Update summary (1-85) + keep messages 86-100

Total Cost: ~50,000 tokens 💰 (70% savings!)

Visual Comparison

// Simple Memory: Everything grows
conversationHistory = [msg1, msg2, msg3, ..., msg100] // All 100 messages

// Summary Memory: Smart optimization
summary = "User discussed React app setup, chose Firebase auth, implemented user login..."
recentMessages = [msg85, msg86, msg87, ..., msg100] // Last 15 messages
// Send: summary + recent messages (much more efficient!)

Why Summary Memory is Better Than Alternatives

Compared to Sliding Window Memory (which you might consider):

✅ Summary Memory: Keeps context from old messages via summary
❌ Sliding Window: Completely forgets old messages
✅ Summary Memory: AI remembers your name from message 1 even at message 100
❌ Sliding Window: AI forgets your name after window size is exceeded

Best of Both Worlds:

// Summary Memory = Context Retention + Cost Control
summary + recentMessages = Full context + Manageable cost

// Simple Memory = Context but Expensive
allMessages = Full context + Growing cost

// Sliding Window = Cheap but Forgets
recentMessages = Limited context + Fixed cost

🤔 Why We Need a Separate Summary Endpoint

The Problem with Inline Summarization

You might think: “Why not just summarize during the chat response?” Here’s why that’s a bad idea:

Bad approach (summarizing inline):

app.post("/api/chat/stream", async (req, res) => {
  // 1. User sends message
  // 2. Check if we need summary
  // 3. Create summary (takes 3-5 seconds) ❌
  // 4. Then respond to user (another 3-5 seconds) ❌
  // Total: 6-10 seconds of waiting!
})

What users experience:

You: "How do I deploy this app?"
[Wait... wait... wait... 8 seconds later...]
AI: "For deployment, you can use..."

The Solution: Dedicated Summary Endpoint

Good approach (separate endpoints):

// Summarization happens separately and strategically
app.post("/api/summarize", async (req, res) => {
  // Just creates summary and returns it
})

// Chat responses stay fast
app.post("/api/chat/stream", async (req, res) => {
  // Uses existing summary + responds immediately
})

What users experience:

You: "How do I deploy this app?"
AI: "For deployment, you can use..." [instant response]
[Summary created quietly in background when needed]

📋 When to Trigger Summary Creation

We don’t summarize randomly. We use strategic triggers based on conversation length and natural breaks.

Trigger Strategy 1: Message Count Thresholds

const shouldCreateSummary = (conversationHistory) => {
  return conversationHistory.length >= summaryThreshold && !summary;
};

const shouldUpdateSummary = (conversationHistory) => {
  return conversationHistory.length >= summaryThreshold * 2 && summary;
};

Timeline example:

Messages 1-24: Simple memory (send all messages)
Message 25: Create first summary (summarize messages 1-10, keep 11-25 detailed)
Messages 26-49: Use summary + recent messages
Message 50: Update summary (summarize messages 1-35, keep 36-50 detailed)

Trigger Strategy 2: Natural Conversation Breaks

const isGoodTimeToSummarize = (conversationHistory) => {
  const recentMessages = conversationHistory.slice(-5);

  // Don't summarize during complex topics
  const hasCodeBlocks = recentMessages.some(msg =>
    msg.content.includes('```') || msg.content.includes('function'));

  const hasFollowUps = recentMessages.some(msg =>
    msg.content.toLowerCase().includes('can you explain') ||
    msg.content.toLowerCase().includes('tell me more'));

  // Wait for natural break if in middle of complex topic
  if (hasCodeBlocks || hasFollowUps) {
    return false; // Wait for better timing
  }

  return true; // Good time to summarize
};

🛠️ Step 1: Build the Summarization Endpoint

Let’s enhance your backend by adding a dedicated summary endpoint. We’ll build on your existing streaming implementation.

Understanding the Backend Enhancement

Your current Simple Memory backend accepts conversation history and sends it all to OpenAI. We’ll add:

New summarization endpoint for creating summaries
Enhanced chat endpoint that uses summaries + recent messages
Smart context building that combines both efficiently

Step 1a: Add the Summarization Endpoint

Add this new endpoint to your backend index.js, right after your existing chat endpoints:

// 🆕 SUMMARY MEMORY ADDITION: Dedicated summarization endpoint
app.post("/api/summarize", async (req, res) => {
  try {
    const { messages, conversationType = 'general' } = req.body;

    if (!messages || messages.length === 0) {
      return res.status(400).json({ error: "Messages are required" });
    }

    // Summary instructions for different conversation types
    const summaryInstructions = {
      technical: "Create a technical summary focusing on technologies discussed, decisions made, code examples covered, and implementation details. Preserve specific technical context.",
      creative: "Summarize the creative process including ideas generated, concepts explored, and creative directions chosen. Maintain the creative flow context.",
      support: "Summarize the support conversation including the user's issue, troubleshooting steps attempted, solutions provided, and current status.",
      general: "Create a conversational summary capturing key topics, decisions, and important context for continuing the discussion naturally."
    };

    const instruction = summaryInstructions[conversationType] || summaryInstructions.general;

    // Build context-aware message for the AI
    let contextualMessage = `Please summarize this conversation:\n\n${messages.map(msg => `${msg.role}: ${msg.content}`).join('\n\n')}`;

    // Add summarization instructions
    contextualMessage = `You are a conversation summarizer. ${instruction} Keep it concise but comprehensive enough to maintain conversation continuity.\n\n${contextualMessage}`;

    console.log(`Creating summary for ${messages.length} messages`);

    // Create streaming response using Response API
    const response = await openai.responses.create({
      model: "gpt-4o-mini",
      input: contextualMessage,
    });

    // Return results
    res.json({
      summary: response.output_text,
      messagesCount: messages.length,
      conversationType: conversationType,
      success: true,
    });

  } catch (error) {
    console.error("Summarization Error:", error);
    res.status(500).json({
      error: "Failed to create summary",
      success: false,
    });
  }
});

Step 1b: Enhance Your Chat Endpoint for Summary Support

Update your existing /api/chat/stream endpoint to handle summaries:

// 🔄 ENHANCED: Updated streaming endpoint with summary support
app.post("/api/chat/stream", async (req, res) => {
  try {
    const {
      message,
      conversationHistory = [],
      summary = null,
      recentWindowSize = 15
    } = req.body;

    if (!message) {
      return res.status(400).json({ error: "Message is required" });
    }

    // Set headers for streaming
    res.writeHead(200, {
      'Content-Type': 'text/plain',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    });

    // 🆕 SUMMARY MEMORY ADDITION: Build smart context with summary
    let contextualMessage = message;

    // If we have a summary, use it + recent messages for context
    if (summary && conversationHistory.length > 0) {
      const recentMessages = conversationHistory.slice(-recentWindowSize);
      const recentContext = recentMessages
        .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
        .join('\n');

      contextualMessage = `Previous conversation summary:\n${summary}\n\nRecent conversation:\n${recentContext}\n\nCurrent question: ${message}`;
    }
    // If no summary but we have conversation history, use all of it (Simple Memory fallback)
    else if (conversationHistory.length > 0) {
      const context = conversationHistory
        .map(msg => `${msg.role === 'user' ? 'User' : 'Assistant'}: ${msg.content}`)
        .join('\n');

      contextualMessage = `Previous conversation:\n${context}\n\nCurrent question: ${message}`;
    }

    // Create streaming response using Response API
    const stream = await openai.responses.create({
      model: "gpt-4o-mini",
      input: contextualMessage,
      stream: true,
    });

    // Stream each chunk to the frontend - Handle Response API events
    for await (const event of stream) {
      switch (event.type) {
        case "response.output_text.delta":
          if (event.delta) {
            let textChunk = typeof event.delta === "string"
              ? event.delta
              : event.delta.text || "";

            if (textChunk) {
              res.write(textChunk);
              res.flush?.();
            }
          }
          break;

        case "text_delta":
          if (event.text) {
            res.write(event.text);
            res.flush?.();
          }
          break;

        case "response.created":
        case "response.completed":
        case "response.output_item.added":
        case "response.content_part.added":
        case "response.content_part.done":
        case "response.output_item.done":
        case "response.output_text.done":
          // Keep connection alive, no content to write
          break;

        case "error":
          console.error("Stream error:", event.error);
          res.write("\n[Error during generation]");
          break;
      }
    }

    // Close the stream
    res.end();

  } catch (error) {
    console.error("OpenAI Streaming Error:", error);

    // Handle error properly for streaming
    if (res.headersSent) {
      res.write("\n[Error occurred]");
      res.end();
    } else {
      res.status(500).json({
        error: "Failed to stream AI response",
        success: false,
      });
    }
  }
});

Summary of Backend Changes

🆕 New /api/summarize endpoint:

What it does: Creates intelligent summaries of conversation history
Why it’s separate: Keeps chat responses fast while summarization happens in background
How it works: Uses conversation type detection for better summaries

🔄 Enhanced /api/chat/stream endpoint:

Added parameters: summary, recentWindowSize
Smart context building: Uses summary + recent messages when available
Fallback support: Still works with Simple Memory if no summary exists

🔄 Step 2: Enhance Your Frontend with Summary Logic

Now let’s enhance your Simple Memory frontend to add intelligent summarization. We’ll build on your existing StreamingChat component.

Understanding the Frontend Enhancement

Your current Simple Memory frontend builds and sends all conversation history. We’ll enhance it to:

Add summary state management for tracking summaries
Create smart summarization logic with intelligent timing
Send summaries + recent messages instead of all messages
Provide visual feedback about memory optimization

Step 2a: Add Summary State Management

Update your component state to include summary-related functionality:

function StreamingChat() {
  const [messages, setMessages] = useState([])
  const [input, setInput] = useState('')
  const [isStreaming, setIsStreaming] = useState(false)
  const abortControllerRef = useRef(null)

  // 🆕 SUMMARY MEMORY ADDITION: Summary-specific state
  const [summary, setSummary] = useState(null)
  const [recentWindowSize, setRecentWindowSize] = useState(15)
  const [summaryThreshold, setSummaryThreshold] = useState(25)
  const [isCreatingSummary, setIsCreatingSummary] = useState(false)
  const [conversationType, setConversationType] = useState('general')

What each new state does:

summary - Stores the current conversation summary text
recentWindowSize - How many recent messages to keep in detail (default 15)
summaryThreshold - When to create first summary (default 25 messages)
isCreatingSummary - Shows when summarization is happening
conversationType - Tracks detected conversation type

Step 2b: Add Summary Creation Logic

Add these functions right after your existing buildConversationHistory function:

// 🆕 SUMMARY MEMORY ADDITION: Detect conversation type automatically
const detectConversationType = (messages) => {
  const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();

  if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) {
    return 'technical';
  } else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) {
    return 'creative';
  } else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) {
    return 'support';
  }
  return 'general';
};

// 🆕 SUMMARY MEMORY ADDITION: Create summary with intelligent timing
const createSummary = async (messagesToSummarize) => {
  if (isCreatingSummary) return; // Prevent multiple simultaneous summaries

  try {
    setIsCreatingSummary(true);

    // Detect conversation type for better summaries
    const detectedType = detectConversationType(messages);

    const response = await fetch('http://localhost:8000/api/summarize', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        messages: messagesToSummarize,
        conversationType: detectedType
      }),
    });

    const data = await response.json();

    if (data.success) {
      setSummary(data.summary);
      setConversationType(data.conversationType);

      console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`);
    }
  } catch (error) {
    console.error("Failed to create summary:", error);
  } finally {
    setIsCreatingSummary(false);
  }
};

// 🆕 SUMMARY MEMORY ADDITION: Smart summary triggers
const shouldCreateSummary = (conversationHistory) => {
  return conversationHistory.length >= summaryThreshold && !summary;
};

const shouldUpdateSummary = (conversationHistory) => {
  return conversationHistory.length >= summaryThreshold * 2 && summary;
};

const isGoodTimeToSummarize = (conversationHistory) => {
  const recentMessages = conversationHistory.slice(-3);

  // Check if we're in middle of complex topic
  const hasCodeDiscussion = recentMessages.some(msg =>
    msg.content.includes('```') || msg.content.includes('function'));

  const hasFollowUp = recentMessages.some(msg =>
    msg.content.toLowerCase().includes('can you explain') ||
    msg.content.toLowerCase().includes('tell me more') ||
    msg.content.toLowerCase().includes('what about'));

  return !hasCodeDiscussion && !hasFollowUp;
};

Step 2c: Update Your sendMessage Function

Replace your existing sendMessage function with this enhanced version:

const sendMessage = async () => {
  if (!input.trim() || isStreaming) return

  const userMessage = { text: input, isUser: true, id: Date.now() }
  setMessages(prev => [...prev, userMessage])

  const currentInput = input
  setInput('')
  setIsStreaming(true)

  const aiMessageId = Date.now() + 1
  const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true }
  setMessages(prev => [...prev, aiMessage])

  try {
    // Build conversation history from current messages
    const conversationHistory = buildConversationHistory(messages)

    // 🆕 SUMMARY MEMORY ADDITION: Smart summary timing - happens in background
    if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
      const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
      createSummary(messagesToSummarize); // No await - background process
    } else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
      const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
      createSummary(messagesToSummarize); // No await - background process
    }

    abortControllerRef.current = new AbortController()

    const response = await fetch('http://localhost:8000/api/chat/stream', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        message: currentInput,
        conversationHistory: conversationHistory,
        summary: summary, // 🆕 SUMMARY MEMORY ADDITION: Include summary
        recentWindowSize: recentWindowSize // 🆕 SUMMARY MEMORY ADDITION: Include window size
      }),
      signal: abortControllerRef.current.signal,
    })

    if (!response.ok) {
      throw new Error('Failed to get response')
    }

    // Read the stream (unchanged)
    const reader = response.body.getReader()
    const decoder = new TextDecoder()

    while (true) {
      const { done, value } = await reader.read()
      if (done) break

      const chunk = decoder.decode(value, { stream: true })

      setMessages(prev =>
        prev.map(msg =>
          msg.id === aiMessageId
            ? { ...msg, text: msg.text + chunk }
            : msg
        )
      )
    }

    // Mark streaming as complete (unchanged)
    setMessages(prev =>
      prev.map(msg =>
        msg.id === aiMessageId
          ? { ...msg, isStreaming: false }
          : msg
      )
    )

  } catch (error) {
    if (error.name === 'AbortError') {
      console.log('Request was cancelled')
    } else {
      console.error('Streaming error:', error)
      setMessages(prev =>
        prev.map(msg =>
          msg.id === aiMessageId
            ? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false }
            : msg
        )
      )
    }
  } finally {
    setIsStreaming(false)
    abortControllerRef.current = null
  }
}

Step 2d: Add Memory Status Helper

Add this helper function for displaying memory statistics:

// 🆕 SUMMARY MEMORY ADDITION: Calculate memory statistics
const getMemoryStats = () => {
  const totalMessages = messages.filter(msg => !msg.isStreaming).length
  const recentMessages = Math.min(totalMessages, recentWindowSize)
  const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)

  return { totalMessages, recentMessages, summarizedMessages }
};

// 🆕 SUMMARY MEMORY ADDITION: Manual summary trigger
const triggerManualSummary = async () => {
  const conversationHistory = buildConversationHistory(messages);
  if (conversationHistory.length >= 10) {
    const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
    await createSummary(messagesToSummarize);
  }
};

Step 2e: Enhanced UI with Summary Controls

Replace your existing return statement with this enhanced UI:

return (
  <div className="min-h-screen bg-gray-100 flex items-center justify-center p-4">
    <div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col">
      {/* 🔄 ENHANCED: Header with summary controls */}
      <div className="bg-blue-500 text-white p-4 rounded-t-lg">
        <div className="flex justify-between items-start">
          <div>
            <h1 className="text-xl font-bold">Smart Summary Memory Chat</h1>
            <p className="text-blue-100 text-sm">Intelligent conversation memory with automatic summarization</p>
          </div>

          <div className="text-right space-y-2">
            <div>
              <label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label>
              <input
                type="range" min="5" max="30" value={recentWindowSize}
                onChange={(e) => setRecentWindowSize(parseInt(e.target.value))}
                className="w-20" disabled={isStreaming}
              />
            </div>
            <div>
              <label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label>
              <input
                type="range" min="15" max="50" value={summaryThreshold}
                onChange={(e) => setSummaryThreshold(parseInt(e.target.value))}
                className="w-20" disabled={isStreaming}
              />
            </div>
            <button
              onClick={triggerManualSummary}
              disabled={isCreatingSummary || messages.length < 10}
              className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50"
            >
              Create Summary Now
            </button>
          </div>
        </div>
      </div>

      {/* 🆕 SUMMARY MEMORY ADDITION: Memory status dashboard */}
      <div className="bg-gray-50 px-4 py-3 border-b">
        {(() => {
          const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();

          return (
            <div className="space-y-2">
              <div className="flex justify-between items-center text-sm">
                <div className="flex space-x-4 text-gray-600">
                  <span>📊 Total: {totalMessages}</span>
                  <span>🔥 Recent: {recentMessages}</span>
                  {summarizedMessages > 0 && (
                    <span>📝 Summarized: {summarizedMessages}</span>
                  )}
                  <span className="text-blue-600">🧠 Type: {conversationType}</span>
                </div>

                <div className="flex items-center space-x-2 text-xs">
                  {summary && (
                    <span className="text-green-600">✅ Summary Active</span>
                  )}
                  {isCreatingSummary && (
                    <span className="text-blue-600">🔄 Creating Summary...</span>
                  )}
                </div>
              </div>

              {/* Memory usage bar */}
              <div className="w-full bg-gray-200 rounded-full h-2">
                <div
                  className="bg-blue-500 h-2 rounded-full transition-all duration-300"
                  style={{
                    width: `${Math.min(100, (totalMessages / 50) * 100)}%`
                  }}
                />
              </div>
              <div className="text-xs text-gray-500 text-center">
                Memory usage: {totalMessages}/50 messages before optimization
              </div>
            </div>
          );
        })()}
      </div>

      {/* 🆕 SUMMARY MEMORY ADDITION: Active summary display */}
      {summary && (
        <div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-4 mt-2 rounded">
          <div className="flex items-start">
            <span className="text-blue-600 mr-2">📋</span>
            <div className="flex-1">
              <p className="text-xs font-medium text-blue-800 mb-1">
                Active Summary ({conversationType})
              </p>
              <p className="text-xs text-blue-700 leading-relaxed">
                {summary}
              </p>
            </div>
          </div>
        </div>
      )}

      {/* Messages (unchanged) */}
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.length === 0 && (
          <div className="text-center text-gray-500 mt-20">
            <Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" />
            <p>Send a message to see streaming and summary memory in action!</p>
          </div>
        )}

        {messages.map((message) => (
          <div
            key={message.id}
            className={`flex items-start space-x-3 ${
              message.isUser ? 'justify-end' : 'justify-start'
            }`}
          >
            {!message.isUser && (
              <div className="bg-blue-500 p-2 rounded-full">
                <Bot className="w-4 h-4 text-white" />
              </div>
            )}

            <div
              className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${
                message.isUser
                  ? 'bg-blue-500 text-white'
                  : 'bg-gray-200 text-gray-800'
              }`}
            >
              {message.text}
              {message.isStreaming && (
                <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" />
              )}
            </div>

            {message.isUser && (
              <div className="bg-gray-500 p-2 rounded-full">
                <User className="w-4 h-4 text-white" />
              </div>
            )}
          </div>
        ))}
      </div>

      {/* Input (unchanged) */}
      <div className="border-t p-4">
        <div className="flex space-x-2">
          <input
            type="text"
            value={input}
            onChange={(e) => setInput(e.target.value)}
            onKeyPress={handleKeyPress}
            placeholder="Type your message..."
            className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500"
            disabled={isStreaming}
          />
          {isStreaming ? (
            <button
              onClick={stopStreaming}
              className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors"
            >
              Stop
            </button>
          ) : (
            <button
              onClick={sendMessage}
              disabled={!input.trim()}
              className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors"
            >
              <Send className="w-5 h-5" />
            </button>
          )}
        </div>
      </div>
    </div>
  </div>
)

🧪 Test Your Summary Memory

Step-by-Step Testing Guide

Start both servers (backend and frontend)
Open your enhanced streaming chat
Set summary threshold to 10 for faster testing
Have this comprehensive test conversation:

Messages 1-5: Build context
You: "Hi! My name is Sarah and I'm 25 years old"
AI: "Nice to meet you, Sarah! It's great to know you're 25."

You: "I'm building a React todo app with Firebase"
AI: "That sounds like a great project! React and Firebase work well together."

You: "I'm using TypeScript and want authentication"
AI: "Excellent choice! TypeScript adds great type safety to React projects."

Messages 6-10: Continue building context
You: "I work as a frontend developer in New York"
AI: "That's awesome! New York has a great tech scene."

You: "I love using modern frameworks and tools"
AI: "Modern frameworks definitely make development more efficient."

Messages 11-15: Watch summary creation
You: "What CSS framework should I use?"
AI: "For a React app, you might consider Tailwind CSS or styled-components."

[Watch the memory status - should show summary being created]

You: "What do you remember about me?"
AI: "Based on our conversation, you're Sarah, 25 years old, a frontend developer in New York working on a React todo app with Firebase and TypeScript authentication."

[Should reference information from early messages via summary!]

What to Watch For

Memory indicator shows total vs summarized vs recent messages
Summary creation happens automatically at threshold
AI maintains context from early messages even after summarization
Chat responses stay fast (no waiting for summarization)
Visual feedback shows when summary is active

Complete Enhanced StreamingChat Component

Here’s your complete StreamingChat.jsx component with all Summary Memory enhancements:

import { useState, useRef } from 'react'
import { Send, Bot, User } from 'lucide-react'

function StreamingChat() {
  const [messages, setMessages] = useState([])
  const [input, setInput] = useState('')
  const [isStreaming, setIsStreaming] = useState(false)
  const abortControllerRef = useRef(null)

  // 🆕 SUMMARY MEMORY ADDITION: Summary-specific state
  const [summary, setSummary] = useState(null)
  const [recentWindowSize, setRecentWindowSize] = useState(15)
  const [summaryThreshold, setSummaryThreshold] = useState(25)
  const [isCreatingSummary, setIsCreatingSummary] = useState(false)
  const [conversationType, setConversationType] = useState('general')

  // Function to build conversation history (from Simple Memory)
  const buildConversationHistory = (messages) => {
    return messages
      .filter(msg => !msg.isStreaming)
      .map(msg => ({
        role: msg.isUser ? "user" : "assistant",
        content: msg.text
      }));
  };

  // 🆕 SUMMARY MEMORY ADDITION: Detect conversation type automatically
  const detectConversationType = (messages) => {
    const recentText = messages.slice(-10).map(m => m.text).join(' ').toLowerCase();

    if (recentText.includes('function') || recentText.includes('code') || recentText.includes('api')) {
      return 'technical';
    } else if (recentText.includes('create') || recentText.includes('idea') || recentText.includes('design')) {
      return 'creative';
    } else if (recentText.includes('problem') || recentText.includes('error') || recentText.includes('help')) {
      return 'support';
    }
    return 'general';
  };

  // 🆕 SUMMARY MEMORY ADDITION: Create summary with intelligent timing
  const createSummary = async (messagesToSummarize) => {
    if (isCreatingSummary) return; // Prevent multiple simultaneous summaries

    try {
      setIsCreatingSummary(true);

      // Detect conversation type for better summaries
      const detectedType = detectConversationType(messages);

      const response = await fetch('http://localhost:8000/api/summarize', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          messages: messagesToSummarize,
          conversationType: detectedType
        }),
      });

      const data = await response.json();

      if (data.success) {
        setSummary(data.summary);
        setConversationType(data.conversationType);

        console.log(`Summary created: ${data.messagesCount} messages summarized as ${data.conversationType}`);
      }
    } catch (error) {
      console.error("Failed to create summary:", error);
    } finally {
      setIsCreatingSummary(false);
    }
  };

  // 🆕 SUMMARY MEMORY ADDITION: Smart summary triggers
  const shouldCreateSummary = (conversationHistory) => {
    return conversationHistory.length >= summaryThreshold && !summary;
  };

  const shouldUpdateSummary = (conversationHistory) => {
    return conversationHistory.length >= summaryThreshold * 2 && summary;
  };

  const isGoodTimeToSummarize = (conversationHistory) => {
    const recentMessages = conversationHistory.slice(-3);

    // Check if we're in middle of complex topic
    const hasCodeDiscussion = recentMessages.some(msg =>
      msg.content.includes('```') || msg.content.includes('function'));

    const hasFollowUp = recentMessages.some(msg =>
      msg.content.toLowerCase().includes('can you explain') ||
      msg.content.toLowerCase().includes('tell me more') ||
      msg.content.toLowerCase().includes('what about'));

    return !hasCodeDiscussion && !hasFollowUp;
  };

  // 🆕 SUMMARY MEMORY ADDITION: Calculate memory statistics
  const getMemoryStats = () => {
    const totalMessages = messages.filter(msg => !msg.isStreaming).length
    const recentMessages = Math.min(totalMessages, recentWindowSize)
    const summarizedMessages = Math.max(0, totalMessages - recentWindowSize)

    return { totalMessages, recentMessages, summarizedMessages }
  };

  // 🆕 SUMMARY MEMORY ADDITION: Manual summary trigger
  const triggerManualSummary = async () => {
    const conversationHistory = buildConversationHistory(messages);
    if (conversationHistory.length >= 10) {
      const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
      await createSummary(messagesToSummarize);
    }
  };

  const sendMessage = async () => {
    if (!input.trim() || isStreaming) return

    const userMessage = { text: input, isUser: true, id: Date.now() }
    setMessages(prev => [...prev, userMessage])

    const currentInput = input
    setInput('')
    setIsStreaming(true)

    const aiMessageId = Date.now() + 1
    const aiMessage = { text: '', isUser: false, id: aiMessageId, isStreaming: true }
    setMessages(prev => [...prev, aiMessage])

    try {
      // Build conversation history from current messages
      const conversationHistory = buildConversationHistory(messages)

      // 🆕 SUMMARY MEMORY ADDITION: Smart summary timing - happens in background
      if (shouldCreateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
        const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
        createSummary(messagesToSummarize); // No await - background process
      } else if (shouldUpdateSummary(conversationHistory) && isGoodTimeToSummarize(conversationHistory)) {
        const messagesToSummarize = conversationHistory.slice(0, -recentWindowSize);
        createSummary(messagesToSummarize); // No await - background process
      }

      abortControllerRef.current = new AbortController()

      const response = await fetch('http://localhost:8000/api/chat/stream', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          message: currentInput,
          conversationHistory: conversationHistory,
          summary: summary, // 🆕 SUMMARY MEMORY ADDITION: Include summary
          recentWindowSize: recentWindowSize // 🆕 SUMMARY MEMORY ADDITION: Include window size
        }),
        signal: abortControllerRef.current.signal,
      })

      if (!response.ok) {
        throw new Error('Failed to get response')
      }

      // Read the stream (unchanged)
      const reader = response.body.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value, { stream: true })

        setMessages(prev =>
          prev.map(msg =>
            msg.id === aiMessageId
              ? { ...msg, text: msg.text + chunk }
              : msg
          )
        )
      }

      // Mark streaming as complete (unchanged)
      setMessages(prev =>
        prev.map(msg =>
          msg.id === aiMessageId
            ? { ...msg, isStreaming: false }
            : msg
        )
      )

    } catch (error) {
      if (error.name === 'AbortError') {
        console.log('Request was cancelled')
      } else {
        console.error('Streaming error:', error)
        setMessages(prev =>
          prev.map(msg =>
            msg.id === aiMessageId
              ? { ...msg, text: 'Sorry, something went wrong.', isStreaming: false }
              : msg
          )
        )
      }
    } finally {
      setIsStreaming(false)
      abortControllerRef.current = null
    }
  }

  const handleKeyPress = (e) => {
    if (e.key === 'Enter' && !e.shiftKey && !isStreaming) {
      e.preventDefault()
      sendMessage()
    }
  }

  const stopStreaming = () => {
    if (abortControllerRef.current) {
      abortControllerRef.current.abort()
    }
  }

  return (
    <div className="min-h-screen bg-gray-100 flex items-center justify-center p-4">
      <div className="bg-white rounded-lg shadow-lg w-full max-w-2xl h-[600px] flex flex-col">
        {/* 🔄 ENHANCED: Header with summary controls */}
        <div className="bg-blue-500 text-white p-4 rounded-t-lg">
          <div className="flex justify-between items-start">
            <div>
              <h1 className="text-xl font-bold">Smart Summary Memory Chat</h1>
              <p className="text-blue-100 text-sm">Intelligent conversation memory with automatic summarization</p>
            </div>

            <div className="text-right space-y-2">
              <div>
                <label className="block text-xs text-blue-100">Recent: {recentWindowSize}</label>
                <input
                  type="range" min="5" max="30" value={recentWindowSize}
                  onChange={(e) => setRecentWindowSize(parseInt(e.target.value))}
                  className="w-20" disabled={isStreaming}
                />
              </div>
              <div>
                <label className="block text-xs text-blue-100">Summary at: {summaryThreshold}</label>
                <input
                  type="range" min="15" max="50" value={summaryThreshold}
                  onChange={(e) => setSummaryThreshold(parseInt(e.target.value))}
                  className="w-20" disabled={isStreaming}
                />
              </div>
              <button
                onClick={triggerManualSummary}
                disabled={isCreatingSummary || messages.length < 10}
                className="text-xs bg-white bg-opacity-20 px-2 py-1 rounded disabled:opacity-50"
              >
                Create Summary Now
              </button>
            </div>
          </div>
        </div>

        {/* 🆕 SUMMARY MEMORY ADDITION: Memory status dashboard */}
        <div className="bg-gray-50 px-4 py-3 border-b">
          {(() => {
            const { totalMessages, recentMessages, summarizedMessages } = getMemoryStats();

            return (
              <div className="space-y-2">
                <div className="flex justify-between items-center text-sm">
                  <div className="flex space-x-4 text-gray-600">
                    <span>📊 Total: {totalMessages}</span>
                    <span>🔥 Recent: {recentMessages}</span>
                    {summarizedMessages > 0 && (
                      <span>📝 Summarized: {summarizedMessages}</span>
                    )}
                    <span className="text-blue-600">🧠 Type: {conversationType}</span>
                  </div>

                  <div className="flex items-center space-x-2 text-xs">
                    {summary && (
                      <span className="text-green-600">✅ Summary Active</span>
                    )}
                    {isCreatingSummary && (
                      <span className="text-blue-600">🔄 Creating Summary...</span>
                    )}
                  </div>
                </div>

                {/* Memory usage bar */}
                <div className="w-full bg-gray-200 rounded-full h-2">
                  <div
                    className="bg-blue-500 h-2 rounded-full transition-all duration-300"
                    style={{
                      width: `${Math.min(100, (totalMessages / 50) * 100)}%`
                    }}
                  />
                </div>
                <div className="text-xs text-gray-500 text-center">
                  Memory usage: {totalMessages}/50 messages before optimization
                </div>
              </div>
            );
          })()}
        </div>

        {/* 🆕 SUMMARY MEMORY ADDITION: Active summary display */}
        {summary && (
          <div className="bg-blue-50 border-l-4 border-blue-400 p-3 mx-4 mt-2 rounded">
            <div className="flex items-start">
              <span className="text-blue-600 mr-2">📋</span>
              <div className="flex-1">
                <p className="text-xs font-medium text-blue-800 mb-1">
                  Active Summary ({conversationType})
                </p>
                <p className="text-xs text-blue-700 leading-relaxed">
                  {summary}
                </p>
              </div>
            </div>
          </div>
        )}

        {/* Messages (unchanged) */}
        <div className="flex-1 overflow-y-auto p-4 space-y-4">
          {messages.length === 0 && (
            <div className="text-center text-gray-500 mt-20">
              <Bot className="w-12 h-12 mx-auto mb-4 text-gray-400" />
              <p>Send a message to see streaming and summary memory in action!</p>
            </div>
          )}

          {messages.map((message) => (
            <div
              key={message.id}
              className={`flex items-start space-x-3 ${
                message.isUser ? 'justify-end' : 'justify-start'
              }`}
            >
              {!message.isUser && (
                <div className="bg-blue-500 p-2 rounded-full">
                  <Bot className="w-4 h-4 text-white" />
                </div>
              )}

              <div
                className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${
                  message.isUser
                    ? 'bg-blue-500 text-white'
                    : 'bg-gray-200 text-gray-800'
                }`}
              >
                {message.text}
                {message.isStreaming && (
                  <span className="inline-block w-2 h-4 bg-blue-500 ml-1 animate-pulse" />
                )}
              </div>

              {message.isUser && (
                <div className="bg-gray-500 p-2 rounded-full">
                  <User className="w-4 h-4 text-white" />
                </div>
              )}
            </div>
          ))}
        </div>

        {/* Input (unchanged) */}
        <div className="border-t p-4">
          <div className="flex space-x-2">
            <input
              type="text"
              value={input}
              onChange={(e) => setInput(e.target.value)}
              onKeyPress={handleKeyPress}
              placeholder="Type your message..."
              className="flex-1 border border-gray-300 rounded-lg px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500"
              disabled={isStreaming}
            />
            {isStreaming ? (
              <button
                onClick={stopStreaming}
                className="bg-red-500 hover:bg-red-600 text-white px-4 py-2 rounded-lg transition-colors"
              >
                Stop
              </button>
            ) : (
              <button
                onClick={sendMessage}
                disabled={!input.trim()}
                className="bg-blue-500 hover:bg-blue-600 disabled:bg-gray-300 text-white p-2 rounded-lg transition-colors"
              >
                <Send className="w-5 h-5" />
              </button>
            )}
          </div>
        </div>
      </div>
    </div>
  )
}

export default StreamingChat

Summary of All Frontend Changes

🆕 New State Variables (Lines 9-13):

summary - Current conversation summary
recentWindowSize - How many recent messages to keep
summaryThreshold - When to trigger summary creation
isCreatingSummary - Summary creation status
conversationType - Detected conversation type

🆕 New Functions (Lines 24-89):

detectConversationType() - Auto-detects conversation type
createSummary() - Creates summaries via API call
shouldCreateSummary() / shouldUpdateSummary() - Smart triggers
isGoodTimeToSummarize() - Timing intelligence
getMemoryStats() - Memory usage calculations
triggerManualSummary() - Manual summary creation

🔄 Enhanced sendMessage (Lines 91-163):

Background summary creation
Includes summary and window size in requests
Non-blocking summarization

🔄 Enhanced UI (Lines 174-350):

Summary controls in header
Memory status dashboard
Active summary display
Visual indicators for memory usage

🔍 How Summary Memory Works

Data Flow Visualization

Frontend State → Summary Creation → Enhanced Context

messages = [
  { text: "I'm Sarah, 25", isUser: true, id: 1 },          // Gets summarized
  { text: "Nice to meet you!", isUser: false, id: 2 },    // Gets summarized
  { text: "I work in NYC", isUser: true, id: 3 },         // Gets summarized
  // ... 20 more messages ...
  { text: "What frameworks?", isUser: true, id: 23 },     // Recent (kept)
  { text: "React is great!", isUser: false, id: 24 },     // Recent (kept)
  { text: "Tell me about CSS", isUser: true, id: 25 }     // Current message
]

↓ Summary Creation (messages 1-10) ↓

summary = "User Sarah (25) is a frontend dev in NYC building a React todo app with Firebase auth and TypeScript"
recentMessages = [messages 11-24] // Last 15 messages
currentMessage = "Tell me about CSS"

↓ Enhanced Context ↓

contextualMessage = `
Previous conversation summary:
User Sarah (25) is a frontend dev in NYC building a React todo app with Firebase auth and TypeScript

Recent conversation:
User: What frameworks should I use?
Assistant: React is great for modern apps...

Current question: Tell me about CSS
`

Memory Optimization Process

// Before Summary (25 messages):
Request = [msg1, msg2, msg3, ..., msg25] = ~2,500 tokens

// After Summary (25+ messages):
Request = summary + [msg16, msg17, ..., msg25] = ~800 tokens
// 70% cost reduction while maintaining full context!

🧪 Advanced Testing Scenarios

Test Scenario 1: Context Retention

1. Tell AI your name and age (message 1-2)
2. Have 20+ messages about different topics
3. Ask "What do you remember about me?"
4. AI should remember details from message 1 via summary

Test Scenario 2: Summary Timing

1. Start complex coding discussion
2. Notice summary waits for natural break
3. Change topics completely
4. Summary triggers automatically

Test Scenario 3: Cost Optimization

1. Watch memory usage bar grow
2. See summarization reduce effective memory
3. Compare with Simple Memory costs
4. Verify 70%+ savings in long conversations

✅ What You’ve Built

Your intelligent Summary Memory system now provides:

Smart Memory Management

✅ Context retention - Never loses important conversation details
✅ Cost optimization - Up to 70% savings on long conversations
✅ Intelligent timing - Summarizes at natural conversation breaks
✅ Type detection - Different summary styles for different conversation types

Production-Ready Features

✅ Background processing - Chat responses stay instant
✅ Visual feedback - Real-time memory usage indicators
✅ User controls - Adjustable thresholds and manual triggers
✅ Error handling - Graceful degradation when summarization fails

Advanced Capabilities

✅ Unlimited conversations - Scales to any conversation length
✅ Smart fallbacks - Works with Simple Memory when no summary exists
✅ Multiple conversation types - Technical, creative, support, general
✅ Real-time optimization - Automatic memory management

This is production-ready memory management that combines the best of Simple Memory (context retention) and Sliding Window (cost control) without the downsides of either approach! 🧠✨