Memory System

The Memory System in @hazeljs/rag provides persistent context and conversation management, enabling AI applications to remember conversations, user preferences, and historical interactions across sessions.

Overview

Building context-aware AI applications requires managing conversation history, tracking entities, storing facts, and maintaining temporal context. The Memory System solves these challenges by providing:

  • 5 Memory Types: Conversation, Entity, Fact, Event, and Working memory
  • 3 Storage Strategies: BufferMemory (fast), VectorMemory (semantic), HybridMemory (best of both)
  • Semantic Search: Find relevant memories using embeddings
  • Auto-Summarization: Compress old conversations automatically
  • Entity Tracking: Remember people, companies, and relationships
  • Importance Scoring: Prioritize relevant information
  • RAG Integration: Combine document retrieval with conversation context

Architecture

Loading diagram...

Memory Types

Conversation Memory

Track multi-turn conversations with automatic summarization.

import { MemoryManager, BufferMemory } from '@hazeljs/rag';

const memoryStore = new BufferMemory({ maxSize: 100 });
const memoryManager = new MemoryManager(memoryStore, {
  maxConversationLength: 20,
  summarizeAfter: 50,
});

await memoryManager.initialize();

// Add messages
await memoryManager.addMessage(
  { role: 'user', content: 'What is HazelJS?' },
  'session-123'
);

await memoryManager.addMessage(
  { role: 'assistant', content: 'HazelJS is an AI-native framework...' },
  'session-123'
);

// Get history
const history = await memoryManager.getConversationHistory('session-123', 10);

// Summarize
const summary = await memoryManager.summarizeConversation('session-123');

Features:

  • Sliding window for recent messages
  • Automatic summarization of old conversations
  • Token-aware context management
  • Multi-session support

Entity Memory

Track entities (people, companies, concepts) mentioned in conversations.

// Track an entity
await memoryManager.trackEntity({
  name: 'Alice',
  type: 'person',
  attributes: {
    role: 'engineer',
    company: 'TechCorp',
  },
  relationships: [
    { type: 'works_at', target: 'TechCorp' },
  ],
  firstSeen: new Date(),
  lastSeen: new Date(),
  mentions: 1,
});

// Retrieve entity
const alice = await memoryManager.getEntity('Alice');

// Update entity
await memoryManager.updateEntity('Alice', {
  attributes: { ...alice.attributes, status: 'premium' },
});

// Get all entities
const entities = await memoryManager.getAllEntities('session-123');

Use Cases:

  • Customer relationship management
  • Personalized recommendations
  • Knowledge graph construction
  • Context-aware responses

Semantic Memory (Facts)

Store and recall facts with semantic understanding.

// Store facts
await memoryManager.storeFact(
  'User prefers dark mode',
  { userId: 'user-123', category: 'preference' }
);

await memoryManager.storeFact(
  'HazelJS supports TypeScript decorators',
  { category: 'framework-feature' }
);

// Recall facts semantically
const facts = await memoryManager.recallFacts('user preferences', {
  topK: 5,
  minScore: 0.7,
});

// Update a fact
await memoryManager.updateFact(factId, 'User prefers light mode');

Features:

  • Semantic search across facts
  • Time-based relevance
  • Conflict detection
  • Automatic consolidation

Working Memory

Temporary scratchpad for current task context.

// Set context
await memoryManager.setContext('current_task', 'checkout', 'session-123');
await memoryManager.setContext('cart_items', ['item1', 'item2'], 'session-123');

// Get context
const task = await memoryManager.getContext('current_task', 'session-123');
const items = await memoryManager.getContext('cart_items', 'session-123');

// Clear context
await memoryManager.clearContext('session-123');

Use Cases:

  • Multi-step workflows
  • State management
  • Temporary calculations
  • Task coordination

Storage Strategies

BufferMemory

Fast FIFO in-memory buffer for recent memories.

import { BufferMemory } from '@hazeljs/rag';

const buffer = new BufferMemory({
  maxSize: 100,
  ttl: 3600000, // 1 hour in milliseconds
});

Best For:

  • Development and testing
  • Recent conversation history
  • Low-latency requirements
  • Temporary context

Advantages:

  • Extremely fast (in-memory)
  • Zero setup
  • No external dependencies
  • Automatic TTL expiration

Limitations:

  • Data lost on restart
  • Limited capacity
  • No semantic search

VectorMemory

Stores memories as embeddings for semantic search.

import { VectorMemory, MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
});

const vectorStore = new MemoryVectorStore(embeddings);
const vectorMemory = new VectorMemory(vectorStore, embeddings, {
  collectionName: 'memories',
});

Best For:

  • Long-term memory storage
  • Semantic search requirements
  • Production deployments
  • Large memory volumes

Advantages:

  • Semantic search
  • Persistent storage
  • Scalable
  • Works with any vector store

Limitations:

  • Slower than buffer
  • Requires embeddings
  • External dependencies

HybridMemory

Combines buffer and vector storage for optimal performance.

import { HybridMemory, BufferMemory, VectorMemory } from '@hazeljs/rag';

const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);

const hybrid = new HybridMemory(buffer, vectorMemory, {
  archiveThreshold: 15, // Archive after 15 messages
});

Best For:

  • Production applications
  • Balancing speed and persistence
  • Large-scale deployments
  • Best of both worlds

How It Works:

  1. Recent memories stay in fast buffer
  2. Old memories automatically archive to vector store
  3. Searches check both stores
  4. Deduplication ensures consistency

RAG Integration

Combine memory with document retrieval for context-aware responses.

import {
  RAGPipelineWithMemory,
  MemoryManager,
  HybridMemory,
  BufferMemory,
  VectorMemory,
  MemoryVectorStore,
  OpenAIEmbeddings,
} from '@hazeljs/rag';

// Setup memory
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
});

const buffer = new BufferMemory({ maxSize: 20 });
const memoryVectorStore = new MemoryVectorStore(embeddings);
const vectorMemory = new VectorMemory(memoryVectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);

const memoryManager = new MemoryManager(hybridMemory, {
  maxConversationLength: 20,
  summarizeAfter: 50,
  entityExtraction: true,
});

// Setup RAG
const documentVectorStore = new MemoryVectorStore(embeddings);

const rag = new RAGPipelineWithMemory(
  {
    vectorStore: documentVectorStore,
    embeddingProvider: embeddings,
    topK: 5,
  },
  memoryManager,
  llmFunction
);

await rag.initialize();

// Add documents
await rag.addDocuments([
  {
    content: 'HazelJS is a modern TypeScript framework...',
    metadata: { source: 'docs' },
  },
]);

// Query with memory context
const response = await rag.queryWithMemory(
  'What did we discuss about pricing?',
  'session-123',
  'user-456'
);

console.log(response.answer);
console.log('Sources:', response.sources);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);

Enhanced Context

The RAG pipeline with memory combines three sources of context:

  1. Document Retrieval: Relevant documents from knowledge base
  2. Conversation History: Recent messages in the conversation
  3. Relevant Memories: Semantically similar past interactions
// Automatic fact extraction
const response = await rag.queryWithLearning(
  'Tell me about HazelJS features',
  'session-123',
  'user-456'
);
// Facts from response are automatically stored

// Get conversation summary
const summary = await rag.getConversationSummary('session-123');

// Recall specific facts
const facts = await rag.recallFacts('user preferences', 5);

// Memory statistics
const stats = await rag.getMemoryStats('session-123');

Advanced Features

Memory Search

Search across all memories semantically:

const relevantMemories = await memoryManager.relevantMemories(
  'pricing and discounts',
  {
    sessionId: 'session-123',
    types: [MemoryType.CONVERSATION, MemoryType.FACT],
    topK: 5,
    minScore: 0.7,
  }
);

Importance Scoring

Automatically calculate and use importance scores:

const memoryManager = new MemoryManager(memoryStore, {
  importanceScoring: true, // Enable automatic scoring
});

// Memories with higher importance are retained longer
// Questions and long content get higher scores

Memory Decay

Time-based relevance scoring:

const memoryManager = new MemoryManager(memoryStore, {
  memoryDecay: true,
  decayRate: 0.1, // 10% decay per time unit
});

// Older memories gradually become less relevant

Memory Statistics

Monitor memory usage:

const stats = await memoryManager.getStats('session-123');

console.log(`Total memories: ${stats.totalMemories}`);
console.log(`By type:`, stats.byType);
console.log(`Average importance: ${stats.averageImportance}`);
console.log(`Oldest: ${stats.oldestMemory}`);
console.log(`Newest: ${stats.newestMemory}`);

Memory Pruning

Clean up old or low-importance memories:

// Prune memories older than 30 days
const pruned = await memoryManager.prune({
  olderThan: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000),
});

// Prune low-importance memories
const pruned = await memoryManager.prune({
  minImportance: 0.5,
});

Configuration

Memory Manager Config

const config = {
  maxConversationLength: 20,      // Max messages in buffer
  summarizeAfter: 50,              // Summarize after N messages
  entityExtraction: true,          // Auto-extract entities
  importanceScoring: true,         // Calculate importance scores
  memoryDecay: false,              // Enable time-based decay
  decayRate: 0.1,                  // Decay rate (if enabled)
  maxWorkingMemorySize: 10,        // Max working memory items
};

const memoryManager = new MemoryManager(memoryStore, config);

Buffer Memory Config

const bufferConfig = {
  maxSize: 100,                    // Max memories in buffer
  ttl: 3600000,                    // Time to live (ms)
};

const buffer = new BufferMemory(bufferConfig);

Hybrid Memory Config

const hybridConfig = {
  bufferSize: 20,                  // Buffer size
  archiveThreshold: 15,            // Archive after N messages
  ttl: 3600000,                    // Buffer TTL
};

const hybrid = new HybridMemory(buffer, vectorMemory, hybridConfig);

Use Cases

Customer Support Bot

// Remember customer information
await memoryManager.trackEntity({
  name: 'Jane Smith',
  type: 'customer',
  attributes: { tier: 'premium', accountId: 'ACC-123' },
  // ...
});

// Store support history
await memoryManager.storeFact(
  'Customer reported login issues on 2024-01-15',
  { customerId: 'ACC-123', category: 'support' }
);

// Context-aware responses
const response = await rag.queryWithMemory(
  'What was my previous issue?',
  'session-123',
  'ACC-123'
);

Personal AI Assistant

// Remember preferences
await memoryManager.storeFact('User prefers concise responses');
await memoryManager.storeFact('User timezone is PST');

// Track tasks
await memoryManager.setContext('active_tasks', ['email', 'meeting'], 'session-123');

// Personalized responses
const response = await rag.queryWithMemory(
  'What should I focus on today?',
  'session-123'
);

Educational Tutor

// Track learning progress
await memoryManager.trackEntity({
  name: 'Student-123',
  type: 'student',
  attributes: {
    level: 'intermediate',
    completedLessons: ['intro', 'basics'],
  },
  // ...
});

// Remember misconceptions
await memoryManager.storeFact(
  'Student confused about async/await',
  { studentId: 'Student-123', topic: 'javascript' }
);

Best Practices

Choose the Right Store

  • Development: Use BufferMemory for fast iteration
  • Production: Use HybridMemory for best performance
  • Semantic Search: Use VectorMemory when search is critical

Set Appropriate Limits

  • Configure maxConversationLength based on LLM token limits
  • Set archiveThreshold to balance performance and memory
  • Use summarizeAfter to compress long conversations

Enable Features Selectively

  • entityExtraction: For tracking people and things
  • importanceScoring: For prioritization
  • memoryDecay: For time-based relevance

Monitor Memory Usage

// Regular monitoring
const stats = await memoryManager.getStats();
console.log(`Memory usage: ${stats.totalMemories}`);

// Periodic pruning
setInterval(async () => {
  await memoryManager.prune({ olderThan: thirtyDaysAgo });
}, 24 * 60 * 60 * 1000); // Daily

Session Management

// Use consistent session IDs
const sessionId = `user-${userId}-${Date.now()}`;

// Clear sessions when done
await memoryManager.clearConversation(sessionId);
await memoryManager.clearContext(sessionId);

Examples

Check out the memory examples for complete working code:

  • Basic Memory: Core features and memory types
  • RAG with Memory: Integration with document retrieval
  • Chatbot with Memory: Complete context-aware chatbot

API Reference

MemoryManager

class MemoryManager {
  // Conversation
  addMessage(message: Message, sessionId: string): Promise<string>
  getConversationHistory(sessionId: string, limit?: number): Promise<Message[]>
  summarizeConversation(sessionId: string): Promise<string>
  clearConversation(sessionId: string): Promise<void>
  
  // Entity
  trackEntity(entity: Entity): Promise<void>
  getEntity(name: string): Promise<Entity | null>
  updateEntity(name: string, updates: Partial<Entity>): Promise<void>
  getAllEntities(sessionId?: string): Promise<Entity[]>
  
  // Facts
  storeFact(fact: string, metadata?: Record<string, any>): Promise<string>
  recallFacts(query: string, options?: MemorySearchOptions): Promise<string[]>
  updateFact(id: string, newContent: string): Promise<void>
  
  // Working Memory
  setContext(key: string, value: any, sessionId: string): Promise<void>
  getContext(key: string, sessionId: string): Promise<any>
  clearContext(sessionId: string): Promise<void>
  
  // Search & Stats
  relevantMemories(query: string, options: MemorySearchOptions): Promise<Memory[]>
  getStats(sessionId?: string): Promise<MemoryStats>
}

RAGPipelineWithMemory

class RAGPipelineWithMemory extends RAGPipeline {
  queryWithMemory(
    query: string,
    sessionId: string,
    userId?: string,
    options?: RAGQueryOptions
  ): Promise<RAGResponseWithMemory>
  
  queryWithLearning(
    query: string,
    sessionId: string,
    userId?: string,
    options?: RAGQueryOptions
  ): Promise<RAGResponseWithMemory>
  
  clearSessionMemory(sessionId: string): Promise<void>
  getConversationSummary(sessionId: string): Promise<string>
  storeFact(fact: string, sessionId?: string, userId?: string): Promise<string>
  recallFacts(query: string, topK?: number): Promise<string[]>
  getMemoryStats(sessionId?: string): Promise<MemoryStats>
}

Next Steps