RAG Patterns & Best Practices

Advanced patterns, techniques, and best practices for building production-ready RAG applications with HazelJS.

RAG Architecture Patterns

Basic RAG Pattern

The simplest RAG implementation: retrieve relevant documents and use them as context.

Loading diagram...
import { RAGPipeline, OpenAIEmbeddings, MemoryVectorStore } from '@hazeljs/rag';
import { AIService } from '@hazeljs/ai';

async function basicRAG(query: string) {
  // 1. Setup
  const embeddings = new OpenAIEmbeddings({
    apiKey: process.env.OPENAI_API_KEY,
  });
  
  const vectorStore = new MemoryVectorStore(embeddings);
  await vectorStore.initialize();
  
  const rag = new RAGPipeline({
    vectorStore,
    embeddingProvider: embeddings,
    topK: 3,
  });
  
  // 2. Retrieve
  const results = await rag.query(query);
  
  // 3. Format context
  const context = results.sources
    .map(s => s.content)
    .join('\n\n');
  
  // 4. Generate with LLM
  const aiService = new AIService();
  const response = await aiService.executeTask({
    name: 'rag-answer',
    provider: 'openai',
    model: 'gpt-4-turbo-preview',
    prompt: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:`,
    outputType: 'string',
  }, {});
  
  return {
    answer: response.data,
    sources: results.sources,
  };
}

Hybrid RAG Pattern

Combines vector search with keyword search for better retrieval.

Loading diagram...
import { HybridSearchRetrieval } from '@hazeljs/rag';

async function hybridRAG(query: string) {
  const hybridSearch = new HybridSearchRetrieval(vectorStore, {
    vectorWeight: 0.7,
    keywordWeight: 0.3,
  });
  
  await hybridSearch.indexDocuments(documents);
  
  const results = await hybridSearch.search(query, { topK: 5 });
  
  // Continue with LLM generation...
}

Multi-Query RAG Pattern

Generates multiple query variations for comprehensive retrieval.

Loading diagram...
import { MultiQueryRetrieval } from '@hazeljs/rag';

async function multiQueryRAG(query: string) {
  const multiQuery = new MultiQueryRetrieval(vectorStore, {
    numQueries: 3,
    llmProvider: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
  });
  
  const results = await multiQuery.retrieve(query, { topK: 10 });
  
  // Results are already deduped and ranked
  // Continue with LLM generation...
}

Conversational RAG Pattern

Maintains conversation history for context-aware responses.

interface ConversationMessage {
  role: 'user' | 'assistant';
  content: string;
}

class ConversationalRAG {
  private conversations = new Map<string, ConversationMessage[]>();
  
  async chat(sessionId: string, message: string) {
    // 1. Get conversation history
    const history = this.conversations.get(sessionId) || [];
    
    // 2. Retrieve relevant documents
    const results = await rag.query(message);
    const context = results.sources.map(s => s.content).join('\n\n');
    
    // 3. Format with history
    const historyText = history
      .map(m => `${m.role}: ${m.content}`)
      .join('\n');
    
    // 4. Generate response
    const response = await aiService.executeTask({
      name: 'conversational-rag',
      provider: 'openai',
      model: 'gpt-4-turbo-preview',
      prompt: `
        Conversation History:
        ${historyText}
        
        Context:
        ${context}
        
        User: ${message}
        Assistant:
      `,
      outputType: 'string',
    }, {});
    
    // 5. Update history
    history.push(
      { role: 'user', content: message },
      { role: 'assistant', content: response.data }
    );
    this.conversations.set(sessionId, history);
    
    return {
      answer: response.data,
      sources: results.sources,
    };
  }
}

Stateful RAG with Database Persistence

Use the Agent Runtime with database state management for production-grade conversational RAG:

import { Agent, Tool, AgentRuntime } from '@hazeljs/agent';
import { DatabaseStateManager } from '@hazeljs/agent/state';
import { RAGPipeline } from '@hazeljs/rag';
import { PrismaClient } from '@prisma/client';

@Agent({ name: 'conversational-rag-agent' })
class ConversationalRAGAgent {
  constructor(private rag: RAGPipeline) {}

  @Tool({ description: 'Answer questions using RAG with conversation context' })
  async answer(query: string): Promise<any> {
    // Agent runtime automatically provides conversation history
    // from database state manager
    const results = await this.rag.query(query);
    
    return {
      answer: results.answer,
      sources: results.sources,
      confidence: results.confidence,
    };
  }
}

// Setup with database persistence
const prisma = new PrismaClient();
const stateManager = new DatabaseStateManager({
  client: prisma,
  softDelete: true,        // Keep deleted contexts for audit
  autoArchive: true,       // Archive old conversations
  archiveThresholdDays: 30, // Archive after 30 days
});

const runtime = new AgentRuntime({
  stateManager,
  // ... other config
});

// Execute with persistent conversation state
const result = await runtime.execute('conversational-rag-agent', 'What is HazelJS?', {
  sessionId: 'user-123',
  userId: 'user-abc',
  enableMemory: true,  // Automatic conversation history
  enableRAG: true,     // Enable RAG integration
});

// Continue conversation - history is automatically loaded from database
const followUp = await runtime.resume(result.executionId, 'Tell me more about its features');

// Query all conversations for a session
const sessionContexts = await stateManager.getSessionContexts('user-123');
console.log(`Found ${sessionContexts.length} conversations in this session`);

Database State Manager Features:

  • Automatic State Persistence: All conversation history, steps, and context saved to database
  • Session Management: Track multiple conversations per user/session
  • Pause/Resume: Long-running RAG queries can be paused and resumed
  • Soft Deletes: Keep audit trail of deleted conversations
  • Auto-Archiving: Automatically archive old conversations
  • Working Memory: Store temporary context and variables
  • Entity Tracking: Track entities mentioned in conversations
  • Full Audit Trail: Complete history of all agent steps and decisions

Benefits:

  • Production-Ready: Durable state management for production applications
  • Scalable: Database-backed state works across multiple instances
  • Queryable: SQL queries for analytics and monitoring
  • Recoverable: Resume conversations after crashes or restarts
  • Compliant: Full audit trail for compliance requirements

Document Chunking Strategies

Fixed-Size Chunking

Simple but effective for uniform content.

import { RecursiveTextSplitter } from '@hazeljs/rag';

const splitter = new RecursiveTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const chunks = splitter.splitText(longDocument);

Best For:

  • Uniform content (articles, documentation)
  • Simple implementation
  • Predictable chunk sizes

Semantic Chunking

Split by meaning rather than size (future enhancement).

// Future implementation
const semanticSplitter = new SemanticTextSplitter({
  embeddingProvider: embeddings,
  similarityThreshold: 0.8,
});

Best For:

  • Preserving context
  • Complex documents
  • Better retrieval quality

Document Structure-Aware Chunking

Split by document structure (headings, paragraphs).

function splitByStructure(markdown: string) {
  const sections = markdown.split(/^##\s+/gm);
  
  return sections.map(section => ({
    content: section,
    metadata: {
      heading: section.split('\n')[0],
      type: 'section',
    },
  }));
}

Best For:

  • Structured documents (Markdown, HTML)
  • Maintaining hierarchy
  • Better context preservation

Metadata Strategies

Rich Metadata

Add comprehensive metadata for better filtering and ranking.

await vectorStore.addDocuments([
  {
    content: 'Document content',
    metadata: {
      // Source information
      source: 'documentation',
      url: 'https://example.com/doc',
      
      // Temporal information
      createdAt: '2024-01-01',
      updatedAt: '2024-01-15',
      
      // Classification
      category: 'technical',
      tags: ['typescript', 'framework', 'rag'],
      
      // Quality signals
      author: 'John Doe',
      reviewStatus: 'approved',
      
      // Custom fields
      importance: 'high',
      audience: 'developers',
    },
  },
]);

Hierarchical Metadata

Organize documents in hierarchies.

await vectorStore.addDocuments([
  {
    content: 'Chapter content',
    metadata: {
      book: 'HazelJS Guide',
      chapter: 'RAG Package',
      section: 'Vector Stores',
      subsection: 'Pinecone',
      hierarchy: ['book', 'chapter', 'section', 'subsection'],
    },
  },
]);

// Search within hierarchy
const results = await vectorStore.search('pinecone setup', {
  filter: {
    book: 'HazelJS Guide',
    chapter: 'RAG Package',
  },
});

Temporal Metadata

Track document freshness and relevance.

function addTemporalMetadata(doc: Document) {
  return {
    ...doc,
    metadata: {
      ...doc.metadata,
      indexedAt: new Date().toISOString(),
      expiresAt: new Date(Date.now() + 30 * 24 * 60 * 60 * 1000).toISOString(),
      version: '1.0',
    },
  };
}

// Filter by freshness
const recentResults = await vectorStore.search('query', {
  filter: {
    indexedAt: { $gte: '2024-01-01' },
  },
});

Query Optimization

Query Expansion

Expand user queries with synonyms and related terms.

async function expandQuery(query: string): Promise<string[]> {
  const aiService = new AIService();
  
  const expansion = await aiService.executeTask({
    name: 'query-expansion',
    provider: 'openai',
    model: 'gpt-3.5-turbo',
    prompt: `
      Generate 3 alternative phrasings for this query: "${query}"
      Return as JSON array of strings.
    `,
    outputType: 'json',
  }, {});
  
  return [query, ...expansion.data];
}

// Use expanded queries
const queries = await expandQuery('vector database setup');
const allResults = await Promise.all(
  queries.map(q => vectorStore.search(q))
);

Query Rewriting

Rewrite queries for better retrieval.

async function rewriteQuery(query: string): Promise<string> {
  const aiService = new AIService();
  
  const rewritten = await aiService.executeTask({
    name: 'query-rewrite',
    provider: 'openai',
    model: 'gpt-3.5-turbo',
    prompt: `
      Rewrite this query to be more specific and searchable: "${query}"
      Focus on key technical terms and concepts.
    `,
    outputType: 'string',
  }, {});
  
  return rewritten.data;
}

Query Classification

Route queries to appropriate retrieval strategies.

async function classifyQuery(query: string) {
  // Simple classification
  const hasQuestionWords = /^(what|how|why|when|where|who)/i.test(query);
  const hasTechnicalTerms = /\b(api|code|function|class|error)\b/i.test(query);
  
  if (hasQuestionWords) {
    return 'semantic'; // Use vector search
  } else if (hasTechnicalTerms) {
    return 'hybrid'; // Use hybrid search
  } else {
    return 'keyword'; // Use BM25
  }
}

async function smartSearch(query: string) {
  const strategy = await classifyQuery(query);
  
  switch (strategy) {
    case 'semantic':
      return vectorStore.search(query);
    case 'hybrid':
      return hybridSearch.search(query);
    case 'keyword':
      return bm25.search(query);
  }
}

Response Generation

Citation-Aware Generation

Include source citations in responses.

async function generateWithCitations(query: string) {
  const results = await rag.query(query);
  
  const contextWithCitations = results.sources
    .map((source, idx) => `[${idx + 1}] ${source.content}`)
    .join('\n\n');
  
  const response = await aiService.executeTask({
    name: 'cited-answer',
    provider: 'openai',
    model: 'gpt-4-turbo-preview',
    prompt: `
      Context (with citations):
      ${contextWithCitations}
      
      Question: ${query}
      
      Provide an answer and cite sources using [1], [2], etc.
    `,
    outputType: 'string',
  }, {});
  
  return {
    answer: response.data,
    sources: results.sources.map((s, idx) => ({
      citation: `[${idx + 1}]`,
      content: s.content,
      metadata: s.metadata,
    })),
  };
}

Confidence Scoring

Score answer confidence based on retrieval quality.

function calculateConfidence(results: SearchResult[]): number {
  if (results.length === 0) return 0;
  
  // Average similarity score
  const avgScore = results.reduce((sum, r) => sum + r.score, 0) / results.length;
  
  // Score distribution (lower variance = higher confidence)
  const variance = results.reduce(
    (sum, r) => sum + Math.pow(r.score - avgScore, 2),
    0
  ) / results.length;
  
  // Combine metrics
  const confidence = avgScore * (1 - Math.min(variance, 0.5));
  
  return Math.round(confidence * 100);
}

async function answerWithConfidence(query: string) {
  const results = await rag.query(query);
  const confidence = calculateConfidence(results.sources);
  
  if (confidence < 50) {
    return {
      answer: "I don't have enough information to answer confidently.",
      confidence,
      sources: results.sources,
    };
  }
  
  // Generate answer...
}

Performance Optimization

Caching Strategy

Cache embeddings and search results for better performance.

In-Memory Caching (Development)

Simple in-memory caching for development:

class CachedRAG {
  private embeddingCache = new Map<string, number[]>();
  private searchCache = new Map<string, SearchResult[]>();
  
  async getEmbedding(text: string): Promise<number[]> {
    if (this.embeddingCache.has(text)) {
      return this.embeddingCache.get(text)!;
    }
    
    const embedding = await embeddings.embed(text);
    this.embeddingCache.set(text, embedding);
    return embedding;
  }
  
  async search(query: string): Promise<SearchResult[]> {
    const cacheKey = `${query}:${Date.now() / 60000 | 0}`; // 1-minute cache
    
    if (this.searchCache.has(cacheKey)) {
      return this.searchCache.get(cacheKey)!;
    }
    
    const results = await vectorStore.search(query);
    this.searchCache.set(cacheKey, results);
    return results;
  }
}

Redis Caching (Production)

Use Redis for distributed caching across multiple instances:

import { CacheService } from '@hazeljs/cache';
import { RAGPipeline } from '@hazeljs/rag';

class ProductionRAGService {
  private cache: CacheService;
  private rag: RAGPipeline;

  constructor() {
    // Initialize Redis cache
    this.cache = new CacheService({
      strategy: 'redis',
      redis: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379'),
        password: process.env.REDIS_PASSWORD,
      },
      ttl: 3600, // 1 hour default
    });

    this.rag = new RAGPipeline({
      vectorStore,
      embeddingProvider: embeddings,
    });
  }

  async search(query: string, options?: { topK?: number }): Promise<SearchResult[]> {
    const cacheKey = `rag:search:${query}:${options?.topK || 5}`;
    
    // Try cache first
    const cached = await this.cache.get<SearchResult[]>(cacheKey);
    if (cached) {
      console.log('Cache hit for query:', query);
      return cached;
    }

    // Perform search
    console.log('Cache miss, performing search:', query);
    const results = await this.rag.search(query, options);

    // Cache results with tags for invalidation
    await this.cache.setWithTags(
      cacheKey,
      results,
      3600, // 1 hour TTL
      ['rag-searches', `topK:${options?.topK || 5}`]
    );

    return results;
  }

  async addDocuments(documents: Document[]): Promise<void> {
    // Add documents to vector store
    await this.rag.addDocuments(documents);

    // Invalidate all search caches since index changed
    await this.cache.invalidateTags(['rag-searches']);
    console.log('Invalidated all RAG search caches');
  }

  async getCachedStats(): Promise<any> {
    return await this.cache.getStats();
  }
}

// Usage
const ragService = new ProductionRAGService();

// First search - cache miss
const results1 = await ragService.search('What is HazelJS?');

// Second search - cache hit (fast!)
const results2 = await ragService.search('What is HazelJS?');

// Add new documents - invalidates cache
await ragService.addDocuments([
  { content: 'New documentation', metadata: {} }
]);

// Next search - cache miss (cache was invalidated)
const results3 = await ragService.search('What is HazelJS?');

// Check cache performance
const stats = await ragService.getCachedStats();
console.log('Cache hit rate:', stats.hitRate);

Multi-Tier Caching (Optimal Performance)

Combine memory and Redis for best performance:

import { CacheService } from '@hazeljs/cache';

class OptimizedRAGService {
  private cache: CacheService;

  constructor() {
    // Multi-tier: L1 (memory) + L2 (Redis)
    this.cache = new CacheService({
      strategy: 'multi-tier',
      redis: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379'),
      },
      ttl: 3600,
    });
  }

  async search(query: string): Promise<SearchResult[]> {
    return await this.cache.getOrSet(
      `rag:search:${query}`,
      async () => {
        // Only called on cache miss
        return await this.rag.search(query);
      },
      3600,
      ['rag-searches']
    );
  }
}

Benefits of Redis Caching:

  • Distributed: Shared cache across multiple server instances
  • Persistent: Cache survives server restarts
  • Scalable: Handle high traffic with Redis cluster
  • Fast: Sub-millisecond response times for cached queries
  • Tag-based invalidation: Invalidate related caches together

Batch Processing

Process documents in batches for efficiency.

async function indexLargeDataset(documents: Document[]) {
  const batchSize = 100;
  
  for (let i = 0; i < documents.length; i += batchSize) {
    const batch = documents.slice(i, i + batchSize);
    
    console.log(`Processing batch ${i / batchSize + 1}...`);
    await vectorStore.addDocuments(batch);
    
    // Rate limiting
    await new Promise(resolve => setTimeout(resolve, 1000));
  }
}

Parallel Retrieval

Retrieve from multiple sources in parallel.

async function parallelRetrieval(query: string) {
  const [vectorResults, keywordResults, cachedResults] = await Promise.all([
    vectorStore.search(query),
    bm25.search(query, 10),
    getCachedResults(query),
  ]);
  
  // Merge and deduplicate
  const allResults = [...vectorResults, ...keywordResults, ...cachedResults];
  const uniqueResults = deduplicateById(allResults);
  
  return uniqueResults.sort((a, b) => b.score - a.score).slice(0, 10);
}

Error Handling

Graceful Degradation

Fallback to simpler strategies on failure.

async function robustSearch(query: string) {
  try {
    // Try hybrid search first
    return await hybridSearch.search(query);
  } catch (error) {
    console.warn('Hybrid search failed, falling back to vector search');
    
    try {
      return await vectorStore.search(query);
    } catch (error) {
      console.error('Vector search failed, using cached results');
      return getCachedResults(query);
    }
  }
}

Retry Logic

Implement exponential backoff for transient failures.

async function searchWithRetry(
  query: string,
  maxRetries = 3
): Promise<SearchResult[]> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await vectorStore.search(query);
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      
      const delay = Math.pow(2, attempt) * 1000;
      console.log(`Retry ${attempt + 1} after ${delay}ms`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  
  throw new Error('Max retries exceeded');
}

Monitoring and Observability

Metrics Collection

Track key RAG metrics.

class RAGMetrics {
  private metrics = {
    queries: 0,
    avgLatency: 0,
    avgRelevance: 0,
    cacheHits: 0,
    errors: 0,
  };
  
  async trackQuery(fn: () => Promise<any>) {
    const start = Date.now();
    this.metrics.queries++;
    
    try {
      const result = await fn();
      const latency = Date.now() - start;
      
      this.metrics.avgLatency = 
        (this.metrics.avgLatency * (this.metrics.queries - 1) + latency) /
        this.metrics.queries;
      
      return result;
    } catch (error) {
      this.metrics.errors++;
      throw error;
    }
  }
  
  getMetrics() {
    return { ...this.metrics };
  }
}

Logging

Structured logging for debugging.

function logRAGOperation(operation: string, data: any) {
  console.log(JSON.stringify({
    timestamp: new Date().toISOString(),
    operation,
    ...data,
  }));
}

async function searchWithLogging(query: string) {
  logRAGOperation('search_start', { query });
  
  const start = Date.now();
  const results = await vectorStore.search(query);
  const duration = Date.now() - start;
  
  logRAGOperation('search_complete', {
    query,
    resultCount: results.length,
    duration,
    topScore: results[0]?.score,
  });
  
  return results;
}

Testing Strategies

Unit Testing

Test individual components.

import { describe, it, expect } from '@jest/globals';

describe('RAG Pipeline', () => {
  it('should retrieve relevant documents', async () => {
    const rag = new RAGPipeline({
      vectorStore: new MemoryVectorStore(embeddings),
      embeddingProvider: embeddings,
    });
    
    await rag.addDocuments([
      { content: 'TypeScript is a typed superset of JavaScript' },
    ]);
    
    const results = await rag.query('What is TypeScript?');
    
    expect(results.sources).toHaveLength(1);
    expect(results.sources[0].score).toBeGreaterThan(0.7);
  });
});

Integration Testing

Test end-to-end RAG flow.

describe('RAG Integration', () => {
  it('should answer questions correctly', async () => {
    const answer = await basicRAG('What is HazelJS?');
    
    expect(answer.answer).toContain('framework');
    expect(answer.sources.length).toBeGreaterThan(0);
  });
});

Evaluation Metrics

Measure RAG quality.

interface EvaluationResult {
  precision: number;
  recall: number;
  f1Score: number;
}

function evaluateRetrieval(
  retrieved: SearchResult[],
  relevant: string[]
): EvaluationResult {
  const retrievedIds = new Set(retrieved.map(r => r.id));
  const relevantIds = new Set(relevant);
  
  const truePositives = [...retrievedIds].filter(id => relevantIds.has(id)).length;
  const precision = truePositives / retrieved.length;
  const recall = truePositives / relevant.length;
  const f1Score = 2 * (precision * recall) / (precision + recall);
  
  return { precision, recall, f1Score };
}

Production Checklist

Before Deployment

  • Choose appropriate vector store for scale
  • Implement error handling and retries
  • Add monitoring and logging
  • Set up caching strategy
  • Configure rate limiting
  • Test with production-like data
  • Optimize chunk size and overlap
  • Implement metadata filtering
  • Add confidence scoring
  • Set up backup and recovery

Performance Targets

  • Search Latency: < 500ms for p95
  • Indexing Throughput: > 100 docs/second
  • Cache Hit Rate: > 70%
  • Error Rate: < 1%
  • Relevance Score: > 0.7 average

What's Next?