RAG Package

The @hazeljs/rag package provides a comprehensive Retrieval-Augmented Generation (RAG) implementation with built-in memory management for building intelligent, context-aware applications with semantic search, vector databases, and persistent conversation memory.

Purpose

Building RAG applications requires integrating vector databases, managing embeddings, implementing search strategies, handling document chunking, and maintaining conversation context. The @hazeljs/rag package solves these challenges by providing:

5 Vector Store Implementations: Memory, Pinecone, Qdrant, Weaviate, and ChromaDB with a unified interface
Memory System: Conversation tracking, entity memory, fact storage, and working memory for context-aware AI
Multiple Embedding Providers: OpenAI and Cohere embeddings with easy extensibility
Advanced Retrieval Strategies: Hybrid search (vector + BM25), multi-query retrieval, and semantic search
Intelligent Text Splitting: Multiple chunking strategies for optimal retrieval
RAG + Memory Integration: Combine document retrieval with conversation history for enhanced context
Decorator-Based API: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG
Production-Ready: Battle-tested patterns with proper error handling and TypeScript support

Architecture

Loading diagram...

Key Components

RAG Pipeline: Orchestrates document indexing, query processing, and result retrieval
Vector Stores: Pluggable storage backends for embeddings and documents
Embedding Providers: Generate vector embeddings from text
Retrieval Strategies: Advanced search algorithms (hybrid, multi-query, BM25)
Text Splitters: Intelligent document chunking for optimal retrieval
Decorators: @Embeddable, @SemanticSearch, @HybridSearch for declarative RAG

Advantages

Vector Store Flexibility

Start with in-memory storage for development, then seamlessly switch to Pinecone, Qdrant, Weaviate, or ChromaDB for production—all with the same API.

Advanced Retrieval

Built-in support for hybrid search (combining vector and keyword search), multi-query retrieval (generating multiple search queries), and BM25 keyword ranking.

Developer Experience

Decorator-based API means you can add RAG capabilities with a single decorator. No need to manage vector stores, embeddings, or search logic manually.

Production Ready

Proper error handling, TypeScript support, connection pooling, and battle-tested patterns make it ready for production use.

Extensible

Easy to add custom vector stores, embedding providers, or retrieval strategies by implementing simple interfaces.

Installation

# Core RAG package
npm install @hazeljs/rag

# Peer dependencies (choose based on your needs)
npm install openai  # For OpenAI embeddings

# Optional: Vector store clients (install only what you need)
npm install @pinecone-database/pinecone  # For Pinecone
npm install @qdrant/js-client-rest       # For Qdrant
npm install weaviate-ts-client           # For Weaviate
npm install chromadb                     # For ChromaDB

Optional Dependencies:

# For Cohere embeddings
npm install cohere-ai

Quick Start

Basic RAG Pipeline

The simplest way to get started with RAG:

import { 
  RAGPipeline, 
  OpenAIEmbeddings, 
  MemoryVectorStore 
} from '@hazeljs/rag';

// Setup embeddings provider
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536,
});

// Create vector store
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Create RAG pipeline
const rag = new RAGPipeline({
  vectorStore,
  embeddingProvider: embeddings,
  topK: 5, // Return top 5 results
});

await rag.initialize();

// Index documents
await rag.addDocuments([
  {
    content: 'HazelJS is a modern TypeScript framework for building scalable applications.',
    metadata: { category: 'framework', source: 'docs' },
  },
  {
    content: 'The RAG package provides semantic search and vector database integration.',
    metadata: { category: 'rag', source: 'docs' },
  },
]);

// Query with semantic search
const results = await rag.search('What is HazelJS?', { topK: 3 });

console.log('Search Results:');
results.forEach((result, index) => {
  console.log(`${index + 1}. ${result.content}`);
  console.log(`   Score: ${result.score}`);
  console.log(`   Metadata:`, result.metadata);
});

Vector Stores

The RAG package supports 5 vector store implementations with a unified interface.

Memory Vector Store (Development)

In-memory storage with no external dependencies. Perfect for development and testing.

Advantages:

Zero setup required
Extremely fast
No external dependencies
Great for testing and CI/CD

Limitations:

Data lost on restart
Limited to available memory
Not suitable for production

import { MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

// Use it
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Pinecone Vector Store (Production, Serverless)

Fully managed, serverless vector database with automatic scaling.

Advantages:

Fully managed (no infrastructure)
Auto-scaling
Global distribution
High performance
Excellent for serverless deployments

Limitations:

Paid service (free tier available)
Network latency for self-hosted alternatives

import { PineconeVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new PineconeVectorStore(embeddings, {
  apiKey: process.env.PINECONE_API_KEY,
  environment: process.env.PINECONE_ENVIRONMENT,
  indexName: 'my-knowledge-base',
});

await vectorStore.initialize();

// Same API as Memory store
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });

Setup:

Sign up at pinecone.io
Create an index with dimension matching your embeddings (1536 for OpenAI text-embedding-3-small)
Get your API key and environment from the dashboard

Qdrant Vector Store (High-Performance, Self-Hosted)

Rust-based vector database optimized for speed and efficiency.

Advantages:

Extremely fast (Rust-based)
Advanced filtering capabilities
Self-hosted (full control)
Open-source
Cost-effective for large datasets

Limitations:

Requires infrastructure management
Setup complexity

import { QdrantVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new QdrantVectorStore(embeddings, {
  url: process.env.QDRANT_URL || 'http://localhost:6333',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 6333:6333 qdrant/qdrant

Weaviate Vector Store (GraphQL, Flexible)

Open-source vector database with GraphQL API and advanced features.

Advantages:

GraphQL API
Flexible schema
Built-in vectorization
Hybrid search support
Multi-tenancy

Limitations:

Requires infrastructure
Learning curve for GraphQL

import { WeaviateVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new WeaviateVectorStore(embeddings, {
  host: process.env.WEAVIATE_HOST || 'http://localhost:8080',
  className: 'MyKnowledgeBase',
});

await vectorStore.initialize();

Setup with Docker:

docker run -p 8080:8080 semitechnologies/weaviate:latest

ChromaDB Vector Store (Prototyping, Embedded)

Lightweight, embeddable vector database perfect for prototyping.

Advantages:

Easy setup
Lightweight
Can run embedded or as a server
Great for prototyping
Python and JavaScript support

Limitations:

Less mature than alternatives
Limited scalability for very large datasets

import { ChromaVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new ChromaVectorStore(embeddings, {
  url: process.env.CHROMA_URL || 'http://localhost:8000',
  collectionName: 'my-knowledge-base',
});

await vectorStore.initialize();

// ChromaDB-specific features
const stats = await vectorStore.getStats();
console.log('Collection size:', stats.count);

const preview = await vectorStore.peek(5);
console.log('First 5 documents:', preview);

Setup with Docker:

docker run -p 8000:8000 chromadb/chroma

Vector Store Comparison

Feature	Memory	Pinecone	Qdrant	Weaviate	ChromaDB
Setup	None	API Key	Docker	Docker	Docker
Persistence	❌	✅	✅	✅	✅
Scalability	Low	High	High	High	Medium
Performance	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐
Cost	Free	Paid	Free (OSS)	Free (OSS)	Free (OSS)
Best For	Dev/Test	Production	High-perf	GraphQL	Prototyping
Metadata Filtering	✅	✅	✅	✅	✅
Hybrid Search	❌	✅	✅	✅	❌
Multi-tenancy	❌	✅	✅	✅	❌

Embedding Providers

Embedding providers convert text into vector representations for semantic search.

OpenAI Embeddings

State-of-the-art embeddings from OpenAI with multiple model options.

Models:

text-embedding-3-small: 1536 dimensions, fast and cost-effective
text-embedding-3-large: 3072 dimensions, highest quality
text-embedding-ada-002: Legacy model, 1536 dimensions

import { OpenAIEmbeddings } from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Optional: reduce dimensions for faster search
});

// Embed single text
const vector = await embeddings.embed('Hello world');
console.log('Vector dimensions:', vector.length);

// Embed multiple texts (batch)
const vectors = await embeddings.embedBatch([
  'First document',
  'Second document',
  'Third document',
]);

Cohere Embeddings

Multilingual embeddings from Cohere with excellent performance.

Models:

embed-english-v3.0: English-only, high quality
embed-multilingual-v3.0: 100+ languages
embed-english-light-v3.0: Faster, smaller model

import { CohereEmbeddings } from '@hazeljs/rag';

const embeddings = new CohereEmbeddings({
  apiKey: process.env.COHERE_API_KEY,
  model: 'embed-english-v3.0',
  inputType: 'search_document', // or 'search_query'
});

const vector = await embeddings.embed('Hello world');

Retrieval Strategies

Advanced search strategies for better results.

Hybrid Search

Combines vector similarity search with BM25 keyword search for best results.

Loading diagram...

How it works:

Performs vector similarity search (semantic understanding)
Performs BM25 keyword search (exact term matching)
Normalizes scores from both methods
Combines scores with configurable weights
Returns re-ranked results

import { 
  HybridSearchRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const hybridSearch = new HybridSearchRetrieval(vectorStore, {
  vectorWeight: 0.7,  // 70% weight to semantic search
  keywordWeight: 0.3, // 30% weight to keyword search
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with hybrid strategy
const results = await hybridSearch.search('machine learning algorithms', {
  topK: 5,
});

Multi-Query Retrieval

Generates multiple query variations using an LLM to improve recall.

Loading diagram...

How it works:

Takes user's original question
Uses LLM to generate multiple variations
Searches with each variation
Deduplicates results
Re-ranks by frequency and average score

import { 
  MultiQueryRetrieval, 
  MemoryVectorStore, 
  OpenAIEmbeddings 
} from '@hazeljs/rag';

const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
});

const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();

const multiQuery = new MultiQueryRetrieval(vectorStore, {
  llmApiKey: process.env.OPENAI_API_KEY,
  numQueries: 3, // Generate 3 query variations
  topK: 10,
});

// Add documents
await vectorStore.addDocuments(documents);

// Search with multiple query variations
const results = await multiQuery.search('How do I deploy my app?', {
  topK: 5,
});

Text Splitters

Intelligent document chunking for optimal retrieval.

Recursive Character Text Splitter

Splits text recursively by trying different separators (paragraphs, sentences, words).

import { RecursiveCharacterTextSplitter } from '@hazeljs/rag';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,      // Target chunk size in characters
  chunkOverlap: 200,    // Overlap between chunks for context
  separators: ['\n\n', '\n', '. ', ' '], // Try these in order
});

const chunks = await splitter.splitText(longDocument);

console.log(`Split into ${chunks.length} chunks`);
chunks.forEach((chunk, i) => {
  console.log(`Chunk ${i + 1}: ${chunk.substring(0, 50)}...`);
});

Character Text Splitter

Simple character-based splitting with overlap.

import { CharacterTextSplitter } from '@hazeljs/rag';

const splitter = new CharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 50,
  separator: '\n',
});

const chunks = await splitter.splitText(document);

Token Text Splitter

Splits by token count (useful for LLM context windows).

import { TokenTextSplitter } from '@hazeljs/rag';

const splitter = new TokenTextSplitter({
  chunkSize: 512,      // Max tokens per chunk
  chunkOverlap: 50,    // Overlap in tokens
  encodingName: 'cl100k_base', // OpenAI encoding
});

const chunks = await splitter.splitText(document);

Decorators

Declarative RAG with decorators.

@Embeddable

Mark a class as embeddable for automatic vector storage.

import { Embeddable, Embedded } from '@hazeljs/rag';

@Embeddable({
  vectorStore: 'memory',
  embeddingProvider: 'openai',
})
class Article {
  @Embedded()
  title: string;

  @Embedded()
  content: string;

  metadata: {
    author: string;
    date: Date;
  };
}

@SemanticSearch

Add semantic search to a method.

import { Controller, Get } from '@hazeljs/common';
import { SemanticSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get()
  @SemanticSearch({
    vectorStore: 'pinecone',
    topK: 5,
  })
  async search(@Query('q') query: string) {
    // Results automatically injected
    return { query, results: this.searchResults };
  }
}

@HybridSearch

Add hybrid search (vector + keyword) to a method.

import { Controller, Get } from '@hazeljs/common';
import { HybridSearch } from '@hazeljs/rag';

@Controller('search')
class SearchController {
  @Get('hybrid')
  @HybridSearch({
    vectorStore: 'qdrant',
    vectorWeight: 0.7,
    keywordWeight: 0.3,
    topK: 10,
  })
  async hybridSearch(@Query('q') query: string) {
    return { query, results: this.searchResults };
  }
}

Best Practices

Choose the Right Vector Store

Development: Use MemoryVectorStore for fast iteration
Production (Serverless): Use PineconeVectorStore for zero infrastructure
Production (Self-Hosted): Use QdrantVectorStore for performance and cost
Prototyping: Use ChromaVectorStore for quick setup

Optimize Chunk Size

// For Q&A: Smaller chunks (200-500 chars)
const qaChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 300,
  chunkOverlap: 50,
});

// For summarization: Larger chunks (1000-2000 chars)
const summaryChunks = new RecursiveCharacterTextSplitter({
  chunkSize: 1500,
  chunkOverlap: 200,
});

Use Metadata Filtering

// Add metadata when indexing
await vectorStore.addDocuments([
  {
    content: 'Document content',
    metadata: {
      category: 'technical',
      date: '2024-01-01',
      author: 'John Doe',
    },
  },
]);

// Filter during search
const results = await vectorStore.search('query', {
  topK: 5,
  filter: {
    category: 'technical',
    date: { $gte: '2024-01-01' },
  },
});

Implement Caching

import { CacheService } from '@hazeljs/cache';

class RAGService {
  constructor(
    private vectorStore: VectorStore,
    private cache: CacheService,
  ) {}

  async search(query: string) {
    const cacheKey = `search:${query}`;
    
    // Check cache first
    const cached = await this.cache.get(cacheKey);
    if (cached) return cached;

    // Perform search
    const results = await this.vectorStore.search(query);

    // Cache results
    await this.cache.set(cacheKey, results, 3600); // 1 hour

    return results;
  }
}

Monitor Performance

async function searchWithMetrics(query: string) {
  const start = Date.now();
  
  try {
    const results = await vectorStore.search(query);
    const duration = Date.now() - start;
    
    console.log(`Search completed in ${duration}ms`);
    console.log(`Found ${results.length} results`);
    
    return results;
  } catch (error) {
    console.error('Search failed:', error);
    throw error;
  }
}

Troubleshooting

Connection Errors

// Add retry logic
async function connectWithRetry(vectorStore: VectorStore, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await vectorStore.initialize();
      console.log('Connected successfully');
      return;
    } catch (error) {
      console.log(`Connection attempt ${i + 1} failed`);
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
    }
  }
}

Dimension Mismatch

// Ensure embedding dimensions match vector store configuration
// OpenAI text-embedding-3-small = 1536 dimensions
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  dimensions: 1536, // Must match index
});

Docker Setup for Self-Hosted Stores

# Qdrant
docker run -p 6333:6333 qdrant/qdrant

# Weaviate
docker run -p 8080:8080 semitechnologies/weaviate:latest

# ChromaDB
docker run -p 8000:8000 chromadb/chroma

Low Search Quality

Increase chunk overlap: More context between chunks
Adjust chunk size: Smaller chunks for precise retrieval
Use hybrid search: Combine semantic and keyword search
Add metadata filtering: Narrow down search scope
Try multi-query retrieval: Generate multiple search variations

High Latency

Use batch operations: Process multiple documents at once
Cache embeddings: Store embeddings with documents
Optimize topK: Request fewer results
Use production vector stores: Pinecone, Qdrant, or Weaviate
Enable connection pooling: For self-hosted databases

Memory System

The RAG package includes a powerful memory system for building context-aware AI applications. See the Memory System Guide for complete documentation.

Quick Example

import {
  RAGPipelineWithMemory,
  MemoryManager,
  HybridMemory,
  BufferMemory,
  VectorMemory,
} from '@hazeljs/rag';

// Setup memory
const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);

const memoryManager = new MemoryManager(hybridMemory, {
  maxConversationLength: 20,
  summarizeAfter: 50,
  entityExtraction: true,
});

// Create RAG with memory
const rag = new RAGPipelineWithMemory(
  { vectorStore, embeddingProvider: embeddings },
  memoryManager,
  llmFunction
);

// Query with conversation context
const response = await rag.queryWithMemory(
  'What did we discuss about pricing?',
  'session-123',
  'user-456'
);

console.log(response.answer);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);

Memory Features

Conversation Memory: Track multi-turn conversations with auto-summarization
Entity Memory: Remember people, companies, and relationships
Fact Storage: Store and recall facts semantically
Working Memory: Temporary context for current tasks
Hybrid Storage: Fast buffer + persistent vector storage
Semantic Search: Find relevant memories using embeddings

Learn more in the Memory System Guide.

What's Next?

Explore the Memory System for context-aware AI
Learn about AI Package for LLM integration with RAG
Explore Caching to optimize RAG performance
Check out Config for managing API keys
Read the Vector Stores Guide for detailed setup
See RAG Patterns for advanced techniques

API Reference

For complete API documentation, see the RAG API Reference.