Loading...
The @hazeljs/rag package provides a comprehensive Retrieval-Augmented Generation (RAG) implementation with built-in memory management for building intelligent, context-aware applications with semantic search, vector databases, and persistent conversation memory.
Building RAG applications requires integrating vector databases, managing embeddings, implementing search strategies, handling document chunking, and maintaining conversation context. The @hazeljs/rag package solves these challenges by providing:
@Embeddable, @SemanticSearch, @HybridSearch for declarative RAG@Embeddable, @SemanticSearch, @HybridSearch for declarative RAGStart with in-memory storage for development, then seamlessly switch to Pinecone, Qdrant, Weaviate, or ChromaDB for production—all with the same API.
Built-in support for hybrid search (combining vector and keyword search), multi-query retrieval (generating multiple search queries), and BM25 keyword ranking.
Decorator-based API means you can add RAG capabilities with a single decorator. No need to manage vector stores, embeddings, or search logic manually.
Proper error handling, TypeScript support, connection pooling, and battle-tested patterns make it ready for production use.
Easy to add custom vector stores, embedding providers, or retrieval strategies by implementing simple interfaces.
# Core RAG package
npm install @hazeljs/rag
# Peer dependencies (choose based on your needs)
npm install openai # For OpenAI embeddings
# Optional: Vector store clients (install only what you need)
npm install @pinecone-database/pinecone # For Pinecone
npm install @qdrant/js-client-rest # For Qdrant
npm install weaviate-ts-client # For Weaviate
npm install chromadb # For ChromaDB
Optional Dependencies:
# For Cohere embeddings
npm install cohere-ai
The simplest way to get started with RAG:
import {
RAGPipeline,
OpenAIEmbeddings,
MemoryVectorStore
} from '@hazeljs/rag';
// Setup embeddings provider
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536,
});
// Create vector store
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
// Create RAG pipeline
const rag = new RAGPipeline({
vectorStore,
embeddingProvider: embeddings,
topK: 5, // Return top 5 results
});
await rag.initialize();
// Index documents
await rag.addDocuments([
{
content: 'HazelJS is a modern TypeScript framework for building scalable applications.',
metadata: { category: 'framework', source: 'docs' },
},
{
content: 'The RAG package provides semantic search and vector database integration.',
metadata: { category: 'rag', source: 'docs' },
},
]);
// Query with semantic search
const results = await rag.search('What is HazelJS?', { topK: 3 });
console.log('Search Results:');
results.forEach((result, index) => {
console.log(`${index + 1}. ${result.content}`);
console.log(` Score: ${result.score}`);
console.log(` Metadata:`, result.metadata);
});
The RAG package supports 5 vector store implementations with a unified interface.
In-memory storage with no external dependencies. Perfect for development and testing.
Advantages:
Limitations:
import { MemoryVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
// Use it
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });
Fully managed, serverless vector database with automatic scaling.
Advantages:
Limitations:
import { PineconeVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new PineconeVectorStore(embeddings, {
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
indexName: 'my-knowledge-base',
});
await vectorStore.initialize();
// Same API as Memory store
await vectorStore.addDocuments(documents);
const results = await vectorStore.search('query', { topK: 5 });
Setup:
Rust-based vector database optimized for speed and efficiency.
Advantages:
Limitations:
import { QdrantVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new QdrantVectorStore(embeddings, {
url: process.env.QDRANT_URL || 'http://localhost:6333',
collectionName: 'my-knowledge-base',
});
await vectorStore.initialize();
Setup with Docker:
docker run -p 6333:6333 qdrant/qdrant
Open-source vector database with GraphQL API and advanced features.
Advantages:
Limitations:
import { WeaviateVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new WeaviateVectorStore(embeddings, {
host: process.env.WEAVIATE_HOST || 'http://localhost:8080',
className: 'MyKnowledgeBase',
});
await vectorStore.initialize();
Setup with Docker:
docker run -p 8080:8080 semitechnologies/weaviate:latest
Lightweight, embeddable vector database perfect for prototyping.
Advantages:
Limitations:
import { ChromaVectorStore, OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new ChromaVectorStore(embeddings, {
url: process.env.CHROMA_URL || 'http://localhost:8000',
collectionName: 'my-knowledge-base',
});
await vectorStore.initialize();
// ChromaDB-specific features
const stats = await vectorStore.getStats();
console.log('Collection size:', stats.count);
const preview = await vectorStore.peek(5);
console.log('First 5 documents:', preview);
Setup with Docker:
docker run -p 8000:8000 chromadb/chroma
| Feature | Memory | Pinecone | Qdrant | Weaviate | ChromaDB |
|---|---|---|---|---|---|
| Setup | None | API Key | Docker | Docker | Docker |
| Persistence | ❌ | ✅ | ✅ | ✅ | ✅ |
| Scalability | Low | High | High | High | Medium |
| Performance | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Cost | Free | Paid | Free (OSS) | Free (OSS) | Free (OSS) |
| Best For | Dev/Test | Production | High-perf | GraphQL | Prototyping |
| Metadata Filtering | ✅ | ✅ | ✅ | ✅ | ✅ |
| Hybrid Search | ❌ | ✅ | ✅ | ✅ | ❌ |
| Multi-tenancy | ❌ | ✅ | ✅ | ✅ | ❌ |
Embedding providers convert text into vector representations for semantic search.
State-of-the-art embeddings from OpenAI with multiple model options.
Models:
text-embedding-3-small: 1536 dimensions, fast and cost-effectivetext-embedding-3-large: 3072 dimensions, highest qualitytext-embedding-ada-002: Legacy model, 1536 dimensionsimport { OpenAIEmbeddings } from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536, // Optional: reduce dimensions for faster search
});
// Embed single text
const vector = await embeddings.embed('Hello world');
console.log('Vector dimensions:', vector.length);
// Embed multiple texts (batch)
const vectors = await embeddings.embedBatch([
'First document',
'Second document',
'Third document',
]);
Multilingual embeddings from Cohere with excellent performance.
Models:
embed-english-v3.0: English-only, high qualityembed-multilingual-v3.0: 100+ languagesembed-english-light-v3.0: Faster, smaller modelimport { CohereEmbeddings } from '@hazeljs/rag';
const embeddings = new CohereEmbeddings({
apiKey: process.env.COHERE_API_KEY,
model: 'embed-english-v3.0',
inputType: 'search_document', // or 'search_query'
});
const vector = await embeddings.embed('Hello world');
Advanced search strategies for better results.
Combines vector similarity search with BM25 keyword search for best results.
How it works:
import {
HybridSearchRetrieval,
MemoryVectorStore,
OpenAIEmbeddings
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
const hybridSearch = new HybridSearchRetrieval(vectorStore, {
vectorWeight: 0.7, // 70% weight to semantic search
keywordWeight: 0.3, // 30% weight to keyword search
topK: 10,
});
// Add documents
await vectorStore.addDocuments(documents);
// Search with hybrid strategy
const results = await hybridSearch.search('machine learning algorithms', {
topK: 5,
});
Generates multiple query variations using an LLM to improve recall.
How it works:
import {
MultiQueryRetrieval,
MemoryVectorStore,
OpenAIEmbeddings
} from '@hazeljs/rag';
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
});
const vectorStore = new MemoryVectorStore(embeddings);
await vectorStore.initialize();
const multiQuery = new MultiQueryRetrieval(vectorStore, {
llmApiKey: process.env.OPENAI_API_KEY,
numQueries: 3, // Generate 3 query variations
topK: 10,
});
// Add documents
await vectorStore.addDocuments(documents);
// Search with multiple query variations
const results = await multiQuery.search('How do I deploy my app?', {
topK: 5,
});
Intelligent document chunking for optimal retrieval.
Splits text recursively by trying different separators (paragraphs, sentences, words).
import { RecursiveCharacterTextSplitter } from '@hazeljs/rag';
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // Target chunk size in characters
chunkOverlap: 200, // Overlap between chunks for context
separators: ['\n\n', '\n', '. ', ' '], // Try these in order
});
const chunks = await splitter.splitText(longDocument);
console.log(`Split into ${chunks.length} chunks`);
chunks.forEach((chunk, i) => {
console.log(`Chunk ${i + 1}: ${chunk.substring(0, 50)}...`);
});
Simple character-based splitting with overlap.
import { CharacterTextSplitter } from '@hazeljs/rag';
const splitter = new CharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 50,
separator: '\n',
});
const chunks = await splitter.splitText(document);
Splits by token count (useful for LLM context windows).
import { TokenTextSplitter } from '@hazeljs/rag';
const splitter = new TokenTextSplitter({
chunkSize: 512, // Max tokens per chunk
chunkOverlap: 50, // Overlap in tokens
encodingName: 'cl100k_base', // OpenAI encoding
});
const chunks = await splitter.splitText(document);
Declarative RAG with decorators.
Mark a class as embeddable for automatic vector storage.
import { Embeddable, Embedded } from '@hazeljs/rag';
@Embeddable({
vectorStore: 'memory',
embeddingProvider: 'openai',
})
class Article {
@Embedded()
title: string;
@Embedded()
content: string;
metadata: {
author: string;
date: Date;
};
}
Add semantic search to a method.
import { Controller, Get } from '@hazeljs/common';
import { SemanticSearch } from '@hazeljs/rag';
@Controller('search')
class SearchController {
@Get()
@SemanticSearch({
vectorStore: 'pinecone',
topK: 5,
})
async search(@Query('q') query: string) {
// Results automatically injected
return { query, results: this.searchResults };
}
}
Add hybrid search (vector + keyword) to a method.
import { Controller, Get } from '@hazeljs/common';
import { HybridSearch } from '@hazeljs/rag';
@Controller('search')
class SearchController {
@Get('hybrid')
@HybridSearch({
vectorStore: 'qdrant',
vectorWeight: 0.7,
keywordWeight: 0.3,
topK: 10,
})
async hybridSearch(@Query('q') query: string) {
return { query, results: this.searchResults };
}
}
MemoryVectorStore for fast iterationPineconeVectorStore for zero infrastructureQdrantVectorStore for performance and costChromaVectorStore for quick setup// For Q&A: Smaller chunks (200-500 chars)
const qaChunks = new RecursiveCharacterTextSplitter({
chunkSize: 300,
chunkOverlap: 50,
});
// For summarization: Larger chunks (1000-2000 chars)
const summaryChunks = new RecursiveCharacterTextSplitter({
chunkSize: 1500,
chunkOverlap: 200,
});
// Add metadata when indexing
await vectorStore.addDocuments([
{
content: 'Document content',
metadata: {
category: 'technical',
date: '2024-01-01',
author: 'John Doe',
},
},
]);
// Filter during search
const results = await vectorStore.search('query', {
topK: 5,
filter: {
category: 'technical',
date: { $gte: '2024-01-01' },
},
});
import { CacheService } from '@hazeljs/cache';
class RAGService {
constructor(
private vectorStore: VectorStore,
private cache: CacheService,
) {}
async search(query: string) {
const cacheKey = `search:${query}`;
// Check cache first
const cached = await this.cache.get(cacheKey);
if (cached) return cached;
// Perform search
const results = await this.vectorStore.search(query);
// Cache results
await this.cache.set(cacheKey, results, 3600); // 1 hour
return results;
}
}
async function searchWithMetrics(query: string) {
const start = Date.now();
try {
const results = await vectorStore.search(query);
const duration = Date.now() - start;
console.log(`Search completed in ${duration}ms`);
console.log(`Found ${results.length} results`);
return results;
} catch (error) {
console.error('Search failed:', error);
throw error;
}
}
// Add retry logic
async function connectWithRetry(vectorStore: VectorStore, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
await vectorStore.initialize();
console.log('Connected successfully');
return;
} catch (error) {
console.log(`Connection attempt ${i + 1} failed`);
if (i === maxRetries - 1) throw error;
await new Promise(resolve => setTimeout(resolve, 1000 * (i + 1)));
}
}
}
// Ensure embedding dimensions match vector store configuration
// OpenAI text-embedding-3-small = 1536 dimensions
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: 'text-embedding-3-small',
dimensions: 1536, // Must match index
});
# Qdrant
docker run -p 6333:6333 qdrant/qdrant
# Weaviate
docker run -p 8080:8080 semitechnologies/weaviate:latest
# ChromaDB
docker run -p 8000:8000 chromadb/chroma
The RAG package includes a powerful memory system for building context-aware AI applications. See the Memory System Guide for complete documentation.
import {
RAGPipelineWithMemory,
MemoryManager,
HybridMemory,
BufferMemory,
VectorMemory,
} from '@hazeljs/rag';
// Setup memory
const buffer = new BufferMemory({ maxSize: 20 });
const vectorMemory = new VectorMemory(vectorStore, embeddings);
const hybridMemory = new HybridMemory(buffer, vectorMemory);
const memoryManager = new MemoryManager(hybridMemory, {
maxConversationLength: 20,
summarizeAfter: 50,
entityExtraction: true,
});
// Create RAG with memory
const rag = new RAGPipelineWithMemory(
{ vectorStore, embeddingProvider: embeddings },
memoryManager,
llmFunction
);
// Query with conversation context
const response = await rag.queryWithMemory(
'What did we discuss about pricing?',
'session-123',
'user-456'
);
console.log(response.answer);
console.log('Memories:', response.memories);
console.log('History:', response.conversationHistory);
Learn more in the Memory System Guide.
For complete API documentation, see the RAG API Reference.