PDF-to-Audio Package

The @hazeljs/pdf-to-audio package converts PDF documents to audio using OpenAI TTS. It extracts text from PDFs, chunks it intelligently, generates speech per chunk, and merges the audio into a single output file—perfect for audiobooks, accessibility, or listening on the go.

Purpose

Converting documents to audio involves PDF parsing, text chunking, TTS orchestration, and audio merging. The @hazeljs/pdf-to-audio package handles these concerns:

PDF Text Extraction — Extract and clean text from PDF documents
Intelligent Chunking — Split text into optimal segments for TTS
OpenAI TTS — High-quality speech synthesis via tts-1 or tts-1-hd
Async Job Queue — BullMQ-based jobs for long conversions
Physical File Storage — Output files written to disk for serving or archiving
AI Summaries — Optional document summary at the start of the audio
Summary-Only Mode — Output just the summary without reading the full document
CLI Support — Convert PDFs from the command line against a running API

Architecture

graph TD
  A["PDF File"] --> B["PDF Parser"]
  B --> C["Text Extraction"]
  C --> D["Chunker"]
  D --> E["Chunks"]
  E --> F["OpenAI TTS"]
  F --> G["Audio Segments"]
  G --> H["Merge"]
  H --> I["MP3 / Opus Output"]
  J["Optional"] --> K["AI Summary"]
  K --> I
  
  style A fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style B fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style C fill:#3b82f6,stroke:#60a5fa,stroke-width:2px,color:#fff
  style D fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style E fill:#10b981,stroke:#34d399,stroke-width:2px,color:#fff
  style F fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style G fill:#8b5cf6,stroke:#a78bfa,stroke-width:2px,color:#fff
  style H fill:#f59e0b,stroke:#fbbf24,stroke-width:2px,color:#fff
  style I fill:#ec4899,stroke:#f472b6,stroke-width:2px,color:#fff

Installation

npm install @hazeljs/pdf-to-audio @hazeljs/core @hazeljs/ai @hazeljs/queue @hazeljs/rag ioredis

Note: Redis is required for the job queue. Start Redis before using PDF-to-audio.

Environment

Variable	Description	Required
`OPENAI_API_KEY`	OpenAI API key for TTS	Yes
`REDIS_HOST`	Redis host (default: `localhost`)	For module/CLI
`REDIS_PORT`	Redis port (default: `6379`)	For module/CLI

Module (REST API)

import { HazelModule } from '@hazeljs/core';
import { PdfToAudioModule } from '@hazeljs/pdf-to-audio';

@HazelModule({
  imports: [
    PdfToAudioModule.forRoot({
      connection: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379', 10),
      },
      outputDir: './data/pdf-to-audio', // optional, default: ./data/pdf-to-audio
    }),
  ],
})
export class AppModule {}

Endpoints

POST /api/pdf-to-audio/convert — Submit a PDF for conversion
- Content-Type: multipart/form-data
- Field: file (PDF file)
- Response: { jobId } (202)
- Optional fields: includeSummary, summaryOnly, voice, model, format
GET /api/pdf-to-audio/status/:jobId — Check job status
- Statuses: pending, processing, completed, failed
GET /api/pdf-to-audio/download/:jobId — Download the MP3 when the job is completed

Example: Submit and Poll

# Submit job
curl -X POST -F "file=@report.pdf" http://localhost:3000/api/pdf-to-audio/convert
# => { "jobId": "abc123" }

# Check status
curl http://localhost:3000/api/pdf-to-audio/status/abc123

# Download when completed
curl -O http://localhost:3000/api/pdf-to-audio/download/abc123

Service (Programmatic)

Use PdfToAudioService when you need to convert PDFs directly in code without the REST API:

import { PdfToAudioService } from '@hazeljs/pdf-to-audio';
import { OpenAIProvider } from '@hazeljs/ai';

const provider = new OpenAIProvider();
const service = new PdfToAudioService(provider);

const pdfBuffer = await fs.readFile('document.pdf');
const audioBuffer = await service.convert(pdfBuffer, {
  voice: 'alloy',
  model: 'tts-1',
  format: 'mp3',
});

CLI

The CLI works with a running API server. It submits a job, optionally waits for completion, then saves the output.

# Submit and wait, save to output file
hazel pdf-to-audio convert document.pdf --api-url http://localhost:3000 --wait -o audio.mp3

# Submit only (returns job ID)
hazel pdf-to-audio convert document.pdf --api-url http://localhost:3000

# Check status and download when ready
hazel pdf-to-audio status <jobId> --api-url http://localhost:3000 -o audio.mp3

# Summary only (no full document)
hazel pdf-to-audio convert report.pdf --summary-only --wait -o summary.mp3

Options

Option	Description	Default
`voice`	TTS voice: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`	`alloy`
`model`	TTS model: `tts-1`, `tts-1-hd`	`tts-1`
`format`	Output format: `mp3`, `opus`	`mp3`
`includeSummary`	Include AI-generated document summary at the start	`true`
`summaryOnly`	Output only the summary—do not read the full document	`false`

Best Practices

Long documents: Use async jobs; conversions can take several minutes.
Physical storage: Set outputDir so files are written to disk for easier serving and archiving.
Summary-only: Use summaryOnly: true for quick 2–4 sentence overviews without full narration.

What's Next?

See the AI Package for LLM and TTS providers
Explore RAG Package for document processing
Read the v0.2.0-beta.27 release blog for PDF-to-audio updates

API Reference

For the complete API, see the @hazeljs/pdf-to-audio package.