Document Store
Classes
Classes
Chunker
Hybrid chunking engine that splits documents into optimal chunks for RAG embeddings
Constructor
constructor(options: ChunkOptions)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
options | ChunkOptions | Yes |
Methods
chunkDocument
Main entry point - dispatches to appropriate chunking strategy based on content type
chunkDocument(documentId: string, content: ExtractedContent): ChunkedDocument
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
documentId | string | Yes | |
content | ExtractedContent | Yes |
Returns:
ChunkedDocument -
ContentExtractor
ContentExtractor - Extracts text, structure, and metadata from document buffers
Methods
extractText
Extract text content from a buffer
extractText(buffer: Buffer<ArrayBufferLike>, mimeType: string, language?: string | undefined): ExtractedContent
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
buffer | Buffer<ArrayBufferLike> | Yes | - The document buffer |
mimeType | string | Yes | - MIME type of the document |
language | string | undefined | No | - Optional language/type identifier |
Returns:
ExtractedContent - Extracted content with text, structure, and metadata
FileDetector
Methods
detectLanguage
Detect the programming language or file type from filename and MIME type
detectLanguage(filename: string, mimeType: string): string
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | |
mimeType | string | Yes |
Returns:
string -
detectContentType
Classify the content type of a file based on filename and detected language
detectContentType(filename: string, language: string): ContentType
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | |
language | string | Yes |
Returns:
ContentType -
isTextFile
Determine if a file is text-based from its MIME type
isTextFile(mimeType: string): boolean
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
mimeType | string | Yes |
Returns:
boolean -
RAGQueue
Queue for managing RAG sync jobs with BullMQ and Redis
Constructor
constructor(config: RAGQueueConfig)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | RAGQueueConfig | Yes |
Methods
enqueue
Add a job to the queue
enqueue(job: RAGSyncJob, options?: EnqueueOptions | undefined): Promise<string>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
job | RAGSyncJob | Yes | The RAG sync job data |
options | EnqueueOptions | undefined | No | Optional job options (priority, delay, attempts) |
Returns:
Promise<string> - Job ID
getStats
Get queue statistics
getStats(): Promise<QueueStats>
Returns:
Promise<QueueStats> - Current queue statistics
onCompleted
Register a callback for when jobs complete successfully
onCompleted(callback: (jobId: string, result: RAGSyncResult) => void | Promise<void>): void
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
callback | (jobId: string, result: RAGSyncResult) => void | Promise<void> | Yes | Function to call when a job completes |
onFailed
Register a callback for when jobs fail
onFailed(callback: (jobId: string, error: Error) => void | Promise<void>): void
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
callback | (jobId: string, error: Error) => void | Promise<void> | Yes | Function to call when a job fails |
close
Close the queue and all connections gracefully
close(): Promise<void>
Returns:
Promise<void> -
EmbeddingService
Service for generating text embeddings using Ollama
Constructor
constructor(config: EmbeddingServiceConfig)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | EmbeddingServiceConfig | Yes |
Methods
embed
Generate embedding for a single text
embed(text: string): Promise<number[]>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes |
Returns:
Promise<number[]> -
embedBatch
Generate embeddings for multiple texts in a batch
embedBatch(texts: string[]): Promise<number[][]>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
texts | string[] | Yes |
Returns:
Promise<number[][]> -
S3Adapter
Constructor
constructor(config: S3Config)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | S3Config | Yes |
Methods
generateUploadUrl
generateUploadUrl(params: { key: string; contentType: string; }): Promise<PresignedUploadUrl>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
params | { key: string; contentType: string; } | Yes |
Returns:
Promise<PresignedUploadUrl> -
generateDownloadUrl
generateDownloadUrl(params: { key: string; }): Promise<PresignedDownloadUrl>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
params | { key: string; } | Yes |
Returns:
Promise<PresignedDownloadUrl> -
deleteObject
deleteObject(params: { key: string; }): Promise<void>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
params | { key: string; } | Yes |
Returns:
Promise<void> -
downloadObject
downloadObject(params: { key: string; }): Promise<Buffer<ArrayBufferLike>>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
params | { key: string; } | Yes |
Returns:
Promise<Buffer<ArrayBufferLike>> -
BatchAccumulator
Accumulates file changes and emits batches after debounce timeout
Constructor
constructor(config?: BatchAccumulatorConfig)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | BatchAccumulatorConfig | No |
Methods
add
Add a file change to the accumulator
add(change: GitFileChange): void
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
change | GitFileChange | Yes |
addMultiple
Add multiple changes at once
addMultiple(changes: GitFileChange[]): void
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
changes | GitFileChange[] | Yes |
getPending
Get pending changes without emitting batch
getPending(): GitFileChange[]
Returns:
GitFileChange[] -
flush
Manually flush pending changes as a batch
flush(): Promise<GitWatcherBatch | null>
Returns:
Promise<GitWatcherBatch \| null> -
stop
Stop the accumulator and clear timers
stop(): void
FileFilter
Filters files based on include/exclude patterns and file type
Constructor
constructor(config?: FileFilterConfig)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | FileFilterConfig | No |
Methods
shouldInclude
Check if a file should be included
shouldInclude(filePath: string): boolean
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
filePath | string | Yes |
Returns:
boolean -
getStats
Get filter statistics
getStats(): { totalChecked: number; included: number; excluded: number; }
Returns:
{ totalChecked: number; included: number; excluded: number; } -
resetStats
Reset statistics
resetStats(): void
GitMetadataExtractor
Extracts git metadata for files in a repository
Methods
isValidRepo
Check if a path is a valid git repository
isValidRepo(repoPath: string): Promise<boolean>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
repoPath | string | Yes |
Returns:
Promise<boolean> -
getCurrentBranch
Get current branch name
getCurrentBranch(repoPath: string): Promise<string>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
repoPath | string | Yes |
Returns:
Promise<string> -
getCommitHash
Get commit hash for a specific file
getCommitHash(repoPath: string, filePath: string): Promise<string>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
repoPath | string | Yes | |
filePath | string | Yes |
Returns:
Promise<string> -
getFileMetadata
Get git metadata for a file (commit hash, author, timestamp)
getFileMetadata(repoPath: string, filePath: string): Promise<{ commitHash: string; author: string; committedAt: string; branch: string; }>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
repoPath | string | Yes | |
filePath | string | Yes |
Returns:
Promise<{ commitHash: string; author: string; committedAt: string; branch: string; }> -
GitWatcher
Git Watcher - Monitors repository for file changes and batches them for RAG processing
Constructor
constructor(config: GitWatcherConfig)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | GitWatcherConfig | Yes |
Methods
start
Start watching the repository
start(): Promise<void>
Returns:
Promise<void> -
pause
Pause watching
pause(): Promise<void>
Returns:
Promise<void> -
resume
Resume watching
resume(): Promise<void>
Returns:
Promise<void> -
stop
Stop watching
stop(): Promise<void>
Returns:
Promise<void> -
getStatus
Get current watcher status
getStatus(): GitWatcherStatus
Returns:
GitWatcherStatus -
manualSync
Manually trigger sync (bypass debounce)
manualSync(): Promise<GitWatcherBatch | null>
Returns:
Promise<GitWatcherBatch \| null> -
WatcherManager
Manages multiple git watchers per workspace
Constructor
constructor(maxWatchersPerWorkspace?: number | undefined)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
maxWatchersPerWorkspace | number | undefined | No |
Methods
register
Register a new git watcher
register(config: GitWatcherConfig): Promise<string>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | GitWatcherConfig | Yes |
Returns:
Promise<string> -
get
Get watcher by ID
get(watcherId: string): GitWatcher | undefined
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
watcherId | string | Yes |
Returns:
GitWatcher \| undefined -
listForWorkspace
List all watchers for workspace
listForWorkspace(workspaceId: string): GitWatcherStatus[]
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
workspaceId | string | Yes |
Returns:
GitWatcherStatus[] -
pause
Pause watcher
pause(watcherId: string): Promise<void>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
watcherId | string | Yes |
Returns:
Promise<void> -
resume
Resume watcher
resume(watcherId: string): Promise<void>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
watcherId | string | Yes |
Returns:
Promise<void> -
unregister
Unregister watcher
unregister(watcherId: string): Promise<void>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
watcherId | string | Yes |
Returns:
Promise<void> -
manualSync
Manually trigger sync for watcher
manualSync(watcherId: string): Promise<GitWatcherBatch | null>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
watcherId | string | Yes |
Returns:
Promise<GitWatcherBatch \| null> -
onBatch
Register batch handler
onBatch(handler: (batch: GitWatcherBatch) => Promise<void>): void
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
handler | (batch: GitWatcherBatch) => Promise<void> | Yes |
stopAll
Stop all watchers
stopAll(): Promise<void>
Returns:
Promise<void> -
RAGWorker
BullMQ worker that processes RAG sync jobs
Processing pipeline:
- Download file from S3
- Extract text content
- Chunk content
- Generate embeddings
- Insert to SurrealDB
Constructor
constructor(config: RAGWorkerConfig)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | RAGWorkerConfig | Yes |
Methods
initialize
Initialize connections (must be called before processing)
initialize(): Promise<void>
Returns:
Promise<void> -
stop
Gracefully stop the worker and close connections
stop(): Promise<void>
Returns:
Promise<void> -
SurrealAdapter
Adapter for interacting with SurrealDB for vector operations
Constructor
constructor(config: SurrealConfig)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
config | SurrealConfig | Yes |
Methods
connect
Establish connection to SurrealDB and initialize schema
connect(): Promise<void>
Returns:
Promise<void> -
disconnect
Close connection to SurrealDB
disconnect(): Promise<void>
Returns:
Promise<void> -
isConnected
Check if connected to database
isConnected(): boolean
Returns:
boolean -
insertChunks
Insert multiple vector chunks into SurrealDB
insertChunks(records: VectorRecord[]): Promise<number>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
records | VectorRecord[] | Yes |
Returns:
Promise<number> -
deleteByDocumentId
Delete all vectors associated with a document
deleteByDocumentId(documentId: string): Promise<number>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
documentId | string | Yes |
Returns:
Promise<number> -
vectorSearch
Vector similarity search
vectorSearch(options: { queryVector: number[]; workspaceId?: string; limit?: number; minScore?: number; }): Promise<{ documentId: string; chunkIndex: number; text: string; score: number; metadata?: Record<string, unknown>; }[]>
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
options | { queryVector: number[]; workspaceId?: string; limit?: number; minScore?: number; } | Yes |
Returns:
Promise<{ documentId: string; chunkIndex: number; text: string; score: number; metadata?: Record<string, unknown>; }[]> -