Document Store

Classes

Classes

Chunker

Hybrid chunking engine that splits documents into optimal chunks for RAG embeddings

Constructor

constructor(options: ChunkOptions)

Parameters:

ParameterTypeRequiredDescription
optionsChunkOptionsYes

Methods

chunkDocument

Main entry point - dispatches to appropriate chunking strategy based on content type

chunkDocument(documentId: string, content: ExtractedContent): ChunkedDocument

Parameters:

ParameterTypeRequiredDescription
documentIdstringYes
contentExtractedContentYes

Returns:

ChunkedDocument -

ContentExtractor

ContentExtractor - Extracts text, structure, and metadata from document buffers

Methods

extractText

Extract text content from a buffer

extractText(buffer: Buffer<ArrayBufferLike>, mimeType: string, language?: string | undefined): ExtractedContent

Parameters:

ParameterTypeRequiredDescription
bufferBuffer<ArrayBufferLike>Yes- The document buffer
mimeTypestringYes- MIME type of the document
languagestring | undefinedNo- Optional language/type identifier

Returns:

ExtractedContent - Extracted content with text, structure, and metadata

FileDetector

Methods

detectLanguage

Detect the programming language or file type from filename and MIME type

detectLanguage(filename: string, mimeType: string): string

Parameters:

ParameterTypeRequiredDescription
filenamestringYes
mimeTypestringYes

Returns:

string -

detectContentType

Classify the content type of a file based on filename and detected language

detectContentType(filename: string, language: string): ContentType

Parameters:

ParameterTypeRequiredDescription
filenamestringYes
languagestringYes

Returns:

ContentType -

isTextFile

Determine if a file is text-based from its MIME type

isTextFile(mimeType: string): boolean

Parameters:

ParameterTypeRequiredDescription
mimeTypestringYes

Returns:

boolean -

RAGQueue

Queue for managing RAG sync jobs with BullMQ and Redis

Constructor

constructor(config: RAGQueueConfig)

Parameters:

ParameterTypeRequiredDescription
configRAGQueueConfigYes

Methods

enqueue

Add a job to the queue

enqueue(job: RAGSyncJob, options?: EnqueueOptions | undefined): Promise<string>

Parameters:

ParameterTypeRequiredDescription
jobRAGSyncJobYesThe RAG sync job data
optionsEnqueueOptions | undefinedNoOptional job options (priority, delay, attempts)

Returns:

Promise<string> - Job ID

getStats

Get queue statistics

getStats(): Promise<QueueStats>

Returns:

Promise<QueueStats> - Current queue statistics

onCompleted

Register a callback for when jobs complete successfully

onCompleted(callback: (jobId: string, result: RAGSyncResult) => void | Promise<void>): void

Parameters:

ParameterTypeRequiredDescription
callback(jobId: string, result: RAGSyncResult) => void | Promise<void>YesFunction to call when a job completes

onFailed

Register a callback for when jobs fail

onFailed(callback: (jobId: string, error: Error) => void | Promise<void>): void

Parameters:

ParameterTypeRequiredDescription
callback(jobId: string, error: Error) => void | Promise<void>YesFunction to call when a job fails

close

Close the queue and all connections gracefully

close(): Promise<void>

Returns:

Promise<void> -

EmbeddingService

Service for generating text embeddings using Ollama

Constructor

constructor(config: EmbeddingServiceConfig)

Parameters:

ParameterTypeRequiredDescription
configEmbeddingServiceConfigYes

Methods

embed

Generate embedding for a single text

embed(text: string): Promise<number[]>

Parameters:

ParameterTypeRequiredDescription
textstringYes

Returns:

Promise<number[]> -

embedBatch

Generate embeddings for multiple texts in a batch

embedBatch(texts: string[]): Promise<number[][]>

Parameters:

ParameterTypeRequiredDescription
textsstring[]Yes

Returns:

Promise<number[][]> -

S3Adapter

Constructor

constructor(config: S3Config)

Parameters:

ParameterTypeRequiredDescription
configS3ConfigYes

Methods

generateUploadUrl

generateUploadUrl(params: { key: string; contentType: string; }): Promise<PresignedUploadUrl>

Parameters:

ParameterTypeRequiredDescription
params{ key: string; contentType: string; }Yes

Returns:

Promise<PresignedUploadUrl> -

generateDownloadUrl

generateDownloadUrl(params: { key: string; }): Promise<PresignedDownloadUrl>

Parameters:

ParameterTypeRequiredDescription
params{ key: string; }Yes

Returns:

Promise<PresignedDownloadUrl> -

deleteObject

deleteObject(params: { key: string; }): Promise<void>

Parameters:

ParameterTypeRequiredDescription
params{ key: string; }Yes

Returns:

Promise<void> -

downloadObject

downloadObject(params: { key: string; }): Promise<Buffer<ArrayBufferLike>>

Parameters:

ParameterTypeRequiredDescription
params{ key: string; }Yes

Returns:

Promise<Buffer<ArrayBufferLike>> -

BatchAccumulator

Accumulates file changes and emits batches after debounce timeout

Constructor

constructor(config?: BatchAccumulatorConfig)

Parameters:

ParameterTypeRequiredDescription
configBatchAccumulatorConfigNo

Methods

add

Add a file change to the accumulator

add(change: GitFileChange): void

Parameters:

ParameterTypeRequiredDescription
changeGitFileChangeYes

addMultiple

Add multiple changes at once

addMultiple(changes: GitFileChange[]): void

Parameters:

ParameterTypeRequiredDescription
changesGitFileChange[]Yes

getPending

Get pending changes without emitting batch

getPending(): GitFileChange[]

Returns:

GitFileChange[] -

flush

Manually flush pending changes as a batch

flush(): Promise<GitWatcherBatch | null>

Returns:

Promise<GitWatcherBatch \| null> -

stop

Stop the accumulator and clear timers

stop(): void

FileFilter

Filters files based on include/exclude patterns and file type

Constructor

constructor(config?: FileFilterConfig)

Parameters:

ParameterTypeRequiredDescription
configFileFilterConfigNo

Methods

shouldInclude

Check if a file should be included

shouldInclude(filePath: string): boolean

Parameters:

ParameterTypeRequiredDescription
filePathstringYes

Returns:

boolean -

getStats

Get filter statistics

getStats(): { totalChecked: number; included: number; excluded: number; }

Returns:

{ totalChecked: number; included: number; excluded: number; } -

resetStats

Reset statistics

resetStats(): void

GitMetadataExtractor

Extracts git metadata for files in a repository

Methods

isValidRepo

Check if a path is a valid git repository

isValidRepo(repoPath: string): Promise<boolean>

Parameters:

ParameterTypeRequiredDescription
repoPathstringYes

Returns:

Promise<boolean> -

getCurrentBranch

Get current branch name

getCurrentBranch(repoPath: string): Promise<string>

Parameters:

ParameterTypeRequiredDescription
repoPathstringYes

Returns:

Promise<string> -

getCommitHash

Get commit hash for a specific file

getCommitHash(repoPath: string, filePath: string): Promise<string>

Parameters:

ParameterTypeRequiredDescription
repoPathstringYes
filePathstringYes

Returns:

Promise<string> -

getFileMetadata

Get git metadata for a file (commit hash, author, timestamp)

getFileMetadata(repoPath: string, filePath: string): Promise<{ commitHash: string; author: string; committedAt: string; branch: string; }>

Parameters:

ParameterTypeRequiredDescription
repoPathstringYes
filePathstringYes

Returns:

Promise<{ commitHash: string; author: string; committedAt: string; branch: string; }> -

GitWatcher

Git Watcher - Monitors repository for file changes and batches them for RAG processing

Constructor

constructor(config: GitWatcherConfig)

Parameters:

ParameterTypeRequiredDescription
configGitWatcherConfigYes

Methods

start

Start watching the repository

start(): Promise<void>

Returns:

Promise<void> -

pause

Pause watching

pause(): Promise<void>

Returns:

Promise<void> -

resume

Resume watching

resume(): Promise<void>

Returns:

Promise<void> -

stop

Stop watching

stop(): Promise<void>

Returns:

Promise<void> -

getStatus

Get current watcher status

getStatus(): GitWatcherStatus

Returns:

GitWatcherStatus -

manualSync

Manually trigger sync (bypass debounce)

manualSync(): Promise<GitWatcherBatch | null>

Returns:

Promise<GitWatcherBatch \| null> -

WatcherManager

Manages multiple git watchers per workspace

Constructor

constructor(maxWatchersPerWorkspace?: number | undefined)

Parameters:

ParameterTypeRequiredDescription
maxWatchersPerWorkspacenumber | undefinedNo

Methods

register

Register a new git watcher

register(config: GitWatcherConfig): Promise<string>

Parameters:

ParameterTypeRequiredDescription
configGitWatcherConfigYes

Returns:

Promise<string> -

get

Get watcher by ID

get(watcherId: string): GitWatcher | undefined

Parameters:

ParameterTypeRequiredDescription
watcherIdstringYes

Returns:

GitWatcher \| undefined -

listForWorkspace

List all watchers for workspace

listForWorkspace(workspaceId: string): GitWatcherStatus[]

Parameters:

ParameterTypeRequiredDescription
workspaceIdstringYes

Returns:

GitWatcherStatus[] -

pause

Pause watcher

pause(watcherId: string): Promise<void>

Parameters:

ParameterTypeRequiredDescription
watcherIdstringYes

Returns:

Promise<void> -

resume

Resume watcher

resume(watcherId: string): Promise<void>

Parameters:

ParameterTypeRequiredDescription
watcherIdstringYes

Returns:

Promise<void> -

unregister

Unregister watcher

unregister(watcherId: string): Promise<void>

Parameters:

ParameterTypeRequiredDescription
watcherIdstringYes

Returns:

Promise<void> -

manualSync

Manually trigger sync for watcher

manualSync(watcherId: string): Promise<GitWatcherBatch | null>

Parameters:

ParameterTypeRequiredDescription
watcherIdstringYes

Returns:

Promise<GitWatcherBatch \| null> -

onBatch

Register batch handler

onBatch(handler: (batch: GitWatcherBatch) => Promise<void>): void

Parameters:

ParameterTypeRequiredDescription
handler(batch: GitWatcherBatch) => Promise<void>Yes

stopAll

Stop all watchers

stopAll(): Promise<void>

Returns:

Promise<void> -

RAGWorker

BullMQ worker that processes RAG sync jobs

Processing pipeline:

  1. Download file from S3
  2. Extract text content
  3. Chunk content
  4. Generate embeddings
  5. Insert to SurrealDB

Constructor

constructor(config: RAGWorkerConfig)

Parameters:

ParameterTypeRequiredDescription
configRAGWorkerConfigYes

Methods

initialize

Initialize connections (must be called before processing)

initialize(): Promise<void>

Returns:

Promise<void> -

stop

Gracefully stop the worker and close connections

stop(): Promise<void>

Returns:

Promise<void> -

SurrealAdapter

Adapter for interacting with SurrealDB for vector operations

Constructor

constructor(config: SurrealConfig)

Parameters:

ParameterTypeRequiredDescription
configSurrealConfigYes

Methods

connect

Establish connection to SurrealDB and initialize schema

connect(): Promise<void>

Returns:

Promise<void> -

disconnect

Close connection to SurrealDB

disconnect(): Promise<void>

Returns:

Promise<void> -

isConnected

Check if connected to database

isConnected(): boolean

Returns:

boolean -

insertChunks

Insert multiple vector chunks into SurrealDB

insertChunks(records: VectorRecord[]): Promise<number>

Parameters:

ParameterTypeRequiredDescription
recordsVectorRecord[]Yes

Returns:

Promise<number> -

deleteByDocumentId

Delete all vectors associated with a document

deleteByDocumentId(documentId: string): Promise<number>

Parameters:

ParameterTypeRequiredDescription
documentIdstringYes

Returns:

Promise<number> -

Vector similarity search

vectorSearch(options: { queryVector: number[]; workspaceId?: string; limit?: number; minScore?: number; }): Promise<{ documentId: string; chunkIndex: number; text: string; score: number; metadata?: Record<string, unknown>; }[]>

Parameters:

ParameterTypeRequiredDescription
options{ queryVector: number[]; workspaceId?: string; limit?: number; minScore?: number; }Yes

Returns:

Promise<{ documentId: string; chunkIndex: number; text: string; score: number; metadata?: Record<string, unknown>; }[]> -

Previous
Types