maintainSource Edge Function
Purpose: Document management (add, update, delete) with queue-based processing, RLS, and transaction logging Status: Production Last Updated: 2026-01-01
Change Log
| Date | Change | By |
|---|---|---|
| 2026-01-01 | skipParentStorage: Added flag to bypass parent document storage for faster processing of large files | Claude |
| 2026-01-01 | User filtering: Added user_id=mine filter to manageQueue for viewing own submissions |
Claude |
| 2025-12-31 | Fire-and-forget mode: Added async parameter for immediate return with queueId (HTTP 202) |
Claude |
| 2025-12-31 | Enhanced edge_function_response: Added processingStats with parent/vector doc counts, file type, and load mode |
Claude |
| 2025-11-18 | Queue Everything Pattern: All submissions logged for audit trail and retry capability | Claude |
| 2025-11-14 | Phase 3 Integration: Account balance + platform pricing | Claude |
| 2025-11-06 | Initial implementation with RLS and transaction logging | Claude |
Overview
This Edge Function wraps the maintainSource service from _langchain/services/documents-manager.ts. It provides:
- JWT validation and user claim extraction
- Content moderation via OpenAI Moderation API (optional)
- Queue-based processing for audit trail and retry capability
- Transaction logging for business model tracking
- RLS enforcement via hybrid client approach
API Reference
Endpoint
POST /maintainSource
Authorization: Bearer <jwt_token>
Content-Type: application/json
Request Body
{
// Required
sourceUrl: string; // URL of document to process
sourceDate: string; // ISO date string (e.g., "2025-12-31")
// Actions (at least one required)
delDocs?: boolean; // Delete existing documents for this source
addDocs?: boolean; // Add new documents (fetch, chunk, embed)
updDocs?: boolean; // Update metadata only (no re-embedding)
// Optional metadata
sourceTitle?: string; // Document title
iprOwner?: string; // IPR owner identifier (UUID)
isIprOwner?: boolean; // User is the rights holder
dcCreator?: string; // Dublin Core: Creator
dcPublisher?: string; // Dublin Core: Publisher
dcRights?: string; // Dublin Core: Rights statement
dcIdentifier?: string; // Dublin Core: Alternative ID (DOI, ISBN)
dcSource?: string; // Dublin Core: Source reference
metadata?: object; // Additional custom metadata
// Queue options
priority?: number; // Queue priority 0-100 (higher = first)
// Admin-only options (requires accessLevel >= 9)
active?: boolean; // Set document active status
access_level?: number; // Set document access level (0-10)
onBehalfOfUserId?: string; // Transfer ownership to another user
adminOverride?: boolean; // Bypass ownership check for replace/delete
// Processing options
verbose?: boolean; // Enable detailed logging
skipModeration?: boolean; // Skip content moderation check
content?: string; // Pre-fetched content (skips fetch for moderation)
queueId?: string; // Skip queue creation (already queued)
// Async mode (fire-and-forget)
async?: boolean; // Return immediately with queueId (HTTP 202)
// Performance optimization
skipParentStorage?: boolean; // Skip parent doc storage (faster for large files)
}
Response
Standard Response (sync mode)
{
success: boolean;
statusCode: number; // 200 on success
message: string;
// On success
transactionId?: string; // Transaction ID for tracking
queueId?: string; // Queue item ID
transactionResult?: { // Business model details
success: boolean;
platformFee: number;
transactionType: string;
};
// Dispute support
disputeMailto?: string; // Pre-filled mailto link for disputes
// Processing details (in queue via edge_function_response)
processingStats?: {
parentDocs: { deleted: number; added: number };
vectorDocs: { deleted: number; added: number };
sourceDocsLoaded?: number;
fileType?: string; // "pdf", "html", "txt", "vtt"
loadMode?: string; // "local", "remote", "auto"
};
duration?: number; // Document load duration (ms)
storageDuration?: number; // Storage/embedding duration (ms)
netlifyDuration?: number; // Netlify processing duration (ms)
}
Async Response (fire-and-forget mode)
When async: true is set, the function returns immediately with HTTP 202:
{
success: true;
statusCode: 202;
message: "Document queued for processing";
queueId: string; // Use this to poll for status
userId: string;
orgId: string;
sourceUrl: string;
}
Fire-and-Forget Mode
For long-running document processing, use async mode to avoid client timeouts:
1. Submit with async mode
const response = await fetch(`${SUPABASE_URL}/functions/v1/maintainSource`, {
method: "POST",
headers: {
Authorization: `Bearer ${jwt}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
sourceUrl: "https://example.com/large-document.pdf",
sourceDate: new Date().toISOString(),
addDocs: true,
async: true, // Fire-and-forget mode
}),
});
const { queueId } = await response.json();
// Returns immediately with HTTP 202
2. Check status (choose one approach)
Option A: Manual refresh (recommended)
Use the user_id=mine filter to show the user's submissions. User clicks refresh to see updates.
// List user's queue items with current status
const response = await fetch(
`${SUPABASE_URL}/functions/v1/manageQueue/items?user_id=mine`,
{ headers: { Authorization: `Bearer ${jwt}` } }
);
const { items } = await response.json();
// items contains all user's submissions with their current status
// completed items have edge_function_response with results
Option B: Active polling (optional)
For real-time updates (toast notifications, progress indicators):
async function pollQueueStatus(queueId, jwt, maxWaitMs = 600000) {
let delayMs = 1000; // Start with 1s
const maxDelayMs = 30000; // Max 30s between polls
const startTime = Date.now();
while (Date.now() - startTime < maxWaitMs) {
const response = await fetch(
`${SUPABASE_URL}/functions/v1/manageQueue/items/${queueId}`,
{ headers: { Authorization: `Bearer ${jwt}` } }
);
const item = await response.json();
if (item.status === "completed") {
return { success: true, result: item.edge_function_response };
}
if (item.status === "failed") {
return { success: false, error: item.error_message };
}
// Still processing - wait and retry
await new Promise((r) => setTimeout(r, delayMs));
delayMs = Math.min(delayMs * 1.5, maxDelayMs); // Exponential backoff
}
throw new Error("Queue processing timeout");
}
// Usage: poll in background after submission
const { queueId } = await submitAsync(sourceUrl);
pollQueueStatus(queueId, jwt).then((result) => {
if (result.success) {
showToast("Document processed successfully");
refreshQueueList();
}
});
3. Trigger processing
Processing is triggered via the admin UI "Process Queue" button, which:
- Calls
get_next_upload_queue_item()RPC - Processes each item via
maintainSource - Updates status via
mark_upload_queue_completedormark_upload_queue_failed
Queue Integration
All submissions are logged to document_upload_queue table for:
- Audit trail - Complete record of all document operations
- Retry capability - Failed items can be retried
- Admin visibility - Queue management UI shows all operations
Queue Status Flow
pending → processing → completed/failed
↓
expired (if JWT expires during retry)
Viewing Queue Details
The edge_function_response JSONB column stores processing results:
{
"success": true,
"statusCode": 200,
"message": "Successfully processed https://example.com/doc.pdf. Before: 0, After: 150",
"duration": 12500,
"storageDuration": 8200,
"netlifyDuration": 4300,
"processingStats": {
"parentDocs": { "deleted": 0, "added": 25 },
"vectorDocs": { "deleted": 0, "added": 150 },
"sourceDocsLoaded": 45,
"fileType": "pdf",
"loadMode": "remote"
}
}
Supported File Types
| Extension | Handler | Notes |
|---|---|---|
.pdf |
maintainParentDocumentsText |
Uses local/remote fallback (Netlify for large files) |
.html, .md |
maintainParentDocumentsHTML |
CSS selector support for content extraction |
.txt |
maintainParentDocumentsText |
Plain text processing |
.vtt |
maintainParentDocumentsText |
WebVTT captions (pre-chunked by dialogue) |
| No extension | maintainParentDocumentsHTML |
Assumes HTML (e.g., 11ty pages) |
| Google Docs | maintainParentDocumentsHTML |
Auto-converts to export/view URLs |
Content Moderation
Optional content moderation via OpenAI Moderation API:
// Skip moderation
{
skipModeration: true;
}
// Pre-fetch content for moderation
{
content: "Pre-fetched document text...";
}
If content is flagged, the request is rejected with category details.
Admin Operations
Users with accessLevel >= 9 can:
- Override ownership - Delete/replace documents they don't own
- Transfer ownership - Assign documents to other users
- Set access level - Control document visibility
- Set active status - Activate/deactivate documents
{
adminOverride: true, // Bypass ownership check
onBehalfOfUserId: "uuid...", // Transfer to this user
access_level: 5, // Set access level
active: false // Deactivate document
}
Timeout Constraints
Document processing involves multiple components with different timeout limits:
Architecture
Client → maintainSource (Edge Function) → document-loader → Netlify Background Function
↓ ↓
Polls database ←←←←←←←← writes results to ←←←←←←←←←←←←←←←←←┘
(document_loading_jobs table)
Timeout Limits
| Component | Limit | Notes |
|---|---|---|
| Supabase Edge Function | 400 seconds max | Hard limit, cannot be extended |
| Polling timeout | 390 seconds default | Just under Edge Function limit |
| Netlify Background Function | 15 minutes | Runs independently in Node.js |
How It Works
- Edge Function calls Netlify with a
job_id - Netlify returns
202 Acceptedimmediately (background execution) - Edge Function polls
document_loading_jobstable for completion - Netlify writes results to database when done
The Bottleneck
Even though Netlify can run for 15 minutes, the Edge Function will stop waiting after its timeout. If the Edge Function times out:
- The Netlify function continues processing
- Results are written to
document_loading_jobstable - But the original HTTP request returns a timeout error
- The queue item is marked as failed (can be retried)
Recommendations for Large Files
The default polling timeout is now 390 seconds (6.5 minutes), which should handle most documents. For files that may still timeout:
- Use
skipParentStorage: true- Bypass parent document storage (significant speedup for large files) - Pre-split very large PDFs - Break into smaller files before upload (recommended for > 50MB)
- Use queue retry - Failed items can be retried automatically
- Monitor
document_loading_jobs- Check for completed jobs that timed out client-side
Performance Optimization: skipParentStorage
The skipParentStorage flag bypasses the parent document retriever pattern, storing vector chunks directly in the vectorstore without saving parent documents to Supabase Storage.
When to Use
- Large documents (> 10MB) - Eliminates Storage bucket overhead
- High-volume processing - Faster chunk embedding
- When parent document retrieval isn't needed - RAG queries that work fine with vector chunks
Trade-offs
| Feature | skipParentStorage: false (default) |
skipParentStorage: true |
|---|---|---|
| Storage | Parent docs in Storage bucket + vector chunks | Vector chunks only |
| Speed | Slower (Storage operations) | Faster (direct embedding) |
| Parent Doc Retrieval | ✅ Supported | ❌ Not available |
| Context window | Larger chunks available | Standard 400-char chunks |
Example
// Fast processing for large file (no parent doc storage)
{
sourceUrl: "https://example.com/large-document.pdf",
sourceDate: "2026-01-01",
addDocs: true,
skipParentStorage: true // Skip parent doc storage for speed
}
Stats Difference
When skipParentStorage: true, the processingStats will show:
{
"parentDocs": { "deleted": 0, "added": 0 }, // Always 0
"vectorDocs": { "deleted": 0, "added": 250 }
}
Environment Variables
# Document loader mode: "local" | "remote" | "auto" (default)
DOCUMENT_LOADER_MODE=auto
# Netlify background function URL
NETLIFY_DOCUMENT_LOADER_URL=https://your-site.netlify.app/.netlify/functions/document-loader-background
# API key for Netlify function
NETLIFY_BACKGROUND_API_KEY=your-api-key
Related Files
| File | Purpose |
|---|---|
| manageQueue/README.md | Queue API documentation (for polling status) |
| documents-manager.ts | Core processing logic |
| document-loader.ts | Local/remote loader with Netlify fallback |
| QUEUE_UI_DESIGN.md | Queue management UI design |
| upload-transaction-service.ts | Transaction logging |
| 20251118000002_simplified_queue.sql | Queue table migration |
Testing
# Local testing
curl -X POST http://localhost:54321/functions/v1/maintainSource \
-H "Authorization: Bearer $JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sourceUrl": "https://example.com/doc.pdf",
"sourceDate": "2025-12-31",
"addDocs": true,
"verbose": true
}'
Deployment
# Deploy to Supabase
deno task deploy:maintainSource
# Or via supabase CLI
supabase functions deploy maintainSource