README - RAG Edge Function

Streaming RAG endpoint for knowledge base Q&A with RoC credit distribution Modified: 2026-Jan-31 03:15:32 UTC

RAG Edge Function

Purpose: Streaming Retrieval-Augmented Generation (RAG) endpoint for knowledge base Q&A with RoC (Return on Contribution) credit distribution.

Status: Active Last Updated: 2026-01-20 Version: 2.2.0


Overview

The RAG edge function is the core Q&A endpoint that:

  • Streams LLM-generated answers via Server-Sent Events (SSE)
  • Retrieves relevant documents from Supabase vector store
  • Calculates source attention weights for citation accuracy
  • Distributes RoC credits to content contributors
  • Supports multiple LLM providers (OpenAI, Anthropic)
  • Maintains conversation history via Supabase (per-org tables)
  • Persists metadata with assistant responses (model, confidence, sources, transaction, timing)
  • Extracts topic tags for Hot Topics analytics (privacy-preserving query analytics)

Base URL

https://<project-ref>.supabase.co/functions/v1/rag

Authentication

All requests require JWT authentication via the Authorization header:

Authorization: Bearer <user-jwt>

The JWT must contain app claims with:

  • authId - User's auth UUID
  • orgId - Organization ID
  • accessLevel - User's access level

Request Format

Method: POST Content-Type: application/json

Request Body

{
  // Required
  chatMessages: {
    system: string;   // System prompt context
    user: string;     // User's question
  };

  // Optional
  sessionId?: string;          // Conversation session ID (for memory)
  metadataFilter?: object;     // Vector search filter
  specificity?: "1" | "2" | "3"; // Retrieval method (default: "1")
  modelVersion?: string;       // LLM model name (default: platform default)
  provider?: "openai" | "anthropic"; // LLM provider (default: "openai")
  apiKey?: string;             // BYOK API key (optional)
  verbose?: boolean;           // Enable verbose logging
}

Specificity Values

Value Method Description
"1" SimilaritySearch Standard vector similarity search (default)
"2" Self-Query LLM-structured query (temporarily disabled)
"3" Parent Parent document retrieval (not implemented)

API Key Options

  1. Platform Key (default): Uses platform's OpenAI/Anthropic key
  2. Header Key: x-openai-key or x-anthropic-key header
  3. Body Key: apiKey field in request body

When using platform keys, the model is enforced to the platform default (e.g., gpt-4.1-mini).


Response Format

The response is a Server-Sent Events (SSE) stream. See SSE_EVENTS_REFERENCE.md for complete event documentation.

Key SSE Events

Event Type Description
model_info LLM provider/model being used
progress Pipeline progress phases
results Streaming answer text chunks
events-answer-confidence Answer confidence score
events-sources-data Source contribution breakdown
transaction-summary Query cost and RoC distribution
error Error messages

Transaction Summary Structure

The transaction-summary event includes Dublin Core metadata for each source:

{
  transaction_id: string;
  platform_fee: number;
  duration_seconds: number;
  duration_ms: number;
  sources_used: number;
  roc_credits_distributed: number;
  contributors: number;
  hourly_rate: number;
  roc_split_percent: number;
  balance_before?: number;
  balance_after?: number;
  sources: [
    {
      source_url: string;
      source_title?: string;
      contributor_id: string;
      contributor_name?: string;
      portion: number;
      roc_earned?: number;
      chunks_used?: number;
      dublin_core?: {           // Dublin Core metadata for popover display
        dc_title?: string;
        dc_creator?: string;
        dc_publisher?: string;
        dc_date?: string;
        dc_rights?: string;
        dc_description?: string;
        dc_source?: string;
        dc_identifier?: string;
      } | null;
    }
  ];
}

Example Request

curl -X POST \
  'https://<project>.supabase.co/functions/v1/rag' \
  -H 'Authorization: Bearer <jwt>' \
  -H 'Content-Type: application/json' \
  -d '{
    "chatMessages": {
      "system": "You are a helpful assistant.",
      "user": "What is the company mission?"
    },
    "sessionId": "session-123",
    "specificity": "1"
  }'

With BYOK (Bring Your Own Key)

curl -X POST \
  'https://<project>.supabase.co/functions/v1/rag' \
  -H 'Authorization: Bearer <jwt>' \
  -H 'Content-Type: application/json' \
  -H 'x-anthropic-key: sk-ant-...' \
  -d '{
    "chatMessages": {
      "system": "You are a helpful assistant.",
      "user": "Explain the architecture."
    },
    "provider": "anthropic",
    "modelVersion": "claude-sonnet-4-20250514"
  }'

Environment Variables

Variable Required Description
OPENAI_API_KEY Yes Platform OpenAI API key
ANTHROPIC_API_KEY No Platform Anthropic API key
OPENAI_DEFAULT_MODEL No Default model for platform (default: gpt-4.1-mini)
OPENAI_DEFAULT_MODEL_BYOK No Default model for BYOK users
ANTHROPIC_DEFAULT_MODEL_BYOK No Default Anthropic model for BYOK
RETRIEVER_K No Number of documents to retrieve (default: 5)
SUPABASE_DISTANCE No Distance metric: cosine, inner, euclidean
ATTENTION_THRESHOLD No Min attention % to include source (default: 0)
VERBOSE No Enable verbose logging (default: true)

File Structure

rag/
├── README.md                      # This file
├── index.ts                       # Main RAG endpoint
├── _deps.ts                       # Dependency re-exports
├── deno.json                      # Deno configuration
├── SSE_EVENTS_REFERENCE.md        # Complete SSE event documentation
├── test.html                      # Browser-based test client
├── rag.test.js                    # Integration tests
├── helpers-utils.test.ts          # Unit tests
│
├── # Dashboard-related docs (deprecated - see dashboardApi)
├── DASHBOARDS_README.md
├── DASHBOARD_SESSION_SUMMARY.md
├── TRANSACTION_MINI_DASHBOARD.md
├── transaction-mini-dashboard.js
└── dashboard-account-summary.html

Development

Run Locally

deno task run:rag

Run with Debug Inspection

deno task run:rag:inspect

Deploy

deno task deploy:rag

Architecture

Request Flow

1. JWT Validation → Extract user claims (authId, orgId)
2. API Key Resolution → Platform vs BYOK
3. Vector Store Init → Supabase vector store with RLS
4. Document Retrieval → Similarity search with scores
5. Attention Calculation → Source contribution weights
6. LLM Generation → Streaming answer via LangChain
7. Source Relevance → LLM self-reflection on source contribution
8. Topic Extraction → Extract topic tags (if sources contributed)
9. Transaction Logging → RoC credit distribution + topic tags
10. Balance Update → Deduct platform fee, credit contributors

Key Dependencies

Module Purpose
@langchain/openai OpenAI LLM integration
@langchain/anthropic Anthropic LLM integration
@langchain/core LangChain core abstractions
@langchain/community Supabase vector store
@xata.io/client Conversation history storage
@supabase/supabase-js Database client

Internal Modules

Import Purpose
@core/deps.ts Centralized dependency exports
@utils/helpers-utils.ts Utility functions
@stores/supabase-store.ts Vector store operations
@stores/xata.ts Conversation memory
@business-model/services/ Transaction logging, RoC distribution

RoC Credit Distribution

When sources contribute to an answer:

  1. Platform Fee: Charged to querying user (based on query duration)
  2. RoC Split: 60% of platform fee distributed to contributors
  3. Portion-Based: Credits weighted by source attention scores
  4. Self-Use Filter: Users don't receive RoC for their own content

Dublin Core Metadata

Source documents may include Dublin Core metadata (ISO 15836) extracted from vector metadata:

  • dc_title - Document title
  • dc_creator - Author/creator
  • dc_publisher - Publishing entity
  • dc_date - Publication date
  • dc_rights - Copyright/license info
  • dc_description - Brief summary
  • dc_source - Original source URL
  • dc_identifier - DOI, ISBN, etc.

Frontend clients can display this metadata in popovers for source attribution.


Hot Topics (Query Analytics)

The RAG function extracts topic tags from user queries for privacy-preserving analytics. This enables the "Hot Topics" dashboard visualization without exposing raw query text.

How It Works

  1. Topic Extraction: After the RAG chain completes successfully, an LLM extracts 2-5 topic tags from the query
  2. Smart Gating: Topics are only extracted when sources actually contribute to the answer
    • Skipped when resultsFound === false (no documents retrieved)
    • Skipped when sourcesContributed === false (LLM didn't use sources)
  3. Storage: Topics are stored in transactions.metadata.query_topics:
    {
      query_topics: {
        topics: ["budget-planning", "q4-forecast"],
        extracted_at: "2026-01-19T...",
        confidence: 0.9,
        version: "v1"
      }
    }

Tag Structure Rules

Tags follow consistent semantic structure for aggregation:

  • Document types: [type]-[subject] ordering

    • transcript-meeting (not meeting-transcript)
    • report-financial (not financial-report)
    • policy-hr (not hr-policy)
  • Actions: [action]-[subject] ordering

    • planning-budget (not budget-planning)
    • audit-compliance (not compliance-audit)
  • Time periods: Standalone

    • q4, annual, monthly, 2025

Canonical Terms

The LLM is instructed to use canonical terms for consistency:

  • transcript (not: transcription, notes, minutes)
  • meeting (not: call, session, standup)
  • report (not: document, file, summary)
  • policy (not: guideline, rule, procedure)

Environment Variables

Variable Default Description
HOT_TOPICS_ENABLED true Set to false to disable topic extraction

Privacy Considerations

  • Raw queries never stored - Only derived topic tags
  • Minimum threshold - Dashboard only shows topics with ≥3 queries
  • No user attribution - Topics not linked to individual users
  • Time boundary - Topics older than 90 days excluded from visualization


Change Log

2026-01-20

  • FEATURE: Hot Topics query analytics
    • Extracts 2-5 topic tags from user queries using LLM
    • Tags stored in transactions.metadata.query_topics
    • Smart gating: skips extraction when sources don't contribute to answer
    • Consistent tag structure with [type]-[subject] ordering
    • Canonical terms for better aggregation
    • Can be disabled via HOT_TOPICS_ENABLED=false
    • New module: _shared/core/topic-extraction.ts
    • New tests in rag.test.js for extraction behavior

2026-01-16

  • FEATURE: Added metadata persistence with chat history
    • Assistant responses now include model, confidence, sources, transaction, and timing metadata

2025-12-18

  • FEATURE: Added Dublin Core metadata to transaction-summary SSE event
    • New dublin_core field on each source in the sources[] array
    • Includes: dc_title, dc_creator, dc_publisher, dc_date, dc_rights, dc_description, dc_source, dc_identifier
    • Extracted from vector metadata at query time
    • Enables frontend to display metadata popovers in real-time query results

2025-12-13

  • FEATURE: Added multi-provider LLM support (OpenAI, Anthropic)
  • FEATURE: Added model_info SSE event for provider/model transparency
  • FEATURE: BYOK (Bring Your Own Key) support via headers

2025-12-10

  • FEATURE: Added LLM self-reflection for source contribution assessment
  • FEATURE: Answer confidence scoring based on source relevance

2025-11-25

  • FEATURE: Added attention-based source attribution
  • FEATURE: RoC credit distribution with per-source breakdown
  • FEATURE: Transaction logging with balance tracking

2025-09-10

  • Initial streaming RAG implementation
  • Supabase vector store integration
  • Xata conversation memory