Plan: Chat Metadata Persistence
Status: Draft Feature: Persist chat metadata for session history and copy functionality Created: 2026-01-16 From: Frontend Claude To: Backend Claude
Problem Statement
Currently, rich metadata (sources, confidence, transaction details) is only available for the most recent chat query. When a user has multiple exchanges in a session, clicking the copy button on an older message only copies the prompt and response text - not the metadata.
Additionally, when users switch sessions via the session history dropdown, they cannot see previous conversations or their associated metadata.
What We Need
1. Persist Metadata with Each Assistant Response
When the RAG endpoint saves a chat message, please also persist the metadata that is currently streamed via SSE events. This metadata is already generated - it just needs to be stored.
Metadata to persist (per assistant response):
| Category | SSE Event Source | Why We Need It |
|---|---|---|
| Model info | model_info |
Show which model answered (for copy output, audit) |
| Confidence | events-answer-confidence |
Include confidence % and reasoning in copied content |
| Sources | events-sources-data |
Generate source attribution with Mermaid charts |
| Transaction | transaction-summary |
Include cost, RoC distribution, Dublin Core provenance |
| Timing | progress events |
Include response timing in copied content |
2. API Endpoint to Fetch Session History with Metadata
We need an endpoint to retrieve a full conversation with metadata when a user switches sessions.
Request:
GET /api/chat/history?sessionId={sessionId}
Expected Response:
{
"sessionId": "my-session",
"messages": [
{
"id": "msg-1",
"role": "user",
"content": "What is Smart Data?",
"timestamp": "2026-01-16T10:00:00Z",
"metadata": null
},
{
"id": "msg-2",
"role": "assistant",
"content": "Smart Data is...",
"timestamp": "2026-01-16T10:00:05Z",
"metadata": {
"model": { "provider": "openai", "model": "gpt-4o-mini", "mode": "platform" },
"confidence": { "confidence": 85, "sourcesContributed": true, "reasoning": "...", "sourcesFound": 5 },
"sources": { "totalSources": 5, "usedAttention": true, "sources": [...] },
"transaction": { "platform_fee": 0.02, "roc_credits_distributed": 0.01, "sources": [...] },
"timing": { "ttf_ms": 1500, "duration_ms": 5000 }
}
}
]
}
Why We Need This
1. Copy Any Conversation (not just the latest)
Users want to copy older Q&A exchanges with full source attribution and confidence scores - not just the most recent one.
2. Session Recall
When users select a previous session from the history dropdown, we want to restore the full conversation with all metadata, so they can continue where they left off.
3. Content Provenance
For audit and trust purposes, users need to trace any answer back to its original sources, including Dublin Core metadata and contributor attribution.
How Frontend Will Use This
-
On session switch (
session-history-btndropdown selection):- Call the history endpoint
- Render messages in chat UI
- Cache metadata client-side for copy functionality
-
On copy button click:
- Look up metadata from client cache
- Generate markdown with sources chart, confidence, etc.
-
Future: Copy entire conversation button:
- Iterate all messages and generate combined markdown
Backwards Compatibility
- Existing messages without metadata should still load (return
nullfor metadata) - Frontend will handle missing metadata gracefully
Questions
- What endpoint structure works best for your architecture?
- Any concerns about storage size with full metadata per message?
- Should metadata be filtered based on user role (e.g., hide
contributor_idfrom non-admins)?
Acceptance Criteria
- [ ] Assistant responses are saved with associated metadata
- [ ] History endpoint returns messages with metadata for a given session
- [ ] Existing messages (without metadata) continue to work
- [ ] Response time for history fetch is acceptable (< 500ms for typical session)
Related Documentation
- SSE_EVENTS_REFERENCE.md - Current SSE event structure
- clearChatHistory/README.md - Chat history management
- How-RAG-Works.md - RAG system overview
Implementation Files
Primary files for backend implementation:
| File | Purpose |
|---|---|
| supabase/functions/rag/index.ts | Main RAG function - where metadata is generated and streamed via SSE |
| supabase/functions/rag/README.md | RAG function documentation |
Key implementation points in rag/index.ts:
- Metadata is already generated during query processing (model_info, confidence, sources, transaction, timing)
- Currently streamed via SSE but not persisted
- Chat history is saved via Supabase chat history store (see
@stores/imports) - Backend needs to: (1) collect metadata during streaming, (2) persist alongside assistant message
Mock Data for Frontend Development
The following mock data matches the actual SSE event structures documented in SSE_EVENTS_REFERENCE.md. Frontend Claude can use these to implement and test the UI before the backend endpoint is ready.
Mock Session History Response
{
"sessionId": "session-abc123",
"messages": [
{
"id": "msg-001",
"role": "user",
"content": "What is Smart Data and how does it relate to content provenance?",
"timestamp": "2026-01-16T10:00:00.000Z",
"metadata": null
},
{
"id": "msg-002",
"role": "assistant",
"content": "Smart Data is an approach to data management that embeds intelligence and context directly into data assets. It goes beyond traditional data storage by incorporating metadata, relationships, and provenance information that travels with the data.\n\nIn terms of content provenance, Smart Data enables:\n\n1. **Origin Tracking**: Every piece of content maintains a record of where it came from, who created it, and when\n2. **Transformation History**: As data is processed or modified, the changes are logged\n3. **Attribution Chain**: Contributors are credited for their intellectual contributions\n4. **Rights Management**: Usage permissions and licensing information are preserved\n\nThis is particularly valuable in AI/RAG systems where answers are synthesized from multiple sources - Smart Data ensures proper attribution and enables fair compensation to content creators.",
"timestamp": "2026-01-16T10:00:08.500Z",
"metadata": {
"model": {
"provider": "openai",
"model": "gpt-4o-mini",
"mode": "platform"
},
"confidence": {
"confidence": 85,
"sourcesContributed": true,
"reasoning": "The answer directly incorporates specific concepts and terminology from the retrieved documents about Smart Data architecture and provenance tracking. Multiple sources contributed complementary information.",
"sourcesFound": 8,
"llmAssessmentConfidence": 92
},
"sources": {
"totalSources": 5,
"usedAttention": true,
"sources": [
{
"source_url": "https://docs.rosie.ai/concepts/smart-data",
"source_title": "Smart Data Architecture Overview",
"contributor_id": "user-uuid-001",
"contributor_name": "Alice Chen",
"portion": 0.35,
"roc_earned": 0.0049,
"chunks_used": 4,
"dublin_core": {
"dc_title": "Smart Data Architecture Overview",
"dc_creator": "Alice Chen",
"dc_publisher": "Rosie AI Documentation",
"dc_date": "2025-11-15",
"dc_rights": "CC BY-SA 4.0",
"dc_description": "Comprehensive guide to Smart Data concepts and implementation patterns",
"dc_source": null,
"dc_identifier": "rosie-doc-smart-data-001"
}
},
{
"source_url": "https://docs.rosie.ai/guides/content-provenance",
"source_title": "Content Provenance in RAG Systems",
"contributor_id": "user-uuid-002",
"contributor_name": "Bob Martinez",
"portion": 0.28,
"roc_earned": 0.0039,
"chunks_used": 3,
"dublin_core": {
"dc_title": "Content Provenance in RAG Systems",
"dc_creator": "Bob Martinez",
"dc_publisher": "Rosie AI Documentation",
"dc_date": "2025-12-01",
"dc_rights": "CC BY-SA 4.0",
"dc_description": "How to track and attribute content sources in retrieval-augmented generation",
"dc_source": null,
"dc_identifier": "rosie-doc-provenance-001"
}
},
{
"source_url": "https://example.org/papers/data-attribution.pdf",
"source_title": "Fair Attribution in AI Systems",
"contributor_id": "user-uuid-003",
"contributor_name": "Dr. Carol Singh",
"portion": 0.22,
"roc_earned": 0.0031,
"chunks_used": 2,
"dublin_core": {
"dc_title": "Fair Attribution in AI Systems",
"dc_creator": "Dr. Carol Singh",
"dc_publisher": "Journal of AI Ethics",
"dc_date": "2025-09-22",
"dc_rights": "All Rights Reserved",
"dc_description": "Academic paper on attribution mechanisms for AI-generated content",
"dc_source": "https://doi.org/10.1234/jaie.2025.attribution",
"dc_identifier": "DOI:10.1234/jaie.2025.attribution"
}
},
{
"source_url": "https://internal.company.com/wiki/metadata-standards",
"source_title": "Internal Metadata Standards Guide",
"contributor_id": "user-uuid-001",
"contributor_name": "Alice Chen",
"portion": 0.1,
"roc_earned": 0.0014,
"chunks_used": 1,
"dublin_core": {
"dc_title": "Internal Metadata Standards Guide",
"dc_creator": "Alice Chen",
"dc_publisher": "Company Wiki",
"dc_date": "2025-10-05",
"dc_rights": "Confidential - Internal Use Only",
"dc_description": null,
"dc_source": null,
"dc_identifier": null
}
},
{
"source_url": "https://blog.rosie.ai/smart-data-explained",
"source_title": "Smart Data Explained: A Beginner's Guide",
"contributor_id": "user-uuid-004",
"contributor_name": "Diana Lee",
"portion": 0.05,
"roc_earned": 0.0007,
"chunks_used": 1,
"dublin_core": null
}
]
},
"transaction": {
"platform_fee": 0.0236,
"roc_credits_distributed": 0.014,
"usage_duration_seconds": 8.5,
"transaction_type": "query_usage",
"transaction_id": "txn-uuid-001",
"balance_before": 99.9764,
"balance_after": 99.9528,
"sources": [
{
"source_url": "https://docs.rosie.ai/concepts/smart-data",
"source_title": "Smart Data Architecture Overview",
"contributor_id": "user-uuid-001",
"contributor_name": "Alice Chen",
"portion": 0.35,
"roc_earned": 0.0049,
"chunks_used": 4
},
{
"source_url": "https://docs.rosie.ai/guides/content-provenance",
"source_title": "Content Provenance in RAG Systems",
"contributor_id": "user-uuid-002",
"contributor_name": "Bob Martinez",
"portion": 0.28,
"roc_earned": 0.0039,
"chunks_used": 3
},
{
"source_url": "https://example.org/papers/data-attribution.pdf",
"source_title": "Fair Attribution in AI Systems",
"contributor_id": "user-uuid-003",
"contributor_name": "Dr. Carol Singh",
"portion": 0.22,
"roc_earned": 0.0031,
"chunks_used": 2
},
{
"source_url": "https://internal.company.com/wiki/metadata-standards",
"source_title": "Internal Metadata Standards Guide",
"contributor_id": "user-uuid-001",
"contributor_name": "Alice Chen",
"portion": 0.1,
"roc_earned": 0.0014,
"chunks_used": 1
},
{
"source_url": "https://blog.rosie.ai/smart-data-explained",
"source_title": "Smart Data Explained: A Beginner's Guide",
"contributor_id": "user-uuid-004",
"contributor_name": "Diana Lee",
"portion": 0.05,
"roc_earned": 0.0007,
"chunks_used": 1
}
]
},
"timing": {
"ttf_ms": 1704,
"duration_ms": 8500
}
}
},
{
"id": "msg-003",
"role": "user",
"content": "How is contributor compensation calculated?",
"timestamp": "2026-01-16T10:02:30.000Z",
"metadata": null
},
{
"id": "msg-004",
"role": "assistant",
"content": "Contributor compensation in the Rosie platform is calculated using an attention-weighted attribution model. Here's how it works:\n\n**1. Content Retrieval**\nWhen a query is processed, the RAG system retrieves relevant document chunks from multiple contributors.\n\n**2. Attention Weight Calculation**\nThe system calculates how much each source contributed to the final answer using attention weights. Sources that provided more relevant information receive higher weights.\n\n**3. RoC Distribution Formula**\n```\nRoC earned = (attention_portion) × (total_roc_pool) × (1 - platform_fee_percentage)\n```\n\n**4. Example Breakdown**\n- Query cost: $0.024\n- Platform fee (40%): $0.010\n- RoC pool for contributors: $0.014\n- Top contributor (35% attention): $0.0049\n- Second contributor (28% attention): $0.0039\n\n**5. Self-Use Exemption**\nWhen users query their own contributed content, no RoC is distributed for those chunks (marked as `roc_earned: null`).",
"timestamp": "2026-01-16T10:02:42.200Z",
"metadata": {
"model": {
"provider": "openai",
"model": "gpt-4o-mini",
"mode": "platform"
},
"confidence": {
"confidence": 92,
"sourcesContributed": true,
"reasoning": "The answer directly uses specific formulas, percentages, and examples from the retrieved documentation about the RoC compensation model. High confidence due to exact numerical matches with source material.",
"sourcesFound": 4,
"llmAssessmentConfidence": 95
},
"sources": {
"totalSources": 3,
"usedAttention": true,
"sources": [
{
"source_url": "https://docs.rosie.ai/economics/roc-model",
"source_title": "RoC Compensation Model",
"contributor_id": "user-uuid-005",
"contributor_name": "Eric Johnson",
"portion": 0.55,
"roc_earned": 0.0061,
"chunks_used": 5,
"dublin_core": {
"dc_title": "RoC Compensation Model",
"dc_creator": "Eric Johnson",
"dc_publisher": "Rosie AI Documentation",
"dc_date": "2025-12-10",
"dc_rights": "CC BY-SA 4.0",
"dc_description": "Technical specification of the Return on Content (RoC) compensation system",
"dc_source": null,
"dc_identifier": "rosie-doc-roc-001"
}
},
{
"source_url": "https://docs.rosie.ai/concepts/attention-weighting",
"source_title": "Attention-Based Content Attribution",
"contributor_id": "user-uuid-002",
"contributor_name": "Bob Martinez",
"portion": 0.3,
"roc_earned": 0.0033,
"chunks_used": 3,
"dublin_core": {
"dc_title": "Attention-Based Content Attribution",
"dc_creator": "Bob Martinez",
"dc_publisher": "Rosie AI Documentation",
"dc_date": "2025-11-28",
"dc_rights": "CC BY-SA 4.0",
"dc_description": "How attention weights determine content contribution percentages",
"dc_source": null,
"dc_identifier": "rosie-doc-attention-001"
}
},
{
"source_url": "https://docs.rosie.ai/faq/compensation",
"source_title": "Compensation FAQ",
"contributor_id": "user-uuid-006",
"contributor_name": "Fiona Wright",
"portion": 0.15,
"roc_earned": 0.0017,
"chunks_used": 1,
"dublin_core": {
"dc_title": "Compensation FAQ",
"dc_creator": "Fiona Wright",
"dc_publisher": "Rosie AI Documentation",
"dc_date": "2026-01-05",
"dc_rights": "CC BY-SA 4.0",
"dc_description": "Frequently asked questions about contributor payments",
"dc_source": null,
"dc_identifier": "rosie-doc-faq-comp-001"
}
}
]
},
"transaction": {
"platform_fee": 0.0185,
"roc_credits_distributed": 0.0111,
"usage_duration_seconds": 12.2,
"transaction_type": "query_usage",
"transaction_id": "txn-uuid-002",
"balance_before": 99.9528,
"balance_after": 99.9343,
"sources": [
{
"source_url": "https://docs.rosie.ai/economics/roc-model",
"source_title": "RoC Compensation Model",
"contributor_id": "user-uuid-005",
"contributor_name": "Eric Johnson",
"portion": 0.55,
"roc_earned": 0.0061,
"chunks_used": 5
},
{
"source_url": "https://docs.rosie.ai/concepts/attention-weighting",
"source_title": "Attention-Based Content Attribution",
"contributor_id": "user-uuid-002",
"contributor_name": "Bob Martinez",
"portion": 0.3,
"roc_earned": 0.0033,
"chunks_used": 3
},
{
"source_url": "https://docs.rosie.ai/faq/compensation",
"source_title": "Compensation FAQ",
"contributor_id": "user-uuid-006",
"contributor_name": "Fiona Wright",
"portion": 0.15,
"roc_earned": 0.0017,
"chunks_used": 1
}
]
},
"timing": {
"ttf_ms": 1520,
"duration_ms": 12200
}
}
},
{
"id": "msg-005",
"role": "user",
"content": "What's the weather like today?",
"timestamp": "2026-01-16T10:05:00.000Z",
"metadata": null
},
{
"id": "msg-006",
"role": "assistant",
"content": "I don't have access to real-time weather data or your location. The Rosie knowledge base focuses on documentation, technical content, and organizational knowledge rather than live data feeds.\n\nFor current weather information, I'd recommend checking a weather service like Weather.com, your phone's weather app, or asking a general-purpose AI assistant that has web access.",
"timestamp": "2026-01-16T10:05:04.800Z",
"metadata": {
"model": {
"provider": "openai",
"model": "gpt-4o-mini",
"mode": "platform"
},
"confidence": {
"confidence": 35,
"sourcesContributed": false,
"reasoning": "No relevant sources were found in the knowledge base for weather information. The answer is based entirely on general knowledge about the system's limitations.",
"sourcesFound": 0,
"llmAssessmentConfidence": 98
},
"sources": {
"totalSources": 0,
"usedAttention": false,
"sources": []
},
"transaction": {
"platform_fee": 0.008,
"roc_credits_distributed": 0.0,
"usage_duration_seconds": 4.8,
"transaction_type": "query_usage",
"transaction_id": "txn-uuid-003",
"balance_before": 99.9343,
"balance_after": 99.9263,
"sources": []
},
"timing": {
"ttf_ms": 980,
"duration_ms": 4800
}
}
}
]
}
Mock Data for Legacy Messages (No Metadata)
This shows how older messages (before metadata persistence) should appear:
{
"sessionId": "session-legacy-001",
"messages": [
{
"id": "msg-legacy-001",
"role": "user",
"content": "Tell me about the platform architecture",
"timestamp": "2025-12-15T14:30:00.000Z",
"metadata": null
},
{
"id": "msg-legacy-002",
"role": "assistant",
"content": "The platform uses a microservices architecture with Supabase Edge Functions...",
"timestamp": "2025-12-15T14:30:12.000Z",
"metadata": null
}
]
}
TypeScript Types for Frontend
// Types matching the API response structure
interface ChatHistoryResponse {
sessionId: string;
messages: ChatMessage[];
}
interface ChatMessage {
id: string;
role: "user" | "assistant";
content: string;
timestamp: string; // ISO 8601 format
metadata: AssistantMetadata | null;
}
interface AssistantMetadata {
model: ModelInfo;
confidence: ConfidenceInfo;
sources: SourcesInfo;
transaction: TransactionInfo;
timing: TimingInfo;
}
interface ModelInfo {
provider: "openai" | "anthropic";
model: string;
mode: "platform" | "byok";
}
interface ConfidenceInfo {
confidence: number; // 10-95
sourcesContributed: boolean;
reasoning: string;
sourcesFound: number;
llmAssessmentConfidence: number;
}
interface SourcesInfo {
totalSources: number;
usedAttention: boolean;
sources: SourceDetail[];
}
interface SourceDetail {
source_url: string;
source_title: string | null;
contributor_id: string;
contributor_name: string | null;
portion: number; // 0-1
roc_earned: number | null; // null if self-use
chunks_used: number;
dublin_core: DublinCoreMetadata | null;
}
interface DublinCoreMetadata {
dc_title: string | null;
dc_creator: string | null;
dc_publisher: string | null;
dc_date: string | null;
dc_rights: string | null;
dc_description: string | null;
dc_source: string | null;
dc_identifier: string | null;
}
interface TransactionInfo {
platform_fee: number;
roc_credits_distributed: number;
usage_duration_seconds: number;
transaction_type: string;
transaction_id: string;
balance_before: number;
balance_after: number;
sources: TransactionSource[];
}
interface TransactionSource {
source_url: string;
source_title: string | null;
contributor_id: string;
contributor_name: string | null;
portion: number;
roc_earned: number | null;
chunks_used: number;
}
interface TimingInfo {
ttf_ms: number; // Time to first response
duration_ms: number; // Total duration
}
Edge Cases to Handle
| Scenario | Expected Behavior |
|---|---|
| Legacy message (no metadata) | metadata: null |
| No sources found | sources.sources: [], sourcesContributed: false |
| Self-use (user queries own content) | roc_earned: null for those sources |
| Missing Dublin Core | dublin_core: null for that source |
| BYOK mode | model.mode: "byok" with user's chosen model |
| Empty session | messages: [] |
| Invalid session ID | HTTP 404 with error message |
Last Updated: 2026-01-16 Maintained By: Frontend/Backend Claude collaboration