Work In Progress

Modified: 2025-Dec-03 09:08:52 UTC
https://lucid.app/lucidchart/7f175f7b-3168-42a1-bace-06e6d158b6cc/edit?invitationId=inv_4b7cdc80-33bf-4e61-837f-aa3d5ed5a326&page=QyLL-WN1ffe1#
Lucid Rosie AI: Solution

To Do List

  1. Metadata Filters Bug

    1. netlify/edge-functions/utils/helpers.generateJsonFilter will not work for multiple orgIds, only the last to be loaded
    2. Modify /home/steven/github/collabventures/llm-meeting-minutes/netlify/functions/utils/cli/loadDocsToVectorStore.mjs to accept paramerter for orgId and dynamically import the correct file
    3. There should be one loadDocSources file for each OrgId

  2. loadSource: "title": "2024-01-11 GSWG undefined Transcripts", replace undefined with ""

  3. Streaming not working

  4. https://js.langchain.com/docs/modules/model_io/llms/subscribing_events is supposed to return tokens used which could be used in business model, but it's not working. Tested with retrievalAugmentedGeneration & _xata

  5. Create Xata table for status line to be retrieved by chat_script.js to be displayed in chat console to users know if there's a problem with Langchain, Supabase, or Xata.

  6. Semantic Corrections

    1. Potentially use GitHub Actions Suggest Changes "Using AI and LLMs in docs-as-code pipelines" 2024-01-24 Zoom call
    2. Load: Summarization
      1. Create a load contents page
      2. Metadata form or load from URL (meeting page agenda), document type like transcripts, meeting page, document, etc.
      3. Upload files
      4. Load from URL, check for youtube
      5. Clear vector store
    3. Cleanse: Plan & Execute Chain
      1. Create a cleanse document page descibing the process
      2. Create chain (Conversation Retrieval Agent with Tools) to:
        1. Load vector store
        2. Use Chat History
        3. Create a table with rows number, incorrect text, semantic correction
        4. Prompt User to select all, none, or individual rows to correct
        5. Save corrected text in mardown format with appropriate tags and metadata
        6. Load new markdown file into vector store
        7. Summarize document
        8. Save summary in markdown format with appropriate tags and metadata
        9. Load new markdown file into vector store
  7. Keys

    1. Use Chat Messages div to display errors or instructions if no keys provided
    2. BYOK for OpenAI
    3. BYOK for vector store? (teams for shared vector store?)
    4. Use hashed key as prefix for SessionId
    5. Not securely Save Settings to Local Storage
  8. Rosie Docs

    1. Change chat.njk to chatContainer to markdown
    2. Create tool to save output in markdown and push to Rosie-Core
    3. Create tool to translate to another language like Esperanto, Chinese, Japanese, Spanish, French
  9. Misc

    1. Write Pricing Guide
    2. Create Rosie Help Bot
    3. Usage by organization member See community thread

Done List

  1. Redirect handled errors to console UI
  2. Implement GPT-4.0 variable
  3. Query: Conversational. See also langchain_playground: tools_AgentWebBrowserChat
    1. Add Filter UI by metadata
  4. Implement Messages Roles
  5. Chat History to vector store Xata
  6. Load text documents
  7. Load vtt text transcripts and convert them to json
  8. Add text to vectore store
  9. Q&A from vector stores HNSWLib (local), Pinecone, Xata
  10. Ingest youtube transcripts
  11. Implement LangChain meeting minutes
  12. BUG: New AI role repeats last User message
  13. Change Inputs to Datalists. Using json for now.
  14. Implement LangChain

Notes

from langchain.chains import VectorDBQA
from langchain.chat_models import ChatOpenAI
qa = VectorDBQA.from_chain_type(llm=ChatOpenAI(), chain_type="stuff", vectorstore=db, k=1)
query = "What is the document about"
qa.run(query)
Legend

To-do WIP Done Important

Napkin Business Canvas

source

Rosie-AI

Generating reputation.

is a AI Assistant
for organizations
fearful of AI-generated content that may cause reputational harm.
Unlike the commonly used ChatGPT on the web, which cannot access current information , or add-ons that create clueless meeting minutes, drafts or summaries,
Rosie-AI trains itself with industry and organization specific resources. It then assists you in creating authentic context-driven content, thereby continuously training iteslf with complete transparancy and attribution of sources.
This is worth price of an entry-level employee for our customers,
which we find through channels like Trust Over IP.

Parking Lot

  1. Working on authenticate content
  2. of the belief there's more knowledge being shared in meetings than being captured and capitalized on.
  3. Langchain Syncing data sources to vector stores
    1. Test deleting docs from Xata
    2. Need vector store meets Indexing API requirements
    3. Xata should work
    4. Pinecone should work
    5. 2023-09-13: Chroma which will be offering hosted vector stores for serverless apps
    6. 2023-09-21: Indexing API is not available in JavaScript yet.
    7. See Pinecone doc to query by metadata - like source and then delete all docs before adding new ones
  4. Finding information in long documents with AI using vector databases and MapReduceChain from Langchain
  5. Create Summarization Provider option
    1. Add Model Verion to Chat Settings to allow for 16k
    2. Added env RETRIEVER_K
    3. Added RETRIEVER_K slider for Chat Settings
  6. Is AssemblyAI transcripts better than zoom's? Can "Plan & Execute" map Speaker A to Zoom participant? If so, then use AssemblyAI transcripts.
  7. AssemblyAI: Live transcript and adding Speaker A's name in realtime. See also https://picovoice.ai/docs/quick-start/eagle-web/
  8. (Integrate Audio into LangChain.js apps in 5 Minutes - YouTube
    1. https://www.assemblyai.com/docs/Models/speech_recognition#custom-vocabulary
      • Not good enough?
    2. Create /.netlify/functions/webhook.mjs?testing=123 to call background function and avoid timeout
    3. Implement AssemblyAI webhook.
      • Not possible with AssemblyAI Loader

Value Proposition

Jobs to be Done

Pains

Gains