Work In Progress

https://lucid.app/lucidchart/7f175f7b-3168-42a1-bace-06e6d158b6cc/edit?invitationId=inv_4b7cdc80-33bf-4e61-837f-aa3d5ed5a326&page=QyLL-WN1ffe1# — Lucid Rosie AI: Solution

To Do List

Metadata Filters Bug
1. netlify/edge-functions/utils/helpers.generateJsonFilter will not work for multiple orgIds, only the last to be loaded
2. Modify /home/steven/github/collabventures/llm-meeting-minutes/netlify/functions/utils/cli/loadDocsToVectorStore.mjs to accept paramerter for orgId and dynamically import the correct file
3. There should be one loadDocSources file for each OrgId
loadSource: "title": "2024-01-11 GSWG undefined Transcripts", replace undefined with ""
Streaming not working
https://js.langchain.com/docs/modules/model_io/llms/subscribing_events is supposed to return tokens used which could be used in business model, but it's not working. Tested with retrievalAugmentedGeneration & _xata
Create Xata table for status line to be retrieved by chat_script.js to be displayed in chat console to users know if there's a problem with Langchain, Supabase, or Xata.
Semantic Corrections
1. Potentially use GitHub Actions Suggest Changes "Using AI and LLMs in docs-as-code pipelines" 2024-01-24 Zoom call
2. Load: Summarization
  1. Create a load contents page
  2. Metadata form or load from URL (meeting page agenda), document type like transcripts, meeting page, document, etc.
  3. Upload files
  4. Load from URL, check for youtube
  5. Clear vector store
3. Cleanse: Plan & Execute Chain
  1. Create a cleanse document page descibing the process
  2. Create chain (Conversation Retrieval Agent with Tools) to:
    1. Load vector store
    2. Use Chat History
    3. Create a table with rows number, incorrect text, semantic correction
    4. Prompt User to select all, none, or individual rows to correct
    5. Save corrected text in mardown format with appropriate tags and metadata
    6. Load new markdown file into vector store
    7. Summarize document
    8. Save summary in markdown format with appropriate tags and metadata
    9. Load new markdown file into vector store
Keys
1. Use Chat Messages div to display errors or instructions if no keys provided
2. BYOK for OpenAI
3. BYOK for vector store? (teams for shared vector store?)
4. Use hashed key as prefix for SessionId
5. Not securely Save Settings to Local Storage
Rosie Docs
1. Change chat.njk to chatContainer to markdown
2. Create tool to save output in markdown and push to Rosie-Core
3. Create tool to translate to another language like Esperanto, Chinese, Japanese, Spanish, French
Misc
1. Write Pricing Guide
2. Create Rosie Help Bot
3. Usage by organization member See community thread

Done List

Redirect handled errors to console UI
Implement GPT-4.0 variable
Query: Conversational. See also langchain_playground: tools_AgentWebBrowserChat
1. Add Filter UI by metadata
Implement Messages Roles
Chat History to vector store Xata
Load text documents
Load vtt text transcripts and convert them to json
Add text to vectore store
Q&A from vector stores HNSWLib (local), Pinecone, Xata
Ingest youtube transcripts
Implement LangChain meeting minutes
BUG: New AI role repeats last User message
Change Inputs to Datalists. Using json for now.
Implement LangChain

Notes

from langchain.chains import VectorDBQA
from langchain.chat_models import ChatOpenAI

https://medium.com/mlearning-ai/using-chatgpt-for-question-answering-on-your-own-data-afa33d82fbd0

qa = VectorDBQA.from_chain_type(llm=ChatOpenAI(), chain_type="stuff", vectorstore=db, k=1)
query = "What is the document about"
qa.run(query)

Legend

To-do WIP Done Important

Napkin Business Canvas

source

Rosie-AI

Generating reputation.

is a AI Assistant
for organizations
fearful of AI-generated content that may cause reputational harm.
Unlike the commonly used ChatGPT on the web, which cannot access current information , or add-ons that create clueless meeting minutes, drafts or summaries,
Rosie-AI trains itself with industry and organization specific resources. It then assists you in creating authentic context-driven content, thereby continuously training iteslf with complete transparancy and attribution of sources.
This is worth price of an entry-level employee for our customers,
which we find through channels like Trust Over IP.

Parking Lot

Working on authenticate content
of the belief there's more knowledge being shared in meetings than being captured and capitalized on.
Langchain Syncing data sources to vector stores
1. Test deleting docs from Xata
2. Need vector store meets Indexing API requirements
3. Xata should work
4. Pinecone should work
5. 2023-09-13: Chroma which will be offering hosted vector stores for serverless apps
6. 2023-09-21: Indexing API is not available in JavaScript yet.
7. See Pinecone doc to query by metadata - like source and then delete all docs before adding new ones
Finding information in long documents with AI using vector databases and MapReduceChain from Langchain
Create Summarization Provider option
1. Add Model Verion to Chat Settings to allow for 16k
2. Added env RETRIEVER_K
3. Added RETRIEVER_K slider for Chat Settings
Is AssemblyAI transcripts better than zoom's? Can "Plan & Execute" map Speaker A to Zoom participant? If so, then use AssemblyAI transcripts.
AssemblyAI: Live transcript and adding Speaker A's name in realtime. See also https://picovoice.ai/docs/quick-start/eagle-web/
(Integrate Audio into LangChain.js apps in 5 Minutes - YouTube
1. https://www.assemblyai.com/docs/Models/speech_recognition#custom-vocabulary
  - Not good enough?
2. Create /.netlify/functions/webhook.mjs?testing=123 to call background function and avoid timeout
3. Implement AssemblyAI webhook.
  - Not possible with AssemblyAI Loader

To Do List

Done List

Notes

Legend

Napkin Business Canvas

Rosie-AI

Generating reputation.

Parking Lot

Value Proposition

Jobs to be Done

Pains

Gains