| # AAC Context-Aware Demo: To-Do Document | |
| ## Goal | |
| Create a proof-of-concept offline-capable RAG (Retrieval-Augmented Generation) system for ALS AAC users that: | |
| * Uses a lightweight knowledge graph (JSON) | |
| * Supports utterance suggestion and correction | |
| * Uses local/offline LLMs (e.g., Gemma, Flan-T5) | |
| * Includes a semantic retriever to match context (e.g. conversation partner, topics) | |
| * Provides a Gradio-based UI for deployment on HuggingFace | |
| --- | |
| ## Phase 1: Environment Setup | |
| * [ ] Install Gradio, Transformers, Sentence-Transformers | |
| * [ ] Choose and install inference backends: | |
| * [ ] `google/flan-t5-base` (via HuggingFace Transformers) | |
| * [ ] Gemma 2B via Ollama or Transformers (check support for offline use) | |
| * [ ] Sentence similarity model (`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` or similar) | |
| --- | |
| ## Phase 2: Knowledge Graph | |
| * [ ] Create example `social_graph.json` (people, topics, relationships) | |
| * [ ] Define function to extract relevant context given a selected person | |
| * Name, relationship, typical topics, frequency | |
| * [ ] Format for prompt injection: inline context for LLM use | |
| --- | |
| ## Phase 3: Semantic Retriever | |
| * [ ] Load sentence-transformer model | |
| * [ ] Create index from the social graph topics/descriptions | |
| * [ ] Match transcript to closest node(s) in the graph | |
| * [ ] Retrieve context for prompt generation | |
| --- | |
| ## Phase 4: Gradio UI | |
| * [ ] Simple interface: | |
| * Dropdown: Select "Who is speaking?" (Bob, Alice, etc.) | |
| * Record Button: Capture audio input | |
| * Text area: Show transcript | |
| * Toggle tabs: | |
| * [ ] "Suggest Utterance" | |
| * [ ] "Correct Message" | |
| * Output: Generated message | |
| * [ ] Implement Whisper transcription (use `whisper`, `faster-whisper`, or `whisper.cpp`) | |
| * [ ] Pass transcript + retrieved context to LLM model | |
| --- | |
| ## Phase 5: Model Comparison | |
| * [ ] Test both Flan-T5 and Gemma: | |
| * [ ] Evaluate speed/quality tradeoffs | |
| * [ ] Compare correction accuracy and context-specific generation | |
| --- | |
| ## Optional Phase 6: HuggingFace Deployment | |
| * [ ] Clean up UI and remove dependencies requiring GPU-only execution | |
| * [ ] Upload Gradio demo to HuggingFace Spaces | |
| * [ ] Add documentation and example graphs/transcripts | |
| --- | |
| ## Notes | |
| * Keep user privacy and safety in mind (no cloud transcription if Whisper offline is available) | |
| * Keep JSON editable for later expansion (add sessions, emotional tone, etc.) | |
| * Option to cache LLM suggestions for fast recall | |
| --- | |
| ## Future Features (Post-Proof of Concept) | |
| * Add visualisation of social graph (D3 or static SVG) | |
| * Add editable profile page for caregivers | |
| * Add chat history / rolling transcript viewer | |
| * Add emotion/sentiment detection for tone-aware suggestions | |