Chatbot with Voice Clone
# The Heart Space — Chatbot with Cloned Voice: Full Documentation
## Overview
"The Heart Space" is an AI-powered trauma recovery chatbot embedded in the Trauma Navigator platform. It combines:
- **OpenAI GPT-4o** for empathetic, trauma-informed text responses
- **HeyGen Voice Cloning** (primary) to speak responses in a cloned human voice (Felicia)
- **OpenAI TTS** (fallback) using the `nova` voice model
- **RAG (Retrieval-Augmented Generation)** to ground responses in the toolkit's exercise library
- **Persistent user accounts** with full conversation history
---
## API Dependencies
| Service | Purpose | API Key Environment Variable |
|---|---|---|
| OpenAI (GPT-4o) | Chat completion (AI responses) | `OPENAI_API_KEY` |
| OpenAI (text-embedding-3-small) | RAG vector embeddings | `OPENAI_API_KEY` |
| OpenAI (tts-1-hd, nova) | Fallback TTS voice | `OPENAI_API_KEY` |
| HeyGen v3 Voices API | Primary cloned voice synthesis | `HEYGEN_API_KEY` |
| ElevenLabs (inactive) | Legacy secondary TTS (no longer primary) | `ELEVENLABS_API_KEY` |
---
## Environment Variables
All keys must be set in the server environment. They are configured in `.env` and also in `ecosystem.config.cjs` for the PM2 process manager.
```
OPENAI_API_KEY=sk-...
HEYGEN_API_KEY=sk_V2_hgu_kHPSnIq8Fxq_...
ELEVENLABS_API_KEY=... # legacy, kept but not primary
ADMIN_PASSWORD=... # protects admin-only routes
SITE_URL=https://your-domain.com # used during RAG re-indexing
```
---
## Key Source Files
| File | Role |
|---|---|
| `server/openai.ts` | All AI/TTS generation functions |
| `server/routes.ts` | All chatbot API routes + speech cache |
| `server/embeddings.ts` | RAG context retrieval + content indexing |
| `server/auth.ts` | Password hashing/verification (scrypt) |
| `server/storage.ts` | Database access layer (Drizzle ORM) |
| `shared/schema.ts` | PostgreSQL schema definitions |
| `client/src/components/ChatBot.tsx` | React UI component (floating widget) |
| `client/src/hooks/use-tts.ts` | Reusable TTS hook (calls `/api/tts`) |
---
## Database Schema
Defined in `shared/schema.ts` using Drizzle ORM against PostgreSQL.
### `visitors` table
```
id serial PRIMARY KEY
name text NOT NULL
email text NOT NULL
passwordHash text -- scrypt hash
lastState text
createdAt timestamp
```
### `conversations` table
```
id serial PRIMARY KEY
visitorId integer ? visitors.id
createdAt timestamp
```
### `messages` table
```
id serial PRIMARY KEY
conversationId integer ? conversations.id
role text -- 'user' | 'assistant'
content text
createdAt timestamp
```
### `resources` table (exercise library)
```
id serial PRIMARY KEY
title text
description text
content text
type text -- 'breathing' | 'somatic' | 'grounding' | 'creative'
category text -- 'sympathetic' | 'parasympathetic' | 'creative'
imageUrl text
```
### `resource_embeddings` table (RAG)
```
id serial PRIMARY KEY
resourceId integer ? resources.id
pageId integer ? pages.id
embedding jsonb -- float[] vector from text-embedding-3-small
contentType text -- 'resource' | 'page'
url text -- direct link to the content
```
### `knowledge_documents` table (admin knowledge base)
```
id serial PRIMARY KEY
title text
sourceType text -- 'upload' | 'url' | 'manual'
sourceUrl text
content text
contentType text -- 'pdf' | 'txt' | 'url' | 'manual'
embedding jsonb
createdAt timestamp
updatedAt timestamp
```
---
## Step-by-Step: How the Voice Clone Was Created
### Step 1 — Voice Recording (External to Codebase)
1. Record clean audio samples of the target voice (Felicia) — typically 1–5 minutes of clear speech with no background noise.
2. Log in to [HeyGen Studio](https://app.heygen.com) ? **Voices** ? **Voice Clone**.
3. Upload the audio samples and create a **Instant Voice Clone** or **Professional Voice Clone**.
4. Once processed, HeyGen assigns a **Voice ID** (UUID). This is the cloned voice's identifier.
### Step 2 — Store the Voice ID
The Voice ID is hardcoded as a constant in `server/openai.ts`:
```ts
// server/openai.ts (line 97)
const FELICIA_VOICE_ID = "b5d52e83e8fe4a34a24bc5cffd2ada3a";
```
This ID is also referenced as `FELICIA_VOICE_CLONE` on line 147 (legacy ElevenLabs constant with same value, kept for reference).
### Step 3 — Store the API Key
Add the HeyGen API key to the server environment:
```
HEYGEN_API_KEY=sk_V2_hgu_kHPSnIq8Fxq_...
```
In `ecosystem.config.cjs`, this is passed via PM2's `env` block so the Node.js process receives it at runtime.
---
## Step-by-Step: Full Chat + Voice Workflow
### Phase 1 — User Authentication
1. User clicks the floating heart button ? `ChatBot.tsx` opens.
2. User registers (`POST /api/chat/register`) or logs in (`POST /api/chat/login`).
- **Register:** `name`, `email`, `password` ? password hashed with `scrypt` ? `visitors` row + `conversations` row created ? returns `{ visitor, conversation }`.
- **Login:** email + password ? verified against stored `scrypt` hash ? returns `{ visitor, conversations[] }`.
3. Session is persisted in browser `sessionStorage` as `chat_visitor` and `chat_conversation` keys.
### Phase 2 — Sending a Message
1. User types a message and submits the form ? `POST /api/chat/message` with `{ conversationId, content }`.
2. Server saves the user message to the `messages` table (`role: 'user'`).
3. Full conversation history is retrieved from the database.
4. Server scans past assistant messages to detect if one-time phrases ("Dear one", "you are not broken") have already been used.
### Phase 3 — RAG Context Retrieval
1. The user's latest message is vectorised via `generateEmbedding()` ? OpenAI `text-embedding-3-small` model ? returns a `float[]` vector.
2. `findRelevantContext()` in `server/embeddings.ts` fetches all stored embeddings from three sources:
- `resource_embeddings` (trauma exercises)
- `page_embeddings` (institutional pages)
- `knowledge_documents` (admin-uploaded knowledge base)
3. **Cosine similarity** is computed between the query vector and every stored embedding.
4. The top-5 most similar items are assembled into a context string, each including title, type, URL, description, and up to 500 characters of content.
### Phase 4 — AI Response Generation
1. `getChatCompletion()` in `server/openai.ts` sends the following to OpenAI `gpt-4o`:
- **System prompt:** Defines the persona "The Heart Space" (blend of Louise Hay and Sarah Blondin), tone rules, one-time phrase guards, exercise suggestion format (`[[Title|URL]]`), and the RAG context.
- **Message history:** All prior user + assistant messages in the conversation.
- **Temperature:** `0.7`
2. The AI response is returned and saved to the `messages` table (`role: 'assistant'`).
### Phase 5 — Pre-Generated Speech Cache (Background)
Immediately after saving the AI message, the server **pre-generates audio in the background** (without blocking the HTTP response):
```ts
// server/routes.ts (lines 248–254)
generateHeyGenSpeech(cleanTextForSpeech(aiResponse)).then(result => {
if ("audio" in result) {
speechCache.set(msgId, { buffer: result.audio, expiresAt: Date.now() + 15 * 60 * 1000 });
}
}).catch(() => {});
```
- The audio `Buffer` is stored in a **server-side in-memory Map** (`speechCache`) keyed by `messageId`.
- Cache entries **expire after 15 minutes** and are purged every 5 minutes.
- This means when the user clicks "Listen", the audio is already ready.
### Phase 6 — Voice Playback ("Listen" Button)
1. User clicks the **Listen** button below any assistant message ? `playSpeech(id, text)` runs in `ChatBot.tsx`.
2. Client calls `POST /api/chat/speech` with `{ text, messageId }`.
3. Server checks in-memory `speechCache`:
- **Cache HIT:** Returns the pre-buffered `audio/mpeg` binary immediately, then deletes the entry.
- **Cache MISS:** Calls `generateHeyGenSpeech(cleanText)` on-demand.
4. `generateHeyGenSpeech()` in `server/openai.ts`:
- **Step 1:** `POST https://api.heygen.com/v3/voices/speech` with `{ text, voice_id: FELICIA_VOICE_ID, speed: 1.0 }` ? HeyGen returns `{ data: { audio_url, duration } }`.
- **Step 2:** Server fetches the binary audio from `audio_url` and returns it as an `audio/mpeg` stream to the client.
5. **Fallback:** If HeyGen fails (API key missing, network error, or API error), server falls back to OpenAI `tts-1-hd` with the `nova` voice at `speed: 0.9`.
6. Client receives the `Blob`, creates an object URL, constructs an `HTMLAudioElement`, and calls `.play()`.
### Phase 7 — Text Pre-Processing for Speech
Before any text is sent to the TTS providers, it is cleaned by `cleanTextForSpeech()`:
```ts
// server/routes.ts (lines 19–24)
function cleanTextForSpeech(text: string): string {
return text
.replace(/\[\[(.*?)\|(.*?)\]\]/g, "I suggest the $1") // [[Title|URL]] ? spoken
.replace(/\[(.*?)\]\((.*?)\)/g, "$1") // [Title](URL) ? title only
.replace(/[*_#]/g, ""); // strip markdown
}
```
---
## API Route Reference
| Method | Route | Description |
|---|---|---|
| `POST` | `/api/chat/register` | Create account ? visitor + conversation |
| `POST` | `/api/chat/login` | Authenticate ? visitor + conversations list |
| `POST` | `/api/chat/new-conversation` | Create a new conversation for existing visitor |
| `GET` | `/api/chat/visitor/:visitorId/conversations` | List all conversation previews |
| `POST` | `/api/chat/start` | Legacy guest start (no password) |
| `POST` | `/api/chat/message` | Send message ? triggers RAG + GPT-4o + voice pre-gen |
| `GET` | `/api/chat/history/:conversationId` | Retrieve all messages in a conversation |
| `POST` | `/api/chat/speech` | Get audio for a message (cache-first, HeyGen primary) |
| `POST` | `/api/tts` | Generic TTS via ElevenLabs (used by `useTTS` hook) |
| `POST` | `/api/admin/index-knowledge` | Re-index all resources + pages into vector DB |
---
## HeyGen API Details
**Endpoint:** `POST https://api.heygen.com/v3/voices/speech`
**Request headers:**
```
X-Api-Key: <HEYGEN_API_KEY>
Content-Type: application/json
```
**Request body:**
```json
{
"text": "Your response text here",
"voice_id": "b5d52e83e8fe4a34a24bc5cffd2ada3a",
"speed": 1.0
}
```
**Response body:**
```json
{
"error": null,
"data": {
"audio_url": "https://cdn.heygen.com/.../audio.mp3",
"duration": 4.2
}
}
```
The server then fetches the binary from `audio_url` and proxies it directly as `audio/mpeg` to the client — the client never touches the HeyGen CDN directly.
---
## OpenAI TTS Fallback Details
**Model:** `tts-1-hd`
**Voice:** `nova` (warmer and more expressive)
**Speed:** `0.9` (slightly slower to avoid monotonous delivery)
Called via the official `openai` Node.js SDK:
```ts
openai.audio.speech.create({ model: "tts-1-hd", voice: "nova", input: text, speed: 0.9 })
```
---
## Process Management
The server is managed by **PM2** with the app name `trauma-navigator`.
```bash
pm2 start ecosystem.config.cjs # start
pm2 restart trauma-navigator # restart after code changes
pm2 logs trauma-navigator # view logs
pm2 status # check running status
```
---
## How to Replace the Cloned Voice
1. Record new voice samples and create a new Voice Clone in HeyGen Studio.
2. Copy the new Voice ID from the HeyGen dashboard.
3. Update `FELICIA_VOICE_ID` in `server/openai.ts` (line 97):
```ts
const FELICIA_VOICE_ID = "<new-voice-id-here>";
```
4. Restart the server: `pm2 restart trauma-navigator`
---
## Speech Cache Architecture
```
POST /api/chat/message
??? Save user msg
??? RAG lookup
??? GPT-4o completion
??? Save assistant msg (msgId = N)
??? [background] generateHeyGenSpeech(text)
??? speechCache.set(N, { buffer, expiresAt })
POST /api/chat/speech { messageId: N }
??? Cache HIT ? return buffer, delete entry (instant)
??? Cache MISS ? generateHeyGenSpeech on-demand (2–4s)
```
Cache TTL: **15 minutes**. Purge interval: **5 minutes**.
__________________________________________________________
Integrate The Heart Space Chatbot into a New Project
This plan walks a new developer through adding the full AI chatbot (GPT-4o + HeyGen voice clone + RAG) from Trauma Navigator into a new Node.js / Express / React / PostgreSQL / Drizzle / PM2 project.
Prerequisites — gather before starting
- OpenAI API key — needs access to
gpt-4o,text-embedding-3-small, andtts-1-hd - HeyGen API key —
sk_V2_...format, from app.heygen.com ? Settings ? API - HeyGen Voice ID — see Step 3 below for how to clone a voice and get the UUID
- PostgreSQL database running and accessible via
DATABASE_URL - Node.js ? 20,
npm,tsx,pm2installed globally
Step 1 — Copy source files into the new project
Copy these files/directories verbatim from the Trauma Navigator repo into your project root:
Source pathWhat it isopenai.tsGPT-4o completion + HeyGen + OpenAI TTS functionsroutes.tsAll chatbot API routes + speech cache logicembeddings.tsRAG retrieval: cosine similarity + context builderauth.tsscrypt password hashing / verificationstorage.tsDrizzle database access layerschema.tsDrizzle schema (all tables)ChatBot.tsxReact floating chat widgetuse-tts.tsGeneric TTS hook (calls /api/tts)
Register routes — in your main
index.ts, import and mount the routes from routes.ts with app.use(routes).Step 2 — Install npm dependencies
Add these packages to package.json and run npm install:
openai pg drizzle-orm drizzle-zod drizzle-kit express multer pdf-parse zod zod-validation-error bcryptjs @types/bcryptjs express-session connect-pg-simple memorystore passport passport-local
For the React client, these are already present if you scaffolded with Vite + Shadcn, but confirm:
@tanstack/react-query lucide-react wouter framer-motion
Step 3 — Create the HeyGen voice clone (or reuse existing)
Skip this step if you have an existing Voice ID.
- Record 1–5 minutes of clean speech (no background noise) from the target voice.
- Log in to HeyGen Studio ? Voices ? Voice Clone.
- Upload samples ? create Instant Voice Clone (or Professional for higher quality).
- Once processed, copy the Voice ID UUID shown in the dashboard.
Step 4 — Set the Voice ID in the codebase
In
openai.ts, update line 97: ts const FELICIA_VOICE_ID = "<your-voice-id-uuid-here>";
Step 5 — Configure environment variables
Create a .env file in the project root:
DATABASE_URL=postgres://user:password@host:5432/dbname OPENAI_API_KEY=sk-... HEYGEN_API_KEY=sk_V2_... ELEVENLABS_API_KEY= # optional — legacy fallback, can be left empty ADMIN_PASSWORD=your-admin-pw # protects /api/admin/* routes SITE_URL=https://your-domain.com
For PM2, mirror all keys in ecosystem.config.cjs inside the env block so the production process receives them:
js
module.exports = {
apps: [{
name: "your-app-name",
script: "dist/index.cjs",
env: {
NODE_ENV: "production",
DATABASE_URL: "...",
OPENAI_API_KEY: "...",
HEYGEN_API_KEY: "...",
ADMIN_PASSWORD: "...",
SITE_URL: "..."
}
}]
};
Step 6 — Apply the database schema
Run the Drizzle migration to create all required tables in PostgreSQL:
bash npm run db:push
This reads
schema.ts and pushes the schema against DATABASE_URL. Tables created: visitors, conversations, messages, resources, pages, resource_embeddings, page_embeddings, knowledge_documentsStep 7 — Seed resources (exercise library) for RAG
The RAG system searches the resources and pages tables. Without content, the chatbot still works but won't suggest exercises.
- Insert resource rows into the
resourcestable (title, description, content, type, category). - Insert page rows into the
pagestable if you have institutional content.
Step 8 — Build the project
bash npm run build
This compiles the Express server to
index.cjs and bundles the React client into public.Step 9 — Start with PM2
bash pm2 start ecosystem.config.cjs pm2 save # persist across reboots pm2 startup # install startup hook
Verify:
bash pm2 status # should show "online" pm2 logs your-app-name # watch for startup errors
Step 10 — Run RAG indexing (generates embeddings)
After the server is running, trigger initial embedding generation:
bash curl -X POST https://your-domain.com/api/admin/index-knowledge \ -H "x-admin-password: your-admin-pw"
This vectorises all resources, pages, and knowledge documents using text-embedding-3-small and writes the results to resource_embeddings, page_embeddings, and knowledge_documents.
Re-run any time you add new resources or pages.
Step 11 — Verify end-to-end
- Open the app in a browser ? the floating heart button should appear (rendered by
ChatBot.tsx). - Register a new account ? verify a
visitorsrow andconversationsrow are created in the DB. - Send a message ? confirm an AI response is returned.
- Click the Listen button ? audio should play (2–4s on cache miss, instant on cache hit).
- Check PM2 logs for any HeyGen API errors; if voice fails, the fallback OpenAI
novavoice will play automatically.
Troubleshooting quick reference
SymptomLikely causeFixDATABASE_URL error on startupEnv var missingCheck .env and ecosystem.config.cjsNo AI responseOPENAI_API_KEY invalid or model not enabledVerify key in OpenAI dashboardVoice plays OpenAI nova instead of cloned voiceHEYGEN_API_KEY wrong or Voice ID incorrectCheck HeyGen dashboard; update .env and FELICIA_VOICE_ID constantRAG returns no contextEmbeddings not indexedRe-run POST /api/admin/index-knowledgepm2 logs shows port conflictAnother process on same portChange PORT in .env or stop the conflicting process
