π Make Your AI Talk Back!
Your AI can chat, create images, understand audio, and analyze files. Now letβs give it a voice! π€
Imagine users asking βWhatβs the weather like?β and your AI speaking back in a warm, friendly voice instead of just showing text. Or reading long articles aloud while they work on other things!
What weβre building: Your AI will be able to speak any text in 6 different voice personalities - from professional business tones to energetic marketing voices. Itβs like having a team of voice actors inside your app!
π― From Silent Text to Speaking AI
Section titled βπ― From Silent Text to Speaking AIβCurrent state: Your AI shows brilliant text responses Target state: Users can hear your AI speak with natural voices!
π The Amazing Transformation
Section titled βπ The Amazing TransformationβBefore (Silent AI):
User: "Explain quantum physics"AI: [Shows long text explanation]User: [Has to read everything] π΄After (Speaking AI):
User: "Explain quantum physics"AI: [Shows text AND speaks it] πUser: [Can listen while doing other things] π§The magic: Your AI becomes accessible, engaging, and multitask-friendly!
π Why Voice Makes Your App Incredible
Section titled βπ Why Voice Makes Your App IncredibleβReal-world impact:
- π± Accessibility heroes - Visually impaired users can fully enjoy your app
- πββοΈ Multitasking magic - Users can listen while exercising, driving, or working
- π§ Learning boost - Audio learners absorb information better when they hear it
- π Instant podcasts - Turn any article into audio content on demand
- π― Better engagement - Voice keeps users active instead of passive readers
Without voice AI:
β Hire expensive voice actorsβ Use robotic computer voicesβ Miss 15% of users who prefer audioβ Limited to text-only experiencesWith voice AI:
β
Professional voices in secondsβ
Natural, engaging speechβ
Serve all learning stylesβ
Complete multimedia experienceπ Meet Your 6 AI Voice Actors
Section titled βπ Meet Your 6 AI Voice ActorsβOpenAI gives you a complete voice acting team! Each one has a distinct personality:
ποΈ Alloy - The Professional
Perfect for: Business presentations, formal contentSounds like: Your trusted corporate spokespersonUser feels: Confident and professionalπ Echo - The Calm Companion
Perfect for: Meditation apps, soothing contentSounds like: Your gentle yoga instructorUser feels: Relaxed and peacefulπ Fable - The Master Storyteller
Perfect for: Creative content, engaging storiesSounds like: Your favorite audiobook narratorUser feels: Captivated and entertainedπ― Onyx - The Authority
Perfect for: News, important announcementsSounds like: Your trusted news anchorUser feels: Informed and confidentβοΈ Nova - The Friendly Helper
Perfect for: Tutorials, customer supportSounds like: Your helpful best friendUser feels: Welcome and supported⨠Shimmer - The Energy Booster
Perfect for: Marketing, motivational contentSounds like: Your enthusiastic coachUser feels: Excited and motivatedPro tip: Weβll build a voice selector so users can choose their favorite!
π οΈ Step 1: Add Voice Power to Your Backend
Section titled βπ οΈ Step 1: Add Voice Power to Your BackendβGood news: Weβre using the exact same patterns you already know!
What you already have:
// Your familiar Response API patternconst response = await client.responses.create({ model: "gpt-4o", input: [systemPrompt, userMessage]});What weβre adding:
// New voice synthesis (same style!)const speech = await client.audio.speech.create({ model: "tts-1", voice: "alloy", input: textToSpeak});Perfect! Same patterns, just different endpoints.
π§ Understanding Voice Generation Flow
Section titled βπ§ Understanding Voice Generation FlowβSimple concept: Text goes in β Beautiful voice comes out!
// What we need to track:const voiceState = { textInput: "Hello, I'm your AI assistant!", // What to say selectedVoice: "nova", // Who says it audioSettings: { // How to say it speed: 1.0, // Normal speed quality: "hd", // High definition format: "mp3" // Audio format }, generatedAudio: "audio-file-url", // Result!}Voice options:
- πββοΈ TTS-1 - Fast generation (great for testing)
- π TTS-1-HD - Premium quality (perfect for production)
- β‘ Speed control - From 0.25x (slow) to 4x (fast)
- π΅ Formats - MP3, Opus, AAC, FLAC
Step 2: Add the Voice Generation Route
Section titled βStep 2: Add the Voice Generation RouteβAdd this to your existing server - same patterns you know and love:
import fs from 'fs';import path from 'path';
// π VOICE PROFILES: Available AI voices with personalitiesconst VOICE_PROFILES = { alloy: { name: "Alloy", description: "Professional and versatile", bestFor: "Business content, presentations" }, echo: { name: "Echo", description: "Calm and soothing", bestFor: "Meditation, relaxation content" }, fable: { name: "Fable", description: "Expressive storyteller", bestFor: "Stories, creative content" }, onyx: { name: "Onyx", description: "Deep and authoritative", bestFor: "News, formal announcements" }, nova: { name: "Nova", description: "Warm and friendly", bestFor: "Customer service, tutorials" }, shimmer: { name: "Shimmer", description: "Bright and energetic", bestFor: "Marketing, upbeat content" }};
// π§ HELPER FUNCTIONS: Audio processing utilitiesconst saveAudioToTemp = async (audioBuffer, format = 'mp3') => { const tempDir = path.join(process.cwd(), "temp");
// Create temp directory if it doesn't exist if (!fs.existsSync(tempDir)) { fs.mkdirSync(tempDir, { recursive: true }); }
// Create unique filename const filename = `tts-${Date.now()}.${format}`; const filepath = path.join(tempDir, filename);
// Write audio file fs.writeFileSync(filepath, audioBuffer);
// Auto-cleanup after 1 hour setTimeout(() => { try { if (fs.existsSync(filepath)) { fs.unlinkSync(filepath); console.log(`π§Ή Cleaned up: ${filename}`); } } catch (error) { console.error("Error cleaning up audio file:", error); } }, 3600000); // 1 hour
return { filepath, filename };};
// π AI Text-to-Speech endpoint - add this to your existing serverapp.post("/api/tts/generate", async (req, res) => { try { // π‘οΈ VALIDATION: Check required inputs const { text, voice = "alloy", model = "tts-1", speed = 1.0, format = "mp3" } = req.body;
if (!text || text.trim() === "") { return res.status(400).json({ error: "Text is required", success: false }); }
if (text.length > 4096) { return res.status(400).json({ error: "Text too long. Maximum 4096 characters allowed.", current_length: text.length, success: false }); }
console.log(`π Generating speech: ${text.substring(0, 50)}... (${voice})`);
// ποΈ AI SPEECH GENERATION: Convert text to speech const response = await openai.audio.speech.create({ model: model, // tts-1 (fast) or tts-1-hd (high quality) voice: voice, // AI voice personality input: text.trim(), // Text to convert response_format: format, // Audio format (mp3, opus, aac, flac) speed: Math.max(0.25, Math.min(4.0, speed)) // Speaking speed (0.25x to 4x) });
// πΎ AUDIO PROCESSING: Save audio file const audioBuffer = Buffer.from(await response.arrayBuffer()); const { filepath, filename } = await saveAudioToTemp(audioBuffer, format);
// π€ SUCCESS RESPONSE: Send audio info and download link res.json({ success: true, audio: { filename: filename, format: format, size: audioBuffer.length, duration_estimate: Math.ceil(text.length / 14), // ~14 characters per second download_url: `/api/tts/download/${filename}` }, generation: { voice: voice, voice_info: VOICE_PROFILES[voice], model: model, speed: speed, text_length: text.length }, timestamp: new Date().toISOString() });
} catch (error) { // π¨ ERROR HANDLING: Handle TTS failures console.error("Text-to-speech error:", error); res.status(500).json({ error: "Failed to generate speech", details: error.message, success: false }); }});
// π₯ Audio Download endpoint - serve generated audio filesapp.get("/api/tts/download/:filename", (req, res) => { try { const { filename } = req.params; const filepath = path.join(process.cwd(), "temp", filename);
// Security check - ensure filename is safe if (!filename.match(/^tts-\d+\.(mp3|opus|aac|flac)$/)) { return res.status(400).json({ error: "Invalid filename" }); }
// Check if file exists if (!fs.existsSync(filepath)) { return res.status(404).json({ error: "Audio file not found or expired" }); }
// Serve audio file const extension = path.extname(filename).substring(1); res.setHeader('Content-Type', `audio/${extension}`); res.setHeader('Content-Disposition', `attachment; filename="${filename}"`);
const audioBuffer = fs.readFileSync(filepath); res.send(audioBuffer);
} catch (error) { console.error("Audio download error:", error); res.status(500).json({ error: "Failed to download audio", message: error.message }); }});
// ποΈ Voice Information endpoint - get available voicesapp.get("/api/tts/voices", (req, res) => { res.json({ success: true, voices: VOICE_PROFILES, models: [ { id: "tts-1", name: "TTS-1", description: "Fast, cost-effective synthesis", quality: "standard" }, { id: "tts-1-hd", name: "TTS-1 HD", description: "High-definition audio quality", quality: "premium" } ], formats: ["mp3", "opus", "aac", "flac"], speed_range: { min: 0.25, max: 4.0, default: 1.0 }, text_limit: 4096 });});What this does (step by step):
- β Validates text - Makes sure we have something to say
- π Picks voice - Selects the right AI personality
- ποΈ Generates speech - OpenAI creates beautiful audio
- πΎ Saves file - Stores audio temporarily for download
- π€ Returns results - Sends back audio URL and metadata
- π§Ή Cleans up - Removes old files automatically
Same reliable patterns as your chat and image features!
Step 2C: Adding Error Handling for TTS
Section titled βStep 2C: Adding Error Handling for TTSβAdd this middleware to handle text-to-speech specific errors:
// π¨ TTS ERROR HANDLING: Handle text-to-speech errorsapp.use((error, req, res, next) => { if (error.message && error.message.includes('Invalid voice')) { return res.status(400).json({ error: "Invalid voice selected. Please choose from: alloy, echo, fable, onyx, nova, shimmer", success: false }); }
if (error.message && error.message.includes('text too long')) { return res.status(400).json({ error: "Text exceeds maximum length of 4096 characters", success: false }); }
next(error);});Your backend now supports:
- Text chat (existing functionality)
- Streaming chat (existing functionality)
- Image generation (existing functionality)
- Audio transcription (existing functionality)
- File analysis (existing functionality)
- Text-to-speech (new functionality)
---
## π§ Step 3: Building the React Text-to-Speech Component
Now let's create a React component for text-to-speech using the same patterns from your existing components.
### **Step 3A: Creating the Text-to-Speech Component**
Create a new file `src/TextToSpeech.jsx`:
```jsximport { useState, useRef, useEffect } from "react";import { Volume2, Play, Pause, Download, Settings } from "lucide-react";
function TextToSpeech() { // π§ STATE: Text-to-speech data management const [text, setText] = useState(""); // Text to convert const [selectedVoice, setSelectedVoice] = useState("alloy"); // AI voice selection const [audioSettings, setAudioSettings] = useState({ // TTS settings model: "tts-1", speed: 1.0, format: "mp3" }); const [isGenerating, setIsGenerating] = useState(false); // Processing status const [generatedAudio, setGeneratedAudio] = useState([]); // Generated audio list const [currentlyPlaying, setCurrentlyPlaying] = useState(null); // Audio playback state const [voices, setVoices] = useState({}); // Available voices const [error, setError] = useState(null); // Error messages
const audioRef = useRef(null);
// Load available voices on component mount useEffect(() => { fetchVoices(); }, []);
const fetchVoices = async () => { try { const response = await fetch("http://localhost:8000/api/tts/voices"); const data = await response.json(); if (data.success) { setVoices(data.voices); } } catch (error) { console.error('Failed to fetch voices:', error); } };
// π§ FUNCTIONS: Text-to-speech logic engine
// Main speech generation function const generateSpeech = async () => { // π‘οΈ GUARDS: Prevent invalid generation if (!text.trim() || isGenerating) return;
// π SETUP: Prepare for generation setIsGenerating(true); setError(null);
try { // π€ API CALL: Send to your backend const response = await fetch("http://localhost:8000/api/tts/generate", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ text: text.trim(), voice: selectedVoice, ...audioSettings }) });
const data = await response.json();
if (!response.ok) { throw new Error(data.error || 'Failed to generate speech'); }
// β
SUCCESS: Store generated audio const newAudio = { id: Date.now(), text: text.trim(), voice: selectedVoice, settings: audioSettings, audio: data.audio, generation: data.generation, timestamp: new Date().toISOString() };
setGeneratedAudio(prev => [newAudio, ...prev]); setText(""); // Clear input after successful generation
} catch (error) { // π¨ ERROR HANDLING: Show user-friendly message console.error('Speech generation failed:', error); setError(error.message || 'Something went wrong while generating speech'); } finally { // π§Ή CLEANUP: Reset generation state setIsGenerating(false); } };
// Audio playback function const playAudio = async (audioItem) => { try { if (currentlyPlaying?.id === audioItem.id) { // Pause current audio if (audioRef.current) { audioRef.current.pause(); setCurrentlyPlaying(null); } return; }
// Stop any currently playing audio if (audioRef.current) { audioRef.current.pause(); }
// Create new audio element const audio = new Audio(`http://localhost:8000${audioItem.audio.download_url}`); audioRef.current = audio;
audio.onloadstart = () => setCurrentlyPlaying({ ...audioItem, status: 'loading' }); audio.oncanplay = () => setCurrentlyPlaying({ ...audioItem, status: 'ready' }); audio.onplay = () => setCurrentlyPlaying({ ...audioItem, status: 'playing' }); audio.onpause = () => setCurrentlyPlaying({ ...audioItem, status: 'paused' }); audio.onended = () => setCurrentlyPlaying(null); audio.onerror = () => { setCurrentlyPlaying(null); setError('Failed to play audio'); };
await audio.play(); } catch (error) { console.error('Audio playback error:', error); setCurrentlyPlaying(null); setError('Failed to play audio'); } };
// Download audio function const downloadAudio = (audioItem) => { try { const link = document.createElement('a'); link.href = `http://localhost:8000${audioItem.audio.download_url}`; link.download = `speech-${audioItem.id}.${audioItem.audio.format}`; document.body.appendChild(link); link.click(); document.body.removeChild(link); } catch (error) { console.error('Download error:', error); setError('Failed to download audio'); } };
// Sample texts for quick testing const sampleTexts = [ "Welcome to our application! I'm excited to help you with AI-powered text-to-speech.", "Once upon a time, in the world of artificial intelligence, voices came alive with just a few lines of code.", "This is a test of the emergency broadcast system. This is only a test.", "Take a deep breath and relax as you listen to this calming AI-generated voice.", "Breaking news: AI technology continues to amaze us with natural-sounding speech synthesis." ];
// Utility functions const formatFileSize = (bytes) => { if (bytes === 0) return '0 Bytes'; const k = 1024; const sizes = ['Bytes', 'KB', 'MB']; const i = Math.floor(Math.log(bytes) / Math.log(k)); return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i]; };
const formatDuration = (seconds) => { const mins = Math.floor(seconds / 60); const secs = Math.floor(seconds % 60); return `${mins}:${secs.toString().padStart(2, '0')}`; };
// π¨ UI: Interface components return ( <div className="min-h-screen bg-gradient-to-br from-orange-50 to-red-50 flex items-center justify-center p-4"> <div className="bg-white rounded-2xl shadow-2xl w-full max-w-4xl flex flex-col overflow-hidden">
{/* Header */} <div className="bg-gradient-to-r from-orange-600 to-red-600 text-white p-6"> <div className="flex items-center space-x-3"> <div className="w-10 h-10 bg-white bg-opacity-20 rounded-full flex items-center justify-center"> <Volume2 className="w-5 h-5" /> </div> <div> <h1 className="text-xl font-bold">π AI Text-to-Speech</h1> <p className="text-orange-100 text-sm">Convert any text to natural speech!</p> </div> </div> </div>
{/* Voice Settings Section */} <div className="p-6 border-b border-gray-200"> <h3 className="font-semibold text-gray-900 mb-4 flex items-center"> <Settings className="w-5 h-5 mr-2 text-orange-600" /> Voice Settings </h3>
<div className="grid grid-cols-1 md:grid-cols-4 gap-4"> {/* Voice Selection */} <div> <label className="block text-sm font-medium text-gray-700 mb-2">Voice</label> <select value={selectedVoice} onChange={(e) => setSelectedVoice(e.target.value)} disabled={isGenerating} className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-orange-500 disabled:bg-gray-100" > {Object.entries(voices).map(([key, voice]) => ( <option key={key} value={key}> {voice.name} - {voice.description} </option> ))} </select> </div>
{/* Model Selection */} <div> <label className="block text-sm font-medium text-gray-700 mb-2">Quality</label> <select value={audioSettings.model} onChange={(e) => setAudioSettings(prev => ({ ...prev, model: e.target.value }))} disabled={isGenerating} className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-orange-500 disabled:bg-gray-100" > <option value="tts-1">Standard (Fast)</option> <option value="tts-1-hd">HD (High Quality)</option> </select> </div>
{/* Speed Control */} <div> <label className="block text-sm font-medium text-gray-700 mb-2"> Speed ({audioSettings.speed}x) </label> <input type="range" min="0.25" max="4" step="0.05" value={audioSettings.speed} onChange={(e) => setAudioSettings(prev => ({ ...prev, speed: parseFloat(e.target.value) }))} disabled={isGenerating} className="w-full h-2 bg-gray-200 rounded-lg appearance-none cursor-pointer disabled:cursor-not-allowed" /> </div>
{/* Format Selection */} <div> <label className="block text-sm font-medium text-gray-700 mb-2">Format</label> <select value={audioSettings.format} onChange={(e) => setAudioSettings(prev => ({ ...prev, format: e.target.value }))} disabled={isGenerating} className="w-full px-3 py-2 border border-gray-300 rounded-lg focus:outline-none focus:ring-2 focus:ring-orange-500 disabled:bg-gray-100" > <option value="mp3">MP3</option> <option value="opus">Opus</option> <option value="aac">AAC</option> <option value="flac">FLAC</option> </select> </div> </div> </div>
{/* Text Input Section */} <div className="p-6 border-b border-gray-200"> <div className="mb-4"> <div className="flex justify-between items-center mb-2"> <label className="block text-sm font-medium text-gray-700">Text to Convert</label> <span className="text-sm text-gray-500">{text.length}/4096 characters</span> </div> <textarea value={text} onChange={(e) => setText(e.target.value)} placeholder="Enter the text you want to convert to speech..." className="w-full px-4 py-3 border border-gray-300 rounded-xl focus:outline-none focus:ring-2 focus:ring-orange-500 focus:border-transparent transition-all duration-200 resize-none" rows={4} maxLength={4096} disabled={isGenerating} /> </div>
{/* Sample Texts */} <div className="mb-4"> <p className="text-sm text-gray-600 mb-2">Quick samples:</p> <div className="flex flex-wrap gap-2"> {sampleTexts.map((sample, index) => ( <button key={index} onClick={() => setText(sample)} disabled={isGenerating} className="px-3 py-1 text-sm bg-gray-100 hover:bg-orange-100 text-gray-700 hover:text-orange-700 rounded-full transition-colors duration-200 disabled:opacity-50 disabled:cursor-not-allowed" > {sample.substring(0, 30)}... </button> ))} </div> </div>
{/* Generate Button */} <div className="flex justify-center"> <button onClick={generateSpeech} disabled={isGenerating || !text.trim()} className="px-8 py-3 bg-gradient-to-r from-orange-600 to-red-600 hover:from-orange-700 hover:to-red-700 disabled:from-gray-300 disabled:to-gray-300 text-white rounded-xl transition-all duration-200 flex items-center space-x-2 shadow-lg disabled:shadow-none" > {isGenerating ? ( <> <div className="w-4 h-4 border-2 border-white border-t-transparent rounded-full animate-spin"></div> <span>Generating...</span> </> ) : ( <> <Volume2 className="w-4 h-4" /> <span>Generate Speech</span> </> )} </button> </div> </div>
{/* Results Section */} <div className="flex-1 p-6"> {/* Error Display */} {error && ( <div className="bg-red-50 border border-red-200 rounded-lg p-4 mb-4"> <p className="text-red-700"> <strong>Error:</strong> {error} </p> </div> )}
{/* Generated Audio List */} {generatedAudio.length === 0 ? ( <div className="text-center py-12"> <div className="w-16 h-16 bg-orange-100 rounded-2xl flex items-center justify-center mx-auto mb-4"> <Volume2 className="w-8 h-8 text-orange-600" /> </div> <h3 className="text-lg font-semibold text-gray-700 mb-2"> No Audio Generated Yet </h3> <p className="text-gray-600 max-w-md mx-auto"> Enter some text above and click "Generate Speech" to create your first AI voice. </p> </div> ) : ( <div className="space-y-4"> <h4 className="font-semibold text-gray-900 mb-4"> Generated Audio ({generatedAudio.length}) </h4>
{generatedAudio.map((audioItem) => ( <div key={audioItem.id} className="bg-gray-50 rounded-lg p-4 border border-gray-200"> <div className="flex items-start justify-between mb-3"> <div className="flex-1"> <div className="flex items-center space-x-2 mb-2"> <div className="p-1 bg-orange-100 rounded"> <Volume2 className="w-4 h-4 text-orange-600" /> </div> <span className="font-medium text-gray-900 text-sm"> {voices[audioItem.voice]?.name || audioItem.voice} </span> <span className="text-xs text-gray-500"> {new Date(audioItem.timestamp).toLocaleTimeString()} </span> </div>
<p className="text-sm text-gray-700 mb-2 line-clamp-2"> {audioItem.text} </p>
<div className="flex flex-wrap gap-1 text-xs"> <span className="px-2 py-1 bg-orange-100 text-orange-800 rounded-full"> {audioItem.settings.model} </span> <span className="px-2 py-1 bg-blue-100 text-blue-800 rounded-full"> {audioItem.settings.speed}x speed </span> <span className="px-2 py-1 bg-green-100 text-green-800 rounded-full"> {formatFileSize(audioItem.audio.size)} </span> <span className="px-2 py-1 bg-gray-100 text-gray-800 rounded-full"> ~{formatDuration(audioItem.audio.duration_estimate)} </span> </div> </div>
<div className="flex items-center space-x-2"> <button onClick={() => playAudio(audioItem)} className="p-2 bg-orange-500 hover:bg-orange-600 text-white rounded-lg transition-colors duration-200" title={currentlyPlaying?.id === audioItem.id ? "Pause" : "Play"} > {currentlyPlaying?.id === audioItem.id && currentlyPlaying?.status === 'playing' ? ( <Pause className="w-4 h-4" /> ) : ( <Play className="w-4 h-4" /> )} </button>
<button onClick={() => downloadAudio(audioItem)} className="p-2 bg-green-500 hover:bg-green-600 text-white rounded-lg transition-colors duration-200" title="Download audio" > <Download className="w-4 h-4" /> </button> </div> </div> </div> ))} </div> )} </div> </div> </div> );}
export default TextToSpeech;Step 3B: Adding Text-to-Speech to Navigation
Section titled βStep 3B: Adding Text-to-Speech to NavigationβUpdate your src/App.jsx to include the new text-to-speech component:
import { useState } from "react";import StreamingChat from "./StreamingChat";import ImageGenerator from "./ImageGenerator";import AudioTranscription from "./AudioTranscription";import FileAnalysis from "./FileAnalysis";import TextToSpeech from "./TextToSpeech";import { MessageSquare, Image, Mic, Folder, Volume2 } from "lucide-react";
function App() { // π§ STATE: Navigation management const [currentView, setCurrentView] = useState("chat"); // 'chat', 'images', 'audio', 'files', or 'speech'
// π¨ UI: Main app with navigation return ( <div className="min-h-screen bg-gray-100"> {/* Navigation Header */} <nav className="bg-white shadow-sm border-b border-gray-200"> <div className="max-w-6xl mx-auto px-4"> <div className="flex items-center justify-between h-16"> {/* Logo */} <div className="flex items-center space-x-3"> <div className="w-8 h-8 bg-gradient-to-r from-blue-500 to-purple-600 rounded-lg flex items-center justify-center"> <span className="text-white font-bold text-sm">AI</span> </div> <h1 className="text-xl font-bold text-gray-900">OpenAI Mastery</h1> </div>
{/* Navigation Buttons */} <div className="flex space-x-2"> <button onClick={() => setCurrentView("chat")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "chat" ? "bg-blue-100 text-blue-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <MessageSquare className="w-4 h-4" /> <span>Chat</span> </button>
<button onClick={() => setCurrentView("images")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "images" ? "bg-purple-100 text-purple-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Image className="w-4 h-4" /> <span>Images</span> </button>
<button onClick={() => setCurrentView("audio")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "audio" ? "bg-blue-100 text-blue-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Mic className="w-4 h-4" /> <span>Audio</span> </button>
<button onClick={() => setCurrentView("files")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "files" ? "bg-green-100 text-green-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Folder className="w-4 h-4" /> <span>Files</span> </button>
<button onClick={() => setCurrentView("speech")} className={`px-4 py-2 rounded-lg flex items-center space-x-2 transition-all duration-200 ${ currentView === "speech" ? "bg-orange-100 text-orange-700 shadow-sm" : "text-gray-600 hover:text-gray-900 hover:bg-gray-100" }`} > <Volume2 className="w-4 h-4" /> <span>Speech</span> </button> </div> </div> </div> </nav>
{/* Main Content */} <main className="h-[calc(100vh-4rem)]"> {currentView === "chat" && <StreamingChat />} {currentView === "images" && <ImageGenerator />} {currentView === "audio" && <AudioTranscription />} {currentView === "files" && <FileAnalysis />} {currentView === "speech" && <TextToSpeech />} </main> </div> );}
export default App;π§ͺ Testing Your Text-to-Speech
Section titled βπ§ͺ Testing Your Text-to-SpeechβLetβs test your text-to-speech feature step by step to make sure everything works correctly.
Step 1: Backend Route Test
Section titled βStep 1: Backend Route TestβFirst, verify your backend route works by testing it directly:
Test with a simple text:
curl -X POST http://localhost:8000/api/tts/generate \ -H "Content-Type: application/json" \ -d '{"text": "Hello, this is a test of AI voice synthesis.", "voice": "alloy", "model": "tts-1"}'Expected response:
{ "success": true, "audio": { "filename": "tts-1234567890.mp3", "format": "mp3", "size": 15420, "duration_estimate": 3, "download_url": "/api/tts/download/tts-1234567890.mp3" }, "generation": { "voice": "alloy", "voice_info": { "name": "Alloy", "description": "Professional and versatile" }, "model": "tts-1", "speed": 1.0, "text_length": 44 }}Step 2: Full Application Test
Section titled βStep 2: Full Application TestβStart both servers:
Backend (in your backend folder):
npm run devFrontend (in your frontend folder):
npm run devTest the complete flow:
- Navigate to Speech β Click the βSpeechβ tab in navigation
- Select voice settings β Choose voice, quality, speed, and format
- Enter text β Type or select a sample text
- Generate speech β Click βGenerate Speechβ and see loading state
- Listen to audio β Click play button to hear the generated voice
- Download audio β Test downloading the speech file
- Try different voices β Test all six AI voices with the same text
Step 3: Voice Comparison Test
Section titled βStep 3: Voice Comparison TestβTest all six voices with the same text to hear their personalities:
ποΈ Alloy: Professional and neutralπ Echo: Calm and soothingπ Fable: Expressive storytellerπ― Onyx: Deep and authoritativeβοΈ Nova: Warm and friendlyβ¨ Shimmer: Bright and energeticExpected behavior:
- Each voice has distinct personality and tone
- Audio quality is clear and natural
- Playback controls work smoothly
- Download generates proper audio files
β What You Built
Section titled ββ What You BuiltβCongratulations! Youβve completed your comprehensive OpenAI mastery application with text-to-speech:
- β Extended your backend with voice synthesis and audio file management
- β Added React speech component following the same patterns as your other features
- β Implemented six AI voices with distinct personalities and use cases
- β Created flexible audio settings for quality, speed, and format control
- β Added playback functionality with play/pause controls
- β Maintained consistent design with your existing application
Your complete application now has:
- Text chat with streaming responses
- Image generation with DALL-E 3 and GPT-Image-1
- Audio transcription with Whisper voice recognition
- File analysis with intelligent document processing
- Text-to-speech with six AI voice personalities
- Unified navigation between all features
- Professional UI with consistent TailwindCSS styling
π Youβve built a complete OpenAI mastery application! Your users can now chat with AI, generate images, transcribe audio, analyze files, and hear AI responses spoken aloud - all in one seamless experience.
Your application demonstrates mastery of OpenAIβs entire ecosystem and provides a solid foundation for building even more advanced AI-powered applications. π