6.6 KiB
RunPod WhisperX Integration Setup
This guide explains how to set up and use the RunPod WhisperX endpoint for transcription in the canvas website.
Overview
The transcription system can now use a hosted WhisperX endpoint on RunPod instead of running the Whisper model locally in the browser. This provides:
- Better accuracy with WhisperX's advanced features
- Faster processing (no model download needed)
- Reduced client-side resource usage
- Support for longer audio files
Prerequisites
- A RunPod account with an active WhisperX endpoint
- Your RunPod API key
- Your RunPod endpoint ID
Configuration
Environment Variables
Add the following environment variables to your .env.local file (or your deployment environment):
# RunPod Configuration
VITE_RUNPOD_API_KEY=your_runpod_api_key_here
VITE_RUNPOD_ENDPOINT_ID=your_endpoint_id_here
Or if using Next.js:
NEXT_PUBLIC_RUNPOD_API_KEY=your_runpod_api_key_here
NEXT_PUBLIC_RUNPOD_ENDPOINT_ID=your_endpoint_id_here
Getting Your RunPod Credentials
-
API Key:
- Go to RunPod Settings
- Navigate to API Keys section
- Create a new API key or copy an existing one
-
Endpoint ID:
- Go to RunPod Serverless Endpoints
- Find your WhisperX endpoint
- Copy the endpoint ID from the URL or endpoint details
- Example: If your endpoint URL is
https://api.runpod.ai/v2/lrtisuv8ixbtub/run, thenlrtisuv8ixbtubis your endpoint ID
Usage
Automatic Detection
The transcription hook automatically detects if RunPod is configured and uses it instead of the local Whisper model. No code changes are needed!
Manual Override
If you want to explicitly control which transcription method to use:
import { useWhisperTranscription } from '@/hooks/useWhisperTranscriptionSimple'
const {
isRecording,
transcript,
startRecording,
stopRecording
} = useWhisperTranscription({
useRunPod: true, // Force RunPod usage
language: 'en',
onTranscriptUpdate: (text) => {
console.log('New transcript:', text)
}
})
Or to force local model:
useWhisperTranscription({
useRunPod: false, // Force local Whisper model
// ... other options
})
API Format
The integration sends audio data to your RunPod endpoint in the following format:
{
"input": {
"audio": "base64_encoded_audio_data",
"audio_format": "audio/wav",
"language": "en",
"task": "transcribe"
}
}
Expected Response Format
The endpoint should return one of these formats:
Direct Response:
{
"output": {
"text": "Transcribed text here"
}
}
Or with segments:
{
"output": {
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Transcribed text here"
}
]
}
}
Async Job Pattern:
{
"id": "job-id-123",
"status": "IN_QUEUE"
}
The integration automatically handles async jobs by polling the status endpoint until completion.
Customizing the API Request
If your WhisperX endpoint expects a different request format, you can modify src/lib/runpodApi.ts:
// In transcribeWithRunPod function
const requestBody = {
input: {
// Adjust these fields based on your endpoint
audio: audioBase64,
// Add or modify fields as needed
}
}
Troubleshooting
"RunPod API key or endpoint ID not configured"
- Ensure environment variables are set correctly
- Restart your development server after adding environment variables
- Check that variable names match exactly (case-sensitive)
"RunPod API error: 401"
- Verify your API key is correct
- Check that your API key has not expired
- Ensure you're using the correct API key format
"RunPod API error: 404"
- Verify your endpoint ID is correct
- Check that your endpoint is active in the RunPod console
- Ensure the endpoint URL format matches:
https://api.runpod.ai/v2/{ENDPOINT_ID}/run
"No transcription text found in RunPod response"
- Check your endpoint's response format matches the expected format
- Verify your WhisperX endpoint is configured correctly
- Check the browser console for detailed error messages
"Failed to return job results" (400 Bad Request)
This error occurs on the server side when your WhisperX endpoint tries to return results. This typically means:
-
Response format mismatch: Your endpoint's response doesn't match RunPod's expected format
- Ensure your endpoint returns:
{"output": {"text": "..."}}or{"output": {"segments": [...]}} - The response must be valid JSON
- Check your endpoint handler code to ensure it's returning the correct structure
- Ensure your endpoint returns:
-
Response size limits: The response might be too large
- Try with shorter audio files first
- Check RunPod's response size limits
-
Timeout issues: The endpoint might be taking too long to process
- Check your endpoint logs for processing time
- Consider optimizing your WhisperX model configuration
-
Check endpoint handler: Review your WhisperX endpoint's
handler.pyor equivalent:# Example correct format def handler(event): # ... process audio ... return { "output": { "text": transcription_text } }
Transcription not working
- Check browser console for errors
- Verify your endpoint is active and responding
- Test your endpoint directly using curl or Postman
- Ensure audio format is supported (WAV format is recommended)
- Check RunPod endpoint logs for server-side errors
Testing Your Endpoint
You can test your RunPod endpoint directly:
curl -X POST https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"input": {
"audio": "base64_audio_data_here",
"audio_format": "audio/wav",
"language": "en"
}
}'
Fallback Behavior
If RunPod is not configured or fails, the system will:
- Try to use RunPod if configured
- Fall back to local Whisper model if RunPod fails or is not configured
- Show error messages if both methods fail
Performance Considerations
- RunPod: Better for longer audio files and higher accuracy, but requires network connection
- Local Model: Works offline, but requires model download and uses more client resources
Support
For issues specific to:
- RunPod API: Check RunPod Documentation
- WhisperX: Check your WhisperX endpoint configuration
- Integration: Check browser console for detailed error messages