256 lines
6.6 KiB
Markdown
256 lines
6.6 KiB
Markdown
# RunPod WhisperX Integration Setup
|
|
|
|
This guide explains how to set up and use the RunPod WhisperX endpoint for transcription in the canvas website.
|
|
|
|
## Overview
|
|
|
|
The transcription system can now use a hosted WhisperX endpoint on RunPod instead of running the Whisper model locally in the browser. This provides:
|
|
- Better accuracy with WhisperX's advanced features
|
|
- Faster processing (no model download needed)
|
|
- Reduced client-side resource usage
|
|
- Support for longer audio files
|
|
|
|
## Prerequisites
|
|
|
|
1. A RunPod account with an active WhisperX endpoint
|
|
2. Your RunPod API key
|
|
3. Your RunPod endpoint ID
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Add the following environment variables to your `.env.local` file (or your deployment environment):
|
|
|
|
```bash
|
|
# RunPod Configuration
|
|
VITE_RUNPOD_API_KEY=your_runpod_api_key_here
|
|
VITE_RUNPOD_ENDPOINT_ID=your_endpoint_id_here
|
|
```
|
|
|
|
Or if using Next.js:
|
|
|
|
```bash
|
|
NEXT_PUBLIC_RUNPOD_API_KEY=your_runpod_api_key_here
|
|
NEXT_PUBLIC_RUNPOD_ENDPOINT_ID=your_endpoint_id_here
|
|
```
|
|
|
|
### Getting Your RunPod Credentials
|
|
|
|
1. **API Key**:
|
|
- Go to [RunPod Settings](https://www.runpod.io/console/user/settings)
|
|
- Navigate to API Keys section
|
|
- Create a new API key or copy an existing one
|
|
|
|
2. **Endpoint ID**:
|
|
- Go to [RunPod Serverless Endpoints](https://www.runpod.io/console/serverless)
|
|
- Find your WhisperX endpoint
|
|
- Copy the endpoint ID from the URL or endpoint details
|
|
- Example: If your endpoint URL is `https://api.runpod.ai/v2/lrtisuv8ixbtub/run`, then `lrtisuv8ixbtub` is your endpoint ID
|
|
|
|
## Usage
|
|
|
|
### Automatic Detection
|
|
|
|
The transcription hook automatically detects if RunPod is configured and uses it instead of the local Whisper model. No code changes are needed!
|
|
|
|
### Manual Override
|
|
|
|
If you want to explicitly control which transcription method to use:
|
|
|
|
```typescript
|
|
import { useWhisperTranscription } from '@/hooks/useWhisperTranscriptionSimple'
|
|
|
|
const {
|
|
isRecording,
|
|
transcript,
|
|
startRecording,
|
|
stopRecording
|
|
} = useWhisperTranscription({
|
|
useRunPod: true, // Force RunPod usage
|
|
language: 'en',
|
|
onTranscriptUpdate: (text) => {
|
|
console.log('New transcript:', text)
|
|
}
|
|
})
|
|
```
|
|
|
|
Or to force local model:
|
|
|
|
```typescript
|
|
useWhisperTranscription({
|
|
useRunPod: false, // Force local Whisper model
|
|
// ... other options
|
|
})
|
|
```
|
|
|
|
## API Format
|
|
|
|
The integration sends audio data to your RunPod endpoint in the following format:
|
|
|
|
```json
|
|
{
|
|
"input": {
|
|
"audio": "base64_encoded_audio_data",
|
|
"audio_format": "audio/wav",
|
|
"language": "en",
|
|
"task": "transcribe"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Expected Response Format
|
|
|
|
The endpoint should return one of these formats:
|
|
|
|
**Direct Response:**
|
|
```json
|
|
{
|
|
"output": {
|
|
"text": "Transcribed text here"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Or with segments:**
|
|
```json
|
|
{
|
|
"output": {
|
|
"segments": [
|
|
{
|
|
"start": 0.0,
|
|
"end": 2.5,
|
|
"text": "Transcribed text here"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Async Job Pattern:**
|
|
```json
|
|
{
|
|
"id": "job-id-123",
|
|
"status": "IN_QUEUE"
|
|
}
|
|
```
|
|
|
|
The integration automatically handles async jobs by polling the status endpoint until completion.
|
|
|
|
## Customizing the API Request
|
|
|
|
If your WhisperX endpoint expects a different request format, you can modify `src/lib/runpodApi.ts`:
|
|
|
|
```typescript
|
|
// In transcribeWithRunPod function
|
|
const requestBody = {
|
|
input: {
|
|
// Adjust these fields based on your endpoint
|
|
audio: audioBase64,
|
|
// Add or modify fields as needed
|
|
}
|
|
}
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### "RunPod API key or endpoint ID not configured"
|
|
|
|
- Ensure environment variables are set correctly
|
|
- Restart your development server after adding environment variables
|
|
- Check that variable names match exactly (case-sensitive)
|
|
|
|
### "RunPod API error: 401"
|
|
|
|
- Verify your API key is correct
|
|
- Check that your API key has not expired
|
|
- Ensure you're using the correct API key format
|
|
|
|
### "RunPod API error: 404"
|
|
|
|
- Verify your endpoint ID is correct
|
|
- Check that your endpoint is active in the RunPod console
|
|
- Ensure the endpoint URL format matches: `https://api.runpod.ai/v2/{ENDPOINT_ID}/run`
|
|
|
|
### "No transcription text found in RunPod response"
|
|
|
|
- Check your endpoint's response format matches the expected format
|
|
- Verify your WhisperX endpoint is configured correctly
|
|
- Check the browser console for detailed error messages
|
|
|
|
### "Failed to return job results" (400 Bad Request)
|
|
|
|
This error occurs on the **server side** when your WhisperX endpoint tries to return results. This typically means:
|
|
|
|
1. **Response format mismatch**: Your endpoint's response doesn't match RunPod's expected format
|
|
- Ensure your endpoint returns: `{"output": {"text": "..."}}` or `{"output": {"segments": [...]}}`
|
|
- The response must be valid JSON
|
|
- Check your endpoint handler code to ensure it's returning the correct structure
|
|
|
|
2. **Response size limits**: The response might be too large
|
|
- Try with shorter audio files first
|
|
- Check RunPod's response size limits
|
|
|
|
3. **Timeout issues**: The endpoint might be taking too long to process
|
|
- Check your endpoint logs for processing time
|
|
- Consider optimizing your WhisperX model configuration
|
|
|
|
4. **Check endpoint handler**: Review your WhisperX endpoint's `handler.py` or equivalent:
|
|
```python
|
|
# Example correct format
|
|
def handler(event):
|
|
# ... process audio ...
|
|
return {
|
|
"output": {
|
|
"text": transcription_text
|
|
}
|
|
}
|
|
```
|
|
|
|
### Transcription not working
|
|
|
|
- Check browser console for errors
|
|
- Verify your endpoint is active and responding
|
|
- Test your endpoint directly using curl or Postman
|
|
- Ensure audio format is supported (WAV format is recommended)
|
|
- Check RunPod endpoint logs for server-side errors
|
|
|
|
## Testing Your Endpoint
|
|
|
|
You can test your RunPod endpoint directly:
|
|
|
|
```bash
|
|
curl -X POST https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer YOUR_API_KEY" \
|
|
-d '{
|
|
"input": {
|
|
"audio": "base64_audio_data_here",
|
|
"audio_format": "audio/wav",
|
|
"language": "en"
|
|
}
|
|
}'
|
|
```
|
|
|
|
## Fallback Behavior
|
|
|
|
If RunPod is not configured or fails, the system will:
|
|
1. Try to use RunPod if configured
|
|
2. Fall back to local Whisper model if RunPod fails or is not configured
|
|
3. Show error messages if both methods fail
|
|
|
|
## Performance Considerations
|
|
|
|
- **RunPod**: Better for longer audio files and higher accuracy, but requires network connection
|
|
- **Local Model**: Works offline, but requires model download and uses more client resources
|
|
|
|
## Support
|
|
|
|
For issues specific to:
|
|
- **RunPod API**: Check [RunPod Documentation](https://docs.runpod.io)
|
|
- **WhisperX**: Check your WhisperX endpoint configuration
|
|
- **Integration**: Check browser console for detailed error messages
|
|
|
|
|
|
|