6.6 KiB

Raw Permalink Blame History

RunPod WhisperX Integration Setup

This guide explains how to set up and use the RunPod WhisperX endpoint for transcription in the canvas website.

Overview

The transcription system can now use a hosted WhisperX endpoint on RunPod instead of running the Whisper model locally in the browser. This provides:

Better accuracy with WhisperX's advanced features
Faster processing (no model download needed)
Reduced client-side resource usage
Support for longer audio files

Prerequisites

A RunPod account with an active WhisperX endpoint
Your RunPod API key
Your RunPod endpoint ID

Configuration

Environment Variables

Add the following environment variables to your .env.local file (or your deployment environment):

# RunPod Configuration
VITE_RUNPOD_API_KEY=your_runpod_api_key_here
VITE_RUNPOD_ENDPOINT_ID=your_endpoint_id_here

Or if using Next.js:

NEXT_PUBLIC_RUNPOD_API_KEY=your_runpod_api_key_here
NEXT_PUBLIC_RUNPOD_ENDPOINT_ID=your_endpoint_id_here

Getting Your RunPod Credentials

API Key:
- Go to RunPod Settings
- Navigate to API Keys section
- Create a new API key or copy an existing one
Endpoint ID:
- Go to RunPod Serverless Endpoints
- Find your WhisperX endpoint
- Copy the endpoint ID from the URL or endpoint details
- Example: If your endpoint URL is https://api.runpod.ai/v2/lrtisuv8ixbtub/run, then lrtisuv8ixbtub is your endpoint ID

Usage

Automatic Detection

The transcription hook automatically detects if RunPod is configured and uses it instead of the local Whisper model. No code changes are needed!

Manual Override

If you want to explicitly control which transcription method to use:

import { useWhisperTranscription } from '@/hooks/useWhisperTranscriptionSimple'

const {
  isRecording,
  transcript,
  startRecording,
  stopRecording
} = useWhisperTranscription({
  useRunPod: true, // Force RunPod usage
  language: 'en',
  onTranscriptUpdate: (text) => {
    console.log('New transcript:', text)
  }
})

Or to force local model:

useWhisperTranscription({
  useRunPod: false, // Force local Whisper model
  // ... other options
})

API Format

The integration sends audio data to your RunPod endpoint in the following format:

{
  "input": {
    "audio": "base64_encoded_audio_data",
    "audio_format": "audio/wav",
    "language": "en",
    "task": "transcribe"
  }
}

Expected Response Format

The endpoint should return one of these formats:

Direct Response:

{
  "output": {
    "text": "Transcribed text here"
  }
}

Or with segments:

{
  "output": {
    "segments": [
      {
        "start": 0.0,
        "end": 2.5,
        "text": "Transcribed text here"
      }
    ]
  }
}

Async Job Pattern:

{
  "id": "job-id-123",
  "status": "IN_QUEUE"
}

The integration automatically handles async jobs by polling the status endpoint until completion.

Customizing the API Request

If your WhisperX endpoint expects a different request format, you can modify src/lib/runpodApi.ts:

// In transcribeWithRunPod function
const requestBody = {
  input: {
    // Adjust these fields based on your endpoint
    audio: audioBase64,
    // Add or modify fields as needed
  }
}

Troubleshooting

"RunPod API key or endpoint ID not configured"

Ensure environment variables are set correctly
Restart your development server after adding environment variables
Check that variable names match exactly (case-sensitive)

"RunPod API error: 401"

Verify your API key is correct
Check that your API key has not expired
Ensure you're using the correct API key format

"RunPod API error: 404"

Verify your endpoint ID is correct
Check that your endpoint is active in the RunPod console
Ensure the endpoint URL format matches: https://api.runpod.ai/v2/{ENDPOINT_ID}/run

"No transcription text found in RunPod response"

Check your endpoint's response format matches the expected format
Verify your WhisperX endpoint is configured correctly
Check the browser console for detailed error messages

"Failed to return job results" (400 Bad Request)

This error occurs on the server side when your WhisperX endpoint tries to return results. This typically means:

Response format mismatch: Your endpoint's response doesn't match RunPod's expected format
- Ensure your endpoint returns: {"output": {"text": "..."}} or {"output": {"segments": [...]}}
- The response must be valid JSON
- Check your endpoint handler code to ensure it's returning the correct structure
Response size limits: The response might be too large
- Try with shorter audio files first
- Check RunPod's response size limits
Timeout issues: The endpoint might be taking too long to process
- Check your endpoint logs for processing time
- Consider optimizing your WhisperX model configuration

Check endpoint handler: Review your WhisperX endpoint's handler.py or equivalent:

# Example correct format
def handler(event):
    # ... process audio ...
    return {
        "output": {
            "text": transcription_text
        }
    }

Transcription not working

Check browser console for errors
Verify your endpoint is active and responding
Test your endpoint directly using curl or Postman
Ensure audio format is supported (WAV format is recommended)
Check RunPod endpoint logs for server-side errors

Testing Your Endpoint

You can test your RunPod endpoint directly:

curl -X POST https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "input": {
      "audio": "base64_audio_data_here",
      "audio_format": "audio/wav",
      "language": "en"
    }
  }'

Fallback Behavior

If RunPod is not configured or fails, the system will:

Try to use RunPod if configured
Fall back to local Whisper model if RunPod fails or is not configured
Show error messages if both methods fail

Performance Considerations

RunPod: Better for longer audio files and higher accuracy, but requires network connection
Local Model: Works offline, but requires model download and uses more client resources

Support

For issues specific to:

RunPod API: Check RunPod Documentation
WhisperX: Check your WhisperX endpoint configuration
Integration: Check browser console for detailed error messages

6.6 KiB Raw Permalink Blame History