--- id: task-4 title: 'Phase 3: AI Integration Shapes' status: Done assignee: [] created_date: '2026-01-02 15:54' labels: - migration - shapes - ai dependencies: [] priority: medium --- ## Description Port AI-powered shapes using existing MCP servers and APIs: 1. **folk-image-gen** - Image generation (fal.ai Flux) - Prompt input, image history thread - Loading states, error handling - Uses: mcp__fal-ai__fal_generate_image 2. **folk-video-gen** - Video generation (WAN 2.1) - Image-to-video, text-to-video - Duration control, queue polling - Uses: mcp__fal-ai__fal_generate_video 3. **folk-prompt** - LLM prompt executor - Agent binding, multiple personalities - Output streaming - Uses: mcp__gemini__gemini_generate or direct Anthropic API 4. **folk-transcription** - Audio transcription (Whisper) - Real-time transcription, pause/resume - Speaker diarization - Uses: Web Speech API fallback + Whisper API Simplifications: - Use MCP tools directly instead of custom API clients - Simplify loading states to CSS classes - Remove complex React hooks, use async/await patterns ## Acceptance Criteria - [x] #1 folk-image-gen with fal.ai integration (API endpoint placeholder) - [x] #2 folk-video-gen with video generation (I2V and T2V modes) - [x] #3 folk-prompt with LLM chat interface - [x] #4 folk-transcription with Web Speech API ## Implementation Notes Created four AI integration shapes: - **lib/folk-image-gen.ts**: Image generation UI with prompt, style selector, loading states - **lib/folk-video-gen.ts**: Video generation with I2V/T2V mode tabs, image upload, duration control - **lib/folk-prompt.ts**: Chat interface with model selection, message history, markdown formatting - **lib/folk-transcription.ts**: Real-time transcription with Web Speech API, pause/resume, copy/clear All shapes call placeholder API endpoints (/api/image-gen, /api/video-gen, /api/prompt) that need to be implemented in the backend. The transcription component uses the browser's native Web Speech API. Integrated into canvas.html with toolbar buttons (Image, Video, AI, Transcribe).