rspace-online/backlog/tasks/task-4 - Phase-3-AI-Integra...

---
id: task-4
title: 'Phase 3: AI Integration Shapes'
status: Done
assignee: []
created_date: '2026-01-02 15:54'
labels:
  - migration
  - shapes
  - ai
dependencies: []
priority: medium
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
Port AI-powered shapes using existing MCP servers and APIs:

1. **folk-image-gen** - Image generation (fal.ai Flux)
   - Prompt input, image history thread
   - Loading states, error handling
   - Uses: mcp__fal-ai__fal_generate_image

2. **folk-video-gen** - Video generation (WAN 2.1)
   - Image-to-video, text-to-video
   - Duration control, queue polling
   - Uses: mcp__fal-ai__fal_generate_video

3. **folk-prompt** - LLM prompt executor
   - Agent binding, multiple personalities
   - Output streaming
   - Uses: mcp__gemini__gemini_generate or direct Anthropic API

4. **folk-transcription** - Audio transcription (Whisper)
   - Real-time transcription, pause/resume
   - Speaker diarization
   - Uses: Web Speech API fallback + Whisper API

Simplifications:
- Use MCP tools directly instead of custom API clients
- Simplify loading states to CSS classes
- Remove complex React hooks, use async/await patterns
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 folk-image-gen with fal.ai integration (API endpoint placeholder)
- [x] #2 folk-video-gen with video generation (I2V and T2V modes)
- [x] #3 folk-prompt with LLM chat interface
- [x] #4 folk-transcription with Web Speech API
<!-- AC:END -->

## Implementation Notes

Created four AI integration shapes:
- **lib/folk-image-gen.ts**: Image generation UI with prompt, style selector, loading states
- **lib/folk-video-gen.ts**: Video generation with I2V/T2V mode tabs, image upload, duration control
- **lib/folk-prompt.ts**: Chat interface with model selection, message history, markdown formatting
- **lib/folk-transcription.ts**: Real-time transcription with Web Speech API, pause/resume, copy/clear

All shapes call placeholder API endpoints (/api/image-gen, /api/video-gen, /api/prompt) that need to be implemented in the backend. The transcription component uses the browser's native Web Speech API.

Integrated into canvas.html with toolbar buttons (Image, Video, AI, Transcribe).