rspace-online/backlog/tasks/task-4 - Phase-3-AI-Integra...

2.2 KiB

id title status assignee created_date labels dependencies priority
task-4 Phase 3: AI Integration Shapes Done
2026-01-02 15:54
migration
shapes
ai
medium

Description

Port AI-powered shapes using existing MCP servers and APIs:

  1. folk-image-gen - Image generation (fal.ai Flux)

    • Prompt input, image history thread
    • Loading states, error handling
    • Uses: mcp__fal-ai__fal_generate_image
  2. folk-video-gen - Video generation (WAN 2.1)

    • Image-to-video, text-to-video
    • Duration control, queue polling
    • Uses: mcp__fal-ai__fal_generate_video
  3. folk-prompt - LLM prompt executor

    • Agent binding, multiple personalities
    • Output streaming
    • Uses: mcp__gemini__gemini_generate or direct Anthropic API
  4. folk-transcription - Audio transcription (Whisper)

    • Real-time transcription, pause/resume
    • Speaker diarization
    • Uses: Web Speech API fallback + Whisper API

Simplifications:

  • Use MCP tools directly instead of custom API clients
  • Simplify loading states to CSS classes
  • Remove complex React hooks, use async/await patterns

Acceptance Criteria

  • #1 folk-image-gen with fal.ai integration (API endpoint placeholder)
  • #2 folk-video-gen with video generation (I2V and T2V modes)
  • #3 folk-prompt with LLM chat interface
  • #4 folk-transcription with Web Speech API

Implementation Notes

Created four AI integration shapes:

  • lib/folk-image-gen.ts: Image generation UI with prompt, style selector, loading states
  • lib/folk-video-gen.ts: Video generation with I2V/T2V mode tabs, image upload, duration control
  • lib/folk-prompt.ts: Chat interface with model selection, message history, markdown formatting
  • lib/folk-transcription.ts: Real-time transcription with Web Speech API, pause/resume, copy/clear

All shapes call placeholder API endpoints (/api/image-gen, /api/video-gen, /api/prompt) that need to be implemented in the backend. The transcription component uses the browser's native Web Speech API.

Integrated into canvas.html with toolbar buttons (Image, Video, AI, Transcribe).