Native Android app for voice-to-text with on-device Whisper transcription
Go to file
Jeff Emmett c459d58563 fix: build configuration and resource fixes for successful APK build
- Fixed sherpa-onnx dependency to use Maven Central package
- Fixed VoiceIntent enum name conflict with android.content.Intent
- Added AndroidX configuration in gradle.properties
- Added gradle wrapper jar and script
- Added app launcher icons (adaptive icons)
- Fixed drawable tint references
- Added colors.xml resource file
- Downloaded Whisper tiny.en model tokens.txt
- Updated download-models.sh to download tar.bz2 package

Build now produces 141MB debug APK with sherpa-onnx and Whisper model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 12:52:18 -08:00
app fix: build configuration and resource fixes for successful APK build 2025-12-07 12:52:18 -08:00
backlog fix: build configuration and resource fixes for successful APK build 2025-12-07 12:52:18 -08:00
gradle fix: build configuration and resource fixes for successful APK build 2025-12-07 12:52:18 -08:00
.gitignore Initial commit: Native Android voice transcription app 2025-12-06 22:46:45 -08:00
README.md Initial commit: Native Android voice transcription app 2025-12-06 22:46:45 -08:00
build.gradle.kts Initial commit: Native Android voice transcription app 2025-12-06 22:46:45 -08:00
download-models.sh fix: build configuration and resource fixes for successful APK build 2025-12-07 12:52:18 -08:00
gradle.properties fix: build configuration and resource fixes for successful APK build 2025-12-07 12:52:18 -08:00
gradlew fix: build configuration and resource fixes for successful APK build 2025-12-07 12:52:18 -08:00
settings.gradle.kts Initial commit: Native Android voice transcription app 2025-12-06 22:46:45 -08:00

README.md

Voice Command - Native Android App

A fully integrated Android app for voice-to-text transcription with on-device Whisper processing. No server required, no Termux, no additional apps needed.

Features

  • 100% On-Device Transcription - Uses sherpa-onnx with Whisper models
  • Privacy-First - All processing happens locally, no data leaves your device
  • Multiple Trigger Methods:
    • Floating button overlay (always accessible)
    • Volume button combo (press both volumes)
    • Quick Settings tile (notification shade)
  • Smart Routing:
    • Copy to clipboard
    • Share via any app
    • Save as markdown note
    • Create task (Backlog.md compatible)
  • Intent Detection - Automatically suggests best action based on content

Requirements

  • Android 10 (API 29) or higher
  • ~100-250MB storage for Whisper model
  • Microphone permission

Installation

  1. Download the latest APK from releases
  2. Enable "Install from unknown sources" if prompted
  3. Install and open Voice Command
  4. Grant microphone permission
  5. Wait for model download (~40-250MB depending on selected model)

Build from Source

# Clone the repository
git clone https://gitea.jeffemmett.com/jeffemmett/voice-command.git
cd voice-command/android-native

# Build debug APK
./gradlew assembleDebug

# Build release APK (requires signing config)
./gradlew assembleRelease

The APK will be in app/build/outputs/apk/

Usage

Quick Start

  1. Open the app and grant microphone permission
  2. Tap the big mic button to start recording
  3. Speak your note or task
  4. Tap again to stop - transcription happens automatically
  5. Choose an action from the menu

Trigger Methods

Floating Button

  • Enable in Settings
  • Drag to reposition
  • Tap to start/stop recording
  • Works over any app

Volume Buttons

  • Enable Accessibility Service in Settings
  • Press Volume Up + Volume Down simultaneously
  • Vibration confirms recording start/stop

Quick Settings Tile

  • Swipe down notification shade
  • Add "Voice Note" tile
  • Tap tile to toggle recording

Models

Model Size Languages Quality
Tiny English ~40MB English only Good for quick notes
Base English ~75MB English only Better accuracy
Small English ~250MB English only Best accuracy
Tiny ~40MB Multilingual Basic quality
Base ~75MB Multilingual Good quality
Small ~250MB Multilingual Best quality

Architecture

┌─────────────────────────────────────────────────────┐
│                   Voice Command App                  │
├─────────────────────────────────────────────────────┤
│  UI Layer (Jetpack Compose)                         │
│  ├── MainActivity (main interface)                  │
│  ├── RecordingScreen (recording controls)           │
│  └── TranscriptionResultActivity (result dialog)    │
├─────────────────────────────────────────────────────┤
│  Service Layer                                      │
│  ├── FloatingButtonService (overlay)                │
│  ├── VolumeButtonAccessibilityService (vol combo)   │
│  └── VoiceCommandTileService (Quick Settings)       │
├─────────────────────────────────────────────────────┤
│  Core Layer                                         │
│  ├── AudioRecorder (16kHz PCM capture)              │
│  ├── SherpaTranscriptionEngine (Whisper wrapper)    │
│  └── ActionRouter (clipboard, files, share)         │
├─────────────────────────────────────────────────────┤
│  Native Layer (sherpa-onnx)                         │
│  └── Whisper ONNX models + ONNX Runtime             │
└─────────────────────────────────────────────────────┘

Permissions

Permission Purpose
RECORD_AUDIO Voice recording
SYSTEM_ALERT_WINDOW Floating button overlay
FOREGROUND_SERVICE Background recording
POST_NOTIFICATIONS Service notifications
VIBRATE Recording feedback

Output Formats

Notes (Markdown)

# Voice Note Title

Your transcribed text here...

---
Created: 2025-12-06 14:30
Source: voice

Tasks (Backlog.md Compatible)

---
title: Task Title
status: To Do
priority: medium
created: 2025-12-06T14:30:00
source: voice
---

# Task Title

Your transcribed text here...

Troubleshooting

Model won't load

  • Ensure sufficient storage (~250MB free)
  • Check internet connection for initial download
  • Try a smaller model (Tiny instead of Small)

Recording not working

  • Check microphone permission is granted
  • Ensure no other app is using microphone
  • Try restarting the app

Volume buttons not detected

  • Enable Accessibility Service in Android Settings
  • Grant all requested permissions
  • Some custom ROMs may block this feature

Floating button not appearing

  • Enable "Display over other apps" permission
  • Check notification for "Floating Button Active"
  • Some launchers may hide overlays

Privacy

  • All transcription happens on-device
  • No audio or text is sent to any server
  • No analytics or tracking
  • Notes/tasks saved only to local storage

Credits

License

MIT