Native Android app for voice-to-text with on-device Whisper transcription

Go to file

Jeff Emmett c459d58563 fix: build configuration and resource fixes for successful APK build - Fixed sherpa-onnx dependency to use Maven Central package - Fixed VoiceIntent enum name conflict with android.content.Intent - Added AndroidX configuration in gradle.properties - Added gradle wrapper jar and script - Added app launcher icons (adaptive icons) - Fixed drawable tint references - Added colors.xml resource file - Downloaded Whisper tiny.en model tokens.txt - Updated download-models.sh to download tar.bz2 package Build now produces 141MB debug APK with sherpa-onnx and Whisper model. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>		2025-12-07 12:52:18 -08:00
app	fix: build configuration and resource fixes for successful APK build	2025-12-07 12:52:18 -08:00
backlog	fix: build configuration and resource fixes for successful APK build	2025-12-07 12:52:18 -08:00
gradle	fix: build configuration and resource fixes for successful APK build	2025-12-07 12:52:18 -08:00
.gitignore	Initial commit: Native Android voice transcription app	2025-12-06 22:46:45 -08:00
README.md	Initial commit: Native Android voice transcription app	2025-12-06 22:46:45 -08:00
build.gradle.kts	Initial commit: Native Android voice transcription app	2025-12-06 22:46:45 -08:00
download-models.sh	fix: build configuration and resource fixes for successful APK build	2025-12-07 12:52:18 -08:00
gradle.properties	fix: build configuration and resource fixes for successful APK build	2025-12-07 12:52:18 -08:00
gradlew	fix: build configuration and resource fixes for successful APK build	2025-12-07 12:52:18 -08:00
settings.gradle.kts	Initial commit: Native Android voice transcription app	2025-12-06 22:46:45 -08:00

README.md

Voice Command - Native Android App

A fully integrated Android app for voice-to-text transcription with on-device Whisper processing. No server required, no Termux, no additional apps needed.

Features

100% On-Device Transcription - Uses sherpa-onnx with Whisper models
Privacy-First - All processing happens locally, no data leaves your device
Multiple Trigger Methods:
- Floating button overlay (always accessible)
- Volume button combo (press both volumes)
- Quick Settings tile (notification shade)
Smart Routing:
- Copy to clipboard
- Share via any app
- Save as markdown note
- Create task (Backlog.md compatible)
Intent Detection - Automatically suggests best action based on content

Requirements

Android 10 (API 29) or higher
~100-250MB storage for Whisper model
Microphone permission

Installation

From APK (Recommended)

Download the latest APK from releases
Enable "Install from unknown sources" if prompted
Install and open Voice Command
Grant microphone permission
Wait for model download (~40-250MB depending on selected model)

Build from Source

# Clone the repository
git clone https://gitea.jeffemmett.com/jeffemmett/voice-command.git
cd voice-command/android-native

# Build debug APK
./gradlew assembleDebug

# Build release APK (requires signing config)
./gradlew assembleRelease

The APK will be in app/build/outputs/apk/

Usage

Quick Start

Open the app and grant microphone permission
Tap the big mic button to start recording
Speak your note or task
Tap again to stop - transcription happens automatically
Choose an action from the menu

Trigger Methods

Floating Button

Enable in Settings
Drag to reposition
Tap to start/stop recording
Works over any app

Volume Buttons

Enable Accessibility Service in Settings
Press Volume Up + Volume Down simultaneously
Vibration confirms recording start/stop

Quick Settings Tile

Swipe down notification shade
Add "Voice Note" tile
Tap tile to toggle recording

Models

Model	Size	Languages	Quality
Tiny English	~40MB	English only	Good for quick notes
Base English	~75MB	English only	Better accuracy
Small English	~250MB	English only	Best accuracy
Tiny	~40MB	Multilingual	Basic quality
Base	~75MB	Multilingual	Good quality
Small	~250MB	Multilingual	Best quality

Architecture

┌─────────────────────────────────────────────────────┐
│                   Voice Command App                  │
├─────────────────────────────────────────────────────┤
│  UI Layer (Jetpack Compose)                         │
│  ├── MainActivity (main interface)                  │
│  ├── RecordingScreen (recording controls)           │
│  └── TranscriptionResultActivity (result dialog)    │
├─────────────────────────────────────────────────────┤
│  Service Layer                                      │
│  ├── FloatingButtonService (overlay)                │
│  ├── VolumeButtonAccessibilityService (vol combo)   │
│  └── VoiceCommandTileService (Quick Settings)       │
├─────────────────────────────────────────────────────┤
│  Core Layer                                         │
│  ├── AudioRecorder (16kHz PCM capture)              │
│  ├── SherpaTranscriptionEngine (Whisper wrapper)    │
│  └── ActionRouter (clipboard, files, share)         │
├─────────────────────────────────────────────────────┤
│  Native Layer (sherpa-onnx)                         │
│  └── Whisper ONNX models + ONNX Runtime             │
└─────────────────────────────────────────────────────┘

Permissions

Permission	Purpose
`RECORD_AUDIO`	Voice recording
`SYSTEM_ALERT_WINDOW`	Floating button overlay
`FOREGROUND_SERVICE`	Background recording
`POST_NOTIFICATIONS`	Service notifications
`VIBRATE`	Recording feedback

Output Formats

Notes (Markdown)

# Voice Note Title

Your transcribed text here...

---
Created: 2025-12-06 14:30
Source: voice

Tasks (Backlog.md Compatible)

---
title: Task Title
status: To Do
priority: medium
created: 2025-12-06T14:30:00
source: voice
---

# Task Title

Your transcribed text here...

Troubleshooting

Model won't load

Ensure sufficient storage (~250MB free)
Check internet connection for initial download
Try a smaller model (Tiny instead of Small)

Recording not working

Check microphone permission is granted
Ensure no other app is using microphone
Try restarting the app

Volume buttons not detected

Enable Accessibility Service in Android Settings
Grant all requested permissions
Some custom ROMs may block this feature

Floating button not appearing

Enable "Display over other apps" permission
Check notification for "Floating Button Active"
Some launchers may hide overlays

Privacy

All transcription happens on-device
No audio or text is sent to any server
No analytics or tracking
Notes/tasks saved only to local storage

Credits

sherpa-onnx - On-device speech recognition
OpenAI Whisper - Original Whisper model
Jetpack Compose - Modern Android UI

License

MIT