193 lines
6.1 KiB
Markdown
193 lines
6.1 KiB
Markdown
# Voice Command - Native Android App
|
|
|
|
A fully integrated Android app for voice-to-text transcription with on-device Whisper processing. No server required, no Termux, no additional apps needed.
|
|
|
|
## Features
|
|
|
|
- **100% On-Device Transcription** - Uses sherpa-onnx with Whisper models
|
|
- **Privacy-First** - All processing happens locally, no data leaves your device
|
|
- **Multiple Trigger Methods**:
|
|
- Floating button overlay (always accessible)
|
|
- Volume button combo (press both volumes)
|
|
- Quick Settings tile (notification shade)
|
|
- **Smart Routing**:
|
|
- Copy to clipboard
|
|
- Share via any app
|
|
- Save as markdown note
|
|
- Create task (Backlog.md compatible)
|
|
- **Intent Detection** - Automatically suggests best action based on content
|
|
|
|
## Requirements
|
|
|
|
- Android 10 (API 29) or higher
|
|
- ~100-250MB storage for Whisper model
|
|
- Microphone permission
|
|
|
|
## Installation
|
|
|
|
### From APK (Recommended)
|
|
|
|
1. Download the latest APK from releases
|
|
2. Enable "Install from unknown sources" if prompted
|
|
3. Install and open Voice Command
|
|
4. Grant microphone permission
|
|
5. Wait for model download (~40-250MB depending on selected model)
|
|
|
|
### Build from Source
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://gitea.jeffemmett.com/jeffemmett/voice-command.git
|
|
cd voice-command/android-native
|
|
|
|
# Build debug APK
|
|
./gradlew assembleDebug
|
|
|
|
# Build release APK (requires signing config)
|
|
./gradlew assembleRelease
|
|
```
|
|
|
|
The APK will be in `app/build/outputs/apk/`
|
|
|
|
## Usage
|
|
|
|
### Quick Start
|
|
|
|
1. **Open the app** and grant microphone permission
|
|
2. **Tap the big mic button** to start recording
|
|
3. **Speak your note or task**
|
|
4. **Tap again to stop** - transcription happens automatically
|
|
5. **Choose an action** from the menu
|
|
|
|
### Trigger Methods
|
|
|
|
#### Floating Button
|
|
- Enable in Settings
|
|
- Drag to reposition
|
|
- Tap to start/stop recording
|
|
- Works over any app
|
|
|
|
#### Volume Buttons
|
|
- Enable Accessibility Service in Settings
|
|
- Press Volume Up + Volume Down simultaneously
|
|
- Vibration confirms recording start/stop
|
|
|
|
#### Quick Settings Tile
|
|
- Swipe down notification shade
|
|
- Add "Voice Note" tile
|
|
- Tap tile to toggle recording
|
|
|
|
## Models
|
|
|
|
| Model | Size | Languages | Quality |
|
|
|-------|------|-----------|---------|
|
|
| Tiny English | ~40MB | English only | Good for quick notes |
|
|
| Base English | ~75MB | English only | Better accuracy |
|
|
| Small English | ~250MB | English only | Best accuracy |
|
|
| Tiny | ~40MB | Multilingual | Basic quality |
|
|
| Base | ~75MB | Multilingual | Good quality |
|
|
| Small | ~250MB | Multilingual | Best quality |
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ Voice Command App │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ UI Layer (Jetpack Compose) │
|
|
│ ├── MainActivity (main interface) │
|
|
│ ├── RecordingScreen (recording controls) │
|
|
│ └── TranscriptionResultActivity (result dialog) │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ Service Layer │
|
|
│ ├── FloatingButtonService (overlay) │
|
|
│ ├── VolumeButtonAccessibilityService (vol combo) │
|
|
│ └── VoiceCommandTileService (Quick Settings) │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ Core Layer │
|
|
│ ├── AudioRecorder (16kHz PCM capture) │
|
|
│ ├── SherpaTranscriptionEngine (Whisper wrapper) │
|
|
│ └── ActionRouter (clipboard, files, share) │
|
|
├─────────────────────────────────────────────────────┤
|
|
│ Native Layer (sherpa-onnx) │
|
|
│ └── Whisper ONNX models + ONNX Runtime │
|
|
└─────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Permissions
|
|
|
|
| Permission | Purpose |
|
|
|------------|---------|
|
|
| `RECORD_AUDIO` | Voice recording |
|
|
| `SYSTEM_ALERT_WINDOW` | Floating button overlay |
|
|
| `FOREGROUND_SERVICE` | Background recording |
|
|
| `POST_NOTIFICATIONS` | Service notifications |
|
|
| `VIBRATE` | Recording feedback |
|
|
|
|
## Output Formats
|
|
|
|
### Notes (Markdown)
|
|
```markdown
|
|
# Voice Note Title
|
|
|
|
Your transcribed text here...
|
|
|
|
---
|
|
Created: 2025-12-06 14:30
|
|
Source: voice
|
|
```
|
|
|
|
### Tasks (Backlog.md Compatible)
|
|
```markdown
|
|
---
|
|
title: Task Title
|
|
status: To Do
|
|
priority: medium
|
|
created: 2025-12-06T14:30:00
|
|
source: voice
|
|
---
|
|
|
|
# Task Title
|
|
|
|
Your transcribed text here...
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Model won't load
|
|
- Ensure sufficient storage (~250MB free)
|
|
- Check internet connection for initial download
|
|
- Try a smaller model (Tiny instead of Small)
|
|
|
|
### Recording not working
|
|
- Check microphone permission is granted
|
|
- Ensure no other app is using microphone
|
|
- Try restarting the app
|
|
|
|
### Volume buttons not detected
|
|
- Enable Accessibility Service in Android Settings
|
|
- Grant all requested permissions
|
|
- Some custom ROMs may block this feature
|
|
|
|
### Floating button not appearing
|
|
- Enable "Display over other apps" permission
|
|
- Check notification for "Floating Button Active"
|
|
- Some launchers may hide overlays
|
|
|
|
## Privacy
|
|
|
|
- **All transcription happens on-device**
|
|
- No audio or text is sent to any server
|
|
- No analytics or tracking
|
|
- Notes/tasks saved only to local storage
|
|
|
|
## Credits
|
|
|
|
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) - On-device speech recognition
|
|
- [OpenAI Whisper](https://openai.com/research/whisper) - Original Whisper model
|
|
- [Jetpack Compose](https://developer.android.com/compose) - Modern Android UI
|
|
|
|
## License
|
|
|
|
MIT
|