Loading...
VoiceLayer adds bidirectional voice to AI coding agents via the Model Context Protocol. It provides 5 voice modes (announce, brief, consult, converse, think) for different interaction patterns — from fire-and-forget status updates to full voice Q&A with local speech-to-text. Uses edge-tts for neural text-to-speech and whisper.cpp for local transcription (~300ms on Apple Silicon). Session booking prevents mic conflicts between parallel Claude sessions. Everything runs locally with zero cloud APIs.
voice_speak and voice_ask cover the full range: fire-and-forget TTS to interactive Q&A, with automatic mode detection.
whisper.cpp transcription at ~300ms. No cloud APIs, no data leaving your machine.
Speech
User voice input
STT
whisper.cpp ~300ms
Voice Tools
2 tools, auto detection
Session Mgr
Lockfile mutex
TTS Output
edge-tts neural
Speech
User voice input
STT
whisper.cpp ~300ms
Voice Tools
2 tools, auto detection
Session Mgr
Lockfile mutex
TTS Output
edge-tts neural
bunx voicelayer-mcpTyping every interaction with AI coding agents felt wrong. QA testing, code review, and design discussions should be conversations — not typing marathons. Existing voice platforms charge per-minute and send data to the cloud.
Designed 5 distinct modes for different moments: announce (fire-and-forget status), brief (agent reads back findings), consult (checkpoint before action), converse (full bidirectional Q&A), and think (silent notes to markdown).
Speech-to-text runs locally using whisper.cpp with CoreML/Metal acceleration — transcription in ~200-400ms on Apple Silicon. No cloud APIs, no per-minute billing, no data leaving the machine.
Lockfile-based mutex prevents mic conflicts. Only one voice session at a time — other Claude sessions see "line busy" and fall back to text. Stale locks from dead processes are auto-cleaned.
75 tests with 178 assertions. 7 MCP tools (5 modes + 2 aliases). Full CI pipeline, branch protection, TypeScript strict mode. Docs site live at etanhey.github.io/voicelayer.
8 MCP tools, 335K+ indexed chunks, hybrid semantic+keyword search, knowledge graph with entity resolution, local LLM enrichment via Groq/MLX/Ollama. pip install brainlayer.
Autonomous AI agent ecosystem — 11 packages + 4 external repos, 7 domain agents, 335K+ memory chunks, multi-LLM routing, Telegram integration. 912 tests.
Lockfile-based mic mutex. Other sessions see "line busy". No conflicts.
Neural-quality text-to-speech. Free, local, multiple voices. User-controlled stop.