VoiceLayer — Voice I/O for AI Coding Agents

VoiceLayer adds bidirectional voice to AI coding agents via the Model Context Protocol. It provides 5 voice modes (announce, brief, consult, converse, think) for different interaction patterns — from fire-and-forget status updates to full voice Q&A with local speech-to-text. Uses edge-tts for neural text-to-speech and whisper.cpp for local transcription (~300ms on Apple Silicon). Session booking prevents mic conflicts between parallel Claude sessions. Everything runs locally with zero cloud APIs.

Source Docs Live

voicelayer

$ voicelayer announce "Build complete. 0 errors."

VoiceLayer · announce mode (fire-and-forget)

♫ Speaking via edge-tts (en-US-GuyNeural)

✓ Audio playback complete (1.2s)

MCP tools

STT backends

~0ms

STT latency

Tests passing

Built with

Key features

Bi-directional TTS + STT

voice_speak and voice_ask cover the full range: fire-and-forget TTS to interactive Q&A, with automatic mode detection. VoiceBar daemon (renamed from FlowBar) handles both directions.

Local STT

whisper.cpp and Wispr Flow backends at ~300ms. No cloud APIs, no data leaving your machine.

How it works

Speech

User voice input

STT

whisper.cpp ~300ms

Voice Tools

2 tools, auto detection

Session Mgr

Lockfile mutex

TTS Output

edge-tts neural

Speech

User voice input

STT

whisper.cpp ~300ms

Voice Tools

2 tools, auto detection

Session Mgr

Lockfile mutex

TTS Output

edge-tts neural

Get started

bunx voicelayer-mcp

The journey

The Problem

Typing every interaction with AI coding agents felt wrong. QA testing, code review, and design discussions should be conversations — not typing marathons. Existing voice platforms charge per-minute and send data to the cloud.

5 Voice Modes

Designed 5 distinct modes for different moments: announce (fire-and-forget status), brief (agent reads back findings), consult (checkpoint before action), converse (full bidirectional Q&A), and think (silent notes to markdown).

Local STT via whisper.cpp

Speech-to-text runs locally using whisper.cpp with CoreML/Metal acceleration — transcription in ~200-400ms on Apple Silicon. No cloud APIs, no per-minute billing, no data leaving the machine.

Session Booking

Lockfile-based mutex prevents mic conflicts. Only one voice session at a time — other Claude sessions see "line busy" and fall back to text. Stale locks from dead processes are auto-cleaned.

Production Ready

75 tests with 178 assertions. 7 MCP tools (5 modes + 2 aliases). Full CI pipeline, branch protection, TypeScript strict mode. Docs site live at etanhey.github.io/voicelayer.

Part of the ecosystem

BrainLayer — Persistent Memory for AI Agents

12 MCP tools, 335K+ indexed chunks, hybrid semantic+keyword search with MMR dedup, knowledge graph with entity resolution, Gemini enrichment. 1,848 Python + 54 Swift tests. pip install brainlayer.

Golems - Autonomous AI Agents

Autonomous AI agent ecosystem — 12 Bun workspace packages, 7 domain agents, 60+ AI-agnostic skills, multi-LLM routing, Night Shift autonomous coding at 4am. 1,073 tests.