// exploration · voice-coach · open question

What can voice do that a chat box can't?

An exploration of spoken interaction with AI: when talking is genuinely faster, clearer or more accessible than typing — and how to build for it, instead of bolting a microphone onto a chatbot.

The question

Most "voice AI" is a chatbot you talk at: speech is just a transcription front-end for the same old text turn. But speech is a different medium — hands-free, serial, with timing and tone. The open question isn't "can we add voice", it's where voice is actually the right interface, and what makes it good there.

What it explores

  • Turn-taking & barge-in. When may the assistant finish a sentence, and when must it stop the instant a human speaks?
  • Latency budget. How fast must a reply land before a conversation feels broken — and what (streaming, backchannels) buys time?
  • When voice wins. Hands busy, eyes elsewhere, on the move, procedural steps — versus where text stays clearly better (lists, code, anything to re-read).
  • Accessibility. Speech as a primary way in, not a gimmick — for people and situations where typing is the barrier.
  • Holding the thread. How a voice keeps context across turns without making you repeat yourself.

The approach

Build the smallest real spoken interaction for one genuine task, measure where it helps and where it grates, and let that decide what voice-coach becomes. Research first, product second.

Built with (planned)

Workers AI / speech · streaming audio · TypeScript

If a spoken flow ever needs to take a real, irreversible action, it can hand that off to mcp-approval — but voice-coach is about the interaction itself, not the approval. That's a separate question.

Status & roadmap

Where it stands: concept — a research question with a defined shape, no production code yet. It's on the hub because the question is worth answering, not to pad the list.

Next: a throwaway prototype of one spoken flow to feel out turn-taking and latency for real.