Skip to content
Cleft
Download

Founder dev log

Recording that survives the real world

The hard part of a voice app is capturing clean audio on real devices that connect late, switch mid-sentence, and get interrupted by calls. Here is the work that went into making recording survive that.

Filed
Author
Jonathan Cosgrove
Read
2 min
Updated

People assume the hard part of a voice notes app is the transcription. In practice the hard part is getting clean audio off a real device, in a real pocket, on a real commute, every time.

A lot of our work between 1.10.2 and 1.12.2 went into one unglamorous goal: you press record, you talk, and the whole thing is captured, with no silent gaps, no empty notes, and no crash when you stop.

The audio session keeps changing underneath you

The audio session is not really yours. iOS hands you an AVAudioSession and can reshape it at any moment:

  • Your AirPods connect a half second after you tap record.
  • You start on the phone speaker and plug in a USB mic halfway through.
  • A call comes in and the system takes the microphone away.
  • You pause, take a breath, and resume.

Each of these used to be a way to lose part of a recording, or to get silence where your voice should be. There was no single clever fix. Each transition had to be handled on its own: when the device changes, keep recording and follow it; when you pause and resume, make sure the resumed audio actually has sound in it; when the session is interrupted while you are stopping, save the note instead of letting the app fall over.

By 1.12.2 the result was simple to state: connect AirPods, plug in a mic, switch a Bluetooth headset, and your recording keeps going without missing a word.

Asking for the mic once

The other half of "just works" is permission timing. Ask for the microphone at the wrong moment and you teach people to tap No. So we ask once, at the point it makes sense, and then stay out of the way. After that, recording should never make you think about permission again.

Protecting the recording first

When something does go wrong, the recording is the thing we protect. Stop a recording while a call interrupts the audio and the note is preserved instead of lost to a crash. When the device runs short on memory during transcription, the work falls back from the GPU to the CPU instead of taking the app down.

Losing a recording is the failure we are least willing to accept, so wherever there is a tradeoff we take the slower, safer path.