Skip to content

Meetings and transcription

Control Center can record a call and hand you back a clean writeup — a summary, action items, and decisions — without any of the audio or transcript ever leaving your machine. Think Granola, but local-first and wired into the same workspace your agents live in.

A meeting is a recorded, transcribed session. It is not the same thing as a calendar event: an event is a scheduled commitment synced from Google, a meeting is something you actually recorded. The two can be linked — you can start a recording from an event — but they stay distinct records.

Every meeting is workspace-scoped, like everything else in Control Center. Meetings recorded in one workspace never surface in another.

Everything happens on-device:

  • Audio is captured locally and written to a local file.
  • Transcription runs through an on-device Whisper model — no audio is uploaded.
  • Speaker diarization runs locally with sherpa-onnx.

The only step that involves an agent is the summary, which runs over the already-transcribed text through your configured agent CLI.

A meeting records two audio channels at once:

  • You (“me”) — your microphone.
  • Them (“them”) — the system audio output, captured by a driver-free loopback:
    • macOS — Core Audio process/device taps
    • Windows — WASAPI loopback
    • Linux — a PipeWire / PulseAudio monitor

Because the two channels are captured separately, the transcript is speaker-attributed from the start: your words are tagged me, everyone on the call is tagged them.

To keep your own voice from being transcribed twice (once from the mic, once bleeding through the system output), Control Center applies echo cancellation — a signal-level WebRTC AEC pass when the platform supports it, and an always-on text-level echo filter as a cross-platform fallback.

While you record, audio is decoded in rolling windows (cut on a short trailing silence, or at a maximum window length) by a Whisper model running off the UI thread. Silent windows are skipped without decoding. Each window becomes a speaker-tagged transcript segment with millisecond offsets.

After you stop, diarization runs offline over the recording and splits the remote channel into individual speakers — Person 1, Person 2, and so on — which you can rename. The transcript is rendered as [mm:ss] SPEAKER: text lines.

When a recording stops, Control Center publishes a MeetingRecordingStopped domain event. A built-in pipeline template — meeting_summary — is triggered by that event. The recorder doesn’t wait on it; the meeting simply transitions through its status lifecycle:

recording → processing → done

The summary agent receives the title, your rough live notes, and the transcript, and returns structured JSON. The pipeline’s persist steps then write that JSON to discrete rows:

  • enhancedNotes and summary → the meeting’s notes
  • each action item → a MeetingActionItem row (content, owner, optional ticket link)
  • each decision → a MeetingDecision row

Action items and decisions are never parsed out of free-form markdown — only from the agent’s structured arrays. If a run produces no structured output, the persist steps are skipped and the raw transcript is kept as a fallback, so you never lose the record.

SurfaceWhat it shows
/meetingsThe list of meetings, with action-item and decision counts
/meetings/recordThe live recording HUD — your notes on one side, the streaming transcript on the other
/meetings/:meetingIdA meeting’s detail: Notes, Transcript, Action Items, and Decisions tabs

On-device capture is driver-free on all three desktop platforms (Core Audio taps on macOS, WASAPI on Windows, PipeWire on Linux). Whisper and diarization models are provided on-device; when a model isn’t available the feature degrades rather than failing the recording.