How it works
When you hold the hotkey, four stages run in sequence. Three of them stay on your Mac. The middle one — LLM cleanup — talks to a cloud provider you configure (or skip entirely). This page walks each stage and what gets persisted versus what doesn't.
Example flow
Audio
~3 s WAV
16 kHz mono
Transcript
"lets go to the beach this weekend"
Cleaned
"Let's go to the beach this weekend."
Pasted
into the focused app
AVAudioEngine taps the system microphone at the hardware sample rate and converts each buffer in-line to 16 kHz mono Int16 — the format the local ASR model expects. Audio is accumulated as raw samples in process memory; nothing is ever written to disk.
WAV bytes are POSTed to a sidecar process that runs Parakeet TDT v3 (CoreML) on the Apple Neural Engine. Typical latency is ~64 ms for a 5-second clip after the model is warm. The model resides at ~150 MB and downloads from Hugging Face on first launch.
The raw transcript streams to a configurable AI provider that lightly cleans the text — capitalization, punctuation, filler-word removal, common transcription errors, and spoken numbers turned into digits when context is technical. Cleaned words stream into the overlay as they arrive.
On accept (auto-timer or manual hotkey tap), the cleaned text pastes into whatever app was focused when you pressed the hotkey originally — not whatever happens to be focused at accept time. CGEventTap synthesizes the keystrokes; the trailing-space heuristic adds a space after the pasted text by default.
The loop
Stages 3 and 4 don't have to run once. While the overlay is open, each subsequent hotkey press re-runs stage 3 with a different system prompt — a refine prompt that takes the existing text plus your new utterance as an edit instruction and produces the smallest change that fully accomplishes it. Stage 4 only fires when you tap to accept.
It's a voice undo/redo loop. The visible text is the working state; each press replaces it with the LLM's edit. Tone, format, length, structure — anything you can describe in a sentence is a valid edit.
After stage 3 (initial cleanup)
"Yeah, we should probably move the meeting to next Thursday because too many people are out this week."
You press ⌥ and say
"make it more professional"
Refine pass — same LLM, different prompt
"I'd like to move our meeting to next Thursday — too many team members are out this week."
You press ⌥ and say
"shorter, end with a question mark"
"Could we move our meeting to next Thursday?"
You press ⌥ and say
"add 'attendance has been thin lately' as the reason"
"Could we move our meeting to next Thursday? Attendance has been thin lately."
↳ tap ⌥ — stage 4 runs, text pastes
Implementation note: the refine prompt lives in SystemPrompts.swift alongside the cleanup prompt. Both run through the same streaming LLM provider; the only differences are the system prompt text and the user-message shape (refine includes the prior text + edit instruction in a single user message).
Privacy posture
~/.parleq/config.json — settings + custom dictionary, user-authored.~/.parleq/usage.jsonl — one line per LLM call: timestamp, model, token counts, latency. Metadata only.~/.parleq/app.log — diagnostic log. ASR/LLM metrics are length-only ("post-utterance 87 ms, 142 chars / 28 words"); never the transcript itself./tmp/parleq-*.wav.~/.aws/sso/cache/, ~/.config/gcloud/, and ~/.azure/. Parleq stores no long-lived session tokens directly.Read the full enterprise-review packet in SECURITY_REVIEW.md.