Azure OpenAI setup

Azure OpenAI

Azure OpenAI is Microsoft's hosted version of OpenAI's GPT-4o family and the newer gpt-5 reasoning models. The models are the same ones OpenAI ships, but the calls run through Microsoft's infrastructure on your organization's Azure subscription — billed alongside other Azure services and governed by Microsoft's enterprise contracts and data privacy commitments rather than OpenAI's consumer terms. If your team already uses Azure for other workloads, an Azure OpenAI resource generally slips through procurement and IT review the same way any other Azure resource would.

At a glance

Auth modes (two): Resource API key (simpler — paste a key from the portal), or Microsoft Entra ID (uses az login — no key to paste, refreshes automatically).
Model family setting: Standard (gpt-4o, gpt-4) or Reasoning (gpt-5, o-series). Picked once in Settings to tell Parleq which request shape to send — Azure routes by deployment name, so the model itself is configured in the Azure portal, not Parleq.
Region: Model availability varies by region. eastus2, swedencentral, and westus3 tend to carry the broadest selection.
IAM role for Entra ID auth: Cognitive Services OpenAI User on the resource (data-plane role — Owner / Contributor on the subscription is not sufficient).
Routing quirk: Azure routes by deployment name, not model name. Pick a deployment name when you deploy the model in the portal; that's what Parleq calls.

Before you start

1.

Create an Azure OpenAI (or AI Foundry) resource.

Go to Azure portal → Create an Azure OpenAI resource. Pick a subscription, resource group, and a region — model availability varies by region, so check the model availability matrix first. eastus2, swedencentral, and westus3 tend to carry the broadest selection.

New resources use the AI Foundry hostname pattern *.cognitiveservices.azure.com; older Azure-OpenAI-specific resources use *.openai.azure.com. Parleq accepts either — see the Resource field section below.
2.

Deploy a model under the resource.

In Azure AI Foundry (or "Azure OpenAI Studio" on older resources): Deployments → Deploy model. Recommended starting point is gpt-4o-mini for low latency and predictable behavior; gpt-5-mini works too if you have access to the reasoning family in your subscription.

The deployment name is what Parleq calls. Azure's API routes requests by deployment name, not model name. You'll typically name the deployment after the model (e.g. gpt-4o-mini), but anything works — just paste the deployment name into Parleq's Deployment field.

New subscriptions sometimes have zero quota for the model you want — the deployment dialog will say "no capacity in this region." Either pick a different region (revisit step 1), or request a quota increase from Quotas in the Azure portal.

Mode 1: Resource API key

Simplest path. Each Azure OpenAI resource exposes two interchangeable API keys (Key 1 and Key 2 — for rotation). Parleq sends whichever you pasted in the api-key header.

Setup:

On your resource's portal page → Keys and Endpoint. Copy Key 1 and the Endpoint.
In Parleq's Setup Wizard, pick Azure OpenAI, leave Auth mode on API key, fill in:
- Resource: see the next section.
- Deployment: the deployment name you picked in step 2 above.
- API Key: paste Key 1.
Continue → Finish & Restart.

Mode 2: Microsoft Entra ID (az login)

Parleq mints a short-lived OAuth bearer token from your az login session and sends it as Authorization: Bearer <token>. No API key to paste, no secrets in the Keychain — auth flows entirely through your existing Azure CLI session and refreshes via az account get-access-token.

Setup:

# Install the Azure CLI if you haven't:
brew install azure-cli

# Sign in (opens browser):
az login

# If you have multiple subscriptions, set the right one as default:
az account set --subscription "My Team Subscription"

Critical IAM step. Owner / Contributor on the subscription is not enough — those are control-plane roles, and Azure OpenAI requires the data-plane role Cognitive Services OpenAI User (a97b65f3-24c7-4388-baec-2e87135dc908) on the resource itself.

Grant it: Resource → Access control (IAM) → Add → Add role assignment → Cognitive Services OpenAI User → assign to your user. Wait ~1 minute for the assignment to propagate before testing.

Then in Parleq's Setup Wizard, pick Azure OpenAI, switch Auth mode to Microsoft Entra ID, fill in Resource + Deployment, then Finish & Restart. No API key field appears in this mode.

Parleq augments PATH at process launch to include /opt/homebrew/bin and /usr/local/bin, so a Homebrew-installed az is reachable even when launched from Spotlight or as a login item.

The Resource field

Parleq accepts three input shapes in the Resource field, so you can paste whatever the Azure portal shows you:

Bare resource name — my-openai. Parleq builds the URL as my-openai.openai.azure.com. Use this for classic Azure OpenAI resources.
Full hostname — my-openai.cognitiveservices.azure.com or my-openai.services.ai.azure.com. Use this for newer AI Foundry resources where the hostname uses a different domain.
Full URL — https://my-openai.cognitiveservices.azure.com/openai/responses?api-version=.... Parleq strips the scheme and path and uses just the hostname. Convenient when you copy-paste the full Endpoint URL from the portal.

Detection rule: if the value contains a dot, it's treated as a hostname; otherwise as a bare resource name. Azure resource names are alphanumeric + hyphens, no dots, so this is unambiguous.

Model family (and the API version)

Azure's API routes calls by deployment name, not model name — there is no per-call model identifier on the wire, only the deployment you pick in the URL. So Parleq's Settings doesn't ask for a model name; instead, it asks for a Model family:

Standard — pick this when your deployment serves gpt-4o-mini, gpt-4o, gpt-4, gpt-3.5-turbo, or any non-reasoning model. Parleq sends max_tokens + an explicit temperature.
Reasoning — pick this when your deployment serves gpt-5, gpt-5-mini, or any o-series model (o1, o3, o4, o5). These models reject max_tokens in favor of max_completion_tokens and refuse any temperature other than 1.0, so Parleq has to send a different request shape.

Default is Standard. If you mismatch (e.g., picked Standard but deployed a gpt-5 model), Azure responds with a clear 400 error message that names the offending parameter — flip the family in Settings and dictate again.

API version. Defaults to 2025-04-01-preview, which supports gpt-5-series models. If your deployment requires a different version, override in Settings.

Reasoning models are noticeably slower than the gpt-4o family — they think before responding. Worth it if you've tuned the cleanup style toward gpt-5; otherwise the gpt-4o-mini path wins on latency.

Pricing & quotas

Per-token pricing per Azure's published rates, plus regional capacity (TPM — tokens per minute) granted per deployment. Cleanup traffic is short-prompt + short-response, so single-user dictation use is well below default TPM caps.

Charges land on the Azure subscription that owns the resource, alongside everything else your org bills there.

Cleaning up after testing. Idle Azure OpenAI resources don't incur compute charges — there's no provisioned capacity to pay for unless you've explicitly purchased it. You can leave the resource in place between sessions; if you want to remove it entirely, delete the deployment first, then the resource, both from the Azure portal.

Compliance notes

Azure OpenAI traffic is governed by your Microsoft Customer Agreement / EA terms, not OpenAI's. Per Microsoft's documented data privacy commitments, prompts and completions are not used to train or improve OpenAI's or Microsoft's foundation models. Data stays in the Azure region you deployed to. Activity logs (Azure Monitor → Diagnostic settings on the resource) capture call metadata for audit.

Audio never reaches Azure. Only the text transcript output by Parleq's on-device speech model is sent.

Troubleshooting

404 Resource not found on first dictation

The URL Parleq built doesn't match a real deployment. Check three things in order: (1) the Resource field — for an AI Foundry resource, paste the full hostname (*.cognitiveservices.azure.com) rather than the bare name; (2) the Deployment name — it's case-sensitive and has to match exactly; (3) the API version — bump to 2025-04-01-preview or later if you're calling gpt-5-series.

401 PermissionDenied in Entra ID mode

Your az login identity doesn't have the data-plane role. Even Owner on the resource isn't enough — Azure OpenAI specifically requires Cognitive Services OpenAI User. Grant it on the resource (not the resource group) and wait ~1 minute for propagation.

400 max_tokens not supported / "use 'max_completion_tokens'"

Family mismatch. Your deployment serves a reasoning model but Parleq is configured as Standard. Open Settings → LLM → switch Model family to Reasoning (gpt-5, o-series), then dictate again.

400 unsupported parameter: 'max_completion_tokens'

The reverse mismatch: Parleq is in Reasoning mode but your deployment serves a standard model. Switch Model family back to Standard and try again.

az: command not found

Parleq couldn't find az on PATH. Most likely cause: you installed the Azure CLI somewhere outside /opt/homebrew/bin / /usr/local/bin. Symlink it into /usr/local/bin, or switch to API-key mode.

429 Too Many Requests

You've hit the deployment's TPM quota. Either wait for the per-minute window to reset, or in the Azure portal increase the deployment's TPM allocation under Deployments → your deployment → Edit.