Vertex AI

Google Vertex AI

Vertex AI is Google's enterprise-flavored access to its Gemini models. The models themselves are identical to what the direct Google AI API serves — same Gemini 2.5 Flash, same response quality — but the calls run through your organization's Google Cloud (GCP) project instead of through a personal AI Studio account. That brings with it the IAM controls, audit logs, regional deployment guarantees, and contract terms most companies require to actually use an AI service for work. If your team already uses GCP, Vertex is usually the right answer.

At a glance

Auth modes (two): gcloud Application Default Credentials (uses your existing CLI session — easiest on developer machines), or service-account JSON (a dedicated identity for Parleq, no gcloud install required).
Default model: gemini-2.5-flash — the same model the direct Gemini API serves.
Region: Pick the closest one offering the model. us-central1 carries every Gemini variant; europe-west4 is a common EU choice.
IAM role required: Vertex AI User (roles/aiplatform.user) on the project — for your user (ADC mode) or the service account (JSON mode).
Compliance: Vertex traffic is governed by your GCP organization's terms — prompts and completions are not used for model training, and Cloud Audit Logging captures every call.

Before you start

1.

Have a GCP project with the Vertex AI API enabled.

In Cloud Console → APIs & Services → Vertex AI API, click Enable. Note the Project ID (not the project name — the ID, like my-team-prod-1234) — Parleq needs it.
2.

Pick a region with Gemini 2.5 Flash available.

us-central1 (Parleq's default) carries every Gemini variant. europe-west4 is a common EU choice. Pick the closest region to you for latency. The region you pick goes into the URL twice (subdomain + path), so the model has to be deployed in that exact region — not a multi-region.
3.

Decide who's calling: you or a service account.

gcloud ADC is right when Parleq runs on your developer machine and you want calls billed and audit-logged as you. Service-account JSON is right when you can't (or don't want to) install gcloud locally, when you want a dedicated identity for Parleq's calls, or when your org policy prefers service accounts for tool-driven traffic.

Mode 1: gcloud Application Default Credentials

Parleq shells out to gcloud auth application-default print-access-token on each token refresh, getting back a fresh OAuth bearer minted from your existing CLI session. The same auth path you already use for any other GCP-touching tool on your machine.

First-time setup (one-shot):

# Install gcloud if you haven't:
brew install --cask google-cloud-sdk

# Sign in (opens browser):
gcloud auth application-default login

# Tell ADC which project to bill the calls against:
gcloud auth application-default set-quota-project my-team-prod-1234

In Parleq's Setup Wizard, pick Google Vertex AI, leave Auth mode on gcloud (ADC), fill in Project + Region, click Continue, then Finish & Restart.

Parleq augments PATH at process launch to include /opt/homebrew/bin and /usr/local/bin, so a Homebrew-installed gcloud is found even when launched from Spotlight, the Dock, or as a login item — paths that launchd doesn't normally include.

Mode 2: Service-account JSON

Parleq mints OAuth bearer tokens on its own by signing a JWT with the service account's RSA private key and exchanging it at Google's token endpoint. No gcloud install needed — useful on locked-down workstations where you can't install dev tools, or for users on personal Macs who want a dedicated GCP identity for Parleq.

Setup:

In Cloud Console → IAM & Admin → Service Accounts, create a service account named something like parleq-cleanup.
On the new service account: Permissions → grant the Vertex AI User role (roles/aiplatform.user) on this project. That's the minimum role for calling streamGenerateContent on Gemini publishers.
On the service account's Keys tab: Add Key → Create new key → JSON. The browser downloads a JSON file. Open it in a text editor and copy the entire contents.
In Parleq's Setup Wizard, pick Google Vertex AI, switch Auth mode to Service account JSON, fill in Project + Region, paste the full JSON into the multi-line field, then Finish & Restart.

The JSON contents are stored in the macOS Keychain — never in ~/.parleq/config.json. Parleq only reads two fields from it: client_email and private_key.

Service-account keys never expire on their own — they're valid until you delete or disable them. Rotate by creating a new key, pasting it into Parleq's Settings (replaces the old value in the Keychain), and deleting the old key from the GCP console.

Choosing a model

gemini-2.5-flash is the default and is the right choice for almost every dictation use case — same model as the direct Gemini API, just running on your GCP project. Cleanup is a low-stakes, low-token task; Flash matches Pro for it at a fraction of the latency.

To pick a different model, open Settings… → LLM. The Model dropdown carries the recommended Gemini variants — Flash (default), Flash-Lite (faster + cheaper), and Pro (long, technical transcripts). Pick Custom (enter below) to type any other Vertex-supported model identifier, e.g. a regional preview model or a publisher model your project has access to. Make sure the chosen model is offered in the configured region.

Power-user note: every Settings field maps to a key in ~/.parleq/config.json and can also be edited there directly. Settings is the recommended path for most users.

Pricing & quotas

Vertex pricing matches the direct Gemini API at the per-token level. The difference is billing: charges land on the GCP project's billing account, with full Cloud Billing reporting and BigQuery export, rather than on a personal AI Studio account.

Vertex enforces per-region, per-model QPM (queries-per-minute) quotas separately from the direct API. Default quotas are generous — single-user dictation use will not hit them. If you do, request a quota increase in Cloud Console → Quotas & System Limits.

Compliance notes

Vertex traffic is governed by your GCP organization's terms of service. Per Google's Cloud Service Terms, prompts and completions on Vertex are not used to train Google models. Audit logs (Cloud Audit Logging → aiplatform.googleapis.com) capture every call with caller identity, timestamp, and resource. VPC Service Controls work normally if you've already perimeterized your GCP environment.

Audio never reaches Google. Only the text transcript output by Parleq's on-device speech model is sent.

If you're choosing between direct Gemini and Vertex: pick Vertex when you need IAM roles, audit logs, data-residency commitments, or your org has already invested in GCP. Pick the direct API when you want zero setup overhead and the consumer terms are acceptable.

Troubleshooting

PERMISSION_DENIED — caller does not have aiplatform.endpoints.predict

The identity Parleq is using doesn't have the Vertex AI User role on the configured project. For ADC, this is your identity — check the IAM page for your user. For service-account auth, it's the SA's email — make sure the role was granted on the right project.

FAILED_PRECONDITION — Vertex AI API not enabled

Enable the API on the project at Cloud Console → API Library.

"Could not obtain an access token" / gcloud: command not found

Parleq couldn't find gcloud on PATH. Most likely cause: you installed gcloud somewhere outside /opt/homebrew/bin / /usr/local/bin (e.g., a custom directory referenced from your shell rc only). Either symlink the binary into /usr/local/bin, or switch to the service-account JSON path which doesn't need gcloud at all.

"That doesn't look like a service account JSON"

Parleq validates the pasted JSON against the SA-key shape (must have type, client_email, private_key). If you accidentally pasted an OAuth client JSON or a Workload Identity config, this fires. Re-download the SA key from IAM → Service Accounts → your SA → Keys → Add Key → JSON.