Gemini is Google’s family of multimodal AI models. With the Gemini Speech-to-Speech (STS) integration, Agent Voice Response (AVR) can handle conversations where speech input is directly transformed into speech output, without requiring separate ASR (speech-to-text) and TTS (text-to-speech) components.
This approach significantly reduces latency and enables more natural, human-like voice interactions.
avr-sts-gemini 1.5.0+ supports two ways to authenticate with Gemini Live. Choose one mode per deployment — do not set an API key and Vertex flags together unless you intend Vertex mode (Vertex takes precedence when GOOGLE_GENAI_USE_VERTEXAI=true).
| Mode | Best for | Credentials |
|---|---|---|
| Google AI Studio (default) | Quick setup, local dev | API key (GEMINI_API_KEY or GOOGLE_API_KEY) |
| Vertex AI (Google Cloud Console) | GCP projects, service accounts, enterprise | Application Default Credentials (ADC) + project/region |
Connector reference: https://github.com/agentvoiceresponse/avr-sts-gemini
GEMINI_API_KEY (or GOOGLE_API_KEY) in your environmentNo Google Cloud project is required for this path.
Use Vertex when Gemini is enabled in Google Cloud Console instead of AI Studio.
us-central1) — set GOOGLE_CLOUD_LOCATION to matchGOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json (mount the file into the container)gcloud auth application-default loginOptional AVR-prefixed aliases: GEMINI_USE_VERTEXAI, GEMINI_VERTEX_PROJECT, GEMINI_VERTEX_LOCATION.
| Variable | Description | Example Value |
|---|---|---|
PORT |
Port on which the Gemini STS service runs | 6037 |
GEMINI_MODEL |
Gemini model ID (Live native audio) | gemini-2.5-flash-native-audio-preview-12-2025 |
GEMINI_API_VERSION |
Optional API version override | (unset) |
GEMINI_INSTRUCTIONS |
System prompt for the voice assistant | "You are a helpful assistant." |
GEMINI_URL_INSTRUCTIONS |
URL to fetch dynamic instructions | https://your-api.com/instructions |
GEMINI_FILE_INSTRUCTIONS |
Path to local instruction file | ./instructions.txt |
| Variable | Required | Description | Example Value |
|---|---|---|---|
GEMINI_API_KEY |
Yes* | API key from Google AI Studio | AIza... |
GOOGLE_API_KEY |
Yes* | Alternative name (Google GenAI SDK) | AIza... |
* One of GEMINI_API_KEY or GOOGLE_API_KEY is required. Do not set GOOGLE_GENAI_USE_VERTEXAI=true for AI Studio mode.
| Variable | Required | Description | Example Value |
|---|---|---|---|
GOOGLE_GENAI_USE_VERTEXAI |
Yes | Enable Vertex AI mode | true |
GOOGLE_CLOUD_PROJECT |
Yes | GCP project ID | my-avr-project |
GOOGLE_CLOUD_LOCATION |
Yes | Vertex region | us-central1 |
GOOGLE_APPLICATION_CREDENTIALS |
Yes† | Path to service account JSON (in container) | /secrets/gcp-sa.json |
† Required in Docker/production unless the host provides ADC another way (e.g. GCE metadata).
Aliases: GEMINI_USE_VERTEXAI, GEMINI_VERTEX_PROJECT, GEMINI_VERTEX_LOCATION.
We’ve added support for the following Gemini settings:
GEMINI_THINKING_LEVEL=MINIMALGEMINI_THINKING_BUDGET=0More details here 👉 https://ai.google.dev/gemini-api/docs/thinking?hl=en
Supported values for GEMINI_THINKING_LEVEL:
THINKING_LEVEL_UNSPECIFIEDLOWMEDIUMHIGHMINIMALGEMINI_THINKING_BUDGET:
0 → turn off thinking-1 → enable dynamic thinkingPin the image tag in production (for example 1.5.0). :latest tracks the newest release.
Add the following service to your docker-compose.yml:
avr-sts-gemini:
image: agentvoiceresponse/avr-sts-gemini:1.5.0
platform: linux/x86_64
container_name: avr-sts-gemini
restart: always
environment:
- PORT=6037
- GEMINI_API_KEY=${GEMINI_API_KEY}
- GEMINI_MODEL=${GEMINI_MODEL:-gemini-2.5-flash-native-audio-preview-12-2025}
- GEMINI_INSTRUCTIONS=${GEMINI_INSTRUCTIONS:-You are a helpful assistant}
- GEMINI_THINKING_LEVEL=${GEMINI_THINKING_LEVEL:-MINIMAL}
- GEMINI_THINKING_BUDGET=${GEMINI_THINKING_BUDGET:-0}
networks:
- avr
Mount a service account key and enable Vertex mode:
avr-sts-gemini:
image: agentvoiceresponse/avr-sts-gemini:1.5.0
platform: linux/x86_64
container_name: avr-sts-gemini
restart: always
environment:
- PORT=6037
- GOOGLE_GENAI_USE_VERTEXAI=true
- GOOGLE_CLOUD_PROJECT=${GOOGLE_CLOUD_PROJECT}
- GOOGLE_CLOUD_LOCATION=${GOOGLE_CLOUD_LOCATION:-us-central1}
- GOOGLE_APPLICATION_CREDENTIALS=/run/secrets/gcp-sa.json
- GEMINI_MODEL=${GEMINI_MODEL:-gemini-2.5-flash-native-audio-preview-12-2025}
- GEMINI_INSTRUCTIONS=${GEMINI_INSTRUCTIONS:-You are a helpful assistant}
volumes:
- ${GOOGLE_APPLICATION_CREDENTIALS_HOST_PATH}:/run/secrets/gcp-sa.json:ro
networks:
- avr
Set GOOGLE_APPLICATION_CREDENTIALS_HOST_PATH in your .env to the JSON key path on the host (for example ./secrets/gcp-sa.json).
If WebSocket init fails, the connector returns a specific
errormessage (missing API key, project, or location) instead of a generic failure — check container logs and the message fromavr-core.
Configure avr-core to use Gemini STS by setting STS_URL:
avr-core:
image: agentvoiceresponse/avr-core
platform: linux/x86_64
container_name: avr-core
restart: always
environment:
- PORT=5001
- STS_URL=ws://avr-sts-gemini:6037
ports:
- 5001:5001
networks:
- avr
⚠️ When
STS_URLis configured,ASR_URL,LLM_URL, andTTS_URLmust be commented out.
The Gemini STS integration supports multiple instruction sources with a clear priority order.
GEMINI_INSTRUCTIONS="You are a specialized customer service agent for a tech company. Always be polite and helpful."
If set, this overrides all other instruction sources.
GEMINI_URL_INSTRUCTIONS="https://your-api.com/instructions"
Expected response format:
{
"system": "You are a helpful assistant that provides technical support."
}
The request will include the call UUID as an HTTP header:
X-AVR-UUID: <call-uuid>
This enables dynamic, per-call instruction generation.
GEMINI_FILE_INSTRUCTIONS="./instructions.txt"
The file should contain plain text instructions.
GEMINI_INSTRUCTIONSGEMINI_URL_INSTRUCTIONSGEMINI_FILE_INSTRUCTIONSA ready-to-use example is available in the avr-infra repository:
docker-compose-gemini.yml — Example #10https://github.com/agentvoiceresponse/avr-infra
INTERRUPT_LISTENING are ignored in STS mode