Gemini is Google’s family of multimodal AI models. With the Gemini Speech-to-Speech (STS) integration, Agent Voice Response (AVR) can handle conversations where speech input is directly transformed into speech output, without requiring separate ASR (speech-to-text) and TTS (text-to-speech) components.
This approach significantly reduces latency and enables more natural, human-like voice interactions.
To connect AVR with Gemini, you need a Gemini API key:
You will use this key as GEMINI_API_KEY in your Docker environment.
| Variable | Description | Example Value |
|---|---|---|
PORT |
Port on which the Gemini STS service runs | 6037 |
GEMINI_API_KEY |
API Key from Google AI Studio | AIza... |
GEMINI_MODEL |
Gemini model ID to use | gemini-2.5-flash-preview-native-audio-dialog |
GEMINI_INSTRUCTIONS |
System prompt for the voice assistant | "You are a helpful assistant." |
GEMINI_URL_INSTRUCTIONS |
URL to fetch dynamic instructions | https://your-api.com/instructions |
GEMINI_FILE_INSTRUCTIONS |
Path to local instruction file | ./instructions.txt |
We’ve added support for the following Gemini settings:
GEMINI_THINKING_LEVEL=MINIMALGEMINI_THINKING_BUDGET=0More details here 👉 https://ai.google.dev/gemini-api/docs/thinking?hl=en
Supported values for GEMINI_THINKING_LEVEL:
THINKING_LEVEL_UNSPECIFIEDLOWMEDIUMHIGHMINIMALGEMINI_THINKING_BUDGET:
0 → turn off thinking-1 → enable dynamic thinkingAdd the following service to your docker-compose.yml:
avr-sts-gemini:
image: agentvoiceresponse/avr-sts-gemini
platform: linux/x86_64
container_name: avr-sts-gemini
restart: always
environment:
- PORT=6037
- GEMINI_API_KEY=$GEMINI_API_KEY
- GEMINI_MODEL=$GEMINI_MODEL
- GEMINI_INSTRUCTIONS=$GEMINI_INSTRUCTIONS
networks:
- avr
Configure avr-core to use Gemini STS by setting STS_URL:
avr-core:
image: agentvoiceresponse/avr-core
platform: linux/x86_64
container_name: avr-core
restart: always
environment:
- PORT=5001
- STS_URL=ws://avr-sts-gemini:6037
ports:
- 5001:5001
networks:
- avr
⚠️ When
STS_URLis configured,ASR_URL,LLM_URL, andTTS_URLmust be commented out.
The Gemini STS integration supports multiple instruction sources with a clear priority order.
GEMINI_INSTRUCTIONS="You are a specialized customer service agent for a tech company. Always be polite and helpful."
If set, this overrides all other instruction sources.
GEMINI_URL_INSTRUCTIONS="https://your-api.com/instructions"
Expected response format:
{
"system": "You are a helpful assistant that provides technical support."
}
The request will include the call UUID as an HTTP header:
X-AVR-UUID: <call-uuid>
This enables dynamic, per-call instruction generation.
GEMINI_FILE_INSTRUCTIONS="./instructions.txt"
The file should contain plain text instructions.
GEMINI_INSTRUCTIONSGEMINI_URL_INSTRUCTIONSGEMINI_FILE_INSTRUCTIONSA ready-to-use example is available in the avr-infra repository:
docker-compose-gemini.yml — Example #10https://github.com/agentvoiceresponse/avr-infra
INTERRUPT_LISTENING are ignored in STS mode