Gemini is Google’s family of multimodal AI models. With the Gemini STS (Speech-to-Speech) integration, AVR can natively handle conversations where speech in is directly transformed into speech out—without requiring separate ASR (speech-to-text) and TTS (text-to-speech) components.
This reduces latency and delivers more natural, human-like dialogue.
To connect AVR with Gemini, you need an API Key:
You’ll use this key as GEMINI_API_KEY
in your Docker environment.
Variable | Description | Example Value |
---|---|---|
PORT |
Port on which the Gemini STS service runs | 6037 |
GEMINI_API_KEY |
API Key from Google AI Studio | AIza... |
GEMINI_MODEL |
Gemini model ID to use | gemini-2.5-flash-preview-native-audio-dialog |
GEMINI_INSTRUCTIONS |
System prompt for the voice assistant | "You are a helpful assistant." |
Add the following service to your docker-compose.yml:
avr-sts-gemini:
image: agentvoiceresponse/avr-sts-gemini
platform: linux/x86_64
container_name: avr-sts-gemini
restart: always
environment:
- PORT=6037
- GEMINI_API_KEY=$GEMINI_API_KEY
- GEMINI_MODEL=$GEMINI_MODEL
- GEMINI_INSTRUCTIONS=$GEMINI_INSTRUCTIONS
networks:
- avr
Point avr-core to the Gemini STS service:
avr-core:
image: agentvoiceresponse/avr-core
platform: linux/x86_64
container_name: avr-core
restart: always
environment:
- PORT=5001
- STS_URL=ws://avr-sts-gemini:6037
ports:
- 5001:5001
networks:
- avr
A ready-to-use integration example is available in the avr-infra github project:
docker-compose-gemini.yml
— Example 10 configuration of AVR with Gemini STS.