Kokoro is a fast, lightweight, and open-source text-to-speech (TTS) engine that can be deployed locally.
This makes it an excellent choice when you want full control, reduced latency, or the ability to run offline without cloud dependencies.
AVR integrates seamlessly with Kokoro to provide natural, customizable voices directly inside your telephony infrastructure.
An example configuration is available in the docker-compose-local.yml file in the avr-infra repository.
| Variable | Description | Example Value |
|---|---|---|
PORT |
Port where the Kokoro TTS service listens | 6012 |
KOKORO_BASE_URL |
Base URL of your Kokoro server | http://avr-kokoro:8880 |
KOKORO_VOICE |
Voice model to use | af_alloy |
KOKORO_SPEED |
Speaking rate (1.0 = normal, >1 faster, <1 slower) | 1.3 |
Example .env file:
PORT=6012
KOKORO_BASE_URL=http://avr-kokoro:8880
KOKORO_VOICE=af_alloy
KOKORO_SPEED=1.3
avr-kokoro:
image: ghcr.io/remsky/kokoro-fastapi-cpu
container_name: avr-kokoro
restart: always
ports:
- 8880:8880
networks:
- avr
avr-tts-kokoro:
image: agentvoiceresponse/avr-tts-kokoro
platform: linux/x86_64
container_name: avr-tts-kokoro
restart: always
environment:
- PORT=6012
- KOKORO_BASE_URL=http://avr-kokoro:8880
- KOKORO_VOICE=af_alloy
- KOKORO_SPEED=1.3
networks:
- avr
You can test Kokoro directly from its built-in web UI:
http://localhost:8880/web/
curl -X POST http://localhost:6012/text-to-speech-stream \
-H "Content-Type: application/json" \
-d '{"text": "Hello, welcome to Agent Voice Response with Kokoro TTS!"}' \
--output response.raw
The resulting response.raw file will contain PCM audio (8kHz, 16-bit, mono).