
Vosk is an open-source speech recognition toolkit that runs locally without requiring internet connectivity. It supports more than 20 languages and dialects and is a great choice if you need offline, privacy-friendly transcription.
Vosk provides pre-trained models in over 20 languages, including:
English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, Polish, Uzbek, Korean, Breton, Gujarati, Tajik, Telugu.
Full list and downloads available here: Vosk Models
To use Vosk with AVR, add the following service to your docker-compose.yml:
avr-asr-vosk:
image: agentvoiceresponse/avr-asr-vosk
platform: linux/x86_64
container_name: avr-asr-vosk
restart: always
environment:
- PORT=6010
- MODEL_PATH=model
volumes:
- ./model:/usr/src/app/model
networks:
- avr
avr-core:
image: agentvoiceresponse/avr-core
container_name: avr-core
restart: always
environment:
- ASR_URL=http://avr-asr-vosk:6010/speech-to-text-stream
...
networks:
- avr
Variable
Description
Example Value
PORT
Port on which the Vosk ASR service runs
6010
MODEL_PATH
Path to the mounted Vosk model (container)
model
For most AVR use cases, small/medium models provide a good balance between latency and accuracy