The AVR Infrastructure project provides a complete, modular deployment environment for the Agent Voice Response (AVR) system.
It allows you to deploy AVR Core, ASR (Automatic Speech Recognition), LLM (Large Language Model), TTS (Text-to-Speech), or a unified STS (Speech-to-Speech) service, all integrated with Asterisk PBX using the AudioSocket protocol.
AVR supports a wide range of providers, including cloud services such as OpenAI, Deepgram, Google, ElevenLabs, Anthropic, as well as local and open-source providers like Vosk, Kokoro, CoquiTTS, and Ollama.
The entire stack can be customized and deployed using Docker Compose.
Before starting, ensure the following tools and credentials are available:
AVR follows a modular and provider-agnostic architecture.
At runtime, AVR Core acts as the orchestrator between Asterisk and the configured AI services.
At a high level, a call handled by AVR follows this lifecycle:
The sections below describe each step in detail.
Before audio streaming begins, the call can be explicitly initialized via HTTP.
This step is optional but strongly recommended, as it enables:
The Asterisk dialplan:
POST /call request to AVR CoreAVR Core:
call_initiated webhookSee also:
Once the call is initialized, Asterisk opens a TCP AudioSocket connection to AVR Core.
From the dialplan:
UUID used in POST /call is passed to AudioSocketAVR Core responsibilities:
When using the classic pipeline:
AVR Core streams audio chunks to the ASR service (ASR_URL)
The ASR returns:
AVR Core:
The final transcript triggers the reasoning step.
AVR Core sends the final transcript and conversation context to the LLM (LLM_URL)
The LLM generates the assistant response:
AVR Core handles provider-specific streaming and normalization.
TTS_URL)The caller hears the response with minimal latency.
If STS_URL is configured, AVR Core bypasses ASR, LLM, and TTS entirely.
Caller speech is sent directly to the STS provider, which returns synthesized speech.
This approach:
STS routing can be static or dynamic via the call_initiated webhook.
From the caller’s perspective:
Use one of the preconfigured Docker Compose files to launch AVR with your preferred providers.
git clone https://github.com/agentvoiceresponse/avr-infra
cd avr-infra
cp .env.example .env
Run a stack, for example:
docker-compose -f docker-compose-openai.yml up -d
Or with local providers:
docker-compose -f docker-compose-local.yml up -d
Edit .env with your provider credentials.
ASR_URLLLM_URLTTS_URLSTS_URL (ASR / LLM / TTS disabled)ASR_URL=http://avr-asr-deepgram:6010/speech-to-text-stream
LLM_URL=http://avr-llm-anthropic:6000/prompt-stream
TTS_URL=http://avr-tts-google:6003/text-to-speech-stream
STS_URL=http://avr-sts-openai:6033/speech-to-speech-stream
Hostnames and ports depend on your Docker Compose configuration.