Whisper — Speech-to-Text¶
Whisper provides audio transcription (speech-to-text) for RAG pipelines and audio analysis in AKKO. It uses an OpenAI-compatible API to transcribe audio files into text — fully offline, no external API calls.
Architecture¶
ai-service / ADEN / Cockpit
|
+-----v------+
| Whisper | REST API (port 8000)
| (Speech-to- | WAV/MP3/M4A/FLAC/OGG -> Text
| Text) |
+------------+
Supported Formats¶
| Input Format | Description |
|---|---|
| WAV | Uncompressed PCM audio |
| MP3 | MPEG Layer 3 compressed audio |
| M4A | AAC/ALAC compressed audio |
| FLAC | Lossless compressed audio |
| OGG | Vorbis/Opus compressed audio |
Usage¶
From Trino (akko_ai_transcribe)¶
-- Transcribe a single audio file from object storage
SELECT akko_ai_transcribe('s3://akko-documents/meeting-2026-04.wav');
-- Transcribe all audio files and store results
SELECT
file_path,
akko_ai_transcribe(file_path) AS transcript
FROM iceberg.raw.audio_files;
From ai-service (REST API)¶
import httpx
# Upload a file
with open("meeting.wav", "rb") as f:
response = httpx.post(
"http://akko-akko-ai-service:8000/v1/transcribe",
files={"file": ("meeting.wav", f)},
)
print(response.json()["text"])
# Or use an S3 URI
response = httpx.get(
"http://akko-akko-ai-service:8000/v1/transcribe",
params={"s3_uri": "s3://akko-documents/meeting.wav"},
)
print(response.json()["text"])
From Notebooks¶
import requests
# Transcribe an audio file from object storage
resp = requests.get(
"http://akko-akko-ai-service:8000/v1/transcribe",
params={"s3_uri": "s3://akko-documents/interview.mp3"},
timeout=300,
)
result = resp.json()
print(f"Language: {result['language']}")
print(f"Duration: {result['duration_seconds']}s")
print(f"Transcript:\n{result['text']}")
Health Check¶
Airflow DAG¶
The akko_audio_transcription DAG runs every 15 minutes and automatically:
- Lists new audio files in the
akko-documentsS3 bucket - Transcribes each file via the AI Service
/v1/transcribeendpoint - Stores the transcript in pgvector
rag.documents(content_type=audio/transcript) - Tracks processed files in
rag.audio_transcription_tracking
Configuration¶
Kubernetes (Helm)¶
akko-whisper:
enabled: true
image:
repository: hwdsl2/whisper-server
tag: "latest" # Pin to a specific version in production
whisperModel: "base" # Options: tiny, base, small, medium, large
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: "2"
memory: 2Gi
Whisper Model Selection¶
| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| tiny | 39 MB | Fastest | Low | Quick previews, development |
| base | 74 MB | Fast | Moderate | Default, good balance |
| small | 244 MB | Moderate | Good | Production with decent hardware |
| medium | 769 MB | Slow | High | High-quality transcription |
| large | 1.5 GB | Slowest | Highest | Maximum accuracy |
Memory Requirement
The Whisper model is loaded into memory at startup. The base model requires ~256 Mi, while large requires ~2 Gi. Adjust resource limits accordingly.
Network Access¶
Whisper is an internal service with no internet access. It processes audio locally using CPU-based speech recognition. The NetworkPolicy restricts:
- Ingress: Only ai-service, ADEN, and cockpit can reach port 8000
- Egress: DNS only (no internet access)
RBAC¶
The akko_ai_transcribe Trino function is available to:
- admin — Full access
- engineer — Full access
- analyst — Full access
- steward — No access (governance-only role)
- viewer — No access
Troubleshooting¶
Whisper Pod CrashLoopBackOff (OOMKilled)¶
Symptoms: The Whisper pod enters CrashLoopBackOff status. kubectl describe pod shows OOMKilled as the last termination reason.
Cause: The selected Whisper model is too large for the configured memory limits.
Solution:
# Check current memory limits
kubectl get pod -n akko -l app.kubernetes.io/name=akko-whisper -o jsonpath='{.items[0].spec.containers[0].resources}'
# Use a smaller model or increase memory
helm upgrade akko helm/akko/ -n akko -f helm/examples/values-dev.yaml \
--set akko-whisper.whisperModel=tiny \
--set akko-whisper.resources.limits.memory=1Gi
Slow Transcription¶
Symptoms: Audio transcription takes several minutes for short files. CPU usage is at 100%.
Cause: Whisper uses CPU-based inference. Larger models and longer audio files require more processing time.
Solution:
# Check CPU allocation
kubectl top pod -n akko -l app.kubernetes.io/name=akko-whisper
# Use a smaller model for faster processing
helm upgrade akko helm/akko/ -n akko -f helm/examples/values-dev.yaml \
--set akko-whisper.whisperModel=tiny
# Or increase CPU limits
helm upgrade akko helm/akko/ -n akko -f helm/examples/values-dev.yaml \
--set akko-whisper.resources.limits.cpu=4
Empty Transcription Results¶
Symptoms: The /v1/transcribe endpoint returns {"status": "error", "error": "Could not transcribe audio"}.
Cause: The audio file may be corrupted, in an unsupported format, or contain only silence.
Solution: