Whisper — Speech-to-Text¶

Whisper provides audio transcription (speech-to-text) for RAG pipelines and audio analysis in AKKO. It uses an OpenAI-compatible API to transcribe audio files into text — fully offline, no external API calls.

Architecture¶

ai-service / ADEN / Cockpit
            |
      +-----v------+
      |   Whisper   |  REST API (port 8000)
      | (Speech-to- |  WAV/MP3/M4A/FLAC/OGG -> Text
      |   Text)     |
      +------------+

Supported Formats¶

Input Format	Description
WAV	Uncompressed PCM audio
MP3	MPEG Layer 3 compressed audio
M4A	AAC/ALAC compressed audio
FLAC	Lossless compressed audio
OGG	Vorbis/Opus compressed audio

Usage¶

From Trino (akko_ai_transcribe)¶

-- Transcribe a single audio file from object storage
SELECT akko_ai_transcribe('s3://akko-documents/meeting-2026-04.wav');

-- Transcribe all audio files and store results
SELECT
    file_path,
    akko_ai_transcribe(file_path) AS transcript
FROM iceberg.raw.audio_files;

From ai-service (REST API)¶

import httpx

# Upload a file
with open("meeting.wav", "rb") as f:
    response = httpx.post(
        "http://akko-akko-ai-service:8000/v1/transcribe",
        files={"file": ("meeting.wav", f)},
    )
print(response.json()["text"])

# Or use an S3 URI
response = httpx.get(
    "http://akko-akko-ai-service:8000/v1/transcribe",
    params={"s3_uri": "s3://akko-documents/meeting.wav"},
)
print(response.json()["text"])

From Notebooks¶

import requests

# Transcribe an audio file from object storage
resp = requests.get(
    "http://akko-akko-ai-service:8000/v1/transcribe",
    params={"s3_uri": "s3://akko-documents/interview.mp3"},
    timeout=300,
)
result = resp.json()
print(f"Language: {result['language']}")
print(f"Duration: {result['duration_seconds']}s")
print(f"Transcript:\n{result['text']}")

Health Check¶

curl http://akko-akko-whisper:8000/health

Airflow DAG¶

The akko_audio_transcription DAG runs every 15 minutes and automatically:

Lists new audio files in the akko-documents S3 bucket
Transcribes each file via the AI Service /v1/transcribe endpoint
Stores the transcript in pgvector rag.documents (content_type=audio/transcript)
Tracks processed files in rag.audio_transcription_tracking

Configuration¶

Kubernetes (Helm)¶

akko-whisper:
  enabled: true
  image:
    repository: hwdsl2/whisper-server
    tag: "latest"  # Pin to a specific version in production
  whisperModel: "base"  # Options: tiny, base, small, medium, large
  resources:
    requests:
      cpu: 250m
      memory: 256Mi
    limits:
      cpu: "2"
      memory: 2Gi

Whisper Model Selection¶

Model	Size	Speed	Accuracy	Use Case
tiny	39 MB	Fastest	Low	Quick previews, development
base	74 MB	Fast	Moderate	Default, good balance
small	244 MB	Moderate	Good	Production with decent hardware
medium	769 MB	Slow	High	High-quality transcription
large	1.5 GB	Slowest	Highest	Maximum accuracy

Memory Requirement

The Whisper model is loaded into memory at startup. The base model requires ~256 Mi, while large requires ~2 Gi. Adjust resource limits accordingly.

Network Access¶

Whisper is an internal service with no internet access. It processes audio locally using CPU-based speech recognition. The NetworkPolicy restricts:

Ingress: Only ai-service, ADEN, and cockpit can reach port 8000
Egress: DNS only (no internet access)

RBAC¶

The akko_ai_transcribe Trino function is available to:

admin — Full access
engineer — Full access
analyst — Full access
steward — No access (governance-only role)
viewer — No access

Troubleshooting¶

Whisper Pod CrashLoopBackOff (OOMKilled)¶

Symptoms: The Whisper pod enters CrashLoopBackOff status. kubectl describe pod shows OOMKilled as the last termination reason.

Cause: The selected Whisper model is too large for the configured memory limits.

Solution:

# Check current memory limits
kubectl get pod -n akko -l app.kubernetes.io/name=akko-whisper -o jsonpath='{.items[0].spec.containers[0].resources}'

# Use a smaller model or increase memory
helm upgrade akko helm/akko/ -n akko -f helm/examples/values-dev.yaml \
  --set akko-whisper.whisperModel=tiny \
  --set akko-whisper.resources.limits.memory=1Gi

Slow Transcription¶

Symptoms: Audio transcription takes several minutes for short files. CPU usage is at 100%.

Cause: Whisper uses CPU-based inference. Larger models and longer audio files require more processing time.

Solution:

# Check CPU allocation
kubectl top pod -n akko -l app.kubernetes.io/name=akko-whisper

# Use a smaller model for faster processing
helm upgrade akko helm/akko/ -n akko -f helm/examples/values-dev.yaml \
  --set akko-whisper.whisperModel=tiny

# Or increase CPU limits
helm upgrade akko helm/akko/ -n akko -f helm/examples/values-dev.yaml \
  --set akko-whisper.resources.limits.cpu=4

Empty Transcription Results¶

Symptoms: The /v1/transcribe endpoint returns {"status": "error", "error": "Could not transcribe audio"}.

Cause: The audio file may be corrupted, in an unsupported format, or contain only silence.

Solution:

# Check Whisper pod logs
kubectl logs -n akko -l app.kubernetes.io/name=akko-whisper --tail=50

# Verify the audio file is valid
kubectl exec -n akko deploy/akko-akko-ai-service -- \
  curl -s "http://akko-akko-whisper:8000/health"