Tempo — Distributed Tracing¶
Dashboards Tempo is the distributed tracing backend for AKKO. It receives OpenTelemetry (OTLP) spans from ADEN and other instrumented services, stores them locally or on S3, and exposes a query API that Dashboards reads via the Tempo datasource. This enables end-to-end trace visibility across the AKKO pipeline.
Architecture¶
ADEN / AI Service / Airflow
| (OTLP gRPC :4317 or HTTP :4318)
|
+--v-----------+
| Tempo |
| (:3200 query|
| :4317 gRPC |
| :4318 HTTP)|
+--+-----------+
|
+--v-----------+
| Dashboards |
| (Tempo |
| datasource)|
+--------------+
- OTLP receivers — gRPC on port 4317 and HTTP on port 4318 for span ingestion
- Query API on port 3200 consumed by the Dashboards Tempo datasource
- Local filesystem storage for development; S3 backend for production
- Apache 2.0 licensed (R27 compliant)
URLs¶
Tempo is an internal-only service (no ingress). Traces are visualized through Dashboards.
| Endpoint | Port | Protocol |
|---|---|---|
| OTLP gRPC | 4317 | gRPC |
| OTLP HTTP | 4318 | HTTP |
| Query API | 3200 | HTTP |
Configuration (Helm values)¶
akko-tempo:
enabled: true
image:
repository: grafana/tempo
tag: "2.6.1"
replicas: 1
service:
otlpGrpcPort: 4317
otlpHttpPort: 4318
queryPort: 3200
persistence:
enabled: true
size: 10Gi
retention:
blockRetention: 72h # 3 days for local dev, 14-30d for production S3
serviceMonitor:
enabled: true
interval: 30s
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1
memory: 1Gi
Health Check¶
Tempo exposes metrics and health on the query port:
livenessProbe:
httpGet:
path: /ready
port: 3200
initialDelaySeconds: 15
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 3200
initialDelaySeconds: 10
periodSeconds: 10
RBAC (who can access)¶
Tempo is an internal-only service with no ingress. Access is controlled by:
- NetworkPolicy — restricts which pods can send spans (OTLP) and query traces
- Dashboards — traces are visualized through Dashboards (Dashboards is authenticated via Keycloak)
Sending Traces to Tempo¶
Services send OTLP spans to Tempo using environment variables:
OTEL_EXPORTER_OTLP_ENDPOINT=http://akko-akko-tempo:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_SERVICE_NAME=<service-name>
For Python services (ADEN, AI Service):
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://akko-akko-tempo:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
Key Features¶
| Feature | Description |
|---|---|
| OTLP ingestion | Receives spans via gRPC (4317) and HTTP (4318) |
| Dashboards integration | Native Tempo datasource for trace visualization |
| S3 backend | Production deployments store traces on object storage/S3 |
| Configurable retention | 72h default (local), tunable for production |
| Prometheus metrics | ServiceMonitor scrapes /metrics on port 3200 |
| Trace-to-logs | Link traces to logs layer logs in Dashboards |
Resource Requirements¶
| Component | Minimum RAM | Recommended |
|---|---|---|
| Tempo | 256 Mi | 1 Gi |
Troubleshooting¶
No Traces in Dashboards¶
Symptoms: Dashboards Tempo datasource returns no results, or shows "No traces found".
Cause: Services are not sending spans, Tempo is not receiving them, or the Datasource is misconfigured.
Solution:
# Verify Tempo pod is running and ready
kubectl get pods -n akko -l app.kubernetes.io/name=akko-tempo
# Check Tempo readiness
kubectl exec -n akko deploy/akko-akko-tempo -- wget -qO- http://localhost:3200/ready
# Verify OTLP endpoint is reachable from a service pod
kubectl exec -n akko deploy/akko-akko-aden -- curl -s http://akko-akko-tempo:3200/ready
# Check Tempo logs for ingestion errors
kubectl logs -n akko deploy/akko-akko-tempo --tail=50 | grep -i "error\|warn"
Tempo Disk Full¶
Symptoms: Tempo pod restarts or traces are silently dropped. Logs show disk full or WAL write failure.
Cause: The PVC is full due to high trace volume or insufficient retention cleanup.
Solution:
# Check PVC usage
kubectl exec -n akko deploy/akko-akko-tempo -- df -h /var/tempo
# Reduce retention
# In values: akko-tempo.retention.blockRetention: 24h
# Increase PVC size (if supported by StorageClass)
kubectl edit pvc -n akko akko-akko-tempo
OTLP Connection Refused¶
Symptoms: Services log OTLP exporter: connection refused or failed to export spans.
Cause: Tempo pod is not ready, or NetworkPolicy blocks OTLP traffic.
Solution: