Observability

Twinkle Server provides full observability through OpenTelemetry, covering traces, metrics, and logs.

Quick Start

1. Start the Observability Stack

The project includes a one-command Docker Compose setup based on the grafana/otel-lgtm image (bundles OTel Collector, Mimir, Tempo, Loki, and Grafana):

cd cookbook/observability
docker compose up -d

Available services after startup:

ServiceURLPurpose
Grafanahttp://localhost:3000Dashboards and data exploration
OTLP gRPClocalhost:4317Point Twinkle’s otlp_endpoint here
OTLP HTTPlocalhost:4318Same, HTTP alternative

2. Configure the Server

Enable telemetry in server_config.yaml:

telemetry:
  enabled: true
  otlp_endpoint: http://localhost:4317

3. Install Dependencies

pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

4. Launch the Server

twinkle-server launch -c server_config.yaml

5. Open Grafana

Navigate to http://localhost:3000. Default credentials: admin / admin.

telemetry Configuration Fields

FieldTypeDefaultDescription
enabledboolfalseWhether to enable the telemetry pipeline
service_namestrtwinkle-serverReported service name
otlp_endpointstrhttp://localhost:4317OTel Collector gRPC address
debugboolfalseWhen true, dumps spans/metrics to console instead of OTLP
export_interval_msint30000Metrics export interval (milliseconds)
resource_attributesdict{}Additional resource attributes attached to all telemetry

Built-in Grafana Dashboard

The provisioned Twinkle Server Overview dashboard includes:

  • HTTP request rate and P95 latency per deployment (Gateway / Model / Sampler / Processor)
  • Active resource counts (sessions, models, sampling sessions, futures)
  • Task queue depth, execution P95, wait-time P95
  • Rate-limit rejections and task completions by status

Metric Naming Reference

Twinkle uses dot-notation OpenTelemetry metric names. Prometheus OTLP ingestion converts dots to underscores and appends _total to monotonic counters:

OpenTelemetry NamePrometheus Name
twinkle.http.requests.totaltwinkle_http_requests_total
twinkle.http.request.duration_secondstwinkle_http_request_duration_seconds_bucket
twinkle.queue.depthtwinkle_queue_depth
twinkle.task.execution_secondstwinkle_task_execution_seconds_bucket
twinkle.task.wait_secondstwinkle_task_wait_seconds_bucket
twinkle.rate_limit.rejections.totaltwinkle_rate_limit_rejections_total
twinkle.tasks.totaltwinkle_tasks_total
twinkle.sessions.activetwinkle_sessions_active
twinkle.models.activetwinkle_models_active
twinkle.sampling_sessions.activetwinkle_sampling_sessions_active
twinkle.futures.activetwinkle_futures_active

The *.active resource gauges report absolute values. Do NOT wrap them with rate() or increase().

Tracing

Twinkle spans are namespaced under twinkle.server.<component> (Gateway / Model / Sampler / Processor). Each request carries twinkle.session_id and trace_id correlation keys, supporting end-to-end cross-deployment tracing.

In Grafana, switch the datasource to Tempo to search traces by service name or span name.

Production Deployment

The LGTM all-in-one image in cookbook/observability is for local development and demos only. For production:

  • Deploy Mimir / Tempo / Loki / Grafana separately with persistent storage and replicas
  • Place an independent OTel Collector tier in front for sampling and routing
  • The telemetry config and metric names in server_config.yaml transfer without changes

Troubleshooting

Grafana shows “No data”

  • Confirm telemetry.enabled: true in your config
  • Confirm worker logs show Worker telemetry initialized
  • Set debug: true to verify spans appear in the console, then switch back to debug: false

Twinkle can’t reach the Collector

  • otlp_endpoint must be reachable from the Twinkle process. If Twinkle runs in a separate container, use the Docker network address e.g. http://twinkle-lgtm:4317

Resource gauges stuck at 0

  • Only the cleanup-leader worker pushes resource counts. If gauges remain at 0 for longer than export_interval_ms × 2 after startup, check logs for “became cleanup leader” messages

Tear Down

cd cookbook/observability
docker compose down -v   # -v removes the data volume as well
docs