Source Code

Agent Observability Dashboard 📊

Unified observability for OpenClaw agents — metrics, traces, and performance insights.

What It Does

OpenClaw agents need production-grade visibility. Multiple platforms exist (Langfuse, Langsmith, AgentOps) but no unified view.

Agent Observability Dashboard provides:

Metrics tracking — Latency, success rate, token usage, error counts
Trace visualization — Tool chains, decision flows, session timelines
Cross-agent aggregation — Compare performance across multiple agents/sessions
Exportable reports — JSON, CSV, markdown for human review
Alert thresholds — Notify when metrics exceed limits

Problem It Solves

No centralized view of OpenClaw agent performance
Hard to debug across multiple tool calls
No way to compare agents or track regressions
Production monitoring is enterprise-grade; agents need the same

Usage

# Start dashboard server
python3 scripts/observability.py --dashboard

# Record metrics from a session
python3 scripts/observability.py --record --session agent:main --latency 1.5 --success true

# View session trace
python3 scripts/observability.py --trace --session agent:main:12345

# Get performance report
python3 scripts/observability.py --report --period 24h

# Export to CSV
python3 scripts/observability.py --export metrics.csv

# Set alert thresholds
python3 scripts/observability.py --alert --metric latency --threshold 5.0

Metrics Tracked

Category	Metric	Description
Performance	Latency	Tool call latency (ms)
	Throughput	Calls per second
Success	Success Rate	% of successful tool calls
	Error Count	Failed operations
Cost	Token Usage	Input + output tokens
	API Cost	Estimated cost in USD
Quality	Hallucinations	Detected false outputs
	Corrections Needed	User corrections

Trace Format

Each tool call is logged with:

Timestamp
Agent session ID
Tool name + parameters
Latency
Success/failure
Token usage
Error details (if failed)

Example trace:

{
  "session_id": "agent:main:12345",
  "trace": [
    {
      "timestamp": "2026-01-31T14:00:00Z",
      "tool": "web_search",
      "params": {"query": "agent observability"},
      "latency_ms": 1234,
      "success": true,
      "tokens_used": 150
    },
    {
      "timestamp": "2026-01-31T14:00:02Z",
      "tool": "memory_write",
      "params": {"content": "..."},
      "latency_ms": 45,
      "success": true,
      "tokens_used": 0
    }
  ]
}

Architecture

┌─────────────────┐
│  Instrumentation│  ← Auto-capture from OpenClaw logs
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Metrics Store  │  ← SQLite/InfluxDB for time-series
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Analytics      │  ← Aggregations, trends, anomalies
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Dashboard UI  │  ← Web interface (Flask/FastAPI)
└─────────────────┘

Requirements

Python 3.9+
flask (for dashboard web UI)
pandas (for analytics)
influxdb-client (optional, for production storage)

Installation

# Clone repo
git clone https://github.com/orosha-ai/agent-observability-dashboard

# Install dependencies
pip install flask pandas influxdb-client

# Run dashboard
python3 scripts/observability.py --dashboard
# Open http://localhost:5000

Inspiration

Dynatrace AI Observability App — Enterprise-grade unified observability
Langfuse vs AgentOps benchmarks — Comparison of platforms
Microsoft .NET tracing guide — Practical implementation patterns
OpenLLMetry — OpenTelemetry integration for LLMs

Local-Only Promise

Metrics stored locally (SQLite/InfluxDB)
Dashboard runs locally
No data sent to external services

Version History

v0.1 — MVP: Metrics tracking, trace visualization, dashboard UI
Roadmap: InfluxDB integration, anomaly detection, multi-agent comparison