overgrowth / API_MONITORING.md
Graham Paasch
Improve monitoring UX and deterministic offline outputs
1209812

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

🔍 API Monitoring & Transparency

What Judges Will See

When you click "🚀 Run Full Pipeline" in Overgrowth, you get complete transparency into every API call the system makes:

📊 Live Session Statistics Dashboard

At the top of the interface, you'll see a real-time dashboard showing:

  • Total Cost: Running total of all AI API costs (calculated in real-time)
  • API Calls: Count of LLM calls (OpenAI/Anthropic/OpenRouter) and GNS3 calls
  • Token Usage: Total tokens consumed (input/output breakdown)
  • Session Duration: How long the current session has been running
  • Error Count: Any failed API calls
  • Cost Per Call: Average cost efficiency

🤖 Real-Time API Activity Feed

Every single API call is logged with full details:

✅ 🤖 **ANTHROPIC** `claude-3-haiku-20240307` 
   | 📝 1,247→892 tokens 
   | 💰 $0.0043 
   | ⏱️ 2,314ms

✅ 🌐 **LOCAL-MCP** `get_topology` 
   | ⏱️ 458ms

✅ 🤖 **OPENAI** `gpt-4o` 
   | 📝 2,105→1,543 tokens 
   | 💰 $0.0201 
   | ⏱️ 3,127ms

💰 Cost Tracking Features

  1. Per-Model Pricing - Accurate pricing for each AI model:

    • GPT-4o: $2.50/$10.00 per 1M tokens (in/out)
    • Claude 3 Haiku: $0.25/$1.25 per 1M tokens
    • Claude 3.5 Sonnet: $3.00/$15.00 per 1M tokens
  2. Real-Time Calculation - Costs computed immediately after each call

  3. Session Totals - Running total updated after every operation

  4. Token Breakdown - Separate counts for input vs output tokens

  5. Budget Guardrails - Optional session budget with alerts

    • Set API_BUDGET_USD (e.g., 50 or 15.5) to display remaining budget and trigger warnings
    • Tweak alert threshold with API_BUDGET_ALERT_FRACTION (default 0.8 for 80%)

Why This Impresses Judges

1. Enterprise-Grade Observability

This isn't a demo - it's production-ready software with full audit trails

2. Cost Transparency

Users know exactly what they're spending in real-time (critical for enterprise adoption)

3. Multi-API Tracking

Monitors both AI APIs (OpenAI/Anthropic) AND infrastructure APIs (GNS3)

4. Proof of Execution

Every claim is backed by verifiable API calls with timestamps and metrics

5. Error Visibility

Failed calls are tracked and displayed - shows robust error handling

Technical Implementation

Architecture

User Action
    ↓
Agent/Pipeline Code
    ↓
API Call (LLM or GNS3)
    ↓
API Monitor (tracks start)
    ↓
Execute API Request
    ↓
API Monitor (tracks completion with tokens/cost/timing)
    ↓
UI Updates (real-time refresh of stats and activity feed)

Key Components

  1. agent/api_monitor.py - Singleton monitor tracking all API usage

    • Thread-safe for concurrent calls
    • Tracks tokens, costs, timing, errors
    • Exports JSON for auditing
  2. agent/llm_client.py - Instrumented LLM client

    • Tracks every OpenAI/Anthropic/OpenRouter call
    • Captures actual token usage from API responses
    • Calculates costs based on current pricing
  3. agent/local_mcp.py - Instrumented MCP client

    • Tracks all GNS3 API calls
    • Monitors infrastructure operations
    • Provides timing data
  4. app.py - Gradio UI integration

    • Live dashboard at top of interface
    • Auto-refresh after pipeline execution
    • Manual refresh buttons for real-time updates

Demo Scenario for Judges

  1. Judge opens the Space

    • Sees "Session Statistics" showing $0.00 cost, 0 calls
  2. Judge clicks "🚀 Run Full Pipeline"

    • API Activity Feed populates in real-time
    • Each LLM call shows model, tokens, cost, timing
    • GNS3 calls show infrastructure operations
  3. Judge sees completion

    • Pipeline status includes API usage summary at top
    • Session Statistics show total cost (e.g., $0.15)
    • Activity Feed shows 5-10 API calls with full details
  4. Judge clicks "🔄 Refresh Stats"

    • Dashboard updates instantly
    • All data persists across the session

Comparison to Other Submissions

Most hackathon projects hide their API usage. Overgrowth makes it a feature:

Other Projects Overgrowth
❌ Hidden API costs ✅ Real-time cost tracking
❌ No token visibility ✅ Per-call token counts
❌ Unknown model usage ✅ Model names displayed
❌ No timing data ✅ Response time for every call
❌ Silent failures ✅ Error tracking with messages

Future Enhancements

  • Budget Alerts: Warn when session cost exceeds threshold
  • Cost Optimization: Suggest cheaper models for simple tasks
  • Historical Analytics: Track costs over time with charts
  • Export Reports: Download API usage as CSV/JSON for accounting
  • Provider Comparison: Show cost differences between OpenAI vs Anthropic
  • Streaming Token Counter: Live token count during streaming responses

For Development/Testing

Reset the monitor:

from agent.api_monitor import monitor
monitor.reset()

Export session data:

json_data = monitor.export_json()
# Save to file or send to analytics platform

Access raw calls:

all_calls = monitor.get_all_calls()
for call in all_calls:
    print(f"{call.provider}: ${call.estimated_cost}")

This level of transparency demonstrates that Overgrowth is enterprise-ready, not just a hackathon prototype.