The Five Monitoring Dimensions for AI Systems
1. Infrastructure health (standard)
- API error rate by model and endpoint
- p50, p95, p99 latency
- Queue depth and consumer lag (for async systems)
- Rate limit hit rate
2. Cost anomalies
- Cost per request vs historical baseline
- Token count per request (input + output) vs baseline
- Total daily cost vs budget
- Sudden cost spikes (often indicate runaway retries or malformed inputs)
3. Output quality metrics
- Quality score distribution (% of outputs above/below quality threshold)
- Low-confidence output rate (what % of outputs have confidence below the review threshold)
- Human override rate (what % of auto-approved outputs are later corrected)
- Evaluation set score (weekly automated evaluation against ground truth)
4. Data quality signals
- Input data quality metrics (completeness, format compliance)
- Distribution shift indicators (statistical distance of current inputs from baseline)
- Missing required fields rate
- Upstream data freshness (when did the source data last update?)
5. Business outcome metrics
- The business metric the AI system is designed to move — measured continuously
- Downstream process health (if AI outputs feed another process, track that process's health)
- Customer-visible error rate or satisfaction score