System Metrics and Alarms
For managed environments (cloud, single-tenant and managed on-premise), Grafana, Prometheus, and several exporters are deployed to the environment to aggregate system metrics, render them in custom dashboards and send alerts when deviations or dangerous values are detected.
Some of the systems that are monitored include:
- Load balancer
- Redis
- Postgres
- Postgres worker queue
- Redis worker queue
- Workflow executions
- Connect Proxy Requests
- Microservices
Metrics and alarms include:
- Request count / response time
- HTTP response codes
- Connection count
- Bytes processed
- TLS errors

Metrics and alarms include:
- Uptime
- Clients
- Memory usage
- Commands executed/second
- Hits/misses per second
- Total items per database
- Network I/O
- Expiring vs non-expiring keys
- Expired/evicted
- Command calls/second

Metrics and alarms include:
- CPU usage
- Memory usage
- Transactions
- Locks
- Conflicts/deadlocks
- Cache hit rate

Metrics and alarms include:
- Workers
- Throughput
- Average wait
- Job statuses
- Job duration
- Error rate
- Average wait per queue
- Workers per queue
Metrics and alarms include:
- Queue length
- Queue states
- Failures by queue
- Job duration

Metrics and alarms include:
- Workflow executions
- Step executions
- Workflow completion rate

Metrics and alarms include:
- Total request count
- Latency
- Open requests
- Status code
- Status code by integration
- Status code by credential

Metrics and alarms include:
- Requests
- Apdex score
- Error rate
- Event loop lag
- CPU
- Heap
- Request duration
Last modified 10mo ago