Monitoring
Developers and administrators can monitor application metrics, system metrics, and infrastructure metrics for Grainite. Grainite exposes a Prometheus endpoint on port 5064 that can be used for gathering and processing monitoring data. The counters and gauges APIs discussed in the API section of the documentation allow for the creation of these metrics which are then exposed to Prometheus by Grainite and can be visualized in a tool like Grafana. More details here.
Developer-defined application metrics
Developers can define application metrics using the counters and gauges APIs. The Userflows example included in our Samples demonstrates the following developer-defined application metrics.
Description | Metric Used | Metric Type |
---|---|---|
Completed flows |
| Counter |
Abandoned flows |
| Counter |
Current flows |
| Gauge |
Current flows by type |
| Gauge |
Grainite-defined metrics
Grainite provides built-in metrics that allow developers and administrators to monitor
Application runtime metrics
Event processing metrics
Database metrics
Application Runtime Metrics
Description | Metric Used | Metric Type |
---|---|---|
Rate of action Invocation per minute |
| Gauge |
Count of actions errors |
| Counter |
Average action execution Latency |
| Counter |
Paused Endpoints due to failures |
| Gauge |
Task execution errors |
| Counter |
Task instance execution errors |
| Counter |
Task execution status |
| Gauge |
Event processing metrics
Description | Metric Used | Metric Type |
---|---|---|
Message delivery latency for the last 30s window of data. This is published for the 50/95/99th percentiles |
| Counter |
Topic consumption latency for the last 30s window of data. This is published for the 50/95/99th percentiles |
| Counter |
Batch Size of fetched requests |
| Counter |
Total events fetched from Topic |
| Counter |
Total events fetched and processed |
| Counter |
Total Grain to Grain messages fetched |
| Counter |
Total Grain to Grain messages fetched and processed |
| Counter |
Indicates how many events have been pulled from a topic but has not been processed |
| Gauge |
Database Metrics
Description | Metric Used | Metric Type |
---|---|---|
Average latency to process requests to database |
| Counter |
Cumulative count of writes to Grains |
| Counter |
Disk currently used by apps and system |
| Gauge |
Number of Grain updates that have materialized |
| Counter |
Number of Grain updates pending |
| Gauge |
Cluster Health
Description | Metric Used | Metric Used |
---|---|---|
Total high load and total stalled metrics indicate the health of compute capability of the Grainite cluster |
| Counter |
WAL disk used metric provides the current utilization of Grainite Write Ahead Log (WAL) |
| Gauge |
The current allowed rate and current target rate help to determine if there is a continuous event execution overload on the cluster |
| Gauge |
Infrastructure Metrics
Cloud providers' monitoring solutions can be used to gather infrastructure-level metrics. We recommend monitoring the following metrics:
CPU usage: CPU usage by each Kubernetes node is measured in the number of CPU cores
CPU utilization: CPU utilization by each node measured as a percent of available CPU resources
Bytes transmitted: Throughput of network traffic being sent out of each node measured in bytes
Bytes received: Throughput of network traffic being received by each node, measured in bytes
Memory usage: Memory usage by each node measured in GiB
Disk read: Throughput of disk IOPS being read by each node to its persistent disk
Disk write: Throughput of disk IOPS being written by each node to its persistent disk
Additional metrics can be added as desired for your deployments within the cloud provider's monitoring console.
Last updated