Alerting on metrics
You can set up alerts based on the metrics collected for SpatialOS cloud deployments. For example, you might want to receive alerts if the number of nodes is lower than it should be, or if snapshots keep failing.
Example: setting up alerts using Grafana
In this example, you want to receive alerts if there are no workers connected for a period of 5 minutes or more.
In Grafana, create a query to return the number of workers connected (of a particular type):
You can copy the query text from the “Workers connected” example below.
For more details, see Grafana’s documentation on querying using Prometheus, and the Prometheus’s documentation on querying.
For more examples of how to query the SpatialOS deployment metrics, see the metrics reference page.
Create an alert.
Set the conditions so that you’ll be alerted if the maximum value returned from the query, within the last 5 minutes, is below 1. In other words, you’ll be alerted if, in the last 5 minutes, there are no workers connected.
For more details, see Grafana’s documentation on configuring alerts.
Set up notifications for the alert. For details, see Grafana’s documentation on alert notifications.
Examples of sensible alerts
Many of your alerts will need to be game-specific, but there are a few that we think are useful for everyone as a starting point.
Snapshot failure
Alert me if, in the last hour, more than 30% of snapshots failed.
Query:
spatialos_snapshot_count::sum{project="project_name", dpl="deployment_name", outcome="failure"} / ignoring(cluster, dpl, outcome, project) sum(spatialos_snapshot_count::sum{project="project_name", dpl="deployment_name"}) > 0
Alert:
WHEN max() OF query (A, 1h, now) IS ABOVE 0.3
Workers connected
Alert me if, in the last 5 minutes, there were no workers connected (of a particular type).
Query:
spatialos_worker_connected::sum{project="test_project", dpl="test_deployment", dpl_tag="tag", worker_type="worker_type"} > 0
Alert:
WHEN max() OF query (A, 5m, now) IS BELOW 1
Other metrics to alert on
You might also want to set up alerts for the following metrics:
How you set up the queries and alerts for these metrics depends on your game. Think about what values you expect to see, and what values you’d consider problematic.