Metrics reference
When you run a deployment, SpatialOS collects metrics on it, which you can use to monitor the deployment’s health and status. This page lists the metrics that are collected and explains what you can use them for.
Metric detail levels
Metrics are stored at two levels of granularity:
Aggregated metrics: These are kept for 9 days for deployments with the alpha, beta or prod tags. Otherwise they are kept for 1 day.
Detailed metrics: These are more detailed and are extremely useful for debugging. Due to their storage impact, they are only kept for 30 minutes after they’ve been collected. The provided labels allow for much finer grained querying of the data.
For general monitoring of a deployment, use the aggregated metrics. But for detailed investigation, use the detailed metrics.
List of metrics
Worker metrics
Worker connected
A gauge for the number of workers connected to SpatialOS runtime.
Useful for:
- Checking how many players are logged in.
- Checking if the correct number of managed workers are running.
Aggregated metric
- Name: spatialos_worker_connected::sum
- Labels: project, cluster, dpl, dpl_tag, worker_type
Example query:
spatialos_worker_connected::sum{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker"}
Worker update
The worker operation update rate in the last minute for each worker platform.
- Use “update_size_bytes” metrics for bandwidth
- Use “update_messages” metrics for messages sent
Use the detailed metric (spatialos_worker_update_size_bytes:rate1m) to check updates per component type.
Useful for:
- Optimising for performance and cost.
Aggregated metric
- Name: spatialos_worker_update_size_bytes::rate1m
Name: spatialos_worker_update_messages::rate1m
Labels: project, cluster, dpl, dpl_tag, worker_type, direction={“from_worker”|”to_worker”}
Example query:
spatialos_worker_update_size_bytes::rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker", direction="from_worker"}
Detailed metric
- Name: spatialos_worker_update_size_bytes:rate1m
Name: spatialos_worker_update_messages:rate1m
Labels: project, dpl, dpl_tag, worker_type, direction={“from_worker”|”to_worker”}, component_type={
}
Example query:
spatialos_worker_update_size_bytes:rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker", direction="from_worker", component_type="player.Health"}
Node metrics
Node up
A gauge for the number of nodes that are exporting metrics.
Use detailed metrics to break down the value by node category node_cat
.
Useful for:
- Setting up alerts if nodes are not all up.
Aggregated metric
Name: spatialos_node_up::sum
Labels: project, cluster, dpl, dpl_tag
Example query:
spatialos_node_up::sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”}
Detailed metric
Name: spatialos_node_up:sum
Labels: project, dpl, dpl_tag, node_cat
Example query:
spatialos_node_up::sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”, node_cat=”master”}
Node CPU usage ratio
Useful for:
- Optimising for performance and cost.
Aggregated metric
Name: spatialos_node_cpu_used::max_ratio A gauge for the highest ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code) across nodes.
Labels: project, cluster, dpl, dpl_tag, node_cat
Example query:
spatialos_node_cpu_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="gsimbridge"
Detailed metric
Name: spatialos_node_cpu_used:ratio A gauge for the ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code).
Labels: project, dpl, dpl_tag, node, node_cat
Example query:
spatialos_node_cpu_used:ratio{project="test_project", dpl="test_deployment", node="gsimbridge02"
Memory usage ratio
Useful for:
Optimising for performance and cost.
Detecting memory leaks.
Aggregated metric
Name: spatialos_node_memory_used::max_ratio A gauge for the highest ratio of memory used per total available memory across nodes.
Labels: project, cluster, dpl, dpl_tag, node_cat
Example query:
spatialos_node_memory_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="fsim"}
Detailed metric
Name: spatialos_node_memory_used:ratio A gauge for the ratio of memory used per total available memory.
Labels: project, dpl, dpl_tag, node, node_cat
Example query:
spatialos_node_memory_used:ratio{project="test_project", dpl="test_deployment", node="fsim_01"}
Logging metrics
Log rate
A rate for the number of error or warning logs.
Useful for:
- Triggering alerts if the error rate is too high
Aggregated metric
Name: spatialos_logging_logs::rate1m
Labels: project, cluster, dpl, dpl_tag, level={“ERROR”|”WARN”}
Example query:
spatialos_logging_logs::rate1m{project="test_project", dpl="test_deployment", level="ERROR"}
Entity metrics
Entity count
A gauge for the number of entities.
Useful for:
- Debugging peaks or drops of entity counts in your deployment.
- Designing your game and tweaking its mechanics, eg “Are there too many/too few entities of a given kind?”
Aggregated metric
- Name: spatialos_entity_count::sum
- Labels: project, cluster, dpl, dpl_tag
Example query:
spatialos_entity_count::sum{project="test_project", dpl="test_deployment"}
Detailed metric
- Name: spatialos_entity_count:sum
- Labels: project, dpl, dpl_tag, entity_type={“Player”|…}
Example query:
spatialos_entity_count
:sum{project="test_project", dpl="test_deployment", entity_type="Player"}
Entities created
A rate of entities created per minute.
Useful for:
- Debugging spikes of entities created.
Aggregated metric
- Name: spatialos_entity_created::rate1m
- Labels: project, cluster, dpl, dpl_tag
Example query:
spatialos_entity_created::rate1m{project="test_project", dpl="test_deployment"}
Detailed metric
- Name: spatialos_entity_created:rate1m
- Labels: project, dpl, dpl_tag, entity_type={“Player”|…}
Example query:
spatialos_entity_created:rate1m{project="test_project", dpl="test_deployment", entity_type="Player"}
Entities deleted
The rate of entities deleted per minute.
Useful for:
- Debugging spikes of entities deleted.
Aggregated metric
- Name: spatialos_entity_deleted::rate1m
- Labels: project, cluster, dpl, dpl_tag
Example query:
spatialos_entity_deleted::rate1m{project="test_project", dpl="test_deployment"}
Detailed metric
- Name: spatialos_entity_deleted:rate1m
- Labels: project, dpl, dpl_tag, entity_type={“Player”|…}
Example query:
spatialos_entity_deleted:rate1m{project="test_project", dpl="test_deployment", entity_type="Player"}
Command metrics
Command count
The rate of commands sent per minute. The status label values are defined on the API reference pages: C# and C++.
Useful for:
- Alerting and debugging spikes or drops in commands sent.
- Optimising for performance and cost.
Aggregated metric
- Name: spatialos_command_count::rate1m
- Labels: project, cluster, dpl, dpl_tag
Example query:
spatialos_command_count::rate1m{project="test_project", dpl="test_deployment"}
Detailed metric
Name: spatialos_command_count:rate1m
Labels: project, dpl, dpl_tag, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status
Example query:
spatialos_command_count:rate1m{project=”test_project”, dpl=”test_deployment”, component_type={“player.Health”}, command_type="USER_DEFINED”}
Command latency
The latency of commands measured from the SpatialOS runtime receiving the command request to the runtime receiving the command response in the last five minutes in 99th, 90th and 50th percentiles. The status label values are defined on the API reference pages: C# and C++. Latencies are capped at 1 second.
Useful for:
- Alerting abnormal latency in a deployment.
- Debugging latency for certain components.
- Optimising for performance and cost.
Aggregated metric
- Name: spatialos_command_latency_seconds::summary5m
- Labels: project, cluster, dpl, dpl_tag, quantile
Example query:
spatialos_command_latency_seconds::summary5m{project="test_project", dpl_tag="prod", quantile="0.95"}
Detailed metric
- Name: spatialos_command_latency_seconds:summary5m
- Labels: project, dpl, dpl_tag, quantile, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status
Example query:
spatialos_command_latency_seconds:summary5m{project=”test_project”, dpl=”test_deployment”, quantile="0.95", component_type={“player.Health”}, command_type="USER_DEFINED”}
Network metrics
Network egress rate
The rate of total network egress (traffic going out of the cloud) bytes per minute.
Useful for:
- Optimising for performance and cost.
Aggregated metric
- Name: spatialos_network_egress_bytes::rate1m
- Labels: project, cluster, dpl, dpl_tag
Example query:
spatialos_network_egress_bytes::rate1m{project="test_project", dpl="test_deployment"}
Detailed metric
- Name: spatialos_network_egress_bytes:rate1m
- Labels: project, dpl, dpl_tag, node
Example query:
spatialos_network_egress_bytes:rate1m{project="test_project", dpl="test_deployment", node="worker_01"}
Runtime metrics
Worker to runtime latency
The round-trip time (RTT) from workers to the SpatialOS runtime in the last five minutes in 99th, 90th and 50th percentiles.
Useful for:
- Optimising for performance and cost.
Aggregated metric
Name: spatialos_runtime_worker_latency_seconds::summary5m
Labels: project, cluster, dpl, dpl_tag, worker_type, quantile
Example query:
spatialos_runtime_worker_latency_seconds::summary5m{project="test_project", dpl="test_deployment", worker_type="MyCSharpClient", quantile="0.90"}
Snapshot metrics
Snapshot count
A counter for the number of snapshots.
Useful for:
- Alerting when there is a snapshot failure.
Aggregated metric
Name: spatialos_snapshot_count::sum
Labels: project, cluster, dpl, dpl_tag, outcome={“success”|“failure”}
Example query:
spatialos_snapshot_count::sum{project="test_project", dpl="test_deployment", outcome="failure"}