Get SpatialOS

Sites

Menu

Metrics reference

When you run a deployment, SpatialOS collects metrics on it, which you can use to monitor the deployment’s health and status. This page lists the metrics that are collected and explains what you can use them for.

This page also contains information on how to query the metrics using the Prometheus query syntax.

You can query the metrics through code, or on your own analytics platform. Both of these methods are only available to select users for testing. If you’d like access, raise a support request (for customers with a service agreement) or ask on our forums.

Metric detail levels

Metrics are stored at two levels of granularity:

  • Aggregated metrics: These are kept for 9 days for deployments with the alpha, beta or prod tags. Otherwise they are kept for 1 day.

  • Detailed metrics: These are more detailed and are extremely useful for debugging. Due to their storage impact, they are only kept for 30 minutes after they’ve been collected. The provided labels allow for much finer grained querying of the data.

For general monitoring of a deployment, use the aggregated metrics. But for detailed investigation, use the detailed metrics.

List of metrics

Worker metrics

Worker connected

A gauge for the number of workers connected to SpatialOS runtime.

Useful for:

  • Checking how many players are logged in.
  • Checking if the correct number of managed workers are running.
Aggregated metric
  • Name: spatialos_worker_connected::sum
  • Labels: project, cluster, dpl, dpl_tag, worker_type

Example query: spatialos_worker_connected::sum{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker"}

Worker update

The worker operation update rate in the last minute for each worker platform.

  • Use “update_size_bytes” metrics for bandwidth
  • Use “update_messages” metrics for messages sent

Use the detailed metric (spatialos_worker_update_size_bytes:rate1m) to check updates per component type.

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_worker_update_size_bytes::rate1m
  • Name: spatialos_worker_update_messages::rate1m

  • Labels: project, cluster, dpl, dpl_tag, worker_type, direction={“from_worker”|”to_worker”}

Example query: spatialos_worker_update_size_bytes::rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker", direction="from_worker"}

Detailed metric
  • Name: spatialos_worker_update_size_bytes:rate1m
  • Name: spatialos_worker_update_messages:rate1m

  • Labels: project, dpl, dpl_tag, worker_type, direction={“from_worker”|”to_worker”}, component_type={}

Example query: spatialos_worker_update_size_bytes:rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker", direction="from_worker", component_type="player.Health"}

Node metrics

Node up

A gauge for the number of nodes that are exporting metrics. Use detailed metrics to break down the value by node category node_cat.

Useful for:

  • Setting up alerts if nodes are not all up.
Aggregated metric
  • Name: spatialos_node_up::sum

  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_node_up::sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”}

Detailed metric
  • Name: spatialos_node_up:sum

  • Labels: project, dpl, dpl_tag, node_cat

Example query: spatialos_node_up::sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”, node_cat=”master”}

Node CPU usage ratio

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_node_cpu_used::max_ratio A gauge for the highest ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code) across nodes.

  • Labels: project, cluster, dpl, dpl_tag, node_cat

Example query: spatialos_node_cpu_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="gsimbridge"

Detailed metric
  • Name: spatialos_node_cpu_used:ratio A gauge for the ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code).

  • Labels: project, dpl, dpl_tag, node, node_cat

Example query: spatialos_node_cpu_used:ratio{project="test_project", dpl="test_deployment", node="gsimbridge02"

Memory usage ratio

Useful for:

  • Optimising for performance and cost.

  • Detecting memory leaks.

Aggregated metric
  • Name: spatialos_node_memory_used::max_ratio A gauge for the highest ratio of memory used per total available memory across nodes.

  • Labels: project, cluster, dpl, dpl_tag, node_cat

Example query: spatialos_node_memory_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="fsim"}

Detailed metric
  • Name: spatialos_node_memory_used:ratio A gauge for the ratio of memory used per total available memory.

  • Labels: project, dpl, dpl_tag, node, node_cat

Example query: spatialos_node_memory_used:ratio{project="test_project", dpl="test_deployment", node="fsim_01"}

Logging metrics

Log rate

A rate for the number of error or warning logs.

Useful for:

  • Triggering alerts if the error rate is too high
Aggregated metric
  • Name: spatialos_logging_logs::rate1m

  • Labels: project, cluster, dpl, dpl_tag, level={“ERROR”|”WARN”}

Example query: spatialos_logging_logs::rate1m{project="test_project", dpl="test_deployment", level="ERROR"}

Entity metrics

Entity count

A gauge for the number of entities.

Useful for:

  • Debugging peaks or drops of entity counts in your deployment.
  • Designing your game and tweaking its mechanics, eg “Are there too many/too few entities of a given kind?”
Aggregated metric
  • Name: spatialos_entity_count::sum
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_count::sum{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_entity_count:sum
  • Labels: project, dpl, dpl_tag, entity_type={“Player”|…}

Example query: spatialos_entity_count :sum{project="test_project", dpl="test_deployment", entity_type="Player"}

Entities created

A rate of entities created per minute.

Useful for:

  • Debugging spikes of entities created.
Aggregated metric
  • Name: spatialos_entity_created::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_created::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_entity_created:rate1m
  • Labels: project, dpl, dpl_tag, entity_type={“Player”|…}

Example query: spatialos_entity_created:rate1m{project="test_project", dpl="test_deployment", entity_type="Player"}

Entities deleted

The rate of entities deleted per minute.

Useful for:

  • Debugging spikes of entities deleted.
Aggregated metric
  • Name: spatialos_entity_deleted::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_deleted::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_entity_deleted:rate1m
  • Labels: project, dpl, dpl_tag, entity_type={“Player”|…}

Example query: spatialos_entity_deleted:rate1m{project="test_project", dpl="test_deployment", entity_type="Player"}

Entity authority changes

The rate at which the authority of entities changes.

Useful for:

  • Debugging spikes in the frequency of entities crossing worker boundaries.
Aggregated metric
  • Name: spatialos_authority_changes::rate1m
  • Labels: project, dpl, dpl_tag, outcome

Example query: spatialos_authority_changes::rate1m{project="test_project", dpl="test_deployment", outcome="failure"}

Command metrics

Command count

The rate of commands sent per minute. The status label values are defined on the API reference pages: C# and C++.

Useful for:

  • Alerting and debugging spikes or drops in commands sent.
  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_command_count::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_command_count::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_command_count:rate1m

  • Labels: project, dpl, dpl_tag, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status

Example query: spatialos_command_count:rate1m{project=”test_project”, dpl=”test_deployment”, component_type={“player.Health”}, command_type="USER_DEFINED”}

Command latency

The latency of commands measured from the SpatialOS runtime receiving the command request to the runtime receiving the command response in the last five minutes in 99th, 90th and 50th percentiles. The status label values are defined on the API reference pages: C# and C++. Latencies are capped at 1 second.

Useful for:

  • Alerting abnormal latency in a deployment.
  • Debugging latency for certain components.
  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_command_latency_seconds::summary5m
  • Labels: project, cluster, dpl, dpl_tag, quantile

Example query: spatialos_command_latency_seconds::summary5m{project="test_project", dpl_tag="prod", quantile="0.95"}

Detailed metric
  • Name: spatialos_command_latency_seconds:summary5m
  • Labels: project, dpl, dpl_tag, quantile, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status

Example query: spatialos_command_latency_seconds:summary5m{project=”test_project”, dpl=”test_deployment”, quantile="0.95", component_type={“player.Health”}, command_type="USER_DEFINED”}

Network metrics

Network egress rate

The rate of total network egress (traffic going out of the cloud) bytes per minute.

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_network_egress_bytes::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_network_egress_bytes::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_network_egress_bytes:rate1m
  • Labels: project, dpl, dpl_tag, node

Example query: spatialos_network_egress_bytes:rate1m{project="test_project", dpl="test_deployment", node="worker_01"}

Runtime metrics

Worker to runtime latency

The round-trip time (RTT) from workers to the SpatialOS runtime in the last five minutes in 99th, 90th and 50th percentiles.

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_runtime_worker_latency_seconds::summary5m

  • Labels: project, cluster, dpl, dpl_tag, worker_type, quantile

Example query: spatialos_runtime_worker_latency_seconds::summary5m{project="test_project", dpl="test_deployment", worker_type="MyCSharpClient", quantile="0.90"}

View lateness

The latency for an update anywhere in the system to be reflected in a view at the 50th percentile.

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_runtime_view_lateness_50th_percentile_ms
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_runtime_view_lateness_50th_percentile_ms{project="test_project", dpl="test_deployment"}

Snapshot metrics

Snapshot count

A counter for the number of snapshots.

Useful for:

  • Alerting when there is a snapshot failure.
Aggregated metric
  • Name: spatialos_snapshot_count::sum

  • Labels: project, cluster, dpl, dpl_tag, outcome={“success”|“failure”}

Example query: spatialos_snapshot_count::sum{project="test_project", dpl="test_deployment", outcome="failure"}

Search results

Was this page helpful?

Thanks for letting us know!

Thanks for your feedback

Need more help? Ask on the forums