Get SpatialOS

Sites

Menu

Metrics reference

These metrics are currently only available to select customers for testing. Contact support if you’d like access to these metrics.

SpatialOS exposes a set of metrics for users. You can use these for monitoring the health and status of your deployment or triggering alerts.

When you run a deployment, SpatialOS collects metrics on it, which you can use to monitor the deployment’s health and status. This page gives details of the metrics that are collected.

The metrics are stored in Prometheus. To query these metrics yourself, use the Prometheus query syntax.

If you want to create a custom dashboard based on these metrics, see the Build a metrics dashboard page.

If you want to access these metrics programmatically (for example, from a continuous integration script), see Accessing metrics through code page.

Metric detail levels

Metrics are stored at two levels of granularity:

  • Aggregated metrics: These are kept for 9 days for deployments with the alpha, beta or prod tags. Otherwise they are kept for 1 day.

  • Detailed metrics: These are more detailed and are extremely useful for debugging. Due to their storage impact, they are only kept for 30 minutes after they’ve been collected. The provided labels allow for much finer grained querying of the data.

For general monitoring of a deployment, use the aggregated metrics. But for detailed investigation, use the detailed metrics.

Worker metrics

Worker connected

A gauge for the number of workers connected to SpatialOS runtime.

Use cases:

  • Checking how many players are logged in.
  • Checking if the correct number of managed workers are running.
Aggregated metric
  • Name: spatialos_worker_connected::sum
  • Labels: project, cluster, dpl, dpl_tag, worker_type

Example query: spatialos_worker_connected::sum{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="UnityFSim"}

Worker update

The worker operation update rate in the last minute for each worker platform. - Use “update_size_bytes” metrics for bandwidth, - Use “update_messages” metrics for messages sent. Use the detailed metric (spatialos_worker_update_size_bytes:rate1m) to check updates per component type.

Use cases:

  • Optimizing for performance and cost.
Aggregated metric
  • Name: spatialos_worker_update_size_bytes::rate1m
  • Name: spatialos_worker_update_messages::rate1m

  • Labels: project, cluster, dpl, dpl_tag, worker_type, direction={“from_worker”|”to_worker”}

Example query: spatialos_worker_update_size_bytes::rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="UnityFSim", direction="from_worker"}

Detailed metric
  • Name: spatialos_worker_update_size_bytes:rate1m
  • Name: spatialos_worker_update_messages:rate1m

  • Labels: project, dpl, dpl_tag, worker_type, direction={“from_worker”|”to_worker”}, component_type={}

Example query: spatialos_worker_update_size_bytes:rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="UnityFSim", direction="from_worker", component_type="player.Health"}

Node metrics

Node up

A gauge for the number of nodes that are exporting metrics. Use detailed metrics to break down the value by node category node_cat.

Use cases:

  • Setting up alerts if nodes are not all up.
Aggregated metric
  • Name: spatialos_node_up::sum

  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_node_up::sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”}

Detailed metric
  • Name: spatialos_node_up:sum

  • Labels: project, dpl, dpl_tag, node_cat

Example query: spatialos_node_up::sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”, node_cat=”master”}

Node CPU usage ratio

Use cases:

  • Optimizing for performance and cost.
Aggregated metric
  • Name: spatialos_node_cpu_used::max_ratio A gauge for the highest ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code) across nodes.

  • Labels: project, cluster, dpl, dpl_tag, node_cat

Example query: spatialos_node_cpu_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="gsimbridge"

Detailed metric
  • Name: spatialos_node_cpu_used:ratio A gauge for the ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code).

  • Labels: project, dpl, dpl_tag, node, node_cat

Example query: spatialos_node_cpu_used:ratio{project="test_project", dpl="test_deployment", node="gsimbridge02"

Memory usage ratio

Use cases:

  • Optimizing for performance and cost.

  • Detecting memory leaks.

Aggregated metric
  • Name: spatialos_node_memory_used::max_ratio A gauge for the highest ratio of memory used per total available memory across nodes.

  • Labels: project, cluster, dpl, dpl_tag, node_cat

Example query: spatialos_node_memory_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="fsim"}

Detailed metric
  • Name: spatialos_node_memory_used:ratio A gauge for the ratio of memory used per total available memory.

  • Labels: project, dpl, dpl_tag, node, node_cat

Example query: spatialos_node_memory_used:ratio{project="test_project", dpl="test_deployment", node="fsim_01"}

Logging metrics

Log rate

A counter for the number of user facing logs that are error or warning.

Use cases:

  • Trigger alerts if the error rate is too high
Aggregated metric
  • Name: spatialos_logging_logs::rate1m

  • Labels: project, cluster, dpl, dpl_tag, level={“ERROR”|”WARN”}

Example query: spatialos_logging_logs::rate1m{project="test_project", dpl="test_deployment", level="ERROR"}

Entity metrics

Entity count

A gauge for the number of entities.

Use cases:

  • Debugging peaks or drops of entity counts in your deployment.
  • Design your game and tweaking its mechanics. eg: Are there too many / too little entities of a given kind ?
Aggregated metric
  • Name: spatialos_entity_count::sum
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_count::sum{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_entity_count:sum
  • Labels: project, dpl, dpl_tag, entity_type={“Player”|…}

Example query: spatialos_entity_count :sum{project="test_project", dpl="test_deployment", entity_type="Player"}

Entities created

A rate of entities created per minute.

Use cases:

  • Debugging spikes of entities created.
Aggregated metric
  • Name: spatialos_entity_created::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_created::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_entity_created:rate1m
  • Labels: project, dpl, dpl_tag, entity_type={“Player”|…}

Example query: spatialos_entity_created:rate1m{project="test_project", dpl="test_deployment", entity_type="Player"}

Entities deleted

The rate of entities deleted per minute.

Use cases:

  • Debugging spikes of entities deleted.
Aggregated metric
  • Name: spatialos_entity_deleted::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_deleted::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_entity_deleted:rate1m
  • Labels: project, dpl, dpl_tag, entity_type={“Player”|…}

Example query: spatialos_entity_deleted:rate1m{project="test_project", dpl="test_deployment", entity_type="Player"}

Command metrics

Command count

The rate of commands sent per minute. The status label values are defined on the API reference page.

Use cases:

  • Alerting and debugging spikes or drops in commands sent.
  • Optimizing for performance and cost.
Aggregated metric
  • Name: spatialos_command_count::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_command_count::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_command_count:rate1m

  • Labels: project, dpl, dpl_tag, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status

Example query: spatialos_command_count:rate1m{project=”test_project”, dpl=”test_deployment”, component_type={“player.Health”}, command_type="USER_DEFINED”}

Command latency

The latency of commands measured from the SpatialOS runtime receiving the command request to the runtime receiving the command response in the last five minutes in 99th, 90th and 50th percentiles. The status label values are defined on the API reference page. Latencies are capped at 1 second.

Use cases:

  • Alerting abnormal latency in a deployment.
  • Debugging latency for certain components.
  • Optimizing for performance and cost.
Aggregated metric
  • Name: spatialos_command_latency_seconds::summary5m
  • Labels: project, cluster, dpl, dpl_tag, quantile

Example query: spatialos_command_latency_seconds::summary5m{project="test_project", dpl_tag="prod", quantile="0.95"}

Detailed metric
  • Name: spatialos_command_latency_seconds:summary5m
  • Labels: project, dpl, dpl_tag, quantile, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status

Example query: spatialos_command_latency_seconds:summary5m{project=”test_project”, dpl=”test_deployment”, quantile="0.95", component_type={“player.Health”}, command_type="USER_DEFINED”}

Network metrics

Network egress rate

The rate of total network egress (traffic going out of the cloud) bytes per minute.

Use cases:

  • Optimizing for performance and cost.
Aggregated metric
  • Name: spatialos_network_egress_bytes::rate1m
  • Labels: project, cluster, cluster, dpl, dpl_tag

Example query: spatialos_network_egress_bytes::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric
  • Name: spatialos_network_egress_bytes:rate1m
  • Labels: project, dpl, dpl_tag, node

Example query: spatialos_command_count:rate1m{project="test_project", dpl="test_deployment", node="worker_01"}

Runtime metrics

Worker to runtime latency

The round-trip time (RTT) from workers to the SpatialOS runtime in the last five minutes in 99th, 90th and 50th percentiles.

Use cases:

  • Optimizing for performance and cost.
Aggregated metric
  • Name: spatialos_runtime_worker_latency_seconds::summary5m

  • Labels: project, cluster, dpl, dpl_tag, worker_type, quantile

Example query: spatialos_runtime_worker_latency_seconds::summary5m{project="test_project", dpl="test_deployment", worker_type="UnityClient", quantile="0.90"}

Snapshot metrics

Snapshot count

A counter for the number of snapshots.

Use cases:

  • Alerting when there is a snapshot failure.
Aggregated metric
  • Name: spatialos_snapshot_count::sum

  • Labels: project, cluster, dpl, dpl_tag, outcome={“success”|“failure”}

Example query: spatialos_snapshot_count::sum{project="test_project", dpl="test_deployment", outcome="failure"}

Was this page helpful?

Thanks for letting us know!

Thanks for your feedback

Need more help? Ask on the forums