Sites

Menu
These are the docs for 14.0, an old version of SpatialOS. The docs for this version are frozen: we do not correct, update or republish them. 14.2 is the newest →

Metrics reference

When you run a deployment, SpatialOS collects metrics on it, which you can use to monitor the deployment’s health and status. This page lists the metrics that are collected and explains what you can use them for.

This page also contains information on how to query the metrics using the Prometheus query syntax.

You can query the metrics through code, or on your own analytics platform.

Metric detail levels

Metric values are retrieved roughly every 15 seconds, and stored at two levels of granularity:

  • Aggregated metrics: These are kept for 9 days for deployments with the alpha, beta or prod tags. Otherwise they are kept for 1 day.

  • Detailed metrics (alpha): These are more detailed and extremely useful for debugging. Due to their storage impact, these metrics are only kept for 30 minutes after they’ve been collected. The provided labels allow for much finer grained querying of the data.

For general monitoring of a deployment, use the aggregated metrics. But for detailed investigation, use the detailed metrics (alpha).

Worker metrics

Login outcome

A counter for the outcome of connection attempts to the SpatialOS Runtime using the connect methods provided by the Worker SDK (C#/C++).

Useful for:

  • Checking how many connection attempts have been made.
  • Checking how many attempts have been rejected because of rate or capacity limits.
Aggregated metric

Metric name: spatialos_login_outcome::sum

Label Description
project The name of the project (for example test_project). This label is mandatory.
dpl The name of the deployment (for example test_deployment).
worker_type The name of the worker type (for example MyCSharpWorker).
outcome The outcome of the login request. Possible values: SUCCESS, JOIN_RATE_EXCEEDED, CAPACITY_EXCEEDED.

Example query

spatialos_login_outcome::sum{project="test_project", dpl="test_deployment", worker_type="MyCSharpWorker"}

Worker connected

A gauge for the number of worker instances connected to the SpatialOS Runtime.

Useful for:

  • Checking how many players are logged in.
  • Checking if the correct number of managed workers are running.
Aggregated metric

Metric name: spatialos_worker_connected::sum

Label Description
project The name of the project (for example test_project). This label is mandatory.
cluster The name of the cluster which the deployment is running in (for example eu1-prod).
dpl The name of the deployment (for example test_deployment).
dpl_tag The deployment stage of the deployment, as set in the SpatialOS Console. Possible values: beta, alpha, prod.
worker_type The name of the worker type (for example MyCSharpWorker).

Example query

spatialos_worker_connected::sum{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker"}

Worker update

The worker operation update rate in the last minute for each worker platform.

  • Use “update_size_bytes” metrics for bandwidth
  • Use “update_messages” metrics for messages sent

Use the detailed metric (alpha) (spatialos_worker_update_size_bytes:rate1m) to check updates per component type.

Useful for:

  • Optimising for performance and cost.
Aggregated metric

Metric names: spatialos_worker_update_size_bytes::rate1m, spatialos_worker_update_messages::rate1m

Label Description
project The name of the project (for example test_project). This label is mandatory.
cluster The name of the cluster which the deployment is running in (for example eu1-prod).
dpl The name of the deployment (for example test_deployment).
dpl_tag The deployment stage of the deployment, as set in the SpatialOS Console. Possible values: beta, alpha, prod.
worker_type The name of the worker type (for example MyCSharpWorker).
direction The direction of the message (egress or ingress). Possible values: from_worker, to_worker.

Example query

spatialos_worker_update_size_bytes::rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker", direction="from_worker"}
Detailed metric (alpha)
  • Metric names: spatialos_worker_update_size_bytes:rate1m, spatialos_worker_update_messages:rate1m
Label Description
project The name of the project (for example test_project). This label is mandatory.
cluster The name of the cluster which the deployment is running in (for example eu1-prod).
dpl The name of the deployment (for example test_deployment).
dpl_tag The deployment stage of the deployment, as set in the SpatialOS Console. Possible values: beta, alpha, prod.
worker_type The name of the worker type (for example MyCSharpWorker).
direction The direction of the message (egress or ingress). Possible values: from_worker, to_worker.
component_type The fully-qualified name of a component as defined in the schema (for example player.Health).

Example query:

spatialos_worker_update_size_bytes:rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="MyCSharpWorker", direction="from_worker", component_type="player.Health"}

Worker CPU Usage (alpha)

A gauge for the average, minimum, and maximum CPU usage across a given worker type. Usage is reported as a ratio of the cores used to the total available cores, so 1.0 represents 100% utilization of all available cores.

For example, assume you’ve got only one instance of only one worker type running on a given node that has 4 CPU cores, and nothing else is running on the node. If this worker instance is using 50% of a single core, this metric reports 0.125 CPU usage across all aggregations: avg, min, and max. The metric is calculated as follows: 1 represents 4 cores, so 1 core is 0.25 and 50% of 1 core is 0.125.

Useful for:

  • Determining whether any worker instances of a given worker type are significantly overloaded or underloaded, by comparing the minimum and maximum usage with the average usage. If the maximum of a single worker instance is far above average and minimum of other worker instances of the same worker type, this worker instance might be having performance issues.
  • If you have only one single instance of a worker type, this metric indicates the actual CPU usage of that instance. Actual values are the same across avg, min, and max.

To troubleshoot the performance issue that you identified, you can attach a profiler or debugger to a worker instance using SSH/TCP port forwarding that you suspect to be having performance issues.

Aggregated metrics
  • Name: spatialos_worker_cpu_usage_ratio::avg
  • Name: spatialos_worker_cpu_usage_ratio::min
  • Name: spatialos_worker_cpu_usage_ratio::max

  • Labels: project, cluster, dpl, dpl_tag, worker_type

Example query: spatialos_worker_cpu_usage_ratio::avg{project="test_project",dpl="test_deployment",worker_type="UnrealWorker"}

Worker Memory Usage (alpha)

A gauge for the minimum, average, and maximum memory usage across a given worker type. The memory usage is measured as the resident set size (RSS) and is reported in bytes.

Useful for:

  • Determining whether any worker instances are using significantly less or more memory than others of the same type, by comparing the minimum and maximum usage to the average. If the maximum of a single worker instance is far above the average and minimum of all the other worker instances of the same worker type, this worker instance might be having memory issues.
  • If you have only one single instance of a worker type, this metric indicates the actual memory usage of that instance. Actual values are the same across avg, min, and max.

To troubleshoot the memory issue that you identified, you can attach a profiler or debugger to a worker instance using SSH/TCP port forwarding that you suspect to be having performance issues.

Aggregated metrics

  • Name: spatialos_worker_memory_bytes::avg
  • Name: spatialos_worker_memory_bytes::min
  • Name: spatialos_worker_memory_bytes::max

  • Labels: project, cluster, dpl, dpl_tag, worker_type

Example query: spatialos_worker_memory_bytes::avg{project="test_project",dpl="test_deployment",worker_type="UnrealWorker"}

Node metrics

Node up

A gauge for the number of nodes that are exporting metrics. Use detailed metrics (alpha) to break down the value by node category node_cat.

Useful for:

  • Setting up alerts if nodes are not all up.
Aggregated metric
  • Name: spatialos_node_up::sum

  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_node_up::sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”}

Detailed metric (alpha)
  • Name: spatialos_node_up:sum

  • Labels: project, dpl, dpl_tag, node_cat

Example query: spatialos_node_up:sum{project=”test_project”, dpl=”test_deployment”, dpl_tag=”prod”, node_cat=”master”}

Node CPU usage ratio

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_node_cpu_used::max_ratio A gauge for the highest ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code) across nodes.

  • Labels: project, cluster, dpl, dpl_tag, node_cat

Example query: spatialos_node_cpu_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="gsimbridge"}

Detailed metric (alpha)
  • Name: spatialos_node_cpu_used:ratio A gauge for the ratio of CPU cores used per total available CPU cores (i.e. the CPU cores available for user code).

  • Labels: project, dpl, dpl_tag, node, node_cat

Example query: spatialos_node_cpu_used:ratio{project="test_project", dpl="test_deployment", node="gsimbridge02"}

Node CPU core usage

A gauge for the maximum CPU usage of a single core on a worker node.

For example, if a single core of a CPU on a node category is fully utilized, this metric reports 1.0. If the highest per-core utilization of that CPU is 50%, this metric report 0.5.

This metric is supported only for worker nodes.

Useful for:

  • Determining whether work is being distributed across cores on a worker node, by comparing the usage of the most utilized core (spatialos_node_cpucore_use:rate1m_max) with the average core usage (spatialos_node_cpu_used:ratio). You can use this metric to understand whether a process, for example, a worker instance on a node might be blocked by the performance issues of a single core.
Detailed metric (alpha)
  • Name: spatialos_node_cpucore_used:rate1m_max A gauge for usage (seconds/second) of the most utilized core.

  • Labels: project, dpl, dpl_tag, node

Example query: spatialos_node_cpucore_used:rate1m_max{project="test_project", dpl="test_deployment", node="gsimbridge02"}

Memory usage ratio

Useful for:

  • Optimising for performance and cost.

  • Detecting memory leaks.

Aggregated metric
  • Name: spatialos_node_memory_used::max_ratio A gauge for the highest ratio of memory used per total available memory across nodes.

  • Labels: project, cluster, dpl, dpl_tag, node_cat

Example query: spatialos_node_memory_used::max_ratio{project="test_project", dpl="test_deployment", node_cat="fsim"}

Detailed metric (alpha)
  • Name: spatialos_node_memory_used:ratio A gauge for the ratio of memory used per total available memory.

  • Labels: project, dpl, dpl_tag, node, node_cat

Example query: spatialos_node_memory_used:ratio{project="test_project", dpl="test_deployment", node="fsim_01"}

Disk space available

A gauge for the number of bytes available on the filesystem root (/).

  • Name: spatialos_node_filesystem_available_bytes::sum
  • Labels: project, dpl, dpl_tag, node, node_cat

Useful for:

  • Setting up alerts for disk space usage

Example query: spatialos_node_filesystem_available_bytes::sum{project="test_project", dpl="test_deployment", node="workers"}

Logging metrics

Log rate

A rate for the number of error or warning logs.

Useful for:

  • Triggering alerts if the error rate is too high
Aggregated metric
  • Name: spatialos_logging_logs::rate1m

  • Labels: project, cluster, dpl, dpl_tag, level={“ERROR”|”WARN”}

Example query: spatialos_logging_logs::rate1m{project="test_project", dpl="test_deployment", level="ERROR"}

Entity metrics

Entity count

A gauge for the number of entities.

Useful for:

  • Debugging peaks or drops of entity counts in your deployment.
  • Designing your game and tweaking its mechanics, eg “Are there too many/too few entities of a given kind?”
Aggregated metric
  • Name: spatialos_entity_count::sum
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_count::sum{project="test_project", dpl="test_deployment"}

Entities created

A rate of entities created per minute.

Useful for:

  • Debugging spikes of entities created.
Aggregated metric
  • Name: spatialos_entity_created::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_created::rate1m{project="test_project", dpl="test_deployment"}

Entities deleted

The rate of entities deleted per minute.

Useful for:

  • Debugging spikes of entities deleted.
Aggregated metric
  • Name: spatialos_entity_deleted::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_entity_deleted::rate1m{project="test_project", dpl="test_deployment"}

Entity write access authority changes

The rate at which the write access authority of entities changes.

Useful for:

  • Debugging spikes in the frequency of entities crossing worker boundaries.
Aggregated metric
  • Name: spatialos_authority_changes::rate1m
  • Labels: project, dpl, dpl_tag, outcome

Example query: spatialos_authority_changes::rate1m{project="test_project", dpl="test_deployment", outcome="failure"}

Command metrics

Command count

The rate of commands sent per minute. The status label values are defined on the API reference pages: C# and C++.

Useful for:

  • Alerting and debugging spikes or drops in commands sent.
  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_command_count::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_command_count::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric (alpha)
  • Name: spatialos_command_count:rate1m

  • Labels: project, dpl, dpl_tag, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status

Example query: spatialos_command_count:rate1m{project=”test_project”, dpl=”test_deployment”, component_type={“player.Health”}, command_type="USER_DEFINED”}

Command latency

The latency of commands measured from the SpatialOS Runtime receiving the command request to the Runtime receiving the command response in the last five minutes in 99th, 90th and 50th percentiles. The status label values are defined on the API reference pages: C# and C++. Latency is capped at 1 second, so any commands taking longer than this will be reported as taking 1s.

Useful for:

  • Alerting abnormal latency in a deployment.
  • Debugging latency for certain components.
  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_command_latency_seconds::summary5m
  • Labels: project, cluster, dpl, dpl_tag, quantile

Example query: spatialos_command_latency_seconds::summary5m{project="test_project", dpl_tag="prod", quantile="0.95"}

Detailed metric (alpha)
  • Name: spatialos_command_latency_seconds:summary5m
  • Labels: project, dpl, dpl_tag, quantile, component_type={“player.test”|”SYSTEM”}, command_type={“USER_DEFINED”|”CREATE_ENTITY_REQUEST”|”REMOVE_ENTITY_REQUEST”|…}, status

Example query: spatialos_command_latency_seconds:summary5m{project=”test_project”, dpl=”test_deployment”, quantile="0.95", component_type={“player.Health”}, command_type="USER_DEFINED”}

Network metrics

Network egress rate

The rate of total network egress (traffic going out of the cloud) bytes per minute.

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_network_egress_bytes::rate1m
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_network_egress_bytes::rate1m{project="test_project", dpl="test_deployment"}

Detailed metric (alpha)
  • Name: spatialos_network_egress_bytes:rate1m
  • Labels: project, dpl, dpl_tag, node

Example query: spatialos_network_egress_bytes:rate1m{project="test_project", dpl="test_deployment", node="worker_01"}

Runtime metrics

Operations

Operations (ops) are messages carrying information between client-worker and server-worker instances and SpatialOS. The size of an op is measured in operation units.

Useful for:

  • Optimizing for performance and cost.
Aggregated metric
  • Name: spatialos_worker_ops::rate1m
Label Description
project The name of the project (for example test_project). This label is mandatory.
dpl The name of the deployment (for example test_deployment).
dpl_tag The deployment stage of the deployment, as set in the SpatialOS Console. Possible values: beta, alpha, prod.
worker_type The name of the worker type (for example MyCSharpWorker).
direction Whether the operation was sent to or from a worker instance. Possible values: to_worker, from_worker.

Example query: spatialos_worker_ops::rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="PhysicsWorker", direction="from_worker"}

Operation units

Every operation counts towards one or more operation units (op units) based on its payload size. The total op units per second across the whole deployment is the total amount of information SpatialOS is synchronizing. Your choice of game template specifies an upper limit on how many op units your deployment can handle.

Aggregated metric
  • Name: spatialos_worker_op_units::rate1m
Label Description
project The name of the project (for example test_project). This label is mandatory.
dpl The name of the deployment (for example test_deployment).
dpl_tag The deployment stage of the deployment, as set in the SpatialOS Console. Possible values: beta, alpha, prod.
worker_type The name of the worker type (for example MyCSharpWorker).
direction Whether the operation was sent to or from a worker instance. Possible values: to_worker, from_worker.

Example query: spatialos_worker_op_units::rate1m{project="test_project", dpl="test_deployment", dpl_tag="prod", worker_type="PhysicsWorker", direction="from_worker"}

Data latency

A measure of the total time taken for the following, in the last five minutes, in 99th, 90th and 50th percentiles:

  • the Runtime sending an op to a worker instance
  • the worker instance responding to that op
  • the Runtime processing the response from the worker instance

It is capped at 10 seconds, so anything taking longer than this will be reported as taking 10s.

Data latency is not the same as network latency. It’s always larger than network latency. It accounts for back-up of ops received by the worker instance from the Runtime, and by the Runtime from the worker instance.

If network latency is normal and data latency is high, it can mean that the worker instance is struggling to keep up with the amount of data sent to it.

This data is coarsely updated; that is, updated every few seconds, rather than every time the worker instance sends an op.

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_runtime_worker_latency_seconds::summary5m

  • Labels: project, cluster, dpl, dpl_tag, worker_type, quantile

Example query: spatialos_runtime_worker_latency_seconds::summary5m{project="test_project", dpl="test_deployment", worker_type="MyCSharpClient", quantile="0.90"}

View lateness

The latency for an update anywhere in the system to be reflected in a view at the 50th percentile.

Useful for:

  • Optimising for performance and cost.
Aggregated metric
  • Name: spatialos_runtime_view_lateness_50th_percentile_ms
  • Labels: project, cluster, dpl, dpl_tag

Example query: spatialos_runtime_view_lateness_50th_percentile_ms{project="test_project", dpl="test_deployment"}

Snapshot metrics

Snapshot count

A counter for the number of snapshots.

Useful for:

  • Alerting when there is a snapshot failure.
Aggregated metric
  • Name: spatialos_snapshot_count::sum

  • Labels: project, cluster, dpl, dpl_tag, outcome={“success”|“failure”}

Example query: spatialos_snapshot_count::sum{project="test_project", dpl="test_deployment", outcome="failure"}

Search results

Was this page helpful?

Thanks for letting us know!

Thanks for your feedback

Need more help? Ask on the forums