Version: Nightly

Overview

Autopilot is a GreptimeDB Enterprise capability that automatically optimizes cluster load and data distribution. It runs in Metasrv, continuously collects write statistics from Datanodes and Regions, and submits scheduling actions when the configured conditions are met. This reduces the operational cost of identifying hotspots and manually adjusting the cluster.

Autopilot currently includes the following capabilities:

Region Balancer: Automatically migrates hot Regions to balance write load across Datanodes.
Auto Repartition: Automatically splits large Regions into smaller Regions to prevent a single large Region from becoming a performance bottleneck. The split Regions can then be scheduled across multiple Datanodes to distribute potential bottlenecks.

How it works

Autopilot consists of a shared runtime, shared cluster statistics, and scheduling strategies:

Runtime: Triggers a scheduling cycle at a fixed interval.
Cluster statistics: Collects Region write statistics from Datanode heartbeats and smooths short-term fluctuations.
Scheduling strategies: Decide whether to move Regions or split large Regions based on the collected statistics.
Executors: Submit actions generated by the strategies, such as Region Migration or Repartition.

When both Region Balancer and Auto Repartition are enabled, they share the same Autopilot runtime and cluster statistics.

When to use Autopilot

Autopilot is useful in the following scenarios:

Some Datanodes have a write load that remains significantly higher than others.
Some large Regions may become performance bottlenecks.
You want to reduce the operational cost of manually identifying load bottlenecks and running Region Migration or Repartition.

Limitations

Different Autopilot strategies have their own limitations:

Region Balancer requires the number of schedulable Regions to be greater than the number of active Datanodes. Otherwise, moving Regions cannot make the load evenly distributed across Datanodes.
Auto Repartition only works for partitioned tables. It can only split tables that already have partition rules. If a table does not have partition rules, Auto Repartition does not generate new partition rules for it automatically. For more information about table partitioning and Repartition, see Table Sharding and Repartition.

Configuration

Autopilot configuration includes shared configuration and strategy-specific configuration:

plugins.autopilot: Configures the Autopilot runtime.
plugins.cluster_stat: Configures sampling and smoothing for Datanode and Region write statistics.
plugins.region_balancer: Enables and configures Region Balancer.
plugins.auto_repartition: Enables and configures Auto Repartition.

The following example enables both Region Balancer and Auto Repartition:

[[plugins]]
[plugins.autopilot]
tick_interval = "45s"

[[plugins]]
[plugins.cluster_stat]
sampling_window = "45s"
max_history_windows = 5
ewma_alpha = 0.2

[[plugins]]
[plugins.region_balancer]
acceptable_load_ratio = 0.12
min_load_threshold = "4MB"
region_migration_cooldown_period = "1h"
window_stability_threshold = 2

[[plugins]]
[plugins.auto_repartition]
split_trigger_ratio = 1.8
max_split_parts = 3
table_repartition_cooldown_period = "60s"
max_actions_per_tick = 4
max_actions_per_table_per_tick = 2

If you only need one strategy, configure only plugins.region_balancer or plugins.auto_repartition.

Autopilot runtime configuration

Option	Default	Description
`tick_interval`	`"45s"`	The interval of Autopilot scheduling cycles. A shorter interval reacts faster to load changes but may increase scheduling overhead.

Cluster statistics configuration

Option	Default	Description
`sampling_window`	`"45s"`	The duration of each statistics window. A larger window smooths short-term fluctuations but reacts more slowly.
`max_history_windows`	`5`	The number of historical statistics windows to keep. Region Balancer and Auto Repartition use historical windows to determine whether load is stable.
`ewma_alpha`	`0.2`	The EWMA smoothing factor. A larger value gives more weight to recent observations. A smaller value makes the statistics smoother.

Next steps

To automatically balance write load across Datanodes, see Region Balancer.
To automatically split large Regions, see Auto Repartition.

Overview

How it works​

When to use Autopilot​

Limitations​

Configuration​

Autopilot runtime configuration​

Cluster statistics configuration​

Next steps​