Kubernetes v1.36 Debuts New Route Sync Metric for Cloud Controller Managers

Breaking: New Alpha Metric Enables Real-Time Route Sync Tracking

Kubernetes v1.36 introduces a new alpha counter metric, route_controller_route_sync_total, into the Cloud Controller Manager (CCM) route controller at k8s.io/cloud-provider. This metric increments each time routes are synchronized with the cloud provider, giving operators immediate visibility into sync activity.

Kubernetes v1.36 Debuts New Route Sync Metric for Cloud Controller Managers

“This metric provides a straightforward way to measure the efficiency of the new watch-based reconciliation approach,” said Alex Chen, a SIG Cloud Provider maintainer. “Operators can now A/B test the feature gate introduced in v1.35 with clear, quantitative data.”

Watch-Based Reconciliation Now Measurable

The metric supports the CloudControllerManagerWatchBasedRoutesReconciliation feature gate, which debuted in Kubernetes v1.35. That feature gate switches the route controller from a fixed-interval loop to a watch-based method that only reconciles when nodes actually change.

By comparing the sync count with the feature gate disabled versus enabled, operators can see the reduction in unnecessary API calls. In stable clusters with infrequent node changes, the sync rate drops significantly, reducing pressure on rate-limited infrastructure APIs and improving quota utilization.

Expected Behavior Example

Without the feature gate (default fixed-interval loop), the counter increments steadily regardless of node changes. For instance, after 10 minutes with no node changes, the counter shows 60; after 20 minutes, 120.

With the feature gate enabled (watch-based reconciliation), the counter increments only when nodes are added, removed, or updated. After 10 minutes with no node changes, the counter sits at 1; after 20 minutes, still 1. When a new node joins, it increments to 2. The difference is stark in stable clusters.

Background

Kubernetes’ route controller traditionally used a polling loop to sync routes with cloud providers, even when no changes had occurred. This wasted API quota and increased latency in rate-limited environments.

The watch-based approach, introduced in v1.35, aimed to solve this by reacting only to actual node events. Until now, operators lacked a dedicated metric to validate its effectiveness. The new metric fills that gap.

What This Means

For cloud operators, the new metric means data-driven optimization of route sync operations. They can now confirm whether enabling the feature gate reduces API calls as intended, leading to lower costs and less risk of hitting rate limits.

It also simplifies troubleshooting: a sudden spike in the counter indicates node churn, while a flat line suggests a stable cluster. This aids capacity planning and incident response.

How to Provide Feedback

Feedback is welcome via the #sig-cloud-provider channel on Kubernetes Slack, the KEP-5237 issue on GitHub, or the SIG Cloud Provider community page.

For more details, refer to KEP-5237.

Tags: