Kubernetes v1.36 Beta: Adjusting Job Resources on the Fly for Suspended Workloads

By

Introduction

Kubernetes v1.36 elevates the ability to modify container resource requests and limits in the pod template of a suspended Job from alpha to beta. Initially introduced in v1.35, this feature empowers queue controllers and cluster administrators to tweak CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended, before it begins or resumes execution. This capability addresses a critical gap in batch and machine learning workflows where resource demands are not always known at Job creation time.

Kubernetes v1.36 Beta: Adjusting Job Resources on the Fly for Suspended Workloads

Why Mutable Pod Resources for Suspended Jobs?

Batch and machine learning workloads often face fluctuating resource requirements that depend on current cluster capacity, queue priorities, and the availability of specialized hardware like GPUs. Before this feature, once a Job’s pod template was set, its resource fields were immutable. If a queue controller such as Kueue determined that a suspended Job should run with different resources, the only recourse was to delete and recreate the Job entirely. That approach meant losing metadata, status, and history—an expensive and disruptive process.

This new functionality offers a more graceful path: a specific Job instance triggered by a CronJob can progress with reduced resources rather than failing outright when the cluster is heavily loaded. It also allows queue controllers to optimize resource allocation dynamically, improving overall cluster utilization and Job success rates.

Example: Machine Learning Training Job

Consider a machine learning training Job that initially requests 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
  labels:
    app.kubernetes.io/name: trainer
spec:
  suspend: true
  template:
    metadata:
      annotations:
        kubernetes.io/description: "ML training, ID abcd123"
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

A queue controller managing cluster resources might determine that only 2 GPUs are available. With this feature, the controller can update the Job’s resource requests before resuming it:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
  labels:
    app.kubernetes.io/name: trainer
spec:
  suspend: true
  template:
    metadata:
      annotations:
        kubernetes.io/description: "ML training, ID abcd123"
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "4"
            memory: "16Gi"
            example-hardware-vendor.com/gpu: "2"
          limits:
            cpu: "4"
            memory: "16Gi"
            example-hardware-vendor.com/gpu: "2"
      restartPolicy: Never

After the resources are updated, the controller resumes the Job by setting spec.suspend to false, and the new Pods are created with the adjusted resource specifications. This process avoids deletion and preserves all associated metadata and history.

How It Works

The Kubernetes API server relaxes the immutability constraint on pod template resource fields specifically for Jobs that are suspended. No new API types are introduced; the existing Job and pod template structures accommodate the change through a controlled relaxation of validation rules. The feature is enabled by default in v1.36 as a beta feature, meaning cluster operators can rely on it without needing to explicitly enable a feature gate.

Key technical aspects include:

Use Cases for Mutable Resources

Benefits and Limitations

This feature provides significant operational flexibility for batch and ML workloads. However, it comes with some important considerations:

Getting Started

To use this feature, you need a Kubernetes cluster running v1.36 or later. The feature is enabled by default. You can suspend a Job by setting spec.suspend: true, update the pod template’s resources section, and then resume the Job. For queue controllers, integrate with the Kubernetes API to watch suspended Jobs and apply resource modifications programmatically.

For more details, refer to the official Kubernetes documentation on job suspension and resource management for containers.

Conclusion

The promotion of mutable pod resources for suspended Jobs to beta in Kubernetes v1.36 marks a meaningful step toward more intelligent and resource-efficient batch processing. By allowing on-the-fly adjustments without data loss, it strengthens the platform’s suitability for dynamic, large-scale workloads. As Kubernetes continues to evolve, features like this underscore the commitment to providing flexible, observable, and adaptable scheduling mechanisms.

Tags:

Related Articles

Recommended

Discover More

10 Essential Steps to Compress PDF Files Locally in Your Browser with JavaScriptMastering the Latest Rustup 1.29.0: A Complete Guide to Faster Toolchain ManagementGoogle Revamps Bug Bounty Program: Now Pays Up to $1.5 Million for Top Android Exploits5 Alarm Clock Apps That Saved Me From OversleepingDeep Sea Sanctuaries: A Step-by-Step Guide to Squid Survival Through Extinction Events