How to Adjust Pod Resources for Suspended Kubernetes Jobs (v1.36+)

By

Introduction

In Kubernetes v1.36, a new beta feature allows you to modify CPU, memory, GPU, and extended resource requests and limits on a suspended Job. This is a game-changer for batch and machine learning workloads where resource requirements often depend on real-time cluster capacity and queue priorities. Previously, you'd have to delete and recreate a Job to change its resource spec, losing metadata and history. Now, you can adjust resources while the Job is paused and then resume it — without starting from scratch.

How to Adjust Pod Resources for Suspended Kubernetes Jobs (v1.36+)

This step-by-step guide will walk you through using this feature manually or with a queue controller like Kueue.

What You Need

Step-by-Step Instructions

Step 1: Create or Identify a Suspended Job

If you don’t already have a suspended Job, create one that requests specific resources. The key is to set spec.suspend: true in the Job manifest. Below is an example of a machine learning training Job asking for 4 GPUs, 8 CPUs, and 32 GiB of memory:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Apply this manifest with kubectl apply -f job.yaml.

Step 2: Confirm the Job Is Suspended

Run the following command to verify that the Job is in a suspended state:

kubectl get job training-job-example-abcd123 -o jsonpath='{.spec.suspend}'

It should output true. You can also list all Jobs with kubectl get jobs and look for a Status of 0/1 completed tasks.

Step 3: Modify the Resource Requests and Limits

While the Job is suspended, you can change its pod template resource fields. For example, if the cluster only has 2 GPUs available, adjust the requests and limits accordingly. Use kubectl patch, kubectl edit, or a direct update via API. Here’s how to patch the resource fields:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"trainer","resources":{"requests":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"},"limits":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"}}}]}}}}'

This updates the Job’s pod template. Because the Job is suspended, this modification is allowed (the immutability constraint is relaxed).

Step 4: Verify the Changes

Check that the resources have been updated correctly:

kubectl get job training-job-example-abcd123 -o yaml

Look under spec.template.spec.containers[0].resources — they should now show the adjusted values. No new Pods are created yet because the Job is still suspended.

Step 5: Resume the Job

Once you’re satisfied with the resource settings, unsuspend the Job by setting spec.suspend to false:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"suspend":false}}'

Kubernetes will now launch the Pods using the updated resource specifications. You can monitor progress with:

kubectl get pods -l job-name=training-job-example-abcd123

Step 6: Confirm Pod Resources

After the Job resumes, inspect one of the running Pods to ensure the resources are applied:

kubectl get pod  -o jsonpath='{.spec.containers[0].resources}'

The output should match the new values you set in Step 3. If everything looks good, you’ve successfully adjusted resources for a suspended Job.

Tips and Best Practices

By following these steps, you can flexibly adjust resource allocations for batch and ML jobs without losing metadata, history, or requiring deletions. This feature streamlines workload scheduling in dynamic cluster environments.

Tags:

Related Articles

Recommended

Discover More

ASML CEO Declares Unassailable Lead in Chip Lithography: 'No One Is Coming for Us'Transform Your PS5 into a Linux Gaming Rig: A Step-by-Step GuidePython 3.13.10 Released: A Detailed Look at the Latest Maintenance UpdateHow to Protect Yourself from Hantavirus on a Cruise: A Prevention GuideDecoding iPhone 17 Sales: A Guide to Understanding Supply vs. Demand Dynamics