Kubernetes is built to scale, and with managed Kubernetes services, you can deploy a Pod without having to worry about capacity planning at all. So why is it that Pods sometimes become stuck in an "Unschedulable" state? How do you end up with Pods that have been "Pending" for several minutes? In this blog, we'll dig into the reasons Pods fail to schedule. We'll look at why it happens, how to troubleshoot it, and ways you can prevent it.
What are unschedulable Pods and why is it important to address them?
A Pod is unschedulable when it's been put into Kubernetes' scheduling queue, but can't be deployed to a node. This can be for a number of reasons, including:
- The cluster not having enough CPU or RAM available to meet the Pod's requirements.
- Pod affinity or anti-affinity rules preventing it from being deployed to available nodes.
- Nodes being cordoned due to updates or restarts.
- The Pod requiring a persistent volume that's unavailable, or bound to an unavailable node.
Although the reasons vary, an unschedulable Pod is almost always a symptom of a larger problem. The Pod itself may be fine, but the cluster isn't operating the way it should, which makes resolving the issue even more critical.
How do I detect unschedulable Pods?
Unfortunately, there is no easy direct way to query for unschedulable Pods. Pods waiting to be scheduled are held in the "Pending" status, but if the Pod can't be scheduled, it will remain in this state. However, Pods that are being deployed normally are also marked as "Pending." The difference comes down to how long a Pod remains in "Pending."
We can get a list of pending Pods by querying for them. For example, using kubectl:
1kubectl get pods --field-selector=status.phase=Pending
This gives us our list of Pods, which we can use to do additional troubleshooting. Here, we have a Pod called web-frontend
that's been Pending for over 20 minutes:
1NAME READY STATUS RESTARTS AGE2web-frontend-3fc842jb1-gxvwx 0/1 Pending 0 21m48s
How do I troubleshoot unschedulable Pods?
When troubleshooting unschedulable Pods, the first place to check is your scheduler, which is the component responsible for determining which node Pods get assigned to. The default Pod scheduler for Kubernetes, kube-scheduler, logs the reasons Pods can't be scheduled. You can typically find these by using a tool like kubectl to query your Pod or deployment.
Let's look back at the web-frontend
example. Imagine this is part of a larger Deployment, also called web-frontend
. We can start debugging by running kubectl describe deploy/web-frontend
and looking at the Events section. Under the Reason column, look for lines containing FailedScheduling
. For example:
1Events:2 FirstSeen LastSeen Count From SubObjectPath Type Reason Message3 --------- -------- ----- ---- ------------- -------- ------ -------4 25m 0s 15 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (3).
Looking under the Message column, it's clear that the Pod wasn't scheduled because of insufficient CPU. This could mean that there isn't enough CPU capacity in your cluster to accommodate new Pods, or that your CPU requests are set too high.
How do I fix unschedulable Pods?
There is no single solution for unschedulable Pods as they have many different causes. However, there are a few things you can try depending on the cause.
Enable cluster autoscaling
If you're using a managed Kubernetes service like Amazon EKS or Google Kubernetes Engine (GKE), you can very easily take advantage of autoscaling to increase and decrease cluster capacity on-demand. With autoscaling enabled, Kubernetes' Cluster Autoscaler will trigger your provider to add nodes when needed. As long as you've configured your cluster node pool and it hasn't reached its max node limit, your provider will automatically provision a new node and add it to the pool, making it available to the cluster and to your Pods.
Increase your node capacity
Even if you have enough nodes available, those nodes might not be large enough to accommodate your Pod. As an example, AWS compute-optimized instances start at just one vCPU and 2GiB of memory. Imagine we set a CPU request of just 100m on each of your containers. Even with that low amount, we'll hit the instance's capacity with just a few containers (likely earlier, due to the other services running on the instance).
To avoid this, consider upgrading your lower-capacity instances to higher-capacity instances. This will increase costs, but it will also ensure your Pods can be scheduled without slowing down your systems or being marked as unschedulable. On EKS for example, you can do this by creating a new node group and selecting your new instance type. Then, delete the previous node group. AWS will cordon the nodes in the old group, which signals them to drain the Pods from the node. Kubernetes will then start the Pods on the new nodes.
Check your Pod requests
There's a chance the problem might be due to a misconfiguration or typo in the Pod's manifest. Make sure that if you define requests, they're set to the correct values. Remember: requests determine the minimum amount of resources allocated to a Pod, while limits determine the maximum. If you set a container's CPU requests to 1000m (or simply 1), the container's Pod will not run unless it can reserve an entire CPU core. If you have many Pods running on your nodes and they're all competing for CPU time, this can make it very hard to find a node with enough capacity.
One way to approach this is to consider whether a single container really needs that much CPU allocated to it. If not, consider lowering the request amount and seeing if that impacts the container's performance or stability. Keep in mind that Kubernetes sums up all requests across all of the Pod's containers. For instance, if you have two containers, both requesting 250m CPU, then Kubernetes will look for a node with at least 500m of free CPU capacity.
Check your affinity and anti-affinity rules
Affinity and anti-affinity rules let you control which Pods get deployed to which nodes based on certain conditions. For example, if you have a Pod that requires high-speed continuous read/write access to a filesystem, you can define an affinity rule so the Pod only gets scheduled onto nodes that have SSDs. By default, affinity rules are restrictive: if no available nodes meet the Pod's requirements, the Pod won't run at all and will be unschedulable.
The most direct solution to this problem is to deploy a node that meets the Pod's requirements. You might also want to check your existing nodes' labels in case they're improperly configured. If affinity isn't a hard requirement (i.e. your Pod will work fine on a non-ideal node), you can change your affinity rule to preferred. It will still look for an ideal node, but will fall back to an available node if it can't find one.
As with most Kubernetes features, affinity and anti-affinity go much deeper than simple "do/don't deploy" rules. There are Pod topology constraints, taints and tolerations, inter-pod affinity, and more. You can learn more about these in the Kubernetes documentation.
What other Kubernetes risks should I be looking for?
If you want to know how to protect yourself against other risks like resource exhaustion, missing liveness probes, and improperly configured high-availability clusters, check out our blog series on Detected Risks. Each post will walk you through a specific risk, how to detect them, how to resolve them, and how to prevent them from reoccuring.
In the meantime, if you'd like a free report of your reliability risks in just a few minutes, you can sign up for a free 30-day Gremlin trial.