Skip to main content

Debugging in Kubernetes

Hands-On Lab

 

Photo of Will Boyd

Will Boyd

DevOps Team Lead in Content

Length

01:30:00

Difficulty

Intermediate

Kubernetes is great for managing complex applications. Unfortunately, though, even in the best circumstances, problems can still occur. Therefore, debugging is an important skill when it comes to managing Kubernetes applications in practice. This lab will give you an opportunity to practice some common Kubernetes debugging skills, such as obtaining important debugging info and locating problems within the cluster.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Debugging in Kubernetes

The scenario

Our company has a robust Kubernetes infrastructure that is used by multiple teams. Congratulations! However, the team just told us that there is a problem with a service they use in the cluster. They want us to fix it. Unfortunately, no one is able to give us much information about the service. We don't even know what it is called or where it is located. All we know is that there is likely a pod failing somewhere.

The team has asked us to take the lead in debugging the issue. They want us to locate the problem and collect some relevant information that will help them analyze the problem and correct it in the future. They also want us to go ahead and get the broken pod running again.

To get this all done, we've got several tasks ahead of us, so let's get right to it.

Get logged in

Use the credentials and server IP in the hands-on lab overview page to log in with SSH.

Find the broken pod and save the pod name to /home/cloud_user/debug/broken-pod-name.txt

Since we don't know what namespace the broken pod is in, a quick way to find the broken pod is to list all pods from all namespaces:

[user@host ]$ kubectl get pods --all-namespaces

Check the STATUS field to find which pod is broken. Once you have located the broken pod (probably the one that doesn't have a RUNNING status), create a text file to hold the name of the broken pod, and save the file:

[user@host ]$ vi /home/cloud_user/debug/broken-pod-name.txt

In the same namespace as the broken pod, find out which pod is using the most CPU and output the name of that pod to /home/cloud_user/debug/high-cpu-pod-name.txt

Look at the namespace of the broken pod, and then use kubectl top pod to show resource usage for all pods in that namespace.

[user@host ]$ kubectl top pod -n &ltnamespace>

Identify which pod in that namespace is using the most CPU cycles, then create another text file to hold its name:

[user@host ]$ vi /home/cloud_user/debug/high-cpu-pod-name.txt

Get the broken pod's summary data in JSON format and output it to /home/cloud_user/debug/broken-pod-summary.json

You can get the JSON data and output it to the file like this:

[user@host ]$ kubectl get pod &ltpod name> -n &ltnamespace> -o json > /home/cloud_user/debug/broken-pod-summary.json

Get the broken pod's container logs and put them in /home/cloud_user/debug/broken-pod-logs.log

You can get the logs and output them to the file like this:

[user@host ]$ kubectl logs &ltpod name> -n &ltnamespace> > /home/cloud_user/debug/broken-pod-logs.log

Fix the problem with the broken pod so that it enters the Running state

Describe the broken pod to help identify what is wrong:

[user@host ]$ kubectl describe pod &ltpod name> -n &ltnamespace>

Check the Events to see if you can spot what is wrong.

You may notice the pod's liveness probe is failing. If you look closely, you might also notice the path for the liveness probe looks like it may be incorrect.

In order to edit and fix the liveness probe, you will need to delete and recreate the pod. You should save the pod descriptor before deleting it, or you will have no way to recover it!

[user@host ]$ kubectl get pod &ltpod name> -n &ltnamespace> -o yaml --export > broken-pod.yml

Delete the broken pod:

[user@host ]$ kubectl delete pod &ltpod name> -n &ltnamespace>

Now, edit the descriptor file, and fix the path attribute for the liveness probe (it should say /healthz, not /ealthz):

[user@host ]$ vi broken-pod.yml`.

Recreate the broken pod with the fixed probe:

[user@host ]$ kubectl apply -f broken-pod.yml -n &ltnamespace>

Check to make sure the pod is now running properly:

[user@host ]$ kubectl get pod &ltpod name> -n &ltnamespace>

Conclusion

That's it! We did it. We walked into a strange Kubernetes infrastructure that was broken, found the problem and got things up and running again for our development team. Congratulations!