Debugging in Kubernetes
DevOps Team Lead in Content
Kubernetes is great for managing complex applications. Unfortunately, though, even in the best circumstances, problems can still occur. Therefore, debugging is an important skill when it comes to managing Kubernetes applications in practice. This lab will give you an opportunity to practice some common Kubernetes debugging skills, such as obtaining important debugging info and locating problems within the cluster.
Debugging in Kubernetes
Our company has a robust Kubernetes infrastructure that is used by multiple teams. Congratulations! However, the team just told us that there is a problem with a service they use in the cluster. They want us to fix it. Unfortunately, no one is able to give us much information about the service. We don't even know what it is called or where it is located. All we know is that there is likely a pod failing somewhere.
The team has asked us to take the lead in debugging the issue. They want us to locate the problem and collect some relevant information that will help them analyze the problem and correct it in the future. They also want us to go ahead and get the broken pod running again.
To get this all done, we've got several tasks ahead of us, so let's get right to it.
Get logged in
Use the credentials and server IP in the hands-on lab overview page to log in with SSH.
Find the broken pod and save the pod name to
Since we don't know what namespace the broken pod is in, a quick way to find the broken pod is to list all pods from all namespaces:
[user@host ]$ kubectl get pods --all-namespaces
STATUS field to find which pod is broken. Once you have located the broken pod (probably the one that doesn't have a
RUNNING status), create a text file to hold the name of the broken pod, and save the file:
[user@host ]$ vi /home/cloud_user/debug/broken-pod-name.txt
In the same namespace as the broken pod, find out which pod is using the most CPU and output the name of that pod to
Look at the namespace of the broken pod, and then use
kubectl top pod to show resource usage for all pods in that namespace.
[user@host ]$ kubectl top pod -n &ltnamespace>
Identify which pod in that namespace is using the most CPU cycles, then create another text file to hold its name:
[user@host ]$ vi /home/cloud_user/debug/high-cpu-pod-name.txt
Get the broken pod's summary data in JSON format and output it to
You can get the JSON data and output it to the file like this:
[user@host ]$ kubectl get pod &ltpod name> -n &ltnamespace> -o json > /home/cloud_user/debug/broken-pod-summary.json
Get the broken pod's container logs and put them in
You can get the logs and output them to the file like this:
[user@host ]$ kubectl logs &ltpod name> -n &ltnamespace> > /home/cloud_user/debug/broken-pod-logs.log
Fix the problem with the broken pod so that it enters the
Describe the broken pod to help identify what is wrong:
[user@host ]$ kubectl describe pod &ltpod name> -n &ltnamespace>
Events to see if you can spot what is wrong.
You may notice the pod's liveness probe is failing. If you look closely, you might also notice the path for the liveness probe looks like it may be incorrect.
In order to edit and fix the liveness probe, you will need to delete and recreate the pod. You should save the pod descriptor before deleting it, or you will have no way to recover it!
[user@host ]$ kubectl get pod &ltpod name> -n &ltnamespace> -o yaml --export > broken-pod.yml
Delete the broken pod:
[user@host ]$ kubectl delete pod &ltpod name> -n &ltnamespace>
Now, edit the descriptor file, and fix the
path attribute for the liveness probe (it should say
[user@host ]$ vi broken-pod.yml`.
Recreate the broken pod with the fixed probe:
[user@host ]$ kubectl apply -f broken-pod.yml -n &ltnamespace>
Check to make sure the pod is now running properly:
[user@host ]$ kubectl get pod &ltpod name> -n &ltnamespace>
That's it! We did it. We walked into a strange Kubernetes infrastructure that was broken, found the problem and got things up and running again for our development team. Congratulations!