Yesterday I got infomation from develop team that some application on GKE cluster can’t work well. After I have check for a while I found that so many pod have stuck with status “containercreating” and GKE cluster have upgrading. So I try to delete pod but It stuck in status “Terminating”
In our cluster we have 3 node. and many application that be deployed such as ArgoCD, Nats, EcpRouter, KeyCloaks, Prometheus, many exporter for Prometheus and so on. Some application work well and some application don’t work well
we wait about 1 hour for complete upgrade node. But pod in status Creating and Terminating still stuck.
After investigate a little more time. new 3 node have ready but pod that have problem be in only 3rd node.
I try to force delete pod with
kubectl delete pod/nats-2 --grace-period=0 --force -n nats-prod
so it can delete and create new pod in 3rd and still stuck in status “containercreating”
finally I decide to create 4th node and cordon 3rd node. After cordon 3rd node I wait a long time because pod that in status “containercreating” can’t move to other node. I have force delete all pod in status “containercreating” again and this time look good. new pod create in new node and every pod run in status runing.
after everything work well I delete 3rd node that I think it have problem
I hope this event may help some one.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Credit : TrueDigitalGroup
— — — — — — — — — — — — — — — — — — — — — — — — — — — — —