关于k8s 的woker节点故障处理--临时记录
  j6V0MxmTOdYq 2023年11月02日 62 0

正常情况下查看ks8搭建环境的pod信息


[root@gip33 ~]# kubectl get pods

NAME                                       READY   STATUS    RESTARTS        AGE

greatdb-sample-dashboard                   1/1     Running   3 (4m3s ago)    45h

greatdb-sample-greatdb-0                   3/3     Running   10 (4m3s ago)   45h

greatdb-sample-greatdb-1                   3/3     Running   10 (4m3s ago)   45h

greatdb-sample-greatdb-2                   3/3     Running   10 (4m3s ago)   45h

greatdbpaxos-operator-5f46f78ff6-r84vh     1/1     Running   1               47h

local-client-provisioner-b7c46f5db-swbh2   1/1     Running   3 (4m13s ago)   2d

[root@gip33 ~]#  
[root@gip33 ~]#

[root@gip33 ~]# kubectl get nodes -o wide

NAME    STATUS   ROLES           AGE    VERSION    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                CONTAINER-RUNTIME

gip28   Ready    control-plane   2d2h   v1.24.17   172.17.136.28   <none>        CentOS Linux 8   4.18.0-348.7.1.el8_5.x86_64   docker://24.0.5

gip29   Ready    <none>          2d1h   v1.24.17   172.17.136.29   <none>        CentOS Linux 8   4.18.0-348.7.1.el8_5.x86_64   docker://24.0.5

gip32   Ready    <none>          2d1h   v1.24.17   172.17.136.32   <none>        CentOS Linux 8   4.18.0-348.7.1.el8_5.x86_64   docker://24.0.5

gip33   Ready    <none>          2d1h   v1.24.17   172.17.136.33   <none>        CentOS Linux 8   4.18.0-348.7.1.el8_5.x86_64   docker://24.0.5

[root@gip33 ~]#


1、使用kubectl get pods 提示 无法获取API server  group  list,获取超时

kubectl get nodes
E0831 11:25:06.676204 1306788 memcache.go:265] couldn't get current server API group list: Get "https://172.17.136.28:6443/api?timeout=32s": net/http: TLS handshake timeout
E0831 11:25:16.685134 1306788 memcache.go:265] couldn't get current server API group list: Get "https://172.17.136.28:6443/api?timeout=32s": net/http: TLS handshake timeout

关于k8s 的woker节点故障处理--临时记录_2d


2、查看对应的kubelet服务状态

systemctl  status kubelet

[root@gip28 ~]# systemctl status kubelet

● kubelet.service - kubelet: The Kubernetes Node Agent

   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)

  Drop-In: /usr/lib/systemd/system/kubelet.service.d

           └─10-kubeadm.conf

   Active: active (running) since Wed 2023-08-30 10:08:48 CST; 1 day 1h ago

     Docs: https://kubernetes.io/docs/

 Main PID: 1820 (kubelet)

    Tasks: 20 (limit: 101083)

   Memory: 69.9M

   CGroup: /system.slice/kubelet.service

           └─1820 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --con>


Aug 31 11:25:13 gip28 kubelet[1820]: E0831 11:25:13.775149    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_coredns-74586cf9b6>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.858653    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_kube-proxy-fk6vp_e>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.858862    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_calico-kube-contro>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.859064    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_calico-node-q2llv_>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.859201    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_kube-scheduler-gip>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.859358    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_coredns-74586cf9b6>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.859511    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_etcd-gip28_a758dc2>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.860389    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_kube-controller-ma>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.860836    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_kube-apiserver-gip>

Aug 31 11:25:24 gip28 kubelet[1820]: E0831 11:25:24.861060    1820 cri_stats_provider.go:669] "Unable to fetch container log stats" err="failed to get fsstats for \"/var/log/pods/kube-system_coredns-74586cf9b6>

[root@gip28 ~]#

关于k8s 的woker节点故障处理--临时记录_docker_02

排查思路:

1、先查看kubelet 、docker、contrainer 服务状态是否正常

2、查看对应的各节点主机网络是否正常、查看个节点的k8s版本是否有自动升级情况

3、查看对应日志对应的container信息 是否存在


发现对应的日志信息不存在

关于k8s 的woker节点故障处理--临时记录_2d_03


后询问同事原因,是在数据库的container运行时,执行docker rm $(docker ps -a|grep Exited|awk -F ' ' '{print $1}') 清理处于Exited 状态的container的原因导致。


解决方法,直接简单粗暴,重启 gip28主机,重新自动启动新的container


关于k8s 的woker节点故障处理--临时记录_docker_04



关于k8s 的woker节点故障处理--临时记录_ide_05


关于k8s 的woker节点故障处理--临时记录_ide_06


重新恢复正常!



【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月08日 0

暂无评论

j6V0MxmTOdYq