【Kubernets】K8s群集安装部署详细教程-(3)安装过程中的错误解决-摩杜云开发者社区

k8s常用命令集合

k8s常用命令集合：

# 查看当前集群的所有的节点
kubectl get node
# 显示 Node 的详细信息（一般用不着）
kubectl describe node node1

# 查看所有的pod
kubectl get pod --all-namespaces
# 查看pod的详细信息
kubectl get pods -o wide --all-namespaces

# 查看所有创建的服务
kubectl get service

# 查看所有的deploy
kubectl get deploy

# 重启 pod（这个方式会删除原来的pod，然后再重新生成一个pod达到重启的目的）
# 有yaml文件的重启
kubectl replace --force -f xxx.yaml
# 无yaml文件的重启
kubectl get pod <POD_NAME> -n <NAMESPACE> -o yaml | kubectl replace --force -f -

# 查看pod的详细信息
kubectl describe pod nfs-client-provisioner-65c77c7bf9-54rdp -n default

# 根据 yaml 文件创建Pod资源
kubectl apply -f pod.yaml
# 删除基于 pod.yaml 文件定义的Pod 
kubectl delete -f pod.yaml

# 查看容器的日志
kubectl logs <pod-name>
# 实时查看日志
kubectl logs -f <pod-name>
# 若 pod 只有一个容器，可以不加 -c
kubectl log  <pod-name> -c <container_name>
# 返回所有标记为 app=frontend 的 pod 的合并日志
kubectl logs -l app=frontend

# 通过bash获得 pod 中某个容器的TTY，相当于登录容器
# kubectl exec -it <pod-name> -c <container-name> -- bash
eg:
kubectl exec -it redis-master-cln81 -- bash

# 查看 endpoint 列表
kubectl get endpoints

# 查看已有的token
kubeadm token list

排错1：Node状态不可用（pod kube-flannel-ds-xxxx状态：Init:ImagePullBackOff）

node节点状态不可用问题：

[root@M001 ~]# kubectl get nodes -o wide         
NAME   STATUS     ROLES           AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE          KERNEL-VERSION          CONTAINER-RUNTIME
m001   Ready   control-plane   78m   v1.26.0   192.168.11.120   <none>        CentOS Stream 9   5.14.0-325.el9.x86_64   containerd://1.6.21
n001   NotReady   <none>          31m   v1.26.0   192.168.11.121   <none>        CentOS Stream 9   5.14.0-325.el9.x86_64   containerd://1.6.21
n002   Ready      <none>          27m   v1.26.0   192.168.11.122   <none>        CentOS Stream 9   5.14.0-171.el9.x86_64   containerd://1.6.21
[root@M001 ~]#

已经确认：

当 Kubernetes 中的节点状态显示为 "NotReady" 时，这通常表示节点无法正常工作或与控制平面失去连接。

以下是一些可能导致节点状态为 "NotReady" 的常见问题和解决方法：

网络问题：检查节点是否能够与其他节点和控制平面进行通信。确保网络配置正确，并且节点能够通过所需的端口和协议与其他组件进行通信。
==通讯正常
CNI 插件问题：检查容器网络接口（CNI）插件的配置和状态。CNI 插件负责为容器提供网络功能，并在节点上设置网络接口。确保 CNI 插件正确安装并正常运行。
==我们使用的是：Flannel
2.1 检查配置文件
在Node节点上没有发现/etc/cni/net.d/下存在有关Flannel的配置文件（Ready的Node上有）
2.2 检查pods状态
在M001上检查：

[root@M001 ~]# kubectl get pods -n kube-flannel
NAME                    READY   STATUS                  RESTARTS         AGE
kube-flannel-ds-5nzr9   0/1     Init:ImagePullBackOff   1                40m
kube-flannel-ds-g94hc   0/1     CrashLoopBackOff        18 (3m54s ago)   73m
kube-flannel-ds-jmc5d   0/1     CrashLoopBackOff        11 (4m24s ago)   37m
[root@M001 ~]#

备注：

"Init:ImagePullBackOff" 是表示 Flannel pod 的初始化容器（Init Container）处于无法拉取镜像的状态。

该错误通常是由于以下原因之一导致的：

镜像地址错误：请确保使用的 Flannel 镜像地址正确且可访问。检查 Pod 的配置文件或部署文件，确认所使用的镜像名称和版本是否正确。如果有需要，您可以尝试更换为其他可用的 Flannel 镜像源或版本。

仓库凭据问题：如果使用私有镜像仓库，可能需要提供有效的凭据来拉取镜像。确认在 Kubernetes 集群中的机密（Secret）中配置了正确的仓库凭据，并确保这些凭据与正在使用的镜像仓库匹配。

网络问题：如果节点无法连接到互联网或无法访问所需的镜像仓库，将无法拉取镜像。确保节点具有与其他正常工作的节点相同的网络配置，并能够访问所需的网络资源，包括镜像仓库。

防火墙问题：某些防火墙设置可能会限制节点对外部网络的访问，这可能导致拉取镜像失败。检查节点的防火墙规则，并确保允许所需端口的访问以及相关的网络流量。

2.3 修复pod的异常状态；移除pod

在移除Pod之前使用以下命令查看节点上的所有运行中的 Pod：

[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=n001
NAMESPACE      NAME                    READY   STATUS                  RESTARTS      AGE
kube-flannel   kube-flannel-ds-5nzr9   0/1     Init:ImagePullBackOff   1             46m
kube-system    kube-proxy-5q92t        1/1     Running                 1 (34m ago)   46m
[root@M001 ~]#

然后，对于每个返回的 Pod，使用以下命令删除它们：

kubectl delete pod <pod-name> --namespace <namespace>

<pod-name> 是要删除的 Pod 的名称（注意区分大小写的），<namespace> 是该 Pod 所属的命名空间。重复此步骤，直到节点上没有任何运行的 Pod。

kubectl delete pod kube-flannel-ds-5nzr9 --namespace kube-flannel
kubectl delete pod kube-proxy-5q92t --namespace kube-system

提示：如果出现删除时迟迟无响应的情况，则需要在node节点（注意是node节点）上查看服务containerd是否已经启动，没启动的情况下手动启动该服务。

[root@M001 ~]# kubectl delete pod kube-flannel-ds-5nzr9 --namespace kube-flannel
pod "kube-flannel-ds-5nzr9" deleted

[root@M001 ~]#

删除成功后，再检查：

[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=n001
NAMESPACE      NAME                    READY   STATUS             RESTARTS        AGE
kube-flannel   kube-flannel-ds-77b9l   0/1     CrashLoopBackOff   3 (49s ago)     3m11s
kube-system    kube-proxy-5q92t        1/1     Running            2 (3m17s ago)   58m
[root@M001 ~]#

发现情况已经有改变；我们就不再继续删除其他pod了；

[root@M001 ~]# kubectl get nodes -o wide
NAME   STATUS   ROLES           AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE          KERNEL-VERSION          CONTAINER-RUNTIME
m001   Ready    control-plane   115m   v1.26.0   192.168.11.120   <none>        CentOS Stream 9   5.14.0-325.el9.x86_64   containerd://1.6.21
n001   Ready    <none>          67m    v1.26.0   192.168.11.121   <none>        CentOS Stream 9   5.14.0-325.el9.x86_64   containerd://1.6.21
n002   Ready    <none>          64m    v1.26.0   192.168.11.122   <none>        CentOS Stream 9   5.14.0-171.el9.x86_64   containerd://1.6.21
[root@M001 ~]#

接下来尝试修复新的故障（状态：CrashLoopBackOff）。

资源不足：检查节点的资源使用情况，包括 CPU、内存和存储。如果节点资源不足，它可能无法正常工作或被标记为 "NotReady"。可以考虑增加节点的资源配额或重新分配任务以减轻节点的负载。
节点故障：检查节点本身的状态和健康状况。节点可能遇到硬件故障、操作系统问题或其他不可预知的失败。确保节点正常运行，并且没有任何错误或异常。
容器运行时问题：Kubernetes 使用容器运行时来管理容器，例如 Docker、containerd 或 CRI-O。检查容器运行时的日志和状态，确保它们正常运行，并且没有发生任何错误或故障。

排错2：所有节点上pod kube-flannel-ds-xxxx状态：CrashLoopBackOff）

Pod 的状态为 CrashLoopBackOff 表示该 Pod 在启动后不断地崩溃和重启。这通常是由于 Pod 内部的容器出现了持续性错误或异常导致的。

解决 CrashLoopBackOff 状态的问题，可以按照以下步骤进行排查：

查看 Pod 日志：使用以下命令查看 Pod 的日志，以获取更多关于错误或异常的详细信息：

kubectl logs <pod-name> -n <namespace>

<pod-name> 是出现 CrashLoopBackOff 状态的 Pod 的名称，<namespace> 是 Pod 所属的命名空间。检查日志中是否有任何报错信息，可能可以帮助确定问题所在。

[root@M001 ~]# kubectl logs kube-flannel-ds-77b9l -n kube-flannel          
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:13:06.428205       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:13:06.428264       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0616 09:13:06.439102       1 main.go:228] Failed to create SubnetManager: error parsing subnet config: invalid character '#' looking for beginning of object key string
[root@M001 ~]#

这看起来就是配置文件中有问题；而flannel的配置文件是：kube-flannel.yml

删除网络配置中的注释部分：

【Kubernets】K8s群集安装部署详细教程-(3)安装过程中的错误解决_k8s

重新应用flannel配置

#删除
[root@M001 ~]# kubectl delete -f kube-flannel.yml
namespace "kube-flannel" deleted
clusterrole.rbac.authorization.k8s.io "flannel" deleted
clusterrolebinding.rbac.authorization.k8s.io "flannel" deleted
serviceaccount "flannel" deleted
configmap "kube-flannel-cfg" deleted
daemonset.apps "kube-flannel-ds" deleted
[root@M001 ~]# 
#检查
[root@M001 ~]# kubectl get ns
NAME              STATUS   AGE
default           Active   143m
kube-node-lease   Active   143m
kube-public       Active   143m
kube-system       Active   143m
[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=m001
NAMESPACE     NAME                           READY   STATUS              RESTARTS   AGE
kube-system   coredns-5bbd96d687-c948c       0/1     ContainerCreating   0          143m
kube-system   coredns-5bbd96d687-tpqs6       0/1     ContainerCreating   0          143m
kube-system   etcd-m001                      1/1     Running             0          143m
kube-system   kube-apiserver-m001            1/1     Running             0          143m
kube-system   kube-controller-manager-m001   1/1     Running             0          143m
kube-system   kube-proxy-khc5n               1/1     Running             0          143m
kube-system   kube-scheduler-m001            1/1     Running             0          143m



#再应用
[root@M001 ~]# kubectl apply -f  kube-flannel.yml                                   
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
[root@M001 ~]# 
[root@M001 ~]# kubectl get ns
NAME              STATUS   AGE
default           Active   144m
kube-flannel      Active   5s
kube-node-lease   Active   144m
kube-public       Active   144m
kube-system       Active   144m
[root@M001 ~]# kubectl get pods --all-namespaces --field-selector spec.nodeName=m001
NAMESPACE      NAME                           READY   STATUS              RESTARTS      AGE
kube-flannel   kube-flannel-ds-fc2fb          0/1     CrashLoopBackOff    1 (11s ago)   19s
kube-system    coredns-5bbd96d687-c948c       0/1     ContainerCreating   0             143m
kube-system    coredns-5bbd96d687-tpqs6       0/1     ContainerCreating   0             143m
kube-system    etcd-m001                      1/1     Running             0             144m
kube-system    kube-apiserver-m001            1/1     Running             0             144m
kube-system    kube-controller-manager-m001   1/1     Running             0             144m
kube-system    kube-proxy-khc5n               1/1     Running             0             143m
kube-system    kube-scheduler-m001            1/1     Running             0             144m
[root@M001 ~]#

问题依旧。

[root@M001 ~]# kubectl get pods --all-namespaces
NAMESPACE      NAME                           READY   STATUS              RESTARTS      AGE
default        ngx                            0/1     ContainerCreating   0             75m
kube-flannel   kube-flannel-ds-bssdt          0/1     CrashLoopBackOff    4 (28s ago)   2m4s
kube-flannel   kube-flannel-ds-fc2fb          0/1     CrashLoopBackOff    4 (28s ago)   2m5s
kube-flannel   kube-flannel-ds-hrlb4          0/1     CrashLoopBackOff    4 (26s ago)   2m4s
kube-system    coredns-5bbd96d687-c948c       0/1     ContainerCreating   0             145m
kube-system    coredns-5bbd96d687-tpqs6       0/1     ContainerCreating   0             145m
kube-system    etcd-m001                      1/1     Running             0             146m
kube-system    kube-apiserver-m001            1/1     Running             0             145m
kube-system    kube-controller-manager-m001   1/1     Running             0             145m
kube-system    kube-proxy-5q92t               1/1     Running             2 (43m ago)   98m
kube-system    kube-proxy-b245t               1/1     Running             0             95m
kube-system    kube-proxy-khc5n               1/1     Running             0             145m
kube-system    kube-scheduler-m001            1/1     Running             0             145m
[root@M001 ~]#

再次查看日志：

[root@M001 ~]# kubectl logs kube-flannel-ds-bssdt -n kube-flannel     
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:34:32.875469       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:34:32.875698       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0616 09:34:32.886229       1 kube.go:144] Waiting 10m0s for node controller to sync
I0616 09:34:32.886617       1 kube.go:485] Starting kube subnet manager
I0616 09:34:33.887163       1 kube.go:151] Node controller sync successful
I0616 09:34:33.887187       1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - n002
I0616 09:34:33.887192       1 main.go:234] Installing signal handlers
I0616 09:34:33.887331       1 main.go:542] Found network config - Backend type: vxlan
I0616 09:34:33.887347       1 match.go:206] Determining IP address of default interface
I0616 09:34:33.887627       1 match.go:259] Using interface with name ens33 and address 192.168.11.122
I0616 09:34:33.887651       1 match.go:281] Defaulting external address to interface address (192.168.11.122)
I0616 09:34:33.887793       1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0616 09:34:33.888320       1 main.go:334] Error registering network: failed to acquire lease: node "n002" pod cidr not assigned
W0616 09:34:33.888508       1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I0616 09:34:33.888729       1 main.go:522] Stopping shutdownHandler...
[root@M001 ~]# kubectl logs kube-flannel-ds-fc2fb -n kube-flannel     
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:34:27.296223       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:34:27.296364       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0616 09:34:27.304363       1 kube.go:144] Waiting 10m0s for node controller to sync
I0616 09:34:27.304426       1 kube.go:485] Starting kube subnet manager
I0616 09:34:28.304699       1 kube.go:151] Node controller sync successful
I0616 09:34:28.304801       1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - m001
I0616 09:34:28.304825       1 main.go:234] Installing signal handlers
I0616 09:34:28.304987       1 main.go:542] Found network config - Backend type: vxlan
I0616 09:34:28.305082       1 match.go:206] Determining IP address of default interface
I0616 09:34:28.305483       1 match.go:259] Using interface with name ens33 and address 192.168.11.120
I0616 09:34:28.305598       1 match.go:281] Defaulting external address to interface address (192.168.11.120)
I0616 09:34:28.305650       1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0616 09:34:28.305914       1 main.go:334] Error registering network: failed to acquire lease: node "m001" pod cidr not assigned
I0616 09:34:28.306086       1 main.go:522] Stopping shutdownHandler...
W0616 09:34:28.306091       1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
[root@M001 ~]# kubectl logs kube-flannel-ds-hrlb4 -n kube-flannel     
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0616 09:34:30.437216       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0616 09:34:30.437290       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0616 09:34:30.449034       1 kube.go:144] Waiting 10m0s for node controller to sync
I0616 09:34:30.449066       1 kube.go:485] Starting kube subnet manager
I0616 09:34:31.449763       1 kube.go:151] Node controller sync successful
I0616 09:34:31.449795       1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - n001
I0616 09:34:31.449803       1 main.go:234] Installing signal handlers
I0616 09:34:31.449885       1 main.go:542] Found network config - Backend type: vxlan
I0616 09:34:31.449906       1 match.go:206] Determining IP address of default interface
I0616 09:34:31.450696       1 match.go:259] Using interface with name ens33 and address 192.168.11.121
I0616 09:34:31.450794       1 match.go:281] Defaulting external address to interface address (192.168.11.121)
I0616 09:34:31.450902       1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0616 09:34:31.451147       1 main.go:334] Error registering network: failed to acquire lease: node "n001" pod cidr not assigned
W0616 09:34:31.451351       1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I0616 09:34:31.451362       1 main.go:522] Stopping shutdownHandler...
[root@M001 ~]#

根据报错提示：

确保节点 "m001" 的配置正确，并且已为其分配了正确的Pod CIDR。您可以使用以下命令检查节点的配置情况：

[root@M001 ~]# kubectl describe node m001
Name:               m001
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=m001
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 16 Jun 2023 15:07:33 +0800
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  m001
  AcquireTime:     <unset>
  RenewTime:       Fri, 16 Jun 2023 17:36:58 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 16 Jun 2023 17:32:33 +0800   Fri, 16 Jun 2023 15:07:31 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 16 Jun 2023 17:32:33 +0800   Fri, 16 Jun 2023 15:07:31 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 16 Jun 2023 17:32:33 +0800   Fri, 16 Jun 2023 15:07:31 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 16 Jun 2023 17:32:33 +0800   Fri, 16 Jun 2023 16:26:16 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.11.120
  Hostname:    m001
Capacity:
  cpu:                2
  ephemeral-storage:  38700584Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1789588Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  35666458156
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1687188Ki
  pods:               110
System Info:
  Machine ID:                 714a13eae5c04693ad91bf1cdbcf706f
  System UUID:                9ca14d56-178a-3d0f-d26c-94bcbee0f0e5
  Boot ID:                    f943b00b-7105-4d26-ac7e-1cbf4b261bd5
  Kernel Version:             5.14.0-325.el9.x86_64
  OS Image:                   CentOS Stream 9
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.21
  Kubelet Version:            v1.26.0
  Kube-Proxy Version:         v1.26.0
Non-terminated Pods:          (8 in total)
  Namespace                   Name                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                            ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-fc2fb           100m (5%)     0 (0%)      50Mi (3%)        0 (0%)         5m34s
  kube-system                 coredns-5bbd96d687-c948c        100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    149m
  kube-system                 coredns-5bbd96d687-tpqs6        100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    149m
  kube-system                 etcd-m001                       100m (5%)     0 (0%)      100Mi (6%)       0 (0%)         149m
  kube-system                 kube-apiserver-m001             250m (12%)    0 (0%)      0 (0%)           0 (0%)         149m
  kube-system                 kube-controller-manager-m001    200m (10%)    0 (0%)      0 (0%)           0 (0%)         149m
  kube-system                 kube-proxy-khc5n                0 (0%)        0 (0%)      0 (0%)           0 (0%)         149m
  kube-system                 kube-scheduler-m001             100m (5%)     0 (0%)      0 (0%)           0 (0%)         149m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                950m (47%)   0 (0%)
  memory             290Mi (17%)  340Mi (20%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:
  Type    Reason            Age                    From           Message
  ----    ------            ----                   ----           -------
  Normal  CIDRNotAvailable  4m30s (x38 over 149m)  cidrAllocator  Node m001 status is now: CIDRNotAvailable
[root@M001 ~]#

查看容器日志：

[root@M001 ~]# cat /run/flannel/subnet.env
cat: /run/flannel/subnet.env: No such file or directory
[root@M001 ~]# kubectl logs kube-flannel-ds-fc2fb -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0619 08:24:03.670815       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0619 08:24:03.670900       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0619 08:24:03.681661       1 kube.go:144] Waiting 10m0s for node controller to sync
I0619 08:24:03.681716       1 kube.go:485] Starting kube subnet manager
I0619 08:24:04.682401       1 kube.go:151] Node controller sync successful
I0619 08:24:04.682488       1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - m001
I0619 08:24:04.682494       1 main.go:234] Installing signal handlers
I0619 08:24:04.682865       1 main.go:542] Found network config - Backend type: vxlan
I0619 08:24:04.682913       1 match.go:206] Determining IP address of default interface
I0619 08:24:04.683997       1 match.go:259] Using interface with name ens33 and address 192.168.11.120
I0619 08:24:04.684037       1 match.go:281] Defaulting external address to interface address (192.168.11.120)
I0619 08:24:04.684245       1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0619 08:24:04.684536       1 main.go:334] Error registering network: failed to acquire lease: node "m001" pod cidr not assigned
I0619 08:24:04.684744       1 main.go:522] Stopping shutdownHandler...
W0619 08:24:04.685011       1 reflector.go:347] github.com/flannel-io/flannel/pkg/subnet/kube/kube.go:486: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
[root@M001 ~]#

看到：

Error registering network: failed to acquire lease: node "m001" pod cidr not assigned

看来还是pod cidr的配置出现问题。

检查k8s Cluster的pod CIDR配置：

[root@M001 ~]# kubectl cluster-info dump | grep -m 1 cluster-cidr
                            "--cluster-cidr=10.100.0.0/16",
[root@M001 ~]#

这里好像不对；检查下初始化k8s的yml配置文件：

[root@M001 ~]# 
[root@M001 ~]# cat kubeadm-init.yml 
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.11.120
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
#  name: node
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: xml_k8s
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.26.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.100.0.0/16
scheduler: {}

---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd                                       #申明cgroup用 systemd
failSwapOn: false

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs


[root@M001 ~]#

上面的podSubnet手误，写错了地址。

原因呢查到，接下来就是如何解决问题了，当然不想整个群集删除重建喽。

步骤：

a) 修改群集配置

集群配置 configmap, 在 networking 下修正 podSubnet。

[root@M001 ~]# kubectl edit cm kubeadm-config -n kube-system              
onfigmap/kubeadm-config edited
[root@M001 ~]# 

---
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      extraArgs:
        authorization-mode: Node,RBAC
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta3
    certificatesDir: /etc/kubernetes/pki
    clusterName: xml_k8s
    controllerManager: {}
    dns: {}
    etcd:
      local:
        dataDir: /var/lib/etcd
    imageRepository: registry.aliyuncs.com/google_containers
    kind: ClusterConfiguration
    kubernetesVersion: v1.26.0
    networking:
      dnsDomain: cluster.local
      podSubnet: 10.112.0.0/12        #修改这里使其不与servicesubnet冲突，也要和flannel配置文件一致
      serviceSubnet: 10.96.0.0/12
    scheduler: {}
kind: ConfigMap
metadata:
  creationTimestamp: "2023-06-16T07:07:35Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "199"
  uid: 1c652fe9-a733-4793-a6f0-cc9d2d779644
---

b) 修改 controller-manager 静态 pod 的启动参数

增加

--allocate-node-cidrs=true

--cluster-cidr=10.112.0.0/12

[root@M001 ~]# kubectl edit cm kubeadm-config -n kube-system              
onfigmap/kubeadm-config edited
[root@M001 ~]# vi /etc/kubernetes/manifests/kube-controller-manager.yaml    
[root@M001 ~]# cat /etc/kubernetes/manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubecnotallow=/etc/kubernetes/controller-manager.conf
    - --authorization-kubecnotallow=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --allocate-node-cidrs=true     #增加这里
    - --cluster-cidr=10.112.0.0/12   #修改这里使其和flannel配置文件一致
    - --cluster-name=xml_k8s
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --cnotallow=*,bootstrapsigner,tokencleaner
    - --kubecnotallow=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.96.0.0/12
    - --use-service-account-credentials=true
    image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.26.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10257
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-controller-manager
    resources:
      requests:
        cpu: 200m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10257
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ca-certs
      readOnly: true
    - mountPath: /etc/pki
      name: etc-pki
      readOnly: true
    - mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      name: flexvolume-dir
    - mountPath: /etc/kubernetes/pki
      name: k8s-certs
      readOnly: true
    - mountPath: /etc/kubernetes/controller-manager.conf
      name: kubeconfig
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/pki
      type: DirectoryOrCreate
    name: etc-pki
  - hostPath:
      path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      type: DirectoryOrCreate
    name: flexvolume-dir
  - hostPath:
      path: /etc/kubernetes/pki
      type: DirectoryOrCreate
    name: k8s-certs
  - hostPath:
      path: /etc/kubernetes/controller-manager.conf
      type: FileOrCreate
    name: kubeconfig
status: {}
[root@M001 ~]#

c) 检查配置生效情况

[root@M001 ~]# kubectl cluster-info dump | grep -m 1 cluster-cidr        
                            "--cluster-cidr=10.112.0.0/12",
[root@M001 ~]#

注意：如果更新较慢，可以手动删除相关 pod，如 kubectl delete pod -n kube-system kube-flannel-ds-amd64-???

d) 确认flannel配置文件的设定

flannel的配置文件是：kube-flannel.yml；检查其中的Network确保与前面几步的设定完全一致。

【Kubernets】K8s群集安装部署详细教程-(3)安装过程中的错误解决_kubernets_02

e) 逐一确认各节点的状态

#M001
[root@M001 ~]# kubectl describe node m001                        
Name:               m001
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=m001
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"b2:50:68:29:17:b5"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.11.120
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 16 Jun 2023 15:07:33 +0800
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  m001
  AcquireTime:     <unset>
  RenewTime:       Mon, 19 Jun 2023 16:58:48 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 19 Jun 2023 16:55:02 +0800   Mon, 19 Jun 2023 16:55:02 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Mon, 19 Jun 2023 16:54:31 +0800   Fri, 16 Jun 2023 15:07:31 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 19 Jun 2023 16:54:31 +0800   Fri, 16 Jun 2023 15:07:31 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 19 Jun 2023 16:54:31 +0800   Fri, 16 Jun 2023 15:07:31 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 19 Jun 2023 16:54:31 +0800   Fri, 16 Jun 2023 16:26:16 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.11.120
  Hostname:    m001
Capacity:
  cpu:                2
  ephemeral-storage:  38700584Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1789588Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  35666458156
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1687188Ki
  pods:               110
System Info:
  Machine ID:                 714a13eae5c04693ad91bf1cdbcf706f
  System UUID:                9ca14d56-178a-3d0f-d26c-94bcbee0f0e5
  Boot ID:                    f943b00b-7105-4d26-ac7e-1cbf4b261bd5
  Kernel Version:             5.14.0-325.el9.x86_64
  OS Image:                   CentOS Stream 9
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.21
  Kubelet Version:            v1.26.0
  Kube-Proxy Version:         v1.26.0
PodCIDR:                      10.112.0.0/24     #出现了
PodCIDRs:                     10.112.0.0/24     #出现了
Non-terminated Pods:          (8 in total)
  Namespace                   Name                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                            ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-fc2fb           100m (5%)     0 (0%)      50Mi (3%)        0 (0%)         2d23h
  kube-system                 coredns-5bbd96d687-c948c        100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    3d1h
  kube-system                 coredns-5bbd96d687-tpqs6        100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    3d1h
  kube-system                 etcd-m001                       100m (5%)     0 (0%)      100Mi (6%)       0 (0%)         3d1h
  kube-system                 kube-apiserver-m001             250m (12%)    0 (0%)      0 (0%)           0 (0%)         3d1h
  kube-system                 kube-controller-manager-m001    200m (10%)    0 (0%)      0 (0%)           0 (0%)         4m55s
  kube-system                 kube-proxy-khc5n                0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d1h
  kube-system                 kube-scheduler-m001             100m (5%)     0 (0%)      0 (0%)           0 (0%)         3d1h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                950m (47%)   0 (0%)
  memory             290Mi (17%)  340Mi (20%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:
  Type    Reason            Age                    From             Message
  ----    ------            ----                   ----             -------
  Normal  CIDRNotAvailable  6m59s (x92 over 3d1h)  cidrAllocator    Node m001 status is now: CIDRNotAvailable
  Normal  RegisteredNode    4m26s                  node-controller  Node m001 event: Registered Node m001 in Controller
[root@M001 ~]# 


#N001
[root@M001 ~]# kubectl describe node n001
Name:               n001
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=n001
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"4e:99:fd:f9:6a:d5"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.11.121
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 16 Jun 2023 15:55:01 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  n001
  AcquireTime:     <unset>
  RenewTime:       Mon, 19 Jun 2023 17:00:56 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 19 Jun 2023 16:56:45 +0800   Mon, 19 Jun 2023 16:56:45 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Mon, 19 Jun 2023 16:59:45 +0800   Fri, 16 Jun 2023 16:50:08 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 19 Jun 2023 16:59:45 +0800   Fri, 16 Jun 2023 16:50:08 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 19 Jun 2023 16:59:45 +0800   Fri, 16 Jun 2023 16:50:08 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 19 Jun 2023 16:59:45 +0800   Fri, 16 Jun 2023 16:51:48 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.11.121
  Hostname:    n001
Capacity:
  cpu:                2
  ephemeral-storage:  38700584Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1789588Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  35666458156
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             1687188Ki
  pods:               110
System Info:
  Machine ID:                 7e170c574de842a6b03a8c74efbc7755
  System UUID:                6d2b4d56-c0f1-6a16-8354-64e1136889f0
  Boot ID:                    bff2448c-9be0-4c49-94bd-e13d66671e55
  Kernel Version:             5.14.0-325.el9.x86_64
  OS Image:                   CentOS Stream 9
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.21
  Kubelet Version:            v1.26.0
  Kube-Proxy Version:         v1.26.0
PodCIDR:                      10.112.1.0/24
PodCIDRs:                     10.112.1.0/24
Non-terminated Pods:          (2 in total)
  Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                     ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-hrlb4    100m (5%)     0 (0%)      50Mi (3%)        0 (0%)         2d23h
  kube-system                 kube-proxy-5q92t         0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d1h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (5%)  0 (0%)
  memory             50Mi (3%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type    Reason            Age                    From             Message
  ----    ------            ----                   ----             -------
  Normal  CIDRNotAvailable  8m42s (x99 over 3d1h)  cidrAllocator    Node n001 status is now: CIDRNotAvailable
  Normal  RegisteredNode    6m27s                  node-controller  Node n001 event: Registered Node n001 in Controller
[root@M001 ~]# 




#N002
[root@M001 ~]# kubectl describe node n002
Name:               n002
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=n002
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"f6:03:6c:3e:42:cb"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.11.122
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 16 Jun 2023 15:58:11 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  n002
  AcquireTime:     <unset>
  RenewTime:       Mon, 19 Jun 2023 17:01:16 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 19 Jun 2023 16:56:13 +0800   Mon, 19 Jun 2023 16:56:13 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Mon, 19 Jun 2023 16:58:22 +0800   Fri, 16 Jun 2023 15:58:11 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 19 Jun 2023 16:58:22 +0800   Fri, 16 Jun 2023 15:58:11 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 19 Jun 2023 16:58:22 +0800   Fri, 16 Jun 2023 15:58:11 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 19 Jun 2023 16:58:22 +0800   Fri, 16 Jun 2023 15:59:40 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.11.122
  Hostname:    n002
Capacity:
  cpu:                1
  ephemeral-storage:  38700584Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             756904Ki
  pods:               110
Allocatable:
  cpu:                1
  ephemeral-storage:  35666458156
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             654504Ki
  pods:               110
System Info:
  Machine ID:                 a9e6288c521d4f868c036c3b466209c4
  System UUID:                03794d56-7ee4-42f8-c910-3bd20730aef1
  Boot ID:                    c3775db2-5fae-4906-adca-23427f4c22d3
  Kernel Version:             5.14.0-171.el9.x86_64
  OS Image:                   CentOS Stream 9
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.21
  Kubelet Version:            v1.26.0
  Kube-Proxy Version:         v1.26.0
PodCIDR:                      10.112.2.0/24
PodCIDRs:                     10.112.2.0/24
Non-terminated Pods:          (3 in total)
  Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                     ------------  ----------  ---------------  -------------  ---
  default                     ngx                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d
  kube-flannel                kube-flannel-ds-bssdt    100m (10%)    0 (0%)      50Mi (7%)        0 (0%)         2d23h
  kube-system                 kube-proxy-b245t         0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d1h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                100m (10%)  0 (0%)
  memory             50Mi (7%)   0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type    Reason            Age                  From             Message
  ----    ------            ----                 ----             -------
  Normal  CIDRNotAvailable  10m (x83 over 3d1h)  cidrAllocator    Node n002 status is now: CIDRNotAvailable
  Normal  RegisteredNode    6m52s                node-controller  Node n002 event: Registered Node n002 in Controller
[root@M001 ~]#

f) 检查pods的运行状态

#运行情况
[root@M001 ~]# kubectl get pods --all-namespaces
NAMESPACE      NAME                           READY   STATUS    RESTARTS       AGE
default        ngx                            1/1     Running   0              3d
kube-flannel   kube-flannel-ds-bssdt          1/1     Running   60 (11m ago)   2d23h
kube-flannel   kube-flannel-ds-fc2fb          1/1     Running   60 (12m ago)   2d23h
kube-flannel   kube-flannel-ds-hrlb4          1/1     Running   60 (10m ago)   2d23h
kube-system    coredns-5bbd96d687-c948c       1/1     Running   0              3d1h
kube-system    coredns-5bbd96d687-tpqs6       1/1     Running   0              3d1h
kube-system    etcd-m001                      1/1     Running   0              3d1h
kube-system    kube-apiserver-m001            1/1     Running   0              3d1h
kube-system    kube-controller-manager-m001   1/1     Running   0              8m37s
kube-system    kube-proxy-5q92t               1/1     Running   2 (3d ago)     3d1h
kube-system    kube-proxy-b245t               1/1     Running   0              3d1h
kube-system    kube-proxy-khc5n               1/1     Running   0              3d1h
kube-system    kube-scheduler-m001            1/1     Running   0              3d1h
[root@M001 ~]# 
#某一个的日志
[root@M001 ~]# kubectl logs kube-flannel-ds-hrlb4 -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0619 08:56:44.161875       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0619 08:56:44.162426       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0619 08:56:44.172385       1 kube.go:144] Waiting 10m0s for node controller to sync
I0619 08:56:44.172539       1 kube.go:485] Starting kube subnet manager
I0619 08:56:44.176719       1 kube.go:506] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.112.2.0/24]
I0619 08:56:44.176756       1 kube.go:506] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.112.0.0/24]
I0619 08:56:45.173029       1 kube.go:151] Node controller sync successful
I0619 08:56:45.173057       1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - n001
I0619 08:56:45.173062       1 main.go:234] Installing signal handlers
I0619 08:56:45.173321       1 main.go:542] Found network config - Backend type: vxlan
I0619 08:56:45.173344       1 match.go:206] Determining IP address of default interface
I0619 08:56:45.173709       1 match.go:259] Using interface with name ens33 and address 192.168.11.121
I0619 08:56:45.173730       1 match.go:281] Defaulting external address to interface address (192.168.11.121)
I0619 08:56:45.173769       1 vxlan.go:140] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
W0619 08:56:45.182753       1 main.go:595] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env
I0619 08:56:45.182820       1 main.go:481] Current network or subnet (10.112.0.0/12, 10.112.1.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I0619 08:56:45.183633       1 kube.go:506] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.112.1.0/24]
I0619 08:56:45.204871       1 main.go:356] Setting up masking rules
I0619 08:56:45.207593       1 main.go:407] Changing default FORWARD chain policy to ACCEPT
I0619 08:56:45.209693       1 iptables.go:290] generated 7 rules
I0619 08:56:45.212622       1 main.go:435] Wrote subnet file to /run/flannel/subnet.env
I0619 08:56:45.212704       1 main.go:439] Running backend.
I0619 08:56:45.213367       1 iptables.go:290] generated 3 rules
I0619 08:56:45.214654       1 vxlan_network.go:64] watching for new subnet leases
I0619 08:56:45.215811       1 watch.go:51] Batch elem [0] is { subnet.Event{Type:0, Lease:subnet.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa700200, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:subnet.LeaseAttrs{PublicIP:0xc0a80b7a, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x66, 0x36, 0x3a, 0x30, 0x33, 0x3a, 0x36, 0x63, 0x3a, 0x33, 0x65, 0x3a, 0x34, 0x32, 0x3a, 0x63, 0x62, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0619 08:56:45.215870       1 watch.go:51] Batch elem [0] is { subnet.Event{Type:0, Lease:subnet.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa700000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:subnet.LeaseAttrs{PublicIP:0xc0a80b78, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x62, 0x32, 0x3a, 0x35, 0x30, 0x3a, 0x36, 0x38, 0x3a, 0x32, 0x39, 0x3a, 0x31, 0x37, 0x3a, 0x62, 0x35, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0619 08:56:45.219194       1 main.go:460] Waiting for all goroutines to exit
I0619 08:56:45.235656       1 iptables.go:283] bootstrap done
I0619 08:56:45.242457       1 iptables.go:283] bootstrap done
[root@M001 ~]#

g) 看看pod的ip地址

[root@M001 ~]# kubectl get pods -o wide --all-namespaces
NAMESPACE      NAME                           READY   STATUS    RESTARTS       AGE     IP               NODE   NOMINATED NODE   READINESS GATES
default        ngx                            1/1     Running   0              3d      10.112.2.2       n002   <none>           <none>
kube-flannel   kube-flannel-ds-bssdt          1/1     Running   60 (17m ago)   2d23h   192.168.11.122   n002   <none>           <none>
kube-flannel   kube-flannel-ds-fc2fb          1/1     Running   60 (18m ago)   2d23h   192.168.11.120   m001   <none>           <none>
kube-flannel   kube-flannel-ds-hrlb4          1/1     Running   60 (16m ago)   2d23h   192.168.11.121   n001   <none>           <none>
kube-system    coredns-5bbd96d687-c948c       1/1     Running   0              3d2h    10.112.0.3       m001   <none>           <none>
kube-system    coredns-5bbd96d687-tpqs6       1/1     Running   0              3d2h    10.112.0.2       m001   <none>           <none>
kube-system    etcd-m001                      1/1     Running   0              3d2h    192.168.11.120   m001   <none>           <none>
kube-system    kube-apiserver-m001            1/1     Running   0              3d2h    192.168.11.120   m001   <none>           <none>
kube-system    kube-controller-manager-m001   1/1     Running   0              14m     192.168.11.120   m001   <none>           <none>
kube-system    kube-proxy-5q92t               1/1     Running   2 (3d ago)     3d1h    192.168.11.121   n001   <none>           <none>
kube-system    kube-proxy-b245t               1/1     Running   0              3d1h    192.168.11.122   n002   <none>           <none>
kube-system    kube-proxy-khc5n               1/1     Running   0              3d2h    192.168.11.120   m001   <none>           <none>
kube-system    kube-scheduler-m001            1/1     Running   0              3d2h    192.168.11.120   m001   <none>           <none>
[root@M001 ~]# 
[root@M001 ~]# kubectl get po -n kube-flannel -o wide
NAME                    READY   STATUS    RESTARTS       AGE     IP               NODE   NOMINATED NODE   READINESS GATES
kube-flannel-ds-bssdt   1/1     Running   60 (24m ago)   2d23h   192.168.11.122   n002   <none>           <none>
kube-flannel-ds-fc2fb   1/1     Running   60 (25m ago)   2d23h   192.168.11.120   m001   <none>           <none>
kube-flannel-ds-hrlb4   1/1     Running   60 (23m ago)   2d23h   192.168.11.121   n001   <none>           <none>
[root@M001 ~]#

至此，问题解决。

如果你的情况不是这样的，可以继续参考下面的步骤尝试解决。

检查容器配置文件：确保容器的配置文件正确且完整。检查容器镜像、启动命令、环境变量等参数，确保它们与应用程序要求相匹配。如果配置文件中有错误，可能导致容器启动失败并进入 CrashLoopBackOff 状态。
更新容器镜像或配置：如果发现容器镜像或配置文件存在问题，可以尝试更新它们。根据需要修改容器镜像版本或重新编写配置文件，并使用以下命令将更改的配置应用到 Pod：

kubectl apply -f <config-file>

<config-file> 是更新后的配置文件路径。

检查资源限制：确保 Pod 设置的资源限制（如 CPU 和内存）适合容器中运行的应用程序。如果资源限制过低，可能导致应用程序无法正常运行并进入 CrashLoopBackOff 状态。
检查依赖项：如果应用程序依赖其他服务或资源（如数据库、配置信息等），确保这些依赖项已正确配置且可访问。如果依赖项无法满足，可能会导致应用程序崩溃并进入 CrashLoopBackOff 状态。

使用验证

创建Pod

创建：

[root@M001 ~]# kubectl run ngx --image=nginx:alpine --port=80
pod/ngx created
[root@M001 ~]#

查看：

[root@M001 ~]# kubectl get pods ngx -o wide
NAME   READY   STATUS    RESTARTS   AGE    IP           NODE   NOMINATED NODE   READINESS GATES
ngx    1/1     Running   0          3d1h   10.112.2.2   n002   <none>           <none>
[root@M001 ~]# 
#等状态变成running

创建SVC

创建：

[root@M001 ~]# kubectl expose pod ngx --target-port 80 --type NodePort
service/ngx exposed
[root@M001 ~]# 
#这里的80指的是ngx这个pod的80端口

查看：

[root@M001 ~]# kubectl get service
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP        3d2h
ngx          NodePort    10.107.222.138   <none>        80:31718/TCP   14s
[root@M001 ~]# kubectl get service ngx                                
NAME   TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
ngx    NodePort   10.102.254.255   <none>        80:31718/TCP   20s
[root@M001 ~]#
#访问需要使用
#http://192.168.11.120:31718

验证访问

nginx 的 svc 是通过 NodePort 的方式暴露出来的，直接通过浏览器访问 31718 端口。

【Kubernets】K8s群集安装部署详细教程-(3)安装过程中的错误解决_Pod_03

如何将service的端口固定下来呢？

要创建一个 Kubernetes Service 并将其绑定到固定的端口上，您需要创建一个 Service 对象，并为该 Service 指定一个端口。

以下是创建一个固定端口的 Service 的示例 YAML 文件：

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: ClusterIP
  ports:
    - protocol: TCP
      port: <port-number>
      targetPort: <target-port>
  selector:
    app: my-app

请将 <port-number> 替换为您想要使用的端口号，<target-port> 替换为将流量转发到的容器端口号，my-service 替换为您想要的 Service 名称，app: my-app 是一个标签选择器，它将 Service 与具有 app=my-app 标签的 Pod 关联起来。

将上述 YAML 文件保存为 service.yaml，然后使用以下命令创建 Service：

kubectl apply -f service.yaml

这将创建一个名为 my-service 的 Service，并将其绑定到指定的端口。

请注意，这个示例创建的是一个 ClusterIP 类型的 Service，它只在集群内部可用。如果您需要从集群外部访问该 Service，您可以考虑使用其他类型的 Service，如 NodePort 或 LoadBalancer。

以上，就是本文的全部内容；下一篇将会介绍如何部署k8s的图形化管理系统。

文章准备仓促，可能存在错别字或者表述不清甚至错误的情况，如果大家有发现文章不妥之处，真诚欢迎留言，本人会尽力修正。

喜欢本文的朋友请三连哦，谢谢！