一.基础概念
探针 是由 kubelet 对容器执行的定期诊断,具体调用由容器实现的 Handler (处理程序)
针对运行中的容器,kubelet 可以选择是否执行以下三种探针,以及如何针对探测结果作出反应:
- livenessProbe:针对pod运行状态存活检测。即Pod是否为running状态,如果LivenessProbe探针探测到容器不健康,则kubelet将kill掉容器,并根据容器的重启策略是否重启,如果一个容器不包含LivenessProbe探针,则Kubelet认为容器的LivenessProbe探针的返回值永远成功,当我们执行kubectl get pods命令,输出信息中STATUS一列我们可以看到Pod是否处于Running状态
- readinessProbe:针对容器提供服务,就绪探测。如果ReadinessProbe探测失败,则容器的Ready将为False,控制器将此Pod的Endpoint从对应的service的Endpoint列表中移除,从此不再将任何请求调度此Pod上,直到下次探测成功,与livenessProbe不同的是,kubelet不会对readinessProbe的探测情况有重启操作。当我们执行kubectl get pods命令,输出信息中READY一列我们可以看到Pod的READY状态是否为True
- startupProbe:针对容器服务端口探测,容器内应用是否已启动。如果启用startupProbe,则禁用其他探测,知道它成功为止。探测失败,kubelet将杀死容器,容器服从重启策略
二.应用背景
启动时间长的用startupProbe探针
Pod已经成功启动,但是 Pod 的的容器中应用程序还在启动中导致发生错误,可以使用readinessProbe,保证服务的高可用
无法在遇到问题后,自行崩溃的程序使用livenessProbe
三.3种探针的异同
startupProbe探针的使用方法跟 ReadinessProbe 和 livenessProbe 相同,对 Pod 的处置跟livenessProbe 方式相同,失败重启,只在容器启动时运行一次
readinessProbe 当检测失败后,将 Pod 的 IP:Port 从对应的 EndPoint 列表中删除,运行于容器整个生命周期
livenessProbe 当检测失败后,将杀死容器并根据 Pod 的重启策略来决定作出对应的措施。运行于容器整个生命周期
四.3种探测方式
五.实操
liveness探针
1.存活探针liveness ExecAction
容器启动5秒后,kubelet每隔5秒检测是否有/tmp/health文件,35秒后文件被删除,则探针失败,被kubelet删除重启,并遵从重启策略
[root@k8s-master1 ~]# cat exec-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5 # 延迟探测时间,容器启动后第一次执行探测是需要等待多少秒。
periodSeconds: 5 # 执行探测的频率。默认是10秒,最小1秒。
timeoutSeconds: 1 # 超时时间
successThreshold: 1 # 健康阀值,成功一次,标记成功
failureThreshold: 6 # 失败后重试次数,连续6次失败,标记失败
initialDelaySeconds + periodSeconds * failureThreshold = 实际启动时间
[root@k8s-master1 ~]# kubectl describe pod liveness-exec
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 51s default-scheduler Successfully assigned default/liveness-exec to k8s-node2
Normal Pulling 50s kubelet, k8s-node2 Pulling image "busybox"
Normal Pulled 35s kubelet, k8s-node2 Successfully pulled image "busybox"
Normal Created 35s kubelet, k8s-node2 Created container liveness
Normal Started 35s kubelet, k8s-node2 Started container liveness
Warning Unhealthy 4s kubelet, k8s-node2 Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
[root@k8s-master1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox-5d7b4b65d6-6gfbb 1/1 Running 0 129m
liveness-exec 1/1 Running 5 8m37s
myweb-7ccb985444-cqbft 1/1 Running 0 129m
2.liveness HTTP请求
在容器启动15s后开始探测。timeout仅设置为1秒,因此容器必须在1秒内进行响应, 不然这次 探测记作失败。每10秒探测一次容器(period=10s), 并在探测连续三次失败 (#failure= 3)后重启容器
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-error
spec:
replicas: 1
selector:
matchLabels:
app: nginx-error
template:
metadata:
name: nginx-error
labels:
app: nginx-error
spec:
containers:
- name: nginx-error
image: nginx-error
imagePullPolicy: Never
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 15
[root@k8s-master1 ~]# kubectl describe deployment/nginx-error
Name: nginx-error
Namespace: default
CreationTimestamp: Mon, 03 Jan 2022 03:35:00 -0500
Labels: <none>
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-error","namespace":"default"},"spec":{"replicas":1,...
Selector: app=nginx-error
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=nginx-error
Containers:
nginx-error:
Image: nginx
Port: 8080/TCP
Host Port: 0/TCP
Liveness: http-get http://:8080/ delay=15s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-error-54c58ddb79 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 88s deployment-controller Scaled up replica set nginx-error-54c58ddb79 to 1
[root@k8s-master1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox-5d7b4b65d6-6gfbb 1/1 Running 0 148m 10.244.1.2 k8s-node1 <none> <none>
liveness-exec 1/1 Running 10 27m 10.244.2.3 k8s-node2 <none> <none>
myweb-7ccb985444-cqbft 1/1 Running 0 148m 10.244.2.10 k8s-node2 <none> <none>
nginx-error-54c58ddb79-6h9mx 0/1 CrashLoopBackOff 5 5m36s 10.244.2.4 k8s-node2 <none> <none>
3.liveness tcpSocket请求
[root@k8s-master1 ~]# cat nginx-error2_deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-error2
spec:
replicas: 1
selector:
matchLabels:
app: nginx-error2
template:
metadata:
name: nginx-error2
labels:
app: nginx-error2
spec:
containers:
- name: nginx-error2
image: nginx
imagePullPolicy: Always
ports:
- containerPort: 8080
#readinessProbe:
# tcpSocket:
# port: 8080
# initialDelaySeconds: 5
# periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
#livenessProbe:
# httpGet:
# path: /
# port: 8080
# initialDelaySeconds: 15
[root@k8s-master1 ~]# kubectl describe pod/nginx-error2-65bb47c758-d9gkb
Name: nginx-error2-65bb47c758-d9gkb
Namespace: default
Priority: 0
Node: k8s-node2/192.168.255.144
Start Time: Mon, 03 Jan 2022 03:46:10 -0500
Labels: app=nginx-error2
pod-template-hash=65bb47c758
Annotations: <none>
Status: Running
IP: 10.244.2.5
IPs:
IP: 10.244.2.5
Controlled By: ReplicaSet/nginx-error2-65bb47c758
Containers:
nginx-error2:
Container ID: docker://96e949a0c05ad384da75a33130341a1cea3f10bf784493e57347cd15d809a5b7
Image: nginx
Image ID: docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 03 Jan 2022 03:47:43 -0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 03 Jan 2022 03:46:27 -0500
Finished: Mon, 03 Jan 2022 03:47:27 -0500
Ready: True
Restart Count: 1
Liveness: tcp-socket :8080 delay=15s timeout=1s period=20s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vf9dn (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-vf9dn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vf9dn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 106s default-scheduler Successfully assigned default/nginx-error2-65bb47c758-d9gkb to k8s-node2
Normal Pulling 29s (x2 over 105s) kubelet, k8s-node2 Pulling image "nginx"
Warning Unhealthy 29s (x3 over 69s) kubelet, k8s-node2 Liveness probe failed: dial tcp 10.244.2.5:8080: connect: connection refused
Normal Killing 29s kubelet, k8s-node2 Container nginx-error2 failed liveness probe, will be restarted
Normal Pulled 13s (x2 over 90s) kubelet, k8s-node2 Successfully pulled image "nginx"
Normal Created 13s (x2 over 89s) kubelet, k8s-node2 Created container nginx-error2
Normal Started 13s (x2 over 89s) kubelet, k8s-node2 Started container nginx-error2
readnessProbe探针
[root@k8s-master1 ~]# cat nginx-error2_deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-error2
spec:
replicas: 1
selector:
matchLabels:
app: nginx-error2
template:
metadata:
name: nginx-error2
labels:
app: nginx-error2
spec:
containers:
- name: nginx-error2
image: nginx
imagePullPolicy: Always
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
readnessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 15
startupprobe探针
startupProbe:
httpGet:
path: /
port: 8080
failureThreshold: 10 # 失败后重试次数
initialDelaySeconds: 10 # 等待5秒探测
periodSeconds: 10 # 每隔10秒进行探测
六.优先级顺序
startupprobe > livenessprobe > readnessprobe