一、背景介绍
以docker为代表的运行时容器,都支持将数据存储在容器自身之外的地方实现数据持久化,存储位置可以是承载容器的物理机本地磁盘,也可以是网络上的其他存储位置。类似的,K8S也支持将短生命周期的容器数据进行持久化存储,但存储卷绑定的对象是pod而非容器,pod中的所有容器共享绑定的持久化存储卷。
二、存储卷种类与概述
1.emptyDir存储卷
emptydir作为一种临时目录,生命周期随着pod的结束而结束, 尽管 Pod 中的容器挂载 emptyDir 卷的路径可能相同也可能不同,这些容器都可以读写 emptyDir 卷中相同的文件。 当 Pod 因为某些原因被从节点上删除时,emptyDir 卷中的数据也会被永久删除(容器崩溃并不会导致 Pod 被从节点上移除,因此容器崩溃期间 emptyDir 卷中的数据是安全的)。通过示例进行说明:
apiVersion: v1
kind: Pod
metadata:
name: emptydir
spec:
containers:
- name: redis
image: redis
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data/redis
name: emptydir
- name: busybox
image: busybox
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /srv/busybox
name: emptydir
command: [ "sleep", "1000000" ]
volumes:
- name: emptydir
emptyDir: {}
通过上述的yaml文件创建出pod之后,首先进入busybox容器,在挂载点/srv/busybox下,创建一个名为1.txt的文件,写入内容“1111”
root@master1:~/yaml# kubectl exec -it pod/emptydir -c busybox -- /bin/sh
/ # echo 1111 >> /srv/busybox/1.txt
/ # cat /srv/busybox/1.txt
1111
然后,再进入redis容器,在挂载点/data/redis下查看1.txt文件内容,与busybox容器一致
root@master1:~# kubectl exec -it pod/emptydir -c redis -- /bin/sh
# cat redis/1.txt
1111
2.hostpath存储卷
hostPath 卷能将主机节点文件系统上的文件或目录挂载到你的 Pod 中,但这种方法通常并不被推荐,hostPath 卷存在许多安全风险,最佳做法是尽可能避免使用 hostPath。当必须使用 HostPath 卷时,它的范围应仅限于所需的文件或目录,并以只读方式挂载,当使用这种类型的卷时要小心,因为::
- HostPath 卷可能会暴露特权系统凭据(例如 Kubelet)或特权 API(例如容器运行时套接字),可用于容器逃逸或危害集群的其他部分。
- 具有相同配置(例如基于同一 PodTemplate 创建)的多个 Pod 会由于节点上文件的不同而在不同节点上有不同的行为。
- 下层主机上创建的文件或目录只能由 root 用户写入。 你需要在特权容器中以 root 身份运行进程,或者修改主机上的文件权限以便容器能够写入 hostPath 卷。
使用DirectoryOrCreate类型物理节点上可以事先不存在/hostpath/redis目录
apiVersion: v1
kind: Pod
metadata:
name: hostpath
spec:
containers:
- name: redis
image: redis
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data/redis
name: redis
volumes:
- name: redis
hostPath:
path: /hostpath/redis
type: DirectoryOrCreate
类型为Directory的pod在被创建时,如果物理节点上事先没有/hostname/busybox路径,则会报错
apiVersion: v1
kind: Pod
metadata:
name: hostpath
spec:
containers:
- name: busybox
image: busybox
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /srv/busybox
name: busybox
command: [ "sleep", "1000000" ]
volumes:
- name: busybox
hostPath:
path: /hostname/busybox
type: Directory
3.NFS存储卷
除了emptyDir 和hostpath存储卷,还能将 NFS (网络文件系统) 挂载到你的 Pod 中。 不像 emptyDir 那样会在删除 Pod 的同时也会被删除,nfs 卷的内容在删除 Pod 时会被保存,卷只是被卸载。 这意味着 nfs 卷可以被预先填充数据,并且这些数据可以在 Pod 之间共享。当然使用hostpath存储卷在pod删除后并不会删除数据,也可以将nfs事先挂在给node节点让pod使用hostpath,示例采用网络nfs演示:
首先,确定nfs服务器对外服务正常(搭建步骤略)
root@master1:~/yaml# showmount -e 172.16.100.30
Export list for 172.16.100.30:
/nfs/dir3 172.16.100.0/24
/nfs/dir2 172.16.100.0/24
/nfs/dir1 172.16.100.0/24
其次,定义yaml文件,指定nfs服务器地址和影射出来的目录
apiVersion: v1
kind: Pod
metadata:
name: nfs
spec:
containers:
- image: redis
name: redis
volumeMounts:
- mountPath: /data/nfs
name: nfs-volume
volumes:
- name: nfs-volume
nfs:
server: 172.16.100.30
path: /nfs/dir1
readOnly: false
再次,进入pod中/data/nfs目录下,创建一个名为nfs.txt文件
root@master1:~/yaml# kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs 1/1 Running 0 5s
root@master1:~/yaml# kubectl exec -it pod/nfs -- /bin/sh
# cd nfs
# pwd
/data/nfs
# touch nfs.txt
最后,到nfs服务器上查看该文件是否存在
[root@nfs1 dir1]# pwd
/nfs/dir1
[root@nfs1 dir1]# ls
nfs.txt
4.RBD块存储卷
在使用ceph存储之前,首先要保证k8s的工作节点可以访问ceph存储(需要安装ceph-common和拥有ceph集群的配置文件和keyring文件),并且ceph集群上已创建出对应的块设备,此处ceph集群搭建过程省略。
首先,在ceph集群上创建块存储,并ceph集群配置文件和admin用户的keyring文件拷贝给k8s的工作节点
###创建存储池
[root@ceph1 ceph]# ceph osd pool create kube.pool 64 64
###将存储池初始化为rbd类型
[root@ceph1 ceph]# ceph osd pool application enable kube.pool rbd
enabled application 'rbd' on pool 'kube.pool'
###创建块存储
[root@ceph1 ceph]# rbd create kube.pool/rbd-image1 -s 10G
[root@ceph1 ceph]# rbd info kube.pool/rbd-image1
rbd image 'rbd-image1':
size 10 GiB in 2560 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: df0b7ee92306
block_name_prefix: rbd_data.df0b7ee92306
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Thu Aug 31 23:43:36 2023
access_timestamp: Thu Aug 31 23:43:36 2023
modify_timestamp: Thu Aug 31 23:43:36 2023
###拷贝ceph.conf和admin.keyring文件到k8s工作节点
[root@ceph1 ceph]# scp ceph.conf ceph.client.admin.keyring root@172.16.100.11:/etc/ceph/
[root@ceph1 ceph]# scp ceph.conf ceph.client.admin.keyring root@172.16.100.12:/etc/ceph/
[root@ceph1 ceph]# scp ceph.conf ceph.client.admin.keyring root@172.16.100.13:/etc/ceph/
其次,在所有工作节点上安装ceph-common包,并确保工作节点可以查看到ceph集群信息
root@node1:~# apt install ceph-common
root@node1:~# ceph -s
cluster:
id: 7629fef6-1943-11ee-9b2d-000c29dbfa5e
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph1,ceph3,ceph2 (age 3w)
mgr: ceph1.xxracu(active, since 3w), standbys: ceph2.zhftxe, ceph3.wcysbm
mds: 3 up:standby
osd: 10 osds: 10 up (since 2w), 10 in (since 2w)
rgw: 2 daemons active (myorg.us-east-1.ceph1.bslmbn, myorg.us-east-1.ceph2.baypns)
task status:
data:
pools: 8 pools, 297 pgs
objects: 264 objects, 14 MiB
usage: 11 GiB used, 489 GiB / 500 GiB avail
pgs: 297 active+clean
最后,定义yaml文件,创建pod
apiVersion: v1
kind: Pod
metadata:
name: rbd
spec:
containers:
- image: redis
name: redis
imagePullPolicy: IfNotPresent
volumeMounts:
- name: rbdvolume
mountPath: /data/rbdvolume
volumes:
- name: rbdvolume
rbd:
monitors:
- '10.0.0.1:6789'
- '10.0.0.2:6789'
- '10.0.0.3:6789'
pool: kube.pool
image: rbd-image1
fsType: xfs
user: admin
keyring: /etc/ceph/ceph.client.admin.keyring
root@master1:~/yaml# kubectl apply -f rbd.yaml
pod/rbd created
root@master1:~/yaml# kubectl get pod
NAME READY STATUS RESTARTS AGE
rbd 1/1 Running 0 43s
root@master1:~/yaml# kubectl exec -it pod/rbd -- lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 63.4M 1 loop
loop1 7:1 0 79.9M 1 loop
loop2 7:2 0 111.9M 1 loop
loop3 7:3 0 40.8M 1 loop
loop5 7:5 0 53.3M 1 loop
loop6 7:6 0 63.5M 1 loop
sda 8:0 0 100G 0 disk
|-sda1 8:1 0 1M 0 part
|-sda2 8:2 0 2G 0 part
`-sda3 8:3 0 98G 0 part /etc/hosts
sr0 11:0 1 1024M 0 rom
rbd0 252:0 0 10G 0 disk /data/rbdvolume
三、PV、PVC和SC
上文得知pod在数据持久化方面可以在创建时直接将nfs等存储写入yaml文件中,这就需要使用者同时对K8S平台和后端存储都必须熟悉,但实际情况是:往往用户作为最终使用者,没有必要了解后端存储的工作机制,于是引入了pv,pvc中间层,这样就转变成了消费者和提供者模型。
1.PersistentVolume(PV)
PV可以由管理员事先制备(静态制备), 或者使用存储类(Storage Class)来动态制备。 持久卷是集群资源,拥有独立于Pod 的生命周期。
apiVersion: v1
kind: PersistentVolume
metadata:
name: dir1
spec:
storageClassName: manual
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
nfs:
path: /nfs/dir1
server: 172.16.100.30
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: dir2
spec:
capacity:
storage: 5i
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
nfs:
path: /nfs/dir2
server: 172.16.100.30
root@master1:~/yaml# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
dir1 5Gi RWO Retain Available manual 22h
dir2 5Gi RWO Retain Available 22h
2.PersistentVolumeClaim(PVC)
pvc表达的是用户对存储的请求, PVC 申领会耗用 PV 资源,前提是要与 PV 卷访问模式(ReadWriteOnce、ReadOnlyMany 或 ReadWriteMany )和StorageClass一致,或者使用volumeName参数指明要绑定的PV
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc1
spec:
storageClassName: auto
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc2
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
3.Storage Class(SC)
每个 PV 可以属于某个类(Class),通过将其 storageClassName 属性设置为某个 StorageClass 的名称来指定。 特定类的 PV 卷只能绑定到请求该类存储卷的 PVC 申领。 未设置 storageClassName 的 PV 卷只能绑定到那些没有指定特定存储类的 PVC 。所以SC并不是PVC和PV绑定过程中的必要参数,SC最主要的用途是实现PV动态供给时,将类型传递给PV配置器(provisioner),从而自动创建PV。
早前,Kubernetes 使用注解 volume.beta.kubernetes.io/storage-class 而不是 storageClassName 属性。这一注解目前仍然起作用,不过在将来的 Kubernetes 发布版本中该注解会被彻底废弃。
4.PVC与PV的绑定条件
按照上文中的yaml创建出的pvc无法绑定pv,进而验证了pvc无论storageClassName和accessModes任何一项不一致,都不会绑定pv
root@master1:~/yaml# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc1 Pending auto 33s
pvc2 Pending manual 33s
root@master1:~/yaml# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
dir1 5Gi RWO Retain Available manual 22h
dir2 5Gi RWO Retain Available 22h
但容量并不是强制约束条件,将上述yaml更改如下:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc1
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc2
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
此时可以发现,pvc会找一个容量最接近的绑定(pvc申请容量只能小于或等于pv容量),但未定义sc的pvc只能绑未定义sc的pv,示例中pvc2就没法绑定dir2
由于sc名称可以自定义,但accessModes却需要符合后端存储的特性,特性共分为4种,每个卷同一时刻只能以一种访问模式挂载,即使该卷能够支持多种访问模式:
- ReadWriteOnce:卷可以被一个节点以读写方式挂载。 ReadWriteOnce 访问模式也允许运行在同一节点上的多个 Pod 访问卷。
- ReadOnlyMany:卷可以被多个节点以只读方式挂载。
- ReadWriteMany:卷可以被多个节点以读写方式挂载。
- ReadWriteOncePod:卷可以被单个 Pod 以读写方式挂载。 如果你想确保整个集群中只有一个 Pod 可以读取或写入该 PVC, 请使用 ReadWriteOncePod 访问模式。这只支持 CSI 卷以及需要 Kubernetes 1.22 以上版本。
卷插件 |
ReadWriteOnce |
ReadOnlyMany |
ReadWriteMany |
ReadWriteOncePod |
AzureFile |
✓ |
✓ |
✓ |
- |
CephFS |
✓ |
✓ |
✓ |
- |
CSI |
取决于驱动 |
取决于驱动 |
取决于驱动 |
取决于驱动 |
FC |
✓ |
✓ |
- |
- |
FlexVolume |
✓ |
✓ |
取决于驱动 |
- |
GCEPersistentDisk |
✓ |
✓ |
- |
- |
Glusterfs |
✓ |
✓ |
✓ |
- |
HostPath |
✓ |
- |
- |
- |
iSCSI |
✓ |
✓ |
- |
- |
NFS |
✓ |
✓ |
✓ |
- |
RBD |
✓ |
✓ |
- |
- |
VsphereVolume |
✓ |
- |
-(Pod 在相同节点上时可行) |
- |
PortworxVolume |
✓ |
- |
✓ |
- |
四、PV静态供给与动态供给
1.PV静态供给
按照消费者和提供者模型,对NFS存储卷示例的yaml文件做适当修改,这样最终用户在创建pod时就不需要对后端存储有所了解,只需要按照存储管理员的规范指定pvc。需要注意,Pod 的配置文件指定了 pvc,但没有指定 pv。 对 Pod 而言,pvc 就是一个和pod在相同namespace的存储卷(namespace坐在位置在metadata中定义)
apiVersion: v1
kind: Pod
metadata:
name: pv-pod
spec:
containers:
- name: redis
image: redis
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: "/data/redis"
name: pv-storage
volumes:
- name: pv-storage
persistentVolumeClaim:
claimName: pv-claim
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-volume
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
nfs:
path: /nfs/dir1
server: 172.16.100.30
执行后查看,pv、pvc和pod已被正常创建
root@master1:~/yaml# kubectl apply -f pv.yaml
pod/pv-pod created
persistentvolumeclaim/pv-claim created
persistentvolume/pv-volume created
root@master1:~/yaml# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-volume 10Gi RWO Retain Bound default/pv-claim manual 15s
root@master1:~/yaml# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pv-claim Bound pv-volume 10Gi RWO manual 20s
root@master1:~/yaml# kubectl get pod
NAME READY STATUS RESTARTS AGE
pv-pod 1/1 Running 0 30s
2.PV动态供给
静态供给需要管理员事先准备好pvc和pv,在创建pod时直接指定pvc,这种方式只能适用变动较少,不会频繁修改的场景,同时还会造成一定的空间浪费(pvc会绑定容量最接近的pv)。解决这个问题的方法是提供动态配置PV的方法,可以自动创PV。管理员可以部署PV配置器(provisioner),然后定义对应的StorageClass,这样开发者在创建PVC的时候就可以选择需要创建存储的类型,步骤如下:
- 在创建pod的yaml文件中请求PVC
- PVC根据存储类型找到存储类StorageClassName
- StorageClass传递给PV provisioner,由provisioner自动创建PV
- PV在和PVC形成绑定关系
- 持久卷开始供给给PV使用
这可以看到使用StorageClass后,不仅创建了PVC,而且创建了PV,并且将二者绑定了。减少创建并维护PV的工作,PV变成了自动创建,作为使用者,只需要在声明PVC时指定StorageClassName即可,这就大大减少工作量。
上文中提到了一个关键的插件:PV配置器(provisioner),provisioner可以分为内部(in-tree)和外部(out-of-tree),rbd在v1.28之后就需要采用CSI这种out-of-tree方式实现动态供给
另外,k8s 使用 stroageclass 动态创建 ceph 存储资源的时候,需要 controller-manager 使用 rbd 命令去和 ceph 集群交互,而 k8s 的 controller-manager 使用的默认镜像中没有集成 ceph 的 rbd 客户端。而 k8s 官方建议我们使用外部的 provisioner 来解决这个问题,所以不同于静态供给,集群中的master节点也需要安装ceph-common包。
a.使用kubernetes.io/rbd尝试
Ceph 存储集群默认是开启了 cephx 认证的,为了方便演示,使用ceph上的admin账户创建secret资源。
首先,查看ceph上admin用户的密码,由于k8s的secret资源是基于base64算法进行加密,所以将得到的密码进行加密
[root@ceph1 ~]# ceph auth get-key client.admin | base64
QVFDSktLSmtuSTk1QnhBQUtBb2VUd2dGUzZvWnI4TUJERG1CMWc9PQ==
其次,根据admin用户密码创建secret资源
apiVersion: v1
kind: Secret
metadata:
name: kube-secret-admin
type: "kubernetes.io/rbd"
data:
key: QVFDSktLSmtuSTk1QnhBQUtBb2VUd2dGUzZvWnI4TUJERG1CMWc9PQ==
root@master1:~# kubectl apply -f kube-secret-admin.yaml
secret/kube-secret-admin created
root@master1:~# kubectl get secrets
NAME TYPE DATA AGE
default-token-p67nm kubernetes.io/service-account-token 3 120d
kube-secret-admin kubernetes.io/rbd 1 11s
再次,创建一个名为kube的StorageClass资源
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: kube
provisioner: kubernetes.io/rbd
parameters:
monitors: 10.0.0.1:6789,10.0.0.2:6789,10.0.0.3:6789
adminId: admin
adminSecretName: kube-secret-admin
adminSecretNamespace: default
pool: k8s.pool
userId: admin
userSecretName: kube-secret-admin
userSecretNamespace: default
fsType: xfs
imageFormat: "2"
imageFeatures: "layering"
root@master1:~/yaml# kubectl apply -f kube.yaml
storageclass.storage.k8s.io/kube created
root@master1:~/yaml# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
kube kubernetes.io/rbd Delete Immediate false 6s
最后,创建一个PVC,指定SC为kube,看是否能够自动创建PV并完成绑定
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-kube
spec:
storageClassName: kube
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
root@master1:~/yaml# kubectl apply -f pvc-kube.yaml
persistentvolumeclaim/pvc-kube created
root@master1:~/yaml# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-kube Pending kube 10s
结果发现PVC状态为Pending,检查后发现persistentvolume-controller有返回的错误信息
root@master1:~/yaml# kubectl describe persistentvolumeclaim/pvc-kube
Name: pvc-kube
Namespace: default
StorageClass: kube
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd
volume.kubernetes.io/storage-provisioner: kubernetes.io/rbd
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 8s (x6 over 81s) persistentvolume-controller Failed to provision volume with StorageClass "kube": failed to create rbd image: executable file not found in $PATH, command output:
经过查询,通过kubeadm方式部署的kube-controller-manager控制器,其镜像内默认没有安装ceph-common包,所以没法使用rbd子命令,如果是二进制部署,就不会遇到这个问题,印证了k8s 的 controller-manager 使用的默认镜像中没有集成 ceph 的 rbd 客户端。
b.使用ceph.com/rbd尝试
step1,根据官方建议使用外部的 rbd-provisioner 来提供服务
root@master1:~/yaml# vim ceph-rbd-provisioner.yml
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: rbd-provisioner
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
namespace: kube-system
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
- apiGroups: [""]
resources: ["services"]
resourceNames: ["kube-dns","coredns"]
verbs: ["list", "get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
namespace: kube-system
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
roleRef:
kind: ClusterRole
name: rbd-provisioner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rbd-provisioner
namespace: kube-system
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rbd-provisioner
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rbd-provisioner
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: rbd-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:latest"
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
serviceAccount: rbd-provisioner
root@master1:~/yaml# kubectl apply -f ceph-rbd-provisioner.yml
serviceaccount/rbd-provisioner created
clusterrole.rbac.authorization.k8s.io/rbd-provisioner created
clusterrolebinding.rbac.authorization.k8s.io/rbd-provisioner created
role.rbac.authorization.k8s.io/rbd-provisioner created
rolebinding.rbac.authorization.k8s.io/rbd-provisioner created
deployment.apps/rbd-provisioner created
root@master1:~/yaml# kubectl get pods -l app=rbd-provisioner -n kube-system
NAME READY STATUS RESTARTS AGE
rbd-provisioner-7dd6d8f459-mkkdl 1/1 Running 0 41s
step2,获取ceph集群admin用户的密钥,并在k8s集群中创建为secret资源
[root@ceph1 ~]# ceph auth get-key client.admin | base64
QVFDSktLSmtuSTk1QnhBQUtBb2VUd2dGUzZvWnI4TUJERG1CMWc9PQ==
root@master1:~/yaml# vim ceph-admin-secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: ceph-admin-secret
namespace: kube-system
data:
key: QVFDSktLSmtuSTk1QnhBQUtBb2VUd2dGUzZvWnI4TUJERG1CMWc9PQ==
type:
kubernetes.io/rbd
root@master1:~/yaml# kubectl apply -f ceph-admin-secret.yaml
secret/ceph-admin-secret created
root@master1:~/yaml# kubectl get secrets ceph-admin-secret -n kube-system
NAME TYPE DATA AGE
ceph-admin-secret kubernetes.io/rbd 1 13s
step3,在ceph集群创建k8s需要用到的存储池与客户密钥
[root@ceph1 ~]# ceph auth add client.kube mon 'allow r' osd 'allow rwx pool=k8s'
[root@ceph1 ~]# rbd pool init k8s
[root@ceph1 ~]# ceph osd pool ls
k8s
[root@ceph1 ~]# ceph auth get-key client.kube | base64
QVFCRUlmUmtlNVhNT0JBQVM1VWN0SEJDN0JQVGR3SWZCTnBOU1E9PQ==
root@master1:~/yaml# vim ceph-k8s-secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: ceph-k8s-secret
namespace: default
data:
key: QVFCRUlmUmtlNVhNT0JBQVM1VWN0SEJDN0JQVGR3SWZCTnBOU1E9PQ==
type:
kubernetes.io/rbd
root@master1:~/yaml# kubectl apply -f ceph-k8s-secret.yaml
secret/ceph-k8s-secret created
root@master1:~/yaml# kubectl get secrets
NAME TYPE DATA AGE
ceph-k8s-secret kubernetes.io/rbd 1 41s
step4,创建sc
root@master1:~/yaml# vim ceph-rbd-sc.yaml
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ceph-rbd
provisioner: ceph.com/rbd
parameters:
monitors: 10.0.0.1:6789, 10.0.0.2:6789, 10.0.0.3:6789
pool: k8s
adminId: admin
adminSecretNamespace: kube-system
adminSecretName: ceph-admin-secret
userId: kube
userSecretNamespace: default
userSecretName: ceph-k8s-secret
imageFormat: "2"
imageFeatures: layering
root@master1:~/yaml# kubectl apply -f ceph-rbd-sc.yaml
storageclass.storage.k8s.io/ceph-rbd created
root@master1:~/yaml# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-rbd ceph.com/rbd Delete Immediate false 8s
step5,创建pvc,此时发现pvc仍为Pending状态
root@master1:~/yaml# vim ceph-rbd-claim.yml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-rbd-claim1
spec:
accessModes:
- ReadWriteOnce
storageClassName: ceph-rbd
resources:
requests:
storage: 1Gi
root@master1:~/yaml# kubectl apply -f ceph-rbd-claim.yml
persistentvolumeclaim/ceph-rbd-claim1 created
root@master1:~/yaml# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-rbd-claim1 Pending ceph-rbd 71m
检查报错,发现提示等待创建卷
root@master1:~/yaml# kubectl describe persistentvolumeclaim/ceph-rbd-claim1
Name: ceph-rbd-claim1
Namespace: default
StorageClass: ceph-rbd
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: ceph.com/rbd
volume.kubernetes.io/storage-provisioner: ceph.com/rbd
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 63s (x283 over 71m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ceph.com/rbd" or manually created by system administrator
检查日志,初步判断是selfLink导致,因为kubernetes 1.20版本开始,禁用了 selfLink导致
root@master1:~/yaml# kubectl logs -f rbd-provisioner-7dd6d8f459-mkkdl -n kube-system
I0903 09:37:28.060086 1 controller.go:987] provision "default/ceph-rbd-claim1" class "ceph-rbd": started
E0903 09:37:28.072152 1 controller.go:1004] provision "default/ceph-rbd-claim1" class "ceph-rbd": unexpected error getting claim reference: selfLink was empty, can't make reference
编辑/etc/kubernetes/manifests/kube-apiserver.yaml文件,增加该特性
spec:
containers:
- command:
...
...
- kube-apiserver
- --feature-gates=RemoveSelfLink=false
重启api-server,又出现了新的错误,说rbd-provisioner容器内找不到ceph.conf配置文件
root@master1:/etc/kubernetes/manifests# kubectl apply -f kube-apiserver.yaml
root@master1:/etc/kubernetes/manifests# kubectl logs -f rbd-provisioner-7dd6d8f459-mkkdl -n kube-system
2023-09-03 11:10:19.378 7f36a6273900 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2023-09-03 11:10:19.378 7f36a6273900 -1 parse_file: cannot open /root/.ceph/ceph.conf: (2) No such file or directory
进入rbd-provisioner容器内,手动添加ceph.conf和keyring文件
root@master1:~# kubectl exec -it rbd-provisioner-7dd6d8f459-mkkdl -n kube-system -- /bin/sh
sh-4.2# cd /etc/ceph/
sh-4.2# ls
ceph.client.admin.keyring ceph.conf rbdmap
此时错误信息变成如下所示
root@master1:/etc/kubernetes/manifests# kubectl logs -f rbd-provisioner-7dd6d8f459-mkkdl -n kube-system
2023-09-03 11:59:12.666 7f605026f900 -1 monclient: get_monmap_and_config failed to get config
rbd: couldn't connect to the cluster!
查找资料,说是由于ceph集群版本与rbd-provisioner版本不一致导致
sh-4.2# ceph -v
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
[root@ceph1 ceph]# ceph -v
ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
替换rbd-provisioner中的yum源,更换ceph版本
sh-4.2# ceph -v
ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)
sh-4.2# ceph -s
cluster:
id: 7629fef6-1943-11ee-9b2d-000c29dbfa5e
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph1,ceph3,ceph2 (age 4w)
mgr: ceph1.xxracu(active, since 4w), standbys: ceph2.zhftxe, ceph3.wcysbm
mds: 3 up:standby
osd: 10 osds: 10 up (since 3w), 10 in (since 3w)
rgw: 2 daemons active (myorg.us-east-1.ceph1.bslmbn, myorg.us-east-1.ceph2.baypns)
task status:
data:
pools: 11 pools, 457 pgs
objects: 288 objects, 21 MiB
usage: 11 GiB used, 489 GiB / 500 GiB avail
pgs: 457 active+clean
至此,rbd-provisioner中没有报错信息了,再看pvc已正常创建pv并完成绑定
root@master1:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-rbd-claim1 Bound pvc-bbf9d2a5-e870-4901-94f9-63e0c5e3fc83 1Gi RWO ceph-rbd 171m
step5,最后模拟一个pv动态供给的新建pod,测试结果
root@master1:~/yaml# vim dynamic-pv-pod.yaml
---
kind: Pod
apiVersion: v1
metadata:
name: rbd-test-pod
spec:
containers:
- name: redis
image: redis
imagePullPolicy: IfNotPresent
volumeMounts:
- name: pvc
mountPath: "/data/redis"
volumes:
- name: pvc
persistentVolumeClaim:
claimName: ceph-rbd-claim2
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-rbd-claim2
spec:
accessModes:
- ReadWriteOnce
storageClassName: ceph-rbd
resources:
requests:
storage: 10Gi
root@master1:~/yaml# kubectl apply -f dynamic-pv-pod.yaml
pod/rbd-test-pod created
persistentvolumeclaim/ceph-rbd-claim2 created
root@master1:~/yaml# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-bbf9d2a5-e870-4901-94f9-63e0c5e3fc83 1Gi RWO Delete Bound default/ceph-rbd-claim1 ceph-rbd 19m
pvc-d27e1b6d-b519-4d10-ae8d-21fa0c69a85b 10Gi RWO Delete Bound default/ceph-rbd-claim2 ceph-rbd 25s
root@master1:~/yaml# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-rbd-claim1 Bound pvc-bbf9d2a5-e870-4901-94f9-63e0c5e3fc83 1Gi RWO ceph-rbd 3h6m
ceph-rbd-claim2 Bound pvc-d27e1b6d-b519-4d10-ae8d-21fa0c69a85b 10Gi RWO ceph-rbd 43s
root@master1:~/yaml# kubectl get pod
NAME READY STATUS RESTARTS AGE
rbd-test-pod 1/1 Running 0 58s
root@master1:~# kubectl exec -it pods/rbd-test-pod -- lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 40.8M 1 loop
loop1 7:1 0 63.4M 1 loop
loop2 7:2 0 111.9M 1 loop
loop4 7:4 0 53.3M 1 loop
loop5 7:5 0 79.9M 1 loop
loop6 7:6 0 63.5M 1 loop
sda 8:0 0 100G 0 disk
|-sda1 8:1 0 1M 0 part
|-sda2 8:2 0 2G 0 part
`-sda3 8:3 0 98G 0 part /etc/hosts
sr0 11:0 1 1024M 0 rom
rbd0 252:0 0 10G 0 disk /data/redis