监控组件安装
首先登陆华为云cce集群控制,在"插件管理"中,搜索安装"prometheus"和"metrics-server"这两个插件
安装完毕后执行 kubectl get po -n monitoring查看组件安装情况
custom-metrics-apiserver服务配置
此插件主要用于用户自定义指标的采集以及根据自定义指标创建hpa策略. 自建k8s集群可参考 https://github.com/kubernetes-sigs/custom-metrics-apiserver/tree/master来自定义安装,本文基于华为云k8s集群,安装过程不再详细赘述. 配置文件:
kind: Deployment
apiVersion: apps/v1
metadata:
name: custom-metrics-apiserver
namespace: monitoring
uid: 8f51ef30-4a2b-45cc-a4b3-5a694aeeff8c
resourceVersion: '356797141'
generation: 2
creationTimestamp: '2023-07-31T05:23:47Z'
labels:
app: custom-metrics-apiserver
app.kubernetes.io/managed-by: Helm
release: cceaddon-prometheus
annotations:
deployment.kubernetes.io/revision: '2'
meta.helm.sh/release-name: cceaddon-prometheus
meta.helm.sh/release-namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: custom-metrics-apiserver
template:
metadata:
name: custom-metrics-apiserver
creationTimestamp: null
labels:
app: custom-metrics-apiserver
release: cceaddon-prometheus
annotations:
seccomp.security.alpha.kubernetes.io/pod: runtime/default
spec:
volumes:
- name: config
configMap:
name: adapter-config
defaultMode: 420
- name: tmp-vol
emptyDir: {}
- name: localtime
hostPath:
path: /etc/localtime
type: ''
containers:
- name: custom-metrics-apiserver
image: swr.cn-east-3.myhuaweicloud.com/hwofficial/k8s-prometheus-adapter:2.23.32
args:
- '--secure-port=6443'
- '--cert-dir=/tmp/cert'
- '--logtostderr=true'
- '--prometheus-url=http://172.18.0.170:9090' ##作者使用的自建prometheus,所以此处修改为自建prometheus地址
- '--metrics-relist-interval=1m'
- '--config=/etc/adapter/config.yaml'
- '--tls-min-version=VersionTLS12'
- '--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384'
- '--bind-address=$(POD_IP)'
ports:
- containerPort: 6443
protocol: TCP
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
resources:
limits:
cpu: '1'
memory: 500Mi
requests:
cpu: 200m
memory: 200Mi
volumeMounts:
- name: config
readOnly: true
mountPath: /etc/adapter/
- name: tmp-vol
mountPath: /tmp
- name: localtime
readOnly: true
mountPath: /etc/localtime
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: custom-metrics-apiserver
serviceAccount: custom-metrics-apiserver
securityContext: {}
imagePullSecrets:
- name: default-secret
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- custom-metrics-apiserver
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
priorityClassName: system-node-critical
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
prometheus-kube-state-metrics服务配置
简介:prometheus-kube-state-metrics是一个用于监控Kubernetes集群中资源状态的监控工具。它通过收集Kubernetes的API服务器提供的数据,并将其转化为Prometheus格式,以供Prometheus监控系统进行指标收集和分析。 配置文件:
kind: Deployment
apiVersion: apps/v1
metadata:
name: cceaddon-prometheus-kube-state-metrics
namespace: monitoring
uid: 59cf2c57-9553-4943-9d53-d34549829b53
resourceVersion: '353929528'
generation: 1
creationTimestamp: '2023-07-31T05:23:47Z'
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-2.23.32
component: kube-state-metrics
heritage: Helm
release: cceaddon-prometheus
annotations:
deployment.kubernetes.io/revision: '1'
meta.helm.sh/release-name: cceaddon-prometheus
meta.helm.sh/release-namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: kube-state-metrics
release: cceaddon-prometheus
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-2.23.32
component: kube-state-metrics
heritage: Helm
release: cceaddon-prometheus
annotations:
prometheus.io/port: '8080'
prometheus.io/scrape: 'true'
seccomp.security.alpha.kubernetes.io/pod: runtime/default
spec:
volumes:
- name: localtime
hostPath:
path: /etc/localtime
type: ''
containers:
- name: prometheus-kube-state-metrics
image: swr.cn-east-3.myhuaweicloud.com/hwofficial/kube-state-metrics:2.23.32
command:
- /bin/sh
- '-c'
- /kube-state-metrics --host=${POD_IP} --port=8080 --telemetry-host=${POD_IP} --telemetry-port=8081
ports:
- name: metrics
containerPort: 8080
protocol: TCP
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
resources:
limits:
cpu: '2'
memory: 3Gi
requests:
cpu: 100m
memory: 128Mi
volumeMounts:
- name: localtime
readOnly: true
mountPath: /etc/localtime
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: cceaddon-prometheus-kube-state-metrics
serviceAccount: cceaddon-prometheus-kube-state-metrics
securityContext: {}
imagePullSecrets:
- name: default-secret
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: component
operator: In
values:
- kube-state-metrics
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
priorityClassName: system-node-critical
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
创建service:
metadata:
name: kube-stat-metirc
namespace: monitoring
uid: 804ca6ff-1d9a-4746-916a-68866ff86346
resourceVersion: '346490466'
creationTimestamp: '2023-07-31T05:54:37Z'
labels:
app: prometheus
component: kube-state-metrics
annotations:
kubernetes.io/elb.class: union
kubernetes.io/elb.health-check-flag: 'off'
kubernetes.io/elb.id: 2b52fa5a-fb48-46b0-a7d8-995764921d26
kubernetes.io/elb.lb-algorithm: ROUND_ROBIN
kubernetes.io/elb.mark: '0'
finalizers:
- service.kubernetes.io/load-balancer-cleanup
spec:
ports:
- name: cce-service-0
protocol: TCP
port: 8080
targetPort: 8080
nodePort: 31404
selector:
app: prometheus
component: kube-state-metrics
release: cceaddon-prometheus
clusterIP: 10.247.54.237
clusterIPs:
- 10.247.54.237
type: LoadBalancer
sessionAffinity: None
loadBalancerIP: 123.60.54.238
externalTrafficPolicy: Cluster
status:
loadBalancer:
ingress:
- ip: 123.60.54.238
- ip: 192.168.0.68
apiVersion: v1
kind: Service
需要将8080端口映射到外部可以访问
自建prometheus服务安装
参考prometheus官方文档安装即可 配置文件详解:
- job_name: 'k8s-prod'
metrics_path: metrics
static_configs:
- targets: ['192.168.0.68:8080']
- job_name: k8s-prod-custom-metrics
kubernetes_sd_configs:
- role: pod
api_server: https://192.168.0.152:5443
bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
tls_config:
insecure_skip_verify: true
scrape_interval: 60s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_namespace]
action: keep
regex: true;default
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (\d+)
replacement: $1
target_label: __meta_kubernetes_pod_container_port_number
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
action: replace
regex: (.+);(.+);(.+)
replacement: api/v1/namespaces/$1/pods/$2:$3/proxy/metrics
target_label: __metrics_path__
- source_labels: [__metrics_path__, __meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
regex: (.+)/metrics;/?(.+)
replacement: $1/$2
target_label: __metrics_path__
- source_labels: []
action: replace
regex: ""
replacement: 192.168.0.152:5443
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- job_name: k8s-prod-cadvisor
kubernetes_sd_configs:
- role: node
api_server: https://192.168.0.152:5443
bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom #apiserver认证信息
tls_config:
insecure_skip_verify: true
scrape_interval: 60s
scheme: https
tls_config:
#insecure_skip_verify: false
insecure_skip_verify: true
#server_name: kubernetes
#ca_file: /opt/prometheus/k8s/k8s-01.crt
bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
relabel_configs:
- target_label: __address__
replacement: 192.168.0.152:5443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
target_label: __metrics_path__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service_name
- job_name: k8s-prod-node
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: node
api_server: https://192.168.0.152:5443
bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.*)
- action: replace
regex: (.*):(.*)
source_labels: ["__address__"]
target_label: __address__
replacement: ${1}:9100
grafana绘制大盘展示
截取部分展示信息,可以在grafana官网下载模板
为了能获取到集群内指标信息,还需要创建role和rolebind进行权限绑定,可以参考github关于kube-state-metric相关文档,作者偷懒直接创建了admin的账号,具体命令如下:
kubectl create clusterrolebinding --user system:serviceaccount:default:default kube-system-cluster-admin --clusterrole cluster-admin
将在下一篇文章中讲解java服务自定义指标的上报和监控