基于华为云cce集群的k8s集群监控
  mU7QSV7oimBB 2023年11月02日 64 0

监控组件安装

首先登陆华为云cce集群控制,在"插件管理"中,搜索安装"prometheus"和"metrics-server"这两个插件

安装完毕后执行 kubectl get po -n monitoring查看组件安装情况

基于华为云cce集群的k8s集群监控_prometheus

custom-metrics-apiserver服务配置

此插件主要用于用户自定义指标的采集以及根据自定义指标创建hpa策略. 自建k8s集群可参考 https://github.com/kubernetes-sigs/custom-metrics-apiserver/tree/master来自定义安装,本文基于华为云k8s集群,安装过程不再详细赘述. 配置文件:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: custom-metrics-apiserver
  namespace: monitoring
  uid: 8f51ef30-4a2b-45cc-a4b3-5a694aeeff8c
  resourceVersion: '356797141'
  generation: 2
  creationTimestamp: '2023-07-31T05:23:47Z'
  labels:
    app: custom-metrics-apiserver
    app.kubernetes.io/managed-by: Helm
    release: cceaddon-prometheus
  annotations:
    deployment.kubernetes.io/revision: '2'
    meta.helm.sh/release-name: cceaddon-prometheus
    meta.helm.sh/release-namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: custom-metrics-apiserver
  template:
    metadata:
      name: custom-metrics-apiserver
      creationTimestamp: null
      labels:
        app: custom-metrics-apiserver
        release: cceaddon-prometheus
      annotations:
        seccomp.security.alpha.kubernetes.io/pod: runtime/default
    spec:
      volumes:
        - name: config
          configMap:
            name: adapter-config
            defaultMode: 420
        - name: tmp-vol
          emptyDir: {}
        - name: localtime
          hostPath:
            path: /etc/localtime
            type: ''
      containers:
        - name: custom-metrics-apiserver
          image: swr.cn-east-3.myhuaweicloud.com/hwofficial/k8s-prometheus-adapter:2.23.32
          args:
            - '--secure-port=6443'
            - '--cert-dir=/tmp/cert'
            - '--logtostderr=true'
            - '--prometheus-url=http://172.18.0.170:9090' ##作者使用的自建prometheus,所以此处修改为自建prometheus地址
            - '--metrics-relist-interval=1m'
            - '--config=/etc/adapter/config.yaml'
            - '--tls-min-version=VersionTLS12'
            - '--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384'
            - '--bind-address=$(POD_IP)'
          ports:
            - containerPort: 6443
              protocol: TCP
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
          resources:
            limits:
              cpu: '1'
              memory: 500Mi
            requests:
              cpu: 200m
              memory: 200Mi
          volumeMounts:
            - name: config
              readOnly: true
              mountPath: /etc/adapter/
            - name: tmp-vol
              mountPath: /tmp
            - name: localtime
              readOnly: true
              mountPath: /etc/localtime
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: custom-metrics-apiserver
      serviceAccount: custom-metrics-apiserver
      securityContext: {}
      imagePullSecrets:
        - name: default-secret
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - custom-metrics-apiserver
              topologyKey: kubernetes.io/hostname
      schedulerName: default-scheduler
      priorityClassName: system-node-critical
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600

prometheus-kube-state-metrics服务配置

简介:prometheus-kube-state-metrics是一个用于监控Kubernetes集群中资源状态的监控工具。它通过收集Kubernetes的API服务器提供的数据,并将其转化为Prometheus格式,以供Prometheus监控系统进行指标收集和分析。 配置文件:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: cceaddon-prometheus-kube-state-metrics
  namespace: monitoring
  uid: 59cf2c57-9553-4943-9d53-d34549829b53
  resourceVersion: '353929528'
  generation: 1
  creationTimestamp: '2023-07-31T05:23:47Z'
  labels:
    app: prometheus
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-2.23.32
    component: kube-state-metrics
    heritage: Helm
    release: cceaddon-prometheus
  annotations:
    deployment.kubernetes.io/revision: '1'
    meta.helm.sh/release-name: cceaddon-prometheus
    meta.helm.sh/release-namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: kube-state-metrics
      release: cceaddon-prometheus
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: prometheus
        chart: prometheus-2.23.32
        component: kube-state-metrics
        heritage: Helm
        release: cceaddon-prometheus
      annotations:
        prometheus.io/port: '8080'
        prometheus.io/scrape: 'true'
        seccomp.security.alpha.kubernetes.io/pod: runtime/default
    spec:
      volumes:
        - name: localtime
          hostPath:
            path: /etc/localtime
            type: ''
      containers:
        - name: prometheus-kube-state-metrics
          image: swr.cn-east-3.myhuaweicloud.com/hwofficial/kube-state-metrics:2.23.32
          command:
            - /bin/sh
            - '-c'
            - /kube-state-metrics --host=${POD_IP} --port=8080 --telemetry-host=${POD_IP} --telemetry-port=8081
          ports:
            - name: metrics
              containerPort: 8080
              protocol: TCP
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
          resources:
            limits:
              cpu: '2'
              memory: 3Gi
            requests:
              cpu: 100m
              memory: 128Mi
          volumeMounts:
            - name: localtime
              readOnly: true
              mountPath: /etc/localtime
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      serviceAccountName: cceaddon-prometheus-kube-state-metrics
      serviceAccount: cceaddon-prometheus-kube-state-metrics
      securityContext: {}
      imagePullSecrets:
        - name: default-secret
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: component
                    operator: In
                    values:
                      - kube-state-metrics
              topologyKey: kubernetes.io/hostname
      schedulerName: default-scheduler
      priorityClassName: system-node-critical
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600

创建service:

metadata:
  name: kube-stat-metirc
  namespace: monitoring
  uid: 804ca6ff-1d9a-4746-916a-68866ff86346
  resourceVersion: '346490466'
  creationTimestamp: '2023-07-31T05:54:37Z'
  labels:
    app: prometheus
    component: kube-state-metrics
  annotations:
    kubernetes.io/elb.class: union
    kubernetes.io/elb.health-check-flag: 'off'
    kubernetes.io/elb.id: 2b52fa5a-fb48-46b0-a7d8-995764921d26
    kubernetes.io/elb.lb-algorithm: ROUND_ROBIN
    kubernetes.io/elb.mark: '0'
  finalizers:
    - service.kubernetes.io/load-balancer-cleanup
spec:
  ports:
    - name: cce-service-0
      protocol: TCP
      port: 8080
      targetPort: 8080
      nodePort: 31404
  selector:
    app: prometheus
    component: kube-state-metrics
    release: cceaddon-prometheus
  clusterIP: 10.247.54.237
  clusterIPs:
    - 10.247.54.237
  type: LoadBalancer
  sessionAffinity: None
  loadBalancerIP: 123.60.54.238
  externalTrafficPolicy: Cluster
status:
  loadBalancer:
    ingress:
      - ip: 123.60.54.238
      - ip: 192.168.0.68
apiVersion: v1
kind: Service

需要将8080端口映射到外部可以访问

基于华为云cce集群的k8s集群监控_prometheus_02

自建prometheus服务安装

参考prometheus官方文档安装即可 配置文件详解:

- job_name: 'k8s-prod'
    metrics_path: metrics
    static_configs:
    - targets: ['192.168.0.68:8080']

  - job_name: k8s-prod-custom-metrics
    kubernetes_sd_configs:
    - role: pod
      api_server: https://192.168.0.152:5443
      bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
      tls_config:
        insecure_skip_verify: true
    scrape_interval: 60s
    scheme: https
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_namespace]
      action: keep
      regex: true;default
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: (\d+)
      replacement: $1
      target_label: __meta_kubernetes_pod_container_port_number
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
      action: replace
      regex: (.+);(.+);(.+)
      replacement: api/v1/namespaces/$1/pods/$2:$3/proxy/metrics
      target_label: __metrics_path__
    - source_labels: [__metrics_path__, __meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      regex: (.+)/metrics;/?(.+)
      replacement: $1/$2
      target_label: __metrics_path__
    - source_labels: []
      action: replace
      regex: ""
      replacement: 192.168.0.152:5443
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: pod

  - job_name: k8s-prod-cadvisor
    kubernetes_sd_configs:
    - role: node
      api_server: https://192.168.0.152:5443
      bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom #apiserver认证信息
      tls_config:
        insecure_skip_verify: true
    scrape_interval: 60s
    scheme: https
    tls_config:
      #insecure_skip_verify: false
      insecure_skip_verify: true
      #server_name: kubernetes
      #ca_file: /opt/prometheus/k8s/k8s-01.crt
    bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
    relabel_configs:
    - target_label: __address__
      replacement: 192.168.0.152:5443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
      target_label: __metrics_path__
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: service_name

  - job_name: k8s-prod-node
    metrics_path: /metrics
    scheme: http
    kubernetes_sd_configs:
    - role: node
      api_server: https://192.168.0.152:5443
      bearer_token_file: /opt/module/prometheus-2.45.0.linux-amd64/k8s/token_k8s-01_custom
      tls_config:
        insecure_skip_verify: true
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.*)
    - action: replace
      regex: (.*):(.*)
      source_labels: ["__address__"]
      target_label: __address__
      replacement: ${1}:9100

grafana绘制大盘展示

截取部分展示信息,可以在grafana官网下载模板

基于华为云cce集群的k8s集群监控_kube-operator_03

为了能获取到集群内指标信息,还需要创建role和rolebind进行权限绑定,可以参考github关于kube-state-metric相关文档,作者偷懒直接创建了admin的账号,具体命令如下:

kubectl create clusterrolebinding --user system:serviceaccount:default:default kube-system-cluster-admin --clusterrole cluster-admin

将在下一篇文章中讲解java服务自定义指标的上报和监控

【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月08日 0

暂无评论

推荐阅读
  uvM09mQNI0hF   2023年11月19日   27   0   0 promtaillokik8s
  cO5zyEmh8RH6   2023年12月09日   28   0   0 k8s证书
  CXvnc1NhAWTQ   2023年11月13日   37   0   0 PrometheusAlertManager