告警配置文件
prometheus截图:
AlertManager配置文件:
global:
...
route:
receiver: default
group_by:
- alertname # 根据此标签分组,可以使用别的标签
continue: false
routes:
- receiver: critical_alerts # 接受者的名字
match:
severity: critical # 路由匹配的标签
continue: false # 不继续匹配
group_wait: 1m # 分组等待
group_interval: 1m # 组间隔
repeat_interval: 5m # 重复通知间隔
- receiver: warning_alerts
match:
severity: warning
continue: false
group_wait: 30m
group_interval: 30m
repeat_interval: 2h
- receiver: info_alerts
match:
severity: info
continue: false
group_wait: 3h
group_interval: 3h
repeat_interval: 1d
group_wait: 30s
group_interval: 30s
repeat_interval: 10m
inhibit_rules: # 抑制告警的规则
- source_match:
severity: critical
target_match_re:
severity: warning|info
equal:
- origin_prometheus
- namespace
- alertname
- source_match:
severity: warning
target_match_re:
severity: info
equal:
- origin_prometheus
- namespace
- alertname
receivers: # 通知规则
- name: default
webhook_configs:
- send_resolved: true
http_config:
follow_redirects: true
url: http://xxxxx:32555/prometheusalert?type=fs&tpl=prometheus-fs-wraning&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/4197322c-3c93-4d8a-xxxxx
max_alerts: 0
- name: warning_alerts
webhook_configs:
- send_resolved: true
http_config:
follow_redirects: true
url: http://xxxxxxx:32555/prometheusalert?type=fs&tpl=prometheus-fs-wraning&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/4197322c-3c93-4d8a-xxxxx
max_alerts: 0
- name: critical_alerts
webhook_configs:
- send_resolved: true
http_config:
follow_redirects: true
url: http://xxxxxxx:32555/prometheusalert?type=fs&tpl=prometheus-fs-critical&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/1d422c40-2b88-4ead-xxxxx
max_alerts: 0
- name: info_alerts
webhook_configs:
- send_resolved: true
http_config:
follow_redirects: true
url: http://xxxxxx:32555/prometheusalert?type=fs&tpl=prometheus-fs-info&fsurl=https://open.feishu.cn/open-apis/bot/v2/hook/a1c21ffc-6413-4bab-xxxxx
max_alerts: 0
告警等级(自定义规范)
告警等级 |
告警间隔时间 |
说明 |
info |
24h |
不重要不紧急,可以随时处理,一天提醒一次 |
warning |
1h |
比较重要,不紧急。一小时提醒一次 |
critical |
2m |
重要紧急,需要马上处理。 |
告警抑制方法
使用场景:
当运维接收到告警信息,准备处理的时候。
可预知的告警信息,升级,重启等。
演示:saas-test
AlertManager
正在告警的抑制方法
查看,创建抑制信息
告警信息展示
可根据不同的告警信息,配置不同的机器人进行告警不同的信息内容。
优化过后的通知: