前言：

Prometheus作为一个监控神器，深受大家的喜爱，那么如何使用它是一个问题，如何用好它又是一个问题了。

本文以centos7操作系统下搭建的Prometheus server为例，讲解基于文件发现新的采集器。

一，

Prometheus的配置文件

第一，

通常的，不管是哪种服务都是有且仅有一个主配置文件，例如MySQL的主配置文件是my.cnf ，各个部署安装教程里也都是所有的配置写到这一个配置文件内。

说到底，这么做也是对的，毕竟一个配置文件便于管理嘛，毕竟根据MySQL服务的特点来说，不到100行的配置文件属实是有点小的。

那么，Prometheus的主配置文件就不同了，如果是仅仅几个服务器的监控，无所谓喽，例如，仅仅监控三台的包含node_exporter的配置文件：

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
 
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:
      - targets: ["192.168.217.24:9090"]  #本机IP+端口，其它的不用改
      - targets: ["192.168.217.25:9090"] 
      - targets: ["192.168.217.26:9090"]

那么，问题来了，如果是有上百台的服务器都安装了node_exporter收集器，然后这个Prometheus server要监控它们，是不是就要写几百个 - targets: ["192.168.217.24:9090"] 了？不仅仅是node_exporter收集器，假设这上百个服务器里有几十个MySQL，那么，Prometheus的配置文件是不是还要把几十个mysqld_exporter 写进来？还有其他的需要监控的，比如nginx，那么，又要增加写入多少配置呢？

如此的话，这个配置文件将会非常的臃肿（一般情况下，我们认为一个配置文件的内容不应该超过100行，否则，我们应该认为这个配置是不便于管理的）。

第二，

配置文件修改后生效的问题

一般情况下，任何服务的主配置文件都是和服务的启停脚本绑定的，这意味着如果配置文件修改了，要看到修改后的效果必须要重启服务，Prometheus也是如此（例如，nginx 增加一个反向代理配置，是不是需要重新启动一下nginx服务或者至少reload一下服务，才可以看到反向代理的效果？）

要不说Prometheus是神器呢，这个问题在普罗米修斯这里不在是问题了。Prometheus贴心的给准备了服务自动发现功能。

文件自动发现概念：

可以通过 watch 一组本地的目标文件来获取抓取目标以及标签信息，这就是基于文件的服务发现方式。

该方式提供了一种更通用的方式来配置静态目标，它读取一组包含零个或多个 <static_config> 列表的文件，对所有定义的文件的变更通过磁盘监视被检测到并立即应用，目标文件可以以 YAML或 JSON 格式,YAML格式的如下：

- targets:
[ - '<host>' ]
labels:
[ <labelname>: <labelvalue> ... ]

三，

如何实现文件自动发现

编辑Prometheus的主配置文件：末尾添加如下：

这里使用了通配符，但也可以使用绝对路径，指定单一文件，这个路径需要绝对存在，

多说一句，在192.168.217.19/20/20/22 这四台服务器都已经安装并启动了node_exporter服务。


cat /usr/local/prometheus/prometheus.yml
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label  to any timeseries scraped from this config.
  - job_name: "prometheus"
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.217.22:9090"]  #本机IP+端口，其它的不用改

  - job_name: "node-exporter"
    file_sd_configs: 
      - files: ['/opt/promethes/files-sd/*.yml'] #自定义的文件存放路径和要发现的文件类型
        refresh_interval : 5s #发现文件的刷新时间为5秒，这里是为了更快速的测试到

根据以上配置，需要新建目录如下：

mkdir -p /opt/promethes/files-sd

重启Prometheus服务：

systemctl daemon-reload && systemctl restart prometheus

此时，我们在/opt/promethes/files-sd目录下根据自己的需求新建文件，文件的后缀名必须是前面定义的yml，Prometheus才可以识别到：

注意，标签可以自定义任意的，但最好定义的有一定意义

cat >/opt/promethes/files-sd/node-exporter2.yml <<eof
- targets: ['192.168.217.20:9100']
  labels:
    job: node2
eof

OK，此时的Prometheus server不再需要重启了，可以立刻通过Prometheus的web管理端看到，那么，依法炮制其它三个文件，内容如下：

cat >/opt/promethes/files-sd/node-exporter2.yml <<eof
- targets: ['192.168.217.20:9100']
  labels:
    job: node2
eof


cat >/opt/promethes/files-sd/node-exporter3.yml <<eof
- targets: ['192.168.217.21:9100']
  labels:
    job: node3
eof


cat >/opt/promethes/files-sd/node-exporter4.yml <<eof
- targets: ['192.168.217.21:9100']
  labels:
    job: node4
eof

打开Prometheus的web管理端，查看Status===》Service Discovery，应该是能够看到如下：

Linux|centos7 Prometheus的自动服务发现一（文件发现机制）

OK，假设某天节点192.168.217.19损坏了，开不了机了，那么，将其提出Prometheus的监控范围也就非常简单了，把对应节点的配置文件改个名称就好了：

mv node-exporter1.yml node-exporter1.yml-bak

再次回到web管理界面，大概等待5秒（上面定义的刷新时间），可以看到相关的node1的node_exporter已经没有了：

Linux|centos7 Prometheus的自动服务发现一（文件发现机制）

总结：

基于文件的服务发现有如下优点：

1，减少主配置文件，防止配置文件过于臃肿

2，通过文件解耦服务，便于服务动态改动，不需要重启Prometheus，如果Prometheus的监控服务非常多的话，避免其它的服务受到因服务重启而造成的影响。