Prometheus监控MySQL慢SQL 方案

前言:后续补充 大晚上手挫给我搓爽了 这套方案用于Linux 环境: mysqld_exporter 采集 MySQL 指标。 Prometheus 抓取 mysqld_exporter:9104 。 使用 mysql_global_status_slow_queries 监控全局慢查询增长。 使用...
Prometheus监控MySQL慢SQL 方案
Prometheus监控MySQL慢SQL 方案

前言:后续补充

大晚上手挫给我搓爽了 :laughing:

这套方案用于Linux 环境:

  • mysqld_exporter 采集 MySQL 指标。
  • Prometheus 抓取 mysqld_exporter:9104
  • 使用 mysql_global_status_slow_queries 监控全局慢查询增长。
  • 使用 mysql_perf_schema_events_statements_*performance_schema.events_statements_summary_by_digest 发现具体慢SQL摘要。

1. MySQL 开启慢查询与 performance_schema

建议在 /etc/my.cnf/etc/mysql/mysql.conf.d/mysqld.cnf 中配置:

[mysqld]
performance_schema=ON
slow_query_log=ON
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=1
log_queries_not_using_indexes=OFF

登录mysql执行授权 SQL:

mysql -uroot -p

> CREATE USER IF NOT EXISTS 'exporter'@'127.0.0.1'
  IDENTIFIED BY 'change_me'
  WITH MAX_USER_CONNECTIONS 3;
>GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'127.0.0.1';
>FLUSH PRIVILEGES;

运行命令检查:

SHOW VARIABLES LIKE 'performance_schema';
SHOW VARIABLES LIKE 'slow_query_log';
SHOW VARIABLES LIKE 'long_query_time';
SHOW GLOBAL STATUS LIKE 'Slow_queries';

2. 部署 mysqld_exporter

下载并安装二进制:

sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus
curl -LO https://github.com/prometheus/mysqld_exporter/releases/download/v0.19.0/mysqld_exporter-0.19.0.linux-amd64.tar.gz
tar -xzf mysqld_exporter-0.19.0.linux-amd64.tar.gz
sudo install -m 0755 mysqld_exporter-0.19.0.linux-amd64/mysqld_exporter /usr/local/bin/mysqld_exporter
sudo mkdir -p /etc/mysqld_exporter
sudo install -m 0640 mysqld_exporter/.my.cnf.example /etc/mysqld_exporter/.my.cnf
sudo chown -R prometheus:prometheus /etc/mysqld_exporter
sudo install -m 0644 mysqld_exporter/mysqld_exporter.service /etc/systemd/system/mysqld_exporter.service
sudo systemctl daemon-reload
sudo systemctl enable --now mysqld_exporter

mysqld_exporter.service配置:

[Unit]
Description=Prometheus MySQL Exporter
After=network-online.target
Wants=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter \
  --config.my-cnf=/etc/mysqld_exporter/.my.cnf \
  --web.listen-address=:9104 \
  --collect.global_status \
  --collect.global_variables \
  --collect.perf_schema.eventsstatements \
  --collect.perf_schema.eventsstatements.limit=250 \
  --collect.perf_schema.eventsstatements.timelimit=86400 \
  --collect.perf_schema.eventsstatements.digest_text_limit=160
Restart=always
RestartSec=5s
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true

[Install]
WantedBy=multi-user.target

修改 /etc/mysqld_exporter/.my.cnf 中密码后再启动服务。

.my.cnf内容:

[client]
user=exporter
password=change_me
host=127.0.0.1
port=3306

验证:

curl -s http://127.0.0.1:9104/metrics | grep -E 'mysql_global_status_slow_queries|mysql_perf_schema_events_statements'

3. 配置 Prometheus

prometheus.ymlrules/mysql-slow-query.rules.yml 合并到 Prometheus 配置目录。

prometheus.yml内容配置:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - /etc/prometheus/rules/mysql-slow-query.rules.yml

scrape_configs:
  - job_name: mysql
    static_configs:
      - targets:
          - 127.0.0.1:9104
        labels:
          service: mysql
          env: prod


mysql-slow-query.rules.yml内容配置:

groups:
  - name: mysql-slow-query
    interval: 30s
    rules:
      - record: mysql:slow_queries:rate5m
        expr: rate(mysql_global_status_slow_queries[5m])

      - record: mysql:statement_digest:avg_latency_seconds5m
        expr: |
          rate(mysql_perf_schema_events_statements_seconds_total[5m])
          /
          clamp_min(rate(mysql_perf_schema_events_statements_total[5m]), 0.001)

      - record: mysql:statement_digest:rows_examined_rate5m
        expr: rate(mysql_perf_schema_events_statements_rows_examined_total[5m])

      - alert: MySQLSlowQueriesIncreasing
        expr: mysql:slow_queries:rate5m > 0.1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "MySQL slow queries are increasing on {{ $labels.instance }}"
          description: "Slow query rate is {{ $value | printf \"%.3f\" }}/s for 10m. Check MySQL slow log and statement digest metrics."

      - alert: MySQLSlowSQLDigestHighAvgLatency
        expr: mysql:statement_digest:avg_latency_seconds5m > 1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Slow SQL digest on {{ $labels.instance }}"
          description: "schema={{ $labels.schema }}, avg_latency={{ $value | printf \"%.3f\" }}s, digest={{ $labels.digest }}, sql={{ $labels.digest_text }}"

      - alert: MySQLSQLDigestExaminesTooManyRows
        expr: mysql:statement_digest:rows_examined_rate5m > 100000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "MySQL SQL digest examines too many rows on {{ $labels.instance }}"
          description: "schema={{ $labels.schema }}, rows_examined_rate={{ $value | printf \"%.0f\" }}/s, digest={{ $labels.digest }}, sql={{ $labels.digest_text }}"

      - alert: MySQLSQLDigestNoIndexUsed
        expr: increase(mysql_perf_schema_events_statements_no_index_used_total[15m]) > 0
        for: 5m
        labels:
          severity: info
        annotations:
          summary: "MySQL SQL digest used no index on {{ $labels.instance }}"
          description: "schema={{ $labels.schema }}, no_index_count={{ $value | printf \"%.0f\" }} in 15m, digest={{ $labels.digest }}, sql={{ $labels.digest_text }}"


检查配置并重载:

promtool check config /etc/prometheus/prometheus.yml
sudo systemctl reload prometheus

4. 导入 Grafana dashboard

可以直观看到慢 SQL。导入 grafana/mysql-slow-sql-dashboard.json 后,选择 Prometheus 数据源即可看到:

  • 慢查询增长速率趋势。
  • Top 慢 SQL digest 表格,按 5 分钟平均耗时排序。
  • Top 扫描行数 SQL digest。
  • 未使用索引 SQL digest。

导入路径:

Grafana -> Dashboards -> New -> Import -> Upload JSON file

mysql-slow-sql-dashboard.json配置:

{
  "__inputs": [
    {
      "name": "DS_PROMETHEUS",
      "label": "Prometheus",
      "description": "Prometheus datasource used by this dashboard",
      "type": "datasource",
      "pluginId": "prometheus",
      "pluginName": "Prometheus"
    }
  ],
  "__requires": [
    {
      "type": "grafana",
      "id": "grafana",
      "name": "Grafana",
      "version": "10.0.0"
    },
    {
      "type": "datasource",
      "id": "prometheus",
      "name": "Prometheus",
      "version": "1.0.0"
    },
    {
      "type": "panel",
      "id": "timeseries",
      "name": "Time series",
      "version": ""
    },
    {
      "type": "panel",
      "id": "table",
      "name": "Table",
      "version": ""
    }
  ],
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 0.1
              }
            ]
          },
          "unit": "qps"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "legend": {
          "calcs": [
            "lastNotNull"
          ],
          "displayMode": "table",
          "placement": "right",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "rate(mysql_global_status_slow_queries{instance=~\"$instance\"}[5m])",
          "legendFormat": "{{instance}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Slow Query Rate",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "custom": {
            "align": "auto",
            "cellOptions": {
              "type": "auto"
            },
            "filterable": true,
            "inspect": false
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 1
              }
            ]
          },
          "unit": "s"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "SQL"
            },
            "properties": [
              {
                "id": "custom.width",
                "value": 720
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "Schema"
            },
            "properties": [
              {
                "id": "custom.width",
                "value": 120
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "Avg Latency"
            },
            "properties": [
              {
                "id": "unit",
                "value": "s"
              },
              {
                "id": "decimals",
                "value": 3
              },
              {
                "id": "custom.cellOptions",
                "value": {
                  "mode": "gradient",
                  "type": "color-background"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 10,
        "w": 24,
        "x": 0,
        "y": 8
      },
      "id": 2,
      "options": {
        "cellHeight": "sm",
        "footer": {
          "countRows": false,
          "fields": "",
          "reducer": [
            "sum"
          ],
          "show": false
        },
        "showHeader": true,
        "sortBy": [
          {
            "desc": true,
            "displayName": "Avg Latency"
          }
        ]
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "topk($topk, rate(mysql_perf_schema_events_statements_seconds_total{instance=~\"$instance\",schema=~\"$schema\"}[5m]) / clamp_min(rate(mysql_perf_schema_events_statements_total{instance=~\"$instance\",schema=~\"$schema\"}[5m]), 0.001))",
          "format": "table",
          "instant": true,
          "legendFormat": "__auto",
          "range": false,
          "refId": "A"
        }
      ],
      "title": "Top Slow SQL Digests by Avg Latency",
      "transformations": [
        {
          "id": "organize",
          "options": {
            "excludeByName": {
              "Time": true,
              "digest": true,
              "instance": true,
              "job": true,
              "service": true
            },
            "indexByName": {
              "Value": 3,
              "digest_text": 2,
              "schema": 1
            },
            "renameByName": {
              "Value": "Avg Latency",
              "digest_text": "SQL",
              "schema": "Schema"
            }
          }
        }
      ],
      "type": "table"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "custom": {
            "align": "auto",
            "cellOptions": {
              "type": "auto"
            },
            "filterable": true,
            "inspect": false
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 100000
              }
            ]
          },
          "unit": "rows"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "SQL"
            },
            "properties": [
              {
                "id": "custom.width",
                "value": 720
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "Rows Examined/s"
            },
            "properties": [
              {
                "id": "decimals",
                "value": 0
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 10,
        "w": 12,
        "x": 0,
        "y": 18
      },
      "id": 3,
      "options": {
        "cellHeight": "sm",
        "footer": {
          "countRows": false,
          "fields": "",
          "reducer": [
            "sum"
          ],
          "show": false
        },
        "showHeader": true,
        "sortBy": [
          {
            "desc": true,
            "displayName": "Rows Examined/s"
          }
        ]
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "topk($topk, rate(mysql_perf_schema_events_statements_rows_examined_total{instance=~\"$instance\",schema=~\"$schema\"}[5m]))",
          "format": "table",
          "instant": true,
          "range": false,
          "refId": "A"
        }
      ],
      "title": "Top SQL Digests by Rows Examined",
      "transformations": [
        {
          "id": "organize",
          "options": {
            "excludeByName": {
              "Time": true,
              "digest": true,
              "instance": true,
              "job": true,
              "service": true
            },
            "indexByName": {
              "Value": 3,
              "digest_text": 2,
              "schema": 1
            },
            "renameByName": {
              "Value": "Rows Examined/s",
              "digest_text": "SQL",
              "schema": "Schema"
            }
          }
        }
      ],
      "type": "table"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "custom": {
            "align": "auto",
            "cellOptions": {
              "type": "auto"
            },
            "filterable": true,
            "inspect": false
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 1
              }
            ]
          },
          "unit": "short"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "SQL"
            },
            "properties": [
              {
                "id": "custom.width",
                "value": 720
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "No Index Used"
            },
            "properties": [
              {
                "id": "decimals",
                "value": 0
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 10,
        "w": 12,
        "x": 12,
        "y": 18
      },
      "id": 4,
      "options": {
        "cellHeight": "sm",
        "footer": {
          "countRows": false,
          "fields": "",
          "reducer": [
            "sum"
          ],
          "show": false
        },
        "showHeader": true,
        "sortBy": [
          {
            "desc": true,
            "displayName": "No Index Used"
          }
        ]
      },
      "pluginVersion": "10.0.0",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "topk($topk, increase(mysql_perf_schema_events_statements_no_index_used_total{instance=~\"$instance\",schema=~\"$schema\"}[15m]))",
          "format": "table",
          "instant": true,
          "range": false,
          "refId": "A"
        }
      ],
      "title": "SQL Digests Without Index",
      "transformations": [
        {
          "id": "organize",
          "options": {
            "excludeByName": {
              "Time": true,
              "digest": true,
              "instance": true,
              "job": true,
              "service": true
            },
            "indexByName": {
              "Value": 3,
              "digest_text": 2,
              "schema": 1
            },
            "renameByName": {
              "Value": "No Index Used",
              "digest_text": "SQL",
              "schema": "Schema"
            }
          }
        }
      ],
      "type": "table"
    }
  ],
  "refresh": "30s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": [
    "mysql",
    "slow-sql",
    "prometheus"
  ],
  "templating": {
    "list": [
      {
        "current": {},
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mysql_up, instance)",
        "hide": 0,
        "includeAll": true,
        "label": "Instance",
        "multi": true,
        "name": "instance",
        "options": [],
        "query": {
          "query": "label_values(mysql_up, instance)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
      {
        "current": {},
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mysql_perf_schema_events_statements_total{instance=~\"$instance\"}, schema)",
        "hide": 0,
        "includeAll": true,
        "label": "Schema",
        "multi": true,
        "name": "schema",
        "options": [],
        "query": {
          "query": "label_values(mysql_perf_schema_events_statements_total{instance=~\"$instance\"}, schema)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "10",
          "value": "10"
        },
        "hide": 0,
        "label": "TopK",
        "name": "topk",
        "options": [
          {
            "selected": true,
            "text": "10",
            "value": "10"
          },
          {
            "selected": false,
            "text": "20",
            "value": "20"
          },
          {
            "selected": false,
            "text": "50",
            "value": "50"
          }
        ],
        "query": "10,20,50",
        "queryValue": "",
        "skipUrlSync": false,
        "type": "custom"
      }
    ]
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "MySQL Slow SQL Monitoring",
  "uid": "mysql-slow-sql",
  "version": 1,
  "weekStart": ""
}

如果 Grafana 显示 No data 或面板左上角有告警图标,按下面顺序排查:

mysql_up

如果没有结果,说明 Prometheus 没抓到 mysqld_exporter,检查 prometheus.yml 里的 targets 和 Prometheus Status -> Targets 页面。

mysql_global_status_slow_queries

如果 mysql_up 有结果但这个没有,检查 mysqld_exporter 是否能连接 MySQL,以及 exporter 用户权限。

mysql_perf_schema_events_statements_total

如果前两个有结果但这个没有,说明 SQL digest 采集没打开或 MySQL performance_schema 没有可用数据,检查 systemd 服务里是否包含:

--collect.perf_schema.eventsstatements

同时在 MySQL 执行:

SHOW VARIABLES LIKE 'performance_schema';
SELECT COUNT(*) FROM performance_schema.events_statements_summary_by_digest;

如果 events_statements_summary_by_digest 为空,先执行几条业务 SQL,等待 Prometheus 下一个抓取周期后刷新 Grafana。

5. 常用 PromQL

全局慢查询增长速度:

rate(mysql_global_status_slow_queries[5m])

过去 5 分钟平均耗时最高的 SQL digest:

topk(
  10,
  rate(mysql_perf_schema_events_statements_seconds_total[5m])
  /
  clamp_min(rate(mysql_perf_schema_events_statements_total[5m]), 0.001)
)

过去 5 分钟扫描行数明显高的 SQL digest:

topk(
  10,
  rate(mysql_perf_schema_events_statements_rows_examined_total[5m])
)

未使用索引的 SQL digest:

topk(
  10,
increase(mysql_perf_schema_events_statements_no_index_used_total[15m])
)

6. 告警说明

规则文件已实现这些告警:

  • MySQLSlowQueriesIncreasing:全局慢查询持续增加。
  • MySQLSlowSQLDigestHighAvgLatency:某条 SQL digest 平均耗时超过 1 秒。
  • MySQLSQLDigestExaminesTooManyRows:某条 SQL digest 扫描行数过高。
  • MySQLSQLDigestNoIndexUsed:某条 SQL digest 发生未使用索引。

赶紧睡了,明天上班摸鱼再继续施工

1 个帖子 - 1 位参与者

阅读完整话题

来源: LinuxDo 最新话题查看原文