前言:后续补充
大晚上手挫给我搓爽了 ![]()
这套方案用于Linux 环境:
mysqld_exporter采集 MySQL 指标。- Prometheus 抓取
mysqld_exporter:9104。 - 使用
mysql_global_status_slow_queries监控全局慢查询增长。 - 使用
mysql_perf_schema_events_statements_*从performance_schema.events_statements_summary_by_digest发现具体慢SQL摘要。
1. MySQL 开启慢查询与 performance_schema
建议在 /etc/my.cnf 或 /etc/mysql/mysql.conf.d/mysqld.cnf 中配置:
[mysqld]
performance_schema=ON
slow_query_log=ON
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=1
log_queries_not_using_indexes=OFF
登录mysql执行授权 SQL:
mysql -uroot -p
> CREATE USER IF NOT EXISTS 'exporter'@'127.0.0.1'
IDENTIFIED BY 'change_me'
WITH MAX_USER_CONNECTIONS 3;
>GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'127.0.0.1';
>FLUSH PRIVILEGES;
运行命令检查:
SHOW VARIABLES LIKE 'performance_schema';
SHOW VARIABLES LIKE 'slow_query_log';
SHOW VARIABLES LIKE 'long_query_time';
SHOW GLOBAL STATUS LIKE 'Slow_queries';
2. 部署 mysqld_exporter
下载并安装二进制:
sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus
curl -LO https://github.com/prometheus/mysqld_exporter/releases/download/v0.19.0/mysqld_exporter-0.19.0.linux-amd64.tar.gz
tar -xzf mysqld_exporter-0.19.0.linux-amd64.tar.gz
sudo install -m 0755 mysqld_exporter-0.19.0.linux-amd64/mysqld_exporter /usr/local/bin/mysqld_exporter
sudo mkdir -p /etc/mysqld_exporter
sudo install -m 0640 mysqld_exporter/.my.cnf.example /etc/mysqld_exporter/.my.cnf
sudo chown -R prometheus:prometheus /etc/mysqld_exporter
sudo install -m 0644 mysqld_exporter/mysqld_exporter.service /etc/systemd/system/mysqld_exporter.service
sudo systemctl daemon-reload
sudo systemctl enable --now mysqld_exporter
mysqld_exporter.service配置:
[Unit]
Description=Prometheus MySQL Exporter
After=network-online.target
Wants=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter \
--config.my-cnf=/etc/mysqld_exporter/.my.cnf \
--web.listen-address=:9104 \
--collect.global_status \
--collect.global_variables \
--collect.perf_schema.eventsstatements \
--collect.perf_schema.eventsstatements.limit=250 \
--collect.perf_schema.eventsstatements.timelimit=86400 \
--collect.perf_schema.eventsstatements.digest_text_limit=160
Restart=always
RestartSec=5s
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
修改 /etc/mysqld_exporter/.my.cnf 中密码后再启动服务。
.my.cnf内容:
[client]
user=exporter
password=change_me
host=127.0.0.1
port=3306
验证:
curl -s http://127.0.0.1:9104/metrics | grep -E 'mysql_global_status_slow_queries|mysql_perf_schema_events_statements'
3. 配置 Prometheus
把 prometheus.yml 和 rules/mysql-slow-query.rules.yml 合并到 Prometheus 配置目录。
prometheus.yml内容配置:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- /etc/prometheus/rules/mysql-slow-query.rules.yml
scrape_configs:
- job_name: mysql
static_configs:
- targets:
- 127.0.0.1:9104
labels:
service: mysql
env: prod
mysql-slow-query.rules.yml内容配置:
groups:
- name: mysql-slow-query
interval: 30s
rules:
- record: mysql:slow_queries:rate5m
expr: rate(mysql_global_status_slow_queries[5m])
- record: mysql:statement_digest:avg_latency_seconds5m
expr: |
rate(mysql_perf_schema_events_statements_seconds_total[5m])
/
clamp_min(rate(mysql_perf_schema_events_statements_total[5m]), 0.001)
- record: mysql:statement_digest:rows_examined_rate5m
expr: rate(mysql_perf_schema_events_statements_rows_examined_total[5m])
- alert: MySQLSlowQueriesIncreasing
expr: mysql:slow_queries:rate5m > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "MySQL slow queries are increasing on {{ $labels.instance }}"
description: "Slow query rate is {{ $value | printf \"%.3f\" }}/s for 10m. Check MySQL slow log and statement digest metrics."
- alert: MySQLSlowSQLDigestHighAvgLatency
expr: mysql:statement_digest:avg_latency_seconds5m > 1
for: 10m
labels:
severity: warning
annotations:
summary: "Slow SQL digest on {{ $labels.instance }}"
description: "schema={{ $labels.schema }}, avg_latency={{ $value | printf \"%.3f\" }}s, digest={{ $labels.digest }}, sql={{ $labels.digest_text }}"
- alert: MySQLSQLDigestExaminesTooManyRows
expr: mysql:statement_digest:rows_examined_rate5m > 100000
for: 10m
labels:
severity: warning
annotations:
summary: "MySQL SQL digest examines too many rows on {{ $labels.instance }}"
description: "schema={{ $labels.schema }}, rows_examined_rate={{ $value | printf \"%.0f\" }}/s, digest={{ $labels.digest }}, sql={{ $labels.digest_text }}"
- alert: MySQLSQLDigestNoIndexUsed
expr: increase(mysql_perf_schema_events_statements_no_index_used_total[15m]) > 0
for: 5m
labels:
severity: info
annotations:
summary: "MySQL SQL digest used no index on {{ $labels.instance }}"
description: "schema={{ $labels.schema }}, no_index_count={{ $value | printf \"%.0f\" }} in 15m, digest={{ $labels.digest }}, sql={{ $labels.digest_text }}"
检查配置并重载:
promtool check config /etc/prometheus/prometheus.yml
sudo systemctl reload prometheus
4. 导入 Grafana dashboard
可以直观看到慢 SQL。导入 grafana/mysql-slow-sql-dashboard.json 后,选择 Prometheus 数据源即可看到:
- 慢查询增长速率趋势。
- Top 慢 SQL digest 表格,按 5 分钟平均耗时排序。
- Top 扫描行数 SQL digest。
- 未使用索引 SQL digest。
导入路径:
Grafana -> Dashboards -> New -> Import -> Upload JSON file
mysql-slow-sql-dashboard.json配置:
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "Prometheus",
"description": "Prometheus datasource used by this dashboard",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "10.0.0"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
},
{
"type": "panel",
"id": "timeseries",
"name": "Time series",
"version": ""
},
{
"type": "panel",
"id": "table",
"name": "Table",
"version": ""
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 0.1
}
]
},
"unit": "qps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [
"lastNotNull"
],
"displayMode": "table",
"placement": "right",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "rate(mysql_global_status_slow_queries{instance=~\"$instance\"}[5m])",
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Slow Query Rate",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"filterable": true,
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 1
}
]
},
"unit": "s"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "SQL"
},
"properties": [
{
"id": "custom.width",
"value": 720
}
]
},
{
"matcher": {
"id": "byName",
"options": "Schema"
},
"properties": [
{
"id": "custom.width",
"value": 120
}
]
},
{
"matcher": {
"id": "byName",
"options": "Avg Latency"
},
"properties": [
{
"id": "unit",
"value": "s"
},
{
"id": "decimals",
"value": 3
},
{
"id": "custom.cellOptions",
"value": {
"mode": "gradient",
"type": "color-background"
}
}
]
}
]
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 8
},
"id": 2,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "Avg Latency"
}
]
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"exemplar": false,
"expr": "topk($topk, rate(mysql_perf_schema_events_statements_seconds_total{instance=~\"$instance\",schema=~\"$schema\"}[5m]) / clamp_min(rate(mysql_perf_schema_events_statements_total{instance=~\"$instance\",schema=~\"$schema\"}[5m]), 0.001))",
"format": "table",
"instant": true,
"legendFormat": "__auto",
"range": false,
"refId": "A"
}
],
"title": "Top Slow SQL Digests by Avg Latency",
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"digest": true,
"instance": true,
"job": true,
"service": true
},
"indexByName": {
"Value": 3,
"digest_text": 2,
"schema": 1
},
"renameByName": {
"Value": "Avg Latency",
"digest_text": "SQL",
"schema": "Schema"
}
}
}
],
"type": "table"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"filterable": true,
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 100000
}
]
},
"unit": "rows"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "SQL"
},
"properties": [
{
"id": "custom.width",
"value": 720
}
]
},
{
"matcher": {
"id": "byName",
"options": "Rows Examined/s"
},
"properties": [
{
"id": "decimals",
"value": 0
}
]
}
]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 18
},
"id": 3,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "Rows Examined/s"
}
]
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "topk($topk, rate(mysql_perf_schema_events_statements_rows_examined_total{instance=~\"$instance\",schema=~\"$schema\"}[5m]))",
"format": "table",
"instant": true,
"range": false,
"refId": "A"
}
],
"title": "Top SQL Digests by Rows Examined",
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"digest": true,
"instance": true,
"job": true,
"service": true
},
"indexByName": {
"Value": 3,
"digest_text": 2,
"schema": 1
},
"renameByName": {
"Value": "Rows Examined/s",
"digest_text": "SQL",
"schema": "Schema"
}
}
}
],
"type": "table"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"filterable": true,
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 1
}
]
},
"unit": "short"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "SQL"
},
"properties": [
{
"id": "custom.width",
"value": 720
}
]
},
{
"matcher": {
"id": "byName",
"options": "No Index Used"
},
"properties": [
{
"id": "decimals",
"value": 0
}
]
}
]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 18
},
"id": 4,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "No Index Used"
}
]
},
"pluginVersion": "10.0.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "topk($topk, increase(mysql_perf_schema_events_statements_no_index_used_total{instance=~\"$instance\",schema=~\"$schema\"}[15m]))",
"format": "table",
"instant": true,
"range": false,
"refId": "A"
}
],
"title": "SQL Digests Without Index",
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"digest": true,
"instance": true,
"job": true,
"service": true
},
"indexByName": {
"Value": 3,
"digest_text": 2,
"schema": 1
},
"renameByName": {
"Value": "No Index Used",
"digest_text": "SQL",
"schema": "Schema"
}
}
}
],
"type": "table"
}
],
"refresh": "30s",
"schemaVersion": 38,
"style": "dark",
"tags": [
"mysql",
"slow-sql",
"prometheus"
],
"templating": {
"list": [
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(mysql_up, instance)",
"hide": 0,
"includeAll": true,
"label": "Instance",
"multi": true,
"name": "instance",
"options": [],
"query": {
"query": "label_values(mysql_up, instance)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(mysql_perf_schema_events_statements_total{instance=~\"$instance\"}, schema)",
"hide": 0,
"includeAll": true,
"label": "Schema",
"multi": true,
"name": "schema",
"options": [],
"query": {
"query": "label_values(mysql_perf_schema_events_statements_total{instance=~\"$instance\"}, schema)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": {
"selected": false,
"text": "10",
"value": "10"
},
"hide": 0,
"label": "TopK",
"name": "topk",
"options": [
{
"selected": true,
"text": "10",
"value": "10"
},
{
"selected": false,
"text": "20",
"value": "20"
},
{
"selected": false,
"text": "50",
"value": "50"
}
],
"query": "10,20,50",
"queryValue": "",
"skipUrlSync": false,
"type": "custom"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "MySQL Slow SQL Monitoring",
"uid": "mysql-slow-sql",
"version": 1,
"weekStart": ""
}
如果 Grafana 显示 No data 或面板左上角有告警图标,按下面顺序排查:
mysql_up
如果没有结果,说明 Prometheus 没抓到 mysqld_exporter,检查 prometheus.yml 里的 targets 和 Prometheus Status -> Targets 页面。
mysql_global_status_slow_queries
如果 mysql_up 有结果但这个没有,检查 mysqld_exporter 是否能连接 MySQL,以及 exporter 用户权限。
mysql_perf_schema_events_statements_total
如果前两个有结果但这个没有,说明 SQL digest 采集没打开或 MySQL performance_schema 没有可用数据,检查 systemd 服务里是否包含:
--collect.perf_schema.eventsstatements
同时在 MySQL 执行:
SHOW VARIABLES LIKE 'performance_schema';
SELECT COUNT(*) FROM performance_schema.events_statements_summary_by_digest;
如果 events_statements_summary_by_digest 为空,先执行几条业务 SQL,等待 Prometheus 下一个抓取周期后刷新 Grafana。
5. 常用 PromQL
全局慢查询增长速度:
rate(mysql_global_status_slow_queries[5m])
过去 5 分钟平均耗时最高的 SQL digest:
topk(
10,
rate(mysql_perf_schema_events_statements_seconds_total[5m])
/
clamp_min(rate(mysql_perf_schema_events_statements_total[5m]), 0.001)
)
过去 5 分钟扫描行数明显高的 SQL digest:
topk(
10,
rate(mysql_perf_schema_events_statements_rows_examined_total[5m])
)
未使用索引的 SQL digest:
topk(
10,
increase(mysql_perf_schema_events_statements_no_index_used_total[15m])
)
6. 告警说明
规则文件已实现这些告警:
MySQLSlowQueriesIncreasing:全局慢查询持续增加。MySQLSlowSQLDigestHighAvgLatency:某条 SQL digest 平均耗时超过 1 秒。MySQLSQLDigestExaminesTooManyRows:某条 SQL digest 扫描行数过高。MySQLSQLDigestNoIndexUsed:某条 SQL digest 发生未使用索引。
赶紧睡了,明天上班摸鱼再继续施工
1 个帖子 - 1 位参与者