数据库巡检检验哪些实例已经不在使用的方案

2023-12-19 20:50:44

背景：
不清楚线上是否有已经不在使用的实例，没有退掉。

方案：使用阿里云监控api 每隔2个小时取一次上两个小时监控最大读qps，最大写qps，和最大链接数等。然后分析最近三天的最大值

rds 获取的监控指标：

 ["CpuUsage", "IOPSUsage", "DiskUsage", "MySQL_ComDelete", "MySQL_ComInsert",
  "MySQL_ComInsertSelect", "MySQL_ComReplace","MySQL_ComSelect", "MySQL_ComUpdate", "DASMySQLSessionCount"]

DASMySQLSessionCount是通过das服务api接口获得
das20200116_models.GetMySQLAllSessionAsyncRequest
其它监控指标是通过云监控 cms DescribeMetricDataRequest接口获得。

判断rds已经不在使用的指标三天最大写qps 小于1，最大cpu小于1 最大iops小于1 ，最大长链接session 小于5，因为自身监控的原因，就是没用也要少量的qps

a.max_write_max < 1
and a.max_cpu_max < 1
and a.max_iops_max < 1
and a.max_das_total_session_count < 5

判断sql

select
  b.*
from
  (
    select
      inst_id,
      `inst_name`,
      max_write_max,
      max_select_max,
      max_das_total_session_count,
      max_cpu_max,
      max_iops_max
    from
      (
        SELECT
          inst_id,
          `inst_name`,
          max(write_max) AS max_write_max,
          max(select_max) as max_select_max,
          max(das_total_session_count) as max_das_total_session_count,
          max(`cpu_max`) as max_cpu_max,
          max(`iops_max`) as max_iops_max
        FROM
          `rds_monitor_data`
        where
          `exec_date` > date_sub(now(), interval 3 day)
        group by
          inst_id,
          inst_name
      ) a
    where
      a.max_write_max < 1
      and a.max_cpu_max < 1
      and a.max_iops_max < 1
      and a.max_das_total_session_count < 5
  ) b
  JOIN rds_info c
where
  b.inst_id = c.inst_id
  and c.`status` = 1
order by
  b.max_write_max

redis

获得监控指标

["CpuUsage", "memoryUsage", "TotalQps", "GetQps", "PutQps","intranetInRatio","intranetOutRatio","dasRedisTotalSession"]

dasRedisTotalSession 是通过 das 接口 das20200116_models.GetRedisAllSessionRequest获取。
其它指标是通过redis DescribeHistoryMonitorValuesRequest监控接口获取。

阿里云 redis判断实例不在使用的标准。
因为阿里云redis监控数据会产生部分qps，所以不能用qps等于0判断是否在用。

a.max_put_qps_max < 2 （主要是这个参数，只用这个参数判断也可以）
and a.max_get_qps_max < 25 （辅助条件）
and a.max_das_total_session_count < 30（辅助条件）

es 判断方法

获取的指标：

 ["NodeCPUUtilization", "NodeDiskUtilization", "NodeLoad_1m", "NodeHeapMemoryUtilization", "ClusterIndexQPS", "ClusterQueryQPS", "elasticsearch-server.bulk_total_operations", "elasticsearch-server.search_total"]

通过es grafan接口获得监控指标
client.get_emon_monitor_data_with_options 获取一段时间内对index 的读写请求。
排除 ‘.kibana|.report|.monitoring|.apm-|.security-’ 索引。取读写最大的索引。
指标：一个写操作，一个读操作。
“elasticsearch-server.bulk_total_operations”, “elasticsearch-server.search_total”
部分请求体

      request_body = """

              {{
                  "start":{pe_start_ts},
                  "queries":[
                      {{
                       
                          "metric":"{metric_key}",
                          "aggregator":"sum",
                          "downsample":"avg",
                          "tags":
                              {{
                                  "instanceId":"{instance_id}",
                                  "es_resourceUid":"1241148226163200",
                                  "index":"*"
                              }},
                              "granularity":"1m"
                      }}
                  ],
                  "limit":"",
                  "end":{pe_end_ts}
              }}
              """.format(
            pe_start_ts=pe_start_ts, pe_end_ts=pe_end_ts, instance_id=instance_id, metric_key=metric_key)

判断标准：
index_read_max_name，index_bulk_write_max_name 读写的索引名字为空（已经排除es自带’.kibana|.report|.monitoring|.apm-|.security-'）
max_cluster_index_qps_max，max_cluster_query_qps_max 监控指标没在用的时候也不会为0，因为有集群本身的监控，也会访问集群自带的index，产生请求。
加上这两个指标主要为了防止，有的阿里云集群见的太早，接口不支持获取
“elasticsearch-server.bulk_total_operations”, “elasticsearch-server.search_total” 监控指标。

a.max_cluster_index_qps_max < 20  （辅助判断）
and a.max_cluster_query_qps_max < 5（辅助判断）
and a.index_read_max_name = '' （如果能取得，api能获取这个值是最准确的判断）
and a.index_bulk_write_max_name = ''（如果能取得，api能获取这个值是最准确的判断）

文章来源:https://blog.csdn.net/qq_35640866/article/details/134803079
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：veading@qq.com进行投诉反馈，一经查实，立即删除！