Orchestrator源码解读3-故障处理阶段

2024-01-10 11:30:56

文接上篇,Orchestrator源码解读2-故障失败发现-CSDN博客?,上篇 阶段了如何发现故障或失败,OC会对被管理的数据库进行状态信息数据收集之后,在OC的后台管理数据库(benkend)进行一个复杂查询,有个状态值已经在该复杂SQL中进行了判断。根据SQL查询的至会存储到结构体中,判断故障类型主要是根据结构体中的字段。这些故障类型有的需要进行处理,

分类故障类型处理函数源码条件源码翻译故障描述isActionableRecoveryisInEmergencyOperationGracefulPeriod
NoProblem没有集群健康健康FALSE
DeadMasterWithoutReplicas没有a.IsMaster && !a.LastCheckValid && a.CountReplicas == 0主库 ,最近一次检测实例失败,没有从副本主库宕机,该主库没有从副本FALSE
DeadMastercheckAndRecoverDeadMastera.IsMaster && !a.LastCheckValid && a.CountValidReplicas == a.CountReplicas && a.CountValidReplicatingReplicas == 0该实例为主库 且 最近一次主库探活失败 且 所有从副本都存活 且 主从复制正常的从副本个数为0主库宕机,所有从副本复制中断,从副本存活TRUEcheckAndRecoverGenericProblemFALSE
DeadMasterAndReplicascheckAndRecoverGenericProblema.IsMaster && !a.LastCheckValid && a.CountReplicas > 0 && a.CountValidReplicas == 0 && a.CountValidReplicatingReplicas == 0该实例为主库 且 最近一次主库探活失败 且 从副本个数大于0 且 存活的从副本为0 且 复制正常的从副本为0主库和所有的从副本都宕机FALSE
DeadMasterAndSomeReplicascheckAndRecoverDeadMastera.IsMaster && !a.LastCheckValid && a.CountValidReplicas < a.CountReplicas && a.CountValidReplicas > 0 && a.CountValidReplicatingReplicas == 0该实例为主库 且 最近一次主库探活失败 且 有效从副本个数小于从副本总数 且 复制正常的从副本为0主库和部分从副本宕机TRUEcheckAndRecoverGenericProblemFALSE
UnreachableMasterWithLaggingReplicascheckAndRecoverGenericProblema.IsMaster && !a.LastCheckValid && a.CountLaggingReplicas == a.CountReplicas && a.CountDelayedReplicas < a.CountReplicas && a.CountValidReplicatingReplicas > 0该实例为主库 且 最近一次主库探活失败 且 所有从副本都存在延迟 且主库宕机 ,所有从副本都延迟FALSE
UnreachableMastercheckAndRecoverGenericProblema.IsMaster && !a.LastCheckValid && !a.LastCheckPartialSuccess && a.CountValidReplicas > 0 && a.CountValidReplicatingReplicas > 0该实例为主库 且 最近一次主库探活失败 且 存活的从副本个数大于0 且 复制正常的从副本个数大于0通过OC节点不能连接,但是有复制正常的从副本FALSE
MasterSingleReplicaNotReplicatinga.IsMaster && a.LastCheckValid && a.CountReplicas == 1 && a.CountValidReplicas == a.CountReplicas && a.CountValidReplicatingReplicas == 0该实例为主库 且 最近一次主库探活正常 且 只有一个从副本 且 复制正常的从副本个数为0主库正常且只有一个从副本,但该从副本复制不正常
MasterSingleReplicaDeada.IsMaster && a.LastCheckValid && a.CountReplicas == 1 && a.CountValidReplicas == 0该实例为主库 且 最近一次主库探活正常 且 只有一个从副本 且 存活的从副本个数为0主库正常且只有一个从副本,但该从副本宕机
AllMasterReplicasNotReplicatingcheckAndRecoverGenericProblema.IsMaster && a.LastCheckValid && a.CountReplicas > 1 && a.CountValidReplicas == a.CountReplicas && a.CountValidReplicatingReplicas == 0该实例为主库 且 最近一次主库探活正常 且 从副本个数大于1个 且 从副本都存活 且 复制正常的从副本个数为0主库正常 但是所有的从副本主从复制不正常FALSE
AllMasterReplicasNotReplicatingOrDeadcheckAndRecoverGenericProblema.IsMaster && a.LastCheckValid && a.CountReplicas > 1 && a.CountValidReplicas < a.CountReplicas && a.CountValidReplicas > 0 && a.CountValidReplicatingReplicas == 0主库正常 但是所有的从副本主从复制不正常 或宕机FALSE
半同步复制LockedSemiSyncMasterHypothesis
LockedSemiSyncMastercheckAndRecoverLockedSemiSyncMastera.IsMaster && a.SemiSyncMasterEnabled && a.SemiSyncMasterStatus && a.SemiSyncMasterWaitForReplicaCount > 0 && a.SemiSyncMasterClients < a.SemiSyncMasterWaitForReplicaCount半同步复制因为没有得到从副本的确认被锁住TRUEcheckAndRecoverGenericProblemFALSE
MasterWithTooManySemiSyncReplicascheckAndRecoverMasterWithTooManySemiSyncReplicasconfig.Config.EnforceExactSemiSyncReplicas && a.IsMaster && a.SemiSyncMasterEnabled && a.SemiSyncMasterStatus && a.SemiSyncMasterWaitForReplicaCount > 0 && a.SemiSyncMasterClients > a.SemiSyncMasterWaitForReplicaCount半同步复制的从副本比配置的多TRUE
MasterWithoutReplicas
Co-MasterDeadCoMastercheckAndRecoverDeadCoMasterOC不能访问中间主库 且 所有从副本主从复制都不正常TRUE
DeadCoMasterAndSomeReplicascheckAndRecoverDeadCoMasterTRUE
UnreachableCoMaster
AllCoMasterReplicasNotReplicating
Intermediate Master 级联复制的中间主库DeadIntermediateMastercheckAndRecoverDeadIntermediateMasterTRUE
DeadIntermediateMasterWithSingleReplicacheckAndRecoverDeadIntermediateMasterTRUE
DeadIntermediateMasterWithSingleReplicaFailingToConnectcheckAndRecoverDeadIntermediateMasterTRUE
DeadIntermediateMasterAndSomeReplicascheckAndRecoverDeadIntermediateMasterTRUE
DeadIntermediateMasterAndReplicascheckAndRecoverGenericProblemFALSE
UnreachableIntermediateMasterWithLaggingReplicascheckAndRecoverGenericProblemFALSE
UnreachableIntermediateMaster
AllIntermediateMasterReplicasFailingToConnectOrDeadcheckAndRecoverDeadIntermediateMaster
AllIntermediateMasterReplicasNotReplicating
FirstTierReplicaFailingToConnectToMaster
BinlogServerFailingToConnectToMaster
// Group replication problems
组复制DeadReplicationGroupMemberWithReplicascheckAndRecoverDeadGroupMemberWithReplicasTRUE

文章来源:https://blog.csdn.net/weixin_48154829/article/details/135490192
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。