数据库学习日常案例20231219-19C集群ORA-27301 ORA-27300分析处理
1 问题概述
今日对数据库检查发现如下错误
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 6 of thread 1
ORA-00312: online log 6 thread 1: '+REDO/orcldb/ONLINELOG/group_6.266.1070296359'
ORA-17503: ksfdopn:2 Failed to open file +REDO/orcldb/ONLINELOG/group_6.266.1070296359
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:32.788217+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 7 of thread 1
ORA-00312: online log 7 thread 1: '+DGDATA4/orcldb/ONLINELOG/group_7.262.1066409025'
ORA-17503: ksfdopn:2 Failed to open file +DGDATA4/orcldb/ONLINELOG/group_7.262.1066409025
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:42.795004+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 8 of thread 2
ORA-00312: online log 8 thread 2: '+REDO/orcldb/ONLINELOG/group_8.267.1070296539'
ORA-17503: ksfdopn:2 Failed to open file +REDO/orcldb/ONLINELOG/group_8.267.1070296539
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:52.801743+08:00
?
2 数据库numa和mtu值信息
[root@orcl01 ~]# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 0 size: 195001 MB
node 0 free: 1759 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
node 1 size: 196608 MB
node 1 free: 629 MB
node 2 cpus: 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
node 2 size: 131072 MB
node 2 free: 613 MB
node 3 cpus: 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 3 size: 131072 MB
node 3 free: 465 MB
node distances:
node ? 0 ? 1 ? 2 ? 3?
? 0: ?10 ?21 ?21 ?21?
? 1: ?21 ?10 ?21 ?21?
? 2: ?21 ?21 ?10 ?21?
? 3: ?21 ?21 ?21 ?10?
[root@orcl01 ~]# ip ad sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
? ? link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
? ? inet 127.0.0.1/8 scope host lo
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 ::1/128 scope host?
? ? ? ?valid_lft forever preferred_lft forever
2: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f0 brd ff:ff:ff:ff:ff:ff
3: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
? ? link/ether 34:0a:98:9c:ed:37 brd ff:ff:ff:ff:ff:ff
? ? inet 10.2.0.101/24 brd 10.2.0.255 scope global ens3f0
? ? ? ?valid_lft forever preferred_lft forever
? ? inet 169.254.22.17/19 brd 169.254.31.255 scope global ens3f0:1
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 fe80::360a:98ff:fe9c:ed37/64 scope link?
? ? ? ?valid_lft forever preferred_lft forever
4: ens3f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether 34:0a:98:9c:ed:38 brd ff:ff:ff:ff:ff:ff
5: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f1 brd ff:ff:ff:ff:ff:ff
? ? inet 10.228.224.4/26 brd 10.228.224.63 scope global eno3
? ? ? ?valid_lft forever preferred_lft forever
? ? inet 10.228.224.10/26 brd 10.228.224.63 scope global secondary eno3:1
? ? ? ?valid_lft forever preferred_lft forever
? ? inet 10.228.224.12/26 brd 10.228.224.63 scope global secondary eno3:2
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 2409:8002:5a06:120:10:0:2:d00c/116 scope global deprecated?
? ? ? ?valid_lft forever preferred_lft 0sec
? ? inet6 2409:8002:5a06:120:10:0:2:d00a/116 scope global deprecated?
? ? ? ?valid_lft forever preferred_lft 0sec
? ? inet6 2409:8002:5a06:120:10:0:2:d004/116 scope global?
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 fe80::c6b8:b4ff:fe2e:b6f1/64 scope link?
? ? ? ?valid_lft forever preferred_lft forever
6: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f2 brd ff:ff:ff:ff:ff:ff
7: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f3 brd ff:ff:ff:ff:ff:ff
8: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
? ? link/ether 52:54:00:76:94:f3 brd ff:ff:ff:ff:ff:ff
? ? inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
? ? ? ?valid_lft forever preferred_lft forever
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
? ? link/ether 52:54:00:76:94:f3 brd ff:ff:ff:ff:ff:ff
[root@orcl01 ~]#?
3 问题分析处理
很早以前遇到过IPC这样的问题无数,
这个就是咱们现在大内存的主机下,缓冲区碎片整理单元有点小,并且回环地址的mtu值有问题。我们按照官方MOS文档指示进行更改即可;
On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.
*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***
On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.
On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here ‘n’ is the number of NUMA nodes.
Additionally, the MTU value should be modified as below
#ifconfig lo mtu 16436
To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :
MTU=16436
Save the file and restart the network service to load the changes
#service network restart
Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.
- ?Database Alert log captures below:
mtype: 61 process 14152 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /oracle/admin/diag/rdbms/<DB>/<SID>/trace/<SID>_ora_14152.trc ?(incident=249601):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /oracle/admin/diag/rdbms/<DB>/<SID>/incident/incdir_249601/<SID>_ora_14152_i249601.trc
- Network communication issue and High Memory consumption would be observed during this time.
- Server is running with UEK3 kernel
This happens due to less space available for network buffer reservation.根据查询数据库metalink发现需要对主机参数进行调整优化。 ???????
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若内容造成侵权/违法违规/事实不符,请联系我的编程经验分享网邮箱:veading@qq.com进行投诉反馈,一经查实,立即删除!