Oracle 11g集群故障排错
问题现象:
Oracle集群好像不漂移IP了,全部会话连接数量只集中在某个节点上,这个节点断掉也不会自动连接到另外一个节点上;
问题排查:
在节点RAC1上执行集群状态检查命令(注意看红色字体部分):
grid@rac01:[/home/grid]crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARDATA.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.CLDATA.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.LISTENER.lsnr
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.USDATA.dg
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.USDATA01.dg
ONLINE OFFLINE rac01
ONLINE ONLINE rac02
ora.asm
ONLINE ONLINE rac01 Started
ONLINE ONLINE rac02 Started
ora.gsd
OFFLINE OFFLINE rac01
OFFLINE OFFLINE rac02
ora.net1.network
ONLINE ONLINE rac01
ONLINE ONLINE rac02
ora.ons
ONLINE ONLINE rac01
ONLINE ONLINE rac02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac02
ora.cvu
1 ONLINE ONLINE rac02
ora.oc4j
1 ONLINE ONLINE rac02
ora.rac01.vip
1 ONLINE ONLINE rac01
ora.rac02.vip
1 ONLINE ONLINE rac02
ora.racdb.db
1 ONLINE OFFLINE Instance Shutdown
2 ONLINE ONLINE rac02 Open
ora.scan1.vip
1 ONLINE ONLINE rac02
由此发现可能USDATA01卷组不能正常工作,尝试启动USDATA01卷组,发现继续报错。
grid@rac01:[/home/grid]srvctl start diskgroup -g USDATA01
PRCR-1079 : Failed to start resource ora.USDATA01.dg
CRS-5017: The resource action "ora.USDATA01.dg start" encountered the following error:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "USDATA01" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "USDATA01"
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/rac01/agent/crsd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.USDATA01.dg' on 'rac01' failed
查看ASM磁盘卷组,发现USDATA01卷组无法正常识别,所以节点RAC1的状态不正常是因为无法正常识别该卷组的问题。
grid@rac01:[/home/grid]asmcmd
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 666603 332494 0 332494 0 N ARDATA/
MOUNTED EXTERN N 512 4096 1048576 51207 50811 0 50811 0 Y CLDATA/
MOUNTED EXTERN N 512 4096 1048576 512009 124302 0 124302 0 N USDATA/