从alert日志看Oracle 11g DataGuard日志传输

Oracel DG技术本身,是借助redo log的传递和应用,确保在standby端数据和primary端保持一致数据。在这个过程中,Redo Transport和Redo Apply是两个核心动作。Redo Transport是将Redo Log信息传递到Standby端,等待进行Apply。而Redo Apply就是将这些日志应用执行,更改Standby端的数据,来实现一致。
 

下面实验,就是利用alert log来观察一对Primary和Standby在启动过程、工作过程中传递日志的情况。从而证明Oracle DG的工作特点和机制。

相关参考:

1、环境介绍

我们在Oracle 11g上进行试验,版本为11.2.0.4。由于环境限制,笔者Primary和Physical Standby在相同服务器上。Primary实例名称为ora11g,Standby实例名为ora11gsy。
 
监听程序首先关闭,来查看数据库行为。

[oracle@SimpleLinux ~]$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 27-APR-2014 13:40:15

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=SimpleLinux)(PORT=1521)))
 
TNS-12541: TNS:no listener

 TNS-12560: TNS:protocol adapter error

  TNS-00511: No listener

  Linux Error: 111: Connection refused

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))

TNS-12541: TNS:no listener

 TNS-12560: TNS:protocol adapter error

  TNS-00511: No listener

  Linux Error: 111: Connection refused

 

主库Primary日志。

 

[root@SimpleLinux ~]# su - oracle

[oracle@SimpleLinux ~]$ cd /u01/app/diag/rdbms/ora11g/ora11g/trace/

[oracle@SimpleLinux trace]$ ls -l | grep alert

-rw-r-----. 1 oracle oinstall 176813 Apr 21 21:58 alert_ora11g.log

 

2、Primary端启动过程

 

首先启动数据库到nomount状态,此时pmon是进行工作的。

 

[oracle@SimpleLinux ~]$ env | grep ORACLE_SID

ORACLE_SID=ora11g

[oracle@SimpleLinux ~]$ sqlplus /nolog

 

SQL*Plus: Release 11.2.0.4.0 Production on Sun Apr 27 13:54:15 2014

 

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

 

SQL> conn / as sysdba

Connected to an idle instance.

SQL> startup nomount

ORACLE instance started.

 

Total System Global Area  372449280 bytes

Fixed Size                  1364732 bytes

Variable Size            331353348 bytes

Database Buffers          33554432 bytes

Redo Buffers                6176768 bytes

 

这个阶段日志是没有什么额外特殊的信息的,只有正常的后台实例启动。

 

Sun Apr 27 13:54:58 2014

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Initial number of CPU is 1

CELL communication is configured to use 0 interface(s):

CELL IP affinity details:

    NUMA status: non-NUMA system

    cellaffinity.ora status: N/A

CELL communication will use 1 IP group(s):

    Grp 0:

(篇幅原因,有省略……)

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
 
Sun Apr 27 13:55:07 2014

MMNL started with pid=16, OS id=1776

starting up 1 shared server(s) ...

ORACLE_BASE from environment = /u01/app

 

切换到mount状态。

 

SQL> alter database mount;

Database altered.

 

日志中,定位到mount状态。

 

Sun Apr 27 14:03:18 2014

alter database mount

Sun Apr 27 14:03:23 2014

Successful mount of redo thread 1, with mount id 4242195174

Database mounted in Exclusive Mode

Lost write protection disabled

Completed: alter database mount

 

之后启动数据库,到open状态。在mount之前,数据库是不会生成和执行redo相关的动作的。从mount到open阶段,是需要进行一个instance recovery过程的,也就是日志前滚后滚的动作。在mount和mount之前,是不会有Redo Transport过程的。
 
 

Sun Apr 27 14:24:56 2014

alter database open

Beginning crash recovery of 1 threads

Started redo scan

Completed redo scan

 read 78 KB redo, 26 data blocks need recovery

Started redo application at

 Thread 1: logseq 32, block 47

Recovery of Online Redo Log: Thread 1 Group 1 Seq 32 Reading mem 0

  Mem# 0: /u01/app/oradata/ORA11G/onlinelog/o1_mf_1_9mnjwtj9_.log

  Mem# 1: /u01/app/fast_recovery_area/ORA11G/onlinelog/o1_mf_1_9mnjwvdm_.log

Completed redo application of 0.02MB

Completed crash recovery at

 Thread 1: logseq 32, block 203, scn 815633

 26 data blocks read, 26 data blocks written, 78 redo k-bytes read

Sun Apr 27 14:24:58 2014

LGWR: STARTING ARCH PROCESSES

Sun Apr 27 14:24:58 2014

 

Fatal NI connect error 12541, connecting to:

 (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.137.88)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=ora11gsy)(CID=(PROGRAM=oracle)(HOST=SimpleLinux.localdomain)(USER=oracle))))
 
 

  VERSION INFORMATION:

        TNS for Linux: Version 11.2.0.4.0 - Production

        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

  Time: 27-APR-2014 14:25:03

  Tracing not turned on.

  Tns error struct:

    ns main err code: 12541

 

TNS-12541: TNS:no listener

    ns secondary err code: 12560

    nt main err code: 511

 

TNS-00511: No listener

    nt secondary err code: 111

    nt OS err code: 0

 

 

Check whether the listener is up and running.

FAL[server, ARC3]: Error 12541 creating remote archivelog file 'ora11gsy'

FAL[server, ARC3]: FAL archive failed, see trace file.

ARCH: FAL archive failed. Archiver continuing

ORACLE Instance ora11g - Archival Error. Archiver continuing.

Starting background process QMNC

Sun Apr 27 14:25:11 2014

QMNC started with pid=25, OS id=2129

Sun Apr 27 14:25:25 2014

Completed: alter database open

Sun Apr 27 14:25:28 2014

Starting background process CJQ0

Sun Apr 27 14:25:28 2014

CJQ0 started with pid=28, OS id=2149

Setting Resource Manager plan SCHEDULER[0x32DF]:DEFAULT_MAINTENANCE_PLAN via scheduler window
 
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter

Sun Apr 27 14:25:34 2014

Starting background process VKRM

Sun Apr 27 14:25:34 2014

VKRM started with pid=29, OS id=2151

 

Open过程完成,但是数据库日志不断报错。主要体现在两个方面,一个是监听器故障,Primary在传递日志的时候,archive_log_dest配置两种类型,location和service。其中service就存放对应服务名称。这个服务名称是借助Oracle NET实现,Oracle NET外表体现就是监听器。另一种类型错误就是心跳机制,Oracle Primary在不断的访问standby。
 
 

在之后过程,日志中不断报错。说明这个过程是一个周期性确认的过程。

 

(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.137.88)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=ora11gsy)(CID=(PROGRAM=oracle)(HOST=SimpleLinux.localdomain)(USER=oracle))))
 
 

  VERSION INFORMATION:

        TNS for Linux: Version 11.2.0.4.0 - Production

        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

  Time: 27-APR-2014 14:30:00

  Tracing not turned on.

  Tns error struct:

    ns main err code: 12541

 

TNS-12541: TNS:no listener

    ns secondary err code: 12560

    nt main err code: 511

 

TNS-00511: No listener

    nt secondary err code: 111

    nt OS err code: 0

Error 12541 received logging on to the standby

Check whether the listener is up and running.

PING[ARC1]: Heartbeat failed to connect to standby 'ora11gsy'. Error is 12541.

 

心跳12504错误,信息为监听器没有启动。

 

[oracle@SimpleLinux ~]$ oerr ora 12504

12504, 00000, "TNS:listener was not given the SERVICE_NAME in CONNECT_DATA"

// *Cause:  The listener was not configured with a default service and

//  SERVICE_NAME was missing from the CONNECT_DATA received by the listener.

// *Action: Possible solutions are:

//  - Configure DEFAULT_SERVICE parameter in LISTENER.ORA with a valid service

//    name. Reload the listener parameter file using reload [<listener_name>].

//    This should be done by the LISTENER administrator.

//  - If using a service name, Check that the connect descriptor corresponding

//    to the service name in TNSNAMES.ORA has a SERVICE_NAME or SID component

//    in the CONNECT_DATA.

相关推荐