DG延迟排查思路-摩杜云开发者社区

问题概述

近期在生产环境进行dg切换演练，需要active备库，演练完用闪回点切换回dg。切换后大量数据库产生应用延迟。

1.检查dg状态，显示各进程正常，但MRP0进程的SEQUENCE#和BLOCK#无变化，说明无日志应用。

rfs进程能正常接收日志。

SQL> select process,status,sequence#,block# from v$managed_standby;

PROCESS   STATUS        SEQUENCE#     BLOCK#
--------- ------------ ---------- ----------
ARCH      CLOSING              55      92160
ARCH      CLOSING              54      86016
ARCH      CONNECTED             0          0
ARCH      CLOSING              53      77824
RFS       IDLE                  0          0
RFS       IDLE                  0          0
RFS       IDLE                 56       9990
MRP0      APPLYING_LOG         56       6097

8 rows selected.

SQL>
SQL>
SQL> /

PROCESS   STATUS        SEQUENCE#     BLOCK#
--------- ------------ ---------- ----------
ARCH      CLOSING              55      92160
ARCH      CLOSING              54      86016
ARCH      CONNECTED             0          0
ARCH      CLOSING              53      77824
RFS       IDLE                  0          0
RFS       IDLE                  0          0
RFS       IDLE                 56       9995
MRP0      APPLYING_LOG         56       6097

8 rows selected.

2.检查v$dataguard_stats视图。同样，发现日志接收正常，但应用有延迟。

SQL>  select * from v$dataguard_stats;

NAME                             VALUE                     UNIT                           TIME_COMPUTED                  DATUM_TIME
-------------------------------- ------------------------- ------------------------------ ------------------------------ ------------------------------
transport lag                    +00 00:00:00              day(2) to second(0) interval   04/16/2022 12:52:36            04/16/2022 12:52:35
apply lag                        +00 01:48:29              day(2) to second(0) interval   04/16/2022 12:52:36            04/16/2022 12:52:35
apply finish time                                          day(2) to second(3) interval   04/16/2022 12:52:36
estimated startup time           83                        second                         04/16/2022 12:52:36

3.排查主库，指向备库的归档路径正常。

SQL> select dest_id,DESTINATION,error from v$archive_dest where dest_id=2;

   DEST_ID DESTINATION                              ERROR
---------- ---------------------------------------- ----------------------------------------------------------------------------------------------------
         2 xwjdgt

4.这说明dg在接收日志上正常，卡在日志应用上。

检查mrp进程等待事件。

SQL>  select addr,PID,spid,pname from v$process where upper(pname) like upper('%mrp%');

ADDR                    PID SPID                     PNAME
---------------- ---------- ------------------------ ----------------------------------------
000000006DC42A30         37 14854                    MRP0

SQL> col event for a50
SQL> select event,blocking_session from v$session where paddr='000000006DC42A30';

EVENT                                              BLOCKING_SESSION
-------------------------------------------------- ----------------
flashback log file sync

从mrp等待事件看，进程在等待flashback log file sync

5.同时，在alert日志中也可以看到大量关于闪回日志的等待

*************************************************************
Unable to allocate flashback log of 1600 blocks from
current recovery area of size 73400320 bytes.
Recovery Writer (RVWR) is stuck until more space
is available in the recovery area.
Unable to write Flashback database log data because the
recovery area is full, presence of a guaranteed
restore point and no reusable flashback logs.
Sat Apr 16 11:52:58 2022
*************************************************************
Unable to allocate flashback log of 1600 blocks from
current recovery area of size 73400320 bytes.
Recovery Writer (RVWR) is stuck until more space
is available in the recovery area.
Unable to write Flashback database log data because the
recovery area is full, presence of a guaranteed
restore point and no reusable flashback logs.
Sat Apr 16 11:53:58 2022
*************************************************************
Unable to allocate flashback log of 1600 blocks from
current recovery area of size 73400320 bytes.
Recovery Writer (RVWR) is stuck until more space
is available in the recovery area.
Unable to write Flashback database log data because the
recovery area is full, presence of a guaranteed
restore point and no reusable flashback logs.

从mrp等待事件和alert日志中可以看到，因为快速闪回区满了，导致闪回日志无法切换，mrp不能应用日志。

检查v$restore_point，因为创建了guaranteed point，数据库要保证这个闪回点可闪回。

SQL>  select * from v$restore_point;

       SCN DATABASE_INCARNATION# GUA STORAGE_SIZE TIME                                     RESTORE_POINT_TIME   PRE NAME
---------- --------------------- --- ------------ ---------------------------------------- -------------------- --- --------------------
   1319954                     2 YES     65536000 16-APR-22 10.27.52.000000000 AM                               YES TEST

问题原因

因为存在一个guaranteed，闪回区不能覆盖，导致mrp无法继续应用。

关于闪回点的描述见参考文档部分。

解决方案

1.删除闪回点。

drop restore point test;

2.增加闪回区大小

alter system set db_recover_file_dest_size=100G;

参考文档

https://docs.oracle.com/database/121/SQLRF/statements_6013.htm#SQLRF20001