记enq: HW - contention等待事件与表空间自动扩展引发的故障

一、故障现象

在15:30-16:00 检测发现大量enq: HW - contention和enq: TX - row lock contention等待事件,并且这两事件越来越多,最终导致短暂的业务宕机。后面检测发现uses表空间正在使用自动扩展,于是增加users表空间,问题解决

二、故障分析

1、查看15:30-16:00时间的snap_id,用于定位这个时间段内的所有session操作
                
                select * from dba_hist_snapshot where begin_interval_time>to_date('2019-12-03','yyyy-mm-dd')
                
                        
2、查看这个时间段主要的等待事件
                
                select 
                   to_char(sample_time, 'YYYY-MM-DD HH24:MI'),event,
                   count(*)
                from dba_hist_active_sess_history
                where snap_id in (116604, 116605)
                and sample_time>=to_date('2019-12-03 15:10','yyyy-mm-dd hh24:mi')
                group by  to_char(sample_time, 'YYYY-MM-DD HH24:MI'),event
                order by 1,3;
                
                        

从上面分析可以看出等待事件为enq: HW – contention,enq: TX - row lock contention

3、根据上面的等待事件查看哪些语句执行产生的(526awsta39934)
                
                select t.instance_number,
                    t.sample_time,
                    t.session_id,
                    t.session_serial#,
                    t.sql_id,
                    t.event,
                    t.blocking_session,
                    t.blocking_inst_id,
                    t.program,
                    t.machine
                from dba_hist_active_sess_history t
                where snap_id in (116604, 116605)
                and to_char(sample_time,'yyyy-mm-dd hh24:mi')='2019-12-03 15:39'
                and event in ('enq: TX - row lock contention', 'enq: HW - contention')
                order by 2, 1;
                
                        
4、根据上面的sql_id为‘526awsta39934’引起的‘enq: HW - contention’高水位线争用事件,查看出语句为
                
                    INSERT INTO TRANSACTION_LOG VALUES (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 ,:B14 ,:B15 )
                
                        
5、在根据步骤3,可以查出‘enq: TX - row lock contention’行锁的阻塞会话id

根据上面阻塞的会话id可以,查找正好是sql_id’ 526awsta39934’阻塞

                
                select t.instance_number,
                    t.sample_time,
                    t.session_id,
                    t.session_serial#,
                    t.sql_id,
                    t.event,
                    t.blocking_session,
                    t.blocking_inst_id,
                    t.program,
                    t.machine
                from dba_hist_active_sess_history t
                where snap_id in (116604, 116605)
                and /*sample_time >= to_date('2019-12-03 15:10', 'yyyy-mm-dd hh24:mi')
                and */to_char(sample_time,'yyyy-mm-dd hh24:mi')='2019-12-03 15:39'
                and t.session_id in (2531,3595,2128,3208,3555,5598,168)
                and t.instance_number=2
                -- and event in ('enq: TX - row lock contention', 'enq: HW - contention')
                order by 2, 1;
                
                        
6、总结

由上面可以分析当大量高并发语句‘526awsta39934’,往数据库中插入数据会去查看数据块高水位线下的可用块,正好此时表空间users处于自动扩展,同时其他的会话也在插入数据或修改数据,也需要高水位线下的可用块。最终引起了行锁。

ps:关于oracle高水位线参考

https://www.cnblogs.com/linjiqin/archive/2012/01/15/2323030.html