Monthly Archives: March 2011

Simple explanation of Wait Event “cursor: pin s”

Let’s start with basics to understand the root cause of this problem.

What are Mutexes?

Oracle introduced Mutexes in Oracle Version 10.2 replacing library cache pin and they stand for mutual exclusion (functionality is similar to mutexes in c).  Mutexes are used to protect data or other resources from concurrent access. Don’t come to any conclusion based on the previous statement.  Please read ahead.

How does Oracle Mutexes work?

Oracle uses counters to implement mutexes.  The counters are called as reference or ref counters

1. Every time a particular mutex is obtained, Oracle will increment its value.

2. Every time a particular mutex is released, Oracle will decrement its value.

Finally what is “cursor pin s”?

A wait even that occurs when a session wants a mutex in shared mode on a cursor. As mentioned in the previous section, Oracle has to update the ref counters to get the mutex

However it is very important to understand that access to these counters are not concurrent. If there are concurrent sessions trying to obtain the mutex, then only one session can actually increment or decrement the reference counter at a time. Therefore these concurrent sessions must wait.

Four key points to avoid confusion are

  1. Oracle is requesting the mutex in shared mode
  2. No session is holding  the mutex in exclusive mode
  3.  Wait event occurs as Oracle was not able to update the reference counters.
  4. Sessions will wait over and over until the mutex is obtained. This may cause CPU spike.

The simplest analogy to understand “cursor pin s” is traffic bottlenecks caused by Tolls. There may not be a real traffic problem because of road capacity or lane merger. Instead the traffic problem is caused by toll booth.

The easiest way to detect “cursor: pin s” is running AWR report and the queries causing the “cursor: pin s” wait event can be identified with ADDM report.  You can also query

  1. V$ACTIVE_SESSION_HISTORY:  Column P1 (idn or sql hash_value)
  2.  V$MUTEX_SLEEP_HISTORY:  Column MUTEX_IDENTIFIER (idn or sql hash_value)

Bugs

One of the notorious bugs associated with mutexes  is Bug 6904068. This affects almost all versions of Oracle  from 10.2.0.x to 11.2.0.x .  We hit this bug with 10.2.0.3 database because one of our SQL statement was executed with very high concurrency.

Database Solutions (As usual please contact Oracle support when playing around with underscore parameter)

1. Apply patch for the bug 6904068 and then adjust _FIRST_SPARE_PARAMETER.  After applying the patch, setting _FIRST_SPARE_PARAMETER to ZERO will retain current behavior.

2. Set underscore  parameter “ _CURSOR_FEATURES_ENABLED”  to  10

3. Disable mutexes and embrace library cache pins. Mutexes can be disabled by setting underscore parameter “_KKS_USE_MUTEX_PIN” to false

Application solution

1. Synonym Approach: Create multiple synonyms for table and use them in SQL statements. This will force Oracle  to use different SQL area for the SQL statement.  This is good if high concurreny SQL is used for multiple packages.

  • Select  ename f rom synonym1 where job=:b1; (In Package emppkg1)
  • Select  ename f rom synonym2 where job=:b1; (In Package emppkg2)

Where synonym1  and synonym2 are synonyms for  table EMP

2. Table Alias Approach: Create table aliases for the same SQL if they are used in different packages

  • Select  ename f rom EMP A  where job=:b1; (In Package emppkg1)
  • Select  ename f rom EMP B where job=:b1; (In Package emppkg2)

3.JAVA prepare statement Approach:  This approach is good when SQL statement is prepared using JAVA where in we modify the SQL statement in a way to have multiple SQL areas for same SQL statement by adding table alias for the SQL statement.  The simplest way to generate  table alias is using hostname.

“select ename from emp myapp1  where myapp1.ename = :1″  INSTEAD OF    “select ename from emp where ename =:1”


Moral of Story 

Everything in excess is opposed to nature –Hippocrates  .


Advertisements

Raw Devices still rules the roost

Note : The focus of this blog is not Data Guard.

We found ourselves in complex situation while implementing Data Guard.  Since our SLA’s are in Milli seconds , Our response times jumped 400% when we implemented synchronous physical Data guard;  All best practices were followed  as per Oracle. See metalink Note:387343.1 and Data Guard Redo Apply and Media Recovery Best Practices 10gR2

Introduction of Data Guard introduced data guard related wait events but log file sync increased more than 10  times indicating  that writes to  standby redo logs were consuming more time. We have a very talented storage team. In the 1st phase, we striped the standby redo log files(1GB)  across 8 disks  in such a way that data only resides on the tracks closer to the drive head. This reduced  the impact of  data guard considerably and helped us to maintain the SLA’s.

In the next phase , we used raw devices for redo logs and control files. With raw devices , Our response times were kind of close to architecture without Data guard.

Our Environment

Database : 10.2.0.3

OS : AIX5.3

Storage: DMX

thanks