Monthly Archives: December 2011

Recreating your Oracle Inventory

From 10g onwards, you can reverse engineer and recreate your Oracle inventory if it gets corrupted or  accidentally deleted, thereby avoiding  time consuming re-installation of Oracle S/W  or any other unsupported tricks.

If Oracle inventory is corrupted or missing, you generally get the  below  error when opatch command is issued.

=============================================

oracle@myhost:/app/oracle$ opatch lsinventory
Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

Oracle Home                   : /app/oracle/product/10.2/db
Central Inventory           : /app/oracleai/oraInventory
from                                : /etc/oraInst.loc
OPatch version               : 11.2.0.1.6
OUI version                     : 10.2.0.3.0
Log file location               : /app/oracle/product/10.2/db/cfgtoollogs/opatch/opatch2011-12-27_13-19-08PM.log

OPatch failed to locate Central Inventory.
Possible causes are:
The Central Inventory is corrupted
The oraInst.loc file specified is not valid.
LsInventorySession failed: OPatch failed to locate Central Inventory.
Possible causes are:
The Central Inventory is corrupted
The oraInst.loc file specified is not valid.

OPatch failed with error code 73
oracle@myhost:/app/oracle$

=============================================

..

..

You may also get this error because of  incorrect inventory location.  So  it is a good idea to make sure the location of inventory is specified correctly  in one of the following files depending upon you OS.

.

  1. /var/opt/oracle/oraInst.loc
  2. /etc/oraInst.loc

.

Contents of oraInst.loc

bash-3.2$ cat /etc/oraInst.loc
inventory_loc=/app/oraInventory
inst_group=dba

..

If the error occurred due to missing or corrupt inventory, then you can recreate the inventory following the steps below.

  1. Backup your existing oracle corrupted  inventory  if it exists.
  2. Run the following OUI command from the Oracle home  whose inventory is corrupt or missing.

cd $ORACLE_HOME/oui/bin

./runInstaller -silent –attachHome ORACLE_HOME=”/app/oracle/product/10.2/db” ORACLE_HOME_NAME=”Ora10202Home”

Note: Even though –attachHome was introduced with OUI version 10.1, it is  doucumented with  OUI 10.2 and higher.

Problem Solving with AWR report

AWR report is by  far one of the best and superior Oracle diagnostic tool. No other database diagnostic tool comes even close to AWR reports. I will discuss one of our recent experiences of  using AWR to resolve a complex issue.  I consider this as an complex issue as the issue was identified with AWR even though it was not an database issue.

.Note: Our SLA’s are in milliseconds and every change to the database has to be performance tested. You will surprised how sensitive the database/OS/App becomes when you have millisecond SLA’s

.Problem : Response times are really very bad in  our performance environment.

.

What could be the most likely culprits?.

  1. Newly applied AIX patch
  2. Application Code
  3. Unusual Storage  activity/SRDF
  4. Network connectivity issues.
  5. Database itself
.
.

Top wait events from Oracle.

..

.

“log file sync” was the top wait event  with average wait of 17ms. Normally this wait event is around 8-9ms. We also noticed unusual number of SQL*net break/reset to client. I think important most part of troubleshooting is to try and relate wait events.  Since “log file sync” doubled , the obvious culprit becomes our EMC storage which was not the actual case.

But the key to resolve our issue was load  profile

From the load profile, the redo size increased from  169 to 1394, the number of block changes increased from 1207 to 15437 which lead to conclusion that some additional unusual activity is occurring in the database. This became more intriguing and complex since there was no change to apps/database. With this clue, we drilled down further into AWR reports and identified few more  more major variations.

..

.

Let me summarize the above data.

  1. “transaction rollbacks”  increased from  281 to 322,439.
  2. “undo change vector size” increased from 62.2 to 330.7
  3. “user commits” increase from 93K to 679K.
  4. “user rollbacks” remained the same.

Here are some of the key  definitions for wait events discussed in this blog

  1. SQL*Net break/reset : A wait event indicating an error/unhandled exception during execution.
  2. User Rollbacks: Rollback issued by application/user
  3. Transaction Rollbacks: Rollbacks performed by Oracle like a transaction could not be completed because of some constraint violation or so.
Because we see an increase in “transaction rollbacks” and “redo size” ,the evidence is  pointing towards some data issues causing execution  of additional business logic or so but were unsuccessful.  A further analysis proved this theory correct and issue was resolved. We had  inadvertently deleted some test data that caused different flow of business logic from our regular tests.
.
So AWR to rescue again

ORA-14128: FOREIGN KEY constraint mismatch in ALTER TABLE EXCHANGE PARTITION : The solution

I am writing this blog to provide a simple way  to resolve ORA-14128.    I did not find any reasonable documented  solution to fix this issue w/o disabling the constraints.

Here goes my story.  Upgrading from Oracle 10.2.0.3 to 11.2.0.2 introduced few problems mainly due to our high concurrency and high availability OLTP system.  In 10.2.0.3 , our partition maintenance was done online w/o disabling constraints using the approach described below with an example..

Lets assume tables  transaction and sales  with following attributes:

Transaction (Parent/Child) table with transid as primary key and  index on salesid

.
Sales (Parent) table with salesid as primary key

.
Foreign Key Definition: transaction.salesid  references sales.salesid. Also lot of other tables references transid.transaction. Therefore transaction table is both parent and child.

.

Partition Maintenance Approach: Our ultimate goal for partition maintenance is to drop old partitions.  So all I have  do  to issue “alter table transaction drop partition transaction_part_2011”.  This command will only succeed if there are no foreign key constraints referencing this parent table. Otherwise it will result in “ORA-02266: unique/primary keys in table referenced by enabled foreign keys”.  Since partition maintenance is done online, we don’t have the luxury to disable constraints.

Therefore we had to do all the additional steps involved with partition exchange to accommodate online partition maintenance.  And one of the key requirements of partition exchange is that the tables being exchanged must be identical with respect to structure like same indexes , constraints etc.  You also have the option of disabling constraints to get to this goal.

Note: With steps 1-4 we are trying to make transaction_duplicate table identical to transaction

  1.  create table transaction_Duplicate as select * from Transaction where 1 = 0;
  2. alter table transaction_Duplicate add constraint Transaction_dup_pk primary key(transid);
  3. create index salesid_idx on Transaction_Duplicate(salesid) ;
  4. alter table transaction_Duplicate add constraint fk_salesid foreign key (salesid) references sales(salesid);
  5. alter table transaction exchange partition Transaction_part_2011 with table Transaction_Duplicate including indexes without validation;
  6. alter table transaction drop partition transaction_part_2011;

VERY IMPORTANT: Step-4 is optional if the one of the tables used in exchange is empty which is true in our case as we are creating a dummy duplicate table.

Our problems started with Step-4 after upgrading to 11.2.0.2. We started getting  “ORA-00054:resource busy and acquire with NOWAIT specified” because Oracle changed the locking behavior in 11g.

We opened SR with Oracle and as per Oracle, the code fix for bug 5909305 introduces an intentional change in locking behavior and that change is effective from 11.1.0.6 onwards. ie: From 11g onwards it is correct and expected that DML on the child table will take an SX lock on the parent.

For customers that cannot live with the changed behavior the fix in bug 6117274 allows the change in locking to be reverted to pre-11g behavior by setting “_fix_control” to ‘5909305:OFF’.

.

In our case since Step-4 was optional , we removed step-4 but our problems did not stop there.  Due to nature of our business , every change that goes into the database requires a rollback just in case its required. In other words,  we wanted to rollback the partition  exchange by performing another exchange. Now Step-4 became a must for rollback as both the tables are not empty after the 1st exchange(One of the partition is empty not  the table.).

We were back to square one with small difference. Initially we were trying to fix ORA-00054 and now we are trying to fix ORA-14128. After lot of reading , trials and prototypes, I was able to fix ORA-14128. The solution is very simple. All you had to do was put  the referential integrity constraints in “ENABLE NOVALIDATE“. This was OK for our  database. So if you can put your constraints in  “ENABLE NOVALIDATE” , then you have a simple fix for ORA-14128

Thanks for reading.