Blog Archives

IBM ServeRAID Manager Agent – memory leak

Facts:

  • server IBM x3950
  • Windows 2003 x64, SP2
  • IBM ServeRAID Manager 8.40

While working on a couple of Oracle servers running on IBM x3950, I noticed extremely large memory consumption of the RAIDSERV.EXE process that belongs to “IBM ServeRAID Manager Agent”. It’s a 32-bit process running inside WOW64. On one server 800MB was allocated to raidserv.exe and 1600MB on another. Immediately after service startup, 22MB memory allocation was observed, then slowly increasing. Can’t tell for sure how fast/slow memory leaking is progressing – I’ll know in a couple of days.

Critical bug in ArcServe Agent 9.0

If you still have ArcServe 9.0 (confirmed on Build 2050) Agents around the server farm, be very careful if you backup mounted volumes on Windows 2000/2003. This is usually the case on database servers to avoid the letters for the individual volumes.

Let’s say that you have the following mount points:

D:\ORADATA\ORADB\DATA01 —> pointing to VOLUME1
D:\ORADARA\ORADB\DATA02 —> pointing to VOLUME2

You prepare backup in ArcServe as usual, connecting to the Agent, selecting above directories, running the backup and everything seems kosher….until you try to do a restore. At that point you can find out that mount point D:\ORADATA\ORDB\DATA02 contains the files from some random volume. Yes, it means that backup is useless. No error, everything is working fine, it’s just that ArcServe Agent 9 gets somehow wrong information about the volume. For example out of 10 mounted volumes, nine of them will be backed up correctly and one will have unreasonable content – from another volume. One workaround is to add a drive letter with Disk Manager to the problematic mount point, then backup the drive by letter not by mount point.

I couldn’t find any official CA bug note, all I know is that this doesn’t happen with ArcServe 11.5 (SP3) agent.

The moral of this story is to test your restore procedures as much (or more) as you do the backup itself.

Regards,
Ales

ORA-600 … While Upgrading Or Patching Databases To 10.2.0.3

If you’re upgrading (or patching) database created as 32-bit to Oracle 10.2.0.3 (64-bit) make sure you read and understand the Metalink note: 412271.1 ORA-600 [22635] and ORA-600 [KOKEIIX1] Reported While Upgrading Or Patching Databases To 10.2.0.3.

Without installing the necessary patch, on top of 10.2.0.3 and before you start your upgrade, you’ll likely hit the bug which will end in corrupted database.

To check, if your database was created as 32-bit, Oracle is suggesting examining string returned by query:

sql> select metadata from sys.kopm$ ;

It you find B023 in the string then database was created as 32-bit, otherwise you’ll find B047.

Metalink note is also telling us that this bug will happen if we patch the 32-bit 10.2.0.2 release to 10.2.0.3 (64-bit). I performed several tests (on Windows x64, with Oracle EE 10.2.0.3 and without any Patch on top of that!) and could not reproduce the ORA-600 error.

Btw. performing the word size change and patching at the same time is not recommended by Oracle – but nevertheless a valid option, otherwise I would expect from Oracle to prevent us from doing that in the first place, by including check in upgrade script or something similar.

Despite of my test results that are suggesting that ORA-600 will not happen during upgrade of ours 32-bit 10.2.0.2 databases, I’ll (of course) follow the recommendation from Oracle and install the Patch 5 (or higher) on top of 10.2.0.3, before I’ll run upgrade/patch script.

For a long time, I thought Oracle will never provide the patch for Windows platform and that we’ll have to wait for 10.2.0.4 release.
(Un)fortunately, Oracle updated the Metalink note with the information about the fix for the bug being included in Patch 5 (and higher) for Windows platform on 14th of September 2007 – four months after the release of Patch 5. May I say, that I’m not a happy camper! :-(

Regards,
Ales

Exceptional High Memory consumption of Oracle 10.2.0.x on Windows EMT64

While testing 64-bit Oracle EE 10.2.0.2 on Windows 2003 SP1, EMT64 I noticed exceptional high Virtual Memory usage of oracle.exe process during the startup (for example if SGA was sized at 1.5GB, then 4-6GB of VM memory will be reported by Task Manager at instance startup) .
I thought Oracle fixed the problem with memory management on EMT64 in patchset 10.2.0.3 where they claim they fixed the bug 5205552 “EXCEPTIONAL HIG VM SIZE USAGE FOR ORACLE.EXE ON 64 BIT X86 WINDOWS PLATFORM”.
Perhaps they did, but there is another bug left in the code and (at least in my case) quite easily reproduced.

Some facts:

– reproduced on Windows 2003 x64, Enterprise Edition with SP1/SP2
– reproduced with 64-bit Oracle 10.2.0.2/10.2.0.3/10.2.0.3 + Patch 5 bundle for Windows x64
– tested on three different servers (one Dell and two IBM servers)

Steps:

1) make sure you don’t have the following line in the SQLNET.ORA (server side of course):

NAMES.DIRECTORY_PATH= (TNSNAMES)

2) try to create some dummy database link with no TNS entry present in TNSNAMES.ORA, such as:


sql> connect scott/tiger

sql> create database link dummy connect to dummy identified by dummy using 'dummy';

3) the above statement will “hang” for awhile; at this time observe the “Virtual Memory Size”, “Peak Memory Usage” and “Page Faults” within Task Manager for the oracle.exe process. You’ll likely see the excessive growth of Memory usage. Roughly three times the SGA will be used during create database link statement, before Oracle returns control to the user. Imagine this happening on production server where two or three users at the same time send rogue create database link command to the server? They can easily bring the server down.

The only workaround known to me is to make sure that NAMES.DIRECTORY_PATH is present in sqlnet.ora (server side), such as:

NAMES.DIRECTORY_PATH= (TNSNAMES)

I also noticed that this bug is semi reproducible, for example in my case the first create db link will show excessive memory growth, then two or three similar statements will go smoothly, then again, one statement will cause excessive memory allocation etc.

Regards,
Ales

Another day, another bug … Bug 4732503 – Self-deadlock on TT enqueue

Sequence:

0) Oracle 10.2.0.2 EE
1) user scott hit his tablespace quota on tablespace users; he remains connected to the instance
2) DBA tries to add some space to the schema: alter user scott quota 2000m on users;
3) DBA session with the alter user statement will hang until scott session is not closed (or killed)

There are several variations of this scenario: alter tablespace add datafile, allocation of undo segments in undo tablespace etc.
Fixed in 10.2.0.3 / 11g R1.

Regards,
Ales