Author Archives: alesk

Reactive Performance Management Intensive

I’ll spend next two days in Zagreb attending the 2 day seminar by Craig Shallahamer, titled “Reactive Performance Management Intensive”. I hope Oracle University from Croatia will continue to “hunt” down prominent Oracle speakers and bring them to Zagreb; at least once or twice per year.
I spent this afternoon strolling in the parks and streets of Zagreb enjoying the sunshine, recharging my batteries – (not the batteries for my laptop… for a change ;-).
Anyway, I’ll post a comment to this blog with interesting notes (links) from the seminar that I’ll found useful to share with you. Right now I’m at hotel writing this short blog and preparing for the seminar by reading some papers from OraPub. (For those of you who are not familiar with orapub.com yet, a fair warning, the process of downloading white papers from orapub.com looks a bit awkward at first with it’s “shopping cart” but nevertheless, the papers are free – all that is needed is some patience.)

Simple script for sftp automation on Windows

The main goal for this script was to automate file transfer from Linux box to Windows Server via sftp. Files saved on Linux server are first transfered to Windows server and after that immediately deleted (purged) from the Linux server.
The script will run once per hour and must satisfy some restrictions that I put in place:

  • it must run on any Windows Server 2003 with minimal dependency on third party tools
  • it should not depend on WSH or Powershell – a plain old DOS script is a preferred solution in this case
  • I was tempted to write script with python but that would cause the script to end under my “support jurisdiction” and I don’t want that to happen ;(. It must be as simple as possible, so that others will not have an excuse not to take over the script for further enhancement – you know who you’re! ;-)
  • file transfer must be encrypted using ssh protocol
  • no dependency on commercial tools

At first, I considered using Windows port of rsync, cwRsync, but decided not to, mainly because of the cygwin1.dll conflict with existing CopSSH installation and possible with other open source tools that rely on cygwin.

My final decision was to write simple DOS script that’ll depend only on OpenSSH (ssh, sftp). For now, I didn’t put any error notification ability in the script. I’ll probably make a few changes for production version of the script (at least logging, error notification and public key authentication) but for now it is what it is – a draft version .

echo off
:: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: AlesK
:: Simple script for transfering/purging files with SFTP  
:: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::
:: Global variables
::
set sshuser=oracle
set sshhost=192.168.1.10
set remotedir=/home/oracle/work
set batchcopy=batchcopy.sftp
set batchdelete=batchdelete.sftp
::
::
:: Generate two batch files, one to copy files (get...) from remote host
:: and second one to delete files on remote host (rm...), that'll be
:: used for sftp with option -b <batchfile>
::
::
:: Initialize batch files that are needed by sftp
::
echo # Generated by test.cmd > %Batchcopy%
echo # Generated by test.cmd > %Batchdelete%
::
::
:: After initializing batch files we generate get and rm statements
:: with simple FOR statement. 
::
for /F %%i IN ('ssh %sshuser%@%sshhost% 'ls -1 %remotedir%'') DO (
                echo get %remotedir%/%%i >> %batchcopy%
                echo rm %remotedir%/%%i  >> %batchdelete%
                )
::
:: So far so good. While writing and testing the script I didn't use public key authentication,
:: instead I chose to enter password each time I run the script. 
::
:: At workplace I have an older version of OpenSSH (3.7.p1) that worked flawlesly as such:
::
:: cmd> sftp -b batchcopy.sftp sshuser@sshhost   ...and sftp asked me for a password
:: 
:: At home I'm using newer version of OpenSSH (4.7p1) and the same line of code failed with the error 
::
:: Permission denied (publickey,password)
::
:: After some RTFM and Googling (what else?:), I "discovered" that this is a perfectly 
:: normal behavior, since sftp is not reading from stdin at all (obviously this was not
:: the case in OpenSSH 3.7).
:: The workaround that I found on the Net is not intuitive at all; with option -o we turn batchmode 
:: off immediately before we call batch file with -b. Hmm...what a logic!
::
::
:: Get files with sftp in batch mode (for testing purposes only I prefer password authentication, 
:: in production it's better to use public key authentication)
sftp -o "batchmode no" -b %batchcopy% %sshuser%@%sshhost%
::
:: check the errorlevel, if 0 then sftp completed successfully and we can delete files on remote host
if %errorlevel% EQU 0 goto delete
goto end
::
:delete
sftp -o "batchmode no" -b %batchdelete% %sshuser%@%sshhost%
goto end
::
::
:end
echo on

Job hang with Wait Event: log file switch (private strand flush incomplete)

If only someone will give us a buck for each bug that we “(re)discoverer” in 10g (precisely 10.2.0.x). I wonder if 10g (R1 & R2) is (was) really a production level product at all. Sometimes, I feel working with 10g is very much like working with a Beta product – just an expensive one :-(.
I can only hope that Oracle will keep patching 10g R2 to at least patchset 10.2.0.8 – yeah, I know, a zero chance for that to happen. I guess we’ll have to live with tons of bugs for several years.

Today, I had to kill hanged job that was spending hours waiting for the event: log file switch (private strand flush incomplete).

Perhaps you noticed occasional “Private Strand Flush Not Complete” message in alert.log, that can be safely ignored, as described in Metalink note 372557.1 Alert Log Messages: Private Strand Flush Not Complete, this notice can for example follow after you manually switch log file.

Our scenario is different:

  • application developer started job via dbms_job
  • inside job execution time frame, daily RMAN incremental backup started and finished (in approx. 10 minutes). Part of the backup job is also a log switch – nothing more than a detail that I think is worth mentioning
  • RMAN job completed successfully, but the other job that was running simply hanged waiting on above event for several more hours – I could not confirm that the deadlock happened at exact time when log switch that is part of RMAN daily backup job kicked in
  • just for the record, everything was ok with archiver and I/O subsystem

I’m not applying that log switch that was part of RMAN job “locked” other process running the job. It’s just the fact that at the time of the hang only job submitted with dbms_job and RMAN backup were active, perhaps it’s just a coincidence and those two events are not related at all!?

Anyway, a quick search on Metalink revealed the recently filled bug that resembles our case very well:

6806770 LGWR SPINS WHEN OTHER PROCESSES ARE WAITING FOR ‘LOG FILE SWITCH’

What worries me is that bug is somehow connected with a bag of 10g so called “new features behind the scene” – one such feature is In Memory Undo (IMU) and that only workaround proposed is to disable IMU by setting _in_memory_undo = FALSE.

How unfortunate is that? I was just recently reading an excellent white paper written by Craig Shallahamer about IMU.

For now, I decided not to turn IMU off – but if the problem persist then I’m afraid we’ll have to turn In Memory Undo off. (It’s becoming some kind of a folklore – get to know the cool new features, then turn them off and wait until they’re debugged:-).

Destroying data on Windows – for free :)

I’m in the middle of a process to prepare our venerable SAN disk subsystem for retirement. It served us well for the last six years. Due to confidentiality of data once stored on this system I opted to erase data by first formatting all logical volumes at OS level, followed by erasing data with specialized tool that writes random data with the choice of several runs and finally formatting all volumes low-level with SAN management tool. Step two and three are time consuming parts of the process, that will take approx. 3 full days. And I’m wiping out a mere 1.6TB system!
If you’re looking for specialized tool for destroying data I suggest that you start here. Depending of the platform choose the most appropriate tool. I tried Eraser for Windows and it performed well. I think this is another tool to add to my list of the free open source software that I use.

EventCombMT – free Windows event log search tool

I was looking for a tool that can help me search Event Log’s on the number of servers for particular event. My first thought was to do an “exercise” by writing python script, when a colleague of mine sent me a link to Microsoft article How to use the EventCombMT utility to search event logs for account lockouts. EventCombMT is part of the “Account Lockout and Management Tools” but it can be used for generic log event searches. After downloading the pack and starting the EventCombMT, my first impression was that this will not suite my need as it’s a GUI and I would prefer a command line version of the tool. Hopefully, I checked the Help before dumping the tool, where I found out that we can easily use the tool from the command line as well.

Let’s see EventCombMT in action:

We want to search Oracle server (ORASERVER1) Event log for critical error that indicates that Fibre Channel path connected to our SAN is down. I’m searching for Event 5 with the text similar to this one: “Path1 removed from multipath device nn by MPIO”.

cmd> EventCombMT /s:oraserver1  /evt:"5" /et:we /log:sys /outdir:"C:\temp" /t:1 /after:01152008120000 /before:02122008120000 /start

The entire line must be executed as a single cmd line!

Short explanation of the switches:

  • server /s (we can specify file with the server names instead)
  • we’re searching for event #5 (/evt:”5″) that is associated with the error, such as:
    sdddsm “Path1 removed from multipath device nn by MPIO”
  • we’re interesting in two event types; Warning or Error (/et:we)
  • we want to limit the search to System part of the event log (/log:sys)
  • how many threads should EventCombMT use for the search (/t:1)
  • time interval for the search (/after, /before) in the format MMDDYYYYHHMMSS (this format is mandatory)
  • we want to execute the search from the command line (/start)

The command will produce two files, the first one is called EventCombMT.txt and it’ll look something like this:

Find Events After: Tue Jan 15 12:00:00 2008 
Find Events Before: Tue Feb 12 12:00:00 2008 
Searching System Logs
Event IDs:   5
No Event Text specified.
No Event Source specified.
No Between Event IDs specified.
Will Search the following servers:
oraserver1
To find these events we'll need a search running. It has already begun....
 
Spawning Thread for: oraserver1
Thread Running for: oraserver1
Opening: C:\temp\oraserver1-System_LOG.txt
Number Of Records for the System log on oraserver1 is 1248
Total Bytes Read ending with the System log on oraserver1: 189124
C:\temp\oraserver1-System_LOG.txt contains 22 parsed events.
Exiting thread for: oraserver1
All threads Scheduled to run are running.
Total events searched: 1248
Total matches found: 22
Servers/Logs Searched: 1
DLL Cache Contained: 0
SID Cache Contained: 0
Start time: Tue Feb 12 16:12:51 2008 
Finish time: Tue Feb 12 16:12:51 2008 
True records per second: 1248.00

and the second file oraserver1-System_LOG.txt that will contain events that were found in the system log:

5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 13 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 12 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 11 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 10 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 9 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 8 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 7 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 6 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 5 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 4 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 3 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 2 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 1 
5,WARNING,sdddsm,Sat Jan 19 18:57:41 2008,No User,  1 0 
5,WARNING,sdddsm,Fri Jan 18 16:25:52 2008,No User,  1 13 
5,WARNING,sdddsm,Fri Jan 18 16:25:52 2008,No User,  1 12 
5,WARNING,sdddsm,Fri Jan 18 16:25:51 2008,No User,  1 11 
5,WARNING,sdddsm,Fri Jan 18 16:25:51 2008,No User,  1 10 
5,WARNING,sdddsm,Fri Jan 18 16:25:51 2008,No User,  0 13 
5,WARNING,sdddsm,Fri Jan 18 16:25:51 2008,No User,  0 12 
5,WARNING,sdddsm,Fri Jan 18 16:25:50 2008,No User,  0 11 
5,WARNING,sdddsm,Fri Jan 18 16:25:50 2008,No User,  0 10 
C:\temp\oraserver1-System_LOG.txt contains 22 parsed events.

At this point I can write a simple script (with python, of course;-) that’ll check for a presence of certain event in the logfile and send SMS alert to my friendly Motorola Tamagotchi.