If you’re an avid user of Grid Control, like me, then you’ve probably fallen in love with the ability to run OS commands on multiple machines from GC. It’s a great tool!
For example, I often want to know how much space my Archive Logs are taking up – easy! Since we have a very uniform environment, I’m able to query the /oracle/$ORACLE_SID/oraarch mount point and get the results ASAP.
df -h | grep arch
Easy enough right? Here is the problem… as you get more advanced, Oracle has hardcoded the shell the Agent uses!! It DOES NOT take the shell of the process owner, in our case the owner is the OS user Oracle and his shell is /bin/ksh.
This may not affect you on “df -h | grep arch” but it most certainly will affect you if you you get into special characters! Let’s say you want to use Grid Control to see if all of your /var/opt/oracle/oratab files have the autostart parameter set to “Y.” This is a fair test, I think.
From the command line, it’s as simple as…
[oracle: /oracle] grep -v ^# /var/opt/oracle/oratab
Let’s create an OS Command job that is exactly what we did from the command line…
Here is the output if you do the same thing in Grid Control… what gives??
The interpreter is hardcoded a /bin/sh
On Solaris this is the Bourne Shell. On Linux this is the BASH Shell
Verified that the interpreter is /bin/sh and that the differences between Linux and Solaris
You can verify that a couple different ways… as you can see, the second item works as this shell requires double quotes to accept special characters. If you copy that code into an OS Command Job it’ll work just fine.
[oracle:<AD1> /oracle] echo $SHELL
[oracle:<AD1> /oracle] sh
[$LOGNAME:<$ORACLE_SID> $PWD] grep -v ^# /var/opt/oracle/oratab
[$LOGNAME:<$ORACLE_SID> $PWD] grep -v "^#" /var/opt/oracle/oratab
Or, I was able to create an OS Command Job (to run “ps -ef | grep rman”) and quickly run the same command “ps -ef | grep rman” on the host I told it to execute against to see if it’s executing.
[oracle:<AD1> /oracle> ps -ef | grep rman
oracle 8167 8163 0 11:05:52 ? 0:00 grep rman
oracle 8163 8161 0 11:05:52 ? 0:00 /bin/sh -c ps -ef | grep rman
You can see how the Agent is executing the command.
Lucky for me, I was executing something simple and harmless, but think if I was trying to edit a configuration file across my environment? When I found this error, I was doing just that! Fortunately, it was just my crontabs and I could easily (relatively) go back and fix them. I was very very very lucky it wasn’t something more serious.
I’ve been working with a great Oracle Support Rep, Peter, but unfortunately he gave me some bad news on Wednesday regarding this SR:
The enhancement request has been rejected. It was suggested that script version of host command be used instead and specify which shell interpreter you want to use in the “Interpreter” field.
Don’t think there is anything more I can do on this.
You OK for me to close this SR now?
But I don’t go down without a fight. I asked for an escalation of this issue
I think this is a very important issue and I think Oracle development needs to recognize it and FIX IT. It’s easy to put a note in the Solaris SPARC README saying “oh, remember that this OS uses the Bourne Shell while Linux defaults to the BASH Shell.” But that would be a horrible remedy, in my humble opinion. That’s a band-aid, not a solution. This is a problem that needs a solution.