With any upgrade of code there is always the chance of the microcode getting “confused”. Have there been any changes recently in the way the system is accessed? Have programmers changed any of the system values or tried to modify any native IBM commands(RSTLIB etc) with code of their own? I worked Disaster Recovery for 7 years and we had certain clients that required us to wipe our Test system 3 times to get it back to a normal state because they modified where IBM does not recommend.
<i>The first time I got the code that indicated there was a bad disk controller.
The second time I got the code that indicated two disk drives wer missing.</i>
I assume when you say you “got the code” you mean a SRC code and that you called IBM hardware (not software) support at that time. And you have logged two separate support calls.
At this point, ensure that all problem log issues are cleared. All suggested PTFs have been applied and all corrective actions have been taken. If parts are called out, send the problem reports to IBM through the problem log.
DSPPTF to an outfile. Query it for any that have actions that still need to be taken. Perform the actions.
If you have HiPers to apply, load and apply them at least a week before the cume. You want to run clean for a few days, and a week is a good length of time. If any HiPers apply, you’ll want to ensure again that no actions are missed.
Plan the cume for at least 6-8 weeks after it’s available. Most issues will have been found by someone else by then. Relevant HiPers will then become available, so that’s when you’ll order them, load them and apply them. (And run for a week.)
The plan will include enough time for problems when the cume applies. If a SRC code appears, call hardware support. With two previous incidents, ‘confused’ microcode is not going to be acceptable. The only resolution has to be the method of clearing it up.
Or replacing your disk controller (or whatever is sending the intermittent error report).