Phantom locks

6735 pts.
Tags:
AS/400
iSeries
OS/400
SAVLIB
Our nightly backups consist of a series of saves to save files, followed by a save of the save file library to tape. For the past 10 days or so the program has been receiving CPF3761 - Cannot use FILE EC#FILE in SAVELIBS. milliseconds after the SAVLIB command is issued. The second-level text reads FILE EC#FILE is locked by another job, or another save or restore operation is in progress for this object or for library SAVELIBS. A WRKOBJLCK OBJ(SAVELIBS/EC#FILE) OBJTYPE(*FILE) OUTPUT(*PRINT) issued just prior to the SAVLIB tells me that no locks exist for the specified object. An ALCOBJ OBJ((SAVELIBS *LIB *EXCLRD)) issued just prior to the SAVLIB has no effect. The only time CPF3761 didn't make an appearance was after I'd deleted the EC#FILE save file from the library (the program will create a save file if one doesn't exist). Reviewing the system logs for the times in question don't show anything that could be locking the save file. The scheduled run time has been changed (twice) to try and avoid anything missed in the logs, to no effect. I'm running pretty low on ideas, so any suggestions would be appreciated.

Software/Hardware used:
iseries, os/400

Answer Wiki

Thanks. We'll let you know when a new response is added.

Managed to leave out the acronym IBM at the end of that first sentence.

Discuss This Question: 29  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • TomLiotta
    Can you turn object auditing on for EC#FILE to see if ZC or ZR entries show up in the audit journal? Tom
    125,585 pointsBadges:
    report
  • Splat
    DSPAUDJRNE provided these entries:
    ZC DAYEND     EC#FILE    SAVELIBS   *FILE    BACKUP     DAYEND     283615 2010-05-28-03.55.43.392816
    ZC DAYEND     EC#FILE    SAVELIBS   *FILE    BACKUP     DAYEND     283615 2010-05-28-03.55.49.996512
    
    ZR DAYEND     SECONDARIE EC#FILE    *QRYDFN  BACKUP     DAYEND     283615 2010-05-28-03.55.44.766304
    From the timestamp, it's evident that these are from the save to save file SAVLIB for that library. The save to tape SAVLIB command executed at 05:23:40.584984, and the CPF3761 message appeared at 05:23:40.985064.
    6,735 pointsBadges:
    report
  • Splat
    The text editor apparently didn't like the ZR segment. Let me try that one again.
    ZR DAYEND     SECONDARIE EC#FILE    *QRYDFN  BACKUP     DAYEND     283615 2010-05-28-03.55.44.766304
    6,735 pointsBadges:
    report
  • Splat
    ZR DAYEND SECONDARIE EC#FILE *QRYDFN BACKUP DAYEND 283615 2010-05-28-03.55.44.766304
    6,735 pointsBadges:
    report
  • DanTheDane
    Could this be a job-owner issue !! - check JOBSCDE entry parameter USER ; - does this user have proper authority to use program ? - just a quick thought. good luck DanF
    2,555 pointsBadges:
    report
  • Splat
    Dan, the user has the requisite authority; it owns the objects, creates/clears the save files, and isn't glitching on any of the other save files I'm digging through the application vendor's files to see if they're using one of those to drive a process I haven't identified (getting information from the vendor is like pulling teeth at times).
    6,735 pointsBadges:
    report
  • TomLiotta
    For the past 10 days or so the program has been receiving CPF3761... What is the program doing when the message is sent? ...a series of saves to save files, followed by a save of the save file library to tape. Is doing one of the "series of saves" or is it doing the "save of the save file library"? I don't yet think it matters, but the message mentions two possibilities -- "FILE EC#FILE" or "library SAVELIBS". There's a chance that the library isn't quite available yet. Just an additional item to look into. To9m
    125,585 pointsBadges:
    report
  • Splat
    Tom, What is the program doing when the message is sent? It's starting the execution of the SAVLIB LIB(SAVELIBS) DEV(TAP01) ENDOPT(*LEAVE) SAVACT(*LIB) Is doing one of the “series of saves” or is it doing the “save of the save file library”? The save of the save file library. It ran at 05:23:06.763280 this morning; the CPF3761 appeared at 05:23:07.122096. I can't see that the SAVELIBS library is unavailable, though I'll look for anything that may be holding on to it. I tried allocating it *EXCLRD and received no indication that it was locked by any other job. A WRKACTJOB OUTPUT(*PRINT), executing just prior to the above SAVLIB, doesn't show anything that looks as though it would be tying up either the SAVELIBS library or the EC#FILE save file in that library. Thanks for the suggestions.
    6,735 pointsBadges:
    report
  • Rayj1031
    Two thought come to mind. Idea number 1: Add the following MONMSG CPF3761 to your program and see who the culprit is.
     MONMSG MSGID(CPF3761)  EXEC(WRKOBJLCK OBJ(SAVLIBS/EC#FILE) OBJTYPE(*FILE) OUTPUT(*PRINT))                                          
    
    Idea number 2 - if you are on a OS/400 V5R4 or above investigate virtual tape instead of save files. Virtual tape works similarly to save files except that you can save multiple objects and multipel libraries to a single tape. The concept of virtual tape is like haveing a virtual tape library on disk.
    335 pointsBadges:
    report
  • Splat
    Rayj1031, I've got a global monitor (CPF0000 MCH0000) in the CL - it picks up that there's an error, but CPF3761 is a level 20 message so it's not preventing the SAVLIB to tape command from executing. (This morning the SAVLIB command started at 05:23:53 and finished at 05:56:16.) I'd placed a WRKOBJLCK OBJ(SAVELIBS/EC#FILE) OBJTYPE(*FILE) OUTPUT(*PRINT) in the error trap, but to date it's shown nothing. I'll plug a MONMSG in right after the SAVLIB command and see if it produces anything. I'm not familiar with virtual tape (we are on V5R4), but it sounds like something worth looking into after I figure out what's gone wonky here.
    6,735 pointsBadges:
    report
  • Rayj1031
    Even with a Global MONMSG, a specific MONMSG immediatly following the SAVLIB command should take presidence. If the error CPF3761 is issued, the MONMSG immediately following the SAVLIB should create a printed report. If that doesn't work try this:
    SAVLIB SAVLIB LIB(SAVELIBS) DEV(TAP01) ENDOPT(*LEAVE) SAVACT(*LIB)
    MONMSG CPF3761 EXEC(DO)
      CHGJOB OUTQ(MYQ) LOG(4 00 *SECLVL) LOGCLPGM(*YES)       
      WRKOBJLCK OBJ(SAVLIBS/EC#FILE)              
      SNDPGMMSG MSG('CPF3761 was issued')    TOMSGQ(*HSTLOG) 
      WRKACTJOB OUTPUT(*PRINT)
      anything else you would like to know .....
    ENDDO    
    ...next step in program.
    
    The SNDPGMMSG will put a marker in your history log as well. I often use this technique to find thing in the history log. Also I suggest you add the following line to the beginning of the program if you haven't already: CHGJOB LOG(4 00 *SECLVL) LOGCLPGM(*YES) This provides the most error information from the job log. -- different subject- Virtual Tape works great at V5R4. - The advantages of using Virtual Tape is that you can save multiple libraries to a single virtual tape cartridge and you can make your virtual tape cartridges the precise size of your real cartridge so that when you copy your saved tape imges to media they fit on the cartridge. If you have used virtual CD images to install PTF's or release upgrades, then Virtual Tapes are very similar.
    335 pointsBadges:
    report
  • TomLiotta
    MONMSG CPF3761 EXEC(DO)
    There's not much point in monitoring for CPF3761 since it won't be caught. It's not sent as an *ESCAPE (nor *STATUS nor *NOTIFY) message. The CPF3761 would need to be received from the program (or job) message queue to see if it exists. That also implies that its attributes need to be carefully examined to ensure that it wasn't sent from any prior SAVxxx in the same job. I tried allocating it *EXCLRD and received no indication that it was locked by any other job. Did you try *EXCL instead? An *EXCLRD lock won't prohibit all locks by other jobs. Tom
    125,585 pointsBadges:
    report
  • Splat
    Rayj1031, I plugged in the MONMSG for the CPF3761 with a WRKOBJLCK to print. Unfortunately, Tom is right - nothing was produced. The job description is set to (4 00 *NOLIST), so I've been getting job logs (the error trap generates a program dump), I just can't figure out what in blazes is going on from what I'm reading. Tom, the system wouldn't allow me to put an *EXCL lock on a library: CPF0979 (Lock *EXCL not valid for object type *LIB.)
    6,735 pointsBadges:
    report
  • TomLiotta
    Tom, the system wouldn’t allow me to put an *EXCL lock on a library: Sorry, I misunderstood. I was thinking that you were trying to allocate the object that was giving you a problem, not the library. (Locking the library *EXCLRD won't make any difference for the problem you're having. That lock status on the library won't interfere at all with other jobs that might use objects inside of the library unless those jobs are creating or deleting objects.) What I was thinking was that you'd put *EXCL against the SAVELIBS/EC#FILE *FILE object to see if you could lock it exclusively. If the attempt failed, then you'd run WRKOBJLCK OUTPUT(*PRINT) inside of the MONMSG DO-group against it to capture the conflicting locks. And if it succeeded, you'd immediately release it with DLCOBJ. Actually, in addition to WRKOBJLCK, I might also do something like WRKACTJOB *PRINT in the DO-group. It might be worthwhile having a clear list of everything that was running at that same point in time. Tom
    125,585 pointsBadges:
    report
  • Rayj1031
    OK. I think I see the question. The object does not get saved but the error is not an escape message so MONMSG ignores it, right? Change the job logging level in the program just before you do the savelib and see if you get any additional information.
    CHGJOB LOG(4 00 *SECLVL) LOGCLPGM(*YES) 
    
    If the log doesn't provide any more information I have one other trick that I have used. I have captured this kind of information using two jobs, where one job monitors the other jobs message queue and when the message appears in the first job's message queue, the second job does the wrkobjlck command. Its kind of clunky I have successfully captured similar, (but not the same) information. Let me know if you are interested in the details. It is an interesting bug.
    335 pointsBadges:
    report
  • Splat
    Tom, surprisingly it seems the system objects to using the ALCOBJ command against a save file, resulting in a CPF0984. Rayj1031, I'm getting both job logs and program dumps. The problem is that I can't make any sense out of the information provided - the diagnostic message CPF3761 indicates the save file is locked, but a WRKOBJLCK shows nothing and a WRKACTJOB shows no identifiable conflicting jobs. Getting tired of making dents in the wall with my head, I called IBM. According to the guy I talked to the system has an aborted save logged (dunno where that came from - we haven't had any aborted saves) which is triggering the 'lock' on that particular save file. He said an IPL should clear it out. Since we've got a system save planned for this weekend, I'll find out whether an IPL sets matters to the right come Monday morning.
    6,735 pointsBadges:
    report
  • TomLiotta
    ...surprisingly it seems the system objects to using the ALCOBJ command against a save file, resulting in a CPF0984. I agree. I guess I've never tried it before. ...the system has an aborted save logged... Sounds plausible. If so, I'd wonder if it could be deleted/recreated just to see the result. OTOH, if the IPL clears it up, that's also worth knowing. If it clears up, I suggest that you log the 'Answer' for your question. I also suggest that a Design Change Request (DCR) be entered to get some kind of command, API or tool created that could unlock the condition in order to avoid having to wait for an IPL that might be weeks away. Tom
    125,585 pointsBadges:
    report
  • pdraebel
    I remember having similar after an ENDJOBABN being issued to a Backup job. Seems that after that event a system object QSR.LOCK doesn't get cleaned up. The solution was either an IPL of the Following command : DMPSYSOBJ OBJ(QSR.SRLOCK) CONTEXT(QSYS) TYPE(0E) SUBTYPE(EF) Hope this helps, Peter
    2,570 pointsBadges:
    report
  • Splat
    Peter, I went through the DMPSYSOBJ OBJ(QSR.SRLOCK) CONTEXT(QSYS) TYPE(0E) SUBTYPE(EF) with. The primary associated space showed 00000001 in the first column of offset 000000, and zeros everywhere else, so it didn't give us any good direction. Yours isn't the only mention about an ENDJOBABN being issued against a save process; I'm a bit concerned because I should be the only one with the knowledge and authority to do such, and I can say with some confidence that I didn't.
    6,735 pointsBadges:
    report
  • Splat
    Managed to leave 'IBM' off of the end of that first sentence.
    6,735 pointsBadges:
    report
  • Splat
    Update. IPLed the system on Sunday, with a RCLSTG tossed in for good measure - no problems with the RCLSTG or the system save. Checked the job log for the nightly backup on Monday; the problem persists. Contacted IBM & sent the job log, program dump, etc. for them to look over. To ensure I had a backup of the file library in question, I added code to create a second save file (EC#FILE_1) and save the file library a second time. I figured this way I could be reasonably confident that I had something to fall back on if things hit the fan. Looking over the job log this morning, it wasn't surprising to see the familiar CPF3761 message. What was surprising was to see that it now read Cannot use FILE EC#FILE_1 in SAVELIBS. Sent the job log, program dump, etc. so IBM can puzzle over this development. Peter, thanks for posting the DMPSYSOBJ command; having it here has saved me some scrounging.
    6,735 pointsBadges:
    report
  • Rayj1031
    I would be interested in the resolution , however after loosing that much blood on the wall from beating my head against it, I would at least consider virtual tape. Start by saving one library to one virtual tape cartridge. Then copy the virtual tape image to a real tape cartridge using the DUPTAP command. Then delete your test library then restore the test library from tape. Once you have done that exercise you will understand and trust virtual tape. I realize that this may seem like a significant change, but try virtual tape with a single TEST library then analyze how much work it would be to change your current backup process. It might surprise you. It might surprise me. ?? Another advantage is that virtual tape images can be FTP's to another server for archival.
    335 pointsBadges:
    report
  • Abigail
    Try: ===> SAVLIB LIB(SAVELIBS) DEV(TAP01) ENDOPT(*LEAVE) SAVACT(*SYNCLIB) SAVACTWAIT(10) SAVACTMSGQ(QSYSOPR)
    645 pointsBadges:
    report
  • Splat
    Abigail, I tried the save to tape with SAVACT(*SYNCLIB). Unfortunately it didn't make any difference - the program still fell over with the CPF3761. IBM claims the problem lies with the iTera HA software. I've given Vision the information and am waiting to hear back.
    6,735 pointsBadges:
    report
  • TomLiotta
    ...HA software. This is a HA environment? Well, in that case, that's the first place to check. for future reference, that's an element that should be mentioned in the beginning for many (unexpected/unexplained) lock-related issues. Vendor contact is the best step. Tom
    125,585 pointsBadges:
    report
  • Splat
    Tom, you're right, I should have made note of it. I didn't mention it because I couldn't see how it was related in any way; I've been able to trace its effects elsewhere, so I assumed (mistakenly) that as I couldn't see any interaction there was none. I've been in touch with iTera & will be trying a different approach to the backups. BTW, iTera didn't have an explanation for why WRKOBJLCK didn't show anything.
    6,735 pointsBadges:
    report
  • TomLiotta
    I should have added a 'smiley' to my previous comment. Experience with issues from a lot of customer sites makes me perhaps over-sensitive to HA environments. There's nothing wrong with them; they just tend to expose odd bits of the system that few of us experience under normal circumstances. Unexpected/unexplained locks seem to top the list. The vendors almost always seem plenty knowledgeable on recovery or avoidance or whatever is needed. Tom
    125,585 pointsBadges:
    report
  • Splat
    Tom, no need for a sweetener. I'd left out something important. After talking to iTera, I pulled the library out of the replication. I added code following the save to tape to push the objects to the backup machine using an iTera command for objects not being replicated. For the first time in more than a month, the backup ran without a hitch. Thanks to everyone for their suggestions - there are some I still plan to investigate now that I've got the main problem taken care of.
    6,735 pointsBadges:
    report
  • Rayj1031
    If you consider what iTera is doing (watching every object on the system and capturing every change that occurs) an unexpected lock on an object seems very reasonable now that you know what the problem is. :-) iTera is just doing what it does knows how to do, watching everything on the system, kind like a phantom, looking (and apparently locking) things as he stealths around checking for changed objects.
    335 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following