Lotus Alive: Lotus Administration for Everyone

Apr 7 2012   11:15AM GMT

IBM OS/400 & Domino 8 Issues



Posted by: David Vasta
Tags:
8.5
Domino
Domino 8.5.X
IBM i
Lotus
OS/400
System i
V5R4
V6R1
V7R1

I have been fighting a problem at my current employers environment that has been hard to diagnose. The mail servers run on IBM’s Power Platform and iOS (OS/400) V5R4. The Domino version is 8.5.1 FP4. That said you would think that everything is going to be just fine. There is plenty of memory, plenty of CPU and about 1.5TB of extra disk space. Evertyhing should be perfect, but it’s not and you know this because I would not have started out by saying I was “fighting a problem”.

What was the problem?

There are two servers. One primary and one secondary. The majority of the users would run on the primary until it crashed then as expected would jump over to the secondary. Problem was when they started to move to the secondary the server would run up 300% CPU (Which is ok in a LPAR system if you have the CPU) and eventually just crash as well leaving NO mail server, which is a bad thing.

Ater begging for help from IBM Support I finally got some answers about the issue. Seems there is a problem with OS/400 V5R4 that does not make Domino 8.5.X very happy. Seems the issue lies in the way OS/400 does a look up on the files/DBs and OS/400 adds a 4K record to every record looked up. On a server with over 5000 users and a Domino Directory with 25,000 users and 65,000 Groups that is going to get messy. So when the fail over started the server would go nuts trying to index, compact, failover and do lookups to make sure everything was right and going into this spiral of doom eventually crashing. You add 4K to every record, every lookup and every little document while it is failing over and you get all kinds of issues.

So not only was my primary down, not my secondary was down as well.

So with both server down a few questions come up;

1 . Why and 2. When can we move to Exchange.

Answers;

1. I am working on it and 2. Never!

I am still looking for the IBM Article that shows the problem. I will try to contact IBM Support again this weekend get it. I am sure some of you want to know more.

The fix is simple (I use the term loosley) upgrade the OS to V6R1 or V7R1 and it fixes the problem. But in a big IT environment it’s not that easy. So I have spent the past 6 months building new servers and moving them into place to replace the old one. Still in the process of doing this today and things are going well. The secondary server is doing good and the primary is due for a swap out in the next few weeks.
Now the question is WHY?

Why is this an issue?

Why did IBM fail to fix it years ago and lastly WHY is V5R4 still supported?

I know in the next few months V5R4 will fall out of support. I think it is later this year, but as some of you have pointed out this is unacceptable and the question needs to be asked. Why did IBM not want to invest the time to fix what I consider one of it’s most stable and most robust OS’es (short of OS/2 :-) )

7  Comments on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • berntrop
    Phew. I guess were lucky to have transited to 7r1 almost a year ago... Although we recently diagnosed a networking problem. A real kicker for Designer related performance is to make sure to upgrade ALL files which have an ODS (nsf, ntf, and mail.box) to the latest (must have Create_R85_Databases=1 in the server notes.ini. Then perform a Compact -ODS -* with the server down to convert all in one go. The ones that do not upgrade their ODS need to be made a new replica of: they where an .ns5 db whiche were renamed to nsf, but compact -c will not reset the bit that keeps the db at the older ODS. Made a huge diff in design refreshes, which had been really slow, and creation of new templates on the server (from +2 minutes to 8 seconds)
    0 pointsBadges:
    report
  • David Vasta
    Going to have to look into that. ODS is always something I forget about but is rather important. Once the second server gets to V7R1 I am going to tune everything on it. Get the applications off to an application server and just let it do mail. We are also getting SMTP off the server but that is another post. Is there a Lotus Doc you can point everyone too?
    975 pointsBadges:
    report
  • Spitcher
    David, I would hold IBM support's feet to the fire on this one. I had a few servers on 8.5.X on V5R4 for a while and never experienced this problem. All releases of 8.5.X are supported on V5R4 so they'd best find a solution for this. Migrating to 6.1 or 7.1 is really unacceptable. I would pose this question to the Domino/400 mailing list on Midrange.com if I were you. A few IBM'ers who specialize in Domino on IBM i (Walter Scanlan, especially...he's probably the most knowledgable IBM'er on the topic IMHO) help customers there. Send an email to domino400@midrange.com with a link to your post.
    5 pointsBadges:
    report
  • David Vasta
    Walter is the one who made the recommendation to upgrade form V5R4 to V6R1 or V7R1. The OS problem is fried in these two releases.
    975 pointsBadges:
    report
  • David Vasta
    I will get the tech note this week and post it as well.
    975 pointsBadges:
    report
  • Spitcher
    That's unfortunate. But if Walter says it's the only option then I guess that's it. He's the master.
    5 pointsBadges:
    report
  • David Vasta
    There is a tech note. Walter had to back it up. I trust him but opinions are not allowed in IT without some base in facts. I will get the facts and post them.
    975 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: