So, today in the morning we received this entry in our syslog server:
Feb 27 17:15:58: %STACKMGR-4-STACK_LINK_CHANGE: Stack Port 1 Switch 2 has changed to state DOWN
Feb 27 17:15:58: %STACKMGR-4-STACK_LINK_CHANGE: Stack Port 2 Switch 3 has changed to state DOWN
naturally, we get the feeling that the stack link went to down. but why and how? these questions remain un-answered. Further reading in to CIsco documentation. it showed that the only reason it could happen is for faulty connection. usually a bad cable. To confirm that the link is down we used the following commands
Switch#show switch detail
Switch# Role Mac Address Priority State
*1 Master 0019.e71f.6a80 10 Ready
2 Member f4ac.c14e.3100 1 Ready
3 Member 04fe.7fc5.2980 3 Ready
Stack Port Status Neighbors
Switch# Port 1 Port 2 Port 1 Port 2
1 Ok Ok 2 3
2 Down Ok None 1
3 Ok Down 1 None
As it can be seen, one link is indeed is down. and since we are using cross connection (full redundant connection) the stack did not break. Which is functioning right now.
We are investigating on how to fix the issue without causing the switches to reboot or any downtime. even 5 mins is critical for this server, since it is connected in our server farm.
We set up a test bed with two 3750 switches, going to do all possible testings before we even try to fix the current Live setup. all findings will be posted.