We are running Windows Server 2003 and recently updated our old hubs to 3Con switches and old cat-5 cables to fiber and 3Com transceivers. In the last month we are experiencing intermittent (4 out of 5 days a week) downing of our server at about 8 am (we are a school system and it is approximately the time that most of our staff of 75 attempts to log on to our system). If we do nothing, the system will come up within an hour or so, and if we power off the 2816sfp plus switch it will come back up in about 15 minutes. We have replaced the switch and the problem persists. 3Com tech support suggests that we are having a problem with a Broadcast Domain and that when the traffic is heaviest first thing in the morning it is causing the switch to become "confused" and that after time it sorts itself out. We are at a loss as to how to resolve (or even to properly troubleshoot) this issue. Any help that you might provide would be greatly appreciated!
Software/Hardware used:
ASKED:
May 4, 2005 5:30 PM
UPDATED:
May 17, 2005 12:32 PM
I agree that the answer given by 3COM is “less than helpful”. I too have experienced similar issues, but not with a 3COM switch, When a large number of clients logged on, the network would “crash”. Using a packet sniffer (Wild Packets’ Etherpeek was what worked for us, although there are many good programs out there), we determined that there were several printers on the (TCP/IP) network running IPX/SPX – a remnant of our Novell days – which were being switched on at about the same time and were causing the congestion. Disabling the IPX protocol solved the problem.
The key here is, of course, a network analysis program or device…
You may also wish to limit your broadcast domain by creating VLAN’s and thereby also allowing you to increase security along these areas. This can be accomplished by ACL’s on the routed interface. While it is not fool proof it does help reduce the “confused” state of the switch and as you are not routing IPX, if this is an issue it would stay local to the VLAN ports.
Also run Ethereal to determine who the high talkers are during this time and try to track them down. There is also the possibility that when a machine is turned on that there is some malware/spyware/virus that is attempting to call home during this time as well.
Also during this time what do the stats on the PDC/BDC/DC look like during log in? Is the server experiencing any large hits in processing/cpu/pagefiles/etc?
Please let us know what happens. If you have tracked it down to a particular machine or machines but can not determine the cause, let me know and i can give you a BAT file to run on the machine(s) in question. This will allow you to see what is running and cross reference this to what you know should be running and then track it down to its source.
TA
CiscoCat6k – CCDP