Network timeouts

35 pts.
Tags:
Network connectivity
Network Timeout
Switches
We've been having occasional network timeout issues with a particular group of users on a very old 10/100 switch. So we upgraded their switch yesterday to a brand new Gig switch with larger throughput capacity. (Have not upgraded the users to Gig NICs yet.) Anyway - It made the problem worse! They are all heavy users of a SQL database, which has been having locked process problems intermittently with these users. Today *all* of these users were locked. We've monitored the network activity on the SQL server, and it is not maxed out. It looks like a latency issue, somehow made worse by the new switch. Any ideas on how to further troubleshoot this issue?
ASKED: January 24, 2008  4:49 PM
UPDATED: March 11, 2008  6:18 PM

Answer Wiki

Thanks. We'll let you know when a new response is added.

We had a problem like this where the network switch and router didn’t auto negotiate at the correct speed.

The other thing to check is, if it’s an older building, the wiring in the walls. If it’s not wired to CAT5 standard it will work fine on the slower speed hubs but won’t auto-negotiate properly with the Gig switch. I have alot of customers that I have had to go back and repunch the ends of the cables installed to be CAT5 compliant. With 10M you don’t have to cross any wires and it will communicate at the proper speeds. At 100M if the wires aren’t crossed correctly it will work intermittently.

Also look at your nic settings on your SQL server and your workstation that are having the slowdown. See if their advanced settings any form of checksum offload or hardware offload and turn them all off.
Even thou you have checked for loopbacks between the switchs check again because this would be your number one thing that would cause this problem. Make sure that there is only one cable form this switch to any other switch.

Discuss This Question: 11  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Dilbertina
    The NICs and the switch both indicated that they were connecting at 100M, and we had 2 other users with Gig NICs that showed up on the switch as Gig (but they weren't among the trouble users). Is it worth trying to hard-set each switch port and each NIC? Most of the cables are recent home-made (by our very knowledgeable crew) and strung across cable ladders in a warehouse-type building. I'm pretty sure they are Cat5e compliant. I will double-check that with someone that's been here longer though. (I'm new - 3months) Also, it seems to take a while for the problems to start happening. Everything is fine for about an hour, then the problems start getting worse and worse. By 3 hours, they're all unable to work. We took the new switch down today, to rethink why this is happening without interrupting the workers on the production floor. I am not sure what to do next. The switch is a Netgear Prosafe GS748T. Anyone heard anything bad about those in general? I've talked to tech support, and they have been no help at all. :-/
    35 pointsBadges:
    report
  • Nox.freak
    Are you having a network loop back problem? Is your switch showing a high number of collisions? If you are using a manged switch the you should check the logs/stats.
    40 pointsBadges:
    report
  • Tbitner
    Is the SQL server connected to this new switch or are you crossing network boundaries to access the server?
    510 pointsBadges:
    report
  • Dilbertina
    The switch is sort-of managed. Its a "smart switch" with a web console. It can show amount of traffic per port and number of errors per port. Traffic looks normal and evenly distributed, and for the short time we had it connected - no errors. weird. The old switch was not managed, so I don't have any history to look at. We did initially have a loopback problem (before I posted this thread), but that is solved. The SQL Server is connected to our Cisco-linksys managed Gig switch in the main server room. There is a gig uplink cable running from that switch to the new switch in the Receiving dept. This is the same wire, on the same port that was working before. We just now figured out the login for the server room switch, so today I'm going to take a look at its logs. The IP addressing is all on the same LAN, nothing fancy at all. The weird thing is - any switch EXCEPT our new one seems to work fine. But we only have older 10/100s with one Gig uplink port to choose from as spares. So I suspect we have some type of network problem that only shows up on full Gigabit or only shows up with the more modern switch that has autouplink on every port, etc. And since I can't leave the new one in place for very long, I need a very specific test plan before I hook it back up again. How could I test for another loop, if the spare switch does not have autouplink and is not managed?
    35 pointsBadges:
    report
  • Jirvine
    Is there anything you can do after 3 hours to get it back up again? IE: Reboot the switch. If rebooting the switch fixes it, I would suspect the switch. And if the cabling isn't CAT5 compliant it will still show up as a 100m link it just may or may not communicate. If you look at the end of the cable, you should see in this order: White/Orange - Orange - White/Green - Blue - White/Blue - Green - White/Brown - Brown. Or, as an alternative, the Orange and Green pairs may be swapped. Both ways are correct for standard CAT5 Cables.
    520 pointsBadges:
    report
  • Buddyfarr
    if it is older cable look in the wall to make sure that the coating on the cable actually says Cat5 or higher. if so then have you tried swapping out the linksys switch with the new one? there was a problem before, and now with the new switch on the end of the route the problem gets worse, (probably because the traffic is higher). If the initial problem was with the linksys then maybe that is the root of the problem?
    6,850 pointsBadges:
    report
  • Pelle
    Have you typed the windows or UNIX command 'netstat' in your SQL server to look for number of IP-connection? Look for a lot of 'time wait' and if you see A LOT, start looking for another application :-) Not really, before looking for new apps, please do a search for 'time wait' at MS knowledgbase or a Linux howto's and follow the recomendation. I did and it solve my problem enough . . . . until we did find a new application and kicked out our old DB-server killer application :-) This problem will grow if you move to a high speed switch because the application will be able to do more DB-requests in a short period of time making the problem bigger. Good luck and remeber, most complex problem are not a network problem . . . . (only).
    10 pointsBadges:
    report
  • Action CCS
    And after you do all of that stuff posted above, try to determine if there's an environmental problem with the affected workstations. Following the cabling from the problem machines, check to see if it runs on or over a fluorescent light fixture, near an electric motor, etc. Even properly connected CAT5e can have problems if the cabling is physically near an RF source. White noise on the line caused by EM interference will exhibit the symptoms you describe and the symptoms will be exacerbated by a higher speed connection. Sometimes it's not the computer system, it's the environment the system is expected to work in. This seems possible or even likely being as you describe the problem as: 1. Preexisting but worsened by the new switch. 2. Limited to the same workstations in both the old and new hardware installations. 3. Problem increases over time. One other thing to think about is PEBKAC. Do your users have Internet access? Is it possible that one of the computers is infested with spyware/malware that's taking up all the bandwidth? Bittorrent will hog bandwidth like mad if it's not configured not to, might one of your users be downloading music? Is someone getting streaming content from the Internet? Look at your switch logs to see if one computer is getting a lot more packets than the others, if so there's your problem child. This doesn't seem as likely as the environment problem considering that there are other workstations on the same switch that function properly.
    10 pointsBadges:
    report
  • Jerry Lees
    You mention-- auto uplink on the ports. The old switch I assume doen't have this feature. I would try to set the port you're using as an uplink as an actual uplink instead of letting it autonegotiate. It certainly does sound like an autonegotiation issue, possibly that is the place to look. I assume the computers on the same switch don't have issue communicating with each other?
    5,335 pointsBadges:
    report
  • TuftyB
    This sort of issue could be caused by one or even many things, but here is my attempt at assisting you. It is best practice to nail the up-link ports between switches/routers, so set both ends of the GBIC port to 1000/full How long is the uplink cable, because if it is over 100m, then I’m not surprised you’re getting problems (check for errors on both ends of the port)? I would suggest that you get some base line date before you do anything else, i.e.: Pings times to from various devices local and remote to the switch Check the SQL server log Nominate a ‘test’ pc that has experienced the problem and install a network monitor (e.g. Microsoft's Network Monitor / Ethereal), don’t try it on the SQL server because it will capture too much data (BTW – change the buffer size if you use MS Net Mon because it defaults to 1MB) On the ‘test’ pc check the eventvwr Do you have a spare cable between the switch and server room? If not, then how about getting a new one installed and ensuring that it is at least Cat5e standard. How about either connecting the new switch to the main server room (if you have a spare cable to the server room) or daisy-chaining it to the back of the old switch if no spare cable and moving a few of the pc’s across to that switch, then see it the problem re-occurs on those moved PC’s, if not then move them all across in batches until either the problem occurs or you get all the PC's on the new switch working for a couple of days without a problem. If you get that far, then it would appear to be an uplink issue to the main server room. Another idea (if there is a spare cable between the server room and the new switch) would be to connect the server direct to the new switch, this would a good idea if the majority of users were on this switch because then the server is ‘local’ to the PC’s. If the problem re-occurs do the same tests as per the base line tests, this should provide some pointers to the problem. Ensuring that you capture some data from you net-mon even though you may not be able to understand at the moment, I’m sure that someone can give you some pointers (if I’m around, I’ll try). One last thing to consider is, was the old switch really a switch or was it a hub? Let us know how things progress.
    10 pointsBadges:
    report
  • Dilbertina
    Thank you all for your help! Just wanted to give you an update - I was able to login to the managed switch upstream and log some activity on the port in question. It logged several Internal MAC errors, so NetGear says the new gig switch is bad. They replaced it under warranty, but I haven't had a chance to try the new NetGear gig switch yet. We have an ISO audit coming up, so I'm reviewing quality documentation like crazy the past 2 weeks. I will let you know what happens when the new switch goes in.
    35 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following