Possible Networking Issue

0 pts.
Tags:
3Com
Availability
Bandwidth
Cisco
Dell
Networking
I am not a network guy but I'm troubleshooting what I think is a network problem(s). 1)Some workstations at my company are experiencing problems with slow saving of audio files when using an audio editor. Event Viewer sometimes has the following error message "A driver packet received from the I/O subsystem was invalid. The data is the packet," or Mrxsmb with an Event ID of 3019 which indicates a redirector error. These error messages are not consistent...sometimes event viewer is "clean." 2) The application logs for the audio application sometimes has an error message "Disk drive or network share is either invalid or cannot be accessed." The workstation may reconnect after 30 seconds but sometimes needs a reboot. I believe these are two seperate problems. All users are in a newly renovated area of the building on ehternet using a 3Com 3C905C nic installed on a Dell Precision ws running W2K. Prior to moving to their new area, they were connected using ATM w/o any problems. The users are using their same ws's prior to moving. I do not have access to switch logs or networking tools but would appreciate any advice which would help resolving this problem as well as what you think the problem might be.
ASKED: May 17, 2006  7:54 AM
UPDATED: May 24, 2006  8:55 AM

Answer Wiki

Thanks. We'll let you know when a new response is added.

The information you provided really isn’t enough to track down the problem. What kind of switches are you using? What is the distance to the switch? Now that most installations are 100Mbit or even 1Gbit, length limitations are more critical than they were with 10Mgit networks. Is the problem consistent or intermittent?
Are you sure you were using ATM in the old installation? This doesn’t make sense to me.
Do you have any problems saving the files locally? This test will help show if the problem is really reaching the server or something on the machine.
Try pinging the server when the problem occurs. Can you do a capture of traffic during a failure? (Your company may have rules about this). You can get ethereal for free. A capture should help explain the nature of a network failure. You also really need to know if there are problems in the switch logs.

We had some problems similar to yours. The intermittents were driving me nuts because I couldn’t diagnose the problem before it went away. People had trouble saving files, logging onto the domain, staying connected to the exchange server, and getting files from the servers. There were no errors in any of the switch logs. As far as I could see, the network was working fine.
Finally the problem got bad enough, long enough for me to track it down. It turned out to be our new dell Gbit switch. A warrenty replacement over spring break seemed to fix everything until the students came back.
When I called dell support again they were unable to figure out what was happening and insisted I was running some new protocol that was messing up the network. A quick look at my captures didn’t show anything unusual. (If I had time to look deeper into the captures I would have seen the retrys).
Finally, I discovered a pattern in ping failures to the switch. They would fail for a while then succeed for a while. I figured out that when they were failing, if I pinged the router, this “fixed” my pings for a while. I immediately suspected the CAM table. The dell tech showed me where to find it and I was surprised to see is was only 1K in size. We have more than 1000 nodes at our main campus.
Later that day I moved our servers over to a slower cisco switch and the problems went away. My interpretation is the CAM table overflowed and the switch started silently dropping packets. We are in the process of ordering an HP switch to replace the dell switch. The CAM table in the HP switch is 16K.

The point of this long story is to show you can have problems even when all of the network logs say there is no problem. If I had been set up for a capture during one of the intermittent failures I would have found the problem sooner.

Discuss This Question: 10  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Petroleumman
    Hello, Before you get too involved with trouble shooting, ask yourself a few questions. Do you have any problems accessing the file server across the network? Are your users noticing prolonged logons? Can you save other file types to the file server across the network without delay? If the problem is truley network related you should experience some trouble performing these functions as well. You noted one error referencing incorrect driver files, have you confirmed the drivers used with your application are the latest version? Do you use mapped drives linking the workstations to the file server? How does the application react when saving the audio files locally? Can you save files of the same type successfully from WS's located on other network segments? Troubleshooting network issues is best approached by starting with the most obvious first, then expanding out. Don't ever assume something unless you have checked it first. More times than not people will want to jump right to the routers, switches and other network gear first without looking at the obvious. Good luck!
    0 pointsBadges:
    report
  • Bobkberg
    Hear Hear petroleumman and astronomer!! Both good pointers. I suspect that many people tend to point the finger at the portion of the whole network they understand least, or which is not under their control. I'm assuming that you mean "slow saving of audio files" to a network share. Have you tried this locally? And if so, what were the results? When troubleshooting, one of the things you need to do is to change or eliminate one variable at a time, and then try it again. When you say "they were connected using ATM w/o any problems", do you mean that you had an ATM backbone running LANE? Or did the end stations have ATM adapters and you switched to the 3Com Ethernet NICs? If it's the network (local saving never has a problem), the next place I'd look is to see whether the switches and workstations are using hard-set speed and duplex vs. Auto-Detect. I've had (and read about in this forum) numerous problems that were related to that. Since you say that you don't have access to switch logs or networking tools, I'm assuming that you are not part of the IT group, or the network portion of the IT group. Those folks should be brought into the problem. If they are denying any possible involvement, then show them this post and ask what they think of the various suggestions from the group. It DOES get tiring having everything blamed on the network when the problem is often elsewhere - I get my share of that too, but without a better picture from you of what it all looks like, it's hard to tell. Write back with more or better detail. Bob
    1,070 pointsBadges:
    report
  • Idapsdst
    My apologies to all?I?m not trying to make networking people the scapegoat. I work in IT and support all aspects of the audio applications (which is a client/server based system)?including automation systems, local DB?s, live assist, this also includes troubleshooting, configuring, testing apps. as well as some basic back end work. The reason I presented the question the way I did is because the Networking group is not being as helpful as I would like. They are blaming the problem on the image, the audio server (which we have found a problem and will resolve shortly) the Nic?they say the network is fine and the switch is fine. I'm hearing rumors that it may be a Spanning Tree issue There are multiple issues in my opinion. 1) Networking (slow saves across network to file server) 2) One audio server NIC not configured properly (100/Full instead 1G/Full) 3) Clients using audio apps. have at times experienced the following error message ?Disk drive or network share is either invalid or cannot be accessed? Disconnects were viewed using TCP View utility. This occurs on the ATM & Ethernet segments. I am positive workstations were connected using ATM in the past w/o any problems. Most of the users in the building are still on ATM. We just recently began to move users to Ethernet?and it seems to be localized to a group of about 75 users. The problem (slow saves of audio files over the network) isn?t everyday, but it?s consistent w/ the users who moved from ATM to the newly renovated area w/ Ethernet connections...the save issue is not every day or even every workstation. Users in other parts of the building on Ethernet have not complained?only the users who moved to the newly renovated area are having these problems. Saving audio files locally isn?t an issue. Ws?s are configured for 100/Full and seem to experience the problem only when saving audio files?email, internet, word?no real problems. Two audio servers are Ethernet and one is ATM. The other problem that has occured on both ATM and Ethernet (more prominant on Ethernet and mostly localized to the area I explained above) is??Disk drive or network share is either invalid or cannot be accessed.? Workstations loose connectivity to the audio server and the audio application writes the error message to the application log. In Event Viewer I?ve see the workstation disconnected for 30 seconds to 2 minutes?The Event Viewer log and the application error message times are identical sometimes and other times Event Viewer logs are clean and indicate no problems. Once again, my apologies to all. Thanks for your interest and assistance.
    0 pointsBadges:
    report
  • Astronomer
    Intermittents are normally tough. Is there any kind of action that fails consistently? Is there a way to hook a malfunctioning workstation directly to the server with a crossover cable or use a switch that isn't part of the main network? I noticed you are hard coding the ports for duplex and speed. When you do this you need to make sure both ends are hard coded. I found some connections in our net that were auto-negotiate on one end and hard coded on the other. When I changed them to hard coded, (or in some cases auto-negotiate), on both ends, the reliability went up significantly. You should also check the loading on the net. Some switches start dropping packets when the load reaches 60%. If your people are using MRTG, they can graph every port to see if there is a correlation between loading and failures. I still think it would be useful to sniff the traffic to see how it is failing. I haven't done it much here because the network and server prople are so tightly integrated but I have a friend who consistently does traffic sniffing whenever the server group blames the network. In the vast majority of cases he was able to discover what was wrong with the server or workstations. rt
    15 pointsBadges:
    report
  • Bobkberg
    At the risk of getting on my soapbox, I've written blogs about this subject... What I'd suggest is to try and get at least one person from every group into a whiteboard (chalktalk for those of you in my age bracket) meeting to lay out the entire picture. Key to this is a NO-BLAME scenario. What you're trying to do is to get a clear handle on the entire situation, and to do that successfully, you'll need data. Who, What, Where, Why, When and How. Draw lots of pictures, take notes, and ask for explanations from each participant as to how they picture the operation (or non-operation) to go. You may be surprised to learn that not everyone has the same picture, and your colleagues may respect you more for trying to learn from them rather than to place blame or demand fixes for things which may or may not be under their control. Bob
    1,070 pointsBadges:
    report
  • Petroleumman
    Hello, After reading your most recent post, something I might suspect is a possible mis-match in link speed between your NIC's and your subnet router. You made mention that your speed is set to 100/Full on your workstation NIC's. Change your workstations to Auto Detect and set your router's link speed to Auto Negotiate. If one piece of equipment is set to 100/Full and the other is set to anything other you will in essence create a bottleneck during peak traffic times, such as saving large Audio files. This can also occur when forcing link speeds (100/Full, etc.)as well. Let your equipment negotiate a speed that works best. Good luck!
    0 pointsBadges:
    report
  • Idapsdst
    Unfortunatley I do not have access to the switch, the configs or the logs. We are also getting high "ARPing" on ws...80-100% using a fluke. We have good days and bad days...sometimes it affect a random group of workstations one day and another random set the next. I'm convinced there are several problems...1)slow saves and poor performance and 2)disconneting from the file server. The problem seems to have increased since we went to Active Directory 6 months ago...Networking also put in a new switch last month in the area where the problem is more pronounced. Regarding the issue of disconnecting from the file server, I'm seeing "MRxsmb" w/ an event id of 3004 in Event Viewer. I've been able to compile the following about "MRxsmb:" "Most Microsoft platforms, including Windows 2000, use the Common Internet File System (CIFS) standard to implement file and print sharing. Win2K implements CIFS with an enhanced version of the Server Message Block (SMB) protocol (which explains the "smb" part of mrxsmb.sys). Two kernel mode components initiate and manage remote connections, mrxsmb.sys and rdbss.sys. Together, these components create a remote session, perform the file system operations you request (e.g., open, close, read, or write a file or spool a print job), and terminate the session when you no longer need the resource. When a system encounters a problem connecting to or accessing a remote resource, you see event log warnings and error messages from mrxsmb.sys. In severe cases, mrxmb.sys crashes with a veritable smorgasbord of stop codes. "Mrxsmb writes event log messages when a network is alive and well and when a system has connectivity problems. For example, when you boot a system that claims to be the master browser, Mrxsmb writes event ID 8003 informing you that a new guy on the block attempted to take over the role of master browser and that a browser election has occurred. When you boot a system that is unable to contact a domain controller (DC) or a DNS server, you see multiple messages from Mrxsmb, including event ID 3034 "The redirector was unable to initialize security context or query context attributes" and Event ID 3019 "The redirector failed to determine the connection type." Although event ID 3034 most often indicates a serious problem, the Microsoft article "Error Message: The Redirector Failed to Determine the Connection Type" states that you can safely ignore the event ID 3019 warning message." AND... "The redirector was unable to initialize security context or query context attributes." What is a security context? The security context represents the compilation of rights and permissions that a certain account has. At any time, an application runs in a certain "security context" and that could be one for the user that is currently logged on, the "local system" account or a different account (like when an application is started using "run as" mode). Unless is using the user currently logged in, in order to obtain the "security context", an application needs to be able to connect to a domain controller and obtain the "security token" for that user. Failure to connect to a domain controller would stop the application from obtaining the security context." Thanks once again for your assistance!!!
    0 pointsBadges:
    report
  • Petroleumman
    Hello, Those Mrxsmb errors and/or warnings your getting do help but are not always diagnostic of the problem. W2k and Windows Server 2003 generate a lot of those types of errors or warnings with failed connectivity issues however they are usually down stream errors in response to other problems that are occuring elsewhere. Since you have limited or no access to the network components of your network my suggestion to you is to follow advice given in an earlier post and get your networking team involved. I would compare the configuration of a 'working' ethernet subnet with the config of your problematic segment and search for clues. Once you have the cooperation of your network team and can access switch and/or router information, maybe then you can post details from that side of the fence as it doesn't look like we are getting anywhere working just from the workstation side. Good luck!
    0 pointsBadges:
    report
  • Idapsdst
    [...] Possible Networking Issue What started as a routine inspection¬†of slow file transfers became a great guide to general network troubleshooting and discussion about whether too many problems are unfairly lumped as “network” by¬†other groups. [...]
    0 pointsBadges:
    report
  • Idapsdst
    [...] Possible Networking Issue [...]
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following