Question

  Asked: Jul 19 2007   3:55 PM GMT
  Asked by: tbitner


TCP Retransmissions b/c of Checksum Incorrect


Networking, Availability, Bandwidth, Hardware, Routers, Switches, Hubs, Cabling, Cisco, Fault isolation, Network testing, Protocol analysis, Network protocols, Ethernet, IPv4, NetBIOS, TCP

We have a problem where some users will complain of slow application response time (on the LAN) from our ERP server, but others will be fine. After many hours of troublshooting I've discovered the following:

- Packet Sniff's of affected users shows TCP Retransmissions because the Checksum is Incorrect from the server.
- Replacing the NIC card on the workstation temporarily fixes the errors. Then the problem may start happening on new NIC, so I swap back to old NIC and problem goes away for a while.
- Sometimes the problem clears itself up for certain people after a couple months.
- There is no similarity between NIC cards, drivers, versions, laptops, desktops, wireless.
- Affected users have been tested on different segments of the LAN with no success.

Subscribe to Alerts! Get questions and answers delivered to your Inbox.


E-mail me updates on this question



   SUBSCRIBE

hidden modal window

Answer Wiki (Improve, edit or add to this answer)


 RATE THIS ANSWER
0
Click to Vote:
  •   0
  •  0



You said "the Checksum is Incorrect from the server". Did you mean server or client? Just to be absolutely clear!!

If you really meant "server", then you should be looking at the server's configuration & hardware.

If not, can you identify any other common characteristics of the users?
  • AddThis Social Bookmark Button

Browse more Questions and Answers on Networking and DataCenter.

Looking for relevant Networking Whitepapers? Visit the SearchNetworking.com Research Library.


Discuss This Answer


You must be logged-in to discuss a question. Log-in/Register

tbitner  |   Jul 19 2007  4:41PM GMT

I accidentally hit reply before I was finished composing last reply. The Retransmission requests are coming from the server stating “Checksum: 0×493a (incorrect, should be 0×4939)”.

The server is HP-UX 11.11 but will be migrating to 11.23 in the next couple months. From my testing it seems to be something wrong with the server, but we’re DBA-less currently so I’m reluctant to make any changes!

 

jtt555  |   Jul 19 2007  4:58PM GMT

If OS is Microsoft you may find this useful:
article - 224829

A possible reason for the incorrect checksum is if your network cards are capable of performing TCP Checksum Offload. Broadcom and Intel gigabit cards are among those that can offload TCP checksum calculation. Linux enabled TCP Checksum Offload automatically when it is available.

With TCP Checksum Offload, the packets are captured before the card calculates the checksum — so the checksums may not be correct. The checksum actually transmitted on the wire and received by the destination host will be correct.

On Linux, it is possible to disable TCP checksum offload

Of course it could also be due to any number of conditions, such as hardware failure, corruption of an IP datagram or router or congestion. Make sure your NIC drivers (server and wrkstn) are up to date. You may need to configure an NLB setup to make a fatter pipe for your ERP server.
Good luck!

 

tbitner  |   Jul 20 2007  11:28AM GMT

jtt555,

I don’t think the client OS is generating these checksums since I’m seeing retransmission requests from the server. I’m starting to lean towards the server as the culprit from my tests. Can a bad cable cause incorrect checksums or would it be the server’s nic?

Thanks

 

Snapper70  |   Oct 12 2007  9:17PM GMT

You might want to verify the duplex setting between the HP and the switch it’s connected to. If the HP is set to 100full and the switch is autonegotiate, then you may have a mismatch; and as load increases you WILL get a lot of runts and retransmissions. The OTHER thing is that some older HP’s didn’t seem to run full duplex even if configured that way - although recent models don’t have that issue.

What we HAVE done is to FTP to/from the HP to a high end workstation, and verified the transmission rate. If you’re on a 100 Meg connection, you should get at LEAST 30 Meg from a high end workstation to/from the HP via FTP (use a large file of 50 Meg or more). If your rate is only 5 meg or so, you’ve probably got a duplex mismatch.

If you use Autonegotiate on the switches, you should also use Autonegotiate on the server; if hardcode speed/duplex at one side, make sure you fix it the same on the other. A duplex mismatch will severely impact performance, but may appear normal under low volume traffic.