Posted by: SolarWinds
Network Management, Networking
We will be providing a four part blog series on troubleshooting network problems. The series will address:
- Part 1: Network Device Performance
- Part 2: Network Device Configurations
- Part 3: Network Traffic and Bandwidth Consumption
- Part 4: IP Address Issues
No matter how carefully planned your network design is, how much redundancy you have built in, or how much you proactively monitor your network, you are bound to have a problem at some point. And when that problem occurs, you need some steps and tools to troubleshoot the problem so you can minimize the impact to your users.
“The network is slow today” is without a doubt one of the most disliked phrases heard by network administrators. The network has become a dumping ground for problems that originate as often as not from servers and applications as from the network. Thus, one of the biggest jobs of the network administrator is to defend their network from being labeled the cause of today’s problem. Because slow environment performance is often first—and often incorrectly— attributed to the network, rapid identification and problem isolation is critical to the administrator’s workload.
But, what causes a “slow network”? Most network performance issues can be attributed to one of four broad categories: device performance, device configurations, traffic/bandwidth, or IP issues. Today, I will provide you some basic tools and tips for troubleshooting network device performance issues.
Baseline Network Performance
Hopefully you have performed a baseline of your network performance so you know the normal working conditions of your network infrastructure. This baseline can then be used for comparison to catch changes that could indicate a problem, provide early indicators that application and network demands are pushing near the available capacity, and align network performance baselines with service-level agreements (SLAs).
If you haven’t established a baseline, then you will need to rely on your equipment vendors and their recommended or “best practice” thresholds. You can also use various network equipment or monitoring forums to see what other IT professionals are doing.
Collect Network Device Performance Metrics
Network device performance metrics provide information about the system resources on each individual device. These metrics are critical in ascertaining whether a resource overuse problem is a central cause of a reduction in performance. Collecting and reporting on network devices helps the troubleshooting administrator quickly identify whether the device is a source of the problem or the problem lies elsewhere.
Device monitoring using the Simple Network Management Protocol (SNMP) provides a very device-centric view of network conditions. Using SNMP, counters on a device such as a router, switch, or firewall can be measured and forwarded to a network management system for review. This data is useful for understanding performance conditions that are specific to that device. Performance statistics such as CPU utilization, Interface/Bandwidth Utilization, and Memory Utilization represent the majority of performance issues encountered in the day-to-day operation of network devices. There are dozens of free and commercially available tools in the market that will allow you to monitor these device statistics.
Switch/Router CPU Utilization
Common symptoms of high CPU utilization within your switch or router include:
- High percentages in the show process cpu command output
- Input queue drops
- Slow performance
- Services such as Telnet, console response, ping response, or updates fail
- High buffer failures
If you are able to connect to the router, then you can use the show process cpu (for Cisco routers) command to check if CPU utilization is high due to interrupts or processes.
Cisco provides two great documents on Troubleshooting High CPU Utilization and Troubleshooting High CPU Utilization Caused by Interrupts.
Switch/Router Memory Utilization
Memory is a limited resource on all network devices and must be controlled and monitored to ensure that utilization is kept in check. A memory allocation failure means either the network device has used all available memory or the memory has fragmented such that the device cannot find a usable available block. The symptoms of memory allocation failure include, but are not limited to:
- A memory related console or log error message ( “%SYS-2-MALLOCFAIL: Memory allocation of 1028 bytes failed from 0x6015EC84, Pool Processor, alignment 0″ in the case of a Cisco router)
- Refused Telnet sessions
- The show processor memory command is displayed no matter what command you type on a console
- No output from some show commands
- “Low on memory” messages
- The console message “Unable to create EXEC – no memory or too many processes”
- Router hanging, no console response.
Possible causes of memory failure include:
- Memory Size Does not Support the Cisco IOS Software Image
- In Processor Memory (“Pool Processor” on all platforms)
- Large quantity of memory used for normal or abnormal processes
- Memory fragmentation problem or bug
- Memory allocation failure at process = <interrupt level>
- Memory leak bug
In Packet Memory
- Not enough shared memory for the interfaces
- Buffer leak bug
- Router running low on fast memory
For additional detail and troubleshooting steps for Cisco routers, see Troubleshooting Memory Problems
Before you start digging into the gory details of your router interfaces, it is best to simply monitor the overall network bandwidth utilization to determine if you even have a problem. Numerous open source or free tools from network management suppliers exist in the market that greatly simplify the process of gathering bandwidth utilization data and presenting it in an easy-to-consume graphical format. SolarWinds free Real-Time Bandwidth Analyzer is an example of a commercially developed free tool that displays network device interface utilization.
If you determine that you have a problem then you will want to get detailed information about the interface on your router. On Cisco routers, you can view the information about a particular interface using the “show interfaces” command:
If you identify an interface that has high utilization, you can take the appropriate steps to reconfigure or load balance your system.
Stay tuned for future articles where I will provide similar tips and tools for troubleshooting device configurations, traffic/bandwidth issues, and IP address issues.
By Brad Hale, Product Marketing Principal for SolarWinds. SolarWinds (NYSE: SWI) provides powerful and affordable IT management software to customers worldwide.