IT Performance Management Call for Resources; I have a dream for performance management
Posted by: Ryan Shopp
So in my last posting I called out for some links, resources that people recommend to others when it comes to understanding the variety of options and functions for Network & Application Performance Management. Upon making the request I decided to spend a few minute looking around. First up for me is a quick trip over to Wikipedia to see what they have on the topic.
On the topic of Network Performance Management; there is a nice write-up on factors that contribute to performance issues - Latency, Packet loss, retransmission, throughput.
On the topic of Application Performance Management; there were some very in-depth graphs focused around monitoring response time which I found intriguing.
On the topic of Performance Engineering; I was very surprised not only by a nice write-up of principals and perspectives related to the software development lifecycle, but also a laundry list of interesting and applicable whitepapers at the bottom.
So at this point I stopped and started pondering, is there a product out there that goes beyond grabbing statistics and reporting on them? Some tools collect data from flows, some collect data from individual resources, some tools set-up endpoints that systematically send sythentic transactions to measure response times, etc.
What do I really mean by this…is there a product that takes a troubleshooting workflow (think Run Book Automation) approach to the different steps involved with determining performance concern. He is what I mean…
- Start with monitoring traffic flows for their response time
- Automatically baseline this and when a major deviation occurs go to the next bullet point
- Is this traffic delay specific to a specific type of traffic or is affecting all traffic
- What is causing this anomaly, calculate which points of the infrastructure are traversed by these traffic flows
- Look at each input/output point on the infrastructure (e.g., interfaces) to see if their are errors, retransmissions, etc
- If not errors, next look at each input/output point on the infrastructure to see if throughput in bottlenecked.
- If no bottlenecks, next look at the processors/CPU on each point of the infrastructure to see if that is causing the delay
- If no processor delays, look at…. (etc, etc, etc)
At this point I think we get the picture. Most products I’m familiar with collect data metrics from one, two, three, etc points of view on the network and roll-up those into impressive looking graphical reports. Then it’s up to the administer to review each report and self-analyze. As mentioned previously in posts I’m familiar with Integrien, Netuitive & BMC (ProactiveNet) who perform impressive behavioral baselining in creating more intelligent alerts to forward to the event management console but I’m looking for more here. I want someone to take all the collected data and basically apply root cause analysis/run book automation principles. If someone is out there doing this please speak up and throw a link to your site down in the comments so I can come take a look.



