Shenning |
Ryan - let me talk a bit more about what Integrien Alive does because we provide far more than just baselining and more actionable alerts. We integrate whatever monitoring data sources a customer has available so that we are able to provide our insights cross-silo. We can analyze the data across an entire business service’s components, which is critical to troubleshooting. Of course, we do the intelligent, behavioral baselining you spoke about, but the main reason for this is to establish the abnormal precursors to problems, which static threshold-based alerting cannot. What really separates our solution from the competition is that we allow our customers to set key indicators based on business service performance, end user experience, and/or pure IT metrics.The key indicators now drive our patented problem modeling analytics. When a key indicator is exceeded (either exceeding its normal behavior or a preset problem indicator value), our solution creates a model of the building pattern of abnormalities that lead to the problem (we refer to this as “Problem Fingerprinting”). This problem model provides a forensics tool - similar to what you are describing above - that allows the Ops team to focus their troubleshooting efforts. The model will pinpoint the silos of the business service where the problem resides. For example, the model may show that the application server and database tiers of the application are where the abnormal behaviors are occurring when a serious transaction slowdown occurs. The application server and DB server experts are provided with the specific abnormal symptom behaviors that occurred up to an hour before the problem manifested. Armed with this information, the experts can use their deep dive tools to determine the root cause of the problem. To continue my example, it may be found that the database abnormalities are the result of duplicate transactions being submitted by the application logic. Therefore the problem does not reside in the DB tier. The application expert may use a tool such as CA Wily to find the errant code that caused the problem and note this as a regression from a previous build. The focus provided by the problem modeling significantly reduces the MTTI/MTTR over getting representatives from all silo teams on a bridge call. The other benefit is that once a model has been captured, if a similar building pattern of events occurs in the future, the transaction slowdown can be predicted so that it can be addressed before it occurs. The predictive alert that is sent provides complete information on what to look for and how the problem was solved the first time it occurred.
While the troubleshooting workflow approach you speak of would certainly work in individual silos (and I believe is already implemented in some tools like EMC SMARTS for the network silo), the complexity of today’s mutli-tier applications and their interdependencies make this approach unlikely across an entire business service. The complexity of managing performance and availability in multi-tier business services requires an analytics-based solution, with data agnostic algorithms, that can model problems and allow Ops to to focus their efforts. Once this focus is provided, the silo-based experts and their troubleshooting workflow tools can do their thing.
Amena |
Ryan, OPNET has an integrated application performance management solution that provides real-time monitoring and global visibility all the way down to local troubleshooting and problem remediation.
ACE Live delivers an end-to-end solution that spans monitoring, measurement, and detection of violations, and then bridges seamlessly into uncovering the root-cause of application performance problems. It provides visibility of all transactions and users across the enterprise, with detailed real-time and historical information about performance, utilization, route quality, ISP performance, and end-user response times.
While ACE Live provides visibility of all users and transactions across the network, OPNET Panorama collects detailed data across all the servers in your application’s environment, then feeds this into our expert analysis engine, to produce real-time dashboards, historical reports and in-depth data views based on events and behavior patterns. Alarming thresholds are established dynamically, automatically adjusting their limits based on historical performance. Forensic ‘snapshots’ capture and archive in-depth data on key events for detailed troubleshooting. Drill-down capabilities identify specific resources, such as CPU, Java/.NET classes or database components, that scale inefficiently. Deep transaction tracing enables a detailed analysis of execution times at the method level, pinpointing statements in the application code that are responsible for performance problems.
OPNET’s ACE Analyst provides the “forensic” analysis that complements ACE Live. ACE Analyst automatically deconstructs individual application transactions to determine protocol delay, error messages, retransmissions, and arrival times. Diagnostic reports pinpoint performance bottlenecks and summarize sources of response time delay, providing actionable recommendations for improving response time (e.g. is application “chattiness” or TCP windowing between a specific client or server causing performance issues?) Hundreds of protocol and transaction level decodes provide code-level visibility into application statements. ACE Analyst even goes a step further than root-cause analysis. Its unique predictive model, created from captured traces, allows the troubleshooter to quickly and easily validate fixes to performance problems before implementation. For example, you can adjust infrastructure and application design parameters (such as bandwidth and application message turns) and immediately see the impact on the application’s response time.
All the pieces described above are tightly integrated with seamless workflows, delivering a complete monitoring and root-cause analysis solution, as you describe in your “dream” — here’s a link to our website so you can come take a look: <a href="http://www.opnet.com/solutions/application_performance/index.html" rel="nofollow">http://www.opnet.com/solutions/application_performance/index.html</a>
One final note on the subject of Performance Engineering: OPNET provides a methodology service for establishing a performance engineering practice at your organization, which has been highly successful operationally at a number of our key client sites.
You must be logged-in to post a comment. Log-in/Register