Posted by: Ryan Shopp
In part one we talked about data collection and basic functionality offered by Performance/Capacity/Availability Management for Data Center Automation. In part two we hit upon how some vendors our taking this collected data and extending it’s effectiveness in the name of proactive, even predictive analysis while attempting to drive false positive issue identification to absolute zero. Now, how do we take this information and apply it back to the Data Center Automation Blueprint we’ve been working on.
We currently have two distinct entities combined together within the current Performance & Availability functional category;
- Reactive events/alarms/alerts…something has already happened and single-pain of glass, event correlation or root cause analysis is attempting to automatically sort through all this to weed out the less import or false positives concerns.
- Proactive determination…we are attempting to identify any issues before it happens…many times these proactive tools also feed their event/alert/alarm information up to the reactive single pain of glass consoles
These two distinct areas are both very necessary and I believe with further automation will continue to consolidate together. So I believe they should stay together as one entity and we should continue to push vendors to further pull these together. What we really want is a unified list of events that are 100% accurate and detailed instances of 1) things that will be going wrong very soon and 2) do to unforeseen or controllable circumstance an immediate condition/concern that is upon us. A term I seen used in the past, mostly in the service provider space, is calling these conjoined areas “service assurance.” I really like this term as it’s all about assuring our data center is providing us with the business services we come to expect from it. Maybe I’ll use that in the blueprint going forward.
One other area that I encourage and expect we will see continued convergence into this “service assurance” category are not performance or outage related situations, but security and privacy events. There is no reason when a abnormality caused by a worm equates to an outage or degraded performance situation the metrics should not be correlated together vs. the separate silos of today. But that is a whole other novel.
So with that said, I’m planning to update the Data Center Automation Blueprint to relabel Performance & Availability to Service Assurance (Performance, Capacity & Availability). Now, about analytics? That’s the next topic to tackle in part 4 of this series.