Adventures in Data Center Automation

Nov 6 2007   5:17PM GMT

Building out a reference model for data center monitoring/automation



Posted by: Ryan Shopp
DataCenter, Network monitoring, Performance management, Security, Systems monitoring, Virtualization

So as this blog is just getting rolling I’m quickly realizing I need to come up with a graphical reference model, key approaches and metrics to reference. So to get that process started, i’m going to brainstorm some items here and hope to get some feedback on areas I should make sure I don’t forget. I’m not trying to re-create the wheel here but in my experience with ITIL, FCAPS, TMN, OSS, etc I still haven’t found a model that is technical enough to capture the essence of the challenges I’m solving – while not so technical I get lost in the weeds. I’ve see 50,000 foot views and I’ve seen 10,000 foot views but I’m aspiring to find something that is at the 30,000 foot level.

Data Center Infrastructure categories:

  • Network Connectivity: Routers, Switches, CSU/DSU, WiFi
  • Network/Application Optimization: Load Balancers, WAN Optimizers
  • Network/Application Security: Firewalls, Intrusion Prevention, Data Leakage
  • Application Servers: Windows, Solaris, Linux, Virtualization
  • Applications: ERP, CRM, Web, Databases, VoIP, Streaming Media (may need to break this down further)

Data Center Automation categories

  • Performance/Capacity Management – throughput, processor usage, memory usage, latency
  • Event/Fault Management – availability, consolidator of all alerts/messages into single pane of glass
  • Configuration/Software Management – upgrades, functionality changes, deployment, provisioning
  • Security Management – vulnerabilities, intrusions, leakage

The first area I’m thinking through is Performance Management where you gather key metrics over time to assist in the identification of current or future performance hindering situations that may ultimately result in productivity or revenue losses by an enterprise.

Key Performance Metrics

  • Basic (all components in the Data Center should provide these): Processor Usage, Memory Usage, Throughput, Latency
  • Advanced (will be unique/specific to each Data Center category): Bandwidth savings (e.g., WAN optimization), Transaction failures, page faults, etc)

Point of View for actual metric

  • System-centric – something specific to a Data Center infrastructure category (e.g., processor utilization)
  • Flow-centric – something watching transactions end-to-end at some point in the infrastructure (e.g, VoIP transaction, DNS resolution request)

Then the last area to consider and discuss are the methods by which this information is gathered; proprietary agent, agentless, hardware appliance, leveraging an established vendors agent, etc. Certain information may only be available through certain methods. Those method may or may not be an option for use depending on the enterprises’ business requirements. I’m going to need to come up with a way to organize/categorize these based on business uses (e.g., NetFlow, RMON2, SNMP, WMI, RPC, XML, Proprietary)

So stay tuned as I work to pull this together over the days ahead. Once I’ve hashed out this model I hope to provide a taxonomy of vendors and how they map to each. Once we have that in place then it will be time to start going through best practices and methodologies around evaluating vendors to meet you company’s individual business requirements.

As always, please provide feedback, thoughts, ideas as we build this out.  Note to self:  This is currently centered on managing the IP portion of the Data Center, not inclusive of power, space, non-IP storage, etc…once I get the IP portion down I hope to extend into those areas.

Comment on this Post

Leave a comment: