Adventures in Data Center Automation:

DCAB

Jun 17 2008   10:41PM GMT

Performance and Availability vs. Analytics - Part 5 of 5



Posted by: Ryan Shopp
DataCenter, Analytics, CMDB, DCAB, IT Process Automation, BSM

Finally, the last installment of this 5 part series (which was originally a ? part series). This last segment took a little more time then expected to get to. These days you can find BSM definitions and products all over the place. So the question I’ve been asking myself is BSM different then Analytics (as defined in our Data Center Automation Blueprint).

First up, here are some of the best definitions of Business Service Management:

  • Business Service Management is about enabling IT operations and support staff with empowering information that helps them to understand the impact on the business in business terms
  • Business Service Management is the integration and consolidation of systems management with business management
  • Business Service Management is about understanding the business perspective also known as the “top down perspective”. What value, revenue, cost, churn, ROI, etc. can be associated with the IT services, applications, processes, transactions, etc. being delivered and supported by your IT organization?
  • Business Service Management is an IT operations management software product that links the availability and performance status of IT infrastructure components to business-oriented IT services that enable business processes
  • Business Service Management metric might look at the dollar impact of server downtime as opposed to an ITSM (IT-focused) metric that identifies the percent uptime for the same server
  • First-generation BSM solutions offer:
    • a way of defining and describing business processes;
    • discovery (partly manual, partly automatic) of IT service components;
    • mapped (partly manual, partly automatic) business processes to IT components;
    • adapters to other infrastructure management products;
    • Measure end-to-end performance for business processes;
    • Measure the business impact of downtime;
    • analyz the root causes of incidents resulting in downtime;
    • provided dashboard views so that selected target audiences can combine relevant information.

In part 4, I defined the Data Center Automation Blueprint’s analytics category as “a roll-up aggregation view of metrics that are mapped to the business metrics and goals.”

So from what I’ve heard/read etc, Analytics is a major component/subset of BSM…but BSM isn’t specifically or just analytics.   BUT, if you combine (from the DCAB) Resource Reconciliation, Process Orchestration and Analytics into one bundle things are looking very, very close to BSM as I read it.  With that said, going forward we will continue to watch/discuss BSM and it’s specific applicability to the data center.

To jump back to any of the previous topics in this series follow the below links:

Part one covered data collection
Part two covered applying analytics and business/service mapping to those collection points
Part three covered evolving the Data Center Automation Blueprint from Performance & Availability to Service Assurance.
Part four coverd the term analytics and how it’s applied as a standalone category and within categories of the DCAB.

Next action item would be a couple key updates to the Data Center Automation Blueprint.

Apr 17 2008   9:58PM GMT

Performance and Availability Management vs. Analytics - Part 1 of ?



Posted by: Ryan Shopp
nimsoft, cittio, eg innovations, Alcatel-Lucent, Analytics, Apparent Networks, Brix Networks, Compuware, Entuity, Fluke Networks, Gomez, Groundwork, Hyperic, Indicative, Application monitoring, DCAB, Firescope, HP Software, IBM Tivoli, InfoVista, Integrien, NetScout, Netuitive, Solarwinds, Systems monitoring, BMC, Quest Software, NetIQ, Network monitoring, Packet Design, Performance management, CA, Keynote, NAGIOS, NetQoS, Network Instruments, OpenNMS, Opnet, Xangati, ZenOSS

I’ve had an opportunity to be briefed over the past couple months by a number of current Data Center Automation Blueprint’s Performance & Availability vendors (e.g., CITTIO, eG Innovations, InfoVista, Integrien, Nimsoft).  With that and some further research I think I’m ready to take another pass at this area of the blueprint.

First up, all these vendors use a variety of techniques to collect a variety of data from as many points of view as possible.

  • Their own server agents that collect data about systems, services, applications, databases, etc and then aggregate back to a centralized console
  • Agent-less centralized consoles that leverage infrastructure standard communications protocols (e.g., SNMP, RPC, ODBC, WMI, SSH, TCP, UDP, HTTP) to query or connect remotely to collect data from networks, systems, services, applications, databases, etc.
  • Passive traffic flow collectors (which can be an agents or appliance) that are either in-line with the traffic flows or receive an exact copy of all traffic flows traversing a network connection (e.g., switch port uplink) through hardware vendor capabilities (e.g., spanning)

These data collection points can be statistics about a specific IT infrastructure resource ; physical devices, virtual devices, physical connections, virtual connections or resources running on physical or virtual devices like services, processes, applications, databases, etc.

Or the data collection points can be traffic flows or end-to-end specifics including passive traffic flows, synthetic transactions or even as simple as a pinging from remote points.

Metrics that are captured, typically revolve around throughput, errors, utilization, latency, up/down status, etc. (there are way to many to mention here).

After saying all this, there is a list a mile long of vendors (a number already noted on the DCAB) that capture these predominately time-series oriented data points about performance, capacity, availability using any/all these methods or vantage points (I know, passive traffic flows are not time-series data but patterns/usage/performance etc can be determined from them).

So, with all that data, what most these vendors offer are two primary types of functionality; 1) a variety graphical reports and 2)metric thresholding capabilities that produce a list of outstanding issues/alerts/alarms/events/concerns (whatever you want to call them).

Ok, so why did I organize and point all this out. So I can draw a line around where most of the innovation from my perspective is occurring. The above is for the most part in my eyes a commodity these days. Most companies have had collection/reporting/thresholding capabilities spanning multiple technology silos since pretty close to the start of the enterprise networking. The reports continue to get fancier, the number of data sources a single product collects from continues to expand, etc.  Another sign of commoditization is related to the variety of economic business models offering these products; open source, managed service providers, internet distributed products, appliances deployment models and indirect sales forces, large enterprise direct sales force, completely flexible frameworks for service providers to basically “build their own,” etc.

For the most part where the majority of technical innovation is occurring these days is the next layer above this data collection, reporting and alerting. Now let me say this, yes…there is some great innovation still occurring in the data collection realm (e.g., Xangati offering real-time Netflow down to a user level, PacketDesign monitoring routing messages, NetQoS leveraging advanced TCP/IP theory to analyze where end-to-end bottlenecks are occurring). But, for the most part these new data sources are being used to augment or replace currently deployed data sources in an attempt to see things from either as many vantage points or the best vantage points to avoid surprises within their unique enterprise IT environment.

So where is the serious innovation coming from…stay tuned for part 2.


Apr 14 2008   9:45PM GMT

Mapping HP Software to the Data Center Automation Blueprint



Posted by: Ryan Shopp
DataCenter, Analytics, CMDB, DCAB, HP Software, Integrien, Netuitive, GridApp Systems

I had the chance to recently chat with an executive at HP to breakdown what pieces and parts ended up where post Peregrine, Mercury and Opsware acquisitions. Here is my attempted and mapping them to the Data Center Automation Blueprint.

  • Configuration & Change
    • for networks - Network Automation Software (formerly Opsware, formerly Rendition)
    • for servers - Server Automation Software (formerly Opsware)
    • for storage - Storage Essentials Software (formerly Appilog)
  • Resource Reconciliation
    • Universal CMDB software (formerly Mercury, formerly AppLogic)
  • Process Orchestration - Operations Orchestration Software (formerly Opsware, formerly iConclude)

The focus of our call was around the above areas…from here I’m trying to piece together by using the website and the knowledge that:

  • The Business Service Management group is where all the monitoring products reside; Mercury (excluding QA products) and original OpenView monitoring products. There still seems like a ton of overlap here…
  • The IT Service Management is where Peregrine and the original HP Service Desk products reside.

So that means for the other functional areas of the Data Center Automation Blueprint we have:

  • Analytics
    • HP Dashboard software & HP Business Service Level Management - each offers a unified user interface consolidating reports and statistics spanning multiple other product lines within Performance & Availability to IT Service Desks.
  • Performance & Availability
    • Products that are event/availability centric for the Data Center Infrastructure
      • HP Network Node Manager software - agent-less performance and availability software for networks
      • HP Operations Manager software - agent-based performance and availability software for servers/services/applications/databases.
      • HP Problem Isolation software - agent-less performance and availability software for servers/services/application/databases.
      • HP Process Monitor software & HP TransactionVision software - agent-based performance and availability software for services/applications/databases
    • Products that are trend/capacity centric for the Data Center Infrastructure
      • HP Performance Insight software - agent-less time series performance and capacity reporting software for networks that also consolidates data for reporting on servers/services/applications/databases
      • HP SiteScope software - agent-less performance and availability software for servers/services/applications/databases
      • HP Performance Manager software & HP GlancePlus software - agent-based time series performance & capacity statistics collected from servers/services/applications/databases.l
      • HP Real-User Monitor software - monitors applications/services/data traffic flows
  • Security & Prevention
    • HP WebInspect software - web application vulnerability scanning
      • **NOTE: In my eyes, this is more a security extension to the QA and Testing products from Mercury then part of a security & prevention software portfolio like that of Symantec, McAfee or EMC RSA.

So there we have it (i think). Now please correct me if I’m wrong, but one thing I didn’t see in the portfolio was anything that does proactive performance analytics like Integrien, Netuitive or ProactiveNet (acquired by BMC)? Besides that, from an outside perspective they merely have a very confusing Performance & Availability functional category (due to Mercury/OpenView overlap) that does seem to have all the pieces. So for HP Software, it’s just about executing and tying things together based on end-to-end use cases from their customers. One other area to keep an eye on is Configuration & Change for databases (from companies like GridApp). As more and more enteprises deploy the Server Automation Software, they may start wanting to get more detailed in the world of databases, if so that may be a build/buy decision point to consider in the future. One other thing based on what I’ve read is all these products are busy making sure they extend beyond physical systems support into the virtualized world.

I guess one outstanding thing to ponder is why shouldn’t HP also offer a comprehensive security & prevention offering to help them better compete against IBM? At some point many people assume/expect security and operations to converge, why not help drive that with a comprehensive security offering?


Apr 7 2008   7:08PM GMT

Data Center Automation Blueprint - made a round of updates



Posted by: Ryan Shopp
DataCenter, DCAB

It’s been a few weeks since I’ve taken a pass at the Data Center Automation Blueprint, so it was time for some additions and clean-up.First up I added descriptions for some of the categories and made sure the known vendor list was up to date (e.g., BladeLogic acquired by BMC).

Resource Reconciliation
Description - Automation that captures a complete view of all IT resources, assets, services, etc. and their relationships, layers 1 through 7. This comprehensive view of all IT resources is the “record of truth” and needs to always be 100% accurate. Once in place, this is the hub of information that keeps all other monitoring and management solutions on the same page so nothing is missed or overlooked.

Process Orchestration
Description - Cross-silo automation for mundane manual or high occurrence tasks. The capabilities are focused around helping individual technology domains (e.g., network, windows, unix, database, etc) communicate and collaborate to automate tasks that before required numerous people and passing around a trouble ticket.

Configuration & Change
Description - Automation around making configuration or software changes in mass or in a more controlled, systematic way even if at an individual level. Understanding what the potential impact or risks are associated with making that change and keeping tabs on what is changing and if it is authorized or in line with established standards.

Top Capabilities
1) Making changes easier through a simplified user interface - enables more junior administrators to make traditionally more complex changes that required senior individuals.
2) Abstraction layer that enables the same change to be applied to a numerous resources, which includes spanning multiple vendors.
3) Ability to recommend when a change is not recommended or even unauthorized…understanding the interdependencies and risks associated with a change.

One area I’m still pondering back and forth is Analytics.  The more I research and dig into things, I’m seeing that analytics automation is functional category specific (e.g., Config, Performance or Availability) with only a hint of cross DCAB category integration today.  Examples include:

  • Configuration & Change Management vendors offering analytics for servers/applications, integrates in details from help desk solutions around the changes tickets but does offer a hint of cross-functional with applicable incidents from an availability solution
  • Performance & Capacity Management vendors offering analytics for end-to-end applications/services with a hint of configuration & change specifics so they know what change and if it has an impact.
  • Performance & Capacity Management vendors offering analytics through real-time algorithms that perform dynamic thresholding and problem fingerprinting based on performance and availability conditions.

It seems the point it goes serious cross-functional, we find it discussed in terms of Business Intelligence, Business Service Management or Dashboards.So my gut is telling me to go back to these 6 areas of the Data Center Automation Blueprint where analytics is a key area of capabilities within each functional area…not it’s own stand alone category:

  • Resource Reconciliation (aka CMDB)
  • Process Orchestration (aka RBA)
  • Availability & Notification
  • Performance & Capacity
  • Security & Protection
  • Configuration & Change
  • Any thoughts on this please speak up!


    Mar 11 2008   1:27PM GMT

    EMC adds Service Desk to Data Center Management portfolio



    Posted by: Ryan Shopp
    BladeLogic, DCAB, HP Software, BMC, NetIQ, Performance management, Symantec, EMC, NetQoS, Packet Design, Xangati

    EMC made a move yesterday that continued to show their intent and desire to compete against the Big 4 in IT Infrastructure Management (e.g., BMC, CA, HP, IBM).  All those other players have their own Service Desk offering, so it was time to join those ranks.

    Infra Corporation, was acquired by EMC’s Resource Management Software Business Unit for undisclosed financial terms.

    Combined with their previous acquisitions:

    SMARTS - Availability & Performance Management - Q1 2005
    nLayers -  IT  Resource Reconciliation (e.g., CMDB) - Q3 2006
    Voyence - Configuration & Change Management (for Network Devices) - Q4 2007

    This acquisition shows a slowly increasing pace of their acquisitions (within the software group).  With that being said, looking at their portfolio, I would be surprised if we don’t see another one or maybe even two (depending on the size) before the year is out.  Areas they could benefit from (aka we could see) would be Configuration & Change Management (for Systems/Applications) or a move to strengthen their Availability & Performance Management offering; specifically more application performance centric.

    On the CCM front there are numerous virtual & physical system configuration vendors sprouting up these days, versus before the primary game in town was BladeLogic (or Opsware before HP acquired them).  Meanwhile, on the Performance Management front they have a variety of options that could include grabbing a smaller application performance appliance vendor (e.g., Mazu, Xangati, Packet Design)  or something bigger like maybe a NetQoS.  Or even bigger and more interesting (but convoluted) could be buying out NetIQ who continues to innovate within Attachemate (e.g., Aegis product) or the artist formerly known as Precise Software (and now again known by the same name after Symantec spun them back out).  Probably long shots but just thoughts to ponder as the EMC Resource Management Software portfolio could use portfolio expansion in either or both functional areas of the DCAB.

    Bottom line from my outsiders perspective is EMC is one or two moves away from changing conversations from the big 4 to maybe the big 5.


    Mar 5 2008   7:59PM GMT

    Top Enterprise Management Tools vs. Data Center Automation Blueprint



    Posted by: Ryan Shopp
    DataCenter, Analytics, Application monitoring, CMDB, DCAB, HP Software, IBM Tivoli, InfoVista, IT Process Automation, Netuitive, RBA, RealOps, Run Book Automation, Systems monitoring, BMC, Network configuration, Network monitoring, Networkingchannel, Performance management, CA, NetQoS, Opnet, Tideway

    I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

    It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

    • Network Fault & Performance: CA eHealth & Spectrum
    • Consolidated Event Management: IBM Tivoli Netcool
    • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
    • Application Discovery Mapping: Tideway Foundation
    • Business Intelligence: Cognos
    • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
    • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
    • Process Automation: BMC RunBook Automation

    Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

    Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

    • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
    • Application Discovery Mapping, CMDB = IT Resource Reconciliation
    • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
    • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

    I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

    Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

    • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
    • Consolidated Event Management: IBM Tivoli Netcool
    • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
    • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
    • Business Intelligence: Cognos (which IBM recently acquired)
    • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
    • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
    • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

    Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


    Feb 28 2008   4:55PM GMT

    Analytics; What are the top capabilities?



    Posted by: Ryan Shopp
    Analytics, DataCenter, DCAB, Integrien, NCCM, BMC, Alterpoint, Configuresoft, Netuitive, Opnet

    Recently, I made some adjustments to the Data Center Automation Blueprint where we combined 2 original areas and added a new one for Analytics.  Steve Henning just posted a great guest blog entry over at Doug McClure’s blog called “Why Real Time Analytics?” I personally liked the analogy to TQM and the manufacturing industry.

    He also recently jotted down some of his thoughts on capabilities within the comments section for the posting “Data Center Automation Blueprint; now includes virtualization thoughts.”

    Here are some of my initial thoughts that I will take another pass at cleaning up in the next week or two.  I wanted to get this posted in a timely manner to hopefully inspire some discussions:

    1) Inter-domain Integrations - Steve called it “Cross Silo” in his comment post. But the analytics solutions need to have a data model and API/SDK that is not specific to one domain (e.g., databases, windows systems, network devices, websphere applications).  To perform holistic analysis you need more then one point of view.

    2) Pattern Logic Automation- Automation through algorithms, rules etc that work to mimic the human problem solving / analysis process.

    3) “Advanced” Graphical Visualization- more then summary graphics, pie charts etc…what I’m think here is something I can look at that helps me see the pattern or some unique situation/trend affecting the business (e.g., correlation of trouble ticket and performance monitoring details).  A better name then “advanced” is needed here for sure.

    So far the vendors I’m thinking of when I’m creating the above functionality list (as noted in the DCAB) include;

    Who else do we believe should be in this analytics bucket? Thoughts on these 3 capabilities?  What are some others?


    Feb 21 2008   11:18PM GMT

    IT Resource Reconciliation (CMDB) - Top 5 Capabilities



    Posted by: Ryan Shopp
    DataCenter, CMDB, DCAB, Tideway

    The crew over a Tideway offered up and impressive in-depth product demo last week. It made me realize I haven’t circled back to throw down my top five features for this functional area of the Data Center Automation Blueprint we’ve been working on.

    With that said, I was impressed with their comprehensive agent-less discovery vs. the agent centric approach of Symantec (Relicore), HP (Opsware), IBM (Collation), CA (Cendura) or the passive-flow based from EMC (nLayers). I know some of these vendors can do some discovery through an agent-less approach but to get comprehensive feature functionality they will lead you toward deploying their agents.

    So on to the top five features…

    1) Comprehensive discovery engine that can automate the identification of and it’s communications relationships for any IT resource (e.g., applications, databases, services, systems, storage, network etc)

    2) Impressive visibility capabilities including multi-layer topological / dependency mapping illustrations while offering comprehensive reporting options (e.g., graphical summaries down to detailed lists)

    3) Reconciliation automation where this solution serves as the “source of truth” for the current state of the IT resources in the data center. At a minimum this should offer the ability to report differences between this and other Data Center Automation solutions. The real deal would have embedded automation/integrations that keep all products synchronized, saving major amounts of time for the system administrators and avoiding an event from occurring when it unfortunately wasn’t being monitored.

    4) Accurate fingerprinting (e.g., discovery-to-data model mapping). Making sure the discovery process has the ability to keep up with newer software versions, new vendors etc for all the possible IT resources in the data center.

    5) A fast search engine to quickly find an IT resource you are: troubleshooting, need to review prior to putting in a change order to understand potential impact or may be susceptible to a recently announced security threat, etc.

    5b) A policy engine, built on the search engine, that enables users to define desired attributes for specific types of IT resources and be notified immediately when something doesn’t match that desired state so it can be remediated.

    One other thing I noticed about the Tideway product that was appealing was it’s transparent approach. All communications between their product and each IT resource are visible down to the specific commands that are run. This enable the product to quickly build trust with the user since they can see the specific queries/commands used and their results.

    I know their are other desired features so let’s hear them!

    Speaking of that, at some point I need to put together the “table stakes” features that any DCAB product should have. You know what I mean - slick dashboard (e.g. iGoogle), RBAC, SDK/API, Grouping, etc, etc, etc.

    I’ve also made a few more updates to the wiki summary version of the Data Center Automation Blueprint, come take a look and throw down some feedback.


    Feb 18 2008   6:20PM GMT

    links for 2008-02-18



    Posted by: Ryan Shopp
    DataCenter, DCAB

    Also, I’ve taken my second pass at updating the wiki page for the Data Center Automation Blueprint, I have the six areas with some of the vendors listed.  Next up is to round out each vendor list and also add in the key features/capabilities for each area.  Please come leave your comments or even make some edits yourself.


    Feb 14 2008   9:01PM GMT

    Data Center Automation Blueprint status



    Posted by: Ryan Shopp
    DataCenter, DCAB

    This is a work in progress as always, but here is a first stab at a round four graphic for the Data Center Automation Blueprint DCAB).

    As always comments are welcome but it’s time to migrate to a wiki driven interface for the DCAB to allow others to contribute or add comments specific to this model that won’t get lost between blog postings or require me to link from version to version to version.  I will migrate/copy the capabilities and categories from previous blog postings next week.

    data-center-automation-reference-model4.jpg