Adventures in Data Center Automation:

BMC

Apr 21 2008   3:11PM GMT

Performance and Availability vs. Analytics - Part 2 of ?



Posted by: Ryan Shopp
DataCenter, Analytics, eg innovations, Indicative, HP Software, Integrien, Netuitive, BMC, NetQoS, Opnet

So in part 1 we talked through the collection of performance/capacity/availability data. Next up is focused on where innovations using this collected data are taking us.

The next level of Performance & Availability I previously mentioned are coming from a variety of companies doing cross-metric analysis or even automated behavioral analytics. These vendors are typically classify themselves as Service Level Management, some types of Business Service Management or Analytics. They either leverage a variety of data collection entities or they themselves offer capabilities that span multiple sources to elevate and/or automate results in the hope of proactive (even predictive) identification of issues with minimal (striving for zero) false positives. Here are some more thoughts on each of these areas:

  • Service Level Agreement vendors seem to focus on leveraging a variety of data sources/metrics and normalizing them into very detailed quality of service/performance agreements between a service provider and their customers (in some situations the service provider is the internal IT department themselves).
  • Business Service Management vendors in the realm of performance/capacity/availability seem to focus on the mapping of each business service (e.g., application(s) and the infrastructure that supports those application(s) from and end-to-end perspective). Then, if any component in the mapped bundle shows signs of trouble, an alert is raised for proactive resolution.  NOTE:  BSM is a very broad term - I’m focusing it down here on just this functional area, I’m not talking comprehensive dashboard spanning all functional areas, service desks etc.
  • Real-time Analytic vendors seem to leverage a variety of time-series metrics from various collection sources mapped together appropriately (like BSM), then using behavioral algorithms they dynamically determine normal behavior. If something deviates from that behavior then in real-time it raises an alarm (now were getting predictive).
  • Historical Analytics or modeling/simulation vendors seem to leverage a variety of data sources coupled with other cross-functional details (e.g., CMDB, configuration settings) to establish a model and expected behavior. Then you can tweak, tune or even re-design to see impact of potential changes, upgrades, etc.

We could probably come up with better names for these higher level performance/capacity/availability areas but Service Level Management, Business Service Management and Performance Analytics are the ones on the marketing being advertised today.

One area of data collection and reporting that does continue to innovate  is from the end-user, passive traffic flow perspective. This first popped up on the scene back in the last 1990’s and since then there seems to have been a major resurgence in vendors focusing on specific, mission-critical applications. Since these agents typically reside and monitor from the desktop or mobile device perspective I’ve placed them beyond the scope and control of Data Center Automation. Some vendors are doing the end-to-end monitoring (as mentioned before) from an appliance in the data center making some TCP/IP assumptions (e.g., NetQoS, CA Wily).

So now we’ve discussed Performance/Capacity/Availability management and how it also has analytics occurring within that functional silo. So what does that mean to the Data Center Automation Blueprint from my perspective. Stay tuned for part 3.

Apr 17 2008   9:58PM GMT

Performance and Availability Management vs. Analytics - Part 1 of ?



Posted by: Ryan Shopp
nimsoft, cittio, eg innovations, Alcatel-Lucent, Analytics, Apparent Networks, Brix Networks, Compuware, Entuity, Fluke Networks, Gomez, Groundwork, Hyperic, Indicative, Application monitoring, DCAB, Firescope, HP Software, IBM Tivoli, InfoVista, Integrien, NetScout, Netuitive, Solarwinds, Systems monitoring, BMC, Quest Software, NetIQ, Network monitoring, Packet Design, Performance management, CA, Keynote, Nagios, NetQoS, Network Instruments, OpenNMS, Opnet, Xangati, ZenOSS

I’ve had an opportunity to be briefed over the past couple months by a number of current Data Center Automation Blueprint’s Performance & Availability vendors (e.g., CITTIO, eG Innovations, InfoVista, Integrien, Nimsoft).  With that and some further research I think I’m ready to take another pass at this area of the blueprint.

First up, all these vendors use a variety of techniques to collect a variety of data from as many points of view as possible.

  • Their own server agents that collect data about systems, services, applications, databases, etc and then aggregate back to a centralized console
  • Agent-less centralized consoles that leverage infrastructure standard communications protocols (e.g., SNMP, RPC, ODBC, WMI, SSH, TCP, UDP, HTTP) to query or connect remotely to collect data from networks, systems, services, applications, databases, etc.
  • Passive traffic flow collectors (which can be an agents or appliance) that are either in-line with the traffic flows or receive an exact copy of all traffic flows traversing a network connection (e.g., switch port uplink) through hardware vendor capabilities (e.g., spanning)

These data collection points can be statistics about a specific IT infrastructure resource ; physical devices, virtual devices, physical connections, virtual connections or resources running on physical or virtual devices like services, processes, applications, databases, etc.

Or the data collection points can be traffic flows or end-to-end specifics including passive traffic flows, synthetic transactions or even as simple as a pinging from remote points.

Metrics that are captured, typically revolve around throughput, errors, utilization, latency, up/down status, etc. (there are way to many to mention here).

After saying all this, there is a list a mile long of vendors (a number already noted on the DCAB) that capture these predominately time-series oriented data points about performance, capacity, availability using any/all these methods or vantage points (I know, passive traffic flows are not time-series data but patterns/usage/performance etc can be determined from them).

So, with all that data, what most these vendors offer are two primary types of functionality; 1) a variety graphical reports and 2)metric thresholding capabilities that produce a list of outstanding issues/alerts/alarms/events/concerns (whatever you want to call them).

Ok, so why did I organize and point all this out. So I can draw a line around where most of the innovation from my perspective is occurring. The above is for the most part in my eyes a commodity these days. Most companies have had collection/reporting/thresholding capabilities spanning multiple technology silos since pretty close to the start of the enterprise networking. The reports continue to get fancier, the number of data sources a single product collects from continues to expand, etc.  Another sign of commoditization is related to the variety of economic business models offering these products; open source, managed service providers, internet distributed products, appliances deployment models and indirect sales forces, large enterprise direct sales force, completely flexible frameworks for service providers to basically “build their own,” etc.

For the most part where the majority of technical innovation is occurring these days is the next layer above this data collection, reporting and alerting. Now let me say this, yes…there is some great innovation still occurring in the data collection realm (e.g., Xangati offering real-time Netflow down to a user level, PacketDesign monitoring routing messages, NetQoS leveraging advanced TCP/IP theory to analyze where end-to-end bottlenecks are occurring). But, for the most part these new data sources are being used to augment or replace currently deployed data sources in an attempt to see things from either as many vantage points or the best vantage points to avoid surprises within their unique enterprise IT environment.

So where is the serious innovation coming from…stay tuned for part 2.


Mar 26 2008   2:03PM GMT

IT Performance Management Call for Resources; I have a dream for performance management



Posted by: Ryan Shopp
Integrien, Netuitive, RBA, Run Book Automation, BMC, Performance management

So in my last posting I called out for some links, resources that people recommend to others when it comes to understanding the variety of options and functions for Network & Application Performance Management.  Upon making the request I decided to spend a few minute looking around.  First up for me is a quick trip over to Wikipedia to see what they have on the topic.

On the topic of Network Performance Management; there is a nice write-up  on factors that contribute to performance issues - Latency, Packet loss, retransmission, throughput.

On the topic of Application Performance Management; there were some very in-depth graphs focused around monitoring response time which I found intriguing.

On the topic of Performance Engineering; I was very surprised not only by a nice write-up of principals and perspectives related to the software development lifecycle, but also a laundry list of interesting and applicable whitepapers at the bottom.

So at this point I stopped and started pondering, is there a product out there that goes beyond grabbing statistics and reporting on them?  Some tools collect data from flows, some collect data from individual resources, some tools set-up endpoints that systematically send sythentic transactions to measure response times, etc.

What do I really mean by this…is there a product that takes a troubleshooting workflow (think Run Book Automation) approach to the different steps involved with determining performance concern.  He is what I mean…

  • Start with monitoring traffic flows for their response time
  • Automatically baseline this and when a major deviation occurs go to the next bullet point
  • Is this traffic delay specific to a specific type of traffic or is affecting all traffic
  • What is causing this anomaly, calculate which points of the infrastructure are traversed by these traffic flows
  • Look at each input/output point on the infrastructure (e.g., interfaces) to see if their are errors, retransmissions, etc
  • If not errors, next look at each input/output point on the infrastructure to see if throughput in bottlenecked.
  • If no bottlenecks, next look at the processors/CPU on each point of the infrastructure to see if that is causing the delay
  • If no processor delays, look at…. (etc, etc, etc)

At this point I think we get the picture.  Most products I’m familiar with collect data metrics from one, two, three, etc points of view on the network and roll-up those into impressive looking graphical reports.  Then it’s up to the administer to review each report and self-analyze.  As mentioned previously in posts I’m familiar with Integrien, Netuitive & BMC (ProactiveNet) who perform impressive behavioral baselining in creating more intelligent alerts to forward to the event management console but I’m looking for more here.  I want someone to take all the collected data and basically apply root cause analysis/run book automation principles.  If someone is out there doing this please speak up and throw a link to your site down in the comments so I can come take a look.


Mar 17 2008   1:22PM GMT

BMC makes the big move, buys BladeLogic for $800M



Posted by: Ryan Shopp
BladeLogic, HP Software, IBM Tivoli, RealOps, BMC, CA, EMC

So BMC is the one, not IBM or EMC that decides to piece it all together.  Responding to HP acquiring Opsware (July ‘07); BMC, in less then a year, has acquired RealOps (July ‘07), Emprisa (Oct ‘07) and now BladeLogic pulling together the critical components for their DCA strategy that all tie in nicely with Remedy, Atrium etc.  Very impressive!  They have most the pieces, now it’s about execution on the vision/strategy.

So HP & BMC have acquired the major pieces, IBM has many of the pieces too, but some are showing their age versus the newer products that were acquired by their competitors.  CA has been the quietest of all players, so I would expect for them to make some moves to shore things up ASAP (but most likely at this point having to pay premiums based on previous CCM valuations).  Meanwhile, EMC has been methodically building themselves up in the hope to make a run at knocking off one of the big 4 in IT Infrastructure Management, but they still have some serious work based on the recent moves of some of the current big 4.

Data Center Automation is about to hit the major growth curve now that multiple big guys have strong portfolio’s in the game.  As predicted, 2008 is going to be hot for Data Center Automation!


Mar 11 2008   1:27PM GMT

EMC adds Service Desk to Data Center Management portfolio



Posted by: Ryan Shopp
BladeLogic, DCAB, HP Software, BMC, NetIQ, Performance management, Symantec, EMC, NetQoS, Packet Design, Xangati

EMC made a move yesterday that continued to show their intent and desire to compete against the Big 4 in IT Infrastructure Management (e.g., BMC, CA, HP, IBM).  All those other players have their own Service Desk offering, so it was time to join those ranks.

Infra Corporation, was acquired by EMC’s Resource Management Software Business Unit for undisclosed financial terms.

Combined with their previous acquisitions:

SMARTS - Availability & Performance Management - Q1 2005
nLayers -  IT  Resource Reconciliation (e.g., CMDB) - Q3 2006
Voyence - Configuration & Change Management (for Network Devices) - Q4 2007

This acquisition shows a slowly increasing pace of their acquisitions (within the software group).  With that being said, looking at their portfolio, I would be surprised if we don’t see another one or maybe even two (depending on the size) before the year is out.  Areas they could benefit from (aka we could see) would be Configuration & Change Management (for Systems/Applications) or a move to strengthen their Availability & Performance Management offering; specifically more application performance centric.

On the CCM front there are numerous virtual & physical system configuration vendors sprouting up these days, versus before the primary game in town was BladeLogic (or Opsware before HP acquired them).  Meanwhile, on the Performance Management front they have a variety of options that could include grabbing a smaller application performance appliance vendor (e.g., Mazu, Xangati, Packet Design)  or something bigger like maybe a NetQoS.  Or even bigger and more interesting (but convoluted) could be buying out NetIQ who continues to innovate within Attachemate (e.g., Aegis product) or the artist formerly known as Precise Software (and now again known by the same name after Symantec spun them back out).  Probably long shots but just thoughts to ponder as the EMC Resource Management Software portfolio could use portfolio expansion in either or both functional areas of the DCAB.

Bottom line from my outsiders perspective is EMC is one or two moves away from changing conversations from the big 4 to maybe the big 5.


Mar 5 2008   7:59PM GMT

Top Enterprise Management Tools vs. Data Center Automation Blueprint



Posted by: Ryan Shopp
DataCenter, Analytics, Application monitoring, CMDB, DCAB, HP Software, IBM Tivoli, InfoVista, IT Process Automation, Netuitive, RBA, RealOps, Run Book Automation, Systems monitoring, BMC, Network configuration, Network monitoring, Networkingchannel, Performance management, CA, NetQoS, Opnet, Tideway

I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

  • Network Fault & Performance: CA eHealth & Spectrum
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
  • Application Discovery Mapping: Tideway Foundation
  • Business Intelligence: Cognos
  • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
  • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
  • Process Automation: BMC RunBook Automation

Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

  • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
  • Application Discovery Mapping, CMDB = IT Resource Reconciliation
  • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
  • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

  • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
  • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
  • Business Intelligence: Cognos (which IBM recently acquired)
  • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
  • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
  • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


Feb 28 2008   4:55PM GMT

Analytics; What are the top capabilities?



Posted by: Ryan Shopp
Analytics, DataCenter, DCAB, Integrien, NCCM, BMC, Alterpoint, Configuresoft, Netuitive, Opnet

Recently, I made some adjustments to the Data Center Automation Blueprint where we combined 2 original areas and added a new one for Analytics.  Steve Henning just posted a great guest blog entry over at Doug McClure’s blog called “Why Real Time Analytics?” I personally liked the analogy to TQM and the manufacturing industry.

He also recently jotted down some of his thoughts on capabilities within the comments section for the posting “Data Center Automation Blueprint; now includes virtualization thoughts.”

Here are some of my initial thoughts that I will take another pass at cleaning up in the next week or two.  I wanted to get this posted in a timely manner to hopefully inspire some discussions:

1) Inter-domain Integrations - Steve called it “Cross Silo” in his comment post. But the analytics solutions need to have a data model and API/SDK that is not specific to one domain (e.g., databases, windows systems, network devices, websphere applications).  To perform holistic analysis you need more then one point of view.

2) Pattern Logic Automation- Automation through algorithms, rules etc that work to mimic the human problem solving / analysis process.

3) “Advanced” Graphical Visualization- more then summary graphics, pie charts etc…what I’m think here is something I can look at that helps me see the pattern or some unique situation/trend affecting the business (e.g., correlation of trouble ticket and performance monitoring details).  A better name then “advanced” is needed here for sure.

So far the vendors I’m thinking of when I’m creating the above functionality list (as noted in the DCAB) include;

Who else do we believe should be in this analytics bucket? Thoughts on these 3 capabilities?  What are some others?


Jan 25 2008   9:00AM GMT

Couple recent notes on CMDB, aka Resource Reconciliation



Posted by: Ryan Shopp
DataCenter, CMDB, Opalis, Scalent, Symantec, BMC, NetIQ, CA

Another great post by Glenn O’Donnell; CMDB is the new integration mechanism. I’m looking forward to seeing his forthcoming book on the same topic!

2007 TechTarget Products of the Year - Data Center include (categories by DCAB functional categories):

Resource Reconciliation (category combined with Configuration & Change) solutions from CA, BMC and Scalent

A couple other categories that map to the DCAB are;

Process Orchestration solutions from Symantec, Opalis and CA

Performance & Capacity solutions from NetIQ, BalancePoint and CiRBA

I find the CiRBA solution very intriguing after my read and post on Innovations in Performance Management yesterday.


Jan 24 2008   3:11PM GMT

Innovations and evolutions in Performance Managment



Posted by: Ryan Shopp
Integrien, Netuitive, BMC, NetQoS

Great write-up by Glenn on the innovation occurring in Performance Management; Get Innovative About Performance.  He has tremendous perspective and I’m excited to sees his candid perspective back now that he has departed EMC.  Great job Glenn, keep it up!

A well articulated summary of entire post from my perspective is this statement:  “Analysis has proven effective for fault management (evaluation of up/down conditions), but performance is a different animal. Whereas fault management deals with binary conditions of black and white, performance involves the full pallet of colors and shades of gray. Of course, dealing in colors is much more difficult than black and white, but help is now here.”

If your a vendor or enterprise doing something innovative, beyond reporting, in Performance Management please throw down some details in the comments section sharing the company, capability and benefits for other (including myself) to check out.

BTW, my conversation thread on virtualization I’ve put on hold for a week or two.  I’m still pulling together some research and thoughts.  In the meanwhile, a great resource I’ve come across for learning and tracking the world of virtualization is Virtualization.info. 


Jan 21 2008   1:43PM GMT

Quick Monday Summary of events from late last week/weekend



Posted by: Ryan Shopp
Compuware, Symantec, BMC, Quest Software, NetIQ, Indicative, NetQoS, NetScout

 Symantec to sell off Application Performance Monitoring group.  Looks like Precise Software is back and the Symantec Data Center group will focus in on the configuration and change management side of things.

BarcampESM took place over the weekend.  Here are some materials to take a look at.  BSM by Doug,  Discussions around open software and open standards, the desire for an “open agent” .  From this point forward keep track of things via the Open Management Consortium discussions.

Application Performance Management(APM) rolling review continues at InformationWeek - recently highlighted, ProactiveNet (recently acquired by BMC).  Previous reviews include Quest Software Foglight (Dec 2007), Network General (Nov 2007), Nimsoft Nimbus (Oct 2007), Compuware Vantage (Oct 2007), NetIQ AppManager (Sept 2007), NetQoS SuperAgent (Sept 2007)Indicative (Aug 2007).  As you can see this is a very congested space, pardon the pun, but it is sized to be over $2B in size by Forrester.

Now that we’ve run through the entire 6 functional areas of the Data Center Automation Blueprint we plant to discuss the impact of virtualization over the next couple posts.  Thanks in advance to those I’ve been talking with and their perspectives on this topic.