Adventures in Data Center Automation:

Run Book Automation

Mar 26 2008   2:03PM GMT

IT Performance Management Call for Resources; I have a dream for performance management



Posted by: Ryan Shopp
Integrien, Netuitive, RBA, Run Book Automation, BMC, Performance management

So in my last posting I called out for some links, resources that people recommend to others when it comes to understanding the variety of options and functions for Network & Application Performance Management.  Upon making the request I decided to spend a few minute looking around.  First up for me is a quick trip over to Wikipedia to see what they have on the topic.

On the topic of Network Performance Management; there is a nice write-up  on factors that contribute to performance issues - Latency, Packet loss, retransmission, throughput.

On the topic of Application Performance Management; there were some very in-depth graphs focused around monitoring response time which I found intriguing.

On the topic of Performance Engineering; I was very surprised not only by a nice write-up of principals and perspectives related to the software development lifecycle, but also a laundry list of interesting and applicable whitepapers at the bottom.

So at this point I stopped and started pondering, is there a product out there that goes beyond grabbing statistics and reporting on them?  Some tools collect data from flows, some collect data from individual resources, some tools set-up endpoints that systematically send sythentic transactions to measure response times, etc.

What do I really mean by this…is there a product that takes a troubleshooting workflow (think Run Book Automation) approach to the different steps involved with determining performance concern.  He is what I mean…

  • Start with monitoring traffic flows for their response time
  • Automatically baseline this and when a major deviation occurs go to the next bullet point
  • Is this traffic delay specific to a specific type of traffic or is affecting all traffic
  • What is causing this anomaly, calculate which points of the infrastructure are traversed by these traffic flows
  • Look at each input/output point on the infrastructure (e.g., interfaces) to see if their are errors, retransmissions, etc
  • If not errors, next look at each input/output point on the infrastructure to see if throughput in bottlenecked.
  • If no bottlenecks, next look at the processors/CPU on each point of the infrastructure to see if that is causing the delay
  • If no processor delays, look at…. (etc, etc, etc)

At this point I think we get the picture.  Most products I’m familiar with collect data metrics from one, two, three, etc points of view on the network and roll-up those into impressive looking graphical reports.  Then it’s up to the administer to review each report and self-analyze.  As mentioned previously in posts I’m familiar with Integrien, Netuitive & BMC (ProactiveNet) who perform impressive behavioral baselining in creating more intelligent alerts to forward to the event management console but I’m looking for more here.  I want someone to take all the collected data and basically apply root cause analysis/run book automation principles.  If someone is out there doing this please speak up and throw a link to your site down in the comments so I can come take a look.

Mar 5 2008   7:59PM GMT

Top Enterprise Management Tools vs. Data Center Automation Blueprint



Posted by: Ryan Shopp
DataCenter, Analytics, Application monitoring, CMDB, DCAB, HP Software, IBM Tivoli, InfoVista, IT Process Automation, Netuitive, RBA, RealOps, Run Book Automation, Systems monitoring, BMC, Network configuration, Network monitoring, Networkingchannel, Performance management, CA, NetQoS, Opnet, Tideway

I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

  • Network Fault & Performance: CA eHealth & Spectrum
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
  • Application Discovery Mapping: Tideway Foundation
  • Business Intelligence: Cognos
  • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
  • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
  • Process Automation: BMC RunBook Automation

Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

  • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
  • Application Discovery Mapping, CMDB = IT Resource Reconciliation
  • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
  • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

  • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
  • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
  • Business Intelligence: Cognos (which IBM recently acquired)
  • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
  • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
  • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


Jan 31 2008   5:04PM GMT

Month in Review - January 2008



Posted by: Ryan Shopp
DataCenter, Application monitoring, CMDB, IT Process Automation, ITIL, RBA, Run Book Automation, OSS, Virtualization

Thanks for all your feedback and insights during this months postings. Keep them coming!

Development of Data Center Automation Blueprint (DCAB)

Discussions beyond the DCAB functional areas

Overall DCA Trends and Observations

Current events in DCA

Keep the feedback and conversations flowing.  As I’ve mentioned before I just enjoy learning and talking about the innovation occurring in DCA…I’m really hoping and attempting to facilitate dialog from vendors and customers alike on various topics.  So don’t be shy, create an ID and leave some thoughts/comments!


Jan 17 2008   7:14PM GMT

What are the most desired features in IT Process Orchestration (e.g. RBA)?



Posted by: Ryan Shopp
DataCenter, Enigmatec, HP Software, IBM Tivoli, IT Process Automation, Opalis, Optinuity, RBA, RealOps, Run Book Automation, Stratavia, BMC, LANDesk, NetIQ, OpTier, Scapa Technologies

Alright, looking for feedback on this one. After talking about the players in the IT Process Orchestration space, I’m wondering what are the primary capabilities people are looking for?

Here are my top five, please feel free to throw down yours in the comments below:

  1. Drag/Drop graphical interface for designing process workflows
  2. Common, normalized Data Model of common/primary attributes
  3. Library of pre-defined, re-usable actions/triggers/processes for usage out-of-the-box (bigger the better - even a community that shares is a plus)
  4. Policy/Desired-state engine driving things
  5. Sandbox, simulator to help test workflows without impacting actual resources/instances within the production enterprise.

Beyond these five core capabilities, depending on the processes you wish to automate you need to verify what interaction/communications protocols are supported (e.g., SNMP, WMI, JMX, ODBC, Telnet/SSH/FTP to CLI, XML/Web Services). Make sure they have what you need to communicate with.

Of course, it also goes without saying (just like with any commercial product) table stakes require RBAC security, reporting, logging, appropriate hardware/software requirements.

Bottom line, I guarantee if your a medium to large enterprise you have current manual processes that these products can automate for you! Reducing errors due to the mundane nature of that task, freeing up people currently doing the task for other projects or tasks and also the intangible benefit of it’s simply faster which provides better customer service depending on the process that is automated. Make this a priority in 2008 and get one of these vendors in there to help out!

Disclosure: I have no relationships with any of the vendors in this space. The comments are all made based on my personal experiences and perspectives.


Jan 14 2008   8:42PM GMT

Digging into the DCAB 6’s functional areas: Process Orchestration



Posted by: Ryan Shopp
DataCenter, HP Software, IBM Tivoli, IT Process Automation, Opalis, Optinuity, RBA, Run Book Automation, Stratavia, BMC, NetIQ, OpTier, Scapa Technologies, LANDesk, Enigmatec, GridApp Systems

Alright, back on track with our review of the 6 functional DCAB areas. We are now onto the hottest, fastest growth areas! First up, Process Orchestration or what Gartner has coined as Run Book Automation?

These products offer the ability to define, build, orchestrate, manage, monitor and report on workflows that automate specific IT intra or inter domain processes (intra = between different products for the Windows Server team or inter = between the application and network team). There are a ton of case studies and examples on most the players websites.

A couple quick examples to get a flavor include:

A monitoring product identifies a specific condition (e.g., an outage), it then checks a configuration auditing product to see if a recent change was performed for that system.

A configuration auditing product monitoring if a device is in or out of compliance notices an situation and then automatically opens a trouble ticket. Later, it notices again the situation has been resolved and it adds the appropriate details to the ticket and automatically closes it out.

Here are the companies I know about (as always, in alphabetical order)

BMC (formerly RealOps)
Enigmatec
GridApp
HP (formerly Opsware, formerly iConclude)
IBM (formerly ThinkDynamics)
LANDesk (Process Manager product)
NetIQ (Aegis product)
OpTier
Opalis
Optinuity
Scapa Technologies
Stratavia
UC4 Software
xTigo

As always, who am I missing. What are the opinions out there from users or evaluators for each platform (please chime in down in the comments section). I have personal product exposure and experience with only BMC, Stratavia. Some of the key features that I learned from those products included the value of having a normalized, common data model and “action” abstraction capabilities so you re-use previous process actions in new workflows.

Here are a couple good reviews and write-ups for further reading if desired.

Data Center Manager Primed for IT Process Automation
IT Process Automaton Overview and review of some players


Dec 17 2007   5:59PM GMT

Next pass on Data Center Automation “Blueprint”



Posted by: Ryan Shopp
DataCenter, CMDB, eTOM, FCAPS, IT Process Automation, ITIL, Application monitoring, Network monitoring, Performance management, Security, Storage, Virtualization, RBA, Run Book Automation, Systems monitoring, Systemschannel, WAN optimization

Thanks for the feedback, I’ve incorporated some points that have been made into an updated version of the Data Center Automation Blueprint (DCAB).

data-center-automation-blueprint2.jpg

As mentioned previous this is a work in progress and I love getting feedback, ideas, concerns etc. with the model. As mentioned previously I’m trying to build a functional model (at the 30,000 foot level) that represents key software functionality to automate the data center towards someday becoming “lights out.”

Also, with that said, it needs to be comprehensive but not overwhelming. I want to keep the yellow DCA functional areas limited in number…if this grows to be much more then the current six I feel it becomes too complex. So to add any new areas I need to assess how do they compare to the current areas and could I combine any areas.

One I’m struggling with right now is I’ve received feedback that analytics itself is an area. The interesting thing is analytics currently fits to some degree within each of the 4 horizontal functional areas (e.g., Configuration/Change, Security/Protection) as each of those products offer advanced reporting and as that progresses they do predictive reporting and analytics around that functional area.

Analytics would also show up at the dashboard level (currently beyond the scope of what I’m defining as the functional areas of the Data Center Automation Blueprint) where you would correlate business intelligence, patterns etc. across not just Data Center Automation functional categories but also across manual task orchestration (e.g., service/help desk) details.

Thoughts?

One more thing to clear up, I know some (many) of these functional categories and their products extend beyond the Data Center. The lens this blog looks through is exclusively focused on the challenges posed by large, complex data centers. For example, I know performance products are also useful in all sized companies (big & small) and also beyond the data center (e.g., headquarters, remote offices, partner networks, etc).


Dec 10 2007   6:44PM GMT

2 of 10 are “disruptive” according to Gartner



Posted by: Ryan Shopp
CMDB, IT Process Automation, DataCenter, RBA, Run Book Automation

Gartner held their Data Center conference during the last week of November where a presentation was given by Research Vice President Carl Claunch on his view of the top 10 disruptive technologies for the data center.

Two of those 10 are management/automation oriented and have already found their way into the reference model or Data Center Automation “Blueprint” that we’ve been working on.

1) IT operations process automation (e.g., the DCA Blueprint currently calls this process orchestration)
Gartner estimates that operational error causes about 40% of all outages. Why? Because as technology gets cheaper, the same number of people have more to manage. Errors will happen.

“When you have these two trends, they intersect at points, and it’s time to shift what was human labor to automation,” he said. “As we move to a real-time infrastructure, we need to find a way of automating all of these tasks.”

2) CMDBs (e.g., the DCA Blueprint currently calls this this Resource Reconciliation)
According to Gartner, through 2009 the implementation of a configuration and change management strategy will reduce downtime by as much as 35%.

“Knowing what is actually happening is important to making sure it is working right,” he said. “If you try to respond to a problem based on outdated information, you make mistakes.”

Some of the others find themselves covered by our categorization of the Data Center Infrastructure itself.

These by far are the two hottest areas of the 6 functional areas currently covered by the model (e.g., DCA Blueprint).  They are also probably the most complex to solve.  Both require significant initial start-up costs to codify or integrate current technologies, software or processes into the functional applications themselves.  But the strategic and tactical ROI from what I’ve seen and heard so far is tremendous.

Special thanks to Mark Fontecchio’s complete article over on searchdatacenter.com where I grabbed the two snippets from on the two disruptive technologies.


Nov 28 2007   8:22PM GMT

IT Operations Process Automation - aka “Run Book” continues to mature!



Posted by: Ryan Shopp
DataCenter, BMC, RealOps, Optinuity, Opalis, Alterpoint, BladeLogic, HP Software, IT Process Automation, Run Book Automation, RBA, NetIQ, Stratavia

This is an area I haven’t hit on yet but will also need to fit into the reference model (that one of these days I’ll get back on track)

Lots of action what Gartner and others are calling Run Book Automation or RBA!!!  So let’s summarize the latest.

Optinuity launched a new version of their product that has also been re-branded. Attempting to elevate and differentiate itself beyond the other RBA vendors through re-focusing their primary target audience (from IT Operation Executives to Enterprise Application Executives) and adding specific functionality to provide a self-contained (not reliant on IT Operations) closed loop, automated process (e.g., application monitoring).  The goal, per talking with CEO Scott Stouffer, is to get as close to the enterprise applications themselves as possible (e.g., the teams that develop and/or perform the advanced support/administration for them).  One example discussed was a unique “locked account” scenario that was happening thousands of times a month and thus wasting hundreds, if not thousands of man hours a month!

Opalis launched a new version of their product (version 5.4) which includes some intriguing enhancements in the areas of automating virtualization and the ability to run simulations of process automation workflows prior to deployment in the live environment. They also continue to sport a very impressive list of out-of-the-box IT Operations centric connectors for products/companies that don’t have a process automation product including; BladeLogic, EMC, IBM, Microsoft, Symantec along with support for various product from the other big 4 vendors that do have competing products (e.g., BMC, CA, HP).

HP announces their re-branded suite that includes the former iConclude product HP has so many pieces for automating the data center (beyond the RBA capabilities)…the question now is can the execute on it’s organization (e.g., product bundling/branding), integration (e.g., focus on delivering the right use cases end-to-end) and deployments (e.g., making this all come together inside complex enterprises).

BMC made their move into this space back in the summer time (July) with their acquisition of RealOps. They re-branded this product as BMC Run Book Automation and are using it to tighten up and automate the process flows between their other products; Remedy, Atrium, Marimba, etc. Of course you can still use the platform to integrate with non-BMC product but they are going to focus on their own product line.

NetIQ recently threw their hat into the ring also. Now a subsidiary of Attachmate, they built their solution internally over the past couple years (prior to BMC or HP joining in). Their focus appears to be, in my opinion, around helping ensure their product AppManager stays competitive with other System/Application monitoring vendors (e.g., BMC, HP, IBM, CA, Microsoft). The challenge will be that the service desks they would integrate with are part of companies that now also offer this Run Book Automation technology. So basically, if your a current NetIQ customer and happy then you now won’t be as motivated to go to BMC or HP who own all three components (e.g., system monitoring, process automation and service desk).  Smart strategy move to continue innovating and keep current customers happy.

Stratavia also announced their latest product release in October.  Originally more focused on automation tasks for databases, they continue to evolve their product to be competitive with the other non-database centric but more system/applications centric vendors.  This database automation functionality evolved from their original business model of being a managed service provider for remote database management (at that time they were called ExtraQuest).

To that point, it’s amazing how many of these RBA or IT Process Automation companies come out of operational businesses.  Stratavia was original a managed services provider, RealOps came out of the consulting ranks from Windward Consulting.  This makes sense with various Data Center Automation function…they are very complex and challenging tasks that originally are tackled with service-based approaches only then to be automated with software.  Beyond this RBA sector, another couple vendors that started from similar origins would be Opsware (originally a managed service provider) and BladeLogic (whose founder were previously responsible for operating the infrastructure for a managed service provider)

I also read in a recent Forrester report by Jean-Pierre Garbani that the first market sizing forecast for the IT process automation software space is about $50 million today, but forecasted it to grow to about $700 million by 2015.  Now that is some SERIOUS GROWTH!

One last item, I want to give credit where credit is due to a former boss, colleague and friend Dave Williams who is now at Gartner.  I remember him talking about this space looong before anyone else!  That is recognized in this write-up by internetnews.com. When he left AlterPoint back in February 2006 I remember talking about these products over lunch a number of times.  I had the chance to work closely with the RealOps executive team when AlterPoint built a partnership and integration with them.

So if you have a very, very complex IT Operations environment or are seeing skilled people doing very unskilled/mundane tasks over and over and over…it’s time to check out one or more of these vendors!

So what other “Run Book Automation” vendors are out their at what have been your experiences so far with their products, the company itself and their partners???  Please chime in with your comments as I know their are a ton of people evaluating and using these products these days!