Adventures in Data Center Automation:

IBM Tivoli

Apr 17 2008   9:58PM GMT

Performance and Availability Management vs. Analytics - Part 1 of ?



Posted by: Ryan Shopp
nimsoft, cittio, eg innovations, Alcatel-Lucent, Analytics, Apparent Networks, Brix Networks, Compuware, Entuity, Fluke Networks, Gomez, Groundwork, Hyperic, Indicative, Application monitoring, DCAB, Firescope, HP Software, IBM Tivoli, InfoVista, Integrien, NetScout, Netuitive, Solarwinds, Systems monitoring, BMC, Quest Software, NetIQ, Network monitoring, Packet Design, Performance management, CA, Keynote, NAGIOS, NetQoS, Network Instruments, OpenNMS, Opnet, Xangati, ZenOSS

I’ve had an opportunity to be briefed over the past couple months by a number of current Data Center Automation Blueprint’s Performance & Availability vendors (e.g., CITTIO, eG Innovations, InfoVista, Integrien, Nimsoft).  With that and some further research I think I’m ready to take another pass at this area of the blueprint.

First up, all these vendors use a variety of techniques to collect a variety of data from as many points of view as possible.

  • Their own server agents that collect data about systems, services, applications, databases, etc and then aggregate back to a centralized console
  • Agent-less centralized consoles that leverage infrastructure standard communications protocols (e.g., SNMP, RPC, ODBC, WMI, SSH, TCP, UDP, HTTP) to query or connect remotely to collect data from networks, systems, services, applications, databases, etc.
  • Passive traffic flow collectors (which can be an agents or appliance) that are either in-line with the traffic flows or receive an exact copy of all traffic flows traversing a network connection (e.g., switch port uplink) through hardware vendor capabilities (e.g., spanning)

These data collection points can be statistics about a specific IT infrastructure resource ; physical devices, virtual devices, physical connections, virtual connections or resources running on physical or virtual devices like services, processes, applications, databases, etc.

Or the data collection points can be traffic flows or end-to-end specifics including passive traffic flows, synthetic transactions or even as simple as a pinging from remote points.

Metrics that are captured, typically revolve around throughput, errors, utilization, latency, up/down status, etc. (there are way to many to mention here).

After saying all this, there is a list a mile long of vendors (a number already noted on the DCAB) that capture these predominately time-series oriented data points about performance, capacity, availability using any/all these methods or vantage points (I know, passive traffic flows are not time-series data but patterns/usage/performance etc can be determined from them).

So, with all that data, what most these vendors offer are two primary types of functionality; 1) a variety graphical reports and 2)metric thresholding capabilities that produce a list of outstanding issues/alerts/alarms/events/concerns (whatever you want to call them).

Ok, so why did I organize and point all this out. So I can draw a line around where most of the innovation from my perspective is occurring. The above is for the most part in my eyes a commodity these days. Most companies have had collection/reporting/thresholding capabilities spanning multiple technology silos since pretty close to the start of the enterprise networking. The reports continue to get fancier, the number of data sources a single product collects from continues to expand, etc.  Another sign of commoditization is related to the variety of economic business models offering these products; open source, managed service providers, internet distributed products, appliances deployment models and indirect sales forces, large enterprise direct sales force, completely flexible frameworks for service providers to basically “build their own,” etc.

For the most part where the majority of technical innovation is occurring these days is the next layer above this data collection, reporting and alerting. Now let me say this, yes…there is some great innovation still occurring in the data collection realm (e.g., Xangati offering real-time Netflow down to a user level, PacketDesign monitoring routing messages, NetQoS leveraging advanced TCP/IP theory to analyze where end-to-end bottlenecks are occurring). But, for the most part these new data sources are being used to augment or replace currently deployed data sources in an attempt to see things from either as many vantage points or the best vantage points to avoid surprises within their unique enterprise IT environment.

So where is the serious innovation coming from…stay tuned for part 2.

Mar 17 2008   1:22PM GMT

BMC makes the big move, buys BladeLogic for $800M



Posted by: Ryan Shopp
BladeLogic, HP Software, IBM Tivoli, RealOps, BMC, CA, EMC

So BMC is the one, not IBM or EMC that decides to piece it all together.  Responding to HP acquiring Opsware (July ‘07); BMC, in less then a year, has acquired RealOps (July ‘07), Emprisa (Oct ‘07) and now BladeLogic pulling together the critical components for their DCA strategy that all tie in nicely with Remedy, Atrium etc.  Very impressive!  They have most the pieces, now it’s about execution on the vision/strategy.

So HP & BMC have acquired the major pieces, IBM has many of the pieces too, but some are showing their age versus the newer products that were acquired by their competitors.  CA has been the quietest of all players, so I would expect for them to make some moves to shore things up ASAP (but most likely at this point having to pay premiums based on previous CCM valuations).  Meanwhile, EMC has been methodically building themselves up in the hope to make a run at knocking off one of the big 4 in IT Infrastructure Management, but they still have some serious work based on the recent moves of some of the current big 4.

Data Center Automation is about to hit the major growth curve now that multiple big guys have strong portfolio’s in the game.  As predicted, 2008 is going to be hot for Data Center Automation!


Mar 5 2008   7:59PM GMT

Top Enterprise Management Tools vs. Data Center Automation Blueprint



Posted by: Ryan Shopp
DataCenter, Analytics, Application monitoring, CMDB, DCAB, HP Software, IBM Tivoli, InfoVista, IT Process Automation, Netuitive, RBA, RealOps, Run Book Automation, Systems monitoring, BMC, Network configuration, Network monitoring, Networkingchannel, Performance management, CA, NetQoS, Opnet, Tideway

I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

  • Network Fault & Performance: CA eHealth & Spectrum
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
  • Application Discovery Mapping: Tideway Foundation
  • Business Intelligence: Cognos
  • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
  • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
  • Process Automation: BMC RunBook Automation

Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

  • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
  • Application Discovery Mapping, CMDB = IT Resource Reconciliation
  • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
  • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

  • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
  • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
  • Business Intelligence: Cognos (which IBM recently acquired)
  • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
  • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
  • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


Jan 18 2008   4:14PM GMT

Digging into the DCAB’s 6 functional areas: Resource Reconciliation



Posted by: Ryan Shopp
DataCenter, CMDB, HP Software, IBM Tivoli, ITIL, Symantec, BMC, EMC

 The second up and coming area goes by many names these days.  Some call it next-generation asset management, many others call it CMDB.

I’m calling it resource reconciliation as I would like to see it extend beyond a discovery engine, IT asset database, dependency mapping and the necessary graphical topology and reports.  I also believe that these tools not only should communicate directly with the infrastructure outlined in the Data Center Automation Blueprint (DCAB) - but also synchronize and provide reconciliation capabilities with the 5 other DCAB functions.

What I’m saying is I want to make sure that all my other functional products always are 100% accurate to what my IT infrastructure contains.  There is no reason my performance & capacity products don’t know about a specific IT resource.  Nor, do I want multiple discovery engines combing my infrastructure setting off false alarms in my security products or requiring me to open additional communication avenues making the infrastructure less secure.

Here are a list of the vendors I know of, this space saw some major consolidation during  2006.

BMC
CA (Cendura acquisition)
EMC (nLayers acquisition)
HP (Opsware acquisition)
IBM (Collation acquisition)
Symantec (Relicore acquisition)
Tideway

Another area I’m researching and pondering inclusion in this category are service catalogs (e.g. NewScale)  Any thoughts or opinions on how they compare to the players/products  above?


Jan 17 2008   7:14PM GMT

What are the most desired features in IT Process Orchestration (e.g. RBA)?



Posted by: Ryan Shopp
DataCenter, Enigmatec, HP Software, IBM Tivoli, IT Process Automation, Opalis, Optinuity, RBA, RealOps, Run Book Automation, Stratavia, BMC, LANDesk, NetIQ, OpTier, Scapa Technologies

Alright, looking for feedback on this one. After talking about the players in the IT Process Orchestration space, I’m wondering what are the primary capabilities people are looking for?

Here are my top five, please feel free to throw down yours in the comments below:

  1. Drag/Drop graphical interface for designing process workflows
  2. Common, normalized Data Model of common/primary attributes
  3. Library of pre-defined, re-usable actions/triggers/processes for usage out-of-the-box (bigger the better - even a community that shares is a plus)
  4. Policy/Desired-state engine driving things
  5. Sandbox, simulator to help test workflows without impacting actual resources/instances within the production enterprise.

Beyond these five core capabilities, depending on the processes you wish to automate you need to verify what interaction/communications protocols are supported (e.g., SNMP, WMI, JMX, ODBC, Telnet/SSH/FTP to CLI, XML/Web Services). Make sure they have what you need to communicate with.

Of course, it also goes without saying (just like with any commercial product) table stakes require RBAC security, reporting, logging, appropriate hardware/software requirements.

Bottom line, I guarantee if your a medium to large enterprise you have current manual processes that these products can automate for you! Reducing errors due to the mundane nature of that task, freeing up people currently doing the task for other projects or tasks and also the intangible benefit of it’s simply faster which provides better customer service depending on the process that is automated. Make this a priority in 2008 and get one of these vendors in there to help out!

Disclosure: I have no relationships with any of the vendors in this space. The comments are all made based on my personal experiences and perspectives.


Jan 14 2008   8:42PM GMT

Digging into the DCAB 6’s functional areas: Process Orchestration



Posted by: Ryan Shopp
DataCenter, HP Software, IBM Tivoli, IT Process Automation, Opalis, Optinuity, RBA, Run Book Automation, Stratavia, BMC, NetIQ, OpTier, Scapa Technologies, LANDesk, Enigmatec, GridApp Systems

Alright, back on track with our review of the 6 functional DCAB areas. We are now onto the hottest, fastest growth areas! First up, Process Orchestration or what Gartner has coined as Run Book Automation?

These products offer the ability to define, build, orchestrate, manage, monitor and report on workflows that automate specific IT intra or inter domain processes (intra = between different products for the Windows Server team or inter = between the application and network team). There are a ton of case studies and examples on most the players websites.

A couple quick examples to get a flavor include:

A monitoring product identifies a specific condition (e.g., an outage), it then checks a configuration auditing product to see if a recent change was performed for that system.

A configuration auditing product monitoring if a device is in or out of compliance notices an situation and then automatically opens a trouble ticket. Later, it notices again the situation has been resolved and it adds the appropriate details to the ticket and automatically closes it out.

Here are the companies I know about (as always, in alphabetical order)

BMC (formerly RealOps)
Enigmatec
GridApp
HP (formerly Opsware, formerly iConclude)
IBM (formerly ThinkDynamics)
LANDesk (Process Manager product)
NetIQ (Aegis product)
OpTier
Opalis
Optinuity
Scapa Technologies
Stratavia
UC4 Software
xTigo

As always, who am I missing. What are the opinions out there from users or evaluators for each platform (please chime in down in the comments section). I have personal product exposure and experience with only BMC, Stratavia. Some of the key features that I learned from those products included the value of having a normalized, common data model and “action” abstraction capabilities so you re-use previous process actions in new workflows.

Here are a couple good reviews and write-ups for further reading if desired.

Data Center Manager Primed for IT Process Automation
IT Process Automaton Overview and review of some players


Jan 5 2008   7:40PM GMT

Digging into the DCAB 6 functional areas: Security and Protection



Posted by: Ryan Shopp
DataCenter, Reconnex, NetForensics, LogLogic, ArcSight, EMC, Ecora, Skybox Security, Tripwire, nCircle, Vericept, Configuresoft, HP Software, IBM Tivoli, Symantec

The massive number of security management vendors make simply covering this portion of the DCAB a very intimidating task. So many technology approaches and different data center technology focuses (e.g., networks vs. system vs. applications etc). I’ve attempted a first pass at sub-dividing this functional area. I know that do to it’s vastness, I’m going to miss tons of vendors I already know about and also stretch the categories a little in my attempt to limit the number of sub-divisions.

Proactive Identification (proactive searching for a potential exposure point that could become a situation) which includes:

  • IP Scanning - query remotely that simply requires IP address to gather information and determine if their is a potential condition of concern. Vendors include: eEye, nCircle, Nessus, Qualys, McAfee, Rapid7
  • Configuration/Settings Auditing - query remotely (using credentials) or having an agent on the system to take a more details look at the configuration files, etc. Vendors include: ConfigureSoft, Ecora, nCircle, Tripwire, Solidcore, Skybox Security
  • Penetration Testing - remote query attempts to actually expose or harm a data center resource. Vendor include: Core Security, HP (former Spi Dynamics), IBM (former Watchfire), Imperva, Mu Security, BreakingPoint Systems

Reactive Identification (reactive, collecting of events or watching data flows to identify a condition or re-occuring trend)

  • Security Event Consolidation (aka. SEM) - unified view of events from a variety of sources with the hope that you can quickly identify a problem and resolve it sooner after it occurred, or seeing something that tells you that problem may be about to happen. Vendors include: ArcSight, NetForensics, EMC/RSA
  • Information Archival & Reporting (aka. SIM) - archiving and then the analysis and mining of all that event data to identify a re-occurring situation that could be resolved. This archive is also a great resource for reporting certain compliance situation to auditors. Vendors include: ArcSight, NetForensics, LogLogic
  • Data Leakage - monitoring activities or traffic flows to identify if sensitive information is being . Vendors include: EMC/RSA (Tablus), Reconnex, Symantec (Vontu), Vericept

Alright, that will have to do for now. Identity & Access Management is a whole other area but this will have to do for now. Wow, I’m really starting to realize that this DCAB was biting off more then I could honestly chew :) Hopefully, it will prove helpful to someone out there. When I do start to make updates the best way to manage that may be moving this to a wiki.

Quick status check, I’ve now taken a first pass on 4 of the 6 functional areas (and most of them require/deserve a return visit sometime soon). Each functional area alone probably could/would be topic enough for an individual blogger (any volunteers). I’ve also had some great recent conversations with people on virtualization, process orchestration and resource reconciliation that i’m eager to talk about. So as I’ve stated before, comments are open for anyone and everyone to add thoughts and commentary. Which vendors did I miss, what capabilities/functions did I miss as we monitor the security in our data center.


Jan 2 2008   11:10PM GMT

Digging into the DCAB 6’s functional areas: Configuration and Change



Posted by: Ryan Shopp
DataCenter, Ecora, BladeLogic, Cassatt, Configuresoft, HP Software, IBM Tivoli, mValent, Scalent, Solidcore, BMC, CA, EMC

There seem to be two key components or approaches to this functional area. Some vendors are focused on auditing & monitoring the configuration/state of a device while others are focused on that and the provisioning/deployment of configuration/software to a device. Typically, the vendors going across data center technology categories are audit-centric.

Vendors doing both Deployment & Auditing (listed alphabetical)

  • AlterPoint (for network devices)
  • BladeLogic (for appilcations, servers)
  • BMC (for applications, servers with Marimba acquisition and networks with Emprisa acquisition)
  • CA (for systems)
  • Cassatt (for systems, applications, networks
  • Cisco (for network devices)
  • ConfigureSoft (for applications, servers)
  • Ecora (for servers, applications)
  • EMC (for network with Voyence acquisition, for storage with ControlCenter)
  • HP (former Opsware for applications, servers, networks, storage)
  • IBM Tivoli (for applications, servers)
  • mValent (for applications)
  • Phurnace (for applications)
  • Scalent Systems (for servers, applications)
  • Symantec (for servers, applications with Jareva, Altiris and storage with CommandCenter)

Vendors focused on Auditing

Vendors that do both primarily for desktop’s which extends to provide some server configuration and change capabilities for the data center

Just as with my previous post on Performance & Capacity I’m not done with this one. I started going through the laundry list of vendors in the “virtualization” space but simply ran out of my allocated time for today. So I’ll pick back up on it at a later time


Dec 28 2007   11:31PM GMT

Digging into each of these 6 functional areas: Performance and Capacity



Posted by: Ryan Shopp
DataCenter, HP Software, IBM Tivoli, InfoVista, Integrien, Netuitive, Systems monitoring, OSS, BMC, Quest Software, NetIQ, Network monitoring, Performance management, CA, Zabbix, ZenOSS, OpenNMS, NAGIOS, Hyperic, Groundwork, Packet Design, Apparent Networks, Xangati, Gomez, Keynote, Brix Networks, Entuity, Opnet, Network Instruments, Fluke Networks, Alcatel-Lucent, Compuware, NetScout, NetQoS, Symantec, EMC

First things first, we have many of the same vendors from the Availability & Notification functional area of this Data Center Automation Blueprint in this category. Which probably begs the question, do we combine Availability & Notification with Performance & Capacity? I know in the OSS (not Open Source Software but telco-oriented Operational  Support Systems) model they do this and call it “Service Assurance”, another name could be Service Level Management as they two monitoring-centric functions are about ensuring service levels are met…or simply I call it Availability & Performance? I’ll come back to this at the end after I type up the players in this Performance & Capacity area:

But then, we have a slew of others that have been around for quite some time now…

And some innovative up-and-comers in some unique technology/approaches…

Real-Time Behavior/Pattern Analysis through Dynamic Thresholding

IP Traffic/Packet Flow Monitoring & Analysis

Open Source Software (OSS) vendors

Whew..that was more work then I expected to pull together and I’m not done yet…  Please throw into the comment who I’ve missed (I know there has to be a few).

The major challenge here is organizing and breaking down this functional area.  There are so many approaches to obtain performance metrics from/for the data center.  Some of the techniques and perspectives include;

  • passive vs. active
  • agent vs. agent-less
  • in-line appliance vs. out-of-band appliance (e.g., span a port)
  • proprietary vs. leverage infrastructure mgmt. capabilities (e.g., Cisco Netflow)
  • outside the data center looking in vs. inside the data center itself.
  • Reactive troubleshooting vs. Proactive Predictive

I’m going to need to have a part two (and maybe more) for this functional category breaking down the pro’s and con’s of various approaches.  Which vendors do what, etc.  I also need to revisit that question from the top of do we combine this into a single “availability & performance” functional category???  For now, this first pass will have to do…


Dec 27 2007   6:04PM GMT

Great write-up on Security Managment activities this year



Posted by: Ryan Shopp
Symantec, HP Software, IBM Tivoli, Security, Securitychannel, EMC

I have Security as one of the 6 DCAB Functional Categories.  This article does a great job highlighting some key landscape changes in the overall Security Management market (some items are beyond what is covered by this blog).  As it relates to monitoring/managing the security of the data center this points out some key activities:

  • Web Application Vulnerability Scanning - IBM acquiring Watchfire, HP acquiring SPI
  • Data Leakage Monitoring - Symantec acquiring Vontu, EMC acquire Tablus and others.

As noted, these capabilities aren’t exclusive to the data center but have applicability.