Adventures in Data Center Automation:

Application monitoring

Apr 17 2008   9:58PM GMT

Performance and Availability Management vs. Analytics - Part 1 of ?



Posted by: Ryan Shopp
nimsoft, cittio, eg innovations, Alcatel-Lucent, Analytics, Apparent Networks, Brix Networks, Compuware, Entuity, Fluke Networks, Gomez, Groundwork, Hyperic, Indicative, Application monitoring, DCAB, Firescope, HP Software, IBM Tivoli, InfoVista, Integrien, NetScout, Netuitive, Solarwinds, Systems monitoring, BMC, Quest Software, NetIQ, Network monitoring, Packet Design, Performance management, CA, Keynote, Nagios, NetQoS, Network Instruments, OpenNMS, Opnet, Xangati, ZenOSS

I’ve had an opportunity to be briefed over the past couple months by a number of current Data Center Automation Blueprint’s Performance & Availability vendors (e.g., CITTIO, eG Innovations, InfoVista, Integrien, Nimsoft).  With that and some further research I think I’m ready to take another pass at this area of the blueprint.

First up, all these vendors use a variety of techniques to collect a variety of data from as many points of view as possible.

  • Their own server agents that collect data about systems, services, applications, databases, etc and then aggregate back to a centralized console
  • Agent-less centralized consoles that leverage infrastructure standard communications protocols (e.g., SNMP, RPC, ODBC, WMI, SSH, TCP, UDP, HTTP) to query or connect remotely to collect data from networks, systems, services, applications, databases, etc.
  • Passive traffic flow collectors (which can be an agents or appliance) that are either in-line with the traffic flows or receive an exact copy of all traffic flows traversing a network connection (e.g., switch port uplink) through hardware vendor capabilities (e.g., spanning)

These data collection points can be statistics about a specific IT infrastructure resource ; physical devices, virtual devices, physical connections, virtual connections or resources running on physical or virtual devices like services, processes, applications, databases, etc.

Or the data collection points can be traffic flows or end-to-end specifics including passive traffic flows, synthetic transactions or even as simple as a pinging from remote points.

Metrics that are captured, typically revolve around throughput, errors, utilization, latency, up/down status, etc. (there are way to many to mention here).

After saying all this, there is a list a mile long of vendors (a number already noted on the DCAB) that capture these predominately time-series oriented data points about performance, capacity, availability using any/all these methods or vantage points (I know, passive traffic flows are not time-series data but patterns/usage/performance etc can be determined from them).

So, with all that data, what most these vendors offer are two primary types of functionality; 1) a variety graphical reports and 2)metric thresholding capabilities that produce a list of outstanding issues/alerts/alarms/events/concerns (whatever you want to call them).

Ok, so why did I organize and point all this out. So I can draw a line around where most of the innovation from my perspective is occurring. The above is for the most part in my eyes a commodity these days. Most companies have had collection/reporting/thresholding capabilities spanning multiple technology silos since pretty close to the start of the enterprise networking. The reports continue to get fancier, the number of data sources a single product collects from continues to expand, etc.  Another sign of commoditization is related to the variety of economic business models offering these products; open source, managed service providers, internet distributed products, appliances deployment models and indirect sales forces, large enterprise direct sales force, completely flexible frameworks for service providers to basically “build their own,” etc.

For the most part where the majority of technical innovation is occurring these days is the next layer above this data collection, reporting and alerting. Now let me say this, yes…there is some great innovation still occurring in the data collection realm (e.g., Xangati offering real-time Netflow down to a user level, PacketDesign monitoring routing messages, NetQoS leveraging advanced TCP/IP theory to analyze where end-to-end bottlenecks are occurring). But, for the most part these new data sources are being used to augment or replace currently deployed data sources in an attempt to see things from either as many vantage points or the best vantage points to avoid surprises within their unique enterprise IT environment.

So where is the serious innovation coming from…stay tuned for part 2.

Mar 5 2008   7:59PM GMT

Top Enterprise Management Tools vs. Data Center Automation Blueprint



Posted by: Ryan Shopp
DataCenter, Analytics, Application monitoring, CMDB, DCAB, HP Software, IBM Tivoli, InfoVista, IT Process Automation, Netuitive, RBA, RealOps, Run Book Automation, Systems monitoring, BMC, Network configuration, Network monitoring, Networkingchannel, Performance management, CA, NetQoS, Opnet, Tideway

I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

  • Network Fault & Performance: CA eHealth & Spectrum
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
  • Application Discovery Mapping: Tideway Foundation
  • Business Intelligence: Cognos
  • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
  • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
  • Process Automation: BMC RunBook Automation

Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

  • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
  • Application Discovery Mapping, CMDB = IT Resource Reconciliation
  • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
  • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

  • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
  • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
  • Business Intelligence: Cognos (which IBM recently acquired)
  • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
  • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
  • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


Jan 31 2008   5:04PM GMT

Month in Review - January 2008



Posted by: Ryan Shopp
DataCenter, Application monitoring, CMDB, IT Process Automation, ITIL, RBA, Run Book Automation, OSS, Virtualization

Thanks for all your feedback and insights during this months postings. Keep them coming!

Development of Data Center Automation Blueprint (DCAB)

Discussions beyond the DCAB functional areas

Overall DCA Trends and Observations

Current events in DCA

Keep the feedback and conversations flowing.  As I’ve mentioned before I just enjoy learning and talking about the innovation occurring in DCA…I’m really hoping and attempting to facilitate dialog from vendors and customers alike on various topics.  So don’t be shy, create an ID and leave some thoughts/comments!


Dec 17 2007   5:59PM GMT

Next pass on Data Center Automation “Blueprint”



Posted by: Ryan Shopp
DataCenter, CMDB, eTOM, FCAPS, IT Process Automation, ITIL, Application monitoring, Network monitoring, Performance management, Security, Storage, Virtualization, RBA, Run Book Automation, Systems monitoring, Systemschannel, WAN optimization

Thanks for the feedback, I’ve incorporated some points that have been made into an updated version of the Data Center Automation Blueprint (DCAB).

data-center-automation-blueprint2.jpg

As mentioned previous this is a work in progress and I love getting feedback, ideas, concerns etc. with the model. As mentioned previously I’m trying to build a functional model (at the 30,000 foot level) that represents key software functionality to automate the data center towards someday becoming “lights out.”

Also, with that said, it needs to be comprehensive but not overwhelming. I want to keep the yellow DCA functional areas limited in number…if this grows to be much more then the current six I feel it becomes too complex. So to add any new areas I need to assess how do they compare to the current areas and could I combine any areas.

One I’m struggling with right now is I’ve received feedback that analytics itself is an area. The interesting thing is analytics currently fits to some degree within each of the 4 horizontal functional areas (e.g., Configuration/Change, Security/Protection) as each of those products offer advanced reporting and as that progresses they do predictive reporting and analytics around that functional area.

Analytics would also show up at the dashboard level (currently beyond the scope of what I’m defining as the functional areas of the Data Center Automation Blueprint) where you would correlate business intelligence, patterns etc. across not just Data Center Automation functional categories but also across manual task orchestration (e.g., service/help desk) details.

Thoughts?

One more thing to clear up, I know some (many) of these functional categories and their products extend beyond the Data Center. The lens this blog looks through is exclusively focused on the challenges posed by large, complex data centers. For example, I know performance products are also useful in all sized companies (big & small) and also beyond the data center (e.g., headquarters, remote offices, partner networks, etc).


Oct 31 2007   8:12PM GMT

Activities in Application, System & Network Performance Monitoring



Posted by: Ryan Shopp
Microsoft Windows, EMC, Symantec, BMC, HP Software, CA, IBM Tivoli, Accellent, InfoVista, Solarwinds, Systems monitoring, Network monitoring, Application monitoring, Performance management, Quest Software, Networking, DataCenter

Big item to post about right out of the gate!  We all are familiar with the “Performance Management” sector within the Data Center.  Quick couple sentence summary.  Software that automates the collection and identification of potential performance bottlenecks within the data center.  Performance bottlenecks meaning real-time delays, conditions that are affecting productivity or analytics that leverages historical collected data that can help predict a potential performance concern before it happens.

Now there are a TON of large players in this space which we will review in more details in upcoming posts (e.g., BMC, CA, HP, IBM, EMC, Symantec, Quest Software, Microsoft) but today I want to hit on a couple vendors you should consider if you’re tired of working with your current vendor (most likely one of the big names above).

InfoVista is one of the last pure-play companies that provide solutions for automating Data Center Performance Management/Monitoring.  Yesterday, they finally announced a move (after years of OEM’ing various product) to round out their solution on the application performance management perspective.  I’ve talked to a number of large global enterprise/telecom customers who speak the gospel about the quality and capabilities of their products.  They’ve been known in the past for their network and systems centric capabilities but with the acquisition of Accellent they now own the application monitoring technology.  Now, let’s be clear - their solution is designed for large, large Enteprises and/or Telecommunication companies.  If your not looking to do a major global deployment spanning a large data centers and/or vast numbers of remote offices this solution may be overkill for you.   If that is the case I would recommend you taking a look at another company.

Solarwinds is making some major investments in their offerings.  If your a small, medium business or wishing to manage a portion (specific group/organization) within a larger enterprise then take a look at their Orion product line.  You get a major bang for your buck (many times 75% of the functionality you use from one of the big guys at a price point most likely less then your annual maintenance contract).  The other beautiful thing is you can download and evaluate the product in all is glory without ever talking to a single sales person.  Also, they have a very active community behind their products including a great blog, Geek Speak by my friend Josh Stephens, that provides very useful insights and perspective on leveraging their products.