Adventures in Data Center Automation:

Networking

Jun 23 2008   3:00PM GMT

So let’s talk a little about Traffic Flow Reporting and Analysis



Posted by: Ryan Shopp
DataCenter, Alcatel-Lucent, Compuware, Accellent, Application monitoring, HP Software, InfoVista, NetScout, Solarwinds, Network monitoring, Packet Design, Performance management, NetQoS, Opnet, Xangati

Next up, I plan to dig into this sector a little deeper (as always from a purely data center centric perspective - aka no End-User Monitoring that requires a desktop agent).

The priority for these products is to provide an end-to-end service/application perspective on traffic performance and capacity. The goals; help quickly troubleshoot from an application or end-point perspective OR better understand what/where traffic levels are going across the infrastructure. All this from a network-centric control point (no loading of agents on a server or client - since the network team doesn’t own the responsibility for those).

So on the surface I see two main categories (each has subcategories that I’ll dig into during follow-up posts)

Flow Reporting-centric (these vendors gather Cisco NetFlow, J-flow, sFlow from infrastructure agents and report in various ways)

  • Netscout, Solarwinds, CA eHealth, NetQoS, Mazu Networks, Xangati, InfoVista, Opnet, Lancope, Packet Design, Q1 Labs. Alcatel-Lucent VitaNet, HP Performance Insight - to name a few

Flow Self-Collection & Reporting (these vendors span/tap actual traffic flows and report in various ways)

  • NetQoS, Mazu Networks, InfoVista (through acquisition of Accellent), Lancope, CA Wily, Q1 Labs, Compuware - to name a few

I quickly notice now that many of the vendors actually support both - which I assume is about flexibility as some customers don’t have NetFlow type capabilities enabled or don’t wish to enabling them for a variety of reasons.

So my first set of questions/experiences I’m now reading/researching about are:

1) What are the key benefits to going the self-collection route over the Reporting only route? Unique metrics? Scalability? Limitations around NetFlow (e.g., Performance)

2) When it comes to reporting only using Netflow, etc - what metrics are being used these days.

I remember first integrating and being able to report on RMON2 probes and early Cisco NetFlow data back in 2001 within the Lucent VitalNet product…so where are things 6 years later now that NetFlow is much more pervasive and I’m sure improved.

My assumption on some of these are as follows (vendors & users please leave comments to help educate me for my follow-up posts),

When it comes to reporting, there are historical/capacity centric reports & their are real-time/troubleshooting centric views. My assumption (again, currently an assumption..I haven’t read to much on this topic yet) is most the reporting centric vendors (that don’t also offer their own passive flow monitoring capability) are focused more on those historical/capacity reports (e.g., eHealth, Solarwinds, InfoVista). These reports are how much data is going where and what type of data is it over a day/week/month etc. Once this data is archived, they slide & dice in a variety of ways. But, basically it’s about looking at it for trends over time.

Now, when it comes to real-time, since so much data is coming in so quickly their needs to be extra intelligence/automation helping out - building a “what looks normal” model and then focusing on identifying and then alerting someone when something “odd” is noted. Of course, they need to store/report on much of the same data as the historic/capacity centric products as they build credibility and trust in their users.

So when it comes down to it..much of the same data is being used for 2 unique users…one focused on planning improvements and the other focused on quickly resolving issues. So now that I’ve finished writing this post a better way to probably organize the field of play is not by technology (NetFlow vs. Self-Collect) but by usage. I’ll read some more and do that next time.

Another angle to ponder on this topic will be around the WAN acceleration/optimization vendors…but again, for another day.

Apr 17 2008   9:58PM GMT

Performance and Availability Management vs. Analytics - Part 1 of ?



Posted by: Ryan Shopp
nimsoft, cittio, eg innovations, Alcatel-Lucent, Analytics, Apparent Networks, Brix Networks, Compuware, Entuity, Fluke Networks, Gomez, Groundwork, Hyperic, Indicative, Application monitoring, DCAB, Firescope, HP Software, IBM Tivoli, InfoVista, Integrien, NetScout, Netuitive, Solarwinds, Systems monitoring, BMC, Quest Software, NetIQ, Network monitoring, Packet Design, Performance management, CA, Keynote, NAGIOS, NetQoS, Network Instruments, OpenNMS, Opnet, Xangati, ZenOSS

I’ve had an opportunity to be briefed over the past couple months by a number of current Data Center Automation Blueprint’s Performance & Availability vendors (e.g., CITTIO, eG Innovations, InfoVista, Integrien, Nimsoft).  With that and some further research I think I’m ready to take another pass at this area of the blueprint.

First up, all these vendors use a variety of techniques to collect a variety of data from as many points of view as possible.

  • Their own server agents that collect data about systems, services, applications, databases, etc and then aggregate back to a centralized console
  • Agent-less centralized consoles that leverage infrastructure standard communications protocols (e.g., SNMP, RPC, ODBC, WMI, SSH, TCP, UDP, HTTP) to query or connect remotely to collect data from networks, systems, services, applications, databases, etc.
  • Passive traffic flow collectors (which can be an agents or appliance) that are either in-line with the traffic flows or receive an exact copy of all traffic flows traversing a network connection (e.g., switch port uplink) through hardware vendor capabilities (e.g., spanning)

These data collection points can be statistics about a specific IT infrastructure resource ; physical devices, virtual devices, physical connections, virtual connections or resources running on physical or virtual devices like services, processes, applications, databases, etc.

Or the data collection points can be traffic flows or end-to-end specifics including passive traffic flows, synthetic transactions or even as simple as a pinging from remote points.

Metrics that are captured, typically revolve around throughput, errors, utilization, latency, up/down status, etc. (there are way to many to mention here).

After saying all this, there is a list a mile long of vendors (a number already noted on the DCAB) that capture these predominately time-series oriented data points about performance, capacity, availability using any/all these methods or vantage points (I know, passive traffic flows are not time-series data but patterns/usage/performance etc can be determined from them).

So, with all that data, what most these vendors offer are two primary types of functionality; 1) a variety graphical reports and 2)metric thresholding capabilities that produce a list of outstanding issues/alerts/alarms/events/concerns (whatever you want to call them).

Ok, so why did I organize and point all this out. So I can draw a line around where most of the innovation from my perspective is occurring. The above is for the most part in my eyes a commodity these days. Most companies have had collection/reporting/thresholding capabilities spanning multiple technology silos since pretty close to the start of the enterprise networking. The reports continue to get fancier, the number of data sources a single product collects from continues to expand, etc.  Another sign of commoditization is related to the variety of economic business models offering these products; open source, managed service providers, internet distributed products, appliances deployment models and indirect sales forces, large enterprise direct sales force, completely flexible frameworks for service providers to basically “build their own,” etc.

For the most part where the majority of technical innovation is occurring these days is the next layer above this data collection, reporting and alerting. Now let me say this, yes…there is some great innovation still occurring in the data collection realm (e.g., Xangati offering real-time Netflow down to a user level, PacketDesign monitoring routing messages, NetQoS leveraging advanced TCP/IP theory to analyze where end-to-end bottlenecks are occurring). But, for the most part these new data sources are being used to augment or replace currently deployed data sources in an attempt to see things from either as many vantage points or the best vantage points to avoid surprises within their unique enterprise IT environment.

So where is the serious innovation coming from…stay tuned for part 2.


Mar 26 2008   2:03PM GMT

IT Performance Management Call for Resources; I have a dream for performance management



Posted by: Ryan Shopp
Integrien, Netuitive, RBA, Run Book Automation, BMC, Performance management

So in my last posting I called out for some links, resources that people recommend to others when it comes to understanding the variety of options and functions for Network & Application Performance Management.  Upon making the request I decided to spend a few minute looking around.  First up for me is a quick trip over to Wikipedia to see what they have on the topic.

On the topic of Network Performance Management; there is a nice write-up  on factors that contribute to performance issues - Latency, Packet loss, retransmission, throughput.

On the topic of Application Performance Management; there were some very in-depth graphs focused around monitoring response time which I found intriguing.

On the topic of Performance Engineering; I was very surprised not only by a nice write-up of principals and perspectives related to the software development lifecycle, but also a laundry list of interesting and applicable whitepapers at the bottom.

So at this point I stopped and started pondering, is there a product out there that goes beyond grabbing statistics and reporting on them?  Some tools collect data from flows, some collect data from individual resources, some tools set-up endpoints that systematically send sythentic transactions to measure response times, etc.

What do I really mean by this…is there a product that takes a troubleshooting workflow (think Run Book Automation) approach to the different steps involved with determining performance concern.  He is what I mean…

  • Start with monitoring traffic flows for their response time
  • Automatically baseline this and when a major deviation occurs go to the next bullet point
  • Is this traffic delay specific to a specific type of traffic or is affecting all traffic
  • What is causing this anomaly, calculate which points of the infrastructure are traversed by these traffic flows
  • Look at each input/output point on the infrastructure (e.g., interfaces) to see if their are errors, retransmissions, etc
  • If not errors, next look at each input/output point on the infrastructure to see if throughput in bottlenecked.
  • If no bottlenecks, next look at the processors/CPU on each point of the infrastructure to see if that is causing the delay
  • If no processor delays, look at…. (etc, etc, etc)

At this point I think we get the picture.  Most products I’m familiar with collect data metrics from one, two, three, etc points of view on the network and roll-up those into impressive looking graphical reports.  Then it’s up to the administer to review each report and self-analyze.  As mentioned previously in posts I’m familiar with Integrien, Netuitive & BMC (ProactiveNet) who perform impressive behavioral baselining in creating more intelligent alerts to forward to the event management console but I’m looking for more here.  I want someone to take all the collected data and basically apply root cause analysis/run book automation principles.  If someone is out there doing this please speak up and throw a link to your site down in the comments so I can come take a look.


Mar 11 2008   1:27PM GMT

EMC adds Service Desk to Data Center Management portfolio



Posted by: Ryan Shopp
BladeLogic, DCAB, HP Software, BMC, NetIQ, Performance management, Symantec, EMC, NetQoS, Packet Design, Xangati

EMC made a move yesterday that continued to show their intent and desire to compete against the Big 4 in IT Infrastructure Management (e.g., BMC, CA, HP, IBM).  All those other players have their own Service Desk offering, so it was time to join those ranks.

Infra Corporation, was acquired by EMC’s Resource Management Software Business Unit for undisclosed financial terms.

Combined with their previous acquisitions:

SMARTS - Availability & Performance Management - Q1 2005
nLayers -  IT  Resource Reconciliation (e.g., CMDB) - Q3 2006
Voyence - Configuration & Change Management (for Network Devices) - Q4 2007

This acquisition shows a slowly increasing pace of their acquisitions (within the software group).  With that being said, looking at their portfolio, I would be surprised if we don’t see another one or maybe even two (depending on the size) before the year is out.  Areas they could benefit from (aka we could see) would be Configuration & Change Management (for Systems/Applications) or a move to strengthen their Availability & Performance Management offering; specifically more application performance centric.

On the CCM front there are numerous virtual & physical system configuration vendors sprouting up these days, versus before the primary game in town was BladeLogic (or Opsware before HP acquired them).  Meanwhile, on the Performance Management front they have a variety of options that could include grabbing a smaller application performance appliance vendor (e.g., Mazu, Xangati, Packet Design)  or something bigger like maybe a NetQoS.  Or even bigger and more interesting (but convoluted) could be buying out NetIQ who continues to innovate within Attachemate (e.g., Aegis product) or the artist formerly known as Precise Software (and now again known by the same name after Symantec spun them back out).  Probably long shots but just thoughts to ponder as the EMC Resource Management Software portfolio could use portfolio expansion in either or both functional areas of the DCAB.

Bottom line from my outsiders perspective is EMC is one or two moves away from changing conversations from the big 4 to maybe the big 5.


Mar 5 2008   7:59PM GMT

Top Enterprise Management Tools vs. Data Center Automation Blueprint



Posted by: Ryan Shopp
DataCenter, Analytics, Application monitoring, CMDB, DCAB, HP Software, IBM Tivoli, InfoVista, IT Process Automation, Netuitive, RBA, RealOps, Run Book Automation, Systems monitoring, BMC, Network configuration, Network monitoring, Networkingchannel, Performance management, CA, NetQoS, Opnet, Tideway

I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

  • Network Fault & Performance: CA eHealth & Spectrum
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
  • Application Discovery Mapping: Tideway Foundation
  • Business Intelligence: Cognos
  • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
  • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
  • Process Automation: BMC RunBook Automation

Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

  • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
  • Application Discovery Mapping, CMDB = IT Resource Reconciliation
  • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
  • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

  • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
  • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
  • Business Intelligence: Cognos (which IBM recently acquired)
  • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
  • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
  • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


Feb 28 2008   4:55PM GMT

Analytics; What are the top capabilities?



Posted by: Ryan Shopp
Analytics, DataCenter, DCAB, Integrien, NCCM, BMC, Alterpoint, Configuresoft, Netuitive, Opnet

Recently, I made some adjustments to the Data Center Automation Blueprint where we combined 2 original areas and added a new one for Analytics.  Steve Henning just posted a great guest blog entry over at Doug McClure’s blog called “Why Real Time Analytics?” I personally liked the analogy to TQM and the manufacturing industry.

He also recently jotted down some of his thoughts on capabilities within the comments section for the posting “Data Center Automation Blueprint; now includes virtualization thoughts.”

Here are some of my initial thoughts that I will take another pass at cleaning up in the next week or two.  I wanted to get this posted in a timely manner to hopefully inspire some discussions:

1) Inter-domain Integrations - Steve called it “Cross Silo” in his comment post. But the analytics solutions need to have a data model and API/SDK that is not specific to one domain (e.g., databases, windows systems, network devices, websphere applications).  To perform holistic analysis you need more then one point of view.

2) Pattern Logic Automation- Automation through algorithms, rules etc that work to mimic the human problem solving / analysis process.

3) “Advanced” Graphical Visualization- more then summary graphics, pie charts etc…what I’m think here is something I can look at that helps me see the pattern or some unique situation/trend affecting the business (e.g., correlation of trouble ticket and performance monitoring details).  A better name then “advanced” is needed here for sure.

So far the vendors I’m thinking of when I’m creating the above functionality list (as noted in the DCAB) include;

Who else do we believe should be in this analytics bucket? Thoughts on these 3 capabilities?  What are some others?


Jan 25 2008   9:00AM GMT

Couple recent notes on CMDB, aka Resource Reconciliation



Posted by: Ryan Shopp
DataCenter, CMDB, Opalis, Scalent, Symantec, BMC, NetIQ, CA

Another great post by Glenn O’Donnell; CMDB is the new integration mechanism. I’m looking forward to seeing his forthcoming book on the same topic!

2007 TechTarget Products of the Year - Data Center include (categories by DCAB functional categories):

Resource Reconciliation (category combined with Configuration & Change) solutions from CA, BMC and Scalent

A couple other categories that map to the DCAB are;

Process Orchestration solutions from Symantec, Opalis and CA

Performance & Capacity solutions from NetIQ, BalancePoint and CiRBA

I find the CiRBA solution very intriguing after my read and post on Innovations in Performance Management yesterday.


Jan 21 2008   1:43PM GMT

Quick Monday Summary of events from late last week/weekend



Posted by: Ryan Shopp
Compuware, Symantec, BMC, Quest Software, NetIQ, Indicative, NetQoS, NetScout

 Symantec to sell off Application Performance Monitoring group.  Looks like Precise Software is back and the Symantec Data Center group will focus in on the configuration and change management side of things.

BarcampESM took place over the weekend.  Here are some materials to take a look at.  BSM by Doug,  Discussions around open software and open standards, the desire for an “open agent” .  From this point forward keep track of things via the Open Management Consortium discussions.

Application Performance Management(APM) rolling review continues at InformationWeek - recently highlighted, ProactiveNet (recently acquired by BMC).  Previous reviews include Quest Software Foglight (Dec 2007), Network General (Nov 2007), Nimsoft Nimbus (Oct 2007), Compuware Vantage (Oct 2007), NetIQ AppManager (Sept 2007), NetQoS SuperAgent (Sept 2007)Indicative (Aug 2007).  As you can see this is a very congested space, pardon the pun, but it is sized to be over $2B in size by Forrester.

Now that we’ve run through the entire 6 functional areas of the Data Center Automation Blueprint we plant to discuss the impact of virtualization over the next couple posts.  Thanks in advance to those I’ve been talking with and their perspectives on this topic.


Jan 17 2008   7:14PM GMT

What are the most desired features in IT Process Orchestration (e.g. RBA)?



Posted by: Ryan Shopp
DataCenter, Enigmatec, HP Software, IBM Tivoli, IT Process Automation, Opalis, Optinuity, RBA, RealOps, Run Book Automation, Stratavia, BMC, LANDesk, NetIQ, OpTier, Scapa Technologies

Alright, looking for feedback on this one. After talking about the players in the IT Process Orchestration space, I’m wondering what are the primary capabilities people are looking for?

Here are my top five, please feel free to throw down yours in the comments below:

  1. Drag/Drop graphical interface for designing process workflows
  2. Common, normalized Data Model of common/primary attributes
  3. Library of pre-defined, re-usable actions/triggers/processes for usage out-of-the-box (bigger the better - even a community that shares is a plus)
  4. Policy/Desired-state engine driving things
  5. Sandbox, simulator to help test workflows without impacting actual resources/instances within the production enterprise.

Beyond these five core capabilities, depending on the processes you wish to automate you need to verify what interaction/communications protocols are supported (e.g., SNMP, WMI, JMX, ODBC, Telnet/SSH/FTP to CLI, XML/Web Services). Make sure they have what you need to communicate with.

Of course, it also goes without saying (just like with any commercial product) table stakes require RBAC security, reporting, logging, appropriate hardware/software requirements.

Bottom line, I guarantee if your a medium to large enterprise you have current manual processes that these products can automate for you! Reducing errors due to the mundane nature of that task, freeing up people currently doing the task for other projects or tasks and also the intangible benefit of it’s simply faster which provides better customer service depending on the process that is automated. Make this a priority in 2008 and get one of these vendors in there to help out!

Disclosure: I have no relationships with any of the vendors in this space. The comments are all made based on my personal experiences and perspectives.


Jan 14 2008   8:42PM GMT

Digging into the DCAB 6’s functional areas: Process Orchestration



Posted by: Ryan Shopp
DataCenter, HP Software, IBM Tivoli, IT Process Automation, Opalis, Optinuity, RBA, Run Book Automation, Stratavia, BMC, NetIQ, OpTier, Scapa Technologies, LANDesk, Enigmatec, GridApp Systems

Alright, back on track with our review of the 6 functional DCAB areas. We are now onto the hottest, fastest growth areas! First up, Process Orchestration or what Gartner has coined as Run Book Automation?

These products offer the ability to define, build, orchestrate, manage, monitor and report on workflows that automate specific IT intra or inter domain processes (intra = between different products for the Windows Server team or inter = between the application and network team). There are a ton of case studies and examples on most the players websites.

A couple quick examples to get a flavor include:

A monitoring product identifies a specific condition (e.g., an outage), it then checks a configuration auditing product to see if a recent change was performed for that system.

A configuration auditing product monitoring if a device is in or out of compliance notices an situation and then automatically opens a trouble ticket. Later, it notices again the situation has been resolved and it adds the appropriate details to the ticket and automatically closes it out.

Here are the companies I know about (as always, in alphabetical order)

BMC (formerly RealOps)
Enigmatec
GridApp
HP (formerly Opsware, formerly iConclude)
IBM (formerly ThinkDynamics)
LANDesk (Process Manager product)
NetIQ (Aegis product)
OpTier
Opalis
Optinuity
Scapa Technologies
Stratavia
UC4 Software
xTigo

As always, who am I missing. What are the opinions out there from users or evaluators for each platform (please chime in down in the comments section). I have personal product exposure and experience with only BMC, Stratavia. Some of the key features that I learned from those products included the value of having a normalized, common data model and “action” abstraction capabilities so you re-use previous process actions in new workflows.

Here are a couple good reviews and write-ups for further reading if desired.

Data Center Manager Primed for IT Process Automation
IT Process Automaton Overview and review of some players