Adventures in Data Center Automation:

InfoVista

Jun 23 2008   3:00PM GMT

So let’s talk a little about Traffic Flow Reporting and Analysis



Posted by: Ryan Shopp
DataCenter, Alcatel-Lucent, Compuware, Accellent, Application monitoring, HP Software, InfoVista, NetScout, Solarwinds, Network monitoring, Packet Design, Performance management, NetQoS, Opnet, Xangati

Next up, I plan to dig into this sector a little deeper (as always from a purely data center centric perspective - aka no End-User Monitoring that requires a desktop agent).

The priority for these products is to provide an end-to-end service/application perspective on traffic performance and capacity. The goals; help quickly troubleshoot from an application or end-point perspective OR better understand what/where traffic levels are going across the infrastructure. All this from a network-centric control point (no loading of agents on a server or client - since the network team doesn’t own the responsibility for those).

So on the surface I see two main categories (each has subcategories that I’ll dig into during follow-up posts)

Flow Reporting-centric (these vendors gather Cisco NetFlow, J-flow, sFlow from infrastructure agents and report in various ways)

  • Netscout, Solarwinds, CA eHealth, NetQoS, Mazu Networks, Xangati, InfoVista, Opnet, Lancope, Packet Design, Q1 Labs. Alcatel-Lucent VitaNet, HP Performance Insight - to name a few

Flow Self-Collection & Reporting (these vendors span/tap actual traffic flows and report in various ways)

  • NetQoS, Mazu Networks, InfoVista (through acquisition of Accellent), Lancope, CA Wily, Q1 Labs, Compuware - to name a few

I quickly notice now that many of the vendors actually support both - which I assume is about flexibility as some customers don’t have NetFlow type capabilities enabled or don’t wish to enabling them for a variety of reasons.

So my first set of questions/experiences I’m now reading/researching about are:

1) What are the key benefits to going the self-collection route over the Reporting only route? Unique metrics? Scalability? Limitations around NetFlow (e.g., Performance)

2) When it comes to reporting only using Netflow, etc - what metrics are being used these days.

I remember first integrating and being able to report on RMON2 probes and early Cisco NetFlow data back in 2001 within the Lucent VitalNet product…so where are things 6 years later now that NetFlow is much more pervasive and I’m sure improved.

My assumption on some of these are as follows (vendors & users please leave comments to help educate me for my follow-up posts),

When it comes to reporting, there are historical/capacity centric reports & their are real-time/troubleshooting centric views. My assumption (again, currently an assumption..I haven’t read to much on this topic yet) is most the reporting centric vendors (that don’t also offer their own passive flow monitoring capability) are focused more on those historical/capacity reports (e.g., eHealth, Solarwinds, InfoVista). These reports are how much data is going where and what type of data is it over a day/week/month etc. Once this data is archived, they slide & dice in a variety of ways. But, basically it’s about looking at it for trends over time.

Now, when it comes to real-time, since so much data is coming in so quickly their needs to be extra intelligence/automation helping out - building a “what looks normal” model and then focusing on identifying and then alerting someone when something “odd” is noted. Of course, they need to store/report on much of the same data as the historic/capacity centric products as they build credibility and trust in their users.

So when it comes down to it..much of the same data is being used for 2 unique users…one focused on planning improvements and the other focused on quickly resolving issues. So now that I’ve finished writing this post a better way to probably organize the field of play is not by technology (NetFlow vs. Self-Collect) but by usage. I’ll read some more and do that next time.

Another angle to ponder on this topic will be around the WAN acceleration/optimization vendors…but again, for another day.

Apr 17 2008   9:58PM GMT

Performance and Availability Management vs. Analytics - Part 1 of ?



Posted by: Ryan Shopp
nimsoft, cittio, eg innovations, Alcatel-Lucent, Analytics, Apparent Networks, Brix Networks, Compuware, Entuity, Fluke Networks, Gomez, Groundwork, Hyperic, Indicative, Application monitoring, DCAB, Firescope, HP Software, IBM Tivoli, InfoVista, Integrien, NetScout, Netuitive, Solarwinds, Systems monitoring, BMC, Quest Software, NetIQ, Network monitoring, Packet Design, Performance management, CA, Keynote, NAGIOS, NetQoS, Network Instruments, OpenNMS, Opnet, Xangati, ZenOSS

I’ve had an opportunity to be briefed over the past couple months by a number of current Data Center Automation Blueprint’s Performance & Availability vendors (e.g., CITTIO, eG Innovations, InfoVista, Integrien, Nimsoft).  With that and some further research I think I’m ready to take another pass at this area of the blueprint.

First up, all these vendors use a variety of techniques to collect a variety of data from as many points of view as possible.

  • Their own server agents that collect data about systems, services, applications, databases, etc and then aggregate back to a centralized console
  • Agent-less centralized consoles that leverage infrastructure standard communications protocols (e.g., SNMP, RPC, ODBC, WMI, SSH, TCP, UDP, HTTP) to query or connect remotely to collect data from networks, systems, services, applications, databases, etc.
  • Passive traffic flow collectors (which can be an agents or appliance) that are either in-line with the traffic flows or receive an exact copy of all traffic flows traversing a network connection (e.g., switch port uplink) through hardware vendor capabilities (e.g., spanning)

These data collection points can be statistics about a specific IT infrastructure resource ; physical devices, virtual devices, physical connections, virtual connections or resources running on physical or virtual devices like services, processes, applications, databases, etc.

Or the data collection points can be traffic flows or end-to-end specifics including passive traffic flows, synthetic transactions or even as simple as a pinging from remote points.

Metrics that are captured, typically revolve around throughput, errors, utilization, latency, up/down status, etc. (there are way to many to mention here).

After saying all this, there is a list a mile long of vendors (a number already noted on the DCAB) that capture these predominately time-series oriented data points about performance, capacity, availability using any/all these methods or vantage points (I know, passive traffic flows are not time-series data but patterns/usage/performance etc can be determined from them).

So, with all that data, what most these vendors offer are two primary types of functionality; 1) a variety graphical reports and 2)metric thresholding capabilities that produce a list of outstanding issues/alerts/alarms/events/concerns (whatever you want to call them).

Ok, so why did I organize and point all this out. So I can draw a line around where most of the innovation from my perspective is occurring. The above is for the most part in my eyes a commodity these days. Most companies have had collection/reporting/thresholding capabilities spanning multiple technology silos since pretty close to the start of the enterprise networking. The reports continue to get fancier, the number of data sources a single product collects from continues to expand, etc.  Another sign of commoditization is related to the variety of economic business models offering these products; open source, managed service providers, internet distributed products, appliances deployment models and indirect sales forces, large enterprise direct sales force, completely flexible frameworks for service providers to basically “build their own,” etc.

For the most part where the majority of technical innovation is occurring these days is the next layer above this data collection, reporting and alerting. Now let me say this, yes…there is some great innovation still occurring in the data collection realm (e.g., Xangati offering real-time Netflow down to a user level, PacketDesign monitoring routing messages, NetQoS leveraging advanced TCP/IP theory to analyze where end-to-end bottlenecks are occurring). But, for the most part these new data sources are being used to augment or replace currently deployed data sources in an attempt to see things from either as many vantage points or the best vantage points to avoid surprises within their unique enterprise IT environment.

So where is the serious innovation coming from…stay tuned for part 2.


Mar 5 2008   7:59PM GMT

Top Enterprise Management Tools vs. Data Center Automation Blueprint



Posted by: Ryan Shopp
DataCenter, Analytics, Application monitoring, CMDB, DCAB, HP Software, IBM Tivoli, InfoVista, IT Process Automation, Netuitive, RBA, RealOps, Run Book Automation, Systems monitoring, BMC, Network configuration, Network monitoring, Networkingchannel, Performance management, CA, NetQoS, Opnet, Tideway

I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

  • Network Fault & Performance: CA eHealth & Spectrum
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
  • Application Discovery Mapping: Tideway Foundation
  • Business Intelligence: Cognos
  • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
  • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
  • Process Automation: BMC RunBook Automation

Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

  • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
  • Application Discovery Mapping, CMDB = IT Resource Reconciliation
  • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
  • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

  • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
  • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
  • Business Intelligence: Cognos (which IBM recently acquired)
  • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
  • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
  • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


Dec 28 2007   11:31PM GMT

Digging into each of these 6 functional areas: Performance and Capacity



Posted by: Ryan Shopp
DataCenter, HP Software, IBM Tivoli, InfoVista, Integrien, Netuitive, Systems monitoring, OSS, BMC, Quest Software, NetIQ, Network monitoring, Performance management, CA, Zabbix, ZenOSS, OpenNMS, NAGIOS, Hyperic, Groundwork, Packet Design, Apparent Networks, Xangati, Gomez, Keynote, Brix Networks, Entuity, Opnet, Network Instruments, Fluke Networks, Alcatel-Lucent, Compuware, NetScout, NetQoS, Symantec, EMC

First things first, we have many of the same vendors from the Availability & Notification functional area of this Data Center Automation Blueprint in this category. Which probably begs the question, do we combine Availability & Notification with Performance & Capacity? I know in the OSS (not Open Source Software but telco-oriented Operational  Support Systems) model they do this and call it “Service Assurance”, another name could be Service Level Management as they two monitoring-centric functions are about ensuring service levels are met…or simply I call it Availability & Performance? I’ll come back to this at the end after I type up the players in this Performance & Capacity area:

But then, we have a slew of others that have been around for quite some time now…

And some innovative up-and-comers in some unique technology/approaches…

Real-Time Behavior/Pattern Analysis through Dynamic Thresholding

IP Traffic/Packet Flow Monitoring & Analysis

Open Source Software (OSS) vendors

Whew..that was more work then I expected to pull together and I’m not done yet…  Please throw into the comment who I’ve missed (I know there has to be a few).

The major challenge here is organizing and breaking down this functional area.  There are so many approaches to obtain performance metrics from/for the data center.  Some of the techniques and perspectives include;

  • passive vs. active
  • agent vs. agent-less
  • in-line appliance vs. out-of-band appliance (e.g., span a port)
  • proprietary vs. leverage infrastructure mgmt. capabilities (e.g., Cisco Netflow)
  • outside the data center looking in vs. inside the data center itself.
  • Reactive troubleshooting vs. Proactive Predictive

I’m going to need to have a part two (and maybe more) for this functional category breaking down the pro’s and con’s of various approaches.  Which vendors do what, etc.  I also need to revisit that question from the top of do we combine this into a single “availability & performance” functional category???  For now, this first pass will have to do…


Dec 7 2007   6:52PM GMT

Data Center Virtualization Automation/Management is becoming very, very congested



Posted by: Ryan Shopp
DataCenter, BladeLogic, Cassatt, HP Software, IBM Tivoli, InfoVista, Symantec, BMC, Microsoft Windows, Virtualization, Netuitive, PlateSpin, Quest Software, Stratavia, Veeam, Vizioncore

I just saw this snippet from the 451 consulting group and WOW!  In December 2006 they were covering 6 players in the Virtualization Management arena, now their are over 60!

I have some reading to do it seems.  I don’t have the $$$ to pay for the 170 page report but will take the time to go review the website and read articles about many of these vendors then report back what I learn here on this blog.  Reading through the below list I recognize a number of them…but some are names i’ve never even heard of to this point.  My quick notes are mentioned next to the company name…kind of like the word association game “what is the first thing you think of when I say…”

The companies listed by the report that have a virtualization management play include;

3Leaf Systems - who?
Acronis - who?
Akorri - who?
Availigent - who?
Avocent - the ones who acquired LANdesk
BladeLogic - major player in DCA systems/application automation
Blue Lane Technologies - virtual patching appliance
BMC Software - one of the big 4 has something, not sure how deep or what
CA - one of the big 4 has something, not sure how deep or what
Cassatt - virtualization pure play with “green” positioning
Catbird - who?
CiRBA - monitoring product to help with cserver onsolidation efforts
Cisco Systems - big guy with their ambitious Data Center 3.0 initiative
Citrix Systems - acquired XenSource post VMware IPO
CohesiveFT - who?
CollabNet - who?
Configuresoft - big but still growing systems & security mgmt player
Desktone - who?
DeviceVM - who?
Egenera - who?
eG Innovations - who?
Embotics - who?
Enigmatec - who?
Enomaly - who?
FastScale - who?
Hewlett-Packard - major player/move with Opsware acquisition
Hyperic - who?
IBM - one of the big 4 has something, not sure how deep or what
illumita - who?
InfoVista - not sure what they have in virtualization, maybe a performance monitoring for some virtual servers?
InovaWave - who?
Leostream - who?
Marathon Technologies - who?
Mendocino Software - who?
Microsoft - guerilla, who will have an impact in this space!
Netuitive - automated performance threshold monitoring, i assume they must do this for virtual servers to be included here.
Network Appliance - not sure
Nimsoft - application monitoring, been on my todo list to read more on them.
Novell - big guy, has some play here - not sure what
Onaro - who?
Pano Logic - who?
PlateSpin - known virtualization automation player i’ve talked about previously
Platform Computing - who?
Quest Software - database, application monitoring
Qumranet - who?
Red Hat - linux
Reflex Security - who?
RingCube - who?
Scalent Systems - known virtualization player with recent major OEM announcements
ScienceLogic - who?
SteelEye Technology - who?
Stratavia - Run Book Automation
Surgient - austin company, not sure what they have these days…need to look
SWsoft and Parallels - Macintosh ability to run Windows
Sychron - who?
Sun Microsystems - solaris and grid computing initiatives
Symantec - security and storage with some systems products they’ve acquired
ToutVirtual - who?
Univa UD - who?
Veeam Software - known virtualization player i’ve previously talked about
Virtual Iron - heard of them…haven’t looked at them yet though
Virtugo Software - who?
Vizioncore - known virtualization player i’ve previously talked about
VMLogix - heard of them…haven’t look at them yet though
VMware - if you don’t know this name you must dead, or atleast not into technology or the stock market
XDS - who?
Xsigo - who?

Bottom line, I have a ton of reading to do!!!  I’ll start with the smaller guys and work my way up.  If you have any perspectives or insights please don’t hesitate to leave them in the comments section.


Dec 4 2007   10:04PM GMT

What are the Six Functional Areas of Data Center Automation



Posted by: Ryan Shopp
DataCenter, Alterpoint, BladeLogic, Cassatt, Integrien, IT Process Automation, HP Software, IBM Tivoli, InfoVista, BMC, Microsoft Windows, NetIQ, Netuitive, Opalis, Optinuity, PlateSpin, RealOps, Scalent, Stratavia, Veeam, Vizioncore

Alright, here is my first pass at a graphic I’m attempting to build that will capture the spirit of my previous posts (this is a work still in progress as previously mentioned);

I’m attempting to come up with a 30,000 foot reference model (functionality focused) for when you’re building out a data center’s software automation architecture.

The yellow areas are the 6 current areas I’ve functionally identified. The tricky part is based on the complexities of each category in the Data Center Infrastructure (e.g., Network vs. System), many of the functional areas require technical depth and audience-specific focus (e.g., network engineers vs. SAP administrators). The arrows are trying to capture that.

I know this still needs work but this is an evolution, and I only have a little time each week to currently work on it during these blog posts.

Below the graphic are some current vendors by function that have product(s) in each function that I’ve mentioned during previous blog posting so far.

data-center-automation-reference-model-v1.jpg

  • Configuration & Change: BMC (Marimba), CA, EMC (Voyence), HP (Opsware), IBM, BladeLogic, Cassatt, AlterPoint, Platespin, Scalent, Veeam, Vizioncore
  • Security & Protection: Symantec, IBM, EMC, McAfee, nCircle, Lumension, ArcSight
  • Performance & Capacity: BMC, CA, EMC, HP, IBM, Quest, InfoVista
  • Availability & Notification: BMC, CA, EMC, HP, IBM, Microsoft, Quest, Integrien, Netuitive, NetIQ
  • Process Orchestration: BMC (RealOps), HP (iConclude), Opalis, Optinuity, NetIQ, Stratavia
  • Resource Reconciliation: Symantec, IBM, HP, BMC, EMC

I know I’ve missed many and also it would probably be helpful to not simply mention the company but also the product name but that will have to wait until another time.


Nov 2 2007   3:11PM GMT

Why not AlterPoint, NCCM continues to consolidate?



Posted by: Ryan Shopp
Network configuration, NCCM, Alterpoint, DataCenter, HP Software, IBM Tivoli, InfoVista, BMC, Microsoft Windows, EMC, BladeLogic, CA

Now let me be clear here. I’m very biased on this topic. Full disclosure, I spent almost 4 years of “blood, sweat and tears” at AlterPoint from it’s version 1.0, no revenue days through it’s last leadership transition. Back in Summer of 2006 we had a new leadership team come in with new blood/energy that really invigorated things. This was needed since the company, like Voyence, had been around since early 2000 and in the world of start-up’s you work lots of 80 hour plus weeks that can wear and tear on a person.

What I’m perplexed on is over the past 30 days two other Network Configuration/Change Management vendors have been consolidated by major players; Voyence by EMC and Emprisa by BMC. So why not AlterPoint is what I’m pondering over the last couple days?  Time to jump on my soapbox for a minute or two…

With a marquee customer list that includes; Citigroup, HSBC, Microsoft, Yahoo, Hertz, TJX, Walgreen, Cingular (now AT&T Wireless) and numerous others. A list that easily that from my perspective and opinion eclipse what Voyence or Emprisa had captured.

Additionally, AlterPoint is diversified in their offerings. They recently announced specific new applications that leverage the core NCCM technology for Compliance & Analytics. Finally, talk about being a good corporate citizen - they have lead the way for a commercial IT management vendor taking a portion of their revenue producing product and productizing it for open source (called ziptie). So they have a thriving customer list, are not a “one trick pony” and are giving back/building a strong community behind their capabilities. What’s not to love :)

So if we take a quick look at the landscape, that leaves IBM, Symantec, maybe CA (they had an NCCM type module included in the Aprisma acqusition) and maybe Microsoft (they recently OEM’ed InfoVista which I discussed in my last posting) with a big hole! So in my opinion the best NCCM business/product is still out their on the market so let the bidding begin. :) The longer any of those players wait the further behind they will get in delivering end-to-end use cases for their customers that require the capabilities of NCCM.

Now my hats off must go to Opsware who was the first to see and execute on the end-to-end configuration vision for data centers. They acquired Rendition back in late 2004 and once they brought things together their valuation continued to increase which likely assisted with the recent acquisition of Opsware by HP.

Bottom line here, if your not currently leveraging an NCCM product either, commerical or open source, let me say they are amazing products that help save time, money and frustration for network engineering and operations. These automation tools are critical to the data center and beyond and compliment similar automation tools on the applications/systems side (those offered by BladeLogic, Opsware, etc). More on those automation players in upcoming posts. I would also recommend taking time to subscribe or at least check out the AlterPoint sponsored blog highlighting key evolutions and perspectives in Network Management.

As noted in my personal about section these are my own opinions and based on personal beliefs and public knowledge. I left AlterPoint back in September 2006 for some new opportunities but continue to be a avid fan and cheerleader of the NCCM space, all the vendors (competition is a good thing) and especially my friends still over at AlterPoint!


Oct 31 2007   8:12PM GMT

Activities in Application, System & Network Performance Monitoring



Posted by: Ryan Shopp
Microsoft Windows, EMC, Symantec, BMC, HP Software, CA, IBM Tivoli, Accellent, InfoVista, Solarwinds, Systems monitoring, Network monitoring, Application monitoring, Performance management, Quest Software, Networking, DataCenter

Big item to post about right out of the gate!  We all are familiar with the “Performance Management” sector within the Data Center.  Quick couple sentence summary.  Software that automates the collection and identification of potential performance bottlenecks within the data center.  Performance bottlenecks meaning real-time delays, conditions that are affecting productivity or analytics that leverages historical collected data that can help predict a potential performance concern before it happens.

Now there are a TON of large players in this space which we will review in more details in upcoming posts (e.g., BMC, CA, HP, IBM, EMC, Symantec, Quest Software, Microsoft) but today I want to hit on a couple vendors you should consider if you’re tired of working with your current vendor (most likely one of the big names above).

InfoVista is one of the last pure-play companies that provide solutions for automating Data Center Performance Management/Monitoring.  Yesterday, they finally announced a move (after years of OEM’ing various product) to round out their solution on the application performance management perspective.  I’ve talked to a number of large global enterprise/telecom customers who speak the gospel about the quality and capabilities of their products.  They’ve been known in the past for their network and systems centric capabilities but with the acquisition of Accellent they now own the application monitoring technology.  Now, let’s be clear - their solution is designed for large, large Enteprises and/or Telecommunication companies.  If your not looking to do a major global deployment spanning a large data centers and/or vast numbers of remote offices this solution may be overkill for you.   If that is the case I would recommend you taking a look at another company.

Solarwinds is making some major investments in their offerings.  If your a small, medium business or wishing to manage a portion (specific group/organization) within a larger enterprise then take a look at their Orion product line.  You get a major bang for your buck (many times 75% of the functionality you use from one of the big guys at a price point most likely less then your annual maintenance contract).  The other beautiful thing is you can download and evaluate the product in all is glory without ever talking to a single sales person.  Also, they have a very active community behind their products including a great blog, Geek Speak by my friend Josh Stephens, that provides very useful insights and perspective on leveraging their products.