Opnet archives - Adventures in Data Center Automation

Adventures in Data Center Automation:

Opnet

Jun 23 2008   3:00PM GMT

So let’s talk a little about Traffic Flow Reporting and Analysis



Posted by: Ryan Shopp
Network monitoring, Performance management, Alcatel-Lucent, NetScout, DataCenter, Application monitoring, SolarWinds, InfoVista, Accellent, HP Software, NetQoS, Compuware, Opnet, Xangati, Packet Design

Next up, I plan to dig into this sector a little deeper (as always from a purely data center centric perspective - aka no End-User Monitoring that requires a desktop agent).

The priority for these products is to provide an end-to-end service/application perspective on traffic performance and capacity. The goals; help quickly troubleshoot from an application or end-point perspective OR better understand what/where traffic levels are going across the infrastructure. All this from a network-centric control point (no loading of agents on a server or client - since the network team doesn’t own the responsibility for those).

So on the surface I see two main categories (each has subcategories that I’ll dig into during follow-up posts)

Flow Reporting-centric (these vendors gather Cisco NetFlow, J-flow, sFlow from infrastructure agents and report in various ways)

  • Netscout, Solarwinds, CA eHealth, NetQoS, Mazu Networks, Xangati, InfoVista, Opnet, Lancope, Packet Design, Q1 Labs. Alcatel-Lucent VitaNet, HP Performance Insight - to name a few

Flow Self-Collection & Reporting (these vendors span/tap actual traffic flows and report in various ways)

  • NetQoS, Mazu Networks, InfoVista (through acquisition of Accellent), Lancope, CA Wily, Q1 Labs, Compuware - to name a few

I quickly notice now that many of the vendors actually support both - which I assume is about flexibility as some customers don’t have NetFlow type capabilities enabled or don’t wish to enabling them for a variety of reasons.

So my first set of questions/experiences I’m now reading/researching about are:

1) What are the key benefits to going the self-collection route over the Reporting only route? Unique metrics? Scalability? Limitations around NetFlow (e.g., Performance)

2) When it comes to reporting only using Netflow, etc - what metrics are being used these days.

I remember first integrating and being able to report on RMON2 probes and early Cisco NetFlow data back in 2001 within the Lucent VitalNet product…so where are things 6 years later now that NetFlow is much more pervasive and I’m sure improved.

My assumption on some of these are as follows (vendors & users please leave comments to help educate me for my follow-up posts),

When it comes to reporting, there are historical/capacity centric reports & their are real-time/troubleshooting centric views. My assumption (again, currently an assumption..I haven’t read to much on this topic yet) is most the reporting centric vendors (that don’t also offer their own passive flow monitoring capability) are focused more on those historical/capacity reports (e.g., eHealth, Solarwinds, InfoVista). These reports are how much data is going where and what type of data is it over a day/week/month etc. Once this data is archived, they slide & dice in a variety of ways. But, basically it’s about looking at it for trends over time.

Now, when it comes to real-time, since so much data is coming in so quickly their needs to be extra intelligence/automation helping out - building a “what looks normal” model and then focusing on identifying and then alerting someone when something “odd” is noted. Of course, they need to store/report on much of the same data as the historic/capacity centric products as they build credibility and trust in their users.

So when it comes down to it..much of the same data is being used for 2 unique users…one focused on planning improvements and the other focused on quickly resolving issues. So now that I’ve finished writing this post a better way to probably organize the field of play is not by technology (NetFlow vs. Self-Collect) but by usage. I’ll read some more and do that next time.

Another angle to ponder on this topic will be around the WAN acceleration/optimization vendors…but again, for another day.

May 20 2008   10:47PM GMT

Performance and Availability vs. Analytics - Part 4 of 5



Posted by: Ryan Shopp
DataCenter, Managed Objects, Integrien, Opnet, Firescope

Sorry for the delay but family time called as we were blessed with a baby boy a couple weeks ago.  So back on track; in part one we hit data collection, part two talked about applying analytics and business/service mapping and part three we hit on evolving the Data Center Automation Blueprint from Performance & Availability to Service Assurance.  So what does that mean for analytics?

Well, here is where it gets tricky.  I believe their are two types of analytics that are sometimes being confused or blended together…

Type 1:  Is Per functional category - meaning, software automation that uses algorithms, automated analysis, etc focused on one of the 3 functional categories (e.g., Performance & Availability, Configuration & Change, Security & Protection).
Cross functional category.

Type 2: Is Cross-functional - like Process Orchestration & Resource Reconciliation, you have a roll-up aggregated view of metrics that are mapped to the business (beyond IT specific metrics).  This is also commonly called Business Service Management by most definitions.

Some quick examples….companies like Integrien, Opnet fall into type 1, while companies like Managed Objects, Firescope map closer to type 2.  Now this all gets very confusing as there are overlaps where vendors who do mostly type 1 analytics and some type 2 analytics claim both and even call themselves BSM vendors…meanwhile, the same occurs where mostly type 2 analytics (aka BSM) also claim to do some type 1.  So I’m not a BSM guru but I do exchange blogs/emails with some and would love to hear them chime in on this thread.  Based on this feedback and some further reading over at my favorite BSM blog, my next post will wrap up this series and I’ll update the Data Center Automation Blueprint.


Apr 21 2008   3:11PM GMT

Performance and Availability vs. Analytics - Part 2 of ?



Posted by: Ryan Shopp
BMC, DataCenter, Analytics, HP Software, Netuitive, Integrien, NetQoS, Opnet, Indicative, eg innovations

So in part 1 we talked through the collection of performance/capacity/availability data. Next up is focused on where innovations using this collected data are taking us.

The next level of Performance & Availability I previously mentioned are coming from a variety of companies doing cross-metric analysis or even automated behavioral analytics. These vendors are typically classify themselves as Service Level Management, some types of Business Service Management or Analytics. They either leverage a variety of data collection entities or they themselves offer capabilities that span multiple sources to elevate and/or automate results in the hope of proactive (even predictive) identification of issues with minimal (striving for zero) false positives. Here are some more thoughts on each of these areas:

  • Service Level Agreement vendors seem to focus on leveraging a variety of data sources/metrics and normalizing them into very detailed quality of service/performance agreements between a service provider and their customers (in some situations the service provider is the internal IT department themselves).
  • Business Service Management vendors in the realm of performance/capacity/availability seem to focus on the mapping of each business service (e.g., application(s) and the infrastructure that supports those application(s) from and end-to-end perspective). Then, if any component in the mapped bundle shows signs of trouble, an alert is raised for proactive resolution.  NOTE:  BSM is a very broad term - I’m focusing it down here on just this functional area, I’m not talking comprehensive dashboard spanning all functional areas, service desks etc.
  • Real-time Analytic vendors seem to leverage a variety of time-series metrics from various collection sources mapped together appropriately (like BSM), then using behavioral algorithms they dynamically determine normal behavior. If something deviates from that behavior then in real-time it raises an alarm (now were getting predictive).
  • Historical Analytics or modeling/simulation vendors seem to leverage a variety of data sources coupled with other cross-functional details (e.g., CMDB, configuration settings) to establish a model and expected behavior. Then you can tweak, tune or even re-design to see impact of potential changes, upgrades, etc.

We could probably come up with better names for these higher level performance/capacity/availability areas but Service Level Management, Business Service Management and Performance Analytics are the ones on the marketing being advertised today.

One area of data collection and reporting that does continue to innovate  is from the end-user, passive traffic flow perspective. This first popped up on the scene back in the last 1990’s and since then there seems to have been a major resurgence in vendors focusing on specific, mission-critical applications. Since these agents typically reside and monitor from the desktop or mobile device perspective I’ve placed them beyond the scope and control of Data Center Automation. Some vendors are doing the end-to-end monitoring (as mentioned before) from an appliance in the data center making some TCP/IP assumptions (e.g., NetQoS, CA Wily).

So now we’ve discussed Performance/Capacity/Availability management and how it also has analytics occurring within that functional silo. So what does that mean to the Data Center Automation Blueprint from my perspective. Stay tuned for part 3.


Apr 17 2008   9:58PM GMT

Performance and Availability Management vs. Analytics - Part 1 of ?



Posted by: Ryan Shopp
Network monitoring, Performance management, BMC, NetIQ, Alcatel-Lucent, NetScout, Analytics, CA, Systems monitoring, Application monitoring, SolarWinds, InfoVista, IBM Tivoli, HP Software, Quest Software, Netuitive, Integrien, NetQoS, Compuware, Fluke Networks, Network Instruments, Opnet, Entuity, Brix Networks, Keynote, Gomez, Xangati, Apparent Networks, Packet Design, Groundwork, Hyperic, Nagios, OpenNMS, ZenOSS, Firescope, Indicative, DCAB, eg innovations, cittio, nimsoft

I’ve had an opportunity to be briefed over the past couple months by a number of current Data Center Automation Blueprint’s Performance & Availability vendors (e.g., CITTIO, eG Innovations, InfoVista, Integrien, Nimsoft).  With that and some further research I think I’m ready to take another pass at this area of the blueprint.

First up, all these vendors use a variety of techniques to collect a variety of data from as many points of view as possible.

  • Their own server agents that collect data about systems, services, applications, databases, etc and then aggregate back to a centralized console
  • Agent-less centralized consoles that leverage infrastructure standard communications protocols (e.g., SNMP, RPC, ODBC, WMI, SSH, TCP, UDP, HTTP) to query or connect remotely to collect data from networks, systems, services, applications, databases, etc.
  • Passive traffic flow collectors (which can be an agents or appliance) that are either in-line with the traffic flows or receive an exact copy of all traffic flows traversing a network connection (e.g., switch port uplink) through hardware vendor capabilities (e.g., spanning)

These data collection points can be statistics about a specific IT infrastructure resource ; physical devices, virtual devices, physical connections, virtual connections or resources running on physical or virtual devices like services, processes, applications, databases, etc.

Or the data collection points can be traffic flows or end-to-end specifics including passive traffic flows, synthetic transactions or even as simple as a pinging from remote points.

Metrics that are captured, typically revolve around throughput, errors, utilization, latency, up/down status, etc. (there are way to many to mention here).

After saying all this, there is a list a mile long of vendors (a number already noted on the DCAB) that capture these predominately time-series oriented data points about performance, capacity, availability using any/all these methods or vantage points (I know, passive traffic flows are not time-series data but patterns/usage/performance etc can be determined from them).

So, with all that data, what most these vendors offer are two primary types of functionality; 1) a variety graphical reports and 2)metric thresholding capabilities that produce a list of outstanding issues/alerts/alarms/events/concerns (whatever you want to call them).

Ok, so why did I organize and point all this out. So I can draw a line around where most of the innovation from my perspective is occurring. The above is for the most part in my eyes a commodity these days. Most companies have had collection/reporting/thresholding capabilities spanning multiple technology silos since pretty close to the start of the enterprise networking. The reports continue to get fancier, the number of data sources a single product collects from continues to expand, etc.  Another sign of commoditization is related to the variety of economic business models offering these products; open source, managed service providers, internet distributed products, appliances deployment models and indirect sales forces, large enterprise direct sales force, completely flexible frameworks for service providers to basically “build their own,” etc.

For the most part where the majority of technical innovation is occurring these days is the next layer above this data collection, reporting and alerting. Now let me say this, yes…there is some great innovation still occurring in the data collection realm (e.g., Xangati offering real-time Netflow down to a user level, PacketDesign monitoring routing messages, NetQoS leveraging advanced TCP/IP theory to analyze where end-to-end bottlenecks are occurring). But, for the most part these new data sources are being used to augment or replace currently deployed data sources in an attempt to see things from either as many vantage points or the best vantage points to avoid surprises within their unique enterprise IT environment.

So where is the serious innovation coming from…stay tuned for part 2.


Mar 5 2008   7:59PM GMT

Top Enterprise Management Tools vs. Data Center Automation Blueprint



Posted by: Ryan Shopp
Network monitoring, Performance management, BMC, DataCenter, Networkingchannel, Analytics, CA, Systems monitoring, CMDB, Application monitoring, InfoVista, IBM Tivoli, HP Software, Network Configuration, RealOps, RBA, Run Book Automation, IT Process Automation, Netuitive, NetQoS, Opnet, DCAB, Tideway

I was doing some “light” reading this morning and came upon this recent article:  Top 10 Enterprise Management Tools

It’s focused on Complete Enterprise Management, not specifically focused on the Data Center so I thought I would summarize and then compare/contrast/discuss:

  • Network Fault & Performance: CA eHealth & Spectrum
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Advisor
  • Application Discovery Mapping: Tideway Foundation
  • Business Intelligence: Cognos
  • ITSM Workflow, CMDB and Service Desk: BMC Remedy ITSM and Atrium
  • Network & Systems Configuration Managment: HP Automation (formerly Opsware SAS & NAS)
  • Process Automation: BMC RunBook Automation

Since it isn’t data center centric, it’s light on automated management for applications & databases.  It also chooses to stay away from the very congested and sometimes confusing security/protection market.

Next up, I thought  it would be fun to do a quick mapping to the Data Center Automation Blueprint.

  • Network Fault & Performance, Consolidated Event Management, Service Impact Monitoring = Availability & Performance
  • Application Discovery Mapping, CMDB = IT Resource Reconciliation
  • Business Intelligence = Analytics (maybe…Analytics is still a work in progress…need to figure out this vs. BSM etc)
  • ITSM Workflow, Service Desk = outside of DCAB listed as Manual Task Orchestration

I was surprised not to see an End-User Application Performance Monitoring category.  These products either do their duty from passive agents on the endpoint or from data center appliances using slick algorithms, TCPIP theory, etc.  Maybe that could have indirectly been rolled under Network Fault & Performance as CA acquired Wily which offers that.  The other one missing was more towards Capacity Planning and Trending Analytics, either based off historical data like what Opnet offers or from real-time data patterns from Netuitive.

Needless to say I found it a really nice write-up and summary of those products/offerings.  The only thing I struggle with is all of the big 4 (BMC, CA, HP, IBM) are represented in this mix.  Which means you will have 4 sales guys all continously battling it out to grab more land.  This may be good from a cost competition standpoint, but it’s a real fiasco for making sure all parts are playing nicely with each other or simply managing those vendor relationships.  Bottom line, you’re always going to have at least one of the big 4 in there as they continue to snap-up the innovative smaller companies/ technologies to enhance their portfolio and offer differentiation.  So I’d typically recommend a strategy where you pick 2 of the big 4 and keep them in check versus each other while continually looking for those innovative start-up’s to fill in the gaps.  Here is an example of how you could do this using the categories in the original article.

  • Network Fault & Performance: HP Network Node Manager, Operations Manager, Performance Insight
  • Consolidated Event Management: IBM Tivoli Netcool
  • Service Impact Monitoring : IBM Tivoli Business Service Manage & Service Level Adviser
  • Application Discovery Mapping: IBM Tivoli Application Dependency Discovery Manager
  • Business Intelligence: Cognos (which IBM recently acquired)
  • ITSM Workflow, CMDB and Service Desk: HP AssetCenter (former Peregrine)
  • Network & Systems Configuration Managment: HP Data Center Automation (formerly Opsware SAS & NAS)
  • Process Automation: HP Operations Orchestration (formerly iConclude that Opsware acquired)

Or, if you want to completely rebel and go the non-big 4 route, take a look at the above mappings to the DCAB and look for a name that’s not big-4.  Example:  Network Fault & Performance: InfoVista or NetQoS


Feb 28 2008   4:55PM GMT

Analytics; What are the top capabilities?



Posted by: Ryan Shopp
BMC, Configuresoft, DataCenter, Analytics, NCCM, Alterpoint, Netuitive, Integrien, Opnet, DCAB

Recently, I made some adjustments to the Data Center Automation Blueprint where we combined 2 original areas and added a new one for Analytics.  Steve Henning just posted a great guest blog entry over at Doug McClure’s blog called “Why Real Time Analytics?” I personally liked the analogy to TQM and the manufacturing industry.

He also recently jotted down some of his thoughts on capabilities within the comments section for the posting “Data Center Automation Blueprint; now includes virtualization thoughts.”

Here are some of my initial thoughts that I will take another pass at cleaning up in the next week or two.  I wanted to get this posted in a timely manner to hopefully inspire some discussions:

1) Inter-domain Integrations - Steve called it “Cross Silo” in his comment post. But the analytics solutions need to have a data model and API/SDK that is not specific to one domain (e.g., databases, windows systems, network devices, websphere applications).  To perform holistic analysis you need more then one point of view.

2) Pattern Logic Automation- Automation through algorithms, rules etc that work to mimic the human problem solving / analysis process.

3) “Advanced” Graphical Visualization- more then summary graphics, pie charts etc…what I’m think here is something I can look at that helps me see the pattern or some unique situation/trend affecting the business (e.g., correlation of trouble ticket and performance monitoring details).  A better name then “advanced” is needed here for sure.

So far the vendors I’m thinking of when I’m creating the above functionality list (as noted in the DCAB) include;

Who else do we believe should be in this analytics bucket? Thoughts on these 3 capabilities?  What are some others?


Dec 28 2007   11:31PM GMT

Digging into each of these 6 functional areas: Performance and Capacity



Posted by: Ryan Shopp
Network monitoring, Performance management, Symantec, BMC, EMC, NetIQ, Alcatel-Lucent, NetScout, DataCenter, CA, OSS, Systems monitoring, InfoVista, IBM Tivoli, HP Software, Quest Software, Netuitive, Integrien, NetQoS, Compuware, Fluke Networks, Network Instruments, Opnet, Entuity, Brix Networks, Keynote, Gomez, Xangati, Apparent Networks, Packet Design, Groundwork, Hyperic, Nagios, OpenNMS, ZenOSS, Zabbix

First things first, we have many of the same vendors from the Availability & Notification functional area of this Data Center Automation Blueprint in this category. Which probably begs the question, do we combine Availability & Notification with Performance & Capacity? I know in the OSS (not Open Source Software but telco-oriented Operational  Support Systems) model they do this and call it “Service Assurance”, another name could be Service Level Management as they two monitoring-centric functions are about ensuring service levels are met…or simply I call it Availability & Performance? I’ll come back to this at the end after I type up the players in this Performance & Capacity area:

But then, we have a slew of others that have been around for quite some time now…

And some innovative up-and-comers in some unique technology/approaches…

Real-Time Behavior/Pattern Analysis through Dynamic Thresholding

IP Traffic/Packet Flow Monitoring & Analysis

Open Source Software (OSS) vendors

Whew..that was more work then I expected to pull together and I’m not done yet…  Please throw into the comment who I’ve missed (I know there has to be a few).

The major challenge here is organizing and breaking down this functional area.  There are so many approaches to obtain performance metrics from/for the data center.  Some of the techniques and perspectives include;

  • passive vs. active
  • agent vs. agent-less
  • in-line appliance vs. out-of-band appliance (e.g., span a port)
  • proprietary vs. leverage infrastructure mgmt. capabilities (e.g., Cisco Netflow)
  • outside the data center looking in vs. inside the data center itself.
  • Reactive troubleshooting vs. Proactive Predictive

I’m going to need to have a part two (and maybe more) for this functional category breaking down the pro’s and con’s of various approaches.  Which vendors do what, etc.  I also need to revisit that question from the top of do we combine this into a single “availability & performance” functional category???  For now, this first pass will have to do…