Posted by: Matt Stansberry
Data Center, Hyperic, Nagios, open source, Reconnoiter, Systems Management
Theo Schlossnagle, CEO and founder of managed services and hosting provider OmniTI, hopes to solve some of the common complaints with open source systems management tools with his company’s new tool Reconnoiter.
OmniTI manages 15 data centers with heterogeneous architectures for multiple clients, and Schlossnagle said he’s used every tool under the sun: Zenoss, Tivoli, OpenView, Nagios, Cacti and more.
The recurring problems Schlossnagle found with open source management tools — scaling issues, repeated effort for configuration management, and requirements for powerful server infrastructure – frustrated his team to the point where OmniTI built its own toolset for monitoring metrics, graphing data for capacity planning, and post-mortem analysis of problems.
The tool uses an agent-based system. Users would install a Noit Daemon in each important portion of infrastructure and configure it to monitor different services. The software is written in C, plug-ins are written in C or Lua. Reconnoiter uses SNMP, ITMP, HTTP among other protocols.
The company is offering it under BSD license on its Website for free.
According to Schlossnagle, challenges using the open source management software Nagios were a major driver for developing the Reconnoiter tool.
“Nagios is quite inefficient in the way it collects data,” Schlossnagle said. “It follows the age-old Unix philosophy that you use the right tool for each job. This means that Nagios ends up launching thousands of small applications to test things. While the lots of little tools philosophy is often convenient, it heavily conflicts with high performance, low latency requirements. Often purpose built tools need to take over in that role — that is what Reconnoiter is.
“I have to buy a big, expensive box to run Nagios — I don’t with the Reconnoiter agents,” Schlossnagle continued. “Nagios does fault detection, but not trending — which means I have to double my efforts by configuring both Nagios and another tool.”
Schlossnagle also said Nagios’ monitoring was centralized, so it was difficult to adding checks in the field. Managing configurations was hard to track as you deployed new services and machines.
The Reconnoiter tool polls systems to see if they’re healthy in a similar way that Nagios does, but of the open source-commercial hybrid products that are out there, Schlossnagle said the product is most similar to Hyperic.
“Hyperic takes a more holistic view of monitoring in that it includes both trending and fault detection. Reconnoiter takes this approach as well.”
The Reconnoiter tool is also designed to help IT managers analyze Web traffic events in a very granular way, even ones that happened in the distant past. “RRDTool is specifically designed to retain data within size constraints. You define how long you wish to retain data on various granularities,” Schlossnagle said. “In most systems that use rrdtool (like Cacti) recent data (like one week) is retained on five minute granularity, while data older than a week is reduced to a granularity of one hour. So, if you want to compare a spike today to one from six months ago, it is very likely that you have a defeating skew: 288 five-minute intervals for “today” and four six-hour intervals for the day in question six months back.”
Reconnoiter approaches this by taking the stance that storage is cheap. “There is not excuse for throwing any of that data away. I’ll go buy a terabyte of disk. I’m not going to search back 12 months very often, so it doesn’t need to be fast, but I need to be able to do it.”
According to Schlossnagle, watching the spike happen gives you a better understanding how traffic patterns shift during a major event, for example a Web site being picked up by a large social media site like Digg.
“If I’m looking at that spike on my systems at thirty second granularity, I can tell you how fast that spike happened. If I use the RRD tool with Nagios and Cacti, I can only see that day at that level of granularity for about six hours.”
This tool can help IT managers plan for capacity during spike scenarios and compare to events in the past.
“Our primary goal was to make our lives easier. This tool replaces an enormous amount of headache at OmniTI,” Schlossnagle said. “Making it a successful open source tool makes it even easier. One of the short term goals it to have it adopted other places and get the tool deployed in large environments.”
Today, OmniTI is slowly introducing Reconnoiter to its managed services clients. The company is currently monitoring tens of thousands of metrics across five data centers, approaching a terabyte of metric data.
OmniTI does not plan to develop a commercial version at this time, like Hyperic or Zenoss. “An open source approach with a strong community is better,” Schlossnagle said. “I don’t want to be in the tools business. If a company wants to give us money for support and indemnify them with IP rights, we won’t turn away that money.
“The key difference being the product we deliver, support and indemnify, would be the same product, not the one that has special neat features that paying customers get.”
You can give Reconnoiter a test run at labs.omniti.com.