Linux Done Right archives - Enterprise Linux Log

Enterprise Linux Log:

Linux Done Right

Dec 18 2007   2:44PM GMT

Splunk: Or how I learned to stop worrying and love log files



Posted by: admin
disaster recovery, UNIX, Andrew Kutz, Linux Done Right

Log files may be the most important piece of forensic information we have when determining why a server or application crashes. However, warnings of such a distaster are available to IT administrators. They just have to know where to look (hint: what do you think log files are for?)

Looking for a repeating pattern in a list one thousand items long might seem daunting, but luckily there is help. There’s no need to fear, Splunk is here.

Splunk is an amazing little web application (currently at version 3.1.3) that indexes just about any type of log file you can think of. Not only does Splunk index the information, but it presents it as a beautiful, easy-to-use, web application (purists need not worry, you can access the information from a terminal as well.) So you say, what is the big deal about searching log files? You say that you can do that with grep. That is true, but Splunk is hundreds of times more powerful and excels in four areas:

  • Indexing
  • Presentation
  • Analysis
  • Collaboration

Indexing

Splunk can index logs from a number of sources:

  • Files and directories
  • FIFO queues (pipes)
  • Network ports (syslogging directly to Splunk)

Splunk data inputs

Splunk enables you to tail log files, the contents of entire directories, pipes, and even open ports for applications to send their logs directly to Splunk itself (although I recommend using a separate syslog server in order to maintain a file-based log rotation history.)

Presentation

Also, Splunk is more physically appealing than grep (no offense, grep). To give you an idea of what data looks like in Splunk take a gander at this screenshot:

Looking at log files has never been so much fun!

Analysis

This is where Splunk really outshines its command line competition. Imagine you wanted to comb your log files to figure out which VM has had the most number of VMotion events in your VMware Infrastructure? With Splunk that is as easy as pie — a pie chart, that is:

Splunk allows you to easily query the data using SQL in order to build complex analysis reports. And if that was not enough…

Collaboration

Splunk not only allows administrators to easily determine the goings-on of their servers through log file analysis, Splunk also allows administrators to share their logs with the rest of the Splunk community. Imagine this scenario: a major website’s web servers are crashing and the website’s administrators cannot figure out why. As an interner business, their primary point-of-sale is the web; so if their web servers go offline that is very bad. The administrators are pulling out their hair trying to figure out the problem when one of them realizes they haven’t checked Splunk. Because the administrators at Amazon are participating in SplunkBase they can analyze not only their log files but also the logs of anyone else who uploads logs to Splunk’s community. Bingo! They discover that the problem was a lock that was not getting destroyed.

By themselves, the administrators did not have a large enough data set to determine the problem, but because others had generated similar logs and figured out the problem already, the website admins were able to quickly resolve the issue.

Splunk-tastic!

I’ll say it again, Splunk is great. Apart from VMware Server, Splunk may be my favorite server application to come along in the past few years. I cannot imagine running an enterprise data center without Splunk. See you on SplunkBase!

Oct 2 2007   12:59PM GMT

Linux Done Right (personals edition): Linux shop seeks Linux vendor



Posted by: admin
disaster recovery, Backup & recovery, Hardware issues, Linux Done Right, Administration, interoperability and integration

Consider this the second in an occasional, meandering series of articles on Linux done right. These aren’t meant to boost the sales of any particular vendor, but instead are meant to show other end users, IT managers and decision makers what to look for when vetting applications and operating system migrations. It can be support, migrations strategies, execution or anything and everything in between. If it’s Linux done right, then you’ll find it here.


Matthew Porter, the CEO of Contegix, is an anomaly as far as I’m concerned–and I don’t mean that in a negative way whatsoever.You see, Contegix, a managed hosting provider based in St. Louis, Mo., is a 100% Linux shop. Every server they run internally has Red Hat Enterprise Linux 3, 4 or 5 installed (although they’re not using Xen just yet), and all their applications, save a financial/payroll application that just has to run on Windows as a virtual instance in VMware, runs on Linux.OK, so that makes them a 99% Linux shop with a vestigial Microsoft Windows appendix, and I apologize. In an industry that holds sacred the “five nine’s,” I think you can give me some slack on this one.

Anyway, outside of European universities and some HPC instances, 100% Linux shops are a rare breed in this heterogeneous operating system mishmash of a world we live in today. But that still hasn’t stopped Contegix. In a call last week, Porter told me that business is going well and growing fast. So fast, in fact, that Porter called what’s happened over the past few months “explosive.””We’ve grown 10% every month over the past couple of years,” he said. “Today it’s more like 14%.”

I called Contegix an anomaly, but their story isn’t all the surprising when you look at Linux growth over the same period of time. Everyone from Gartner to IDC to our friends at Saugatuck have pegged 2009-2011 or thereabouts as the magic year where Linux takes an approximate 50% share of all mission critical operations in the enterprise. That’s not edge of enterprise stuff in addition to mission critical, either–it’s bare bones “if this messes up then our business suffers” stuff.

But that’s all in the amorphous soup of the far future. Contegix was an all Linux shop now, and with all of that growth over the past few quarters, it was starting to experience what can only be described as growing pains. Legacy software and a surging pile of user data that grew every month were taxing the system and tying up resources for days at a time, Porter said.

Their old backup solution, Arkeia, worked well for about a year, Porter said, but couldn’t scale and Contegix was spending 40+ hours per week managing backups and recoveries.

“The problem we were dealing with was that we were working around the limitations of our previous software,” Porter said. “It often took 24 hours to backup the index that the software was using.” Sometimes that 24-hour estimate was being generous, and the backup took longer (some recovery or file system-related efforts were eating up 42 or more hours a clip). “When a customer needed some stored, even if it was just a 65 meg file or a database or whatever, it may have taken and hour just to restore that. And we were storing about 50 terabytes a month,” he said.

As Contegix continued to grow, speeding up the backup and recovery time would become a top priority going forward.

Looking for options, thinking of Linux

A Linux shop should expect a certain degree of Linux respect and understanding, right? Contegix’s case was no exception. From the onset, Porter and his team sought out vendors who could provide recovery and back up peace of mind with a Linux twist, no questions asked. They had to, because Porter wasn’t about to spend even more money to retrain his staff on Windows or SQL Server.

“We have a lot of Postgres and MySQL, so it was critical to have hot backup plug-ins for those databases … [and] we had literally no technical staff that used Windows as a desktop. We didn’t want to learn SQL Server,” he said.

Those strict specifications hurt the first candidate, Oceanport, N.J.-based CommVault, right out of the gate. With CommVault’s offering, called Simpana, Porter said his staff was asked to learn SQL Server. “Given the ownership costs, CommVault had higher costs of ownership,” Porter said.

Nor did CommVault offer support for MySQL or PostgreSQL. Contegix was also unable to test the application because CommVault wanted a signed PO first. No deal.

The next solution came from Symantec, which Porter and some of the Contegix team had had some experience with at a previous company. From what Porter told me, things didn’t go well even with the prior encounter serving as a foot in the door. Again, the hangup arrived because of how Contegix viewed the vendor’s approach to Linux, Linux support and testing.

“[Symantec weren't as nimble in evaluation process as they could have been. It took two months to get a quote, but there was still no demo unit. The installation process was too costly. The there was the Linux dynamic. The reseller we went through basically said 'we only sell for Windows, but we can do Linux after we get approval for Linux.'," Porter said. "It kind of felt like they fully supported [Linux], but not fully at all.”

Symantec’s application, NetBackup, was also out of Contegix’s price range, and they were worried about the potential management hours they would have to spend on NetBackup.

Cue the Price is Right “you lose” gong sound.

Finding some Linux spine

Rounding out a trio of back up and recovery options was BakBone Software, a backup and recovery vendor based in San Diego. Interestingly enough, the trait that immediately stuck out in Porter’s mind about his experiences with BakBone wasn’t technical, it was support and sales-related.

“The same sales rep we dealt with in the beginning was there a year later. Sometimes when you see a lot of turnover the reps don’t really believe in the product, or it’s not selling, but that obviously wasn’t the case,” he said.

The came the point on which many Linux and open source software relationships are made or broken: support. How does it fare? Is it what you’ve become accustomed to over the years? Is it better? Is it completely different? Is it professional?

In Porter’s case, he asks similar questions, but he also has a test of his own that’s been generated from Contegix’s own support practices. “[As a managed hosting provider] we always have support staff on hand at all times 24/7/365, and we answer every ticket in five minutes. We assign an engineer to that ticket, not some sales rep or whatever. When an organization like ours is built around support as the number one feature, then vendors must have that same mentality,” he said.

Long story short, BakBone did support MySQL and Postgres, and the handful of other applications on hand like Ruby on Rails and Java, and it allowed testing and the price point was right, so Porter bought into NetVault: Backup 8.0.

The server implementation took less than a day, and today Contegix has migrated about 98% of its Arkeia servers over to NetVault. In twenty more days, Porter expects the migration to be complete.

“The consolidation was was a huge benefit for us. They can do full consolidation or a synthetic one. The second big draw for us is the not just the consolidation is that there, it is the fact that we have great independent restore time, that’s fast and a great way to back up our catalogue and index,” Porter said. “We do a lot of back up to a fiber channel SAN. With NetVault, we could mount our SAN in drivesafe just like Oracle does, so that the load can be shared among back end servers and multiple backups and clients. Literally, we have three or four servers that just perform backup.”

For Contegix, the ability to share media and have those multiple backup servers is “ubelieveably smart,” Porter said. “We were spending so much time writing custom scripts to work with the ODL system before and many of those were already features in BakBone,” Porter said.

Indeed, before the third party backup and recovery app was introduced to the Contegix back end environment, the IT staff was wasting a good 100-150 hours per month on those customer scripts. But not anymore.

Like I wrote earlier, the migration off legacy is about 98% done. Something could still go wrong, I suppose, but that’s not the feeling I got when talking with Porter. From the sounds of things this shop will remain a Linux-only club for the indefinite future.


Have a Linux Done Right success story you’d like to share? Send it to me at Jack Loftus, News Writer and I guarantee I’ll get you the 15 minutes of IT fame you so richly deserve.


Sep 10 2007   3:38PM GMT

Linux Done Right: A user’s pleasant surprise



Posted by: admin
support, identity management, Enterprise applications for Linux, Samba, Linux basics, Linux versus Windows, Linux desktops, Open source applications, Linux Done Right, Administration, interoperability and integration

Consider this the first in an occasional, meandering series of articles on Linux done right. These aren’t meant to boost the sales of any particular vendor, but instead are meant to show other end users, IT managers and decision makers what to look for when vetting applications and operating system migrations. It can be support, migrations strategies, execution or anything and everything in between. If it’s Linux done right, then you’ll find it here.


First, a little background.

I initially spoke with John Flores, a system administrator with the University of Texas at San Antonio, earlier this year for a broad SearchEnterpriseLinux.com article on Linux support. The article focused on the good, the bad and the ugly of working with commercial Linux distributors, as well as with the alternatives like CentOS and Debian. It was also a comparison of the past, present and future of Linux support as a whole.

Flores and his data center — like many data centers today — were at a crossroads. He was using Windows NT as his domain controller, but it was update time as a few Dell servers were past their prime and new ones were set to be introduced in the summer of 2006.

“We had an old Dell 6300 that was to be put out of service … it was what was running the NT 4.0,” Flores told me. “Rather than move NT 4.0 to a new server, we were looking for an OS that could put onto a new server and it was going to be either Linux or MS.”

But old servers weren’t the only issue at the U of T that summer. Flores explained that NT 4.0 had become “unstable, mostly due to age.” The software configurations were also old and difficult to maintain, he said. and a lot of “junk” had accumulated over the years. The clutter was quickly becoming a maintenance issue for the IT staff, he said.”We were having a server failure almost once every two weeks. A server would have a major problem so we’d have to reboot it and bring it back up again,” Flores said. But then things got even worse.

“Because this is a university environment, we have a whole new set of something like 5,000 users changing over every semester. We have to log all those IDs and passwords every semester.” Continued »