Disaster Recovery archives - Enterprise Linux Log

Enterprise Linux Log:

disaster recovery

Oct 24 2008   7:19PM GMT

Whirlwind Tech Tour explores remote administration tools



Posted by: Caroline Hunter
disaster recovery, Security, Linux, Enterprise applications for Linux, Administration, interoperability and integration

This week, SearchEnterpriseLinux.com launched its Whirlwind Tech Tour, a new site feature in which we ask Linux professionals a weekly question and post their answers side by side. This week we asked about remote server administration. Done correctly, remote server administration enables companies to distribute resources and prepare for disaster recovery. It also requires a strong toolset to perform these roles well.  

Which tool is best for remote server administration in a Linux environment, and why?

 Jay Lyman, an open source analyst at Boulder, Colo.-based 451 Group, recommends the General Public License-licensed Virtual Network Computing (VNC) system for its user-friendly general user interface. This tool works with Open Secure Shell (OpenSSH) to perform tunneling, a method to establish secure connections between local and remote networks.  OpenSSH itself received several mentions in our IT pros’ responses .

As Kristian Erik Hermansen noted, the tool does more than tunnel. Hermansen’s description of OpenSSH’s capabilities: It can “forward graphical applications to remote machines, create a series of tunnels, redirect traffic over a SOCKS proxy, and perform way too many other features to mention.”  

Serge Wroclawski expected SSH to be at the top of respondents’ lists but suggested they trade it in for more automated remote administration tools. He advises managing remote server configuration with tools such as bcfg2 and Puppet. 

“Remote server management is a multidimensional problem, and managing the Linux OS is only a part of it,” said Ideas International Inc.

CEO Tony Iams Iams outlined several considerations in approaching this problem, but concluded that  “perhaps the most important factor in choosing a remote Linux management tool…is to make sure it integrates smoothly into the dominant management tools and procedures that are already in place.” 

Do you have a question you’d like to see asked and answered? Email it to  editor at searchenterpriselinux.com class=”MsoCommentReference”> . To see the complete responses from our IT pros, go to the feature main page.


Jun 17 2008   7:00PM GMT

SEP adds online backup for VMware



Posted by: Pam Derringer
disaster recovery, Virtualization, VMware, DataCenter, DataManagement, Backup & recovery, Enterprise applications for Linux, Linux blogs and news, Open source applications, Administration, interoperability and integration

SEP Software LLC, a German-based company with U.S. headquarters in Boulder, Colo., has introduced SEP Sesam 3.4 backup and recovery software with additional support for VMware this week at the fourth annual Red Hat Summit. Known primarily in Europe, the company has expanded its U.S. presence for about 18 months.

According to SEP Software President Tim Wagner, the new version runs on Red Hat, Novell SUSE, Debian, Ubuntu and other open source operating systems and is very easy to use in a cross-platform environment. Unlike its competitors, SEP Sesam 3.4 enables users to back up and recover virtualized data online.

SEP already supports all major hardware, operating systems and databases, but it has now extended to virtualized data and can be installed as a guest for concurrent backups if the user already has another backup product, Wagner said. SEP provides snapshot backups for VMware, requiring only one installation per VMware host. Installation and subsequent backup and recovery operations are quick and easy to do. Backups also can be performed within a storage area network.

In addition, SEP enables users to migrate data from disk to disk to tape and transfer data securely over the network via AES 256 encryption and decryption. An administrative application program interface provides access to all servers and their data.

Prices start at $377 per server and $214 per client. Online groupware and database modules start at $845 to $3,845, depending on operating system and hardware manufacturer.


Dec 18 2007   2:44PM GMT

Splunk: Or how I learned to stop worrying and love log files



Posted by: admin
disaster recovery, UNIX, Andrew Kutz, Linux Done Right

Log files may be the most important piece of forensic information we have when determining why a server or application crashes. However, warnings of such a distaster are available to IT administrators. They just have to know where to look (hint: what do you think log files are for?)

Looking for a repeating pattern in a list one thousand items long might seem daunting, but luckily there is help. There’s no need to fear, Splunk is here.

Splunk is an amazing little web application (currently at version 3.1.3) that indexes just about any type of log file you can think of. Not only does Splunk index the information, but it presents it as a beautiful, easy-to-use, web application (purists need not worry, you can access the information from a terminal as well.) So you say, what is the big deal about searching log files? You say that you can do that with grep. That is true, but Splunk is hundreds of times more powerful and excels in four areas:

  • Indexing
  • Presentation
  • Analysis
  • Collaboration

Indexing

Splunk can index logs from a number of sources:

  • Files and directories
  • FIFO queues (pipes)
  • Network ports (syslogging directly to Splunk)

Splunk data inputs

Splunk enables you to tail log files, the contents of entire directories, pipes, and even open ports for applications to send their logs directly to Splunk itself (although I recommend using a separate syslog server in order to maintain a file-based log rotation history.)

Presentation

Also, Splunk is more physically appealing than grep (no offense, grep). To give you an idea of what data looks like in Splunk take a gander at this screenshot:

Looking at log files has never been so much fun!

Analysis

This is where Splunk really outshines its command line competition. Imagine you wanted to comb your log files to figure out which VM has had the most number of VMotion events in your VMware Infrastructure? With Splunk that is as easy as pie — a pie chart, that is:

Splunk allows you to easily query the data using SQL in order to build complex analysis reports. And if that was not enough…

Collaboration

Splunk not only allows administrators to easily determine the goings-on of their servers through log file analysis, Splunk also allows administrators to share their logs with the rest of the Splunk community. Imagine this scenario: a major website’s web servers are crashing and the website’s administrators cannot figure out why. As an interner business, their primary point-of-sale is the web; so if their web servers go offline that is very bad. The administrators are pulling out their hair trying to figure out the problem when one of them realizes they haven’t checked Splunk. Because the administrators at Amazon are participating in SplunkBase they can analyze not only their log files but also the logs of anyone else who uploads logs to Splunk’s community. Bingo! They discover that the problem was a lock that was not getting destroyed.

By themselves, the administrators did not have a large enough data set to determine the problem, but because others had generated similar logs and figured out the problem already, the website admins were able to quickly resolve the issue.

Splunk-tastic!

I’ll say it again, Splunk is great. Apart from VMware Server, Splunk may be my favorite server application to come along in the past few years. I cannot imagine running an enterprise data center without Splunk. See you on SplunkBase!


Nov 9 2007   9:52AM GMT

UPDATE REMINDER: Product of the Year nominations are going on now!



Posted by: admin
disaster recovery, Database, authentication, blades, identity management, Backup & recovery, Enterprise applications for Linux, Xen, Red Hat, green computing, Systems Management, Linux basics, SUSE/Novell, Hardware issues, Clusters, grids and mainframes, Open source applications, Administration, interoperability and integration

2007 Product of the Year AwardsWorking with vendors is tough. You need their help, they want your money. Hopefully, whatever it is they help you install works and the price meets you both somewhere in the middle (as in your side of the middle, right?).

Sometimes this process is a headache, but sometimes a project can really surprise you—things just work and upper management is just peachy keen with how the whole thing looks on the balance sheet.

In that vein, SearchEnterpriseLinux.com wants to help its readers discover the best of the best in Linux products for the enterprise in our prestigious SearchEnterpriseLinux.com 2007 Products of the Year awards. We’ve been asking readers and vendors over at SearchEnterpriseLinux.com to nominate a favorite product they’ve used or to nominate their own new product, and now we’ve opened it up to the Intertubes here at the Enterprise Linux Log. Regardless of where you fall — vendor, user or general Linux guru –the deadline is drawing near!

Our editorial team and a select panel of industry experts and analysts are currently accepting submissions online until 5 p.m. PST on Nov. 9, 2007 in a range of categories, including: Server Linux platform product (either a distribution release or a new, integrated server Linux offering); Security applications/tools for Linux on the server; Virtualization product for Linux on the server; and Linux administration tools. You can access the 2007 POY submission page in the link above.

To qualify, new or significantly upgraded products must have been shipped after October 31, 2006, and before November 1, 2007. Submit your entry today and let us know what you think are the top data center products on the market!


Oct 2 2007   12:59PM GMT

Linux Done Right (personals edition): Linux shop seeks Linux vendor



Posted by: admin
disaster recovery, Backup & recovery, Hardware issues, Linux Done Right, Administration, interoperability and integration

Consider this the second in an occasional, meandering series of articles on Linux done right. These aren’t meant to boost the sales of any particular vendor, but instead are meant to show other end users, IT managers and decision makers what to look for when vetting applications and operating system migrations. It can be support, migrations strategies, execution or anything and everything in between. If it’s Linux done right, then you’ll find it here.


Matthew Porter, the CEO of Contegix, is an anomaly as far as I’m concerned–and I don’t mean that in a negative way whatsoever.You see, Contegix, a managed hosting provider based in St. Louis, Mo., is a 100% Linux shop. Every server they run internally has Red Hat Enterprise Linux 3, 4 or 5 installed (although they’re not using Xen just yet), and all their applications, save a financial/payroll application that just has to run on Windows as a virtual instance in VMware, runs on Linux.OK, so that makes them a 99% Linux shop with a vestigial Microsoft Windows appendix, and I apologize. In an industry that holds sacred the “five nine’s,” I think you can give me some slack on this one.

Anyway, outside of European universities and some HPC instances, 100% Linux shops are a rare breed in this heterogeneous operating system mishmash of a world we live in today. But that still hasn’t stopped Contegix. In a call last week, Porter told me that business is going well and growing fast. So fast, in fact, that Porter called what’s happened over the past few months “explosive.””We’ve grown 10% every month over the past couple of years,” he said. “Today it’s more like 14%.”

I called Contegix an anomaly, but their story isn’t all the surprising when you look at Linux growth over the same period of time. Everyone from Gartner to IDC to our friends at Saugatuck have pegged 2009-2011 or thereabouts as the magic year where Linux takes an approximate 50% share of all mission critical operations in the enterprise. That’s not edge of enterprise stuff in addition to mission critical, either–it’s bare bones “if this messes up then our business suffers” stuff.

But that’s all in the amorphous soup of the far future. Contegix was an all Linux shop now, and with all of that growth over the past few quarters, it was starting to experience what can only be described as growing pains. Legacy software and a surging pile of user data that grew every month were taxing the system and tying up resources for days at a time, Porter said.

Their old backup solution, Arkeia, worked well for about a year, Porter said, but couldn’t scale and Contegix was spending 40+ hours per week managing backups and recoveries.

“The problem we were dealing with was that we were working around the limitations of our previous software,” Porter said. “It often took 24 hours to backup the index that the software was using.” Sometimes that 24-hour estimate was being generous, and the backup took longer (some recovery or file system-related efforts were eating up 42 or more hours a clip). “When a customer needed some stored, even if it was just a 65 meg file or a database or whatever, it may have taken and hour just to restore that. And we were storing about 50 terabytes a month,” he said.

As Contegix continued to grow, speeding up the backup and recovery time would become a top priority going forward.

Looking for options, thinking of Linux

A Linux shop should expect a certain degree of Linux respect and understanding, right? Contegix’s case was no exception. From the onset, Porter and his team sought out vendors who could provide recovery and back up peace of mind with a Linux twist, no questions asked. They had to, because Porter wasn’t about to spend even more money to retrain his staff on Windows or SQL Server.

“We have a lot of Postgres and MySQL, so it was critical to have hot backup plug-ins for those databases … [and] we had literally no technical staff that used Windows as a desktop. We didn’t want to learn SQL Server,” he said.

Those strict specifications hurt the first candidate, Oceanport, N.J.-based CommVault, right out of the gate. With CommVault’s offering, called Simpana, Porter said his staff was asked to learn SQL Server. “Given the ownership costs, CommVault had higher costs of ownership,” Porter said.

Nor did CommVault offer support for MySQL or PostgreSQL. Contegix was also unable to test the application because CommVault wanted a signed PO first. No deal.

The next solution came from Symantec, which Porter and some of the Contegix team had had some experience with at a previous company. From what Porter told me, things didn’t go well even with the prior encounter serving as a foot in the door. Again, the hangup arrived because of how Contegix viewed the vendor’s approach to Linux, Linux support and testing.

“[Symantec weren't as nimble in evaluation process as they could have been. It took two months to get a quote, but there was still no demo unit. The installation process was too costly. The there was the Linux dynamic. The reseller we went through basically said 'we only sell for Windows, but we can do Linux after we get approval for Linux.'," Porter said. "It kind of felt like they fully supported [Linux], but not fully at all.”

Symantec’s application, NetBackup, was also out of Contegix’s price range, and they were worried about the potential management hours they would have to spend on NetBackup.

Cue the Price is Right “you lose” gong sound.

Finding some Linux spine

Rounding out a trio of back up and recovery options was BakBone Software, a backup and recovery vendor based in San Diego. Interestingly enough, the trait that immediately stuck out in Porter’s mind about his experiences with BakBone wasn’t technical, it was support and sales-related.

“The same sales rep we dealt with in the beginning was there a year later. Sometimes when you see a lot of turnover the reps don’t really believe in the product, or it’s not selling, but that obviously wasn’t the case,” he said.

The came the point on which many Linux and open source software relationships are made or broken: support. How does it fare? Is it what you’ve become accustomed to over the years? Is it better? Is it completely different? Is it professional?

In Porter’s case, he asks similar questions, but he also has a test of his own that’s been generated from Contegix’s own support practices. “[As a managed hosting provider] we always have support staff on hand at all times 24/7/365, and we answer every ticket in five minutes. We assign an engineer to that ticket, not some sales rep or whatever. When an organization like ours is built around support as the number one feature, then vendors must have that same mentality,” he said.

Long story short, BakBone did support MySQL and Postgres, and the handful of other applications on hand like Ruby on Rails and Java, and it allowed testing and the price point was right, so Porter bought into NetVault: Backup 8.0.

The server implementation took less than a day, and today Contegix has migrated about 98% of its Arkeia servers over to NetVault. In twenty more days, Porter expects the migration to be complete.

“The consolidation was was a huge benefit for us. They can do full consolidation or a synthetic one. The second big draw for us is the not just the consolidation is that there, it is the fact that we have great independent restore time, that’s fast and a great way to back up our catalogue and index,” Porter said. “We do a lot of back up to a fiber channel SAN. With NetVault, we could mount our SAN in drivesafe just like Oracle does, so that the load can be shared among back end servers and multiple backups and clients. Literally, we have three or four servers that just perform backup.”

For Contegix, the ability to share media and have those multiple backup servers is “ubelieveably smart,” Porter said. “We were spending so much time writing custom scripts to work with the ODL system before and many of those were already features in BakBone,” Porter said.

Indeed, before the third party backup and recovery app was introduced to the Contegix back end environment, the IT staff was wasting a good 100-150 hours per month on those customer scripts. But not anymore.

Like I wrote earlier, the migration off legacy is about 98% done. Something could still go wrong, I suppose, but that’s not the feeling I got when talking with Porter. From the sounds of things this shop will remain a Linux-only club for the indefinite future.


Have a Linux Done Right success story you’d like to share? Send it to me at Jack Loftus, News Writer and I guarantee I’ll get you the 15 minutes of IT fame you so richly deserve.


Aug 22 2007   2:20PM GMT

LinuxWorld wrap-up: Demystifying data recovery



Posted by: admin
disaster recovery, LinuxWorld

This gem didn’t have a home on any of our sites, but I didn’t want the reporting to go to waste. So it’s going underground on the Enterprise Linux Log. On that note, enjoy some session coverage from LinuxWorld 2007!


SAN FRANCISCO – Everyone in IT uses storage in their data center, therefore everyone will one day have to deal with that storage failing. It could happen at anytime.Even in the moments before your LinuxWorld presentation on demystifying data recovery.

That’s what happened to Chris Bross anyway, roughly five minutes before attendees starting filing into his session on “Demystifying Data Recovery” here at the LinuxWorld Conference and Expo.

Bross is an enterprise recovery engineer with Novato, Calif.-based DriveSavers Data Recovery Inc., and the good news for his presentation was that he had brought along a backup USB thumb drive with a copy of his presentation. All too often however, Bross said IT managers and decision makers are not taking the steps necessary to secure and recover the data.

All storage fails eventually

“All storage is going to fail eventually. All hardware breaks. Are you prepared for the inevitable?” Bross said before conducting an informal poll about who had ever lost data.

A smattering of attendees, Bross included, raised their hand (In addition to losing a USB thumb drive, Bross would later admit that one of his two Ubuntu laptops failed during a shipping snafu).

But the informal poll belied a much bigger problem in data back up and recovery in today’s enterprise; one which Bross set out to diagnose and recover much as he and his staff have done hundreds of times back in Novato with hard disks damaged by fire, water and mechanical defects (the latter being demonstrated with a variety of drive-head-on-platter audio clips from real life recovery efforts at the DriveSavers clean room).

Disaster recovery: the numbers

“For all of the effort [systems administrators] put into assigning employees backup tasks, 60% of all corporate data today resides on unprotected PC desktops and laptops,” Bross said, citing industry research from Rochester, N.Y.-based Harris Research.

And when natural disasters strike – and they will, despite the disagreement over the disaster recovery between business executives and IT staffs — the track records of today’s data centers is poor.

According to a study from the University of Texas, U.S. small and medium-sized businesses have shown that when they lose data in a natural disaster, 50% never reopen and 90% are gone in two years. Bross said the hourly cost to “recreate” these battered data centers can run anywhere from $50,000 per hour to $2 million per job at large eCommerce sites.

The reality of reliability

Bross said common knowledge in data centers is that the mean time before failure (MTBF) – or “mean time to failure” – for a typical hard drive is between 500,000 to 1.5 million hours. In an ideal environment, the annual rate of failure of any given drive is .88%.

But two studies from Google Inc. and Carnegie Mellon disagree. In both studies, real world testing of drive reliability found the actual annual replacement rate was actually 3-8%. On top of that revelation was word that failure rates double after the first year of service. For drives older than one year, Bross gave simple advice: “If you experience a drive error of any kind, pull the drive. It’s better to be safe than sorry,” he said.

Those were just mechanical failures though; like natural disasters and virus corruption. Truth be told, studies have shown that user error is by far the biggest contributor to data loss. Fully 60% of all hardware failure is the result of the user, Bross said, which includes malicious/accidental deletion of code, incorrect RAID configuration, accidental reformatting and bad maintenance.

Data protection and the inevitable

Bross concluded his session with a number of tips and best practices for systems administers to use in specific situations.

Hurricanes and floods – “Remember if want to preserve data, you’ll want to make sure that the drive is kept wet,” Bross said. “Storage needs to remain wet. If it dries out, there’s lots of calcification and mineral deposits that can form and cause havoc.” Bross instructs all of his customers to keep wet drives submerged and cool.

Data has been damaged, now what? – Rule number one: Don’t panic. Evaluate the failure and check the status of you backup, Bross said. “Don’t run repair utilities on it. Don’t reformat the volume. Don’t restore backup to the drive in question. Don’t remove drives from a RAID system or rebuild it. Instead, cool heads will prevail and you should evaluate and check the backup drive first, he said.

RAID is not equal to backup! — RAID, by its definition, is a redundant array of disks. “The reality is that a RAID device is only part of the backup solution,” Bross said. “RAID is good for one thing and it’s not as the primary backup application–it’s fault tolerance,” he said.

DIY data recovery – Doing things on your own is good for corruption, deletion or logical corruption of volumes. However, Bross warned that this approach is bad for hardware damage or complex configurations.

Local service providers – This is a good option for transfers, but not for data recovery (this category also encompasses a local expert).

Professional data recovery services — For mission critical data where risk is not an option. Employ clean room facilities as required by drive manufacturers. Bross said that even in these pristine professional conditions “not every patient makes it though this ‘ER for hard drives’.”

Remote access services — Cannot be used with physical device failures. There is the potential risk to customer data that hardware or logical volumes could degrade during diagnosis. Potential benefits are a quick resolution and recovery of data. And there’s no need to ship hardware to lab, either.

Bross said users can avoid needing these data recovery strategies in the first place by making complete backups regularly. “Have a process and a schedule. Assign responsibility and create a chain of command,” he said. “All storage eventually fails. Run home and backup your data now. Be happy you did and sleep well tonight.”


Aug 15 2007   7:56AM GMT

Where’d LinuxToday go?!



Posted by: admin
disaster recovery, Hardware issues

Apparently, the Perfect Server Storm hit popular third party news site LinuxToday.com yesterday and everything went dark. Countless legions of Linux reading geeks and nerds (LinuxWorld Golden Penguin Bowl, anyone?) went entire minutes, if not hours without Linux news and views. It was awful.

Thanks to a crashing server, a database that went awry, and a certain editor’s home/office Internet connection going belly up–all at the same time, yesterday afternoon Linux Today, JustLinux, and LinuxPR all went into the mode technologists sometimes refer to as “bye-bye.”

It was, actually, almost the perfect storm of things that could go wrong all at once. I would like to thank the system administrators who logged in after hours to restart the site at around 2215 EDT (0215 GMT) last night. (Some readers noted the sudden appearance of 13 new stories at that time.)

All three situations seem repaired now, so I would also like to thank everyone for all of their patience and continued readership.

I’ve had the pleasure of meeting and conversing with LinuxToday Managing Editor Brian Proffitt for the better part of the past three years now at various shows, so my sympathies are with him on this dark, dark day. :-) Good job to his systems administrators for getting LT back up and running so quickly.

On that note, perhaps Brian would like to read my upcoming post-LinuxWorld story on disaster recovery? Hmm? :-)