when relevant content is
added and updated.
If you sell disaster recovery products and solutions, you have a big batch of potential new customers: Major U.S. airlines. Four airlines — Delta, Southwest, United, and American, which just happen to control 80 percent of all domestic travel in the U.S. — have suffered major outages in recent months that have been attributed to a lack of a proper disaster recovery plan.
In August, Delta customers suffered major delays because of…a power failure? “A power outage in Atlanta, which began at approximately 2:30 a.m. ET, has impacted Delta computer systems and operations worldwide, resulting in flight delays,” the company reportedly wrote on its blog at the time. “Following the power loss, some critical systems and network equipment didn’t switch over to Delta’s backup systems,” reported Kim Nash in the Wall Street Journal.
“How could a company as technologically savvy and mature with its business processes as Delta not have a working disaster recovery plan?” writes M.J. Shoer in Seacoast Online. “It’s a fair question and we are still waiting to learn the details. I can’t for a minute believe Delta does not have a disaster recovery plan to deal with an event like this, but it failed. That begs the question as to when it was last tested and how often this plan is reviewed, revised and retested.”
In July, “Southwest Airlines canceled more than 2,000 flights over several days after an outage that it blamed on a faulty network router,” write Alastair Jamieson, Shamar Walters, Kurt Chirbas and Gabe Gutierrez for NBC News. While the company had redundancies built into its equipment, they didn’t work, according to CEO Gary Kelly. “A back-up system also failed, extending the outage. Ultimately the company had to replace the router and reboot 400 servers,” the company’s chief operating officer told Conor Shine of the Dallas Morning News.
Delta and Southwest aren’t alone. “Computer network outages have affected nearly all the major carriers in recent years,” writes David Koenig for the Associated Press. “After it combined IT systems with merger partner Continental, United suffered shutdowns on several days, most recently in 2015. American also experienced breakdowns in 2015, including technology problems that briefly stopped flights at its big hub airports in Dallas, Chicago and Miami.”
Altogether, thousands of flights were delayed, resulting in costs of millions of dollars. Southwest’s Kelly could even lose his job over the incident, after criticism from a number of employee unions.
“The meltdown highlights the vulnerability in Delta’s computer system, and raises questions about whether a recent wave of four U.S. airline mergers that created four large carriers controlling 85 percent of domestic capacity has built companies too large and too reliant on IT systems that date from the 1990s,” writes Susan Carey in the Wall Street Journal.
Moreover, it points to how vulnerable airlines are to terrorist attacks, writes Hugo Martin in the Los Angeles Times. In fact, at first, some even attributed Delta’s outage to a terrorist attack – maybe because it just seemed so unbelievable that an airline could be brought down by a power failure. But it’s happening.
So What’s Causing It?
“There have been several reservation system outages that have hit worldwide airline ops with distressing regularity over the past few years,” risk management specialist Robert Charette told Willie Jones of IEEE Spectrum. “Southwest Airlines had one just a few weeks ago. (It had another big one June of 2013 and another in October 2015.) What you’ll see in reviewing them is recurring problems with infrastructure (i.e., power, networks, routers, servers, etc.) that seem to keep surprising the airlines. In every case I can recall, there were backup systems in place, but they failed—another recurring theme. The Southwest CEO claimed that the last outage—caused by a router—was equivalent to a 1000-year flood. Not only was that a comical overstatement, but it also shows the thinking that is probably [leading to the airlines] skimping on contingency management preparations.”
Some have attributed the failures to too much consolidation in the airline industry and too much emphasis on efficiency. One study, Delivering the Benefits? Efficiencies and Airline Mergers, found that not only did mergers not save operational money, but often cost much more than expected – particularly in terms of integrating IT. Some have attributed Delta’s failure, in particular, to the retiring of its previous CIO, Theresa Wise – who merged the Delta and Northwest IT teams — in January.
“Is this a sign that airlines aren’t investing enough money in their IT infrastructure?” writes Adam Levine-Weinberg in The Motley Fool.
Airlines Aren’t Alone
On the other hand, airlines aren’t alone in having insufficient disaster recovery protection – just the most conspicuous. A survey – admittedly by a disaster recovery vendor – found that companies in general aren’t testing their disaster recovery systems enough. “When asked about how frequently they tested their DR environment, more than half of the respondents indicated that they tested less than once a year; even worse, a third said that they tested infrequently or never,” the survey found.
Airlines’ IT systems’ complexity makes it worse. “Since they are needed on a 24/7 basis, 365 days a year, it’s hard to fully test every potential scenario that could cause problems,” Levine-Weinberg writes. “As a result, it may be impossible to fully eliminate large-scale IT outages across the airline industry.”
The biggest problem with this failure in disaster recovery? The computer networks and systems can be repaired. Disaster recovery plans can be created for the next time. But repairing customers’ trust may not be so easy.