Data center outages at Delta Airlines and Amazon Web Services stole the headlines in recent months, but there’s plenty of other outages at everyday enterprises that fly under the radar.
IT pros dished the dirt last week at IBM Interconnect, anonymously sharing tales about their data center outages — an illustration of the various problems behind data center downtime, and a reality check about how that next outage could be caused by just about anything.
A CIO, two weeks into the new position, claimed he was hired to implement a “transformational agenda” – but first he endured a one week outage of a core, externally facing customer system. “I spent months delaying my agenda to focus on sustainability” wrote the unnamed CIO.
An insurance company in Connecticut performed a data migration from its original system to a new platform, then shut down the old system, claimed another contributor. But when they attempted to bring up the new system, the data was corrupt.
In a networking tale of woe a F5 refresh took out an entire website when a parameter set to direct traffic to the least loaded server instead sent the traffic to a test server. You can probably guess what happened next.
Another debacle cited failure of an unspecified storage component which degraded performanceand ultimately triggered the disaster recovery plan. But there was one problem: — “We had no way to failback – not good,” wrote the IT pro.
Nature was blamed for one data center takedown — a squirrel chewed into a main power feed during maintenance to the data center’s battery backup. That caused a blackout – albeit short – with the data center going down for about five seconds until the generators kicked in. No word on whether any data –or the squirrel — was lost.
One IT pro lamented how a load test was conducted on productive storage during working hours. It was a virtualized environment and nothing should have happened, but the ports became saturated and the network couldn’t handle the load so there was downtime.
Timing can be everything, and that was certainly the case when the hard drive died in a network staging server at one company — just before a new product was to be launched, according to the anonymous writer.
Backup for data center cooling and power systems are especially important, as shown by one story where an IT pro claimed that there was no UPS or generator backup for cooling towers on the roof of the data center. When power went out, the CPU overheated with no working cooling system.
Don’t blame me
Notice a common theme? None of the authors accept guilt in their stories of data center downtime. In fact, nobody is blamed in most cases. So much for the blameless post mortem, even when it is anonymous. A majority of data center outages are caused by human error, which leaves us wondering exactly what was the painful truth behind these outages.
Now that you’ve read some tales from the data center trenches, what’s your best story about an outage and downtime?
When I first noticed the line there was maybe a few dozen people in it. The number quickly swelled to 100 and then 200 or more, as the line snaked its way down the aisles of the vendor exhibition at IBM’s Interconnect 2017 conference in Las Vegas.
It’s not that unusual to see show attendees lined up at a vendor’s booth waiting for, well, almost anything. This line, however, carried a certain sense of anticipation and energy, with people engaged in animated conversations and craning their necks to see if the long line was moving forward.
Who or what they could be waiting for? Was it a well-known computer industry or sports celebrity making an appearance, or perhaps a well-known a software giveaway?
Turns out it had nothing to do with anything like that. These people were waiting for a copy of a book with the decidedly unsexy title: Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World.
Really? A book chronicling the rise of blockchain — or as some refer to it, the blockchain — draws this large a crowd? I felt a little like the clueless reporter in Bob Dylan’s Ballad Of A Thin Man with the chorus: Something is happening here and you don’t know what it is, do you Mr. Jones?
Well alright, maybe I’m not that clueless — I did spent the previous two days at the show listening to a series of executives from IBM and end user companies sing the praises of the technology. Some even suggested that blockchain is the breakthrough technology the Internet has been waiting for.
There certainly is a lot of hype building around this technology. That’s nothing new in this industry, going back to the glitzy, multi-million dollar marketing campaigns for desktop products like Apple’s original Macintosh, or Microsoft’s Windows 95. For the most part these and other heavily hyped products lived up to those promises and/or went on to influence other breakthrough products.
But this one is different. The success of blockchain – and we are still waiting to see how successful it will become – figures to be more through its quiet adoption by large (and some smaller) enterprises. True Believers say it offers rock-solid security in the cloud through what is essentially a distributed database that records transactions in blocks in a way that makes tampering damn near impossible.
IBM is certainly one of those True Believers. Some company executives believe it is one of the company’s two or three most strategically important technologies. Big Blue is betting a good chunk of its future on blockchain along with its Watson’s cognitive technology and an aggressive adoption of open source across its software portfolio.
The latest testimony to that is the raft of announcements the company made at Interconnect involving all three technologies, along with a featured presentation by Leanne Kemp, CEO of Everledger, about how her company uses IBM’s blockchain to prevent fraud in the diamond industry.
IBM is not alone in not just talking the talk but walking the walk about blockchain. Microsoft has stated its commitment to the technology and is working on its Project Bletchley, an open blockchain-as-a-service offering. JP Morgan now has two blockchain offerings aimed at the enterprise called Juno and Quorum. Accenture has introduced an early version of an “editable blockchain” allowing for a blockchain to be edited under what the company calls “extraordinary circumstances” to fix human errors or accommodate certain legal and regulatory requirements. And the Linux Foundation has its Hyperledger Fabric.
And in one last example of blockchain’s growing popularity, not just among fully matured IT geeks in enterprise shops but among aspiring geeks, Wiley this spring will release Blockchain For Dummies.
So if I am having difficulty grasping the finer points of blockchain this year as explained to me by technical industry experts, maybe you’ll find me in a similar line at a trade show looking to pick up a copy.
After cooking for more than three decades in its research laboratories, Big Blue has served up an appetizer of its quantum computing efforts, to give the IT world its first taste of what possibilities the emerging technology might offer.
The first morsel is a set of APIs to help developers build interfaces between five IBM-hosted quantum computers and the company’s installed base of host-based systems. Included with the new APIs is an improved quantum simulator, accessible through the IBM cloud, to run algorithms on model circuits of up to 20 qubits.
IBM promises to follow up with a full-blown SDK in the first half of this year that lets programmers build simple quantum applications for both business and scientific use.
Despite the sci-fi aura around quantum computing, it would be a mistake to view the technology as the successor to IBM’s current mainframe and Watson technologies any time soon. Rather than a one-for-one replacement, it is intended more as a complimentary technology that solves complex problems beyond the abilities of today’s traditional systems.
“There could be a universal fault-tolerant quantum computer able to do everything a classic computer can do, but that’s a long way off,” said Dave Turek, vice president with IBM’s high performance computing and cognitive systems. “For now, what we are trying to do is solve problems that are intractable for conventional computers.”
One practical example is what Turek referred to as the traveling salesman problem. Conventional computers could easily calculate an optimal route for a salesman if it only involved two, three or four cities. But to calculate the possible routes for 20 or more cities, the number of possible routes grows exponentially and becomes too difficult for today’s systems.
“To solve that you have to assess a number of routes that can only be counted by counting the number of atoms in the universe. That is how big that number of routes gets,” Turek said.
Another example of what quantum computing can do is to help discover more efficient and creative ways to produce greater quantities of ammonia, a key ingredient in fertilizer. With the world’s population expected to nearly double by 2050 and farmers having to feed twice as many people with the same amount of land at their disposal, such advances will be essential.
“Conventional computing models can’t get within a reasonable distance of what is going on at the molecular level to create models to solve such problems,” Turek said.
Other business problems that can be addressed with quantum computing, Turek said, include finding optimal routes for logistics and supply chains, coming up with better ways to model financial data and identifying risk factors in making investments, making cloud computing more secure through quantum physics and improving the capabilities of machine learning when handling large data sets.
Since last year IBM and a handful of industrial partners — all members of the IBM Research Frontiers Institute, including Samsung, Honda, Canon, and Hitachi Metals – have researched potential quantum applications and evaluated their viability for business use.
Full-blown quantum computing systems won’t be operational for several years, but opening it up to corporate developers will get IT shops thinking about what the technology can do, Turek said.
“I think you will see some very interesting things in the next two years as developers begin solving real commercial problems,” he said.
Specifications for the new Quantum APIs are available on GitHub at https://github.com/IBM/qiskit-api-py.
Ed Scannell is a senior executive editor with TechTarget. Contact him at email@example.com.
IT professionals hoping for a taste of what the combined Dell-EMC-VMware will serve up to them for new products and strategies, got a bowl of steam instead.
In formally announcing the completion of their $67 billion deal, executives from Dell and EMC spent most of their presentation on Sept. 7 reciting the resume of the combined companies, reminding us of how big and bad they plan to be in the IT world:
• The world’s largest privately held technology company ($74 billion in revenues);
• Holding the number one, two or three position in several major product categories including PCs, servers, storage and virtualization; and
• A corporate structure that supposedly allows them to innovate and pivot quickly like a startup, but with pockets deep enough to heavily invest in research and development for the long term.
“We are going to be the trusted provider of essential infrastructure for the next industrial revolution, so organizations can build the next generation of transformational IT,” said Michael Dell, chairman and CEO of Dell Technologies.
If nothing else, you have to admire Mr. Dell’s confidence and ambitions. On paper, the new company at least appears to have a fighting chance of accomplishing this objective. With archrivals IBM and HPE either selling , spinning off, or merging huge pieces of themselves and creating much smaller competitors, Dell Technologies could indeed end up being the biggest and baddest boy on the IT block.
But what looks formidable on paper — as we have seen in this industry time and again — ends up not being worth the paper it’s written on. For instance, Hewlett Packard execs believed they would dominate the world of desktop PCs and Intel-based servers after buying Compaq Computer Corp. in 2001, only to squander whatever advantages the latter had when dozens of key Compaq executives left and a number of key products were dropped just a year or two after the deal.
“They have enough resources to compete with just about anyone,” said one long-time IT professional with investments in both Dell server and EMC storage products. “But they haven’t specifically laid out how they [Dell-EMC-VMware] will work together to make say, cloud-based environments work hand-in-glove with on-premises environments.
Such a lack of clarity, he added, “reminds me of a certain presidential candidate with huge ambitions and few details about how he gets there.”
It’s not just the lack of specifics about how the combined companies will work cooperatively together that makes some skeptical. It is also Michael Dell’s bold claim that the new company can “innovate like a startup”. But can a newly-formed $74 billion elephant keep pace with, not just with real jack rabbit startups, but also invest enough to match the R&D dollars typically spent by IBM, Microsoft and Google annually?
Dell certainly has a history of being a fast follower in the hardware business the past 30 years, but never a company that felt comfortable making a living out on the razor’s edge.
Michael Dell’s answer to growing this now mammoth business while still delivering more innovative products faster seems to revolve around Dell’s decision to go private a couple of years ago.
“The single best way to get bigger, but also move faster, is to detach yourself from the 90-day reporting cycles that are common among larger companies,” he said. “I think going private has kicked the company into a new gear. We have had 14 quarters in a row of gaining [market] share in our client business. Dell Technologies can act fast and not be governed by short-term concerns.”
Going private indeed may have helped spur consistent growth in Dell’s client business – a business that is declining for not just Dell but all of its major competitors – but he failed to mention how it has resulted in any significant technology innovations the past couple of years.
As announced earlier this year the new company is now called Dell Technologies, with Michael Dell serving as chairman and CEO. The company is split into two groups: Client Solutions headed by Dell president and vice chairman Jeff Clarke, and an infrastructure group to be led by David Goulden, the former head of EMC’s Information Infrastructure organization. Both organizations will be supported by a Dell EMC Service unit.
The rest of the old EMC Federation — namely VMware, Virtuestream, Pivotal, Boomi, RSA and SecureWorks, — will continue to function independently and are free to pursue their own strategic agendas and develop their own ecosystems, “which is our commitment to remaining open and offering customer choice,” said Michael Dell. “But we have also strategically aligned our technologies to deliver integrated solutions like hybrid cloud, and security and seamless infrastructure technology from the edge to the core to the cloud.”
Again, all that looks good on paper. — but can this melding of two giant IT suppliers work beneficially for users where so many similar unions have failed? Maybe with the next press conference Dell can offer users at least an appetizer instead of a bowl of steam as to how this will all work.
Companies that offer DCIM tools position them as essential, promising a holistic view of the performance of a data center. The DCIM market went from volatile to pretty stagnant, though a buyout between two of the major vendors could jump start demand.
There are several problems with data center infrastructure management (DCIM) tools at the moment.
DCIM tools can be fairly complex and IT pros may initially be overwhelmed with the amount of information the tools provide. Going all in with may even require organizational changes, so slowly adding tools is probably a better bet.
These three comments highlight the broad points of view around the industry about DCIM.
DCIM tools help solve problems like the one craigslist engineer Jeremy Zawodny posted on Quora, an open question-and-answer website:
There’s more potential in DCIM than power and cooling measurements or even asset controls. According to data center facilities expert Robert McFarlane, DCIM tools will fall away from the forefront, but that might not be a bad thing.
“DCIM will become less of the big industry buzz word and settle down into the background,” McFarlane said. He doesn’t think that means DCIM will be less important, but rather, that IT pros will take a close look at DCIM when they want to track a specific metric in the deployed infrastructure. Some in the industry even see DCIM being essential to preventative data center maintenance.
— nilesh srivastava (@nileshranjansri) March 24, 2016
Potential users who invest heavily in DCIM tools today expect a broad, integrated platform that isn’t always reality. Commenter ‘NoteShark’ detailed this disconnect between expectations and reality in response to Robert Gates’ story “Buyout could give stagnant DCIM tools market a boost” from February 2016 (linked above).
For some, configuration drift seems to occur due to the ops and facility teams only having a normative description on which to base their designs — from racks to system architectures. When the tool doesn’t have input from everywhere in the stack from facility to app, DCIM tools don’t live up to their fullest potential. And while configuration drift can happen at every level, ‘NotesShark’ goes on to say how a DevOps -type of IT environment, where there is more communication and a better flow of information between dev and ops teams, would benefit most from thorough asset and portfolio management alongside current DCIM tools’ abilities in facility and hardware tracking.
Where do you stand on DCIM tools’ usefulness and their future?
Cloud infrastructure offerings increased in resiliency in 2015, assuaging the fears of many businesses looking to switch some applications or transition production IT entirely to the cloud. Enterprises want to save money while retaining the same performance, which cloud providers aim to deliver. Granted, 2015 wasn’t a perfect year.
While evaluating cloud providers’ reliability is difficult since there are few independent data sources, it is not impossible. SearchCloudComputing created a general assessment of cloud infrastructure performance in 2015 by combining a few sources of data, including a CloudHarmony snapshot of cloud provider performance over a 30-day period and Nasuni’s reports on the cloud providers that it uses.
In February 2015, Google’s infrastructure as a service offering Google Compute Engine (GCE) experienced a global outage for over two hours. The outage was at its peak for forty minutes, during which outbound traffic from GCE experienced 70% loss of flows.
Months later, Amazon Web Services (AWS) experienced outages over a weekend in September that affected content delivery giant Netflix and throttled service for other U.S.-East-1 region AWS users while recovery efforts took place. Compared to previous years when AWS experienced some major outages, 2015’s cloud problems were definitely less major, more of a slowdown than a full stop. However, the list of AWS services affected was longer than the list of services unaffected.
Is Colo the Way to Go?
Even though offerings from cloud providers are improving, some companies found that the cloud just couldn’t handle their business needs. Since 2011, Groupon has been moving away from the cloud and to a colocation provider. Cost drove the online deals company towards running its own data center IT, with its enterprise needs covered in nearly every area, from databases and storage to hosting virtual machines.
However, colocation providers aren’t free of problems. A study of the costs of data center outages from Emerson and Ponemon Institutes found that UPS system failure accounted for a fourth of all unplanned outages, while cybercrime rose from 2% of outages in 2010 to 22% in 2016.
Verizon’s recent data center outage that took airline company JetBlue offline for three hours and grounded flights highlights the importance of failover plans and redundant power. Verizon, which runs its own data centers for its telecom business, is a surprising sufferer in this outage scenario, according to some observers.
Companies that run owned data centers aren’t free from the same problems that plague cloud and colocation data centers, from stale diesel fuel to poor disaster recovery planning in advance of an attack, error or natural disaster. Data center IT staff must consider how much oversight they have over potential problem areas, and how much control they want — or can have — over the outage and how it is resolved. Visibility into the outage and its aftermath also will vary from provider to provider.
Each year, SearchDataCenter ushers in the holiday season with a geek gift guide by Beth Pariseau, who enjoys a brief break from breaking stories about AWS public cloud to tell you about what to find on Amazon’s other major property.
This year, SearchDataCenter’s writers and editors decided to get in on the fun, and share what tech gift they’d like to unwrap:
Geekiest gift he’s ever gotten? “A Motorola Xoom.” You might remember the Super Bowl commercial for it.
Michelle Boisvert, executive site editor: “The Garmin Forerunner 920XT watch. As a triathlete and a Type A personality (they go hand in hand), I like to track everything during my training and races. What was my swim pace, my transition time, my cadence on the bike? Currently, I have two different watches I use: an old school Timex wristwatch for swimming and a Garmin FR60 for running. This works for training — when I have time to swap watches — but not in races. The Garmin Forerunner 920XT is a single watch that tracks swim distance and speed (in the pool or in open water!), pace, power output, heart rate and cadence on the bike (with optional bike mount) and all the bells and whistles of data wanted during a run. So, if anyone happens to have an extra $450 lying around, you know where to find me.”
Geekiest gift she ever received? “Probably a Tanita scale that measures weight and body fat percentage. And no, I did not want this. No one wants to see a scale around the holidays!”
Meredith Courtemanche, senior site editor: “I would love a new iPod. My iPod Touch is over five years old now, and likes to repeat a song two, sometimes three times before moving on to the next one. If anyone wants to come over and trance out to Bing Crosby’s White Christmas on repeat, hit me up.”
Geekiest gift she’s ever gotten? “Does a VTech from childhood count? It looked like a laptop anyway and since this was before toys could go online, my data is safe in an attic somewhere.”
Stephen J. Bigelow, senior technology editor: “I could go for a nice low-profile bluetooth headset for the gym so that I can play music from my smartphone and still be able to work the machines without those silly, wired earbuds falling out or yanking my smartphone to the ground.”
What about you, IT reader? What do you want for the holidays?
The summer weather didn’t slow down anyone in the cool, dark halls of the data center. Catch up on the big news and expert advice from the past month that other data center pros found valuable.
The big picture:
These trends, shared at the Gartner IT operations conference, will shape the face of the data center sector for the coming years.
Top story on jobs:
The buzz at Red Hat Summit included talk of converged and hyper-converged infrastructures. Many attendees were keen to learn if and how these systems would change their daily work.
Opinions to stir up conversation:
Do you run a DC data center with high-voltage racks built into a glacier? No? Neither do most of your peers. But just because a concept missed mainstream adoption or faded from use does not mean that we can’t learn something from it for tomorrow’s data centers.
Most helpful tip:
Big data means data sprawl and more work for data centers. This tip outlines ways to corral enterprise data and store it without exhausting your hardware and staff resources.
In the news:
HP told attendees at its HP Discover conference that the impetus for its Grommet user interface came from a decision to look like one company across its various enterprise tools and applications.
The June issue of Modern Infrastructure
This e-zine covers everything from micro services to mega convergence in data center storage. Check out expert stories on bare metal, desktop security and big data as well.
The problem with getting older is that I sometimes find myself set in my ways — gravitating toward things that I knew (or was at least interested in). I confess that I sometimes feel a little overwhelmed by the many abstract concepts emerging across the industry like big data and the Internet of Things just to name a few. After all, I’m a hardware guy, and finding ways to monetize or justify business value in 26 billion connected devices or securely deliver streaming content to a multitude of remote device users is tougher to wrap my brain around than the newest Intel command set. There are moments when I’d rather just move to Nebraska and raise alpacas.
But watching this morning’s keynote address by Gartner’s Chris Howard on “Scenarios for the future of IT” at Gartner IT Operations Strategies & Solutions Summit in Orlando, Fla., reminded me of something that I’d long-forgotten: IT has never been about servers and networks and stacks and all of the engineering stuff; IT is about solving business problems and enabling the business.
Back in those ancient days before the Internet (yes, I was there), IT supported the business by storing and serving up files and even supporting the groundbreaking notion of collaboration. Later, networks and user bases expanded, and businesses needed IT to solve new problems, allowing businesses to support remote users and market the business differently on that thing called the world-wide web.
As we fast-forward to today, Howard’s hour-long keynote focused on the challenges of the digital business. This included the importance of context, providing access to data that isn’t tied to devices, where devices have the intelligence to determine where you are and what you need. He also talked about the need for analytics that extend to the edge of the environment (not just in a data center) to decide what data is important and how it should be used.
And while Howard cited numerous examples of these issues — where many of the working elements are already in place — there was NO mention of the underlying systems, networks, software, or other elements needed to make all of these business activities possible. It was then that I realized there shouldn’t be.
It’s not that the underlying parts aren’t important. It’s just that the underlying parts aren’t the point. Thinking back, it really never mattered what server or disk group served up files back in the day. The only goal was that IT needed to deploy, configure and maintain that capability. While today’s business demands and pace has changed dramatically, the basic role of IT remains essentially unchanged; to enable, protect and support those competitive business capabilities in a reliable, cost-effective manner. The underlying “stuff” is there, and IT professionals have the savvy to make it all work.
So the real challenge for today’s IT pros is to embrace these many new ideas and find the way to map those complex business needs to the underlying infrastructure, which must inevitably evolve and grow to meet ever-greater bandwidth, storage, and computing demands.
Who knows what the next few days in Orlando might bring? Maybe this old dog might actually learn a new trick or two?
Google used machine learning to parse the multitudinous data inputs on its data center operations, as a way to bust through a plateau in energy efficiency evidenced by its measured power usage effectiveness (PUE).
In a white paper describing the effort to improve PUE below 1.12, Google’s Jim Gao, data center engineer, wrote that the machine learning approach does what humans cannot: Model all the possible operating configurations and predict the best one for energy use in a given setting.
The 19 factors that interrelate to affect energy usage are as follows, according to Google’s program:
- Total server IT load (kW)
- Total campus core network room IT load (kW)
- Total number of process water pumps (PWPs) running
- Mean PWP variable frequency drive (VFD) speed: Percent
- Total number of condenser water pumps (CWP) running
- Mean CWP VFD speed: Percent
- Total number of cooling towers running
- Mean cooling tower leaving water temperature set point
- Total number of chillers running
- Total number of dry coolers running
- Total number of chilled water injection pumps running
- Mean chilled water injection pump set point temperature
- Mean heat exchanger approach temperature
- Outside air wet bulb temperature
- Outside air dry bulb temperature
- Outside air enthalpy (kJ/kg)
- Outside air relative humidity: Percent
- Outdoor wind speed
- Outdoor wind direction
Gao states: “A typical largescale [data center] generates millions of data points across thousands of sensors every day, yet this data is rarely used for applications other than monitoring purposes.” Machine learning can understand nonlinear changes in efficiency better than traditional engineering formulas.
Read the paper here.