Posted by: Beth Pariseau
Data center disaster recovery planning
According to reports out of the U.K. yesterday, Barclays ATM machines stopped working Tuesday because of a fault with one of its disk arrays.
The exact nature of the problem has not been specified, but the company is publicly known as a customer of Hitachi Data Systems’ (HDS) USP-V. HDS supplied a SAN subsystem based on its high-end USP-V hardware in February to bring capacity to 1 PB at a new 28,000 square foot Gloucester data center. That is the data center where the outage occurred.
Reached for comment, an HDS spokesperson wrote to Storage Soup in an email:
Not much to respond to as Barclays’ operations are now fully back online as of end of business day yesterday local time. Barclays and Hitachi Data Systems are investigating the cause of the problem. As a trusted storage partner to customers around the globe, it is our commitment to deliver on high standards of customer service and support excellence to Barclays and all of our customers worldwide.
U.K. storage consultant Chris M. Evans, who has worked with HDS products and customers, came to the vendor’s defense. He pointed the finger at the lack of redundancy of Barclays’ architecture.
What surprises me with this story is the time Barclays appeared to take to recover from the original incident. If a storage array is supporting a number of critical applications including online banking and ATMs, then surely a high degree of resilience has been built in that caters for more than just simple hardware failures? Surely the data and servers supporting ATMs and the web are replicated (in real time) with automated clustered failover or similar technology?
We shouldn’t be focusing here on the technology that failed. We should be focusing on the process, design and support of the environment that wasn’t able to manage the hardware failure and “re-route” around the problem.
One other thought. I wonder if this problem would have been avoided with a bit of Hitachi HAM?