There’s been a couple of instances recently where government agencies have been careless with data, losing access to personally identifiable information such as Social Security numbers.
First, a NASA laptop that “contained records of sensitive personally identifiable information for a large number of NASA employees, contractors and others” was stolen from a vehicle, and while the laptop itself was password-protected, the data on it was not encrypted. In its memo about the incident, NASA didn’t say how many staffers might have been affected.
Second, the state of South Carolina’s Department of Revenue determined that hackers had broken into its database, putting the PII of up to 4 million people and 700,000 businesses at risk — again, because data had not been encrypted — in what is said to be the largest breach ever of a state agency. “Hackers also stole 3.3 million bank account numbers and the tax files of 700,000 businesses,” wrote Reuters. The Social Security numbers of 1.9 million children on parents’ returns were also compromised.
Are you detecting a Trend? Like, maybe, that encrypting PII is a Good Idea?
NASA, which had already lost another laptop in March to a similar theft, is actually in the process of implementing encryption on its systems — the stolen laptop just hadn’t gotten through the process yet. However, the agency expects all of its laptops to be encrypted by December 21, a spokeswoman told the New York Times. The agency didn’t say how much the breach would cost.
With South Carolina, its encryption plans are less clear. Gov. Nikki Haley — who had reportedly claimed the breach wasn’t the state’s fault until an investigation by the security company Mandiant proved her wrong — has been blaming the problem on “antiquated state software and outdated IRS security guidelines” that don’t require encryption. But while the state has implemented some security measures, such as increased monitoring, reports haven’t indicated anything yet about South Carolina installing encryption, though the Republican governor wrote the IRS a Strongly Worded Letter encouraging the federal agency to require states to do so.
“Had I known that IRS compliance meant that our Social Security numbers were not encrypted, I would have been shocked,” Haley was quoted as saying on local news.
Haley said the state also hadn’t encrypted the data because it was complicated. “But it’s highly unlikely that anyone on the security team at the Department of Revenue recommended storing millions of SSNs in plaintext because the alternative–deploying an encryption package–was too complicated,” wrote Dennis Fisher of Threatpost in a scathing rebuttal. “More likely, someone looked at his budget, looked at the price of the database encryption package, and made a hard choice. Lots of businesses, government agencies, non-profits and other organizations face the same choice every year and some of them decide that the cost of the encryption outweighs the potential benefit. And that can work out fine. That is, until something like the South Carolina data breach happens. Then things tend to be not fine.”
If the goal was to save money, they chose…poorly. “The cost of the state’s response has exceeded $14 million,” reported the Post. “That includes $12 million to the Experian credit-monitoring agency to cover taxpayers who sign up — half of which is due next month — and nearly $800,000 for the extra security measures ordered last week. The Revenue Department has estimated spending $500,000 for Mandiant, $100,000 for outside attorneys and $150,000 for a public relations firm. But those costs will depend on the total hours those firms eventually spend on the issue. The agency also expects to spend $740,000 to mail letters to an estimated 1.3 million out-of-state taxpayers.”
Plus, there’s the class action lawsuit, which could amount to $4 billion or more.
Meanwhile, other states such as Georgia and Alabama are hastening to point out that they don’t have any problems like this because they encrypt their data. However, most other states don’t, said Larry Ponemon, chairman of The Ponemon Institute, which researches privacy and data protection.
Acquisitions are hard.
HP announced this week that it was being forced to write off $8.8 billion of the $9.7 billion cost of its year-old acquisition of Autonomy.
Let the fingerpointing begin.
HP blamed the write-down on what it said were systematic accounting games that made the U.K. company look much more valuable than it was — to which it said it was alerted only when a senior Autonomy official pointed them out.
Autonomy grew through acquisitions, buying everything from storage companies like Iron Mountain to enterprise software firms like Interwoven. They’d then go to customers and offer them a deal they couldn’t refuse. Say a customer had $5 million and four years left on a data-storage contract, or “disk,” in the trade. Autonomy would offer them, say, the same amount of storage for $4 million but structure it as a $3 million purchase of IDOL software, paid for up front, and $1 million worth of disk. The software sales dropped to the bottom line and burnished Autonomy’s reputation for being a fast-growing, cutting-edge software company a la Oracle, while the revenue actually came from the low-margin, commodity storage business.
Mike Lynch, former CEO of Autonomy, who was reportedly fired by HP in April, strenuously denied the allegations, which he said he knew nothing about until the press release came out, and said that HP had mismanaged the company in the year it had run it.
I think what has happened here is that they have got themselves in a mess. They did the acquisition of EDS, they had to write that one down. They had to write Palm down. When Autonomy was acquired it was done by a CEO who wanted to get rid of various divisions of that business and lead with software. He was ousted in an internal coup d’etat. From that point Autonomy was at odds with the divisions that were in power. There was a series of mismanagement steps. They lost hundreds of the talented people at Autonomy. They whole management team basically went out of the door. Sadly they are left with the results of having destroyed all that value.
Analysts blamed HP for spending too much on the acquisition — which they’d said since the initial announcement — and for not doing due diligence.
Even if Autonomy passed the auditors’ scrutiny the way it did, how can you pay $10 billion for a company that has $1 billion in revenue and growing at 10% a year only,” writes Jean-Baptiste Su in Forbes. “It didn’t make sense then, and now H-P is taking an $8.8 billion charge and blaming everyone else for its mistake.”
(Oracle’s Larry Ellison caused a kerfluffle in September, 2011, when he said that Autonomy had been shopped to Oracle as well — ironically, on April 1 of that year — which Lynch denied at the time until Ellison posted the Powerpoint presentation from the meeting. Ellison said the company was overpriced — but due to the feud between Oracle and HP at the time, this likely sounded like sour grapes.)
HP’s Meg Whitman, who’d been CEO for only a few weeks when the acquisition was finalized but who had been on the board at the time, blamed former CEO Leo Apotheker, who was forced out after the acquisition when he also tried to kill HP’s PC division, and former chief strategy officer Shane Robison, who has also left the company. She also said that accounting firm Deloitte had vetted the company, and that KPMG had vetted Deloitte.
Apotheker declared he was shocked, shocked, and talked about the due diligence he had done.
Deloitte, for its part, had already come under criticism earlier this year for making mistakes.
There. Have we left anyone out?
HP would have much more credibility about its accusations if the company didn’t have such a poor track record of its other acquisitions. Wrote Su:
Unfortunately, Autonomy is just the latest example of the company’s dreadful acquisition track record and value destruction that started, in earnest, 10-years ago with Carly Fiorina‘s decision to acquire Compaq for $25 billion, followed then by Mercury Interactive in 2006 ($4.5B), EDS in 2009 ( $13.9B) and Palm in 2010 ($1.2B).
In what seems like a prescient story, the Wall Street Journal wrote earlier this month about HP’s troubles.
Like most big tech companies, H-P has acquired new technologies. But many of its biggest purchases have fizzled, leaving the company with less cash and more debt than its rivals and effectively shutting it out of future deals,” the paper wrote. “Neither H-P’s buybacks nor acquisitions have panned out. H-P wrote off $8 billion of the $13 billion EDS deal earlier this year. In 2010, H-P also spent $1.2 billion to acquire mobile-device maker Palm, but shut down the unit and wrote it off a year later. Last year, H-P paid more than $10 billion to acquire software maker Autonomy, but has already said sales in that business are declining.”
One also wonders, if the tale about HP discovering the issue after performing an internal investigation based on the Autonomy executive’s tip in May is true, what took the company so long to have noticed? If it hadn’t been for the tip, just when would it have come out?
And if “H-P’s internal team was aware of talk about accounting irregularities at the time the deal was struck…[and] was looking for a way to unwind the deal before it closed, but couldn’t find any material accounting issues,” as the Wall Street Journal writes now, wouldn’t the company have studied those irregularities right away? Were Lynch and the other executives supposed to be so diabolically clever that an army of accountants couldn’t find the fraud until the former Autonomy executive pointed it out for them? (U.K. laws reportedly make it difficult and expensive to renege on an acquisition.)
And finally, just what sort of due diligence did HP and Deloitte do? Did they actually research the numbers themselves, or just trust the numbers it got from Autonomy?
HP acquired Autonomy in August, 2011, soon after Gartner did its first e-discovery Magic Quadrant, in May, 2011. At the time, Gartner spoke glowingly about the company, which it placed in the Leaders quadrant.
Autonomy is a brand and marketing powerhouse that appears on many clients’ shortlists,” Gartner said in its earlier report. “Although we have seen little appetite for ‘full-service e-discovery platforms’ from clients as yet, Autonomy is positioned to seize these opportunities when they do arise — indeed, the overall market may evolve in that direction.”
In that report, Gartner also predicted that consolidation would have eliminated one in four enterprise e-Discovery vendors by 2014, with the acquirers likely to be mainstream companies such as Hewlett-Packard, Oracle, Microsoft, and storage vendors. Autonomy itself acquired Iron Mountain’s archiving, e-discovery and online backup business in May 2011 for US$ 380 million in cash.
At the time, HP was thought to be pursuing a similar strategy to that of IBM, which divested itself of its PC business and moved instead to primarily software and services.
Now, HP’s strategy, as well as that of Whitman — whom some are, fairly or unfairly, blaming for the whole situation — is going to be finding a way to survive.
When I first started this blog almost two years ago, I called it “yottabytes” because that was the term commonly accepted for the biggest size of storage (1000^8, or a 10 followed by 24 zeroes, compared with, say, 1000^4 for a terabyte). But as people are actually starting to refer to petabytes (1000^5) and exabytes (1000^6) of storage, there’s starting to be more discussion of what comes next.
No, I’m not changing the name of the blog.
The proximate cause for the discussion now is a presentation by Shantanu Gupta, director of Connected Intelligent Solutions for Intel’s Data Center and Connected Systems Group, which showed up in GigaOm the other day. According to this presentation, what comes after yottabyte is brontobyte, or 10 followed by 27 zeroes.
This is not definite; as GigaOm’s Stacey Higginbotham points out, it’s not an official prefix, though it has been discussed since at least 1991 — though, that far back, it was 10 followed by 15 zeroes. It does, however, appear to be more accepted for the number than does hella-, which had a brief flurry a couple of years ago as people tried to promote it.
Past bronto, to 10 followed by 30 zeroes, it gets more complicated, partly due to what honestly looks like typographical errors.
Gupta refers to “geobyte” in his presentation — but also refers to “bronobyte” as opposed to “brontobyte” for 10 followed by 27 zeroes . Wikianswers also refers to “geobyte.”
Higginbotham refers to “gegobyte” for the figure, as does Seagate in a blog posting riffing on the GigaOm post.
Geo-, gego-, or geop-? It kind of doesn’t matter, because it’s all unofficial anyway, but somebody might want to figure it out at some point.
Beyond what-do-we-call-it, we also have the obligatory how-to-put-it-in-terms-we-puny-humans-can-understand discussion, aka the Flurry of Analogies that came up when IBM announced a 120-petabyte hard drive a year ago. Depending on where you read about it, that drive was:
- 2.4 million Blu-ray disks
- 24 million HD movies
- 24 billion MP3s
- 1 trillion files
- Eight times as largest as the biggest disk array available previously
- More than twice the entire written works of mankind from the beginning of recorded history in all languages
- 6,000 Libraries of Congress (a standard unit of data measure)
- Almost as much data as Google processes every week
- Or, four Facebooks
So, how big are bronto- and geo/gego/geop-?
Well, GigaOm wrote, “Cisco estimates we’ll see a 1.3 zettabytes of traffic annually over the internet in 2016.” On the other hand, GigaOm cited a piece with the Cisco estimate being 130 exabytes, which would only be .13 zettabytes if I have my math right. Seagate estimates that total storage capacity demand will reach 7 zettabytes in 2020.
Yottabytes is in the realm of CIA and NSA spy data, noted a piece in the Examiner.com, which went on to point out, “As of 2011, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world does not amount to even one yottabyte, but was estimated at approximately 160 exabytes in 2006. As of 2009, the entire Internet was estimated to contain close to 500 exabytes.” A yottabyte would also be 250 trillion DVDs, GigaOm wrote.
For brontobyte, which Gupta said would be used primarily for the “Internet of Things” ubiquitous sensors, there are also somewhat fanciful definitions such as “More than the number of all the cells of the human body in each person living in Indiana and then some,” and “You would need a brontobyte computer to download everything on the Internet” (though, apparently not, according to Examiner.com).
Of course, once we start talking in terms of trillions of DVDs, obviously we’ve got to find another unit of measure. Interestingly, Seagate used geographic area.
“If today’s 4 terabyte 3.5-inch drive is roughly .16 square feet, you can get approximately 24 terabytes per square foot. That’s .0046 square miles of land mass per 4 terabytes. Assuming 1 terabyte per disk was the maximum areal density, and hard drives will not get any thicker than 1 inch:
- An exabyte hard drive would be about the size of Connecticut [or, I would add, Owyhee County in Idaho]
- A zettabyte hard drive would be about the size of Antarctica
- A yottabyte hard drive would cover the earth 23 times
- A brontobyte hard drive would cover the earth 23,000 times
- A gegobyte hard drive would cover the earth 23,000,000 times”
Beyond that? Wikianswers postulated Saganbyte, Jotabyte, and Gatobyte, while Wikipedia referred to a system working backward through the Greek alphabet — though that one wouldn’t include brontobyte or geo/gego/geopbyte.
When one thinks of E-discovery pioneers, one doesn’t tend to think of Hooters. But a recent legal case with the, ahem, female-oriented restaurant has ramifications for the E-discovery industry, specifically in the area of predictive coding.
Predictive coding, or predictive technology, is the use of computer-assisted review software to help determine the relevance of documents to be potentially used in a legal case. It wasn’t until March that a judge first ruled that predictive coding could be used in a case.
Now, we’ve gone one step further, with a judge actually *requiring* two parties — one of which is Hooters — to use predictive coding in their case. Well, at least, Strongly Suggesting.
“Vice Chancellor J. Travis Laster in Delaware Chancery Court has made e-discovery history, again, with a surprise bench order requiring both sides to use predictive coding and to use the same vendor,” writes Ralph Losey for the E-Discovery Team blog. “This appears to be the first time a judge has required both sides of a dispute to use predictive coding when neither has asked for it. It may also be the first time a judge has ordered parties to use the same vendor.” If the parties could not agree on a vendor, the judge continued, he would select one.
(What’s the case about? It’s kind of hard to tell. “A complex multimillion dollar commercial indemnity dispute involving the sale of Hooters, a very well-known restaurant, famous for its chicken and wings, beer, and other things,” Losey writes.)
The goal appeared to be to use the case as an example of how to save money. “The problem is that these types of indemnification claims can generate a huge amount of documents,” the judge said in his decision. “That’s why I would really encourage you all, instead of burning lots of hours with people reviewing, it seems to me this is the type of non-expedited case where we could all benefit from some new technology use.”
“Predictive coding used correctly, promises to reduce costs and turn over at least as much responsive ESI and less unresponsive ESI than our current eyeballs-on-every-document approach,” agrees Karl Schieneman in E-Discovery Journal.
At this point, the two sides need to determine whether they will go along with the judge’s recommendation, or fight it using the Sedona Principles that, basically, leave it up to the participants to determine the best way to proceed. But that might be fraught. “Who wants to tell the judge to butt-out, no matter how politely you say it, or how many Sedona Principles you cite?” Losey writes. “Better to let the other side be the complainer, even if you do not much like it either. Much will depend on who has the heaviest production burden.”
The two sides might also object to the suggestion that they use the same vendor, and the judge’s motivation for suggesting that isn’t clear, Losey continues. Schieneman agrees, noting that it might be difficult to find a neutral vendor and it also isn’t clear how the software will be paid for. “There is an argument to be made that this is a well-intentioned, but possibly uneducated bench that is forcing parties to use and pay for an undefined, black box marketing label.”
Schieneman went on to write that he hopes judges don’t get too enthusiastic about predictive coding and start requiring it willy-nilly. “It could be an unintended disaster if every judge ordered the use of predictive coding,” both because of the learning curve required for legal firms and the ability for predictive coding vendors to be able to adequately support all these new potential customers, he says. “While judicial encouragement of predictive coding is great and absolutely necessary, blind encouragement could be dangerous,” he says.
Though there have been a number of data center outages associated with the Sandy megastorm, and it’s not over yet, what may be most surprising is how little disruption it actually caused — particularly in comparison to the outages caused by June’s thunderstorm.
While several data centers were knocked offline due to flooding — most notably Datagram, which hosts Gawker, Gizmodo, Buzzfeed, The Huffington Post, and Media — many stayed on, often through generators running on diesel fuel. (The New York Times — which criticized data centers just last month about their use of diesel backup generators — was strangely silent on the subject this week.)
The problem then switched to getting fuel delivered, since typically generators would keep only three days’ fuel on-site. That time, however, did give users of those data centers time to find other alternatives.
Though data centers went through extensive preparation, the ones that were knocked offline typically had either the data center, or the fuel systems, or both, in the basement, which flooded. Some sites went offline after they weren’t able to get fuel delivered to the island of Manhattan.
“The situation shows that in many ways, Lower Manhattan is one terrible place to put a data center,” noted Cade Metz in Wired. On the other hand, he said, data centers need to be near where the business action is to provide low-latency data transmission.
In one case, Fog Creek Software and Squarespace — with fuel pumps in the flooded basement and a generator on the 17th floor — employees used a bucket brigade to get fuel up the stairs to run the generator.
Other customers were migrated to cloud services such as Amazon Web Services — ironically, since it has suffered a number of outages over the past few months.
Not only was the Internet relatively resilient to the hurricane itself, but to the increased load of all the East Coast people who stayed at home, watched Netflix, and chatted with their loved ones over Skype.
- Don’t put data centers, or their diesel backup, in the basement. On the other hand, it’s not like you want it up over your head, either — especially if you end up needing to do a bucket brigade to the roof.
- Have data centers, or backup data centers, located in separate geographic regions.
- Plan, plan, plan. And don’t wait until an actual emergency to test the plan. “You can’t wait ’til folks’ hair is on fire to plan these things,” Shannon Snowden, a data center professional who is now senior technical marketing architect for Zerto, a company with technology that helps companies move and failover applications, told GigaOm. “What you should be doing from the data center perspective is [always] make sure the power has been tested, that you can fail over to generators, that those those generators are tested to make sure they’re functional and that they have enough fuel,” he said.
Finally, the vendor I lambasted in April for exploiting natural disasters to promote their product was at it again — press release issued right on schedule on Monday afternoon.
It’s been a busy week in the storage market. The biggest player is that Microsoft acquiring cloud storage vendor StorSimple, but in addition, Carbonite, a provider of online backup solutions, acquired open source and SMB cloud backup vendor Zmanda, while Persistent Systems acquired Dovenz, which sells disaster recovery as a service.
The nice thing about three of them happening at once is that this makes it a Trend, so instead of addressing each acquisition individually, we can talk about What It All Means.
From the startup side, there’s really only three exit strategies you can have. You can die. You can file an IPO (like Violin also did last week). Or you can get acquired. If either the company isn’t strong enough, or the market isn’t strong enough, an IPO isn’t necessarily a good idea. So that leaves acquisitions. (We’ll assume no startup plans to die.)
Being acquired doesn’t mean giving up or throwing in the towel. Particularly in the case of the company doing the acquiring, it can be a good idea. It’s a quick way to collect a bunch of new people, a new technology, and perhaps some new customers. “The vast majority (over 90 percent) of the successful private company exits in 2011 and 2012 have been through company sale or M&A,” writes Jim Price in Business Insider.
The next thing to look at is who’s doing the acquiring. Is it two small companies hoping that together they’ll be strong enough to survive? I don’t want to pick on Carbonite, but given the sort of year they’ve had, that might be a factor. Or is it a big company looking to add an innovative new technology to its portfolio? Certainly in the case of Microsoft and StorSimple, it’s the latter.
As far as what’s next, keep in mind that acquisitions tend to run in clumps. A new technology comes along, a bunch of little companies start up to use it, and then some of them die, some of them merge with each other, and some of them get acquired by larger companies — typically with the strongest players going first and the later ones being picked up by latecomers in the market who are desperate to own a piece of it, in sort of a high-tech version of Musical Chairs. For a big company, it can be a much safer way to innovate than trying to develop a new technology yourself.
We saw something similar a year ago, when Gartner did its first Magic Quadrant on E-Discovery, and predicted that 25% of the companies in it would be acquired by 2014 by major vendors. As it happened, Gartner didn’t even get the report published before the first acquisition happened, and they’ve been falling steadily like little dominoes every since — especially after Gartner conveniently provided a shopping list.
“Probably the prime imperative for Fortune 500 managers is to find areas for revenue and profit growth,” writes Price. “But the challenge is to do so without endangering the existing franchise. Too often, the dilemma from the helm looks like this: You know you need to get into a promising new space, but it’s quite unproven and you suspect running two or three concurrent experiments might bleed cash for years. So in a real sense, you can’t afford – on a quarter-to-quarter income statement basis – to run too many such risky projects. But if you let entrepreneurial startups run the experiments with their energy, time and capital – and let them ring out the technology risk and the market risk – then once a winner appears, you can buy that winner with capital off your balance sheet.”
Certainly Microsoft and StorSimple would qualify.
Since that’s the case, it seems likely that StorSimple competitors like Nasuni and Panzura — which were speaking with a great deal of bravado about the 800-pound gorilla suddenly in their midst — should be expecting to get calls from other large vendors in the next few weeks, and decide which startup exit strategy they plan to follow.
All of a sudden, backing up Oracle databases is big news, with Wikibon and Amazon Web Services each releasing new insights about how to do it.
What makes this a big deal? As Wikibon mentions, nearly 30% of Oracle shops are managing more than 100 TB of data that needs to be backed up. And with ‘big data’ becoming a buzzword, not only is the data getting bigger, but people are paying more attention to it.
Wikibon points out several trends, including increasing virtualization, more space devoted to backups, and that tape is still around. 45% of customers report that more than half of backup data resides on tape, Wikibon says.
But one of the newer backup choices that Wikibon mentions is RMAN. And the advantage to that is brought up in one of the other big recent developments in Oracle backup, which is RMAN’s newer ability to back up to the cloud.
That’s where the Amazon Web Services white paper comes in. It describes how Amazon itself started backing up all its Oracle databases to the cloud using RMAN. While such white papers are often pretty self-serving — and now we’re talking about one where a vendor is using its own product, or what EMC’s Paul Maritz refers to as “eating your own dog food” — this one has some hard numbers behind it.
“The transition to S3-based backup started last year and by summer, 30 percent of backups were on S3; three months later it was 50 percent. The company expects the transition to be done by year’s end — except for databases in regions where Amazon s3 is not available,” writes Barb Darrow for GigaOm. Moreover, the company is saving $1 million per year for backups that take only half as long, she writes.
Whether you want to go the AWS route for Oracle backups or not, the Wikibon report has some interesting information on the backup subject. Granted, some of them are pretty Mom-and-apple pie — implement redundancy, test your backups, use dedupe — but others are more nuanced.
For example, the company notes, organizations are increasingly virtualizing their Oracle servers — which could have an impact on the speed of backing them up. “The big initial attraction of server virtualization is that it increased average utilization from 15% to about 85%,” Wikibon writes. “This means that virtualized environments will see a drastic reduction in overall server capacity, some of which was used to run backups.”
It was just a year ago that the Thailand flooding — only a few months after the Japan earthquake — devastated the storage industry, causing a temporary shortage of disk drives and increase in prices. But now that it’s all over, a funny story is coming out of BackBlaze, which found itself literally thinking outside the box.
The company, which is known for providing low-cost constant backups for its subscribers, is also known for building its cloud out of a whole lot of teeny (well, 3 TB) commodity disk drives rather than a few great big ones. This saves money and helps the company grow more granularly.
The only problem is if you suddenly run out of teeny commodity disk drives — or find that, in a matter of two weeks, that they’ve tripled in price, as BackBlaze did, when it was adding 50 TB of capacity a day. At the same time, the company wasn’t buying enough to be able to get deals from the manufacturers.
In an extremely detailed, hysterically funny blog post, the company is now relating how it dealt with the crisis — basically, by buying them as consumer commodities rather than as parts, and turning them into the parts they needed to build the “storage pods” on which their service was based.
“With our normal channels charging usury prices for the hard drives core to our business, we needed a miracle,” writes Andrew Klein, director of product marketing. “We got two: Costco and Best Buy. On Brian [Wilson, CTO]’s whiteboard he listed every Costco and Best Buy in the San Francisco Bay Area and then some. We would go to each location and buy as many 3 TB drives as possible.”
While the company then had to “shuck” the drives from their cases, this saved the company $100 per drive over buying them from its usual suppliers. Problem solved.
For a while.
“The “Two Drive Limit” signs started appearing in retail stores in mid-November,” Klein writes. “At first we didn’t believe them, but we quickly learned otherwise.” So workers started making the circuit — circled the San Francisco Bay hitting local Costco and Best Buy stores: 10 stores, 46 disk drives, for 212 miles. It put a lot of miles on the cars, and a lot of time, but it solved that problem.
For a while.
Then BackBlaze employees started getting banned from stores.
At that point, they started hitting up friends and family, and not just in the Bay Area, but nationwide. “It was cheaper to buy external drives at a store in Iowa and have Yev’s dad, Boris, ship them to California than it was to buy internal drives through our normal channels,” Klein writes.
(The company also apparently considered renting a moving van to drive across the country, hitting stores along the way — a variation on the “bandwidth of a station wagon of tapes” problem — but decided it wouldn’t be economical.)
By the time internal drive prices got to their normal level, the company had bought 5.5 petabytes of storage through retail channels — or more than 1800 disk drives. But finally, it could go back to its normal practices.
“On July 25th of this year, Backblaze took $5M in venture funding,” Klein writes. “At the same time, Costco was offering 3TB external drives for $129 about $30 less than we could get for internal drives. The limit was five drives per person. Needless to say, it was a deal we couldn’t refuse.”
Disclosure: I am a BackBlaze customer.
First it was HGST with helium. Now it’s Hitachi itself with glass. The company has announced a technology that enables it to store data for what it says is forever.
The technology works with a 2cm square piece of glass that’s 2mm thick, and is etched in binary with a laser. There are four layers, which results in a density of 40MB per square inch. “That’s better than a CD (which tops out at 35MB per square inch), but not nearly as good as a standard hard disk, which can encode a terabyte in the same space,” writes Sam Grobart in Bloomberg. The company said it could also add more layers for more density.
Of course, the selling point is not how dense it is, but that it will, supposedly, last forever, without the bit rot that degrades magnetic storage and is leading some to fear a “digital dark ages” where we will lose access to large swathes of our history and culture because it’s being stored magnetically.
The technology was developed in 2009 and may be made available as a product by 2015, Hitachi said, according to Broadcast Engineering.
There’s more to the digital dark ages than simply preserving the media, however — there’s also the factor of having the hardware and software that enables people to read the data. Anyone who’s found a perfectly pristine 78-rpm record in their grandparents’ attic is familiar with that problem.
Hitachi says that won’t be a problem because all computers, ultimately, store data in binary, and the glass could be read using a microscope. But how it’s encoded in binary — the translation between the binary and turning it into music or movies or whatever — the company didn’t say. The microscope could read it, but how would it know what it meant?
The way it may work is to have organizations with a great deal of data to preserve, such as governments, museums and religious organizations, send their data to Hitachi to encode it, wrote Broadcast Engineering.
The quartz glass is said to be impervious to heat — the demonstration included being baked at 1000 degrees Celsius for two hours to simulate aging — as well as to water, radiation, radio waves and most chemicals, which is why many laboratory containers are made of glass.
On the other hand, the glass is vulnerable to breakage. And as anyone who’s used a microscope has probably experienced, imagine reading the data and then, trying to improve the focus, turning the microscope too far and watching in horror as centuries-old data gets crunched.
Virtualization. In talking about how under-utilized data center servers are, and in appearing to limiting himself to less than state-of-the-art facilities, Glanz failed to notice how prevalent virtualization is becoming, which enables an organization to set up numerous “virtual servers” inside a physical server — which, in the process, results in much higher utilization. “[V]irtualized systems can be easily run at greater than 50% utilization rates, and cloud systems at greater than 70%,” writes Clive Longbottom in SearchDataCenter.
“[I]n many cases the physical “server” doesn’t even exist since everyone doing web at scale makes extensive use of virtualization, either by virtualizing at the OS level and running multiple virtual machines (in which case, yes, perhaps that one machine is bigger than a desktop, but it runs several actual server processes in it) or distributing the processing and storage at a more fine-grained level,” writes Diego Doval in his critique of the New York Times piece. “There’s no longer a 1-1 correlation between “server” and “machine,” and, increasingly, “servers” are being replaced by services.”
“Although the article mentions virtualization and the cloud as possible solutions to improve power utilization, VMware is not mentioned,” agrees Dan Woods in Forbes‘ critique of the piece. “If the reporter talked to VMware or visited their web site, he would have found massive amounts of material that documents how thousands of data centers are using virtualization to increase server utilization.”
Storage. Similarly, Glanz appeared to not be aware of advances in storage technology, even though some of them are taking place in the very data centers he lambasted in his articles. In Prineville, Ore., for example, not all that far from the Quincy, Wash., data centers he criticized, Facebook is working on designing its own storage to eliminate unnecessary parts, as well as setting up low-cost slow-access storage that is spun down most of the time.
Facebook — which does this research precisely because of the economies of scale in its massive data centers — is making similar advances in servers. Moreover, the company’s OpenCompute initiative is releasing all these advances to the computer industry in general to help it take advantage of them, too.
In addition, Glanz focused on the “spinning disks” of the storage systems, apparently not realizing that increasingly organizations like eBay are moving to solid-state “flash” storage technology that use much less power.
Also, storage just isn’t as big a deal as it used to be and as the story makes out. “A Mr Burton from EMC lets slip that the NYSE ‘produces up to 2,000 gigabytes of data per day that must be stored for years’,” reports Ian Bitterlin of Data Center Dynamics in its critique of the New York Times piece. “A big deal? No, not really, since a 2TB (2,000 gigabytes) hard-drive costs $200 – less than a Wall Street trader spends on lunch!”
Disaster recovery. Glanz also criticized data centers for redundancy — particularly their having diesel generators on-site to deal with power failures — apparently not realizing that such redundancy is necessary to make sure the data centers stay up.
And yet, even with all this redundancy, there have been a number of well-publicized data center failures in recent months caused by events as mundane as a thunderstorm. Such outages can cost up to $200,000 per hour for a single company — and a data center such as Amazon’s can service multiple companies. If anything, one might argue that the costs of downtime require more redundancy, not less.
Of course it’s important to ensure that data centers are making efficient use of power, but it’s also important to understand the context.