EMC is sort of the IBM of the storage industry — big, not necessarily terribly exciting or innovative, but continuing to be a major player because, remember big? Just like IBM can suddenly decide to make a particular technology front-page news by throwing a billion dollars at it, like it did with flash a few weeks ago, EMC can make a big deal about storage virtualization, software-defined storage, mobile, cloud, and so on simply by virtue of being EMC, even though other storage vendors have been doing it for years.
There’s other places to read about the specific announcements so I won’t go into them, other than to observe that EMC is saying you will be able to use them to have your own Facebook-like data center. Except the whole point of the Facebook’s data center storage is that it uses commodity hardware, and if you’re using commodity hardware, then what do you need EMC for anyway? I know, I know, it’s a metaphor, never mind.
Befitting the conference’s theme of “transformation,” EMC seemed to be spending an awful lot of time explaining the various reorganizations it’s had over the past few years, starting when CEO Joe Tucci decided he was going to retire, then changed his mind, followed by a lot of musical chairs between EMC and VMware, and culminating in the recent announcement of Pivotal, which rearranges yet more pieces of EMC and VMware.
At the same time, the company also spent a lot of time talking about the “third platform” — a conglomeration of mobile, big data, cloud, and so on, after the first platform of mainframes and the second platform of client/server. After all, if EMC can make mobile and the cloud sound like just another generational version of mainframes, it sounds more like they’ll continue to be the logical alternative, right?
And of course EMC is going to do all it can to promote big data. Like Cowboy Curtis, who knows that “big feet” means “big boots,” EMC knows that big data means big hardware to put it on, and nobody does it bigger than EMC.
Ironically, this was all happening against a backdrop of EMC announcing it was laying off more than 1,000 people, with VMware laying off another 800. The company said it was always doing this and that by the end of the year it would actually have more people than it started with. Okay. But seriously? After all the investment in hiring and training those people, the company sees no other way but to do a forklift upgrade of its employees?
On second thought, for EMC, maybe that isn’t so surprising after all.
In any event, EMC has to at least go through the motions of being up on what users are interested in, lest it sound too much like another Bruno Mars number, “The Lazy Song” [mildly NSFW]:
Today I don’t feel like doing anything
I just wanna lay in my bed
Don’t feel like picking up my phone, so leave a message at the tone
‘Cause today I swear I’m not doing anything
I’ll be lounging on the couch just chilling in my Snuggie
Click to MTV so they can teach me how to dougie
‘Cause in my castle I’m the freaking man]]>
As an example, analyst firm IDC included the Austin, Texas-based StoredIQ in its IDC MarketScape: Worldwide Standalone Early Case Assessment Applications 2011 Vendor Analysis, but Gartner hasn’t included it in either of its e-discovery Magic Quadrants — from which a number of larger vendors have plucked other acquisitions. (However, Gartner did name StoredIQ as a “Cool Vendor” in April of this year.”)
Instead, IBM is working on creating a family of “information lifecycle management” applications, which are kinda both — big data, because it covers all an organization’s data, but also e-discovery, because part of the reason for having such applications is for litigation support, both for identifying data needed in legal situations but also to help reduce, in a legally justifiable way, the amount of such data in the first place.
StoredIQ’s advantage is that it manages the data in situ rather than by moving it to a secondary location, which saves the cost of the secondary storage, noted Zacks Equity Research, adding that the company had received $11.4 million in funding in August and had 120 clients — though it warns that IBM faces competition from vendors such as EMC, Oracle, and SAP.
The company has also been working to make its product, which includes software and an appliance, easy enough for even legal professionals to use, rather than requiring IT people to operate. In addition, it has partnered with a wide variety of other vendors over the years, including NetApp, EMC, and NewsGator, and supported a number of formats, including SharePoint and Office 365.
As big data has become more prevalent, companies are interested in saving their data in hopes of being able to analyze it at some point and improve their businesses. But what it calls data hoarding is a problem for two reasons, notes Law Technology News.
First, there’s the cost. Though the price of storage itself has been dropping, it still costs something, plus there’s the cost of managing it, backing it up, and so on — which could amount to $5,000 per terabyte, Law Technology News said.
Second, there’s the legal cost. Should an organization be sued, it not only needs to provide all the pertinent information that the other side asks for, but it has to find it in the first place — and the more data a company has, the more expensive that search is. Also, companies have to balance the value of the data for analysis with what it might cost them should it reveal something in a lawsuit. This cost is on the order of $15,000 per gigabyte, Law Technology News said.
In fact, legal organizations have been advising companies to look for opportunities to delete data, pointing out how much money they can save. However, they have to do this in a regular fashion, because once a lawsuit is filed, a “legal hold” is put on the data and it can’t be deleted, or a company is subject to large fines.
The acquisition becomes part of IBM’s Information Lifecycle Governance suite, headed by Deidre Paknad, vice president of Information Lifecycle Governance. Paknad had been CEO of PSS Systems Inc. in Mountain View, Calif., a pioneer in the e-discovery space, which itself was acquired by IBM in 2010. The group also includes Vivisimo, which IBM acquired earlier this year.
The acquisition was not a surprise; IBM had partnered with StoredIQ for two years. As is typical for IBM, it did not reveal the cost of the acquisition. It is expected to be finalized in the first calendar quarter of 2013.]]>
No, I’m not changing the name of the blog.
The proximate cause for the discussion now is a presentation by Shantanu Gupta, director of Connected Intelligent Solutions for Intel’s Data Center and Connected Systems Group, which showed up in GigaOm the other day. According to this presentation, what comes after yottabyte is brontobyte, or 10 followed by 27 zeroes.
This is not definite; as GigaOm’s Stacey Higginbotham points out, it’s not an official prefix, though it has been discussed since at least 1991 — though, that far back, it was 10 followed by 15 zeroes. It does, however, appear to be more accepted for the number than does hella-, which had a brief flurry a couple of years ago as people tried to promote it.
Past bronto, to 10 followed by 30 zeroes, it gets more complicated, partly due to what honestly looks like typographical errors.
Gupta refers to “geobyte” in his presentation — but also refers to “bronobyte” as opposed to “brontobyte” for 10 followed by 27 zeroes . Wikianswers also refers to “geobyte.”
Higginbotham refers to “gegobyte” for the figure, as does Seagate in a blog posting riffing on the GigaOm post.
On the other hands, answers.yahoo.com uses “geopbyte” for the figure, as does the Urban Dictionary and whatisabyte.com.
Geo-, gego-, or geop-? It kind of doesn’t matter, because it’s all unofficial anyway, but somebody might want to figure it out at some point.
Beyond what-do-we-call-it, we also have the obligatory how-to-put-it-in-terms-we-puny-humans-can-understand discussion, aka the Flurry of Analogies that came up when IBM announced a 120-petabyte hard drive a year ago. Depending on where you read about it, that drive was:
So, how big are bronto- and geo/gego/geop-?
Well, GigaOm wrote, ”Cisco estimates we’ll see a 1.3 zettabytes of traffic annually over the internet in 2016.” On the other hand, GigaOm cited a piece with the Cisco estimate being 130 exabytes, which would only be .13 zettabytes if I have my math right. Seagate estimates that total storage capacity demand will reach 7 zettabytes in 2020.
Yottabytes is in the realm of CIA and NSA spy data, noted a piece in the Examiner.com, which went on to point out, “As of 2011, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world does not amount to even one yottabyte, but was estimated at approximately 160 exabytes in 2006. As of 2009, the entire Internet was estimated to contain close to 500 exabytes.” A yottabyte would also be 250 trillion DVDs, GigaOm wrote.
For brontobyte, which Gupta said would be used primarily for the “Internet of Things” ubiquitous sensors, there are also somewhat fanciful definitions such as “More than the number of all the cells of the human body in each person living in Indiana and then some,” and ”You would need a brontobyte computer to download everything on the Internet” (though, apparently not, according to Examiner.com).
Of course, once we start talking in terms of trillions of DVDs, obviously we’ve got to find another unit of measure. Interestingly, Seagate used geographic area.
“If today’s 4 terabyte 3.5-inch drive is roughly .16 square feet, you can get approximately 24 terabytes per square foot. That’s .0046 square miles of land mass per 4 terabytes. Assuming 1 terabyte per disk was the maximum areal density, and hard drives will not get any thicker than 1 inch:
Beyond that? Wikianswers postulated Saganbyte, Jotabyte, and Gatobyte, while Wikipedia referred to a system working backward through the Greek alphabet — though that one wouldn’t include brontobyte or geo/gego/geopbyte.
What makes this a big deal? As Wikibon mentions, nearly 30% of Oracle shops are managing more than 100 TB of data that needs to be backed up. And with ‘big data’ becoming a buzzword, not only is the data getting bigger, but people are paying more attention to it.
Wikibon points out several trends, including increasing virtualization, more space devoted to backups, and that tape is still around. 45% of customers report that more than half of backup data resides on tape, Wikibon says.
But one of the newer backup choices that Wikibon mentions is RMAN. And the advantage to that is brought up in one of the other big recent developments in Oracle backup, which is RMAN’s newer ability to back up to the cloud.
That’s where the Amazon Web Services white paper comes in. It describes how Amazon itself started backing up all its Oracle databases to the cloud using RMAN. While such white papers are often pretty self-serving — and now we’re talking about one where a vendor is using its own product, or what EMC’s Paul Maritz refers to as “eating your own dog food” — this one has some hard numbers behind it.
“The transition to S3-based backup started last year and by summer, 30 percent of backups were on S3; three months later it was 50 percent. The company expects the transition to be done by year’s end — except for databases in regions where Amazon s3 is not available,” writes Barb Darrow for GigaOm. Moreover, the company is saving $1 million per year for backups that take only half as long, she writes.
Whether you want to go the AWS route for Oracle backups or not, the Wikibon report has some interesting information on the backup subject. Granted, some of them are pretty Mom-and-apple pie — implement redundancy, test your backups, use dedupe — but others are more nuanced.
For example, the company notes, organizations are increasingly virtualizing their Oracle servers — which could have an impact on the speed of backing them up. ”The big initial attraction of server virtualization is that it increased average utilization from 15% to about 85%,” Wikibon writes. “This means that virtualized environments will see a drastic reduction in overall server capacity, some of which was used to run backups.”]]>
The Digital Government plan doesn’t even mention the word “storage,” even though open data accessible to everyone is one of the linchpins of the plan.
But a recent survey by MeriTalk of 151 federal government IT professionals about big data found that storage was already an issue.
Factors found in the survey indicate the following:
It is not one humungo drive; it is, in fact, an array of 200,000 conventional hard drives (not even solid-state disk) hooked together (which would make them an average of 600 GB each).
Unfortunately, you’re not going to be able to trundle down to Fry’s and get one anytime soon. No, this is something being put together by the IBM Almaden research lab in San Jose, Calif., according to MIT Technology Review.
What exactly it’s going to be used for IBM wouldn’t say, only that it was “an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena.” Most writers speculated that that meant weather, though Popular Science thought it could be used for seismic monitoring — or by the NSA for spying on people.
Like the Cray supercomputer back in the day, and some high-powered PCs even now, the system is reportedly water-cooled rather than by using fans.
Needless to say, it also uses a different file system than a typical PC: IBM’s General Parallel File System (GPFS), which according to Wikipedia has been available on GPFS has been available on IBM’s AIX since 1998, on Linux since 2001 and on Microsoft Windows Server since 2008 and which some tests have shown can work up to 37 times faster than a typical system. (The Wikipedia entry also has an interesting comparison with the file system used by big data provider Hadoop.)
GPFS provides higher input/output performance by “striping” blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel.”
The system also has a kind of super-mondo RAID that lets dying disks store copies of themselves and then get replaced, which reportedly gives the system a mean time between failure of a million years.
Technology Review didn’t say how much space it took up, but if a typical drive is, say, 4 in. x 5.75 in. x 1 in, we’re talking 4.6 million cubic inches just for the drives themselves, not counting the cooling system and cables and so on. That’s a 20-ft. x 20-ft. square almost 7.5 feet high, just of drives. (This is all back-of-the-envelope calculations.)
In fact, the system needs two petabytes of its storage just to keep track of all the index files and metadata, Technology Review reported.]]>