A while back, I wrote a piece on how the Arizona State University School of Earth and Space Exploration (SESE) moved a petabyte of data from its previous storage system to its new one. That was pretty impressive.
Now, how about 30 petabytes?
- More than 10 times as much as stored in the hippocampus of the human brain
- All the data used to render Avatar’s 3D effects — times 30
- More than the amount of data passed daily through AT&T or Google
So what is there bigger than A&T or Google? It could only be Facebook — which added the last 10 petabytes just in the past year — when it was *already* the largest Hadoop cluster in the world. Writes Paul Yang on the Facebook Engineering blog:
During the past two years, the number of shared items has grown exponentially, and the corresponding requirements for the analytics data warehouse have increased as well…By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center.
What was particularly ambitious is that Facebook wanted to do this without shutting down, which is why it couldn’t just move all the existing machines to the new space, Yang described. Instead, the company built a giant new cluster, and then replicated all the existing data to it — while the system was still up. Then, after all the data was replicated, all the data that had changed since the replication started was copied over as well.
Facebook uses Hive for analytics, which means it uses the Hadoop distributed file system (HDFS), which is particularly well suited for big data, Yang said — which has the potential for being useful more broadly in the future, he added:
As an additional benefit, the replication system also demonstrated a potential disaster-recovery. solution for warehouses using Hive. Unlike a traditional warehouse using SAN/NAS storage, HDFS-based warehouses lack built-in data-recovery functionality. We showed that it was possible to efficiently keep an active multi-petabyte cluster properly replicated, with only a small amount of lag.
Yang didn’t say how long all this took. But the capability will stand Facebook in good stead in the future, as the company builds a new data center in Prineville, Ore., as well as another one in North Carolina, noted GigaOm.
Okay, I don’t usually talk about speeds and feeds here, but this is cool. Western Digital has designed a hard disk drive that lets you have a 1-terabyte drive on a notebook.
Heck, the brick I do my backups on isn’t that big.
(You have to understand, I’m old. When I bought my first computer, in 1983, a Hewlett-Packard HP-150 (hey! it had a touchscreen! it was ahead of its time!), I could have gotten a 10-*mega*byte hard disk with it that was the same size as the computer and cost just as much, even with my employee discount. So this is my tell-us-about-the-first-time-you-saw-a-car-Grandma moment.)
So here’s the deal. The Scorpio Blue is a 2.5-inch form factor drive with standard 9.5 mm height, which means it can fit into a notebook. But it’s the first drive with this type of capacity. The way WD was able to do it was by being able to fit 500 GB on each of two platters, rather than the three platters most drives require, said Jason Bache at MyCE:
Traditional terabyte drives use three 334-GB platters to achieve their capacity, which inevitably makes the drives too thick to fit in anything but a desktop or a specially-modified notebook case. Both Samsung’s and Western Digital’s new drives use two 500-GB platters, made possible by advances in platter formatting.
While Samsung announced a similar drive in June, WD is the first vendor to be able to ship one, Bache says, something that is also repeated in numerous other articles, though vendors are apparently taking orders for it.
Shopping for it can be a little challenging; doing a Google shopping search, for example, you run into all sorts of things, including the 12.5 mm version, and ones that aren’t actually 1 TB even if that’s what you search for. (Oddly, there’s also some priced at $4,000 or more; I wonder if it’s some sort of automatic pricing issue.) Anyway, here’s the real thing, and it seems to be $120 or so.
CNET notes that it spins at 5200 rpm instead of 5400 rpm, which means it’s going to be slower (and is probably also behind the low power requirements, low noise, and low operating temperature that WD is touting).
No doubt, as with the quadruple toe loop, everyone will be doing it now that someone proved they can; if nothing else, Seagate will be doing it because it is acquiring Samsung.
For me, it’s still in the waltzing-bear stage, but it’s tempting, just for the size queen aspect of it.
It’s not always fun to be right.
Less than three weeks ago, I wrote a piece about the downsides of backing up to the cloud vs. backing up to one’s own storage, talking about several potential problems, including that of (as Steven J. Vaughan-Nichols of ZDNet had said before me)
“Wouldn’t that be just wonderful! Locked out of your local high-speed ISP for a year because you spent too much time working on Office 365 and watching The Office reruns.”
As it happens, exactly that has happened — except the culprit wasn’t even The Office reruns, but a cloud backup service!
André Vrignaud, an entertainment industry consultant, described in his blog this week how he was cut off from Comcast — about the only broadband provider in his area — for a year for breaking his 250-gigabyte bandwidth cap two months in a row because, as it turns out, he was backing up his voluminous picture and video files to the cloud.
You know, like writers and vendors keep telling you that you should be doing.
While he knew about the cap, he didn’t realize that data he uploaded counted against it as well as data he downloaded.
One could argue that Vrignaud — who worked at home — shouldn’t have been using a consumer service in the first place. He points out, however, that the business service is considerably more expensive for a lesser service, that he would also be required to sign up for a long-term plan and buy additional equipment he didn’t need, and that, in any event, it was now moot because Comcast had banned him from *all* its services.
Stacey Higginbotham of GigaOm went on to note that it also isn’t always easy to determine what would constitute “business use,” and that neither ISPs like Comcast nor cloud service providers are doing a good job of alerting users to the potential problem nor telling them what to do should it occur. Moreover, she added, aside from the issue of whether such a cap was productive, the particular cap Comcast instituted was archaic; the median usage when Comcast implemented the cap in 2008 was 2 to 4 GB per month, and it has now more than doubled to from 4 to 6 GB per month — but with no increase in the cap.
With services such as cloud backup, online phone calling, and music services such as Spotify becoming more prevalent, this is likely to become more of an issue. Jason Garland, who identifies himself as a senior voice network engineer for Amazon, posted a spreadsheet on Google+ demonstrating that, depending on the speed of the connection, users could hit the cap in less than five hours of a single day. It’s hard to imagine that cloud application providers are going to sit still for this for long.
Today was a big day for people interested in both virtualization and the cloud, with Citrix buying Cloud.com and VMware announcing what it said was a cloud infrastructure platform. The two announcements put the companies in competition not only with each other but with other cloud infrastructure vendors such as Microsoft and Amazon.
Citrix, which bought the virtualization software company XenSource in 2007, has now bought Cloud.com, which produces software that lets companies set up private clouds at a cost of a tenth of that of competing services such as those from Microsoft and VMware, Bloomberg quoted Forrester analyst James Staten as saying.
Cloud.com has already used Citrix Xen software to help it build private clouds for customers such as Zynga, KT, Tata, and Go Daddy, according to Cloud.com CEO and founder Sheng Liang in a blog post.
In comparison, VMware — headed by former Microsoft executive Paul Maritz — announced a cloud infrastructure program that he told the New York Times would be the “Microsoft Office” of cloud computing software.
The VMware announcement was not as innovative as the Citrix one, noted GigaOm, but VMware has the advantage of being bigger and having a larger share of the virtualization market.
Either way, this is supposed to be good for the user. The announcements — as well as several smaller ones by Microsoft — “signaled significant progress in making cloud platforms (Infrastructure as a Service (IaaS) and Platform as a Service (PaaS)) more enterprise ready and consumable by I&O professionals,” Staten wrote in a blog post for Forbes.
Ironically, this is all happening against a backdrop of user organizations finding that neither virtualization nor cloud implementations are meeting their expectations — though they still plan to support those movements increasingly over the next year — as well as concerns that the extra bandwidth required to support the cloud could be both more expensive and less reliable than in-house storage.
What nationality is your data?
It might sound like a funny question, but in these days of multinational companies, data in the cloud, software as a service, and worldwide replication, it’s deadly serious.
It’s particularly becoming an issue for companies outside the U.S. that are concerned about their data “entering” the U.S., and consequently becoming subject to laws that would enable the U.S. government to seize the data — perhaps without the parent company even knowing about it.
This has become an issue with data providers such as Dropbox, which revealed earlier this year that it would release data to U.S. authorities if required to, as well as with Microsoft’s cloud-based Office 365 — to the extent that the issue could affect the product’s success outside the U.S., according to Computerweekly:
Microsoft, like other cloud providers, will need to clarify data sovereignty issues, if Office Live is to be taken seriously. While it does have a datacentre in Dublin – so it can guarantee data resides in the EU – Microsoft is headquartered in the US and will be subject to US legislation, such as Homeland Security, as well as UK and EU law. It is far from clear how government legislation will affect data in the cloud. But this will be an issue enterprises will need to address if they are to take Office 365 seriously.
The issue is also arising in Australia, which is developing its own government cloud computing initiative, provided by Hewlett-Packard, but is concerned about the ramifications of data leaving the country — for availability reasons as well as security.
Moreover, by giving the U.S. government access to company data, a company could potentially be violating its own country’s laws. According to The Register:
Buyers of off-the-peg cloud contracts could unwittingly be putting themselves in breach of UK data protections laws, says Kathryn Wynn, an associate at the law firm Pinsent Masons. Many service providers have standard terms that specify compliance with US laws, for example, which could put the customer in breach of the UK’s Data Protection Act.
The Data Protection Act forbids sensitive data from being stored offshore. Companies in the European Union should make sure that their data providers have “safe harbor” agreements, the Register said.
It’s not just the U.S., either. Other examples of different countries’ restrictions on data include France, which used to disallow encryption unless the government was given the key, and Germany, which has even stricter privacy regulations than does the E.U. Countries and companies will need to work together to negotiate these differences.
People sometimes ask me why I still have DVDs when I have Netflix, why I still have CDs when I have iTunes, why I still have a landline when I have a cellphone and Skype, and why I still have books when I can download all that stuff off the Internet. The thing is, I’m a mistrustful old cuss and I don’t like depending on a single source for things.
Consequently, it makes me nervous when people talk about putting everything on the cloud. Yes, I agree, there’s specific use cases where that’s preferable. And there’s replication, multiple copies, worldwide access, I can get to it anywhere. Fine.
But it’s not here. It’s that high tech/high touch thing, as John Naisbitt would say.
So that’s why it was particularly interesting to read a couple of different takes on the cloud and how it relates to storage. Supposedly, part of the reason for going to the cloud is to save money — by paying operational expenses to someone to manage your storage instead of having to pay capital expenses and salary to buy and manage your own storage.
But an organization called Backblaze did a truly wonderful infographic talking about the cost of storage vs. the cost of bandwidth over time — and bandwidth wasn’t winning. The price of buying a gigabyte vs. downloading a megabit per second crossed over around 1995 — and the split’s been getting wider since then. In fact, if bandwidth had decreased in cost in the U.S. as quickly as the cost of storage, we’d get 985 Mbps for $45 a month.
A couple of pundits found this infographic very interesting.
“Cloud computing, if anything, depends on the idea that we will have ample and cheap bandwidth that will allow us to access various types of information and services on any kind of device, anywhere. The rapid growth of cloud as outlined by Amazon CTO Werner Vogels at our Structure 2011 conference only underscores the need for more bandwidth. This need only goes up as we start living in an on-demand world, streaming instead of storing information locally,” wrote Om Malik of GigaOm.
And says Tim Worstall at ChannelRegister UK:
“Yes, access from anywhere is lovely, and being able to get at your data as long as you have access to the cloud is cool. Being able to time-share is also pretty good: we don’t all need to have huge computing power at our fingertips all the time and the cloud can provide us with that when we need it. However, part of the basic contention does depend upon the relative prices of local storage and the cost of transporting from remote storage to local use and usability: in short, the cost of bandwidth. If local disk space is falling in price faster than bandwidth, then the economics are moving in favour of local storage, not the cloud.”
But that’s just the cost factor. Steven J. Vaughan-Nichols at ZDNet went on to talk — in the context of cloud-based applications such as Microsoft Office 365, but the principle is the same — about the dangers of relying on access to the cloud to run a business, and how such access was becoming less sure over time, not more. He points out potential problems such as delays in Internet access during busy times and the increasing number of Internet service providers imposing bandwidth caps:
“Wouldn’t that be just wonderful! Locked out of your local high-speed ISP for a year because you spent too much time working on Office 365 and watching The Office reruns.”
I’m thinking I should buy another terabyte of storage for my home office. Maybe two.
In a move that might let every individual user learn the joys of virtualization, a Windows blogger, Robert McLaws, discovered this week that the forthcoming Windows 8 operating system has a Microsoft Hyper-V hypervisor built into it.
This made a lot of people very excited (more than 12,000 people hit Robert’s blog by the time I write this). Previously, Hyper V has only been used as a server-end virtualization utility, noted International Business Times.
So, that leaves us with two major questions:
What does this mean for Windows users?
McLaws laid out a dozen new features with the hypervisor, including support for more than four cores and the ability to support up to a 16-TB disk. In addition, the system has the potential to offer much improved emulation support, including for Windows XP and Windows 7, as well as Windows Phone 7. That’s particularly good for developers writing applications for those operating systems, he says. Linux or Apple operating systems are also a possibility.
It would also help organizations that still run legacy software, says Mike Halsey of Windows 8 News and Tips. “[T]he largest problem facing Windows is the need for legacy support, which can account for 80% of the latches and updates delivered to the operating system, and is also a major factor in some older software not working.”
It could also help protect the machines by virtualizing everything. In particular, it could help solve the problem of people using their business computers for personal reasons, said Eric Knorr and Doug Dineley in InfoWorld.
One basic division would be between a “business virtual machine” and a “personal virtual machine” running on the same client. The business virtual machine would be a supersecure environment without any of the personal stuff users download or run; changes to that business virtual machine would be synced to the server when users were online. If the client hardware was lost or stolen or the user’s relationship with the company ended, the virtual machine could be killed by admins remotely.
(InfoWorld’s J. Peter Bruzzese and ZDNet’s Mary Jo Foley were also competing about which of them had first come up with the idea that Microsoft should do this, with Bruzzese pointing to a 2009 column and Foley pointing to a 2010 article that referenced a 2009 blog post).
Not everyone, though, was enamored. Take Kevin Fogerty of ITWorld, who expects it to show up only in high-end versions of the operating system:
It would be a great idea if – at least right now – provisioning, storing, launching and managing VMs on a desktop weren’t already too complicated for most users to handle. Rather than reducing support requirement, it might increase them. It would also confuse users who often can’t tell the difference between the monitor, the computer, the applications and the “cloud” what they’re actually working with, making support calls infinitely longer and even more frustrating than they are now.
What does this mean for VMWare users?
Hyper-V isn’t the only hypervisor out there, of course, and what will having Hyper-V built into Windows 8 do for people running VMWare? As Fogerty says:
[O]nce all those copies of Hyper-V are running on everyone’s desktop, what possible reason could there be to go buy desktop virtualization from VMware, Citrix or elsewhere. Microsoft would put itself back in the game for the virtual desktop by giving away in the operating system all the goodies other vendors rely on for revenue and growth. I wonder if Microsoft has ever done that before?
Needless to say, the VMWare support boards were buzzing with the news, as well, and while VMWare people weren’t talking, they didn’t sound particularly nervous or surprised, either. “Unfortunately NDAs prohibit the release of any information and VMware won’t officailly comment, so we will all just have to wait and see,” noted VMRoyale, one of the moderators of the VMWare community. “Who knows – maybe the intent is already there??” replied Maishsk, another moderator.
Vendor surveys are always dicey; it’s remarkable how often survey responses just happen to line up perfectly with the vendor’s product line. But Symantec’s survey on Virtualization and Evolution to the Cloud seems on the up-and-up, if only because the results are such a bummer to virtualization and cloud vendors.
Most notably, organizations that have implemented various kinds of virtualization and cloud technology indicated that they were frequently disappointed by the results not meeting their expectations. Server virtualization was actually one of the better ones, having an average expectation gap of 4% overall, including 7% scalability, 12% in reducing capital expenditures, and 10% in reducing operational expenditures. In fact, more than half of respondents (56%) said storage costs somewhat or significantly increased with server virtualization.
In contrast, private Storage as a Service had an average expectation gap of 37%, including 34% in scalability, 40% in reducing complexity, and 37% in efficiency. Storage virtualization was almost as bad, with an average expectation gap of 34%, including 32% in agility, 35% in scalability, and 32% in reducing operational expenditures. Hybrid/private cloud computing had a 32% average expectation gap, composed of 39% in time to provision new resources, 34% in scalability, and 29% in security. Finally, endpoint/desktop virtualization had an average expectation gap of 26%, including 27% in new endpoint deployment, 30% in application delivery, and 27% in application compatibility.
Symantec attributed the varying gaps to the varying degrees of maturity of the various technologies, noting that server virtualization is more mature than storage-as-a-service, for example. “Expectations are unlikely to be matched by reality until IT organizations gain sufficient experience with these technologies to understand their potential,” the report said. “These gaps are a hallmark of early stage markets where expectations are out of step with reality.”
Similarly, organizations indicated that they were more willing to virtualize business-critical applications than they were to put them on the cloud — probably for the same reason.
“Among those currently implementing hybrid/private cloud computing, the most common concerns regarding placing business-critical applications into the cloud are related to disaster recovery, security and maintaining control over data. Disaster Recovery concerns were expressed by 70 percent of respondents, and more than two-thirds expressed concerns over loss of physical control over data and fear of hijacked accounts or traffic. Other concerns involve performance and compliance issues.”
The survey also noted that executives were much more hesitant about placing business-critical applications into a virtualized or cloud environment than were more IT-specific people such as server groups and application owners, due to concerns about reliability, security, and performance. At the same time, actual implementation results typically met performance goals. Symantec attributed this misperception in the face of reality to a lack of communication between IT and executives, meaning that executives weren’t hearing from IT the degree to which such implementations actually were successful.
That said, of enterprises that are implementing virtualization, more than half (59%) plan to virtualize database applications in the next 12 months, 55% plan to virtualize Web applications, 47% plan to virtualize email and calendar applications and 41% plan to virtualize ERP applications.
About a quarter of the survey’s respondents said their organizations have already implemented some form of virtualization or cloud, with another quarter in the midst of implementing, 20% in pilot projects, and about 20% discussing or planning for it.
The survey was performed in April, and consisted of 3,700 organizations of various sizes in 35 countries, including small, medium, and large enterprises. Respondents represented a wide range of industries and included a mix of C-level (CIO, CISO, etc.) executives (31%), IT management who were primarily focused on strategic issues (35%), and IT management primarily focused on tactical issues (34%). 60% were 31 to 49 years of age, with the rest split between those less than 30 (30%) or older than 50 (10%). 79% were male. The typical respondent had worked in IT for 10 years. 20% said their companies were shrinking in terms of revenue, while 61% reported growth, Symantec said.
“Excitement” and “storage” aren’t really words that go together very often. But this week was different, with the first storage IPO in three years sparking a surge of interest not only in the newly public flash vendor Fusion-io (NYSE:FIO) but investments in several other storage companies as well.
Fusion-io has the advantage of a couple of big names associated with it — chief technology officer Steve Wozniak (am I old enough that I have to explain his connection with Apple?) and Facebook, which uses the company’s storage devices. Another couple of big names that helped were LinkedIn and Zynga, not because they use the company’s products but by having successful IPOs in the computer industry in the past few weeks that paved the way.
Like LinkedIn, Fusion-io raised the planned price of its IPO the day before it went public, to $16 to $18 per share after originally suggesting it would be priced at $13 to $15 — and then actually priced it at $19, raising $233.7 million and giving the company a valuation of $1.8 billion, according to Investor’s Business Daily.
The Debbie Downers at the Wall Street Journal, however, pointed out a number of issues with Fusion-io:
- It doesn’t expect it to maintain its growth
- It’s never made a profit for an entire year at a time
- The nine-month period ending March 31 showed a slender profit of $35,000
- 10 of its clients account for 91% of its revenue
- Facebook alone accounts for 47% of its revenue
- Oh, and by the way, that was going to decrease
- Virsto Software, a virtual machines storage company, raised $12 million in Series B venture capital funding led by InterWest Partners with August Capital and Canaan Partners also participating
- Virsto also acquired EvoStor, which specializes in storage virtualization technology for VMware environments, for an undisclosed amount
- Flash array maker Violin Memory raised a $40 million Series C round from public-market investors
- VeloBit raised an undisclosed amount of Series A funding from Fairhaven Capital and Longworth Venture Partners
IDC recently released its Q1 disk storage systems sales figures, and there’s good news and…well, actually, it’s pretty much just good news, unless you’re Dell or a small vendor.
Here’s several aspects of the good news:
- 13.2% growth in external disk storage factory revenues year over year
- 17.3% growth in open networked disk storage systems
- 13.4% growth in open SAN
- 27.1% growth in NAS
- 23.0% growth in iSCSI SAN
- 12.1% growth in total disk storage systems
- Fifth quarter in a row of double-digit growth
- 46.3% growth in capacity
Broken down by vendor, in terms of market share, things haven’t changed much, relatively speaking. In external disk storage, the top five vendors are EMC, NetApp, IBM, HP, and Fujitsu — NetApp and IBM swapped places compared with a year ago. In the total open networked disk storage market, EMC led NetApp. Broken out into components, Open SAN had EMC, IBM, and HP; NAS had EMC and NetApp; and iSCSI SAN had Dell, followed by HP and EMC tied for second. Finally, in worldwide total disk storage factory revenue, we have EMC, HP, IBM, Dell, and NetApp, the same order as a year ago.
There are, however, a couple of interesting points to be made:
- We saw a case of “the rich getting richer.” Generally, the market shares of the top vendors increased, while the market share of “other” decreased.
- The one exception was Dell, which went from 12.7% to 11.4% — and that was *after* IDC started including Compellant in its figures, after the company’s acquisition. Chris Mellor of the Register UK points out that Dell fell completely out of the top 5 in external disk revenues, being replaced by Hitachi, with which it had tied in the previous quarter. In fact, in total revenues, NetApp may overtake Dell in the next quarter, he adds.
- In external disk storage, ranked by revenue growth, we’d have seen NetApp, Hitachi, EMC, IBM, and HP.
- In total disk storage, we’d have seen NetApp, EMC, IBM, HP, and Dell. Mellor points out, however, that NetApp’s growth has slowed compared to previous quarters.
Some of HP and EMC’s growth is due to acquisition — in HP’s case , it’s H3C and 3PAR, while in EMC’s case it’s Isilon.
It will be interesting to see how things change in the next quarter.
- With everyone talking about the cloud, will fewer people be buying fewer drives?
- Or will the storage sold to all the cloud vendors make up for it?
- Or, will the Amazon outage send people scrambling to take care of their own storage again?
- What will happen to disk storage sales as flash becomes more popular?
- How might acquisitions in the drive manufacturing space change things in the system space?
- What will happen with Dell?