Zhang wrote a note and put up flyers about the theft, which was picked up by ABC News and which a friend of his posted to his Facebook page, and which was then posted to Reddit and many other websites beyond that. He offered $1000 to the thieves for the data, telling them exactly where on the disk they could find it, giving them the password, and telling them they could keep the computer already; he just wanted to graduate.
Now, in honor of the “Everything Wrong With … in X Minutes” CinemaSins YouTube movie spoofs (and they’re hysterical), here’s everything wrong with this story.
William Steven Albaugh, 67, was arrested after police found “numerous files of child pornography” on his Verizon online storage locker and several thumb drives, the Baltimore Sun reported. “Detectives began investigating Albaugh after Verizon Online notified the National Center for Missing and Exploited Children that Albaugh, a subscriber, had stored images of children engaged in sexual acts on the online cloud storage system, police said.”
We already learned, in 2011, that cloud storage systems such as Dropbox would turn over files if requested by law enforcement. We also learned that some systems such as Dropbox, when a file is uploaded, check to see if it’s already online, and, if so, just save a pointer to the original copy. While this saves space, it also means that, in theory, law enforcement could upload any number of files it’s illegal to own — such as copies of movies — and if the stored file length is less than the original file, it means someone has it on the system already.
In the process of that, we learned, if we didn’t know already, that in 2010 New York Attorney General Andrew Cuomo made an agreement with several online services such as Facebook and LiveJournal to check uploaded images for child pornography. ”Through its investigations, the Attorney General’s Office has created a database of more than 8,000 hash values that are associated with images of child pornography,” the Attorney General’s office wrote at the time. “The database can be used to identify the corresponding child pornography images through the fingerprints and stop that picture from ending up on a site.” The office also said it would continue working with other online services to encourage them to do the same thing.
Apparently, at least one of them was the Verizon Online Backup and Sharing cloud storage service.
Media outlets have pointed out that this was all clearly spelled out in the terms of service. “Like many types of online storage or media services, Verizon’s Online Backup and Sharing states in its terms of service that the company is ‘required by law to report any facts or circumstances reported to us or that we discover from which it appears there may be a violation of the child pornography laws,’” writes the International Business Times.
Because, of course, we all read every word of our terms of service.
If this sounds familiar, it may be because, as of last July, four out of the five cases concerning whether people have to provide the key to their encrypted storage also have had to do with child pornography, according to the Electronic Frontier Foundation’s attorney Marcia Hoffman.
Look, there isn’t any question that child pornography is bad. But there’s a saying, “Hard cases make bad law” — that is, an unpleasant case can lead to a harsher general law that can end up being more widely applied. (We don’t know whether law enforcement is more likely to push the envelope of legal search because they so badly want to catch child pornographers, or because they think people will be less likely to criticize their methods because the crime is so heinous.)
If it’s determined through these cases that checking people’s files as they are uploaded to a cloud storage service is an acceptable practice, it has the potential to apply to all files and all people, not just ones we don’t like.
In the meantime, it sounds like we’d better be sure to read our terms of service carefully.]]>
That’s the question that the Electronic Privacy Information Center (EPIC) is asking the Supreme Court to decide, in a friend-of-the-court brief it has filed in the case of Jennings vs. Jennings.
“The privacy group is filing an amicus brief asking the high court to accept an email privacy case from South Carolina that’s exacerbated confusion over what courts consider electronic storage,” writes the political journalism site Politico. “In the filing, submitted on behalf of nearly 20 privacy advocates, EPIC tells the Supreme Court that email privacy rules and definitions have become increasingly unclear, thanks to the rise of cloud computing, and Congress has yet to step in to fill the gap.”
The whole issue of what “storage” is became an issue last fall, when a South Carolina Supreme Court ruled that, under the Stored Communications Act (SCA), email in a Yahoo! account should not be considered protected from unauthorized access because email sitting in the cloud was not “stored” the same way as it would be sitting on one’s own computer — which was protected.
This means that was also true for anyone who uses a cloud-based email system — not just Yahoo, but also Gmail and a plethora of other systems. Not to mention some components of the federal government itself that have moved to cloud-based email, EPIC notes in its brief.
The original case was a domestic dispute — a husband was cheating on his wife, and the wife’s daughter-in-law figured out the husband’s e-mail password and logged in to his personal account to read the e-mails between the husband and his paramour, wrote Orin Kerr in The Volokh Conspiracy legal blog. ”The daughter-in-law found the e-mails and shared them. The husband filed suit under several laws including the Stored Communications Act, 18 U.S.C. 2701, which only allows a civil suit if the e-mails accessed were in “electronic storage.””
The Supreme Court may get involved because this decision conflicts with a similar case by the Ninth Circuit Court in 2004, wrote Andrew Hoffman at the Information Law Group blog.
“The Jennings opinion establishes a split with the Ninth Circuit’s opinion in Theofel v. Farey-Jones, 359 F.3d 1066 (9th Cir. 2004), which found that emails that had been received, read, and left on the server were stored “for purposes of backup protection” and therefore within the ambit of the SCA,” Hoffman wrote.
This is a problem because it’s not good for different courts to have different ideas of what does and doesn’t constitute a legal issue, Hoffman wrote. “Thus, until the split of authority is resolved, the same conduct will disparately subject some individuals to civil liability, depending on the interpretation of the SCA applied by the court. Such disparate interpretations could create an incentive for forum shopping and pose conflict of law questions, when multiple states (and even nations) could be involved in an email hacking case. Such disparate interpretations may also pose problems for employers investigating suspected employee misconduct involving webmail.”
Just to show how confused the South Carolina court was, its judges couldn’t even agree on why the email wasn’t stored, but instead had three different opinions, Kerr wrote.
Aside from the issue of protection, the issue of defining what storage is is important because it is the primary difference between the Stored Communications Act — the law under which the original suit was filed — and the Electronic Communications Privacy Act, according to EPIC.
A related question is “What is a backup?” because some of the legal arguments also hinged on whether the email retrieved from the account was the “only copy” or a backup — a question that is kind of irrelevant in cloud storage, which may feature multiple replicated copies of data, EPIC writes.
“A wealth of personal and private messages are now stored remotely in the cloud, and their protection depends on the interpretation of ‘electronic storage’ under ECPA,” EPIC writes.]]>
“Dropbox will be acquired by a major enterprise infrastructure player,” the company wrote. “In another sign that “consumerization” doesn’t mean mimicking consumer technologies in the enterprise but actually acquiring and/or integrating with widely adopted consumer offerings in the enterprise, IDC predicts that Dropbox will be acquired by a major enterprise infrastructure player in 2013. This will certainly be an expensive acquisition, but it will be one that brings an enormous number of consumers (many of whom are also employees), and a growing number of ecosystem partners, along with Dropbox’s technology.”
“Expensive” is putting it mildly; a $250 Series B funding round last fall gave the company a $4 billion valuation, which is expected to be even higher now (though GigaOm still thinks the market is small). Only a major enterprise infrastructure player would be able to afford it.
Part of what makes this prediction interesting is that a Dropbox IPO has been rumored — and highly anticipated — since last year. Dropbox founder and CEO Drew Houston had reportedly received a nine-figure acquisition offer from Apple early on, Forbes reported last year, but turned it down because he wanted to run a big company — though he sounded at the end of the article as though he might be reconsidering that.
As he walked out of [Facebook founder Mark] Zuckerberg’s relatively modest Palo Alto colonial, clearly enroute to becoming the big company CEO he had told Steve Jobs he would be, Houston noticed the security guard parked outside, presumably all day, every day and pondered the corollaries of the path: “I’m not sure I want to live that life, you know?”
The downside with getting a big funding round is that eventually investors want to see some return on their investment — and typically that means either an IPO or an acquisition. Employees also typically want their big buyout, though Dropbox employee stock has reportedly been available on the secondary market.
The advantage of an acquisition by a major vendor is that it could give Dropbox the credibility and structure it would need to fit into the enterprise. It’s not that people aren’t using Dropbox. Quite the contrary — a recent survey by storage vendor Nasuni found that 20% of corporate users were using Dropbox.
This is despite the security and governance holes inherent with using a system such as Dropbox, the security holes in Dropbox in particular, and rules that corporations have attempted to put into place to keep people from using it. (Nasuni found that 49% of the people whose companies had rules against it were using it anyway.) As long as people have multiple devices — and they show no signs of stopping — and need access to their files, as well as the ability to send large files to other people, there’s going to be a need for the functionality, and all the rules in the world aren’t going to stop it, especially when, as Nasuni’s survey indicated, some of the worst offenders are executives.
“The most blatant offenders are near the top of the corporate heap — VPs and directors are most likely to use Dropbox despite the documented risks and despite corporate edicts,” writes GigaOm’s Barb Darrow. “C-level and other execs are the people who brought their personal iPads and iPhones into the office in the first place and demanded they be supported.”
So being purchased by a major player offers the opportunity to rein in some of these users, while still giving them the functionality they need. The company itself has also indicated that it plans to address the issue to make the product safer for corporate users — which would also make it more attractive to an acquirer.
The other likely aspect is that, as we’ve seen with e-discovery and other emerging markets, when the first big vendor goes, many of the smaller vendors quickly follow like dominoes. A Dropbox acquisition would likely presage a whole round of other ones; Wikipedia lists 17 “notable competitors,” including Box.Net and YouSendIt, and there are others. Acquisitions would also help simplify the complicated market.
Although major players such as Apple, Google, and Microsoft already offer their own cloud storage solutions, the vendors might want to acquire other ones for their technology, their people, or simply to get them off the market, while other vendors (dare I suggest HP, which doesn’t have a great track record on acquisitions these days?) would do so simply to get a toe in the market.
Either way, it seems likely that something will happen to this market next year.]]>
The nice thing about three of them happening at once is that this makes it a Trend, so instead of addressing each acquisition individually, we can talk about What It All Means.
From the startup side, there’s really only three exit strategies you can have. You can die. You can file an IPO (like Violin also did last week). Or you can get acquired. If either the company isn’t strong enough, or the market isn’t strong enough, an IPO isn’t necessarily a good idea. So that leaves acquisitions. (We’ll assume no startup plans to die.)
Being acquired doesn’t mean giving up or throwing in the towel. Particularly in the case of the company doing the acquiring, it can be a good idea. It’s a quick way to collect a bunch of new people, a new technology, and perhaps some new customers. ”The vast majority (over 90 percent) of the successful private company exits in 2011 and 2012 have been through company sale or M&A,” writes Jim Price in Business Insider.
The next thing to look at is who’s doing the acquiring. Is it two small companies hoping that together they’ll be strong enough to survive? I don’t want to pick on Carbonite, but given the sort of year they’ve had, that might be a factor. Or is it a big company looking to add an innovative new technology to its portfolio? Certainly in the case of Microsoft and StorSimple, it’s the latter.
As far as what’s next, keep in mind that acquisitions tend to run in clumps. A new technology comes along, a bunch of little companies start up to use it, and then some of them die, some of them merge with each other, and some of them get acquired by larger companies — typically with the strongest players going first and the later ones being picked up by latecomers in the market who are desperate to own a piece of it, in sort of a high-tech version of Musical Chairs. For a big company, it can be a much safer way to innovate than trying to develop a new technology yourself.
We saw something similar a year ago, when Gartner did its first Magic Quadrant on E-Discovery, and predicted that 25% of the companies in it would be acquired by 2014 by major vendors. As it happened, Gartner didn’t even get the report published before the first acquisition happened, and they’ve been falling steadily like little dominoes every since — especially after Gartner conveniently provided a shopping list.
“Probably the prime imperative for Fortune 500 managers is to find areas for revenue and profit growth,” writes Price. “But the challenge is to do so without endangering the existing franchise. Too often, the dilemma from the helm looks like this: You know you need to get into a promising new space, but it’s quite unproven and you suspect running two or three concurrent experiments might bleed cash for years. So in a real sense, you can’t afford – on a quarter-to-quarter income statement basis – to run too many such risky projects. But if you let entrepreneurial startups run the experiments with their energy, time and capital – and let them ring out the technology risk and the market risk – then once a winner appears, you can buy that winner with capital off your balance sheet.”
Certainly Microsoft and StorSimple would qualify.
Since that’s the case, it seems likely that StorSimple competitors like Nasuni and Panzura — which were speaking with a great deal of bravado about the 800-pound gorilla suddenly in their midst — should be expecting to get calls from other large vendors in the next few weeks, and decide which startup exit strategy they plan to follow.]]>
What makes this a big deal? As Wikibon mentions, nearly 30% of Oracle shops are managing more than 100 TB of data that needs to be backed up. And with ‘big data’ becoming a buzzword, not only is the data getting bigger, but people are paying more attention to it.
Wikibon points out several trends, including increasing virtualization, more space devoted to backups, and that tape is still around. 45% of customers report that more than half of backup data resides on tape, Wikibon says.
But one of the newer backup choices that Wikibon mentions is RMAN. And the advantage to that is brought up in one of the other big recent developments in Oracle backup, which is RMAN’s newer ability to back up to the cloud.
That’s where the Amazon Web Services white paper comes in. It describes how Amazon itself started backing up all its Oracle databases to the cloud using RMAN. While such white papers are often pretty self-serving — and now we’re talking about one where a vendor is using its own product, or what EMC’s Paul Maritz refers to as “eating your own dog food” — this one has some hard numbers behind it.
“The transition to S3-based backup started last year and by summer, 30 percent of backups were on S3; three months later it was 50 percent. The company expects the transition to be done by year’s end — except for databases in regions where Amazon s3 is not available,” writes Barb Darrow for GigaOm. Moreover, the company is saving $1 million per year for backups that take only half as long, she writes.
Whether you want to go the AWS route for Oracle backups or not, the Wikibon report has some interesting information on the backup subject. Granted, some of them are pretty Mom-and-apple pie — implement redundancy, test your backups, use dedupe — but others are more nuanced.
For example, the company notes, organizations are increasingly virtualizing their Oracle servers — which could have an impact on the speed of backing them up. ”The big initial attraction of server virtualization is that it increased average utilization from 15% to about 85%,” Wikibon writes. “This means that virtualized environments will see a drastic reduction in overall server capacity, some of which was used to run backups.”]]>
The company, which is known for providing low-cost constant backups for its subscribers, is also known for building its cloud out of a whole lot of teeny (well, 3 TB) commodity disk drives rather than a few great big ones. This saves money and helps the company grow more granularly.
The only problem is if you suddenly run out of teeny commodity disk drives — or find that, in a matter of two weeks, that they’ve tripled in price, as BackBlaze did, when it was adding 50 TB of capacity a day. At the same time, the company wasn’t buying enough to be able to get deals from the manufacturers.
In an extremely detailed, hysterically funny blog post, the company is now relating how it dealt with the crisis — basically, by buying them as consumer commodities rather than as parts, and turning them into the parts they needed to build the “storage pods” on which their service was based.
“With our normal channels charging usury prices for the hard drives core to our business, we needed a miracle,” writes Andrew Klein, director of product marketing. “We got two: Costco and Best Buy. On Brian [Wilson, CTO]’s whiteboard he listed every Costco and Best Buy in the San Francisco Bay Area and then some. We would go to each location and buy as many 3 TB drives as possible.”
While the company then had to “shuck” the drives from their cases, this saved the company $100 per drive over buying them from its usual suppliers. Problem solved.
For a while.
“The “Two Drive Limit” signs started appearing in retail stores in mid-November,” Klein writes. “At first we didn’t believe them, but we quickly learned otherwise.” So workers started making the circuit – circled the San Francisco Bay hitting local Costco and Best Buy stores: 10 stores, 46 disk drives, for 212 miles. It put a lot of miles on the cars, and a lot of time, but it solved that problem.
For a while.
Then BackBlaze employees started getting banned from stores.
At that point, they started hitting up friends and family, and not just in the Bay Area, but nationwide. “It was cheaper to buy external drives at a store in Iowa and have Yev’s dad, Boris, ship them to California than it was to buy internal drives through our normal channels,” Klein writes.
(The company also apparently considered renting a moving van to drive across the country, hitting stores along the way — a variation on the “bandwidth of a station wagon of tapes” problem — but decided it wouldn’t be economical.)
By the time internal drive prices got to their normal level, the company had bought 5.5 petabytes of storage through retail channels — or more than 1800 disk drives. But finally, it could go back to its normal practices.
“On July 25th of this year, Backblaze took $5M in venture funding,” Klein writes. ”At the same time, Costco was offering 3TB external drives for $129 about $30 less than we could get for internal drives. The limit was five drives per person. Needless to say, it was a deal we couldn’t refuse.”
Disclosure: I am a BackBlaze customer.]]>
Virtualization. In talking about how under-utilized data center servers are, and in appearing to limiting himself to less than state-of-the-art facilities, Glanz failed to notice how prevalent virtualization is becoming, which enables an organization to set up numerous “virtual servers” inside a physical server — which, in the process, results in much higher utilization. ”[V]irtualized systems can be easily run at greater than 50% utilization rates, and cloud systems at greater than 70%,” writes Clive Longbottom in SearchDataCenter.
“[I]n many cases the physical “server” doesn’t even exist since everyone doing web at scale makes extensive use of virtualization, either by virtualizing at the OS level and running multiple virtual machines (in which case, yes, perhaps that one machine is bigger than a desktop, but it runs several actual server processes in it) or distributing the processing and storage at a more fine-grained level,” writes Diego Doval in his critique of the New York Times piece. “There’s no longer a 1-1 correlation between “server” and “machine,” and, increasingly, “servers” are being replaced by services.”
“Although the article mentions virtualization and the cloud as possible solutions to improve power utilization, VMware is not mentioned,” agrees Dan Woods in Forbes‘ critique of the piece. “If the reporter talked to VMware or visited their web site, he would have found massive amounts of material that documents how thousands of data centers are using virtualization to increase server utilization.”
Storage. Similarly, Glanz appeared to not be aware of advances in storage technology, even though some of them are taking place in the very data centers he lambasted in his articles. In Prineville, Ore., for example, not all that far from the Quincy, Wash., data centers he criticized, Facebook is working on designing its own storage to eliminate unnecessary parts, as well as setting up low-cost slow-access storage that is spun down most of the time.
Facebook — which does this research precisely because of the economies of scale in its massive data centers — is making similar advances in servers. Moreover, the company’s OpenCompute initiative is releasing all these advances to the computer industry in general to help it take advantage of them, too.
In addition, Glanz focused on the “spinning disks” of the storage systems, apparently not realizing that increasingly organizations like eBay are moving to solid-state “flash” storage technology that use much less power.
Also, storage just isn’t as big a deal as it used to be and as the story makes out. “A Mr Burton from EMC lets slip that the NYSE ‘produces up to 2,000 gigabytes of data per day that must be stored for years’,” reports Ian Bitterlin of Data Center Dynamics in its critique of the New York Times piece. “A big deal? No, not really, since a 2TB (2,000 gigabytes) hard-drive costs $200 – less than a Wall Street trader spends on lunch!”
Disaster recovery. Glanz also criticized data centers for redundancy — particularly their having diesel generators on-site to deal with power failures — apparently not realizing that such redundancy is necessary to make sure the data centers stay up.
And yet, even with all this redundancy, there have been a number of well-publicized data center failures in recent months caused by events as mundane as a thunderstorm. Such outages can cost up to $200,000 per hour for a single company — and a data center such as Amazon’s can service multiple companies. If anything, one might argue that the costs of downtime require more redundancy, not less.
Of course it’s important to ensure that data centers are making efficient use of power, but it’s also important to understand the context.]]>
Last week it was Facebook’s Sub-Zero. This week it’s Amazon’s Glacier.
In both cases, the vendors are offering low-cost storage for long-term archiving in return for customers being willing to wait several hours to retrieve their data — though, in Facebook’s case, the customer appears to be primarily itself, at least for the time being.
“To keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable,” says Amazon. “With Amazon Glacier, customers can reliably store large or small amounts of data for as little as $0.01 per gigabyte per month.”
A penny per gigabyte equals $10 per terabyte (1,000 gigabytes) – compared with $79.99 for the cheapest 1-TB external drive from Amazon’s product search, while Dropbox’s 1-TB plan costs $795 annually, notes Law.com.
The service is intended not for the typical consumer, but for people who are already using Amazon’s Web Services (AWS) cloud service. Amazon describes typical use cases as offsite enterprise information archiving for regulatory purposes, archiving large volumes of data such as media or scientific data, digital preservation, or replacement of tape libraries.
“If you’re not an Iron Mountain customer, this product probably isn’t for you,” notes one online commenter who claimed to have worked on the product. “It wasn’t built to back up your family photos and music collection.”
The service isn’t intended to replace Amazon’s S3 storage service, but to supplement it, the company says. “Use Amazon S3 if you need low latency or frequent access to your data,” Amazon says. “Use Amazon Glacier if low storage cost is paramount, your data is rarely retrieved, and data retrieval times of several hours are acceptable.” In addition, Amazon S3 will introduce an option that will allow customers to move data between Amazon S3 and Amazon Glacier based on data lifecycle policies, the company says.
There is also some concern about the cost to retrieve data, particularly because the formula for calculating it is somewhat complicated.
While there is no limit to the total amount of data that can be stored in Amazon Glacier, individual archives are limited to a maximum size of 40 terabytes and up to 1000 “vaults” of data, Amazon says.
While it doesn’t deal with the issue of data for software that no longer exists, the Glacier service could help users circumvent the problem of the “digital dark ages” of data being stored in a format that is no longer readable, notes GigaOm.
Can similar services for other cloud products, such as Microsoft’s Azure, or for consumers, be far behind?]]>
So here we are today, and we have a batch of cloud storage and cloud synchronization services — Box, Dropbox, Drive, SkyDrive, and so on, not to mention my venerable Qwest Digital Vault, which magically changed its name to the CenturyLink Digital Vault when CenturyLink bought Qwest.
I have quite an assortment of space kicking around — 25 GB with Google Drive, 2 GB with Dropbox, 5 GB with Box, and I think 7 GB with SkyDrive; I already missed out on a 25-GB offer there, and if there’s an easy way to find out what my capacity is there, I’m not finding it. (I can consider myself lucky I don’t have iCloud. I don’t think.)
My Digital Vault is a whole other kettle of fish. I thought I had 25 GB there — but when I try to log in, it doesn’t recognize my password. Then again, unless my mother has come back from the dead and changed her maiden name, it doesn’t recognize that, either, so perhaps I’m actually using the wrong ID — but the site doesn’t offer me a way to be reminded what that is, and so far I’ve gone through two separate kinds of chat sessions and neither of them can tell me, either. In any event, its website says it only offers 2 GB free, so that’s probably what I have.
(Plus I have BackBlaze for my actual backups, but I’m not counting that.)
That all totals up to 41 GB, which sounds pretty impressive.
The problem is, it’s not really enough to do anything with. It’d be great to store all my pictures in the cloud, so I could always retrieve them and have plenty of copies to keep them safe, but the picture folder on my NAS (aka “The Big Brick,” 2 terabytes) is 57 GB all by itself. Yes, some of those are duplicates — remember the part about copies to keep them safe? — and some of those are videos, now that I have a camera that can take both still photos and videos. Plus every few months or so I collect all my pictures from all the various sources and save those to the big brick, so they’re all together.
Yes, I should delete all the picture copies sometime — but I’m petrified about making a mistake, and who’s got time?
But for the sake of argument, let’s pretend that I’ve found something I can store in the cloud, something that fits. So then I have to try to keep track of which of my fistful of services I’ve stored it in. I also have to keep track of how to get into each one — something I’ve already demonstrated I have trouble with.
I can sign up for Spanning Stats for Google Drive, but that’s yet another site I have to remember to go check. Plus it turns out that it’s actually a sales tool to encourage you to sign up for its Google Drive backup service. Great. Google doesn’t back up its own cloud service?
That brings up my next level of fear — how do I make sure that what I put into the cloud is still there the next time I look for it? I certainly wouldn’t put my *only* copy of my pictures up there — what if there was a problem? What if the service went out of business? If I have to keep copies and worry about backups and synchronization with the cloud, too, then what’s the point?
(I’m not even worrying yet about the different levels of privacy that the different products offer — and I probably should.)
Spanning Stats also helps me only for Google Drive — I would need an application for Box, Dropbox, SkyDrive, and so on. So that’d be at least five applications and websites (each with their own user ID and password) that I’d have to remember.
What I really want is one thing that would check all my cloud storage systems and tell me what’s in each of them. And maybe while it’s at it, it could also keep track of all the various special offers I get for more free cloud storage space — and when they’re going to expire, and how to move the files around so I don’t get charged for anything. Because you know that’s coming, if it’s not already here. We’ve seen it with the credit cards.
Speaking of which, maybe someone could write an app like that for credit card offers, too.]]>