According to the New York Times, Yahoo! will offer subscribers unlimited email storage on their free webmail accounts starting in May. The company currently has a 1 GB mailbox limit; the move was attributed to “explosive growth in the size of attachments as people share ever more photos, music and videos via e-mail,” but we think it might also have something to do with rival Google making more and more noise in the email storage space.
As backup software vendors are discovering, being flexible is the name of the game when it comes to incorporating the management of some of today’s hottest storage technologies – CDP, data classification, data de-duplication or integration with VTLs – into their backup software.
I am also finding that when one tries to get updates from these companies, one needs to exercise some flexibility as well. Though I had indicated in my last blog entry that I planned to cover Symantec in this month’s blog, the two of us could not get our schedules in sync. So instead I spoke to CA and CommVault and plan to cover Symantec’s NetBackup in more detail next time – or so I hope.
This month I began by talking to Kelly Polanski, CommVault’s Director of Product Marketing, and during our conversation she gave me a statistic that set me back. She said that nearly 80% of CommVault Galaxy’s customer base already uses disk as their primary backup target – either in the form as a virtual tape library (VTL) or disk-as-disk.
This stat caught me off-guard since it contradicts what I have heard to date. For example, Bocada, an independent data protection management software product which reports on all major backup software products, recently told me they still typically see 75% of their customers using tape as their primary target for backups.
So, I did some checking to see if CommVault was like Superman in the backup software space or if other backup software vendors were seeing similar increases in their percentages of customers using disk as their primary target for backup.
Neither CA nor EMC could provide any definitive numbers as to what percentage of their customers were using disk as a primary target for backup though both know that their numbers are growing. Symantec had some numbers to share as they had recently completed an internal survey of 200 of their customers and found that 63% of them now use some form of disk-based protection.
On a side note – I do have to congratulate EMC on their strategy of boosting (inflating?) their numbers – devilish though it may be. EMC is finding more of their customers switching to disk, but they conveniently ship NetWorker with their VTLs. How much NetWorker functionality and licenses that EMC includes with each VTL I’m sure surely varies by how many billions of TBs of storage the customer buys. But, it should come as no surprise to anyone that backup to disk is escalating in new deployments of NetWorker in EMC customer sites.
Sarcasm aside, this rapidly rising rate of users backing up to disk numbers increases the urgency for backup software vendors to integrate the management of each of these different technologies. For as time-consuming as it is to log in to manage each CDP, replication and backup product, it becomes even more difficult to create a consistent set of policies across these products that ensure the level of data protection and recovery matches the application’s requirements.
Of course, the difficulty arises from the fact that each of these different products usually makes its own copies of data, has its own database and is driven by its own policy engine. From a global management perspective, this makes it almost impossible to achieve any consistent method of locating the right copy of data, applying policies centrally or really knowing where anything is.
Both CA and CommVault (I know, it took me a while to get here) address these issues but are taking different paths to do so. This month (March 2007), CA is releasing a service pack (SP) for their BrightStor ARCserve backup softwarethat will more closely tie together their ARCserve and WanSync replication software. This SP provides ARCserve with an interface into the WANSync product and allows ARCserve to backup copies of data that WANSync creates. While a step in the right direction, this is more of a patch job than anything really innovative.
CA’s longer term plan is much more intriguing, if they can pull it off. CA is leveraging its acquisition of MDY Group International and its enterprise records management software (soon to be named CA Records Manager) that they completed in June 2006 to lay the foundation for enterprise-wide policy management for any product database.
According to Kristi Perdue, CA’s Product Marketing Director for Information Management products, the CA Records Manager will provide users a centralized policy engine that they can apply to any vendor’s product data repository. Configured modularly with an open architecture, it permits organizations to use a common set of policies for any vendor’s replication or backup product. (I ought to be in marketing for CA, you think?)
Overall, not a bad idea, but unfortunately at this time it is still vaporware. Even though CA’s Perdue describes CA’s integration efforts as “very aggressive” in this area, I wouldn’t expect to see a product release from CA for at least another year.
Of the two, at least CommVault’s technology is real. All of their replication products – Galaxy, QuickRecovery, and ContinuousDataReplicator – use the same underlying database and share a common set of policies. It even extends to setting policies for performing data archiving and data migrations which is great – assuming you are using their product exclusively on all of your servers.
This is my main concern about CommVault, unless you are exclusively using CommVault’s product, you may still have to bring in something like CA’s Record Manager to manage CommVault along with all of your other backup software products. But, whether that is a flaw in CommVault’s product design or a larger indicator of how enterprises run their businesses or let their businesses run them is a topic for another day.
Hitachi Data Systems CTO Hu Yoshida has an interesting post up on his blog that predicts storage is headed for a “bust” period in terms of petabytes shipped per year. He’s basing this in part on IDC’s recent numbers which show that server shipments are down thanks to virtualization. Yoshida predicts a similar boom in storage virtualization will improve utilization on storage arrays, staving off additional shipments of disk in the coming years.
In 1999 we had a 100% growth in 1999 during the tail end of the dot com boom and the run up to Y2K. In 2000 and 2001, we saw the rate of capacity growth slow down sharply as the industry went through a period of consolidation after the excesses of the boom and Y2K preparation. I believe we are in a boom cycle now and are headed for another bust.
I believe we are ready for another round of storage consolidations, which will drive the growth rate down below 50%.
He’s got a point: the name of the game in storage currently is consolidation and improved utilization, and it’s clear users are serious about finding ways NOT to throw hardware at a problem. It’s also unusual for a major storage vendor to predict any kind of decline in their market, and Yoshida is among the most knowledgeable names in the business–so we’re paying attention.
But having increased storage efficiency as the acknowledged goal and reaching that goal are two different things, after all…and the cynical side of us is tempted to think this is a rationalization of IDC’s recent storage numbers, which showed Hitachi squarely in fourth place in most external disk categories with growth rates for the fourth quarter of ’06 hovering between 2 and 5% (EMC, IBM and NetApp were all consistently showing double-digit growth rates in these same numbers). And the utilization angle doesn’t necessarily explain why HDS fell 37.9% year over year in storage device management software and 24% annually in the IDC software tracker either.
Curiouser and curiouser.
Another software company riding the Google wave came to our attention this week–Datacatch, an Australian company which already markets an indexing tool for offline media written by Windows clients, including tape, CDs, DVDs, Blu-Ray and HD-DVDs, as well as flash drives.
Datacatch has been marketing its Data Librarian product for $39 to small-office and home-office users, as well as small enterprises. This week, the company announced a new free plugin to Google Desktop that will merge Google Desktop Search with Data Librarian, based on Google’s API for developers.
According to Datacatch’s CEO Lindsay Lyon, the product currently is Windows client-based and can’t be run centrally, but updates are planned for the third quarter of this year that will create a networked-storage version of Data Librarian could make it possible for midsize organizations to add Google Desktop to their backup clients, a far less expensive proposition than something like Index Engines’ enterprise-level eDiscovery Appliance, which performs similar searches on offline tape media starting at a cool $50,000.
Granted, Lyon admitted, Data Librarian “is not intended to be the panacea for e-discovery and compliance”; if you have strict regulations you’re better off with an enterprise-class indexing product. But one place Lyons said the product could fit in the enterprise is as a personal assistant to enterprise IT pros managing hundreds of CD-ROMs worth of development software licensing subscriptions, for example.
“A lot of software developers also archive code to DVD in test and development environments,” Lyons said. “Most IT pros have dozens of thumb drives, CDs or DVDs with licenses or installation files on them and other removable media in use that aren’t necessarily a part of the company’s main backup workflow.”
The product can be purchased online at http://www.datacatch.com/purchase.html.
EMC and Microsoft announced a partnership today under which Microsoft will integrate EMC’s Smarts network discovery and modeling software into future versions of Microsoft’s Systems Center Operations Manager. The companies also said they will be working on common models for networking and storage, going forward.
In October 2006, EMC said it would integrate its Documentum enterprise content management system with Microsoft’s Office and SharePoint 2007, SQL Server 2005, and enterprise search offerings.
And prior to that, in January 2006, EMC strengthened its Microsoft competencies by acquiring Internosis Inc, a specialist consulting and service provider for Microsoft shops.
These days, its tough to ignore the ever-present coverage of security breaches and identity theft in the news. In this podcast, storage security expert Kevin Beaver offers practical answers to the most common security questions he hears from storage pros today.
Download the Storage Security FAQ podcast.
Kevin is a frequent contributor to SearchStorage.com, check out some of his recent storage security tips below.
Elsewhere on the internerd, you might want to check out this recent webcast on storage security by Jon Toigo.
Then, go lock the doors and windows.
The Burton Group put out a press release this week warning of some “gotchas” with Google’s software as a service (SaaS). Storage Soup caught up with Burton Group enterprise search and records management analyst Guy Creese today to chat about his take on Google Apps Premier Edition. (GAPE offers, among other things, 10 GB email inboxes, which we covered over on the news page a few weeks ago).
Storage Soup: So. Let’s talk Google. Are the issues with software as a service in general or Google specifically?
Creese: There are not too many generic gotchas for software as a service. In other words, in four to five years where that’s been available, I think a lot of companies have gone from being slightly nervous about it to realizing it’s a valid form of service delivery. You look at for example Salesforce.com and its imitators as well as the Web analytics vendors, and there are a lot of large corporations using software as a service. So from my point of view it’s less of a generic issue with software as a service and more whether this specific [Google] application is appropriate for that [market]. [GAPE] is sort of a “ready, fire, aim” thing from a product point of view.
Storage Soup: How so?
Creese: Well, for example, a lot of the offerings in Google Apps Premier Edition are still pretty rudimentary. There’s apparently a limit on sending out emails to more than 500 people in a day, so if you do that, then your account gets temporarily suspended. A lot of this behavior is really because of the way Google has done it which is basically to take Web apps and move them over to the enterprise division. I think that’s a holdover from worries about spam, whereas typically in an enterprise you don’t worry about that with employees. The workaround that Google recommends for that, if, let’s say, you’re sending out an email to 10,000 employees once a month, is to just set up separate accounts and then you can send out 500 per account. Which I think is not quite appropriate [for the enterprise.] At the moment there’s no distribution lists, and nothing comparable to PowerPoint in terms of presentation capability, although I’m sure they’re working on that.
Storage Soup: What about records management?
Creese: They do offer email archiving via a partner, Postini. But there’s silence–I’ve asked and never received an answer–on archiving for documents and spreadsheets. So that’s a bit of a worry, because you’re thinking, okay, two years from now the SEC calls me up, and says hey, what about this? And so then let’s assume everybody’s filed tons of documents, although you may be able to get at the documents, part of electronic discovery is only giving over the pertinent ones and not handing over everything. So you can do search, but it’s not always that great. You’re kind of stuck, because Google has your documents, but there’s no possibility of Google sending a really huge file transfer so you can then take those documents and use whatever electronic discovery software you want on them. They haven’t really thought about that.
Now, to be fair, this seems to be a blind spot with a lot of software as a service applications. The emphasis has been on a service that’s quick to put up and easy to maintain and so on, but for a short while there has been a lot of legacy thinking. When I talk to software as a service companies, one of my stumper questions is often, “So how do you handle records management?” And there’s often a long silence.
Storage Soup: What about concerns about privacy and security?
Creese: In the security spectrum of things, for me a higher concern is still the records management part as opposed to the intrusion part. They do this day in and day out and have dogs and armed guards and all kinds of stuff. Salesforce.com for example certainly has corporate information that’s pretty valuable to people, and companies who’ve gotten used to the idea that Salesforce is their agent and it’s not going to march out and sell their information. From an intrusion point of view, I certainly haven’t heard people worry about that.
Storage Soup: Anything else that’s related to storage that you feel is important to bring up with software as a service?
Creese: These companies will eventually get to the point where they can’t save everything. Even with storage prices dropping, as more and more corporations put their data into software as a service there’s going to be a tipping point coming, where either it starts to become expensive to save everything for the service and the service therefore raises its rates, or it’s just too difficult to find what’s there. It’s sort of like having an 8 million volume university library and no card catalog. The information is there but it’s as if it weren’t there because it’s not retrievable. I just think it’s something that enterprises should ask about. When they host this in internal software, they worry about that.
Storage Soup: But doesn’t their search engine answer that concern? Wouldn’t that be Google’s answer?
Creese: Yes, but it’s keyword search, so ultimately you run into the problem of what if you didn’t use the same term to describe the same thing? That’s what often comes up in electronic discovery, where somebody calls something one thing and someone else in the business calls it something else, and you sit there racking your brain trying to think of all the different ways somebody could refer to something. There are search engines that are much more concept-based, so for example if you have a physician saying “myocardial infarction” and a lay person saying “heart attack”, that those are viewed as documents of the same thing, even though the keywords are completely different.
The good news is that with Google getting into this, they’re going to be putting tons of resources into it, and the service will improve by leaps and bounds every month. But at the moment I would say it’s still a work in progress.
TeraCloud has introduced a pay-as-you-go pricing model for its SRM software that includes the ability to run at least one level of the package for free–indefinitely.
TSF Express, newly introduced as part of this program, in an SRM tool compatible with Solaris, Linux, Windows and AIX. TSF Express provides daily collection of adhoc reporting capabilities through a Java interface.
Users interested in running TSF Express can go to the website and download a free trial for 90 days without entering any information. If they want to keep using the software, registration is required, but a free year-long license is available with that registration; if in a year’s time the user still wants to run the software, they have to register again, but there is still no charge.
The Express version of TSF includes the ability to gather host-level storage metrics including how many drives and volumes are assigned to a host, how many files directories and domains, including the ability to drill down within individual hosts as well as summary reports. TSF Express can only provide historical information for up to 3 days.
TSF Express can be converted directly, with another set of software licensing keys, to TSF Light, which costs $395 per month for up to 20 terabytes of managed storage and more detailed reporting. “And if you don’t use it,” said Teracloud CEO Gary Tidd, “You don’t pay for it.”
TSF Lite includes longer historical reporting, launch actions, which allow users to create scripts to manipulate the environment based on the SRM tool’s findings, the ability to group servers by application, trending analysis and a topology viewer. Both pieces of software require host agents.
The products will be available through the company’s Web site at www.teracloud.com as of tomorrow. This is Teracloud’s second attempt to reinvent itself as an open-systems storage company, after 12 years specializing in mainframe before its first attempt at repositioning in 2000. The company was brought back into “stealth mode” to develop this latest product in 2003.
Our story on Google’s storage assistance to academic and research institutions focused on the Archimedes Palimpsest, but this article in Wired has some interesting further info on the Hubble telescope project, which was also mentioned but not interviewed for our piece.
How do you get 120 terabytes of data — the equivalent of 123,000 iPod shuffles (roughly 30 million songs) — from A to B? For the most part, the old-fashioned way: via a sneakernet. It’s not glamorous, but Google engineers hope to at least end the arduous process of transferring massive quantities of data — which can literally take weeks to upload onto the internet — with something affectionately called “FedExNet” by the scientists who use it…The near totality of all the astronomical data and images that Hubble has ever collected [is] about 120 terabytes.
Do also check out the glamour shot of Google’s open source program manager Chris DiBona posted with the article–we reckon we’ve never seen such a creative executive headshot.
In this hilarious post over at StorageMojo.com, an EMC lawyer issues a “cease and desist” order over the recent publication on the site of an EMC price list, calling it a “trade secret.” He uses some ominous language indeed in his missive, which is reprinted in full by StorageMojo blogger Robin Harris.
Now that you know the facts of the matter I expect an email from you confirming that you have examined the links and documents provided above and that you now understand that EMC’s price list is not a trade secret, despite what you were led to believe by the person who referred StorageMojo.com to you.
Also, you might want to consult with EMC’s public relations and analyst relations groups as to the advisability of continuing to press confidentiality claims against StorageMojo. The internet community – StorageMojo.com had over 100,000 visitors last month – does not take kindly to attempts to limit the free flow of information and First Amendment rights.
We have to say we also got a chuckle out of the Obi-Wan Kenobi reference.