Tory Skyers’ post about dedupe and the law jogged my memory about recent conversations I’ve had with users about data compliance and archiving. It’s become a big topic for this industry, and as stewards of data, storage managers are part of the legal e-Discovery process.
But some storage managers are beginning to draw a line when it comes to the extent of their role in that process. A discussion about compliance only goes so far these days before frustration starts to show. Someone from a municipal government shop I met at Symantec Vision last week extolled the virtues of Symantec’s Enterprise Vault for data retention and said his organization has policies for dealing with litigation. But he was clear that his role in the process involves managing bits on disk, period. “I don’t delete anything without the department that owns it giving me explicit instructions,” he said. “It’s not up to me to decide to delete data–it’s up to me to keep the storage and backups running on whatever data departments want to keep.”
This week I spoke to a storage guy from a hospital about email management and archiving, and he told me his shop deletes all email after 60 days. “We wrote policies that say we don’t keep email very long because of the storage cost,” he said, and then added that he’d been told by some vendors pushing archiving that a short enough retention period could “make him look guilty.”
“I’m not guilty of anything,” retorted the user. “I’m an IT guy trying to keep email running.”
And he’s right. As long as a company’s retention policy is clearly defined and followed scrupulously, it can be just about any length of time.
As everybody and their uncle tries to get in on selling e-Discovery products and services, new players emerge and the competition gets fiercer. It sounds to me like this is leading some vendors to use scare tactics to push sales by exaggerating how much liability the storage people have when it comes to data compliance and retention. Analysts increasingly agree that organizations of sufficient size should dedicate a liaison between IT and corporate governance to oversee policy instead of tossing legal liability onto the shoulders of IT.
The problem is, IT people remain responsible for understanding and following policies. They also may be called upon to testify as to what those policies are. While I don’t think users should have to take on the legal burden alone, I hope they’re not being pushed too far in the opposite direction, so caught up in shrugging off false expectations that they aren’t mindful of the real ones.
Data deduplication is the poster child of 2008. Everyone is rushing to add this capability to just about everything that could possibly ever sit on a network–I thought I saw an ad for a cable tester with de-dupe built in! On the face of it, de-dupe looks like the savior it’s made out to be (except in very isolated instances where it actually inflates the size of stored data, but that’s another subject for another time.).
But take a look a little deeper with my paranoid, curmudgeon-y, semi-lawyer-esque hat on.
De-dupe technology has been likened to “zip” on the fly (no pun intended), which is where I have a couple of problems while wearing my pseudo-legal hat. The first is the act of compression. Way back in the olden days of computing there was a product appropriately named Stacker; its purpose in life was to allow you to fit more on the ridiculously expensive devices we had in our computer called “hard drives”. Microsoft, not content with Stac backing out of a licensing deal, created DoubleSpace (got sued and lost), then DriveSpace (DOS 6.21).
Via the use of a TSR (even the acronym is dusty), these products would intercept all calls destined for your hard drive and compress the data before it got there. Sound familiar? Those disk compression tools had their run, I used them but it presented problems with memory management, at the time Bill Gates decided no one would ever need more than 640KB, amongst other things. This presented a phenomenally large problem when I would load up one of my favorite games at the time from Spectrum Holobyte: Falcon 3.0, Falcon fans know what sorts of contortions one had to endure to get enough lower memory to run Falcon, but I digress.
So I would try to get around having Stacker or DoubleSpace turned on all the time. That didn’t work out well for me, and I spent quite a bit of time compressing and re-compressing my hard drive, enabling and disabling Stacker and DoubleSpace and setting up various non-compressed partitions.
While I don’t see that specific instance as an issue now per se, I do have that (bad) experience, and because of it I have a problem with something sitting inline with my data, compressing it with a proprietary algorithm that I can’t undo if/when the device decides it doesn’t like me anymore. Jumping back 16 years, it wasn’t that hard to format and reinstall DOS, which was a small part of my (then gigantic) 160MB ESDI hard drive, to get around the problems I had. But today when we are talking about multiple Terabytes and such, I want to be sure that I can get to my data unfettered when I need it.
The reason I am paranoid about getting access to my data when I need it: compliance and legal situations. Which brings me to my second point. How will de-dupe stand up in court? Is it even an issue? Is compression so well understood and accepted that it wouldn’t even be problem? Even as paranoid as I am I would have to say … maybe.
Compression has been around for a very long time, we are used to it, we accept it, and we accept some if its shortcomings (ever try to recover a corrupted zip file?) and its limitations, but will that stand up in court? In today’s digital world there are quite a few things that are being decided in our court systems that may not necessarily make sense. Are we sure our legislators understand the differences between a zip (lossless) and JPEG (lossy) compression? How does the act of compressing affect the validity of the data? Does it affect the metadata or envelope information? The answer to these questions, while second nature for us technology folks, may not so second nature for the people deciding court cases. Because compressing and decompressing data is a physical change to the data itself, I can imagine a lawyer trying to invalidate data based on that fact.
I hope that doesn’t turn out to be the case. The de-dupe products currently on the market have some astounding technology and performance. They also return quite a bit to the bottom line when used as prescribed, and the solid quantifiable return on investment they represent does for most outweigh any risks.
I had a technology demo Tuesday with Xiotech, where they showed off their new baby, the Emprise storage system. A technology demo might seem like a worse fate than death to most, but I appreciate the opportunity to get out from behind my phone and computer screen and actually see things in the flesh (or silicon, as it were).
Xiotech’s reps showed me a pre-recorded demo of the Emprise self healing process, including automated power cycling on a drive and the process of copying data off a drive to the others in its DataPac storage unit, remanufacturing the drive, and bringing it back online, restriping the data. Lots of blinky lights and bar graphs of I/O going up and down.
To say Xiotech officials are excited about Emprise would be a vast understatement. But in the midst of discussing power supply and airflow designs, SCSI command sets and their varying quality from device to device, future storage media such as solid state drives, and parallelized application performance, a little light bulb suddenly went off in the back of my mind.
“What ever happened to Daticon?” I asked. I’ll admit it was something of a non sequitur but it occurred to me at random.
There was a pause. Marketing communications guy looked at CTO Steve Sicola, Sicola looked back at marketing communications guy. “Well, there was a press release last week…”
Last week I was dead to the world beyond Symantec, but it doesn’t appear this press release was exactly heavily broadcast, either: as of June 6, Daticon has been sold to Electronic Evidence Discovery Inc. (EED). According to Xiotech director of marketing communications Bruce Caswell, “the opportunity to buy [Seagates Advanced Storage Architecture (ASA) group] came to light about a year ago, and we had two opportunities to pursue: e-Discovery and storage. We had to decide what we really wanted to pursue.”
He added, “that’s why we announced some evidence management solutions with Daticon and then sort of went dark.”
Xiotech also went dark for about nine months before the Daticon acquistion. At the time, Mike Stolz, vice president of corporate marketing, said “adding this functionality gets us out of day-to-day combat with EMC and IBM…evidence management and data discovery evolve around the storage system but at a higher level.” That made it appear that Xiotech would transform from a general storage array vendor to an ediscovery specialist.
Now Xiotech appears to be putting all of its resources into the Emprise and and its relationship with drive vendor Seagate, which owned Xiotech at one time and remans the sole drive supplier for the Emprisse (it has to be for the drive diagnostic firmware to work). Generally, array vendors use more than one manufacturer to force better pricing and overcome manufacturing anomalies, which crop up from time to time for particular suppliers.
Sicola says Xiotech has a contract with Seagate made to keep raw material costs competitive, but otherwise Xiotech makes no apology for slightly more expensive components, whch also include fans and power supplies engineered to use the same bearings as the disk drives, cutting down on vibration within each DataPac. Xiotech argues that spending more on better parts cuts down on failure rates, SCSI errors and services costs. “You can build a better mouse trap, but you need better parts,” he told me today.
As for manufacturing anomalies affecting whole batches of disk drives, “even when they reach epidemic proportions, they affect 10% of the product on the market,” Sicola said. “Problems with vibration, cooling and bad controller software make them worse–we want to fix that stuff by getting down to clean code.”
What do you think? Does that approach sound risky, or clever? Does ISE seem like another false start a la Daticon, or is it really the next big thing for Xiotech?
Better late than never. Backup software vendor Atempo has ventured into the email archiving market by coming out with the first full integration into its product line of intellectual property it acquired with Lighthouse Global Technologies in February.
Obviously, the release of an email archiving product is hardly earth-shattering. That market is headed for maturity very rapidly. Atempo knows this, which is why it actually released its email archiving software, called the Atempo Digital Archive for Messaging (ADAM), after releasing its file archiving software (ADA).
The first edition of ADAM will be integrated with ADA from the get-go. Atempo has also included features that not all of its predecessors have, such as message stubbing and support for Lotus email. But it’s the file archiving integration, according to Atempo’s VP of marketing Karim Toubba, “that shrinks a competitive landscape of more than 20 players down to just a few.”
In another attempt to differentiate ADAM, Atempo is using search from Exalead, rather than the more commonly used FAST or open-source search engines. This allows for automated retention according to message header info for e-Discovery.
“The downside for Atempo is that their brand is associated with Apple and the Mac,” said Enterprise Strategy Group analyst Brian Babineau, referring to Atempo’s TimeNavigator backup software. “On the positive side, they have a strong European and channel presence.”
The march of 2.5-inch SAS drives into networked storage took another step today when Dell launched its PowerVault MD1120 storage expansion enclosure. The enclosure is designed with the small form factor drives.
The MD1120 expansion closure isn’t a SAN, but a JBOD that connects to Dell’s PowerEdge server. And it’s not the first external storage system with 2.5-inch SAS drives — Infortrend has been shipping one since January. But Dell will obviously drive a lot more adoption than Infortrend, and Dell execs expect 2.5-inch SAS drives to co-exist with 3.5-inch drives in SANs before long.
“We see 3.5-inch drives being relevant for a long time in external storage, with 2.5-inch becoming a relevant complement in the next few years,” said Howard Shoobe, Dell’s senior manager of storage product management.
Small form factor drives allow for denser enclosures and reduce power consumption, but capacity is the main inhibitor for their inclusion in enterprise SANs. The new Dell enclosure holds 24 drives that are either 10,000 RPM 146GB or 15,000 RPM 73GB models. Shoobe expects the tipping point to come when 300 GB SAS 2.5-inch drives are shipping. Seagate has announced a 300 GB 2.5-inch drive that should begin shipping in systems later this year. Shoobe says Dell will incorporate them into the MD1120 when they’re available.
“The capacity we offer today will double, and that’s the trigger point,” he said.
Dell isn’t giving a forecast on when we might see 2.5-inch drives in its EqualLogic PS iSCSI SANs, and certainly not for the Clariion systems it co-markets with EMC. But even if it takes longer than expected to show up in enterprise SANs, Dell sees 2.5-inch SAS helping to give a new life to DAS because of the small form factor and coming bump from 3 Gbps to 6 Gbps.
“We have invested in DAS while the rest of the industry has been abandoning it,” said Praveen Asthana, global director of storage and networking for Dell. “DAS boxes are becoming more capable, especially with SAS. Why are we getting excited about a DAS announcement? It’s big business for us, and it’s growing.”
It never ceases to amaze me how easy it is sometimes to turn grown men into kids again. IT geeks gathered at the robotics competition at Symantec Vision actually giggled with delight at the mechanical violence.
The Geek Squads battle it out Continued »
Copan has enhanced its Revolution 300 Series to beef up its virtual tape library capabilities (VTL) and try and keep distance between the MAID pioneers and those who have followed with disk spin-down products.
The most interesting Revolution enhancements involve data deduplication that Copan added late last year for its VTL via an OEM deal with FalconStor. Now Revolution customers can set up a 40-drive cache landing zone, which supports more than 1,000 concurrent data streams. Up to 40 drives will run separate from the MAID pool, so those drives always spin and increase the ingestion rate while deduping. The cache will spin down with the rest of the drives after it finishes ingestion.
Copan also added a hot standby deduplication option that provides a spare dedupe engine that replaces a failed unit for high availability deduping.
Other enhancements include support for 1 TB SATA drives that bring maximum capacity to 896 TB in a single frame, data shredding to destroy tape data and tape caching to automate moving data from the VTL to physical tape.
Copan was the first to deliver MAID systems in 2004, but rivals Nexsan, Hitachi Data Systems, and EMC have since come out with their own spin down drives. But Copan CTO and founder Chris Santilli says there is more to MAID than spindown, and the new enhancements make Copan’s MAID more enterprise ready than the competition.
“MAID does not equal spin down,” Santilli said. “There’s more to it than saving power by spinning down drives. Enterprise MAID is a combination of density and reliability, and adding software services and features. We ingest as fast as we can, stage the data on MAID, and now we can do dedupe, replication, and encryption on the data.”
Analyst Mark Peters of Enterprise Strategy Group agrees the caching and other enhancements make MAID more valuable than merely spinning down disks.
“Caching shows they understand there are people who want to get a lot of data in their system fast,” Peters said. “If 25 percent of your disks are doing something else, that creates a problem. They created a side stream to the main river. This is a special section where you can keep the drives on.”
At a Storage Foundation and Veritas Cluster Server roadmap session at Vision on Thursday, a Symantec exec revealed it will be coming out with its own clustered NAS system, based on the next generation of its Storage Foundation Scalable File Server. This will be accomplished by layering a NAS personality onto Symantec’s existing clustered file system.
“We’re going to leverage our file system know-how to deliver next generation object storage for cloud computing,” said Rob Soderbery, senior vice president of the storage and availability management group.
The system will mostly be used as the back end for Symantec Protection Network SaaS offerings, but will also be available to service-provider customers, according to Soderbery. Currently called Symantec Secure Scalable Storage (S4), the new system is slated for an alpha later this year, beta early next year and live availability for SaaS in mid-2009.
By putting S4 behind its backup SaaS, Soderbery said, Symantec would be able to offer users online access to files backed up through SPN or the Backup Exec SPN integration. “The backup use case would blur with the storage SaaS use case,” he said.
Other roadmap items for Storage Foundation and Veritas Cluster Server highlighted in the presentation:
- Heterogeneous clustering between server types, OSes and physical and virtual servers with the rollout of the new VCS One product later this year. This builds on an early adopter version of the product released last year called Veritas Application Director. VCS One will use a policy master, and the goal is to support up to 256 mixed OS nodes for multi-tiered application-based HA and DR.
- Change management through Command Central Storage, also due out later this year. In addition to both proactive and reactive change management analysis for the primary storage environment, the product will also track the impact of changes to the DR plan, and allow for policy-based enforcement of configuration standards.
- Symantec will also be rolling out Veritas Operations Services. These services are Web-based configuration management offerings. SFPrep, a utility that checks OS versions, patches, etc. when Storage Foundation is installed, is in beta testing. It will also allow users to submit “goal builds” for review by Symantec’s engineers, who will tell them whether they’ll work or not, and offer remediation if they won’t. “We want to cut out the cycle of deploy, problem, fix,” Soderbery said. He added that out of 800 configurations that had been submitted so far, 25% were problematic.
Ease of use and solving compatibility issues were themes among the roadmap/user feedback sessions at the conference Thursday. Users in a NetBackup roadmap session asked for common management tools to be made avialable for NetBackup and Backup Exec in mixed environments, and for integration with Active Directory and LDAP.
In another session on upgrades to NetBackup 6.5, however, users and analysts praised a newly avialable upgrade process for media servers through LiveUpdate. Previously, updates were only made to the master server through the utility. However, users said they hoped that in future, LiveUpdate could be delivered as a preconfigured virtual appliance, rather than requiring users to set up a separate physical or virtual host on their own to run it.
After Symantec confirmed its acquisition of backup SaaS partner SwapDrive on Wednesday, I sent out some questions to Symantec. Here are the responses I got back from a spokesperson:
How will Symantec integrate SwapDrive into Symantec Protection Network?
SwapDrive offerings are focused on consumers while the Symantec Protection Network is focused on the needs of small and medium businesses. Symantec will continue to offer both. The needs of consumers and businesses can be quite different. We expect, however, that consumers, businesses, and partners will all benefit from knowledge sharing that will take place between the SwapDrive and SPN teams.
Is it true that SwapDrive doesn’t backup open files?
SwapDrive accommodates open files differently across the various implementations. SwapDrive is designed to back up files by automatically shutting them down – backing them up, and then re-opening them transparently to the end user. Further, SwapDrive will implement other “open file” backup techniques as partners request them.
Will SwapDrive add a Mac client?
SwapDrive’s web-based applications, such as the SwapDrive File Sharing and WhaleMail work on all major platforms – including Mac. For example, there are many Mac WhaleMail users.
SwapDrive’s pricing for 2 GB / year is $50–EMC’s Mozy offers this amount for free. Any plans to change that pricing?
SwapDrive’s current online pricing will keep pace with the market and the value derived. Our service is more robust and redundant than many others offered in the market today. We will constantly innovate and price for the market and value we provide. Services included in some of our products (ex. WhaleMail for sending large files) are not offered by other low cost providers.
SwapDrive also supports numerous partners who offer storage to consumers via different arrangements. For example, Norton 360 includes 2GB of storage as part of the purchase price product. (MSRP $79.99)
In keynotes and 1:1 executive briefings at Symantec Vision this week, Symantec officials have opened the kimono about plans for future integration of their products, including integration of software pieces from storage and security units.
It won’t be product integration per se, as has been done with individual backup products NetBackup and PureDisk because changes to product-level code across different disciplines of IT could be an impediment to adoption for users, according to CTO Mark Bregman. “That kind of integration suggests things bolted together, and our approach will be to let the separate products talk to each other,” he said.
Symantec will use IP acquired in its Altiris acquisition, extracting data through Web Services standards and APIs for legacy applications. Some products are already shipping with the necessary Web Services support today, such as Symantec Endpoint Protection and Backup Exec System Recovery (BESR). The overall integration platform, which will be included free in products going forward, is referred to as the Open Collaboration Architecture, or OCA, within Symantec. A small team of engineers has been assigned to build the connectors to it for each product. Currently, applications that support it like BESR can issue reports through the architecture, but taking action is still a ways off, Bregman said.
One of the use cases for OCA discussed by execs is endpoint virtualization using a combination of IP from Symantec’s AppStreams and Vontu. The technology is tangential to storage, but will be Symantec’s way of addressing what it calls the consumerization of IT–i.e. the use of mobile devices for work and personal computing. According to Enrique Salem, AppStreams’ application streaming software would make an application and its data temporarily available on a mobile device. To keep corporate data from floating around on personal devices, Vontu’s data loss prevention software would track data created on the mobile device and clean up the data once the AppStreams session is over. Vontu’s software could also prevent users from forwarding sensitive corporate material elsewhere. OCA underpins all of this, and Salem said Symantec is rolling out the pieces today. It remains unclear if Symantec will prepackage this as a storage security application for mobile devices, but Salem said it can be put together today for users want it through Symantec’s professional services.
Users around the show say it’s a nice idea, but there are improvements they’d like to see to Symantec’s existing frameworks first. Users of Symantec’s OpenStorage API integration with Data Domain say there’s room for improvement there–with the current version, users must manually select a replicated copy of data from a secondary site in the event of an outage.
Elsewhere, a storage director for a financial company said he’d like to see summary reports from Symantec’s management products that put data into business terms like RTO and RPO (Symantec CEO John Thompson said Symantec would prefer to hook OCA into third party reporting tools like Crystal Reports, because “one organization’s great report is another organization’s pain in the butt.”). This user also said he’d like to see more product-level integration. “Anything with catalog integration is high on our wish list,” he said.