Conversations with IT people about long-term archiving usually begin by focusing on a specific storage device, and then it quickly becomes apparent that much more is involved. Addressing a long-term archive is a complex issue that requires education to understand. There is no single silver-bullet product.
The technology discussions include devices/media for storing data and the storage systems and features utilized. Storage systems that automatically and non-disruptively migrate data from one generation of a system to another are key to long-term archiving. I use the analogy of pushing something along in a relay race.
The information maintained in an archive is another key consideration. Information is data with context, where the context is really an understanding of what the data is, what it means, and what its value is. Maintaining information over time requires applications that understand the information, devices that can read the information, and a method for determining when the information no longer has value as part of a data retention policy. Kicking the can of information down the road for years when it has no value makes no sense.
The ability to read and understand the information years into the future is another major concern for long-term archiving. Without applications that do this, the issue of addressing long-term archiving becomes moot. I try to divide the problem into two parts. The first is defining information that is “system of record” where the data must be processed by the application to produce results. The simplest example of this is business records that produce reports, statistics, or other numbers. In this case, there must be a linkage between the information and the application.
If the application changes or is replaced, then the information also must be carried along with translation so the new app understands it. If not, the information no longer has value.
The second part of the application issue concerns information that needs to be viewable in the future where no application is needed. This case is created by putting the information in a viewable format that will persist for a long time. Today that would be a PDF document. At some point that may change and the PDF documents would have to be translated or transformed for the new viewable format, once again requiring a linkage between the information and application.
You must address all of these points for a long-term archive to achieve its goal of making information available and readable when it’s needed.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
The Storage Networking World (SNW) conference was disrupted Tuesday for a couple of hours when a tornado hit the Dallas-Fort Worth area in Texas. The day started out with cloudy skies, but in the afternoon a siren went off throughout Dallas, which was the first sign that something was amiss.
I was sitting in the outside balcony at the Omni Hotel and Resort in downtown Dallas interviewing an executive from Mezeo Software when news started circulating that a tornado was in the area. It didn’t take long before SNW attendees started heading for the windows to watch the strange swirl of clouds in the distance. Hotel personnel quickly started to order everyone on the balcony to move back into the hotel and away from the windows. But not everyone was willing to miss seeing the potential of a tornado hitting the city. Many people kept going back to the windows, pulling out their phones and taking pictures.
One SNW attendee, a meteorologist who has been chasing storms for 10 years, started arguing with hotel employees because he wanted to watch the cloud movement from a window while also tracking its progress from his iPad.
The exhibit hall was closed and sessions were canceled or delayed for at least an hour while everyone waited for the tornado to pass. A heavy rain storm followed after the dark, swirling clouds lifted, giving attendees plenty to talk about besides storage when the conference resumed.
This wasn’t the first time SNW was held at the site of nasty weather. The 2005 fall SNW was disrupted by a hurricane in Orlando, Fla., that prevented many would-be attendees of making it to the show. Fall SNW returned to Orlando last fall for the first time since the hurricane (Orlando had only been the site for the spring show in recent years). The fall SNW in October 2012 will be in Santa Clara.
Hitachi Global Storage Technologies, now part of Western Digital, today launched the first 4 TB enterpris hard drive.
The Ultrastar 7K4000 is a 3.5-inch 7,200 rpm SATA drive with a 2 million hours mean time between failure (MTBF) and five-year warranty. Current SATA enterprise drives top out at 3 TB, and HGST’s main enterprise drive rival Seagate has not yet released a 4 TB drive.
HGST VP of product marketing Brendan Collins said he sees the larger drives as a boon for big internet companies and cloud providers because they allow organizations to pack in 33% more capacity than they can now while reducing power by 24%.
“If you’re a massive data center running out of space and you have to react to petabyte growth, one way of doing that is replacing 3 TB drives with 4 TB,” he said.
OEM partners are qualifying the drives, and Collins said he expects them to ship in volume around the middle of the year. But some vendors may hold off shipping due to the transition to the new Advanced Format 4K hard drive sectors. In moving from 512-byte sectors to 4,096-byte sectors, Advanced Format handles large files more efficiently and improves data integrity. However, server and storage vendors must rewrite their software to support the new format.
The Ultrastar 7K4000 is known as a 512e (emulation) drive because it is configured with 4,096-byte sectors and 512-byte firmware that allows software written for the older format to work with the new drive format. However, there will be performance degradation during the translation process and Collins said some storage vendors might wait until native 512-byte versions are available later this year before shipping the drives.
“Storage system vendors design their own file systems,” Collins said. “Some are ready [for 4K] and can drop it in immediately with no impact. If they’re not ready, they can wait for the native [512-byte] version.”
Collins expects the largest storage vendors to use the 512e drives. He also said HGST will likely have a SAS version of the 4 TB drive later this year.
Violin Memory today picked up another $50 million in funding and a new strategic partner in SAP. If the market cooperates, it will be the last funding round before Violin follows its solid-state storage rival Fusion-io to an initial public offering (IPO).
Violin has pulled in $150 million in funding since former Fusion-io CEO Don Basile became Violin’s CEO in late 2009. Basile said Violin has grown from 100 employees to 320 since last June, and sales increased 500% over the last year. He puts Violin’s valuation at $800 million, which is probably more than 10 times its annual revenue.
Violin likes to bring in funding money from strategic investors as well as venture capitalists. Violin’s NAND supplier Toshiba has been an investor since the first funding round, and joined SAP as the largest investors in this latest round. Previous investor Juniper Networks and newcomer Highland Capital Partners were other investors in today’s round.
“We’re getting in the habit of this,” Basile said after closing his fourth funding round at Violin. “At the end of last year we considered the public market, but our bankers weren’t sure if the public market was open in the first quarter of 2012, so we took a mezzanine round. This gives us money to grow and operate regardless of market decisions.”
Violin sells all-flash storage arrays and caching appliances. Basile points to EMC’s VFCache PCIe caching product and its plans for a Project Thunder flash-based shared storage appliance that will compete with Violin as proof that the enterpreise flash market is poised to take off.
By Basile’s count, there are at least 30 companies selling all-flash arrays now, although he said Violin mostly competes with traditional storage vendors offering solid-state drives mixed in their hard drive arrays. Solid-state storage companies raised more than $300 million in funding in 2011, and have also been prime acquisition targets. “
Violin acquired the assets of Gear6 in 2010, and turned the technology into its vCache NFS Caching product. Basile said some of Violin’s latest funding may be used for small acquisitions to enhance its product line. “We’re an active reviewer of companies,” he said. “Expect us to acquire things that make sense to buy rather than engineer from the ground up.”
Disaster recovery in the cloud is improving by the day.
At least three vendors upgraded services in the past week, concentrating on faster recovery for small enterprises and SMBs.
EVault added a four-hour option for its EVault Cloud Disaster Recovery Service (EVault CDR) to go with its previous 24- and 48-hour SLA options. EVault is promising to have applications on the four-hour SLA up and running within that window.
EVault president Terry Cunningham said four hours is the magic number to gain critical mass for his company’s cloud DR service because it opens the door for heavily regulated businesses that cannot stand long outages for critical systems.
“This opens up the whole market for us,” Cunningham said. “One customer said, ‘When you deliver four hours, you get all our business.’”
He said the technology is available for more granular snapshots and shorter backup windows, making the four-hour SLA possible. The EVault service includes a minimum of one DR test per year, and customers can choose different SLAs for different applications. They can use the four-hour recovery for critical apps, and the longer recovery options for others. He declined to give exact pricing because it is set by EVault’s distribution partners, but the four-year SLA costs twice as much as the 24-hour option.
EVault, owned by Seagate, changed its name back from i365 to Evault last December.
Not everyone is so impressed with four-hour recovery. QuorumLabs promises instant recovery with its new Hybrid Cloud Disaster Recovery service that lets customers install one of the vendor’s onQ appliances on site and replicate to another appliance at a QuorumLabs’ off-site data center.
QuorumLabs’ hybrid service keeps up-to-day virtual clones of critical systems that run on the appliance or in the cloud. The service builds new recovery nodes continuously and the vendor says the cloud appliance can take over for failed servers with one mouse click.
“Compared to our offering – ready in minutes, tested daily – [four-hour recovery] is like a pizza delivery guaranteed to arrive sometime in the next several days,” QuorumLabs CEO Larry Lang said.
QuorumLabs already has customers who set up DR by installing appliances at two locations, but not all of its customers have a second site. “If something were to happen, we bring up an exact copy of that server in your cloud,” Lang said. “Users just redirect their client to the cloud. Literally in an hour they can have something up and running.”
QuorumLabs’ service is priced by the number of servers and the amount of data protected. Lang said a customer with 10 servers and 3 TB would pay about $20,000 per year.
Zetta also upgrade its cloud backup and DR service. Zetta’s DataProtect 3.0 uses the ZettaMirror software agent on the customer site and synchronizes data to one of the vendor’s cloud data centers. The latest version adds support for Apple desktops and laptops as well as Microsoft SQL Server and Windows system state, improves performance with compression and a metadata cache and allows snapshots of synched data.
EVault’s Cunningham said the cloud’s role in data protection has made the business more competitive. He said customers are re-evaluating their backup and DR processes and find it easier to switch vendors.
“It used to be that when you made a backup deal, it was for life,” he said. “We used to sell you some software and say ‘Good luck with that, hope it works out.’ Today it’s a service. We have to earn the business every month.
“The customer has more options for switching now. There are some technical challenges, but you can do it. If vendors screw up, they lose the customers.”
Atlantis Computing today launched Atlantis ILIO for Citrix XenApp, which helps reduce I/O and latency problems often associated with application virtualization. The product runs on a VMware vSphere hypervisor and is aimed at customers planning to virtualize XenApp 6.5 with Windows Server 2008 R2.
The new product is built on the same codebase as Atlantis ILIO for VDI but this new version is targeted at customers deploying application virtualization. Atlantis ILIO helps eliminate I/O bottleneck because it processes I/O locally within the hypervisor’s memory. It does inline deduplication to reduce the amount of data hitting the NAS or SAN.
Atlantis ILIO for XenApp is a virtual machine that is deployed on each XenApp server and creates an NFS datastore that acts as the storage for the XenApp VMs running on Windows Server 2008 R2.
“We correct the problem the way we do with VDI,” said Seth Knox, Atlantis’ director of marketing. “All duplicate storage traffic is generally eliminated before it’s sent to the storage. “
Torsten Volk, senior analyst for Enterprise Management Associates, said Atlantis ILIO for XenApp helps optimize performance because it sequentializes and dedupes the I/O traffic. He also said support for XenApp will broaden Atlantis’ market substantially.
“There is a much larger customer base for Citrix XenApp compared to the VDI market and only minimal changes to the Atlantis ILIO codebase were required to accommodate XenApp,” Volk said. “Not many are using VDI because the ROI is still unclear, but XenApp is a well-liked and vastly adopted platform that has provided tremendous customer value for over a decade.”
Knox said there are customers who ask for both products, but agreed there will be more demand for ILIO for XenApp.
“There is a much larger install base of people using XenApp,” Knox said. “Many of our customers use both VDI and XenApp, so they asked us to do a version for XenApp.”
According to the vendor, Riak CS lets customers store and retrieve content up to 5 GB per object, is compatible with the Amazon S3 API, has multi-tenancy features, and reports on per-tenant usage data and statistics on network I/O. Pricing for Riak CS starts at $10,000 per hardware node, which comes to about 40 cents per GB for a 24 TB node.
Riak CS is Basho’s second software application. Its Riak NoSQL database is based on principles outlined in the 2007 Amazon Dynamo white paper. While Riak is an open source application, Riak CS is not. Basho added multi-tenancy, S3 API compatibility, large object support and per tenant usage, billing and metering to Riak CS to make it a cloud application.
“We look at ourselves as an arms dealer of Amazon principles [outlined in the 2007 Amazon Dynamo distributed white paper],” Basho CMO Bobby Patrick said. “Riak CS is for large service providers looking for scalability and tenancy, and also large companies that want S3 without AWS [Amazon Web Services]. This is S3-compatible, but for a private cloud.”
He said several large multinational companies are evaluating Riak CS as a method of keeping important data in-house behind a firewall.
Riak CS is built to run on commodity hardware. Patrick said it will compete mainly with OpenStack Swift object storage, but it will also come into competition from EMC’s Atmos and software from smaller vendors such as Scality Ring and Gemini Mobile Cloudian.
“Any hosting company, any telecom company, any infrastructure-as-a-service company, is going to have to evolve from expensive shared storage to cloud storage for economic scale benefits,” Patrick said. “A new architecture is needed for that. They need to do it on cheap commodity hardware and in a way they can manage it.”
DataDirect Networks (DDN) launched two storage systems for people who want to start small in their approach to “big data.”
DDN is known for storage systems that deliver extreme performance and capacity but also carry large price tags. To try to broaden its market, the vendor this week introduced lower-priced arrays, including one that starts at $100,000 during introduction pricing that runs until the end of June.
“We found there are a lot of customers and prospective customers looking to start with DataDirect at a lower price and form factor while benefitting from scalability,” DDN marketing VP Jeff Denworth said.
The new systems are the DDN SFA10K-M and SFA10K-ME. The 10K-M scales to 720 TB with InfiniBand or Fibre Channel networking and with SAS, SATA or solid-state drives (SSDs). Customers can upgrade the 20u system to the larger SFA10K-X.
The SFA10K-ME is the same hardware as the 10K-M, but can be bundled with DDN’s GridScaler or ExaScaler parallel file systems. The promotional $100,000 price is for a SFA10K-M with eight InfiniBand ports, a 60-slot disk enclosure, and 16 GB of mirrored cache.
DDN says its new systems cost 40% less with a 57% smaller form factor than its larger SFA storage arrays.
“The news of dramatically smaller footprints and reduced-cost SFA entry points is not what we’re used to hearing from a company that is accustomed to extending the scalability and performance envelopes of big data applications,” Taneja Group analyst Jeff Byrne wrote of DDN’s new systems in a blog on the Taneja web site.
Denworth said the new systems fill a gap in DDN’s product line between the S2A6620 midrange storage for media/entertainment and high performance computing and the SFA10K-X high-bandwidth petabyte capacity platforms.
“Customers can grow the system as requirements and budget dictates,” Denworth said.
SFA10K-M customers can upgrade to DDN 10K or SFA12K systems, but they would have to take the systems offline. There are no non-disruptive upgrades.
How long does an organization keep a storage system? That depends on a few things. For disk systems, there are several driving factors:
• The length of the warranty period and the cost of a service contract after the warranty period.
• The depreciation period on the system.
These factors usually lead organizations to plan on four or five years before replacing their disk storage system.
For tape systems that use LTO technology, IT generally looks at how long new tape drives can be purchased to read their existing tapes. Each new generation of LTO tape drive can read tapes created on the two previous generations. The period for replacement of tapes (meaning migration of the data on those tapes) to a new generation is based on how long it takes for LTO tape generations to be released. It usually takes around seven years to get to the generation that cannot read the previous two generations.
When I speak with contemporaries of mine in other technology disciplines and reflect on the limited lifespans of storage systems, they find it hard to believe how short the lifespan is for storage systems. They usually say that, with the amount of investment made, a storage system should be kept for at least 10 years.
They understand the shorter lifespan better when I explain the pace that storage technology changes and the benefits from more frequent updates. These include:
• Greater efficiency in power, space, and cooling with new, higher capacity devices
• Improved performance with system support solid state technology
• New warranty periods for new storage systems rather than relatively expensive maintenance contracts for storage systems past their warranty period
• Improved reliability for new systems.
The discussion then shifts to how difficult it is to move to a new storage system, mainly because of data migration. Some storage systems automatically migrate data from an older storage system, especially if the migration is between different generations of the same system. If the migration is not transparent and automatic, it costs more to move to another generation of disk storage.
It gets more complicated when switching to another vendor or a different architecture from the same vendor. The new system may require administrators to provision and manage the storage differently than the old system. Administrators must understand the differences, learn new tools or administrative interfaces, and set up new procedures to monitor and respond to issues. These add to the acquisition cost when calculating TCO (Total Cost of Ownership) and pose a potential risk before being effectively implemented.
IT teams would obviously like a longer lifespan for storage systems, but changes in technology make tradeoffs skewed towards replacements at regular intervals. And as technology progresses, there may be a point that longer lifespan systems have greater economic advantages than what we have now.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
Do you ever wonder how long it would take to move a dozen terabytes from one cloud provider to another, or even between two accounts in the same cloud?
Probably not, if you’re sane. But maybe you do if you have data in the cloud and think you might want to switch one day for performance or pricing reasons. And you definitely do if you’re a cloud storage vendor that promises service levels that might require non-disruptive cloud-to-cloud migration.
Nasuni fits in that last category, so the vendor conducted extensive testing of what it considers the top three cloud providers based on the stress testing it conducted last year. The latest results are entered in its Bulk Data Migration in the Cloud report issued today.
In case you were wondering, here’s how long Nasuni estimates it would take to migrate a 12 TB volume:
• Amazon S3 to another Amazon S3 bucket: Four hours
• Amazon S3 to Microsoft Windows Azure: 40 hours
• Amazon S3 to Rackspace: Just under one week
• Microsoft Windows Azure to Amazon S3: Four hours
• Rackspace to Amazon S3: Five hours
Nasuni CEO Andres Rodriguez said transmission speeds vary depending on time of day, but the biggest difference is the cloud providers’ write capabilities because S3 had by far the best transfer times.
Nasuni determines the best back-end cloud for its customers, and usually selects S3 with Azure as the second choice. Nasuni’s competitors sell storage appliances and let customers pick their cloud provider, but Rodriguez said Nasuni picks the cloud provider to meet its SLAs.
“Our enterprise customers using storage in their data centers let Nasuni be the one to move data,” he said. “All customers want from Nasuni is storage service. They don’t care about which cloud it’s unless they want data in a specific geographic location. But that’s a location issue, not a provider issue.”
That means Nasuni customers can’t decide to switch providers based on pricing changes, but Rodriguez said he doesn’t recommend that practice.
“This is not an operation you want to be doing dynamically daily so you can save a few cents here and there,” he said. “You do it to take advantage of better features and performance.”