Every week, I visit IT professionals and I often hear the same complaint about dealing with a file server environment that has grown out of control. The problem is that these file servers have millions of small files and customers are looking for ways to better protect this file data.
Second, disk-based archiving truly fixes areas of the backup that most D2D solutions do not. Customers are highly frustrated with backup applications stumbling over what I call the “millions of small files issue.” This is primarily caused by the never-ending growth of a standard file server’s data. Most backup applications struggle with this millions of files scenario. Customers are counting on D2D to help, and it will… a little. The target disk may be faster, but mostly it is much more forgiving than tape. Tape needs to stream, or be fed a constant flow of data, in order to reach maximum write performance. Millions of small files make it difficult for those tape drives to be fed consistently. Disk backup, on the other hand, will maintain the same write performance no matter how inconsistent the data feed is.
That solves half the backup problem. The other half of the performance problem with millions of small files backup is that the backup software still needs to walk those millions of small files, identifying which ones need to be backed up. This file system walk can be very time consuming. Then, the backup software needs to update its own database that tracks what files were backed up and where. Imagine adding millions of records to a database every night, as fast as possible. That database gets HUGE in a hurry, can easily be corrupted and again, even if everything goes right, is very time consuming. Lastly, with most D2D backup solutions you still need to send the entire data load across the network. Even with deduplication solutions, the entire data payload needs to get to the appliance before deduping happens. All of this consumes network bandwidth. Disk-based archiving may circumvent or delay the need to upgrade network bandwidth by clearing this old data out of the way.
Disk-based archiving eliminates the problem of moving most of these millions of files. With disk-based archiving, the “old” files are stored on the archive and no longer need to be backed up. They are safer on disk than they are on tape (data integrity checking and replication) and they are out of the way. The backup software no longer needs to walk those files to find which ones need to be protected, send the files across the wire to be backed up and they do not consume disk space on file server or the D2D backup target. Additionally, since the archive is disk and not tape, you can be more aggressive with what is archived.
With a classic tape-based archive, customers will wait for data to get very old before moving it to tape. In addition, they will invest in elaborate data movers to provide transparent access to tape. Lastly, data that has stopped changing but is still being referenced or viewed cannot move to tape at all. With a disk-based archive, the delivery back to the user is relatively fast, so you can be more aggressive with your move to archive disk storage and there is less of a need to build elaborate access schemes. Most disk-based archives simply show up as a share on the network and you can archive reference data, further eliminating the data that needs to be protected by traditional backup methods.
A disk-based archive is the perfect compliment to D2D backup. It will reduce the investment in disk needed for backup and an archive strategy may pay for its self on this reduction alone. This is because a disk-based archive will clear out the fixed data (data that has stopped changing), making the investment in the software modules required by most backup applications for D2D cheaper (since they charge on stored capacity) and disk-based archives reduce the disk capacity of the disk backup as well as on the primary (expensive) disk needed on the file server.
What does this look like in hard costs savings? Disk-based archiving can reduce primary storage requirements (at least 10X dollar saving: $4 vs. $43/GB) and they can reduce backup requirements (fixed information is said to occupy, on average 50% or most enterprise primary disk capacity) saving them an additional $6/GB.
For more information please email me at firstname.lastname@example.org or visit the Storage Switzerland Web site at: http://web.mac.com/georgeacrump.
First, a disclaimer: no one here has personally evaluated this product, tested its features, or been able to talk to someone who has (yet). But at face value, an announcement recently came across our desk that could easily get lost in all the sturm und drang of storage news this week, that we thought was at least worth a closer look.
MicroNet’s Fantom Drives G-Force MegaDisk NAS appliance is a 1 TB desktop USB disk enclosure with USB expansion ports for two additional disks. For the $350 starting price, it comes with 1 TB capacity in either RAID-0, RAID-1 or JBOD, and MicroNet’s management software, which allows the MegaDisk to act as an iTunes server (an update also recently added to its higher-end PlatinumNAS product line) or a print server. The software also has a feature that allows the product to work as an unattended download manager for BitTorrent and other large Web-based content management services. Finally, MicroNet is bundling in NTI’s Shadow backup software, which crawls the system looking for file changes in the background without user intervention.
Sound too good to be true? We thought so, too, especially at that price tag. According to Joe Trupiano, VP of marketing for MicroNet, if you want up to 3 TB capacity (with 1 TB SATA expansion disks, that is), it’ll be between $600 and $900.
Still, Trupiano says that what you see is what you get for the 1 TB/$350 starting price. He explained the price by pointing out that MicroNet is a consumer storage company with many other products in its portfolio, and it ships around 30,000 hard drives per month. That adds up to some deep discounts on disks. “If you went down to the store and wanted a 1 TB disk without all these features, it would run you about $400,” he said. “But disk drive makers practically pay us to ship their product.”
In the grand scheme of things, $300 per TB isn’t the ratio enterprise managers are used to, which along with the 1 TB capacity puts this squarely in the consumer marketplace, especially since it’s difficult to expand the box much further, even with the USB drives (the expansion disks cannot be made into a single volume with the capacity of the main enclosure.)
But, we know how storage admins like to play with gadgets on their own time, and thought this one would be of interest to the gadget geeks among our readership. If you know of any other interesting consumer storage products, fire away in the comments.
IBM has reported two discoveries in its ongoing work in nanotechnology, both of which have implications for data storage of almost unimaginably tiny proportions.
According to a Reuters report, scientists at IBM reported yesterday that they have determined how to move the magnetic orientation of an atom, a key step toward using atoms as tiny storage devices. Each atom has a magnetic field that needs to be stabilized somehow before it can be used as the basis for a storage system where “bits” are not magnetized particles, as they are today, but atoms themselves.
IBM scientists in Zurich also announced that they have successfully “switched” the polarity of molecules, another key to computing on an atomic level. Currently, computer systems rely on the ability to “flip” magnetic particles to represent the ones and zeros of binary code. Being able to do the same with molecules, or even atoms, could eventually lead to microscopic computers, and breakthroughs in density on the order of 30,000 movies on an iPod, according to the researchers.
IBM’s not the only organization currently working on theoretical physics that’ll give you an ice-cream headache. Research revealed in July from the University of California Santa Cruz could lead to similar breakthroughs in the stabilization of magnetic fields on conventionally-constructed disk drives, the better to prevent data corruption. Industry experts pointed out that this research is most likely to be used for near-term density breakthroughs.
This research also represents a crucial piece of the puzzle for IBM’s atomic equations — figuring out the polarity of the individual bits or atoms is only half the battle. How those atoms or molecules or bits behave as a group on the surface of a drive is also a hurdle to be overcome before you can carry around a whole Blockbuster Video in your iPod shuffle.
It is not often that a non-storage conference distracts from the normal round of storage conferences such as Storage Decisions and Storage Networking World, but that may be the case when VMworld kicks off on September 11. What makes this event unique is that multiple storage vendors are planning to use VMworld as their venue for new product announcements or, in the case of startups, as their coming out party.
Selecting VMworld as a product or company launching point does not so much diminish the value of other storage conferences, as it reflects the growing importance that VMware is taking on in corporate boardrooms. Storage vendors know that companies are going to need more virtualization technologies, not less, if they adopt VMware, so these vendors see VMworld as a perfect opportunity to share in VMware’s spotlight.
There are only a couple of small problems with storage vendors piggybacking on the VMware express. VMware as a company is already nine years old, founded in 1998. Also, VMware has had a functioning product since 1999 with VMware’s recent ESX server operating system product release, now in its 3rd generation.
IT managers should exercise some caution because, while VMware offers for savings in server consolidation, IT managers can not automatically extend VMware’s savings and benefits to complimentary storage technologies. VMware has spent years developing its technology and building a user and knowledge base. Products from these storage companies may not have reached the same level of maturity.
The good news is that storage virtualization went through a similar round of hype about five to six years ago. Some of the companies that survived that round, such as DataCore Software and FalconStor Software, now have much more mature products that are still around and in use in mission-critical environments.
VMworld is acting as a demarcation point in the future of storage management. Virtualization is no longer something that companies can ignore or minimize — it is critical to the future of enterprise storage management and storage vendors recognize that sharing in VMware’s spotlight will likely pay huge dividends for them in the coming years. However, IT Managers still need to verify, before they spend big money on complimentary storage virtualization technologies, that these products can technically and financially deliver on their promised benefits.
A recent New York Times report touched off speculation last week that Seagate was about to be bought out by a Chinese company, rumored to be Lenovo, “raising concerns among American government officials about the risks to national security in transferring high technology to China,” according to the Times report.
The Times report was based on an interview with Seagate CEO William D. Watkins, in which Watkins is quoted saying there are no plans to sell the company, but that ” if a high enough premium was offered to shareholders it would be difficult to stop.”
Since then, Seagate has released a statement through news wires clarifying (repeating?) that there are no plans to sell the company, to a Chinese buyer or anyone else. Ironically, this now has some in the industry eyeing Western Digital as the possible acquisition target for a Chinese company. Might it have been Watkins speaking generally or hypothetically?
I have to admit I’m scratching my head a little about the supposed security threat–even in the original Times report, two contradictory statements about it follow one another. An anonymous industry executive is quoted as saying “I do not think anyone in the U.S. wants the Chinese to have access to the controller chips for a disk drive. One never knows what the Chinese could do to instrument the drive.” But a paragraph later, it’s noted that “China, however, still lags in basic manufacturing skills like semiconductor design and manufacturing.” So…do they have the means to commit dastardly acts of international espionage or not?
Even if it’s not this acquisition, this time, everyone knows China is a fast-rising power in the global economy. And they already make quite a large proportion of the products Americans use every day. It seems from my view that at least one instance of this type of acquisition is inevitable. Also, from my point of view–which I will admit is not one of experience in constructing foreign policy–it’s probably better to learn how to work with the situation than against it.
What are your thoughts?
Our esteemed colleague and zeitgeist-chaser extraordinaire, Alex Howard of WhatIs.com, has put together a nice post, including podcast, about that most time-honored and mysterious of storage industry questions: What is ILM? Opening a can of worms, to be sure, but Alex does a commendable job creating a definitive resource.
Even if you already know what ILM is, download the podcast anyway, have it at the ready, and simply press “play” for those less expert in the nuances of storage technology (like, say, when upper management wants to know what “ILM devices” you have in place). It could save you loads of time.
Businesses and their storage departments have so far largely dodged the digital media bullet – other than maybe workers tying up network bandwidth while watching video over the Internet. It certainly has not resulted in massive storage growth and management problems though that is already changing in some verticals.
In industries that use digital media, such as oil and gas, real estate development companies and certain government agencies, they are already feeling the pain of digital media.
It is not uncommon for these companies to store digital images on tens if not hundreds of TBs that require hundreds or thousands of disk drives. This creates storage management problems ranging from how to best grow their storage infrastructure to data protection to managing the tedious but necessary task of replacing failed disk drives.
The rest of the business world has so far largely dodged this digital media bullet though it may soon hit them in an area where they least expect it: video surveillance. Video surveillance is already standard practice in most businesses in high security portions of the building with video surveillance stored on VCRs or digital video recorders (DVRs).
The emerging generation of video surveillance technology eliminates the need for these one-off technologies, moving cameras and storage off closed networks and onto corporate networks. Using network-attached cameras, video surveillance software captures video and streams it across Ethernet networks to network-attached storage allowing companies to deploy network-attached cameras almost anywhere.
Though it is impossible to predict the scope to which businesses may adopt it, expect companies, even small and midsize businesses, to increase video surveillance for no other reason than to protect themselves against future lawsuits. Yet, right now, this technology raises more questions than it answers. Where will the cameras be deployed? How many to deploy? Who will determine video retention periods? Is it worth keeping a second copy of video as a back up? How can one know if the captured video is authentic? Who will manage its storage growth?
Video surveillance is still reserved for small segment of the market. But, with it no longer difficult to deploy or requiring dedicated staff to manage it, it will likely take only one well-publicized incident of where a company could have avoided a million dollar settlement to spark its corporate adoption. And, its adoption will usher in a new generation of corporate network and storage management challenges.
A report surfaced last week in ComputerWorld that Iron Mountain will be adding a security system called InControl to its delivery trucks that are carting around sensitive data. This week, I’ve talked to some users about how they feel about the program and also caught up with Iron Mountain’s CEO, Richard Reese, to talk about Iron Mountain’s point of view on security and chains of custody for the data it transports. In both cases, I heard some interesting comments.
The Iron Mountain updates, which come as the result of a $15 million investment over the last 18 months, will not require an additional fee, according to Reese. Bundled under the InControl umbrella are products, services and processes including more extensive background checks on employees and an employee training program on chain of custody procedures.
Reese also said the company has added on-board computers into the majority of its North American truck fleet. The computers will detect common human errors through sensors in the vehicle–a driver using a vehicle retrofitted with this system can’t start the truck if all doors aren’t locked and alarmed. If the system fails and the door somehow comes open anyway, an alarm will sound in the truck cab. The truck will also only allow one door to be open at a time if there are multiple doors on the vehicle, “so you can’t put the box [of tapes] down on the sidewalk and then go behind an open door and lose sight of it,” Reese said.
Drivers will also be given RFID fobs to keep on their keychains, so if they fail to lock the doors while making a delivery, an alarm will go off. Hand-held GPS-enabled scanners will report the whereabouts of shipments back to users through a Web portal that was already in place. The scanners will also alert drivers immediately to inconsistencies so that errors in shipment routing can be corrected more quickly.
Going forward, the program will be expanded to cover Iron Mountain’s international businesses. Right now retrofits have begun in the UK, and Reese said the company is studying legal regulations in other countries before it figures out how to roll out InControl everywhere.
The customer view of this depends on who you talked to. Dwayne Suizer, VP/Director of Technical Operations for First Independent Bank, said looking into the details of the plan put his mind more at ease. “At first, I thought they were just going to be able to track the trucks, but as I read more and understand how the driver proximity works and the dual ignition systems, it seems like these are all great steps forward.”
But another user, who declined to be named for legal reasons, said it’s “‘too little, too late’ for Iron Mountain. Many companies have been affected by Iron Mountain’s losses of tapes in transport mishaps and the seemingly-avoidable fires at two of their UK facilities last year. Two fires, so closely together, could be seen as unlucky or ill-prepared. It’s up to Iron Mountain’s customers to choose.”
Meanwhile, Reese’s response to the criticism that InControl is a day late and a buck short is that it’s only been in the last 18 months or so that data privacy laws have necessitated this type of control over data. “If you go back 2 to 5 years, customers were more concerned about driving down the cost of transportation than data loss–they could make three or four copies of a tape and if one got lost in transit, it wasn’t a big deal. Now they’re changing their own inside operations as well to deal with the new privacy regulations, and we’re trying to take on the same burden.”
Reese also said that there are premium services Iron Mountain users can pay for to have things like point-to-point dedicated routes for their deliveries and two drivers in order to guard against theft, and that Iron Mountain had, until the addition of InControl, been pushing its customers concerned about data security to purchase those extra safeguards. “They just wouldn’t do it. They preferred the common carriers.”
Not everybody’s buying it. “I see RFID tracking and a rigorously-enforced chain-of-custody as standard requirements for today’s off-site storage vendors. RFID tracking can be implemented inexpensively,” said the user who spoke on condition of anonymity.
So why did it take several instances of data loss and destruction for Iron Mountain to begin this grand security scheme? “Let me be clear that there will be other instances,” responded Reese. “InControl will also not be 100%. Any process that involves humans will have errors, and customers also need to understand where their high-risk data is and apply the right solutions. Especially for this baseline service which we just improved radically at no additional cost to customers, I’m not going to guarantee perfection.”
Suizer did have one suggestion for better security: RFID tags in each tape shipment box, an idea Reese said is good in theory, but is “not technically or economically feasible.” RFID tags’ antennas “need to see the sky”, he said, in order to communicate. “Once they go in the loading dock somewhere, the tracking is useless.” Passive RFID tags, which don’t contain batteries, have a much smaller transmission range–5 or 6 feet–than active RFID tags, but the Catch-22 is that active RFID tags require batteries, which are not long-lived. “RFID is not a cure-all,” he said.
Summertime is the best, obviously. BBQs, swimming and generally lounging around can’t be beat. Unfortunately, the weekend’s over and we’re all back at our computers pecking away.
Jo Maitland and I recently put together a summer reading roundup of the top 10 data backup news stories of the year with related expert advice. Some of the backup topics that we’ve been following this year include data deduplication, backup as a service, remote backup, tape transport and bare-metal restore.
Check out our Top 10 data backup news stories and tips now.
Companies tend to focus on the positive aspects of using SATA disk drives for a growing portion of their enterprise storage needs but as some companies are finding out, managing thousands or tens of thousands of SATA disk drives can take on a life of its own.
Recently, I spoke to Lawrence Livermore National Laboratories (LLNL) which is a huge DataDirect Networks user. By huge, I mean they use multiple DataDirect Network Storage Systems with the total number of SATA disk drives in production numbering in the tens of thousands, possibly even up to a hundred thousand SATA disk drives. More impressive, LLNL uses these storage systems in conjunction with some of the world’s fastest supercomputers, including the BlueGene/L currently rated #1 among the world’s fastest computers.
The issue that crops up when companies own tens of thousands of disk drives — SATA or FC — is the growing task of managing failed disk drives. Companies such as Nexsan Technologies report failure rates of less than half of 1% of all SATA disk drives that they have deployed out in the field. Those numbers sound impressive until one begins to encounter environments like LLNL that may have up to a hundred thousand SATA disk drives in their environment. Using a .005% failure rate in that scenario, companies can statistically expect a SATA disk drive to fail about every other day, which is inline with LLNL’s experience.
This is in no way intended to reflect negatively on DataDirect Networks. If users were to deploy a similar numbers of disk drives from any other SATA storage system provider, be it Excel Meridian, Nexsan Technologies or Winchester Systems, they could expect similar SATA disk drive failure rates.
The cautionary note for users here is twofold. First, be sure your disk management practices keep up with your growth in disk drives. Replacing a disk drive may not sound like a big deal, but consider what is involved with a disk drive replacement:
- Discovering the disk drive failure
- Contacting and scheduling time for the vendor to replace the disk drive
- Monitoring the rebuild of the spare disk drive
- Determining if there is application impact during the disk drive rebuild
- Physically changing out the disk drive
Assuming a .005% failure rate, companies with hundreds of disk drives will repeat this process once a year, those with thousands of disk drives once a quarter and those with tens of thousands once a week. Once a company crosses the 10,000 threshold barrier, companies need to seriously contemplate dedicating a person at least a part-time just to monitor and manage the task of disk drive replacements regardless of which vendor’s storage system one selects.
The other cautionary note is that the more disk drives one deploys, the more likely it becomes that two or even three disk drives in the same RAID group will fail before a recovery of an existing failed disk drive is complete. Companies, now more than ever, need to ensure they are using RAID-6 for their SATA disk drive array groups and, when crossing the 10,000 disk drive threshold, should consider the new generation of SATA storage systems from companies such as DataDirect Networks and NEC. These systems give companies more data protection and recovery options for their SATA disk drives.