Jun 15 2010 7:10PM GMT
Posted by: Graeme Elliott
File Virtualization
In the file virtualization world, Network Attached Storage (NAS), NFS and file servers are referred to as back-end filers. I will use this term in the rest of this blog.
Once described to me as “DNS for files,” file virtualization offers a layer between your users and your business files, allowing for transparent policy-based migration and information life-cycle management. From a user’s perspective, the files that appear in a directory may actually exist on different shares on the same back-end filer or on different back-end filers. Using policies files can be migrated between different shares and back-end filers with total transparency to the users.
A file virtualization appliance (or cluster) is placed between the business users in your company and the file shares on your company’s back-end filers. This provides what is known as a Global Namespace as it consolidates and aggregates a number of or even all the file shares from the back-end filers into a single device (or cluster), providing a single access point to all files.
File virtualization appliances are offered by F5 (ARX Range), EMC ( Rainfinity ), Auto Virt and Blue Arc’s File System products. Know of any additional file virtualization appliances? Please leave a comment.
These appliances can be used for ( to name a few ):
1. Migrating data between back-end files to aid in lifecycle management
2. Tiering by file type to lower storage cost (e.g. to migrate video and audio files).
3. Tiering by file age to lower storage cost (e.g. migrating files that have not been accessed for a given period of time).
The tiering of files can improve backup performance by migrating those files not required for backup (video, audio, or old files) to another tier. The backup software has fewer files to scan and fewer files for backup.
Some of these appliances offer migration to Hierarchical Storage Management (HSM) systems and Content Addressable Storage (CAS, an archiving platform) for further tiered processing such as putting the data on tape. I will go more in-depth about CAS in an upcoming blog post.
My own opinion on these appliances is that they are very expensive for what they do, so you will need a good business case to justify their use or have a specific use in mind. They do excel in life-cycle management, that is, migrating files to a new back-end filer when an old one is being replaced. With appropriate planning, only a small outage is required for the file virtualization appliance to start impersonating the old back-end filer and for seamless file migration to begin.
If you do have a specific use for these devices, I would love to hear about it so please leave a comment.
Jun 7 2010 4:36PM GMT
Posted by: Graeme Elliott
Backup,
Snapshots
Snapshots are now a feature on just about all mid to high-end storage arrays, virtual storage appliances and virtualization technologies such as ESX, Hyper-V and XEN.
A snapshot as its name suggests is a point-in-time copy of your data. This is accomplished in different ways by the storage and server vendors but in must cases it is done using a method called copy-on-write. Copy-On-Write involves copying the original data into a snapshot pool before overwriting it. This means that the creation of a snapshot is immediate with no initial impact on the storage array or underlying system.
Unless there is some interaction between the device taking the snapshot and the applications being snapped then the state of that data is in a “Crash Consistent” state.
“Crash Consistent” is really a marketing term as while the data is consistent with a system crash the data is no way in a consistent state. This is important to understand as systems do not always come back online correctly after a crash. Databases, file systems and applications can take from minutes to hours to become available depending on the activity on the server at the time the snapshot is taken.
“Application Aware” refers to snapshot devices that can in some way (usually via an agent or some operating system integration) communicate with file systems, databases and applications. This ensures that the system is in a consistent state at the time the snapshot is taken. This will ensure your system comes back up without any issues or lengthy delays.
An inherent problem with snapshots is that they rely on a base image. If the base image is destroyed or corrupted then all the snapshots based on that image will become unusable. For this reason I don’t believe snapshots will replace traditional backup for the long term storage of data. But because access to snapshots is almost immediate, snapshots are ideal for your more recently backed up data or data that needs to be restored often.
There are a lot of other uses for snapshots apart from backups such as speeding up your traditional backups, offloading batch processing and saving the state of systems prior to upgrades and configuration changes. Keep an eye out for upcoming blogs covering these topics.
Jun 1 2010 10:04PM GMT
Posted by: Graeme Elliott
Cloud Computing,
Storage as a Service
While this is not strictly a storage blog, I thought you might find some thoughts on cloud computing interesting.
If we look at the individual parts of the private cloud it can be broken down as follows
1. The Orchestration Layer: This is the software that provides the workspace and smarts to provision your servers in the private cloud.
2. Server Infrastructure: This is usually in the form of a Blade Chassis or two with a number of Server Blades.
3. Hypervisor: The Virtual Layer that runs on the the servers. Usually VMWare ESX or Xen (there is probably a flavor somewhere also running with Microsoft’s hypervisor).
4. SAN and Storage: Most of the Blade Chassis will have inbuilt Fibre Switches that connect to the back-end storage.
5. Network: Most of the Blade Chassis also have inbuilt Network Switches
Does this sound familiar? Apart from the orchestration layer most data centres already have all the layers listed. Most organizations already have the orchestration layer, too, which is the IT staff and current provisioning tools used to deploy servers.
Admittedly the orchestration layer will provide you a faster way to provision your servers but why do we have to use the vendors hardware as well, this is just a way for the vendors to get their hardware products into your organization where they wouldn’t otherwise be able to.
Other points to consider when implementing private cloud on a total vendor package
1. Will they meet the SLA’s that are already established with the business units as you will now be reliant on the vendor to support the whole cloud environment.
2. How do you report for charge back on the end-to-end solution ( Storage and CPU )
3. Do your current reporting tools work with the private cloud solutions or do you need to use yet another tool
4. Can you plug it into your security model, Active Directory or LDAP
4. Will your Standard Operating Image work on the Virtual Machine’s. Some Private Cloud implementations have storage allocation limitations to the Virtual Machines
When deciding to get on the Private Cloud bandwagon check that you are not already doing it and you may find that you can provide a better solution with your own IT staff rather than outsourcing to another vendor in your own data center.
May 22 2010 10:28PM GMT
Posted by: Graeme Elliott
The LTO roadmap has been extended to LTO 8 which will allow up to 12.8TB of uncompressed data to fit on a single cartridge. The upgrade includes quite a bit of changes, such as the larger compression buffer for history which could mean up to 32TB on a single generation 8 cartridge.
The drive also includes, from LTO generation 5 and onward, the ability to segment the cartridge into two partitions. Each partition can be independently accessed to provide faster access to the data.
The Linear Tape File System specification (LTFS), which defines a new file system that will utilize the partitioning feature of LTO generation 5 cartridges and above, was also announced. This specification will allow LTO generation 5 and above cartridges to be used as a storage medium for unstructured data, allowing the extension of the operating system to store data on tape as if it were a disk.
Support is currently planned for Linux, MAC OS and MS Windows (see the LTO website).
Another notable factor is that the LTO specification only allows for support of the previous two generations of cartridges on LTO Tape Drives. LTO 5 drives will not be able to read LTO 1 and LTO 2 cartridges. By the time LTO 8 is released, organizations will need, at a minimum, LTO 3 drives to read LTO 1 through LTO 3 cartridges; LTO 6 drives to read LTO 4 through LTO 6 cartridges; and LTO 8 drives to read the LTO 7 and LTO 8 cartridges.
The LTO roadmap offers some great new features including a lot more storage capacity. Are these worth the cost of backward compatibility?
May 22 2010 10:15PM GMT
Posted by: Graeme Elliott
VTL
There is currently a lot of discussion on the web regarding the pros and cons of inline versus post deduplication, mostly regarding how fast they can ingest the data. There are a number of other factors that need to be taken into account, especially when using the post-processing method.
While I believe that an inline VTL will never be able to ingest the data as fast as a post-processing VTL, the ingest rate is not relevant for most small to medium sites as they will never push the VTL’s hard enough for this to be an issue. Keep in mind that the following considerations may differ between post-processing VTL vendors and that there are other factors to consider, but in general:
1. Since post-processing requires a transient storage area to hold the data prior to its deduplication, consideration needs to be given to the size of this transient area. If the storage in this area is depleted, then it is not possible to write to the VTL. At the very minimum, at least one day’s worth of backups needs to be able to fit in this transient area.
2. If your backup software of choice requires the use of a virtual tape and it is in the process of being deduplicated, then deduplication will stop until the virtual tape is no longer being used. This increases the time for dededuplication and will require a larger transient area.
3. If replication is used, virtual tapes will not be replicated until the deduplication process for that virtual tape has finished. If combined with the second point, virtual tapes may take many hours to get to the destination site after they have been created.
4. From what I have seen in the VTL space, the post-processing method tends to lend itself better to a more scalable model. Since the inline appliances tend to be built in a single frame of hardware, simply add another node to the deduplication cluster as you need more processing power.
In summary Inline deduplication will require less storage and so will have a smaller footprint in your data center but will be slower ingesting the data and is less scalable than the post-processing deduplication offerings