FalconStor made its virtual tape library and data deduplication partnership with Hitachi Data Systems official today, disclosing that HDS will resell FalconStor’s VTL with dedupe and its File-interface Deduplication Software (FDS) integrated with the HDS Adaptable Modular Storage (AMS) 2000 platform.
During their last earnings report conference call in April, FalconStor execs hinted that they were working on partnerships with HDS. They didn’t disclose what products were involved, but there were rumblings around the industry that HDS had agreed to sell FalconStor’s FDS dedupe either through an OEM or reseller deal.
The reseller arrangement means HDS will sell the FalconStor products under the FalconStor brand rather than the HDS brand.
Nexsan and SpectraLogic also resell FalconStor deduplication software, but HDS is now the largest FalconStor dedupe partner as the software vendor looks to replace revenue lost from EMC and Sun over the past year. EMC sells a lot less FalconStor VTL software now that it has Data Domain deduplication boxes in its portfolio. After buying Sun, Oracle ended Sun’s reseller arrangement for FalconStor VTL and dedupe software.
“We have a very tight relationship with HDS now,” FalconStor marketing VP Fadi Albatal said. “There’s a lot of collaboration between the two companies.”
But FalconStor’s collaborator was strangely silent for this announcement. There were no HDS executives quoted in the press release, and requests I made to HDS for comment over the last two days went unanswered. The HDS deduplication strategy remains unclear. It sells CommVault’s backup software with dedupe through an OEM deal, and has a reseller deal for Diligent ProtecTier VTL and dedupe software dating to before IBM acquired Diligent in 2008. Sepaton uses HDS hardware as the backend storage for its VTLs with dedupe. Sepaton execs claim HDS sales people have financial incentives to sell Sepaton’s VTLs, but HDS hasn’t confirmed that.
If HDS has a preferred dedupe partner among those options, it isn’t saying.
Meanwhile, Albatal says FalconStor is considering extending its dedupe capabilities to primary storage. “We have the building blocks,” he said. “Primary deduplication has to be a post-process method, which is the nature of our solution. We won’t have something in the near future, but it’s something we will be looking at.”
Flash memory appliance vendor Violin Memory today said it acquired the assets of failed caching startup Gear6 for an undisclosed price, and plans to add Gear6’s NAS and Memcached software to Violin arrays.
Gear6 sold its NAS product on large appliances – the smallest was an 11u device that cost $150,000 when it launched two years ago, and larger systems cost more than twice as much. Violin’s 3u arrays range from $30,000 for 700 GB to $200,000 for 10 TB of single-level cell (SLC) solid state capacity.
Violin Memory CEO Don Basile said Gear6 and Violin both set out to eliminate I/O bottlenecks in the data center.
“Instead of using a Gear6-type appliance, we’ll bring the software of both solutions on top of Violin devices,” he said.
Basile said the NAS caching product is the more interesting of the two Gear6 offerings, adding “we need a little more study on Memcached,” a web caching product based on open source software.
“The NFS piece aligns with Violin’s mission and its vision for the data center evolution,” he said. “Violin’s array is far denser and far faster than what Gear6 was able to do. We’ll take expensive complicated hardware from Gear6 and make it more scalable. We can extend our footprint by offering NFS caching in front of NAS devices, and solve performance problems without people needing to replace their NAS infrastructure.”
Violin last month launched the Violin 3200 SSD with plans to eventually scale it to 100 TB. The 3200 holds 84 128 GB SLC memory modules for 10 TB of total capacity, and can have up to 500 GB of RAM cache.
Gear6 filed to liquidate its assets earlier this year after burning through $24 million in venture funding and failing to get more. Basile said Violin bought its technology and patents, and will hire some Gear6 engineers. Violin identified at least 30 Gear6 customers, and Basile suspects there could be as many as 60. He said Violin is “sorting through contracts” to determine its support obligation. He says some of those Gear6 customers are also using Violin products.
Analyst Greg Schulz of StorageIO says Gear6 likely aimed too high with its products and ignored the mainstream NAS market.
“Gear6 was trying to create a market, but I think they focused on higher-end customers as opposed to making it more viable for general purpose NAS,” he said. “They were also going after read-intensive NFS-type environments that may be looking to use deduplication as opposed to an accelerator.”
Schulz says the Gear6 technology will expand the Violin offering.
“They’re adding additional personality to their solid state system,” he said. ‘The real secret to Gear6 was its caching algorithm and its ability to support files. Now Violin has a NAS solution.”
IBM is looking to grab primary data reduction vendor Storwize for $140 million, according to Israeli financial news websites Globes and TheMarker. Whether that deal comes off or not, you can expect a series of either OEM deals or outright acquisitions involving large storage vendors and suppliers of primary reduction technology – which now includes Permabit Technology, Ocarina Networks and Storwize.
Permabit and Ocarina each say they have one large OEM primary deduplication deal nailed down and are working with more storage vendors to secure others. Dell, Hewlett-Packard, Hitachi Data Systems, IBM, and LSI are all believed to be on the prowl for the technology and it’s a matter of whether they will forge OEM deals or acquire the technology outright. NetApp and EMC already have primary reduction capabilities.
Ocarina has been the subject of acquisition rumors, and Ocarina director of marketing Mike Davis says an IBM-Storwize deal would raise his company’s value.
“There’s demand for this technology, and we’ve had contact and serious conversations with all the OEMs out there,” Davis said. “We don’t know how serious IBM’s interest is [in Storwize], but Storwize only does a subset of what Ocarina does. So if they’re worth $140 million, Ocarina should be worth even more than that.”
Individual careers in the fast-paced world of technology are just as subject to change.
Five years ago, I was working two jobs, a typical 40-hour-a-week office job by day and a ‘stringer’ position for a local newspaper by night. I had graduated from college into what was then the worst job market since the Great Depression (as we all know, sadly, it has since been surpassed).
I knew I wanted to go into journalism, but couldn’t find a way in the door at traditional “dead tree” media publications. Sometime in the early spring of 2005, it dawned on me that the stringer position was aptly named — I was being strung along, but a full-time job on the paper’s staff was probably not in the cards.
That’s when I entered the word “writer” into a job search site, and a job description caught my eye for a “News Writer” position being offered at a company I’d never heard of before. I did the customary interviews, and was then hired on to the best job I’ve ever had.
It hasn’t just been technology I’ve gotten the chance to learn while working as a news writer for the Storage Media Group at Tech Target. It’s also given me a opportunity that’s becoming increasingly rare in today’s world: the chance to gain irreplaceable experience as a young writer covering a daily beat. Tech Target hired me with virtually no technology experience, and gave me the opportunity to establish a career for myself.
That’s why I’m happy to say that as I move to the next step in that career, it will still be as a Tech Target employee, in the Data Center and Server Virtualization Group.
Part of the “reporter’s personality” is being a naturally curious person. The need to know, to find out, to keep learning more, is deeply ingrained in a mind that is journalistically inclined. Which is why, though I have grown quite comfortable in the storage industry in the last five years, it has come time for me to begin broadening my technical expertise again.
I’m a nerd from way back. The only activity I’ve enjoyed more in my life than writing was being a student. Now, I want to understand other facets of the enterprise IT infrastructure. I want to keep learning, keep growing, the same way the technologies we all work with continually develop and advance. An opportunity to do just that, while remaining with a company that can provide me with the means to develop as a journalist and learn on the job, as Tech Target can, is an opportunity that’s just too good to pass up, just as it was when I first joined this company half a decade ago.
We’ve all seen how way leads on to way in enterprise IT, how paths cross, companies and technologies integrate and consolidate — and thus I anticipate remaining in touch with the brilliant people I’ve had the good fortune to get to know while covering this market. Data storage remains a crucial aspect of the evolution of digital technologies in our modern age, and that’s one thing I don’t see changing.
I had a brief conversation last week with Ed Chapman, Riverbed’s VP of cloud storage acceleration products, hired away from Cisco in May. Chapman (and senior vice president of marketing and business development Eric Wolford, who chimed in frequently) stopped short of divulging much detail on the planned Cloud Storage Accelerator product, but did offer some new information about its origins…
What are the goals for Riverbed’s cloud business this year?
Wolford: We haven’t launched yet, but we told our user group about the Cloud Storage Accelerator that Ed is going to head up getting to market. It’s a bit of a phoenix from the ashes of Atlas in part, and Steelhead in part, and some new development, in part. I don’t want you to think it’s the same product [as Atlas]– we took key components from that product and put them into the Cloud Storage Accelerator, to sit in the data center and accelerate access to AT&T Synaptic Storage, Amazon S3, etc.
While Riverbed’s working on its cloud product, other vendors like Nasuni, TwinStrata and StorSimple are already out in the market selling cloud storage gateway appliances to interface securely – and in some cases with deduplication features – with the cloud. How does Riverbed plan to differentiate its cloud product against those offerings?
Chapman: We are not going after the same market segments they’re going after. We’re going after a focused market segment that we think is more applicable in the marketplace. Just to give you a viewpoint of what customers are looking for in general, if you look at the application of new storage technologies in the market, it sort of follows a hierarchy…[users say] ‘maybe I’ll use this for backup’… then archive…then they look at all the rest, including primary storage. What we’ve heard from our customers that we’ve spoken to, they want to utilize cloud storage infrastructure in the same sort of mechanism, backup and archive and then going down the rest of the hierarchy from a storage perspective. Our goal looking at the marketplace is to leverage things our customers will want to leverage and utilize first, along the parameters of backup and archive rather than primary storage filer replacement.
Doesn’t Riverbed already allow replication to cloud services for DR? How is this different from that?
Wolford: Well…we’re just going to have to wait.
Similarly, EMC and other vendors are working on systems like VPlex and Atmos, which they claim can replicate data at scale among data centers, with little mention of WAN optimization technology as a necessary component of the infrastructure. Do those products represent a threat to Riverbed’s market? What would your role be in that environment?
Wolford: We’ll help them. The parallel I would make is that when cloud first came out, nobody mentioned WAN optimization. Nobody mentioned, ‘we have this great product but unfortunately, it has this problem.’ Any time users have distance between them and their data they have a problem. There’s been a trend toward consolidation in remote offices and data centers – the cloud is a variant of that reality.The more that reality occurs, we are just lovin’ it because it spotlights the performance problem.
Chapman: EMC has been selling SRDF and SRDF/A and never said WAN optimization is needed, but we just won EMC Select Channel Partner of the Year because we could be used as the primary WAN optimization tool with SRDF/A. So my point there is while EMC talks and launches VPlex and talks about distributed cacheing — and I think it’s a fabulous technology — that doesn’t mean we’re not going to be able to add a lot of value to it the way we have with SRDF, SRDF/A and other technology used for replication.
Emulex today acquired its 10-Gigabit Ethernet silicon partner ServerEngines for $78 million in cash and eight million shares of Emulex stock to be issued at closing, which could bring the price to around $160 million.
The deal is expected to close next month. The Emulex shares would be worth $81 million using Friday’s closing price of $10.11.
Emulex sold ServerEngines silicon with its OneConnect Universal Converged Network Adapters (UCNAs) adapters for the past two years as part of an OEM and joint development deal between the two. ServerEngines contributed the Ethernet and iSCSI portions of the ASICs for the converged adapters.
Emulex CEO Jim McCluney says ServerEngines gave Emulex a fast path to combining Etherent with its own Fibre Channel stack, and now that the UCNAs are coming to market it makes sense to bring the technology in-house. That gives Emulex more control of the technology.
“We saw this as a new game where the old rules didn’t apply,” McCluney said of converged networking. “Instead of repurposing our Fibre Channel ASICS, we wanted to do something unique. We went out and found best of breed 10-gigabit ASIC to combine with our own Fibre Channel stack.We felt the time was right to take things to the next level.”
ServerEngines has two main product families — the BladeEngine 10-GigE ASICs that Emulex uses in OneConnect, and a Pilot family of server management controllers soled by Cisco, Hewlett-Packard, NEC, and Unisys. McCluney said the Pilot products will bring Emulex around $4 million in revenues each quarter.
ServerEngines has about 170 employees, mostly engineers based in Sunnyvale, CA, Austin, TX, and Hyderabad, India.
The reorganization of the Storage and Availability Management group within Symantec was the source of widespread speculation in April that the vendor was preparing to divest entirely from the storage busines. Layoffs in the first quarter affected engineers working on Veritas file system, Backup Exec System Recovery (BESR), Enterprise Vault and Storage Foundation. The speculation prompted the vendor to post a statement on its website reiterating its commitment to its storage products.
“A lot of the rumors were way way wild in my view, a lot of the numbers were way off,” Anil Chakravarthy, the storage and availability management group’s new senior vice president, told me this week. He would not disclose the actual numbers, but said there are 920 people in the business unit worldwide now, include 860 engineers.
He confirmed earlier reports that much of the engineering work in this product group has been moved to Symantec’s development centers in China and India, though development continues at Mountain View, Calif. Chakravarthy previously headed up Symantec’s global services division, he said, but prior to that he ran Symantec’s Indian engineering business.
The reorganization came about — at least in part– after the Unix server market “took a sharp dive last year, and we took a dive along with it,” Chakravarthy said. Unix represented the biggest market for Symantec’s storage management and file system products.
For more on Symantec’s specifric plans for its Storage Foundation and Veritas Cluster Server (VCS) products, see our story on SearchStorage.com.
According to a press release circulated this morning by Storage Newsletter, storage software startup Seanodes has sold off its assets to a French VMware distributor called Deletec after filing for bankruptcy last December.
The report says Deletec will continue to sell Seanodes’ Exanodes software, which turns commodity servers into iSCSI SANs.
Calls to phone numbers listed on the Seanodes website for its offices in Cambridge, MA and France returned a busy signal and “please check the number and dial again” message, respectively. Meanwhile, the most recent press release on the Seanodes website is from last September, and one source close to the company speaking on condition of anonymity has confirmed the company went out of business after failing to get more funding last fall.
A receptionist answering the phone at Deletec’s Paris office said no one at the company would be available for comment until next week.
We’ve also heard reports of funding troubles for cloud storage vendor Parascale. Another source who asked not to go on record says Parascale failed to get Series B funding and is being reorganized. The company quietly changed CEOs last month, bringing in Ken Fehrnstrom to replace Sajai Krishnan. Nobody at Parascale was available for official comment on this Friday before Memorial Day, either.
Yesterday I met with execs from a company called Gluster, which is developing an open-source, software-only, scale out NAS system for unstructured data. As we discussed their market, products and competitors, we got into the nitty gritty of their technical differentiation as well – pasted below is an extended Q&A with CTO and co-founder Anand Babu Periasamy about Gluster’s way of handling metadata, most often a bugaboo when it comes to file system scalability.
Beth Pariseau: So as I’m sure you’re aware, there are many scale-out file system products out there addressing unstructured data growth. What’s Gluster’s differentiation in this increasingly crowded market?
ABP: What we figured out was that centralized and distributed metadata both have their own problems, so you have to get rid of them both. That’s the most important advantage when it comes to the Gluster file system. The reason why we got to a production-ready stage very quickly – we wrote the file system in six months and took it to production, because a customer had already paid for it, and they had a desperate need to scale with seismic data that was very critical, and they could no longer reason with that data because it was all sitting on tapes. I looked around, there was no file system around – the file systems they had used before were for scratch data, they had found a scalability advantage [to scale-out], but the problem was metadata.
The problem with the metadata is if you have centralize metadata it becomes a choke point, and distributed gets extremely complicated, and the problem with both is if your metadata server is lost, your data is gone, it’s finished…became very clear we had to get rid of the metadata server. The moment you separate data and metadata you are introducing cache coherency issues that are incredibly complicated to solve. By eliminating the need to separate data and metadata we made the file system resilient. On the underlying disks, you can format them with any standard file system – we don’t need any of its features. We just want the disk to be accessible by a standard interface, so even tomorrow if you don’t like the Gluster file system or there is a serious crash in your data center, you can just pull the drives out, put them in a different machine and have your data with you – you’re not tied to Gluster at all. Because we didn’t have any metadata the data can be kept, as files and folders, the way users copied it onto the global namespace.
Within the file system the scalability problem became seamless because we didn’t have to put a lock on metadata and slow down the whole thing, we can pretty easily scale because every machine in the system is self-contained and intelligent, equally, as all other machines. So if you want more capacity, more performance, you just throw more machines at it, and the file system pretty much linearly scales, because there’s nothing centrally holding the scalability.
BP: So it’s an aggregation of multiple file systems rather than one coherent file system that has to maintain consistency?
ABP: No. The disk file system is just a matter of formatting the drives. The Gluster file system is a complete storage operating system stack. We did not rely on the underlying operating system at all, because we figured out very quickly [things like] memory manager, volume manager , software RAID, we even already support RDMA over 10 Gigabit Ethernet or InfiniBand, you pretty much have the entire storage operating system stack that’s a lot more scalable than a Unix or Linux kernel. We treat the underlying kernel more like a hypervisor, or a microkernel and don’t rely on any of its features. By pushing everything to user space we were able to very quickly innovate new complicated things that were not possible before and pretty much scale very nicely across multiple machines.
Gluster VP of marketing Jack O’Brien: The three big architectural decisions we made early on…one is that we were in user space rather than kernel space and the second is that rather than having a centralized or distributed metadata store, there’s this concept called elastic cacheing where essentially you algorithmically determine where the data lies and the metadata is with the data rather than being separated. And the third is open source.
BP: Did you see the EMC announcement about VPlex or are you familiar with YottaYotta and what they did with cache coherency, having a pointer rather than having to make sure all data is replicated across all nodes? Is it similar to that?
ABP: What it sounds like they’re describing is basically asynchronous replication with locking, that’s how they bring you the cache coherency issue. But what I explained was, the file system is completely self-contained and distributed so we don’t have to handle the cache coherency issue. The cache coherency issue comes when you separate the data and metadata so when you’re modifying a file you have to hold the lock until a change appears…because we don’t have to hold metadata separately, we don’t have to hold the lock in the first place because we don’t have the cache coherency issue.
JO: Another way to think of it is, every node in the cluster runs the same algorithm to identify where a file’s located and every file has its own unique filename. The hash translates that into a unique ID—
BP: Oh, so it’s object based.
ABP: It is a hashing algorithm inside the file system, but for the end user it’s still files and folder.
BP: But this is how Panasas is, too, right? Underneath that file system interface to the user they have an object-based system with unique IDs..
ABP: But those IDs are stored in a distributed metadata server. We don’t have to do that.
JO: Our ID is part of the extended attributes of the file itself.
ABP: The back end file name is already unique enough, you don’t really need to store it in a separate index in a separate metadata server, we figured out we can come up with a smarter approach to do this. The reason [competitors] all had complications is because they parallelized it at the block layer, basically they took a disk file system and parallelized it, it’s a very complicated problem…you should parallelize at a much higher layer, at the VFS layer, and have a much simpler, more elegant approach.
JO: So a node doesn’t have to look up something centrally and it doesn’t have to ask anybody else in the cluster. It knows where the file’s located algorithmically.
BP: I think that’s what’s giving me the ice-cream headache here. So each node has a database within it? The thing I’m sticking on is ‘algorithmically knows where to look for it?
ABP: At the highest level …given a file name and path that’s already unique, if you hash it it comes out to a number. If you have 10 machines, the number has to be between 1 and 10. No matter how many times from wherever you calculate it, you get the same number. So if the number is, for example seven, then the file has to be on the seventh node on the seventh particular data tree. The problem in hashing is when you add the 11th and 12th node you have to rehash everything. Hashing is a static logic, as you copy more and more data you can easily get hot spots and you can’t solve that problem. The others parallelize at the block layer and put the blocks across. Because we solve the problem at the file level, if you want to find a file…internally what happens is the operating system sends a lookup call…to verify whether the home directory exists and [the user] has the necessary permissions…and then it sends an open call on the file. Internally what happens is by the time the directory calls come, the call on the directory…has all the information about the file properties. We also send information about a bit map.
Instead of taking a simple plain hash logic which cannot scale…you don’t have to physically think that you only have 10 machines. You can think logically, mathematically, you can think you have a thousand machines, there is nothing stopping you from doing that, it’s the idea of a virtual storage solution. It’s like with virtual machines, you may have only 10 machines but you think you have a thousand virtual machines, so we mathematically think we have a thousand disks. It can be any bigger number, and the actual number is really big. Then we present each logical disk as a bit, so the entire information is basically just a bit array, and the bit array is stored as a standard attribute on the data tree itself. By the time the OS or application tries to open the file, a stat call comes and the stat call already has this bitmap, and the hash logic will index into a virtual disk which really doesn’t exist, it could be some 33,000th cluster disk. And whichever directory wants that bit, you know that the file is in that machine, and don’t need to ask the metadata server, ‘tell me where my block is, hold the lock on the metadata because I need to change this bit.”
BP: But then if two people want to write at the same file at the same time…
ABP: We have a distributed locking mechanism. Because the knowledge of files is there across the stack, we only had to write a locking module that knows how to handle one file.