In contrast to the other management products recently developed, released or acquired by VMware – SRM falls out of this loop. You can quite clearly see what the point of this product is, the people it is targeted at and the benefits it can bring. I like SRM, and although it’s still in Beta, and will a version 1.0 product when it ships for me it will be like when I first got hold of VMotion or DRS. SRM itself is not without issues, and detect a slight “disconnect” from main engineering team that own ESX and VirtualCenter – and the guys own the SRM project. The first concerns the fact that SRM requires two, non-linked VirtualCenters. That means the metadata (if you want to call it that) held in the VirtualCenter inventory – and stored in the SQL backend – cannot currently be replicated from the Protected Site, to the Recovery Site. You would think you could just the built-in replication in MS SQL or Oracle. But the architecture of VC just currently doesn’t allow for this. It would require another iteration of VC. In short that means the SRM needs to convince the VC team to change the design. This is both a technical ask, but also a political ask. Let’s hope it’s there in time for Vi4 which some people are saying is not to far around the block!
The second is issue concerns the replication of VMFS. Now, SRM doesn’t do replication. That’s provided by your storage vendor. BUT, it does have to live with how that replication is done. Currently, the SRM system at the Recovery Site “re-signatures” the VMFS volumes which are replicated – the resignature process has a huge impact on the VMFS metadata – it means you get totally different volume names, and GUID values. This resignature process is being done to be ultra-ultra safe – to ensure that one ESX server doesn’t get visibility of two LUNs with the exact same metadata. The fear is that the whole point of LUN replication is create identical LUNs, but it an ESX host saw both LUNs – it would not know where to write its data. So far so good. SRM automates the registering of VMs at the Recovery Site – so although all that lovely VMFS metadata has changed – you don’t get any nasty “I can’t power on a VM because the VMX file points to a totally differently named VMFS volume or GUID value”. It does have huge consequences for failback. The design remit of SRM included failover to the Recovery Site, but not failback. Right now the process of invoking failback is not a pretty one. It means inverting the replication direction – (so the protected site, becomes the recovery site, and recovery site becomes the protected site). Then re-running your BC/DR plan again. Sounds simple? It’s far from it. In the bank and financial sector where they invoke and test their BC/DR on a quarterly or 6 monthly basis – this isn’t going to play well.
What is promising about SRM is having spoken very briefly and informally to people within the SRM product team they are more than aware these issues. That’s a good sign. It means something can and will be done about the shortcomings. However, not in this release – after all when it reaches GA it will be a version 1.0 product. I think we all know what that means in the world of software.