Question: When recovering an application at a remote disaster recovery site, how can the data’s integrity be verified before resuming production processing?
One of the primary goals of a disaster recovery (DR) plan is to protect business data. The RTO (recovery time objective) component is often the most time consuming part of a DR plan. The objective is to shorten the recovery task as much as possible while maintaining trust in your data so that the business can recommence regular operations quickly. Since the complexity of many applications leave far too many places for bad data to hide, the following recommendations will help speed the recovery process.
Disaster Recovery Preparation
- Implement the fastest replication to your DR site that budget and technology allow. The best option to minimize data loss is synchronous replication of your business critical data to a remote backup site. This ensures full data integrity because the replication logs will record data currency while the transactions are copied to both sites simultaneously.
- The next best option is an asynchronous replication scheme. Even if the delay is measured in minutes or based on periodic file copies, you will still be more prone to some known level of data loss. However, since this option is considerably less expensive, the business owners might decide that the reduced costs will offset the increased risk of data lose. Make sure that everyone understands the impact of data loss on the business recovery process. This includes establishing data loss tolerances, and identifying likely types of data loss during the development of the DR plan.
- Copy verifying data to your DR site along with the primary data. This means finding supporting data from outside the application that corroborates your database. This might include data input forms, activity logs, emails, checksums, or input files. Some of these may be paper-based and could be scanned or copied to the DR site.
Recovery Plan Preparation
- Assess the data at the DR site by knowing both its currency (i.e. how old it is, as precisely as possible) and its consistency across multiple applications. Currency is important because it will identify any lost transactions or updates, while consistency is important because incomplete transactions can cause data corruption problems after you have resumed production.
- Review the replication logs for the last data transmission to establish currency.
- Compare verifying data at the DR site with the recovered data to follow specific transactions and identify the degree of data loss.
- Have the business application users access the data and determine if recent changes to production data are found in the recovered data.
- Execute a build acceptance test (BAT) like the ones used to verify application installation and integrity. These tests often point out data anomalies.
- Have a process and tools in place for tracking and resolving data problems after you return the applications to production.
The above recommendations may not identify all the issues, but they will go a long way toward meeting your RTOs and customer requirements, so that you will be able to start using the application while continuing the verification of the recovered data.
John McWilliams, JH McWilliams & Associates, Business Continuity Consultants