Synchronous Replication and Data inconsistency

pts.
Tags:
Backup and Recovery
EMC
HP
IBM
Storage
Veritas
Hi, What are the possible scenarios where one could have data inconsistency when using Array based Synchronous Replication. What could be done to prevent it, and has anybody had any real life experiences with these scenarios? Thanks, Steve

Answer Wiki

Thanks. We'll let you know when a new response is added.

Steve,

Here are several scenarios I can think of right now (I am certain there are more):
(*) Re-mirroring – this scenario occurs whenever syncronization is re-established after either a planned or unplaneed disconnection, where queued source data is flushed (this scenario is quite common, and can happen, for example, after a planned DR test is completed). Most arrays (or host-based replicatoin products, come to that) will not guarantee data consistency of the remote copy while re-synch is in progress (this can take some time with either large amout of data, low bandwith or both).
Best practice to reduce risk – keep a snapshot or BCV of the remote copy, taken just before re-synch starts.

(*) Impropper host-based caching some applications cache critical data in memory, and asynchroneously flush it to disk. If the application is sluggish enough, it may happen that a power failure, or similar unplanned event, will result in the local data copy kept on disk being inconsistent. Of course, with synchroneous replication in place, the remote copy will be inconsistent as well.
Best practice to reduce risk – keep consistent snapshots at remote site (use application supported mechansm – if possible). Keep remote copy of application log files (is there are any). Try to tune application to increase disk-flush rate or eliminate it altogether – if possible (and feasible without sacrificing too much performance). You may also consider the use of CDP products.

(*) “Rolling”, or “evolving” faults – imagine that your data is comprised of multiple subset – each handled by a different process. If data relies on consistency accross subsets, you are at risk. Further imagine that a fault or disaster in the primary site will not crush the processes simultaneously, but rather one after the other. It is not always relevant whether mileseconds or minutes separate each failure. The result is that the remote copy, while consistent at the subset level, will still be inconsistent as a whole.
Best practice to reduce risk – take some serious design – in a nut-shell – consider a strategy that will enable you to roll-back remote copy of data to a KNOWN past consistent state.

(*) I am sure there are more scenarios – these three are what poped out first.

Hope this helps,

Dron

Discuss This Question:  

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following