Data Replication – the key to distributed applications
Data Replication, copying of data in databases to multiple locations, is an essential tool to support distributed applications (where the applications are closer to the line of business). As distributed data storage become more prevalent in large enterprises, the need for maintaining an increasing number of data copies to support timely local response would keep increasing.
Replication provides the benefits of fault tolerance in distributed computing environment, whereby the system can continue to function even when a part of the environment is down. Replication typically provides users with their own local copies of data. These local, updatable data copies can support increased localized processing, reduced network traffic, easy scalability and cheaper approaches for distributed, non-stop processing.
Data replication brings in the challenge of how to maintain the consistency and integrity of the data. With more nodes (or machines) in the cluster, the tightly coupled two-phase commit process becomes impractical. The new replication approaches decouples the applications from the underlying multiple data copies and allows the applications to proceed.
The data synchronization to keep the copies consistent happens at the background. Replication is the best current solution, if the application can deal with some inconsistency among the different data nodes for short periods of time.
All data replication copies data from source(s) to target(s) and the two major models involved are:
1. Master/Slave (or Single Master) model – data is replicated from Master to slave. A transaction is considered to be successful if it is committed at the master site. In the master/slave model every table fragment is assigned to a primary site. Master Slave model works very well in general when the application has a high read/write ratio. Master could become a single-point of failure and most products using Master/Slave approach provide option for backup masters which idles normally and takes over when Master node fails.
2. Peer-to-peer (p2p or no Master) model – Updates can be made to any data location and then copied to other locations. Transaction is considered successful if it is committed at any one site. Peer-to-peer is closest in capability to a true distributed database as there is no limitation on where the data can be located or updated. As there is no single master, there is no single point of failure. But it is more complex as it requires additional software for collision detection and resolution.
The replication timing is another important factor that impacts the consistency and performance of the system. Various alternatives include:
- Immediately or as soon as possible (ASAP).
- Scheduled, as determined by the system administrator.
- Triggered, by user defined criteria such as an event happening, the number of records exceeding a limit or time of day.
- Under manual control
In most NoSQL databases, the eventual consistency is achieved by using replication as the key.
If R=read replica count, W= Write replica count, N=replication factor, and Q (QUORUM) = N / 2 + 1; You will have consistency, when W + R > N;
- If W=Q and R=Q, then you have consistency and remaining replicas are checked in background
- If W=1 and R=N (OR) W=N and R =1, you have consistency as either all replicas have the latest version or all replicas are read for finding the latest
- If W=1 and R=1, you can get older data and Replication typically happens in the background
Collision occurs when two different originating nodes update two different physical copies of the same logical data with two different transactions. When the changes are broadcast changes from each of those originating sites the conflicts are identified and the process of reconciling the differences begins. Various conflict resolution possibilities exist:
- The initial update has priority. Rollback the conflicting (and later) transaction with necessary messages to designated parties.
- The last update has priority. Overwrite the conflict and send the necessary notices.
- Resolve the conflict by firing a user specified trigger.
- Halt the replication process and send a message to the administrator (for manual resolution)