Posted by: Sasirekha R
Apache, CAP Theorm, Consistency, CouchDB, Document Store, NoSQL, Relational Database, Scalability
CouchDB – NOSQL – for Document based applications
CouchDB – as the name suggests- has relax as the byline to CouchDB’s official logo, and when you start CouchDB it says: “Apache CouchDB has started. Time to relax“!
CouchDB creators found existing database tools too cumbersome to work with during development or in production, and decided to focus on making CouchDB easy, even a pleasure, to use.
Like most of the NoSQL, CouchDB relies on the Brewer or CAP theorm that states that it is impossible for a distributed computer system to simultaneously provide all three guarantees – Consistency, Availability and Partition Tolerance.
The point is that the most successful websites today can only guarantee two of Consistency, Availability and Partition Tolerance. So, why not consider the same trade-offs in corporate environments? And the answer is definitely, Yes. At the same time, it becomes obvious that no single form of data storage is going to fit all your application requirements and what you now have is a choice to choose the trade-off based on what your application requires.
What makes CouchDB relaxing?
1. Learning CouchDB and understanding its core concepts should feel natural to most everybody who has been doing any work on the Web and it is also easy to explain to non-technical people.
2. In a production environment, CouchDB’s fault-tolerant internal architectures ensure that when failures occur in a controlled environment, they are dealt with gracefully. Single problems do not cascade through an entire server system but stay isolated in single requests.
3. CouchDB is designed to handle varying traffic gracefully. For instance, if a website is experiencing a sudden spike in traffic, CouchDB will generally absorb a lot of concurrent requests without falling over. It may take a little more time for each request, but they all get answered. When the spike is over, CouchDB will work with regular speed again.
4. Graceful Scaling – i.e., growing and shrinking the underlying hardware – of your application is handled effectively. CouchDB enforces a set of limits on the programmer – which on first look seem inflexible – and left out some features by design to disallow programmers from creating applications that cannot deal with scaling up or down.
According to creators what it boils down to is “CouchDB doesn’t let you do things that would get you in trouble later on. This sometimes means you’ll have to unlearn best practices you might have picked up in your current or past work”.
CouchDB (that fall under the “document stores” breed of NOSQL) drastically changes the way document-based applications are built. CouchDB combines an intuitive document storage model with a powerful query engine in a way that’s so simple that a Django developer says “CouchDB makes Django look old-school in the same way that Django makes ASP look outdated“.
The idea of “evolving, self-contained documents” is the very core of its data model. Design of CouchDB can be summed up as:
1. Based on Web architecture and the concepts of resources, methods, and representations.
2. Augmented with powerful ways to query, map, combine, and filter data.
3. Added fault tolerance, extreme scalability, and incremental replication.
Simple Read vs Simple Writes
In CouchDB, an invoice that contains all the pertinent information about a single transaction-the seller, the buyer, the date, and a list of the items or services sold – is stored as a single document.
An equivalent case in a relational database would have been rows written to buyer, seller and items table and then the list of items written in a transaction table with each item as a row. Thus in relational databases, data is normalized in a way that it is not repeated or duplicated and hence can be easily added, updated or deleted. In effect, relational enforced a consistent state by using simple writes. The trade-off is that for getting the single invoice detail, relational database required reading from multiple tables involving joins. Joins are in most cases complex and hence makes reading slow and in most environments relational database couldn’t scale well in a web world with millions of concurrent users.
The document store databases like CouchDB – provide the ability of handling millions of concurrent reads – as they have a simple read (as typically one document contains all the required information) at the expense of “Consistency”.
In either case, relational or document store, there is a compromise and hence it is up to the developers to make the informed choice based on the specific application’s requirement. CouchDB is a better fit for common applications that involves taking mundane information – contacts, invoice, receipts etc. – and manipulating it using a computer application.
CouchDB schema-free design makes it suitable for handling what is referred as “real-world document“. Business Cards is a example of real-world document that typically has the same information – identity, affiliation and contact details – but uses different form of presenting the details. A business card can have phone number while another may have a phone, fax as well as email id as contact information. In effect, real-world documents of the same type are similar in semantics but vary hugely in syntax.
Relational database typically required modeling of data up front and expected all columns to be filled – in this case the fax number as “not available” or “NULL”. CouchDB is schema-free and hence can store unstructured data and allows aggregating the data after the fact, as humans tend to do.
While it is possible to use CouchDB as is, it also provides tools with multitude of knobs by which you can make the system work better in one area (of course at the cost of others). You can build a system that is
- super fast but with reduced reliability
- system that ensures reliability but by accepting performance hit
- Reduced latency but affecting concurrency and throughput.
In effect, CouchDB acknowledges CAP theorm and gives enough building blocks enabling the users to create a system that suits their requirement and in turn letting them decide on the trade-off.
Local Data is King
Erlang, CouchDB’s implementation language has been designed to run on embedded devices of magnitudes smaller and less powerful than today’s phones. Instead of relying on the network connection or network speed, CouchDB installed on phones or mobile devices use local data and hence have reduced latency. These devices synchronize data with the centrally hosted CouchDBs when they are on a network.
CouchDBCP is an acronym for CouchDB Clustering Proxy for maintaining CouchDB clusters whose objective is to allow for an abstraction of a single reliable CouchDB device, using a collection of possibly unreliable CouchDB units.
Cloudant, a venture based startup, takes the reliability, simplicity, and power of CouchDB and adds distribution, scalability, and ‘cloud readiness’, and offers BigCouch.
To summarize, CouchDB is a schema-less data store built on HTTP and REST/JSON and suitable for document based applications as it relies heavily on the concept of “evolving, self-contained documents” that are quite similar to real-world documents. It runs on most platforms – Unix, Windows and also Android phones. CouchDB has the backing of the Apache Software Foundation and a large community of enthusiastic developers, users, and contributors. BBC uses CouchDB to ensure 24×7 availability of its public website and it is also used by organizations like Mozilla and Canonical.