Apache Cassandra – Distributed Database – Part II
As seen in consistency, Cassandra offers the users the choice of synchronous and asynchronous replication. Reducing replication factor is easy as it only requires running cleanup afterwards removing extra replicas. Highly available asynchronous operations are optimized with features like:
- Hinted Handoff – If a node which should receive a write is down, a hint will be written to a live replica node indicating that the write needs to be replayed to the unavailable node. If there are no live replica nodes for the key the coordinating node will write the hint locally. This reduces the time required for a temporarily failed node to become consistent again. Continued »
Apache Cassandra – Distributed Database – Part I
The Apache Cassandra is highly available, incrementally scalable, eventually consistent, distributed database, bringing together Amazon Dynamo’s fully distributed design and Google Bigtable’s Column Family-based data model.
Schemaless and Rich Data model
Cassandra is considered a schema-less datastore. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems. A column oriented model can be visualized as a key/value pair where the value is a collection of other Key/Value elements. Continued »
Hadoop Database for Big Data – Part III
For insert-only (in most cases at least) situations, involving crawling and indexing (both in the web and in Enterprises), blogs/wikis, facebook-like applications, search based retrieval (as against query based), batch-oriented or in-memory aggregations and computations, Wide column stores like Hadoop with KVP support would be relevant.
Hadoop provides a simplified programming model which allows the user to quickly write and test distributed systems and its efficient, automatic distribution of data and work across machines and in turn utilizing the underlying parallelism of the CPU cores. Continued »
Hadoop Database for Big Data – Part II
HBase uses a data model similar to that of Google’s Bigtable. Data is logically organized into tables, rows and columns. A data row has a sortable row key and an arbitrary number of columns. The tables are stored sparsely so that the rows in the same table can have widely varying number of columns. Any particular column may have multiple versions for the same row key. Continued »
Redis, NoSQL with extensive features – Part II
Redis supports master-slave replication that can be used to improve scalability – multiple slaves to scale to huge amounts of reads, as well as for improved data redundancy.
Redis allows any number of slave servers to be exact copies of master servers. The important facts about Redis replication are: Continued »
Redis, NoSQL with extensive features – Part I
Redis (Remote Dictionary Server) is an open source, networked, in-memory, persistent, journaled, key-value data store. It is similar to memcached, but goes beyond storing simple string values by allowing for lists, sets and sorted sets. It can be used as a cache in front of a traditional database. As the in-memory datasets are not volatile but persisted on disk allows it to be used on its own (A case study using Redis alone for a web application is available at http://code.google.com/p/redis/wiki/TwitterAlikeExample). Continued »
RabbitMQ, Open Source Enterprise Messaging
RabbitMQ is open source solution providing robust messaging for applications. It is easy to use, fit for purpose at cloud scale and supported on all major operating systems and developer platforms.
RabbitMQ can be downloaded and installed from http://www.rabbitmq.com/download.html. It has an interesting tagline “RabbitMQ – Messaging that just works”, and the website says that “you can have RabbitMQ up and running within two minutes of completing your download”. Setting up RabbitMQ is described well in http://www.skaag.net/2010/03/12/rabbitmq-for-beginners/.
RabbitMQ is designed from the ground up to interoperate with other messaging system. It is the leading implementation of AMQP, the open standard for business messaging. Continued »
Understanding Elastic Caching Plaftorms – IBM eXtreme Scale
Data Caching is a standard technique used for improving application performance. Local caching, though the fastest, is not able to scale and caching multiple copies of the same data in local caches raises a complication of keeping the copies in sync (and it gets all the more difficult to manage when the number of nodes involved is in hundreds).
By Elastic Caching, a caching layer to cache data (involving a cluster with large number of caching nodes) is added to the web architecture. Elastic caching can handle large amounts of data (and not limited by the memory in a single server) and provide massive scalability. With Cloud Computing – which can be said to be about Elastic Infrastructure, makes Elastic Caching platforms necessary. Continued »
Using DB2 Hash organized tables for improved performance
In DB2 version 10 for z/OS, IBM introduces a new access type called Hash Access and access method called Hash space. This new option of organizing tables using hash improves performance of queries that access individual rows using equal to predicate (say getting data using customer number or product number).
DB2 uses an internal hash algorithm with the Hash space to reference the location of the data rows. Thus using hash provides the advantage of selecting a hash access path which (in most cases) means only one I/O to retrieve a row from the table in turn reducing the CPU usage and improved response time makes it a very compelling proposition.
Using Hash access also means no need for maintaining the data sequence or clustering the index (for that matter, index clustering is not allowed if the table is hash organized). This results in efficient insert processing and avoids data sharing contention in maintaining a clustering sequence or clustering index. Continued »