Enterprise IT Consultant Views on Technologies and Trends

Nov 9 2010   12:31PM GMT

MongoDB, the easy way to try out NoSQL – Part 2

Sasirekha R Profile: Sasirekha R

MongoDB, the NoSQL favourite – Part 2

In this blog, shall cover some more interesting features of MongoDB.

MongoDB provides free license (A-GPL) across platforms (Linux, Windows, Mac OS X, Windows, and Free BSD & Solaris).  MongoDB currently has client support for the following programming languages: C++, Java , Javascript, Perl, PHP, Python, Ruby.

BSON objects in MongoDB are limited to 4MB in size. MongoDB is meant for storing large files and uses the GridFS storage specification to transparently achieve this.  GridFS splits the large objects into small chunks (usually 256K) and each chunk is stored as a separate document in a chunks collection. Metadata about the file (filename, content type and any other information) is stored as a document in a files collection. To retrieve chunks efficiently, we should create a, compound index in the chunks collection for files_id and n (chunk number). GridFS allows efficient storing of large objects, such as videos, permits range operations (e.g., fetching only the first N bytes of a file).

MongoDB supports asynchronous replication of data between servers for failover and redundancy. Only the Master server is active for writes at a given time, enabling strong consistency. The read operations can optionally be sent to the slaves when eventual consistency is acceptable. Originally MongoDB supported 1 Master, N slaves replication where the failover has to be handled manually. Replica sets are supported from v1.6.0, where there are N servers (with 1 only as primary) providing auto-failover and auto-recovery features.

Writes which are committed at the primary of the set may be visible before the true cluster-wide commit has occurred making theoretically achievable performance and availability higher. On a failover, if there is data which has not replicated from the primary, the data is dropped. A client can block until a write operation has been replicated to N servers (i.e., a write is only truly committed after replicating in N servers). For important writes, the client should request acknowledgement of this with a getLastError({w:…}) call.

MongoDB supports sharding of data among multiple machines in an order-preserving manner. For e.g., if we shard a collection of users by their state of residence, any operation specifying the state will be routed only to those nodes containing that state. MongoDB’s auto-sharding scaling model is similar to that of Google’s BigTable.

Sharding occurs on a per-collection basis, not on the database as a whole. In a database, the smaller collection can be on a single server, with only the collection(s) of huge size and throughput demands are sharded.

MongoDB scales horizontally via an auto-sharding architecture automatically managing failover and balancing of nodes.  Applications connect to the sharded cluster through a mongos process (can be treated as a database router making the cluster appear as a single database to the application) which routes the operations to the appropriate shard(s).

The current version of Mongo supports only very basic security. One authenticates a username and password in the context of a particular database.  Once authenticated, a normal user have full read and write access to the database, while a read only user only has read access.  The admin database is special and authentication on admin gives one to read and write access to all databases on the server. So it is preferred that the Mongo database is run in a trusted environment.

Other Points on MongoDB:

  • MongoDB development began in Oct 2007 by 10gen and the first public release was in Feb 2009.
  • Mongo is the names coined from “humongous” which means extremely large.
  • MongoHQ is the cloud-based hosted database solution for MongoDB.
  • Several GUIs – Fang of Mongo, Mongo3, MongoHub, RockMongo, Opricot, MonguVue, Futon4Mango – have been created to help visualize MongoDB data.
  • Prominent users of MongoDB are – Business Insider, Carbon Calculated, Etsy, EventBrite, foursquare, LHC, New York Times, Shutterfly and SourceForge.

Typical use cases for MongoDB include Log centralization, applications requiring full text search, content management, document management (or electronic record keeping) and real-time analytics.

MongoDB, being simple to learn and use and available in Cloud – as open source and free, makes it an ideal choice for RDBMS personnel to crossover and try out NoSQL.

1  Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • Rackspace enters DBaaS market with ObjectRocket acquisition
    [...] released in 2009 by developers 10gen, MongoDB (Mongo database) supports asynchronous replication of data between servers for failover and [...]
    0 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: