MongoDB, the NoSQL favourite – Part 2
In this blog, shall cover some more interesting features of MongoDB.
BSON objects in MongoDB are limited to 4MB in size. MongoDB is meant for storing large files and uses the GridFS storage specification to transparently achieve this. GridFS splits the large objects into small chunks (usually 256K) and each chunk is stored as a separate document in a chunks collection. Metadata about the file (filename, content type and any other information) is stored as a document in a files collection. Continued »
MongoDB, the NoSQL favourite – Part 1
MongoDB seems to be the favourite choice when NoSQL is considered – mainly due to it being simple and more close to object oriented concepts and relational database usage. MongoDB aims to bridge the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems (which provide rich queries and deep functionality).
MongoDB is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language. In MongoDB, a database consists of one or more collections, the documents in those collections, and an optional set of security credentials for controlling access. MongoDB uses type-rich BSON as the data storage and network transfer format for “documents”. In addition to the basic JSON types of string, integer, boolean, double, null, array and object, BSON types include date, object id, binary data, regular expression and code. Continued »
Neo4j, the Graph Database – for high performance traversals
Currently there are many areas – the Semantic web movement (of W3C), content management, bioinformatics, artificial intelligence, social networks, business intelligence etc. – where data is naturally ordered in networks (Networks are very efficient data storage structures – as seen in human brain and world wide web). With this in mind, a team set out to create a transactional persistence engine with high performance, scalability and robustness but without the disadvantages of the relational model. The result is Neo4j that provides:
- An intuitive graph-oriented model for data representation. The programmer can use an object oriented, flexible graph network consisting of nodes, relationships and properties called a nodespace.
- A disk-based, native storage manager optimized for storing graph structures.
- A powerful traversal framework for high-speed traversals in the node space.
- A simple object-oriented API Continued »
DB2 version 10 provides row versioning and time travelling
DB2 version 10 for z/OS, recently released, provides features for row versioning and time travel querying. With audit and compliance requirements demanding that the multiple versions of the data pertaining to a long period is stored and the changes made track able, we can almost say that it was high time DB2 came up with these features.
What we used to achieve using triggers, stored procedures and complex querying is now achievable using simple SQLs. I believe that it is not overstated when one of the insurance customers – who was involved in the Beta version – has said that over 80% of the applications requiring temporal features can exploit this and can save time and also make applications easier for business users to understand. Continued »
Varnish, the Open Source Web Accelerator
Varnish is an open source web accelerator created by Varnish Software (Norway) first released in 2006. Vanish accelerates the website by sitting in front of the web server, caching the content. Unlike Squid (a caching proxy server that with could be turned into a fairly functional http accelerator), Varnish is designed from the ground up to be an http accelerator, a caching reverse proxy.
Varnish keeps the most recently requested pages in memory and serves them from memory, producing high performance responses with reduced latency. Instead of logging directly to log files Varnish logs everything to a segment in shared memory and that means it doesn’t get slowed down waiting for disks. Logging to disk can then be done by a separate program.
Varnish can be used for application not designed to work with a cache and in such cases it behaves very conservatively. Continued »
MapReduce replacing complex SQL queries
Most NoSQL databases – like CouchDB, MongoDB, Hadoop, Redis – support MapReduce, the programming paradigm for parallel computing pioneered (and also patented!) by Google. Support for MapReduce by the traditional database products, is increasing every day. MapReduce is fast turning out to be the common factor between the traditional database products and the new NoSQL movement.
The possibility of applying the MapReduce model for large scale, fault tolerant computations in suitable applications in the enterprise context is being explored with keen interest. Hadoop is an open source implementation of the MapReduce model and is available on pre-packaged AMIs in the Amazon EC2 cloud platform.
Google points out that MapReduce is a powerful tool that can be applied for a variety of purposes including distributed grep, distributed sort, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning and statistical machine translation. A much longer list of MapReduce applications is available at http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/.
Traditional databases are providing MapReduce support in addition to the standard SQL interface. While the DBAs are expected to continue using SQL, more and more developers are using MapReduce instead of complex SQL queries. Continued »
Understanding SRB the possible key of zPrime – more known for Neon vs. IBM Lawsuits
IBM mainframe specialty processors – zAAP and zIIP – cost significantly less compared to the General purpose processors and these “do not count” in software pricing calculations and ISV licensing costs. Third party tools are needed to make best use of this – as the system has to be tuned to use the right processors for the right load.
One such tool Neon zPrime has resulted in law-suits and has brought a whole stream of questions on the future of these specialty processors. More details on the lawsuits between Neon and IBM can be found at http://itknowledgeexchange.techtarget.com/mainframe-blog/neon-ibm-wrangle-over-trial-timeline-for-zprime-lawsuit/.
zPrime “facilitates” (and doesn’t guarantee) the movement from a general purpose processor (CP) to a specialty processor (zIIP or zAAP) potentially saving millions of dollars in software and hardware costs. The idea is to fully leverage the capacity of the specialty processors with significant loads moved away from the CPs.
What has made the issue interesting is that Zeon says that zPrime is a proprietary software solution governed by trade secrets and the details will not be publicly disclosed. Neon also claims that zPrime uses publicly documented exits and doesn’t modify IBM licensed internal code, zOS or other IBM software. No hooks to z/OS dispatcher, SMF, RLM or WLM either.
Though how zPrime works is not disclosed by Neon (and looks like not yet figured out by IBM – as the issue is still left open), some of the following points hint at SRB being the key: Continued »
Using Memcached for speeding up web sites and reducing database load
Memcached is a open source, high-performance, distributed memory object caching system (not a database like MemcacheDB) intended for use in speeding up dynamic web applications by alleviating database load. It caches data and objects in RAM so as to reduce the number of times an external data source (a database or API) must be accessed.
The key components are:
- Client software, which is given a list of available Memcached servers.
- A client-based hashing algorithm, which chooses a server based on the “key” input.
- Server software, which stores the values with their keys into an internal hash table.
- Server algorithms, which determine when to throw out old data (if out of memory), or reuse memory. Continued »
Jena for building Semantic Web Applications
Jena, a Java RDF API and toolkit is an open source Java framework for building Semantic Web applications. Semantic Web Framework, developed by W3C is elaborated in http://itknowledgeexchange.techtarget.com/enterprise-IT-tech-trends/w3c-semantic-web-framework/.
In Jena, the subject of a statement is always a Resource, the predicate is represented by a Property, and the object is either another Resource or a literal value. The different relationship types can be described using the properties siblingOf, spouseOf, parentOf, and childOf, taken from the “Relationship” vocabulary. Continued »
W3C Semantic Web Framework
The Semantic web – collaborative effort led by W3C – provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is also about language for recording how the data relates to real world objects.
The Semantic Web involves the following technologies:
- A global naming scheme (Uniform Resource Identifiers or URIs)
- A standard syntax for describing data (Resource Description Framework or RDF)
- A standard means of describing the properties of that data (RDF Schema or RDF-S)
- A standard means of describing relationships between data items (Ontology – OWL) Continued »