Big data image via Shutterstock
By James Kobielus (@jameskobielus)
Big data is intimately tied to data storage. One storage technology does not suffice for all data types within most enterprise big-data infrastructures. Each data type–ranging from structured to unstructured, at-rest to in-motion–has distinct storage, compression, and retrieval requirements. Structured data in batch environments traditionally use hard disk drives (HDD). More real-time requirements might use solid state drives (SSD)–especially flash storage–and cache memory.
Big data thrives on “fit-for-purpose” storage deployed differentially by functionally differentiated tiers. In a multi-tier big data architecture, you should mix and match them. Put SSD in front-end query/access nodes for high-performance and hi-capacity HDD in hub and staging nodes. SSD is best for real-time, interactive, fast query & exploration, whereas rotating disk is best for lower-speed, batch I/O. The newer in-memory platforms are also essential for real-time decision-support applications.
The optimal mix of HDD vs. SSD/in-memory storage for big data is rapidly tipping toward the latter. SSD is coming into big-data environments very rapidly, pushing HDDs and other traditional rotating media ever further to the periphery. Likewise, in-memory platforms for analytic and transaction computing are becoming the principal approach for business intelligence and data science applications, which thrive on low-latency speed-of-thought architectures.
The new era of all-SSD big-data environments is fast approaching. SSD is proving to be more cost-effective approach over the data management lifecycle. In terms of acquisition cost–SSD vs. HDD–on a per-TB basis, I predict the tipping point toward SSD will be in 2015. To support my prediction, I call your attention to the following recent article: SSD Flash Storage At Tipping Point: IBM.
Another corroborating article is this, which provides strong evidence that flash storage is reaching the tipping point against HDDs. Chief among this evidence: “NAND flash drives are proving themselves to be both performance- and endurance-worthy in production situations–making them a better buy over time than mechanical hard drives.”
As HDD technology rapidly tips toward obsolescence, enterprise big-data platform managers will need road maps for migrating their data to the all-solid-state and all-in-memory platforms that replace them. By the end of this decade, these newer, faster, more cost-effective technologies will have almost entirely pushed HDD-based platforms into the computer history museums of the world.