when relevant content is
added and updated.
when relevant content is
added and updated.
Algorithm image via FreeImages
By James Kobielus (@jameskobielus)
Deep learning delivers extraordinary cognitive powers in the never-ending battle to distill sense from data at ever larger scales. But high performance doesn’t come cheap.
Deep learning relies on the application of multilevel neural-network algorithms to high-dimensional data objects. As such, it requires that fast-matrix manipulations in highly parallel architectures in order to identify complex, elusive patterns—such as objects, faces, voices, threats, etc.–amid big data’s “3 V” noise. As evidence for the technology’s increasingly superhuman cognitive abilities, check out research projects such as this that use it to put the Turing test to shame.
Extremely high-dimensional data is the bane of deep learning from a performance standpoint. That’s because crunching through high-dimensionality data is an exceptionally resource-intensive task, often consuming every last bit of available processors, memory, disk, and I/O thrown at it. Examples of the sorts of high-dimensional objects against which deep learning algorithms are usually applied include streaming media, photographic images, aggregated environmental feeds, rich behavioral data, and geospatial intelligence.
In data scientists’ attempts to algorithmically replicate the unfathomable intricacies of the mind, they must necessarily leverage the fastest chips, the largest clusters, and the most capacious interconnect bandwidth available to drive increasingly sophisticated deep learning algorithms. All of that assumes, of course, that these high-performance cluster-computing services are within their budgets.
What’s the optimal hardware substrate for deep learning? It would need to meet the following criteria. For high-dimensional deep learning to become more practical and pervasive, the underlying pattern-crunching hardware needs to become faster, cheaper, more scalable, and more versatile. Also, the hardware needs to become capable of processing data sets that will continue to grow in dimensionality as new sources are added, merged with other data, and analyzed by deep learning algorithms of greater sophistication. And the hardware—ranging from the chipsets and servers to the massively parallel clusters and distributed clouds—will need to keep crunching through higher-dimensionality data sets that also scale inexorably in volume, velocity, and variety.
Increasingly, many industry observers are touting graphical processing units (GPUs) are the ideal chipsets for deep learning. As discussed in this 2015 Wired article and this recent Data Science Central Post, GPUs–which were originally developed for video games and have high-performance math-processing features–may be far less hardware-intensive and less costly than general-purpose CPUs.
The Wired article mentions a Stanford researcher who used GPUs to “string together a three-computer system that could do the work of Google’s 1,000-computer cloud.” The article is also quick to point out that GPUs are pulling their deep-learning weight in production commercial and government applications, including as a complement to supercomputing resources at national laboratories. And it notes that some of the more intensive deep-learning algorithms are using GPUs to crunch through many billions of dimensions. The Data Science Central article, from a GPU hardware vendor, says that GPU technology is getting “smarter at a pace way faster than Moore’s Law,” though it offers none of the price-performance trend data needed to bolster that claim.
All of this raises the question of whether general-purpose CPUs have a future in high-performance deep learning. Some argue that general-purpose CPUs might continue to add value, either stand-alone or as a complement to GPUs, as long as they continue to improve in performance and to the extent that they’ve been optimized for high-performance, massively parallel clusters built on low-cost commodity hardware. Users such as Facebook are relying on GPU-based infrastructure to train their deep learning models, while also exploring new multi-core CPU chips that may approach the performance of GPUs in the near future.
A chipset-agnostic hybrid deep-learning hardware environment such as this may be the best approach, considering the vast range of specialized deep-learning applications and the likelihood that various hardware substrates will probably be optimized for diverse types of algorithmic analysis. In such a scenario, special-purpose “neural” chips, such as IBM SyNAPSE, may be incorporated for tasks for which neither GPUs not CPUs are optimal. FPGAs are also a credible option for deep learning.
Let’s leave quantum computing fabrics out of the discussion for now until they emerge from the laboratory suited for robust commercial deep-learning implementations. Deep learning needs serious acceleration in the here and now and shouldn’t pin its outsize performance requirements on unproven architectures that still have one foot in the lab.