By James Kobielus (@jameskobielus)
Logs of all sorts–web logs, application logs, database logs, system logs, etc.–are fundamental to the promise of the Internet of Things (IoT).
Without continuous logging of relevant events, the IoT can’t fulfill its core role as the real-time event-notification bus of the online world. Machine-readable event logging is fundamental to all the core applications of IoT, including real-time sensor grids, remote telemetry, self-healing network computing, medical monitoring, traffic management, emergency response, and security incident and event monitoring. Ubiquitous IoT will depend on the ability to support continuous real-time ingest, analysis, correlation, handling, and any-to-any routing of machine-generated information.
IoT’s development depends on implementation of a ubiquitous, general-purpose event-logging infrastructure. This global logging infrastructure must be able to support disparate relational and non relational logged data types; execution of advanced analytics against myriad logged data objects; agility to work in batch and streaming environments; scalability to support growing volumes of in-flight log data replication without choking or slowing down. Individual event logs need not be peta-scale; in fact, most IoT devices will support local logs that are constrained to their increasingly tight storage constraints and disparate form factors.
I recently came across a great article on the untapped potential for general-purpose logging infrastructure in the IoT age. Though the author, LinkedIn software engineer Jay Kreps, doesn’t specifically connect his discussion to IoT, the affinity is obvious. The two trends that he highlights as increasing the need for distributed data logging–”event data firehose” and “explosion of specialized data systems”–are at the very heart of the IoT revolution.
Kreps lays out a real-time pub-sub architecture reminiscent of the time-honored concept of an “enterprise service bus” (ESB). “[M]any of the things we were building,” he says, “had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures.”
To the extent that we intend for IoT to evolve into an ESB-like infrastructure for big data applications, we must grapple with the central role of distributed logs and with the protocols that support that support distributed log-data consistency, replication, and concurrency.
It seems to me that this loosely-coupled ESB-like approach for data integration is the best infrastructure for truly flexible, increasingly heterogeneous big data, IoT, and cloud infrastructures. The log will be the common denominator data-storage and integration abstraction.