The trend that sees the SQL query engine appearing on Hadoop, is just the start of a movement; the SQL query engine running on data other than HDFS may follow. If these trends portend fitful change for users, they also affect vendors.
One vendor’s journey here is particularly telling. Starburst Data might be called a ‘re-start-up.’ The company was the brainchild of some young data technicians that included Daniel Abadi, an academic researcher who helped forward the notion of column-store parallel databases in the early 2000s. In 2011, he helped form Hadapt — one of the first Hadoop-on-SQL providers.
In 2014, the company was purchased by Teradata. The timing proved a bit odd, as it nearly coincided with Facebook ceding much development responsibility to Teradata for Presto, a SQL-on-Hadoop tool that the social media giant had forged in-house, and which has subsequently been endorsed by no less than Amazon for its Athena SQL engine. The former-Hadapt group within Teradata shifted its efforts to improved performance for a Presto-compatible SQL query engine.
At the end of 2017, Hadapt principals within Teradata spun-out to form Starburst, with Teradata’s blessings. A Starburst goal is to bring SQL engine prowess to SMBs that are still outliers in Teradata’s more familiar big player universe. An early effort for standalone Starburst has been a Cost Based Optimizer for Presto, built in collaboration with Facebook technicians. For the many lovers of SQL joins, the new optimizer supports Join Reordering and Join Distribution Choice.
The picture emerging shows differences in use cases between plain vanilla Hadoop and SQL on Hadoop – the difference is between Hadoop being fit for the purposes of small data science groups and skunk works to Hadoop being useful for the interactive needs of wider groups of SQL business analytics users. We are also seeing HDFS, the file system at the base of Hadoop, giving way as more people choose to pursue these types of applications on the cloud rather than on the premises.
Listen to the latest Talking Data podcast, which features Starburst Data CEO Justin Borgman. We left a noisy restaurant to record the interview, and found a noisy Boston waterfront, with massively loud construction if full throat. Enjoy! – Jack Vaughan