when relevant content is
added and updated.
Data image via FreeImages
By James Kobielus (@jameskobielus)
Data science can function as a sustainable business resource only if it’s managed professionally. Regardless of how your enterprise chooses to organize its data science processes, you need professional management.
One approach for professionalizing your data scientists is to establish internal centers of excellence. To the extent that they come into being, centers of excellence will usually be the pet projects of one or more practicing data scientists who seek to bring great consistency and repeatability to their teams’ practices and procedures.
Of course, professionalism is a two-edged sword. Data scientists are often a proud, stubborn, and fiercely independent, a fact that might make them resistant to innovations in how you organize their work. To the extent that data scientists are given free rein to do what they wish and ad-hoc methods prevail, it may be difficult to establish structured, transparent practices within their teams.
As you introduce industrial-grade automation into your data science practice, excessive professionalism may cause sparks to fly. Data scientists may regard the new automation tools as an irritation, or an affront to their professional judgment, or eve (worst case) as an existential threat. The tensions are likely to grow as automation pushes deeply into the machine learning development pipeline, per my recent discussion here.
However, there’s no turning back to manual methods. Automating the data science development pipeline is the key to operating at enterprise scale. Data scientists will be swamped with unmanageable workloads if they don’t begin to offload many formerly manual tasks to automated tooling. Automation can also help control the cost of developing, scoring, validating, and deploying a growing scale and variety of models against ever expanding big-data collections.
To work in today’s business world, a data scientist must become more like an industrial engineer. In other words, their professional pride must shift toward a 24×7 regimen of building, training, deploying, productionizing, and managing a steady stream of data-driven models. If anything, they will need to master the new generation of data science development tools that:
- Automatically generate customized REST APIs and Docker images around machine-learning models during the promotion and deployment stages;
- Automatically deploy models for execution into private, public, or hybrid multi-cloud platforms;
- Automatically scale models’ runtime resource consumption up or down based on changing application requirements;
- Automatically retrain models using fresh data prior to redeploying them;
- Automatically keep track of which model version is currently deployed; and
- Automatically ensure that a sufficiently predictive model in always in live production status.
Clearly, these levels of automation will still require expert personnel to set up, monitor, and tweak the repeatable workflows they’re managing. In this new industrial order, the role of the working data scientist will become similar to a foreman in a factory that has implemented robotics and computerized numerical controllers.
If you’re data science, DevOps, or IT operations professional, you almost certainly have practical insights for automating data-driven business processes. I would love to hear your thoughts. Please join me on Wednesday, November 1, 2:00-3:00pm (eastern) for the Wikibon CrowdChat “Automating Data Analytics Management to the Max.” You can participate simply by clicking here, logging in with your Twitter handle, and posting your thoughts in an interactive, moderated Q&A format.