when relevant content is
added and updated.
when relevant content is
added and updated.
Data Science image via Shutterstock
By James Kobielus (@jameskobielus)
Data scientists are not an elite class in our society. The concept of a “Citizen Data Scientist” describes a new generation of largely self-taught statistical explorers. In today’s dynamic free-market economy, they’re emerging to satisfy insatiable demand for their services.
Citizen Data Scientists are challenging the notion that you need some minimal academic qualifications to present yourself, without prevarication, as a competent professional in this discipline. In this economy, anybody can become a data scientist simply by doing the work and consistently producing the intended results.
The rise of the Citizen Data Scientist stems from three principal trends:
- Subject matter experts are shifting their focus toward data science. Increasingly, analysts of all sorts are acquiring data science skills and learning the tools of the trade in order to kickstart their careers in an exciting and potentially lucrative new direction. Mid-career professionals are leveraging the wealth of online tools, education, and community resources to master predictive modeling, machine learning, data engineering, and other key data-science practices. Many of the new data scientists are availing themselves of the ample free online resources to bootstrap themselves into this “sexy” profession on the cheap.
- Data science initiatives are increasingly open to team members with non-traditional backgrounds. The shortage of skilled, established data scientists relative to the demand for their services is causing analytics leaders to soften their recruitment and hiring criteria. Given the persistent undersupply of qualified data scientists to meet growing demand, the autodidacts (who can actually deliver the goods) will be able to prosper in today’s big-data-besotted economy.
- Data scientists of all skill levels are volunteering their efforts to a growing range of projects of a voluntary, probono, humanitarian, or charitable nature. As befits the “citizen” sobriquet, someone who embarks on this career path might typically cut their teeth on such projects, perhaps working closely with established data scientists on sabbatical from their dayjobs. Citizen data scientists’ insights–developed in close collaboration with subject-matter experts–can provide the decision support needed by agencies, community groups, and others who are in a position to fix the problems.
Clearly, most of the citizen data scientists who participate in communities such as New York-based DataKind have dayjobs to pay the bills. But they see larger humanitarian causes–reuniting refugees, curing infectious diseases, feeding hungry populations, guaranteeing civil rights to the disenfranchised, etc.–that can benefit from data scientists of all sorts, including the self-taught, applying their best efforts and tools to the task.
For-profit organizations everywhere can play a huge role in cultivating the next generation of citizen data scientists. As I discussed here, for-profit private-sector organizations are engaging in humanitarian data-scientific initiatives. For example, IBM’s Global Citizenship program enables our employees to volunteer their time and talent anywhere there is a social need. Note that, although IBM encourages employees to volunteer under the program, our personnel and the community participants among whom they volunteer know that they are sharing personal time and are not representing the company in any way. In other words, this is a corporate-citizenship program whose aim is to foster private-citizen volunteerism in data scientist and other capacities.
Even without taking leave from their day jobs, people can cultivate Citizen Data Scientist skills that they can apply to data science projects in company-sponsored extracurriculars and other settings. Employers can encourage business analysts to acquire data science skills beyond any that they picked up in school.
Company-sponsored data-science centers of excellence are a good way to nurture a new crop of Citizen Data Scientists. The informal center-of-excellence may be best for attracting people who don’t see themselves becoming heavy-hitting PhD-quality data scientists. At the very least, companies should facilitate ongoing communications between knowledge workers and established data scientists. For example, Friday lunch-and-learn sessions might interest analysts who want to immerse themselves in presentations, demonstrations, and discussions by established data scientists.
Whether you choose to hire or retain a data scientist with minimal qualifications or track records is your decision, and the risks are obvious. In business contexts, it might make good sense to give Citizen Data Scientists a short leash until such time as they prove out some basic level of competence in this function.
In that regard, William Vorhies does a good job discussing these risks in this recent blog. While highlighting the importance of nurturing Citizen Data Scientists in business contexts, he spells out broad recommendations for mitigating the accompanying risks. I’ll paraphrase these risk-mitigation principles as follows:
- Ensure that Citizen Data Scientists apply established methodologies for data sourcing, cleansing, transformation, outlier analysis, and model development
- Require Citizen Data Scientists to discuss their methodologies and results in visual data-centric narratives.
- For projects that have a potential bottom-line business impact, require Citizen Data Scientists to have their work reviewed by established data scientists
- For data-driven predictive models and other artifacts developed by Citizen Data Scientists, require that established data scientists certify those assets before they’re deployed into operational systems, business processes, or applications in conjunction with live data sets
- Ensure that Citizen Data Scientists comply with all relevant data governance, privacy, security, and other procedural controls throughout the lifecycle of their projects
All of that makes exquisite sense. It’s great to have a force multiplier of self-taught, hard-working, creative new contributors for your companies’ data-science initiatives. But it would be foolish in the extreme to let them bootstrap their learning curves without constant monitoring and supervision.