Posted by: Michael Tidmarsh
Big Data, Data Science, Data scientists
Data science image via Shutterstock
By James Kobielus (@jameskobielus)
Scientific discovery is a shared resource of the human race. Though it’s extraordinarily competitive, science doesn’t truly produce shared new understandings of empirical reality if researchers hold any aspect of their work close to the vest in perpetuity. Science can only advance if researchers share all the data, assumptions, methodologies, hypotheses, findings, and constraints behind their insights. If nothing else, all of this must be on collective table in order for independent verification, replication, analysis, and review to take place throughout the scientific community.
Big scientific projects that involve many distributed researchers usually produce massive amounts of observational data. They may also require massive amounts of reference data in order to serve as input, reference, context, and control on their findings. Ideally, all previously gathered, relevant research data should be at the disposal of researchers exploring some new hypothesis, or simply attempting verification of others’ work.
An Internet-centric commitment to radical openness has transformed the global scientific establishment over the past decade. One of the most noteworthy initiatives is the Public Library of Science (www.plos.org), which its website describes as a “nonprofit publisher and advocacy organization with a mission of leading a transformation in scientific and medical research communication. Every article we publish is open-access – freely available online for anyone to use – which benefits everyone, from researchers, educators, and patient advocates to funders, policymakers, and the public.”
Another interesting effort in open science is Corral, an initiative hosted at the Texas Advanced Computing Center (TACC), which maintains around 100 scientific research collections that are open and freely available to researchers everywhere. As described in this recent article (http://ow.ly/pLIdT ), Corral provides reference data sets in archaeology, biology, ecology, natural history, political science, space science, zoology, and other fields. Corral supports high-volume storage and open sharing of research data sets and findings in diverse scientific disciplines.
Crowdsourcing of data science, in online marketplaces such as Kaggle and TopCoder, is yet another indicator that the culture of modern science is blossoming through the synergies of openness. Check out my recent blog for a discussion of the layers of openness that you need in your big-data analytics initiatives.
If you wish to accelerate the brilliance of the world community, open access to data science expertise is just as important as open access to data.