Enterprise IT Watch Blog

Nov 8 2013   1:46PM GMT

Big science thrives on open reference data



Posted by: Michael Tidmarsh
Tags:
Big Data
Data Science
Data scientists

shutterstock_132470936

Data science image via Shutterstock

By James Kobielus (@jameskobielus)

Scientific discovery is a shared resource of the human race. Though it’s extraordinarily competitive, science doesn’t truly produce shared new understandings of empirical reality if researchers hold any aspect of their work close to the vest in perpetuity. Science can only advance if researchers share all the data, assumptions, methodologies, hypotheses, findings, and constraints behind their insights. If nothing else, all of this must be on collective table in order for independent verification, replication, analysis, and review to take place throughout the scientific community.

Big scientific projects that involve many distributed researchers usually produce massive amounts of observational data. They may also require massive amounts of reference data in order to serve as input, reference, context, and control on their findings. Ideally, all previously gathered, relevant research data should be at the disposal of researchers exploring some new hypothesis, or simply attempting verification of others’ work.

An Internet-centric commitment to radical openness has transformed the global scientific establishment over the past decade. One of the most noteworthy initiatives is the Public Library of Science (www.plos.org), which its website describes as a “nonprofit publisher and advocacy organization with a mission of leading a transformation in scientific and medical research communication. Every article we publish is open-access – freely available online for anyone to use – which benefits everyone, from researchers, educators, and patient advocates to funders, policymakers, and the public.”

Another interesting effort in open science is Corral, an initiative hosted at the Texas Advanced Computing Center (TACC), which maintains around 100 scientific research collections that are open and freely available to researchers everywhere. As described in this recent article (ow.ly/pLIdT), Corral provides reference data sets in archaeology, biology, ecology, natural history, political science, space science, zoology, and other fields. Corral supports high-volume storage and open sharing of research data sets and findings in diverse scientific disciplines.

Crowdsourcing of data science, in online marketplaces such as Kaggle and TopCoder, is yet another indicator that the culture of modern science is blossoming through the synergies of openness. Check out my recent blog for a discussion of the layers of openness that you need in your big-data analytics initiatives.

If you wish to accelerate the brilliance of the world community, open access to data science expertise is just as important as open access to data.

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: