Data image via FreeImages
By James Kobielus (@jameskobielus)
I’m a pragmatist. I like to think that you are what you do. So it you look, walk, and quack like a data scientist, you’re a data scientist, aren’t you?
This is not a metaphysical inquiry. As we encourage more people to acquire data science tools and skills, what point is there in distinguishing between data scientists and those who, for all intents and purposes, are of the same species, albeit without traditional track records, tools, and certifications?
This question occurred to me as I was reading about a new DARPA program called Data-Driven Discovery of Models (D3M). What it’s all about is enabling greater automation throughout the data-science lifecycle. The program recognizes that many of the most critical tasks will be performed by people who are new to this field and who may not fit the traditional profile of the professional data scientist. As stated by the agency, the program’s goal is to “develop algorithms and software to help overcome the data-science expertise gap by facilitating non-experts to construct complex empirical models through automation of large parts of the model-creation process.”
What’s exciting about this initiative is that it focuses on the imperative of multiplying the productivity of data-science teams. It seeks innovative approaches that use automated machine-learning algorithms to accelerate the upfront process of composing data-scientific models that are best suited to a particular analytic challenge. I like the fact that it focuses on giving subject matter experts the tools to specify the analytic challenge to be addressed, identify the data to be analyzed, and evaluate the findings from the machine-learning models that are automatically composed. I think it’s good that they’re building hooks into this environment that would allow established data scientists to evaluate the results of automated methods. And I’m encouraged that the program will also address automation of data-science initiatives that are underspecified in terms of the features to be modeled and the data sets to be analyzed.
If it realizes its objectives, DARPA’s program will enable everybody everywhere to enjoy the fruits of high-quality data-science tools. However, I take issue with the self-contradictory notion, as expressed by DARPA in its solicitation, that the subject matter experts who would use such a tool are “non-experts.” Fortunately, the agency expresses its intention more cogently at another point in the document when it states its aim of enabling “users with subject matter expertise but no data science background [to] create empirical models of real, complex processes.”
But that statement also suffers from a fundamental conceptual flaw. What DARPA spells out sounds very much like the core competency of an expert data scientist, rather than a “non-expert” dabbler. After all, the core competency of data scientists is the creation and testing of complex empirical models. No matter what their academic or professional background, data scientists specialize in identifying analytic problems to be solved; defining the principal features of that problem that can be statistically modeled; acquiring, evaluating, cleansing, and preparing data sources to be used in the modeling; and building, testing, evaluating, and refining the resultant models.
At another point in the solicitation, DARPA states that one of its program’s core objectives is to develop a framework for “formal definition of modeling problems and curation of automatically constructed models by users who are not data scientists.” But that’s a self-devouring distinction. If someone, of any background, is able to use such a tool to perform this entire lifecycle of data-science tasks, they are thereby a genuine data scientist. They are not merely some incorporeal “virtual data scientist” or robotic “automated data scientist” (to cite two marginalizing phrases that Network World uses in this article about the DARPA program). And they are not necessarily a “citizen data scientist,” in the “impassioned amateur” sense in which many construe that phrase.
When deciding whether a subject matter expert is also a bona fide data scientist, the fact that they performed these data-science functions in a largely tool-automated fashion, rather than through manual techniques, is irrelevant. DARPA’s discussion seems to be hung up on the bogus notion that “curation”—the core of their “non-data scientist” distinction–is something less than full-blooded data science. Essentially, the agency uses this term to refer to two distinct data-science lifecycle tasks: evaluating the relevance of data sources to a specific modeling problem, and assessing the predictive fit of a constructed model to that same problem. However, by anybody’s reckoning, these tasks are at the heart of professional data science. The former is central to data engineering, and the latter to data modeling.
But I should point out that, for all its scoping flaws, DARPA’s initiative is on the right track. Modeling automation initiatives such as this are driving the new era of democratized data science. If subject matter experts everywhere embrace self-service tools for high-quality data science, we will unlock a world of data-driven creativity and innovation.
Social networking image via FreeImages
What can we expect from the Microsoft-LinkedIn deal? Check out all the details on the latest acquisition in this week’s roundup.
1. Microsoft-LinkedIn deal to shake up enterprise social networking – Brian Holak (SearchCIO)
The Microsoft-LinkedIn deal gives us a glimpse into a hyper-social business future. Will employees like what they see? Also in Searchlight: federal court upholds FCC net neutrality rules; big announcements from Apple’s WWDC.
2. Price isn’t everything: Google bets big on machine learning – Trevor Jones (SearchCloudComputing)
Google sees machine learning and deep analytics as the future of the cloud, as it seeks a strategy to stand out from the crowd beyond price.
3. HPE turns to Docker Engine to fuel server sales – Robert Gates (SearchDataCenter)
HPE has found a data center friend in Docker to ease container entry into the enterprise, but right now, it’s a step too far for some IT shops.
4. June Patch Tuesday addresses DNS, SMB Server vulnerabilities – Tayla Holman (SearchWindowsServer)
June’s batch of security bulletins included a number of updates to close Windows Server vulnerabilities, including a remote code execution flaw in DNS server.
5. LinkedIn could get UC features following Microsoft acquisition – Tracee Herbaugh (SearchUnifiedCommunications)
Analysts believe Microsoft will integrate UC features from Skype for Business into LinkedIn, opening up communications between users of the professional social network.
Deal image via FreeImages
Do you think the Mitel-Polycom deal will go through? Check out the latest details in this week’s roundup.
1. Tech buyers watch fate of Mitel-Polycom deal after second offer made – Tracee Herbaugh (SearchUnifiedCommunications)
Tech buyers face benefits and drawbacks if a New York-based private equity group outbids Mitel for the video conferencing company Polycom.
2. Microsoft warns of rare ransomware worm – Michael Heller (SearchSecurity)
Microsoft warned users of a rare ransomware worm affecting older versions of Windows, but experts are wary of the recommended mitigation technique.
3. Users give thumbs-up to lower-end versions of VMware’s NSX – Ed Scannell (SearchServerVirtualization)
VMware looks to finally establish a foothold in corporate accounts with two low-end versions of NSX. But will the enterprise take the bait?
4. At Cloud Expo, get the latest thoughts on DevOps and Agile – Valerie Silverthorne (SearchSoftwareQuality)
Everyone wants to be Agile and do DevOps, but, of course, it’s harder than it seems. Find out what industry experts will be talking about at Cloud Expo.
5. Falling into the tech skills gap? Try a new recruiting tack – Jason Sparapani (SearchCIO)
Organizations forging into the digital future often come up short in a search for talent. But there are novel ways to close the tech skills gap, say execs at the MIT Sloan CIO Symposium.
CEO image via FreeImages
What do you make of Citrix’s new look? Find out how IT professionals reacted in this week’s roundup.
1. Citrix gets a new look under CEO Kirill Tatarinov – Ramin Edmond (SearchVirtualDesktop)
Citrix is shouting its message from the rooftops that the company will focus on supporting cloud technology, Microsoft integrations and its core end-user computing products going forward.
2. Delivering composable infrastructure holds SDDC key for HPE – Ed Scannell and Robert Gates (SearchDataCenter)
Running hard to catch up with competitors, HPE gets ready to deliver a new approach to help users realize their long-held vision for a software-defined data center.
3. Should IT be looking at a Cisco alternative? – Eamon McCarthy Earls (SearchNetworking)
This week, bloggers debate whether it would be better if the industry found a Cisco alternative and reveal what one survey indicated will be the top IT priorities in 2016.
4. ‘Ingenious’ attack mixes memory deduplication with Rowhammer – Michael Heller (SearchSecurity)
Researchers demonstrated an exploit that combines rare attacks on memory deduplication and Rowhammer in order to allow an adversary access to read or write system memory.
5. Mitel’s Polycom acquisition could spawn cloud video, mobile UC services – Katherine Finnell (SearchUnifiedCommunications)
The UC buyer could see new products and services emerge after Mitel’s Polycom acquisition. Analysts envision cloud video services and mobile UC offerings.
Security image via FreeImages
What can you learn from LinkedIn’s 2012 data breach? Check out several key lessons in this week’s roundup.
1. Lessons from LinkedIn data breach revelations – Warwick Ashford (ComputerWeekly)
There are several important lessons to be learned from revelations about LinkedIn’s 2012 data breach, say security experts.
2. Public cloud vendors jump on serverless computing bandwagon – Trevor Jones (SearchCloudComputing)
Serverless computing is all the rage with cloud providers, and tools such as AWS Lambda may change the way resources are utilized — though it’s still early days.
3. New CEO talks Panzura cloud controller, Nirvanix downfall – Sonia Lelii (SearchCloudStorage)
Nirvanix founder and new Panzura CEO Harr says his company is in the sweet spot for expanding the use of cloud storage and seeks to double business over the next year.
4. VMware customers express frustration, confusion – Ryan Lanigan (SearchVMware)
VMware continues to evolve as a company but there are still users who are frustrated and confused about their products.
5. 2016 GRC conference calendar for IT leaders – Mekhala Roy (SearchCompliance)
Attending a GRC conference can keep you up to speed on compliance regulations, risk management strategies and governance trends. Check out our list of upcoming GRC conferences.
Dell image via FreeImages
Do you think EMC and Dell have made progress on their pending merger? Find out what’s working in the companies’ favor in this week’s roundup.
1. ‘Astonishing how open’ EMC and Dell are about merger – Carol Sliwa (SearchStorage)
Evaluator Group’s Randy Kerns looks at how Dell and EMC said all the right things at the recent EMC World and notes a key factor working in the companies’ favor.
2. Enterprises offer own twist on DevOps adoption – Margie Semilof (SearchDataCenter)
For many companies, DevOps remains a long-term goal, and they are putting their own spin on how to best adopt it.
3. May Patch Tuesday brings critical updates for browsers, Microsoft Office – Tayla Holman (SearchWindowsServer)
Microsoft issued 16 security bulletins for May’s Patch Tuesday, including critical updates for its Internet Explorer and Edge browsers, as well as Microsoft Office.
4. Consumers still eye smart home technologies warily – Lauren Horwitz (SearchCRM)
Despite the ever-ballooning list of smart home products on the market, consumers still see smart home technologies with reticence and as unnecessary.
5. SAP and Apple join to link SAP HANA Cloud Platform with iOS – Jim O’Donnell (SearchSAP)
A new partnership between SAP and Apple for an SDK that enables developers to build iOS apps that link with SAP HANA Cloud Platform underwhelms some observers.
IT company image via FreeImages
What does the future hold for Citrix? Find out why the company’s landscape looks much clearer in this week’s roundup.
1. Future of Citrix looks clearer after turmoil – Gabe Knuth (SearchVirtualDesktop)
Heading into Citrix Synergy 2016, the company is in much better shape than it was last year. Citrix pared down its product portfolio, but it didn’t make any moves that will affect its end-user computing customers.
2. Google cloud security plays catch-up with AWS, Azure – Trevor Jones (SearchCloudComputing)
New Google security certifications are welcome, if belated, additions to the cloud platform, providing assurances to enterprise customers about protecting their data.
3. Craig Wright fails, again, to prove he’s the bitcoin creator – Michael Heller (SearchSecurity)
Craig Wright’s second attempt to prove he’s the bitcoin creator, Satoshi Nakamoto, was debunked after fooling the mainstream press, but his motives are still a mystery.
4. Survey: BI implementation remains top software priority – Ed Burns (SearchBusinessAnalytics)
You might think that in today’s big data world it’s all about advanced analytics, but Tech Target’s IT Priorities survey shows basic BI software tools are still a hot commodity.
5. OAUG head Dues talks tech plans, Oracle cloud applications – Jessica Sirkin (SearchOracle)
OAUG president Patricia Dues talks about the technology that has the OAUG’s attention and why it’s important to learn about the cloud even if you aren’t planning to use it.
Apple image via FreeImages
Are you surprised by Apple’s declining earnings report? Find out why the company shouldn’t be worried in this week’s roundup.
1. One bad Apple earnings report doesn’t signal a mighty fall – Jason Sparapani (SearchCIO)
After 13 years of consistent growth, Apple earnings show a decline. Also in Searchlight: Amazon and Facebook revenues soar; ‘Snowden’ preview goes online.
2. Make the bed, enterprise OpenStack deployment is moving in – Robert Gates (SearchDataCenter)
A typical OpenStack deployment is still for ‘net new’ workloads outside of the world’s largest enterprises, but some big names have started to show off their use of OpenStack.
3. Clarity is key in improving VMware private cloud strategy – Ryan Lanigan (SearchVMware)
VMware has stressed its dedication to the private cloud. The SearchVMware Advisory Board weighs in on what VMware needs to do to take the next step.
4. Apple/FBI battle continues over iPhone vulnerabilities – Peter Loshin (SearchSecurity)
More fallout from the Apple/FBI conflict: The second iPhone suit was dropped; the FBI can’t provide details of a tool used to unlock the San Bernardino shooter’s phone.
5. Apple’s shift on WebRTC technology lacks details – Tracee Herbaugh (SearchUnifiedCommunications)
Apple has said it is developing WebRTC technology, but enterprises won’t see any benefits in online communication applications until the vendor installs the code in Safari.
Data Science image via Shutterstock
By James Kobielus (@jameskobielus)
Data scientists are not an elite class in our society. The concept of a “Citizen Data Scientist” describes a new generation of largely self-taught statistical explorers. In today’s dynamic free-market economy, they’re emerging to satisfy insatiable demand for their services.
Citizen Data Scientists are challenging the notion that you need some minimal academic qualifications to present yourself, without prevarication, as a competent professional in this discipline. In this economy, anybody can become a data scientist simply by doing the work and consistently producing the intended results.
The rise of the Citizen Data Scientist stems from three principal trends:
- Subject matter experts are shifting their focus toward data science. Increasingly, analysts of all sorts are acquiring data science skills and learning the tools of the trade in order to kickstart their careers in an exciting and potentially lucrative new direction. Mid-career professionals are leveraging the wealth of online tools, education, and community resources to master predictive modeling, machine learning, data engineering, and other key data-science practices. Many of the new data scientists are availing themselves of the ample free online resources to bootstrap themselves into this “sexy” profession on the cheap.
- Data science initiatives are increasingly open to team members with non-traditional backgrounds. The shortage of skilled, established data scientists relative to the demand for their services is causing analytics leaders to soften their recruitment and hiring criteria. Given the persistent undersupply of qualified data scientists to meet growing demand, the autodidacts (who can actually deliver the goods) will be able to prosper in today’s big-data-besotted economy.
- Data scientists of all skill levels are volunteering their efforts to a growing range of projects of a voluntary, probono, humanitarian, or charitable nature. As befits the “citizen” sobriquet, someone who embarks on this career path might typically cut their teeth on such projects, perhaps working closely with established data scientists on sabbatical from their dayjobs. Citizen data scientists’ insights–developed in close collaboration with subject-matter experts–can provide the decision support needed by agencies, community groups, and others who are in a position to fix the problems.
Clearly, most of the citizen data scientists who participate in communities such as New York-based DataKind have dayjobs to pay the bills. But they see larger humanitarian causes–reuniting refugees, curing infectious diseases, feeding hungry populations, guaranteeing civil rights to the disenfranchised, etc.–that can benefit from data scientists of all sorts, including the self-taught, applying their best efforts and tools to the task.
For-profit organizations everywhere can play a huge role in cultivating the next generation of citizen data scientists. As I discussed here, for-profit private-sector organizations are engaging in humanitarian data-scientific initiatives. For example, IBM’s Global Citizenship program enables our employees to volunteer their time and talent anywhere there is a social need. Note that, although IBM encourages employees to volunteer under the program, our personnel and the community participants among whom they volunteer know that they are sharing personal time and are not representing the company in any way. In other words, this is a corporate-citizenship program whose aim is to foster private-citizen volunteerism in data scientist and other capacities.
Even without taking leave from their day jobs, people can cultivate Citizen Data Scientist skills that they can apply to data science projects in company-sponsored extracurriculars and other settings. Employers can encourage business analysts to acquire data science skills beyond any that they picked up in school.
Company-sponsored data-science centers of excellence are a good way to nurture a new crop of Citizen Data Scientists. The informal center-of-excellence may be best for attracting people who don’t see themselves becoming heavy-hitting PhD-quality data scientists. At the very least, companies should facilitate ongoing communications between knowledge workers and established data scientists. For example, Friday lunch-and-learn sessions might interest analysts who want to immerse themselves in presentations, demonstrations, and discussions by established data scientists.
Whether you choose to hire or retain a data scientist with minimal qualifications or track records is your decision, and the risks are obvious. In business contexts, it might make good sense to give Citizen Data Scientists a short leash until such time as they prove out some basic level of competence in this function.
In that regard, William Vorhies does a good job discussing these risks in this recent blog. While highlighting the importance of nurturing Citizen Data Scientists in business contexts, he spells out broad recommendations for mitigating the accompanying risks. I’ll paraphrase these risk-mitigation principles as follows:
- Ensure that Citizen Data Scientists apply established methodologies for data sourcing, cleansing, transformation, outlier analysis, and model development
- Require Citizen Data Scientists to discuss their methodologies and results in visual data-centric narratives.
- For projects that have a potential bottom-line business impact, require Citizen Data Scientists to have their work reviewed by established data scientists
- For data-driven predictive models and other artifacts developed by Citizen Data Scientists, require that established data scientists certify those assets before they’re deployed into operational systems, business processes, or applications in conjunction with live data sets
- Ensure that Citizen Data Scientists comply with all relevant data governance, privacy, security, and other procedural controls throughout the lifecycle of their projects
All of that makes exquisite sense. It’s great to have a force multiplier of self-taught, hard-working, creative new contributors for your companies’ data-science initiatives. But it would be foolish in the extreme to let them bootstrap their learning curves without constant monitoring and supervision.
Encryption image via FreeImages
Are you impressed with Apple’s enterprise security features? Find out why many IT professionals are showing confidence in iOS in this week’s roundup.
1. IT pros confident in Apple iOS data protection, encryption – Ramin Edmond (SearchMobileComputing)
Apple’s enterprise security features draw praise from IT pros who say iOS data protection and encryption make the operating system a strong business platform.
2. Google’s second Android Security Report is a mixed bag – Michael Heller (SearchSecurity)
The second annual Android Security Report details a number of ways Google has been working to improve security on its mobile platform but also highlights persistent problems.
3. Mitel and Polycom announce $1.96 billion merger – Tracee Herbaugh (SearchUnifiedCommunications)
Mitel and Polycom agree to a merger that strengthens each company’s product portfolio and global market reach. The combined company will have $2.5 billion in annual revenue.
4. New CEO, unified management take center stage at Citrix Synergy 2016 – Carl Setterlund (SearchVirtualDesktop)
New Citrix CEO Kirill Tatarinov will set the tone for Synergy 2016 with his opening keynote, but IT administrators have plenty else to look forward to after recent updates to XenApp, XenDesktop and XenMobile.
5. 3D printing industry described as healthy and growing – Jim O’Donnell (SearchManufacturingERP)
Industry expert Terry Wohlers said the state of the 3D printing industry is strong, with investment from large corporations and new innovations leading the way.