Data science image via Shutterstock
By James Kobielus (@jameskobielus)
Science isn’t a mad dash to enlightenment. Instead, it usually involves tedious, painstaking, and methodical slogs through empirical data by researchers seeking confirmation of precisely scoped hypotheses.
You shouldn’t trust scientific findings unless they’ve been independently reproduced. To confirm someone else’s findings, an independent researcher needs to know precisely how those results were achieved in the first place. If, however, the original researcher failed to document their procedures in precise detail, neither they nor anybody else can be confident that what they found can be reproduced at a later date by themselves or anyone else.
When scientists use Agile methodologies, there’s every incentive to skimp on documentation in the interest of speed. The essence of agile is that self-organizing, cross-functional teams sprint towards results in fast, iterative, incremental, and adaptive steps. Considering that methodological improvisation is the heart of this approach, it’s not surprising that teams that follow agile principles may neglect to record every step in the winding journey they took in achieving desired outcomes. If they’ve failed to maintain a detailed audit trail of their efforts, they may inadvertently undermine later efforts to reproduce their discoveries.
If data scientists wish to deserve the title of “scientist,” they can’t turn a deaf ear to the need for reproducibility of findings. Unfortunately, reproducibility is seldom a high priority in data science initiatives, especially those who are caught up an agile scramble for some semblance of statistical truth. As Daniel Whitenack says in this recent O’Reilly article, “Truly ‘scientific’ results should not be accepted by the community unless they can be clearly reproduced and have undergone a peer review process. Of course, things get messy in practice for both academic scientists and data scientists, and many workflows employed by data scientists are far from reproducible….At the very best, the results generated by these sorts of workflows could be re-created by the person(s) directly involved with the project, but they are unlikely to be reproduced by anyone new to the project or by anyone charged with reviewing the project.”
In order to ensure that reproducibility isn’t undermined by agile methods, data scientists need to ensure that their teams conduct all their work on shared platforms that automate the following functions:
- Logging of every step in the process of acquiring, manipulating, modeling, and deploying data-driven analytics;
- Versioning of all data, models, and other artifacts at all stages in the development pipeline;
- Retention of archival copies of all data sets, plots, scripts, tools, random seeds, and other artifacts used in every iteration of the modeling process;
- Generation of detailed narratives that spell out how each step contributed to analytical results achieved in every iteration; and
- Accessibility and introspection at the project level by independent parties of every script, run, and result.
Of course, some data scientists might argue that training their models from fresh data is a form of reproducibility. In other words, iterative training of models shows that the features and correlations identified on prior runs are still valid predictors of the phenomena of interest. But training does not address the following concerns that stand in the way of true reproducibility of data-scientific findings:
- Training may not flag circumstances in which a statistical model has been overfitted to a particular data set, a phenomenon that limits the reproducibility of its predictions in other circumstances.
- Training may simply confirm that the model has identified key statistical correlations, but may obscure the underlying causal factors that could be confirmed through independent reproduction.
- Training doesn’t address the need for interpretability of the results by independent parties, which ensures that the reproduced findings are not only statistically significant but also relevant to the application domain of interest.
For all these reasons, data scientists should always ensure that agile methods leave a sufficient audit trail for independent verification, even if their peers (or compliance specialists) never choose to take them up on that challenge.
Reproducibility is the hallmark of professional integrity, being grounded in a commitment to the quality, transparency, and reusability of one’s work. At the data-science community level, reproducibility can be the greatest agility enabler of all. If statistical modelers ensure that their work meets consistent standards of transparency, auditability, and accessibility, they thereby make it easier for others to review, cite, and reuse it in other contexts.
Science is, after all, an iterative process in which an entire community of data-driven investigators systematically probe their way closer to the truth.
Web security image via FreeImages
How should business and federal government cybersecurity policies differ? Find out in this week’s roundup.
1. Cybersecurity policies take center stage at RSA 2017 – Eamon McCarthy Earls (SearchNetworking)
This week, bloggers look into cybersecurity policies presented at RSA 2017, how to confront hybrid cloud challenges and the meaning of the SMS-Curvature merger.
2. Microsoft commits to GDPR compliance in the cloud by 2018 deadline – Peter Loshin (SearchSecurity)
Microsoft vows GDPR compliance in all cloud services when enforcement of the new EU data privacy regulation begins in May 2018, but companies still must take action to avoid fines.
3. Azure Stack appliance choices widen, as pricing questions linger – Robert Gates (SearchDataCenter)
Azure Stack will have a fourth appliance option when it becomes generally available later this year, but questions about pricing continue to emerge.
4. Kubernetes on Azure hints at hybrid cloud endgame – Beth Pariseau (SearchCloudComputing)
Microsoft’s Azure container strategy could take hybrid computing to an entirely new level and help launch both technologies into more mainstream waters.
5. HIMSS 2017 buzz ranges from patient engagement to AI, machine learning – Shaun Sutner (SearchHealthIT)
The busy floor and outskirts of HIMSS 2017 were abuzz with hot health IT topics ranging from patient engagement and care collaboration to AI and machine learning.
Security image via FreeImages
What should the next cybersecurity policy look like for the new presidential administration? Check out some of the suggestions in this week’s roundup.
1. Experts debate national cybersecurity policy suggestions at RSAC 2017 – Michael Heller (SearchSecurity)
Experts at RSAC 2017 discussed national cybersecurity policy suggestions for the new presidential administration, including what to do about encryption and the DHS mission.
2. Cisco revenue continues to fall from weak sales in switches, routers – Antone Gonsalves (SearchNetworking)
Cisco revenue dropped for the fifth consecutive quarter due to declining sales of switches and routers. The company is expected to counter the drop with more acquisitions of network software makers.
3. SAP S/4HANA Cloud deepens vendor’s cloud offerings – Jim O’Donnell (SearchSAP)
SAP S/4HANA Cloud was unveiled at the Capital Markets Day event; ‘next-generation intelligent’ ERP aims to give large enterprises a cloud option for S/4HANA.
4. Tech trends in 2017: A legal view – Jason Sparapani (SearchCIO)
Data regulations, blockchain and the popularity of Agile will have a significant impact on organizations’ technology partnerships and contracting, according to law firm Mayer Brown.
5. New Google cloud database service brings scale, data consistency – Trevor Jones (SearchCloudComputing)
Google’s Cloud Spanner may be years from broad adoption, but could represent a big step toward maintaining consistency across the globe with massive data sets in the public cloud.
Plane image via FreeImages
Over the past few months, several airlines have come under scrutiny due to several data center outages. Have they finally made progress? Find out in this week’s roundup.
1. Lessons learned from data center outages, but still a long trip ahead – Robert Gates (SearchDataCenter)
The hits keep on coming for the airline industry, with several more IT outages that have stranded angry passengers in recent months. Are there any new lessons for IT pros?
2. Trump tells White House cybersecurity officer, ‘You’re fired’ – Michael Heller (SearchSecurity)
Rumors have been confirmed that President Trump has fired the White House cybersecurity officer in charge of making sure he and his staff are not hacked.
3. Cisco joins Microsoft in providing Azure Stack services in UCS server – Antone Gonsalves (SearchNetworking)
Cisco and Microsoft have worked together in delivering Azure Stack services through Cisco’s UCS server. The new product ships in the third quarter.
4. The data storage industry will turn upside down in 2017, or will it? – Rich Castagna (SearchStorage)
Rich Castagna reviews the prognostications offered by data storage vendors on the future of data storage technology in 2017.
5. AWS IPv6 support answers call for IP address space – David Carty (SearchAWS)
IPv6 is not new, but the proliferation of the internet of things creates new demand for the protocol. And AWS has responded in deliberate fashion.
Security image via FreeImages
What do you think should be included in any potential cybersecurity executive order? Check out what several experts think in this week’s roundup.
1. Experts debate effects of government cybersecurity executive order – Michael Heller (SearchSecurity)
A leaked version of a draft of a government cybersecurity executive order from President Trump has experts debating the effects such an order would have.
2. Slack Enterprise Grid needs more than tech to beat Microsoft Teams – Antone Gonsalves (SearchUnifiedCommunications)
The new Slack Enterprise Grid has the technology basics for business. But winning large enterprise deals will require a better strategy against Microsoft.
3. Oracle cloud licensing requirements doubled for AWS, Azure users – Adam Hughes (SearchOracle)
Oracle has updated its cloud licensing policy, and the result doubles the processor license requirements for customers on the AWS and Azure platforms.
4. How Salesforce AI aims to change everyday business – Lauren Horwitz (SearchSalesforce)
The Salesforce flavor of artificial intelligence, Einstein, is trying to bring practical productivity to everyday tasks, but can it prevail over long-standing competition?
5. Advisory board: Learn from these top data center challenges – SearchDataCenter Advisory Board (SearchDataCenter)
For many, time is the ultimate teacher. Explore the top data center challenges and lessons learned from our advisory board members in 2016, and how they plan to move forward.
Virus image via FreeIamges
Remember the infamous Heartbleed bug? Well, find out why it’s still affecting thousands of devices in this week’s roundup.
1. Heartbleed bug still found to affect 200,000 services on the web – Michael Heller (SearchSecurity)
Researchers found the infamous Heartbleed bug is still unpatched on as many as 200,000 services connected to the internet and experts don’t expect that number to change.
2. Will AppDynamics pricing stay too high for small, medium businesses? – Antone Gonsalves (SearchNetworking)
Cisco will broaden its application monitoring portfolio with the acquisition of AppDynamics. But will the vendor make AppDynamics pricing friendlier to smaller businesses?
3. SAP names IoT services SAP Leonardo, debuts IoT kickstarter program – Jim O’Donnell (SearchSAP)
SAP has branded the IoT services portfolio it debuted last fall as SAP Leonardo, and it unveiled a kickstarter program for companies that want to develop IoT applications.
4. Open source challenges reduce menu choices in Docker data storage – Beth Pariseau (SearchITOperations)
Open source is all the rage in the modern IT ops world, but it can be hard to build a business that way — just ask the former CEO of ClusterHQ.
5. Dodge sneaky colocation costs by monitoring your bill – Erica Mixon (SearchDataCenter)
Colocation fees can pile up if you’re not savvy. Negotiate with your provider and predict the scale of your organization to avoid surprises on your next bill.
Storage image via FreeImages
What should we expect from the storage industry in 2017? Check out several predictions in this week’s roundup.
1. Enterprise storage market poised for more disruption in 2017 – Carol Sliwa (SearchStorage)
CTOs share enterprise storage predictions for 2017. Cloud, server-based storage, HCI, and growing use of containers and analytics will spur further disruption.
2. Cloud, IoT to drive enterprise IT trends in 2017 – Mike Matchett (SearchCloudComputing)
Cloud computing has evolved quite a bit in the last few years, but it still has far to go. Technologies such as big data, containers and IoT will have a big part to play in the future.
3. Future of the federal CISO position in question as Touhill steps down – Michael Heller (SearchSecurity)
Retired Brig. Gen. Gregory Touhill stepped down as the federal CISO, leaving questions surrounding the future of the position and the work he has done.
4. HPE-SimpliVity deal raises support, price and development questions – Robert Gates (SearchDataCenter)
With HPE’s buy of No. 2 SimpliVity, the first big deal in the hyper-converged infrastructure space, IT pros see a more robust offering, but also higher prices and weaker support.
5. Debate over big data and privacy is just getting started – Ed Burns (SearchBusinessAnalytics)
For years, the tension between privacy and big data has been apparent, but with emerging technologies generating huge amounts of data, the debate will intensify.
Data image via FreeImages
Is 2017 the year of analytics? Find out why you should embrace it in this week’s roundup.
1. Five analytics priorities for 2017 – Nicole Laskowski (SearchCIO)
The International Institute for Analytics recommends embracing AI, clearly defining roles, and finding a balance between experimentation and deployment.
2. Cisco market share report shows big lead for the vendor – Eamon McCarthy Earls (SearchNetworking)
This week, a report shows a big lead in Cisco market share in multiple segments; Ericsson extends its Cisco partnership; and Extreme targets retailers with new products.
3. Microsoft privacy tools give users control over data collection – Michael Heller (SearchSecurity)
New Microsoft privacy tools will give users control over the data collected on the web and within Windows and experts hope the tools will offer data privacy transparency.
4. Google key management keeps pace with AWS, Azure – Trevor Jones (SearchCloudComputing)
A new Google Cloud Key Management Service attempts to keep pace with AWS and Azure with an important feature for highly regulated industries and enterprises that operate on its cloud.
5. PrivacyCon: Tech’s assault on (obliteration of?) consumer privacy – Linda Tucci (SearchCIO)
The attack on consumer privacy by new tech is huge and growing, enabled by consumers and greased by profit; in other words, a fait accompli?
Chipset image via FreeImages
By James Kobielus (@jameskobielus)
Deep learning has moved well beyond the proof of concept stage. The technology is rapidly being incorporated into diverse applications in the cloud and at the network’s edge, especially in embedded, mobile, and Internet of Things (IoT) platforms.
Deep learning is all the rage. But the pace at which the technology is being adopted depends on the extent to which it is incorporated into commodity neuromorphic chipsets. To be ready for widespread adoption, deep learning’s algorithmic smarts need to be miniaturized into low-cost, reliable, high-performance chips for robust crunching of locally acquired sensor data. Chipsets must be able to execute layered deep neural network algorithms—especially convolutional and recurrent—that detect patterns in high-dimensional data objects.
Embedded deep learning apps will be as diverse as the endpoints whose automated behaviors they drive. In 2017 and beyond, a new generation of neuromorphic chipsets is emerging to address the growing demand for acceleration of artificial intelligence (AI)-powered mobile devices, IoT endpoints, and connected cars. Embedding of fast deep-learning chipsets is fundamental to the promise of an IoT in which endpoints can take actions autonomously based on algorithmic sensing of patterns in locally acquired sensor data.
What deep-learning chipset architecture will become the industry’s de facto standard? It’s too early to say. Currently, most deep neural networks run on graphics processing units, but other approaches are taking shape and are in various stages of being commercialized across the industry. What emerges from this ferment will be innovative approaches that combine GPUs with central processing units, field programmable gate arrays, and application-specific integrated circuits such as the Google TensorFlow Processing Unit.
However, no matter what architecture they incorporate or what deep-learning apps they drive, mass-market neuromorphic chipsets will need to support the following core requirements:
- Perform fast-matrix manipulations at lightning speed in highly parallel architectures in order to identify complex, elusive patterns—such as objects, faces, voices, threats, etc.;
- Achieve 10-100x boosts in the performance, scalability, and power efficiency of deep learning hardware platforms available to the mass market;
- Process sensor datasets that are locally acquired, low latency, specialized, and predominantly persisted in memory;
- Accelerate specialized neural-network functions, in keeping with the task-specific nature of most deep-learning edge applications;
- Execute a wide range of hierarchical neural-net processing patterns in a consistent fashion, in keeping with the various requirements of image, video, audio, and other complex pattern-recognition tasks;
- Enable flash-upgrading of to push revised deep neural network algorithms to edge devices over wireless connections;
- Minimize interprocessor communication and infrastructure roundtripping, in keeping with the need for deep-learning edge devices to operate in intermittently connected, low-bandwidth, autonomous-decisioning scenarios.
- Enable over-the-air or remote distribution of machine learning and other algorithmic artifacts, as well as security patches and updates, will become the standard approach
- Provide more resource-efficient neural-network designs, model compression, and data codings that compress the algorithms and data deployed to deep-learning edge devices without sacrificing predictive accuracy
For the success of the deep-learning industry, a positive sign is the speed at which next-generation neuromorphic hardware platforms are taking shape. As discussed in this recent EETimes article:
- Hardware startups and venture-capital funding are entering the deep learning field at a blistering pace.
- Benchmarking tools for assessing and optimizing the comparative performance of deep neural nets on alternative hardware platforms are being adopted.
- Hardware-based test and prototyping platforms for deep-neural network developers are coming into developers’ hands.
- Industry projects, such as NeuRAM3, are springing up to develop new multi-core neuromorphic chip designs that address the deep-learning industry’s insatiable need for speed, scalability, miniaturization, and power-efficiency
There’s no doubt that embedded neuromorphic chips have the potential to change the world around us and even prolong our lives. Check out IBM’s recent “5 in 5” announcement for examples of medical, environmental, and other IoT apps that benefit from deep-learning algorithms in embedded and/or cloud-based platforms.
Technology image via FreeImages
What should CIOs take away from CES 2017? Find out in this week’s roundup.
1. CES 2017 for CIOs: Making consumer tech business-ready? – Jason Sparapani (SearchCIO)
Artificial intelligence and the internet of things were big at this year’s extravaganza. Here’s what IT chiefs need to know.
2. What effect will Salesforce acquisitions have on the company’s future? – Jesse Scardina (SearchSalesforce)
After buying 10 companies in 2016, analysts are watching for Salesforce’s next step — whether it be more acquisitions or more integration.
3. VMware cloud services remain a concern following 2016 – Ryan Lanigan (SearchVMware)
Our advisory board members reflect on VMware’s past year, with praise for VSAN and vSphere and worries about cloud strategy.
4. AI, messaging top unified communications industry trends – Katherine Finnell (SearchUnifiedCommunications)
Unified communications industry analysts explore the trends that will affect organizations in 2017. Artificial intelligence, messaging and infrastructure are key areas to watch.
5. Looking at cybersecurity initiatives in 2016 and 2017 – Eamon McCarthy Earls (SearchNetworking)
This week, bloggers explore cybersecurity initiatives, Google Compute Engine and new Versa SD-WAN options.