Online transportation company Uber has released its open sourced Pyro – a homegrown probabilistic programming language that has been developed internally.
As many readers will know, a probabilistic programming language is a high-level language (one with a high level of abstraction upwards from machine code) designed to give software programmers a way of defining probability models, which can then be solved by the power of the language model automatically.
As O’Reilly points out, these languages incorporate random events as primitives and their runtime environment handles inference
Uber is releasing Pyro with a specific hope that developers will use it to create further language advancements which will serve its own industry — and (presumably) for the benefit of other vertical industries.
Deep probabilistic modelling
The company describes Pyro as a tool for deep probabilistic modelling, unifying the best of modern deep learning and Bayesian modelling.
As clarified on probabilistic-programming.org, “Probabilistic graphical models provide a formal lingua franca for modelling and a common target for efficient inference algorithms. Their introduction gave rise to an extensive body of work in machine learning, statistics, robotics, vision, biology, neuroscience, Artificial Intelligence (AI) and cognitive science.”
Predictive models are communicated using a mix of natural language, pseudo code and mathematical formulae and solved using special purpose, one-off inference methods.
Now with an eye on AI and its use in autonomous vehicles, Uber says its believes the critical ideas to solve AI will come from a joint effort among a worldwide community of people pursuing diverse approaches.
According to Uber, “By open sourcing Pyro, we hope to encourage the scientific world to collaborate on making AI tools more flexible, open, and easy-to-use. We expect the current (alpha!) version of Pyro will be of most interest to probabilistic modellers who want to leverage large data sets and deep networks, PyTorch users who want easy-to-use Bayesian computation, and data scientists ready to explore the ragged edge of new technology.”
Marrying probability with representational power
The company further explains that specifying probabilistic models directly can be cumbersome and implementing them can be very error-prone. Software engineers at Uber have explained that probabilistic programming languages (PPLs) solve these problems by marrying probability with the representational power of programming languages.
Uber’s Noah Goodman further clarifies that a probabilistic program is a mix of ordinary deterministic computation and randomly sampled values; this stochastic computation represents a generative story about data. The probabilities are implicit in this representation—there is no need to derive formulas—and yet this specification is also universal: any computable probabilistic model can be written this way.
Pyro builds on full Python as its base language.
Red Hat has provided a presence (and a lot of free red Fedora giveaways) at SAP’s TechEd Europe 2017 conference in Barcelona this week.
The open source champions (Red Hat, not SAP) have announced the availability of SAP Vora on Red Hat OpenShift Container Platform.
SAP Vora is a distributed computing solution for business that provides enriched interactive analytics across Hadoop and corporate data.
The end result is an integrated solution that pairs enterprise-grade Kubernetes with big data analytics.
What SAP brings to the yard
Running on Kubernetes clusters and Linux containers driven by OpenShift, SAP Vora provides containerized analytics engines and services and collects big data via Apache Spark, Apache Hadoop, and directly from cloud environments to use for business intelligence.
SAP Vora provides interactive analytics on stored data from various sources, including traditional storage, Hadoop and Amazon Simple Storage Service (Amazon S3). It helps package analysis from disparate data types with a common interface, modeling tools and access to SAP HANA.
Red Hat brings a milkshake too
Red Hat OpenShift delivers a scalable platform upon which these capabilities can run, integrating containerized services and SAP Vora resources for a unified, flexible offering.
Red Hat OpenShift Container Platform unites developers and IT operations on a single platform to build, deploy, and manage applications consistently across hybrid- and multi-cloud infrastructures. Red Hat OpenShift helps businesses achieve greater value by delivering modern and traditional applications with shorter development cycles and lower operating costs. Red Hat OpenShift is built on open source innovation and industry standards, including Red Hat Enterprise Linux and Kubernetes, and is trusted by companies of all sizes around the world.
Key features of the integrated offering include:
- On-demand in-memory big data analytics
- Easier management of big data analytics at scale
- Easier integration of SAP Vora with SAP HANA
- Better support for agile development around big data use cases
Additionally, to help support the consistent creation of applications using components of SAP HANA and the SAP NetWeaver technology platforms, Red Hat is now making its newly-launched Red Hat Enterprise Linux for SAP Solutions available as part of the Red Hat Developer Program.
MongoDB 3.6 will be generally available in early December.
The open source (at its core) general purpose database has some noteable changes (its makers would call them enhancements) including a so-called ‘change streams’ feature, which enable developers to build what are being described as more ‘reactive’ web, mobile and IoT applications that can view, filter and act on data changes as they occur in the database.
Whenever data is changed in MongoDB, the updates are automatically reflected (in real time) in the application that the database itself is serving.
The speed of data
For example, a weather application that pulls from constantly changing datasets (as the weather shifts) would have previously required a developer to write application code that periodically polls the database for updates, limiting the application’s ability to provide an accurate, real-time user experience.
Change streams automates that process — this then, is the ‘speed of data’, that is – the velocity at which real world ‘things’ change data in databases serving applications.
(1) always on & (2) distributed
President and CEO of MongoDB Dev “please pronounce my name Dave” Ittycheria insists that MongoDB 3.6 makes it easier (and faster) to build always-on applications that react in real time to changes in data streamed across distributed systems.
That’s the message for cloud apps for sure now then i.e. (1) always on & (2) distributed
“MongoDB has always aimed to make developers more productive by giving them the most friction-free means of working with data,” said Eliot Horowitz, CTO and co-founder, MongoDB. “With advancements like change streams and retryable writes, MongoDB 3.6 handles critical tasks at the database layer that used to take up developer time and energy. Extensions to the query language make array updates and joins more expressive, and new security features curb the possibility of MongoDB instances being left mistakenly exposed.”
Readers will note that the above mentioned ‘retryable writes’ move the complexity of handling systems failures from the application to the database. Instead of developers having to implement custom, client-side code to handle network errors during updates, MongoDB automatically retries writes.
Navigating schema, with a compass
Other noted features include MongoDB Compass, which allows users to analyse and understand database schema.
Compass now includes query auto-complete, query history and table views.
For users who are looking for other features, the new Compass Plugin Framework gives them the power to build and distribute plugins to make MongoDB Compass their ideal navigation tool.
Also of note in the forthcoming release, MongoDB Ops Manager (which allows users to manage, optimize, secure and back up globally distributed clusters) now has a new Data Explorer, Real-Time Performance Panel and Performance Advisor.
Ops Manager 3.6 makes it easier than ever for ops teams to inspect and improve database performance in real time.
A new dimension of agility
“Agility is expressed both in terms of speed… and in terms of flexibility in handling various data formats and volumes, as well as speed in terms of application innovation. Often, one’s data management technology is an inhibitor in both regards. MongoDB 3.6, with its features that address both dimensions of agility as well as increasing demands for better security and geographic flexibility, is well poised to power the functionality that will make enterprises winners in this digitally driven economy,” said Carl Olofson, research vice president for data management software at IDC.
Schema governance with JSON schema lets developers and ops teams combine the flexibility of the document model with data conformance and validation capabilities to precisely the degree that best suits their application.
Because schema validation is fully tunable, teams can add pinpoint validation to only the critical fields of their model, or start a project with data governance appropriate to the development stage and tighten them during production stages. Now teams can benefit from the ease of development that the document model offers, while still maintaining the strict data governance controls that are critical for applications in regulated industries.
Are developers making the most of containers, or are they [just] treating them like virtual machines?
Could the dawn of cloud-native software application development herald a new era when programmers focus on doing more with containers by virtue of their own work with cloud computing environments, tools and platforms?
Ceppi writes as follows…
While it is indisputable that containers are one of the hottest tickets in open source technology, (with 451 Research projecting more than 250% growth in the market from 2016 to 2020), the question still remains – are developers making the most of them?
It’s easy to see why it’s such an enticing option.
Container technology can combine speed and density with the security of traditional virtual machines and requires far smaller footprint operating systems in order to run.
Containers offer a new form of virtualisation, providing almost equivalent levels of resource isolation as a traditional hypervisor.
Additionally, containers present lower overheads both in terms of lower memory footprint and higher efficiency. This means that higher density can be achieved – simply put, you can get more for the same hardware.
The age of LXD
The telco industry has been at the cutting edge of adopting LXD machine container technology. Part of the catalyst for this trend has been the NFV (network function virtualisation) revolution – the concept of telcos shifting what were traditionally welded-shut proprietary hardware appliances into virtual machines.
NOTE: LXD is a next generation system container manager — it offers a user experience similar to virtual machines but using Linux containers instead.
In this sense, it is unarguable that developers are treating containers like virtual machines, even though containers used in their traditional sense offer both higher performance to the end user, as well as operational efficiency for the cloud administrator.
Unfortunately, many CIOs are still unsure if containers are the best option of technology for them, due to wider market misconceptions. For example, some believe that by using one particular type of container, they are going to tie themselves into a specific vendor.
Another common misconception that might present an obstacle to enterprise or developer adoption is security. There are, however, controls in place that enable us to say, with confidence, that an LXD machine container is more than secure enough to satisfy the CIO that is, understandably, more security-conscious than ever.
Container technology has brought about a step-change in virtualisation technology.
Organisations implementing containers see considerable opportunities to improve agility, efficiency, speed and manageability within their IT environments. Containers promise to improve data centre efficiency and performance without having to make additional investments in hardware or infrastructure.
For Linux-on-Linux workloads, containers can offer a faster, more efficient and cost-effective way to create an infrastructure. Companies using these technologies can take advantage of brand-new code, written using modern advances in technology and development discipline.
We see a lot of developers and small to medium organisations adopting container technology as they emerge from scratch, but established enterprises of all sizes and in all industries also need to channel this spirit of disruption to keep up with the more agile and scalable new kids on the block.
You can follow Marco Ceppi on Twitter here.
This is a guest post for the Computer Weekly Open Source Insider column written by Tim Mackey in his capacity as technology evangelist for open source applications and container management & security firm Black Duck Software.
As detailed on Computer Weekly here, containers encapsulate discrete components of application logic provisioned only with the minimal resources needed to do their job.
Containers are easily packaged, lightweight and designed to run anywhere — multiple containers can be deployed in a single Virtual Machine (VM).
Mackey writes as follows…
Considering the fact that managing container infrastructure in a production environment becomes challenging due to the scale of deployment, one of the biggest problems is trust—specifically trust of the application.
Quite simply, can you trust that all containers in your Kubernetes or OpenShift cluster are performing the tasks you expect of them?
Container assertions, you gotta make ’em
To answer those questions, you need to make some assertions:
- That all containerised applications were pen-tested and subjected to static code analysis;
- That you know the provenance of the container through signatures and from trusted repositories;
- That appropriate perimeter defences are in place and authorisation controls are gating deployment changes.
These assertions define a trust model but omit a key perspective – the attacker profile.
When defending against attackers at scale, you need to understand what information they use to design their attacks.
Shifting deployment responsibilities
When you’re using commercial software, the vendor is responsible for deployment guidance, security vulnerability notification and solutions for disclosed vulnerabilities. If you’re using open source software, those responsibilities shift.
When Black Duck analysed audits of over 1000 commercial applications we found the average application included 147 unique open source components. Tracking down the fork, version and project stability for each component is a monumental task for development teams.
Potential attackers know how difficult it can be to put together this information and they exploit the lack of visibility into open source components. In order to be effective, there are two primary ways for a hacker to create an attack on a given [open] component.
Component attack #1
First, they contribute code in a highly active area of the component to plant a back door, hoping that their code won’t be noticed in a rapidly evolving component.
Component attack #2
Second, hackers look for old, stable code. Why? Older code may have been written by someone who has left the project, or doesn’t recall exactly why it was written that way.
The goal in both cases is to create an attack against the component, so hackers test, fail and iterate against the component until they’re successful or move on to another component.
Hacking the hackers
However, even when attacked by a prepared hacker, you can make it much harder for them to mount an attack. Consider an attacker who recognises they’re in a container and assumes there are multiple containers with the same profile. As an administrator, you can randomise the memory load location, set up kernel security profiles, and enable roles based access. These are a few changes that make it harder for hackers to know whether they created a viable attack or not.
We know that containerisation has increased the pace of deployment, creating questions of trust for many administrators.
Key to protecting your applications in production is by maintaining visibility into your open source components and proactively patching vulnerabilities as they are disclosed.
If you assume from the outset your containers will be compromised, you can prepare by making changes that make it much harder to mount an attack from a compromised container.
This article from is based on Tim Mackey’s presentation from 20th October at DevSecCon, London: “When Good Containers Go Bad.”
American telecommunications mainstay AT&T is working with Indian multinational IT and networking technology firm Tech Mahindra to build an open source Artificial Intelligence (AI) platform named Acumos.
Hosted by The Linux Foundation, the platform is hoped to provide a marketplace for accessing, using and enhancing those applications.
The firm says the industry needs a way to make those apps reusable and accessible to those beyond the company that created them and simplify deployments to lower the barrier to entry.
The Acumos platform is an extensible framework for machine learning solutions — it provides the capability to edit, integrate, compose, package, train and deploy AI microservices.
Simply put, it’s an AI marketplace where applications can be chained to create complex and sophisticated AI services.
According to AT&T, “Take someone who wants to create an AI application for video analytics. The Acumos platform gives them a variety of applications to choose from, like location tracking and facial recognition. The platform interface lets you choose AI capabilities and stitch them together automatically so they function as a single application. In this case, the new service could identify where the video was shot based on background landmarks, and identify the speakers in it – design and deploy in a single interface and with minimal additional code development.”
Content curation, autonomous cars, drones, and augmented reality/virtual reality are other areas where AI models could be used with the Acumos platform.
“Our goal with open sourcing the Acumos platform is to make building and deploying AI applications as easy as creating a website,” said Mazin Gilbert, vice president of advanced technology at AT&T Labs. “We’re collaborating with Tech Mahindra to establish an industry standard for AI in the networking space. We invite others to join us to create a global harmonization in AI and set the stage for all future AI network applications and services.”
The Acumos platform is built on open source technologies and can federate across the various AI tools available today, enabling easy access for developers and businesses.
Linux Foundation executive director Jim Zemlin explains that the organisation has previously used this collaborative model to launch ONAP, the operating system for virtualised networks.
The involved players here are getting the initial framework into open source as quickly as possible so the developer community can accelerate the development of the platform.
Branded as an open source event-driven microgateway, this news comes at the same time as a new release of open source Project Flogo, the firm’s efforts to bring embedded machine learning capabilities to edge computing.
The two projects are meant to work in unison to give developers a route to using event-driven microservices.
What is a microgateway?
As explained by IBM, a gateway is used to protect and control access to API services often related to security and controlling traffic — it is, essentially, a proxy device that works to secure and forward requests to backend APIs.
Project Flogo provides an open model for developers to run deep learning frameworks within a microservice flow, launched with support for Google’s TensorFlow and TIBCO Statistica. With Project Flogo deployed on edge devices, microservice flows can process data locally and predict imminent events to take action on, without the transfer of information to and from the cloud.
Project Mashling is an ultralight event-driven microgateway that accelerates the development of established event-driven microservices.
“The release of these projects underscores TIBCO’s commitment to the open source community and our desire to foster innovation in the microservice and edge computing ecosystem,” said Rajeev Kozhikkattuthodi, vice president, product management, TIBCO. “We embrace an open core model, which allows customers to benefit from collaboration with the open source community, as well as accessing new innovations incorporated into our commercial offerings and TIBCO Connected Intelligence Cloud.”
The solution is designed to complement full lifecycle API management solutions, such as TIBCO Mashery. Project Mashling offers visual and integrated development environment tooling and reusable microgateway patterns developed by the Mashling community.
In line with this news, TIBCO also announced TIBCO Cloud Messaging, along with Community Editions for TIBCO FTL, TIBCO eFTL and TIBCO ActiveSpaces. The firm is aiming to make enterprise-class in-memory data grid and high performance messaging tools more accessible to mainstream organisations, including midsize businesses.
The Pentaho brand is now a fully signed up card-carrying element of Hitachi Vantara.
But making good on its promise to invest in what was a company and is now a brand/product, the PentahoWorld 2017 user conference saw Hitachi Vantara launch the the Pentaho 8.0 version release.
This data integration and analytics platform software is now enhanced with support for Spark and Kafka to improve data and stream processing.
Note: Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. Apache Kafka is a distributed publish-subscribe messaging system designed to replace traditional message brokers.
Hitachi Vantara also points out product enhancements to Pentaho which see it up its ability to match compute resources with business demands, in real time.
According to analyst-style estimates from IDC, the global datasphere will grow to 163 zetabytes by 2025.
IDC also forecasts that more than a quarter of that data will be real-time in nature, with IoT data making up more than 95-percent of it.
If these predictions hold any water, Hitachi Vantara acquisition of (and investment in) Pentaho would appear to be fairly validated.
“We want to help customers to prepare their businesses to address this real-time data deluge by optimising and modernising their data analytics pipelines and improving the productivity of their existing teams,” said the firm, in a press statememt.
New enhancements to the Pentaho 8.0 platform include:
- Stream processing with Spark: Pentaho 8.0 now enables stream data ingestion and processing using its native engine or Spark. This adds to existing Spark integration with SQL, MLlib and Pentaho’s adaptive execution layer.
- Connect to Kafka Streams: Kafka is a very popular publish/subscribe messaging system that handles large data volumes that are common in today’s big data and IoT environments. Pentaho 8.0 now enables real-time processing with specialized steps that connect Pentaho Data Integration (PDI) to Kafka.
- Big data security with Knox: Building on its existing enterprise-level security for Cloudera and Hortonworks, Pentaho 8.0 now adds support for the Knox Gateway used for authenticating users to Hadoop services.
“On the path to digital transformation, enterprises must fully exploit all the data available to them. This requires connecting traditional data silos and integrating their operational and information technologies to build modern analytics data pipelines that can accommodate a more connected, open and fluid world of data,” said Donna Prlich, chief product officer for Pentaho software at Hitachi Vantara. “Pentaho 8.0 provides enterprise scale and faster processing in anticipation of future data challenges to better support Hitachi’s customers on their digital journeys.”
Also here we see find enhancements to optimise processing resources. The firm says that every organisation has constrained data processing resources that it wants to use intelligently, guaranteeing high availability even when demand for computation resources are high.
To support this, Pentaho 8.0 provides worker nodes to scale out enterprise workloads: IT managers can now bring up additional nodes and spread simultaneous workloads across all available computation resources to match capacity with demand.
This matching provides elasticity and portability between cloud and on-premises environments resulting in faster and more efficient processing for end users.
Pentaho 8.0 also comes with several new features to help increase productivity across the data pipeline. These include granular filters for preparing data, improved repository usability and easier application auditing.
For more on this subject read Hitachi Vantara PentahoWorld 2017 major trends in data clarified.
The Computer Weekly Open Source Insider blog has reached 1000 posts since starting in June 2010.
This very short post is intended to convey a heartfelt thank you to all the firms (and their corresponding communications engines, whatever shape they may take) who have provided solid no-spin commentary and helped us understand how open platforms, open design and open computing has developed over what is most a the last decade.
Here’s to what’s next… and may it stay open.
Canonical has announced the release of the Ubuntu 17.10 operating system featuring a new GNOME desktop on Wayland and new versions of KDE, MATE and Budgie to suit a range of tastes.
The firm says that version 17.10 brings Kubernetes 1.8 for hyper-elastic container operations and minimal base images for containers.
Canonical reminds us that this is the 27th release of Ubuntu — and that this is the world’s most widely used distribution of Linux.
“Ubuntu 17.10 is a milestone in our mission to enable developers across the cloud and the Internet of Things” said Mark Shuttleworth, CEO and founder of Canonical. “With the latest capabilities in Linux, it provides a preview of the next major LTS and a new generation of operations for AI, container-based applications and edge computing.”
In this release we also find enhanced security and productivity for developers.
The Atom editor and Microsoft Visual Studio Code are emerging as the new wave of popular development tools, and both are available across all supported releases of Ubuntu including 16.04 LTS and 17.10.
The new default desktop features the latest version of GNOME with extensions developed in collaboration with the GNOME Shell team aim to provide a familiar experience to long-standing Ubuntu users.
Connecting to WiFi in public areas is simplified with support for captive portals. Firefox 56 and Thunderbird 52 both come as standard together with the latest LibreOffice 5.4.1 suite.
Ubuntu 17.10 also supports driverless printing with IPP Everywhere, Apple AirPrint, Mopria, and WiFi Direct. This release enables simple switching between built-in audio devices and Bluetooth.
Ubuntu 17.10 ships with the 4.13 based Linux kernel, enabling the latest hardware and peripherals from ARM, IBM, Dell, Intel etc.
Also here, Ubuntu 17.10 features platform snaps for GNOME and KDE which enable developers to build and distribute smaller snaps with shared common libraries. Delta updates already ensure that snap updates are generally faster, use less bandwidth, and are more reliable than updates to traditional deb packages in Ubuntu.
The 17.10 kernel adds support for OPAL disk drives and numerous improvements to disk I/O. Namespaced file capabilities and Linux Security Module stacking reinforce Ubuntu’s leadership in container capabilities for cloud and bare-metal Kubernetes, Docker and LXD operations.