Databricks announced a new open source project called MLflow for open source machine learning at the Spark Summit this month.
The company exists to focus on cloud-based big data processing using the open source Apache Spark cluster computing framework.
The company’s chief technologist Matei Zaharia says that the team built its machine learning (ML) approach to address the problems that people typically voice when it comes to ML.
Typical ML challenges
A myriad tools – spread across each ‘phase’ of ML lifecycle development from data preparation to model training.
“Unlike traditional software development, where teams select one tool for each phase, in ML you usually want to try every available tool (e.g. algorithm) to see whether it improves results. ML developers thus need to use and productionize dozens of libraries,” noted Zaharia, in a blog.
He also notes that because ML algorithms have dozens of configurable parameters, it is difficult to track which parameters, (code, and data) went into each experiment to produce a model.
Zaharia explains that without detailed tracking, teams often have trouble getting the same code to work again. Reproducing steps makes debugging tough too, obviously.
“[It’s also] hard to deploy ML. Moving a model to production can be challenging due to the plethora of deployment tools and environments it needs to run in (e.g. REST serving, batch inference, or mobile apps). There is no standard way to move models from any library to any of these tools, creating a new risk with each new deployment,” said Zaharia.
What we have ended up with is big vendors producing internal ML platforms that do something of the job, but are limited in scope because they are tied to each company’s own technology infrastructure.
It is built with an open interface and so designed to work with any ML library, algorithm, deployment tool or language.
It’s also built around REST APIs and simple data formats (e.g., a model can be viewed as a lambda function) that can be used from a variety of tools, instead of only providing a small set of built-in functionality.
“We’re releasing MLflow as an open source project that users and library developers can extend. In addition, MLflow’s open format makes it very easy to share workflow steps and models across organisations if you wish to open source your code,” said Zaharia.
Developers can use MLflow Tracking in any environment (for example, a standalone script or a notebook) to log results to local files or to a server, then compare multiple runs.
Databricks says it is ‘just getting started’ with MLflow, so there is a lot more to come. Apart from updates to the project, the team plans to introduce major new components (e.g. monitoring), library integrations and extensions such as support for more environment types in the months and weeks to come.
TigerGraph has brought forward a free Developer Edition of its graph analytics platform for lifetime non-commercial use.
But, please, what is graph analytics?
As defined nicely here by Hitachi Vantara’s Bill Schmarzo, “Graph analytics leverage graph structures to understand, codify, and visualise relationships that exist between people or devices in a network. Graph analytics, built on the mathematics of graph theory, is used to model pairwise relationships between people, objects, or nodes in a network. It can uncover insights about the strength and direction of the relationship.”
Mike Ferguson of IBM further defines four ‘types’ of graph analytics here.
- Path analysis: Determines the shortest distance between two nodes in a graph.
- Connectivity analysis: Determines weaknesses in networks. Community analysis: Determines distance and density–based analysis in groups of people and identifies whether individuals are transient or if the network will grow.
- Centrality analysis: Determines the most influential people in a social network or most highly accessed web pages.
Back to the news story that drove this post, TigerGraph produces a graph analytics platform for developers to create their own big data graph applications i.e. ones whose central function will feature an element of the types of data analytics relationships detailed above.
“As graphs continue to go mainstream, the next phase of the graph evolution has arrived. Cypher vs. Gremlin is no longer the right question to ask,” said Dr. Yu Xu, founder and CEO of TigerGraph. “The time has come to rethink graph analytics with TigerGraph and GSQL, the most complete query language on the market. One hour with our free Developer Edition is all you need to experience TigerGraph’s superiority in unlocking value from connected data at massive scale.”
This technology stores all data sources in a single, unified multiple-graph store that can scale out and up to explore, discover and predict relationships. Unlike traditional graph databases, TigerGraph can scale real-time multi-hop queries to trillions of relationships.
TigerGraph offers enterprise graph MPP (massively parallel processing), to support big data, complex business queries – all with GSQL, the graph query language that is claimed to be intuitive for people who already know SQL.
Other graph analytics tools include Neo4j and Amazon Neptune.
Computer Weekly Open Source Insider talks to Patrick McFadin in his role as vice president for developer relations at DataStax.
DataStax is a distributed cloud database built on Apache Cassandra – the firm is a key contributor to the Cassandra project and describes its technology as an always-on data platform.
This interview forms part of a mini-series of posts (also running on the Computer Weekly Developer Network related to the rise of what we have called the ‘holistic’ application.
Not perhaps a formally defined technology industry term, the holistic application is an app that has arisen in the age of cloud with containerisation, microservices and compartmentalisation of discrete components at its core.
CW: Can we put the holistic into legacy?
Patrick McFadin: It is possible to put a wrapper around more traditional applications that provides access to an API. For these legacy applications, this layered approach can be better and safer than trying to completely re-engineer and migrate a whole application from scratch. However, this should not be seen as removing the need to look at legacy applications and how we manage them. Each API call can be eventually replaced with a modern version without the front end code needing to be changed.
CW: Is the ‘cloud developer’ a naturally holistic continuous developer?
Patrick McFadin: Let’s think about cloud – when you implement new services on top of Azure or AWS, you are naturally consuming small amounts of underlying resources, most likely through API calls. So you are naturally going to move over to that more modern approach.
This should open up opportunities to look at tech like serverless and containers, as these help you keep your applications portable. If you want to go all in on public cloud, you can do that; however, this just exchanges one point of lock-in for another. You also have very little control or agency once you are consuming multiple services within one cloud. Large enterprises are looking at this very seriously, as they want to avoid the risk of not having an exit strategy when it comes to moving their data or their application back from a specific provider.
CW: Can you tell us more about cloud lock-in?
Patrick McFadin: Companies did not tend to worry about lock-in so much when it was an internal database sitting in their datacentre. Partly, I think this is because that database company would not be competing with them for customers like an Amazon might; partly, because that internal It felt like it was ‘yours’ rather than being rented from another company. Cloud feels very different, so that exit strategy and need to avoid lock-in has become more important.
Having said that, there will always be certain points of control where you exchange that flexibility for the ability to differentiate what you are doing. If you need specific services in a cloud platform to build a new application or to manage data clusters, then that is acceptable. Enterprise IT teams are going into this environment with their eyes open now, and I think more developers are too.
CW: Do holistic apps naturally promote microservices?
Patrick McFadin: Holistic applications are a great use case for microservices – rather than using one single application to perform a role, you have a set of individual elements that add up to deliver that result instead. There’ no single code path here — and each of those elements are glued together using APIs. This helps companies scale, as each element of an application can have more resources added to meet demand. This can even be automated to make life easier for your developer team, so they can concentrate on software rather than infrastructure.
The end result here is that holistic applications should help you deliver a better service to the customer. By federating all these elements together, you can put together that more personalised experience that includes all the results and data that this customer wants to see.
CW: Is there a downside to holistic architecture?
Patrick McFadin: The flipside of this is that you have a much more complex environment. Each element will be made up of a cluster of nodes running its own service, creating its own data, and having to meet its own SLA. This expansion can be more tricky to manage as it relies on all the elements working together well, and that the data from each of the services is managed effectively
CW: Do we need a different mindset for holistic?
Patrick McFadin: We do… and it’s all about affecting an actual progression in terms of mindset – it’s much harder to call a project ‘done’ as each element can be changed or updated at any time. While the services can be more flexible, it does mean thinking about the role of software developers differently. Companies that have implemented Agile development properly should be equipped to manage this change more effectively – those that namecheck agile or don’t engage in the process fully will struggle with this move.
Progress has announced the release of NativeScript 4.0, an open source framework for delivering cross-platform, native iOS and Android apps.
NOTE: As explained by Steve Fenton here: compiling is the general term for taking source code written in one language and transforming into another – and so therefore, ‘transpiling’ is a more specific term for taking source code written in one language and transforming into another language that has a similar level of abstraction.
The new version of NativeScript boasts Angular-based enhancements, streamlined workflows, advanced navigation scenarios, integration with Vue.js and out-of-the-box assets generation.
Angular developers should be happy.
Developer advocate for Angular Stephen Fluin has noted that NativeScript now provides official support for processes and tooling for building web and mobile apps with the Angular CLI from a single code base. This enables developers to add native mobile projects to existing Angular and web projects by reusing an existing code base.
This also includes support for Angular Schematics, the workflow tool focused on ease of use and development, extensibility and reusability, atomicity and asynchronicity.
NOTE: In database systems, atomicity (or atomicness; from Greek atomos, undividable) is one of the ACID (Atomicity, Consistency, Isolation, Durability) transaction properties. An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs.
LiveSync with Webpack
NativeScript developers can now enable LiveSync with Webpack simultaneously.
This allows for better development experience as developers can Webpack an application as part of the development process. This will make identifying and addressing issues earlier in the development lifecycle, prior to going into release mode, easier.
According to Progress, “While asset generation was previously a common problem for developers, the latest release of NativeScript is now able to generate icons and splash screens based on a single high-resolution image, as chosen by the developer. This saves time by eliminating the need for image editing.”
As noted above, NativeScript also has expanded functionality for Vue.js developers in relation to code sharing capabilities.
NativeScript has been downloaded more than two million times. It is originally developed and supported by Progress.
Alfresco Software has updated its open source process automation, content management and information governance software.
The immodestly named Alfresco Insight Engine creates dashboard-style statistics and data trend reports that previously required what the company decries as more arduous and time consuming Extract, Transform and Load (ETL) tools and techniques.
Results can be obtained by querying existing Solr indices with familiar SQL statements.
Users can use the out-of-the-box report builder or existing enterprise business intelligence tools to analyse unstructured information.
With a library of over 150 components, the Alfresco Application Development Framework uses angular and material design technologies to help developers build across multiple platforms.
Alfresco Process Services now includes a Quick Start for Amazon Web Services (AWS). Alfresco Content Services can be used by IT teams to build services, package them in Docker containers and then use Kubernetes for deployment into cloud environments such as AWS or Microsoft Azure.
Also in the product updates section here, Alfresco Governance Services 3.0 now offers enhanced features for records management and compliance, as well as enhanced desktop sync functionality for both Windows and Mac.
It appears that Build, Deploy & Iterate is the ‘new’ Extract, Transform & Load.
Mozilla has partnered with Open Tech Strategies for a research project, reviewing the open source industry and studying Mozilla projects closely.
Open Tech Strategies is a services consultancy that aims to provide advice on open source implementation. The company says its clients come to it for assistance in launching, joining, evaluating, or influencing open source software projects.
A report has been generated which claims to offers ‘a new conceptual framework’ of open source project archetypes.
This research cover aspects of open source spanning business objectives, licensing, community standards, component coupling and project governance.
It also contains some practical advice on how to use the framework (it actually is a working framework) and on how to set up projects.
Mozilla reminds us that the famous ‘Four Freedoms’ originally defined by the Free Software Foundation are unambiguous. But says that they only define the rights conveyed by the software’s license.
The company says that people (users) often have expectations that go well beyond that (above linked) arguably quite strict definition: expectations about development models, business models, community structure, even tool chains.
We have to realise that it is even not uncommon for open source projects to be criticised for failing to comply with those unspoken expectations.
According to a Mozilla blog, the company says that it recognises that there is no one true model.
“As Mozilla evolves more and more into a multi-product organization, there will be different models that suit different products and different environments. Structure, governance, and licensing policies should all be explicit choices based on the strategic goals of an open source project. A challenge for any organisation is how to articulate these choices, or to put it simply, how do you answer the question, ‘what kind of open source project is this?’,” notes the firm.
You can read more here.
Every company wants to be a platform company, this is one of life’s great software industry truths.
Once you have a ‘platform play’, then other applications (or application components and other discrete compartmentalised services) can be ‘coded to it’ and the platform itself can be opened to integrations via Application Programming Interfaces (APIs) and more.
Facebook (rightly or wrongly) is a platform, Windows (and indeed Office 365) are platforms and of course there are the big cloud platform players. All of these higher-level software constructs allow developers to ‘code to’ them, or indeed ‘code from’ them i.e. to build software that extends the core functionality initially offered.
Software AG positions its recently acquired Cumulocity division – which is now a brand inside of the parent company – as a platform for Internet of Things (IoT) applications.
Offering a 100% open approach to APIs, Cumulocity is focused on handling the complex basic infrastructure elements associated with IoT applications such as scalability, security or multi-tenancy.
The Cumulocity IoT platform includes a range of pre-packaged solutions such as condition monitoring, predictive maintenance and ‘track & trace’ functionality. It also includes device and sensor management. This is an OT-IT play, that is – a coming together of operational technology (OT) and information technology (IT).
According to Software AG, “Cumulocity IoT is unique in that it provides an IoT-as-a-Service solution that includes enhanced high availability and multi-cluster deployment options. Additionally, Cumulocity IoT incorporates several carrier-grade features, including code-free integration of devices supporting a variety of network technologies.”
Various types of IoT network technologies breaks down to: Low Power WAN (LPWAN) technologies used for long-term low bandwidth remote monitoring, Narrowband IoT (NB-IoT), Lightweight M2M (LWM2M) and Long Range (LoRa).
How does Cumulocity work?
So how does Cumulocity, as an IoT platform, actually work? The company has explained its technology as software that is capable of providing a ‘service wrapper’ for every IoT device project that a customer might want to deploy.
When we say ‘wrapped service’ in this sense, it is the platform’s ability to handle (as noted already) the core infrastructure side of IoT data operations — plus also, it’s ability to offer pre-packaged functions.
Over and above those core operational technologies, Cumulocity provides the ability to feed device data into dashboards that can track Key Performance Indicators (KPIs) and apply a layer of business logic to enable users to interpret what is happening.
The resultant service that Cumulocity produces is often white labelled – hence why the company refers to it as a wrapper. That service could be anything from vehicle emission reports to office equipment maintenance status – anything with an IoT sensor, basically.
But even with an IoT platform to back up your IoT project, Cumulocity says we need to approach deployments in this space in a specific strategic way.
“It’s like the expression – how do you eat an elephant? You cut it into pieces. It’s important to to start small and then scale up fast,” said Bernd Gross in his role as senior VP of IoT & cloud Software AG. “For a successful IoT deployment, it’s important to create an architectural set up that is not limited to one single use case because this can stifle your ability to scale in the longer term. Adopting an IoT platform approach allows you to change your business model and innovate over time.”
Software AG Cumulocity demonstrated its software working in motion at Internet of Things World in Santa Clara this month.
Imagine a bottling plant with sensors to track the following types of actions:
- bottles being filled
- tops being put on bottles
- labels being stuck onto bottles
- bottles being fitted into cases
- cases of bottles being set of pallets
Using Cumulocity’s visual control panels, a user can ‘declare a variable’ that describes the value recorded by a particular IoT sensor and be able track the value of that variable visually during live production. The system can also be set to flag thresholds for actions to be taken on the factory floor — and this is the part where business logic helps control the actions of the system within the data model that has been laid down for the use case in hand.
Software AG lists a selection key Cumulocity customer use cases as follows:
Trackerando markets a portfolio of GPS tracking solutions for a variety of applications, including fleet management, personal car locating, people tracking, elderly patient monitoring and pet recovery.
“It took us only six weeks from start to finish to build a fully customised IoT solution using Software AG’s Cumulocity IoT. This enables us to connect, monitor and track vehicles, people and assets in real-time and gain significant insights into the location of thousands of devices with diverse uses. Trackerando enables its customers to recover misplaced cars, monitor their children’s location, find lost pets and, on the industrial side, optimise fleet management routes and improve resource allocation,” said Bodo Erken, Trackerando, chief executive officer at Trackerando.
Software AG has also noted that STW (Sensor-Technik Wiedemann) is implementing a software system powered by Cumulocity for remote condition monitoring of exhaust gas treatment systems retrofitted to London buses as part of the Greater London Authority’s plans to create an Ultra Low Emission Zone (ULEZ).
HJS Emission Technology is the company that will provide the high-efficiency emissions systems to the Transport for London (TfL) buses, coupled with its UK partner, Emission Engineering Ltd (EEL).
By 2021, more than 5,000 buses in London’s public transport network are to be retrofitted with HJS Emission technology’s SCRT (Selective Catalytic Reduction Technology), which uses particle filters and catalytic converters, to reduce soot particles and nitrogen oxide (NOx). STW’s TC1 Telematics Controller (with built-in GPS) will also be fitted onto each bus in order to monitor their emissions at any time or location.
Software AG already had an arguably substantial IoT offering before its Cumulocity acquisition in the form of the real-time analytics and algorithms already present in its core IT stack. In that regard then, the addition of Cumulocity of course makes good business sense.
What will be interesting from this point forward is to see just how customers use the platform to ‘wrap’ the output of IoT sensors into orchestrated higher-level services connected to defined business logic designed to actually drive business actions.
The open hybrid cloud lies ahead of us, this is the way of things. This truism (if indeed it is one) is impacting the way firms like Red Hat are building out virtualisation technologies.
Red Hat Virtualization 4.2 is the newest release of the company’s Kernel-based Virtual Machine (KVM)-powered virtualisation software.
A Kernel-based Virtual Machine is an open source virtualisation technology built into Linux that allows us to turn Linux into a hypervisor, which in turn allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).
Built on Red Hat Enterprise Linux, Red Hat Virtualization 4.2 offers product updates in virtual networking.
The technology is introduced in tandem with the new version is Red Hat Virtualization Suite (comprised of Red Hat Virtualization and Red Hat CloudForms), Red Hat’s hybrid infrastructure management platform.
Essentially, this is all about creating a pre-integrated (and simplified) access point to open virtualisation technologies combined with management – and there’s a new User Interface (UI) as well.
Other functionality includes new Disaster Recovery (DR) capabilities for a native site-to-site failover capability. There’s also Red Hat Ansible Playbooks and Roles for automated failover and failback of DR processes, which Red Hat insists will limit the potential for human error to cause data and operational losses.
“Open Virtual Network (OVN) has been integrated with Red Hat Virtualization 4.2 to deliver a native SDN solution through Open vSwitch. This is designed to provide automated management of network infrastructure and a Neutron compatible API for external network providers, as well as network self-service for users, which helps to free up network administrators from user requests for additional infrastructure,” said the company, in a press statement.
Red Hat has also upped the metrics and logging options.
The new metrics and logging features offer reporting and visualisation capabilities built around the Elasticsearch, Fluentd and Kibana (EFK).
The firm underlines this news by saying that open integration is a key tenet of Red Hat Virtualization – our mission here is to be able to manage heterogeneous environments across multiple clouds, hypervisors, containers and traditional computing infrastructure.
TIBCO extends its open source credentials this month with commercial support and services for Apache Kafka as part of the firm’s branded Messaging product line.
Apache Kafka is a distributed open source publish-subscribe messaging system designed to replace traditional message brokers – as such, it can be classed as a stream-processing software platform. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. It is written in the Scala and Java programming languages.
Let’s remember that short story novelist Franz Kafka himself was a fan of ‘systems for optimised writing’, which is why Apache Kafka was so named.
Already known as a company focused on data transfer and integration by data bus (TIBCO stands for The Information Bus COmpany), TIBCO Messaging – Apache Kafka Distribution will provide integration between other TIBCO Messaging components and Apache Kafka.
TIBCO says that Messaging customers can now bridge Apache Kafka applications to their existing investments in TIBCO FTL, TIBCO eFTL and TIBCO Enterprise Message Service technologies.
The company also announced the availability of MQTT broker capabilities within TIBCO Messaging, via the open source Eclipse Mosquitto project. MQTT is a popular messaging protocol widely used in IoT scenarios.
“Our announcement of support for Apache Kafka and Eclipse Mosquitto as first-class citizens is an important next step in the continued evolution of TIBCO Messaging and our efforts in open source, including Jaspersoft, Project Flogo and Project Mashling,” said Matt Quinn, chief operating officer, TIBCO. “By adopting popular open-source projects, we are supporting the evolving needs of our customers, while also sharing our messaging experience with the broader OSS community.”
This announcement is basically an offering to data architects and data developers to say that they can now select a single offering that provides a spectrum of capabilities, ranging from high-volume batch processing, to ultra-low-latency distribution, to streaming and IoT messaging.
This could be deployed in use cases such as applications that use data from the billions of device endpoints, including data streams from low power IoT devices and gateways; or perhaps event-driven architectures with loosely coupled microservices that rely on an ultra low-latency messaging infrastructure.
TIBCO Messaging is available for free as a community edition, allowing for production use up to 100 instances. Commercial subscriptions are, obviously, not free.
Red Hat changes its tagline from time to time, but this year the firm appears to happy being labelled as ‘the world’s leading provider of open source solutions’ — perhaps, with Microsoft and so many others picking up the flame, Red Hat feels it need to state its aim with such simplicity.
Branding shenanigans notwithstanding, Red Hat Enterprise Linux (RHEL) must go through its release cycles and now we reach version 7.5 in all its glory.
Big themes this release include a view that the operating system is a) a foundation for hybrid cloud environments b) has enhanced security and compliance controls and c) further integration with Microsoft Windows infrastructure both on-premise and in Microsoft Azure.
The hybrid play here is, of course, because organisations are frequently seeking to pair existing infrastructure and application investments with both bare metal and public clouds.
But hybrid brings with it ‘multiple deployment footprints’, so Red Hat is aiming to align security controls for that aspect.
A major component of these controls is security automation through the integration of OpenSCAP with Red Hat Ansible Automation. This is designed to enable the creation of Ansible Playbooks directly from OpenSCAP scans which can then be used to implement remediations more rapidly and consistently across a hybrid IT environment.
This release also includes storage optimisation. Virtual Data Optimizer (VDO) technology reduces data redundancy and improves storage capacity through de-duplication and compression of data before it lands on a disk.
There’s love for Linux systems administrators, troubleshooters and developers too through enhancements to the cockpit administrator console.
New functionality and integration with Windows-based infrastructure is now offered via improved management and communication with Windows Server implementations, more secure data transfers with Microsoft Azure… and performance improvements for complex Microsoft Active Directory architectures.
Overall, says Red Hat, this can help to provide a smoother transition for organisations seeking to bridge the scalability and flexibility of Red Hat Enterprise Linux 7.5 implementations with existing Windows-based IT investments.
Red Hat Enterprise Linux 7.5 also adds full support for Buildah, an open source utility designed to help developers create and modify Linux container images without a full container runtime or daemon running in the background.
This enables IT teams to build and deploy containerised applications more quickly without needing to run a full container engine, reducing the attack surface and removing the need to run a container engine on a system not intended to do so in production.
Denise Dumas, vice president for platform engineering at Red Hat provides the summary comment.
“The future of enterprise IT doesn’t exist solely in the datacentre or in the public cloud, but rather as a fusion of environments spread across IT’s four footprints: physical, virtual, private cloud, and public cloud. Red Hat Enterprise Linux serves as a scalable, flexible and robust bridge across these footprints,” said Dumas.
Goodness! Footprint fusion and cross-platform containerised interconnectedness inside hybrid development spheres — this all points to some new umbrella term… interoperability-ness isn’t an adjective yet, but it could be, you have been warned.