Open Source Insider


July 18, 2018  9:00 AM

What are ‘mature’ stateful applications?

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

BlueK8s is a new open source Kubernetes initiative from ‘big data workloads’ company BlueData — the project’s direction leads us to learn a little about which direction containerised cloud-centric applications are growing.

Kubernetes is a portable and extensible open source platform for managing containerised workloads and services (essentially it is a container ‘orchestration’ system) that facilitates both declarative configuration and automation.

The first open project in the BlueK8s initiative is Kubernetes Director (aka KubeDirector), for deploying and managing distributed ‘stateful applications’ with Kubernetes.

Apps can be stateful or stateless.

A stateful app is a program that saves client data from the activities of one session for use in the next session — the data that is saved is called the application’s state.

The company reminds us that Kubernetes adoption is accelerating for stateless applications and microservices… and the community is beginning to evolve and mature the capabilities required for stateful applications.

Mature stateful apps?

What they (it, the company) really means here are large-scale distributed typically complex stateful applications

These large-scale distributed stateful applications – including use cases in analytics, data science, machine learning (ML) and deep learning (DL) applications… plus also for AI and big data use cases – and the problem is that these apps are still complex and challenging to deploy with Kubernetes.

Typically, stateless applications are microservices or containerised applications that have no need for long-running [data] persistence and aren’t required to store data.

But, that being said, cloud native web services (such as a web server or front end web user interface) can often be run as containerised stateless applications since HTTP is stateless by nature: there is no dependency on the local container storage for the workload.

Stateful applications, as stated above, are services that save data to storage and use that data; persistence and state are essential to running the service.

Example uses

These mature stateful apps include databases as well as complex distributed applications for big data and AI use cases: e.g. multi-service environments for large-scale data processing, data science and machine learning that employ open source frameworks such as Hadoop, Spark, Kafka, and TensorFlow as well as a variety of different commercial tools for analytics, business intelligence, ETL and visualization.

Kumar Sreekanti, co-founder and CEO of BlueData explains that in enterprise deployments, each of these different tools and applications need to interoperate in a single cohesive environment for an end-to-end distributed data pipeline. Yet they [mature stateful apps that is] typically have many interdependent services and they require persistent storage that can survive service restarts. They have dependencies on storage and networking, and state is distributed across multiple configuration files.

Sreekanti points out that the Kubernetes ecosystem has added building blocks such as Statefulsets – as well as open source projects including the Operator framework, Helm, Kubeflow, Airflow, and others – that have begun to address some of the requirements for packaging, deploying, and managing stateful applications.

But, claims BlueData, there are still gaps in the deployment patterns and tooling for complex distributed stateful applications in large-scale enterprise environments.

BlueData recently joined the Cloud Native Computing Foundation (CNCF) – the organisation behind Kubernetes and other cloud native open source projects – in order to foster collaboration in this area with developers and end users in the Kubernetes ecosystem.

KubeDirector is currently in pre-alpha and under active development.

July 13, 2018  10:52 AM

GitHub Enterprise 2.14 is ‘open goodness’ behind an enterprise firewall

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

GitHub Enterprise 2.14 has arrived this week.

This latest version of the web-based code repository and version control system also of course now features collaborative functions, options for bug tracking and features related to task management — it is, a portal with many Wikis indeed.

Now under the auspices, stewardship and ownership wing of Microsoft, the most prevalent new features in this version update is  unified search.

This search function is intended to allow users to wade through github.com content directly from GitHub Enterprise.

In order to do so, customers must have both a GitHub Enterprise account and a GitHub Business Cloud account.

The good news

What this means, in many ways, is an option to open source ‘innovations’ from a commercial enterprise footing behind an enterprise firewall  – and that (arguably) is not a bad thing i.e. why not open a window into the community contribution model of code development for enterprise users?

Could this, even, be the beauty of Microsoft’s ownership of GitHub?

GitHub says it’s a chance to find public content and collaborate with the entire GitHub community without sacrificing security.

Upping the API

Also here is the Checks API, which helps integrators build tools for continuous integration, linting (see below) and acceptance testing on GitHub.

As noted on StackOverflow: Linting is the process of running a program that will analyse code for potential errors. Lint was the name originally given to a particular program that flagged some suspicious and non-portable constructs (likely to be bugs) in C language source code.

We should also note (with regard to the Checks API) that previously, integrators could report success or failure of a build and include a link to more information using the Statuses API. With the new Checks API, they can specify more status information during builds and collect richer data, providing a more integrated experience for developers.

Other features

Users of the GitHub Enterprise 2.14 edition will also find multiple issue templates, ignore white space functions (when reviewing code, a diff with a number of white space changes can distract from the changes that matter), options to insist upon multiple reviewers (ss projects grow, users  may want additional reviews for a team’s code changes) and a variety of easier admin improvements.

A GitHub technical blog is posted here.


July 10, 2018  9:01 AM

Acquia CTO defines ‘decoupled’ Drupal

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

Many open source enthusiasts (practitioners, paragons, partisans, preachers and protagonists) will have heard of Drupal.

For those that haven’t, Drupal is an open source content management framework, as well as an extended community of developers, maintainers and business supporters.

Acquia Inc. is a software-as-a-service firm that provides enterprise-grade hosting, support and services for Drupal.

So that’s Drupal, but what is Decoupled Drupal?

Acquia CTO and founder of Drupal, Dries Buytaert, has posted a blog at the start of the year which explains how Decoupled Drupal works and the ways in which businesses can use it.

Here’s a snapshot of what Buytaert is saying and you can read more at the above linked blog mention.

Three levels of coupling

Traditional Drupal architecture (coupled Drupal) is a monolithic implementation where Drupal maintains control over all front-end and back-end concerns. The ‘coupling’ in this sense, is Drupal’s hard-wired control relationships at both front and back.

Buytaert says that traditionally coupled traditional Drupal is ideal for traditional websites, that is – it will offer fast deployment (time to market) options and alleviate a content creator’s reliance on front-end developers.

Detailing a second approach, Buytaert says progressively decoupled Drupal offers an approach that strikes a balance between editorial needs like layout management and developer desires to use more JavaScript, by interpolating a JavaScript framework into the Drupal front end.

“Progressive decoupling is in fact a spectrum, whether it is Drupal only rendering the page’s shell and populating initial data — or JavaScript only controlling explicitly delineated sections of the page. Progressively decoupled Drupal hasn’t taken the world by storm, likely because it’s a mixture of both JavaScript and PHP and doesn’t take advantage of server-side rendering via Node.js. Nonetheless, it’s an attractive approach because it makes more compromises and offers features important to both editors and developers,” wrote Buytaert.

Last but not least, Buytaert comes to fully decoupled Drupal and says that this model has gained more attention in recent years as the growth of JavaScript continues with no signs of slowing down.

“This involves a complete separation of concerns between the structure of your content and its presentation. In short, it’s like treating your web experience as just another application that needs to be served content. Even though it results in a loss of some out-of-the-box CMS functionality such as in-place editing or content preview, it’s been popular because of the freedom and control it offers front-end developers,” noted Buytaert.

Horses for courses

Buytaert says that the most critical to any decision to decouple Drupal is the must-have feature set desired for both editors and developers.

He asserts that in order to determine whether you should use a decoupled Drupal, it’s important to isolate which features are most valuable for your editors and developers.

Unfortunately he concludes, it’s a horses for courses situation and there is are no black-and-white answers here; every project will have to weigh the different pros and cons.

Buytaert: Drupal provides a spectrum of architectural possibilities tuned to the diverse needs of different organisations.


July 9, 2018  7:48 AM

Indico Enso opens open route to ‘transfer learning’ AI data

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

With its focus on AI software for unstructured content, Boston-based Indico has now come forward with a new open source project focused on simplifying the use of transfer learning with natural language.

What is transfer learning?

Transfer learning (although initially also applicable to humans) is a part of machine learning — it is the process through which knowledge gained from solving one problem can be applied to a different (often tangentially related) problem or analysis case.

For example, knowledge gained while learning to recognise dogs can be applied to the process of attempting to recognise cats… and knowledge gained while learning to recognise cars can be applied to the process of attempting to recognise trucks & lorries…  and so on.

Back to Indico then. The company has produced Enso, an open-source library designed to streamline the benchmarking of embedding and transfer learning methods for a wide variety of natural language processing tasks.

It provides machine learning engineers and software developers with a standard interface and tools for the fair comparison of varied feature representations and target task models.

“The open source community is the driving force for innovation in machine learning, and Indico has benefitted from it and embraces the open source effort fully,” said Slater Victoroff, co-founder and CTO at Indico. “Enso is a way for us to give back to the community and continue to promote the benefits of transfer learning to accelerate its adoption and reduce the barriers to machine learning.”

To date, transfer learning has seen success in the field of computer vision and image classification.

One of its major problems associated with transfer learning is the so-called ‘overfitting’ to specific datasets, that is – many of the models used for benchmarking are tied to specific datasets making it too difficult to take a model trained for one domain and train it on another.

The Enso project promotes the availability of more general datasets and stronger baselines to compare research against. This is said to help users ascertain where application of a given method is effective and where it is not — the end result, in theory, being a chance to accelerate the application of machine learning for more practical purposes.

Enso is compatible with Python 3.4+.


July 5, 2018  10:42 AM

Reply: open data is ‘intellectual infrastructure’

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

Our embrace of so-called ‘open data’ is coming to the fore, but what is it, why does it matter and what role should it play in the wider development of enterprise technology infrastructures?

According to Jason Hill, Executive Partner from Reply, open data is as it sounds – open and accessible data that is available to anyone.

Hill further states that open data must be interoperable so it can be shared, adapted and reused with other datasets.

Why ask the somewhat obtusely ‘new age’ named Reply?

Because the firm specialises in consulting, system integration and digital services, with a focus on the design and implementation of solutions based on the web and social networks – hence, it touches a lot of data and an increasing amount of that information is starting to be pushed towards open data frameworks.

The Open Data Institute states that open data is only useful if it’s shared in ways that people can actually understand. It needs to be shared in a standardised format and easily traced back to where it came from.

Intellectual infrastructure

So open data is data that is freely available to anyone to analyse, process and republish without restrictions from copyright, patents or other mechanisms of control.

According to Hill, “Open data should be considered intellectual infrastructure, as essential to the smooth running of our societies as roads and railways and critical to the insights of private enterprise. Open data lets people identify trends, fill gaps and collaborate on large scale projects. For example, open data help city planners (and citizens) analyse and interpret people movements and predict individual journeys based on factors like disrupted transport and weather patterns so they can provide intuitive and smart city solutions for their citizens. In healthcare, open data helps researchers identify large-scale trends and ‘causational’ factors to illness that could lead to disease reduction or improved treatments.”

Reply’s Jason Hill reminds us that examples of open data include interactive maps, historical weather data, flight plans, government spending and new scientific records.

CWDN readers may wish to note that the annual Reply Code Challenge is a team programming competition open to students and professional coders.

The Reply Code Challenge.


July 3, 2018  9:17 AM

The 5 API cornerstones of ADE (API Development Environment)

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

Slightly cheesy with a dash of spin perhaps? But API (Application Programming Interface) development is now a real ‘thing’, so should we look more deeply into what API development involves? For real developers, we mean.

The ADE (API Development Environment) streamlines the development process for API developers.

ADE, a new term created by Postman, is argued to be a logical extension of the IDE (Integrated Development Environment), which provides an integrated set of components to support software development within a User Interface.

The API workflow complements and overlaps with the software development cycle.

The company says that a strong ADE will integrate with software development at multiple points to provide these benefits and more in a single environment:

  1. Testing & Debugging: ADE provides a single place to debug, create tests and scripts, and run automated tests over time.
  2. Accurate API Documentation: ADE enables devs to maintain a single source of truth for the API as it gets updated and improved over time.
  3. Collaboration & Version Control: ADE allows devs to collaborate in real time with effective access to version control.
  4. Flexibility in Specification and Design: ADE captures multiple forms of existing API specs and allows creation of an API spec.
  5. Ease of Publishing: ADE helps API publisher get their API in the hands of developers so they can onboard quickly and effectively.


June 27, 2018  8:40 AM

MapD gets granular on spatio-temporal data

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

What’s better than data analytics?

Well, one answer is Marmite mixed with peanut butter and orange marmalade, obviously.

MapD Technologies thinks it can go one better.

The company’s spin doctors have unabashedly labelled its product not analytics, but Extreme Analytics™ — with a fancy trademark and everything. So can MapD substantiate and technically validate this grandeur, or is this just another case of marketingspeak?

Open source at its heart, MapD 4.0 is a software product designed to handle large-scale interactive geospatial analytics.

The technology has been built with native support for geospatial data from the start and is tightly integrated with a GPU-based rendering engine. So what is it used for?

MapD 4.0 might be applied to location intelligence use cases such as:

  • Visually uncovering the relationship between demographic data and spending patterns on a map.
  • Uncovering driver behaviour patterns from connected vehicle telemetry.
  • Gauging cellular signal strength variances in a city, down to the block [street] level.

“Organisations are dealing every day with a deluge of location enriched data, from always-on mobile devices, IoT enabled objects, connected vehicles and location-stamped transactions. Many analytics tools lack the capabilities to handle this spatio-temporal data at granular levels. This represents a massive opportunity cost for all large businesses and government agencies,” said Venkat Krishnamurthy, Veep of product management, MapD.

Krishnamurthy claims that MapD 4.0 could give geospatial analytics to everyone, from techies right ‘down’ to citizen data scientists.

Delivered in open source, cloud and enterprise editions, MapD has been applied in telecom, financial services, defense and intelligence, automotive, retail, pharmaceutical, advertising and academia.

For geospatial analysts, MapD 4.0 natively supports geometry and geographic data types such as points, lines, polygons and multipolygons, as well as key spatial operators. A newly-enhanced rendering engine means users can now query and visualise up to millions of polygons and billions of points.

According to Krishnamurthy, “MapD 4.0 helps users ask questions and explore trends that were once too large or difficult to answer. Computation-heavy challenges are now possible at extreme speed, such as identifying two cargo trucks in one area, moving in the same direction and at the same time, while calculating their speed. Similarly, for retail, city planning or marketing purposes, users can create or select a customized geographic area anywhere in the world and instantly view demographic information in that area.”

In addition to its expanded polygon and rendering engine improvements, MapD 4.0 offers a number of improvements for enterprise-readiness that make it easier to support machine learning, access management and collaboration.

Showboating and marketingspeak?

Not so much, this is complex stuff, yes… extreme even, that appears to be well packaged and presented for a fascinating new use case stream.


June 26, 2018  7:19 AM

Pusher: treat developers ‘as customers’

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

Pusher is a developer tools company that makes communication and collaboration APIs for web and mobile applications.

The company’s core product is called Channels, developers use it to create features such as in-app notifications, activity streams, chat, real-time dashboards and multi-user collaborative apps.

News that almost passed us by last month saw Pusher released its new Pusher Developer Package as a new route offering access to 13 key services.

Pusher is joined in this project by service suppliers including Algolia, Auth0, ButterCMS, Chargebee, Cloudflare, Codeship, DataDog, DigitalOcean, Instabug, MongoDB, Mux.com, Nexmo and SendGrid.

Developers can use the key APIs for apps, from hosting, real-time communications, data storage and authentication to email, data measurement and search.

“We are happy to partner with Pusher on this project to help developers build apps. We care about saving them time and helping them overcome the challenges of reliably delivering emails as their products grow,” said Paul Ford, VP of community development, SendGrid.

Zan Markan, developer evangelist at Pusher says that Pusher was at the forefront of the PaaS revolution that focused on developers as customers

Pusher claims to currently have over 250,000 developer customers across 170 countries.


June 25, 2018  8:04 AM

SUSE CaaS Platform 3: ein fabelhaftes Container-Paket

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

Software-und System-Entwicklung (or, SUSE, to you and me) has been active this month and opened the box on SUSE CaaS Platform 3.

SUSE CaaS is the acronym for the open source operating system company’s Container-as-a-Service technology.

The firm presents SUSE CaaS Platform as a route to ‘packaged’ Kubernetes in a enterprise-class container management solution .

Originally designed by Google, Kubernetes is an open source system for automating the deployment and scaling of containerised applications and managing clusters of containers.

SUSE is focused on delivering an exceptional operator experience with this technology… and that ‘operator’ here is, typically, a member of a DevOps teams working to deploy, manage and scale container-based applications and services.

Deep competencies

How does SUSE pull this off?

Because, says Gerald (no relation to Michelle) Pfeifer, vice president of products and technology programs at SUSE, the firm explains that it has ‘deep competencies’ in infrastructure, systems, process integration, platform security, lifecycle management and enterprise-grade support.

“Properly deployed, Kubernetes leads to agility. But before application teams can use Kubernetes, the platform itself needs to be in place. It needs to be secure, well controlled and maintained. SUSE CaaS Platform helps enterprises provide and consume Kubernetes more easily with a complete solution that is designed with the platform operator, as well as the platform user, in mind,” asserted Pfeifer.

SUSE CaaS Platform 3 is said to expands choices for cluster optimisation, provide new support for more efficient and secure container image management… and it also simplifies deployment and management of long-running workloads.

Cluster-luck

The product works to optimise cluster configuration with expanded data centre integration and cluster reconfiguration options.

“Setting up a Kubernetes environment is simplified with improved integration of private and public cloud storage and automatic deployment of the Kubernetes software load balancer. A new SUSE toolchain module also allows customers to tune the MicroOS container operating system to support custom configuration. With the new cluster reconfiguration capabilities, they can transform a start-up cluster into a scalable and highly available environment,” said the company, in a product statement.

SUSE also tells us that users will be able to manage container images more efficiently and securely with a local container registry.

Customers can download a container image from an external registry once, then save a copy in their local registry for sharing among all nodes in the cluster.


June 19, 2018  9:51 AM

Kafkaesque: Instaclustr creates Kafka-as-a-Service

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater

Instaclustr has announced Kafka-as-a-Service in bid to provide an easier route to the real-time data streaming platform

An open source player from the start, the e-dropping Instaclustr specifies that this release follows an ‘early access programe’ that saw a handful of Instaclustr users deploy the Kafka-as-a-Service solution to manage high volume data streams in real-time.

Apache Kafka is an open source project providing distributed processing of continuous data streams.

The managed Kafka offering follows provisioning and management patterns used to deliver other open source technologies provided through the Instaclustr platform – including Apache Cassandra, Apache Spark, Apache Lucene and Elassandra.

The Kafka platform itself is also backed by data technologies designed to deliver scalability, high performance and uninterrupted availability.

“We believe providing Kafka-as-a-Service will be uniquely beneficial to enterprises looking to take advantage of this powerful data streaming technology,” said Peter Nichol, CEO, Instaclustr. “Our expertise with Kafka combined with our 20 million node hours under management make Instaclustr the most trusted and experienced Kafka service provider in the market. We’re excited to invite all customers and interested organizations to get the most out of what Kafka has to offer with our newest managed service.”

Additionally, Instaclustr provides customers with a SOC2-certified Kafka managed service for data management and client privacy.

Kafka can be run as a standalone managed service or integrated with the other open source data management technologies that Instaclustr provides. It is available within a choice of cloud environments, including AWS, Microsoft Azure, Google Cloud Platform and IBM Cloud.


Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: