What’s being ‘pushed to’ open source this last weekend or so?
Answer: quite a lot.
There’s a whole smörgåsbord of openness out there… but one project emanating from Redmond stands out for sure.
Microsoft’s Open Network Emulator (ONE) project is intended to simulate the entire Azure network infrastructure as a means of finding bugs, glitches and other nasties that end up contributing to network outages.
The project’s replication technology now moves to an official open source status according to Microsoft Research distinguished scientist Victor Bahl.
Why is this a ‘stand out’ piece of work?
The answer, for most people, arguably, would be that emulating a cloud network (and emulating something that encopasses huge chunks of the Azure cloud platform) in a massive task and very difficult thing to pull off.
Bahl has said that his team is trying to build the most reliable, the most accessible, the most secure, fastest network on the planet.
“We have decided that this is such an important resource for everybody that just hoarding it [for ourselves] is not the right thing to do. So, we are making it available to the entire community so that they can now – and it’s not just for production systems, but also for students that are now graduating. Because it emulates the network so well, they can actually do some amazing research without having major datacenters available to them.,” said Bahl.
As Bahl stresses, it’s been very hard [even for Microsoft] to emulate cloud-scale networks.
Why? Because they’re large, obviously.
But also because there’s a huge amount of work needed to follow through on network verification… because network verification ties into availability.
“So, let’s say everything is working perfectly well. Barring hardware failure, everything should be fine. But then somebody, who is part of your team, goes and changes something somewhere. [This can potentially] bring down an entire region, because, you know, if you break the network, your packets are going nowhere. They’re not going to the right places. Let’s put it that way. And that’s a complete no-no for businesses,” said Bahl.
A Microsoft Research interview with Dr Victor Bahl and accompanying half hour podcast is available here.
New England headquartered application development company Progress is flexing its programmer credentials this month.
The Massachusetts-HQ’d firm has now come forward with its Progress Spark Toolkit… but what is it?
The Progress Spark Toolkit is a set of open source ABL code combined with some recommended best-practices.
What is ABL?
OpenEdge Advanced Business Language (OpenEdge ABL) is a business application development language created and maintained by Progress. It boasts ‘English-like’ syntax and has ingrained connectivity to (and programmability for) relational databases… and as such, it is typically regarded to be a Rapid Application Development (RAD) tool.
Previously only available from Progress Services, the Spark Toolkit was created in collaboration with the Progress Common Component Specification (CCS) project.
What is the CCS project?
This is a Progress-specific programme that brings together a group of Progress OpenEdge customers and partners to define a standard set of specifications for the common components for building modern business applications.
Progress OpenEdge itself is an application development environment with various components including a development language (the aforementioned ABL), the Kendo User Interface building and a selection of other ‘companion solutions’ that focus on tasks such as data replication, data management… plus a whole set of application layer technologies including application server plus business process management & business rules management.
By engaging the community in its CCS project, Progress claims to be able to bring forward best practices for the development of these standards-based components and tools for interoperability, flexibility and so on.
John Ainsworth, senior vice president of core products at Progress notes that there are currently 10 components available within the Spark Toolkit. The first three are required components focused on starting up and bootstrapping sessions, business services and an authentication component.
“The remaining components focus on everything from connecting to a service, to logging and catalog management, tagging and mapping and more,” said Ainsworth
The toolkit’s compliance with the CCS means that customers avoid component lock-in risk and can choose from a variety of vendors that implement to the standard. It is compatible with the latest version of OpenEdge 11.7 and is available under Apache License 2.0.
More components are expected to be added in the future.
Cloud giant Amazon has announced the open source release of its Alexa Auto SDK (Software Development Kit).
The news comes from the company’s Amazon Voice Services (AVS) division.
The SDK is provided for software application developers working with automobile manufacturers in order for them to be able to build the Alexa-controlled intelligent assistant applications (and related smaller functions) into software systems intended to be deployed inside working cars.
As detailed on TechTarget, AVS and Alexa were first introduced with Echo, the company’s intelligent speaker, which enables voice interaction with various systems.
Alexa’s main competitors are Google Assistant, Apple Siri and Microsoft Cortana.
According to Amazon AVS, the Alexa Voice Service (AVS) enables [developers] to access cloud-based Alexa capabilities with the support of AVS APIs, hardware kits, software tools and documentation.
“We simplify building ‘voice-forward’ products by handling complex speech recognition and natural language understanding in the cloud, reducing your development costs and accelerating your time to market,” notes Amazon AVS.
What’s inside the SDK?
In terms of components and functionality found in the SDK itself, it will provide a runtime engine to enable data communication interactions with Alexa itself.
The SDK will also provide the necessary interfaces that developers will need to work with audio input controls and media playback… but the SDK is also intended to be potentially used to power the creation of voice-driven (let’s use Amazon’s term and say voice-forward) app functions that could include other automated operations inside a car such as air conditioning, electric windows and other ‘custom skills’.
According to Amazon, “The Alexa Auto SDK includes core Alexa functionality, such as speech recognition and synthesis, and other capabilities such as streaming media, controlling smart home devices, notifications, weather reports and tens-of-thousands of custom skills.”
Additionally, the SDK provides the hooks required to connect to a wake word engine, local media player, local phone and local navigation system.
The SDK is available on GitHub.
Open source log file analytics specialist InfluxData is insistent that we should take a ‘metrics first’ approach to log analysis.
The company says believes in a metrics first approach that provides developers with the means to ingest, correlate and visualise all time series data at three levels:
Data level one: data relating to technology infrastructure metrics including applications, databases, systems, containers etc.
Data level two: data from business metrics including profit and loss and all the normal economic business monitors.
Data level three: log events… a log, in a computing context, is the automatically produced and time-stamped documentation of events relevant to a particular system and virtually all software applications and systems produce log files.
InfluxData’s technology is focused on the visualisation and analysis of structured application and system events captured via log files. By correlating business metrics to server and application metrics with structured logs, InfluxData claims to be able to provide more precise problem investigation and root-cause analysis capabilities.
The firm’s most recent software release expands functionality with new support for high-speed parsing and ingestion using the syslog protocol, custom log parsing and pre-built log visualisation components.
InfluxData founder and CTO Paul Dix says that each log message represents an event in time and the same metadata that accompanies metrics can be used to pinpoint the valuable contextual information contained within those files.
“By starting with metrics and their associated metadata, operators and developers can understand where and how to interrogate the large volumes of event data contained within logs without performing expensive search queries. This reduces much of the guesswork and prior knowledge required to sift through log data that is typically present when using logs as the initial and primary source of anomaly detection,” said Dix.
InfluxData says that its platform allows users to capture metadata at the collection point, allowing the developer to map elements across systems and supplement additional information when and where required, providing consistency and richness to the logs being transmitted via the syslog protocol.
It provides an improved workflow for log visualisation within the same environment where they have constructed metrics dashboards, which allows a developer to analyse the captured log events for a specific time interval and narrow data down by the important metadata elements, such as host, application, subsystem etc.
The Linux Foundation Deep Learning Foundation (LF DLF) has announced five new members: Ciena, DiDi, Intel, Orange and Red Hat.
As an umbrella organization of The Linux Foundation itself, the LF DLF supports and sustains open source innovation in Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL).
What is Deep Learning?
Deep Learning is defined as an aspect of AI that is concerned with emulating the learning approach that human beings use to gain certain types of knowledge. It can be thought of as a way to automate predictive analytics and is also sometimes known as deep structured learning or hierarchical learning.
Deep Learning concerns ‘learning data representations’ as opposed to ‘task-specific algorithms’.
It can be supervised, semi-supervised or unsupervised and can be used to build architectures such as deep neural networks, deep belief networks and recurrent neural networks that have been used in fields including computer vision and speech recognition etc.
AI model discovery
The Linux Foundation says that these new members will provide additional resources to the community to develop and expand open source AI, ML and DL projects, such as the Acumos AI Project, the foundation’s platform for AI model discovery, development and sharing.
These companies join founding members Amdocs, AT&T, B.Yond, Baidu, Huawei, Nokia, Tech Mahindra, Tencent, Univa and ZTE.
Chief operating officer of The Linux Foundation Lisbeth McNabb makes it clear that the LF Deep Learning Foundation is a neutral space for harmonisation and acceleration of separate technical projects focused on AI, ML and DL technologies.
“Deep learning has the potential to change everything about how we learn from data,” said Chris Wright, VP & CTO at Red Hat. “Open source communities are at the heart of advancing deep learning frameworks and we’re excited to see further collaboration with the LF Deep Learning Foundation around model discovery, development and lifecycles… and bringing open source software development best practices to deep learning models.”
Good developments in Deep Learning then… almost good enough to help us calculate the answer to life, the universe and everything?
No need to bother, we already know that the answer is still 42, right?
Instaclustr is known for its focus on providing the Cassandra database as a managed service in the cloud — the company is equally known for its work providing Apache Kafka, Apache Spark and Elasticsearch.
Slater writes on the subject of improving Apache Kafka Management – Apache Kafka is an open source ‘stream processing’ software platform developed by the Apache Software Foundation, written in Scala and Java… it handles trillions of events every day.
Stream processing is useful in areas such as massive multiplayer online gaming and other more enterprise level forms of extreme data processing connectivity such as trading and Internet of Things device log file processing.
Slaters writes as follows…
Like the works of the famed novelist that bears its name, Apache Kafka is fairly easy to get started with – but understanding its deeper nuances and capturing all that it has to offer can be a heck of a challenge.
Here are nine tips for making sure your Kafka deployment remains simple to manage and fully optimized:
1) Configure logs to keep them from getting out of hand.
log.segment.bytes, log.segment.ms and log.cleanup.policy (or the topic-level equivalent) are the parameters that allow you to control log behaviour. For example, if you have no need for past logs you can set cleanup.policy to “delete”, so that Kafka eliminates log files after a set time period or once they reach a pre-determined file size. Alternatively, you can set the policy to “compact” to retain logs, tailoring the parameters to fit your use case as needed.
2) Understand Kafka’s hardware needs.
Because Kafka is designed for horizontal scaling and doesn’t require a great deal of resources, you can run successful deployments while using affordable commodity hardware. Here’s a breakdown:
- Kafka doesn’t require a powerful CPU, except when SSL and log compression are needed.
- 6 GB of RAM, used for heap space, allows Kafka to run optimally in most use cases. More is often helpful to assist with disk caching.
- When it comes to the disk, non-SSD drives are often suitable due to Kafka’s typical sequential access pattern.
3) Make the most of Apache ZooKeeper.
Be sure to cap the number of Apache ZooKeeper nodes at five or fewer. ZooKeeper also pairs well with strong network bandwidth. In pursuing minimal latency, use optimal disks with logs stored elsewhere, isolate the ZooKeeper process with swap disabled, and monitor latency closely.
4) Be smart in establishing replication & redundancy.
Kafka’s resilience depends on your wise pursuit of redundancy and reliability in the face of disaster. For example, Kafka’s default replication factor of two should be increased to three in most production deployments.
5) Be careful with topic configurations.
Set topic configurations properly in the first place, and create a new topic if changes do become necessary.
6) Take advantage of parallel processing.
More partitions mean greater parallelisation and throughput, but also extra replication latency, rebalances, and open server files. Safely estimated, a single partition on a single topic can deliver 10 MB/s (the reality is more favourable); using this baseline you can determine the targeted total throughput for your system.
7) Secure Kafka through proper configuration & isolation.
The .9 release of Kafka added an array of useful security features, including support for authentication between both Kafka and clients, and Kafka and ZooKeeper. Kafka also added support for TLS, which is a key security precaution for systems with clients directly connecting from the public internet.
8) Set a high Ulimit to avoid outages.
Setting your Ulimit configuration is pretty straightforward: Edit /etc/sysctl.conf and set a hard Ulimit of 128,000 or higher for the maximum open files allowed by your deployment system, then restart. Doing so avoids the all-too-common scenario of experiencing what looks like a load issue with brokers going down, but is actually a simple “too many open files” error.
9) Utilise effective monitoring & alerts.
Kafka’s two key areas to monitor are 1) system metrics and 2) JVM stats. Monitoring system metrics means tracking open file handles, network throughput, load, memory, disk usage, and more. For JVM stats, be aware of any GC pauses and heap usage. Informative history tools and dashboards for swift debugging are your friends here.
If you were about to start a company dedicated to providing software application development engineers with mapping and location-based APIs and SDKs, then what would you call it?
MapTech perhaps, or Locationsoft… perhaps even GeoMapLocaNavTech?
Here Technologies went for the arguably more honest and Ronseal does what it says approach and called itself Here.
Actually it didn’t, the company cheekily tried (like so many others) to spell its non-acronym name all in capital letters and you might think, in branding terms, that that’s a quite clever way of getting your corporate label listed more prominently in the press showing in full CAPS. But it isn’t.
Here works in mapping and location platform services and has now introduced a freemium option for developers to build applications using the company’s location software, which is said the be ‘enterprise-grade’ in form and function.
What Here gives developers is access to geo-location-related data and platform services including: maps, geocoding, geofencing, places and intermodal routing, as well as features such as turn-by-turn navigation and custom route, waypoint and fleet APIs with the simple pricing plan described below.
To define a couple of those terms there then:
Geocoding is the computational process of transforming a physical address description to a location on the Earth’s surface as spatial representation in numerical coordinates. Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location.
Geofencing is the use of GPS or RFID technology to create a virtual geographic boundary, enabling software to trigger a response when a mobile device enters or leaves a particular area.
Intermodal routing is a service that allows developers to provide routes combining three different routing modes for car, pedestrian and public transit; as well as access to parking information within the proximity of transit stops.
“Location-awareness is the foundation for our digitally connected world,” claims the fabulously named Edzard Overbeek in his role of CEO of Here Technologies.
Overbeek details the freemium plans saying that it provides free access to Here software to create public, private, paid and free applications and websites with a limit of 250,000 platform transactions, 5,000 SDK active users and 250 managed assets per month.
There are extended pay-as-you-grow options that go beyond its freemium offering linked here.
Elastic is joining forces with Insight.io, a Palo Alto-based startup developing search tools that claim to provide a ‘semantic understanding’ of software source code.
Known for its Elasticsearch and Elastic Stack products, Elastic insists that Insight.io’s technology is ‘highly complementary’ to other Elastic use cases and solutions — indeed, Insight.io is built on the Elastic Stack.
Insight.io provides an interface to search and navigate the source code that is said to ‘go beyond’ simple free text search
Current programming language support includes C/C++, Java, Scala, Ruby, Python and PHP.
This ‘beyond text search’ function gives developers the ability to search for code pertaining to specific application functionality and dependencies.
Essentially it provides IDE-like code intelligence features such as cross-reference, class hierarchy and semantic understanding.
The impact of such functionality should stretch beyond exploratory question-and-answer utility, for example, enabling more efficient onboarding for new team members and reducing duplication of work for existing teams as they scale.
Elastic founder and chief executive officer Shay Banon explains that initial technical integration of Insight.io technology will follow a similar path to other recent Elastic acquisitions such as Opbeat and Prelert, with a focus on creating a scalable single purpose server for the new code search functionality.
“Insight.io’s IDE-like user interface will be released as an official Kibana app, with the full solution then included into the standard distribution of the Elastic Stack,” said Banon.
On the operational side, Elastic will be welcoming all Insight.io’s employees into its development team.
“We founded Insight.io to create a tool that would enable code development efficiency and insight for the millions of developers building new applications,” said Chongzhe Li, co-founder of Insight.io. “A few years ago, we decided to build our product on top of the Elastic Stack because it allowed us to build the best product for our users.”
Co-founded by Fuyao Zhao, Chongzhe Li, and Mengwei Ding, based in Palo Alto, CA, Insight.io also has an engineering team based in Beijing, giving Elastic its first formal development team in China.
Facebook, Twitter, Google and Microsoft have joined an open source initiative designed to help users transfer data across multiple online platforms services without facing privacy issues.
Established back in 2017, the newly expanded Data Transfer Project (DTP) is open source at its heart.
The Data Transfer Project (DTP) describes itself as a collaboration of organisations committed to building a common framework with open source code that can connect any two online service providers, enabling a seamless, direct, user initiated portability of data between the two platforms.
“Using [your own user] data [that exists in] one service when you sign up for another still isn’t as easy as it should be,” said Steve Satterfield, privacy and public policy director at Facebook.
Satterfield provided an example: a user might use an app to share photos publicly, a social networking app for updates with friends and a fitness app… but, he contends, the connection between those apps and the platforms those apps are is, today, far from seamless, or indeed secure.
The Data Transfer Project uses services’ existing APIs (i.e. the Application Programming Interfaces belonging to those services in the first place) and authorisation mechanisms to access data.
The project’s software framework then uses service specific adapters to transfer that data into a common format, and then back into the new service’s API.
Craig Shank, VP corporate standards at Microsoft has said that for people on slow or low bandwidth connections, service-to-service portability will be especially important where infrastructure constraints and expense make importing and exporting data to or from the user’s system impractical if not nearly impossible.
“We encourage others in the industry to join us in advancing a broader view of the data portability ecosystem. This project launch is a starting point for that effort, and we look forward to working with our current and future partners to iterate on designs, improve the ways we serve our customers, and ensure people can benefit from the innovation and diversity of user choice that can be driven through greater portability,” said Shank.
BlueK8s is a new open source Kubernetes initiative from ‘big data workloads’ company BlueData — the project’s direction leads us to learn a little about which direction containerised cloud-centric applications are growing.
Kubernetes is a portable and extensible open source platform for managing containerised workloads and services (essentially it is a container ‘orchestration’ system) that facilitates both declarative configuration and automation.
The first open project in the BlueK8s initiative is Kubernetes Director (aka KubeDirector), for deploying and managing distributed ‘stateful applications’ with Kubernetes.
Apps can be stateful or stateless.
A stateful app is a program that saves client data from the activities of one session for use in the next session — the data that is saved is called the application’s state.
The company reminds us that Kubernetes adoption is accelerating for stateless applications and microservices… and the community is beginning to evolve and mature the capabilities required for stateful applications.
Mature stateful apps?
What they (it, the company) really means here are large-scale distributed typically complex stateful applications
These large-scale distributed stateful applications – including use cases in analytics, data science, machine learning (ML) and deep learning (DL) applications… plus also for AI and big data use cases – and the problem is that these apps are still complex and challenging to deploy with Kubernetes.
Typically, stateless applications are microservices or containerised applications that have no need for long-running [data] persistence and aren’t required to store data.
But, that being said, cloud native web services (such as a web server or front end web user interface) can often be run as containerised stateless applications since HTTP is stateless by nature: there is no dependency on the local container storage for the workload.
Stateful applications, as stated above, are services that save data to storage and use that data; persistence and state are essential to running the service.
These mature stateful apps include databases as well as complex distributed applications for big data and AI use cases: e.g. multi-service environments for large-scale data processing, data science and machine learning that employ open source frameworks such as Hadoop, Spark, Kafka, and TensorFlow as well as a variety of different commercial tools for analytics, business intelligence, ETL and visualization.
Kumar Sreekanti, co-founder and CEO of BlueData explains that in enterprise deployments, each of these different tools and applications need to interoperate in a single cohesive environment for an end-to-end distributed data pipeline. Yet they [mature stateful apps that is] typically have many interdependent services and they require persistent storage that can survive service restarts. They have dependencies on storage and networking, and state is distributed across multiple configuration files.
Sreekanti points out that the Kubernetes ecosystem has added building blocks such as Statefulsets – as well as open source projects including the Operator framework, Helm, Kubeflow, Airflow, and others – that have begun to address some of the requirements for packaging, deploying, and managing stateful applications.
But, claims BlueData, there are still gaps in the deployment patterns and tooling for complex distributed stateful applications in large-scale enterprise environments.
BlueData recently joined the Cloud Native Computing Foundation (CNCF) – the organisation behind Kubernetes and other cloud native open source projects – in order to foster collaboration in this area with developers and end users in the Kubernetes ecosystem.
KubeDirector is currently in pre-alpha and under active development.