Much as we would like it to be, the Computer Weekly Developer Network and Open Source Insider team can’t be everywhere at once — and this week, that means we’re missing MongoDB World 2019 in New York.
In light of this absence, we have been to the local deli to stock up on bagels, beef pastrami with dill pickles and knocked on our neighbour’s door to shout: “ba-da-bing, fo’get-about-it.”
So then, stereotyped exaggeration out of the way, what did open source purists at MongoDB get up to in the Big Apple this week?
One of the major pieces of news saw the company unveiling its product vision for Realm, a mobile database and synchronisation platform company it acquired in May of 2019, which will now merge with the serverless platform MongoDB Stitch.
Realm’s synchronization protocol will connect with the MongoDB Atlas global cloud database on the backend, making Realm Sync a way for developers to connect data to the devices running their applications.
“As a part of MongoDB, Realm will become the default database for mobile developers and the easiest way to build real-time data applications in the browser and in iOS and Android devices,” claimed the company, grandly, in a press statement.
Beyond the database
The company also announced several new cloud services and features.
The event saw the introduction of Data Lake, Full Text Search and MongoDB Atlas Auto Scaling as well as the general availability of MongoDB Charts, all of which are intended to drive a competitive edge with data in multiple stages of the data layer.
IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public cloud. Yet the complexity of Hadoop and the rigidity of traditional data warehouses are making it increasingly difficult and expensive to get clear value from rich, modern data in the cloud.
MongoDB Search gives developers and users the option to filter, rank and sort through their data to surface the most relevant results.
However, to gain access to rich search functionality, many organisations pair their database with a search engine such as Elasticsearch or Solr, which MongoDB claims can complicate development and operations — because we end up with two entirely separate systems to learn, maintain and scale.
“Atlas Full Text Search provides rich text search capabilities based on Apache Lucene 8 against fully managed MongoDB databases with no additional infrastructure or systems to manage. Once indexes have been created using either the Atlas UI or API, developers can run sophisticated search queries using MQL, saving significant time and energy,” noted MongoDB, in its product announcement specifications.
MongoDB also has visualisation technologies and MongoDB Charts is now generally available.
Available as a managed service in MongoDB Atlas, or downloadable to run on-premises, users can create charts and graphs, build dashboards, share them with other team members for collaboration and embed them directly into web apps to create more engaging user experiences.
To top all that, the company also announced the latest version of its core database, MongoDB 4.2. Key features such as distributed transactions, field level encryption (FLE) and an updated Kubernetes Operator.
This is really just a taster of what the firm is discussing in Manhatten this week, although we have attempted to cover most of the major bases.
It is of note to mention that MongoDB issued an 1858 word-long press statement covering three major news headline streams without using a single “I’m delighted that we are blah blah blah,” said the MongoDB CEO.
They (it, the company) must have had some real product-spec related news to go through… who’d a thunk it?
Tibco has made a direct developer play designed to drive recognition of its technologies in areas where programmers are looking to build cloud-native applications.
Known for its integration, API management and analytics stack, Tibco (sometimes written as TIBCO for The Information Bus COmpany) has now enhanced TIBCO Cloud Integration, TIBCO Cloud Mashery and TIBCO Cloud Events.
All three products make use of cloud-native and open source technologies.
There is now native support for GraphQL: an open source data query and manipulation language for APIs and a runtime for fulfilling queries with existing data.
Tibco says that its Cloud Integration technology is the first enterprise Integration Platform-as-a-Service to provide native support for GraphQL.
There is also news that the company’s Cloud Mashery microgateway now supports ‘event-driven patterns’ through support for the AsyncAPI Project when using Apache Kafka.
AsyncAPI is a means of creating machine-readable definitions for message-driven systems.
Deeper into open source, we also find the new open source Project Flogo streams designer.
Raw data pipelines
Rajeev Kozhikkattuthodi, vice president, product management and strategy at Tibco says that Project Flogo now offers a web designer — he claims that this simplifies how developers work with streaming data by allowing them to process raw data pipelines using tasks such as aggregations, join streams and filtering.
The company finally notes that Tibco Cloud Events adds commercial support for these capabilities, along with business rules authoring.
This is supposed to enable non-technical business users to collaborate with developers and build applications to identify meaningful events and take the next best actions.
It’s hard to know whether to pronounce software infrastructure company Wind River as wind (as in eaten too many beans, that thing that makes sails billow out) or wind (as in snakey, twisty) river.
It looks like its wind as in breezy mistrals on this link, so let’s go with that.
Whether it be winding or breezy, the company has this month updated its Wind River Linux with a release focused on ease of adoption of containers in embedded systems.
How do you make containers adoption easier? We’re glad you asked.
It’s all about offering pre-built containers, tools and documentation as well as support for frameworks such as Docker and Kubernetes.
Appliances at the network edge
Wind River Linux is freely available for download and the technology is aligned to support software application development for cloud-native appliances that will exist at the network edge i.e. ones that will be in place, doing a job, for quite a while.
The company says that while containers can deliver powerful benefits such as greater scalability and flexibility, most current frameworks lack the right design or support for mission-critical industries that typically employ devices with extremely long lifecycles.
Embedded devices in the Operational Technology (OT) realm, such as those for industrial, medical equipment and automotive systems, also often require lightweight, reliable software with long lifecycles.
“Existing container technologies and platforms, like those in enterprise Linux, are often bloated or require updates too frequently to run effectively on these embedded systems. Wind River Linux includes container technology that supports the development and orchestration frameworks such as Docker and Kubernetes. It is Docker compatible under Open Container Initiative (OCI) specifications, but it is also lighter weight and has a smaller footprint than Docker, which is often a vital need for embedded systems,” said the company, in a press statement.
Insisting that his organisation is a ‘champion of open source’, Wind River VP of product Michel Genard has noted that although Linux containers have been widely deployed in datacentres and IT environments, without easy-to-use pre-integrated platforms or meaningful engagement across the ecosystem, container use has been scarce in small-footprint and long-lifecycle edge embedded systems.
The company rounds out by saying that by incorporating containers in Wind River Linux and combining this runtime with technologies such as the edge compute software product Wind River Helix Virtualization Platform, heterogeneous systems employing a mix of OSes (and requiring determinism and safety certification) can use the scalability of containers while meeting the often stringent requirements of embedded systems.
There are many magic rings in this world… and none of them should be used lightly. This is true.
It is also true that organisations in every vertical are now having to work hard and find automation streams that they can digitise (on the road to *yawn* digital transformation, obviously) and start to apply AI and machine learning to.
Another key truth lies in the amount of codified best practices that organisations now have the opportunity to lay down.
One we can denote a particular set of workflows in a particular department (or team, or group, or any other collective) to be deemed to be as efficient as possible, then we can lay that process down as a best practice.
These best practices are often now taken forward as templates for other firms to be able to use (once any user data is appropriately obfuscated and anonymised), especially when the best practice itself is identified under the stewardship of some higher level platform provider.
But there is another way we should think about best practice i.e. we should think about its existence as a necessary part of open business in the digital age.
The existence of open source projects (and the use of open platform technologies) could be regarded as a piece of corporate best practice i.e. firms should directly identify that they do engage with open source, because life in a proprietary-only technology world would always be more restrictive.
According to the report, “By implementing open source best practices, organisations are helping developers become both more productive and more structured in how they manage the often abundant open source software their businesses rely on.”
Almost 800 people were surveyed and around half we developers.
Wot no open source?
We’re not quite at the stage where people will refuse to take a job at a particular company based upon whether or not an organisation can evidence a substantial use of open source technologies — but we do know that that type of ethical concern is right up there for millennials and the Generation-Z workers just starting work now at the end of the current decade — so this could (and arguably should) be a trend to look out for.
Elastic (the company known for the Elasticsearch open source text search and analytics engine and the Elastic Stack data analysis and visualition toolset) will now acquire Endgame.
Endgame is a security company focused on endpoint prevention, detection and response.
Elastic wants to add Security Information and Event Management (SIEM) to its stack, so Endgame is a logical enough purchase.
Shay Banon, CEO and Founder of Elastic says he’s excited about the chance to combine the two firms’ competencies. CEO of Endgame Nate Fick thinks that it’s a good idea too. CTO of Endgame Jamie Butler insists that both organisations share a commitment to openness, transparency, and user enablement.
All three men added: customers, customers, customers etc…
As the creators of the Elastic Stack (Elasticsearch, Kibana, Beats, and Logstash), Elastic builds self-managed and SaaS offerings that claim to make data usable in real time at scale for use cases like application search, site search, enterprise search, logging, APM, metrics, security and business analytics.
Endgame makes endpoint protection using machine learning technology that is supposedly capable of stopping everything from ransomware, to phishing and targeted attacks.
The company says its USP lies in its hybrid architecture that offers both cloud administration and data localisation that meets industry, regulatory and global compliance requirements.
The combined company names ElasticEnd, EndElastic, RobustRubber are not being considered at the time of writing.
But what is Kedro?
Kedro is a development workflow framework that structures a programmer’s data pipeline and provides a standardised approach to collaboration for teams building deployable, reproducible, portable and versioned data pipelines.
In 2015, QuantumBlack was acquired by McKinsey & Company — and the management consultancy has never before created a publicly available open source project.
Global of head of engineering & product for QuantumBlack at McKinsey is Michele Battelli — he asserts that many data scientists need to perform the routine tasks of data cleaning, processing and compilation that may not be their favourite activities but still form a large percentage of their day to day tasks.
He claims Kedro makes it easier to build a data pipeline to automate the ‘heavy lifting’ and reduce the amount of time spent on this kind of task.
- Structure analytics code in a uniform way so that it flows seamlessly through all stages of a project
- Deliver code that is ‘production-ready’, making [theoretically] it easier to integrate into a business process
- Build data pipelines that are modular, tested, reproducible in different environments and versioned, allowing users to access previous data states
QuantumBlack says it has used Kedro on more than 60 projects.
“Every data scientist follows their own workflow when solving analytics problems. When working in teams, a common ground needs to be agreed for efficient collaboration. However, distractions and shifting deadlines may introduce friction, ultimately resulting in incoherent processes and bad code quality. This can be alleviated by adopting an unbiased standard which captures industry best practices and conventions,” noted Battelli and team, in a press statement.
The Kedro team state that production ready code should have the following attributes — it should be:
- Reproducible in order to be trusted
- Modular in order to be maintainable and extensible
- Monitored to make it easy to identify errors
- Tested to prevent failure in a production environment
- Well documented and easy to operate by non-experts
Battelli thinks that code written during a pilot phase rarely meets these specifications and can sometimes require weeks of re-engineering work before it can be used in a production environment.
Kedro also features data abstraction, to enable developers to manage how an application will load and save data — this is so they don’t have to worry about the reproducibility of the code itself in different environments.
Kedro also features modularity, allowing developers to break large chunks of code into smaller self-contained and understandable logical units. There’s also ‘seamless’ packaging, allowing coders to ship projects to production, e.g. using Docker or Airflow.
The team invites open contributions to the project and says that it is excited to see how it develops in the future — Kedro is here on GitHub.
The instant clustering aficionados at Instaclustr have created an anomaly detection application capable of processing and vetting real-time events at a uniquely massive scale – 19 billion events per day.
They did it by using open source Apache Cassandra and Apache Kafka and Kubernetes container orchestration technologies.
Keen to show just how much scalability the scalability factor in its own managed platform technology could handle, Instaclustr completed this action and made detailed design information available here and source code available here.
According to an Instaclustr white paper, anomaly detection is the identification of unusual events within an event stream – often indicating fraudulent activity, security threats or in general a deviation from the expected norm.
Anomaly detection applications are deployed across numerous use cases, including financial fraud detection, IT security intrusion and threat detection, website user analytics and digital ad fraud, IoT systems and beyond.
“Anomaly detection applications typically compare inspected streaming data with historical event patterns, raising alerts if those patterns match previously recognised anomalies or show significant deviations from normal behaviour. These detection systems utilise a stack of [technologies] that often include machine learning, statistical analysis and algorithm optimisation and [use] data-layer technologies to ingest, process, analyse, disseminate and store streaming data,” notes Instaclustr.
The company notes that the challenge comes in designing an architecture capable of detecting anomalies in high-scale environments where the volume of daily events reaches into the millions or billions.
When events hit the millions (or indeed billions) a streaming data pipeline application needs to be engineered for mass scale.
“To achieve this, Instaclustr teamed the NoSQL Cassandra database and the Kafka streaming platform with application code hosted in Kubernetes to create an architecture with the scalability and performance required for the solution to be viable in real-world scenarios. Kafka supports fast, scalable ingestion of streaming data, and uses a store and forward design that provides a buffer preventing Cassandra from being overwhelmed by large data spikes,” notes Instaclustr.
Cassandra serves as a linearly scalable, write-optimised database for storing high-velocity streaming data — so then, proceeding with an incremental development approach, Instaclustr monitored, debugged, tuned and retuned specific functions within the pipeline to optimise its capabilities.
No mention of customer case study references, just hardcore data crunching based upon open source technologies — that’s why we quite liked this.
Oracle may not always be viewed positively in open source circles, the company’s approach to Java and wider open platform still draws headlines a decade after it took up a position of stewardship over the Java platform and language in line with the acquisition of Sun Microsystems.
Looking to highlight more positive angles in terms of Oracle’s open universe this month is the company’s David Cabelus in his position as senior principal product manager for developer services.
Cabelus notes the continued adoption of DevOps and Kubernetes and says that the notion of simplified and combined deployment is what spawned the Open Service Broker API project, which provides a consistent model for exposing cloud services to applications and application deployment tooling.
What are service brokers?
Service brokers manage the lifecycle of services — so this means that platforms interact with service brokers to provision, get access to and manage the services they offer. The Open Service Broker API defines these interactions and allows software providers to offer their services to anyone, regardless of the technology or infrastructure those software providers wish to utilise.
The newest service broker is an implementation of the Open Service Broker API for use in a Kubernetes cluster and for use with Oracle Cloud Infrastructure services.
“It simplifies access to Oracle Cloud Infrastructure services, including new services like our Autonomous Databases, which can be a highly scalable, automated, self-tuning backend for microservices and containerised applications,” said Cabelus.
He notes that service brokers enable application portability, too.
“The combination of a consistent model and embedding cloud service provisioning within an application deployment process means that when you deploy your application in a new cloud environment, it has everything that it needs to run. This is true for dev-test-production progressions, and for on-premises-to-cloud migrations,” said Cabelus.
Oracle’s open faction has also pointed to work highlighted at the recent KubeCon + CloudNativeCon Europe 2019 in Barcelona.
In a similar vein, the company highlighted the open sourcing of Oracle Cloud Infrastructure Service Broker for Kubernetes.
This includes a recent set of Oracle open source solutions that facilitate enterprise cloud migrations including Helidon, GraalVM, Fn Project, MySQL Operator for Kubernetes and the WebLogic Operator for Kubernetes.
In addition, the recently launched Oracle Cloud Developer Image provides a development platform on Oracle Cloud Infrastructure that includes Oracle Linux, Oracle Java SE (includes Java 8, 11, and 12), Terraform and many SDKs.
A holy trinity of forces
Could we be seeing a coming together of forces (for the greater good) when we consider where development goes next?
Oracle suggests that a holy trinity of forces (DevOps, cloud-native, open source) could be pushing us towards a cultural change in software programming where complexity is more managed and functionality is more controlled.
Oracle may not be every developer’s favourite and will (most likely) never be regarded as a real open purist (some might say, well look at Red Hat, who is these days?)… so it’s important to look past the corporate showboating that features so prevalently at its big events and read into the developer blogs.
DataStax closed out the final day of its ‘Accelerate 2019’ conference by focusing on a selection of platform-level developments including its community development stream.
Nate McCall, project chair, Apache Cassandra spoke openly on the journey to Apache Cassandra and its 4.0 iteration.
“Community is the lifeblood of our ecosystem,” said McCall.
Looking at the history of Cassandra, McCall explained how Cassandra had started as a project that came out of from Facebook with some additional technologies emanating on Google code back in 2009. Looking at what has happened over the last decade, he thinks that the project has grown considerably and now presents itself with ‘well-defined protocols’ and a far higher degree of usability.
Today we know that most contributors are running clusters and most of them are really bug — these are the people that are making commits on the project. This means, logically, that many of the features appearing in the project are being created by users operating at that kind of level.
“The marketing team was not involved [in the roadmap development] because we know real developers are the ones driving Cassandra forward,” said McCall.
Explaining that contributions can take any form – not just code – McCall pointed to some specific companies whose contributions have been mainly focused on documentation, which is just fine i.e. it’s all valuable and all needed.
Cassandra development in the community this year will come together at both Apachecon North America… and at the NGCC i.e. the Next Generation Cassandra Conference. The community is now looking to share out who is doing the test tracking for various aspects of the project based upon what those individuals (or teams) have specific competencies and skills in.
Folks fuel forks
In reality, the community knows that lots of folks are running internal forks… and this may be because users are waiting for Cassandra 4.0 to come forward. McCall says that a big part of the reason the community is being made to wait is that the team know that they have to make the 4.0 really really good.
McCall handed over to DataStax CTO and co-founder Jonathan Ellis.
No vendor does more for Cassandra than DataStax claimed Ellis… and he is referring to all the support and training that the company inputs alongside all the code input that it drives towards the core.
In terms of ease of use, Graph is a powerful way of exploring data and DataStax is bringing Graph techniques to Cassandra. Equally, Ellis claims that DataStax will be giving some of the wide distributed data positives back to Graph in general.
Ellis dug into the Kubernetes Operator that DataStax is developing to automate container provisioning, automate roll out of config changes and a range of other functions. He also spoke about DataStax Desktop (which offers a good deal of configuration shortcuts), which is available on the firm’s DataStax Labs website zone.
The Labs zone itself is typified by its beta-level software… users need to sign to the following agreement if they wish to use these technologies:
“You agree to test DataStax Labs Software and provide DataStax with comments, notes, bug reports and feature comments with sufficient documentation, samples, code error, screen shots, etc., to help DataStax evaluate and improve the DataStax Labs Software.”
Ellis also spent time explaining the workings of DataStax Studio, which is an interactive developer tool for CQL (Cassandra Query Language), Spark SQL and DSE Graph.
Thinking about what DataStax has announced in line with this year’s event itself, Ellis also worked through some of the functions available in DataStax Insights, this is the company’s product devoted to performance management and monitoring.
AppStax.io is the firm’s approach to automating a data layer… so this is DataStax aiming to provide a new layer of automation for developers, rather than operations, sysadmin staff or database administrators of any kind.
DataStax is aiming to bring off a complex feat of engineering and there is abstraction, automation and simplification here at many levels.
DataStax CEO Billy Bosworth started out as a database administrator (DBA), so one would hope that he knows how to build, compile, manage and deploy in all senses of those terms, right?
Bosworth took the stage to keynote the opening session at his firm’s ‘Accelerate 2019’ data-developer conference and exhibition, which was held in Washington DC this May 2019.
His CEO presentation was formally entitled: accelerating development in a cloud world.
Now then, that sounds like marketing, so why did he use that title?
Perhaps it’s because DataStax’s core drive is all about user choice (by which we mean data-centric developer/programmer system and operations engineer, rather than ‘human’ user — although the happiness of the latter depends upon the former).
What does technical user choice mean in this sense? It’s all about options such as scale-up and scale-down functionality on Cassandra clusters with consumption-based pricing. So it’s tuning the data delivery model to more sympathetically suit what the use case demands.
Bosworth suggested that developers today face a special challenge because new tools emerge every day. He also said that technology both ‘enables and restrains’ a developer’s ability to accelerate innovation.
Similarly, he has pointed out that operations and IT teams responsible for the deployment, upkeep and support of new applications are simultaneously challenged and supported by new ways to deploy — whether that be on-premises, in the cloud, or using a hybrid approach.
Newer, easier-to-use tools
What all of this drives us towards is the idea that we should be able to get easier-to-use tools that make it faster to develop apps that perform regardless of data location, size, and format — and this is (perhaps) an encapsulation of the mission that DataStax says it is on.
Bosworth is clearly impressed by the scale and performance of Apache Cassandra, but he thinks that the top adjectives that come to mind when using this platform would NOT include ‘easy’ or ‘simple’.
“We want to take the powerful technology that exists in Apache Cassandra and make it easy and better – I might even say fun,” said Bosworth, in his keynote address. “Because companies are having to expand and work in new markets all the time, business is getting even tougher. But even in a world where so many underlying infrastructures are changing at lower levels, one common thread we see across all business models is the move to cloud.”
The DataStax CEO says that his firm’s technology is ideally (okay, he said uniquely) suited cloud environments because it is essentially distributed and has a ‘masterless’ architecture with no single point of failure… but, at the same time, it offers a single toolset to manage cloud and on-premises environments.
Demo: chaos in the clouds
A live demo 18-minutes into a keynote is probably a good indication of any event being a real ground level data-centric developer-centric gathering. Bosworth was joined on stage by Chelsea Navo in her role as global vanguard lead at DataStax.
Navo demo’d DataStax’s technology which is capable of spanning connections to (in the example shown) three major cloud providers. With a massive number of servers are out there, there are millions of points of potential failure… so apps need to be able to withstand a rough ride and rely upon a database that can handle disruptive events happening across the network. If nodes drop out (or whole clouds drop out) from some servers in some cloud, DataStax will provide enough resilience to be able to deal with ‘chaos in the clouds’ and switch to resources that are live and functional.
This all leads towards the need for a new layer of management intelligence for cloud applications… and this is what DataStax is aiming to provide with its new DataStax Constellation announcement.
Constellation will launch later this year with two cloud services: DataStax Apache Cassandra as a Service and DataStax Insights. DataStax Apache Cassandra as a Service will deliver easy scale-up and scale-down Cassandra clusters, on consumption-based pricing, which is backed by the stability and performance enhancements of DataStax Enterprise.
Follow up presentations subsequent to Bosworth’s CEO address featured an address from Deloitte’s principal for technology services Mark White – and also from Judy Meyer, VP, WW ISV business leader and AI Advocate, Microsoft.
“We know that the world of the developer is changing. I want you to think about the ubiquitous compute power that the cloud gives us… at what is now at zettabytes of data. This means that we need a modern data estate – and this can be done using DataStax running on Microsoft Azure,” said Meyer.
When she talks about data estates, Meyer is referring to data existing on all devices, all computers, all ‘things’ in the Internet of Things, all operational databases, data lakes and all databases themselves.
There’s certainly a theme coming forward here: if we’re in a world with complexity in the cloud and chaos in the cloud (and perhaps even chaotic cloud complexity) then we need management tools and platform intelligence capable of helping us navigate out of those black holes. Crucially, we need to be able to do that fast and do it without necessarily needing a massive amount of network engineering expertise, because so many more technology practitioners are exposed to the engine rooms of the cloud itself. The sheer scale of the complexity and raw power in cloud has the potential to actually slow us down and encumber IT systems because it’s such a massive (virtual) engine to fuel, run and operate… but it can also accelerate our applications and data services if we use the correct tools — and DataStax wants to be that toolset.