There’s a problem with architecture modernisation — and it’s a big problem.
As many as 99% of IT executives are said to currently report challenges with architecture modernisation — but what does that mean, who’s saying it and what should we do about it?
This ‘revalation’ (Ed – or… contrived PR-driven survey result designed to push a corporate message set perhaps?) has been made in the 2019 IT Architecture Modernisation Trends survey, as carried out by DataStax.
DataStax is, of course, the company that produces the most popular (commercially supported… or ‘production-certified’ as DataStax would put it) database built on open source Apache Cassandra.
Why is architecture modernisation is broken?
Because 98% of IT team managers report challenges with their corporate data architectures due to the presence of data silos i.e. elements of fixed data that remain isolated rom the rest of the organisation under the control of one single department (or team, or division etc,) due to either technical legacy issues or perhaps also cultural personal issues where teams don’t see eye to eye.
Data silos can also occur, of course, as a result of vendor lock-in and the DataStax survey says that this is noted as the issue in 95% of the 300 execs the company surveyed.
The hardest part
“What this report makes clear is that data is certainly the hardest part of architecture modernisation,” said DataStax SVP and chief product officer Robin Schumacher. “While the cloud makes so many things around architectures much easier, it also creates additional data-related challenges.”
Schumacher claims that DataStax helps enterprises face those (above) exact challenges, well, that’s what survey-based stores are for, right?
- Architecture modernization is both necessary and hard: 100% of respondents are modernising their technology architecture — however, 99% report challenges with architecture modernisation and no standards exist for funding new application development.
- Cloud flexibility is key: 85% of respondents have cloud initiatives as part of modernisation efforts, 72% are moving to a hybrid or multi-cloud infrastructure — 95% have concerns about vendor lock-in.
- Data is the driving factor behind modernisation: 98% of respondents report challenges with their corporate data architectures with data silos topping the list — 84% say that they are developing more real-time transactional applications.
- Open source is increasingly valued by large organizations: 82% of respondents report that their teams are more receptive to open source today than five years ago and 50% report open source is part of their architecture modernization plans.
Read the full report here.
By goodness and by golly, hasn’t AI already thrown up its own set of buzzphrases?
Many of them have come about as a result of the bias that was an inherent part of initial systems development in this field.
Often developed by males, a bias for gender was overlooked… and then, looking wider, we quickly found that bias and sensitivity towards culture, race, religion, ethnicity and many other defining human traits had never been conscientiously ‘programmed in’ to these systems in their foundational architecture.
Today, we’re not just worried about getting our hands on working AI… we need ‘Explainable AI’ and ‘Operationalised AI’ … and perhaps just plain old ‘Governed AI’ and ‘Trusted AI’.
This is part of why IBM created Watson OpenScale.
Offering manager for Watson OpenScale Susannah Shattuck has explained that this IBM brand introduced the idea of giving business users and non-data scientists the ability to monitor their AI and machine learning models to understand performance, help detect and mitigate algorithmic bias and to get explanations for AI outputs.
But, she says, that was just the start.
IBM Watson OpenScale has now been augmented to make it easier to detect and mitigate bias against ‘protected attributes’ like sex and ethnicity with Watson OpenScale through recommended bias monitors.
“Up till now, users manually selected which features or attributes of a model to monitor for bias in production, based on their own knowledge. With the new recommended bias monitors, Watson OpenScale will now automatically identify whether known protected attributes, including sex, ethnicity, marital status, and age, are present in a model and recommend they be monitored. Such functionality will help users avoid missing these attributes and ensure that bias against them is tracked in production,” said Shattuck.
She also notes that the team is working with regulatory compliance experts to continue expanding this list of attributes to cover the sensitive demographic attributes most commonly referenced in data regulation.
Shattuck says that in addition to detecting protected attributes, Watson OpenScale will recommend which values within each attribute should be set as the monitored and the reference values.
Recommending, for example, that within the “Sex” attribute, the bias monitor be configured such that “Woman” and “Non-Binary” are the monitored values and “Male” is the reference value.
“Recommended bias monitors help to speed up configuration and ensure that you are checking your AI models for fairness against sensitive attributes. As regulators begin to turn a sharper eye on algorithmic bias, it is becoming more critical that organisations have a clear understanding of how their models are performing, and whether they are producing unfair outcomes for certain groups,” concluded Shattuck.
We’re on the road to bias free open attribute AI, but there’s sea of algorithmic bias out there to combat… so it’s still early days.
We’re expanding our blockchain cryptocurrency vocabulary all the time.
As such, Loopring may be the next term you add to your technology lexicon.
Loopring is an open source and multilateral token exchange protocol based on smart contracts for Decentralised EXchanges (DEXs) on the Ethereum blockchain.
Ethereum is also open-source. It is a public distributed blockchain platform with smart contract (scripting) functionality for online contractual agreements.
Loopring allows for multiple exchanges to mix and match orders. It also allows ‘off-chain’ order-matching. It also allows ‘on-chain’ transaction clearing and payment.
Loopring is ebullient about its open source status and says that it offers open participation.
“Anyone can become a DEX with our software: you can join our DEX network to share liquidity, or create and manage your own DEX network. Sharing liquidity greatly improves your competitive advantage,” states the organisation, on its home pages.
The Loopring protocol is blockchain agnostic and can be deployed on any blockchain with smart contract capability.
Loopring says it has made it possible for AI-based smart agents/robots to trade tokens without human assistance. This could be key for the future economy where AI-based automatons will think, own, trade and perhaps evolve in ways that we have yet to imagine.
Much as we would like it to be, the Computer Weekly Developer Network and Open Source Insider team can’t be everywhere at once — and this week, that means we’re missing MongoDB World 2019 in New York.
In light of this absence, we have been to the local deli to stock up on bagels, beef pastrami with dill pickles and knocked on our neighbour’s door to shout: “ba-da-bing, fo’get-about-it.”
So then, stereotyped exaggeration out of the way, what did open source purists at MongoDB get up to in the Big Apple this week?
One of the major pieces of news saw the company unveiling its product vision for Realm, a mobile database and synchronisation platform company it acquired in May of 2019, which will now merge with the serverless platform MongoDB Stitch.
Realm’s synchronization protocol will connect with the MongoDB Atlas global cloud database on the backend, making Realm Sync a way for developers to connect data to the devices running their applications.
“As a part of MongoDB, Realm will become the default database for mobile developers and the easiest way to build real-time data applications in the browser and in iOS and Android devices,” claimed the company, grandly, in a press statement.
Beyond the database
The company also announced several new cloud services and features.
The event saw the introduction of Data Lake, Full Text Search and MongoDB Atlas Auto Scaling as well as the general availability of MongoDB Charts, all of which are intended to drive a competitive edge with data in multiple stages of the data layer.
IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public cloud. Yet the complexity of Hadoop and the rigidity of traditional data warehouses are making it increasingly difficult and expensive to get clear value from rich, modern data in the cloud.
MongoDB Search gives developers and users the option to filter, rank and sort through their data to surface the most relevant results.
However, to gain access to rich search functionality, many organisations pair their database with a search engine such as Elasticsearch or Solr, which MongoDB claims can complicate development and operations — because we end up with two entirely separate systems to learn, maintain and scale.
“Atlas Full Text Search provides rich text search capabilities based on Apache Lucene 8 against fully managed MongoDB databases with no additional infrastructure or systems to manage. Once indexes have been created using either the Atlas UI or API, developers can run sophisticated search queries using MQL, saving significant time and energy,” noted MongoDB, in its product announcement specifications.
MongoDB also has visualisation technologies and MongoDB Charts is now generally available.
Available as a managed service in MongoDB Atlas, or downloadable to run on-premises, users can create charts and graphs, build dashboards, share them with other team members for collaboration and embed them directly into web apps to create more engaging user experiences.
To top all that, the company also announced the latest version of its core database, MongoDB 4.2. Key features such as distributed transactions, field level encryption (FLE) and an updated Kubernetes Operator.
This is really just a taster of what the firm is discussing in Manhatten this week, although we have attempted to cover most of the major bases.
It is of note to mention that MongoDB issued an 1858 word-long press statement covering three major news headline streams without using a single “I’m delighted that we are blah blah blah,” said the MongoDB CEO.
They (it, the company) must have had some real product-spec related news to go through… who’d a thunk it?
Tibco has made a direct developer play designed to drive recognition of its technologies in areas where programmers are looking to build cloud-native applications.
Known for its integration, API management and analytics stack, Tibco (sometimes written as TIBCO for The Information Bus COmpany) has now enhanced TIBCO Cloud Integration, TIBCO Cloud Mashery and TIBCO Cloud Events.
All three products make use of cloud-native and open source technologies.
There is now native support for GraphQL: an open source data query and manipulation language for APIs and a runtime for fulfilling queries with existing data.
Tibco says that its Cloud Integration technology is the first enterprise Integration Platform-as-a-Service to provide native support for GraphQL.
There is also news that the company’s Cloud Mashery microgateway now supports ‘event-driven patterns’ through support for the AsyncAPI Project when using Apache Kafka.
AsyncAPI is a means of creating machine-readable definitions for message-driven systems.
Deeper into open source, we also find the new open source Project Flogo streams designer.
Raw data pipelines
Rajeev Kozhikkattuthodi, vice president, product management and strategy at Tibco says that Project Flogo now offers a web designer — he claims that this simplifies how developers work with streaming data by allowing them to process raw data pipelines using tasks such as aggregations, join streams and filtering.
The company finally notes that Tibco Cloud Events adds commercial support for these capabilities, along with business rules authoring.
This is supposed to enable non-technical business users to collaborate with developers and build applications to identify meaningful events and take the next best actions.
It’s hard to know whether to pronounce software infrastructure company Wind River as wind (as in eaten too many beans, that thing that makes sails billow out) or wind (as in snakey, twisty) river.
It looks like its wind as in breezy mistrals on this link, so let’s go with that.
Whether it be winding or breezy, the company has this month updated its Wind River Linux with a release focused on ease of adoption of containers in embedded systems.
How do you make containers adoption easier? We’re glad you asked.
It’s all about offering pre-built containers, tools and documentation as well as support for frameworks such as Docker and Kubernetes.
Appliances at the network edge
Wind River Linux is freely available for download and the technology is aligned to support software application development for cloud-native appliances that will exist at the network edge i.e. ones that will be in place, doing a job, for quite a while.
The company says that while containers can deliver powerful benefits such as greater scalability and flexibility, most current frameworks lack the right design or support for mission-critical industries that typically employ devices with extremely long lifecycles.
Embedded devices in the Operational Technology (OT) realm, such as those for industrial, medical equipment and automotive systems, also often require lightweight, reliable software with long lifecycles.
“Existing container technologies and platforms, like those in enterprise Linux, are often bloated or require updates too frequently to run effectively on these embedded systems. Wind River Linux includes container technology that supports the development and orchestration frameworks such as Docker and Kubernetes. It is Docker compatible under Open Container Initiative (OCI) specifications, but it is also lighter weight and has a smaller footprint than Docker, which is often a vital need for embedded systems,” said the company, in a press statement.
Insisting that his organisation is a ‘champion of open source’, Wind River VP of product Michel Genard has noted that although Linux containers have been widely deployed in datacentres and IT environments, without easy-to-use pre-integrated platforms or meaningful engagement across the ecosystem, container use has been scarce in small-footprint and long-lifecycle edge embedded systems.
The company rounds out by saying that by incorporating containers in Wind River Linux and combining this runtime with technologies such as the edge compute software product Wind River Helix Virtualization Platform, heterogeneous systems employing a mix of OSes (and requiring determinism and safety certification) can use the scalability of containers while meeting the often stringent requirements of embedded systems.
There are many magic rings in this world… and none of them should be used lightly. This is true.
It is also true that organisations in every vertical are now having to work hard and find automation streams that they can digitise (on the road to *yawn* digital transformation, obviously) and start to apply AI and machine learning to.
Another key truth lies in the amount of codified best practices that organisations now have the opportunity to lay down.
One we can denote a particular set of workflows in a particular department (or team, or group, or any other collective) to be deemed to be as efficient as possible, then we can lay that process down as a best practice.
These best practices are often now taken forward as templates for other firms to be able to use (once any user data is appropriately obfuscated and anonymised), especially when the best practice itself is identified under the stewardship of some higher level platform provider.
But there is another way we should think about best practice i.e. we should think about its existence as a necessary part of open business in the digital age.
The existence of open source projects (and the use of open platform technologies) could be regarded as a piece of corporate best practice i.e. firms should directly identify that they do engage with open source, because life in a proprietary-only technology world would always be more restrictive.
According to the report, “By implementing open source best practices, organisations are helping developers become both more productive and more structured in how they manage the often abundant open source software their businesses rely on.”
Almost 800 people were surveyed and around half we developers.
Wot no open source?
We’re not quite at the stage where people will refuse to take a job at a particular company based upon whether or not an organisation can evidence a substantial use of open source technologies — but we do know that that type of ethical concern is right up there for millennials and the Generation-Z workers just starting work now at the end of the current decade — so this could (and arguably should) be a trend to look out for.
Elastic (the company known for the Elasticsearch open source text search and analytics engine and the Elastic Stack data analysis and visualition toolset) will now acquire Endgame.
Endgame is a security company focused on endpoint prevention, detection and response.
Elastic wants to add Security Information and Event Management (SIEM) to its stack, so Endgame is a logical enough purchase.
Shay Banon, CEO and Founder of Elastic says he’s excited about the chance to combine the two firms’ competencies. CEO of Endgame Nate Fick thinks that it’s a good idea too. CTO of Endgame Jamie Butler insists that both organisations share a commitment to openness, transparency, and user enablement.
All three men added: customers, customers, customers etc…
As the creators of the Elastic Stack (Elasticsearch, Kibana, Beats, and Logstash), Elastic builds self-managed and SaaS offerings that claim to make data usable in real time at scale for use cases like application search, site search, enterprise search, logging, APM, metrics, security and business analytics.
Endgame makes endpoint protection using machine learning technology that is supposedly capable of stopping everything from ransomware, to phishing and targeted attacks.
The company says its USP lies in its hybrid architecture that offers both cloud administration and data localisation that meets industry, regulatory and global compliance requirements.
The combined company names ElasticEnd, EndElastic, RobustRubber are not being considered at the time of writing.
But what is Kedro?
Kedro is a development workflow framework that structures a programmer’s data pipeline and provides a standardised approach to collaboration for teams building deployable, reproducible, portable and versioned data pipelines.
In 2015, QuantumBlack was acquired by McKinsey & Company — and the management consultancy has never before created a publicly available open source project.
Global of head of engineering & product for QuantumBlack at McKinsey is Michele Battelli — he asserts that many data scientists need to perform the routine tasks of data cleaning, processing and compilation that may not be their favourite activities but still form a large percentage of their day to day tasks.
He claims Kedro makes it easier to build a data pipeline to automate the ‘heavy lifting’ and reduce the amount of time spent on this kind of task.
- Structure analytics code in a uniform way so that it flows seamlessly through all stages of a project
- Deliver code that is ‘production-ready’, making [theoretically] it easier to integrate into a business process
- Build data pipelines that are modular, tested, reproducible in different environments and versioned, allowing users to access previous data states
QuantumBlack says it has used Kedro on more than 60 projects.
“Every data scientist follows their own workflow when solving analytics problems. When working in teams, a common ground needs to be agreed for efficient collaboration. However, distractions and shifting deadlines may introduce friction, ultimately resulting in incoherent processes and bad code quality. This can be alleviated by adopting an unbiased standard which captures industry best practices and conventions,” noted Battelli and team, in a press statement.
The Kedro team state that production ready code should have the following attributes — it should be:
- Reproducible in order to be trusted
- Modular in order to be maintainable and extensible
- Monitored to make it easy to identify errors
- Tested to prevent failure in a production environment
- Well documented and easy to operate by non-experts
Battelli thinks that code written during a pilot phase rarely meets these specifications and can sometimes require weeks of re-engineering work before it can be used in a production environment.
Kedro also features data abstraction, to enable developers to manage how an application will load and save data — this is so they don’t have to worry about the reproducibility of the code itself in different environments.
Kedro also features modularity, allowing developers to break large chunks of code into smaller self-contained and understandable logical units. There’s also ‘seamless’ packaging, allowing coders to ship projects to production, e.g. using Docker or Airflow.
The team invites open contributions to the project and says that it is excited to see how it develops in the future — Kedro is here on GitHub.
The instant clustering aficionados at Instaclustr have created an anomaly detection application capable of processing and vetting real-time events at a uniquely massive scale – 19 billion events per day.
They did it by using open source Apache Cassandra and Apache Kafka and Kubernetes container orchestration technologies.
Keen to show just how much scalability the scalability factor in its own managed platform technology could handle, Instaclustr completed this action and made detailed design information available here and source code available here.
According to an Instaclustr white paper, anomaly detection is the identification of unusual events within an event stream – often indicating fraudulent activity, security threats or in general a deviation from the expected norm.
Anomaly detection applications are deployed across numerous use cases, including financial fraud detection, IT security intrusion and threat detection, website user analytics and digital ad fraud, IoT systems and beyond.
“Anomaly detection applications typically compare inspected streaming data with historical event patterns, raising alerts if those patterns match previously recognised anomalies or show significant deviations from normal behaviour. These detection systems utilise a stack of [technologies] that often include machine learning, statistical analysis and algorithm optimisation and [use] data-layer technologies to ingest, process, analyse, disseminate and store streaming data,” notes Instaclustr.
The company notes that the challenge comes in designing an architecture capable of detecting anomalies in high-scale environments where the volume of daily events reaches into the millions or billions.
When events hit the millions (or indeed billions) a streaming data pipeline application needs to be engineered for mass scale.
“To achieve this, Instaclustr teamed the NoSQL Cassandra database and the Kafka streaming platform with application code hosted in Kubernetes to create an architecture with the scalability and performance required for the solution to be viable in real-world scenarios. Kafka supports fast, scalable ingestion of streaming data, and uses a store and forward design that provides a buffer preventing Cassandra from being overwhelmed by large data spikes,” notes Instaclustr.
Cassandra serves as a linearly scalable, write-optimised database for storing high-velocity streaming data — so then, proceeding with an incremental development approach, Instaclustr monitored, debugged, tuned and retuned specific functions within the pipeline to optimise its capabilities.
No mention of customer case study references, just hardcore data crunching based upon open source technologies — that’s why we quite liked this.