Facebook has open sourced its Zstandard compression algorithm. The technology itself is said to outperform ‘zlib’, which has previously been considered to the reigning standard in this field.
What is a compression algorithm?
A compression algorithm works to reduce the size of data being handled — lossless and lossy compression are terms that describe whether or not, in the compression of a file, all original data can be recovered when the file is uncompressed.
In terms of usage, lossless compression would suit (for example) text or spreadsheet files and lossy compression is better suited for (for example) video and sound, where a certain amount of information loss will not be detected by most users.
According to facebook.github.io/zstd/, “Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder. It also offers a special mode for small data, called dictionary compression and can create dictionaries from any sample set. Zstandard library is provided as open source software using a BSD license.”
In this case, Zstandard compression is a lossless compression technology.
Zstandard in action
Zstandard works in a manner which directs the data in hand to be as ‘branchless’ as is physically and mathematically possible — in doing so we can say that Zstandard is able to reduce the number of potential ‘pipeline flushes’ which can occur (during un-compression) as a result of incorrect branch predictions.
Linux devotees can now get the beta release of openSUSE Leap 42.2 and the new release is all about stability ability.
This hybrid community-enterprise distribution is the safe choice (says openSUSE) because it has the stability of an enterprise distribution with community-built packages.
“Leap is for pragmatic and conservative technology adopters,” said Ludwig Nussel, the release manager for Leap. “Testing the beta helps make Leap even more mature, so we encourage as many people as possible to test it.”
It’s important to note that openSUSE Leap focuses on well-established packages, like systemd 228 and Qt 5.6.
“The hundreds of SUSE Linux Enterprise (SLE) Service Pack (SP) 2 packages and the thousands of community-built packages allow for an effective development-to-production protocol,” says the firm.
While most can be ‘utterly content’ with the life-cycle and versions of packages in openSUSE Leap, the professional distribution gives developers and organisations an ability to bridge to a faster release cycle with openSUSE Tumbleweed or to a more Long Term Support enterprise solution with SLE.
The release day for the official version is scheduled for Nov. 16 2016.
Hazelcast is an open source in-memory data grid with 500,000 installed nodes and over 16 million ‘node starts’ per month. The firm has now announced Hazelcast 3.7 which is claimed to be 30% faster than previous versions and is the first fully modularised version of Hazelcast.
Each client/language and plugin is now available as a module – the theory being that this speeds up the development process for open source contributors, with new features and bug fixes released as modules alongside Hazelcast 3.7.
This release also features native Cloud Foundry integration.
To make Hazelcast 3.7 faster, the networking layer was reworked for greater concurrency.
Hazelcast can now work in nine cloud environments and can be extended via Cloud Discovery Plugins.
For PaaS, Hazelcast is now available as a service on Cloud Foundry and OpenShift. Also, Hazelcast includes container deployment options for Docker.
In 3.7, you can now specify a Partition Strategy of ZONE_AWARE. This allows a single cluster to run across multiple availability zones with backups kept in separate zones to primary data. An entire availability zone can be lost and the cluster keeps running.
The Hazelcast open source community has created clients for programming environments including Java, Scala, .Net/C#, C++, Python, Node.js and Clojure. Java and Scala can be used for both clients and embedded members.
MariaDB is a database that was created as a community-developed ‘fork’ of the MySQL relational database management system and, as such, has always been free to use under the GNU General Public License.
Recent developments though have seen MariaDB announce the availability of MariaDB Galera Cluster 5.5.51 Stable (GA) and MariaDB Connector/J 1.5.1 Release Candidate (RC). But… the MariaDB corporation (as opposed to the Foundation) is also reported to be releasing its MaxScale database proxy software under a proprietary licence.
MaxScale will now be available under what has been called (by MariaDB’s Michael “Monty” Widenius) as a Business Source Licence and usage of the software is free when an application uses the software with a total of less than three database server instances for production purposes.
“The open source model presents challenges to creating a software company that has the needed resources to continually invest in product development and innovation. One reason for this is a lack of understanding of the costs associated with developing and extending software. As one example of what I regard to be unrealistic user expectations,” writes Widenius.
The product is described as an open source, dynamic data routing platform for minimum downtime, security, scalability and interoperability beyond MariaDB and MySQL.
MariaDB MaxScale is a database proxy that sits between a database layer and the clients of that database — the technology itself is said to be a fundamental element in terms of being able to monetise MariaDB due to its ability to enable MariaDB at enterprise scale.
“Built upon MaxScale’s Binlog Server functionality, you can now stream transactional data in real time from MariaDB to other big data stores like Hadoop or a data warehouse through messaging systems, like Kafka, for real-time analytics and machine learning applications. The package includes sample client applications for a Kafka producer and a standalone Python application to receive streaming data from MaxScale,” explains Dipti Joshi, senior product manager for MariaDB MaxScale.
A new fork has already surfaced in reaction to this development.
This is a special guest post for Computer Weekly written by Lars Herrmann, GM of the integrated solutions business unit at Red Hat. Herrmann writes specifically for Open Source Insider to detail the six most common misconceptions that have arisen surrounding the subject of ‘container’ technologies.
Containers, as we know by now, are best described as independently deployable chunks of software application code (in the form of discrete components of application logic) that are capable of being used to build wider (very often Agile) applications. Containers are ‘intelligent’ enough to make their own calls for application resources in order for them to be able to function and they do this through Application Programming Interfaces (APIs).
Red Hat’s Herrmann writes from this point:
1 — Containers are exciting, but only in cloud-native application development
While we’re observing a lot of the buzz and early adoption of containers to be centered on developers using them to build cloud-native applications, the benefits and use cases of containers reach far beyond. Containers provide a practical path for an organisation to adopt the hottest macro-trends such as hybrid cloud, DevOps and microservices. The combination of being a general-purpose OS technology, with built-in abstraction, automation and separation of concerns, all baked into a set of prescriptive workflows for building, deploying, running and managing applications and services, form a new operational model that allows enterprise IT to introduce certain business benefits. These benefits include increased agility, efficiency and innovation across a broad range of applications and environments. It also defines a technology system around which organisations can build processes and structure, to overcome the complex inter-human interactions preventing these benefits today.
2 — Container technology is ‘new’
Containers are often perceived to be a new technology. True, many of their use cases are only emerging now, but most of the technologies inherent to Linux containers have been around for years and have provided the foundation of many first generation PaaS offerings. The new part is the ability to run and manage a broad set of applications such as cloud-native microservices as well as traditional applications with an image-based delivery model.
Equally, the idea of sharing an operating system instance by isolating different parts of an application is not a new concept. Solutions have been available for splitting up and dedicating system resources efficiently for some time now.
3 — Containers can/will replace virtual machines (VMs)
Containers aren’t the same as VMs, and therefore cannot replace them entirely. This is largely owed to the fact that virtualisation and containerisation address different problems: For instance, virtualisation provides flexibility from hardware, while containers provide speed and agility through lightweight application packaging and isolation.
Also, there are some enterprise workloads that lend themselves to running as containers, others are better served with the hardware abstraction provided by VMs.
For these reasons, we like to think of container technology as a complementary solution to VMs, rather than an out-and-out replacement.
4 — Containers are just that, self-contained
Contrary to the suggestion within the name, containers aren’t completely self-contained. Each individual container leverages the same host operating system, as well as its services. The upshot of this is that businesses can greatly reduce overheads, and improve performance. The downside being that this leads to potential security or interoperability issues. This leads us on nicely to the next misconception.
5 — Containers are watertight in terms of security
Linux containers can rely on a very secure foundation: Linux. Due to the aforementioned characteristic of containers sharing a host OS and all the resources therefore being managed by the OS, security needs to addressed differently than with VM’s. There are two entities that need to be made secure: the OS running the containers – which might run in a VM – and the software payload of each individual container.
Out of the box, Linux offers technologies to isolate containers, such as process isolation and namespaces, however despite their effectiveness they cannot shut down every route malicious code could take in order to access other containers in a single environment. Additional layers of security are necessary to create a completely locked down environment, such as SELinux providing military-grade security by enforcing policies for mandatory access control.
Often overlooked, the container payloads carry most of the security risk in a containerised environment, driven by the usage patterns of allowing development teams to define what goes into these containers and when and how they change. Industry best practices are to run only trusted components inside a container, complemented by scanning techniques to be able to create actionable insights on potential security risks such as viruses, known vulnerabilities or weak configuration and default settings.
6 — Containers will be universally portable
This isn’t the case… yet. For containers to be truly portable, there needs to be an integrated application delivery platform built on open standards and providing a consistent execution across different environments. Containers rely on the host OS and its services for compute, network, storage and management, across physical hardware, hypervisors, private clouds and public clouds etc. The ecosystem is the key here, there needs to be industry standards for image format, runtime and distribution in order for universal portability to become possible.
This need is recognised by the industry and relevant communities who formed entities to define and evolve these standards, such as the Open Container Initiative and Cloud Native Computing Foundation.
The press conference is dead as a meeting format isn’t it? No, apparently not. Managed cloud computing company Rackspace staged what is now its sixth breakfast press briefing this morning in London’s glittering Soho region.
Chaired by industry analyst and (arguably) nicest man on the planet Jon Collins, the full panel included the following:
Igor Ljubuncic, principal engineer, Rackspace
Mat Keep, director, product & market analysis, MongoDB
Martin Percival, senior solutions architect, Red Hat
Clive Hackney, senior engineer, Capgemini
Alexis Richardson, co-founder and CEO, Weaveworks
Markus Leberecht, senior data centre solutions architect, Intel
The big switch to open source
MongoDB’s Mat Keep explained how much we have seen the move to ‘open source first’ now as a primary means of enterprise level software application development OVER AND ABOVE the use of proprietary.
“Using open source products, modules, components used to be an exception and you needed to get special permission to implement it – the opposite reality now exists in many cases and developers will find that they more likely need approval to use proprietary chunks if they feel they need to do so for some reason,” said Keep.
This event certainly drew a few real opinions out of both audience and speakers. Refreshingly, not everybody agreed with each other… this was not a corporate (one might say ‘proprietary’) over-rehearsed set of practiced hyperbole. One might suggest that this reflects the true nature of open source and its fluid dynamic nature.
Controlled chaos, in a good way
According to Rackspace’s Igor Ljubuncic, “Open source could be controlled chaos in some ways depending on how you look at it, but there is control and management. The most important thing to remember is that when you expose your codebase you have to do a good job or the community can fork your product and make it for themselves, so responsiveness to community needs are very important.”
Red Hat for its part were at pains to explain that it sees itself as a ‘catalyst’ for open source development throughout the community… but that it still exists in a role to be able to ‘harden’ the code in use. What Red Hat’s Percival was eluding to was the process of locking down dynamic libraries for commercial use.
More women needed in open source
As a side note of huge interest… during general discussions it emerged that (according to one statistic) the split between female and male developers is roughly 80% to 20% in favour of males, obviously. But, significantly, that split drops down to 90% to 10% — why that should be is unknown, but it may be a good pointer for where responsibilities lie.
One of the key takeaways from this event focused on exactly what kind of development, creation and implementation realities we will see with open source companies in the real world.
If you are focused on creating one single piece of single function software then it is feasible that one developer or one open source team of developers might create it – but, if we are talking about the need to create a piece of software that works all the way up from base level system networking needs through functionality and presentation layer technologies and onward to operating system integration – and then onward bridging to cloud computing structures and beyond… then this could action could (and potentially should) always turn to a more formalised enterprise model.
Albeit… this enterprise team could (and potentially should) still affirm to open source principals.
There’s only one thing worse than an InfoGraphic… and that’s two InfoGraphics, usually. Usually too long to read or overloaded with too much tiny text, InfoGraphics are (arguably) typically laden with the kind of statistics that bear little relation to real world and aren’t a whole lot of use.
The exception that proves the rule
This being so, we should be naturally wary of the InfoGraphic in all its forms, obviously. However, the exception that proves the rule is the Operation Hadoop image shared recently by Pepperdata.
What does Pepperdata do? Pepperdata software’s Adaptive Performance Core product works to observe and reshape applications’ usage of CPU, RAM, network and disk, without user intervention, to ensure jobs complete on time.
Clusters in flux
The software is supposed to prevents bottlenecks in multi-tenant, multi-workload clusters so that many users and jobs can run reliably on a single cluster at maximum utilization.
The company naughtily suggests that we look at its most recent InfoGraphic which is filled with data from over 100 production Hadoop clusters. This study uncovered the most common and insidious symptoms of cluster flux that plague companies of all sizes.
According to the firm, “Pepperdata senses contention for CPU, memory, disk I/O, and network at run time and will automatically slow down low-priority tasks when needed to ensure that your high-priority jobs complete on time — without the need to isolate workloads on separate clusters.”
Just to make it more fun to look at, Pepperdata has presented the information in the form of the Hadoop elephant mocked up as the body from the children’s game ‘Operation‘.
So did we actually run an InfoGraphic? No not quite, just the fun part image below.
Who works with big data? Is it mostly developers or DataBase Administrators (DBAs), or even sysadmins? The answer of course, increasingly, is none of the above… we are defining new roles in the form of so-called ‘data scientists’ or ‘data engineers’. These are the people most commonly getting their hands dirty in big data engineering where they will often be exposed to the R programming language.
What is the R language?
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
R was designed 20 years ago to allow academic statisticians and others with sophisticated programming skills to perform complex data statistical analysis and display the results in any of a multitude of visual graphics. In the past, R has been criticised for delivering slow analyses when applied to large data sets, but more recent versions of the language are attempting to address this problem.
So why all this talk of R?
Because data scientists and data engineers using R will now be able to use IBM Watson more easily. IBM has worked with Columbus Collaboratory to partner on the release of CognizeR an open-source R extension.
Advanced analytics & cyber security
Columbus Collaboratory is an ecosystem of companies and partners focused on innovation in the areas of advanced analytics and cyber security.
According to IBM, “CognizeR offers easier access to a variety of Watson’s Artificial Intelligence (AI) services that can enhance the performance of predictive models developed in R.”
Using the CognizeR extension, users can get a channel to IBM Watson directly from their R environment.
“As we collect feedback, we’ll be able to continually improve the experience by adding the cognitive services that data scientists want and need the most,” said Shivakumar Vaithyanathan of the IBM Watson Content Services division.
Ty Henkaline, chief analytics innovator, Columbus Collaboratory comments say, “CognizeR now shortens the journey toward building real cognitive solutions by providing quick and easy access to Watson services. Releasing this code to the open-source community advances our mission of delivering accelerated business value to our member companies and beyond.”
The Lithuanian police department (Lietuvos Policija) has gone open source (atviro kodo) for the LibreOffice suite of productivity applications over any previous preference for Microsoft products.
The installation is a test phase project which spans over 8000 workstations running Ubuntu Linux.
The installation was completed in June of 2016, according to official police media statements.
General N. Malaškevičius speaks
“At first, it all seemed like a dream. We talked about how we can reduce the ever increasing cost of software, we knew that not a single Western company, a municipality or a public body uses open-source software, but did not believe that they themselves can thus relatively easy to do so. As in all modern developments, the most sensitive link is not equipment, a single software interface with the other, and the people who adapt to a slightly different desktop is far more complicated than it may seem at the outset. But when they realise what huge amounts of money we save, the workers became more lenient to change and will soon become skilled in telling using the new word processor, spreadsheet, slideshow creation, drawing, mathematical formulas insertion, database functions with LibreOffice suite,” said deputy police commissioner General N. Malaškevičius.
The move to LibreOffice is said to be ‘one of several’ open source products now implemented by the Lithuanian police. The organisation claims to now be using up to somewhere around a total of 30% open products for software.
This is a guest post for the Computer Weekly Open Source Insider blog written by Luke Whitehead in his capacity as head of EMEA marketing for Couchbase — the firm is am an open source, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database specialist.
Whitehead writes as follows…
Digital transformation = revolution
The very term revolution is defined as a dramatic and wide-reaching change in conditions, attitudes, or operations.
What better way to describe the colossal change facing businesses today as industry after industry shifts to the digital economy? Businesses speak of the Digital Economy, but what tools and innovation is opening up new opportunities? According to the European Commission the digital economy is “the single most important driver of innovation, competitiveness and growth, and it holds huge potential for European entrepreneurs and small and medium-sized enterprises (SMEs).”
Web, mobile & IoT apps are the heart of the new Digital Economy – a multi-trillion-dollar business opportunity – and NoSQL is the operational database powering those apps. Built on the cloud, mobile, social media and big data, it has become fundamental to businesses across the globe. The explosion of new apps seems unstoppable, but what’s behind the applications? What tools are developers using to deal with the mass of data?
Relational vs. non-relational
The all-important backbone is often down to a decision between relational vs. a non-relational database. Relational databases were born in the era of mainframes and business applications – long before the Internet, the cloud, big data, mobile and now, the digital economy. NoSQL databases have been specifically engineered to meet a new generation of enterprise requirements including the need to develop with agility and to operate at any scale.
However, that doesn’t mean there isn’t a place for relational databases within today’s business world, what’s important is for a developer to understand the business needs and supporting them with the best possible infrastructure.
Relational databases are well suited to legacy business management applications, such as enterprise resource planning (ERP) and supply chain management (SCM) – still vital across enterprises. The difference with NoSQL is in the detail. Non-relational databases meet the performance, scalability, availability and agility requirements of interactive customer-facing applications.
With so much data up for grabs, data has become the new currency, and open source developers are the bankers so critical to business success.
Only by deriving data-driven insights, at the moment of interaction, can a business recognise who they’re engaging with, understand what that they want and deliver a great experience. Those that can store, find and access their data at a moment’s notice, will stay ahead in the game – able to deliver exceptional customer experiences and create innovative products and services that will allow them to succeed in the digital economy.
The most innovative companies are embracing NoSQL by successfully introducing it into their relational environments. Many developers deploying NoSQL broadly to address the following challenges and new business requirements:
1. Customers are becoming more demanding, meaning business processes must be fast and agile in order to cater to their needs.
2. As part of this demand, businesses must scale to support thousands if not millions of users with consistently high performance, 24 hours a day, 365 days a year.
3. The Internet is connecting everything, meaning businesses must support a variety of applications with different data structures and countless real-time interactions.
4. Big data is getting bigger making it essential to store customer generated semi-structured/unstructured data from a variety of sources
NoSQL looks to address these needs, helping businesses to remain competitive across a huge number of applications including:
- Supporting large numbers of concurrent users
- Delivering highly responsive experiences to a globally distributed base of users
- Being always available, no downtime ever
- Handling semi and unstructured data
- Rapidly adapting to changing requirements with frequent updates and new features
You might be wondering ‘why now?’, ‘why change?’ or even ‘if it ain’t broke don’t fix it’. The bottom line is the digital economy is expected to reach €3.2 trillion in the G-20 economies and already contributes up to eight per cent of GDP across Europe. It represents opportunity and potential in times of business uncertainty and political change.
Amidst change and uncertainty, the digital economy is here to stay – it powers growth, creating jobs and contributing to higher productivity gains, providing opportunities for developers throughout Europe.
A modern infrastructure is essential to powering maximum uptime, and all important scale that meets the requirements for doing business in today’s digital economy. With NoSQL, enterprises are able to both develop with agility and operate at any scale to deliver the performance and availability required to meet the demands of digital economy businesses.