This is Q&A session with Carlos Sanchez in his capacity as engineer at CloudBees, a provider of continuous delivery and integration software services. The company’s open source continuous integration tool, Jenkins, is the focal point of CloudBees’ services.
Computer Weekly Open Source Insider (CWOSI) posed it’s nine most pertinent Kubernetes questions to Sanchez in an attempt to uncover the heart of the matter given the rise in popularity and profile that Kubernetes is experiencing in 2017.
CWOSI #1: For those that don’t know Kubernetes, how would you summarise and define this technology?
Sanchez: Kubernetes is an open source platform designed to automate the deployment, scaling and operation of containers. It’s a cluster technology that allows for running containers at scale. It enables the execution of applications in isolation across large data centres.
CWOSI #2: How and why did Kubernetes come about in your view — why do we need it?
Sanchez: Docker really made containers successful. Google has been running containers for a number of years – billions of containers, in fact. Kubernetes came out of Google’s experiences running containers at this scale, resulting in Google taking the technology into the open source world to make it easier for others to manage containers.
As for why we need Kubernetes, this is because containers are becoming more important for organisations large and small, empowering development teams to operate in massively distributed environments, to deliver software faster with DevOps and Continuous Delivery practices. Anything that simplifies the efficient operation and management of containers in this context was always going to be in hot demand from the enterprise.
CWOSI #3: Kubernetes is essentially open source, of course, but how many developers actually contribute code commits to a piece of technology that is so inherently infrastructural?
Sanchez: Overall, more than 1,400 contributors. Google, Red Hat and Microsoft are included in that 1,400, to name just a few. Most recently Amazon and Alibaba have become some of the biggest companies to get involved with the technology. The Cloud Native Computing Foundation oversees the technology as a whole.
CWOSI #4: Does containerised technology ultimately mean that each individual component is more accountable in terms of its need to validate its purpose and ultimate ability to deliver a specific output or function?
Sanchez: Containers are typically associated with microservices architectures. Each component is expected to fulfil a specific contract. The components do have a purpose and they have inputs and outputs that are marked by this contract and APIs. They have to be able to deliver whatever their responsibilities are. They should be independent and fulfil a very specific role in architecture where hundreds or thousands of these services coexist.
CWOSI #5: When DON’T you need Kubernetes… is it when you know you don’t need to scale or span multiple machines?
Sanchez: Kubernetes is a complex system. It only makes sense to adopt the technology if you have the scale to justify the deployment. For example, if you only use one or two virtual machines, or you don’t have any more demanding requirements you may not need Kubernetes – Docker, alone, may suffice. That being said, current cloud offerings by Google or Azure make it really easy to get started with Kubernetes and scale from there.
CWOSI #6: Can you explain Kubernetes pods for us?
Sanchez: A Kubernetes pod is essentially a group of containers that run together in the same host. These containers share certain characteristics; for example, they share the same networking space and resources. Really a Kubernetes pod is composed of containers that need to coexist with each other.
CWOSI #7: How easy is it to get Kubernetes wrong and put together the wrong kind of implementation?
Sanchez: This comes back to the installation – it’s a complex piece of software and it requires certain expertise to be set up. That’s why people resort to using Google Kubernetes Engine or Azure container services instead.
That said, there’s an increasing number of tools, both open source and commercial, such as kops, kube-aws or kubeadm that would help you execute a proper installation. If you don’t use one of the installers to simplify the installation, then it’s more likely you will make an error during this process.
CWOSI #8: What is the CloudBees central position on Kubernetes?
Sanchez: We are committed to supporting Kubernetes – the industry has clearly embraced it… and CloudBees Jenkins Enterprise already runs on it.
CWOSI #9: How will Kubernetes develop over the next couple of years in your view?
Sanchez: There will be an increasing number and variety of Kubernetes offerings entering the market from a host of different providers, not just cloud providers but also OS providers. Kubernetes will become the de facto OS for clusters. In addition, Kubernetes is going to evolve into this set of standard APIs that will allow you to run cluster architectures.
We’re seeing cloud providers tearing up infrastructure so that you will be able to run Kubernetes without even needing to run servers. As a result we’ll see vendors providing Kubernetes as a Service where you will be able to run containers in the cloud without having to worry about machines anymore. AWS have already announced their intent to offer this, and this is a trend that’s set to continue among other providers.
Barely even featuring as an item on its press room pages, news has filtered out this week of Czech virus deerstalking firm Avast releasing its machine code decompiler RetDec to open source.
The compiler is engineered to convert binary machine code into a form that looks, feels and executes in what should look like original source code.
As explained here on TechTarget, to decompile is to convert executable (ready-to-run) program code (sometimes called object code ) into some form of higher-level programming language so that it can be read by a human.
Decompilation is a type of reverse engineering that does the opposite of what a compiler does.
Decompilation vs disassembly
It’s important to remember that decompilation via a decompiler is not the same as disassembly via a disassemblers.
To explain the difference, disassemblers turn binary into assembly code (low level language with little abstraction, but more readable to humans than hard core machine code). Decompilers go further back, with less abstraction.
Avast’s intentions for RetDec (and the reason for putting it out in the wild and open sourcing it) is to address the suggestion (made by Avast) that existing open source decompilers are not stable enough and fail to provide an appropriate level of code readability.
RetDec is so named because in full, it reads Retargetable Decompiler — and this re-target-ability allows this tools to focus on code from different 32-bit architectures.
Security minded developers can try out decompilation in the browser here.
Herrmann states that, as many of us know, Kubernetes was originally designed and developed by Google employees when, at the time, Google was one of the first supporters of Linux container technology.
Kubernetes is an open source project forming the basis for container management in many deployments. It provides an environment for the automated provisioning, scaling and management of application containers.
Herrmann writes as follows…
Back in May 2014, Google publicly announced that Google cloud services run on containers. Each week, Google generated over two billion containers on Borg, its internal platform, which served as the predecessor to Kubernetes.
The wealth of experience that Google gained from developing Borg over the years significantly influenced Kubernetes technology.
Google handed over the Kubernetes project to the newly founded Cloud Native Computing Foundation in 2015.
Kubernetes provides a platform on which containers can be deployed and executed using clusters of physical or virtual machines. As key technologies, Kubernetes uses Linux and a container runtime.
Linux runs the containers and manages resources and security. The container runtime manages host-level instantiation and resource assignment (for example Docker or CRI-O). IT departments can use Kubernetes to:
- Orchestrate containers across several hosts
- Make more efficient use of hardware resources required to run company applications
- Manage and automate application deployment and updates
- Mount storage and add storage capacities in order to run stateful applications
- Scale application containers and their resources
Moreover, Kubernetes integrates and uses the services and components of additional open source projects, such as Atomic registry, OpenvSwitch, SELinux or Ansible.
Current focus of Kubernetes
Version 1.8 of Kubernetes has been available since September 2017. The members of the community are currently focusing on five areas:
#1 Service automation
One of the new features of Kubernetes 1.8 in the area of service automation is Horizontal Pod Autoscaling (HPA). HPA enables Kubernetes to automatically scale the number of pods based on usage. Integrating custom metrics allows users to benefit from greater flexibility in scaling workloads.
#2 Workload Diversity
Workload diversity takes two main considerations into account. The first of these is batch- or task-based commuting. Many users are interested in moving some batch workloads to their OpenShift clusters. That is why several new alpha-stage features have been added. These concern batch retries, the waiting time between failed attempts, and other activities necessary for managing large parallel or serial implementations. The second consideration is scheduleJob, which has now been renamed cronJobs and is in beta development.
#3 Security: role-based access control
The Red Hat OpenShift Container Platform was one of the first Kubernetes solutions to support multi-tenancy. Multi-client capability simultaneously makes it necessary to develop role-based access control (RBAC) for the cluster. RBAC version 1 is generally available with Kubernetes 1.8. RBAC authorisation has been a direct port from the OpenShift authorisation system since version 3.0 and enables granular access control to the Kubernetes API.
Immediately deployable RoleBindings are another new feature; these range from discovery roles, to user-facing roles, through to framework control roles and controller roles. New features also include integration with escalation prevention and node bootstrapping as well as the possibility to adapt and expand RoleBindings and ClusterRoleBindings.
It will be possible for kubectl – a command line tool for running commands against Kubernetes clusters – to work with plug-ins, thanks to the work carried out by the CLI Special Interest Group. This function is still in an early stage, and will make it possible to expand kubectl without having to clone the code repository. Developers write the desired code in the language of their choice and then issue the command. This leads to new subcommands, since an executable file is stored at a particular storage location on the hard drive.
#5 Cluster stability
Kubernetes 1.8 features a client-side event filter in order to increase cluster stability. This filter will stop excess data traffic on the API server caused by internal cluster components. There is also a new option to limit the number of events processed by the API server. Threshold values can be globally set on a server, or set per namespace, user, or source+object. Moreover, Red Hat has worked to enable API users to receive the results in Pages. This will minimize the memory allocation impact caused by comprehensive queries.
Finally, another new feature in Kubernetes 1.8 is a stable version of the lightweight container runtime CRI-O. This makes it possible to use OCI-compatible (OCI = Open Container Initiative) containers in Kubernetes, without requiring additional code or other tools.
At present, CRI-O focuses on starting and stopping containers. Although CRI-O has a command line interface, this was only designed to test CRI-O itself and is not suitable for managing containers in a live system. The Red Hat OpenShift Container Platform, for instance, provides an opportunity to create and operate containers via CRI-O.
Open source In-Memory Data Grid (IMDG) company Hazelcast has joined the Eclipse Foundation – and it has done so for a reason.
In particular, Hazelcast will be collaborating with members to popularize JCache, a Java Specification Request (JSR-107).
So what place does JCache fill in the universe then?
In the simplest terms, JCache is the standard caching API for Java. It works to ‘specify API and semantics’ for temporary in-memory caching of Java objects.
These Java objects can include object creation, shared access, spooling, invalidation and consistency across Java Virtual Machines (JVMs). These operations help scale out applications and manage their high-speed access to frequently used data.
In the Java Community Process (JCP), Hazelcast’s CEO, Greg Luck, has been the co spec lead (and then, after that, maintenance lead) on “JCache – Java Temporary Caching API” since 2007.
Prior to becoming a Solution Member of the Eclipse Foundation, Hazelcast was already an active member of the Eclipse MicroProfile project. This is a baseline platform definition that optimizes enterprise Java for a microservices architecture and delivers application portability across multiple MicroProfile runtimes.
The initially planned baseline is JAX-RS + CDI + JSON-P, with the intent of community having an active role in the MicroProfile definition and roadmap.
Community members will continue to work independently, but the MicroProfile project allows collaboration where there is a commonality. Other members include IBM, Red Hat, Tomitribe, Payara, the London Java Community (LJC), SouJava, Hazelcast, Fujitsu, SmartBear and Oracle.
It’s that ‘wonderful’ time of year, when people all across the land exchange presents, meal invitations and predictions for what the open source landscape might look like in the months ahead according to our current understanding of time in relation to space and the wider universe.
SVP of technology at GitHub Jason Warner has cheekily suggested that the cloud model of service based processing, storage and analytics is about to move from being Cloud 1.0 to some higher level notion of the model itself – say, Cloud 2.0, perhaps?
Well, it’s Christmas-Kwanzaa, so on Earth not?
Warner’s Cloud 2.0 is a magical world of fairies and elves who spend most of their time in the workshop playing with data tools and services that support it, like analytics and machine learning systems.
Also worth looking forward to, whether or not your have been naughty or nice, is the suggestion that the workflow war will heat up. Warner thinks that solving infrastructure problems and building better workflow tools could be a key focus in the months ahead.
Could we finally see infrastructure have its Ruby on Rails moment? The man from GitHub – he say yes!
“New tools will help developers get their ideas to production faster and save them time turning knobs under the hood. With applications taking some of the infrastructure burden off developers, they’ll be free to focus on the stuff they care about most – building, growing and evolving their projects and products,” said Warner.
Automated secure layers
He also asserts that security needs to be built into code development, not added in production. The suggestion is that we’ll also see the rise of more intelligent systems, eventually culminating in a series of automatically secured layers.
“The fragility of net neutrality and the rise of country-specific data localisation laws will undoubtedly test the resilience not only of the internet—but also the fabric of global society and how businesses work together worldwide,” Warner concludes.
The company has said that 2018 will decide the future of net neutrality – GitHub says it will feel the impact, whatever the outcome.
This is a guest post for the Computer Weekly Open Source Insider blog written by John Pocknell in his capacity as senior product manager at Quest.
Pockell has been with Quest Software since 2000, working in the database design, development and deployment product areas. He is now responsible for the strategy and roadmap of the Toad portfolio of products worldwide.
Toad Software is a database management toolset from Quest that database developers, database administrators and data analysts use to manage both relational and non-relational databases using SQL.
Pockell writes as follows…
Over the past few years, there has been a phenomenal adoption of open source databases among enterprises. In fact, Gartner forecasts that by 2018, more than 70% of new in-house applications will be developed on open source database management systems (OSDBMS) — and 50% of existing commercial relational database management system (RDBMS) instances will have been converted or will be in process.
A growing number of businesses are implementing heterogeneous, cloud-based databases to power business-critical applications in finance, CRM, HR, e-commerce, business intelligence and analytics and more.
In many cases, these critical applications are dependent on the database vendor and its related offerings, creating a vendor lock-in situation – right down to management and replication – that does not support a hybrid environment.
DevOps establishes a culture and environment to build, test and release software in a rapid, frequent and reliable fashion by embracing Agile methodologies across the IT teams.
If application updates require changes to the database, however, the DevOps process often breaks down, because databases are historically developed and managed differently due to their complexity, development process and sensitive nature.
Database development also frequently lacks code testing and reviews, source code controls and the ability to integrate with existing build automation and release processes, which are critical to preventing errors impacting production systems.
DBAs need to consider adopting tooling that can help them navigate these potential pitfalls by breaking down the common barriers associated with deploying database changes alongside application changes in the DevOps workflow, by integrating those changes with the continuous integration and continuous delivery aspects of DevOps processes.
Sharpened proven tools
With proven tooling, DBAs can ensure they test the functionality of code to reduce defects during the automated build process; perform static code reviews, as well as integrate into popular tools such as Jenkins, Bamboo, and Team Foundation Server.
However, developers and database administrators may still have hesitancies about moving to a new platform.
The good news is that resistance to the OSDBMS amongst enterprise organisations is diminishing as CIOs and senior IT managers realise that it is a low-cost yet reliable alternative to the proprietary RDBMS, especially with the advent of better management functions and support.
In short, more enterprise organisations are taking OSDBMS databases such as MySQL and PostgreSQL seriously, so DBAs are less likely to face resistance internally.
At the heart of good database management is the ability to facilitate innovation and reduce the amount of time and resources dedicated to oversight and administration. For this reason, open source databases and DevOps have unleashed immeasurable performance increases for enterprise organisations across the globe, all while saving billions.
In order to embrace open source, DBAs need the tools at hand to ensure they do not become overwhelmed.
Open source cross-platform developer and graphic designer focused user interface specialist Qt (now known as the Qt Company) has this month come forward with Qt 3D Studio.
Emanating from its development bases in Helsinki, Finland and Santa Clara, California, Qt explains that its latest product is a 3D design and development tool for major industrial use cases.
Essentially, this is a 3D human-machine interface (HMI) authoring system.
Now with the contribution of the Nvidia Drive Design Studio into the Qt ecosystem, Qt 3D Studio works to provide a 3D user interface (UI) authoring system.
With the onset of 3D technologies across a wide range of industries, Qt suggests that businesses and developers seek 3D design tools to create the next generation of embedded devices using 3D such as:
- digital cockpits,
- medical devices,
- clinical wearables,
- smart homes applications.
The primary features of Qt 3D Studio include real-time editing of the user interface so that fast iterations on desktop and target hardware allow designers to select the best graphically capable hardware to match a UX vision early in the development process.
Developers can use Qt’s libraries to combine 2D and 3D UIs, using one code to develop across desktop, embedded and mobile devices.
“Qt 3D Studio opens the doors to the development of cutting-edge and visually immersive user interfaces,” said Lars Knoll, CTO of The Qt Company. “As 3D UI design becomes increasingly integral to the development of today’s screens and devices, designers and developers can now work hand-in-hand to meet this growing need and drive innovation in the automotive, healthcare and industrial automation industries.”
Qt 3D Studio offers a plugin based architecture for rendering, input, materials and effects with access to all source code.
Mozilla is on a mission… and it’s a mission designed to ‘empower’ software application developers with tools to help create more STT apps.
STT you say?
Yes, that would be speech-to-text applications.
More specifically, STT apps typically depend upon voice recognition and deep [machine] learning algorithms, the kind of functionality that is not always available to every coder in every environment… but Mozillla wants to address that fact with a little democratisation.
“There are only a few commercial quality speech recognition services available, dominated by a small number of large companies. This reduces user choice and available features for startups, researchers or even larger companies that want to speech-enable their products and services,” Sean White, vice president of technology strategy at Mozilla.
This then is DeepSpeech from Mozilla.
The firm has also noted that it is releasing the world’s second largest publicly available voice dataset, which was contributed to by nearly 20,000 people.
The initial release of DeepSpeech, sees the company include pre-built packages for Python, NodeJS and a command-line binary that developers can use right away to experiment with speech recognition.
Sexist speech tools
Mozilla laments the fact that too often existing speech recognition services can’t understand people with different accents… and many are better at understanding men than women.
This, the company says, is a result of biases within the data on which they are trained.
“Our hope is that the number of speakers and their different backgrounds and accents will create a globally representative dataset, resulting in more inclusive technologies,” said Mozilla’s White.
Mozilla insists that its approach to developing this technology is open by design and so the firm welcomes more collaborators and contributors.
This is a guest post for the Computer Weekly Open Source Insider blog written by Ben Slater in his capacity as chief product officer at Instaclustr.
Instaclustr positions itself as firm offering managed and supported solutions for Apache Cassandra, ScyllaDB, Elasticsearch, Apache Spark, Apache Zeppelin, Kibana and Apache Lucene.
Indeed, Instaclustr is known for its willingness to describe itself as a managed open source as a service company, if that expression actually exists.
The original title in full for this piece was: Migrating Your Cassandra Cluster – with Zero Downtime – in 7 Easy Steps.
Slater’s moves for writing this piece are (obviously) directed at companies who are looking to move a live Apache Cassandra deployment to a new location.
With this task in mind, it is (obviously) natural that these same companies will have some concerns, such as how you can keep Cassandra clusters 100% available throughout the process.
Arguing that if your application is able to remain online throughout connection setting changes, Slater says it can also remain fully available during this transition.
NOTE: For extra protection and peace of mind, the following technique also includes a rapid rollback strategy to return to your original configuration, up until the moment the migration is completed.
Slater writes as follows:
Here’s a recommended 7-step Cassandra cluster migration order-of-operations that will avoid any downtime:
1) Get your existing environment ready
First of all, make sure that your application is using a datacentre-aware load balancing policy, as well as LOCAL_*. Also, check that all of the keyspaces that will be copied over to the new cluster are set to use NetworkTopologyStrategy as their replication strategy. It’s also recommended that all keyspaces use this replication strategy when created, as altering this later can become complicated.
2) Create the new cluster
Now it’s time to create the new cluster that you’ll be migrating to. A few things to be careful about here: be sure that the new cluster and the original cluster use the same Cassandra version and cluster name. Also, the new datacenter name that you use must be different from the name of the existing datacenter.
3) Join the clusters together
To do this, first make any necessary firewall rule changes in order to allow the clusters to be joined, remembering that some changes to the source cluster may also be necessary. Then, change the new cluster’s seed nodes – and start them. Once this is done, the new cluster will be a second datacenter in the original cluster.
4) Change the replication settings
Next, in the existing cluster, update the replication settings for the keyspaces that will be copied, so that data will now be replicated with the new datacenter as the destination.
5) Copy the data to the new cluster
When the clusters are joined together, Cassandra will begin to replicate writes to the new cluster. It’s still necessary, however, to copy any existing data over with the nodetool rebuild function. It’s a best practice to perform this function on the new cluster one or two nodes at a time, so as not to place an overwhelming streaming load on the existing cluster.
6) Change over the application’s connection points
After all uses of the rebuild function are completed, each of the clusters will contain a complete copy of the data being migrated, which Cassandra will keep in sync automatically. It’s now time to change the initial connection points of your application over to the nodes in the new cluster. Once this is completed, all reads and writes will be served by the new cluster, and will subsequently be replicated in the original cluster. Finally, it’s smart to run a repair function across the cluster, in order to ensure that all data has been replicated successfully from the original.
7) Shut down the original cluster
Complete the process with a little post-migration clean up, removing the original cluster. First, change the firewall rules to disconnect the original cluster from the new one. Then, update the replication settings in the new cluster to cease replication of data to the original cluster. Lastly, shut the original cluster down.
There you have it: your Apache Cassandra deployment has been fully migrated, with zero downtime, low risk and in a manner completely seamless and transparent from the perspective of your end users.
You can follow Instaclustr on Twitter.
The Apache Software Foundation (ASF) has graduated Apache Impala to become a Top-Level Project (TLP).
Apache Impala itself is an analytic database for Apache Hadoop, the open source software framework used for distributed storage and processing of dataset of big data.
This TLP status is intended to signify that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles.
Massively parallel processing
Impala is built with what is known as a massively parallel processing (MPP) SQL query engine. This allows analytical queries on data stored on-premises (in HDFS or Apache Kudu) or in cloud object storage via SQL or business intelligence tools.
“The Impala project has grown a lot since we entered incubation in December 2015,” said Jim Apple, VP of Apache Impala.
In addition to using the same unified storage platform as other Hadoop components, Impala also uses the same metadata, SQL syntax (Apache Hive SQL), ODBC driver and user interface (Impala query UI in Hue) as Hive.
Inspired by Google
Impala was inspired by Google’s F1 database, which also separates query processing from storage management..
“In 2011, we started development of Impala in order to make state-of-the-art SQL analytics available to the user community as open source technology,” said Marcel Kornacker, original founder of the Impala project.
Apache Impala is deployed across a number of industries such as financial services, healthcare and telecommunications — and is in use at companies that include Caterpillar, Cox Automotive and the New York Stock Exchange — in addition, Impala is shipped by Cloudera, MapR and Oracle.