Over the past few weeks, I’ve had the great privilege to partner on a series of roadshows with Gene Kim (@realgenekim) author of “The Phoenix Project”, “The DevOps Handbook” and the annual “State of DevOps Report”. The events are called “Culture, Containers and accelerating DevOps, the path to Digital Transformation” and they provide us with an opportunity to speak with developers, enterprise architects, IT operations engineers and business executives about how they are implementing technology and culture changes to help them deliver software faster into their markets.
During Gene’s presentation, he highlights a series of lessons that he’s learned since writing The Phoenix Project. Some of these include:
- The Business Value of DevOps is higher than expected.
- DevOps is as Good for Ops, as it is for Devs.
- The Importance of Measuring Code Deployment Lead Times.
- The Surprising Implications of Conway’s Law. (organizational structure)
- DevOps is for the Unicorns, and the Horses too.
The lessons are supported by a series of stories, examples and data from businesses that Gene has interacted with over the past 4-5 years, as they navigate their DevOps journey.
What’s the Most Important DevOps Metric?
At some point in every event, someone from the audience will ask the question, “If you had to boil it down to a single thing, what is the most important DevOps metric for us to track?” Gene’s answer is often the topic #3 (above), focused on measuring code lead times. It comes from his experience studying the Toyota Production System and it’s approach to flows-of-work, managing and correcting defects, and empowering employees to make the on-going changes needed to improve production of the end product. In essence, he highlights that today’s data centers have become 21st-century bits factories, with the goal of producing high-quality, agile software applications.
For the most part, I’d agree with Gene that this is an important metric to track. But with all due respect to his expertise in this area, I actually believe that there is a better metric to track. And Gene actually calls out this concept in his talk: An Organization’s Confidence Level of Shipping Software into Production.
In my opinion, this concept is more appropriate than any specific metric, because it forces the technology team to think about their actions in business terms. It also allows them to have a conversation with the business leaders in a way that is focused on impact to customers. It directly aligns to every company becoming a software company, and the importance of making software “go to market” becoming a core competency of the business.
It allows the technology team to begin with the end in mind, and work backwards from the goal of safely shipping software into production. Their level of confidence in the end goal will also force them to consider why their current confidence level may not be the highest it could possibly be.
- Are we investing (people, technology, partnerships) for success towards the goal?
- Are we designing our software, our systems and our platforms to handle the emerging needs of the business?
- Are we enabling a culture and set of systems that allow us to learn from our mistakes, and make improvements when needed?
When I think about this concept, I’m encouraged by the level of confidence from John Rzeszotarski – SVP, Director of Continuous Delivery and Feedback – at KeyBank.
John talked about the DevOps journey at KeyBank and how they focused on culture, continuous integration pipelines, automation, containers and their container-application deployment platform . This was a 12-18 month journey, but where they are today is pretty remarkable. He summed up his talk by telling a story about how they recently re-launched services from one of the banks they had acquired. The highlight was that they were able to deploy 10 new updates to the production application, with 0 defects, during the middle of the day. That is a very high level of confidence in shipping software into production, and in the elements that make up their DevOps culture.
The KeyBank story is a great example of making a significant impact to the business, and measuring the technology in terms of business success and agility.
NOTE: Below are my slides from the Culture, Containers and Accelerating DevOps roadshows.
This week, I was listening to an episode of the “Speaking in Tech” podcast and the guest was talking about why he believed that container usage may be overhyped and not necessary a good thing to be exposed to developers or operators.
When having these types of discussions, I believe it’s important to look at not just the evolution of container technology, but the evolution of container platforms. From this, we have learned a few valuable lessons:
Give Developers Flexibility
Whether we’re talking about PaaS (Platform as a Service) or CaaS (Containers as a Service) technologies, the end goals are fairly similar – make it simpler for developers to get their software into production in a faster, more stable, more secure way. And as much as possible, hide/abstract away much of the complexity to make that happen.
Early PaaS platforms got part of this equation correct by delivering Heroku-like “push” functionality to developers. Have code, push code, run code. But they also got part of the equation wrong, or at least limited the developer experience too much. In other words, the experience was too opinionated. The early platforms limited which languages could be used by developers. They also forced developers down a path that limited the versions or variants of a language that they could use.
By letting developers to use standards-based containers as a packaging mechanism, it allowed them more flexibility than the original PaaS platforms allowed. It allowed for more experimentation, as well as letting developers validate functionality using local resources (e.g. their laptop).
Align Technology and Culture
The guest on Speaking In Tech was correct in saying that no single technology, nor a single culture shift, will give a company a technology advantage in the market. It requires a mix of technology evolution and cultural evolution, geared towards delivering better software for their business. Containers plays a role in this. As container can be the unit of packaging and unit of operations, it begins to create a common language and set of processes for both Developers and Operators. It’s not everything, but it’s a starting point. It needs to be augmented with expertise around automated testing, automated deployments and the CI/CD pipeline tools (e.g. Jenkins) that allow for the consistent movement of software from developer to QA to production, and the on-going operation of that software. By hiding the visibility of containers from either developers or operators, it requires more effort to find commonality between the evolving technology and culture.
Container Platforms Have Quickly Evolved
As container usage grows, we’re learning quite a bit about making them work successfully in production. At the core of that learning is the maturity of the underlying container platforms and how to manage them. The reality is that successful container platforms are a combination of the simplicity of PaaS and the flexibility of CaaS. They allow developers to push code, binary images and containers into the platform (directly or via CI/CD integrations) and they give operators a stable, multi-cloud platform to run those applications. We’ve seen the number of developers working on platforms like Kubernetes grow significantly, and businesses are adopting it around the world. And with the evolution of Kubernetes Federation, we’ll begin to see even greater adoption of truly hybrid and multi-cloud environments for businesses.
Containers have experienced a meteoric rise in interest from developers over the last few years. It’s enabling greater flexibility for developers, it’s bringing together developer and operations teams with common technology, and it’s enabling multi-cloud deployments which are expanding interactions between companies and their marketplaces.
For the last few weeks, I’ve been traveling quite a bit, so I’ve spent a decent amount of time on airplanes. When airplane WiFi is poor (quite frequently), so pass the time watching movies. For me, I’ve been watching the excellent “Long Strange Trip” documentary on Amazon Prime, about the history of the Grateful Dead. If you like music, or history, or just enjoy good storytelling, I highly recommend the series.
Coming up on my 1yr anniversary of working at Red Hat, it struck me how many parallels there are between the evolution of the Dead and how open source software communities [LSD trips and being under the constant influence of drugs excluded]. The Grateful Dead have often been characterized as “a tribe of contrarians who made art out of open-ended chaos”. They phrase could easily apply to many open source communities.
[Episode 1] Committed to Constant Change
The Grateful Dead are known as being a touring band, not one that spent time focused on commercial success via studio albums. Like open source software, their music was constantly evolving, and it was interpreted differently by nearly everyone that saw them perform live. As they began to slow in their contributions, their model was “forked” and replicated by touring bands like Phish and Widespread Panic. Similarly, open source software is less about a single project than a style of development and collaboration that is constantly evolving and the principles being copied (and evolved) by many other projects.
[Episode 2] Finding Success on Their Own Terms
While the record labels wanted them to conform to their recording and sales models that were used by most other bands, the Grateful Dead decided to adopt alternative business models. At the time, selling albums would have been more profitable, but they were actually ahead of their time in focusing on live events and allowing their music to be fragmented and easily copied (bootleg tapes). Similarly, many analysts would like to see open source companies deriving revenues in similar ways to proprietary companies, but that model hasn’t been fruitful. Successful open source companies have adopted support models and SaaS models to drive revenues and success.
[Episode 3] Let’s Join the Band
While the Grateful Dead had 5 or 6 original members, the documentary highlights how Donna and Keith Godchaux “just decided to learn the music and join the band” in 1971. Random fans of the Dead actually joined the band and stayed with them for many years. This is not unlike how anyone can join an open source project just by showing interest and making a meaningful contribution.
[Episode 4] Who’s In Charge Here?
For many people, the connection between Linus Torvalds and the Linux project is the model that they expect all open source projects to have. They expect a BDFL (Benevolent Dictator for Life). In most projects, the BDFL role doesn’t really exist. There might be strong leaders, but they realize that broader success needs many leaders and tribes to emerge. This same dichotomy emerged for Grateful Dead, where Jerry Garcia was the visible leader, but he didn’t want to set all the rules for how the band (or their audience) needed to behave.
[Episode 4] and [Episode 5] I’ve yet to see these episode yet (the next airplane flights), but looking at the previews, they appear to have similar open source parallels. They focus on the growing success of the band and how people set higher expectations than the band wanted to take on themselves. This can often happen with successful projects, where commercial expectations begin to drift from core community expectations. This is where strong leadership is needed just as much as the early days of the project.
If you’re interested in open source software, or some insight into how communities ebb and flow, I highly recommend this documentary. And the music is obviously great too.
This past week, Walmart issues a statement to their retail partners, suggesting that they should not run their technology stack on the AWS cloud. This is not an unprecedented move for Walmart, who has required that their partners have a physical presence in Bentonville, AR (Walmart HQ) for many years, in order to simplify meetings and reduce travel costs for Walmart.
It’s understandable that Walmart wants to keep valuable information about their business trends and details about their partners away from AWS (and indirectly, Amazon). This is not to imply (in any way) that customer data is collected by AWS, but there is no way to determine how much meta-information that AWS can collect about usage patterns that could influence the services they offer.
What’s interesting about this statement from Walmart is that they don’t offer a Walmart-branded hosted cloud alternative to AWS. This brings up an interesting dilemma –  Does this create a unique opportunity for the Azure cloud or Google cloud?,  Does Walmart have concerns about Google’s alternative businesses (e.g. Alphabet) collecting data patterns about their partners?,  Will Walmart partners be swayed by this edict, especially given Amazon’s growing market share in retail?  Will this force Walmart to get into the hosted cloud business? Do they keep enough cash on their balance sheet to compete in that market?
Back in December, I predicted that the Trump administration would pick a fight with Amazon, as proxy for Jeff Bezos’ ownership of the Washington Post. That hasn’t materialized yet, although the year is only half way complete.
This action by Walmart ultimately brings up the question: Can non-traditional tech companies begin to impact AWS in ways that traditional tech companies have been unable to do – e.g. slow down AWS growth? The reach of companies such as HPE haven’t been able to slow it down, but maybe Walmart’s massive reach can have a different impact on the market. It will be interesting to see if Walmart reports this in their quarterly reports, or begins to make this a public issue with their Office of the CTO.
Beyond Amazon vs. Walmart, this bring up yet another interesting question – Will we see existing companies with large ecosystems or supply-chains (e.g. automotive, healthcare, etc. ) apply cloud guidance to their partners (e.g. must use XYZ cloud), or has the world of APIs completely changed what a modern supply-chain now looks like? The concepts of “community clouds” have never really taken off in practice.
This past week, we did some reflection on The Cloudcast about the evolution of technology over the last 6+ years. One of the topics were discussed was the impact that OpenStack had on the industry. People has various (strong) opinions about the level of success that OpenStack has achieved, but we discussed how OpenStack changed the IT landscape in a number of significant ways.
Announced and launched in 2010, OpenStack was designed to deliver an API-driven cloud infrastructure, similar to AWS EC2 compute and S3 storage. At the time, there was a split about whether the project(s) would focus on being a VMware replacement, or an open version of AWS services. This was heavily debated by groups focused on both agendas.
Software Defined Infrastructure
While OpenStack was by no means the first implementation of infrastructure services (networking, storage, firewall, proxy, etc), it was the first significant time when this approach to technology was embraced by Enterprise-centric vendors. Until then, both Enterprise-vendors continued to provide hardware-centric offerings that complimented offerings like VMware virtualization. Since then, API-centric infrastructure is becoming more commonplace in the Enterprise, especially with the emergence of containers and container platforms.
Open Source in the Enterprise
While companies like Red Hat, SUSE and Canonical had been selling commercial open source to the Enterprise for many years, OpenStack was the first time that companies like Cisco, HPE, NetApp, EMC and many others were attempting to combine proprietary and open source software into their go-to-market offerings. Since then more IT vendors have been building open source offerings, or partnering with open source centric companies to bring offerings to market for customers that are demanding open-first with their software.
Who’s in Charge of OpenStack?
While Rackspace may have wanted to leverage all the engineering talent to take on AWS, it wasn’t able to maintain ownership of the project. The OpenStack foundation was an early attempt at trying to bring together many competing vendor interests under a single governance model. Critics would argue that it may have tried to take on too many use-cases (e.g. PaaS, Big Data, DBaaS) and projects in the early days, but the project has continued to evolve and many large cloud environments (Enterprise, Telco) are running on OpenStack.
Since the creation of the OpenStack Foundation, several other highly visible open source projects have created independent foundations to manage the governance of the projects (e.g. CNCF, Cloud Foundry, etc.)
Founders Don’t Always Make the Big Bucks
While OpenStack was viewed as a disruptive threat to the $1T Enterprise infrastructure industry, and heavily funded by venture capital, most of the founding individuals didn’t make out in a big way financially. Piston Cloud and Cloudscaling were sold to Cisco and EMC, respectively, with relatively small exits. SwiftStack has pivoted from just supporting OpenStack to also supporting multiple public cloud storage APIs and software-defined storage use-cases. Nebula went bankrupt. Even Mirantis has moved their focus over to Kubernetes and containers. Ironically, Red Hat has become the Red Hat of OpenStack.
Most tech events that I attend are fairly positive events, with people talking about new technologies and how these might “change the world”. The pushback on most talks is about the viability of the technology, or who would actually attempt to use that technology in production.
But a couple weeks ago at Interop, I experienced a much different vibe at several of the cloud computing talks. At several of the talks I attended, people in the audience were asking how this technology would replace their job and what they could do to prevent it.
We’ve Seen this Before
Now, this isn’t really a new sentiment. We heard it from mainframe and mini admins when open systems and client-server computing was introduced. We heard this from telecom admins when voice-over-IP was introduced. And we heard it from various infrastructure teams when virtualization and software-defined were introduced.
What seemed different about the concerns at this event were that most of the people asking questions didn’t believe that they’d ever get the opportunity to expand their current skills at their current employer. In essence they were saying, I don’t doubt that DevOps or Public Cloud or Cloud-native apps will happen, we just don’t see how they’ll happen via the IT organization at their company.
I’ve written before about how learning new technologies has never been more accessible (here, here, here). But I also realize that many people aren’t going to take the time to learn something new if it can’t be immediately applied to your current job. It’s sort of like taking classes in a foreign language, but not having any people else to practice your new language with.
Do we need more IT Admins?
During one of the session by Joe Emison (@joeemison), he made the point that while developers are driving more changes within IT today, that developers aren’t every good at many of the tasks that IT admins typically perform. But this is leading them to leverage more and more public cloud services (see chart).
It was a sobering slide for those in attendance, especially those what had spent many years building up those skills. There was also a realization that they were part of IT organizations that had previously never really been measured or incentivized to optimize for speed, but rather to optimize for cost-reduction and application up-times.
Double down on developers?
There really weren’t many answers for people asking about their future in a world of DevOps, Public Cloud, Automation and more focus on developing and deploying software quickly. Most answers were focused on learning the software skills necessary to program something – where it was an application or the automation tools needed to stand up infrastructure/security/CI pipelines quickly. Those might not have been the answers that IT admins wanted to hear, but they are the answers that provide some path forward. Answers that tell people to do nothing, or just wait for the future to change probably aren’t going to create the future that people in the audience had hoped for.
This past week I had the opportunity to present a session entitled “Managing Containers in Production: What you need to know” at the Interop conference in Las Vegas. In addition to the talk, I had the opportunity to watch several other presentations about containers and cloud-native applications. One session was focused on “The Case for Containers: What, When, and Why?”. It was primarily focused on Containers 101 and some examples of how you might run containers on your local machine. It highlighted for me three distinct differences between running containers locally and running them in production.
Local Containers vs. Container Platforms
One of the discussion points was getting from running a single containers to running several containers that make up an application, or several interconnected services. The suggestion was that people can just use the build in “Swarm Mode” to interconnect these clusters. While this is true, the session failed to mention the more popular way to do this, using Kubernetes. A member of the audience also asked if this could create a multi-tenancy environment for their business, and they were told that there were no multi-tenant technologies for containers. It’s true that Swarm Mode does not natively support multi-tenancy. But it is also incorrect that multi-tenancy isn’t supported for containers. Red Hat OpenShift delivers a multi-tenant environment for containers (via projects, etc.), built on top of Kubernetes.
Docker Hub vs. Managed Container Registries
Throughout the talk, the speaker used Docker Hub as the source for all container images. While Docker Hub has done a great job of bringing together the containerized applications of ISVs and independent engineers, it does have it’s challenges. First, several independent studies have show that many images on Docker Hub have known security vulnerabilities or viruses. This means that it’s important to know the source of container images as well as have a mechanism to scan/re-scan any images you use in your environment. Second, Docker Hub is a registry located across the Internet from your environment. What will you do if Docker Hub isn’t reachable in your application pipeline? This leads many companies to look at using local container registries to not only improve availability, but also manage bandwidth requirements which might be high for large container images. It also allows companies to better manage image sources (e.g. corporate standard for trusted images) and scanning capabilities.
Aligning Container OS vs. Host OS
A final topic that came up as a result of an audience question was whether or not you should align the base Linux image in the container with the OS in the host where the container is running. This is an important topic to discuss because containers are a core element of the Linux operating system. In essence, they divide the Linux running on the host into two sections: container image and container host.
For an individual’s machine, it may not matter that there is alignment between container base image and the host OS. This can often happen if you’re using the defaults in a tool like Docker for Windows/Mac (e.g. LinuxKit or Alpine Linux) and the popular images from Docker Hub (e.g. Ubuntu Linux). But as this moves into a production environment, it becomes a more critical alignment. There are many elements to Linux containers and Linux hosts. There can be differences between different versions of an OS, version of Linux kernel and the libraries included with each one. This can introduce security vulnerabilities or a lack of functionality.
Overall, it’s great to see container topics being widely discussed as not only DevOps and Developer-centric events, but also as Infrastructure-centric events like Interop. But it’s important that we discuss not only the basics, but how the emerging best-practices get put into production in a way that not only benefits developers and the applications, but also give operators and infrastructure teams a model to keep those applications running and secure.
Next week at Interop 2017 in Las Vegas, I’m giving a talk about managing containers. The focus of the talk is to look at the expanded interactions that are required as engineers move from having a single container, running on their laptop, to moving it into production. It looks at how much developers need to know about containers to get their applications working, and what operations teams need to plan for in terms of container scheduling, networking, storage and security.
Breaking down the talk, there are three critical messages to take away.
The Need for Container Platforms
Platforms that manage containers have been around for quite a while (the artist formerly known as “PaaS”), just like Linux containers have been around for much longer than docker. But as containers are becoming more popular with developers, as the native packaging mechanism for applications, it becomes increasingly important than operations teams have the right tools in place to be able to manage those containers. Hence the need for container platforms, and the emergence of technologies like Kubernetes.
The Developer Experience Matters
As platforms transition from PaaS to CaaS, or some combination of the two, it’s important to remember that the container is just a packaging mechanism for applications. It’s critical to make sure that developers are able to use the platform to rapidly build and deploy applications. This could means that they package the application on their laptop using a container, or push their code directly into a CI/CD pipeline. In either case, the container platform must be able to take that application and run it in a production environment. The platform shouldn’t restrict one development pattern or another.
Operational Visibility is Critical
While containers bring some interesting properties around packaging and portability, it’s important for operational teams to realize that they have different characteristics from virtual machines. Containers may run for a few seconds or for long periods of time. This means that the management, monitoring and logging tools have to be re-thought in order to be valuable in a container-platform environment.
The discussions and sessions at ServerlessConf Austin ’17 were a good mix of emerging technology and existing use-cases. The high level message is that serverless allows developers to focus entirely on their applications and all (OK, most) of the infrastructure and operations challenges get handled by the underlying services.
But as with any emerging technology, after a year+ of learnings, the focus of “what’s next?” begins to evolve. In discussions with several early-stage users, they said that there are two big areas they expect to see highlighted in 2017/2018:
- The evolution of frameworks from being focused on the functions, to being focused on the events and connected services. In particular, the Serverless Framework was mentioned as one that will need to evolve their focus from functions to events.
- The set of corner-cases and advanced use-cases that can’t be addressed with current services and tools.
From Functions to Events – The Natural Platform Evolution
If we go back a couple years, when AWS Lambda was first announced, it had somewhat limited functionality and had limited language support. Fast forward a couple years and the number of other AWS services that can now interact with Lambda has grown significantly, as has the number of supported languages. Services like API Gateway, Kinesis and CloudFormations are now capable of invoking Lambda functions.
Moving outside the AWS ecosystem, and we now see Azure beginning to expand not only their Azure Functions capabilities, but also the tools that connect Azure Functions to other services. At ServerlessConf, it was highlighted that Azure Logic Apps now supported over 120 connectors, to a mix of Azure and 3rd-party services.
As we move from being focused on the code within a function to the connections with other services, the mindset begins to change from “solving technical challenges” to “thinking about business logic”. With the broad range of connectors that will become more readily available in the marketplace, across many services, it’s not hard to imagine a line-of-business leader being able to bring together a new application to address to business model concept with very limited effort. It begins to democratize the development process.
This move from thinking about functions to thinking in terms of events and connectors also creates the possibility for more asynchronous business interactions. Instead of thinking about the end-to-end process of a transaction or process, the steps of interaction can be asynchronous and more loosely coupled. Or they can become more micro-interactions, eventually allowing a more customized experience to be tailored to individual customers.
There are still many use-cases and corner-cases to be worked out as serverless moves the focus from functions to events, but it is definitely an area that the community at ServerlessConf was spending quite a bit of time thinking about and discussing possibilities.
This past week was ServerlessConf in Austin. The event has now done a world tour over the last year, starting in New York in 2016 and coming back to the states in 2017.
We have been following the serverless space for over a year now (see @serverlesscast), but this was the first time we had a chance to attend an event in person. I went down there looking to better understand four key areas:
- The Community – Who is attending, what are they working on, and what level of progress have made?
- The Market – Is this as an area where people are using it for real business challenges, and is there a market to make money from this technology?
- Cloud options vs. Open options – Most of the discussion about serverless so far has been about services delivered via public clouds (e.g. AWS Lambda, Azure Functions, Google Firebase, Auth0, Netlify, etc)
- Market Investment – Is this a market where a lot of VC money has already been placed, or is funding coming from somewhere else?
The events are hosted by A Cloud Guru, which not only does training for cloud services such as serverless, but they have built their entire business on serverless technologies. They have done a great job building the community and attracting end-users that are building interesting things with serverless.
The show attracted about 250 people for the Day 1 hands-on training sessions, and almost 450 people for the Day 2/3 talks and networking. The audience still seems to be people that are doing early-stage projects with serverless, and it hasn’t (yet) been overtaken by the vendors. Nearly everyone in attendance was working on some serverless project for their business, from large (beta) rollouts to entire businesses being delivered via serverless architectures.
Serverless is still in the early days, as we don’t have any companies publicly announcing their revenues at this point. AWS is currently the leader, as their AWS Lambda services have been out in the market the longest and attracted the early-adopter following. But Microsoft Azure Functions had a very strong presence, showcasing now only Azure Functions but also a broad range of event-connectors through their Azure Logic Apps. Google has a split portfolio between Firebase, which has been a Backend-as-a-Service for a while and the beta Google Functions. Applications frameworks like Serverless Framework and Go Sparta are also starting to emerge.
There were several businesses talking about their real-world use cases, including consumer companies like iRobot and large Enterprise companies like Accenture.
Cloud options vs. Open options
Most of the event was focused on cloud services for serverless, but the market does have a few open options that are beginning to emerge. IBM has donated OpenWhisk to the Apache Foundation. Kubernetes has a few projects (Funktion, Fission, Kubeless) that are maturing for on-premises or cloud deployments. Also, a new project from stdlib was announced, called FaaSLang. Since one of the key value points of Serverless/FaaS is only paying for usage, it will be interesting to watch if any on-premises or open-source offerings catch hold in the market.
The amount of VC funding that was present, via startup funding, wasn’t very large at ServerlessConf this week. There were smaller companies such as IOpipe, stdlib, Fauna, A Cloud Guru and Serverless Framework, but none of them had raised a large amount of money yet. The market is still trying to figure out how quickly developers will be attracted to this new mode of developing applications, and if there will be any white space left after the public cloud providers begin to roll out more serverless tools and services.