What the Heck is a Container, and Why Do I Care?

By Chris McBeth, Senior Solutions Architect, cStor

Kubernetes is a vendor-agnostic cluster and container management tool, open-sourced by Google in 2014. It provides a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts.” Above all, Kubernetes lowers the cost of cloud computing expenses and simplifies operations and architecture.

So what the heck does all that mean?

Below, I’ll explain Kubernetes from a high level, and answer the following questions:

What is Kubernetes, and what does it do?
Why should people use it?
What does orchestration mean?
Why do people use containers?
Why should IT administrators and leadership care?

Kubernetes and the Need for Containers

Before we explain what Kubernetes does, we need to explain what containers are and why people are using them. At a very basic level, containers are individual, self-contained, platform-agnostic application environments, and Kubernetes is the overarching management tool that helps you keep track and manage all of them.

A container is a mini-virtual machine. It is small since it doesn’t have device drivers and all the other components of a regular virtual machine (VM) or the associated operating system. Docker is by far the most popular container and is written in Linux. Microsoft has even added containers to Windows because they have become so popular. This is particularly interesting to loyal Microsoft users because the Windows Operating System is undeniably bloated since it contains code and drivers to make it as universally compatible as possible.

The best way to illustrate why containers are useful and important is through an example. Suppose you want to install a simple web server on a Linux platform. You have several ways to do that. First, you could install it directly on a physical server’s OS, or more likely, install it inside a virtual machine with a pre-loaded operating system like Windows, or in this case, Linux.

A traditional VM contains a full-blown operating system within it, including all the associated drivers, libraries and such that make it extremely versatile and robust. So, what could be bad about that? The fact that it’s a full-blown OS (VM OS) sitting on top of another full-blown OS (host OS) is a huge waste of resources and disk space. In many cases, it also adds additional unnecessary costs.

Sometimes, you don’t need all that extra “stuff”… not to mention licensing for VM’s as well as licensing the OS on the VM which can get expensive if you’re using VMware, Windows or other commercial-grade hypervisors and OSs. There’s also the issue of the size… the bigger a VM is, the harder it is to deploy elsewhere, back it up and replicate for resiliency and DR.

So, setting up a new virtual machine to perform a specific function can require a fairly significant amount of administrative effort and cost, and consume considerable resources on a physical server or virtual host that otherwise could be used to run other applications and workloads.

Besides consuming time and unnecessary resources, single-use VM’s are usually underutilized if you dedicate them to just one task, which is how a lot of IT admins like to deploy them. They do it that way because it enables them to deploy copies of that application on other servers without having to re-install over and over. Cut, paste and activate! That is the value of virtualization in a nutshell!

Putting a single application on a server (virtual or physical) also isolates its operating environment. This makes it much easier to troubleshoot problems if and when they occur since the problem can only be caused by one or two code-bases (app and OS) vs. several potential sources of problems (multiple app code-bases installed on one VM).

For efficiency purposes, it would be better to load that one VM with a web server, messaging software, DNS server and several other useful services to increase its utility, if you had the ability to do so. However, in doing so, it becomes harder to support and probably isn’t as useful for copying.

The people who invented containers thought through these issues and reasoned that since many applications just need some bare minimum operating system code to run, why not make a stripped-down version of an OS, place the application inside that, and run it? Then you have a self-contained, machine-agnostic unit that can be installed anywhere. In other words, it’s a micro-VM that can “live” inside other standard VM’s alongside many other micro-VM’s, or even on bare metal servers with no hypervisor! It’s scalable, utilizes physical and logical resources in a vastly superior way, and enables true portability for the containerized workloads!

Some say containers have become so popular that they will eventually render standard VM’s obsolete. That’s probably a bit of an exaggeration, but who’s to say what will happen over the next 20 or 30 years? It’s not too much of a stretch to evolve current enterprise VM environments to the point where the actual virtual machine is “abstracted” from the underlying application. After all, VM’s abstract the physical infrastructure from the OS so it’s only logical to take that to the next level and abstract the VM & OS from the applications.

Docker Hub

Making the container small is not the only advantage. The container can be deployed just like a VM template, meaning once it’s “containerized” it’s now an application that’s ready to go and requires little or no configuration regardless of what platform it’s running on (e.g. Windows, Linux, Android, iOS, OSX, etc.).

There are thousands of pre-configured Docker images at the Dockerhub public repository. There, people have taken the time to assemble open-source software configurations that might take someone else days or hours to put together. People benefit from that because they can install these containers by simply downloading them, turning them on and away they go!

On the Need for Orchestration

Now, there is an inherent problem with containers, just like there is with virtual machines. That is the need to keep track of them. When public cloud companies bill you for CPU time or storage, you need to make sure you do not have any orphaned machines spinning out there doing nothing. Plus, there is the need to automatically spin up more when a machine needs additional memory, CPU or storage, as well as shut them down when the load lightens.

Orchestration tackles these problems. This is where Kubernetes comes in.

Kubernetes

Google built Kubernetes and has been using it for 10 years. That it has been used to run Google’s massive systems for that long is one of its key selling points. Two years ago, Google pushed Kubernetes into open source. Kubernetes is a cluster and container management tool. It lets you deploy containers to clusters, meaning a network of virtual machines and it works with different containers, not just Docker.

The basic idea of Kubernetes is to further abstract machines, storage and networks away from their physical implementation. So, it is a single interface to deploy containers to all kinds of clouds, virtual machines and physical machines.

Here are a few of Kubernetes concepts to help understand what it does.

Node

A node is a physical or virtual machine. It is not created by Kubernetes. You create those with a cloud operating system, like OpenStack or Amazon EC2, or manually install them. So, you need to lay down your basic infrastructure before you use Kubernetes to deploy your apps. However, from that point, it can define virtual networks, storage, etc. For example, you could use OpenStack Neutron or Romana to define networks and push those out from Kubernetes.

Pods

A pod is one or more containers that logically go together. Pods run on nodes. Pods run together as a logical unit. So, they have the same shared content. They all share the share IP address but can reach others via localhost, and they can share storage but they do not need to run on the same machine as containers and can span more than one machine. One node can run multiple pods.

Pods are cloud-aware. For example, you could spin up two Nginx instances and assign them a public IP address on the Google Compute Engine (GCE). To do that, you would start the Kubernetes cluster, configure the connection to GCE, and then type something like:
kubectl expose deployment my-nginx –port=80 –type=LoadBalancer.

Deployment

A set of pods is a deployment. A deployment ensures that a sufficient number of pods are running at one time to service the app and shuts down those pods that are not needed. For example, it can do this by looking at CPU utilization.

Vendor Agnostic

Kubernetes works with many cloud and server products, and the list is always growing as so many companies are contributing to the open-source project. Even though it was invented by Google, Google is not said to dominate its development.

To illustrate, the OpenStack process to create block storage is called Cinder. OpenStack orchestration is called Heat. You can use Heat with Kubernetes to manage storage with Cinder.

Kubernetes works with Amazon EC2, Azure Container Service, Rackspace, GCE, IBM Software and other clouds. It works with bare-metal (using something like CoreOS), Docker, and vSphere. It also works with libvirt and KVM, which are Linux machines turned into hypervisors (i.e, a platform to run virtual machines).

Use Cases

So, why would you use Kubernetes on, for example, Amazon EC2 when it has its own tool for orchestration (CloudFormation)? Because with Kubernetes you can use the same orchestration tool and command-line interfaces for all your different systems. Amazon CloudFormation only works with EC2. So, with Kubernetes, you could push containers to the Amazon cloud, your in-house virtual and physical machines and other clouds.

Wrapping Up

In summary, Kubernetes is an orchestration tool for containers. What are containers? They are small virtual machines that contain ready-to-run applications on top of other virtual machines or any host OS. They greatly simplify deploying applications, and they make sure machines are fully utilized. All of this lowers the cost of IT operations and eases the burden when architecting, testing and using an orchestrated BC/DR plan.