Let's design an agile Monitoring Solution for the Containerized OpenStack

Cloud providers are moving toward a new deployment method that is using container technology. The objective is to replace the inflexible, painful, resource-intensive deployment process of the cloud infrastructure with a flexible, painless, inexpensive deployment process. For instance, OpenStack uses the Kolla project to foster the effort and has already reached a mature level which can be used in production. The new software architecture changes how services are connected together. That exposes the cloud platform to many new challenges including resource provision, performance, failure management etc. Therefore, to guarantee the availability and stability of the containerized system, monitoring features have to be considered thoroughly. This blog post discusses many aspects of such agile, plug-and-play, and practical monitoring system.

OpenStack is an open source cloud platform with a modular architecture. Deploying OpenStack is difficult because it comprises many different components (e.g. Nova, Neutron, Keystone etc.) that need to be connected together. A big problem with most of the existing deployment methods is that all OpenStack services were deployed as static packages on top of a shared operating system. This made the ongoing operations, troubleshooting and upgrades really problematic. The obvious thing to do would be to have all OpenStack services deployed as containers and managed by a container management system. Kolla was born out of that idea and become one of the main projects that house within OpenStack' umbrella. The deployment process is quite complicated, however, the end result is a highly flexible OpenStack cloud deployed using Docker containers, managed and orchestrated by Kolla Ansible, a set of Ansible playbooks that automate the tasks.
Kolla provides production-ready containers and deployment tools for operating OpenStack clouds that are scalable, fast, reliable, and upgradeable, using community best practices. Kolla uses Docker to store the file-system contents of images and provide an execution environment for those containers. One key advantage of Docker is that a registry system provides a central repository of images which can be used to instantiate a deployment. By choosing Docker, cloud providers obtain the superpower of immutability with the Docker registry. Kolla has approximately ninety containers. Sixty would likely be running; the remainders are base containers used as intermediate build steps shared by children containers. Fig. 1 illustrates the OpenStack deployed with Kolla' architecture at a higher level where all of the services are containers. 

Figure 1: OpenStack Kolla High-level Architecture
(This is a simplified version which excludes the CI/CD parts on the Master and file system on the Hosts )
Monitoring a cloud platform like OpenStack has been a mature subject in both literature and reality. It is a critical process for checking the vitality of a cloud infrastructure's software, physical as well as virtual assets. Moreover, monitoring is the first step in the minimizing cloud failures' impact procedure [10]. Cloud providers often use agent-based monitoring solution that is installing client software inside OpenStack services and collecting all the needed information. Because most for the services are built natively in the operating system in the traditional deployment model, monitoring agents are obviously another kind of application that can operate at the system level to read most of the desired data. However, when all of the services are packed into containers, that monitoring model will have to change. Specifically, the agent software needs to be upgraded with the ability to harvest information from the running containers. Also, OpenStack services' interconnection requires a different monitoring approach. Because, unlike conventional services, the containers communicate with each other not using ports but running in a host networking mode, effectively disabling any network isolation and giving all containers access to TCP/IP stacks of their Docker hosts. Those considerations are critical because, without the data sent from agents, monitoring server is useless and thus, breaks OpenStack' reliability.

Having a good understanding of the containerized cloud platform is crucial in designing a comprehensive monitoring solution for it. Thus, the first thing we want to introduce in this section is a brief analysis of the containerized OpenStack. The containerized OpenStack as shown in Fig. 1 consists of two main functional groups. The first group is Deployment Master or server that is in charge of orchestrating the container deployment for OpenStack services using Ansible automation tool. In addition, a Docker Registry is optionally placed within the deployment master to provide the container images for Ansible tasks. Otherwise, Kolla has to use an external container repository. The second functional group of OpenStack Kolla is the deployment hosts where it could be only one host or multiple hosts depending on the size of the desired cloud system. On each host, there is a Docker engine that is used to manipulate the dockerized services. Hence, two different monitoring targets can be identified which are the Deployment Master or Hosts, and the containerized OpenStack services inside each host.
Based on the architecture of Kolla, an end-to-end monitoring solution is designed. The proposed system comprises two components. The first and foremost is the server side which is also split into elements. Those are the monitoring server where monitored data is scraped from the agents and stored in the time-based format, the alert manager that manages all the alerting tasks when a predefined event occurs (e.g. send alarm email to administrator when a target is down), and finally, monitored data stored in the server can be queried and visualized by the front-end. The next component is the client side or the monitoring agents which plays a very important role in the system as discussed in the previous section of this paper. 
Additionally, there are two types of agents need to be deployed in this proposed system because two different monitoring targets are presented. They are the Deployment Master or Hosts which are operating systems (OS), and the containerized OpenStack services which are special packages that contain the library dependencies, the binaries, and a basic configuration needed. The agents will expose a specific set of metrics from the environment in which they are installed. The agent deployed in the Deployment Master or Hosts are normal applications that can scrape information such as CPU, Memory usage, Disk I/O directly from the underlying OS. On the other hand, the client software that monitors containers, i.e. the cAgent, need to have a capability to communicate with the cgroups virtual file system where Docker containers performance statistics are stored via the Docker engine [8]. The later agent type can be realized using cAdvisor, a special tool designed to monitor containers. It can be installed natively in the host or inside a container.

Figure 2: The Proposed Monitoring System Architecture.
By being flexible on the agent software, the proposed monitoring system opens an opportunity for many different useful metrics of the targets to be collected and processed. However, that method creates a problem with complex communication between the client and monitoring server which traditionally requires lots of development effort. Therefore, a unified communication protocol of monitoring server and its agents is required. In this architect system, monitoring clients expose metric data of the targets to the server via a TCP/IP channel that is a port and the address of those targets. Hence, Prometheus, a monitoring software, which supports that kind of client-server communication is leveraged.
As shown in Fig. 2, the proposed architecture can monitor the Deployment Master where all the autonomous deployment tasks are controlled in harmony with the Continuous Integration and Continuous Development (CI/CD) and the Deployment Hosts in which the containerized OpenStack services are deployed as well as the containers themselves. Those capabilities of the designed system create an agile end-to-end monitoring solution for the container-based cloud platform.

With the advanced features of container technology, OpenStack deployment and operation become increasingly flexible and painless. However, the containerized architecture of the cloud while eliminates some existing problems such as dependencies, resource consumption etc. creates new challenges in failure management and reliability assurance. Tackling those issues, the proposed monitoring mechanism through careful consideration and experiment has proved to be feasible and practical to provide an end-to-end monitoring solution for the container-based OpenStack. Nevertheless, there is one drawback of this designed system is that deploying monitoring agents process is still manual and quite laborious. Therefore, future work will consider a deployment model for the agents that are automatic and scalable to adapt to the increasingly complex structure of the cloud.


[1] Marcel GroBmann, Clemens Klug, Monitoring Container Services at the Network Edge, 2017 29th International Teletraffic Congress, 2017. 
[2] Chia-Chen Chang, Shun-Ren Yang, En-Hau Yeh, Phone Lin, Jeu-Yih Jeng, A Kubernetes-Based Monitoring Platform for Dynamic Cloud Resource Provisioning, 2017 IEEE Global Communications Conference, December 2017.
[3] Farnaz Moradi, Christofer Flinta, Andreas Johnsson, Catalin Meirosu, ConMon: An Automated Container Based Network Performance Monitoring System, 2017 International Symposium on Integrated Network Management, 2017.
[4] Asif Khan, Key Characteristics of a Container Orchestration Platform to Enable a Modern Application, IEEE Cloud Computing, December 2017.
[5] Hui Kang, Michael Le, Shu Tao, Container and Microservice Driven Design for Cloud Infrastructure DevOps, IEEE International Conference on Cloud Engineering, 2016.
[6] Wes Lloyd, Shruti Ramesh, Swetha Chinthalapati, Lan Ly, Shrideep Pallickara, Serverless Computing: An Investigation of Factors Influencing Microservice Performance, IEEE International [7] Claus Pahl, Antonio Brogi, Jacopo Soldani, Pooyan Jamshidi, Cloud Container Technologies: a State-of-the-art Review, IEEE Transaction on Cloud Computing, 2016.
[8] Emiliano Casalicchio, Vanessa Perciballi, Measuring Docker Performance: What a mess!!!, 8th ACM/SPEC on International Conference on Performance Engineering Companion, April 2017.
[9] Vaibhav Agrawal, Devanjal Kotia, Kamelia Moshirian, Mihui Kim, Log-Based Cloud Monitoring System for OpenStack, IEEE Fourth International Conference on Big Data Computing Service and Applications, 2018.
[10] Patricia Takako Endo, Guto Leoni Santos, Daniel Rosendo, Demis Moacir Gomes, Andre Moreira, Judith Kelner, Djamel Sadok, Glauco Estacio Gongalves, Mozhgan Mahloo, Minimizing and Managing Cloud Failures, IEEE Computer Society, November 2017.