Container cluster using Mesos and Marathon
Inception


This is one of the most interesting projects I worked on when I was part of Mindtree. This was long before Kubernetes or any cloud container services came into existence. Docker was still in development phase. I was part of a Non-profit project IGG (I Got Garbage). I was provided with simple on-premise machines and asked to make use of it.

I wanted to develop something which is a fully automated system of deployment for our applications. Which included build, deploy and scaling of the application. I started with the concepts and then implement it. I started with the single-server-service. Deployment of a single instance of my application automatically was my 1st goal.

Building Images


As I was a Java developer during this time. I knew that I just need to put the WAR file inside the tomcat container and the application would start. For the people who don't know about WAR file. It is a compiled version of a web application. Today we have spring boot applications which have in-build tomcat we can just deploy the JAR file generated.

Now comes how do we automate the process during build time. We used Jenkins during that time. It was very easy to deploy applications using Jenkins. During that time we had a Jenkins.war which we hosted just by putting it inside the tomcat webapps folder. Now there were options to execute pre-build scripts and post-build scripts. So, when the build was complete we could download the Java based linux image into the Jenkins VM and then just add the WAR file inside the image and then create a new image, tag the new image and then push it to the private registry(More details about this in next section).

Private Registry


As docker registry image is freely available over the internet. I created a new private registry and pushed and pulled images from it. Here I kept all the images of different types. During this time there were no images available with Java or any kind of software installed over a linux kernel based simple VM. As there were multiple types of Java that time realised by multiple types of organisations like Oracle, OpenJDK etc. There was also issues of which version of Java is needed so maintaining it on our own with versions as tags made sense.

Images include base-ubuntu, mysql(DB), tomcat, vertx_cluster(cluster image), application(image including the WAR). Now you could just pull the images in respective machines in the network and start them to deploy the applications.

Multi-Cluster Setup


Now came the main challenge which was setting up the multi-cluster of the application. We needed something like a orchestrator which know the amount of space available and then allocate a VM as well as a runner which could the pull the images and start the application in the VM. During that time there were options like Docker swarm. I think Kubernetes didn't even exist. So I went with the most stable approach which was using Mesos and marathon. Mesos is a resource manager.

It works on the basis of Master-slave architecture. Where the master is the mesos master service running on a orchestrator VM and mesos slave service running on the all the VMs we need to use as a resource. Mesos-master had a dashboard showing amount of Memory allocated to applications and amount of memory unused.

Marathon on the other hand was the runner which basically downloaded the image required on the slave VM and assign and port to it. You could also give a health-check url which makes sure that the Application is running. If the health-check fails then it asks mesos to allocate a different resource until the no. of applications running as a cluster is satisfied, example if no. of applications is 3 provided as a cluster. There should be atleast 3 health-check passed containers running.

Multi-Cluster Logging Dashboard


Now came challenges of logging and debugging of multiple applications per cluster. For this I created a React based Dashboard which basically took logs from the respective VMs and displayed it in a formatted way on the Dashboard. I also configured mounts for logging on containers as they could be read easily through the short-polling method. However you cannot make sure that the order of the logs will be same as generated in the containers. But this was the minimum I could come up to debug errors in containers.

I also created a form based deployment of a new application using the Dashboard which basically uses API to build application in Jenkins and then deploy it using marathon API if there is memory available in any mesos-slave.

Architecture

Conclusion


Finally I deployed this whole setup in AWS as well as in Azure VMs to verify my work. This worked seamlessly. So you could deploy a whole cluster based system using this architecture.