In the last two months, we started our journey towards a new microservices architecture. Among other things, we found that our existing CD tools were not ready to scale with new requirements. So we tried a new approach, defining our pipelines in code using LambdaCD. In combination with a Mesos cluster we can deploy new applications after a few minutes to see how they fit into our architecture by running tests against existing services.
Part 1: The underlying infrastructure
Part 2: Microservices and continuous integration
Part 3: Current architecture and vision for the future
LambdaCD is an open-source project initiated by Florian Sellmayr in August 2014 and Otto is the first company which trusts in this work and uses it in production. It’s a good feeling to explore a new world but we had to work out how to use this tool to get the best out of it. We gained practical experience and adapted the tool to our needs. Now we have a solution which fits better in our architecture than any other CD tool.
Defining your pipelines in code is a big advantage but we had to find a way to control this power. Because of that we started with the most obvious approach by creating one LambdaCD instance for every application. This sounds to be harder as creating a Jenkins pipeline but we automated this step and so it is possible to have a production-ready pipeline within a few minutes.
Current Architecture – Pipelines
Because we have one LambdaCD instance for each project the respective structure is very straightforward. An instance has two separated pipelines whereby the first one is responsible for building and deploying the code of the project and the second one for the self-deployment. That is why we call it the meta-pipeline. That means, if you change the code (configuration) of one of these two pipelines the meta-pipeline will be triggered by your commit and deploy a new version of this LambdaCD instance.
But how can we do our first deployment? Do we have a bootstrapping problem because only the instance knows how it has to be deployed? No, not really. I told you in the last part of this post that you have to define your pipelines in Clojure because LambdaCD itself is written in this functional language. The advantage is that you can extend the tool and interact with its internal components in a very impressive way. But the point is, a LambdaCD instance is a normal piece of code why you can just execute it locally to make your first deployment.
In addition, each instance has a web UI which lets you check the state of your pipelines or the output of a single step. Because all of our microservices are stateless we have the problem that the build history is lost every time we redeploy an instance. Fortunately, the definition of the pipeline isn’t changed that often.
Over the time we added pipelines for new services to existing LambdaCD instances because the services are very small and we make changes to them very rarely. If we had not done that, we would have had many instances which would have run the whole day and consume resources unnecessarily.
All LambdaCD instances are based on the tesla-microservice which uses the component framework and provides some nice features like health check, status page, metrics, loading configuration files and a graceful shutdown. It is a good starting point if you don’t want to reinvent the wheel every time. Check it out! Maybe you can use it to create your own microservices or you can make a fork to adapt it to your needs.
Current Architecture – Build-Monitor
Our pipelines use the LambdaCD cctray extension. The cctray.xml format is widely used by CI tools to report the state of pipelines and their steps. At first, we tried out nevergreen which only displays running and sick steps. But this open-source tool has two disadvantages: You can only display one cctray and the configuration is stored in the browser cache. The second point is impractical because every developer has to configure the monitor on his local machine.
At the moment, the monitor isn’t perfect because the font isn’t scaled dynamically and if you have too many running or sick pipelines your screen could be too small. But I will publish it within the next weeks and maybe somebody has a good idea how to fix these problems.
Current Architecture – Graphite
To know which LambdaCD instance needs more or less resources we use Graphite. With a quick view you can detect instances running out of memory, disk space or CPU power.
These graphs are very good to scale a pipeline corresponding to its consumption. Moreover, if you look at the spikes of these graphs you can see how often a pipeline is triggered or restarted.
Current Architecture – Disadvantages
The current architecture is grown over the last weeks and we collect a few things for improvement:
- Stateless applications – All of our microservices are stateless. As a result, a LambdaCD instance looses all information about past pipeline iterations.
- Many web UIs – Each pipeline has its own web UI and if you want to see why a step failed
you have to know the right LambdaCD instance URL. Better would be a central service
which collects all this information and displays a clear overview.
- Idle pipelines – Most pipelines are only waiting the whole day for the next run but they are still blocking
resources. This isn’t a problem for a few pipelines but in a year, we might have 50 or 100
The Second Approach
Over the next few weeks, I’m going to implement some new LambdaCD components to get rid of these disadvantages. I love the flexibility of LambaCD which lets me extend the tool in any direction. I can’t show you how you can build the best LambdaCD infrastructure because at the moment I don’t know it. But I want to give a sense how easy it is to build a system which adapts your needs and is ready for further improvements.
Let’s have a look at the overview:
The Pipeline-Builder, the Pipeline-Spawner and the Pipeline-State-Controller were added and I will introduce them in the next paragraphs.
The Second Approach – Pipeline-Builder
The Pipeline-Builder is a place for all of our meta-pipelines. You don’t have to create such a pipeline in every project anymore. Just add the Git URL to the trigger of the Pipeline-Builder and you are done.
If you don’t use the Pipeline-Spawner you have to add a step to deploy your pipeline.
The Second Approach – Pipeline-Spawner
If the code of the project is changed the Pipeline-Spawner is triggered by this commit and it deploys the corresponding pipeline which was built by the Pipeline-Builder. In addition, the service injects the commit hash into the pipeline to guarantee it uses the right version of the code.
You don’t have to use this spawner but it is good way to save resources because you run a pipeline only when you need it.
The Second Approach – Pipelines
Pipelines don’t have a Git trigger anymore in front of the other steps because they already know the commit hash which was injected by the Pipeline-Spawner. The other steps are the same as in the current architecture. But a pipeline may only be executed once why it has to shutdown itself after the last step.
The Second Approach – Pipeline-State-Controller
The Pipeline-State-Controller acts as a proxy between all other components and a database, e.g. mongoDB. It stores the state of these components and it provides two different views:
- Control-Center – Just a simple view to get all information about your LambdaCD cluster and to trigger manual steps:
- Build-Monitor – This could be reused from the first approach to get a kind of dashboard.
- Centralized web UI
- Persistent build history
- Pipelines runs on demand
- Avoid meta-pipelines in every LambdaCD instance
- Meta-pipelines can’t break a build of your project anymore. This happens if you trigger the meta-pipeline and the LambdaCD instance restarts during a build of your project is running.
If the Pipeline-Builder or Pipeline-Spawner restarts they could miss a commit because they are stateless and don’t know the last commit they have seen. To avoid this you can use a Git hook instead of a pulling mechanism. With this approach you can add a load balancer and a second instance of both components. So it is possible to share the builds over these instances and to guarantee you don’t miss a commit anymore.
Now you know our infrastructure, current architecture and plans for the next weeks. But all these ideas base on two months experience with microservices and our new Marathon/Mesos infrastructure. So it is likely that our plans will change and we will develop new services. But one big advantage of microservices is that you develop them within a few hours or days, you can test them and then you can throw them away to create new, more powerful services.