A lot has been written about polyglot programming. I did not read much of it. So forgive me if I miss something obvious. For me it is simply defined though: Polyglot programming is when a team of developers works with a set of different programming languages.
To understand my team’s approach I will have to take you on a little digression on architecture first.
The fall of the monoliths
As for the issue of the monolithic software, a lot has changed in quite a short period of time. We moved from rather big vertically decomposed systems to micro-verticals and microservices. See for example Guido Steinacker’s post on that.
Also, we are now creating a new and much more flexible Infrastructure based on Mesos and Docker. Simon Monecke has written a post about that and about how we deploy to that environment with LambdaCD. This post is only the first of a series of three.
Our new Mesos-based infrastructure is not only a suitable runtime for our classical web applications (micro or not), but also for Apache Spark. Which brings us to the first of the two lambdas I promised:
The Lambda Architecture
„Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data.“
In our team, which was founded in early 2015, we work on such an architecture. Figure 1 shows a typical lambda architecture with
- A source of streaming data
- A source of batch data
- A processing unit for the streaming data
- A processing unit for the batch data
- Shared storage for batch and stream processing
- A cache for the batch results
- A cache for the streaming results
- A facade for accessing the data from the caches
Early Lambda architecture implementations often suffer from using different technologies on the two streams of data. The batch processing is usually conducted using Hadoop Map-Reduce which is fundamentally unsuitable for stream processing. Thus the streaming part has to be implemented in a different technology. While this offers the advantage of allowing you to choose different solutions for different problems, it can also make you quite ineffecient if the data you process in batch and streaming is essentially the same.
Apache Spark is there to close that gap. It provides a sane API and a huge base of data processing libraries (e.g. MLlib) that can be used for both batch and stream processing. Spark features immutable, distributed data structures and tries to keep as much of the data in memory as possible. Spark programs can be treated as microservices in their own right. You end up with a bunch of different jobs which can be developed, deployed, scheduled, scaled and run independently.
Spark is written in Scala, which is also the native language in which to program Spark. This is why Scala is now part of our polyglot portfolio. Our team contains not only developers, but also data scientists with a tradition of prototyping and developing models with Python. Luckily, there also exists a Python API for Spark.
The Lambda architecture is a very powerful pattern, but to build and maintain a production system with it means you have a lot of requirements for which Spark is not necessarily the best fit. Instead we use microservices to satisfy requirements like:
- The facade service itself (A)
- Dashboards for business users (B)
- Import and transformation of additional data sources (C)
- Technical and functional monitoring
- Dashboards for operations
For the sake of simplicity, Figure 2 only sketches the first three services. We implement these services with Clojure. Which brings us to the second Lambda:
The Lambda calculus
Clojure is a Lisp, which is in turn based on the Lambda Calculus. Lisp is a programming language that dates as far back as 1958. The Lambda Calculus is even older and was introduced in 1936.
Our services share a minimal framework. We named it like we named our team after Nikola Tesla: tesla-microservice. As other teams at OTTO are now also using our framework, we published the source code on github. Read on for details about Clojure as a programming language.
Wait. Isn’t that three Lambdas now?
Yes. Interestingly, you are right. As mentioned above, we are currently replacing our Jenkins-CI Server with pipelines driven by LambdaCD. A continuous delivery pipeline in code. Clojure code, that is. It feels amazingly good to have pipelines that can be executed on any machine and that are also unit-tested. But this is not the topic now. I heard rumour that the follow up on Simon’s article will contain real code samples and will be released as soon as next week.
We are not only polyglot in our programming languages but also in our persistence technologies. Currently we are using Kafka, MongoDB, Redis and HDFS. We chose them partly because they are best for the respective job, partly because they were the easiest to set up and perform well enough. We will certainly stay curious for other options like Apache Cassandra and Cognitect’s Datomic.
If you take everything together, you get the picture in Figure 3. Not in the picture are Graphite for metrics, Elasticsearch for logs and ZooKeeper for configuration. It might appear a little messy at first sight, but it isn’t. It is a straightforward, data-centric architecture. It uses the most pragmatic tools for any given task. All components are easily interchangeable. Last but not least: It is a lot of fun to work with.
I fear I may have bored you with all that contextual information, so lets get back on topic: the programming languages.
Scala really is a great language. It has tons of interesting features. From a Java programmer’s perspective, Scala feels like two huge steps forward. Data structures are immutable. Functional programming is well supported. There is pattern matching and countless other cool features. All that makes it very interesting and fun to work with.
At the same time it feels like a little step back. Scala is complicated. All too often a pair of developers will look at each other and say „Did we really just spend two hours getting the types right when converting that data representation into that other one?“ Also, it will frequently give you a hard time trying to understand what some library code does. Chances are the authors of the library fancy a different set of language features and syntax options than you do.
I would not call myself an expert on Scala, so I will illustrate my feelings about it with three analogies:
- Scala is like C++: Powerful and Multifaceted. It is a very powerful language giving you the freedom to solve problems in a lot of different ways.
- Scala is like Haskell: Academic standards. Being subject of a lot of scientists‘ work results in well thought out (but possibly a little too academic) features like the powerful type system.
- Scala is like Gentoo Linux: A research lab for the best solution. Like Gentoo, Scala has a rather steep learning curve. Like Gentoo, Scala is the playground of a lot of language nerds that are in search for the best possible solutions. Sooner or later everybody profits from that work, as the results migrate into the more user-friendly distributions. So if you are not that nerd, you do not have to use it.
Python is a nice and friendly little language that has been around for quite a while now. By using indentation instead of parentheses, it removes a lot of visual bloat. It is friendly to beginners and it is also understood by many, many techies that would not call themselves a software developer. I just asked one of our data scientists, Stephan, what he thinks about Python and he said:
„Python not only enables me to solve almost all of my computational problems but it is also, unlike R, a lingua franca to communicate with computer scientists.“
That’s what it is. It is a language equally well suited to support the rather exploratory work of a data scientists and to fulfill all the requirements of reliably operating production software. The support for Python in Spark makes it a first class language for us to build prototypes and to bring prototypes to production with little or no conversion overhead. I’m not sure we will ever find performance problems that are bad enough to require a migration to Scala, but if so, than that’s what we will do.
Clojure! It is by far our favourite language. Clojure has the intent to make simple easy. And it does. Listening to a pair say „Oh, I think we are done.“ is often followed by the disbelieving „But that was only five minutes!“ And then they move on to the next task.
As a Lisp, Clojure features a syntax that is fundamentally different from everything most Java, Scala and Python programmers are accustomed to. This can be perceived as a hurdle when migrating to Clojure. In our experience it is only a low hurdle though. Most developers, when having crossed that bridge, are excited by the simplicity, brevity and expressiveness of Clojure code.
What is particularly puzzling on very first sight is the different use of parentheses. A pair of parentheses defines a list. That list can be either data or code. The first symbol in the list is treated as a function then, all the following are parameters. Sounds complicated? It Isn’t. Here are some examples for how to translate code:
1 + 2 becomes
(+ 1 2).
The slightly more complicated Formula
1 + 3 * 2 becomes
(+1 (* 3 2)) which demonstrates the simplicity of the syntax: As the order of execution is clear, mathematical operators do not have to be complected with precedence rules. (Yes, to complect is a word.)
From my own experience and from that of many of my colleagues I can now confidently say: It is only Syntax. You get used to it quickly. Looking at the code of a library will frequently surprise you, too, but the good way. You will be surprised by how little code is actually necessary to do the trick.
There is no explicit type system in Clojure (although an optional one exists) and it turns out you don’t need it. Clojure is data-centric instead. With immutability as the default, clojure offers a Software Transactional Memory to manage mutable state. This makes it much, much harder to shoot your own foot with shared mutable state problems. Also, it has very cool and simple concurrency features built in. Beeing homoiconic, macros can be written in the language itself which makes Clojure easily extendable. There are extensions for logic programming, matrix computation, pattern matching and many more.
I could go on with this list, but I will finish here. Just one more thing: All this goodness comes as a JVM-language. So the complete ecosystem of the Java world is only one import away (just like in Scala).
Should I try polyglot programming then?
Should you try polyglot programming? Yes you should. But isn’t that inefficient? No it is not.
Let me say it with another analogy: Polyglot programming is a lot like pair programming: If you have not tried it, it is intuitive to assume it means a lot of overhead and is of doubtable benefit. But as in pair programming the overall productiveness of a team is not diminished. Quite the contrary is true:
- As in pair programming looking at a problem from different perspectives makes you understand the problem better.
- As in pair programming that does not mean the most perfect, but the most pragmatic solution will be sought for.
- As in pair programming fiddling with the tools becomes less important and the business problems that need to be solved move into focus.
- As in pair programming it takes a little while to get used to it, but then you do not usually want to go back.
- As in pair programming polyglot programming makes work a lot more fun.
So go for it!
There is always new stuff to try. Like I mentioned there is persistence technologies like Cassandra and Datomic. There is a lot of programming languages yet to be tried. And as soon as we feel a little safer with Spark and understand it better than we do now we will look at Clojure-APIs for it like e.g. Sparkling.
Last year I visited the EuroClojure conference in the beautiful city of Kraków. It was by far the most useful conference I have yet visited. (see my reports of day one and day two) This very week I will go to Barcelona where this year’s EuroClojure takes place. This time I won’t go alone, but take a few colleagues with me. I am really looking forward to that. See you there!