We use Erlang, ML, and Haskell for a few small projects but (as I'm sure you guessed) we'd like to focus on the lessons we've learned using Erlang for building Facebook Chat. The channel servers are at the heart of our Chat infrastructure; they receive messages from our web servers and queue them for HTTP delivery. Presence information, the knowledge of who is currently using Chat, is aggregated and shipped to a separate service. The service has been operational since our launch about a year ago and we continue to develop and improve it frequently. We're also developing an XMPP interface to Facebook Chat based on the ejabberd project. A lot of functionality has been added to interact with the many moving parts of our infrastructure.
Erlang has been the right tool for the job. It's well-known that Erlang is a good choice for communications applications and real-time chat in particular, and we'd like to mention that Erlang's strong support for concurrency, distribution, hot code loading, and online debugging have been indispensable, and that it's those same weaknesses that lead us away from C++ when first designing the service. However, we'd like to focus on the factors inside and outside Facebook that enabled us to choose Erlang in the first place. Most importantly, we needed good support for language-independent RPC. Fortunately we rely heavily on Thrift for all our services. Both the channel and Jabber servers need to speak to programs written in PHP and C++, and having infrastructure and engineers comfortable debugging it already in place when we began was invaluable. Our service management tools as well are language-independent; all our services regardless of language are controlled using the same interface. Facebook and the Thrift project are also fortunate to have an active community. The original Thrift binding for Erlang was implemented in-house but substantial portions were reimplemented by outside contributors. As well we've leveraged existing open source products including ejabberd and Bob Ippolito's Mochiweb framework.
The other half of our story consists of the barriers presented and risks taken in choosing Erlang. I mentioned above that much of Facebook's infrastructure is language-agnostic, but our build and deploy systems unfortunately are not. Many of our internal tools weren't written with Erlang or (in particular) hot code reloading in mind, so we needed to write many one-offs to support all the runtime goodness that Erlang affords us. Facebook runs many services, most of which are in C++, Python, or Java, and only two in Erlang, so in some respects we've needed to work harder to integrate. We at Facebook also tend to work in large codebases and share the responsibility of bug fixing. Unfortunately, our Erlang projects live outside of our main repository, and the extra learning necessary to develop or just fix bugs in Erlang tends to keep us isolated from the pool of talent in our department. Unsurprisingly, the job of fixing Chat bugs falls to one of a very small group, whereas bugs in our PHP code have a small army of PHP generalists waiting to squash them. However Erlang has such a steep learning curve only because most undergraduate programs don't stress functional programming, and Facebook in particular doesn't expose most engineers to functional programming, or even information about the strengths and weaknesses of FP versus PHP, C++, and Python. In particular we'd like to touch on the prevalence of object-oriented programming in education and in practice, and the tendency of engineers (even engineers with FP experience) to guide their design by OOP principles even when they don't fit. The recurring theme is "the right tool for the job" — actor-based, functional, imperative, and object-oriented programming are all valuable tools for modeling particular problems, but the most important lessons in our experience are how to lower the barriers to choosing the right tool, and how to arm engineers with the knowledge they need to make an appropriate choice.