I have been reading some of the papers published by the Google engineers. It started with Bigtable: A Distributed Storage System for Structured Data. I am not sure how I started. The Official Google Blog posted a link announcing their new technology round series. I watched the “MapReduce” discussion, where the engineers talked about Bigtable and how it is used in MapReduce. This lead me to look for more information about Bigtable as I was looking for information on distributed “communication” techniques to enhance the littles3 implementation. (The current littles3 architecture is very simple and only supports one node. It works, but doesn’t do any cool things like scale storage or be fault tolerant.) I had heard Bigtable discussed in different technical blog settings, but I had no idea that there was a paper from 2 years ago that described the Bigtable system. (I guess I don’t read the technical CS journals like I should. I may have to become more active in IEEE.)
While reading the paper (I did find it very readable. Okay, I am a computer geek. Fair warning.) I noticed that Bigtable, which is a highly scallable distributed database (not relational), used a “lock service” called Chubby. What is a “lock service”? Well, the The Chubby Lock Service for Loosely-Coupled Distributed Systems paper will tell you. I am currently reading this paper. (Again, this is from 2006! Where have I been?) Mike Burrows, the author of The Chubby Lock Service for Loosely-Coupled Distributed Systems, sprinkles humor into a computer science paper discussing Paxos, “a family of protocols for solving consensus in a network of unreliable processors”. What I found interesting is how the “lock service” is used to share information in a highly distributed system. The Bigtable implementation is a client of the “lock service” and uses it to elect a leader; the leader is the node that aquires the lock–only one node will get the lock. The “lock service” can also store small amounts of information, like metadata or configuration information, that a client application can read from the “lock service”.
Next up is the paper Paxos Made Live – An Engineering Perspective. This paper provides some details on how the Google team implemented Chubby, some of the history of the previous implementation, and some of the issues that they discovered implementation the Paxos algorithm.
Together, these papers provide some details of how Google has implemented highly distributed systems. So far, the information about Paxos has been very enlightening. And I am impressed with the way in which a “lock service” is used to coordinate communication and direct cooperation in a automated distributed network. It seems that they have created simple building blocks that together work in sometimes unique ways to make a complex system.