Distributed Systems Design Thumbrules

Some common thumbrules to keep in mind when designing distributed systems:

  1. Scale: All your lists are billions of elements long.
  2. Failure is the norm: Will your solution break if some one machine fails? (You don’t have to worry about this in MapReduce in this course because the MR framework takes care of it — you can check the original MR paper for more details — but it can be relevant in other parts of the course.)
  3. Scale-out > scale-up: Many less-powerful machines are better than a single powerful machine. So if you use a fancy fast hash-map in a single reduce instance processing a billion values, it doesn’t matter much — 100 reduce instances doing something simple and processing just 10 million values each will do equally well or better.
  4. Others (for later discussions):
    1. No concept of a “clock” — each machine has its own clock and we can’t have a global clock.
    2. No common memory (i.e. no global variables) — common memory is kind of possible but incurs network access. This relates to the well-known shared memory vs. message passing design choice.