Distributed Systems Design Thumbrules
Some common thumbrules to keep in mind when designing distributed systems:
- Scale: All your lists are billions of elements long.
- Failure is the norm: Will your solution break if some one machine fails? (You don’t have to worry about this in MapReduce in this course because the MR framework takes care of it — you can check the original MR paper for more details — but it can be relevant in other parts of the course.)
- Scale-out > scale-up: Many less-powerful machines are better than a single powerful machine. So if you use a fancy fast hash-map in a single reduce instance processing a billion values, it doesn’t matter much — 100 reduce instances doing something simple and processing just 10 million values each will do equally well or better.
- Others (for later discussions):
- No concept of a “clock” — each machine has its own clock and we can’t have a global clock.
- No common memory (i.e. no global variables) — common memory is kind of possible but incurs network access. This relates to the well-known shared memory vs. message passing design choice.