Systems
Content
Hardware
- Processor, RAM, buses, GPU, disk, SSD, network, switches, racks, server centers
- Bandwidth, latency and faults
Basic parallelization paradigms
- Trees, stars, rings, queues
- Hashing (consistent, proportional)
- Distributed hash tables and P2P
Storage
- RAID
- Google File System / HadoopFS
- Distributed (key, value) storage
Processing
- MapReduce
- Dryad
- S4 / stream processing
Structured access beyond SQL
- BigTable
- Cassandra
Supplementary material
Slides in PDF and Keynote. If you want to extract the equations from the slides you can do so by using LaTeXit, simply by dragging the equation images into it.
Scribe’s notes will be available once they’re ready.
Links
Consistent hashing (Karger et al.) paper
Stateless Proportional Caching (Chawla et al.) paper, slides
MapReduce (Dean and Ghemawat) site
Google File System (Ghemawat, Gobioff, Leung) site
Amazon Dynamo (deCandia et al.) slides, paper, Peter Vogel’s DynamoDB post
BigTable (Chang et al.) site
CEPH filesystem (proportional hashing, file system) homepage, paper on distribution protocol
NVIDIA CUDA CUDA site
ATI Stream Computing site
Microsoft Dryad (Isard et al.) site
Memcached site
Linked.In Voldemort (key,value) storage design description
PNUTS distributed storage (Cooper et al.) paper
SSDs (solid state drives) benchmarks
All Things Distributed, Peter Vogel’s blog
Videos
The quality is rather rough (this is unedited video straight from a Lumix GF2 with a 14mm lens which should explain the sound and the exaggerated hands). But it should help as a supplement with the slides.