Wednesday, July 04, 2012

Notes on Facebook's Haystack

Very interesting Facebook's paper on their photo storage (Finding a needle in Haystack: Facebook’s photo storage). A simple yet effective solution to their problem. Definitely a good example of how to leverage the properties of your problem to come up with an solution that provides a great solution (4x reads/sec improvement). Some notes:
  • The solution is essentially: (1) to cache file metadata in memory to reduce number of I/Os for each photo retrieval; (2) store multiple files in a single file, maintaining very large files and reducing metadata.
  • This solution is tailored for this photo storage use case: written once, read often, never modified, and rarely deleted.
  • CDN were not good enough: expensive, but not suitable for Facebook (long tail of requests is substantial, but not cacheable by CDN). But Haystack still uses CDN for the hottest photos.
