- The solution is essentially: (1) to cache file metadata in memory to reduce number of I/Os for each photo retrieval; (2) store multiple files in a single file, maintaining very large files and reducing metadata.
- This solution is tailored for this photo storage use case: written once, read often, never modified, and rarely deleted.
- CDN were not good enough: expensive, but not suitable for Facebook (long tail of requests is substantial, but not cacheable by CDN). But Haystack still uses CDN for the hottest photos.
Wednesday, July 04, 2012
Notes on Facebook's Haystack
Very interesting Facebook's paper on their photo storage (Finding a needle in Haystack: Facebook’s photo storage). A simple yet effective solution to their problem. Definitely a good example of how to leverage the properties of your problem to come up with an solution that provides a great solution (4x reads/sec improvement). Some notes: