Tuesday, January 25, 2011

How to find topmost frequent items from a data stream?

I've been asked this question in a interview a long time back: if you have a huge amount of data and don't have memory to keep all of them, how do you get the top items, even if you have some error. Answering this precisely is not simple and requiring the interviewee to provide a 100% correct may be too much, but the goal is more about the discussion - and luckily it seems I did well at that time. Anyway, below you can find two good references on the topic, in case you are interested in learning more:

Manku, Motwani - "Approximate Frequency Counts over Data Streams" [pdf]

Data Streams: Algorithms and Applications
Post a Comment