The Bloom filter and Count-Min Sketch data structures keep cropping while I’m reading so I figured I would try to implement them. They are both part of a family of probabilistic data structures that give up accuracy to store a lot of information in a tiny amount of space.
Bloom Filter - Could I have seen this before?
Count–Min Sketch - What’s the maximum number of times I could have seen this?
It’s also worth looking at:
HyperLogLog - Approximately how many unique elements have I seen?
Great post on probabilistic data structures