A blog about software and making.

Exploring Probabilistic Data Structures

The Bloom filter and Count-Min Sketch data structures keep cropping while I’m reading so I figured I would try to implement them. They are both part of a family of probabilistic data structures that give up accuracy to store a lot of information in a tiny amount of space.

Bloom Filter - Could I have seen this before?
Count–Min Sketch - What’s the maximum number of times I could have seen this?

It’s also worth looking at:
HyperLogLog - Approximately how many unique elements have I seen?
Great post on probabilistic data structures

Bloom Filter code
Count-Min Sketch code