I am currently working on a Bloom filter implementation. I was wondering how can I test such a data structure since a Bloom filter is probabilistic in nature, I guess . I want to unit test and also test for false positives. in particular how can I unit test the filter methods (add/contains) and how can I verify that false positives are below a certain percent?
Asked
Active
Viewed 656 times
2
-
Are the hash functions used in your filter selectable? – Philip Kendall Jun 01 '17 at 05:59
-
Possible duplicate of [What are best practices for testing programs with stochastic behavior?](https://softwareengineering.stackexchange.com/questions/170392/what-are-best-practices-for-testing-programs-with-stochastic-behavior) – gnat Jun 01 '17 at 09:59
-
It is possible to use a very large set of random input, that is not generated (i.e. fixed random), for which you have previously computed the correct answer in some way. – Frank Hileman Jun 01 '17 at 23:55
1 Answers
1
False positives are possible in a bloom filter. So having one isn't a failure. You can test for many other things but this is asking the impossible.
You could set a threshold that you expect the false positives to be under but that requires enough trials that the test stops being fast.

candied_orange
- 102,279
- 24
- 197
- 315
-
...or a deterministic unit test. It's still possible for the test to fail, even with a threshold set. – Robert Harvey Jun 01 '17 at 02:26
-
@RobertHarvey Well, kinda. You could use the same data in the test every time trying to make it deterministic. But then the implementation becomes the source of random making it fail or pass based on arbitrary implementation choices that aren't actually wrong, just unlucky. – candied_orange Jun 01 '17 at 02:38
-
1@CandiedOrange I'd try hard to separate testing the Bloom filter (which I think is doable if you can mock out the hash functions) from testing the behaviour of the hash functions themselves, which I agree is a hard problem. – Philip Kendall Jun 01 '17 at 06:01