Bloom Filter implementation in F#
Further to my previous post on bloom filters for efficient ontology lookup, I’ve made a simple implementation in F#. This is based on a Java implementation by Ian Clarke.
The neat thing about Ian’s implementation is its use of Random to extend the hashcode provided by the object being stored into a hash of arbitrary length, suitable for use by the bloom filter algorithm. This will reduce the quality of the hash, but for an arbitrary passed-in object, it’s hard to do better. (For a specific application, like storing ontology labels, it would be better to use a more specific algorithm such as a Jenkin’s Hash).
#light open System // Based on Java Bloom Filter implementation by Ian Clarke// http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ type BloomFilter(bitArraySize : int, expectedElementCount : int) = let bitSet = new System.Collections.BitArray(bitArraySize, false) let bitArraySize = bitArraySize let expectedElementCount = expectedElementCount let k = (int) (Math.Ceiling( ((double) bitArraySize / (double) expectedElementCount) * Math.Log(2.0) )) let bitSequence o = let r = new Random( hash o ) Seq.init_infinite (fun n -> r.Next(bitArraySize)) member b.expectedFalsePositiveProbability = Math.Pow((1.0 - Math.Exp( -((double) k) * (double) expectedElementCount / (double) bitArraySize)), (double) k) member b.add o = let sq = bitSequence o for x in 0 to k do bitSet.Set( Seq.hd sq , true) member b.addAll os = Seq.iter b.add os member b.clear = bitSet.SetAll(false) member b.contains o = let sq = bitSequence o let isSet n = bitSet.Get( Seq.hd sq ) Seq.for_all isSet [0..k] member b.containsAll os = Seq.for_all b.contains os
Using F# makes the code slightly neater than Ian’s Java version - I’ve been able to factor out the hash code into a sequence supplied by the bitSequence function, and to collapse some of the for loops into operations over lists instead. But, the basic structure of the code is still very similar.
March 19th, 2008 at 2:54 pm
Neat
I’m glad you liked the trick with Random, it occurred to me while lying in bed at about 5am one morning, and 30 minutes later I had a working BloomFilter implementation
Ian.