Archive for the ‘fsharp’ Category

OpenHandle code examples in C# and F#

Wednesday, April 23rd, 2008

I’ve just contributed some code examples to the OpenHandle project.

OpenHandle exposes data from Handle, which is an interesting way of providing persistent digital identifiers for information, incorporating metadata. It’s used by systems such as DSpace and DOI.  In some ways, it’s a competitor to existing DNS-based ways of providing persistent URIs.

The code examples I’ve written demonstrate how to download OpenHandle data in F# and in C#. The C# implementation uses Linq.

Bloom Filter implementation in F#

Wednesday, March 19th, 2008

Further to my previous post on bloom filters for efficient ontology lookup, I’ve made a simple implementation in F#. This is based on a Java implementation by Ian Clarke.

The neat thing about Ian’s implementation is its use of Random to extend the hashcode provided by the object being stored into a hash of arbitrary length, suitable for use by the bloom filter algorithm. This will reduce the quality of the hash, but for an arbitrary passed-in object, it’s hard to do better. (For a specific application, like storing ontology labels, it would be better to use a more specific algorithm such as a Jenkin’s Hash).

#light
open System
// Based on Java Bloom Filter implementation by Ian Clarke 
// http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/
type BloomFilter(bitArraySize : int, expectedElementCount : int) =
    let bitSet  = new System.Collections.BitArray(bitArraySize, false)
    let bitArraySize = bitArraySize
    let expectedElementCount = expectedElementCount
    let k = (int) (Math.Ceiling( ((double) bitArraySize / (double) expectedElementCount) * Math.Log(2.0) ))
    let bitSequence o =
        let r = new Random( hash o )
        Seq.init_infinite (fun n -> r.Next(bitArraySize))
    member b.expectedFalsePositiveProbability =
        Math.Pow((1.0 - Math.Exp( -((double) k) * (double) expectedElementCount / (double) bitArraySize)), (double) k)
    member b.add o =
        let sq = bitSequence o
        for x in 0 to k do bitSet.Set( Seq.hd sq , true)
    member b.addAll os = Seq.iter b.add os
    member b.clear = bitSet.SetAll(false)
    member b.contains o =
        let sq = bitSequence o
        let isSet n = bitSet.Get( Seq.hd sq )
        Seq.for_all isSet [0..k]
    member b.containsAll os = Seq.for_all b.contains os

Using F# makes the code slightly neater than Ian’s Java version - I’ve been able to factor out the hash code into a sequence supplied by the bitSequence function, and to collapse some of the for loops into operations over lists instead. But, the basic structure of the code is still very similar.