Some relational algebra with datatypes in Clojure 1.2

By: on August 21, 2010

Clojure 1.2 has been released and Leiningen has moved up to 1.3.0 (or was it briefly 1.2.0) which supports the latest version – so I decided to spend a spare afternoon playing with one of the new features of Clojure.

One of the interesting new features is datatypes which are a replacement for the structmap feature from earlier versions of Clojure. Datatypes and structmaps are directly equivalent to maps. Clojure provides an implementation of relational algebra that works on its maps in the clojure.set namespace, I expected that these functions should work with datatypes but I decided to check for myself.

I specifically wanted to check how close you an approach to an ideal relational system as defined by C. J Date in his books – Database in Depth and Databases, Types, and the Relational Model: The Third Manifesto. Date strongly believes that databases should be built on relational algebra and that none of the current relational databases achieve that goal. Database in Depth is a very readable and concise introduction to relational algebra and is well worth reading.

clojure.set contains six functions that correspond to the six fundamental operations in relational algebra – union, difference, rename, selection, projection and join. The first two, union and difference, are simple set operations that will definitely work on sets of datatypes. Let us try the other four operators on some datatypes, I’ll base my examples on a cut-down set of relations used by Date describing suppliers, parts and shipments.

A clojure datatype is created using the defrecord function. We will a datatype for a supplier, a part and a shipment:

(defrecord Supplier [number name status city])
(defrecord Part [number name colour weight city])
(defrecord Shipment [supplier part quantity])

We can create a supplier like this:

(Supplier. "S1" "Supplier 1" 100 "London")

We can then define our three relations like this:

(def suppliers
  #{(Supplier. "S1" "Smith" 20 "London")
    (Supplier. "S2" "Jones" 10 "Paris")
    (Supplier. "S3" "Blake" 30 "Paris")})
(def parts
  #{(Part. "P1" "Nut" "Red" 12.0 "London")
    (Part. "P2" "Bolt" "Green" 17.0 "Paris")
    (Part. "P3" "Screw" "Blue" 17.0 "Oslo")})
(def shipments
  #{(Shipment. "S1" "P1" 300)
    (Shipment. "S2" "P2" 200)
    (Shipment. "S3" "P3" 400)})

Now let us try the first relational operation rename:

clojure.core=> (use 'clojure.set)
clojure.core=> (rename parts {:number :id :city :location})
 #{{:location "London", :name "Nut", :colour "Red", :weight 12.0, :id "P1"}
 {:location "Paris", :name "Bolt", :colour "Green", :weight 17.0, :id "P2"}
 {:location "Oslo", :name "Screw", :colour "Blue", :weight 17.0, :id "P3"}}

So this results in a new map based on the parts relation with the number and city fields renamed to id and location respectively.

Now let us try a select operation:

clojure.core=> (select #(= (:name %) "Smith") suppliers)
#{#:Supplier{:number "S1", :name "Smith", :status 20, :city "London"}}

Note in this case the set that has been returned contains the datatype defined earlier not a simple map.

Now let us try the project operation:

clojure.core=> (project suppliers [:city])
#{{:city "London"} {:city "Paris"}}

Finally let us join the parts and shipments relations:

clojure.core=> (join parts shipments {:number :part})
#{#:Part{:number "P1", :name "Nut", :colour "Red", :weight 12.0, :city "London", :quantity 300, :part "P1", :supplier "S1"}}

Note that the results here is a set of the datatype Part with the additional joined fields held as additional fields on Part.

So this seems to work perfectly, it is also transitive, for example:

      (select #(= (:city %) "Paris")
      suppliers) shipments {:number :supplier})

This will produce the name of the suppliers that have shipped parts and are located in Paris.

So all is good with relational algebra with datatypes in Clojure! Not quite, C J Date is a bit more picky than that, he really doesn’t like null values, his ideal system has no place for null values. Clojure has a null value, nil, and any value can be nil. So I can create a record in which all values are nil, for example:

clojure.core=> (Supplier. nil nil nil nil)
#:Supplier{:number nil, :name nil, :status nil, :city nil}

Oh no – there are unknown values! We could create a factory function that ensured that you can’t pass in any nils but you can still do this:

clojure.core=> (assoc (Supplier. 1 "A" 20 "London") :number nil)
#:Supplier{:number nil, :name "A", :status 20, :city "London"}

So nil can still worm its way in to the datatype and we’ve introduced unknown values into our relational algebra which we probably don’t want to do. So more work is needed to satisfy C J Date. However, the core operations from clojure.set have done remarkably well and it could easily sit in the core of a relational system with some wrapping to keep the evil nil at bay.



  1. tonyg says:

    Given that Scheme^H^H^H^H^H^HClojure is unityped, aren’t nils perfectly valid column values all of a sudden? Wasn’t Date’s complaint more about NULL being considered a legitimate INTEGER? In a unityped world without column type constraints, you could just as well put a VARCHAR instead of an INTEGER, so singling out NULL seems a bit unfair.

  2. Sam Aaron says:

    Interesting, thanks for sharing.

    It reminds me of a previous project in which I used the functional aspects of Ruby to examine and explore a relational data set. It felt a lot more fluid and natural (to me) than using SQL. However, I wasn’t using relational algebra like you describe above, rather the more usual (yet seemingly closely related) map, inject/reduce, filter family of functions.

    I imagine that Clojure will perform pretty nicely over large data sets (especially compared to Ruby!).

  3. Sam Aaron says:

    @tonyg “Given that Scheme^H^H^H^H^H^HClojure”

    Not to be at all dismissive of your general point regarding the inclusion of NULL, isn’t it slightly unfair to conflate these two languages in such a manner? It feels like the only major similarity is that they are both Lisp-1s.

    It feels to me that the lispy part of Clojure is only a small part of its attraction. I’m also very much compelled by its concurrency semantics, immutable data-structures, its idea of being a hosted-language and the wonderful focus on a well curated suite of abstractions.

    Oh, and the fact that Rich Hickey has the most badass hairdo out of any of the language designers out there today!

  4. tonyg says:

    @Sam Aaron:

    “isn’t it slightly unfair to conflate these two languages in such a manner?”

    Yep. I wasn’t serious (though I wasn’t entirely joking either).

  5. tim says:

    Date has lots of complaints about nulls in relational algebra, the one he uses in the books I referenced is that if you have nulls you are on a slippery slope towards tri-valued logic, a slippery slope that SQL takes and gets wrong – if I remember correctly.

    Personally I think that if you don’t know what a value is it shouldn’t be added to the tuple!

  6. Olivier says:

    If you want to see a db system done in the spirit of the Third Manifesto you might want to look at the D4/Dataphor system. It looks like it never got real traction but it was a valiant attempt and it’s not alpha- or beta-quality software either.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>