An Architect's View

CFML, Clojure, Software Design, Frameworks and more...

An Architect's View

Real World Clojure - XML generation

October 27, 2011 ·

As mentioned earlier in this series, we have a process written in Clojure that reads updated member profiles from our MySQL database, converts them to XML and posts them to our search engine. XML generation in Clojure is made very easy by using a library called hiccup. For example, the following XML:

<items>
  <item id="Clojure" type="language">
    <emotion>Joyful!</emotion>
  </item>
</items>

is represented to hiccup as the following data structure:

[:items
  [:item {:id "Clojure" :type "language"}
    [:emotion "Joyful!"]]]

then rendered to XML (HTML) by calling hiccup.core/html. Since we are simply creating Clojure data structures, all of the normal benefits of Clojure apply, along with all of the functions we are used to. Once we have our list of member profiles to publish, we simply do this to create the appropriate XML:

(hiccup.core/html [:changeset (map render-user users)])

The render-user function starts out like this:

(defn- render-user [user]
  (if (excluded-from-search user)
    [:remove-item {:id (:id user)}]
    [:set-item {:id (:id user)}
     [:properties
      [:struct
        ...]]]))

The elided code walks through the elements of a user profile and maps them to something like this:

[:element {:name "Height"} [:string "183"]]

The value for an element can be an array of strings so the function that actually renders a name/value element looks something like this:

(defn- render-name-value [name value]
  [:entry {:name name}
   (if (.contains value ",")
     [:array (for [item (.split value ",")]
               [:element
                [:string item]])]
     [:string value])])

Again, see how easily we can drop into Java and how the result of String.split() can be treated as a Clojure sequence in the for expression. In the actual code, we run transformation functions against a number of user attributes and we also make the values XML-safe.

This is part of our "publisher" daemon that runs 24x7, keeping our search engine up-to-date. To cope with heavy traffic, we batch the updated profiles and perform the transform and post operation in parallel across those batches:

(doall (pmap (partial publish-user-block options)
       (partition-all num-records users-to-publish)))

We typically read up to a couple of thousand records at a time and break them up into groups of records, so publish-user-block is run on each group of records with as much parallelism as pmap allows. We experimented with various combinations to discover what was optimal for our servers (on our less power CI and QA servers, the publisher processes use fewer groups).

Tags: clojure

0 responses