Entries for month: July 2011
Hot on the heels of 0.0.4, here's another point release that fixes a compatibility issue with HSQLDB (thanx Aaron Bedra for finding this problem!) as well as allowing more control over the PreparedStatement in with-query-results:
Changes in 0.0.5:
- Add prepare-statement function to ease creation of PreparedStatement with common options:
see docstring for details
- with-query-results now allows the SQL/params vector to be:
a PreparedStatement object, followed by any parameters the SQL needs
a SQL query string, followed by any parameters it needs
options (for prepareStatement), a SQL query string, followed by any parameters it needs
- Add support for databases that cannot return generated keys (e.g., HSQLDB)
insert operations silently return the insert counts instead of generated keys
it is the user's responsibility to handle this if you're using such a database!
I've been a bit lapse in announcing new versions on my blog so, to rectify that, this post includes the full set of changes across all four dot releases so far (below). This is mostly a bug fix release, but it also lays the groundwork for exposing more control over the PreparedStatement used in queries, coming in one of the next few dot releases.
In Leiningen, just add [org.clojure/java.jdbc "0.0.4"] to your :dependencies to use this latest release.
[Read more →]
clojure · oss
On July 13th, 2011 Robert C. Martin (aka "Uncle Bob") gave a talk at Skills Matter in London called The Last Programming Language. He was scheduled to give a version of it as the keynote for ACCU 2011, a conference I remember with fondness from my days back in England as a member of the Association of C and C++ Users! You can read Martin's blog post about the talk here but note there's a $2 charge to watch the version linked from that blog post - the Skills Matter version linked above is free.
TL;DR: He asks whether we've exhausted all possible programming paradigms and languages and whether we should now consider a single standardized programming language (and offers a suggestion of what that might be). Preposterous?
[Read more →]
clojure · programming
Back in early May I talked about presenting FW/1 at D2WC, the upcoming Designer/Developer Workflow Conference in Kansas City, MO. I was really looking forward to visiting a new city and trying out a new approach to presenting. Unfortunately, personal issues have conspired to rob me of that opportunity and I'm no longer able to make the trip. Fortunately, my esteemed colleague at Railo, Mark Drew, has agreed to take over my talk at very short notice so you can still look forward to:
- An introductory talk about FW/1 by a stout, bald bloke with a funny account - you'll hardly know the difference!
- Sponsorship by Railo Technologies, with a key staff member on site to answer all of you Railo questions and, hopefully, encourage you to let the Railo consulting team help solve your CFML problems!
What's new is the opportunity to take advantage of the comprehensive Railo Server Administration training course that Mark is running after the conference!
If you haven't registered for D2WC, there's still time. The conference offers an unparalleled blend of designer/developer topics focused on workflow and a host of expert speakers. I'm dead jealous of those of you attending!
coldfusion · d2wc · fw1 · railo
At World Singles, like many other companies, we use a Power MTA server to handle the email we send out - because we send a lot of email every day. When you have a large user community that you email regularly, one of the problems you have to deal with is that users change or abandon their email accounts and you often end up with a lot of stale email addresses that you need to process. The Power MTA server produces a daily accounting log file that is a giant CSV file containing the delivery status of every email it sends. If you're sending a lot of email, these accounting files can be very big. We send close to half a million emails a day so our accounting files are about 200MB.
To automatically handle bounced emails, all we need to do is extract the list of email addresses for which the Power MTA server was unable to deliver messages. What is a CSV file? It's a sequence of records, with each record representing the columns in one line of the CSV file. Given the size of the files, we don't want to load the whole thing into memory and convert it to a data structure - we'd quickly run out of memory! Fortunately, Clojure has lazy data structures which allow us to process large amounts of data, one piece at a time.
David Santiago has created a very nice abstraction for CSV files - clojure-csv - and very quickly responded to my request to update the library to work with Clojure 1.3.0, which is what we're using at World Singles. With just that library, here's what it takes to extract all the email addresses from a 200MB CSV file without using a great deal of memory:
(:require [clojure-csv.core :as csv]))
(defn to-csv [file]
(csv/parse-csv (csv/char-seq (clojure.java.io/reader file))))
(defn get-bounces [csv]
(filter #(= (first %) "b") csv))
(defn get-bouncing-emails [bounces]
(map #(% 5) bounces))
(def test-file "/Developer/workspace/worldsingles/ws/pmta/acct-2011-06-29-0000.csv")
(count (get-bouncing-emails (get-bounces (to-csv test-file))))
This is just a proof of concept to show the feasibility of such parsing. For production code, the use of reader should be wrapped in with-open to ensure the file handle is closed after the data is processed - and of course the actual list of emails needs to be processed against our member database so that we can flag bouncing email addresses. So, how does it work?
to-csv opens the file with a reader, produces a (lazy) character sequence and then parses that sequence to produce a (lazy) sequence of vectors, where each vector is a row of the file.
get-bounces filters the sequence of vectors to return just those marked as bounces (first column is "b"). Again, it's a lazy sequence.
get-bouncing-emails extracts just column 5 (numbered from 0) which represents the recipient ("rcpt" in the original CSV file), again as a lazy sequence.
I picked an arbitrary CSV file and then ran that last line which: converts the test file to a CSV sequence, filters it to get just bounced records, extracts just the email address and counts how many addresses we found. That line took about 45 seconds to process just over 450,000 records in an almost 200MB CSV file. A total of 46,000 email attempts bounced. (count (distinct (get-bouncing-emails (get-bounces (to-csv test-file))))) told me there were about 38,500 unique email addresses in that list.