A discussion that crops up from time to time is whether primary keys should be UUIDs or simple auto-incrementing integers. There are pros and cons on both sides. UUIDs provide universal uniqueness which allows data to be stored across multiple databases, merged and so on. Auto-increment primary keys are easy to work with and efficient for indexing - but cause problems when you need to deal with data created across multiple databases, such as data migration and merging. UUIDs, as 36 character strings, take up a lot more space and can be much less efficient as an index.
But what about storing UUIDs in a more efficient format? A DBA recently recommended using BINARY(16) since that holds 128-bit data and a UUID is usually decomposable as two 64-bit long integers. The question is how to do this? And, I guess, another question is: is it really that much more efficient than a 36 character string?
When I first looked at implementing the DBA's advice, I couldn't find a type in cfqueryparam that would support BINARY(16). There's BLOB... and I guess that would work if I could provide a byte array? Then the question is how to get a UUID - a 36 character string of hex with '-' separators - into a byte array in the first place? And then you have to ask how efficient that would be (under the hood, the UUID generator will create two 64-bit longs and then have to convert that to a string before CFML gets it's hands on it!).
Then another issue is the raw speed of UUID generation in the first place. Adobe made some big improvements between CF8 and CF9 here (100x faster) but generating UUIDs is traditionally still quite slow and in CFML they get converted to a delimited hex string so you can't even get at the underlying binary data. When I did some research, I found that the UUID generator built into Java (since Java 5) can only produce type 3 (name-based) and type 4 (random) UUIDs - it can't create time-based (type 1) or DCE security (type 2) UUIDs. There are third-party generators that can produce time-based UUIDs and which perform better than the built-in Java version.
Overall tho' this sort of fiddling around is something better suited to a low-level language like Java. We want a raw UUID. We want to get the two 64-bit longs and write them to a 16-byte array and use that as our BINARY key. As an experiment, I tried Johann Burkard's lightweight time-based UUID generator - see below - and wrapped it up in Clojure to create the value I wanted:
(defn uuid [] (com.eaio.uuid.UUID.))
(defn uuid-as-byte-array []
(let [u (uuid)
^long lo (.getClockSeqAndNode u)
^long hi (.getTime u)]
(-> (java.nio.ByteBuffer/allocate 16)
(.putLong hi)
(.putLong lo)
(.array))))
There may well be more efficient ways of constructing the byte array, such as direct write operations and bit manipulation, but this seems clean and simple enough.
I'd be very interested to hear what others have done around UUID and/or binary indexes on large tables and, in particular, how you've deal with that from CFML?
See Johann Burkard's comparison of UUID generators for feature / performance analysis. See also Mark Kruger's recent post about leveraging Java UUID from ColdFusion - and the performance improvements possible (and read the comments).

10 responses so far ↓
1 Randy Merrill // Apr 16, 2011 at 8:58 PM
I found this post that talks about the performance difference a bit between the uuid and a normal character array: http://jochem.vandieten.net/2008/02/06/postgresql-uuids-and-coldfusion-1/
For me the benefits of it being almost guaranteed unique across databases is a big benefit. I haven't had to work with any databases where it would actually be an big performance slayer so I can't really speak from experience there.
2 Tyler Clendenin // Apr 17, 2011 at 2:26 AM
Unfortunately the method NEWSEQUENTIALID()
http://msdn.microsoft.com/en-us/library/ms189786.aspx
came around after we started using uniqueidentifiers for our primary keys. From what I understand (sans a reboot). As long as you have a network card (which of course you will) GUID's created with the sequential id function will be unique and will be sequential. This seems like it would have tremendous advantages. Eventually I hope to start using this method to generate primary keys for our schemas.
3 Tyler Clendenin // Apr 17, 2011 at 2:33 AM
http://kb2.adobe.com/cps/190/tn_19007.html
4 Barney Boisvert // Apr 17, 2011 at 2:23 PM
5 Sam Jones // Apr 18, 2011 at 12:39 AM
6 Sean Corfield // Apr 19, 2011 at 2:29 PM
7 Ken Redler // Apr 19, 2011 at 7:23 PM
8 Michel Fleur // Apr 22, 2011 at 1:31 AM
- Since 1.5, java has java.util.UUID, that hasn't got the clock-problem that Form.CreateUUID has.
- That problem was related to Thread.sleep() and fixed from Java 1.4.1
- UUIDs and Clustered indexes is indeed not a very good combination, on the other hand
- Generating random UUIDs client (java)-side increases scalability, because there is less (=no) coherence in the generation process. Beware of the birthday paradox though...
@Ken Redler
The problem with adding an additional, more meaningful column for a clustered index, is that it also introduces a performance hit, unless it's a column with a strictly rising value.
9 Arne Lewinski // May 20, 2011 at 6:40 AM
10 Sean Corfield // May 20, 2011 at 11:24 AM
You certainly do not give up layering - UUID creation is done in the data layer, where it belongs (and with newer data stores, UUID generation is done in the database itself).
Also, there's no less type information in a UUID than any other PK. The value alone is never enough: you always need something else to determine the "shape" of the data behind the PK.
However, it's entirely possible you will never need UUIDs. I don't have that luxury, hence this blog post...
Leave a Comment