An Architect's View

CFML, Clojure, Software Design, Frameworks and more...

An Architect's View

duplicate() is bad for your (object's) health

June 1, 2007 · 31 Comments

ColdFusion 8 brings a lot of enhancements, both large and small, and it's interesting to see what gets some people excited. Andrew Powell thinks that being able to duplicate a CFC is the most important new feature in ColdFusion 8. I've already commented on that blog post but I thought I'd elaborate and talk about why I think this particular feature is dangerous and misguided. I really hope that this is just a temporary aberration in the public beta build and that the ColdFusion team remove this ability and restore the CFMX 7 behavior: duplicate() on a CFC should throw an exception. "What?", you say, "but we've been asking for the ability to duplicate() CFCs for ages!" Yes, yes, I know... but have you actually thought about what it means?As it stands, calling duplicate() on a CFC produces a full, deep copy of a CFC. It's quite a common design idiom to have references to other objects within any given object. If you duplicate such an object, you will create duplicates of all the objects it references - a full, deep copy. Now look at Model-Glue and Transfer: both of these frameworks create objects that contain references back to the core framework object. In both cases, the core framework object is a singleton - only one instance is supposed to exist. In Model-Glue, the event context contains a reference back to the core ModelGlue framework object (several other objects also follow this model). In Transfer, each generated TransferObject contains a reference back to the core Transfer framework object. I expect Mach II and Reactor and some other frameworks behave the same way. It's a common idiom. Duplicate one of these and you have a pretty serious problem: you suddenly have a full, deep copy of the entire framework object tree in your newly minted object! Ouch! With Transfer, for example, you'll now have a separate copy of the cache in each of your duplicated objects and you'll start to get subtle problems with the integrity of your data. Something that seemed like a simple operation - copying a transient object - suddenly turns into an extremely hard-to-debug problem with random data corruption in your application! Ouch! Ouch! So why do you actually want duplicate() in the first place? The most common reason I've heard so far is that createObject() is "slow" so it would be great if you could just create one object and then duplicate it to produce new objects. This assumes duplicate() is faster than createObject(), right? And why do you think it would be? createObject() just creates a new object and runs the pseudo-constructor. duplicate() on the other hand would have to allocate space and copy all of the elements of the original object recursively. I think duplicate() would be slower than createObject(), especially now that ColdFusion 8 has made incredible improvements in performance, especially around creating new objects. I ran some tests. I created a simple.cfc that has just a small init() method that sets two variables and two getters for those variables. Then I timed 1000 createObject() calls. About 50ms. Then I timed 1000 creations plus calls to the init() method. About 60ms. Nice. What about duplicate()? You think it'll be faster? Well, 1000 duplicate() calls took about 3 seconds. Yes, you read that right: 3000ms. Still want duplicate() on CFCs?

Tags: coldfusion

31 responses so far ↓

  • 1 John Farrar // Jun 1, 2007 at 5:26 AM

    Much of the success of any web app today is shared hosting. This has the potential to destroy the rep of CF! (Your point is well taken.)

    My vote is the time would have been better spent adding some things to cfscript like redirect, abort, etc. to make it a complete scripting environment!
  • 2 Jeremy French // Jun 1, 2007 at 5:40 AM

    There could also be a problem in duplicating a CFC with a bidirectional relationship, e.g. an Order object that has references to OrderItem objects, which each have a reference back to their parent Order object. I haven't tried it yet, but wouldn't a deep copy create an infinite loop?

    The only way I can see duplicate() being workable on a CFC is if it's NOT a deep copy, in which case, how much use is it?
  • 3 Sami Hoda // Jun 1, 2007 at 6:16 AM

    Can you come up with some scenarios where using duplicate() would be appropriate? Or is it all bad?
  • 4 Radek Gruchalski // Jun 1, 2007 at 6:24 AM

    You are comletly right Sean. That is why some developers are asking for real statics in CF I think...
  • 5 Damien McKenna // Jun 1, 2007 at 6:27 AM

    The speed rationale is misguided, shaving off a few ms off object creation is going to have negligible effect if your app spends most of its time accessing the database. "Premature optimization is the roo of all evil" as they say.
  • 6 Steve Bryant // Jun 1, 2007 at 6:55 AM

    Sean,

    Great post. When I first read Andrew's post, something bothered me about using Duplicate on CFCs. I couldn't figure out why it seemed wrong to me.

    That being said, I can where Duplicating a CFC could be nice for testing purposes if I want to see what would happen to a Component if I did X, but I don't want to mess with the component itself (and I plan to destroy the copy after I run my test).

    Not sure if I would actually do that in practice or not though.
  • 7 Sean Corfield // Jun 1, 2007 at 7:40 AM

    @Radek, not sure what static methods / data have to do with this - could you explain your logic there?

    @Sami, duplicate() only makes sense on a CFC that has just simple data members (i.e., very simple beans) but even then it will be faster to explicitly create a new object and initialize it from the existing values (not that speed should really influence our decision at that level).
  • 8 Andy Powell // Jun 1, 2007 at 7:54 AM

    My main point really is not speed, but code management. Why have a bunch of createObject() calls around your app when you can manage all of them in one place?
  • 9 Matthew Lesko // Jun 1, 2007 at 8:05 AM

    <rant>
    Stop asking for cfscript additions! Coldfusion is a tag-based, Markup Language (i.e. CFML). cfscript is not, and never can be, ECMA compliant unless changes to do so break existing code. However, it pretends/appears to be, which is just that much worse. So better to deprecate rather than expand a poorly implemented experiment.
    </rant>

    Regarding duplicate(), it now provides developers a loaded gun that can take off a leg (or worse), but may only be apparent when an application experiences load. That said, the same can be said of concurrency (i.e. cfthread). But these issues are not new and I think the solutions they provide outweigh the hazards they create.

    When I read Andrew Powell's blog about applying duplicate() to framework code I had the same reaction as Sean, but I don't know enough about the any Framework's design to comment definitively. Conceptually though, I see applicability in the ORM space, but not MVC.

    That said, I do think duplicate() provides an elegant solution for creating objects when:

    1. part of the object instantiation (including its composition) is expensive (usually some sort of I/O dependency in my experience) and identical between instances. Note, you then need to be able to alter object state after copying.
    2. classes are loaded dynamically at run time so you cannot use sub classing or decorators.

    See the GoF Prototype pattern for a more in depth discussion.
  • 10 Sean Corfield // Jun 1, 2007 at 8:36 AM

    @Andy, factories are a better way to manage your createObject() calls rather than replacing a bunch of createObject() calls with duplicate() calls. IMO.

    @Matthew, again, factories are the solution to the "expensive" instantiation problem - you don't need to duplicate the CFC, just the data that would be expensive to fetch. Not sure what you mean about dynamically loaded classes (Transfer ORM, for example, generates objects on the fly at runtime but still allows you to use decorators).
  • 11 John Farrar // Jun 1, 2007 at 8:43 AM

    Matthew, I checked google to see if script would comply with the standard and not and this is the first thing that caught my interest.

    Read this.

    http://www.microsoft.com/presspass/press/1997/jun97/jecmapr.mspx

    In otherwords this was achieved with script in IE4. 100% compliance.
  • 12 Steve Bryant // Jun 1, 2007 at 9:03 AM

    Matt,

    John didn't ask for ECMA compliance, just more utility in cfscript. You may not like cfscript, but others do.

    Personally, I think it would help the language if a developer could use a script format or a tag format for anything. It might help attract those who prefer a scripting syntax.

    Andy,

    Take a look at ColdSpring or Lightwire for some good offerings at circumventing the need for tons of CreateObject() calls (both are Dependency Injection engines).
  • 13 Fred Fortier // Jun 1, 2007 at 12:09 PM

    I am using cfscript a lot, not because tag scripting is not good, just because I (personal opinion here) find the cfscript markup easier to read in CFC's.

    Coldfusion tag scripting is great when working with HTML and generating table etc. But I force myself never to put any display/HTML generating function in CFC's... just to keep things seperated and clean.

    If I could do all my CFC's in cfscript that would be awesome.
  • 14 Mike Kelp // Jun 1, 2007 at 3:22 PM

    Surprised noone brought this up yet, but you might think because CF 8 also provides us interfaces, it might have been considered allowing duplicate only on CFCs that implement clonable interface or something similar to java that allows you to then implement a clone method if you like.

    I know CF isn't java but I think it was a good way to address all the concerns mentioned above as well as security.

    Mike.
  • 15 Daniel Greenfeld // Jun 1, 2007 at 10:19 PM

    Fred Porter,

    I share your feeling about cfscript not having enough utility. So I extend it.

    What I do is take the tags I need from CF that are not available in cfscript and replicate them as functions inside a utility cfc. A simple example follows:

    <cfdump> becomes dump(). dump() does everything that <cfdump> does. Often I extend these functions to include more capability. Better yet, many of these have already been done and are stored on cflib.org. So much of the work is already done!

    So I end up with a win/win situation.
  • 16 Adam Cameron // Jun 2, 2007 at 4:35 AM

    Hi Sean
    Surely the problem is not with the notion of duplicating CFC instances, it's with HOW the duplicate() function seems to have been implemented to do this.

    If an object (quicker to type than "CFC instance"!) has a member variable that is a REFERENCE to another object, then the REFERENCE to the underlying object should be duplicated, NOT a "deep" duplication of the object to which the reference... err... refers.

    Or does this raise issues of its own? (I've only spent about 30sec thinking about this, and I have a hangover).

    ?

    --
    Adam
  • 17 John Farrar // Jun 2, 2007 at 6:41 AM

    Daniel... that sounds good. But your technique is not foward prep'd enough. If Adobe adds dump to the functions your code could break. You should make custom functions have a different format in some way. Perhaps you should use _dump() rather than dump(). Perhaps Sean or antoher GURU could tell us if this could be an issue.

    I also tend for now to use _redirect(), and _abort() in my code. The truth is most of the tags just are not "fundamental". The bigger issue is things like <cfAbort> and things like that are wrong to be missing from script.

    And to the thought that you can just add them... functions like <cfSaveContent> could be added without making it part of the core language. Yet, adding that and others enhanced the utility of tag developers. The same utility is what those of us who do use script are after. (I am not asking for "all" the tags to be converted.)
  • 18 Nolan // Jun 2, 2007 at 8:25 AM

    I was just thinking about this, and had similar thoughts to the one Mike just brought up, two comments above.

    Sean, I understand your point, and it's a valid one, however it seems (to me) that your argument is specific to applications that have an object pattern which would break if Duplicate() were used. Moving forward with CF8, we'll have interfaces to use, which could cause a shift in how CF apps are designed, making this less of an issue, yes?

    Also, just saying "Duplicate() is bad for objects" is no different than saying "cfregistry in a cfloop could cause your server to crash". Okay so the tools used in a specific context aren't the best way to write code. Why not provide the tools for those situations that have a valid use, and educate against the possible negative aspects? Adobe could do something like...

    "objectCopy() -- Note: depending on your object model, using this function may cause unexpected behavior. Do not use objectCopy() if your object inherits from a Singleton as it will cause a bug in the application."

    ...and then if a user does it anyway, it's his/her own fault. However for those cases that aren't tied to a Singleton or a framework (maybe college kids using CF to learn how objects work? or just simple apps that aren't framework based but still use CFCs), we'd have a simple way to copy objects as needed.

    Yes? No?

    2 cents.
    -Nolan
    http://www.southofshasta.com/blog/
  • 19 Sean Corfield // Jun 2, 2007 at 9:38 AM

    @Nolan, interfaces will make no difference to this issue - it's about composition not inheritance (and you don't inherit from a singleton - singleton is a runtime construct, you can only inherit at compile-time). Even in the absence of singletons, I think people will find a deep copy of objects often gives surprising results (depending on their object model).

    You could happily use duplicate() on simple beans - but it would still be (much) faster to create a new object and initialize it with data from the original bean according to my tests.
  • 20 Gert Franz // Jun 2, 2007 at 1:39 PM

    Hi Sean,

    here my two things about duplicate(). There were about the same reasons why we have implemented a second parameter into the duplicate() method called deepcopy. You then can command Railo to do a flat copy of the component and maintain the pointers to the same objects. By default deepcopy is set to true.
    I have done the same tests as you did and it turned out that Railo needs 147ms for 10,000 object creations and only 32ms for duplicates. But with deepcopy set to true.

    Gert
  • 21 Elliott Sprehn // Jun 12, 2007 at 3:22 AM

    "I've been arguing this exact same thing for years so I totally agree with you. Anyone who bases a code choice on the results of running some code fragment in a loop on their own workstation is living in a dream world!" - Sean Corfield (http://bluedragon.blog-city.com/fallacy_of_loop_testing.htm)

    Anyway, I'm not sure I agree that this should throw an exception. C and C++ let you do some pretty awful things with pointers and memory (corruption), but those features exist in case you need them.

    Duplicate could be quite useful if you need to create another instance of an object that's expensive to create where a factory does not exist (3rd party framework) and could not be easily implemented.

    Also, if you were unit testing and wanted to compare the members of two objects, before and after a change. You could, of course, mirror the setup calls in the unit test for both instances, or you could write a factory, which honestly seems like overkill if it'll only ever need to be done in the tests.

    Duplicate is also useful for working on copies of expensive to create configuration objects. Could we write a factory? Sure. But we could also duplicate.

    Your loop doesn't test the performance to duplicate a very expensive object, like the core of Model Glue, vs loading it all over again, or writing a method for it to create a deep copy, either.

    If you were going to duplicate the entire expensive-like-ModelGlue instance we can:

    - Write a factory method that gets each member, duplicates it's primitive structure or creates a new instance if it's a cfc, and walks the whole tree. This is going to be O(n) in the CFML code.

    - Keep a reference to the loaded configuration data (XML for MG) and generate a new instance of the object each time. This is going to be O(n) in the CFML code.

    - Just call duplicate() on it. This is going to be O(n) in the Java code.

    Which one is more readable? Which one uses less memory? Which one is faster? Which one results in more maintainable code?

    My bet is on duplicate. It might not be the best solution for all problems, it might not be the best solution to most problems, but to say it'll never be useful doesn't seem quite right to me.

    (Note that this is hardly news too, C++ has deep copies, Python has deep copies, ruby has deep copies, BlueDragon has had deep copying for cfcs with duplicate() since 6.1, and Railo allows it too)
  • 22 Andrew Powell // Jun 12, 2007 at 8:38 AM

    You know in the end, it's just like CFCs... You don't have to use this new functionality if you don't want to. Developers can still write CF apps with or without using duplicate() like this.

    When it all comes down to it, THAT is the beauty and power of CF. It can accommodate any development style a developer wants to use or not to use.
  • 23 Sean Corfield // Jun 12, 2007 at 8:47 AM

    @Elliott, C++ does *not* have deep copy semantics for objects - if you want deep copy semantics, you have to write your own copy constructor and copy assignment operator. That's what I'm advocating here - if duplicate() was a shallow copy on CFCs (but a deep copy on arrays and structs), I'd be much less concerned.

    Your example of the Model-Glue core is specious: Model-Glue is a singleton. Most "expensive to create" objects are singletons so duplicate() won't apply to them by definition, in my opinion.

    I'm sure folks will find duplicate() useful - I'm just very concerned that people will duplicate() a CFC and shoot themselves in the foot because deep copy semantics are not appropriate.

    For simple bean-like CFCs, duplicate() will work fine (although not for Reactor or Transfer managed objects!) but, as I noted, duplicate() in the Public Beta is much, much slower than createObject() now that the CF team have vastly improved the performance of createObject().
  • 24 Gert Franz // Jun 12, 2007 at 8:55 AM

    @Sean, I agree on that, but what about the additional parameter I mentioned above. Would it not solve the problem of deep copy and maybe improve performance even more?
  • 25 Sean Corfield // Jun 12, 2007 at 9:18 AM

    @Gert, yes, that would help a *little* but the problems still arise when you have a CFC with a reference to another CFC that in turn has a reference to a singleton: deep copy is not correct here but shallow copy might not be correct either. If duplicate() looked for a clone() method (or some such) and used that if present, then developers - particularly framework developers - would have sufficient control over the behavior to make it "safe".
  • 26 Elliott Sprehn // Jun 12, 2007 at 11:19 AM

    @Sean, Correct, you'd have to write your own copy constructor, but the capacity for creating a deep copy exists, it's just more involved. duplicate just does that work for you.

    It seems silly to me to require people to write their own deep copy constructor for every component to create a deep copy. CF is about saving time.

    I think using the clone() method, as you suggest, if it existed, and if it didn't then the default deep copy behavior seems like a good compromise.

    As a side note, duplicate() deep copies, structCopy() shallow copies. Most struct functions work on cfc instances. So it makes more sense to me to add the capacity to structCopy() a cfc instance for a shallow copy than to add parameters to duplicate.
  • 27 Elliott // Aug 8, 2008 at 2:03 AM

    I ran some recent benchmarks with CF8.0.1 and I'm definitely not seeing this huge performance hit. Quite the opposite.

    Creating 3000 objects:

    coldspring.beans.DefaultXmlBeanFactory:
    createObject: ~3400ms, duplicate: ~3400ms

    So I looked through the code and it turns out that CS calls createUUID() in the body of coldspring.beans.AbstractBeanFactory. I removed this because I suspected a performance hit...

    coldspring.beans.DefaultXmlBeanFactory(3000 times):
    createObject: ~850ms, duplicate: ~486ms

    Then I tried something very simple...

    coldspring.beans.BeanReference(3000 times):
    createObject: ~400ms, duplicate: ~70ms

    So duplicate can be 6x faster in the right conditions, or just as slow in the bad conditions.

    (Incidentally Transfer uses duplicate() to make creating TransferObjects faster on CF8)

    The oddest part of the initial test is that the createUUID() in the *body* of the <cfcomponent> tag was causing the duplicate slowness.

    So I tried adding writeOutput("test!") to the body of a cfcomponent. Creating it once and duplicating it 3000 times. And what did I get? "test!" output 3000 times! eep.

    That definitely has to be a bug (I hope), and a really nasty one at that.

    It does seem using that bug that you can fake singleton behavior though:

    <cfcomponent>
    <cfif isDefined("application.instances.MySingleton")>
    <cfthrow ...>
    </cfif>
    <cfset application.instances.MySingleton = 1>
    </cfcomponent>
  • 28 Sean Corfield // Aug 8, 2008 at 6:25 AM

    @Elliott, interesting. So it sounds like they've largely addressed the performance problems with duplicate() in CF8.0.1 (you should run your tests on CF8.0.0 as well to confirm that if you can).

    Interesting behavior that the pseudo-constructor is executed in duplicate() - have you filed it as a bug? http://adobe.com/go/wish
  • 29 Gary // Oct 29, 2008 at 4:02 AM

    I would like the ability to mark my complex objects as "transient" so that duplicate passes over them!
  • 30 Joel Ferreira // Mar 15, 2011 at 8:00 AM

    Just my two cents here but I was brainstorming one day and had the idea that the duplicate() would be faster than createObject() also. Our code base uses very simple CFC's that do not have references to others, circa CF6.

    I setup a test case and was very surprised by the results.

    in CF8 - duplicate() was twice as fast as createObject()

    in CF9 - duplicate() was about 30% slower than createObject()

    Something under the hood has definitely changed to be producing this behavior.
  • 31 Jose Galdamez // Dec 20, 2012 at 8:26 AM

    I'm working on app that's using WireBox for the first time. Being that I'm getting terribly slow performance on object creation I considered the possibility of using duplicate() for a few of our transient objects. This blog post gives me reason enough to stay away from it.

Leave a Comment

Leave this field empty