Tuesday, 27 April 2010

ID the Identity

Since I stared working in my new place I've come across the same sort of design issues that I encountered in my previous company.  This is a post originally from my previous blog.  I wanted to highlight this issue again since it has reappeared in my conscious but I also wanted to take the opportunity to update it with new learnings, so I moved it here.

One such common design issue is the matter of object identity and how it is used within the application under development.  What I have seen all too often is the following:

The Primary Key reference from the database record is the object identifier!

So, hands up who recognises what is stated above because they have done it in the past, are currently supporting a legacy app with this nuance, or are still doing it?   So, as eveyone heard all too often as a teenager "everyones doing it, whats the problem?"

Well for a start I've tried not to use the term "problem" up until now because, well for one I'm just not the dramatic Graham Norton type, and two I'm sure if you are doing this on your current development project it probably won't have manifested itself as a problem.  Maybe not yet at any rate. 
But consider these..

By exposing your primary keys as application IDs, you are writing this identifier in stone.  Of course your identifier should be immutable, but  this cannot always be guaranteed with the database sequences that are used for primary keys.  Or, for many successful (and therefore large) systems, the database has to be scaled up into server farms so GUIDs are recommended to guarantee uniqueness even across multiple database servers.  Who fancies having to remember a GUID for an object identifier?

The reason above is looking at this issue from a purely technical view point.  The rest of the reasons below consider the functional aspects of such an implementation and (in my book) are all the more convincing for it.

Consider this - database primary keys are quite simply a database storage mechanism.  They are a handy way to uniquely identify individual records in relational databases.  Relational databases in enterprise applications are for persistence and nothing more.  Why let your persistence mechanism bleed into your functionality?  The primary key value is not application data and therefore should not feature in the application.  So, on all projects I have a say on, I now vehemently insist on making the primary key ID a private or protected variable.  For anyone who hasn't previously worked on a project with me, they usually look confused or think I'm making a big deal out of nothing at this stage.

Still not convinced?  OK what about this - by exposing the primary key of the record you are unnecessarily exposing your application implementation to your users.  This may not be a big deal to all developers and all situations, but to customers who want their applications to be as secure as possible, can you truthfully say that you have mitigated every possible security risk when you are advertising database IDs?

Not only have you unnecessarily introduced a potential security risk, you may be unintentionally misleading your customers.  Listening to an old DotNetRocks podcast another reason why this is not a good idea was highlighted - your customers can be misled as to the meaningof the identifier.  The example given in the podcast (thanks Richard & Carl) was that the customer was upset that they had their own customers identified in their system using the database primary key.  When their biggest customer was ID 372 it was a disaster and insisted it had to be changed.  No matter how arbitary that ID is to a developer, perception is everything in business.

And finally for the reason that has me so cantankerous on this subject - in Eric Evans' DDD book he explains
"it is common for identity to be significant outside a particular software system"
What Evans encourages with Domain Driven Design is to let the "domain drive your design".  Nothing insightful there, and it sounds quite simple right? So why isn't this fact enforced in so many software designs? What Evans failed to push home is that when you do have a significant identity outside the software system, the significant identity should be used as the object identifier.  By designing your object model to incorporate such identities you are not only accurately aligning your implementation closer to the business domain, but you are gaining and sharing a better understanding of the business domain with anyone else using this object model (however subtle it may be).  Simply taking the easy way out by using a database primary key is not only encouraging a lazy solution, but is discouraging your development team to realise a better and more accurate design.

Of course Evans goes on to explain that although this is indeed common, identities are sometimes only important in the system context (exceptions to every rule of course).  And that is ok and should be expected also. The main point I am trying to emphasise is that you should always look at the domain to identify an identity for your object.  When there isn't a logical one in the real world, well of course you should have a unique identifier mechanism. Absolutely, but please just do not make it a database primary key!

No comments:

Post a Comment