Where there is data associated with a particular object that may change over a period of time there may be a requirement to keep a history of those changes so that you can tell what values were in effect for a particular date. Not only is this useful for keeping a record of changes that have been made in the past, it may also be useful for entering changes that will not come into effect until a date in the future. Typical examples of this requirement are:
In my long career I have seen several different ways of satisfying this requirement, some methods being better than others, so I want to share with you what I consider to be the most effective and efficient design.
First we must state the rules that must be satisfied in the design:
As we may be holding multiple history records for an object the database design should be obvious - a one-to-many relationship between the object and its history, as shown in figure 1 below:
Figure 1 - E-R diagram of OBJECT and OBJECT_HISTORY
The only questionable area now is the layout of the OBJECT_HISTORY table. Below is one design that I came across quite recently:
Design 1 - not good
|ID||Technical primary key|
|OBJECT_ID||Foreign key to OBJECT table|
|START_DATE||Starting date for this value|
I do not like this design as the use of an unnecessary technical primary key requires the maintenance of a counter and the creation of a second index for the foreign key. For further insight into my opinion on the indiscriminate versus intelligent usage of technical keys please refer to Technical Keys - Their Uses and Abuses which is available on my articles page.
A second design I came across several years ago was similar to the following:
Design 2 - not good
|OBJECT_ID||Primary key, and Foreign key to OBJECT table|
|DATE||Primary key, Starting date for this value|
I do not like this design as it has the start date built into the primary key, which means that it cannot be changed. I remember the panic this caused when some butter-fingered user accidentally entered the wrong date and wanted to change it in a hurry.
I dislike both of these designs as they do not hold the end date for each entry, therefore they both require to access more than 1 occurrence in order to find the single occurrence that matches the target date. The implementation I saw for Design 1 required separate stored procedures to accomplish the following steps:
The following design is the one that I prefer to use as it makes the retrieval of data extremely fast and efficient:
Design 3 - my favourite
|OBJECT_ID||Primary key, Foreign key to OBJECT table|
|SEQ_NO||Primary key, starts at 1 for each object|
|START_DATE||Starting date for this value|
|END_DATE||Ending date for this value|
This design has the following advantages over the others:
read u_where (start_date <= $target_date$ & end_date >= $target_date$)
Note that if an entry does not yet have a value for END_DATE I do not leave it as null. I always use a dummy date such as 31-12-9999 to simulate 'sometime in the future', as explained in Dealing with null End Dates from my Tips & Trix page.
The maintenance of these history occurrences is not a problem provided that you keep to the following rules:
selectdbstatement as follows:
selectdb max(seq_no) from "object_history" %\ u_where (object_id.object_history = object_id.object) %\ to $1 seq_no.object_history = $1 + 1or by using a counter on the parent record as follows:
last_seq_no.object = last_seq_no.object + 1 seq_no.object_history = last_seq_no.object
Note that the
selectdb statement above is very efficient as it references a field which is indexed. This requires only a single database access.
Note also that the use of a sequential number in the compound primary key makes the identification and retrieval of the previous and next occurrences very simple and very efficient:
This simplicity and efficiency is lost if you employ one of the solutions shown in Design 1 and Design 2.
Although this is a common and relatively simple requirement which can appear to be satisfied in several different ways there may be hidden drawbacks in a particular design that do not make themselves apparent until after it has been implemented. I have personally witnessed the weaknesses of some designs and have therefore created my own solution which does not contain any of those weaknesses. I hope that you can benefit from my experience and thus avoid your own painful learning curve.
8th November 2001