(with apologies to William Shakespeare)
I recently came across an article written by Robert C. Martin (Uncle Bob) with the title NO DB in which he complains about the rise of relational databases and the fact that some developers have the audacity to design their systems around the database. Here are some quotes from that article:
During this time I watched in horror as team after team put the database at the center of their system. They had been convinced by the endless marketing hype that the data model was the most important aspect of the architecture, and that the database was the heart and soul of the design.
But then I noticed something. Some of the systems using these nice, simple, tasty, non-relational databases were being designed around those databases. The database, wrapped in shiny new frameworks, was still sitting at the center of the design! That poisonous old relational marketing hype was still echoing through the minds of the designers. They were still making the fatal mistake.
As a developer who has been designing and building enterprise applications for over 30 years I dispute the notion that designing an application around the database is a fatal mistake. On the contrary, not only is it not fatal, by adopting a database-centric approach I have created an open source framework called RADICORE for building database applications which allows me to be far more productive than any of my rivals. I have used this framework to build a large ERP package, originally called TRANSIX, but which is now being marketed under the name GM-X by Geoprise Technologies. This application uses 350 database tables, 700 relationships, and handles over 2,800 user transactions.
I use my own framework instead of any of the others for several reasons:
In his article Uncle Bob makes the following statement:
The center of your application is not the database. Nor is it one or more of the frameworks you may be using. The center of your application are the use cases of your application.
While I agree that the use cases are vitally important, I do not agree that they are the center of the application. Use cases merely identify what needs to be done, but the application implements those use cases by showing how it is actually done using a particular toolset (operating system, programming language, database, et cetera). The use cases are nothing more than a paper representation of the user requirements, while the application is a physical implementation of those requirements.
When building software which interfaces with a database you are building what used to be called a "data processing system", so it is necessary to first understand what is meant by a "system" in general then a "data processing system" in particular.
A "system" is a something which takes "input" and transforms it into "output", as shown in Figure 1:
Figure 1 - a System
A factory is an example of a system as it takes raw materials and transforms them into finished goods.
A "data processing system" is something which sits between a user and a database. Raw data goes in and is stored in a database from where it can be extracted and formatted in a variety of different ways. This is shown in Figure 2:
Figure 2 - a Data Processing System
AS the software sits between two entities - a user and a database - it is therefore required to have a separate interface for each of these entities, as shown in Figure 3:
Figure 3 - software with two interfaces
Those of you who are on the ball may notice that this 3-part structure can be implemented using the 3-Tier Architecture.
Having worked in a software consultancy for many years designing and building a variety of applications for a variety of customers I have personally been part of the full application life cycle which follows these steps:
This "logical" design is nothing more than a paper document which does not require the generation of any program code. It can be shown to and discussed with the customer to verify that it can meet all their requirements. It can also be used by the consultancy to provide cost and time estimates to build the application.
Note that the output of the analysis/design phase has two parts - a list of use cases with their user interfaces and a logical database design. One without the other would be virtually useless.
As the customer paid separately for the production of the Logical Design it had to be of good enough quality and have sufficient detail so that it could be implemented by another consultancy. Different consultancies use different methodologies and toolsets, so may be able to implement that design with different costs and/or timescales.
The team leaders decide on the application architecture, the programming language, and what framework and libraries will be used. Each functional specification identifies a user interface, what actions need to be taken on what database tables, and what business rules need to be applied. The developer then turns the specification into code, tests it, then releases it.
It is important to note here that the output of the analysis/design phase identifies what needs to be done but not how it should be done. It should not be biased towards any particular languages, toolsets or libraries unless it is required to integrate closely with an existing application.
Up until this point each Use Case is nothing but a paper design, and the application itself does not exist except on paper. Once the Build phase starts the physical database can be built from the logical design, then detailed functional specifications can be produced and handed off to the developers so that they can turn the paper designs into working code. I have seen two approaches used by analysts when preparing functional specifications for developers:
In my experience the former has always led to disasters as analysts who are not also developers are not the best people to determine what code should be written in order to achieve the desired result. An experienced developer is quite capable of taking a document which describes the what and turning it into the how by writing efficient and effective code. I have personally seen what happens when the analyst emphasises the how without mentioning the what. In one case I wrote the code exactly as it was specified as that was all I had to work with. When the analyst tested what I had written he complained to me that it did not produce the results that he expected. It was only by getting him to explain precisely what the program was supposed to do that I spotted a serious flaw in his pseudo-code. The code that I eventually wrote to implement his requirements bore no resemblance to the code that he had envisaged, yet the results were correct.
Please read Logical Versus Physical Database Modeling for a more complete description of the differences between a logical and physical database design.
The start of the Build phase is also when the big question is asked - do you design the software to match the database, or design the database to match the software?
Too many of today's OO programmers know very little of how relational databases work, but have their heads crammed full of Object-Oriented Design (OOD) theory. They seem to think that the universe revolves around their precious theories and that everything else, especially the database, is nothing but an implementation detail which can be left until later. A common approach is to use mock objects to simulate all communication with the database and not to build the physical database until the software has been finalised. A reason for this is that it may become difficult to make changes to the database structure after the software has been written. As far as I am concerned if an application which is written to interact with a database cannot deal with changes to the database structure without requiring significant changes to large numbers of software modules then that software has been badly written.
Anyone who has been involved in writing database applications for a reasonable amount of time should be able to tell you that the design of the database is absolutely paramount. If you get this design wrong this will have a detrimental effect on the entire application and no amount of clever code will fix it. This is emphasised in the following quote:
Smart data structures and dumb code works a lot better than the other way around.
Eric S. Raymond, "The Cathedral and the Bazaar"
In order to design an efficient structure for your database you should follow the rules of data normalisation. You know when this design is correct when you can generate SQL queries to obtain the data for all your use cases as efficiently as possible. Any online query that is forced to use a full table scan is the product of a bad database design and will cause slow response times followed by complaints from your users. The beauty of a good database design is that not only will it handle all the queries you have identified today, but it will also be able to handle most if not all of the new queries that may arise in the future. Invariably the user will generate new or modified requirements which may require the addition of extra columns or extra tables, or perhaps moving columns from one table to another. Any such modifications should also be subject to the rules of normalisation so that the data structure is not compromised otherwise it will come back to bite you. Been there, done that, got the t-shirt.
Once the physical database has been built it should be possible to start writing the code to access that database. I say should, but unfortunately the OO purists just love to introduce a totally unnecessary software design phase which has to deal with IS-A and HAS-A relationships, along with class hierarchies, dependencies, associations, aggregations and compositions. The result of this process is invariably a software structure which is incompatible with the database structure. This causes a problem known as the Object-Relational Impedence Mismatch for which the solution is an Object-Relational Mapper (ORM). I think that such things are an abomination as they create more problems than they solve. In a situation such as this I prefer to follow this advice:
It is better to attack the root cause of a problem instead of addressing its symptoms.
If the root cause of the problem is a mismatch between the two structures then in my book the best solution would be to remove the cause of the mismatch. This can be done by producing one structure instead of two, but which one should you keep and which one should you ignore? The answer to me was a no-brainer. Having designed and developed database applications for several decades before switching to an OO-capable language I was very familiar with database design. In 2002 when I ventured into the world of OOP with the PHP language I knew nothing about object-oriented programming except that it was the same as procedural programming but with the addition of encapsulation, inheritance and polymorphism. I was not aware that I was supposed to design my classes according to a different set of artificial rules, so I ended up in the position of ignoring a huge number of them.
It is not possible to build an application directly from a user's Specification of Requirements (SOR) as it invariably does not contain enough detail. A proper analysis and design phase is needed in order to identify two things - a list of use cases (user transactions) with their user interfaces (UI), and a logical database design. The number of database tables and relationships, plus the number and complexity of the use cases, will give an idea of the size of the application and what resources may be required to build it. Once you know what each use case is supposed to achieve from a user's perspective you can then map out which database tables to access, how they need to be accessed (Create, Read, Update or Delete), and what business rules need to be applied.
It is even possible to start with nothing more than a database design and begin building the software straight away by deducing what user transactions will be needed. In other words you can build the use cases from the database design. This is actually easier than than the other way around.
Once you have the database design you do not need to go through a separate design phase for the software. You simply build a user transaction for each use case which performs the relevant CRUD operation(s) on the relevant database table(s) and add in the code to deal with any extra business rules. It helps if you have a framework which was specifically designed to build user transactions which do things with database tables.
The biggest problem that I have seen amongst OO purists is that they ignore the database design when building the software, and so produce a software structure which does not match the database structure. This is a fundamental mistake which I learned to avoid as far back as 1980 when I attended a Jackson Structured Programming course, and it is a mistake that I absolutely refuse to repeat. Consequently when I start the build phase all I do is create the physical database from the logical database design, then start building my components to implement each use case. I do not waste any time in designing my software structure - what classes do I need, what class hierarchies do I need, what methods do I need to define - as all those decisions are unnecessary when using my RADICORE framework:
Using my Data Dictionary I am able to generate a fully-functioning class for each database table at the touch of a button. Using my library of Transaction Patterns I can also create fully-functioning user transactions at the touch of a button. Note here that each use case has to be translated into what communication it has with the database, so "Create an Invoice" becomes "Add a record to the INVOICE table". Every instance of "Add a record to the ??? table" then becomes an instance of either the ADD1 or ADD2 patterns. These provide basic behavior, including primary data validation, which can be augmented by inserting custom code into the relevant customisable methods. This approach means that developers get to spend the minimum amount of time on doing the basic stuff which leaves them with much more time to spend on the important stuff.
Being able to build an entire application starting with nothing more than a database design and a list of use cases means that I can produce applications far more quickly than my OO purist rivals who like to waste their time ensuring that their software follows a set of arbitrary and artificial rules. This proved useful in 2007 when, after having built a bespoke Sales Order Processing system for a client I was asked if I could build a package which could be used by any client. Having previously obtained a copy of Len Silverston's Data Model Resource Book, which contains a library of universal data models for all enterprises, I knew this would be a good starting point. Using my framework I created a prototype multi-lingual and multi-currency application around the PARTY, PRODUCT, ORDER, INVOICE, INVENTORY and SHIPMENT databases which I could demonstrate to the client in only 6 months. That was over 600 working user transactions in just SIX MONTHS! If you cannot match that level of productivity then you must be doing something wrong.
Since that prototype went live in 2008 I have modified the databases, even added new databases, and added more user transactions. The application how has 350+ database tables, 700+ relationships, and 2,800+ user transactions. I could not have achieved this if I had followed the advice of the OO purists, which tells me that OO "purity" is not the silver bullet that it's made out to be. In fact I would go so far as to say that it is more like a gilded turd.
© Tony Marston
5th March 2017