8th June 2011
The premise of the article SOLID OO Principles is that in order to be any "good" at OO programming or design you must follow the SOLID principles otherwise it will be highly unlikely that you will create a system which is maintainable and extensible over time. This to me is another glaring example of the Snake Oil pattern, or "OO programming according to the church of <enter your favourite religion here>". These principles are nothing but fake medicine being presented to the gullible as the universal cure-all. In particular the words "highly unlikely" lead me to the following observations:
Some of these principles may have merit in the minds of their authors, but to the rest of us they may be totally worthless. For example, some egotistical zealot invents the rule "Thou shalt not eat cheese on a Monday!" What happens if I ignore this rule? Does the world come to an end? Does the sun stop shining? Do the birds stop singing? Does the grass stop growing? Does my house burn down? Does my dog run away? Does my hair fall out? If I ignore this rule and nothing bad happens, then why am I "wrong"? If I follow this rule and nothing good happens, then why am I "right"?
It is possible to write programs which are maintainable and extensible WITHOUT following these principles, so following them is no guarantee of success, just as not following them is no guarantee of failure. Rather than being "solid" these principles are vague, wishy-washy, airy-fairy, and have very little substance at all. They are open to interpretation and therefore mis-interpretation and over-interpretation. Any competent programmer will be able to see through the smoke with very little effort.
My own development infrastructure is based on the 3 Tier Architecture, which means that it contains the following:
It also contains an implementation of the Model-View-Controller (MVC) design pattern, which means that it contains the following:
These two patterns, and the way they overlap, are shown in Figure 1:
Figure 1 - MVC plus 3 Tier Architecture
Please note that the 3 Tier Architecture and Model-View-Controller (MVC) design pattern are not the same thing.
The results of my approach clearly show the following:
Yet in spite of all this my critics (of which there are many) still insist that my implementation is wrong simply because it breaks the rules (or their interpretation of their chosen rules).
My primary criticism of each of the SOLID principles is that, like the whole idea of Object Oriented Programming, it is possible for individuals to create their own interpretation, then try to enforce that as the "only" interpretation worth having. There are too many people out there who are fond of saying "Don't do it like that, do it like this!" where "this" is always different to what the previous person said. Below I will examine each of the principles in turn and show you what I mean.
My secondary criticism of these principles is that they do not come with a "solid" reason for using them. If they are supposed to be a solution to a problem then I want to see the following:
If the only bad thing that happens if I choose to ignore any of these principles is that I offend someone's delicate sensibilities (aah, diddums!), then I'm afraid that the principle does not have enough substance for me to bother with it, in which case I can consign it to the dustbin and not waste any of my valuable time on it.
It is also worth noting here that some of the problem/solution combinations I have come across on the interweb thingy are restricted to a particular language or a particular group of languages. PHP is a dynamically-typed scripted language, therefore does not have the problems encountered in a strongly-typed or compiled language. PHP may also achieve certain things in a different manner from other languages, therefore something which may be a problem in one of those other languages simply doesn't exist as a problem in PHP.
There is an old axiom in the engineering world which states "If it ain't broke then don't fix it". If I have code that works why should I change it so that it satisfies your idea of how it should be written? Refactoring code unnecessarily is often a good way to introduce new bugs into the system.
A similar saying is "If I don't have your problem then I don't need your solution". Too many so-called "experts" see an idea or design pattern that has benefits in a limited set of circumstances, so they instantly come up with a blanket rule that it should be implemented in all circumstances without further thought. If you are not prepared to think about what you are doing, and why, then how can you be sure that you are not introducing a problem instead of a solution.
Another old saying is "prevention is better than cure". Sometimes a proposed solution does nothing more than mask the symptoms of the problem instead of actually curing it. For example, if your software structure is different from your database structure then the popular solution is to implement an Object Relational Mapper to deal with the differences. My solution would be totally different - eliminate the problem by not having incompatible structures in the first place!
The Single Responsibility Principle (SRP), also known as Separation of Concerns (SoC), states that an object should have only a single responsibility, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility. But what is this thing called "responsibility" or "concern"? How do you know when a class has too many and should be split? When you start the splitting process, how do you know when to stop? In his article Test Induced Design Damage? Robert C. Martin (Uncle Bob) provides this description:
How do you separate concerns? You separate behaviors that change at different times for different reasons. Things that change together you keep together. Things that change apart you keep apart.
GUIs change at a very different rate, and for very different reasons, than business rules. Database schemas change for very different reasons, and at very different rates than business rules. Keeping these concerns (GUI, business rules, database) separate is good design.
If you take a look at Figure 1 you will see that the GUI is handled in the Presentation layer, business rules are handled in the Business layer, and database access is handled in the Data Access layer.
In a later article called The Single Responsibility Principle Uncle Bob also wrote the following
This is the reason we do not put SQL in JSPs. This is the reason we do not generate HTML in the modules that compute results. This is the reason that business rules should not know the database schema. This is the reason we separate concerns.
My framework does not access the database from the Presentation layer, nor does it generate any HTML from the Business layer, therefore I am following Uncle Bob's description.
He also says the following:
Another wording for the Single Responsibility Principle is:Gather together the things that change for the same reasons. Separate those things that change for different reasons.
If you think about this you'll realize that this is just another way to define cohesion and coupling. We want to increase the cohesion between things that change for the same reasons, and we want to decrease the coupling between those things that change for different reasons.
That is why if I want to switch the DBMS from MySQL to something else like PostgreSQL, Oracle or SQL Server I need only change the Data Access layer. If I want to change the output from HTML to something else like PDF, CSV or XML I need only change the Presentation layer. Each component in the Business layer handles the data for a single database table, so this only changes if the table's structure, data validation rules or business rules change. The Business layer is not affected by a change in the DBMS engine, nor a change in the way its data is presented to the user.
So if my implementation follows the descriptions provided by Uncle Bob, who are you to tell me that I am wrong?
If you build a user transaction where all the code is contained within a single script you have an example of the architecture shown in Figure 2:
Figure 2 - The 1-Tier Architecture
This means that you cannot make a change in one of those areas without having to change the whole component. It is only by splitting that code into separate components, where each component is responsible for only one of those areas, will you have a modular system with reusable and interchangeable modules. This satisfies Robert C. Martin's description as it allows you to make a change in any one of those layers without affecting any of the others. You can take this separation a step further by splitting the Presentation layer into separate components for the View and Controller, as shown in Figure 1.
Another problem I have encountered quite often in other people's designs is deciding on the size and scope of objects in the Business layer (or Model in the MVC design pattern). As I design and build nothing but database applications my natural inclination is to create a separate class for each database table, but I have been told on more than one occasion that this is not good OO. This means that the "proper" approach, according to those people who consider themselves to be experts in such matters, is to create compound objects which deal with multiple database tables. The structure shown in Figure 3 identifies a single Sales Order object which has data spread across several tables in the database:
Figure 3 - Single object dealing with multiple tables
Far too many OO programmers seem to think that the concept of an "order" requires a single class which encompasses all of the data even when that data is split across several database tables. What they totally fail to take into consideration is that it will be necessary, within the application, to write to or read from tables individually rather than collectively. Each database table should really be considered as a separate entity in its own right by virtue of the fact that it has its own properties, its own methods and its own validation rules. The compound class will therefore require separate methods for each table within the collection, and these method names must contain the name of the table on which they operate and the operation which is to be performed. This in turn means that there must be special controllers which reference these unique method names, which in turn means that the controller(s) are tightly coupled to this single compound class. As tight coupling is supposed to be a bad thing, how can this structure be justified?
On the other hand if you go too far you end up with ravioli code, a mass of tiny classes which end up by being less readable, less usable, less efficient, less testable and less maintainable. This is like having an ant colony with a huge number of workers where each worker does something different. When you look at this mass of ants, how do you decide who does what? Where do you look to find the source of a bug, or where to make a change? A prime example of this is a certain open source email library which uses 100 classes, as shown in Figure 4:
Figure 4 - Too much separation
This appalling design is made possible by constructing some classes which contain a single method, and having some methods which contain a single line of code. But who in their right minds would create 100 classes just to send an email? WTF!!!
This particular problem is discussed further by Brandon Savage in his article Avoiding Object Oriented Overkill.
The confusion over the idea that "responsibility" should be treated as "reason for change" is discussed in I don't love the single responsibility principle.
Uncle Bob also wrote an article called One Thing: Extract till you Drop in which he advocated that you should extract all the different actions out of a function or method until it is physically impossible to extract anything else. In theory the end result could be a large number of methods each containing a single line of code. While this may sound good as an academic exercise, is it a worthwhile in the real world? Some of his commentors raised the following objections:
A function by definition, returns 1 result from 1 input. If there's no reuse, there is no "should". Decomposition is for reuse, not just to decompose. Depending on the language/compiler there may be additional decision weights.
What I see from the example is you've gone and polluted your namespace with increasingly complex, longer, more obscure, function name mangling which could have been achieved (quick and readable) with whitespace and comments.
When we write code, we group together elements that are strongly associated (the cohesion/coupling principle). Functions & methods are a way of grouping such elements and providing an image (method name). Saying that a function should do one thing is the same as identifying it as a chunk. However, using the indirection of a function has time and capacity penalties for our limited cognitive process. We can use statement structure and comments to identify chunks and provide images rather than revert to formal functions (and the associated indirection cost). I would envisage that each person's view of the most appropriate level of extraction is different and probably primarily dependent upon programming expertise.
Some people try to justify this excessive proliferation of classes by inventing a totally artificial rule which says "No method should have more than X lines, and no class should have more than Y methods". What these dunderheads fail to realise is that such a rule completely violates the principle of encapsulation which states that
ALL the properties and ALL the methods for an object should be assembled into a SINGLE class. This means that splitting off an object's properties into separate classes, or an object's methods into separate classes, is a clear violation of this fundamental principle.
In my own development framework, the basic structure of which is shown in Figure 1, when I create the Model components I go as far as creating a separate class for each database table. Anything less would be not enough, and anything more would be too much.
In my own development infrastructure, which is shown in Figure 1, each component has a separate and distinct responsibility:
Note that only the Model classes in the Business layer are application-aware. All the Views, Controllers and DAOs are application-agnostic and can work with any database table. This architecture meets the criteria of "reason for change" because of the following:
Each component has a single responsibility which can be readily identified, either as a data entity or a function which can be performed on that data, so when someone tells me that I have not achieved the "correct" separation of responsibilities please excuse me when I say that they are talking
bullshit out of the wrong end of their alimentary canals.
The Open/Closed Principle (OCP) states that "software entities should be open for extension, but closed for modification". This is actually confusing as there are two different descriptions - the Meyer's Open/Closed Principle and Polymorphic Open/Closed Principle. The idea is that once completed, the implementation of a class can only be modified to correct errors; new or changed features will require that a different class be created. That class could reuse coding from the original class through inheritance.
While this may sound a "good" thing in principle, in practice it can quickly become a real PITA. My main application has over 200 database tables with a different class for each table. These classes are actually subclasses which are derived from a single abstract table class, and this abstract class is quite large as it contains all the standard code to deal with any operation that can be performed on any database table. The subclasses merely contain the specifics for an individual database table. Over the years I have had to modify the abstract table class by changing the code within existing methods or adding new methods. If I followed this principle to the letter I would leave the original abstract class untouched and extend it into another abstract class, but then I would have to go through all my subclass definitions and extend them from the new abstract class so that they had access to the latest changes. I would then end up with a collection of abstract classes which had different behaviours, and each subclass would behave differently depending on which superclass it extended.
It may take time and skill to modify my single abstract table class without causing a problem in any of my 200 subclasses, but it is, in my humble opinion, far better than having to manage a collection of different abstract table classes.
According to Craig Larman in his article Protected Variation: The Importance of Being Closed (PDF) the OCP principle is essentially equivalent to the Protected Variation (PV) pattern: "Identify points of predicted variation and create a stable interface around them". OCP and PV are two expressions of the same principle - protection against change to the existing code and design at variation and evolution points - with minor differences in emphasis. This makes much more sense to me - identify some processing which may change over time and put that processing behind a stable interface, so that when the processing does actually change all you have to do is change the implementation which exists behind the interface and not all those places which call it. This is exactly what I have done with all my SQL generation as I have a separate classes for each of the mysql_*, mysqli_*, PostgreSQL, Oracle and SQL Server extensions. I can change one line in my config file which identifies which SQL class to load at runtime, and I can switch between one DBMS and another without having to change a line of code in any of my model classes.
In the same article he also makes this interesting observation:
We can prioritize our goals and strategies as follows:
Low coupling and PV are just one set of mechanisms to achieve the goals of saving time, money, and so forth. Sometimes, the cost of speculative future proofing to achieve these goals outweighs the cost incurred by a simple, highly coupled "brittle" design that is reworked as necessary in response to true change pressures. That is, the cost of engineering protection at evolution points can be higher than reworking a simple design.
- We wish to save time and money, reduce the introduction of new defects, and reduce the pain and suffering inflicted on overworked developers.
- To achieve this, we design to minimize the impact of change.
- To minimize change impact, we design with the goal of low coupling.
- To design for low coupling, we design for PVs.
If the need for flexibility and PV is immediately applicable, then applying PV is justified. However, if you're using PV for speculative future proofing or reuse, then deciding which strategy to use is not as clear-cut. Novice developers tend toward brittle designs, and intermediates tend toward overly fancy and flexible generalized ones (in ways that never get used). Experts choose with insight - perhaps choosing a simple and brittle design whose cost of change is balanced against its likelihood. The journey is analogous to the well-known stanza from the Diamond Sutra:
Before practicing Zen, mountains were mountains and rivers were rivers.
While practicing Zen, mountains are no longer mountains and rivers are no longer rivers.
After realization, mountains are mountains and rivers are rivers again.
Even if I do create a core class and modify it directly instead of creating a new subclass, what problem does it cause (apart from offending someone's sensibilities)? If the effort of following this principle has enormous costs but little (or no) pay back, then would it really be worth it?
Like everything else associated with OO, this principle uses definitions which are extremely vague and open to enormous amounts of misinterpretation, as discussed in Say "No" to the Open/Closed pattern.
The Liskov Substitution Principle (LSP) states that "objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program". In geek-speak this is expressed as: "If S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program". This is supposed to prevent the problem where a subtype may introduce methods that are not present in the supertype, and the introduction of these methods may allow state changes in the subtype that are not permissible in the supertype.
This may be difficult to comprehend unless you have an example which violates this rule, and such an example can be found in the Circle-ellipse problem. This can be summarised as follows:
One possible solution would be for the subclass to implement the inapplicable method, but to either return a result of FALSE or to throw an exception. Even though this would circumvent the problem in practice, it would still technically be a violation of LSP, so you will always find someone somewhere who will argue against such a practical and pragmatic approach and insist that the software be rewritten so that it conforms to the principle in the "proper" manner.
Even if your code actually violates this principle, what would be the effect in the real world? If you try to invoke a non-existent method on an object then the program would abort, but as this error would make itself known in the system/QA testing it would be fixed before being released to the outside word. So this type of error would be easily detected and fixed, thus making it a non-problem.
But how does this rule fit in with my application where I have 200 table classes which all inherit from the same abstract class? Should I be expected to substitute the Product subclass with my Customer subclass and still have the application perform as expected?
The Liskov Substitution Principle is also closely related to the concept of Design by Contract (DbC) as it shares the following rules:
Unfortunately not all programming languages have the ability to support DbC, and PHP is not one of them due to the fact it is not statically typed and not compiled. This is discussed in Programming by contracts in PHP. Certain languages, like Eiffel, have direct support for preconditions and postconditions. You can actually declare them, and have the runtime system verify them for you.
The Interface Segregation Principle (ISP) states that "many client specific interfaces are better than one general purpose interface". Once an interface has become too 'fat' it needs to be split into smaller and more specific interfaces so that any clients of the interface will only know about the methods that pertain to them. In a nutshell, no client should be forced to depend on methods it does not use.
What exactly does 'fat' mean in this context? From the simple description above I would define a general-purpose interface as one which could perform several operations, such as create, read, update or delete, where one of the arguments defined the operation to be performed. The method would then invoke a different piece of code depending on what operation name was supplied at runtime. I totally agree that such a method, which is capable of performing several separate and distinct operations, should be split into a separate method for each operation. That is why my abstract table class does not have a single doSomething($operation, $data) method, but instead has a separate method for each separate operation:
You may think that this is OK, but some bright spark will jump up and say that my getData method is too generic as it can return any number of records using any combination of selection criteria. "You should do what I do" he says, "and have separate and distinct methods for getById, getByName, getByX, getByY, getMultiple, et cetera". Now why would I want to follow his example and create more work without any tangible benefit?
Yet there is another definition of what "fat" means. In his paper entitled The Interface Segregation Principle (PDF) Robert C. Martin defines "fat interfaces" as those which are non-cohesive, or not specific to a single client. In other words, the interfaces of the class can be broken up into groups of member functions. Each group serves a different set of clients. Thus some clients use one group of member functions, and other clients use the other groups. In his example of the ATM User Interface he defines a "client" as a transaction (with its own UI) which performs a distinct operation such as deposit, withdraw and transfer. Each of these transactions uses some of the methods which are available in the ATM object, but not all of them, and it is these "unused" methods which cause a (theoretical) problem. But what is the problem, exactly? According to Robert "This creates the possibility that changes to one of the derivatives of Transaction will force corresponding change to the UI, thereby affecting all the other derivatives of Transaction, and every other class that depends upon the UI interface." I'm sorry but "creates the possibility" is simply not good enough. If I know that this possibility does not exist in my application, especially as I am programming in a dynamic language which has no compilation phase and no DLL files, then why should I expend effort to deal with a situation that simply will not happen?
If I have a class that contains 10 methods, and this class has 10 clients which access only 1 method each and ignore the other 9, this means that I do not have separate interfaces which are specific to a single client but rather a single set of interfaces which can be used, or not used, by any client. This would appear to be a violation of this principle, but so what? What exactly is the problem caused by a client being exposed to interfaces that it does not actually use? If a client does not reference a particular method then is it actually "exposed" to that interface? How does the existence of this non-referenced method affect this particular client? If it does not cause a problem, then why should I implement the solution of creating a separate subclass with a separate interface which is unique to each client? This would not work anyway as it is simply not possible, using inheritance, to create a subclass which has fewer methods than its superclass. I could do this by creating separate classes instead of subclasses (which would break encapsulation by the way), but the effort would be considerable and the pay back absolutely nil, so why should I bother?
My interpretation of this principle is that although each class provides separate methods for the Create, Read, Update and Delete (CRUD) operations, I have a collection of controllers which only refer to the operations that they actually use. Thus the Create controller does not refer to the Delete operation just as the Delete controller does not refer to the Create operation. This means that I can change any interface without having to change any controller which does not refer to that interface. By change I also mean recompile as PHP is not a compiled language.
The Dependency Inversion Principle (DIP) states that "the programmer should depend upon abstractions and not depend upon concretions". This is pure gobbledygook to me! Another way of saying this is:
This to me is confusing as it uses the terms "abstract" and "concrete" in ways which contradict how OO languages have been implemented. My understanding is as follows:
This principle is actually concerned with a specific form of de-coupling of software modules. According to the definition of coupling the aim is to reduce coupling from high or tight to low or loose and not to eliminate it altogether as an application whose modules are completely de-coupled simply will not work. Coupling is also associated with Dependency. The modules in my application are as loosely coupled as it is possible to be, and any further reduction would make the code more complex, less readable and therefore less maintainable.
The only workable example of this principle which makes any sense to me is the "Copy" program which can be found in Robert C. Martin's The Dependency Inversion Principle (PDF). In this the Copy module is hard-coded to call the readKeyboard module and the writePrinter module. While the readKeyboard and writePrinter modules are both reusable in that they can be used by other modules which need to gain access to the keyboard and the printer, the Copy module is not reusable as it is tied to, or dependent upon, those other two modules. It would not be easy to change the Copy module to read from a different input device or write to a different output device. One method would be to code in a dependency to each new device as it became available, and have some sort of runtime switch which told the Copy module which devices to use, but this would eventually make it bloated and fragile.
The proposed solution is to make the high level Copy module independent of its two low level reader and writer modules. This is done using dependency injection where the reader and writer modules are instantiated outside the Copy module and then injected into it just before it is told to perform the copy operation. In this way the Copy module does not know which devices it is dealing with with, nor does it care. Provided that they each have the relevant read and write methods then it will work. This means that new devices can be created and used with the Copy module without requiring any code changes or even any recompilations of that module.
While this contrived example shows obvious benefits from using the Dependency Inversion Principle, it would be foolish to assume that similar benefits can be obtained from implementing this principle in every situation where there is a dependency. What if you know that your dependencies will never change, or will never need a different configuration? Why go through the effort of implementing the ability to make changes when those changes will never happen? This particular topic is described more fully in Dependency Injection is Evil.
Like any other design pattern each of these principles has been formulated to offer a solution to a specific problem, and this leads me to make the following observations:
Sometimes, the cost of speculative future proofing outweighs the cost incurred by a simple, highly coupled "brittle" design that is reworked as necessary in response to true change pressures.Is it really worth while to spend a pound to save a penny?
Following these principles with blind obedience and implementing them without question is no guarantee that your software will be perfect. As I said in the introduction different OO "experts" have different opinions as to what is the "right" way and what is the "wrong" way. It is simply not possible to follow one person's opinion without offending someone else. If you don't follow the SOLID principles someone will be offended. Even if you do attempt to follow them someone else will jump up and say "Your implementation is wrong!", or "Your implementation goes too far!" or "Your implementation does not go far enough!" or "Don't do it like that, do it like this!" It is simply not possible to find a solution which satisfies everyone and offends no one, and if you attempt to do so you may end up in the situation described in The Man, The Boy, and The Donkey where the punch line is "if you try to please everyone you may as well kiss your ass goodbye!"
If it is not possible to please everyone then what can you do? The simple answer is to please yourself - ignore everyone else and do what you think is best for your particular circumstances. After all, it is you building the software, not them. You are the one who is going to deploy and maintain it, not them. It is your ass on the line, not theirs.
When I was a junior programmer I had to follow the lead set by my so-called "superiors", but I kept hitting obstacles that their methodologies created. When I proposed solutions to these obstacles I was constantly put down with "You can't do that as it is against the rules!" or sometimes "How dare you have the audacity to question the rules! Don't think about them, just do as you're told!" When I became senior enough to create my own methodology I concentrated on what I needed to do to get the job done with as few of these known obstacles as possible, and in the process I found myself throwing more and more of these silly rules into the waste bin. When others see that I am not following "their" set of rules they instantly accuse me of being "wrong", "impure" and a heretic, but why should I care? I am results-oriented, not rules-oriented, I am a pragmatist, not a dogmatist, so the fact that I have created software which is powerful, flexible, extensible and maintainable is all the justification that I need. If it works, and if the effort required to keep it working and update it is minimal, then how can it possibly be "wrong"? I have seen projects fail because too much attention was focussed on the rules instead of the results, so if something fails how can it possibly be "right"?
© Tony Marston
8th June 2011