Tony Marston's Blog About software development, PHP and OOP

What is Object Oriented Programming (OOP)?

Posted on 3rd December 2006 by Tony Marston

Amended on 4th February 2023

Introduction
What OOP is NOT
What is an Object Oriented language?
Basic Terminology
Other Terminology
What OOP is
Optional Extras
The difference between OOP and non-OOP
Practical Examples
Encapsulation
Inheritance
Polymorphism
Popular misconceptions
What Abstraction is not
What Encapsulation is not
What Polymorphism is not
What Inheritance is not
OOP requires a totally different thought process
What types of object should I create?
How many objects should I create?
What structure should I use?
How much reusability should I have?
Conclusion
References
Amendment History
Comments

Introduction

Quite often I see a question in a newsgroup or forum along the lines of: What is this thing called 'OOP'? What is so special about it? Why should I use it? How do I use it? The person asking this type of question usually has experience of non-OO programming and wants to know the benefits of making the switch. Unfortunately most of the replies I have seen have been long on words but short on substance, full of airy-fairy, wishy-washy, meaningless phrases which are absolutely no use at all to man or beast.

Having created 1000's of programs using non-OO languages, and another 500+ using the OO features of PHP I feel more than qualified to add my own contribution to the melting pot. According to some OO 'purists' I am not qualified at all as I was not taught to do things 'their' way and I refuse to follow 'their' methods. My response to that accusation is that there is no such thing as 'only one true way' with OOP just as there is no such thing as 'only one true way' with religion. People tell me that my methods are wrong, but they are making a classic mistake. My methods cannot be wrong for the simple reason that they work, and anybody with more than two brain cells to rub together will tell you that something that works cannot be wrong just as something that does not work cannot be right. My methods are not wrong, they are simply different, and sometimes it is a willingness to adopt a different approach that separates the code monkeys from the engineers.

One reason why some people give totally useless answers is that it was what they were taught, and they do not have the intelligence to look beyond what they were taught. Another reason is that some of the explanations about OO are rather vague and can be interpreted in several ways, and if something is open to interpretation it is also open to a great deal of mis-interpretation. If you do not believe that there is widespread confusion as to what OO is and is not then take a look at Nobody Agrees On What OO Is. Even some of the basic terminology can mean different things to different people, as explained in Abstraction, Encapsulation, and Information Hiding. If these people cannot agree on the basic concepts of OOP, then how can they possibly agree on how those concepts may be implemented.


What OOP is NOT

As a first step I shall debunk some of the answers that I have seen. In compiling the following list I picked out those descriptions which are not actually unique to OOP as those features which already exist in non-OO languages cannot be used to differentiate between the two. These features have to be unique to OOP and not shared with non-OO languages.

OOP is about modeling the 'real world'

OOP is a programming paradigm that uses abstraction to create models based on the real world. It provides for better modeling of the real world by providing a much needed improvement in domain analysis and then integration with system design.

Rubbish. OOP is no better at modeling the real world than any other method. Every computer program which seeks to replace a manual process is based on a conceptual software model of that process, and if the model is wrong then the software will also be wrong. The conceptual model is created as an analyst's view of the real world, and the computer software is based solely on this conceptual model. OOP does not provide the ability to model objects which could not be modelled in previous paradigms, it simply provides the ability to produce different types of models where both the data and the operations which act upon that data can be defined (encapsulated) in the same unit (class). OOP does not guarantee that the model will be better, just that the implementation of that model will be different. You should also consider the fact that it would be totally impractical to model the whole of the real world as it is simply too vast and too complicated. It is only ever necessary to model those parts which are actually relevant to your current application, and it is the exercise of deciding what is and is not relevant which decides if your abstraction is correct.

Bear in mind that unless you are developing software which directly manipulates a real-world object, such as process control, robotics, avionics or missile guidance systems, then some of the properties and methods which apply to that real-world object may be completely irrelevant in your software representation. If, for example, you are developing an enterprise application such as Sales Order Processing which deals with entities such as Products, Customers and Orders, you are only manipulating the information about those entities and not the actual entities themselves. In pre-computer days this information was held on paper documents, but nowadays it is held in a database in the form of tables, columns and relationships. An object in the real world may have many properties and methods, but in the software representation it may only need a small subset. For example, an organisation may sell many different products with each having different properties, but all that the software may require to maintain is an identity, a description and a price. A real person may have operations such as stand, sit, walk, and run, but these operations would never be needed in an enterprise application. Regardless of the operations that can be performed on a real-world object, with a database table the only operations that can be performed are Create, Read, Update and Delete (CRUD). Following the process called data normalisation the information for an entity may need to be split across several tables, each with its own columns, constraints and relationships. Each object in the database is a separate table, so I see no reason why I should not have a separate class in my software to deal with each object in my database. Some people advocate having a group of database tables being handled by a single class, but this is not how databases work. It is not necessary to go through one table to get to another as each table is an independent object with its own properties. So, each independent object in the database should have its own independent class in the software.

The article Don't try to model the real world, it doesn't exist puts forward an interesting viewpoint.

The term "abstraction" is also open to interpretation, and therefore mis-interpretation, as discussed in Understand what "abstraction" really means. This is why some people's abstractions look more like the work of Picasso when what is required should look like the work of Michelangelo.

That is why it is possible to create software that does A, B and C but it is useless to the customer as it does not also do X, Y and Z. The real world may contain X, Y and Z but the analyst did not include it in his model either because he did not spot it or because the customer failed to mention it in his Specification Of Requirements (SOR). I know because I have encountered both situations in my long career.

Not everyone agrees that direct real-world mapping is facilitated by OOP, or is even a worthy goal; Bertrand Meyer argues in Object-Oriented Software Construction that a program is not a model of the world but a model of a model of some part of the world; "Reality is a cousin twice removed".

OOP is about code re-use

The power of object-oriented systems lies in their promise of code reuse which will increase productivity, reduce costs and improve software quality.

Rubbish. This implies that code re-use is possible in OOP and not possible in non-OOP. Using OOP does not guarantee that more reusable code will be available as reusability depends on how the code is written, not the language in which it was written. It is possible to produce libraries of reusable modules in any non-OO language (I know, because I was doing just that with COBOL in 1985) just as it is possible to produce volumes of non-reusable code in any OO language.

It does not matter on the capabilities of the language as it is possible to have the same block of code duplicated in 100 places in any language. It is also possible, in any language, to put that block of code into a reusable module and call that module from those 100 places.

The only big difference between procedural and OO languages is that the latter has encapsulation, inheritance and polymorphism while the former does not. The aim should be to use these features in such a way as create code which has more reusability. These features can be used as follows:

  1. Encapsulation - identify a business entity, something with attributes and operations with which the software needs to interact, and create a class for that entity where the attributes become properties and the operations become methods.
  2. Inheritance - where several classes share properties and methods you can put these into a sharable superclass (most likely an abstract class) and then share them with multiple concrete subclasses by means of inheritance. Each subclass then contains everything which was defined in the superclass without it having to be redefined in each subclass.
  3. Polymorphism - where different classes share the same method signatures so when that method is called it is possible to swap the class which is used to instantiate that object for a completely different class. The class/object name is not fixed in the calling code, only the method and its arguments, and the class/object name is not specified until runtime. This leads to a type of code reuse called Dependency Injection.

Note that you must apply these concepts in the stated sequence in order to achieve maximum benefits. If you create classes for the wrong entities then you reduce the possibilities for inheritance. If you get your inheritance wrong then you reduce the possibilities for polymorphism. If you have limited polymorphism then you are reducing the places where you can employ Dependency Injection.

One of the early promises of OOP that I heard many years ago was that it would be possible for a software vendor to produce a library of pre-written classes, and for other developers to use these "off the shelf" classes instead of creating their own custom versions and thus "re-inventing the wheel". This dream never materialised, which just goes to prove that OOP promises much but delivers little.

OOP is about modularity

The source code for an object can be written and maintained independently of the source code for other objects. Once created, an object can be easily passed around inside the system.

Rubbish. The concept of modular programming has existed in non-OO languages for many years, so this argument cannot be used to explain why OO is supposed to be better than non-OO. Just as it is possible in any language to hold the source code for an entire application in a single file, it is just as possible, in any language, to break that source code into smaller modules so that the source code for each module can be maintained and compiled independently of all other modules.

Besides, any software which consists of multiple classes is automatically "modular" as each class can be considered to be a self-contained "module". The critical factor is how well each module or class is designed.

OOP is about plugability

If a particular object turns out to be problematic, you can simply remove it from your application and plug in a different object as its replacement. This is analogous to fixing mechanical problems in the real world. If a bolt breaks, you replace it, not the entire machine.

Rubbish. This is the same as modularity where the source code for any individual module can be modified, recompiled and inserted into the application without having to touch any of the other modules.

OOP is about implementation hiding

By interacting only with an object's methods, the details of its internal implementation remain hidden from the outside world.

In the first place implementation hiding was never one of the aims of OOP, it is merely a by-product of encapsulation. The outside world can see the method names which can be used on a object, but not the code which exists behind those method names.

In the second place implementation hiding is not unique to OOP, nor did it suddenly appear because of OOP. In any language, whether it is object oriented, procedural, functional, or whatever, when you write a function or procedure (and remember that a class method is nothing more than a procedural function within a class) all you are exposing to the outside world, including programmers who write code which calls that function, is the function's signature, as in:

$return = functionName(arg1, arg2, ..., argX);           // procedural
$return = $object->functionName(arg1, arg2, ..., argX);  // object oriented

Here you are identifying three things:

The only thing which is not exposed is the code that is executed when the function is called. You know what the function does but not how it does it. You know what data goes in and what data comes out, but not what code is executed in the middle. In other words how that function is implemented, the actual code which is executed, is hidden from view. The documentation which comes with the function library should describe what the function does so that the programmer can decide if that function is the right one to call, and how to call it, but the actual code behind the function name is still hidden. The documentation may provide a listing of the source code, and the actual source code may be provided for the programmer to view and possibly modify, but as far as the function's signature goes the implementation is effectively hidden. This means that at any time you could install a new version of that function with a modified implementation and, provided that the function's signature did not change, you would not have to change any code which calls that function. This means that the implementation of a function could change at any time but the calling program would not know that it had changed. The implementation is "hidden", so how can the calling program possibly know that it has changed?

OOP is about information hiding

A lot of people assume the misunderstanding regarding implementation hiding then go one step further and say that because the data is part of the implementation then the data must be hidden as well. This is how they justify the use of the visibility options of public, private and protected which in turn necessitate the user of getters (accessors) and setters (mutators). These people do not realise that there is a fundamental difference between "implementation" and "information":

The act of encapsulation is supposed to put an entity's methods and data inside a capsule, but nowhere does it say that the walls of the capsule should be opaque. Nowhere does it say that the data inside the capsule should be hidden from view. Nowhere does it say that an object's data cannot be accessed directly without the use of a separate API. It is only the code behind the object's API which is hidden from view, not the data contained within the object itself.

It is possible to access an object's data in two ways:

While some programmers say that the use of getters and setters to access an object's data should be mandatory, there are others who have a different opinion, as shown in the following:

OOP is about the passing of messages.

Message passing is the process by which an object sends data to another object or asks the other object to invoke a method.

Rubbish. The way that an object's method is invoked in an OO language is identical to the way in which a function or procedure in a non-OO language is invoked. If the language supports both non-OO functions and object methods (as PHP does) the method of invocation is called "calling", not "message passing". In fact in some languages it is necessary to specify the word "call" when invoking a subroutine.

non-OO: $result = function(arg1, arg2, ...)
OO:     $result = $object->function(arg1, arg2, ...)

The result of each invocation is exactly the same - the caller is suspended while control is passed to the callee, and control is not returned to the caller until the callee has finished.

I have worked with messaging software in the past and I can tell you quite categorically that they are completely different. A true messaging system has the following characteristics:

A common example of an asynchronous message system is an email. You send an email, and it goes it the recipient's queue. While you are waiting for a reply you can do other things, but every now and then you check your inbox for a reply.

Activating a method on an object is exactly the same as calling a function, and works as follows:

As you can see the mechanics of activating a method in an object is exactly the same as calling a non-OO function and nothing like sending a message in a messaging system.

OOP is about separation of responsibilities.

Each object can be viewed as an independent little machine with a distinct role or responsibility.

Rubbish. It depends entirely on how the module was written, and not the language in which it was written. It is possible to write independent modules in a procedural language such as COBOL, just as it is possible to write non-independent modules in an OO language.

The problem with "separation of responsibilities" is that different people have a different interpretation as to what it actually means. To some people the database operations such as SELECT, INSERT, UPDATE and DELETE require their own objects whereas others (like myself) put them all together in a single data access object (DAO). Some programmers may have a separate DAO for each table in the database while others (like myself) may have a single DAO which can deal with any and all database tables for a specific DBMS. Before you can separate any responsibilities you must first identify what those responsibilities are, and this is a design decision which is totally separate from the language in which the design is ultimately implemented.

The Single Responsibility Principle (SRP) was first defined by Robert C. Martin who said the following:

How do you separate concerns? You separate behaviors that change at different times for different reasons. Things that change together you keep together. Things that change apart you keep apart.

GUIs change at a very different rate, and for very different reasons, than business rules. Database schemas change for very different reasons, and at very different rates than business rules. Keeping these concerns (GUI, business rules, database) separate is good design.

Creating separate components for the GUI, business rules and database access is not restricted to OOP. This is in fact a description of the 3-Tier Architecture which I first encountered in a non-OO language called UNIFACE.

In all my many years of experience the only project that I have ever been involved in which failed to be implemented due to "technical difficulties" was one where the system architects were OO "experts" who knew everything there was to know (or so they thought) about this "separation of responsibilities". They designed a system around design patterns which had a different module for each responsibility, and this resulted in a design with at least ten layers of code between the UI and the database. This made the creation of new components far more complicated and convoluted than it need be, and it made testing and debugging an absolute nightmare. The result was far too expensive for the client, both in time and money, so he pulled the plug on the whole project and cut his losses. A pair of components which took 10 days to build using these "new fangled" OO techniques took me less than an hour to build using my "old fashioned" non-OO methods. So much for the superiority of OO.

Besides, any software which consists of multiple classes/modules automatically has "separation of concerns" as each class/module can be considered to be "concerned with" or "responsible for" a particular entity. The critical factor is how well each class/module deals with the requirements of its entity.

OOP is easier to learn.

OOP is easier to learn for those new to computer programming than previous approaches, and its approach is often simpler to develop and to maintain, lending itself to more direct analysis, coding, and understanding of complex situations and procedures than other programming methods.

Rubbish. This is just marketing hype. Every new language/tool/paradigm is supposed to be better than everything else, but it rarely is. It is not what you use but how you use it that counts, and I have personally witnessed where an "old" language, when used by competent programmers, regularly outperformed a "new" language which was advertised as being more productive by several orders of magnitude.

A person's ability to learn something is often limited by the quality of the teachers or teaching materials, and I'm afraid that too much of what is being taught is too complicated, too inefficient, and more likely to lead to project failures than successes. Too often the teachers insist that there is "only one way" to do OOP, and that is where I most strongly disagree. I have successfully migrated to OOP by ignoring all these so-called "experts" and drawing on my years of experience with non-OO languages.

Someone once told me that OOP is not as simple as taking a procedural function and wrapping it in a class. I disagree. It *IS* that simple. The only "trick" is placing related functions in the same class (this is called encapsulation), then adjusting them to deal with the state which can be maintained within an object. The really clever thing that you can do with classes is to extend a parent or abstract class into a number of subclasses through inheritance. You can also have a function/method which is available in objects which are instantiated from different classes, which gives you polymorphism. I have seen too many examples where classes have been created with the wrong mix of functions, either related functions not being in the same class, or classes containing functions which are not actually related. I have seen inheritance over used so much that the resulting class hierarchy is really difficult to maintain and enhance. I have seen programmers use every OO feature or construct which is available in the language for no better reason than to impress other programmers with their ability to write obscure code, the theory being that the more obscure it is the more OO it is. They seem to think that if it is too simple then you are not doing it right. These people have obviously not heard of the KISS principle.

OOP is about actors and actions.

Object Oriented Programming is a mode of software development that modularizes and decomposes code authorship into the definition of actors and actions.

This is so vague it is meaningless, and therefore of absolutely no use at all.

OOP is all about late binding.

'Late' refers to the fact that the binding decisions (which binary to load, which function to call) are deferred as long as possible, often until just before the function is called, rather than having the binding decisions made at compile time (early).

Rubbish. Whether such binding takes place early or late does not separate OOP from non-OOP. It is possible to have a non-OO language which offers late binding, but that does not magically turn it into OO. Conversely, a language which supports classes, encapsulation, inheritance and polymorphism is suddenly not OO simply because it only offers early binding.


As you can see, the above descriptions are either too vague or not specific to OOP, so they cannot be used as distinguishing features.


What is an Object Oriented language?

Basic Terminology

A computer language can be said to be Object Oriented if it provides support for the following:

Abstraction The process of separating the abstract from the concrete, the general from the specific, by examining a group of objects looking for both similarities and differences. The similarities can be shared by all members of that group while the differences are unique to individual members. The result of this process should then be an abstract superclass containing the shared characteristics and a separate concrete subclass to contain the differences for each unique instance.

As explained in What is "abstraction" there are two flavours:

Please also refer to What Abstraction is not.

Abstract Class A class which cannot be instantiated into an object. It can only be used as a superclass which can be inherited by any number of subclasses.

See also The difference between an interface and an abstract class.

Concrete Class A class that can be instantiated into an object.
Class A class is a blueprint, or prototype, that defines the variables (data) and the methods (operations) common to all objects of a certain kind.
Object An instance of a class. A class must be instantiated into an object before it can be used in the software. More than one instance of the same class can be in existence at any one time.
Encapsulation The act of placing data and the operations that perform on that data in the same class. The class then becomes the 'capsule' or container for the data and operations. This binds together the data and functions that manipulate the data.
Inheritance The reuse of base classes (superclasses) to form derived classes (subclasses). Methods and properties defined in the superclass are automatically shared by any subclass. A subclass may override any of the methods in the superclass, or may introduce new methods of its own.
Polymorphism Same interface, different implementation. The ability to substitute one class for another. This means that different classes may contain the same method signature, but the result which is returned by calling that method on a different object will be different as the code behind that method (the implementation) is different in each object.

A class defines (encapsulates) both the properties (data) of an entity and the methods (functions or operations) which may act upon those properties. Neither properties nor methods which can be applied to that entity should exist outside of that class definition, and each class should not be responsible for more than one entity.

Inheritance is a method of sharing the properties and methods of an existing (super)class to create one or more (sub)classes.

Polymorphism is made available when multiple objects share the same method signature. This is usually, but not necessarily, achieved through inheritance. This means that when an object calls a method on a dependent object and the method being called exists in multiple objects, it should be possible to switch the identity of that dependent object to any of those objects and the method call will still work, but with different results. The mechanism used to switch the identity of the dependent object at runtime is known as Dependency Injection (DI). Note that using DI in a situation where polymorphism does not exist would be a bad idea as you would be adding a feature to your code which would never be used, thus violating YAGNI.

Other Terminology

Other concepts or terminology which you may encounter in OOP are as follows:

Association A description of a related set of links between objects of two types. This defines a relationship between classes of objects that allows one object instance to cause another to perform an action on its behalf.

In a database application two types (tables) may be joined in a parent-to-child or senior-to-junior relationship where one table is the parent (senior) in the relationship and the other is the child (junior).

In the few code samples I have seen if there is an association between two objects you are required to create a separate method which then instantiates both objects in order to handle the communication between them. This is not how databases work, so in the RADICORE framework there is no instantiation of any related objects. Instead the details of any relationships are held as metadata in two class properties:

  • $child_relations array - this identifies all the child/junior relationships where the current table is the parent/senior.
  • $parent_relations array - this identifies all the parent/senior relationships where the current table is the child/junior.

The $parent_relations array can be referenced when building an SQL SELECT statement in order to JOIN to a parent table so that data from that table can be included in the result.

While a concept in the real world, such as a Sales Order, may have a whole which is made up of several parts, it may not be a good idea to create a single composite class to encapsulate the whole. In a database where the design has been properly normalised and each part has been separated out to its own table, it would be wise to create a separate class for each of those tables. Each table in a database is a separate entity with its own structure and its own business rules, so each entity should have its own class. A composite class would be responsible for multiple entities and would therefore violate the Single Responsibility Principle.

Please also read OOP for Heretics - Object Associations.

Aggregation A property of an association representing a whole-part relationship and (usually) life-time containment. This is a subset of Association where the child in a relationship can exist independently of the parent. Example: Class (parent) and Student (child). Delete the Class and the Students still exist.

Please also read OOP for Heretics - Object Aggregations.

Composition The identification of a type in which each instance is comprised of other objects. This is a subset of Association where the child in a relationship cannot exist without the parent. Example: House (parent) and Room (child). Rooms don't exist separate to a House.

My main gripe about object composition is that it plays against one of the primary aims of OOP which is to decrease code maintenance by increasing code reuse. Directly related to this is the amount of code you have to write in order to reuse some other code - if you have to write lots of code in order to take an empty object and include its composite parts then how is this better than using inheritance which requires nothing but the single word extends?

Please also read OOP for Heretics - Object Composition.

Collaboration Two or more objects that participate in a client/server relationship in order to provide a service. In the RADICORE framework every user transaction is provided by a combination of a Model, a View, a Controller and a Data Access Object as shown in this diagram.

Cohesion Describes the contents of a module. The degree to which the responsibilities of a single module/component form a meaningful unit.
Coupling A dependency between elements (usually types, classes and subsystems), typically resulting from collaboration between the elements to provide a service.
Delegation The notion that an object can issue a message to another object in response to a message. The first object therefore delegates the responsibility to the second object. For example, in a web application a Controller receives a message when the SUBMIT button in an HTML document is pressed, and the Controller calls a method on a Model to carry out the request. The Model may pass a message to a DAO to obtain data from the database.
Domain A formal boundary that defines a particular subject or area of interest. A large enterprise application may cover several different but interconnected domains such as Products, Customers, Orders, Invoices, Inventory and Shipments.

In the RADICORE framework each domain/subsystem has its own database and its own set of tasks (user transactions). Tasks in one domain can access the tables in another domain.

Rather than treating each subsystem as a separate domain which requires its own separate design methodology I treat each one as being a part of a larger system with common and therefore sharable characteristics. Everything which can be shared by the numerous subsystems is therefore built into the RADICORE framework which was purpose-built to satisfy the requirements of any subsystem which has a web front-end and a relational database at the back-end. So I regard my "domain" as being web-based database applications and each subsystem as a different implementation of that domain.

Event A noteworthy occurrence. Also known as a unit of work as it defines an action which the application must perform in order to bring it into line which an action which is performed in the real world. This usually involves reading from and/or writing to the application's database

In the RADICORE framework each event is defined as a task in the MNU-TASK table, which then allows it to appear on a menu button or navigation button so that the user can select it whenever necessary.

Type A description of a set of like objects with attributes (properties) and operations (methods). A category of people or things having common characteristics. This normally equates to a class in OOP.
Use Case A narrative, textual description of the sequence of events and actions that occur when a user participates in a dialog with a system during a meaningful process. May also be known as a user transaction (to differentiate it from a database transaction) or a task. An individual task may be used either to query the contents of the application database or to update it in some way. Some tasks may only touch a single record in a single table while others may touch multiple records in multiple tables.

In the RADICORE framework each task can only handle a single View, so multiple Views require separate tasks, one for each Transaction Pattern. Each task has its own entry on the TASK table after which it can be added to the MENU and/or NAVIGATION_BUTTON tables. A user's access to available tasks is controlled by the Role-Based Access Control (RBAC) feature of the framework.


What OOP is

When I came to learn OOP in late 2001 and early 2002 the resources which were available on the internet were very small in number and far less complicated. All I had to go on was a description of what made a language object oriented in Object Oriented Programming from October 2001 which stated something similar to the following:

Object Oriented Programming is programming which is oriented around objects, thus taking advantage of Encapsulation, Inheritance and Polymorphism to increase code reuse and decrease code maintenance.

To do OO programming you need an OO language, and a language can only be said to be object oriented if it supports encapsulation (classes and objects), inheritance and polymorphism. It may support other features, but encapsulation, inheritance and polymorphism are the bare minimum. That is not just my personal opinion, it is also the opinion of the man who invented the term. In addition, Bjarne Stroustrup (who designed and implemented the C++ programming language), provides this broad definition of the term "Object Oriented" in section 3 of his paper called Why C++ is not just an Object Oriented Programming Language:

A language or technique is object-oriented if and only if it directly supports:
  1. Abstraction - providing some form of classes and objects.
  2. Inheritance - providing the ability to build new abstractions out of existing ones.
  3. Runtime polymorphism - providing some form of runtime binding.

This is the current definition found in ISO/IEC 2382:2015:

2122503
object-oriented
pertaining to a technique or a programming language that supports objects, classes, and inheritance

Note 1 to entry: Some authorities list the following requirements for object-oriented programming: information hiding or encapsulation, data abstraction, message passing, polymorphism, dynamic binding, and inheritance.

The fact that some authorities use a slightly different list of requirements in their definition of OO just proves that there is no single definition which satisfies everybody. Provided that I use terms which are included in the ISO/IEC definition - such as encapsulation, inheritance and polymorphism - then I feel justified in saying that my definition cannot be regarded as "wrong". On the contrary, anyone who says that OO requires features or concepts which are NOT in the above list has no justification in doing so and can be ignored with impunity.

OO theory is constantly being expanded to include more and more concepts, and these concepts are becoming more and more complicated. As languages are modified to include these add-on concepts newcomers to these languages become convinced that it is these add-ons which define what OO is. I totally disagree. OOP does not require the use of any of these optional extras, so it is wrong to say that a program is not OO simply because it does not use them. It would be like saying that a car is not a car unless it has climate control and satnav. Those are optional extras, not the distinguishing features, and not having them does not make your car not a car. It would also be incorrect to say that a car is a car because it has wheels. Having wheels does not make something a car - a pram has wheels, but that does not make it a car, so having wheels is not a distinguishing feature.

That is why I say that such things as "modularity", "reusability" and "messaging" are not features which distinguish an OO language from a non-OO language for the simple reason that they already existed in some non-OO languages. That is why I say that using some of these later additions, these optional extras, to OO languages does not make your code "more" OO. If you write programs which are oriented around objects then your code is object oriented, and you are an object oriented programmer. It's as simple as that. But it is how you implement the concepts of encapsulation, inheritance and polymorphism to achieve strong cohesion, loose coupling and the elimination of redundancies which really matters. It is the ability to produce code which has increased reusability and decreased maintainability when compared to previous paradigms which determines if your implementation is effective or not. If I can write effective software quicker and cheaper with a procedural language than you can with your OO language, then I'm afraid that it is you the programmer who has failed. It is not a failure in the language because I personally have made the move from non-OO languages to an OO-capable language and can produce identical software faster and cheaper than I could before. The failure is in the way that modern programmers are being taught to write OO code. There is too much emphasis on following academic principles instead of getting the job done in the most cost-effective manner, which is supposed to be the production of cost-effective software for the benefit of the paying customer.

I highlighted the phrase when compared to previous paradigms deliberately. You cannot say that something is better than all its alternatives unless you actually have some of those alternatives available for comparison. Today's programmers who have never written software in a non-OO language have nothing to compare against, so how do they know that what they are doing is better? I have several decades worth of experience in writing database applications in non-OO languages, so I am more able than most to make that comparison. I develop nothing but database applications for use by the enterprise, and I judge the effectiveness of a language by how productive I can be in that language. For each table in the database I usually have to create a family of forms in order to view and maintain the contents of that table, and as I have moved from one language to another the amount of time taken to build that family of forms has decreased.

Note that I also created my own frameworks in each of those languages, so this would also have contributed to my levels of productivity.

I created my OO framework in PHP 4 because PHP 4 had what was necessary according to the available definition at that time to write OO programs. This OO framework enabled me to create applications at a faster rate than my earlier frameworks in other languages simply because it had higher levels of reusability, which therefore increased both its speed of development and its maintainability. I have therefore used the concepts of OOP and achieved the objectives of OOP, so what justification do these young upstarts have for telling me that my implementation is wrong?

Optional Extras

Among these "optional extras" which have nothing to do with a language being OO or not are:

There is an additional list of "optional extras" at A minimalist approach to Object Oriented Programming with PHP.

As these are optional extras I am merely exercising the option to not use them. Some people say that my minimalist approach to OOP, the fact that I use nothing but encapsulation, inheritance and polymorphism, means that I am not a "proper" OO programmer. Yet with my approach I can still achieve high reusability and low maintenance, so why am I wrong?


The difference between OOP and non-OOP

A better way of trying to explain the differences between non-OO and OO programming is to use actual examples.

They are defined differently

A function is defined as a self-contained block of code. Each function name "fName" must be unique within the application.

function fName ($arg1, $arg2) 
// function description
{
    ....
    
    return $result;
    
} // fName

A class method is defined within the boundaries of a class definition. Each class name "cName" must be unique within the application. Each class may contain any number of functions (also known as "methods" or "operations"), and the function name "fName" must be unique within the class but need not be unique within the application. In fact, the ability for different classes to share common function/method names is a requirement of polymorphism.

class cName
{
    function fName ($arg1, $arg2) 
    // function description
    {
        ....
        
        return $result;
        
    } // fName
    
} // cName

In his article The Object-Oriented Thought Process the author Matt Weisfeld states the following:

Difference Between OO and Procedural
This is the key difference between OO and procedural programming. In OO design, the attributes and behavior are contained within a single object, whereas in procedural, or structured design, the attributes and behavior are normally separated.

They are accessed differently

It is important to note that neither a function nor a class can be accessed until the function/class definition has been loaded.

Calling a function is very straightforward:

$result = fName($arg1, $arg2);

Calling a class method is not so straightforward. First it is necessary to create an instance of the class (an object), then to access the function (method) name through the object. The class name must be unique within the application. The name of the variable in which the object is stored need bear no relation to the name of the class.

$object = new cName; 
$result = $object->fName($arg1, $arg2);

Objects have inheritance, Functions do not

In OOP it is possible to define a superclass (parent) with methods and properties, and then create a subclass (child) which inherits everything from the superclass. In this was all the code in the superclass can be shared by every subclass. It is possible for the subclass to contain additional methods and properties, or even to override (alter) any of those methods or properties.

When overriding a method in the child class it is possible to completely replace the inherited method with a different implementation. It is also possible to call the inherited method as well as performing additional steps as shown below:

class A
{
    function foo (...)
    {
        ... default code ...
    }
    function bar (...)
    {
        ... default code ...
    }
}
class B extends A
}
    function foo (...)
    {
        $result = parent::foo(...);  // execute default code
				
        ... additional code ...
    }
}

Note here that class B does not override method bar, so when $objectB->bar() is called it actually executes the bar() method that was defined in class A.

They have different numbers of working copies

A function does not have to be instantiated before it can be accessed, therefore only one copy (or instance) is said to exist at any one time.

A class method can only be accessed after it has been instantiated into an object (unless it has been defined as a static method, see below), and it is possible to create multiple instances (objects) of the same class with different object names.

$object1 = new cName;
$object2 = new cName;
$object3 = new cName; 

Although it is possible to access a static method without first creating an object, this is no better than accessing a non-class function. As it is not actually using an object it cannot be considered part of object oriented programming.

They have different numbers of entry points

A function has only a single point of entry, and that is the function name itself.

An object has multiple points of entry, one for each method name.

They have different methods of maintaining state

A function by default does not have state, by which I mean that each time that it is called it is treated as a fresh invocation and not a continuation of any previous invocation.

An object does have state, by which I mean that each time an object's method is called it acts upon the object's state (the class properties) as it was after the previous method call.

It is possible for both a function and a class method to use local variables, and they both operate in the same way. This means that the local variables do not exist outside the scope of the function or class method, and any values placed in them do not persist between invocations.

It is possible for a function to remember values between different invocations by declaring a variable as static, as in the following example:

function count () {
    static $count = 0;
    $count++;
    return $count;
}

Each time this function is called it will return a value that is one greater than the previous call. Without the keyword static it would always return the value '1'.

Class variables which need to persist outside of a function (method) are declared at class level, as follows:

class calculator
{
    // define class properties (member variables)
    var $value;
    
    // define class methods
    function setValue ($value) 
    {
        $this->value = $value;
        
        return;
        
    } // setValue
    
    function getValue () 
    // function description
    {
        return $this->value;
        
    } // getValue
    
    function add ($value) 
    // function description
    {
        $this->value = $this->value + $value;
        
        return $this->value;
        
    } // add
    
    function subtract ($value) 
    // function description
    {
        $this->value = $this->value - $value;
        
        return $this->value;
        
    } // subtract
    
} // calculator

Note that all class/object variables are referenced with the prefix $this-> as in $this->varname. Any variable which is referenced without this keyword, as in $varname, is treated as a local variable.

Note also that each instance of the class (object) maintains its own set of variables, so the contents of one object are totally independent of the contents of another object, even it is from the same class.


Practical Examples

Here are some practical examples which demonstrate Encapsulation, Inheritance and Polymorphism.

Encapsulation

Encapsulation The act of placing data and the operations that perform on that data in the same class. The class then becomes the 'capsule' or container for the data and operations. This binds together the data and functions that manipulate the data.

More details can be found at Object-Oriented Programming for Heretics

Every application deals with a number of different entities or "things", such as "customer" "product" and "invoice", so it is common practice to create a different class for each of these entities. At runtime the software will create one or more objects from each class definition, and when it wants to do something with one of these entities it will do so by calling the relevant method on the relevant object.

The data held within each object at runtime cannot remain in memory for ever, so it is written out to a persistent data store (a database) with a separate table for each entity. There are only four basic operations which can be performed on a database table (Create, Read, Update, Delete) so I shall start by creating a method for each one.

class entity1
{
    // class properties
    var $dbname;           // database name
    var $errors = array(); // array of error messages, indexed by field name          
    var $fieldarray;       // associative array of name=value pairs
    var $fieldspec;        // array of field specifications
    var $numrows;          // number of database rows affected
    var $primary_key;      // array of field names which make up the primary key
    var $tablename;        // table name
    
    // class methods
    function __construct ()
    // constructor
    {
        $this->tablename   = 'entity1';
        $this->dbname      = 'foobar';
        
        $this->fieldlist   = array('column1', 'column2', 'column3', 'column4');
        $this->primary_key = array('column1');
        
    } // __construct
    
    // class methods
    function getData ($where)
    // read data from the database which satisfies the selection criteria in $where
    {
        ....
        
        return $this->fieldarray;
        
    } // getData
    
    function insertRecord ($fieldarray)
    // create a database record using the contents of $fieldarray
    {
        ....
        
        return $fieldarray;
        
    } // insertRecord
    
    function updateRecord ($fieldarray)
    // update a database record using the contents of $fieldarray
    {
        ....
        
        return $fieldarray;
        
    } // updateRecord
    
    function deleteRecord ($fieldarray)
    // delete a database record identified in $fieldarray
    {
        ....
        
        return $fieldarray;
        
    } // deleteRecord
    
} // entity1

Please note the following:

Each of these classes therefore acts as a 'capsule' which contains both the data for an entity and the operations which can be performed upon that data. This is 'encapsulation'.

Inheritance

Inheritance The reuse of base classes (superclasses) to form derived classes (subclasses). Methods and properties defined in the superclass are automatically shared by any subclass. A subclass may override any of the methods in the superclass, or may introduce new methods of its own.

More details can be found at Object-Oriented Programming for Heretics

After writing and testing a class to deal with 'entity1' I copied it and made it work for 'entity2'. I then compared the two classes to see what code was common and could be shared, and what code was unique and could not be shared. I then transferred all the common code into a separate class known as a 'superclass'.

Firstly, to create the superclass, I changed the class name and the constructor to the following:

abstract class Default_Table
{
    // class properties
    var $dbname;           // database name
    var $errors = array(); // array of error messages, indexed by field name          
    var $fieldarray;       // associative array of name=value pairs
    var $fieldspec;        // array of field specifications
    var $numrows;          // number of database rows affected
    var $primary_key;      // array of field names which make up the primary key
    var $tablename;        // table name			
    
    // class methods
    function __construct ()
    // constructor
    {
        $this->tablename   = 'unknown';
        $this->dbname      = 'unknown';
        
        $this->fieldlist   = array();
        $this->primary_key = array();
        
    } // __construct
    
    function getData ($where)
    {
        ....
    }
    function insertRecord ($fieldarray)
    {
        ....
    }
    function updateRecord ($fieldarray)
    {
        ....
    }
    function deleteRecord ($fieldarray)
    {
        ....
    }
} // default

Note here that it is defining an unknown table with an unknown number of fields/columns, so it cannot be used as a genuine object. This is reinforced by the use of the word "abstract" in front of the class name which will prevent the "new" keyword from being used. It is only when the details of a specific database table are combined with this abstract definition through the mechanism known as inheritance that a "concrete" class is made available for instantiation.

Secondly, I altered each table class to remove the common methods and properties, and included the keyword extends to force inheritance from the abstract superclass.

include 'default.class.inc';
class entity1 extends Default_Table
{
    function __construct ()
    // constructor
    {
        $this->tablename   = 'entity1';
        $this->dbname      = 'foobar';
        
        $this->fieldlist   = array('column1', 'column2', 'column3', 'column4');
        $this->primary_key = array('column1');
        
    } // __construct
    
} // entity1

When a subclass is instantiated into an object that object will combine all the properties and methods of the superclass as well as those of the subclass. If anything has been defined in both the superclass and the subclass, then the definition from the subclass will take precedence.

In my current development environment the superclass contains several thousand lines of code, but there is only one copy of this code which is inherited by several hundred table classes. Inheritance is therefore a powerful mechanism for making one copy of common code accessible to many similar objects instead of having multiple copies of that common code.

Polymorphism

Polymorphism Same interface, different implementation. The ability to substitute one class for another. This means that different classes may contain the same method signature, but the result which is returned by calling that method on a different object will be different as the code behind that method (the implementation) is different in each object.

More details can be found at Object-Oriented Programming for Heretics

Polymorphism can only be employed where the same method signatures exist in several classes. The code within the method may be inherited from a parent class, or it may be totally different. This means that the same method can be used on different objects, but the results will be different.

For example, take a series of classes called 'Customer', 'Product' and 'Invoice'. One practice I have seen which makes polymorphism impossible is to incorporate the entity name into the method name, as in:

  1. getCustomer(), insertCustomer(), updateCustomer() deleteCustomer()
  2. getProduct(), insertProduct(), updateProduct() deleteProduct()
  3. getInvoice(), insertInvoice(), updateInvoice() deleteInvoice()

The problem with this approach is that the object (the controller in MVC) which communicates with each table object (the model in MVC) needs to know the method name before it can open up that channel of communication. If each model has a unique set of method names then there is no polymorphism, no possibility of using Dependency Injection, which in turn means that each Model must have a unique Controller (or set of Controllers) to communicate with it. This also means that you have to write specialised code to deal which each set of "uniqueness" instead of being able to reuse sharable code that deals with areas of commonality.

My approach is to use a standard set of method names for standard operations, as in:

  1. getData(), insertRecord(), updateRecord(), deleteRecord()

This is made easier as these methods are defined in the superclass and made available to each subclass through inheritance.

The advantage of this is that I can have one standard controller for each standard function, and this controller can work with any table class in the system. This is far better than having a separate set of controllers for each table class.

Here is some example code from one of my controllers:

$table_id = '????';
....
include "classes/$table_id.class.inc";
$object = new $table_id;
$data = $object->getData($where);
....

The contents of $table_id and $where are made available at runtime.

The significant point is that the name of the class (database table) is not hard-coded into the controller, it is passed as an argument at runtime. Only the method names are hard-coded, but as these method names exist within every table class by being inherited from the superclass they will always work. So, if the class name is 'Customer' the controller will obtain data from the 'Customer' table, if it is 'Product' it will obtain data from the 'Product' table, and so on.

Note that similar code can be used to insert data into the database as well as retrieving it, as shown in the following snippet:

$table_id = '????';
....
include "classes/$table_id.class.inc";
$dbobject = new $table_id;
$result = $dbobject->insertRecord($_POST);

I pass in the entire contents of the $_POST array in order to make my code as loosely coupled as possible. The array is disassembled and validated using logic within the Model.


Popular misconceptions

I am often berated by my critics, of which there are more than few, for not understanding what OOP really means. This simply boils down to the fact that they have extended the basic principles of OOP to include their personal interpretations, and they have developed rules which govern how these interpretations should be implemented.

Let me make it quite clear that I do not care for these personal interpretations, and I certainly do not care for these rules.

What Abstraction is not

This is supposed to be one of the key concepts of OOP, yet the descriptions of what it is, how to do it and what the results are supposed to be I think are totally inadequate as they are vague, imprecise, open to interpretation, and in some cases downright wrong. For every OO concept I expect to see a code sample showing it in action so that everybody knows what it looks like and how to use it, but while this is true for encapsulation, inheritance and polymorphism, for "abstraction" there is absolutely nothing. The nearest reference in the PHP manual is where it mentions an abstract class, but are these two linked or not? According to the following links they are not:

The first pillar of OOP is "Abstraction". "Abstraction is the process of selecting data to show only the relevant information to the user".

From Software Testing Help

Abstraction is the concept of object-oriented programming that "shows" only essential attributes and "hides" unnecessary information. The main purpose of abstraction is hiding the unnecessary details from the users. Abstraction is selecting data from a larger pool to show only relevant details of the object to the user.

From GURU99

Through the process of abstraction, a programmer hides all but the relevant data about an object in order to reduce complexity and increase efficiency.

From WhatIs.com

Abstraction is the process of hiding the internal details of an application from the outer world. Abstraction is used to describe things in simple terms. It's used to create a boundary between the application and the client programs.

From DigitalOcean.com and LogicMojo.com

In object oriented programming, abstraction involves exposing necessary functionality to external objects and hiding implementation details.

From EnjoyAlgorithms.com

Abstraction means displaying only essential information and hiding the details. Data abstraction refers to providing only essential information about the data to the outside world, hiding the background details or implementation.

From GeeksForGeeks.org

An abstraction is a way of hiding the implementation details and showing only the functionality to the users. In other words, it ignores the irrelevant details and shows only the required one.

From javatpoint.com

in Object-oriented programming, abstraction is a process of hiding the implementation details from the user, only the functionality will be provided to the user. In other words, the user will have the information on what the object does instead of how it does it.

From tutorialspoint.com

In Object Oriented Programming abstraction concept the actual implementation is hidden from the user and only required functionality will be accessible or available to the user.

From topperskills.com

Abstraction is a process of hiding unnecessary data and showing only relevant data.

From upgrad.com

All of the above links describe abstraction as separating what can be hidden from what can be left visible. I disagree completely. It was not until I came across a paper called Designing Reusable Classes, which was published by Ralph E. Johnson & Brian Foote in 1988, that I read a proper and unambiguous description of abstraction. In it they say that the process of abstraction involves looking at several objects so that you can separate the abstract from the concrete, the similar from the different. You are, in effect, looking for repeating patterns. All the similar protocols (methods) can be placed in an abstract class and all the differences placed in concrete subclasses. All the similar protocols are then shared via inheritance. They call this programming-by-difference. It means that instead of writing duplicate copies of the similar protocols you create a single master copy which you can then reuse as many times as you like. This later became the basis for the Don't Repeat Yourself (DRY) principle.

I have written more on this subject in The meaning of "abstraction".

While an abstract class can take care of the standard shared methods, what happens when a concrete class needs to deviate from the standard and perform some processing which is unique to that class? This problem is solved with the Template Method Pattern which allows the abstract class to contain some empty "hook" methods at various places in its processing cycle. Any subclass is then free to provide its own implementation of these methods and thus override the standard processing.

What Encapsulation is not

I disagree with all the following statements:

  1. Encapsulation and Abstraction mean the same thing

    This is a popular misconception, but the fact that one leads to the other does not mean that they are the same thing.

    You should be able to see here that encapsulation involves the creation of concrete classes, and it is only after you have several of those classes that you can move any shared protocols to an abstract class. In my own application I have over 400 database tables which share their common protocols from a single abstract class, so that is a huge amount of code which is reused a huge number of times.

    The differences between encapsulation and abstraction are also discussed in Mistaking Encapsulation for Abstraction by Kevin Buchanan.

  2. Encapsulation is about implementation hiding

    This is a meaningless statement as "implementation hiding" is not restricted to encapsulation, and neither is it restricted to OO languages. This was not the aim of encapsulation, nor is it a unique or distinguishing feature of encapsulation. It is a universal property of all languages and all paradigms, as described in OOP is about information hiding. In every procedural language you can create a function with a specific signature (the API), and this signature only exposes what it does, not how it does it (the implementation).

  3. Encapsulation is about information (data) hiding

    Just because "information" and "implementation" have similar sounds, it does not follow that they also have similar meanings. It is also untrue to say data is part of the implementation, therefore implementation hiding automatically means data hiding. This is wrong for the simple reason that implementation is code while information is data. Data is not code, so information is not implementation.

    If you think that I am the only person who thinks that encapsulation has nothing to do with information hiding then take a look at the following:

    The article Abstraction, Encapsulation, and Information Hiding shows how different authors have provided their own interpretations of encapsulation which have caused the true meaning to become corrupted beyond recognition:

    To enclose in or as if in a capsule.

    -- Mish, 1988

    The concept of encapsulation as used in an object-oriented context is not essentially different from its dictionary definition. It still refers to building a capsule, in the case a conceptual barrier, around some collection of things.

    -- Wirfs-Brock et al, 1990

    But then the idea of "data hiding" crept in:

    It is a simple, yet reasonable effective, system-building tool. It allows suppliers to present cleanly specified interfaces around the services they provide. A consumer has full visibility to the procedures offered by an object, and no visibility to its data. From a consumer's point of view, and object is a seamless capsule that offers a number of services, with no visibility as to how these services are implemented ... The technical term for this is encapsulation.

    -- Cox, 1986

    Encapsulation, or equivalently information hiding, refers to the practice of including within an object everything it needs, and furthermore doing this in such a way that no other object need ever be aware of this internal structure.

    -- Graham, 1991

    We say that the changeable, hidden information becomes the secret of the module; also, according to a widely used jargon, we say that such information is encapsulated within the implementation.

    -- Ghezzi et al, 1991

    Data hiding is sometimes called encapsulation because the data and its code are put together in a package or 'capsule'.

    -- Smith, 1991

    Encapsulation is used as a generic term for techniques which realize data abstraction. Encapsulation therefore implies the provision of mechanisms to support both modularity and information hiding. There is therefore a one to one correspondence in this case between the technique of encapsulation and the principle of data abstraction.

    -- Blair et al, 1991

    Encapsulation (also information hiding) consists of separating the external aspects of an object which are accessible to other objects, from the internal implementation details of the object, which are hidden from other objects.

    -- Rumbaugh et al, 1991

    Encapsulation -- also known as information hiding -- prevents clients from seeing its inside view, where the behavior of the abstraction is implemented.

    -- Booch, 1991

    The idea of data hiding has always seemed strange to me. Surely in an application whose sole purpose is to move data between a GUI and a database then hiding that data defeats that purpose?

    The same article shows different sets of definitions for Information Hiding:

    The lack of clear, precise and unambiguous definitions of important OO concepts leads to confusion, misinterpretations and mistakes, and until these ambiguities are rectified the promises of better software made for OOP will never be realised.
  4. You must have a separate method for each use case

    This idea never occurred to me, which is why I came up with a superior technique. Each use case requires the use of a particular Model, View and Controller. Instead of having separate Controllers which are tightly coupled to particular Models my reusable Controllers can call any of the standard CRUD operations on any Model. This is made possible by virtue of the fact that each concrete table class (Model) inherits it standard methods from an abstract table class. Each use case has its own component script in the file system which is accessed via its own URL. Each of these component scripts identifies which Model and View needs to be linked with which Controller. This means that in an enterprise application containing 4,000 use cases I do not have a different method name for each, I have a different component script.

  5. You must use getters and setters to access your data

    As a consequence of the "Encapsulation means information/data hiding" rule some programmers insist that it also means that the visibility of each piece of data should be changed from "public" to "private". This means that instead of

    $foo = $object->foo;
    $object->foo = 'bar';
    
    I should use:
    $foo = $object->getFoo();
    $object->setFoo('bar');
    

    This is because the 'get' and 'set' methods (also known as 'accessors' and 'mutators') provide the opportunity to execute some additional code when the data is being retrieved or inserted.

    Considering that I do not even acknowledge that the rule exists in the first place, why should I be bound by additional consequential rules?

    It should also be noted by experienced programmers that there are not just two methods of retrieving data from and inserting data into an object. There is actually a third - on the method signature itself. Consider the following statement:

    $output = $object->doSomething($input);
    

    $input is data going in, and $output is data coming out. Neither of these variables can have their visibility downgraded from "public" for the simple reason that they are part of the method signature and therefore cannot be hidden. This means that I can put data into and get data out of an object without having to reference a class variable, public or not, hidden or not.

    There is also no rule that says each class property must be a scalar value, so I can use an array if I want to. This can be especially useful when dealing with data associated with relational databases as they deal with rows and columns, and arrays are the perfect mechanism as they can deal with any number of columns from any number of rows. It is possible for the sql SELECT statement to exclude some of the table's columns, or it may include columns from other tables via a JOIN. This method is much more flexible because I don't have to code the names of any columns in any getters and setters. The Controller object injects the entire HTTP request into the Model object as a single array, and the View object retrieves all the data from the Model as a single array.

  6. This is explained in more detail in Don't use getters and setters for user data.

  7. You must validate your data in the setter methods

    There is a golden rule in programming that you must never trust any data which is supplied by a user - it should always be validated or filtered before it gets written to the database. While I agree with this rule (now there's a surprise!) I do not like the follow-on rule which states As you must be using setter methods it follows that you must validate the user input within the setter method.

    I don't use setters therefore I cannot validate within setters. That does not mean that the data goes unvalidated, it just means that I perform my validation in a different manner. All data goes into the object as an array, and all data gets passed to the data access object (DAO) as an array so that it can be written to the database. But it only gets to the DAO after it has been validated, and that is done by passing the entire array through a validation object. If there are any validation errors then the whole array gets thrown back to the user with a suitable error message and never gets as far as the database.

  8. You must not have more than N methods in a class

    Some programmers say that if you have more than N methods in a class (where N is a completely arbitrary number) then you class is too big and unmanageable. They say that such a class must surely be breaking the Single Responsibility Principle. They say that you should break that class down into smaller subclasses as they are easier for the programmer to get his brain around.

    Firstly, encapsulation requires that all the data and all the operations that can be performed on that data are placed in the same class. If you use multiple classes then you are breaking encapsulation.

    Secondly, if a class requires 100 methods then it requires 100 methods. If a programmer cannot deal with 100 methods in a single class then how can he possibly deal with all those methods spread across multiple classes with the additional complexity of extra code which would then be required to pass control to another class just to deal with another operation, or a different facet of the same operation?

    Thirdly, you can take the idea of "lots of small methods" too far end up with an unmaintainable mess. As a prime example I was recently forced to use a certain email library, and when I downloaded it I found it had over 140 classes, each in its own file, spread across 24 directories and subdirectories. When I came to step through it with my debugger in order to track down what I thought was a minor problem I spent over 30 minutes stepping through line after line of code which didn't actually do anything useful. All it was doing was instantiating object after object and jumping from one object method to another. Most of these methods contained just a single line of code, and very little of this code was actually associated with constructing and sending the email. If that is your idea of "best practice", and you are teaching this idea to others, then all I can say is "God help us!"

Common implementation errors

As I have previously stated the simplest yet most accurate definition of encapsulation which I have found goes like this:

The act of placing data and the operations that perform on that data in the same class. The class then becomes the 'capsule' or container for the data and operations. This binds together the data and functions that manipulate the data.

Encapsulation is the creation of classes which can be instantiated into objects. This means that you first identify the entities with which your application will be concerned where an "entity" has data and operations that perform on that data. In the case of a database application each table is an entity. You then create a separate class for each entity where the all the data is defined as class properties and all the operations are defined as class methods. All the properties and methods for an entity go into a SINGLE class and should NOT be split across multiple classes, with the only exception being to separate out the Presentation processing and the Data Access processing as required by the 3 Tier Architecture. Yet there are still many groups of programmers out there who still manage to get this basic idea completely wrong. Among the mistakes are:

What Polymorphism is not

The definition of polymorphism should be easily understood by everyone, yet there are some people who even manage to screw this up. They focus on the phrase "same interface, different implementation" and automatically assume that it means object interface instead of method signature. A practical example of polymorphism in action is where I have the code $result = $object->method() and I can create $object from any number of different classes. All that is required is that each of those classes supports that method signature. It does not matter how the method got into the class - it could be either inherited or hard-coded - just that it exists.

In my main enterprise application I have 450+ concrete table (model) classes and 40 generic page controllers. Each of those controllers can be used with any of those models by virtue of the fact that each model class inherits from the same abstract table class, and each controller communicates with whatever model it has been given by calling methods which were defined in the abstract table class and therefore automatically available in each concrete class. If you do the maths you will see that this arrangement produces 18,000 (450*40) opportunities for polymorphism. Did you read that? EIGHTEEN THOUSAND! If you don't have that level of polymorphism in your application then how can you possibly tell me that my implementation of OOP is wrong?

I recently answered a question in the Dynamic form generation thread in the comp.lang.php newsgroup in which a comedian called Jerry Stuckle said:

You've got a *partial* definition. Polymorphism is only applicable when the two classes have a parent/child hierarchy, and the child class has a method of the same name (and in some languages, the same parameter list) as the parent.

When there is no parent/child relationship (as in the case of two different database tables), there is no polymorphism.
This was my reply:
The definition of polymorphism does NOT state that the classes have to exist in a parent/child hierarchy, only that they have the same method signature. Having said that, it is usually the case that the two classes ARE related.

You obviously haven't used an abstract table class which is inherited by every concrete table class. All my concrete table classes have instant access to all the methods and properties which are defined just once in the abstract class.
To which Jerry responded:
I've used them much longer than you've even known they existed. When you have a class derived from the abstract class, there is a parent/child relationship. But there is no such relationship between two classes derived from the same one.

Once again you show you have no knowledge of OO. Polymorphism cannot exist without inheritance - which requires a parent/child hierarchy.
This was my reply:
Yes it can. Polymorphism simply requires that the same interface exists in more than one class. That may come from inheritance, or it may not. It *IS* possible to define the same interface more than once without inheritance. Each one of the 400 concrete table classes in my application is derived from the same abstract table class, so each one of those 400 classes is a sibling of the other, and "sibling" implies a relationship.
To which Jerry responded:
So there is a sibling relationship? Once again you prove how you don't understand the concept of polymorphism.
Later on he said:
You can have the same interface in more than one object WITHOUT inheritance, but that is not polymorphism.

So, according to Jerry Stuckle, a self-proclaimed "expert", polymorphism is restricted by the following rules:

If multiple classes have the "same interface, different implementation" then the conditions for polymorphism exist whether you like it or not. If you have invented additional rules then you are wrong for inventing those rules. I am most definitely *NOT* wrong for refusing to follow those additional rules.

What Inheritance is not

Some people claim that there are two types of inheritance:

Note that a concrete class can only contain plain (non-abstract) methods while an abstract class can contain a mixture of both abstract and non-abstract methods, or nothing but non-abstract methods.

There are differences between inheritance in the real world and inheritance in OOP:

Interface inheritance is therefore a misnomer on two fronts:

This is like telling someone that they have inherited a suitcase full of money, and when they ask "Where is it?" you tell them that not only do they have to provide the money themselves, they also have to provide the suitcase.

TRUE Inheritance is a technique which allows you define methods and their implementations in one class (superclass), then to reuse those methods in another class (subclass) simply by using the extends keyword. The subclass then becomes a combination of what is in the superclass plus whatever is defined within itself. The article Pragmatic OOP by Ricki Sickenger says the following:

OOP is supposed to be a practical way to organize a program into hierarchies of objects where similar objects can inherit behavior from each other and override that behavior when necessary.

A problem with inheritance is that it can be used incorrectly. The above article contains the following:

A Car and a Train and a Truck can all inherit behavior from a Vehicle object, adding their subtle differences. A Firetruck can inherit from the Truck object, and so on. Wait.. and so on? The thing about inheritance is that is so easy to create massive trees of objects. But what OO-bigots won't tell you is that these trees will mess you up big time if you let them grow too deep, or grow for the wrong reasons.

One problem encountered with the overuse of inheritance is when the superclass contains a method which does not apply in a subclass. This problem led to the creation of the Liskov Substition Principle. It also led to the idea of Favour Composition over Inheritance.

In Object Composition vs. Inheritance I found the following description:

Most designers overuse inheritance, resulting in large inheritance hierarchies that can become hard to deal with. Object composition is a different method of reusing functionality. Objects are composed to achieve more complex functionality. The disadvantage of object composition is that the behavior of the system may be harder to understand just by looking at the source code. A system using object composition may be very dynamic in nature so it may require running the system to get a deeper understanding of how the different objects cooperate.
[....]
However, inheritance is still necessary. You cannot always get all the necessary functionality by assembling existing components.

Interestingly enough the same article also contains this:

The disadvantage of class inheritance is that the subclass becomes dependent on the parent class implementation. This makes it harder to reuse the subclass, especially if part of the inherited implementation is no longer desirable. ... One way around this problem is to only inherit from abstract classes.

The way to avoid problems with inheritance is therefore to avoid deep hierarchies, and to inherit only from abstract classes wherever possible. As I only develop database applications my software never interacts with objects in the "real world", just objects in a database. Every object in a database "is-a" table, so I created an abstract table class to contain the methods that could be applied to any database table, and inherit from this class to create a separate concrete table class for each physical table in my database. My abstract table class is quite large as there is a lot of processing which could be done on each table, and as I have over 400 concrete classes this results in a large amount of code which is reused through inheritance.

By having an abstract table class which is inherited by every concrete table class I can make use of the Template Method Pattern. This allows me to define standard sharable code in the abstract class while each subclass can override the variable/customisable methods to provide behaviour which is specific to that subclass.

OOP requires a totally different thought process

There are a surprising number of people who hold the opinion that object oriented programming is totally different from procedural programming, and that it requires a totally different way of thinking and a different way of writing code. There are even those who say that if you do not utilise OO concepts in the "proper" way then even though you may be using objects you are nothing more than a procedural programmer, where the term "procedural" is used as an insult. Take a look at the following articles:

The article Pragmatic OOP by Ricki Sickenger contains the following:

I have met programmers who believe that anywhere there is a conditional statement in OO code, there is cause to subclass, "because that is the OO way!". And they will defend it against any pragmatic reasoning. So anywhere you see an if/then/else or a switch statement, you should find a way to break the logic into separate objects to avoid the logic. The dogma here is that conditional statements complicate things and are not strictly OO, so they must be minimized and preferable erased.

In Are You Still Debugging? the author Yegor Bugayenko says the following:

Code is procedural when it is all about how the goal should be achieved instead of what the goal is.
A method is procedural if the name is centered around a verb, but OO if it is centered around a noun.

I disagree. Objects (entities) are nouns while methods (operations) are verbs.

In this post a person called Fasda said the following:

The key you need to focus is THE WAY YOU THINK A SOLUTION.
In procedural you thinks solutions as writing a recipe, step by step. First do that, then that, and continue with...
In OO you think solutions as "people asking favors to others" -> Objects and Messages only. Try to use some pure OO language to stop thinking on if, while, for and all those keywords so common in procedural.

The article Getters/Setters. Evil. Period. contains the following quote from David West:

Step one in the transformation of a successful procedural developer into a successful object developer is a lobotomy.

This tells me that object developers have half the brains of procedural developers.

I do not share any of these opinions.

In his article All evidence points to OOP being bullshit John Barker says the following:

Procedural programming languages are designed around the idea of enumerating the steps required to complete a task. OOP languages are the same in that they are imperative - they are still essentially about giving the computer a sequence of commands to execute. What OOP introduces are abstractions that attempt to improve code sharing and security. In many ways it is still essentially procedural code.

In his paper Encapsulation as a First Principle of Object-Oriented Design (PDF) the author Scott L. Bain wrote the following:

Object Orientation (OO) addresses as its primary concern those things which influence the rate of success for the developer or team of developers: how easy is it to understand and implement a design, how extensible (and understandable) an existing code set is, how much pain one has to go through to find and fix a bug, add a new feature, change an existing feature, and so forth. Beyond simple "buzzword compliance", most end users and stakeholders are not concerned with whether or not a system is designed in an OO language or using good OO techniques. They are concerned with the end result of the process - it is the development team that enjoys the direct benefits that come from using OO.

This should not surprise us, since OO is routed in those best-practice principles that arose from the wise dons of procedural programming. The three pillars of "good code", namely strong cohesion, loose coupling and the elimination of redundancies, were not discovered by the inventors of OO, but were rather inherited by them (no pun intended).

Cohesion is the degree to which the responsibilities of a single module/component form a meaningful unit. High cohesion is considered to be better than low cohesion.

Coupling is the degree of interaction between two modules. Whenever you have one module calling another you have coupling. Loose coupling is considered to be better than tight coupling.

Elimination of redundancies is aimed at removing code that you do not need and is now called the YAGNI principle.

Other best practices which evolved in procedural languages, but which are still relevant in the OO world are the KISS and DRY principles

This tells me several things:

This also tells me that OOP can be adequately supported in procedural languages (such as PHP and COBOL) which have had the necessary syntax added in to enable encapsulation, inheritance and polymorphism without replacing the original syntax with something which is more OO-like. For example, if a procedural language allows a statement such as $result = uppercase($string) it would seem to be overkill to replace it with $result = $string->uppercase(). The result is exactly the same, but a lot of effort has been expended just to do it differently.

When some people say that OO programming is completely different from procedural programming they are making a fundamental mistake. OO programming is exactly the same as procedural programming except for the addition of encapsulation, inheritance and polymorphism. The only difference between these two paradigms is that one supports encapsulation, inheritance and polymorphism while the other does not. Just as it is possible to produce spaghetti code (unstructured branching using GOTO) in a procedural language it is also possible to produce ravioli code (too many small classes) or lasagne code (too many layers) in an OO language. Using the features that the language provides will not guarantee "good" code, it is how you make use of those features which is the deciding factor.

OO theory is constantly being expanded to include more and more concepts, and these concepts are becoming more and more complicated. As languages are modified to include these add-on concepts newcomers to these languages become convinced that it is these add-ons which define what OO is. I totally disagree. OOP does not require the use of any of these optional extras, so it is wrong to say that a program is not OO simply because it does not use them. It would be like saying that a car is not a car unless it has climate control and satnav. Those are optional extras, not the distinguishing features, and not having them does not make your car not a car. It would also be incorrect to say that a car is a car because it has wheels. Having wheels does not make something a car - a pram has wheels, but that does not make it a car, so having wheels is not a distinguishing feature.

As has already been stated in What OOP is you need to be using a programming language that supports encapsulation, inheritance and polymorphism, and you need to use these features in a way that creates more reusable code. The more reusable code you have the less you have to write, which in turn means the less code you need to read and maintain.

For further thoughts on this matter please read What is the difference between Procedural and OO programming?


What types of object should I create?

In his article How to write testable code the author identifies three distinct categories of object:

  1. Value objects - an immutable object whose responsibility is mainly holding state but may have some behavior. Examples of Value Objects might be Color, Temperature, Price and Size.
  2. Entities - an object whose job is to hold state and associated behavior. Examples of this might be Account, Product or User.
  3. Services - an object which performs an operation. It encapsulates an activity but has no encapsulated state (that is, it is stateless). Examples of Services could include a parser, an authenticator, a validator or a transformer (such as transforming raw data into XML or HTML).

This is also discussed in When to inject: the distinction between newables and injectables.

The PHP language does not have value objects, so I shall ignore them.

It would be advisable to avoid the temptation to create Anemic Domain Models which contain data but no processing. This goes against the whole idea of OO which is to create objects which contain both data and processing.

My framework contains the following objects:

  1. Model - this is an entity. One of these is created for each entity in the application and holds all the business rules for that entity.
  2. View - this is a service, a reusable component which is provided by the framework.
  3. Controller - this is a service, a reusable component which is provided by the framework.
  4. Data Access Object - this is a service, a reusable component which is provided by the framework.

Note here that all application/domain knowledge is confined to the Models (the Business layer). There is absolutely no application knowledge in any of the services which are built into the framework. This means that the services (Controllers, Views and DAOs) are application-agnostic while the Models are framework-agnostic.

Every application, however small or large, will be comprised of a number of user transactions (sometimes known as Use Cases) which perform a specific unit of work for the user. In my framework each user transaction has its own Component script which does nothing but identify which combination of Model, View and Controller are required to carry out the relevant processing.


How many objects should I create?

This is a tricky question, but there are two extremes which you should avoid:

It is generally accepted that you should break your application down into areas of different logic where each area has a single responsibility, but what exactly is a "responsibility"? The confusion over the idea that "responsibility" should be treated as "reason for change" is discussed in I don't love the single responsibility principle in which Marco Cecconi says the following:

The purpose of classes is to organize code as to minimize complexity. Therefore, classes should be:
  1. small enough to lower coupling, but
  2. large enough to maximize cohesion.
By default, choose to group by functionality.

He also points out that an over-enthusiastic implementation of SRP can result in large numbers of anemic micro-classes that do little and complicate the organisation of the code base.

The simplest way that I have found to identify what classes you need is to identify the entities that will be of interest to the business/domain layer of your application. An entity is something which has both data and operations that act upon that data. An e-commerce application will therefore have CUSTOMERS, PRODUCTS and ORDERS. Note that the data for each of these entities may be spread across more than one database table, such as this collection for a Sales Order, so do you create a class for each table or just a single class for the entire entity? Any programmer with experience of writing database applications and using SQL will be able to tell you that you simply do not deal with that collection of tables as a single unit as each table has its own data structure, its own business rules and is subject to the same set of CRUD operations. In a database you have to deal which each table as a separate entity therefore it makes sense, to me at least, to create a separate class for each table.

In my own architecture, which is shown in Figure 1, I have the following component numbers:


What structure should I use?

In the early days of computing monolithic systems were the most common. These systems have the following characteristics:

A software system is called "monolithic" if it has a monolithic architecture, in which functionally distinguishable aspects (for example data input and output, data processing, error handling, and the user interface) are all interwoven, rather than containing architecturally separate components.

This also became known as the Single-Tier Architecture when the idea of splitting the code into layers or tiers gradually became popular. After using monolithic structures with the COBOL language I switched to UNIFACE which provided the following:

I liked the 3-Tier Architecture so much that I used it when I rebuilt by development framework in PHP.

OO aficionados should note that the 3-Tier Architecture conforms to the Single Responsibility Principle which was written by Robert C. Martin (Uncle Bob). Although in his original article he used the vague term "a class should only have one reason to change" in later articles he gave more usable descriptions:

This is the reason we do not put SQL in JSPs. This is the reason we do not generate HTML in the modules that compute results. This is the reason that business rules should not know the database schema. This is the reason we separate concerns.

In Test Induced Design Damage? he wrote the following:

How do you separate concerns? You separate behaviors that change at different times for different reasons. Things that change together you keep together. Things that change apart you keep apart.

GUIs change at a very different rate, and for very different reasons, than business rules. Database schemas change for very different reasons, and at very different rates than business rules. Keeping these concerns (GUI, business rules, database) separate is good design.

Martin Fowler also describes this separation into three layers in his article PresentationDomainDataLayering where he refers to the "Business" layer as the "Domain" layer. In his article AnemicDomainModel he says the following:

It's also worth emphasizing that putting behavior into the domain objects should not contradict the solid approach of using layering to separate domain logic from such things as persistence and presentation responsibilities. The logic that should be in a domain object is domain logic - validations, calculations, business rules - whatever you like to call it.

Note that each of these three layers is not restricted to having a single component. You may find it convenient to split the program logic into more specialised components. For example, in my Presentation layer I built a separate component to create all HTML output using XML and XSL transformations, and if you treat the Business layer as being the same as the Model, this also produced an implementation of the Model-View-Controller design pattern, as shown in Figure 1:

Figure 1 - The MVC and 3-Tier architectures combined

model-view-controller-03a (5K)

How much reusability should I have?

The purpose of OOP is to use encapsulation, inheritance and polymorphism to produce code with a greater degree of reusability. Code which can be written once and reused multiple times does not have to be rewritten multiple times, and code which you don't have to write takes zero time to write. Taking less time to achieve a given objective results in lower costs, which means that the developer can become more cost-effective and more productive. The more reusability you have the less time you have to spend in maintenance as a module which is reused 100 times only has to be tested once, not 100 times.

If you are not sure how much reusability you have in your framework then I would ask yourself the following questions:

The volume of reusability in my framework is detailed in Levels of Reusability. If you cannot achieve similar levels of reusability in your framework then stop wasting your time in telling me that my methods are wrong. If my methods produce 10 times the reusability that you can, then this surely indicates that my methods are 10 times better than yours.


Conclusion

Many people use different words to describe what OOP is supposed to mean, but the problem with words is that they are slippery. Like Humpty Dumpty proclaimed in Lewis Carroll's Through the Looking Glass:

When I use a word, it means just what I choose it to mean -- neither more nor less.

If you take the words used by the originators of OOP and apply different meanings to those words, then others take your words and apply different meanings to them, then you can end up with something which is nothing like the original, as immortalised in that children's game called Chinese Whispers.

There are only three features which really differentiate an Object Oriented language from a non-OO language, and these are Encapsulation, Inheritance and Polymorphism. Everything else is either bullshit or hype. Object Oriented Programming is therefore the use of these features in a programming language. High reusability and low maintainability cannot be guaranteed - that depends entirely on how these features are implemented.

Some people accuse me of having a view of OOP which is too simplistic, but instead of saying that my view is "more simple than it need be" surely it can also mean that their view is "more complex than it need be"? As a long-time follower of the KISS principle, which has a more modern variant in Do The Simplest Thing That Could Possibly Work. I know which view I prefer, and I also know which view is easier to teach to others.


References

The following articles describe aspects of my framework:

The following articles express my heretical views on the topic of OOP:

These are reasons why I consider some ideas to be complete rubbish:

Here are my views on changes to the PHP language and Backwards Compatibility:

The following are responses to criticisms of my methods:

Here are some miscellaneous articles:


Amendment History

04 Feb 2023 Added What Abstraction is not
17 Dec 2021 Added Common implementation errors
09 May 2020 Added Other Terminology
20 Apr 2017 Added What types of object should I create?
Added How many objects should I create?
Added What structure should I use?
Added How much reusability should I have?
10 Mar 2017 Added What Inheritance is not
Added OOP requires a totally different thought process.
29 Apr 2012 Added Popular misconceptions.
10 Apr 2012 Added Optional Extras.
17 Jun 2010 Amended What OOP is to include a reference to a definition provided by Bjarne Stroustrup.

counter