Abstraction is supposed to be an important part of OOP, but what exactly does it mean? What is it, and how is it implemented? This has confused me for a long time as wherever I look I seem to find a different definition, such as the following which I found by searching the internet:
- The process of removing or generalizing physical, spatial, or temporal details or attributes in the study of objects or systems to focus attention on details of greater importance; it is similar in nature to the process of generalization;
- the creation of abstract concept-objects by mirroring common features or attributes of various non-abstract [concrete] objects or systems of study - the result of the process of abstraction.
- the process of reorganizing common behavior from non-abstract [concrete] classes into "abstract classes" using inheritance to abstract over sub-classes as seen in the object-oriented C++ and Java programming languages.
Note that in the above I have inserted the word "[concrete]" to indicate where it is normally used instead of the term "non-abstract".
Each significant piece of functionality in a program should be implemented in just one place in the source code. Where similar functions are carried out by distinct pieces of code, it is generally beneficial to combine them into one by abstracting out the varying parts.
Computer scientists use abstraction to make models that can be used and re-used without having to re-write all the program code for each new application
Abstraction is the process of taking away or removing characteristics from something in order to reduce it to a set of essential characteristics.
Abstraction is the concept of wrapping up complex actions in simple verbs. Describe each thing you've abstracted clearly, and hide the complexity.
Abstraction is an extension of encapsulation. It is the process of selecting data from a larger pool to show only the relevant details to the object.
Abstraction is a technique of providing only the essential details to the user by hiding the unnecessary or irrelevant details of an entity. This helps in reducing the operational complexity at the user-end.
Abstraction of Data or Hiding of Information is called Abstraction! or in other words, what are those things that a user is concerned about.
Often, it's easier to reason and design a program when you can separate the interface of a class from its implementation, and focus on the interface. This is akin to treating a system as a "black box," where it's not important to understand the gory inner workings in order to reap the benefits of using it.
Abstraction is the process of showing only essential/necessary features of an entity/object to the outside world and hide the other irrelevant information.
Abstraction is a process of hiding the implementation details and showing only functionality to the user. It only shows essential things to the user and hides the internal details. Abstraction lets you focus on what the object does instead of how it does it.
Abstraction can be defined as hiding internal implementation and showing only the required features or set of services that are offered.
I found some more definitions in Abstraction, Encapsulation, and Information Hiding by Edward V. Berard of The Object Agency:
A view of a problem that extracts the essential information relevant to a particular purpose and ignores the remainder of the information.
The essence of abstraction is to extract essential properties while omitting inessential details.
Abstraction is a process whereby we identify the important aspects of a phenomenon and ignore its details.
Abstraction is generally defined as 'the process of formulating generalised concepts by extracting common qualities from specific examples.'
Abstraction is the selective examination of certain aspects of a problem. The goal of abstraction is to isolate those aspects that are important for some purpose and suppress those aspects that are unimportant.
The meaning [of abstraction] given by the Oxford English Dictionary (OED) closest to the meaning intended here is 'The act of separating in thought'. A better definition might be 'Representing the essential features of something without including background or inessential detail.'
[A] simplified description, or specification, of a system that emphasizes some of the system's details or properties while suppressing others. A good abstraction is one that emphasizes details that are significant to the reader or user and suppress details that are, at least for the moment, immaterial or diversionary.
An abstraction denotes the essential characteristics of an object that distinguish it from all other kinds of object and thus provide crisply defined conceptual boundaries, relative to the perspective of the viewer.
You can find even more misleading descriptions if you read What abstraction is not where you will see, when associated with computer programming, the term has been twisted to mean "separating what data can be hidden from that which should be visible" instead of "separating the abstract from the concrete".
So many different definitions, so many different descriptions, but they still fail to answer the basic question "How do I apply this concept called abstraction when designing a computer system, and what are the results?" To muddy the waters even more Edward V. Berard makes this observation:
One point of confusion regarding abstraction is its use as both a process and an entity. Abstraction, as a process, denotes the extracting of the essential details about an item, or a group of items, while ignoring the inessential details. Abstraction, as an entity, denotes a model, a view, or some other focused representation for an actual item. Abstraction is most often used as a complexity mastering technique. For example, we often hear people say such things as: "just give me the highlights" or "just the facts, please." What these people are asking for are abstractions.
We can have varying degrees of abstraction, although these "degrees" are more commonly referred to as "levels." As we move to higher levels of abstraction, we focus on the larger and more important pieces of information (using our chosen selection criteria). Another common observation is that as we move to higher levels of abstraction, we tend to concern ourselves with progressively smaller volumes of information, and fewer overall items. As we move to lower levels of abstraction, we reveal more detail, typically encounter more individual items, and increase the volume of information with which we must deal.
We also note that there are many different types of abstraction, e.g., functional abstraction, data abstraction, process abstraction, and even object abstraction.
How can novice programmers become masters of the art of abstraction if even the current set of so-called "masters" cannot describe it in a consistent and unambiguous manner?
In the above list of random definitions you will see the following:
Abstraction is a technique of providing only the essential details to the user by hiding the unnecessary or irrelevant details of an entity. This helps in reducing the operational complexity at the user-end.
Abstraction of Data or Hiding of Information is called Abstraction! or in other words, what are those things that a user is concerned about.
Abstraction is the process of showing only essential/necessary features of an entity/object to the outside world and hide the other irrelevant information.
Abstraction is a process of hiding the implementation details and showing only functionality to the user. It only shows essential things to the user and hides the internal details. Abstraction lets you focus on what the object does instead of how it does it.
Abstraction can be defined as hiding internal implementation and showing only the required features or set of services that are offered.
These are all saying that abstraction means data hiding, which is not the case. As explained below the process of abstraction has to do with the separation of the abstract from the concrete so that you can place common functionality in an abstract class which can then be inherited and shared in a number of concrete subclasses. The hiding of data is a totally separate, unconnected and unrelated concept.
You can perform an abstraction and the result will be and abstraction, meaning that it is both a verb/process and a noun/entity. So when authors write about "abstraction" which type do they mean? On top of that there are also different types of abstraction, which potentially leads to even more confusion. To muddy the waters even more the only reference in the programming language which includes the word "abstract" is to denote a type of class, one that cannot be instantiated into an object. So if there are different types of abstraction and different types of class, which type of abstraction produces which type of class? Confused? I know I was. Things started to become clearer when I came across the following statements in in a paper called Designing Reusable Classes which was published in 1988 by Ralph Johnson and Brian Foote. While this was published 35 years ago with just the Smalltalk language in mind, the basic concepts are still relevant in many of today's Object Oriented languages.
Introduction
The first section of the paper describes the attributes of object-oriented languages that promote reusable software. Data abstraction encourages modular systems that are easy to understand. Inheritance allows subclasses to share methods defined in superclasses, and permits programming-by-difference. Polymorphism makes it easier for a given component to work correctly in a wide range of new contexts. The combination of these features makes the design of object-oriented systems quite different from that of conventional systems.
Protocol
The specification of an object is given by its protocol, i.e. the set of messages that can be sent to it.
...
Objects with identical protocol are interchangeable. Thus, the interface between objects is defined by the protocols that they expect each other to understand. If several classes define the same protocol then objects in those classes are "plug compatible".
...
Standard protocols are given their power by polymorphism.
Inheritance
Most object-oriented programming languages have another feature that differentiates them from other data abstraction languages; class inheritance. Each class has a superclass from which it inherits operations and internal structure. A class can add to the operations it inherits or can redefine inherited operations. However, classes cannot delete inherited operations.
Class inheritance has a number of advantages. One is that it promotes code reuse, since code shared by several classes can be placed in their common superclass, and new classes can start off having code available by being given a superclass with that code. Class inheritance supports a style of programming called programming-by-difference, where the programmer defines a new class by picking a closely related class as its superclass and describing the differences between the old and new classes. Class inheritance also provides a way to organize and classify classes, since classes with the same superclass are usually closely related.
One of the important benefits of class inheritance is that it encourages the development of the standard protocols that were earlier described as making polymorphism so useful. All the subclasses of a particular class inherit its operations, so they all share its protocol. Thus, when a programmer uses programming-by-difference to rapidly build classes, a family of classes with a standard protocol results automatically. Thus, class inheritance not only supports software reuse by programming-by-difference, it also helps develop standard protocols.
Abstract Classes
Standard protocols are often represented by abstract classes [Goldberg & Robson 1983].
An abstract class never has instances, only its subclasses have instances. The roots of class hierarchies are usually abstract classes, while the leaf classes are never abstract. Abstract classes usually do not define any instance variables. However, they define methods in terms of a few undefined methods that must be implemented by the subclasses.
...
A class that is not abstract is concrete. In general, it is better to inherit from an abstract class than from a concrete class. A concrete class must provide a definition for its data representation, and some subclasses will need a different representation. Since an abstract class does not have to provide a data representation, future subclasses can use any representation without fear of conflicting with the one that they inherited.
After reading this I could eventually see the light at the end of the tunnel. Out of all the previous definitions of abstraction the only ones which were a close match were:
Thought of or stated without reference to a specific instance. Separated from matter, practice, or particular examples; not concrete.
The act of comparing commonality between distinct objects and organizing using those similarities; the act of generalizing characteristics; the product of said generalization.
So the aim of abstraction is to separate out the abstract from the concrete from a group of objects where the abstract identifies the similarities and the concrete identifies the differences. This concept, called programming-by-difference, means that you look at several entities which are of interest to your application and separate out the similarities from the differences. You look at the data for these entities as well as the operations that can be performed on their data. If the data representations (properties) are different but the protocols (methods) are the same then you can put the similarities in an abstract superclass and the differences in separate concrete subclasses. While each concrete class has its own data representation an abstract class does not. The abstract class may contain placeholders for data, but these placeholders are not populated until a concrete class is instantiated into an object and methods are called to insert data. Any shared protocols (ie: operations or methods) can be defined in the abstract class and may use the contents of these placeholders. Data can be inserted into an object either by being pushed from a calling object or pulled from a database.
While all the standard protocols/methods can be defined in the abstract class, how do you deal with any non-standard methods which are unique to particular subclasses? You implement the Template Method Pattern, of course.
It then became clear to me that the practices which I had adopted instinctively and intuitively when I began to develop my framework were completely in tune with the concept of programming-by-difference. These practices were as follows:
After having performed the process of data abstraction and producing a list of tables and classes for each entity in the business domain the next step is to look for similarities and differences in the operations that can be performed on those entities. I have already determined that I am not writing an application which communicates with objects in the real world, only the data which is held on those objects in the database, I am not interested in the operations which are available in those real world objects, only those which are available in the database. A product such as a ride-on lawn mower may have operations such as "switch engine on", "switch engine off", "start moving", "stop moving", "turn left", "turn right", "raise blades" and "lower blades", but these are completely irrelevant in a Sales Order Processing (SOP) system. A person/customer may have operations such as "stand", "sit", "walk", "run", "eat", "sleep" and "defecate", but these are completely irrelevant in a Sales Order Processing (SOP) system.
Regardless of the fact that entities such as products and customers in the real world are as different as chalk and cheese, a SOP system does not interact with those entities directly, it interacts with nothing but information about those entities, and that information is stored in a database as columns of data arranged into tables. Regardless of how many different tables I have, and how many different columns I have on each table and how different the data is in each of those columns, the only operations that can be performed on a database table are Create, Read, Update and Delete (CRUD). So just as I use the DDL language to define the structure of each domain object (table) I use the DML language to define the operations that can be performed on each of those objects.
As these four operations are common to every database table they can be moved to an abstract class from which they can be inherited, thus removing large amounts of boilerplate code that would otherwise be duplicated. In order to cater for the possibility that some concrete subclasses may require additional or non-standard processing then the use of an abstract superclass allows the Template Method Pattern to be employed so that any non-standard processing can be added to "hook" methods within each concrete subclass.
In the description of abstract classes by Johnson and Foote it says Abstract classes usually do not define any instance variables
. The term "usually" means to me that this is an option which may or may not be implemented at the developer's discretion. There is no rule that says Abstract classes must not define instance variables
. I have found that I can define placeholders for common pieces of data in the abstract class and fill these placeholders with actual data within each concrete subclass. These placeholders are as follows:
All these details are loaded into the object when it is instantiated from a separate <tablename>.dict.inc file which is exported from my Data Dictionary.
By using these simplified and less ambiguous definitions of abstraction I have created a large ERP application which contains hundreds of database tables each of which inherits its standard processing from a single abstract class. I have also created standard objects to perform the common services. This is a large amount of code which is reused in a large number of objects, and that is supposed to be a good thing, right?
This section is for objects which exist in the Business/Domain layer. There is a different approach for objects which exist in the Presentation and Data Access layers.
Before you can start creating classes you have to identify those objects/entities which will be relevant to your application, then you can create classes for those entities. This is where a lot of clueless newbies make their first mistake. They have read in the literature that one of the selling points of OO is that you can model the real world!
However, just because you can does not mean that you should. It is obvious to every programmer who has experience with database applications that the software does *NOT* communicate with objects in the real world, it only ever communicates with objects in a database, and those objects are called tables.
The next thing to consider is that with a database application the most important part is the database. Some developers are taught to start with the software design using the rules of Object Oriented Analysis and Design (OOAD), Domain Driven Design (DDD) and the SOLID principles, and then try to create the database to match this design. This always produces a condition known as Object-Relational Impedance Mismatch for which the usual answer is the creation of that abomination known as an Object Relational Mapper (ORM). If you create a software design that cannot be supported in the database then you should throw away that design and start again, which means designing the database according to the rules of Data Normalisation, then designing software objects which match the database objects. This matches what Eric S. Raymond wrote in his book The Cathedral and the Bazaar when he said:
Smart data structures and dumb code works a lot better than the other way around.
This means that you design your database first, then build the software to match this design. Anybody who designs the software first, then tries to build the database to match this design is more likely to produce something that resembles a Heath Robinson contraption than a cost-effective computer system. The purpose of an enterprise application is put data into and get data out of a database by having a User Interface (UI) and the front end, a database at the back end, and software in the middle to move the data between the two ends and to process any business rules. Once the front-end screens and reports have been designed, and the back-end database has been designed, the software in the middle, even down to the choice of language to be used, is nothing more than an implementation detail.
The next thing to consider is granularity (level of detail) which can only be explained with a real-world example. Supposing an organisation wants an application to report product sales, and their product catalog contains numerous product lines ranging from toothbrushes, toasters, clock radios, food blenders and ride-on lawn mowers. A novice might think that each of those products is so different that it requires a separate class, but you would be wrong. No competent database designer would ever create a separate table for each of those products, so why should they each have their own class? If you ignore those differences which are not relevant to the application you should end up with nothing but similarities in the way that the essential data for each of those products is to be stored in the database. These similarities are usually limited to data items such as:
The similarities are the names of the pieces of data which are essential to the application and therefore will need to be stored in the database. This list of data items will be used to define the structure of each table in the form of a DDL script.
The differences are the values which will be stored in that table structure for each different product, with a different row in the table for each product. If your organisation has 100 products available for sale, then there will be 100 rows in the PRODUCT table.
Note that there may be more items depending on the precise requirements of your particular application. All these properties can be recorded as columns in either a single PRODUCT table, but sometimes in a group of related tables. There is never a need to have a separate table for each type of product, so there is never a need to have a separate class for each different type of product.
The result of this process is the definition of a PRODUCT table for which I will then create a PRODUCT class which identifies the structure of that table. Why do I have a separate class for each table? Because it matches the definition of a class:
A class is a blueprint, or prototype, that defines the variables and the methods common to all objects (entities) of a certain kind. A class represents a common abstraction of a set of entities, suppressing their differences.
The DDL script is the "blueprint" for each row in that table, so I use the same blueprint to create a class which will be used to manipulate that data.
It does not matter how many different and diverse products the organisation may deal with, or how different and diverse are the properties and operations which may be performed on or by those physical products, within the application they will be nothing more than rows in a database table.
I then repeat this process for every entity to create more database tables and more table classes.
This process called "data abstraction" is not something which is exclusive to OOP, it is a natural function in the designing of a database irrespective of the programming language that will be used to access it. I was designing and building database applications for several decades before I switched to using an OO language, so even though the code that I wrote in COBOL and UNIFACE, and now PHP, was completely different, the method of designing the database and following the rules of Data Normalisation was always the same.
OO developers who have experience of modular programming should be aware that there are basically two types of object:
Unlike an entity which can have numerous methods to load, modify and interrogate its data (state), with a service the data is loaded and processed in a single operation without being stored for later interrogation or manipulation.
The components in the RADICORE framework fall into the following categories:
Notice here that transforming an entity's data into HTML, CSV or PDF is not a function that is carried out within the entity itself. Mixing presentation logic with business logic and SQL logic is frowned upon in modern applications as it produces a tangled mess that is difficult to maintain. In my long career I have personally dealt with monolithic single-tier applications, then 2-tier applications, finally ending up with the 3-Tier Architecture which is an implementation of the Single Responsibility Principle (SRP). I loved this architecture so much that I made it the starting point when I redeveloped my framework in PHP. By splitting my Presentation layer into two separate components, a Controller and a View, I also accidentally created an implementation of the Model-View-Controller (MVC) design pattern.
When an object such as a service performs a single operation on a set of application data there is little scope for a data abstraction unless you fall into the trap of treating the data for each entity as being so different that you have to create a different version of that service for each entity. As soon as I started programming with PHP I recognised that this was not the case.
By deliberately designing the entities in the Business/Domain layer so that their data can be both input and output in a single array instead of being forced to use separate getters and setters for each column it then became much easier to design a single service for each operation that can work with any data rather than having a separate version of that service that can only work with the data for a particular entity.
In the section on Inheritance vs. decomposition the article states the following:
Since inheritance is so powerful, it is often overused. Frequently a class is made a subclass of another when it should have had an instance variable of that class as a component. For example, some object-oriented user-interface systems make windows be a subclass of Rectangle, since they are rectangular in shape. However, it makes more sense to make the rectangle be an instance variable of the window. Windows are not necessarily rectangular, rectangles are better thought of as geometric values whose state cannot be changed, and operations like moving make more sense on a window than on a rectangle.
Behavior can be easier to reuse as a component than by inheriting it. There are at least two good examples of this in Smalltalk-80. The first is that a parser inherits the behavior of the lexical analyzer instead of having it as a component. This caused problems when we wanted to place a filter between the lexical analyzer and the parser without changing the standard compiler. The second example is that scrolling is an inherited characteristic, so it is difficult to convert a class with vertical scrolling into one with no scrolling or with both horizontal and vertical scrolling. While multiple inheritance might solve this problem, it has problems of its own. Moreover, this problem is easy to solve by making scrollbars be components of objects that need to be scrolled.
Most object-oriented applications have many kinds of hierarchies. In addition to class inheritance hierarchies, they usually have instance hierarchies made up of regular objects. For example, a user-interface in Smalltalk consists of a tree of views, with each subview being a child of its superview. Each component is an instance of a subclass of View, but the root of the tree of views is an instance of StandardSystemView. As another example, the Smalltalk compiler produces parse trees that are hierarchies of parse nodes. Although each node is an instance of a subclass of ParseNode, the root of the parse tree is an instance of MethodNode, which is a particular subclass. Thus, while View and ParseNode are the abstract classes at the top of the class hierarchy, the objects at the top of the instance hierarchy are instances of StandardSystemView and MethodNode.
This distinction seems to confuse many new Smalltalk programmers. There is often a phase when a student tries to make the class of the node at the top of the instance hierarchy be at the top of the class hierarchy. Once the disease is diagnosed, it can be easily cured by explaining the differences between the instance and class hierarchies.
The first statement Since inheritance is so powerful, it is often overused.
tells me that some people implement an idea indiscriminately instead of intelligently. They do not understand when the use of an idea is appropriate and when it is not. I'm afraid there is no cure for this disease. You either have the ability to think, or you don't. Those people whose thought processes are sub-optimal will end up as being nothing more than Cargo Cult programmers.
The second statement Behavior can be easier to reuse as a component than by inheriting it
does not mean that it is impossible to reuse behaviour through inheritance, just more difficult, and sometimes it is beyond the capabilities of some programmers. After having built hundreds of database programs in several languages I had noticed that while some of the code which I wrote was unique there was some which was similar and what became to be known as boilerplate code. The common practice in my COBOL days was simply to churn out code whether it was unique or similar, but this ended up with chunks of similar code which was duplicated in many places. As time went by I learned how to convert this "similar" code into "identical" code which could then be defined once in a central library and then called many times without being duplicated. The OO capabilities of PHP allowed me to go one giant step further and combine the use of an abstract table class and the Template Method Pattern so that the sharable code could be defined in the abstract class as invariant methods while the unique code could be defined in "hook" methods within each concrete table class.
The remainder of the section in the above article is totally irrelevant when it comes to programming with PHP. It is talking about using a compiled language which is communicating with a bit-mapped display in which a copy of the GUI is held in memory, and changes to any part of this memory would result in a corresponding change in the visible display. PHP does not use a bit-mapped display, it constructs an HTML document which is sent to the client's web browser after which the PHP script dies. There is no further interaction with the web page until the browser sends another request which could either be a GET or a POST. A web page is not an object which is comprised of other objects which can be read from or written to in isolation, so it has no instance hierarchies. An HTML document is just a huge string of text containing values which are enclosed in HTML tags. In order to change the display a fresh copy of the entire HTML document has to be constructed and returned to the client's browser. I don't have to waste time developing hierarchies of classes to deal with the different parts of a web page as every page can be built using a single View object which has three simple steps:
Regardless of the structure of an HTML document, whenever a SUBMIT button is pressed the data areas in that form are presented to the PHP script on the server in the form of the $_POST array. This contains nothing but field names and their values, totally ignoring any HTML tags and any other irrelevant details.
Instance hierarchies have no place in a database application for the simple reason that a database does not have hierarchies of objects. There is no such thing as a table being a container for other tables. There may be logical hierarchies of tables, as identified by foreign keys, but it is up to the software to handle these relationships in a user-friendly way. Each table is an independent object which can be addressed directly without the necessity of going through another table. While an ERD diagram may show several tables in what appears to be a hierarchy, they do not constitute a composite object in the OO sense so should not be developed as a composite object in the software. Relationships between tables have no effect on the way that tables are accessed, they are accessed using the same CRUD operations whether or not they are related to other tables.
In the introduction of Designing Reusable Classes it states the following:
Object-oriented programming is often touted as promoting software reuse [Fischer 1987]. Languages like Smalltalk are claimed to reduce not only development time but also the cost of maintenance, simplifying the creation of new systems and of new versions of old systems. This is true, but object-oriented programming is not a panacea. Program components must be designed for reusability. There is a set of design techniques that makes object-oriented software more reusable. Many of these techniques are widely used within the object-oriented programming community, but few of them have ever been written down. This article describes and organizes these techniques. It uses Smalltalk vocabulary, but most of what it says applies to other object-oriented languages. It concentrates on single inheritance and says little about multiple inheritance.
This makes it clear that simply writing programs that use classes and objects is no guarantee that you will be automatically creating software that is more reusable and will require less maintenance. It is how you design your classes to take advantage of encapsulation, inheritance and polymorphism which counts. The more reusability you produce the better.
In the section on abstract classes in the same article it says:
Creating new abstract classes is very important, but is not easy. It is always easier to reuse a nicely packaged abstraction than to invent it. However, the process of programming in Smalltalk makes it easier to discover the important abstractions. A Smalltalk programmer always tries to create new classes by making them be subclasses of existing ones, since this is less work than creating a class from scratch. This often results in a class hierarchy whose top-most class is concrete. The top of a large class hierarchy should almost always be an abstract class, so the experienced programmer will then try to reorganize the class hierarchy and find the abstract class hidden in the concrete class. The result will be a new abstract class that can be reused many times in the future.
This quite clearly says that creating a class hierarchy whose top-most class is concrete is bad, but large numbers of programmers are still doing it. Why? Because that is the way they are taught to do it. This can create problems, but instead of using inheritance correctly they came up with a new principle called favour composition over inheritance. It also leads to such statements as inheritance breaks encapsulation and Inheritance produces tight coupling. I ignore all these principles simply because I don't have the problems created by having deep class hierarchies whose top-most class is concrete. I avoid such problems altogether by only ever inheriting from an abstract class. Avoiding a problem altogether is always much better than trying to deal with the consequences of that problem. As the old saying goes: Prevention is better than cure.
The article goes on to say:
We have already seen that object-oriented programming languages encourage software reuse in a number of ways. Class definitions provide modularity and information hiding. Late-binding of procedure calls means that objects require less information about each other, so objects need only to have the right protocol. A polymorphic procedure is easier to reuse than one that is not polymorphic, because it will work with a wider range of arguments. Class inheritance permits a class to be reused in a modified form by making subclasses from it. Class inheritance also helps form the families of standard protocols that are so important for reuse.
These features are also useful during maintenance. Modularity makes it easier to understand the effect of changes to a program. Polymorphism reduces the number of procedures, and thus the size of the program that has to be understood by the maintainer. Class inheritance permits a new version of a program to be built without affecting the old.
Here the article states that creating useful abstractions is a rare skill among programmers.
The most important attitude is the importance given to the creation of reusable abstractions. Kent Beck describes the difficulty in finding reusable abstractions and the importance placed on them by saying:
Even our researchers who use Smalltalk every day do not often come up with generally useful abstractions from the code they use to solve problems. Useful abstractions are usually created by programmers with an obsession for simplicity, who are willing to rewrite code several times to produce easy-to-understand and easy-to-specialize classes.
Later he states:
Decomposing problems and procedures is recognized as a difficult problem, and elaborate methodologies have been developed to help programmers in this process. Programmers who can go a step further and make their procedural solutions to a particular problem into a generic library are rare and valuable. [O' Shea et. al. 1986]
Here the article states that useful abstractions are discovered after writing code, not invented before writing code.
The sixth section of this article describes design rules. These rules are based on the fact that useful abstractions are usually designed from the bottom up, i.e. they are discovered, not invented. We create new general components by solving specific problems, and then recognizing that our solutions have potentially broader applicability. The design rules in this paper are a way of converting specific solutions into reusable abstractions, not a way of deducing abstractions from first principles.
This is precisely how I did it in my own application. I did not start with an abstract class and work my way down to a concrete class, I started by building a Model, View and Controller to handle Table#1 where the Model did not inherit anything. I then copied these three modules to deal with Table#2 which involved changing all the references for Table#1 to Table#2. I then went through the classes line by line and moved all the code which was duplicated into an abstract class. When I was finished the Model classes ended up with noting but their constructors. You can read the full details in Evolution of the RADICORE framework.
The idea of programming-by-difference can also be used to create reusable code in places other than abstract classes from which each Model can be built. All that it requires is the ability to spot some similarities in a group of objects and to find a way to provide those similarities with code that can be shared instead of having to be duplicated.
After having built many screens in my previous languages I had already come to notice that many had the same basic structure or layout, sometimes the same behaviour, but with the only difference being with the content. Some of this content could be supplied by the framework and some could be supplied by each application component. In my previous languages each screen had to be built individually so that it could be compiled, but this restriction did not exist with PHP as no screens are ever compiled, they are sent to the browser as complete HTML documents which are text files containing values surrounded by HTML tags with a bit of CSS thrown in to set the style. Although all the early PHP books and online tutorials which I read showed the HTML document being output in little chunks in different parts of the code I had already dismissed this idea as being far too primitive for my needs. Instead I wanted to create each web page from a template, which meant that I needed to make use of a templating engine. Fortunately I had encountered XML documents and XSL stylesheets in my previous language, so I knew that these would work, and after having confirmed that PHP contained the necessary extensions I made XSL Transformations the standard templating engine in my RADICORE framework. This is how I managed to build a single View object which can be used to create any HTML document for any applications built using my framework.
Here is an example of one of my earliest stylesheets which is described in Using PHP 4's Sablotron extension to perform XSL Transformations:
<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method='html'/> <!-- param values may be changed during the XSL Transformation --> <xsl:param name="title">List PERSON</xsl:param> <xsl:param name="script">person_list.php</xsl:param> <xsl:param name="numrows">0</xsl:param> <xsl:param name="curpage">1</xsl:param> <xsl:param name="lastpage">1</xsl:param> <xsl:param name="script_time">0.2744</xsl:param> <!-- include common templates --> <xsl:include href="std.pagination.xsl"/> <xsl:include href="std.actionbar.xsl"/> <xsl:template match="/"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title><xsl:value-of select="$title"/></title> <style type="text/css"> <![CDATA[ <!-- caption { font-weight: bold; } th { background: #cceeff; } tr.odd { background: #eeeeee; } tr.even { background: #dddddd; } .center { text-align: center; } --> ]]> </style> </head> <body> <form method="post" action="{$script}"> <div class="center"> <table border="0"> <caption><xsl:value-of select="$title"/></caption> <thead> <tr> <th>Select</th> <th>Id</th> <th>First Name</th> <th>Last Name</th> <th>Star Sign</th> <th>Person Type</th> </tr> </thead> <tbody> <xsl:apply-templates select="//person" /> </tbody> </table> <!-- insert the page navigation links --> <xsl:call-template name="pagination" /> <!-- create standard action buttons --> <xsl:call-template name="actbar"/> </div> </form> </body> </html> </xsl:template> <xsl:template match="person"> <tr> <xsl:attribute name="class"> <xsl:choose> <xsl:when test="position()mod 2">odd</xsl:when> <xsl:otherwise>even</xsl:otherwise> </xsl:choose> </xsl:attribute> <td><xsl:value-of select="selectbox"/></td> <td><xsl:value-of select="person_id"/></td> <td><xsl:value-of select="first_name"/></td> <td><xsl:value-of select="last_name"/></td> <td><xsl:value-of select="star_sign"/></td> <td><xsl:value-of select="pers_type_desc"/></td> </tr> </xsl:template> </xsl:stylesheet>
Here I am using templates called pagination
and actbar
which are obtained from external files which are loaded using the <xsl:include>
command. These are the equivalent calling subroutines from an external library. The <xsl:apply-templates>
command will then iterate over every person
element and process the matching template which is hard-coded at the bottom of that stylesheet. This method meant that I had to create a separate XSL stylesheet for each screen as both the table names and the columns were hard-coded.
The previous example was for a LIST screen where all the columns are display-only, but for ADD screens or UPDATE screens each field/column must be specified using the correct HTML control, as in the following example:
<tr> <td class="label">First Name</td> <td> <input type="text" name="first_name" size="//person/first_name/@size"> <xsl:attribute name="value"> <xsl:value-of select="//person/first_name"/> </xsl:attribute> </input> </td> </tr>
Note that the code required for other controls, such as dropdown lists and radio groups, can be more complex.
My next step was to move the code for each HTML control into its own template, as in the following:
<tr> <td class="label">First Name</td> <td> <xsl:call-template name="textbox"> <xsl:with-param name="field" select="//person/first_name"/> </xsl:call-template> </td> </tr>
Here I am still hard-coding which control goes with which field, but what if I wanted to change that choice in my PHP code? I decided to specify the desired control in the XML document as an attribute called control
and create a new template called datafield
which would call the relevant template:
<tr> <td class="label">First Name</td> <td> <xsl:call-template name="datafield"> <xsl:with-param name="field" select="//person/first_name"/> </xsl:call-template> </td> </tr>
This is the datafield
template:
<xsl:template name="datafield"> <xsl:param name="field"/> <xsl:choose> <xsl:when test="$field/@control='dropdown'"> <xsl:call-template name="dropdown"> <xsl:with-param name="field" select="$field"/> </xsl:call-template> </xsl:when> <xsl:when test="$field/@control='radiogroup'"> <xsl:call-template name="radiogroup"> <xsl:with-param name="field" select="$field"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <!-- this is the default control type --> <xsl:call-template name="textbox"> <xsl:with-param name="field" select="$field"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
This method still forced me to have a separate XSL stylesheet for each screen as I needed to specify which fields needed to be extracted from the XML document and placed where in the screen. This had been reduced to a simple list which basically said "place column X in the next cell of the current row in the screen", so I asked myself the question "Can I define this list in the XML document and process it in the XSL stylesheet?" I started by creating a new element called structure
in the XML document which looked like the following:
<structure> <main id="person"> <row> <cell label="Select"/> <cell field="selectbox"/> </row> <row> <cell label="Id"/> <cell field="person_id"/> </row> <row> <cell label="First Name"/> <cell field="first_name"/> </row> <row> <cell label="Last name"/> <cell field="last_name"/> </row> <row> <cell label="Star Sign"/> <cell field="star_sign"/> </row> <row> <cell label="Person Type"/> <cell field="pers_type_desc"/> </row> </structure>
I then played with the code in my XSL stylesheet to process this new element (see std.detail1.xsl for details). In order to populate the structure
element in the XML document I made use of a small screen structure file which specifies which piece of application data goes where on the screen, as shown in the following:
<?php $structure['xsl_file'] = 'std.list1.xsl'; $structure['tables']['main'] = 'person'; $structure['main']['columns'][] = array('width' => 5); $structure['main']['columns'][] = array('width' => 70); $structure['main']['columns'][] = array('width' => 100); $structure['main']['columns'][] = array('width' => 100); $structure['main']['columns'][] = array('width' => 100); $structure['main']['columns'][] = array('width' => '*'); $structure['main']['fields'][] = array('selectbox' => 'Select'); $structure['main']['fields'][] = array('person_id' => 'Id'); $structure['main']['fields'][] = array('first_name' => 'First Name'); $structure['main']['fields'][] = array('last_name' => 'Last Name'); $structure['main']['fields'][] = array('star_sign' => 'Star Sign'); $structure['main']['fields'][] = array('pers_type_desc' => 'Person Type'); ?>
This file is read into memory at the start of the script, and is copied into the XML document just before the script finishes. This allows the in-memory version to be modified at runtime.
In this way I have separated the similarities from the differences, the what data needs to be processed from the how it needs to be processed. The what is contained within the XML document which is freshly built when each task is run, while the how is defined within a small library of just 12 (twelve) Reusable XSL stylesheets. I have used these 12 stylesheets in my main ERP application to produce the web pages for over 4,000 (four thousand) different tasks, and if that does not qualify as the height of reusability then I'll eat my hat.
Note that my ability to create a single View component which can extract the data from any Model and transform it into XML and then HTML was greatly enhanced by the fact that no Model contains separate named properties for each database column. Instead they all use a ubiquitous $fieldarray property which can hold any data from any table or even several tables. All this data can be extracted using the standard getFieldArray() method. This is an example of loose coupling which is considered to be "good". If each column of data had its own named property then I would need separate calling components for each Model which contained hard-coded references to these named properties, and each of these components would be tightly coupled to a single Model, which is considered to be "bad".
Every modern programmer should be familiar with the term use case, but in my earlier COBOL days they were known as user transactions or units of work, but for the last 20 years I prefer to use the name task as it is short and to the point.
While some OO methodologies teach that each task should have its own method in the domain/business layer I was totally unaware of this idea, so I chose a totally different approach which has turned out to provide enormous benefits. As soon as I started programming with objects I noticed that OOP was 2 Tier by nature - after writing a class for an object in the business/domain layer to encapsulate its own collection of properties and methods it was also necessary to have an additional object in the presentation/UI layer in order to instantiate that class into an object and then call whatever methods were necessary to satisfy the needs of a particular task. In my framework the object in the business/domain layer is known as the Model, while the object in the presentation/UI layer is known as the Controller. This means that each task is actually comprised of two separate components - a Controller which calls a specific set of methods in a specific sequence on a Model, and a Model which contains its own implementation of those methods. The same Model can be forced to produce different outcomes simply by combining it with a different Controller.
My previous experience with database applications had also taught me that each task, from a starting point, performs one or more operations on one or more tables, and regardless of what data a table holds it is always subject to the same operations which are Create, Read, Update and Delete (CRUD). Code to handle the unique business rules is handled separately. The most common set of maintenance tasks for a database table is this family of forms where each task performs a different combination of these operations:
Note that each of the above methods is just a wrapper for a group of methods which are defined in the abstract table class. Some of these methods are invariant/fixed while others are "hook" methods which can be defined in each concrete subclass.
When creating Controllers the question you should ask is where do I start? Do I start with the Model and then add in the operations? Or do I start with the operations and add in the Model? It is only after creating several sets of Controllers for different Models that you can really answer this question. Supposing that you create 10 sets of forms for 10 different Models - what are the similarities and what are the differences? The similarities are that each Controller performs the same set of operations regardless of the Model, and the differences are that each set performs it operations on a different Model.
The OO capabilities of PHP provided me with the ability to encapsulate the similarities into a series of reusable Controllers plus the ability to supply the identity of the Model at runtime. All the operations are available in every Model because they are inherited from the same abstract table class, which means that they can be called in a polymorphic manner. You make use of polymorphism by calling the known operations on an unknown object where the identity of that object is not provided until runtime using a mechanism known as Dependency Injection. In my framework this is achieved with the use of a simple component script which says "Run a task using this Model, this View and this Controller". In this way a Controller can be used with any Model, and a Model can be used with any Controller.
Because the behaviour of each reusable Controller is fixed I found it necessary to document this behaviour in Transaction Patterns for Web Applications. Building a new task, or a family of tasks, then became a series of steps which I performed manually. Because these steps were always predictable I eventually decided to automate it. I had already created a Data Dictionary to automate the creation of both the table class file and the table structure file, so it was relatively straightforward to add in a procedure to match a pattern to a table, press a button, and have it create the necessary scripts and perform the necessary database updates.
Over time I have created new Controllers to deal with more complex scenarios, especially those which deal with relationships (associations) between two tables. In some cases this has meant adding new methods to the abstract class, both invariant and variable, but these have always been in addition to the existing methods so that they would continue to work as they always did.
By making my Controllers loosely coupled to any Model instead of tightly coupled to a particular Model, and by tying them to a particular XSL stylesheet which defines the screen's structure, I have been able to create a library of Transaction Patterns. Unlike design patterns where you have to manually create your own implementation each time these allow you link a pattern with a database table and by pressing a button you can generate the code for a task (or in some cases a family of tasks) which you can run immediately without having to write a single line of code - no PHP, no HTML, no SQL. While the generated task can only handle standard validation, the developer can implement any complex business rules by inserting code into the relevant "hook" method in the table's subclass.
The article has this to say about frameworks:
One of the most important kinds of reuse is reuse of designs. A collection of abstract classes can be used to express an abstract design. The design of a program is usually described in terms of the program's components and the way they interact.
An object-oriented abstract design, also called a framework, consists of an abstract class for each major component. The interfaces between the components of the design are defined in terms of sets of messages. There will usually be a library of subclasses that can be used as components in the design.
Here I disagree slightly. In my framework the major components are Models, Views, Controllers and Data Access Objects, but I only have an abstract class for the Model components as these are the only components that are generated by the developer. All the others are pre-written objects which are supplied in the framework.
Frameworks are more than well written class libraries.
...
A framework, on the other hand, is an abstract design for a particular kind of application, and usually consists of a number of classes. These classes can be taken from a class library, or can be application-specific.
...
Frameworks provide a way of reusing code that is resistant to more conventional reuse attempts. Application independent components can be reused rather easily, but reusing the edifice that ties the components together is usually possible only by copying and editing it. Unlike skeleton programs, which is the conventional approach to reusing this kind of code, frameworks make it easy to ensure the consistency of all components under changing requirements.
Since frameworks provide for reuse at the largest granularity, it is no surprise that a good framework is more difficult to design than a good abstract class. Frameworks tend to be application specific, to interlock with other frameworks by sharing abstract classes, and to contain some abstract classes that are specialized for the framework. Designing a framework requires a great deal of experience and experimentation, just like designing its component abstract classes.
In the case of RADICORE the particular kind of application
is that of a web-based database application. While some people consider that applications such as Invoicing and Inventory cover separate business domains and therefore require separate designs, I do not. It does not matter that each "application domain" requires a totally different set of database tables, totally different business rules and totally different tasks (user transactions, use cases or units of work), as each of those is handled in exactly the same way. RADICORE is a system for creating and then running database applications which itself is comprised of 4 subsystems - Menu, Audit, Workflow and Data Dictionary. Applications such as Order Processing, Invoicing, Shipments and Inventory are nothing more than additional subsystems which can be added in at random intervals.
White-box vs. Black-box Frameworks
One important characteristic of a framework is that the methods defined by the user to tailor the framework will often be called from within the framework itself, rather than from the user's application code. The framework often plays the role of the main program in coordinating and sequencing application activity. This inversion of control gives frameworks the power to serve as extensible skeletons. The methods supplied by the user tailor the generic algorithms defined in the framework for a particular application.
A framework's application specific behavior is usually defined by adding methods to subclasses of one or more of its classes. Each method added to a subclass must abide by the internal conventions of its superclasses. We call these white-box frameworks because their implementation must be understood to use them.
What is being described here is the Template Method Pattern. My abstract table class is full of template methods which means that every concrete table class, which is a subclass of this abstract class, shares the same methods. It does not matter that the data held in each table is totally different as the only operations that can be performed on a table are always the same - Create, Read, Update and Delete (CRUD). Every Controller communicates with its Model(s) using one or more of these template methods. The invariant methods in the abstract class are always executed, but the empty variable "hook" methods may be overridden in any concrete subclass.
The major problem with such a framework is that every application requires the creation of many new subclasses. While most of these new subclasses are simple, their number can make it difficult for a new programmer to learn the design of an application well enough to change it.
Not with RADICORE it doesn't. You only need to create one concrete table class for each table in your database. All the other components - abstract table class, Views, Controllers and Data Access Objects - come supplied with the framework.
A second problem is that a white-box framework can be difficult to learn to use, since learning to use it is the same as learning how it is constructed.
There is a learning curve with every framework, but if all you are going to do is write and then maintain database applications then you should treat any learning curve as an investment that will pay off over time.
Another way to customize a framework is to supply it with a set of components that provide the application specific behavior. Each of these components will be required to understand a particular protocol. All or most of the components might be provided by a component library. The interface between components can be defined by protocol, so the user needs to understand only the external interface of the components. Thus, this kind of a framework is called a black-box framework.
RADICORE is a white-box framework for building and then running web-based database applications, which means that the Presentation layer does nothing but deal with the sending a receiving of HTML forms while the Data Access layer deals with nothing but the reading and writing of data within a database. These two layers are not affected by what data is passed between them, so they can be built as standard and reusable components. It is only the components in the Business layer which need be created and maintained by the developer. While all standard behaviour is supplied by the invariant methods within the abstract class, any custom behaviour can be supplied by customisable/variable methods within each table's subclass.
The idea with RADICORE is that you should never need to customise the framework. You build a new subsystem for each new application domain and then run it. Everything is taken care of by the framework except the business rules which the developer deals with by inserting code into the relevant "hook" methods in each table's subclass.
Before I started programming with PHP I had heard so many people singing the praises of OOP and claiming that it was the best thing since sliced bread, so I started looking for answers to the questions "what is OOP and what makes it so special?" After searching the interweb thingy for a while I found the following statements:
OOP involves writing programs which are oriented around objects. Such programs can take advantage of Encapsulation, Inheritance and Polymorphism to increase code reuse and decrease code maintenance.
...
The power of object-oriented systems lies in their promise of code reuse which will increase productivity, reduce costs and improve software quality.
...
OOP is easier to learn for those new to computer programming than previous approaches, and its approach is often simpler to develop and to maintain, lending itself to more direct analysis, coding, and understanding of complex situations and procedures than other programming methods.
Having spent the previous 20 years writing database applications in several non-OO languages I was familiar with such notions as preferring modular to monolithic programs, structured programming, database design (using a mixture of hierarchical, network and relational databases), the building of libraries of reusable code and the use of frameworks to speed up development. My first framework in COBOL, which I designed and built for one particular project, was adopted as the company standard for all future projects as it cut out a lot of repetitious work and increased the productivity of the entire development team. When the company switched to UNIFACE I rebuilt the framework to take advantage of the new features that the language offered, and this increased the levels of productivity even further.
When I switched to PHP I redeveloped my framework for the third time, and I judged that my efforts would only be successful if my use of encapsulation, inheritance and polymorphism had a positive effect on my productivity. This turned out to be the case so it would seem that my implementation of the principles of Object Oriented Programming was very effective.
Abstraction is the process by which you identify entities that can be represented by classes as well as standard operations within those classes. Those operations which can be shared can be put into an abstract class which can then be inherited by those classes. If you do not perform the process of abstraction correctly then the result will be software that sucks instead of being successful because you will miss out on the following:
By following a more meaningful interpretation of "abstraction" my implementation of OOP works as follows:
If you do not create sharable code nor the mechanism to use that sharable code then you are not using OOP in the way it was designed to be used. If that is the case then shame on you!
The following articles describe aspects of my framework:
The following articles express my heretical views on the topic of OOP:
The following articles express my views on the topic of Backwards Compatibility:
The following are responses to criticisms of my methods:
11 Feb 2023 | Added Reusable Controllers |
04 Feb 2023 | Added Reusable Views |