3rd December 2003
Amended 25th November 2004
In August this year I started a thread called What is/is not considered to be good OO programming in the PHP newsgroup which created quite a lot of response, both in support of my views as well as against. A continuation of this thread was Re: All-Out Flamewar... (was good/bad OOPS) which appeared as a separate item in the online newsreader and which I nearly missed.
This all started because of an article which I published on my website called Using PHP Objects to access your Database Tables (Part 1) and Part 2 which describes how I have used the OO capabilities of PHP 4 to construct objects (what other people call Data Access Objects or DAO) to handle the business rules and data handling for all entities/tables in my development environment. I use this development environment to construct web applications, and by maximising the utilisation of reusable modules I am able to build new components very rapidly.
For the record I must state that I have never been trained as an OO programmer, nor have I used any language with OO capabilities until PHP. I have been a software engineer for over 25 years and have developed in a variety of 2nd, 3rd and 4th generation languages on a mixture of mainframes, mini- and micro-computers. I have worked with flat files, indexed files, hierarchical databases, network databases and relational databases. The user interfaces have included punched card, paper tape, teletype, block mode, CHUI, GUI and web. I have written code which has been procedural, model-driven, event-driven, component-based and object oriented. I have built software using the 1-tier, 2-tier and the 3-tier architectures. I have created development infrastructures in 3 different languages. My latest achievement is to create an environment for building web applications using PHP that encompasses a mixture of 3 tier architecture, OOP, and where all HTML output is generated using XML and XSL transformations. You can get other details by following the links on my About Me page.
Before I start listing the complaints against my article let me explain how I designed and built the code which was the subject of that article. But even before that let me explain how I acquired my knowledge of OOP as I have already admitted that I have never received any training. I have bought several books on OO and browsed the web for suitable articles, but I found them extremely uninspiring.
One thing that really annoys me about articles and tutorials on OOP that I have found on the web and in books - they all talk about creating a class called 'shape' with various subclasses for 'square', 'circle', 'triangle' etc. This is of absolutely no use when I want to build a system to deal with real-world objects such as 'customer', 'product' and 'invoice' which have corresponding database tables. This has often led me to believe that OOP is therefore unsuitable for building common-or-garden business systems as it appears to have been designed for nothing but graphical applications.
This caused me to ignore all the verbosity in those books and articles and concentrate on the principles of OOP which I obtained from the chapter on Object-Oriented Programming Concepts from the The Java Tutorial which is available on the Sun Microsystems website. As Sun Microsystems are the authors of the Java language, the most popular Object Oriented language in the world (according to TIOBE Programming Community Index), I think that what they have to say on the matter is worthy of some consideration.
Reference: What is a Class?
Reference: What is an Object?
Reference: encapsulation The localization of knowledge within a module. Because objects encapsulate data and implementation, the user of an object can view the object as a black box that provides services. Instance variables and methods can be added, deleted, or changed, but as long as the services provided by the object remain the same, code that uses the object can continue to use it without being rewritten.
Reference: What is Inheritance?
Reference: abstract class A class that contains one or more abstract methods, and therefore can never be instantiated. Abstract classes are defined so that other classes can extend them and make them concrete by implementing the abstract methods.
In my previous language I successfully implemented the 3 tier architecture which meant that the application code was split into 3 tiers or layers:-
The purpose of the 3 tier architecture is to delegate tasks to particular layers so that any one layer can be changed without having any effect on the others. In this way it should be possible to switch from one database to another just by making a change to the data access layer, or to switch from one user interface to another just by making a change to the presentation layer. All the complicated code, the processing of the business rules, is maintained within the business layer which should be able to withstand any change in the other layers without missing a beat, thus preserving all the investment that went into its construction. It should even be possible to run two different user interfaces at the same time (such as client/server and web) which share the same components in the business layer.
In order to deal with such real-world entities as 'customer', 'product' and 'invoice' each entity had its own component in the business layer. Each business layer component contained all the processing rules for an entity, and any component in the presentation layer which wanted to deal with an entity had to communicate with that entity's business component. Business components would access the physical database by communicating with a database driver in the data access layer. It seemed reasonable to me that all I had to do was transfer all the logic and functionality of my business layer 'component' into an 'object' and my first foray into the mystical world of OOP would be complete.
One aspect of this previous language was that it utilised an application model (aka data dictionary), and each real-world entity had a corresponding entry in the application model, also known as an entity, which identified the data requirements for the real-world entity. Each entity in the application model translated directly to a table in the database, so I am used to treating the term 'entity' as being any one of 'real-world entity', 'application model entity' and 'database entity'.
The principles of OOP do not include a data dictionary or application model, but they do recognise that you need somewhere to hold an object's variables when it no longer resides in memory. Using OO terminology this is known as a persistent data store. Each instance of an object can be translated into a corresponding instance (row or occurrence) in that object's persistent data store. It is obvious to me that a object must have knowledge of it persistent data store in order to communicate with it. It is also obvious to me that an object is useless without its persistent data store, and a persistent data store is just as useless without an object to maintain its contents. As an object and its persistent data store are therefore intertwined, interwoven, interchangeable, inseparable, indivisible and interdependent when I talk about one I am also including the other. So when I talk about creating a class for a database table I do not mean just the database table on its own, I mean a class for the-entity-with-its-properties-and-methods-and-business-rules-and-its-persistent-data-store-which-is-known-as-a-database-table. As I don't like using such long words I have shortened it to 'class for a table' or 'table class'.
Having identified that I need a different object for each real-world entity and that each object will have its own table in the database I set about building my first OO class. In my previous language each service component in the business layer had two operations (methods) called getData and putData. The way that data was interchanged between the presentation and business layers was through XML streams, and the putData operation could handle a mixture of insert, update and delete at the same time as each occurrence (row) contained a status attribute which identified what action was necessary. Both the getData and putData operations could handle XML streams containing any number of occurrences in any number of entities in any structure.
PHP does not have the same powerful commands to handle XML streams, so I decided to stick with simple associative arrays. This was not actually such a retrograde step as PHP's array handling functions are very flexible and very powerful, and I found that anything I wanted to do could be done, and in a lot of cases with surprising ease. I decided to stick with my getData method which would be capable of receiving an array containing any number of rows, but as my putData method could not work I changed it to insertRecord, updateRecord and deleteRecord. I have subsequently added other methods to deal with other circumstances, but these were all I needed to begin with.
One feature I have observed in examples in books and on the web is that some OO programmers like to have method names which are tailored to the object. Thus for objects such as 'customer', 'product' and 'invoice' they would have a 'getCustomer', 'getProduct' and 'getInvoice' method. I would never ever dream of doing such a thing as I like to make use of as much generic code as possible, and having a single routine which calls the 'getData' method on an object means that I can use the same routine on any object within the system rather than have a separate routine for each individual object. This I believe is acceptable in the world of OOP as it embraces the principle of polymorphism.
Another common OOP feature that I deliberately chose to ignore is having a separate 'getter' and 'setter' for each of my entity variables. Why should I waste my time in feeding an object one item of data at a time when I can feed all data in a single associative array? Why should I waste my time in retrieving data from an object one item at a time when I can retrieve all data items in a single associative array? Not only does this enable my generic code to use common methods on any object, it also means that my generic code does not even have to concern itself with the names of any data items which are in the inbound/outbound array. Thus even the data structure being used by the object is irrelevant to my generic code. This I believe is acceptable in the world of OOP as it embraces the principle of implementation hiding.
I then set about writing a class that dealt with one of the database tables in my sample application, and tested it to make sure that it could read, insert, update and delete data correctly. I then created a duplicate of this class for another database table. I was then in a position to compare the two classes to see what code was similar and what code was different, the objective being to isolate the code that was similar and make it sharable in some way so that I could use it without having to duplicate it each time.
How do you share code in OO systems? Through inheritance. How does inheritance work? Through subclasses.
So I created a superclass to hold the sharable methods and properties, and converted each existing class into a subclass so that it could share those methods and properties through inheritance. Then I began the painstaking process of moving code out of each subclass into the superclass, then testing to check that nothing had been broken. Remember that I was dealing with different classes with entirely different database tables, different data structures and different business rules, so I needed one set of code that could deal with any set of circumstances.
I used a similar technique when building the scripts in the presentation layer which communicate with my objects. I discovered that the only difference between one component and the another was the entity name and the XSL file name, so I split my scripts into two:
Each of the four initial methods which I identified as common to each class needs to generate its own SQL/DML statement in order to communicate with the database. When I searched through various books and websites for examples of how other programmers dealt with the generation of sql statements all I saw was code in which the field names were hard-wired. I did not like this idea as it would mean having to write specific code within each class to deal the class variables, and specific code for each of the insert, update, select and delete operations. What I wanted was a generic mechanism whereby I could throw some data at it and it would generate the necessary sql statement for me. As one apparently did not exist I set about building one.
Those of you who are familiar with PHP will know that when the client presses the SUBMIT button in his browser all the data from the HTML form is made available to the receiving script in the form of the $_GET or $_POST array. This is an associative array of 'name=value' pairs. Bearing in mind that values for $this->tablename and $this->dbname have already been supplied in the class constructor, how easy would it be to turn the contents of the $_GET/$_POST array into a fully functioning sql statement? Answer: very, very easy.
This is when the first obstacle appeared. The $_POST array also contains an entry for the SUBMIT button, and if this is included in the sql INSERT statement you will hit a brick wall. So, I needed a generic mechanism in which items which did not belong in the table were automatically excluded from any sql statement. Notice that I still wanted a generic method - I did not want to create a list of names to be excluded as I had no idea at this stage as to the possible variations of non-database items that could appear in the $_POST array. I chose instead to have another class variable called $this->fieldlist to contain a list of all the column names that were valid for that particular table. That single decision proved to be the start of something which grew to be a very useful and powerful feature in my table class.
The second obstacle appeared when I wanted to perform an update. In order to identify the row that you wish to update it is necessary to supply values for the primary key fields in the WHERE clause, and as some of you may be aware each database table has its own primary key. A simple and generic solution came to mind very quickly - simply extend my $fieldlist array in the class constructor to indicate which field(s) in that table formed the primary key. This meant a small change from this:
$this->fieldlist = array('column1', 'column2', 'column3');
to this:
$this->fieldlist['column1'] = array('pkey' => 'y'); $this->fieldlist['column2'] = array(); $this->fieldlist['column3'] = array();
This also helped with the construction of the DELETE statement where I also need the primary key details for the WHERE clause.
The next area where I wanted to use generic code instead of custom code was in data validation. Any experienced programmer will tell you that you cannot simply take what the user has entered and write it to the database 'as is'. It has to be cleansed (fumigated, sterilized, disinfected, sanitized, purified, decontaminated). At the very least you have to check that the data is of the correct type for the field so that numeric fields only contain numbers, date fields only contain dates, and so on. I have used several languages which have incorporated a data dictionary which has allowed this primary validation to be handled automatically.
In case you do not know what a data dictionary is (also known as an application model), let me offer a brief explanation. It is a facility built into the language which allows the layout of each database table and the specifications of each field within the table to be described. Some languages allow you to import table definitions from the database schema into the dictionary, others do the reverse by allowing you to export table definitions in the form of CREATE TABLE scripts. Thus you can either build you application model from an existing database schema, or you can create your database schema from your application model.
Because the data dictionary describes the type and size of each field this allows the language to perform primary validation before control is ever passed to any program code that the developer may have written. Fields marked as required will generate an error if they are empty. It is also usual for data dictionaries to identify primary keys, candidate keys, indices, relationships and sometimes even delete constraints. Some also allow a field's display format to be defined, some allow a field's label or heading to be defined. Most also allow descriptive comments to be stored for each field and table in order to provide a central repository for documentation.
It should be fairly obvious that PHP does not come with a data dictionary, or anything closely resembling a data dictionary. Those of you with sharp minds may have noticed that my $fieldlist array could easily be extended to provide most of the information that can be found in a data dictionary, which is exactly what I did. By adding more detailed specifications for each field I turned a simple field list into a comprehensive array of field specifications, which is why I renamed it $fieldspec in Part 2 of my article. It was now a fairly straightforward step to write a routine which took the contents of my $fieldspec array and use it to perform primary validation against the contents of the $_POST array.
The $fieldspec array does not contain application data but information about that data. As such it can be referred to as meta-data.
This is also an example of declarative programming (opposite to imperative programming) as it involves the definition or creation of a set of rules or conditions, but leaves the execution or implementation of those rules to another process or module. Thus a rule may be defined in many places but executed in only one. This avoids the duplication of code required to implement the rule, and allows the same rule to be executed in a consistent manner across the whole application. It also means that should the implementation of a rule need to be changed or enhanced then the code need only be changed in a single place.
Primary validation is good enough to check that each field contains valid data, but what if it is necessary to check the contents of one field against another? For example, primary validation will ensure that fields start_date and end_date both contain valid dates, but to check that end_date is not earlier than start_date you need an additional layer of validation which I call secondary validation. I sometimes refer to primary validation as generic to denote the fact that it can be used by any and all tables whereas secondary (or custom) validation is specific to a single table. Where and how is this secondary validation specified? Rather than waste time trying to hold these rules outside the class and having to invent some mechanism to feed them into the object at runtime I decided to create an abstract method in the processing cycle into which any and all such custom validation could be inserted. In case you do not know how to implement an abstract method take a look at the internals of my getData method which contains calls to additional internal methods called pre_getData and post_getData. Both of these methods are defined in the abstract class, but they are empty. It is therefore possible for each subclass to contain its own implementation of the abstract method.
Here, for example, is the code I would have to validate my two dates:
function _cm_commonValidation ($fieldarray) // perform validation that is common to INSERT and UPDATE. { if ($fieldarray['start_date'] > $fieldarray['end_date']) { $this->errors['start_date'] = 'Start Date cannot be later than End Date'; $this->errors['end_date'] = 'End Date cannot be earlier than Start Date'; } // if return $fieldarray; } // _cm_commonValidation
This is an example of imperative programming (opposite to declarative programming) as it provides the actual instructions which are to be executed.
As I had managed to find a way to duplicate most of the functionality of a data dictionary within my code I wondered what I would have to do to fit in the remainder. As my test application included a table with a candidate key I thought I would tackle this problem next. For those of you unfamiliar with database theory every table must have one unique key in the form of a primary key, but it may also have any number of other unique keys which are known as candidate keys. Just as with the primary key these candidate keys may consist of more than one field, so I needed a mechanism to represent such a structure. My solution was to create a $unique_keys array, and the code I built to process it is documented at Changing Candidate Keys.
If a table is the ONE in a ONE-to-MANY relationship (sometimes known as a PARENT_CHILD relationship) there may be some rules to apply before a row can be deleted. For instance, you might want to prevent a parent row from being deleted if any child rows still exist, or you may want to allow the deletion but process the child rows in some way, perhaps by deleting them or by replacing the foreign key with nulls. This is another area that I wanted to automate in some way rather than have to hard-code it each time. Thus I create a $relationship array to hold the necessary rules, with the code to process this array being documented at Deleting Rows.
The end result of all this work gave me a class hierarchy with two levels:
The subclass inherits all the code from the superclass while at the same time providing the missing details which allow an object to be instantiated. These missing details are:
You should notice that my superclass matches perfectly the description given for an abstract class, and as such it is perfectly legitimate, nay obligatory, that it can only be instantiated into an object via a subclass. Accordingly I have a separate subclass for each entity/database table which contains the implementation details for that particular entity/table. When I instantiate a subclass into an object the end result is a merging of the generic code from the superclass and the specific code from the subclass.
My entire development infrastructure is described in A Development Infrastructure for PHP.
When retrieving data from the database it is possible to retrieve more rows than can be comfortably displayed within a single screen, therefore it is customary to split the database result into more manageable chunks or 'pages'. The display starts at page 1, and the user is given options on his screen, usually as hyperlinks, to jump to other pages that are currently available. This is called 'pagination'. In order for this to work successfully you need the following variables somewhere in your code:
$rows_per_page - the maximum number of database rows to be displayed on each page.$pageno - the page number required by the user. This is combined with rows-per-page to calculate the values for LIMIT and OFFSET in the sql SELECT statement.$lastpage - the total number of pages available from the current query. This is so the user can be informed 'you are currently on page X of Y'.My software is written using the 3 tier architecture, which means that I have totally separate components dealing with the presentation, business and data access layers.
There is no direct communication between the presentation and data access layers. The presentation layer passes the request to the business layer which in turn passes it (with possible modifications) to the data access layer. The data access layer gives its result to the business layer which in turn passes it (with possible modifications) on to the presentation layer. Clean and simple.
In order for the pagination as described above to work you have to consider the following:
$pageno comes in via the user interface.$rows-per-page may also be supplied by the user interface, or a default value may be determined by the business layer.$pageno and $rows-per-page so that it can calculate the correct values for LIMIT and OFFSET, These are then combined with the current SELECT, FROM and ORDER BY values into the SQL query string which is then sent to the database for processing.$lastpage is determined, and if $pageno is greater than this value then it has to be adjusted downwards. This means that the value for OFFSET will be lower than if it were calculated using the original value for $pageno. If you issue an SQL query with an OFFSET which is greater than NUMROWS the resultset will be empty, so you must ensure that OFFSET is still within bounds.$lastpage. This value is then passed back untouched to the presentation layer.You should notice here that the request for a particular page comes from the presentation layer but can only be processed by the data access layer, therefore it must pass through the business layer to get there. Similarly the value for $lastpage is generated by the data access layer, but has to pass through the business layer before it can be presented to the user by the presentation layer. This is the most efficient way I have found to achieve the desired result. It seems to me that some of these so-called OO experts are unfamiliar with the concept of "efficient programming", and this is the main reason consistently I reject their "advice".
These are the arguments and criticisms I received about my article. It was initially focussed on one small area but quickly spread to cover my whole approach to OOP.
I received this comment from someone who is supposed to be a senior and well-respected member of the PHP community (he used to be the technical editor for Zend Technologies but is no longer in that position for some reason) which stated that my class broke the rules of OO programming and would therefore be rejected by serious OO programmers. This is what he wrote about my section on class variables, specifically the variables $rows_per_page, $pageno and $lastpage.
How are these three possibly related to the DATA inside the table? They're just formatting things and therefore should not belong in the data itself. That's the whole point of OO. They have nothing to do with your object 'Database_Table'. They do not belong in the class, and are quite inflexible. The whole idea is encapsulation and inheritance. It's all about delegating responsibilities to the specific objects. If you want to write about OO Data Objects than that should definitely be in.
My response came in several parts:
Despite me asking him what he considered to be the *right* way of achieving my objective he failed to respond, probably because he didn't have an answer. It was at this point I decided to ask the wider PHP community for their views and initiated the thread in the PHP newsgroup.
André Næss wrote:
You have put *pagination* information into the DB-class, and that is what I react to. You could insert two variables, $offset and $numRows, and that would be ok. But the pagination belongs in the presentation layer.
Pagination cannot be handled in the presentation layer alone - part of it MUST be in the presentation layer, but part of it can ONLY be done in the data access layer:
$pageno, while the value for 'Z' is passed down as $rows_per_page,$rows_per_page to produce the total number of pages currently available ($lastpage). If $pageno becomes greater than $lastpage due to records being deleted it must be reset to $lastpage otherwise the full query will fail.$lastpage and the count of rows ($numrows).It is quite clear to me that it is physically impossible to handle every part of the pagination process within the presentation layer as part of it can only be done within the data access layer. I hope it is now clear to you.
As an application will contain multiple presentation layer components but only a single data access component, then surely it would not be considered 'best practice' to have that code duplicated in multiple places when it could just as easily be defined within a single place? Or do you have a different definition of 'best practice'?
Dave Martin wrote:
Well, in a way, they are for formatting really. Your layer doesn't need any concept of a page per se since it really has nothing to do with actually doing anything with the data. I certainly wouldn't call it 'wrong' that's for sure. You've simply added a small bit of presentation logic to your class. If you took what you have and moved the 'formatting/presentation' bits into a separate class leaving just the data manipulation bits in your class it might squash that criticism. Personally I'd probably do it exactly the way you have since it's rare that data manipulation and presentation are totally separate things (IMO).
Stephan Beal wrote:
I wouldn't go so far as to say BAD, but I would consider it to be ill-suited to the problem. I agree with him entirely that it's strictly formatting info, and therefor has no place in a db abstraction layer. It does, however, have a place in a DbTableRenderer, e.g., or a layer which builds off of the db layer. Consider that a PAGE is a layout convention, and layout is formatting. Data selection has no unambiguous concept of 'page'. I am of the opinion that SELECT data, for example, is in essence a formatting option (the WHERE clause, on the other hand, is specifically a logic operation), and that shoe-horning such things as paging into the db layer isn't necessary. I would first write the db layer and then make another layer which includes the selection/limitation code.
André Næss wrote:
The information is typical formatting information. The organization of data into pages belongs in presentation code. From a purist perspective, SQL operates on sets, and so operations like 'get me item 1 through 10' makes no sense since sets have no intrinsic ordering. But the second one applies the SORT operator, the result is no longer a set, but a list -- a structure with a defined order, and hence the retrieval of rows 1 through 10 suddenly makes sense.
Some people seem to have a bit of a problem distinguishing the difference between the selection of data and the formatting or presentation of data. First let me provide some definitions from the dictionary:
selection n
format n
format vt
presentation n
In my code data is selected by the data access layer when it issues the sql SELECT statement, as in:
SELECT <columns> FROM <table> WHERE <conditions> LIMIT <row count> OFFSET <offset>
If I were not allowed to use the LIMIT/OFFSET clause on the SELECT statement the only other option would be to retrieve ALL possible rows and pass them ALL back to the presentation layer which would then have the task of filtering out the chunk that it wanted to display. That would be an extremely inefficient way to implement pagination, and anyone who advocates such a level of inefficiency is clearly in the wrong profession (IMHO).
Data is formatted or presented to the user by the presentation layer which takes whatever has been returned by the data access layer (via the business layer) and generates the necessary HTML output. I do not show any code from my presentation layer here as all my HTML output is generated not by any PHP script but via XML and XSL transformations, as discussed in the following articles:
My variables $rows_per_page and $pageno originate in the presentation layer but are passed to the data access layer where they are used to construct the sql SELECT statement.
My variable $lastpage is generated by the data access layer after it has determined how many rows would satisfy that query without being restricted by the LIMIT clause. It is then passed back to the presentation layer so that it can be included in the 'page x of y' display.
They all appear in the business layer simply because they are passing through between the other two layers.
I cannot conceive of any way to achieve my objective without having those variables pass through the business layer, therefore I cannot accept any arguments that say they should not be there.
An offshoot of this technique means that my generic PHP code does not even have to deal with field names in order to generate the HTML output. How is this possible? Simply because my code passes the entire array, whatever it contains, to a standard function which turns it into an XML file. A second function then turns this into HTML output by performing an XSL transformation.
In my original implementation I had a separate XSL stylesheet for each screen as the table and column names were hard-coded within each stylesheet. I have subsequently found a way to supply the table names, column names and column labels within the XML data itself, which means that my library of XSL stylesheets is now much smaller but more re-usable. This is described in detail in Reusable XSL Stylesheets and Templates.
Stephan Beal wrote:
If it serves the purpose, great, and that makes it good for it's purpose. However, that almost inherently makes it less flexible for later adaption into other contexts. What if I wanted to use your db layer but didn't want the paging code? Is the class designed in such a way that I am forced to use it?
My code currently has the following levels of flexibility:
$curpage and $lastpage.getData() method only has to deal with one possible value. I do not need additional methods such as getPreviousPage() or getNextPage() as my existing method can cope with all possible circumstances.$pageno is less than 1 it will be reset to 1.$pageno is greater than $lastpage it will be reset to $lastpage.$pageno is provided it will automatically default to 1.$pageno until it is changed.$row_per_page is not hard-coded anywhere (although a default value is defined with the class constructor), therefore it is possible for it to be changed at any time. I have added hyperlinks to the middle line of my navigation bar which allow the user to change the page size at will.$rows_per_page to zero.If you know of any way I can make this more flexible than it already is I would be pleased to hear from you.
André Næss wrote:
This code leads to a problematic dependency between the DB and the class.
How so? All software I have ever written which communicates with a database table has to have some knowledge of the structure of that table within its code. Without that knowledge the software would not know which piece of data goes where. You cannot add a field to a table, or remove a field from a table without making a corresponding change to some part of your code. The database table and the software which accesses it must always be kept synchronised otherwise there will be big problems. This is not a unique failing in my design, it is a common feature of all languages.
I understand the point of reusing code to generate SELECT/INSERT/UPDATE/DELETE statements, but you can do that without having to create a class for every table you have. Say I wanted to talk to the table 'cars'. I'd compose an object by creating an instance of a Table class and add rules represented as objects to it. I think that if you ask some good designers they will tell you that an approach which uses instances of a single Table class is better than one which requires the declaration of a new class for each new table.
What you seem to be proposing is a structure like this:
Figure 1 - Your implementation
In this structure CLASS A contains the definition of the generic table class while SCRIPT B1, B2, B3...Bn contain the physical implementation details for individual database tables, one per table. At runtime SCRIPT Bn would create an instance of CLASS A then load the object with those implementation details. This would work for defining the table structure and primary validation rules, but it would not be able to handle any secondary validation rules.
Figure 2 - My implementation
In this structure CLASS A still exists, but it is an abstract class insofar as it is never instantiated into an object directly. Where you had SCRIPT Bn I now have SUBCLASS Bn which extends CLASS A and therefore includes its entire contents through that mechanism called inheritance. This means that I do not require any extra code to transfer details between SCRIPT Bn and OBJECT A as they are already there. The problem of secondary validation is also solved as I simply place whatever code I require in the relevant custom method which overrides the abstract method which is defined in the superclass.
If you examine these two methods closely you will find that mine is more successful than yours. My implementation demonstrates that I understand the principles of OOP whereas your implementation demonstrates that you and your so-called 'good designers' do not.
In my implementation all generic code goes into an abstract superclass, while all non-generic code goes into subclasses, one subclass per database table. The generic code is therefore shared through inheritance. Your method, and that of your so-called 'good designers', of instantiating an instance of what is clearly an abstract class to create working objects does not adhere to the principles of OOP. If I added the keyword abstract to the definition of my superclass the language would physically prevent it from being instantiated into an object, therefore the method that both you and your so-called 'good designers' use is totally at variance with the principles of OOP. It is YOUR method which is wrong, not mine.
He also wrote:
Abstract concepts are classes, their instances are objects. IMO The table 'cars' is not an abstract concept but an object in the world.
You are getting lost in your own interpretation of the terminology. You cannot create an object for 'cars' without having a class for 'cars', just as you cannot write a 'cars' record into the database without having a 'cars' table. As far as I am concerned 'class' relates to 'database table' just as 'class instance' (object) relates to 'table instance' (record). It is not rocket science, just simple logic.
According to OOP Principle #2 a 'class' is the blueprint while an 'object' is a working instance of that blueprint. If you look at the CREATE TABLE script for a table is this not a blueprint? Is not each row within the table a working instance of that blueprint? Is it therefore not unreasonable to put the table's blueprint into a class so that you can create instances of that class to manipulate the instances (rows) within that table?
He also wrote:
Classes are supposed to represent abstract concepts. The concept of a table is abstract. A given SQL table is not, it's an object in the world. You wouldn't model people by creating a Person class and subclass it for each new person would you? The concept of a Person is abstract, a given person is not.
I would not create a subclass for 'person' unless I had to process a person with totally different attributes, properties and business rules. The reason that I create a separate subclass for my 'customer', 'product' and 'invoice' tables is that they are totally different entities with totally different structures and totally different business rules. The fact that they are all database tables makes them similar but not identical, and I take care of the similarities by inheriting generic methods and variables from my abstract superclass. That which is unique (non-generic) about each particular table, such as its structure and all its business rules, goes into a separate subclass for each table.
Matty wrote:
In a system with 12 database tables, having a different class for each table? That's not good. Having a separate class *instance* is good OO, having a separate *class* is bad OO, since you lose all the potential benefits of low maintenance, etc.
Having a separate class for each database table most certainly does NOT lose the potential benefit of low maintenance. In my class hierarchy I have identified two distinct levels:
Thus all the information required to process any individual entity is contained within its own class. This is called encapsulation. Every piece of code that is common to all entities is held within a single superclass and shared through inheritance. These are two principles of OOP that were specifically designed to promote low maintenance, and unless you can find any fault with my implementation of these two principles I think you will find it extremely difficult to come up with ANY design that could possibly offer less maintenance.
Jochen Daum wrote:
This means you write the same code for each table - select, insert, update, delete again and again. But basically its always the same.
You obviously have completely misunderstood my design. All the generic code for select, insert, update and delete exists within my abstract superclass. This is inherited, not duplicated, in each table class as each table class is a subclass of my abstract table class.
If having a separate class for each database table is not considered to be good OO, then can you please explain why Martin Fowler, supposedly an expert in this field, in his book Patterns of Enterprise Application Architecture has a pattern called Table Module? This contains the following text in its description:
A Table Module organizes domain logic with one class per table in the data-base, and a single instance of a class contains the various procedures that will act on the data.
There is yet another pattern called Class Table Inheritance which has the following in its description:
Class Table Inheritance supports this by using one database table per class in the inheritance structure.
If I am wrong then Martin Fowler is also wrong. Are you going to tell him, or shall I?
lawrence wrote:
In PHP it is possible to break encapsulation by directly referencing a class variable, rather than getting the value through an accessor method. If you teach people why this is stupid, then only the stupid people will continue to do it. Thus, preserving encapsulation, even when a language allows it, is clearly a practice of such merit that we can without reservation call it a 'best practice'.
If this is supposed to be an example of 'best practice' that everyone knows, then I'm afraid that everyone has been seriously mis-informed. This whole idea is based on one inaccurate statement after another leading to a false conclusion. Take the following:
If it can be demonstrated that any of the leading statements is false, then it must follow that the conclusion in the final statement is also false.
Consider these articles I found after a quick search of the internet using google and the word 'encapsulation':
The idea of implementation hiding is to do with the hiding of critical design decisions, such as whether a particular piece of data is obtained directly from the database or derived at runtime using some magical algorithm or even plucked out of thin air. It is information about the data, not the data itself.
Here is a definition of 'data hiding' I found at SearchDatabase.com:
Data hiding is a characteristic of object-oriented programming. Because an object can only be associated with data in predefined classes or templates, the object can only 'know' about the data it needs to know about. There is no possibility that someone maintaining the code may inadvertently point to or otherwise access the wrong data unintentionally. Thus, all data not required by an object can be said to be 'hidden'.
Did you notice that it said 'data not required by an object' and not 'data inside an object' or 'data belonging to an object'?
Encapsulation means the placing of data and the operations that perform on that data in the same class. Breaking encapsulation therefore means to NOT put the data and associated operations in the same class. It has nothing to do with variables, public or otherwise.
This is just a case of misassociation, misconception, misinformation, misunderstanding, misinterpretation and misrepresentation being marketed as truth which most of you seem willing to accept without question. I seem to be one of the rare individuals who is not so gullible, probably because in my many years of experience I have encountered many so-called 'truths' which have crumbled under close scrutiny. I do not accept anything as 'true' until is has been proven to my satisfaction.
As this so-called 'rule' regarding encapsulation and public variables has been proven to be totally without merit I do not see that it is a rule worth following. It is a non-rule. It has ceased to be. It is deceased. It is defunct. It has met its demise. It has expired. It has become extinct. It has snuffed it. It has gone belly-up. It has passed away. It is bereft of life. It is a late rule. It is an ex-rule. It does not exist. It has bought the farm. It has been deep-sixed. It has departed this life. It has popped its cloggs. It has kicked the bucket. It has shuffled off this mortal coil. It is pushing up daisies. It has gone to meet its maker. It has joined the celestial choir. It has been carried away by the Grim Reaper. R.I.P.
Even more off the wall is the idea being touted around in some quarters that inheritance breaks encapsulation (refer to the third paragraph). OOP was founded on three basic principles, and these people have the nerve to say that one of these principles should be discarded because it interferes with one of the others? Puh-lease! Do you people use your brains for anything other than keeping your ears apart?
This comment came from Stephan Beal and was echoed by Zurab Davitiani.
This type of blanket statement always raises warning bells with me because I have encountered many of them in my long career and without exception they have proved to be full of holes. I do not accept such a rule until I see definitive proof and justification. If no proof is available then as far as I am concerned it falls into the same category as myths, legends and old wives' tales.
I was once told by someone 'You can't use that technique because it's inefficient'
. When I asked him to provide some sort of proof to justify that statement he could not. His only response was 'I read it somewhere'
. After a lot of digging I discovered that the statement was written about a much earlier version of the development tool running on much slower hardware, but when tested on the current versions the difference had completely disappeared. That statement had passed its sell-by date and was no longer relevant, yet that particular developer was still sticking to it, which meant that his method was now the least efficient.
Sometimes a so-called problem can be greatly exaggerated, or claimed to be applicable to every situation when in fact its scope is actually quite limited. Take the case of the COMPUTE verb in a COBOL project which I documented here.
Too few programmers today are able to judge the seriousness of a problem. They cannot tell whether it is major or minor, just that there is some kind of problem. What you need is a series of structured questions to help you gauge the scale of a problem:
If you have a problem that scores mostly a's, then you have a real problem. But if it scores mostly e's then you are making a mountain out of a molehill. Using this scale it is quite obvious that the so-called problem with the COMPUTE verb was just a minor problem in education and training which the team leader was just too lazy to implement. He found it easier to impose an outright ban. When I published my own standards in 1984 I chose a different approach. Which do you think is the more professional?
People sometimes impose blanket bans because they are either unable or unwilling to investigate the problem in any depth and therefore cannot judge whether a particular instance is beneficial or harmful. For example, there are some instances where using a technical primary key on a database table avoids certain problems. Certain people of a limited mentality have expanded this to mean that you MUST use a technical primary key on ALL database tables WITHOUT QUESTION. This is absolute nonsense as there are circumstances where a technical primary key can actually cause problems, as I have documented in my article Technical Keys - Their Uses and Abuses.
As far as I am concerned there are two ways of applying a rule - intelligently or indiscriminately. Whenever I see a rule applied indiscriminately I immediately suspect a lack of intelligence on the part of that rule's author. The following rules are prime candidates for rejection:
These rules show that the author has thought about the problem and is attempting to educate rather than dictate:
I have searched the web trying to find actual examples of how the use of public variables causes problems, and I have concluded that the scale of this problem is nowhere near as large as people would have you think. There are times when making a variable private instead of public, thus forcing you to use a 'getter' or a 'setter' is a good thing. In this way should it ever be necessary to adjust the data before it is input or output you only have to change the code inside the 'getter' or 'setter' rather than all those places which reference the 'getter' or 'setter'. If a variable is only used by internal functions then there is no reason to make it accessible to the outside world, in which case it should be declared private and not public.
There are circumstances where accessing a public variable cannot cause any problems therefore there is no advantage in banning its use. Even the authors of Java agree. The Java Tutorial in its chapter Controlling Access to Members of a Class says the following:
Declare public members only if such access cannot produce undesirable results if an outsider uses them.
Here are some examples from my article which fall into this category:
$dbobject = new mytable; $dbobject->sql_select = '...'; $dbobject->sql_from = '...'; $dbobject->sql_where = '...'; $dbobject->sql_groupby = '...'; $dbobject->sql_having = '...'; $dbobject->sql_orderby = '...'; $data = $dbobject->getData($where);
In this example all the variables I am writing to are immediately processed by code within the getData method. These variables are not used by any other methods, therefore there is no harm in leaving them public.
$dbobject->numrows will return the total number of rows which satisfied the selection criteria.$dbobject->pageno will return the current page number based on $rows_per_page.$dbobject->lastpage will return the last page number based on $rows_per_page.These variables are set by code within my getData method and are destined to be output, therefore there is no harm in accessing them directly instead of via a getter.
There is an old saying in the engineering world: "If it ain't broke don't fix it". So, if my use of public variables does not cause a problem why do I need a solution?
André Næss wrote:
In general I think your design is bad because you have to create a new class for every table you create, and you have to recode both the table and the class every time you change the table, which obviously doubles the chances of bugs. I can't really see how this can save you very much work I guess. So when you add a field to your DB, you also have to change the class, this sort of dependency can quickly lead to maintenance problems, especially if someone else is supposed to use the classes.
Having a separate class for each individual database table is supposed to be a GOOD idea in OOP as it encapsulates all the information required to process that particular database table. This includes a list of all fields and their individual characteristics so that primary validation can be done by the generic code which is inherited from the superclass instead of having to write separate code to validate each table, plus the inclusion of custom code to process business rules which are specific to that table or entity.
In the 20+ years that I have been programming I have never come across a case where software which communicated with a file or database table did not have knowledge of the structure of that file or table built into it. If you amend the file's structure without amending the software's view of that structure then you quickly run into trouble. If your software tries to write to a column that no longer exists the whole operation will fail. If you add a column without amending your software then how can it possibly supply any value for that column? This principle holds true even for software which is built around a data dictionary or application model (and I have used several in the past 20 years) - if you change the database without making a corresponding change in the dictionary/model you *will* have a problem. So when you say that my software poses a maintenance problem and doubles the chance of bugs because each table class must be kept synchronised with the structure of its table then you could not possibly be any more wrong. My software incurs no more of a maintenance problem and causes no more bugs than any other software.
As an example let me compare my techniques with those contained within Building Database Interfaces, an article recently published on the Zend website which is supposed to demonstrate how things *should* be done according to the rules of OOP.
This is supposed to be the *right* way using 'setters':
<?php $client = new Client(); $client->setUserID ( $_POST['userID' ); $client->setEmail ( $_POST['email' ); $client->setFirstname ( $_POST['firstname'); $client->setLastname ( $_POST['lastname' ); $client->setAddress1 ( $_POST['address1' ); $client->setAddress2 ( $_POST['address2' ); $client->setCity ( $_POST['city' ); $client->setProvince ( $_POST['province' ); $client->setCountry ( $_POST['country' ); if ($client->submit($db) !== true) { // do error handling } ?>
This is my way:
<?php $dbobject = new Client; $dbobject->updateRecord($_POST); $errors = $dbobject->getErrors(); ?>
Why should I waste effort in unpacking the $_POST array and feeding the data in one field at a time when I can provide it all in one single step? It is just as easy for code inside the class to address each field as $array['field'] as it is by $this->field.
Whose method will require the most changes if the structure of that table changes?
I am not the only one who thinks this way. Take a look at Why getter and setter methods are evil.
Let us look at another area within the same article. Take this code which cleans the data:
<?php $userID = (int) $_POST['userID']; $email = trim(addslashes($_POST['email'])); $firstname = trim(addslashes($_POST['firstname'])); $lastname = trim(addslashes($_POST['lastname'])); $address1 = trim(addslashes($_POST['address1'])); $address2 = trim(addslashes($_POST['address2'])); $city = trim(addslashes($_POST['city'])); $province = trim(addslashes($_POST['province'])); $country = trim(addslashes($_POST['country' ])); ?>
This is the code I use:
<?php $cleanarray = $this->validateUpdateArray($fieldarray, $this->fieldspec); if ($this->errors) { return false; } ?>
In this code $fieldarray is the array that was input via the updateRecord method, and $this->fieldspec (described in Using PHP Objects to access your Database Tables (Part 2)) is my array of field specifications for the table. My validateUpdateArray routine (which is contained within the superclass and automatically inherited by all subclasses) will examine each field in $fieldarray and where a field is specified as 'type' => 'string' it will automatically perform trim(addslashes()). Using other settings in my $fieldspec array I am also able to perform the following standard data checking:
Whose method will require the most changes if the structure of that table changes?
Let us look at yet another area within the same article. Take this code which generates an sql UPDATE statement:
<?php $DML = 'UPDATE client SET '. "firstName = '$firstName ,' ". "lastName = '$lastName ,' ". "emailAddress = '$emailAddress ,' ". "WHERE clientID= $clientID"; if ($db->Execute($db)) { return true; } else { return false; } ?>
This is the code that I use:
<?php $DML->updateRecord($this->dbname, $this->tablename, $fieldarray, $this->fieldspec); $this->errors = $DML->getErrors(); ?>
Here I am using a separate object to construct and execute all my SQL/DML statements. Notice here that both the database name and the table name are passed in as arguments. They are standard member variables within each table subclass and are assigned actual values within the class constructor. The WHERE clause of the UPDATE statement is easy to deal with as my $fieldspec array identifies those fields which are part of the primary key. For details on how this is actually implemented please refer to updateRecord Method.
Whose method will require the most changes if the structure of that table changes?
If you look closely at the code I think you will find that not only does my code NOT produce more of a maintenance problem when compared with what is supposed to be the *right* way, you should observe that it actually produces LESS of a problem as there are far fewer lines of code that would need to be modified if the structure of that table ever changed.
He also wrote:
In your approach a change in the abstract superclass could potentially wreak havoc on existing code. You can't know, all you know is that you have an enormous amount of dependency, and that's a bad thing. It could be as simple as you adding a variable for some purpose, forgetting that in one of the hundreds of derived classes you have shadowed this variable, and before you know it parts of your applications starts behaving randomly. That's no fun.
One of the benefits of reusable code is that you an write it once and share it many times. One of the drawbacks of reusable code is that you can screw up that one copy and you effectively screw up every place where it is shared. You simply have to weigh up the benefits and risks of a particular implementation, make a choice, then live with the consequences. My design has the same ratio of risk/benefit as any other design, so your statement has no value (IMHO). It is possible that my design can be broken by a fool, but then so can everybody else's. It is simply not possible to make a design foolproof - as soon as you think you've cracked it the universe will promptly invent a better class of fool.
André Næss wrote:
If you say that these variables are in fact selection criteria, then certainly you must agree that seeing as $where is a collection of selection criteria, and it's being passed to getData(), it would be much more consistent to pass them as parameters!
I do not see how that would provide any sort of improvement. It is just *different* to satisfy your personal preference.
First of all let me reiterate the fact that all OO programming languages allow 3 possible ways to feed data into an object:
set method, as in $object->setVariable('value');Secondly, let me say that I do not have to choose just one of those options and stick to it religiously just to be consistent - I will use whatever option that seems appropriate for the particular circumstances.
Where a method has potentially a large number of arguments, and most of these come from the $_GET/$_POST array and may not actually be provided, I have found code such as the following to be the most efficient:
<?php if (isset($_GET['page'])) { $dbobject->setPageNo($_GET['page']); } // if if (isset($_GET['orderby'])) { $dbobject->setOrderBy($_GET['orderby']); } // if if (isset($_GET['orderbyseq'])) { $dbobject->setOrderBySeq($_GET['orderbyseq']); } // if if (isset($search)) { $dbobject->setOrderBySeq($search); } // if if (isset($_POST['reset']) or (isset($_POST['reset_x']))) { // initialise all settings to start afresh $dbobject->setPageNo(NULL); $dbobject->setOrderBy(NULL); $dbobject->setOrderBySeq(NULL); $dbobject->setSqlSearch(NULL); } // if $fieldarray = $dbobject->getData($where); $errors = $dbobject->getErrors(); ?>
I have found this approach to have the following advantages:
set method I can include code within that method to cleanse the input of any bad values instead of having to include the code within the method which uses that variable, as in the following:function setPageNo ($pageno) { $this->pageno = abs((int)$pageno); }
That is just my personal preference, and I see no need to change it just to please you.
(sigh) I do wish you guys would not keep using such big words. I have had enough trouble with 'encapsulation' and 'polymorphism'.
André Næss wrote:
Your design is less reusable, because the validation code cannot be used in other places where it makes sense due to the lack of orthogonality.
So I asked: What has bird watching got to do with it?
He replied with http://www.britannica.com/dictionary?book=Dictionary&va=orthogonality
Two can play at that game, so I came up with http://dictionary.reference.com/search?q=orthogonal which contains the following reference:
Also used loosely to mean 'irrelevant to', e.g. 'This may be orthogonal to the discussion, but ...', similar to 'going off at a tangent'.
I know that I am not the one who keeps going off at a tangent and losing the plot.
André Næss wrote:
Your design is less reusable, because the validation code cannot be used in other places where it makes sense due to the lack of orthogonality. The design is inherently more static, and static designs are always less resistant to change.
I don't think you understand my design at all. The code which performs primary validation by comparing the contents of the $_POST array with the field specifications in the $fieldspec array is automatically invoked on every insert and update operation. All the developer has to do when creating a class for a table is set the contents of the $fieldspec array and the inherited code will take care of everything else. What is less reusable about that?
My validation code is used in every place where it makes sense. I have built several web applications using 48 tables spread across 4 databases, and which are utilised by over 200 components, so my design has been tested quite thoroughly and has met every challenge.
I don't understand the accusation that my design is too static and therefore less resistant to change. Each field in the database is of a particular type and size, therefore the primary validation is limited to checking that the data input by the user conforms to these specifications. As the number of data types is quite static the code I need to validate each data type is just as static, but it is flexible enough to deal with every combination of data type and size that is thrown at it. Where is the problem with that? Tell me, is your validation code contained within a single reusable object, or do you have to write it separately for each table?
When it comes to secondary or custom validation I have provided the following abstract methods in my superclass which are automatically called at the relevant point in the processing cycle:
_cm_validateInsert for records being inserted._cm_validateUpdate for records being updated._cm_commonValidation for records being either inserted or updated._cm_validateDelete for records being deleted.I think I have covered all the bases. Do you think you can do any better?
André Næss wrote:
Say I wanted to talk to the table 'cars'. I'd compose an object by creating an instance of a Table class and add rules represented as objects to it. Of course, the business rules should be taken care of by the DBMS, not your application code, but unfortunately that is not completely possible in current SQL systems.
But if I put the business rules inside the database instead of the class then won't this break encapsulation? Also, no DBMS can handle business rules in a 'user friendly' manner, they can either accept an insert/update or reject it. In the case of a failure the error message may not be very useful. That is why it is necessary to write program code to validate the user's input before it gets sent to the database. It has been this way for decades, and will still be this way for many more decades.
So let's take a simple rule like 'the name of the car should be at least 1 character, and no more than 32'. To model this I'd create a String object that has this limitation built in, and link it to the 'name' field in the Table object.
I do not like this idea for the following reasons:
$this->fieldspec['name'] = array('type' => 'string', 'size' => 32, 'required' => 'y');
Other common data types are Number, Date, File, Time, Creditcard etc. These are fairly simple type validations, but they should go a long way. You'll probably also want to have some sort of generic Rule object for rare cases where none of the available objects suffice, and you don't want to create a new class. It is of course very important that these things are implemented separately, so that they can be reused whenever you need to validate any data.
You mean have a separate object to deal with each of these data types? I prefer to have all my primary validation code in a single place and have it driven by the contents of my $fieldspec array. It is simple and it works very effectively, so I won't waste any time investigating your method.
But the database can enforce a lot of constraints, and one should really put as much business logic into the database as possible because:
* It's less work
* It's safer from bugs, because the DBMS developers have (hopefully) tested it properly.
* It's most likely faster.
I do not like this idea for the following reasons:
I see that you among other things have code to enforce foreign key constraints, but wouldn't it be much better not to need that code at all?
The code has to go somewhere, and I prefer to have it all in one place where it is easy to get at and easy to maintain. I have worked on a system where half the code was in program modules while the other half was spread around a multitude of database triggers and stored procedures, and I can only describe it as a 'mucking fess' (if you catch my drift). Besides, foreign key constraints in the database will only work in the way that the database designers allow them to work, but if the constraints are handled in my code then I have total control and total flexibility over how they work.
André Næss wrote:
There is a very common error which people do the first time they do OO, and that is to overuse inheritance. It's not that strange really as there's always a lot of fighting over inheritance. But object composition is a second technique that frequently applies. Where you see only inheritance, I see inheritance and composition.
I have investigated this thing called 'object composition' and found the following references:
At http://www.eagle-software.com/object.htm I found:
Object composition allows you to group components together, creating a new component. Composition of visual components requires a container component, that acts as a parent window for its subcomponents.
As this is talking about graphical/visual components I find it of no use whatsoever.
At http://brighton.ncsa.uiuc.edu/~prajlich/T/node14.html I found:
Most designers overuse inheritance, resulting in large inheritance hierarchies that can become hard to deal with. Object composition is a different method of reusing functionality. Objects are composed to achieve more complex functionality. The disadvantage of object composition is that the behavior of the system may be harder to understand just by looking at the source code. A system using object composition may be very dynamic in nature so it may require running the system to get a deeper understanding of how the different objects cooperate.
In the first place I do not have a large inheritance hierarchy (just one abstract class and one level of subclass below it) therefore I am not overusing inheritance by any stretch of the imagination. If I don't have a problem then why do I need a solution?
In the second place I am not going to change a method which is simple and effective for one that introduces another level of complexity for absolutely no discernible benefit. This is another one of those looney OO ideas that I shall consign to the khazi.
Interestingly enough the same article also contains this:
One way around this problem is to only inherit from abstract classes.
Guess who's method is only inheriting from a single abstract class? Answers on a postcard to .....
In response to my statement The language allows me to do it, therefore it cannot be wrong
André Næss wrote:
It depends on your usage of the word wrong. C allows me to write code that generates random segmentation faults, and assembler allows me to write self-modifying code.
Any programmer who writes code that causes such problems has obviously not been trained correctly. If he has been trained but still does it then he is an idiot. If I ever come across such problem code I investigate it to find out the seriousness of the problem - does it happen always or only in some circumstances - then I document it so that my findings can be passed on to others.
The ability to write self-modifying code sounds like the ALTER verb in COBOL. I wrote about that in my COBOL Programming Standards way back in 1984.
If the language has a statement/command/verb which does not cause problems, or my usage of it avoids any such problems, then I do not see any reason why I should not use it. If you have a problem with that then the problem is yours, not mine. As a software engineer my task is to write code that performs its task reliably and efficiently, and I will use any command that the language provides in any way I see fit to achieve that objective. Any command which is proven to be unreliable or inefficient I will avoid like the plague, as any competent programmer should. Notice the requirement for proof, not the acceptance of unsubstantiated opinion.
André Næss wrote:
Each language has it's collection common of knowledge known as best-practices. If you want to write good software in whatever language you choose there are basically two approaches:
- I'm so much smarter than all the people who have worked with this language before, I can do it better, because I *really* know how it should be done. Screw the best-practices and the knowledge they have collected through decades of experience, I'll do it my way.
- I'll better listen to the masters if I want to be an expert in this language.
But who exactly are my 'masters'? Where is this 'best practice' documented?
The project leader who decided that the COMPUTE verb was bad was clearly not a master of anything as I eventually ignored his project standards, his idea of 'best practice', and produced software that was praised as being the most well written, well structured and well documented in the whole project when audited by a senior consultant.
I ignored his development standards and began to create a set of my own. Several years later when I was a senior programmer in another company my personal standards were adopted as the company standards. Some time later, after my standards had been used successfully on several projects, I received praise from 2 quarters. My fellow workers came to me with their questions and their problems, so to them I was the master, the guru.
Several years later the company switched to a new language for which I developed new standards. Again I was able to produce a development environment that was unrivaled in its productivity. I was still the master.
In 1999 I joined a team of clowns who attempted to implement the 3 tier architecture in Uniface, my primary language at the time. They had a team of 6 developers who took 6 months to design and build a development infrastructure, but when it came to building live components they found that it was taking weeks instead of the estimated days. When I told them that their entire methodology was wrong they said 'It cannot be, we are following all the rules!' They asked me how I would do it, but when I told them they said 'You cannot possibly do it that way because it is the wrong way.' The client found their implementation schedule to be totally unacceptable, so the whole project was cancelled. As I had time on my hands I set about converting my existing 2-tier development infrastructure into 3-tier, and in a space of two weeks the conversion was complete and I was able to create working components in less than an hour apiece. Only someone of questionable mental ability could possibly conclude that their methodology, which took weeks to build a single component, was *right* whereas my methodology, which took hours to build the same components, was *wrong*. I found their approach and their attitude so laughable that I documented their failure in an article on my website called UNIFACE and the N-Tier Architecture and later in Disaster #3. I followed it by documenting my success in 3 Tiers, 2 Models, and XML Streams.
By this time I was getting extremely frustrated with other people's attempts at defining their version of 'best practice' so I wrote Development Standards - Limitation or Inspiration? which highlighted all the areas where their techniques were actually slowing down the development process instead of helping to speed it up.
Later on I joined another company who called themselves 'innovators' and who boasted that they had successfully implemented the 3 tier architecture. Although it worked it was extremely inefficient compared with what I was used to. I found so many places where their standards created obstacles to speedy development that I wrote them a 17 page 9,000 word document which highlighted all the deficiencies which I found. They seemed content with their development environment which took 1½ days to build a simple component until I showed them that using my own techniques I could create the same component in under an hour. I documented their 'success story' in How not to implement the 3 Tier architecture in UNIFACE.
The last article I wrote on this particular language was called UNIFACE is not a Rapid Application Development tool in which I stated that although the language had the potential for being a RAD tool every implementation I had ever seen was so ineffective and inefficient that it had obliterated that potential and turned the 'R' in RAD from 'rapid' to 'retarded'.
If the term 'masters' is supposed to identify those of superior knowledge and ability then I can only say that I have not met anyone who fits that description for over 20 years. The only thing that others seem eager to teach me is how to slow down development instead of speed it up, how to create obstacle courses for developers instead of smooth paths. This only has the effect of teaching me which practices to avoid like the plague.
Lawrence wrote:
If all the programmers using one language adopt some practice because it seems great then that becomes a 'best practice' for the language. Look around and you'll see dozens of examples of this everyday. Do not break encapsulation' is a good example of a 'best practice'. We are talking about those practices that programmers tend to pick up and imitate. If a company (or, more likely, individual programmer) develops a practice and it catches on worldwide, and most of the programmers using that language adopt it as one of the loose rules to live by, the it can be considered a best practice for that language. If a company comes up with a rule and the rule does not catch on worldwide, then the rule is not a best practice for the language.
Just because a lot of programmers follow a particular practice does not necessarily make it 'best', just 'most imitated'. Some follow a practice because they don't know any better. Others follow a practice because they are told to by people who are supposed to be their 'superiors', and they dare not disobey. I have already taken your don't break encapsulation rule and shown it to be a pile of pooh, and I have also shown that the method of using abstract classes by so-called good designers goes totally against the principles of OOP.
In response to my questions 'Who creates these best practices? Where are they published?' André Næss wrote:
Nobody creates them, they emerge by the collective work of thousands of developers.
But somebody must write them down and publish them, otherwise how are the rest of us mere mortals supposed to know about them? Have these standards been published anywhere?
No and yes. There is no bible, but there is a large body of books and articles and newsgroup discussions that all taken together is the collected knowledge.
Knowledge that is scattered far and wide and is not readily accessible is of no use to anyone. There is just not enough time to read every book that has been printed, every article that has been published, every posting in every newsgroup. The novice programmer expects to go to the library of knowledge and pick out one or two volumes he can read and digest and not have to search through every document in the entire building. Even among all this collected knowledge you will find disagreement, different interpretations and conflicting opinions. How is the novice programmer supposed to separate the wheat from the chaff? The pragmatic from the dogmatic? The awesome from the awful? The beneficial from the barmy? The creditable from the crap? The definitive from the defective? The excellent from the excrement? The fact from the fiction? The great from the grotesque? The harmonious from the hazardous? The important from the irrelevant? The joyous from the jaded? The knowledge from the sewage? The laudable from the lamentable? The meritorious from the mediocre? The notable from the naff? The obvious from the obfuscated? The praiseworthy from the pathetic? The quintessential from the questionable? The ripe from the rancid? The sweet from the sour? The truth from the tripe? The useful from the useless? The veritable from the vomitable? The worthy from the worthless?
Certain books becomes classics, classics are considered such because a large number of people view them as such. They might of course all be wrong, which is why we have the occasional paradigm shift :)
For every book you find which advocates one particular theory or method you will find others which support a contradictory theory or method. They cannot all be right, and they cannot all be wrong. Each theory will find its own group of supporters, and each group may have success while following their chosen path. If there is room in the universe for all these different methods and theories to co-exist, then why cannot I be allowed to choose my own method of implementing the principles of OOP?
All I can say that I assume nobody comes up with practices because they want to be *worse* programmers, so I think it's safe to assume that they do so to produce better software, to be better at their craft.
Nobody deliberately publishes a document of 'worst practice' (unless they have a sense of humour, in which case take a look at How To Write Unmaintainable Code), but you have to remember that each document is not the best that the entire universe of programmers has to offer but the best that a particular individual (or group of individuals) has to offer. It will be limited to his/their experiences and abilities, and not the sum total of experience and abilities of the entire universe of programmers. It is therefore highly probable that someone of greater experience or ability will eventually come along and produce something that is even better. This is called progress. When I was teaching myself PHP I took a look at some of the methods and techniques that had been published and I said to myself 'I can do better than this.' And I have.
Most of the examples of 'best practice' that I have encountered in my long career have been nothing more than brain dumps from people recently promoted to team/project leader. Instead of saying 'this is the best way to do it' their approach is 'this is the way that I do it, and I want everybody to do it the same way'. These people do not like their opinions being questioned and so refuse to discuss the possibility that there may be a better way. They do not experiment with various ways and evaluate the results to find out what is best, they simply pick an idea out of a hat (or so it seems to me) and stick with it regardless of how effective and efficient it is. Because they cannot be bothered to look for better ways they will never improve and will therefore always be second-rate.
In all the years that I have been programming I have never stopped looking for a better way, a quicker way, a more flexible way. Most of the time it has been down to a little change here, a little tweak there. Individually the changes may appear to be insignificant, but taken collectively there is a big difference. I regularly ignore the advice of those who consider themselves to be masters and I regularly outperform their feeble efforts. I did it in COBOL, I did it in UNIFACE, and it looks like I've done it again with PHP.
André Næss wrote:
If I was to code this I'd probably write a sort of PagedPresentation thingy, which is meant for this sort of situation. PagedPresentation should just supply a framework for creating paged presentations, and it has to be coupled with a data source. The data source supplies the PagedPresentation with data, and the PagedPresentation transforms these data as defined by the developer using some sort of template (a simple HTML/PHP mix is sufficient).
But there obviously has to be a connection between the data source and the presentation, because the presentation outputs stuff which in the end has to result in $_GET variables which define what page to display. And this data then has to be used to figure out what rows to fetch.
So, we have to figure out how to implement this as clean as possible. Let us start by considering the purist solution, which is to fetch *all* the rows and supply them to the presentation layer, which outputs the rows in question. How do we achieve this? Well, presumably we have a structure where a presentation module must request data from it's data source, so in the presentation module we have something like this:
$collection = $source->getData();
The source defines *what* the data are, of course. In most cases a data source is just an abstraction of a particular SQL query. getData() returns an collection, because after all, we are requesting a (possibly empty) collection of data.
So now the presentation layer is about to begin it's work, what we in a PagedPresentation would expect is something like this:
$offset = $_GET['pageNum'] * $itemsPerPage;
$data = range($collection, $offset, $itemsPerPage);
applyTemplate($data);So does this interface make sense? It does IMO, and it can be implemented in a fashion that makes it as efficient as a solution which supplies the offset and the number of items per page at an earlier stage. Why? Because we don't have to actually perform the query until the data are accessed, and they aren't really accessed until applyTemplate() is called! In this case range() is a selection utility which defines a certain subrange of the collection, but it doesn't really have to do it, nor does $source need be an actual query result, they are just a facade designed to make the interface coherent and highly reusable, the implementation can be made as efficient as possible using any trick in the book.
So let me stress that this was the *implementation* of PagedPresentation, to use PagedPresentation we would expect an interface like this:
$source = new DataSource('SELECT ALL RED CARS FROM 1998');
$presentation = new PagedPresentation($source, $templateFile, $itemsPerPage);
$presentation->display();All highly simplified, of course.
Well, umm, that may ring your bell (ding
dong), but it sounds like *dung* to me.
Glen Vermeylen wrote:
A part of your solution is Object Oriented but I have some considerations: Tables are part of a database, your approach only leaves room to work with one database.
Wrong. Each class constructor provides a database name along with the table name, and each call to my db_connect function passes this in as $this->dbname. It is therefore possible to access multiple databases in the same application, which I do quite regularly, or even in the same transaction, which I do almost as regularly.
If you create a database class (as in: $db = new dataBase('server', 'username', 'password') you could have your table class make use of this dataBase class as in $table = new table($db). Then you can distanciate yourself from the whereabouts of the tables and only worry about their data.
What would be the benefit in that?
Considering your getData() - this requires you to have insight into the inner mechanics of your class (you have to set your desired pagesize, and you have to go through trouble if you don't want to get the standard subsequent pages, set by pagesize).
You do not need to know the mechanics of the class, just the fact that there is a variable called $rows_per_page. If you can't be bothered to set it then it will use the default value. If you really want to change the page size then how is it so much trouble to issue the single command $dbobject->seRowsPerPage(n)? It is hardly rocket science.
I suggest you have a method $table->getData(selection_criteria), which gives you every result there is.
No way, José I would never retrieve thousands of rows if all I wanted to display was 10. Do you have any idea how inefficient that would be?
Then you could have a method $table->getPage(selection_criteria, pagesize='defaultvalue', pagenumber='1'). (the '=' will use a defaultvalue, if you don't pass that parameter).
I could, but I won't. In my method I only need to specify pagesize and pagenumber when I actually want to change the current values. See You should use arguments, not variables for other reasons.
Now you can make a getNextPage(), getPrevPage(), ... which do what you can guess... :).
No thanks. I prefer using a single method that can get ANY page rather than multiple methods that can only get a particular page. It is simple, efficient and elegant. These are concepts that you would do well to learn.
I don't know anything about error-handling with php, ...
Now why does that not surprise me?
... but you might want to consider tying mysql-specific errors to your mysql-classes.
$results = $table->getPage(...);
if ($table->hasError())
$table->printError();
That way, with delegation (and overriding) your $db errors can be delegated to $table...
And what exactly do you find wrong with my present error handler? What does your technique do that mine does not?
I hope this has been of any help.
Absolutely no help whatsoever.
ps: This is not written out of arrogance and I don't think I know it better than anyone else. I'm just a student, and also was pondering about a nice oo-approach to php/mysql scripting.
Here's my advice (for what it's worth):
This is just a mere braindump.
It is certainly a 'dump' of some kind, but I suspect from a part of your anatomy which is nowhere near your brain.
I think I have demonstrated that by simply following the principles of OOP and by using the OO capabilities within PHP I have managed to produce software that is truly object oriented and which enables me to create new components very quickly. Most of the criticisms I have received have been along the lines of 'you must not do it that way because it breaks the rules.' What rules? There is no rule in the principles of OOP which says I *must* create a class for *this* and not for *that*. There is no rule that says I *must* use inheritance in *this* way and not *that* way. There is no rule that says I *must* use variables in *this* way and not *that* way. These 'rules' may have been formulated by various individuals to show how they personally have chosen to implement the principles of OOP, but they are in addition to the principles, not part of them. They do not identify the only way that the principles can be implemented, just one way out of a myriad of possibilities. I just happen to have chosen a different method.
My method is right for me, your method is right for you, and Tom/Dick/Harry's method is right for Tom/Dick/Harry. Provided that these different methods are capable of producing workable and maintainable software each should be allowed to exist and be judged by the results it produces rather than being slammed simply for 'being different' or 'not invented here'. As the French would say, 'Vive la Difference'. If everybody always did things the same way there would be no progress made, ever. I choose to do things differently because I have a creative streak. I choose to innovate, not imitate. I don't like relying on other people's methods when I reckon that I can do better.
Instead of reviewing my work and saying 'It is different, but it works' you have chosen 'It is different, therefore it is unacceptable'. That shows that you have closed your minds to the possibility that there may be different and better ways. Just like a religious fanatic you seem to think that your way is the only way, that yours is the only 'true' religion, and anyone who refuses to kowtow at the alter of your religion is an unbeliever, a heretic, a nonconformist and a renegade. To that charge I plead guilty.
I have met people like you before, people who take a perfectly good set of principles and make them totally unworkable by adding on layer upon layer of extraneous rules and regulations. This crowd of clowns tried to create a development infrastructure based on the 3 tier architecture, but after spending 3 man-years designing and building their monstrosity they discovered that it was taking weeks to build each live component. I took my own 2-tier infrastructure and converted it into 3-tier, and within 2 man-weeks I was able to create working components in under an hour each. So how did I manage to succeed where they had failed? Simply because I ignored their petty rules and regulations and stuck to the basic principles. I used my knowledge of the language to write code that fulfilled the principles as efficiently as possible.
You have taken this concept of orthogonality [1] too far and wandered off at a tangent. You have strayed so far from the path of righteousness that you have ended up in a place which is sticky, smelly and downright unpleasant. Yet you have the audacity to tell me that I am wrong! Pull the other one chaps, it's got bells on.
As far as I am concerned you bunch of tragicians [2] have taken a reasonable (note that I do not go so far as to say 'brilliant') concept called OOP and turned it into POOP (as in pooh, bowel movements, buffalo chips, compost, cow pats, crap, doo-doo, doggy-doo, droppings, dung, effluent, excrement, excreta, faeces, fertilizer, guano, jobbies, manure, muck, prairie oysters, road apples, sewage, turds [3]). Your petty rules are a hindrance to productivity rather than a help, so I suggest you take them away and flush them down the toilet where they belong. Instead of being OOPers you are POOPers [4].
In his article Object Oriented Programming Oversold! the author devotes a section to Cult Oriented Programming in which he identifies two types of zealotry - ideological and practical. The ideological zealot will follow a methodology to the letter without regards to the effect it has on the result ('I have followed all the rules therefore it cannot be wrong!') whereas the practical zealot will use his skills and abilities to get the best results in the shortest possible time, even if it means ignoring any aspects of the methodology that get in the way. I prefer to worship at the alter of pragmatism, not dogmatism.
Computer programming is an art, not a science, therefore you will never be any good unless you have some artistic skill. The purpose of all these different methodologies with their different rules is to offer the budding artist a sort of 'painting by numbers' kit. A true artist will quickly see beyond the limitations of any particular kit and experiment with different rules, methodologies and techniques in order to produce a bigger, better and brighter picture. Those without any worthwhile talent will struggle blindly along and complain 'I am following all the rules, so why is the result not very good?'
It I were to write my code according to your rules then I would be no better than you. But I believe that your best is simply not good enough, not by a long chalk, and rather than work down to your level I prefer to seek better ways. The only way to improve something is to try a different approach, a different angle, a different way. Inevitably this means adopting a different set of rules. I have been disregarding other people's views of 'best practice' for over 20 years and I have regularly and consistently produced work of a higher quality. I see no reason to take any of your advice and reverse that trend. My long experience has given me the ability to tell the difference between excellent and excrement, between magic and tragic, so I know which category your advice falls into.
André Næss wrote:
In general I think it takes something like 10 years to become a good programmer .....
I've been working with OO for something like 5 years, so I feel that I have a fairly good grasp on it.
So by your own admission you are only halfway there. Or does it mean that you are only half as good as you think you are?
Of course, there's also a bit of designer's gut-feeling here.
I have been designing and building software for over 25 years and I have learnt to tell the difference between something that works and something that fails, something that aids productivity and something that obstructs it. I may only have been programming with objects for a relatively short time, but I have been programming with components (which are not that much different) for the past 10 years. I found it relatively easy to take a successful design from my previous component-based development language and implement it equally successfully in PHP using OOP techniques. It is not just my gut that tells me that the infrastructure I have created is a neat piece of work, it is the fact that I have used it to create over 200 components which deal with a wide range of database structures (one-to-many, many-to-many, even tree structures). I can create new components faster than most people can draw their silly UML diagrams. For example, I can create a database table and write a family of six components (list, search, add, update, enquire, delete) in under a hour, making an average of one component every 10 minutes. Can you match that?
I regard myself as a well-rounded 'software engineer' and therefore a breed apart from most of today's narrow-minded OO programmers who have only been taught one discipline, and judging by the comments I have received regarding my article, taught very badly at that. I am only interested in building practical solutions to real-world problems, not in playing around with impractical theories which have been misinterpreted and mangled beyond all recognition. In the story The Engineer vs the IT Consultant I always side with the engineer's point of view, not the IT nerd.
If you think my ideas and views are total rubbish then don't bother to try out my sample forms which are documented in A Sample PHP Application. And please don't bother to run them online from here or download the code from here as you won't like what you see. The fact that it works, and works rather well thank you very much, should be irrelevant to people who are quite content in writing crap code provided that it follows all their stupid rules and regulations.
Here endeth the lesson. Don't applaud, just throw money.
[1] Look mum! I've found a place to use that big word!
[2] Tragicians - those who deal in the tragic instead of the magic.
[3] Did you notice that I did not use the sh** word? One has to draw the line somewhere, doesn't one?
[4] POOPers - those who think their work is excellent when in reality it is excrement.
© Tony Marston
3rd December 2003
http://www.tonymarston.net
http://www.radicore.org
| 25th Nov 2004 | Updated Argument #5 to contain a link to the Table Module pattern described by Martin Fowler in his book Patterns of Enterprise Application Architecture. |
| 1st Nov 2004 | Updated Argument #15 to contain a link to How To Write Unmaintainable Code. |
| 5th Dec 2003 | Added You should try object composition instead of inheritance. |