Creating a PostgreSQL driver for the Radicore Development Infrastructure

Posted on 10th September 2005 by Tony Marston

Introduction
Administrative programs
Connecting to a Database
CREATE TABLE syntax
SQL queries
- CONCAT() function
- JOIN
- GROUP BY
- INSERT syntax
- AUTO_INCREMENT
- IFNULL() function
- ARRAY datatype
- INFORMATION_SCHEMA.COLUMNS
Conclusion
Comments

Introduction

The software that I write is not intended for just one customer who is locked into a single RDBMS. Instead it can be used by virtually any customer using the RDBMS of their choice. How is this possible? I deliberately designed and built my infrastructure around the 3 Tier Architecture which breaks down application code into three distinct areas of responsibility:

Presentation logic = User Interface, displaying data to the user, accepting input from the user.
Business logic = Business Rules, handles data validation and task-specific behaviour.
Data Access logic = Database Communication, constructing SQL queries and executing them via the relevant API.

It is only the Data Access layer which communicates with the database, therefore it should be possible to communicate with another database simply by changing the component which exists within this layer. I have already produced two versions of my Data Access Object for MySQL:

One for versions up to 4.0 which uses the original mysql_* functions.
Another for version 4.1 and above which uses the 'improved' mysqli_* functions.

The purpose of this article is to document my experiences while producing a DAO for a totally different RDBMS, in this case for PostgreSQL (version 8.0.3).

Administrative programs

Although both MySQL and PostgreSQL come with command line interfaces I think that as we are now in the 21st century the vast majority of developers would insist on a set of proper GUI tools.

MySQL has the following:

MySQL Workbench - a multi-platform GUI tool.
phpMyAdmin - a web-based tool written in PHP.

PostgreSQL has the following:

pgAdmin - a multi-platform GUI tool.
phpPgAdmin - a web-based tool written in PHP.

These are all freely available, although there are also commercial products available for both databases.

In my personal opinion the tools for PostgreSQL are second rate when compared with those for MySQL. They do not have the same set of features, and those for PostgreSQL seem clumsy and clunky. In particular phpPgAdmin has the following faults which need addressing:-

After pressing on the SQL link at the top of the screen a window opens up ino which you can insert your SQL query. You then press the GO button and nothing happens! This is because it has actually opened up a second browser window for the SQL query instead of overwriting the first, and the results of that query are displayed in the first window while you are sitting there looking at the second window. How user-unfriendly is that?
The version that I tested (3.5.5) suffers from a bad programming practice which generates an E_NOTICE in PHP version 4.4.0. The code is littered with lines such as:
```
$rs = @$this->Execute($sql,$inputarr);
```
which produce the message NOTICE: Only variable references should be returned by reference

Connecting to a Database

MySQL has tables within databases while PostgreSQL has tables within schemas within databases. Every database has a default "public" schema, while other schemas can be created on demand. It is possible for the same table name to exist within different schemas within the same database.

With MySQL once you have established a connection to a server instance you may access any table within any database. One database is nominated as the "default" or "current" database by using the mysqli_select_db function, in which case any table names within that database do not have to be qualified. Tables in other databases can only be accessed by qualifying them with the database name, as in dbname.tablename.

With PostgreSQL a connection is limited to a single database, and you may only access tables that belong to schemas within that particular database. You can make a particular schema "default" or "current" by issuing an SQL query to set the schema search_path. Tables in other schemas can only be accessed by qualifying them with the schema name, as in schema.tablename.

I personally like to create a separate database for each of my software subsystems as it makes backups and restores that much easier, so the ability to swap a MySQL "database" for a PostgreSQL "schema" is very convenient.

In order to make my code access database tables through PostgreSQL I made the following changes to my code:

1. DB.INC

This is a small include file which exists outside the web root, and which contains the details necessary to connect to the database. It has been changed from this:

<?php 
$dbms       = 'mysql';
$dbhost     = 'localhost';
$dbusername = '****';
$dbuserpass = '****';
?>

to this:

<?php 
$dbms       = 'pgsql'; 
$PGSQL_dbname = '****'; 
$dbhost     = 'localhost';
$dbusername = '****';
$dbuserpass = '****';
?>

The global variable $dbms identifies which class file to access before creating the Data Access Object that will be responsible for all communication with the database. This is documented in my FAQ.

The new global variable $PGSQL_dbname identifies the database name to be used in the pg_connect function.

2. Database Class (dml.<engine>.class.inc)

Each database class has a function which connects to the relevant server. The code in the MySQL class is as follows:

    function connect ($dbname=null)
    // establish a connection to the database
    {
        global $dbhost, $dbusername, $dbuserpass;
        
        $this->errors = array();
        $this->dbname = $dbname;
        
        static $dbconn;
        
        if (!$dbconn) {
            $dbconn = mysqli_connect($dbhost, $dbusername, $dbuserpass);
            $this->dbconnect = $dbconn;
        } // if
        
        if (!$dbconn) {
            return FALSE;
        } elseif ($dbname) {
            if (!mysqli_select_db($dbconn, $dbname)) {
                return FALSE;
            } // if
        } // if
        
        return TRUE;
   
    } // connect

The code in the PostgreSQL class is as follows:

    function connect ($schema=null)
    // establish a connection to the database
    {
        global $dbhost, $dbusername, $dbuserpass, $PGSQL_dbname;
        
        $this->errors = array();
        $this->dbname = $PGSQL_dbname;
        
        static $dbconn;
        
        if (!$dbconn) {
            $string = "host=$dbhost user=$dbusername password=$dbuserpass dbname=$PGSQL_dbname ";
            $dbconn = pg_connect($string);
            $this->dbconnect = $dbconn;
        } // if
        if (!$dbconn) {
            return FALSE;
        } elseif ($schema) {
            $this->query = "SET search_path TO $schema";
            $result = pg_query($dbconn, $this->query) or trigger_error($this, E_USER_ERROR);
            return TRUE;
        } // if
        
        return TRUE;
   
    } // connect

As you can see a schema in PostgreSQL maps easily to a database in MySQL.

CREATE TABLE syntax

Although both MySQL and PostgreSQL claim to be SQL compliant, there are some differences in the syntax of the CREATE TABLE command:

Does not like IF NOT EXISTS on CREATE TABLE.
Does not like ticks around table or column names, wants either nothing or double quotes.
Does not like UNIQUE to specify a unique key, wants a separate ADD CONSTRAINT statement instead.
Does not like KEY to specify an index, wants a separate CREATE INDEX statement instead.
Does not like a column type of DATETIME, wants TIMESTAMP instead.
Does not like a column type of TINYINT, will accept SMALLINT instead.
Does not like a column type of MEDIUMINT, will accept BIGINT instead.
Does not like a column type of DOUBLE, will accept DOUBLE PRECISION instead.
Does not like size after integer column type.
Does not like the qualifier UNSIGNED or ZEROFILL on any numeric column type.
Does not like '0000-00-00 00:00:00' as the default for TIMESTAMP columns, will only accept a valid date and time.
Does not like TINYTEXT, MEDIUMTEXT or LONGTEXT, wants nothing but TEXT instead.

This means that the following in MySQL:

CREATE TABLE IF NOT EXISTS `tablename` (
  `field1` varchar(8) NOT NULL default '',
  `field2` tinyint(4) unsigned default NULL,
  `field3` smallint(3) unsigned default NULL,
  `field4` mediumint(8) unsigned default NULL,
  `field5` double default NULL,
  `field6` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`field1`),
  UNIQUE KEY  (`field2`),
  KEY `field2` (`field3`)
);

will have to be changed to work in PostgreSQL, as follows:

CREATE TABLE tablename (
  field1 varchar(8) NOT NULL default '',
  field2 smallint default NULL,
  field3 smallint default NULL,
  field4 bigint default NULL,
  field5 double precision default NULL,
  field6 timestamp NOT NULL default '2000-01-01 00:00:00',
  PRIMARY KEY  (field1)
);
ALTER TABLE tablename ADD CONSTRAINT tablename_field2_key UNIQUE (field2);
CREATE INDEX tablename_index(n) ON tablename (field3);

SQL queries

As well as differences in the syntax for creating tables there is also a difference in the syntax of various SQL queries.

CONCAT() function

The following in MySQL:

SELECT CONCAT(first_name, ' ', last_name) AS person_name FROM ....

has to be converted to the following in PostgreSQL:

SELECT first_name || ' ' || last_name AS person_name FROM ....

I can deal with this difference by creating a user-defined function in the PostgreSQL installation to perform the necessary conversion, thus enabling me to leave the original MySQL syntax intact. Here is an example of a suitable function:

create or replace function concat(text,text) returns text as
'select $1 || $2' language sql strict immutable;

create or replace function concat(text,text,text) returns text as
'select $1 || $2 || $3' language sql strict immutable;

create or replace function concat(text,text,text,text) returns text as
'select $1 || $2 || $3 || $4' language sql strict immutable;

This function is included in the string.sql script which is included in the mysqlcompat library written by Chris Kings-Lynne. This can be found in the radicore/dict/sql/postgresql/mysqlcompat/ directory.

Note that because of a difference in the way that double quotes are handled the following will work in MySQL but will fail in PostgreSQL:

SELECT CONCAT(first_name, " ", last_name) AS person_name FROM ....

This is because PostgreSQL uses double quotes around identifiers (names of tables, columns or other database objects) when they contain non-standard characters or are case-sensitive. MySQL on the other hand uses the backtick (`) for this purpose, leaving strings to be delimited by either single or double quotes.

JOIN

The following query produces ERROR: JOIN/ON clause refers to "a", which is not part of JOIN.

SELECT a.a_id, b.b_id, b.b_desc, x.a_id
FROM a, b 
LEFT JOIN x ON (x.b_id = b.b_id AND x.a_id = a.a_id) 
WHERE (a.a_id = 'whatever')

This was cured by changing it to the following:

SELECT a.a_id, b.b_id, b.b_desc, x.a_id
FROM a, b 
LEFT JOIN x ON (x.b_id = b.b_id AND x.a_id = 'whatever') 
WHERE (a.a_id = 'whatever')

This error seems rather dubious to me as it may not be possible to have every part of the ON clause limited to the two tables 'b' and 'x'. Provided that at least one JOIN condition specifies those two tables then any other JOIN condition should not be so limited. In the above example I was able to get around the problem by specifying a literal instead of a column name, but this may not be possible in all circumstances.

In another script I also found that changing the order of the table names in the FROM clause produced the same error. Thus FROM b,a fails while FROM a,b works. MySQL would appear to be more intelligent as it uses the ON conditions to work out how to read the joined table rather than insisting that any table name in the ON condition is either the table immediately following the JOIN clause or the table immediately preceding the JOIN clause. This means, for example, that using PostgreSQL I would not be unable to perform a JOIN on anything other than the last table specified in the FROM clause. The following code would therefore be invalid in PostgreSQL:

SELECT a.a_id, b.b_id, b.b_desc, x.a_id, z.z_desc  
FROM a, b 
LEFT JOIN z ON (z.z_id = a.z_id)
LEFT JOIN x ON (x.b_id = b.b_id AND x.a_id = 'whatever') 
WHERE (a.a_id = 'whatever')

However, this same code DOES work in MySQL, so it's a simple matter to convert all JOIN statements so they work with both databases.

GROUP BY

The following query produces ERROR: column "a.a_desc" must appear in the GROUP BY clause or be used in an aggregate function.

SELECT a.a_id, a.a_desc, COUNT(b.a_id) AS child_count 
FROM a
LEFT JOIN b ON (b.a_id=a.a_id) 
GROUP BY a.a_id

This is perfectly valid in MySQL as it does not insist that the GROUP BY clause contains the names of ALL selected columns. This gives performance advantages for common queries. This seems reasonable to me as on table 'a' the column 'a_id' is the primary key while 'a_desc' is merely a textual description and therefore irrelevant as far as any grouping is concerned.

PostgreSQL is not so forgiving, therefore every column in the SELECT clause must also appear in the GROUP BY clause, as in the following:

SELECT a.a_id, a.a_desc, COUNT(b.a_id) AS b_count 
FROM a
LEFT JOIN b ON (b.a_id=a.a_id) 
GROUP BY a.a_id, a.a_desc

Just as with the JOIN the solution is to amend the SQL statement until it works with both databases.

INSERT syntax

In standard SQL there are two forms of the INSERT command:

INSERT INTO tablename VALUES ('value1','value2',...) 
INSERT INTO tablename (col1,col2,...) VALUES ('value1','value2',...)

What I like about MySQL is that it also allows UPDATE-style syntax, as follows:

UPDATE tablename SET col1='value1', col2='value2', ...
INSERT INTO tablename SET col1='value1', col2='value2', ...

I prefer this approach as it keeps column names and values together instead of in two separate lists. This seems such a common sense idea that I'm surprised it is not part of the standard already. Why allow 2 formats for INSERT and a completely different 3rd one for UPDATE?

Even though PostgreSQL is not so accommodating, the fact in my infrastructure all QUERY statements are constructed and issued within the relevant database driver means that I can customise each driver according to the whims of the individual database. The remainder of my infrastructure remains totally oblivious to the internal workings of any individual driver.

AUTO_INCREMENT

MySQL makes it very easy to use a technical key in any table. All the DBA need do is include the AUTO_INCREMENT keyword in the table definition, as in the following:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `desc` varchar(40) default NULL,
  PRIMARY KEY  (`id`)
);

Although the term AUTO_INCREMENT does not exist in PostgreSQL, it has the equivalent in SERIAL, as in the following:

CREATE TABLE test (
  id serial NOT NULL,
  desc varchar(40),
  PRIMARY KEY  (id)
);

The only other difference is the method used to obtain the last number issued. In MySQL there is a simple function:

$id = mysql_insert_id($link);

With PostgreSQL the same can be achieved with:

$query = "SELECT currval('" .$tablename .'_' .$pkey ."_seq')";
$result = pg_query($link, $query) or trigger_error($this, E_USER_ERROR);
$id = pg_fetch_result($result, 0, 0);

This is because the SERIAL keyword makes use of a counter with the default name of <tablename>_<fieldname>_seq. This can be accessed using the currval() and nextval() functions.

IFNULL() function

The ifnull() function does not exist in PostgreSQL, but as the same functionality can be achieved with the coalesce() function, which does exist in both MySQL and PostgreSQL, it is a simple enough task to swap one function for the other within my application and have code that works in both databases.

ARRAY datatype

Although a database column is only supposed to contain a single value, most modern databases allow a column to contain an array of values. MySQL has the SET datatype which can be defined as follows:

CREATE TABLE `x_person` (
  ....,
  `favourite_food` set('1','2','3','4','5','6','7','8','9','10') default NULL,
  ....,
  PRIMARY KEY  (`person_id`)
);

The values in this array are not displayed to the user, they are merely keys to another array which exists in the language_array.inc file. This is part of the Internationalisation feature which allows text to be displayed in various languages. Sample contents are shown below:

$array['favourite_food'] = array('1' => 'Eggs',
                                 '2' => 'Bacon',
                                 '3' => 'Chips',
                                 '4' => 'Beans',
                                 '5' => 'Sausages',
                                 '6' => 'Mushrooms',
                                 '7' => 'Tomatoes',
                                 '8' => 'Hash Browns',
                                 '9' => 'Toast',
                                 '10' => 'Fried Bread');

PostgreSQL has the ARRAY datatype. This is defined by appending square brackets ([]) to the data type name of the array elements, as shown in the following example:

CREATE TABLE x_person (
    ....,
    favourite_food character varying(2)[],
    ....
);

Although PostgreSQL allows multi-dimensional arrays the Radicore framework currently only supports one-dimensional arrays.

Inserting values into the table requires a slightly different technique depending on which database is being used. For example, when inserting/updating the column with the values 2,4 and 5 the following variations are necessary:

With MySQL it is as simple as putting the values into a comma-delimited string, as in the following:

INSERT INTO x_person (..., favourite_food, ...) VALUES ('...', '2,4,5', '...');

With PostgreSQL each element has to be enclosed in double quotes and the entire array surrounded by curly braces, as in the following:

INSERT INTO x_person (..., favourite_food, ...) VALUES ('...', '{"2", "4", "5"}', '...');

INFORMATION_SCHEMA.COLUMNS

I use this to help populate my Data Dictionary, but unfortunately the PostgreSQL implementation is broken as it is based on faulty sample code in the SQL:1999 standard (but corrected in the SQL:2003 standard) which causes the COLUMN_DEFAULT data to be blanked out even though the user has access privileges on the target table. Fortunately this can be corrected by updating the COLUMNS view within the INFORMATION_SCHEMA schema using the script which is located at radicore/dict/sql/postgresql/information_schema.sql. Note: this fix is not required in PostgreSQL version 8.2

Conclusion

In theory it should be a relatively simple exercise to switch from one SQL database to another. As is usual in this universe there is a big difference between theory and practice. Although all the database vendors 'supposedly' follow the same standards, they all seem to have followed different interpretations of those standards, followed by different implementations, all of which are capped off with different proprietary extensions. This makes the developer's job of switching database vendors more complicated than it need be, but by following some simple steps it is possible to take that mountain and turn it back into a mole hill:

When it comes to database capabilities it is better to stick to the lowest common denominator and avoid those custom extensions which are unique to a particular vendor's database. Some people say that if you are paying money for a database with fancy extensions then it is a waste of money not to use them. But by locking yourself into a particular vendor's product you may actually be throwing money down the drain as you are effectively making it impossible to switch to a cheaper database from a different vendor.
By having all the database access code isolated in a single component, a database driver, instead of having it spread throughout the system means that you only have to change that one component instead of the whole system. Although the internals of each database driver are totally unique they all share the same API's, therefore the remainder of the system is totally unaffected when one driver is swapped for another.

I have included a version of my PostgreSQL driver with my sample application, so you now have the option of running it with either MySQL or PostgreSQL.

NOTE: All testing was done against PostgreSQL version 8.0.3 running on Windows XP using PHP versions 4.3.11 and 5.0.4.

counter

Tony Marston's Blog About software development, PHP and OOP

Creating a PostgreSQL driver for the Radicore Development Infrastructure