Creating a SQL Server driver for the Radicore Development Infrastructure

Posted on 1st January 2011 by Tony Marston

Amended on 1st November 2015

Introduction

The software that I write is not intended for just one customer who is locked into a single RDBMS. Instead it can be used by virtually any customer using the RDBMS of their choice. How is this possible? I deliberately designed and built my infrastructure around the 3 Tier Architecture which breaks down application code into three distinct areas of responsibility:

Presentation logic = User Interface, displaying data to the user, accepting input from the user.
Business logic = Business Rules, handles data validation and task-specific behaviour.
Data Access logic = Database Communication, constructing SQL queries and executing them via the relevant API.

It is only the Data Access layer which communicates with the database, therefore it should be possible to communicate with another database simply by changing the component which exists within this layer. I have already produced the following data access classes:

One for MySQL version <= 4.0 which uses the original mysql_* functions.
One for MySQL version >= 4.1 which uses the 'improved' mysqli_* functions.
One for PostgreSQL version >= 8.0 which uses the pg_* functions.
One for Oracle version >= 10g which uses the OCI8 functions.

The purpose of this article is to document my experiences while producing a DAO for a totally different RDBMS, in this case for SQL Server 2008 R2 Express Edition.

Administrative programs

Both MySQL and SQL Server come with GUI administrative tools. MySQL has the following:

MySQL Workbench
phpMyAdmin - a web-based tool written in PHP.

SQL Server has the following:

Connecting to a Database

Just like MySQL, SQL Server has tables within databases. There are schemas, but unlike SQL Server 2005 (and Oracle) schemas are no longer equivalent to database users. Each schema is now a distinct namespace that exists independently of the user who created it. In other words a schema is simply a container of objects. A schema can be owned by any user, and its ownership is transferable.

Once you have established a connection to a server instance you may access any table within any database. One database is nominated as the "default" or "current" database either by specifying it in the connection string, or by switching to it at runtime by issuing the USE <dbname> query. Tables in other databases can only be accessed by qualifying them with the database and schema names, as in dbname.schema.tablename.

In order to make my code access database tables through the Oracle OCI8 interface I made the following changes to my code:

1. CONFIG.INC (DB.INC in the small sample application)

This is a small include file which exists outside the web root, and which contains the details necessary to connect to the database. It has been changed from this:

<?php 
$dbms       = 'mysql';
$dbhost     = 'localhost';
$dbusername = '****';
$dbuserpass = '****';
?>

to this:

<?php 
$dbms           = 'sqlsrv'; 
$serverName     = '(local)';
$connectionInfo = array('CharacterSet' => 'UTF-8',
                        'ReturnDatesAsStrings' => true);
$SQLSRV_schema  = 'dbo';
?>

The global variable $dbms identifies which class file to access before creating the Data Access Object that will be responsible for all communication with the database. This is documented in my FAQ.

The global variables $serverName and $connectionInfo are used in the sqlsrv_connect function which uses Windows Authentication instead of a username and password. Note that I use the sqlsrv driver (which can be downloaded from http://msdn.microsoft.com/en-us/sqlserver/ff657782.aspx) instead of the msql extension.

The global variable $SQLSRV_schema is used in any JOIN clauses which specify a database name as well as a table name.

2. Database Class (dml.<engine>.class.inc)

Each database class has a function which connects to the relevant server. The code in the MySQL class is as follows:

    function connect ($dbname=null)
    // establish a connection to the database
    {
        global $dbhost, $dbusername, $dbuserpass;
        
        $this->errors = array();
        $this->dbname = $dbname;
        
        static $dbconn;
        
        if (!$dbconn) {
            $dbconn = mysqli_connect($dbhost, $dbusername, $dbuserpass);
            $this->dbconnect = $dbconn;
        } // if
        
        if (!$dbconn) {
            return FALSE;
        } // if
        
        if ($dbname) {
            if (!mysqli_select_db($dbconn, $dbname)) {
                return FALSE;
            } // if
        } // if
        
        return TRUE;
   
    } // connect

The code in the SQL Server class is as follows:

    function connect ($dbname=null)
    // establish a connection to the database
    {
        global $serverName, $connectionInfo;
        
        $this->errors = array();
        
        static $dbconn;
        
        if (!$dbconn) {
            $dbconn = sqlsrv_connect($serverName, $connectionInfo) or trigger_error($this, E_USER_ERROR);
            $this->dbconnect = $dbconn;
        } // if
        
        if (!$dbconn) {
            return FALSE;
        } // if
        
        if (!empty($dbname)) {
            $result = sqlsrv_query($dbconn, "USE $dbname") or trigger_error($this, E_USER_ERROR);
        } // if
        
        return TRUE;
   
    } // connect

As you can see the two methods are quite similar.

CREATE TABLE syntax

Although both MySQL and Oracle claim to be SQL compliant, there are some differences in the syntax of the CREATE TABLE command:

Does not like IF NOT EXISTS on CREATE TABLE.
Does not like ticks around table or column names, wants either nothing or '[' and ']'.
Does not like UNIQUE KEY to specify a unique key, wants just UNIQUE instead.
Does not like KEY to specify an index, wants a separate CREATE INDEX statement instead.
Does not like a column type of MEDIUMINT, but will accept TINYINT, SMALLINT, INT and BIGINT.
Does not like a column type of DOUBLE, will accept FLOAT instead.
Does not like size after integer column type.
Does not like the qualifier UNSIGNED on any numeric column type, use a check connstraint instead.
Does not like the qualifier ZEROFILL on any numeric column type.
Does not like '0000-00-00 00:00:00' as the default for DATETIME columns, will only accept a valid date and time.
Does not like TINYTEXT, MEDIUMTEXT, TEXT and LONGTEXT are being deprecated in favour of VARCHAR(n) or VARCHAR(MAX).
Does not like LONGBLOB, but will accept VARBINARY(MAX).
Does not like default '' NOT NULL as an empty string is treated as NULL.
In order to use the UNICODE character set you should use NCHAR, NVARCHAR and NVARCHAR(MAX) instead of CHAR, VARCHAR and TEXT.
The DATETIME data type should be changed to DATETIME2(0). If you do not specify a scale value (microseconds) it will assume (7)
The TIMESTAMP data type does not in fact store a date and time, it is a synonym for ROWVERSION which contains an automatically generated binary number. This is used as a mechanism for version-stamping table rows.

This means that the following in MySQL:

CREATE TABLE IF NOT EXISTS `tablename` (
  `field1` varchar(8) NOT NULL default '',
  `field2` tinyint(4) unsigned default NULL,
  `field3` smallint(3) unsigned default NULL,
  `field4` mediumint(8) unsigned default NULL,
  `field5` double default NULL,
  `field6` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`field1`),
  UNIQUE KEY `field2` (`field2`),
  KEY `field3` (`field3`)
);

will have to be changed to work in SQL Server, as follows:

CREATE TABLE tablename (
  field1 nvarchar(8) NOT NULL,
  field2 tinyint CHECK(field2 >= 0),
  field3 smallint CHECK(field3 >= 0),
  field4 number(10) CHECK(field4 >= 0),
  field5 float,
  field6 datetime2(0) default '2000-01-01 00:00:00' NOT NULL,
  PRIMARY KEY (field1),
  UNIQUE (field2)
);
CREATE INDEX tablename_field3 ON tablename (field3);

Note: If you use UNIQUE (<fieldlist>) to specify a unique key the system will generate a random name for that key. In order to specify a particular name you must use one of the following commands:

ALTER TABLE <tablename> ADD CONSTRAINT <name> UNIQUE (<fieldlist>)
- or -
CREATE UNIQUE INDEX <name> ON <tablename>(<fieldlist>) [WHERE <column> IS NOT NULL];

If any of the columns in the unique key are optional you need the optional [WHERE <column> IS NOT NULL] otherwise it will not allow duplicate entries with NULL values.

Note also that the <name> must be unique within the database, not just for the table.

Loading data

Separate scripts are provided to create the tables and load the data for each subsystem. These scripts can be found in the radicore/<subsystem>/sql/sqlsrv/ directories and processed using the administration program of your choice.

CREATE TABLE scripts

Each <subsystem>-schema.sql script starts by creating the database and then making that database current, as shown in the following example:

CREATE DATABASE [dict];
GO
USE [dict];
GO

Note that this uses the default schema of dbo.

This will be followed by the code which creates each table within that database, as shown in the following example:

CREATE TABLE dict_database (
  database_id nvarchar(64) NOT NULL,
  database_desc nvarchar(255) NOT NULL,
  db_comment nvarchar(MAX),
  subsys_id nvarchar(16) NOT NULL,
  created_date datetime2(0) NOT NULL DEFAULT '2000-01-01 00:00:00',
  created_user nvarchar(16) NOT NULL DEFAULT 'UNKNOWN',
  revised_date datetime2(0) DEFAULT NULL,
  revised_user nvarchar(16) DEFAULT NULL,
  PRIMARY KEY (database_id)
);
go

Note that there are no GRANT statements as default access is automatically supplied via Windows Authentication.

INSERT data scripts

Each <subsystem>-data.sql script starts by setting the current database, as shown in the following example:

USE dict;

This will be followed by the code which loads records into each table, as shown in the following example:

INSERT INTO tablename (column1, column2, ...) VALUES ('value1', 'value2', ...);
INSERT INTO tablename (column1, column2, ...) VALUES ('value1', 'value2', ...);

Please note the following:

String values containing apostrophes (single quote) must be changed from \' (backslash+quote) to '' (quote+quote) (not a double-quote character).
If a database table has a primary key of type IDENTITY (auto-increment) you cannot load data into that table with a pre-determined key value unless you use the SET IDENTITY_INSERT <table> ON;

SQL queries

As well as differences in the syntax for creating tables there is also a difference in the syntax of various SQL queries. These differences can be handled by code within the SQL Server DAO.

CONCAT() function

The following is valid in MySQL:

SELECT CONCAT(first_name, ' ', last_name) AS person_name FROM ....

SQL Server does have a CONCAT() function, instead it uses the '+' operator, so it has to be converted to the following:

SELECT first_name + ' ' + last_name AS person_name FROM ....

Note that if any of the column names is not of type VARCHAR (string) then it may cause a conversion error. To avoid this the value should be converted by replacing column with CAST(column AS VARCHAR).

The 'select' string is therefore examined before the query is issued to replace the CONCAT() function with '+'.

CONCAT_WS() function

The following is valid in MySQL:

SELECT CONCAT_WS(',', A, B, C)' AS foobar FROM ....

SQL Server does not have a CONCAT_WS() function, instead it uses a combination of the COALESCE function and the '+' operator, so it has to be converted to the following:

SELECT (COALESCE(A, '') + COALESCE(', ' + B, '') + COALESCE(', ' + C, '')) AS foobar FROM ....

The 'select' string is therefore examined before the query is issued to replace the CONCAT_WS() function with the alternative code shown above.

GROUP_CONCAT() function

The following is valid in MySQL:

(SELECT GROUP_CONCAT(table.field ORDER BY .... SEPARATOR ',') FROM table WHERE a=b) AS alias

SQL Server does have a GROUP_CONCAT() function, but it uses a different syntax which is as follows:

SELECT alias=STUFF((SELECT ','+field FROM table WHERE user_id=mnu_user.user_id ORDER BY field ASC FOR XML PATH('')) , 1, 1, '' ) FROM ...

The 'select' string is therefore examined before the query is issued to replace the GROUP_CONCAT() function with the alternative code shown above.

Using expression results

In MySQL it is possible for the SELECT clause to contain an expression whose result can be referred to by a name, such as (expression) AS expression_result, and for expression_result to be reused in the same SELECT clause and the HAVING clause.

For example, the following is valid in MySQL:

SELECT col1, col2, (SELECT ......) AS result1, (SELECT .... WHERE id=result1) AS result2
FROM table
WHERE ...
HAVING result1 > 0

This is not allowed by SQL Server, so the framework has to detect where an expression result is referenced so that it can be replaced by the expression which produced that result.

Backslashes in WHERE

Some parts of the framework automatically escape the WHERE clause with backslashes, but as SQL Server does not use backslash as its escape character this has to be altered before the query is issued. This means that all occurrence of \' (backslash+quote) have to be replaced with '' (quote+quote).

Escaping single quotes in data

In order to have a quote character inside a quoted string in an SQL query it is necessary to escape each quote with another quote. There is no function such as mysqli_real_escape_string() or pg_escape_string(), so it has to be done with code.

LIMIT and OFFSET (pagination)

Unlike MySQL and PostgreSQL there is no LIMIT and OFFSET which can be used for pagination, but the same effect can be achieved using a totally different mechanism.

In the Radicore framework there are two variables which are passed into the DAO from above:

$rows_per_page - indicates the number of rows in each page. A value of zero means 'no pagination'.
$pageno - the page number to be extracted, starting at 1.

If no pagination is required (i.e. when $rows_per_page is zero) then the query can be issued as normal. If pagination is required then the normal query string must be rewritten as follows:

select * from (
SELECT $select_str, ROW_NUMBER() OVER ($sort_str) AS rownum
FROM $from_str $where_str $group_str $having_str
) AS x WHERE rownum BETWEEN $min_rows and $max_rows

The calculation for $max_rows and $min_rows is as follows:

    if ($rows_per_page > 0) {
        // count the rows that satisfy this query
        $this->query = "SELECT count(*) FROM $from_str $where_str $group_str $having_str";
        $this->numrows = $this->getCount($schema, $tablename, $this->query);
        // calculate the total number of pages from this query
        $this->lastpage = ceil($this->numrows/$rows_per_page);
    } else {
        $this->lastpage = 1;
        $this->numrows  = null;
    } // if
    
    // ensure pageno is within range
    if ($pageno < 1) {
        $pageno = 1;
    } elseif ($pageno > $this->lastpage) {
        $pageno = $this->lastpage;
    } // if
    $this->pageno = $pageno;
    
    if ($rows_per_page > 0) {
        // insert code for pagination
        $min_rows = (($pageno - 1) * $rows_per_page) +1;
        $max_rows = ($min_rows + $rows_per_page) -1;
        $this->query = "select * from ("
                     . "SELECT $select_str, ROW_NUMBER() OVER ($sort_str) AS rownum"
                     . "FROM $from_str $where_str $group_str $having_str"
                     . ") AS x WHERE rownum BETWEEN $min_rows and $max_rows";
    } else {
        // read all available rows
        $this->query = "SELECT $select_str FROM $from_str $where_str $group_str $having_str $sort_str";
    } // if
    
    $result = sqlsrv_query($this->dbconnect, $this->query, 
                           array(), array('scrollable' => SQLSRV_CURSOR_STATIC)) 
                           or trigger_error($this, E_USER_ERROR);

INSERT syntax

In standard SQL there are two forms of the INSERT command:

INSERT INTO tablename VALUES ('value1','value2',...) 
INSERT INTO tablename (col1,col2,...) VALUES ('value1','value2',...)

What I like about MySQL is that it also allows UPDATE-style syntax, as follows:

UPDATE tablename SET col1='value1', col2='value2', ...
INSERT INTO tablename SET col1='value1', col2='value2', ...

I prefer this approach as it keeps column names and values together instead of in two separate lists. This seems such a common sense idea that I'm surprised it is not part of the standard already. Why allow 2 formats for INSERT and a completely different 3rd one for UPDATE?

Even though SQL Server is not so accommodating, the fact that in my infrastructure all QUERY statements are constructed and issued within the relevant database driver means that I can customise each driver according to the whims of the individual database. The remainder of my infrastructure remains totally oblivious to the internal workings of any individual driver.

AUTO_INCREMENT

MySQL makes it very easy to use a technical key in any table. All the DBA need do is include the AUTO_INCREMENT keyword in the table definition, as in the following:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `desc` varchar(40) default NULL,
  PRIMARY KEY (`id`)
);

There is no AUTO_INCREMENT keyword in SQL Server, but the same effect can be achieved by using the IDENTITY keyword, as in the following example:

CREATE TABLE test (
  id int NOT NULL IDENTITY(1,1),
  desc varchar(40) default NULL,
  PRIMARY KEY (id)
);

Obtaining the sequence number that has just been allocated so that it can be passed back to the application can be done with the following code:

    if (!empty($auto_increment)) {
        $this->query = "SELECT @@identity';
        $result = sqlsrv_query($this->dbconnect, $this->query) or trigger_error($this, E_USER_ERROR);
        $identity = sqlsrv_fetch_array($result, SQLSRV_FETCH_NUMERIC);
        $fieldarray[$auto_increment] = $identity[0];
    } // if

UNSIGNED

In MySQL you can use the keyword 'unsigned' to specify that a column does not allow negative numbers, but in SQL Server you have to use a check contstraint intead. This can be specified in one of the following ways:

In the CREATE TABLE script, such as:

CREATE TABLE tablename (
  ...
  field2 tinyint CHECK(field2 >= 0),
  ...
);

In an ALTER table statement, such as:

ALTER TABLE <table> ADD CONSTRAINT <name> CHECK (<column> >= 0);

SET datatype

Although a database column is only supposed to contain a single value, most modern databases allow a column to contain an array of values. MySQL has the SET datatype which can be defined as follows:

CREATE TABLE `x_person` (
  ....,
  `favourite_food` set('1','2','3','4','5','6','7','8','9','10') default NULL,
  ....,
  PRIMARY KEY  (`person_id`)
);

Unfortunately SQL Server does not have anything which resembles the SET data type, so this feature cannot be supported.

GROUP BY clause

If the GROUP BY clause is not empty then SQL Server will throw an error if it does not contain every column that also appears in the SELECT list. This was correct according to the SQL standard of 1991, but was changed in the 1999 standard so that any column in the GROUP BY clause which is functionally dependent on another column in the GROUP BY clause can be excluded. Relational theory states that within a relation (table) a non-key field is functionally dependent on the key, so that if the SELECT list contains a mixture of key and non-key fields then only the key fields need to be specified in the GROUP BY clause. This is the way that MySQL works.

This means that before the SQL query for SQL Server is constructed from the separate SELECT, FROM, WHERE, GROUP-BY, HAVING and ORDER-BY strings the GROUP-BY string has to be inspected, and if it is non-empty it must be modified to ensure that it contains every field in the SELECT string (except for fields defined as <expression> AS aliasname), plus every field which appears in the ORDER-BY string.

HAVING clause

If the SELECT list contains an expression such as SUM(amount) AS amount_alias or (SELECT ...) AS select_alias, and the results need to be filtered based on the value of this expression, then this must be done using the HAVING clause instead of the WHERE clause. This is because the value of the expression might not be known until after the WHERE clause has been processed.

MySQL is intelligent because it allows the HAVING clause to refer to the expression by its aliasname instead of the function which produces the value, so it permissible to use the following:

SELECT user, MAX(salary) AS max_salary 
FROM users
GROUP BY user 
HAVING max_salary>10;

SQL Server is not so intelligent as it does not allow the HAVING clause to reference an aliasname, so the aggregate function must be used instead, as in:

SELECT user, MAX(salary) AS max_salary 
FROM users
GROUP BY user 
HAVING MAX(salary)>10;

However, unlike MySQL this only works if the expression is an aggregation, so does not work for functions and subqueries.

The following query is valid in MySQL but not for SQL Server:

SELECT table_id, table_desc, 
      (SELECT COUNT(table_id) FROM dict_column 
                             WHERE dict_column.database_id=dict_table.database_id 
                               AND dict_column.table_id=dict_table.table_id 
                          GROUP BY dict_table.table_id) AS column_count
FROM dict_table  
WHERE database_id='audit'  
HAVING column_count>0 
ORDER BY table_id

This will only work in SQL Server if the query is enclosed in an outer query, with the HAVING clause switched to the WHERE clause of the outer query, as shown in the following example:

SELECT * FROM (
	SELECT table_id, table_desc, 
	      (SELECT COUNT(table_id) FROM dict_column 
                               WHERE dict_column.database_id=dict_table.database_id 
                                 AND dict_column.table_id=dict_table.table_id 
                            GROUP BY dict_table.table_id) AS column_count
	FROM dict_table  
	WHERE dict_table.database_id='audit'
) AS x
WHERE column_count>0 
ORDER BY table_id

This modification of the SQL statement is possible because the SELECT, FROM, WHERE, GROUP-BY, HAVING and ORDER-BY strings are delivered as separate parts, so they can be examined and manipulated individually before they are assembled into a final SQL query.

JOIN clause

In MySQL it is possible to JOIN to a table in a different database simply by placing the database name in front of the table name, as in the following:

FROM ...
LEFT JOIN dbname.tblname ON (...)

In SQL Server it is also necessary to specify the schema name, as in the following:

FROM ...
LEFT JOIN dbname.schema.tblname ON (...)

TRUE/FALSE

In MySQL it is possible to have some expressions which evaluate to either TRUE or FALSE, and to subsequently filter the recordset on the result of that expression, but SQL Server does not allow the words TRUE or FALSE to appear in a query, so the following has to take place:

The word TRUE has to be changed to 1.

The word FALSE has to be changed to 0.

Multi-table deletes

In MySQL it is possible to issue a query such as the following:

DELETE FROM ssn, trn, tbl, fld 
USING audit_ssn AS ssn
LEFT JOIN audit_trn AS trn ON (trn.session_id=ssn.session_id)
LEFT JOIN audit_tbl AS tbl ON (tbl.session_id=trn.session_id 
                           AND tbl.tran_seq_no=trn.tran_seq_no)
LEFT JOIN audit_fld AS fld ON (fld.session_id=tbl.session_id 
                           AND fld.tran_seq_no=tbl.tran_seq_no
                           AND fld.table_seq_no=tbl.table_seq_no)
WHERE ssn.ssn_datetime < '2015-11-01 00:00:00';

This will delete qualifying entries from the audit_ssn table, and all related entries from the audit_trn, audit_tbl and audit_fld tables. This is particularly useful if you are using a storage engine which does not support foreign key constraints.

This form of DELETE statement is not supported in SQL Server as the same functionality can be provided by deleting from the first table only, and have the database itself take care of the others by defining foreign key constraints such as the following:

ALTER TABLE audit_trn ADD CONSTRAINT fk_audit_trn_audit_ssn FOREIGN KEY (session_id) 
                                                            REFERENCES audit_ssn(session_id) 
                                                            ON DELETE CASCADE;

ALTER TABLE audit_tbl ADD CONSTRAINT fk_audit_tbl_audit_trn FOREIGN KEY (session_id, tran_seq_no) 
                                                            REFERENCES audit_trn(session_id, tran_seq_no) 
                                                            ON DELETE CASCADE;

ALTER TABLE audit_fld ADD CONSTRAINT fk_audit_fld_audit_tbl FOREIGN KEY (session_id, tran_seq_no, table_seq_no) 
                                                            REFERENCES audit_tbl(session_id, tran_seq_no, table_seq_no) 
                                                            ON DELETE CASCADE;

Other

Default Sort Sequence

As has been noted in LIMIT and OFFSET (pagination) the SQL query includes the OVER() function which requires a sort string. To avoid the possibility of error you should amend every table definition in the Data Dictionary so that it includes a default sort sequence. If you do not supply one the framework will use the table's primary key.

Conclusion

In theory it should be a relatively simple exercise to switch from one SQL database to another. As is usual in this universe there is a big difference between theory and practice. Although all the database vendors 'supposedly' follow the same standards, they all seem to have followed different interpretations of those standards, followed by different implementations, all of which are capped off with different proprietary extensions. This makes the developer's job of switching database vendors more complicated than it need be, but by following some simple steps it is possible to take that mountain and turn it back into a mole hill:

When it comes to database capabilities it is better to stick to the lowest common denominator and avoid those custom extensions which are unique to a particular vendor's database. Some people say that if you are paying money for a database with fancy extensions then it is a waste of money not to use them. But by locking yourself into a particular vendor's product you may actually be throwing money down the drain as you are effectively making it impossible to switch to a cheaper database from a different vendor.
By having all the database access code isolated in a single component, a database driver, instead of having it spread throughout the system means that you only have to change that one component instead of the whole system. Although the internals of each database driver are totally unique they all share the same API's, therefore the remainder of the system is (or should be) totally unaffected when one driver is swapped for another.

NOTE: All testing was done against SQL Server 2008 R2 Express Edition running on Windows XP using PHP version 5.2.15. I did not use the msql extension as Microsoft have provided their own SQL Server driver for PHP.

Amendment history:

01 Nov 2015	Added section Multi-table deletes.
18 Mar 2015	Added section CONCAT_WS, GROUP_CONCAT and Using expression results.
10 Jan 2015	Added section Default Sort Sequence.

counter

Tony Marston's Blog About software development, PHP and OOP

Creating a SQL Server driver for the Radicore Development Infrastructure

1. CONFIG.INC (DB.INC in the small sample application)

2. Database Class (dml.<engine>.class.inc)