1st January 2011
The software that I write is not intended for just one customer who is locked into a single RDBMS. Instead it can be used by virtually any customer using the RDBMS of their choice. How is this possible? I deliberately designed and built my infrastructure around the 3 Tier Architecture which breaks down application code into three distinct areas of responsibility:
It is only the Data Access layer which communicates with the database, therefore it should be possible to communicate with another database simply by changing the component which exists within this layer. I have already produced the following data access classes:
The purpose of this article is to document my experiences while producing a DAO for a totally different RDBMS, in this case for SQL Server 2008 R2 Express Edition.
Both MySQL and SQL Server come with GUI administrative tools. MySQL has the following:
SQL Server has the following:
Just like MySQL, SQL Server has tables within databases. There are schemas, but unlike SQL Server 2005 (and Oracle) schemas are no longer equivalent to database users. Each schema is now a distinct namespace that exists independently of the user who created it. In other words a schema is simply a container of objects. A schema can be owned by any user, and its ownership is transferable.
Once you have established a connection to a server instance you may access any table within any database. One database is nominated as the "default" or "current" database either by specifying it in the connection string, or by switching to it at runtime by issuing the USE <dbname> query. Tables in other databases can only be accessed by qualifying them with the database and schema names, as in dbname.schema.tablename.
In order to make my code access database tables through the Oracle OCI8 interface I made the following changes to my code:
This is a small include file which exists outside the web root, and which contains the details necessary to connect to the database. It has been changed from this:
<?php $dbms = 'mysql'; $dbhost = 'localhost'; $dbusername = '****'; $dbuserpass = '****'; ?>to this:
<?php $dbms = 'sqlsrv'; $serverName = '(local)'; $connectionInfo = array('CharacterSet' => 'UTF-8', 'ReturnDatesAsStrings' => true); $SQLSRV_schema = 'dbo'; ?>
The global variable $dbms identifies which class file to access before creating the Data Access Object that will be responsible for all communication with the database. This is documented in my FAQ.
The global variables $serverName and $connectionInfo are used in the sqlsrv_connect function which uses Windows Authentication instead of a username and password. Note that I use the sqlsrv driver (which can be downloaded from http://msdn.microsoft.com/en-us/sqlserver/ff657782.aspx) instead of the msql extension.
The global variable $SQLSRV_schema is used in any JOIN clauses which specify a database name as well as a table name.
function connect ($dbname=null)
// establish a connection to the database
{
global $dbhost, $dbusername, $dbuserpass;
$this->errors = array();
$this->dbname = $dbname;
static $dbconn;
if (!$dbconn) {
$dbconn = mysqli_connect($dbhost, $dbusername, $dbuserpass);
$this->dbconnect = $dbconn;
} // if
if (!$dbconn) {
return FALSE;
} // if
if ($dbname) {
if (!mysqli_select_db($dbconn, $dbname)) {
return FALSE;
} // if
} // if
return TRUE;
} // connect
The code in the SQL Server class is as follows:
function connect ($dbname=null)
// establish a connection to the database
{
global $serverName, $connectionInfo;
$this->errors = array();
static $dbconn;
if (!$dbconn) {
$dbconn = sqlsrv_connect($serverName, $connectionInfo) or trigger_error($this, E_USER_ERROR);
$this->dbconnect = $dbconn;
} // if
if (!$dbconn) {
return FALSE;
} // if
if (!empty($dbname)) {
$result = sqlsrv_query($dbconn, "USE $dbname") or trigger_error($this, E_USER_ERROR);
} // if
return TRUE;
} // connect
As you can see the two methods are quite similar.
Although both MySQL and Oracle claim to be SQL compliant, there are some differences in the syntax of the CREATE TABLE command:
default '' NOT NULL as an empty string is treated as NULL.This means that the following in MySQL:
CREATE TABLE IF NOT EXISTS `tablename` ( `field1` varchar(8) NOT NULL default '', `field2` tinyint(4) unsigned default NULL, `field3` smallint(3) unsigned default NULL, `field4` mediumint(8) unsigned default NULL, `field5` double default NULL, `field6` datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (`field1`), UNIQUE KEY `field2` (`field2`), KEY `field3` (`field3`) );
will have to be changed to work in SQL Server, as follows:
CREATE TABLE tablename ( field1 varchar(8) NOT NULL, field2 tinyint, field3 smallint, field4 number(10), field5 float, field6 datetime default '2000-01-01 00:00:00' NOT NULL , PRIMARY KEY (field1), UNIQUE (field2) ); CREATE INDEX tablename_field3 ON tablename (field3);
Separate scripts are provided to create the tables and load the data for each subsystem. These scripts can be found in the radicore/<subsystem>/sql/sqlsrv/ directories and processed using the administration program of your choice.
Each <subsystem>-schema.sql script starts by creating the database and then making that database current, as shown in the following example:
CREATE DATABASE [dict]; GO USE [dict]; GO
Note that this uses the default schema of dbo.
This will be followed by the code which creates each table within that database, as shown in the following example:
CREATE TABLE dict_database ( database_id nvarchar(64) NOT NULL DEFAULT '', database_desc nvarchar(255) NOT NULL DEFAULT '', db_comment ntext, subsys_id nvarchar(16) NOT NULL DEFAULT '', created_date datetime NOT NULL DEFAULT '2000-01-01 00:00:00', created_user nvarchar(16) NOT NULL DEFAULT 'UNKNOWN', revised_date datetime DEFAULT NULL, revised_user nvarchar(16) DEFAULT NULL, PRIMARY KEY (database_id) ); go
Note that there are no GRANT statements as default access is automatically supplied via Windows Authentication.
Each <subsystem>-data.sql script starts by setting the current database, as shown in the following example:
USE dict;
This will be followed by the code which loads records into each table, as shown in the following example:
INSERT INTO tablename (column1, column2, ...) VALUES ('value1', 'value2', ...);
INSERT INTO tablename (column1, column2, ...) VALUES ('value1', 'value2', ...);
As well as differences in the syntax for creating tables there is also a difference in the syntax of various SQL queries. These differences can be handled by code within the SQL Server DAO.
The following in MySQL:
SELECT CONCAT(first_name, ' ', last_name) AS person_name FROM ....
SQL Server does have a CONCAT() function, instead it uses the '+' operator, so it has to be converted to the following:
SELECT first_name + ' ' + last_name AS person_name FROM ....
The 'select' string is therefore examined before the query is issued to replace the CONCAT() function with '+'.
Some parts of the framework automatically escape the WHERE clause with backslashes, but as SQL Server does not use backslash as its escape character this has to be altered before the query is issued. This means that all occurrence of \' (backslash+quote) have to be replaced with '' (quote+quote).
In order to have a quote character inside a quoted string in an SQL query it is necessary to escape each quote with another quote. There is no function such as mysqli_real_escape_string() or pg_escape_string(), so it has to be done with code.
Unlike MySQL and PostgreSQL there is no LIMIT and OFFSET which can be used for pagination, but the same effect can be achieved using a totally different mechanism.
In the Radicore framework there are two variables which are passed into the DAO from above:
If no pagination is required (i.e. when $rows_per_page is zero) then the query can be issued as normal. If pagination is required then the normal query string must be rewritten as follows:
select * from ( SELECT $select_str, ROW_NUMBER() OVER ($sort_str) AS rownum FROM $from_str $where_str $group_str $having_str ) AS x WHERE rownum BETWEEN $min_rows and $max_rows
The calculation for $max_rows and $min_rows is as follows:
if ($rows_per_page > 0) {
// count the rows that satisfy this query
$this->query = "SELECT count(*) FROM $from_str $where_str $group_str $having_str";
$this->numrows = $this->getCount($schema, $tablename, $this->query);
// calculate the total number of pages from this query
$this->lastpage = ceil($this->numrows/$rows_per_page);
} else {
$this->lastpage = 1;
$this->numrows = null;
} // if
// ensure pageno is within range
if ($pageno < 1) {
$pageno = 1;
} elseif ($pageno > $this->lastpage) {
$pageno = $this->lastpage;
} // if
$this->pageno = $pageno;
if ($rows_per_page > 0) {
// insert code for pagination
$min_rows = (($pageno - 1) * $rows_per_page) +1;
$max_rows = ($min_rows + $rows_per_page) -1;
$this->query = "select * from ("
. "SELECT $select_str, ROW_NUMBER() OVER ($sort_str) AS rownum"
. "FROM $from_str $where_str $group_str $having_str"
. ") AS x WHERE rownum BETWEEN $min_rows and $max_rows";
} else {
// read all available rows
$this->query = "SELECT $select_str FROM $from_str $where_str $group_str $having_str $sort_str";
} // if
$result = sqlsrv_query($this->dbconnect, $this->query,
array(), array('scrollable' => SQLSRV_CURSOR_STATIC))
or trigger_error($this, E_USER_ERROR);
In standard SQL there are two forms of the INSERT command:
INSERT INTO tablename VALUES ('value1','value2',...)
INSERT INTO tablename (col1,col2,...) VALUES ('value1','value2',...)
What I like about MySQL is that it also allows UPDATE-style syntax, as follows:
UPDATE tablename SET col1='value1', col2='value2', ... INSERT INTO tablename SET col1='value1', col2='value2', ...
I prefer this approach as it keeps column names and values together instead of in two separate lists. This seems such a common sense idea that I'm surprised it is not part of the standard already. Why allow 2 formats for INSERT and a completely different 3rd one for UPDATE?
Even though SQL Server is not so accommodating, the fact that in my infrastructure all QUERY statements are constructed and issued within the relevant database driver means that I can customise each driver according to the whims of the individual database. The remainder of my infrastructure remains totally oblivious to the internal workings of any individual driver.
MySQL makes it very easy to use a technical key in any table. All the DBA need do is include the AUTO_INCREMENT keyword in the table definition, as in the following:
CREATE TABLE `test` ( `id` int(10) unsigned NOT NULL auto_increment, `desc` varchar(40) default NULL, PRIMARY KEY (`id`) );
There is no AUTO_INCREMENT keyword in SQL Server, but the same effect can be achieved by using the IDENTITY keyword, as in the following example:
CREATE TABLE test ( id int NOT NULL IDENTITY(1,1), desc varchar(40) default NULL, PRIMARY KEY (id) );
Obtaining the sequence number that has just been allocated so that it can be passed back to the application can be done with the following code:
if (!empty($auto_increment)) {
$this->query = "SELECT @@identity';
$result = sqlsrv_query($this->dbconnect, $this->query) or trigger_error($this, E_USER_ERROR);
$identity = sqlsrv_fetch_array($result, SQLSRV_FETCH_NUMERIC);
$fieldarray[$auto_increment] = $identity[0];
} // if
Although a database column is only supposed to contain a single value, most modern databases allow a column to contain an array of values. MySQL has the SET datatype which can be defined as follows:
CREATE TABLE `x_person` (
....,
`favourite_food` set('1','2','3','4','5','6','7','8','9','10') default NULL,
....,
PRIMARY KEY (`person_id`)
);
Unfortunately SQL Server does not have anything which resembles the SET data type, so this feature cannot be supported.
If the GROUP BY clause is not empty then SQL Server will throw an error if it does not contain every column that also appears in the SELECT list. This was correct according to the SQL standard of 1991, but was changed in the 1999 standard so that any column in the GROUP BY clause which is functionally dependent on another column in the GROUP BY clause can be excluded. Relational theory states that within a relation (table) a non-key field is functionally dependent on the key, so that if the SELECT list contains a mixture of key and non-key fields then only the key fields need to be specified in the GROUP BY clause. This is the way that MySQL works.
This means that before the SQL query for SQL Server is constructed from the separate SELECT, FROM, WHERE, GROUP-BY, HAVING and ORDER-BY strings the GROUP-BY string has to be inspected, and if it is non-empty it must be modified to ensure that it contains every field in the SELECT string (except for fields defined as <expression> AS aliasname), plus every field which appears in the ORDER-BY string.
If the SELECT list contains an expression such as SUM(amount) AS amount_alias or (SELECT ...) AS select_alias, and the results need to be filtered based on the value of this expression, then this must be done using the HAVING clause instead of the WHERE clause. This is because the value of the expression might not be known until after the WHERE clause has been processed.
MySQL is intelligent because it allows the HAVING clause to refer to the expression by its aliasname instead of the function which produces the value, so it permissible to use the following:
SELECT user, MAX(salary) AS max_salary FROM users GROUP BY user HAVING max_salary>10;
SQL Server is not so intelligent as it does not allow the HAVING clause to reference an aliasname, so the aggregate function must be used instead, as in:
SELECT user, MAX(salary) AS max_salary FROM users GROUP BY user HAVING MAX(salary)>10;
However, unlike MySQL this only works if the expression is an aggregation, so does not work for functions and subqueries.
The following query is valid in MySQL but not for SQL Server:
SELECT table_id, table_desc,
(SELECT COUNT(table_id) FROM dict_column
WHERE dict_column.database_id=dict_table.database_id
AND dict_column.table_id=dict_table.table_id
GROUP BY dict_table.table_id) AS column_count
FROM dict_table
WHERE database_id='audit'
HAVING column_count>0
ORDER BY table_id
This will only work in SQL Server if the query is enclosed in an outer query, with the HAVING clause switched to the WHERE clause of the outer query, as shown in the following example:
SELECT * FROM (
SELECT table_id, table_desc,
(SELECT COUNT(table_id) FROM dict_column
WHERE dict_column.database_id=dict_table.database_id
AND dict_column.table_id=dict_table.table_id
GROUP BY dict_table.table_id) AS column_count
FROM dict_table
WHERE dict_table.database_id='audit'
) AS x
WHERE column_count>0
ORDER BY table_id
This modification of the SQL statement is possible because the SELECT, FROM, WHERE, GROUP-BY, HAVING and ORDER-BY strings are delivered as separate parts, so they can be examined and manipulated individually before they are assembled into a final SQL query.
In MySQL it is possible to JOIN to a table in a different database simply by placing the database name in front of the table name, as in the following:
FROM ... LEFT JOIN dbname.tblname ON (...)
In SQL Server it is also necessary to specify the schema name, as in the following:
FROM ... LEFT JOIN dbname.schema.tblname ON (...)
In MySQL it is possible to have some expressions which evaluate to either TRUE or FALSE, and to subsequently filter the recordset on the result of that expression, but SQL Server does not allow the words TRUE or FALSE to appear in a query, so the following has to take place:
The word TRUE has to be changed to 1.
The word FALSE has to be changed to 0.
In theory it should be a relatively simple exercise to switch from one SQL database to another. As is usual in this universe there is a big difference between theory and practice. Although all the database vendors 'supposedly' follow the same standards, they all seem to have followed different interpretations of those standards, followed by different implementations, all of which are capped off with different proprietary extensions. This makes the developer's job of switching database vendors more complicated than it need be, but by following some simple steps it is possible to take that mountain and turn it back into a mole hill:
NOTE: All testing was done against SQL Server 2008 R2 Express Edition running on Windows XP using PHP version 5.2.15. I did not use the msql extension as Microsoft have provided their own SQL Server driver for PHP.
© Tony Marston
1st January 2011
http://www.tonymarston.net
http://www.radicore.org