How to handle a Many-to-Many relationship with PHP and MySQL

By Tony Marston

21st May 2003

Intended Audience
Introduction
Database schema
Form design
- Simple
- Complex
- Complex update
Summary

Intended Audience

This tutorial is intended for those developers who come across the situation where they need to cater for what is known as a 'many-to-many' relationship. It will explain how to generate the database tables to deal with such a relationship, and how to design HTML forms to display and update the contents of these tables.

This situation can often catch the inexperienced designer/developer unawares, often leading to clumsy or unworkable solutions. This tutorial will show a technique that has been successful for me for many years.

Introduction

What exactly is a many-to-many relationship? When your user tries to explain to you how a pair of business objects are related he might say "many of these can be related to many of those, and many of those can be related to many of these." If you attempt to draw such a relationship in an Entity Relationship Diagram (ERD) you will probably end up with something like this:

many-to-many-01 (1K)

Those of us who are more experienced in database design will immediately see this as a totally illegal relationship. The only sort of relationship which is possible is a 'one-to-many', so the novice might redraw his ERD as follows:

many-to-many-02 (1K)

That may look better, but again it is impossible to implement in any RDBMS that I have come across. You can have A-to-B in a 'one-to-many' relationship, or B-to-A in a 'one-to-many' relationship, but you cannot have both at the same time. So what are we to do?

The solution is to create a new table which will act as the 'many' in two 'one-to-many' relationships, with each of the original tables acting as the 'one'. This is shown in the following diagram, with the new table identified as 'X'.

many-to-many-03 (1K)

Table 'X' is sometimes referred to as a 'link', 'intersection' or 'xref' (cross-reference) table.

Database schema

Let us assume that the existing tables, 'A' and 'B', are defined as follows:

CREATE TABLE `a` (
  `a_id` varchar(6) NOT NULL default '',
  `a_desc` varchar(40) NOT NULL default '',
  PRIMARY KEY  (`a_id`)
)

CREATE TABLE `b` (
  `b_id` varchar(6) NOT NULL default '',
  `b_desc` varchar(40) NOT NULL default '',
  PRIMARY KEY  (`b_id`)
)

This new table must be constructed to allow the following:

In order to satisfy all these rules all I need in table 'X' are two columns, 'a_id' and 'b_id'. I do not need an additional column for a primary key as I can use the existing two columns. This produces a database schema as follows:

CREATE TABLE `x` (
  `a_id` varchar(6) NOT NULL default '',
  `b_id` varchar(6) NOT NULL default '',
  PRIMARY KEY  (`a_id`,`b_id`)
)

Note here that I have created a primary key from two columns, not one. Some naive database designers insist that every database table should have a single-column technical primary key called 'id' which obtains its value from an automatically-incrementing internal counter. In MySQL this can be done with the 'auto-increment' option. While I agree that there are places where a technical primary key is definitely beneficial, in my years of experience I have also encountered instances where a technical primary key is not only unnecessary, it is actually detrimental. A cross-reference table is one of those places.

Also note that it is not necessary to create a separate index for 'a_id' as this is already covered by virtue of the fact that it forms the leading portion of the primary key. A separate index on 'b_id' might be of benefit depending on the volume of rows in the table and the frequency of searches on that column.

Form design

Having designed the database we must now turn our attentions to designing the forms that will manipulate the data. For this I will suggest two alternatives although experienced developers may be able to suggest more.

Simple

In this design I have 4 simple forms which are interconnected as shown in the following diagram:

many-to-many-04 (1K)

We start off with the form <Browse 'A'> which enables the user to browse through all occurrences of table 'A'. During the processing of this form the user will be able to select an occurrence of 'A', then press a control (a button or a hyperlink) which will pass the identity of the selected occurrence to a second form which I have identified as <Browse 'X' for selected 'A'>.

This second form will show which occurrence of table 'A' has been selected, then list the current contents of table 'X' for that occurrence. For each occurrence of table 'X' it will include the relevant description from table 'B'. The MySQL query to achieve this is as follows:

SELECT x.a_id, x.b_id, b.b_desc
FROM x
LEFT JOIN b ON (b.b_id = x.b_id)
WHERE (x.a_id = 'whatever')

A similar result may also be achieved using the following:

SELECT x.a_id, x.b_id, b.b_desc
FROM x, b
WHERE (x.a_id = 'whatever') AND (b.b_id = x.b_id)

The difference between these two SQL queries will only become apparent if an occurrence of table 'B' is missing for an occurrence of table 'X'. With the first query all columns selected from table 'B' will be returned as empty. With the second query the effect will be that the occurrence of 'X' will be dropped from the selection.

The second form can only show existing occurrences of table 'X'. In my design this form would have controls to invoke a third form to add new occurrences to 'X' or a fourth form to delete occurrences from 'X'.

Complex

An alternative to this combines the facilities of forms (2), (3) and (4) into a single more complex form, as shown in the following diagram:

many-to-many-05 (2K)

This modified form will still show which occurrence of table 'A' has been selected, but it will then list all occurrences of table 'B'. Against each occurrence of table 'B' will be a checkbox to indicate if an associated occurrence exists on table 'X' or not. The user will be able to toggle the checkbox in order to create or delete the entry on table 'X'. The MySQL query to select the data is as follows:

SELECT a.a_id, b.b_id, b.b_desc,  
CASE WHEN x.b_id IS NULL THEN 'F' ELSE 'T' END AS selected 
FROM a 
CROSS JOIN b 
LEFT JOIN x ON (x.a_id = a.a_id AND x.b_id = b.b_id) 
WHERE (a.a_id = 'whatever')

Notice here that I am selecting from table 'A' and 'B', then doing a LEFT JOIN on table 'X'. The CASE..WHEN..THEN..END statement checks the existence of an associated occurrence on 'X' and sets a column named 'selected' to 'T' if one exists or 'F' if one does not.

As an added refinement I use a third form to allow the entry of selection criteria so that the contents of the second form can be refined. This is usually more relevant in those cases where there are more columns to be displayed, but it does provide the ability to select only those occurrences of 'B' where associated occurrences of 'X' either do or do not exist. This is accomplished by altering the WHERE clause to be one of the following:

WHERE (a.a_id = 'whatever') AND x.a_id IS NULL

or

WHERE (a.a_id = 'whatever') AND x.a_id IS NOT NULL

Complex update

If you thought that the code to retrieve the current data from the database was tricky then what will you think about the code needed to update the database after the user has made his changes? For this I have a standard procedure which has as its input two arrays:

  1. $fieldarray which has an entry for each row displayed in the form, and for each row it contains the names and values for the primary key required for table 'X'.
  2. $selectarray which has an entry for the checkbox on each of the rows. Note that this will only contain entries where the checkbox is ON. If it has been checked OFF then the $_POST array will not contain an entry for that row.

I shall now describe the code to process these two arrays.

I begin by looping through each row that was displayed in the form and initialise two string variables:

foreach ($fieldarray as $rownum => $rowdata) {
   $insert = NULL;
   $delete = NULL;

Each row provides me with the names and values for the primary key, so I can move their details into the two string variables.

   foreach ($rowdata as $fieldname => $fieldvalue) {
      $insert .= "$fieldname='$fieldvalue',";
      $delete .= "$fieldname='$fieldvalue' AND ";
   } // foreach

When there are no more fields left I can trim the unwanted ',' and ' AND '.

   $insert = rtrim($insert, ',');
   $delete = rtrim($delete, ' AND ');

Now I examine the contents of the checkbox in $selectarray and construct the SQL query to either create the entry if the checkbox is ON or delete the entry if the checkbox is OFF:

   if (isset($selectarray[$rownum])) {
      $query = "INSERT INTO $tablename SET $insert";
   } else {
      $query = "DELETE FROM $tablename WHERE $delete";
   } // if

Finally I execute the query and check for errors. Note that I ignore errors concerning duplicate entries. This is caused by a checkbox being ON originally and not being changed to OFF by the user:

   $result = @mysql_query($query, $dbconnect);
   if (mysql_errno() <> 0) {
      if (mysql_errno() == 1062) {
         // ignore duplicate entry
      } else {
         trigger_error("SQL", E_USER_ERROR);
      } // if
   } // if
} // foreach

Summary

I first designed this solution to the many-to-many problem several years ago while developing in a different language. It was an interesting exercise to convert it to PHP and MySQL.

In my long career I have also encountered variations of this problem. In this solution table 'X' contains nothing but the foreign key fields which link up to the two related tables, but this can be made more complicated by the following:

However, I shall leave the solutions to these additional problems to a later date.


© Tony Marston
21st May 2003

http://www.tonymarston.net
http://www.radicore.org

counter