Rapid Application Development toolkit for building Administrative Web Applications

Internationalisation and the Radicore Development Infrastructure (Part 1)

By Tony Marston

15th July 2005
Amended 1st January 2013

As of 10th April 2006 the software discussed in this article can be downloaded from www.radicore.org

Introduction
Possible Methods
Design Decisions
The RADICORE Implementation
- Database Structure
- Directory Structure
- File Names
- File Contents
- Determine User Language
- Locate Language File
- Load Screen/Report Structure file
- Get Language Text
- Get Language Array
- Handling Dates
- Handling Numbers
-- Convert to external (user's) format
-- Convert to internal format
- Character Encoding
Conclusion
References
Amendment History

Introduction

The term "internationalisation" is sometimes referred to as "globalisation" or "localisation", but what does it actually mean? The following description is taken from java.sun.com:

Internationalisation is the process of designing an application so that it can be adapted to various languages and regions without engineering changes. Sometimes the term internationalisation is abbreviated as i18n, because there are 18 letters between the first "i" and the last "n."

An internationalised program has the following characteristics:

Internationalisation in a software application covers the ability to communicate with a user in his/her own language. It can be said to exist at the following levels:

Level 1 is supported in the Radicore framework by having the text for such things as button labels, field labels and error messages contained in text files which are separate from the program code. Each set of files contains text in a single language and is held in a subdirectory whose name identifies that language. Each supported language therefore has a copy of these files in its own subdirectory. The framework will detect the user's preferred language, and will access the text files in the appropriate subdirectory. The details are explained in the following sections of this document. Screen labels and help text are maintained in the database, with separate tables for translations in alternative languages.

Level 2 is supported in the Radicore framework by maintaining translated text in separate tables within the application database. The framework will detect the user's preferred language, and will retrieve either the native text or the translated text as appropriate. Please refer to Internationalisation and the Radicore Development Infrastructure (Part 2) for full details.


Possible Methods

There are several ways in which text in language 'A' can be replaced with text language 'B'. Before a solution can be designed it is necessary to examine the range of possibilities and weigh up the pros and cons of each one.

  1. Are you going to run strings of text through a general-purpose translator, or replace one identifiable string with another?
  2. Are you going to perform the translation/substitution as early as possible (i.e. as soon as you know what text needs to be output), or as late as possible (i.e. just before it is presented to the user)?
  3. Are you going to put text into the output area and then translate it, or translate it first and then put it into the output area?
  4. Are you going to identify each piece of text as a complete string, or give each one a smaller identity code?
  5. Are you going to store the language variations in a database or in external text files?
  6. Are you going to put all the language variations into a single file, or have a separate file for each language?
  7. If you use XML and XSL to produce all HTML output (as does Radicore) could you perform all the translation during the XSL transformation?

Design Decisions

While reviewing the possible options I made the following decisions:


The RADICORE Implementation

Database Structure

Although the vast majority of text will be obtained from the text files which are described below, there are two areas where the text, and its foreign language equivalents, is already maintained in the database and the extra step of exporting that text to a non-database file would therefore be an unnecessary overhead. These tables are:

Directory Structure

The full Radicore development infrastructure consists of a series of discrete subsystems each of which has its own directory in the file system. (Note that the smaller sample application consists of just a single subsystem). Each of these directories contains the following subdirectories:

text For all field labels, menu labels, button labels, error messages and other text.
screens For all screen structure files.
reports For all report structure files.

Each of these new directories will be further broken down into subdirectories where the subdirectory name matches a language code, such as:

en English (the default language)
en_us English (United States)
fr French
fr_ca French (Canada)
de German (Germany)
de_ch German (Switzerland)

Note that the library of locale data which is available within your operating system may define language codes in the format 'xx-XX', but when Radicore uses language codes for directory names it will use the format 'xx_xx' (all lower-case characters, with '_' (underscore) instead of '-' (hyphen)).

The screens directory has a set of screen structure files within the en subdirectory to provide screen labels in the default English language. If versions in any foreign language are required there are two choices:

The reports directory has a set of report structure files within the en subdirectory to provide screen labels in the default English language. If versions in any foreign language are required there are the choices as defined above for the screen structure files.

All subdirectories except 'en' English (the author's native language) are optional. The files in the 'en' subdirectories identify every piece of text which can be translated, so these should be used as the patterns for any non-English translations.

Each language subdirectory should contain a copy of the file(s) containing the text for that particular language code. At runtime the framework will search for a file using the procedure described in Locate Language File.

File Names

Within the screens subdirectory there will be a separate file for each screen structure with the suffix '.screen.inc'.

Within the reports subdirectory there will be a separate file for each report structure with the suffix '.report.inc'.

Within the text subdirectory there will be the following files:

  1. A file called language_text.inc that will contain all the translations for that application in that language.
  2. A file called language_array.inc that will contain all the arrays of values that are normally used in picklists (dropdown lists or radio groups) where the key (as referenced internally) remains consistent, but where the value (as seen by the user) may be expressed in different languages.

For each installation there will also be a 'sys.language_text.inc' and 'sys.language_array.inc' to contain all the translatable text required by the system libraries. In the sample application these will reside in the 'sample/text/' directory, but in the full RADICORE infrastructure these will reside in the 'menu/text/' directory. These 'system' files contains the text that may be used by any of the system libraries (controller scripts, validation class, generic table class and DML class) regardless of the application in which any particular component belongs.

This means that:

File Contents

language_array.inc

This is for storing arrays of values that will be used for such things as dropdown lists and radio groups. Each entry should look something like the following:

$array['direction'] = array('L' => 'Left', 'R' => 'Right', 'U' => 'Up', 'D' => 'Down');

$array['month_names_short'] = array(1 => 'Jan', 'Feb', 'Mar', 'Apr', 
                                         'May', 'Jun', 'Jul', 'Aug', 
                                         'Sep', 'Oct', 'Nov', 'Dec');

$array['month_names_long'] = array(1 => 'January', 'February', 'March', 'April', 
                                        'May', 'June', 'July', 'August', 
                                        'September', 'October', 'November', 'December');

Entries from this file can be extracted using the getLanguageArray() function.

language_text.inc

This should have several sections, one for each category of text.

The first section is for all your error messages, such as:

// application error messages
$array['e0001'] = "This is error message number 1";
$array['e0002'] = "This is error message number 2";
$array['e0003'] = "This is error message number 3";
....
$array['e0099'] = "This is error message number 99";

The second and third sections are for text that can be extracted from the MENU database. Once you have finished entering or amending your details in the MENU database you can go to the List Subsystem screen and press the Export button. One of the files it creates will be <subsys>.menu_export.txt, and the contents can be copied directly into the language_text.inc file which deals with your native language.

// menu details for subsystem MENU
$array['Audit']                         = 'Audit';
$array['Dialog Type']                   = 'Dialog Type';
$array['Dictionary']                    = 'Dictionary';
$array['Menu Controls']                 = 'Menu Controls';
$array['Menu System']                   = 'Menu System';
$array['Role']                          = 'Role';
$array['Subsystem']                     = 'Subsystem';
$array['Task (All)']                    = 'Task (All)';
$array['Task (Menu)']                   = 'Task (Menu)';
$array['Task (Proc)']                   = 'Task (Proc)';
$array['ToDo']                          = 'ToDo';
$array['User']                          = 'User';
$array['Workflow']                      = 'Workflow';

// navigation button details for subsystem MENU
$array['Change Password']               = 'Change Password';
$array['Export']                        = 'Export';
$array['Field Access']                  = 'Field Access';
$array['Help Text']                     = 'Help Text';
$array['List Fields']                   = 'List Fields';
$array['List Task']                     = 'List Task';
$array['List User']                     = 'List User';

The fourth section is for field labels. These are defined in your various screen structure files, and in your native language each key will probably be the same as the value. Even though it may seem to be wasted effort, the real benefit comes when creating a copy of the text in a different language as all the text your require can be found in a single place.

// field labels for subsystem MENU
$array['Access']                        = 'Access';
$array['Button Id']                     = 'Button Id';
$array['Button Text']                   = 'Button Text';
$array['Default Language']              = 'Default Language';
$array['Dialog Type']                   = 'Dialog Type';
$array['Directory']                     = 'Directory';
$array['Field Id']                      = 'Field Id';

Entries from this file can be extracted using the getLanguageText() function.

Determine User Language

Before text can be extracted from a language file the first step is to determine the user's language. For the logon screen this is provided by the client browser (user agent) in the $_SERVER['HTTP_ACCEPT_LANGUAGE'] variable. For this the User Agent Language Detection script provided by http://techpatterns.com is used. The output from this script is stored in a session variable as follows:

if (!isset($_SESSION['user_language_array'])) {
    // get language codes from HTTP header
    require 'language_detection.inc';
    $_SESSION['user_language_array'] = get_languages();
} // if

This returns an array of entries, one for each of the languages that the user may have set in his/her browser. Each language entry is another array of 4 entries as follows:

  1. Full language abbreviation, such as: 'en-gb' or 'en-us'
  2. Primary language, such as: 'en'
  3. Full language string, such as: 'English (United Kingdom)' or 'English (United States)'
  4. Primary language string, such as: 'English'

Once the user has logged on the language for that session will be taken from the language_id column in the MNU_USER table. If is is blank then the system's default language will be used.

Locate Language File

Before the contents of a particular file can be loaded it is first necessary to locate a version of that file in an appropriate language subdirectory. This is done with the following function:

function getLanguageFile ($filename, $directory)
// look for '$directory/$language/$filename' where $language is variable.
{
    $language_array = array();

    if (!empty($GLOBALS['party_language'])) {
        // change hyphen to underscore before file system lookup
        $language_array[] = str_replace('-', '_', strtolower($GLOBALS['party_language']));
    } // if

    $browser_language = getBrowserLanguage($directory);
    if (!empty($browser_language)) {
        $language_array[] = $browser_language;
    } // if

    $language_array[] = $_SESSION['default_language'];
    $language_array[] = 'en';

    // search directories in priority order and stop when the file is found
    foreach ($language_array as $language) {
        $fname = "$directory/$language/$filename";
        if (file_exists($fname)) {
            break;
        } // if
    } // foreach

    if (!file_exists($fname)) {
        // 'File $fname cannot be found'
        trigger_error(getLanguageText('sys0056', $fname), E_USER_ERROR);
    } // if

    return $fname;

} // getLanguageFile

The $directory argument is one of text, screens or reports.

At runtime the framework will build an array of subdirectory names which suit the user's preferences, then scan these subdirectories one at a time looking for a file with the specified name. It will stop looking when the file is found. If a subdirectory does not exist, or the file does not exist in a subdirectory, then the search will skip to the next subdirectory. Note that the array of possible language subdirectories is always terminated with the default language, so a file should always be found. In the rare event that no matching file can be found the application will be terminated with a fatal error.

Within Radicore it is also possible for each user to define his/her preferred language code in the Update User screen. This value is made available in $GLOBALS['party_language'], so is used in preference to the browser language and the installation default language.

Note here that as an absolute minimum there MUST be a subdirectory for the default language, and this subdirectory MUST contain a full set of the expected files otherwise the application will be terminated.

Load Screen/Report Structure file

This uses the getLanguageFile() function to locate the relevant language subdirectory in the './screens' or './reports/' path before loading in the specified file.

function getFileStructure ($filename, $directory)
// load the contents of the $structure variable from a disk file.
{
    // locate file in subdirectory which matches user's language code
    $fname = getLanguageFile ($filename, $directory);
    
    require $fname; // import contents into $structure
    if (empty($structure)) {
        // 'File $fname is empty'
        trigger_error(getLanguageText('sys0124', $fname), E_USER_ERROR);
    } // if

    return $structure;

} // getFileStructure

Note that it does not matter if the only subdirectory that exists for screen structure files is in the default language as all the field labels will be translated into the chosen language at a later stage.

Get Language Text

Individual pieces of translated text will be extracted from the relevant language_text.inc or sys.language_text.inc files using the following function:

function getLanguageText ($id, $arg1=null, $arg2=null, $arg3=null, $arg4=null, $arg5=null)
// get text from the language file and include up to 5 arguments.
{
    static $array1;
    static $array2;
    
    if (!is_array($array1)) {
        // find file in a language subdirectory
        $fname = getLanguageFile('sys.language_text.inc', '../menu/text');
        $array1 = require_once $fname;
        if (empty($array1)) {
            // 'File $fname is empty'
            trigger_error(getLanguageText('sys0124', $fname), E_USER_ERROR);
        } // if
        unset ($array);
        // extract identity of language subdirectory
        $language = basename(dirname($fname));
        // use this language in the XSL transformation
        $GLOBALS['output_language'] = $language;
    } // if
        
    if (!is_array($array2)) {
        // find file in a language subdirectory
        $fname = getLanguageFile('language_text.inc', './text');
        $array2 = require_once $fname;
        if (empty($array2)) {
            // 'File $fname is empty'
            trigger_error(getLanguageText('sys0124', $fname), E_USER_ERROR);
        } // if
        unset ($array);
    } // if
    
    // perform lookup for specified $id ($array2 first, then $array1)
    if (isset($array2[$id])) {
        $string = $array2[$id];
    } elseif (isset($array1[$id])) {
        $string = $array1[$id];
    } else {
        // nothing found, so return original input
        return $id;
    } // if
    
    $string = convertEncoding($string, 'UTF-8');

    if (!is_null($arg1)) {
        // insert argument(s) into string
        $string = sprintf($string, $arg1, $arg2, $arg3, $arg4, $arg5);
    } // if
    
    return $string;
    
} // getLanguageText

Please note the following:

This new function has been inserted into the following places:

  1. Inside addParams2XMLdoc() to load script titles:
    $xsl_params['title'] = getLanguageText($task_id);
    
  2. Inside setActBar() to load action buttons:
    $label = getLanguageText($label);
    
  3. Inside setMenuBar() to load menu buttons:
    $button['button_text'] = getLanguageText($button['button_text']);
    
  4. Inside setNavBar() to load navigation buttons:
    $button['button_text'] = getLanguageText($button['button_text']);
    
  5. Inside setScreenStructure() to load field labels:
    $fieldlabel = getLanguageText($fieldlabel);
    
  6. In various places for all error messages, such as:
    if (strlen($fieldvalue) > $size) {
        // '$fieldname cannot be > $size characters
        $this->errors[$fieldname] = getLanguageText('sys0021', $fieldname, $size);
    } // if
    

Get Language Array

Individual arrays of translated text will be extracted from the relevant language_array.inc or sys.language_array.inc files using the following function:

function getLanguageArray ($id)
// get named array from the language file.
{
    static $array1;
    static $array2;
    
    if (!is_array($array1)) {
        // find file in a language subdirectory
        $fname = getLanguageFile('sys.language_array.inc', '../menu/text');
        $array1 = require_once $fname;
        if (empty($array1)) {
            // 'File $fname is empty'
            trigger_error(getLanguageText('sys0124', $fname), E_USER_ERROR);
        } // if
        unset ($array);
    } // if
        
    if (!is_array($array2)) {
        // find file in a language subdirectory
        $fname = getLanguageFile('language_array.inc', './text');
        $array2 = require_once $fname;
        if (empty($array2)) {
            // 'File $fname is empty'
            trigger_error(getLanguageText('sys0124', $fname), E_USER_ERROR);
        } // if
        unset ($array);
    } // if
    
    // perform lookup for specified $id ($array2 first, then $array1)
    if (isset($array2[$id])) {
        $result = $array2[$id];
    } elseif (isset($array1[$id])) {
        $result = $array1[$id];
    } else {
        // nothing found, so return original input as an array
        $result = array($id => $id);
    } // if
    
    foreach ($result as $key => $value) {
        $result[$key] = convertEncoding($value, 'UTF-8');
    } // foreach
    
    return $result;
    
} // getLanguageArray

Please note the following:

This new function should be used to obtain any array where the values will be displayed to the user. For example, instead of:

$languages = array('en' => 'English',
                   'es' => 'Spanish',
                   'fr' => 'French');

you should use the following:

$languages = getLanguageArray('languages');

Handling Dates

In the Radicore framework all date validation is handled by a standardised class (refer to A class for validating and formatting dates) so it was very easy to change this:

    $this->monthalpha = array(1 => 'Jan','Feb','Mar','Apr','May','Jun',
                                   'Jul','Aug','Sep','Oct','Nov','Dec');

or

    $this->monthalpha = array(1 => 'Janv', 'Févr', 'Mars', 'Avr', 'Mai', 'Juin',
                                   'Juil', 'Août', 'Sept', 'Oct', 'Nov', 'Déc');

to this:

    $this->monthalpha = getLanguageArray('month_names_short');

This means that when the date format is 'dd Mmm yyyy' then the 'Mmm' portion will contain the month names in the user's language.

Please note the following:

Handling Numbers

Although the English notation for numbers is to use '.' (period) for the decimal separator and ',' (comma) for the thousands separator there are some countries which use a different notation. Some have the two separators completely reversed, and some use ' ' (space) as the thousands separator. Regardless of any national conventions all numbers are processed within the program code, and stored within the database, in a common format. That is, the decimal point is a '.' (period) and there are no thousands separators.

This means that all decimal values must be formatted before they can be output to the user, and any user input must be unformatted before it can be handled by the program.

Convert to external (user's) format

A very important step in this process is therefore to identify all the decimal format conventions expected by the user. Fortunately all the relevant information can be provided by the localeconv() function. Unfortunately this requires the user's actual locale to be identified first with the setlocale() function. I say 'unfortunately' because the input to this function is the user's current locale or location whereas the only information available at present is the user's preferred language as supplied in the HTTP variables. I have got round this minor annoyance by modifying the 'languages' array used to determine the user's language to include a locale in the full language string, as in the following examples:

This means that I can now set the user's locale using code similar to the following:

    // get full language string from first entry in user_language_array
    $country = $_SESSION['user_language_array'][0][2];
    // extract locale which is enclosed in '[' and ']'
    if (!preg_match('?\[[^\[]+\]?', $country, $regs)) {
        // 'Locale is not defined in string' 
        trigger_error(getLanguageText('sys0078', $country), E_USER_ERROR);
    } // if
    $locale = trim($regs[0], '[]');
    // find out if this is a valid locale
    if (!$locale = setLocale(LC_ALL, $locale)) {
        // 'Cannot set locale'
        trigger_error(getLanguageText('sys0079', $locale), E_USER_ERROR);
    } // if

Having set the locale it is then a relatively simple exercise to convert any decimal number from internal to external format using code similar to the following:

    $decimal_places = $this->fieldspec[$fieldname]['scale'];
    $locale = localeconv();
    $decimal_point  = $locale['decimal_point'];
    $thousands_sep  = $locale['thousands_sep'];
    if ($thousands_sep == chr(160)) {
        // change non-breaking space into ordinary space
        $thousands_sep = chr(32);
    } // if
    $fieldvalue = number_format($fieldvalue,
                                $decimal_places,
                                $decimal_point,
                                $thousands_sep);

Convert to internal format

When the user presses the SUBMIT button any numbers that have been input will need to be converted back into internal format before they can be processed. This is done with code similar to the following:

function number_unformat ($input)
// convert input string into a number using settings from localeconv()
{
    $locale = localeconv();
    $decimal_point  = $locale['decimal_point'];
    $thousands_sep  = $locale['thousands_sep'];
    if ($thousands_sep == chr(160)) {
        // change non-breaking space into ordinary space
        $thousands_sep = chr(32);
    } // if
    
    $count = count_chars($input, 1);
    if ($count[ord($decimal_point)] > 1) {
        // too many decimal places
        return $input;
    } // if
    
    // split number into 2 distinct parts
    list($integer, $fraction) = explode($decimal_point, $input);
    
    // remove thousands separator
    $integer = str_replace($thousands_sep, NULL, $integer);
    
    // join the two parts back together again
    $number = $integer .'.' .$fraction;
    
    return $number;
    
} // number_unformat

Character Encoding

The process of internationalisation can be as simple as replacing a string of text in one language with a string of text in another language, or it can be much more complicated involving the use of different character sets, as explained in Notes on Internationalisation.

In order to deal with the various accented characters that exist in various languages it is necessary to use the correct character encoding, otherwise the display may be incorrect. In some cases it may result in an invalid character being written to the XML file, which will cause the XSL transformation to fail.

The best character set for internationalisation is UTF-8, which is why the next version of PHP, version 6, will have much better UTF-8 support. In the mean time, if you wish to support as wide a range of foreign languages as possible it is recommended hat you take the following steps:


Conclusion

Now that the software is in place it should be simple (in theory) to cater for new languages simply by creating a new subdirectory for the language, then dropping in a set of files which contain the translated text. This system may not be able to cater for every language or locale that exists, but it will deal with the most common ones.

My sample application has been updated with all this code, so feel free to download it and try it out. Contributions of translated files will be most welcome.

If it is necessary to provide application data in multiple languages from a single installation, then please refer to Internationalisation and the Radicore Development Infrastructure (Part 2).


References


Amendment History

01 Jan 2013 Added the Database Structure to identify that text which is obtained directly from the database.
Modified Directory Structure and File Names to remove the HELP directory and all its language subdirectories. All text will now be obtained directly from the HELP_TEXT and HELP_TEXT_ALT tables.
Modified language_text.inc to remove task details as all text will now be obtained directly from the MNU_TASK and MNU_TASK_ALT tables.
01 Jun 2009 Modified Locate Language File so that it looks for the existence of a file in the language subdirectory instead of just the existence of the language subdirectory.
01 Feb 2008 Included a reference to Internationalisation and the Radicore Development Infrastructure (Part 2) which describes a method of providing translations of application data.
27 Jan 2007 Added a new section on Character Encoding.
10 Mar 2006 Added a new section on File Contents.

counter