Tony Marston's Blog About software development, PHP and OOP

Case Sensitive Software is EVIL

Posted on 27th January 2006 by Tony Marston

Amended on 5th August 2006

In my article Breaking Backwards Compatibility is EVIL I voiced my disapproval of the idea of introducing case sensitivity for function names into PHP when version 6 is released. This would mean that a function name with the same spelling but different case would be treated as a completely different function. PHP already has case-sensitive variable names (yuck!), and some of the core developers want to extend it for no other reason than to be "consistent" with other languages. This is such a controversial topic that I decided to split it off as a separate article.

As far as I am concerned the introduction of case-sensitivity into software (operating systems, compilers and tools) was the worst ever decision in the entire history of computing. Personally I blame the authors of Unix as they not only created an operating system which was illogical, unintuitive and user-unfriendly, but they also were either too stupid or too lazy to create a case-insensitive file system. All the existing computer systems were case-insensitive, so what was the justification for the change? To make matters worse this caused every piece of software written to run on a Unix system to also be case-sensitive, and so this stupid mistake has spread like a plague.

In the past 30 years I have worked on a variety of mainframe, mini-computer and micro-computer (PC) systems, and none of these has been case-sensitive in any way - this includes the operating systems, compilers, text and document editors, and database query tools. The authors of this software saw no need for case-sensitivity, and none of the users ever requested it, so where did this stupid idea come from?

The Windows operating system, the most widely used OS in the world today, is not sensitive to case, and neither is any tool or application which runs on it. Does this cause any problems? I think not. Would it cause any problems if this software were to decide that case was important? You betcha!

Can you name me one single problem where introducing case-sensitivity was the solution? On the other hand I have lost count of the situations where having case-sensitivity was actually the cause of the problem, so the idea of implementing something which causes problems instead of solving them strikes me as being incredibly stupid. If you think that case-sensitivity is important, can you answer the following questions:

  1. Is any computer language issued with a set of function names in different combinations of upper and lower case, such as readfile(), readFile(), ReadFile() and READFILE(), where each combination of case means something different?
  2. Is any computer language issued with a set of variable names in different combinations of upper and lower case, such as box, Box and BOX, where each combination of case means something different?
  3. If this feature were available would you encourage or discourage its use?
  4. If you would discourage the use of this feature as (presumably) it would lead to code which was difficult to maintain, would it not make sense to remove it from the language so that it could not be used even by accident?
  5. A change in case does not change the meaning of a word in spoken language (just check any dictionary to see if the same word has different entries for upper and lower case) so why should it be any different in a computer language?

NOTE: Some people who read this article are jumping to the wrong conclusion. I am not advocating FOR the right to deliberately use one combination of case for a variable or function only to use a different combination of case elsewhere in the same program. Anybody who deliberately does such a thing deserves a good talking to. What I am advocating AGAINST is the situation where a variable or function, if defined or referenced in a different case, actually becomes a totally different object. If I encounter readfile(), readFile(), ReadFile() and READFILE() it causes far more problems if they are totally different functions than it does by being the same function but with different case.

Using a different case may offend the sensibilities of some delicate souls out there (oh, the poor little darlings), but the consequences of having a series of different functions or variables which have the same spelling but different case are far more serious as they can create genuine problems. In my many decades of experience only a complete moron cannot handle a simple change of case, and only a complete moron thinks that having readfile(), readFile(), ReadFile() and READFILE() as a series of different functions is a Good Thing.

In order to aid those of you who are intellectually impaired let me give you some practical examples.

Those who think that option (1) is best and option (2) should be avoided are actually agreeing with my argument. Those who think that option (2) is a Good Thing are beyond redemption and should be subject to involuntary euthanasia at the earliest possible opportunity. I am not the only person who thinks that using duplicate names that differ subtly only in case is not a good idea - check out item 21 on How to write unmaintainable code.

Those of who who say that the ability to write the same name in different case is wrong are confusing "mildly annoying" with "catastrophic". They completely fail to realise that the potential for genuine mistakes is far greater in software which is case sensitive. Most reasonable people would not even notice a slight change in case as they concentrate on the spelling of a word, not the case in which it is written. Those who complain about case sensitivity are being overly sensitive (pun intended). In fact I would go so far as to call them nit-picking, anal retentive, OCPD sufferers who are in serious need of a reality check.

If constructs such as GOTO are removed from modern languages due to their propensity to produce spaghetti code, then why include a feature that can help produce an even worse mess? Some languages which have case-sensitivity (such as Visual Basic) avoid any such problems by automatically changing the case of any variable or function names as they are keyed in to whatever has been previously declared, thus making the fact that the are case-sensitive totally invisible. While this can work with languages which are statically typed and compiled (such as VB), it is more difficult to implement in those which are dynamically typed and interpreted (such as PHP).

Some of the arguments I have heard in favour of case-sensitivity are pretty weak:

The fact that some languages and tools have case-sensitivity is no excuse for insisting that ALL languages and tools be changed to implement case-sensitivity "just to be consistent". Ideas are supposed to be implemented because they are good ideas, because they provide benefits or solve problems. Implementing a bad idea just to be consistent with other languages simply perpetuates a consistently bad idea.

Some of my critics like to argue that "plenty of modern languages are case sensitive, yet nobody complains about any problems they cause". But what they fail to notice is that most of these languages trap the situation where a variable or function is declared or referenced with the same spelling but a different case and are able to deal with it before it can cause any problems. This is what happens in Visual Basic for example:

So even though Visual Basic is case sensitive it does not allow the same name (i.e. with the same spelling) to exist more than once with different combinations of upper and lower case. Thus the functions readfile(), readFile() and ReadFile() are exactly the same, and the variables somedata, someData and SomeData are exactly the same. The VB IDE automatically corrects any variation in case, so the use of different case does not cause any problems. How many other languages shield the unwary programmer from differences in case in the same way? THAT is why programmers never complain about case sensitivity causing problems, for the simple reason that any problems which COULD be caused are automatically detected and corrected by the IDE or notified at the time of compilation. It is just not possible to create a compiled VB program which contains the same spelling but with different case even though the language is (apparently) sensitive to case.

Where a language does NOT provide this auto-correction facility, thus deliberately allowing the same name with different case to refer to different objects, this can lead to situations which are a maintenance nightmare.

The reason that few programmers complain about the problems caused by case sensitive software is that their complaints are instantly rejected. "It's the standard" they are told, "so you must learn to live with it". Few programmers have the audacity to question such stupid practices, but I am not so timid. I have had my share of being forced to work with second-rate standards which were full of half-baked ideas, and I know what a joy it is to work with first-rate standards where every statement is properly explained and justified. Statements which cannot be explained or justified have no place in any standards, and I'm afraid that explanations such as "it's the standard", or "it's consistent" or "because I say so" just don't qualify.

How many languages or libraries come supplied with functions and variables which exist more than once but with different case? The answer is NONE! Why not? Because it is not considered to be good practice. It would cause immense amounts of confusion and maintenance headaches. So, if this language "feature" is avoided by all language authors and competent programmers because of its potential for misuse, then why do these languages allow this "feature" to exist in the first place? If the GOTO statement has been eliminated from many languages due to the problems which can be caused by its misuse, then why not remove case-sensitivity for exactly the same reason? After all, this would be "consistent" and "promote good practice".

Traditional naming conventions state that all functions and variables should be given names that are meaningful and descriptive - a function name should describe what it does, and a variable name should describe what it contains. This means that if you want a different function or a different variable then you create a different name with different spelling and therefore a different meaning, not the same name in a different case. Am I really the only person to see this?

Those who say that the correct use of naming conventions avoids any problems with case sensitivity are missing the point - it does not solve the problem, it merely hides it. It simply papers over the crack, but the crack is still there and waiting to catch the unwary. It does not prevent programmers from using what is supposed to be the "wrong" case, either accidentally or deliberately. If you have ever debugged a program where the problem was caused by the incorrect use of case you will know what a ridiculous problem this is. Doing this accidentally is excusable, but some perverse programmers do it deliberately just to cause confusion, to create obfuscated code which only they can maintain. In my opinion if the accidental or deliberate use of the wrong case can cause such problems then the ability to use the wrong case should be removed from the language. The re-introduction of case insensitive software, or at least case preserving software, even if it were limited to variable names and function names, would eliminate such annoying problems without any downside whatsoever.

As far as I am concerned if a computer language does not care which case a token is written in then neither should any programmer. If a programmer cannot look at code and understand how that code will be processed by the computer then he is, quite frankly, in the wrong profession. Every programmer is exposed to mixtures of upper and lower case in the outside world before he becomes a programmer, so anyone who cannot understand source code which is written in a mixture of upper and lower case is, quite frankly, in the wrong profession.

There are some people who try to justify the use of case sensitive software with spurious reasons:

  1. They say it is more efficient because it doesn't have to try all possible combinations of case when it performs a token lookup, such as when processing an HTML or XML document. Why don't these people use a simple trick that I came across decades ago? When a file is first read into memory just convert all the tokens to lowercase, then all lookups need be done only once in lowercase.
  2. They say it is necessary in some languages because they have characters which exist in lower case only and do not have an equivalent in upper case. This just shows that their thinking is deficient - instead of saying that such a character does not exist in upper case they should instead say that such a character does not exist in a different form in upper case. So every character will exist in both lower and upper case, but in some circumstances both characters will be the same.
  3. They say that it is necessary in some languages as trying to convert all tokens in upper- or lower-case for comparisons is not possible as there is a one-to-any relationship instead of the normal one-to-one relationship. The prime example given is the lowercase German character ß which for a long time was represented as SS in uppercase. Although typographers in Germany created and used a single uppercase equivalent as far back as 1905, it was not formally adopted until 2017.

Have you also given thought to the time when keyboards give way to voice-controlled input? How cumbersome will it be not just to say the word but to spell out the case of every single letter? Do you think your audience will congratulate you for your foresight and wisdom? I don't think so.

So remember, when you say that you are in favour of case sensitive software you are also saying that you are in favour of the following:

Am I really the only one who thinks that these are NOT good ideas? Apparent not. Please take a look at the following:

I think that the article sums up the argument quite nicely:

There is no longer any excuse for making humans learn and handle the quirks of the way computers store upper- and lower-case characters. Instead, software should handle the quirks of human language.


Amendment History

5th Aug 2006 Added a NOTE for those of questionable intelligence who fail to understand exactly what it is I am arguing about, and whether I am FOR it or AGAINST it.