3rd September 2006
A lot of developers are taught to use database stored procedures, triggers and database constraints at every possible opportunity, and they cannot understand why an old dinosaur like me should choose to take an opposite view. The reason can be summed up quite simply:
You only know what you have been taught, whereas I know what I have learned.
I was weaned on file systems and databases which did not have any facilities for stored procedures and triggers, so I learned how to build applications without them. When such facilities became available my colleagues and I still never used them for practical reasons:
Our golden rule was:
Use stored procedures and triggers only when it is an absolutely necessity.
This is in total conflict with the attitude of today's wet-behind-the-ears tenderfoot greenhorns who seems to think:
Use stored procedures and triggers at every possible opportunity simply because you can.
Amongst the arguments in favour of stored procedures are:
Some people argue that putting ad-hoc SQL in your business layer (BL) code is not that good. Agreed, but who said that the only alternative is stored procedures? Why not have a DAL that generates the SQL query at runtime based on information passed to it by the BL? It is correct to say that small changes to the database can have severe impacts on the application. However, changes to a relational model will always have an impact on the application that targets that model: add a non-nullable column to a table and you will see what I mean. You can use stored procedures or ad-hoc queries, you have to change the calling code to make sure that column gets a value when a new row is inserted. For Ad-hoc queries, you change the query, and you're set. For stored procedures, you have to change the signature of the stored procedure, since the INSERT/UPDATE procs have to receive a value for the new column. This can break other code targeting the stored procedure as well, which is a severe maintenance issue. A component which generates the SQL on the fly at runtime doesn't suffer from this: it will for example receive an entity which has to be saved to the database, that entity contains the new field, the SQL is generated and the entity is saved. No maintenance problems. With a stored procedure this wouldn't be possible.
This is a common argument that many people echo without realising that it became defunct when role-based security was made available. A good DBA defines user-roles in the database, and users are added to those roles and rights are defined per role, not per user. This way, it is easy to control which users can insert / update and which users can for example select or delete or have access to views in an easy way.
With a view it is possible to control which data is accessed on a column basis or row basis. This means that if you want user U to select only 2 or so columns from a table, you can give that user access to a view, not the underlying table. The same goes for rows in one or more tables. Create a view which shows those rows, filtering out others. Give access rights to the view, not the table, obviously using user-roles. This way you can limit access to sensitive data without having to compromise your programming model because you have to move to stored procedures.
It is also said that stored procedures are more secure because they prevent SQL injection attacks. This argument is false for the simple reason that it is possible to have a stored procedure which concatenates strings together and therefore open itself up to sql injection attacks (generally seen in systems which use procedures and have to offer some sort of general search routine), while the use of parameterized queries removes this vulnerability as no value can end up as being part of the actually query string.
The execution of SQL statements in stored procedures may have been faster than with dynamic SQL in the early days of database systems, but that advantage has all but disappeared in the current versions. In some cases a stored procedure may even be slower than dynamic SQL, so this argument is as dead as a Dodo.
Performance should not be the first question. My belief is that most of the time you should focus on writing maintainable code. Then use a profiler to identify hot spots and then replace only those hot spots with faster but less clear code. The main reason to do this is because in most systems only a very small proportion of the code is actually performance critical, and it's much easier to improve the performance of well factored maintainable code.
While stored procedures may run faster, they take longer to build, test, debug and maintain, therefore this extra speed comes at a price. If the same function can be performed inside application code at an acceptable speed, what is the advantage of spending more money to make it run faster at a more-than-acceptable speed? It is OK to use stored procedures when you absolutely need a performance gain, but until then they're nothing but premature optimization.
A similar argument is that by not using what the company has paid for, you are effectively wasting the company's money. I'm sorry, but using something because it's there is just not good enough. If I can achieve something inside my application with application code, then I must be given a very good reason to move it out of my application and into the database. Believe it or not there are costs involved in moving logic from one place to another, and those costs must be offset by measurable benefits.
No it's not. Application code is built using a programming language whereas SQL is nothing more than a data manipulation language, and is therefore very limited in its scope. There is absolutely nothing that can be done in a stored procedure that cannot also be done in application code, but the converse is not true.
Amongst the arguments against stored procedures are:
Instead of having a structure which separates concerns in a tried and trusted way - GUI, business logic and storage - you now have logic intermingling with storage, and logic on multiple tiers within the architecture. This causes potential headaches down the road if that logic has to change.
The reason for this is that stored procedures form an API by themselves. Changing an API is not that good, it will break a lot of code in some situations. Adding new functionality or new procedures is the "best" way to extend an existing API. A set of stored procedures is no different. This means that when a table changes, or behaviour of a stored procedure changes and it requires a new parameter, a new stored procedure has to be added. This might sound like a minor problem but it isn't, especially when your system is already large and has run for some time. Every system developed runs the risk of becoming a legacy system that has to be maintained for several years. This takes a lot of time, because the communication between the developer(s) who maintain/write the stored procedures and the developer(s) who write the DAL/BL code has to be intense: a new stored procedure will be saved fine, however it will not be called correctly until the DAL code is altered. When you have Dynamic SQL in your BL at your hands, it's not a problem. You change the code there, create a different filter, whatever you like and whatever fits the functionality to implement.
Microsoft also believes stored procedures are over: it's next generation business framework MBF is based on Objectspaces, which generates SQL on the fly.
Business logic in stored procedures is more work to test than the corresponding logic in the application. Referential integrity will often force you to setup a lot of other data just to be able to insert the data you need for a test (unless you're working in a legacy database without any foreign key constraints). Stored procedures are inherently procedural in nature, and hence harder to create isolated tests and prone to code duplication. Another consideration, and this matters a great deal in a sizable application, is that any automated test that hits the database is slower than a test that runs inside of the application. Slow tests lead to longer feedback cycles.
If all the business logic is held in the database instead of the application then the database becomes the bottleneck. Once the load starts increasing the performance starts dropping. With business logic in the application it is easy to scale up simply by adding another processor or two, but that option is not readily available if all that logic is held in the database.
If you have a system with 100's of distributed databases it is far more difficult to keep all those stored procedures and triggers synchronized than it is to keep the application code synchronized.
This is a big issue if you want an application where the customer can insert their own business logic, or where different logic is required by different customers. Achieving this with application code is a piece of cake, but with database logic it is a can of worms.
A big problem with database triggers is that the application does not know that they exist, therefore does not know whether they have run or not. This became a serious issue in one application (not written by me) which I was maintaining. A new DBA who was not aware of the existence of all these triggers did something which deactivated every trigger on the main database. The triggers were still there, they had not been deleted, but they had been turned off so did not fire and do what they were supposed to do. This mistake took several hours to spot and several days to fix.
It is easy to control all changes to application code by running it through a proper version control system, but those facilities do not exist for stored procedures and triggers. How much damage could be caused if a stored procedure were to get out of sync with the application code? How easy is it to check that the application is running with the correct versions? How much more difficult would it be if the application you were supporting was running on a remote site with nothing more than a dial-up connection?
This is a reason why some teams avoid stored procedures like the plague - it eliminates an area of potentially disastrous screw-ups.
You may think that this is not a problem if you build and maintain the applications for a single company where a change in database vendor is highly unlikely, but what happens should the company decide that their DBMS is no longer flavour of the month and they want to change to a different DBMS? This may be due to several factors, such as spiraling costs or poor performance, but when it happens you will find that a lot of code will have to be rewritten. Porting the data will be one exercise, but porting the stored procedures and triggers will be something else entirely. Now, if all that logic were held inside the application, how much simpler would it be?
Believe it or not there are people out there who write applications which are database-independent for the simple reason that the applications may be used by many different companies, and those many companies may not all use the same DBMS. Those that do use the same DBMS may not be using the same version, and stored procedures written for one version may not be compatible with another.
As far as I am concerned the use of stored procedures, database triggers and foreign key restraints is OPTIONAL, not MANDATORY, therefore I am free to exercise my option not to use them. That is my choice, and the software that I produce does not suffer in any way, therefore it cannot be defined as the wrong choice.
The web application framework that I have built using PHP does not use stored procedures, database triggers or foreign key constraints, yet it does not suffer from any lack of functionality. This is possible simply because I can do everything I want inside my application where it is instantly accessible and customisable. To those of you who instantly jump to the (wrong) conclusion that this must mean that I have to write a huge amount of duplicated SQL statements my answer is simple - I don't write any SQL statements at all, they are all generated dynamically at runtime. This is all due to the framework being built using the 3 Tier Architecture which has a clear separation of concerns:
© Tony Marston
3rd September 2006