Don’t DELETE ON CASCADE #356

josecelano · 2023-10-20T11:21:37Z

josecelano
Oct 20, 2023
Maintainer

In one of our team’s weekly meetings, we ended up discussing a very common topic:

"Whether we should put domain logic in the database or not"

It seems to be a commonly accepted rule that stored procedures are no longer considered a good practice (at least by most of my colleagues). As always, this is not a generic rule you should never break. Reality is usually more complex and requires analyzing every single case.

Introduction

In this discussion, I’m not going to analyze the pros and cons deeply; I’ve added a couple of links for articles (above) that I consider the best starting point if you want to have a good understanding of the relationship between Object object-oriented programming and relational databases.

What I want to do is try to explore concrete examples in this project and also try to push the problem to the limit:

Could we consider the “DELETE ON CASCADE” SQL option a syntax sugar for a database-stored procedure?
Can it be considered a business logic rule that should be kept in memory objects instead of in the database implementation?

Why Embedding Business Rules in Infrastructure is Generally NOT a Good Idea

First, I would start by just mentioning some of the problems of having business logic in the database (although, again, Martin Fowler’s article is the best article to understand the trade-offs).

While the database provides powerful mechanisms for enforcing constraints, such as stored procedures, foreign keys, and triggers, over-relying on these mechanisms for business logic can create challenges:

Tight Coupling: Embedding business rules into database constraints or stored procedures can create tight coupling between the database and application logic. Changes in business requirements often lead to changes in business rules. When these rules are tightly bound to the infrastructure, changes can be cumbersome and may require coordinated updates to both the application code and the database.

Decreased Portability: Business rules embedded in the database can make it challenging to switch database providers or move to a different database architecture in the future.

Performance Issues: While database constraints like DELETE ON CASCADE can be convenient, they can sometimes lead to unintentional mass deletions or performance issues when large cascading operations are triggered.

Limited Testing: Testing logic inside stored procedures and triggers can be more challenging than testing application-level code. This is due to the inherent difficulty in creating isolated test environments for databases, and the potential side effects that stored procedures and triggers can introduce.

Complexity: Business logic can be intricate, and cramming that logic into stored procedures or database constraints can lead to very complex and hard-to-maintain database code. This is especially true when there are multiple interacting stored procedures and triggers.

Lack of Version Control: Modern software development relies heavily on version control systems. While databases can be versioned, it's generally more accessible and more straightforward to manage versioning and branching for application code than for database code. Refactoring database schema is usually more complex.

Reduced Flexibility: Over time, businesses evolve, and so do their rules. Relying heavily on database constraints and triggers can make it harder to adapt to these changes. For instance, there might be a need for a more nuanced business rule that doesn’t fit neatly into a standard database constraint.

Development Overhead: Developers proficient in writing efficient, safe, and scalable application code may not be experts in writing stored procedures or managing database triggers. This can lead to an over-reliance on specialized database developers or increase the learning curve for application developers.

Different Data Sources: It’s increasingly common that data come from different sources, not only relational databases. Trying to keep invariants only with database constraints could not be possible in such scenarios.

Schema Evolution: Changing the structure of tables with many foreign key constraints can be cumbersome.

Data Loading or interoperability: Loading data into a database can be slower and more complex due to the need to respect foreign key constraints. Migrating data to different environments can also be more challenging.

The decision to use or not use them should be based on specific project requirements and constraints. The arguments above are not definitive reasons to avoid them but rather considerations to take into account.

Pushing the limits of the rule “don’t use database constraints”

I want to continue with some examples in our application and evaluate whether using the database constraints makes sense.

I’m going to use the Torrust Index application. The Torrust Index (this repo) is a website where registered users can upload and classified torrents and guests can search for them. It is a simple application with some simple business rules that would help understand database constraints' tradeoffs.

Torrents can be classified using categories and tags.

First case: category uniqueness

Categories are defined by the admin and it does not make sense to allow duplicate categories. Right now, the name has to be unique, but in a multi-language application, we should have another identifier that represents the same category, which can be shown to the end-user in different languages:

The original definition of the table was the following:

CREATE TABLE torrust_categories (
category_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE
);

We could say that “not allowing duplicate category names” is a business rule. The business rule is enforced at the database level. In the future, we could add a new field to the category, like a parent category. Two categories could have the same name if they belong to a different parent. That business rule change would require changing the database schema. We could argue that this is a business rule and we should not put it in the database. We should check in a service domain that we do not allow duplicate names, removing the database constrain. This way, if we change the rule, we just need to change the code inn order to change the rule.

Prons

We only need to change the code.
The business rule is not coupled to the database we are using.
We can implement a new database driver without worrying about duplicating the constraint.
We can test the rule with a repository mock without using the actual database.
We can capture the error at the service level instead of catching the database exception that can differ for each database driver.

Why don’t we do that?

In this case, we would need to implement a mechanism to avoid race conditions. There could be cases where two users submit a form to add the same category, and the database would not complain, leading to duplicated category names. In this case, we prefer to put the business rule in the database because implementing the rule is more effortless.

This is one of the cases no one complains about, but I wanted to analyze even the simplest case.

Second case: delete torrent on cascade when the category is deleted

In the first implementation of the application, there was a foreign key between the torrents and categories. The category ID was a foreign key in the torrents table. The problem with that approach is all torrents using that category were deleted after deleting a category. You can see the issue here. That could have been the intended behaviour; it would be easy for admins to remove support from some torrents. For example, if the Index admin does not want to have “movies” in the index, it can simply delete the “movies” category. The original intention was not that. And I would probably not be the expected way to implement it by most end-users. We only wanted to force all torrents to have precisely one category. Finally, we decided to allow empty categories for torrents, but only if the category is removed.

This is related to an article written by Udy Dahan, “Dont’ Delete - Just Don’t”. Maybe we should not allow delete categories but move them to a different set of “deprecated categories” and let the uploaders choose a new category from the new ones. But that’s another topic. What I want to highlight is that the rule “remove all torrents if a category is removed” was only written in the database migration. When developers read the code, they don’t see that behaviour, and we would not have needed to create a new migration if we had removed the categories manually from the torrents that were using it. If that were the case, we only would have needed to change one line to assign the empty category.

Allowing us to make this change required this new database migration:

-- Step 1: Create a new table with the new structure
CREATE TABLE IF NOT EXISTS "torrust_torrents_new" (
"torrent_id" INTEGER NOT NULL,
"uploader_id" INTEGER NOT NULL,
"category_id" INTEGER NULL,
"info_hash" TEXT NOT NULL UNIQUE,
"size" INTEGER NOT NULL,
"name" TEXT NOT NULL,
"pieces" TEXT NOT NULL,
"piece_length" INTEGER NOT NULL,
"private" BOOLEAN DEFAULT NULL,
"root_hash" INT NOT NULL DEFAULT 0,
"date_uploaded" TEXT NOT NULL,
FOREIGN KEY("uploader_id") REFERENCES "torrust_users"("user_id") ON DELETE CASCADE,
FOREIGN KEY("category_id") REFERENCES "torrust_categories"("category_id") ON DELETE SET NULL,
PRIMARY KEY("torrent_id" AUTOINCREMENT)
);

-- Step 2: Copy rows from the current table to the new table
INSERT INTO torrust_torrents_new (torrent_id, uploader_id, category_id, info_hash, size, name, pieces, piece_length, private, root_hash, date_uploaded)
SELECT torrent_id, uploader_id, category_id, info_hash, size, name, pieces, piece_length, private, root_hash, date_uploaded
FROM torrust_torrents;

-- Step 3: Delete the current table
DROP TABLE torrust_torrents;

-- Step 4: Rename the new table
ALTER TABLE torrust_torrents_new RENAME TO torrust_torrents;

Which is not very simple in SQLite. The code change would have been easier.

Conclusions

The main problems I see in using database constraints to enforce business are:

No single responsibility. If you want to change a business rule, you have to change the code and also the database migrations. If you support multiple databases it's even worse becuase you have to apply the change for all of them.

It's hard to test the business rules. You cannot unit test the business rules. You need to use the database.

Handling errors when rules are broken is harder. In fact, we have a related issue for that: #253.

I’m not saying we should not use “UNIQUE” and “DELETE ON CASCADE”, but we should always ask if the rules can change and if we need the database to keep those invariants or the best place to put them.

Extra considerations

Although I prefer to put the business rules in the code, I'm not blaming databases; they are powerful tools, and we need them in some cases, not only to persist data. For example, when you define DDD aggregates, you need both the aggregate design and the database transactions. Many people think that some race conditions leading to breaking business invariants can be solved only with database transactions, and transactional consistency requires designing the DDD aggregates in a proper way. So, in-memory objects and database features are both needed. See DDD in PHP book for more information about this.

Related Issues

Move logic to avoid duplicate categories from database to services

Do not remove torrents when their categories are removed and allow torrents without categories

Quotes

“Many application developers, particularly strong OO developers like myself, tend to treat relational databases as a storage mechanism that is best hidden away.”
Martin Fowler

“My philosophy is that most of the time you should focus on writing maintainable code.”
Martin Fowler

“For any long-lived enterprise application, you can be sure of one thing - it's going to change a lot. As a result you have to ensure that the system is organized in such a way that's easy to change. Modifiability is probably the main reason why people put business logic in memory.”
Martin Fowler

“Some people still need portability, such as people who provide products that can be installed and interfaced with multiple databases. In this case there is a stronger argument against putting logic into SQL since you have to be so careful about which parts of SQL you can safely use.”
Martin Fowler

Links

Author: Udi Dahan
Title: Don’t Delete
Link: https://udidahan.com/2009/09/01/dont-delete-just-dont/

Author: Vaughn Vernon
Title: The Ideal Domain-Driven Design Aggregate Store?
Link: https://kalele.io/the-ideal-domain-driven-design-aggregate-store/

Author: Martin Fowler
Title: Domain Login and SQL
Link: https://martinfowler.com/articles/dblogic.html

Author: Joel Coehoorn
Title: When/Why to use Cascading in SQL Server?
Link: https://stackoverflow.com/q/59297/3012842

Feedback

Please let me know if you know more articles, videos or resources about this topic. I am interested in knowing what other people think about using or not only stored procedures but also basic constraints or triggers like “UNIQUE” and “DELETE ON CASCADE”.

cc @da2ce7 @mario-nt @WarmBeer @grmbyrn

dailos · 2023-10-22T20:09:07Z

dailos
Oct 22, 2023

Hi Jose, in my case, I like what Fowler says: "for long-lived enterprise", due to the requirements and feature changes, better if you don't rely on the DB for any business logic decision, including deletes on cascade, in the other hand, I would be totally fine using cascades for small application with low chances of evolving to a point where you may need to run a migration to change the cascade. In other words....for me is not black or white, you just use it depending on the use case: if it will be convenient, safe and cost-effective in the long term, why not?

0 replies

josecelano · 2024-03-08T16:01:52Z

josecelano
Mar 8, 2024
Maintainer Author

One problem using DELETE ON CASCADE. With SQLite altering a column is not direct and can lead to lose of data if you do not do it carefully #526 :-/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don’t DELETE ON CASCADE #356

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Don’t DELETE ON CASCADE #356

josecelano Oct 20, 2023 Maintainer

Introduction

Why Embedding Business Rules in Infrastructure is Generally NOT a Good Idea

Pushing the limits of the rule “don’t use database constraints”

First case: category uniqueness

Second case: delete torrent on cascade when the category is deleted

Conclusions

Extra considerations

Related Issues

Quotes

Links

Feedback

Replies: 2 comments

dailos Oct 22, 2023

josecelano Mar 8, 2024 Maintainer Author

josecelano
Oct 20, 2023
Maintainer

dailos
Oct 22, 2023

josecelano
Mar 8, 2024
Maintainer Author