Don’t DELETE ON CASCADE #356
Replies: 2 comments
-
Hi Jose, in my case, I like what Fowler says: "for long-lived enterprise", due to the requirements and feature changes, better if you don't rely on the DB for any business logic decision, including deletes on cascade, in the other hand, I would be totally fine using cascades for small application with low chances of evolving to a point where you may need to run a migration to change the cascade. In other words....for me is not black or white, you just use it depending on the use case: if it will be convenient, safe and cost-effective in the long term, why not? |
Beta Was this translation helpful? Give feedback.
-
One problem using DELETE ON CASCADE. With SQLite altering a column is not direct and can lead to lose of data if you do not do it carefully #526 :-/ |
Beta Was this translation helpful? Give feedback.
-
In one of our team’s weekly meetings, we ended up discussing a very common topic:
"Whether we should put domain logic in the database or not"
It seems to be a commonly accepted rule that stored procedures are no longer considered a good practice (at least by most of my colleagues). As always, this is not a generic rule you should never break. Reality is usually more complex and requires analyzing every single case.
Introduction
In this discussion, I’m not going to analyze the pros and cons deeply; I’ve added a couple of links for articles (above) that I consider the best starting point if you want to have a good understanding of the relationship between Object object-oriented programming and relational databases.
What I want to do is try to explore concrete examples in this project and also try to push the problem to the limit:
Why Embedding Business Rules in Infrastructure is Generally NOT a Good Idea
First, I would start by just mentioning some of the problems of having business logic in the database (although, again, Martin Fowler’s article is the best article to understand the trade-offs).
While the database provides powerful mechanisms for enforcing constraints, such as stored procedures, foreign keys, and triggers, over-relying on these mechanisms for business logic can create challenges:
Tight Coupling: Embedding business rules into database constraints or stored procedures can create tight coupling between the database and application logic. Changes in business requirements often lead to changes in business rules. When these rules are tightly bound to the infrastructure, changes can be cumbersome and may require coordinated updates to both the application code and the database.
Decreased Portability: Business rules embedded in the database can make it challenging to switch database providers or move to a different database architecture in the future.
Performance Issues: While database constraints like
DELETE ON CASCADE
can be convenient, they can sometimes lead to unintentional mass deletions or performance issues when large cascading operations are triggered.Limited Testing: Testing logic inside stored procedures and triggers can be more challenging than testing application-level code. This is due to the inherent difficulty in creating isolated test environments for databases, and the potential side effects that stored procedures and triggers can introduce.
Complexity: Business logic can be intricate, and cramming that logic into stored procedures or database constraints can lead to very complex and hard-to-maintain database code. This is especially true when there are multiple interacting stored procedures and triggers.
Lack of Version Control: Modern software development relies heavily on version control systems. While databases can be versioned, it's generally more accessible and more straightforward to manage versioning and branching for application code than for database code. Refactoring database schema is usually more complex.
Reduced Flexibility: Over time, businesses evolve, and so do their rules. Relying heavily on database constraints and triggers can make it harder to adapt to these changes. For instance, there might be a need for a more nuanced business rule that doesn’t fit neatly into a standard database constraint.
Development Overhead: Developers proficient in writing efficient, safe, and scalable application code may not be experts in writing stored procedures or managing database triggers. This can lead to an over-reliance on specialized database developers or increase the learning curve for application developers.
Different Data Sources: It’s increasingly common that data come from different sources, not only relational databases. Trying to keep invariants only with database constraints could not be possible in such scenarios.
Schema Evolution: Changing the structure of tables with many foreign key constraints can be cumbersome.
Data Loading or interoperability: Loading data into a database can be slower and more complex due to the need to respect foreign key constraints. Migrating data to different environments can also be more challenging.
The decision to use or not use them should be based on specific project requirements and constraints. The arguments above are not definitive reasons to avoid them but rather considerations to take into account.
Pushing the limits of the rule “don’t use database constraints”
I want to continue with some examples in our application and evaluate whether using the database constraints makes sense.
I’m going to use the Torrust Index application. The Torrust Index (this repo) is a website where registered users can upload and classified torrents and guests can search for them. It is a simple application with some simple business rules that would help understand database constraints' tradeoffs.
Torrents can be classified using categories and tags.
First case: category uniqueness
Categories are defined by the
admin
and it does not make sense to allow duplicate categories. Right now, the name has to be unique, but in a multi-language application, we should have another identifier that represents the same category, which can be shown to the end-user in different languages:The original definition of the table was the following:
We could say that “not allowing duplicate category names” is a business rule. The business rule is enforced at the database level. In the future, we could add a new field to the category, like a parent category. Two categories could have the same name if they belong to a different parent. That business rule change would require changing the database schema. We could argue that this is a business rule and we should not put it in the database. We should check in a service domain that we do not allow duplicate names, removing the database constrain. This way, if we change the rule, we just need to change the code inn order to change the rule.
Prons
Why don’t we do that?
In this case, we would need to implement a mechanism to avoid race conditions. There could be cases where two users submit a form to add the same category, and the database would not complain, leading to duplicated category names. In this case, we prefer to put the business rule in the database because implementing the rule is more effortless.
This is one of the cases no one complains about, but I wanted to analyze even the simplest case.
Second case: delete torrent on cascade when the category is deleted
In the first implementation of the application, there was a foreign key between the torrents and categories. The category ID was a foreign key in the
torrents
table. The problem with that approach is all torrents using that category were deleted after deleting a category. You can see the issue here. That could have been the intended behaviour; it would be easy for admins to remove support from some torrents. For example, if the Index admin does not want to have “movies” in the index, it can simply delete the “movies” category. The original intention was not that. And I would probably not be the expected way to implement it by most end-users. We only wanted to force all torrents to have precisely one category. Finally, we decided to allow empty categories for torrents, but only if the category is removed.This is related to an article written by Udy Dahan, “Dont’ Delete - Just Don’t”. Maybe we should not allow delete categories but move them to a different set of “deprecated categories” and let the uploaders choose a new category from the new ones. But that’s another topic. What I want to highlight is that the rule “remove all torrents if a category is removed” was only written in the database migration. When developers read the code, they don’t see that behaviour, and we would not have needed to create a new migration if we had removed the categories manually from the torrents that were using it. If that were the case, we only would have needed to change one line to assign the empty category.
Allowing us to make this change required this new database migration:
Which is not very simple in SQLite. The code change would have been easier.
Conclusions
The main problems I see in using database constraints to enforce business are:
No single responsibility. If you want to change a business rule, you have to change the code and also the database migrations. If you support multiple databases it's even worse becuase you have to apply the change for all of them.
It's hard to test the business rules. You cannot unit test the business rules. You need to use the database.
Handling errors when rules are broken is harder. In fact, we have a related issue for that: #253.
I’m not saying we should not use “UNIQUE” and “DELETE ON CASCADE”, but we should always ask if the rules can change and if we need the database to keep those invariants or the best place to put them.
Extra considerations
Although I prefer to put the business rules in the code, I'm not blaming databases; they are powerful tools, and we need them in some cases, not only to persist data. For example, when you define DDD aggregates, you need both the aggregate design and the database transactions. Many people think that some race conditions leading to breaking business invariants can be solved only with database transactions, and transactional consistency requires designing the DDD aggregates in a proper way. So, in-memory objects and database features are both needed. See DDD in PHP book for more information about this.
Related Issues
Move logic to avoid duplicate categories from database to services
Do not remove torrents when their categories are removed and allow torrents without categories
Quotes
“Many application developers, particularly strong OO developers like myself, tend to treat relational databases as a storage mechanism that is best hidden away.”
Martin Fowler
“My philosophy is that most of the time you should focus on writing maintainable code.”
Martin Fowler
“For any long-lived enterprise application, you can be sure of one thing - it's going to change a lot. As a result you have to ensure that the system is organized in such a way that's easy to change. Modifiability is probably the main reason why people put business logic in memory.”
Martin Fowler
“Some people still need portability, such as people who provide products that can be installed and interfaced with multiple databases. In this case there is a stronger argument against putting logic into SQL since you have to be so careful about which parts of SQL you can safely use.”
Martin Fowler
Links
Author: Udi Dahan
Title: Don’t Delete
Link: https://udidahan.com/2009/09/01/dont-delete-just-dont/
Author: Vaughn Vernon
Title: The Ideal Domain-Driven Design Aggregate Store?
Link: https://kalele.io/the-ideal-domain-driven-design-aggregate-store/
Author: Martin Fowler
Title: Domain Login and SQL
Link: https://martinfowler.com/articles/dblogic.html
Author: Joel Coehoorn
Title: When/Why to use Cascading in SQL Server?
Link: https://stackoverflow.com/q/59297/3012842
Feedback
Please let me know if you know more articles, videos or resources about this topic. I am interested in knowing what other people think about using or not only stored procedures but also basic constraints or triggers like “UNIQUE” and “DELETE ON CASCADE”.
cc @da2ce7 @mario-nt @WarmBeer @grmbyrn
Beta Was this translation helpful? Give feedback.
All reactions