Date: 6.11.2024
We have a simple SQL table.
CREATE TABLE person (
name VARCHAR(40),
year INT,
place VARCHAR(20),
number INT
);
Let us have an SQL command with AND
operator (conjunctive SQL query).
SELECT * FROM person
WHERE place IN ('London', 'Neverland') and year >= 1900;
We ask a simple question. Is it equivalent to the following query?
SELECT * FROM person
WHERE place in ('London', 'Neverland')
INTERSECT
SELECT * FROM person
WHERE year >= 1900;
If they are not equivalent, we should simply find a counterexample. It should not be difficult because the INTERSECT
removes duplicates; therefore, if we insert two same rows, the SQL queries return different results. Here is an example of data.
INSERT INTO person VALUES('James Bond', 1953, 'London', 7);
INSERT INTO person VALUES('James Bond', 1953, 'London', 7);
Now let's improve the query using INTERSECT ALL
:
SELECT * FROM person
WHERE place in ('London', 'Neverland')
INTERSECT ALL
SELECT * FROM person
WHERE year >= 1900;
Now it is not possible to find any counterexample. Fine. Lets ask another question. If we have a query
SELECT name FROM person
WHERE place IN ('London', 'Neverland') and year >= 1900;
Is it equivalent to the following SQL query?
SELECT name FROM person
WHERE place in ('London', 'Neverland')
INTERSECT ALL
SELECT name FROM person
WHERE year >= 1900;
If we think about the queries more thoroughly, we can realize that a name
could be duplicated by the INTERSECT ALL
version of the query. Can you pick data where these two queries return different results? It is not straightforward. I even asked chatGPT out of curiosity. He correctly recognized that the SQL queries were probably not equivalent. However, he was unable to provide a counterexample. Therefore, after several prompts, I convinced him these SQL queries are equivalent. Here is the ending of our conversation. :)
Deciding a SQL query equivalenece is a hard problem. Proving that two SQL commands are equivalent is still an open question for many queries. Here is a list of several articles that deal with SQL formalization and SQL query equivalence.
- Benzaken, Véronique, and Evelyne Contejean. "A Coq mechanised formal semantics for realistic SQL queries: formally reconciling SQL and bag relational algebra." In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 249-261. 2019.
- Zhou, Qi, Joy Arulraj, Shamkant Navathe, William Harris, and Dong Xu. "Automated verification of query equivalence using satisfiability modulo theories." Proceedings of the VLDB Endowment 12, no. 11 (2019): 1276-1288.
- Chu, Shumo, Konstantin Weitz, Alvin Cheung, and Dan Suciu. "HoTTSQL: Proving query rewrites with univalent SQL semantics." ACM SIGPLAN Notices 52, no. 6 (2017): 510-524.
- Wang, Shuxian, Sicheng Pan, and Alvin Cheung. "QED: A Powerful Query Equivalence Decider for SQL." Proceedings of the VLDB Endowment 17, no. 11 (2024): 3602-3614.