When tuning SQL queries, sometimes I see these patterns, and I can generally speed them up with these solutions:
“EXISTS” beats “IN”.
If you’re writing a query like:
SELECT *
FROM foobar
WHERE foobar.bar_id IN
( SELECT bar.id
FROM bar)
This query will often be faster (and I’ve never seen it slower)
SELECT *
FROM foobar
WHERE EXISTS
( SELECT *
FROM bar
WHERE foobar.bar_id = bar.id)
No DISTINCT in subqueries
“Distinct” causes two extra steps (sort and filter). If you’re writing a query like this:
SELECT *
FROM foobar
WHERE EXISTS
( SELECT DISTINCT *
FROM bar
WHERE foobar.bar_id = bar.id)
The distinct is unnecessary and may not be “optimized away” by the database engine.
Avoid Subqueries Where Possible
Relational engines are built to do joins. So you an often (but not always) rewrite an “in” subquery as a join. Especially if the subquery is on the “one” side of a “one-to-many” relationship with the outer query.
Only Correlate 1 Level Deep
If you’re writing code like this:
SELECT *
FROM foobar
WHERE EXISTS
( SELECT *
FROM bar
WHERE foobar.bar_id = bar.id
AND EXISTS
( SELECT *
FROM snafu
WHERE bar.id = snafu.bar_id)
)
The interior queries can be flattened into a join, which will be quicker.
SELECT *
FROM foobar
WHERE EXISTS
( SELECT *
FROM bar
JOIN snafu ON bar.id = snafu.bar_id
WHERE foobar.bar_id = bar.id
)
Avoid Applying Functions to Columns in WHERE Clauses
This query:
SELECT *
FROM foobar
WHERE LEFT(foobar.name, 5) = 'Chris'
May be slower (and will never be faster) than:
SELECT *
FROM foobar
WHERE foobar.name LIKE 'Chris%';
(This is because in the first query, an index on foobar.name may not be usable, since the engine has to change the value of the column to do the comparison. It’s no longer a “search argument” or “SARG”)
Avoid “select from ( select)”
I’ve never seen this be fast.
SELECT *
FROM
( SELECT *
FROM foo
)
You can take whatever you have been doing in the two statements and flatten them into a traditional query using joins and aggregates.
And lastly…
Complicated “WHERE” Clauses Are Slow
You can’t always get around it – sometimes you’re asking a complicated question. But the more complicated the question, the more likely the database server is going to give up and just brute-force the solution. Sometimes you can do better by breaking the query into a series of simpler queries. Options for doing this are:
- Using a temporary table.
- Using UNION (or UNION ALL) instead of OR