SQL Optmization theory

Brishen and I have been having ongoing disucussions revolving around query optimization theory. I thought I’d toss this out to the world (or the tiny proportion of the world that sees this site).

Currently, we’re discussing constraints. If a.childID = b.objectID is all that I need to create the right link between tables, would a.childID = b.objectID AND a.childtype = ‘B’ be more efficient? Tests against SQL Server for me show that this appears to be true if a.childtype is indexed, but if it is not indexed, it appears to be about the same speed (a.childID is indexed in both cases). Is there a general rule that can be applied here? Such as: if the constraints apply to indexed columns, it is better to include as many as possible, but if the columns are not indexed, as few as possible?

A further questions concerns using WHERE a IN (option1,option2,option3) vs. a = option1 OR a = option2 OR a=option3. Which is better when? Is there a limit to how large the list used with IN () should be? Personal experience has suggested not really, but the largest I’ve used is not that large.

Furthermore, while they should be equal, is FROM tableX JOIN TableY on x.objectID = y.objectID equal to FROM tableX,TableY WHERE x.objectID = y.objectID, performance-wise?

Every time I leave a largeish query alone for a while, and return to it a few months later, I find little tweaks that I can do to improve it. What I’d really like is some way of learning all the ways of optimizing queries in the first place, even if it is technology specific. Can anyone out there in the world recommend any books about query optimization theory, or a series of experiments comparing the various ways of accomplishing the same things through SQL? If so, I, and Brishen, would be most appreciative.

8 Replies to “SQL Optmization theory”

brishen says:

October 15, 2002 at 11:22 pm

just for those that are wondering where my page is, it’s at littlebluething.com (with the ‘e’) 😉

Reply
brishen says:

October 15, 2002 at 3:22 pm

just for those that are wondering where my page is, it’s at littlebluething.com (with the ‘e’) 😉

Reply
David Grant says:

October 16, 2002 at 2:56 am

I did an IN() with about 30 items in it the other day. Of course if there is a limit, it’s probably more like 1.2 bo-zillion.

Some of this info is pretty good:
http://www.sql-server-performance.com/transact_sql.asp

Reply
David Grant says:

October 15, 2002 at 6:56 pm

I did an IN() with about 30 items in it the other day. Of course if there is a limit, it’s probably more like 1.2 bo-zillion.

Some of this info is pretty good:
http://www.sql-server-performance.com/transact_sql.asp

Reply
Stv. says:

October 16, 2002 at 5:22 pm

Yeah – I’ve had up to 300 before in an in, and the query (which is a fairly complex one beforehand), ran at about 0.8 seconds, which is horribly slow, really.
I’ll definitely check out that link; thanks for that.

Reply
Stv. says:

October 16, 2002 at 9:22 am

Yeah – I’ve had up to 300 before in an in, and the query (which is a fairly complex one beforehand), ran at about 0.8 seconds, which is horribly slow, really.
I’ll definitely check out that link; thanks for that.

Reply
David Grant says:

October 24, 2002 at 9:29 pm

I’m a little late on this, but I have done a tiny bit of research on the “IN () vs. OR OR OR” topic.

x IN (1,2,3) is a bit slower than its OR counterpart: x = 1 OR x = 2 OR x = 3

Test subjects:
SELECT COALESCE(SUM(oy.price * oy.qty),0)
FROM Order_Yearbook oy, Transactions t
WHERE oy.orderID = t.orderID
AND oy.optionID IN (3,5,7,8,9,10)

which a trace showed: on the average 180 for the duration

-and-

SELECT COALESCE(SUM(oy.price * oy.qty),0)
FROM Order_Yearbook oy, Transactions t
WHERE oy.orderID = t.orderID
AND (
oy.optionID = 3
OR oy.optionID = 5
OR oy.optionID = 7
OR oy.optionID = 8
OR oy.optionID = 9
OR oy.optionID = 10
)

trace duration: 150 on average.

This doesn’t mean a whole lot for a single query, but in my current situation this is being run as a subquery in a HUGE stored procedure for reporting, sometimes a thousand times. The report is coming back in about 10 seconds usually, which kinda sucks.

The downside is sending the extra bytes for all the “OR x = N” clauses

I think I’ll change my IN stuff to OR

Thanks for sparking the discussion.

Reply
David Grant says:

October 24, 2002 at 1:29 pm

I’m a little late on this, but I have done a tiny bit of research on the “IN () vs. OR OR OR” topic.

x IN (1,2,3) is a bit slower than its OR counterpart: x = 1 OR x = 2 OR x = 3

Test subjects:
SELECT COALESCE(SUM(oy.price * oy.qty),0)
FROM Order_Yearbook oy, Transactions t
WHERE oy.orderID = t.orderID
AND oy.optionID IN (3,5,7,8,9,10)

which a trace showed: on the average 180 for the duration

-and-

SELECT COALESCE(SUM(oy.price * oy.qty),0)
FROM Order_Yearbook oy, Transactions t
WHERE oy.orderID = t.orderID
AND (
oy.optionID = 3
OR oy.optionID = 5
OR oy.optionID = 7
OR oy.optionID = 8
OR oy.optionID = 9
OR oy.optionID = 10
)

trace duration: 150 on average.

This doesn’t mean a whole lot for a single query, but in my current situation this is being run as a subquery in a HUGE stored procedure for reporting, sometimes a thousand times. The report is coming back in about 10 seconds usually, which kinda sucks.

The downside is sending the extra bytes for all the “OR x = N” clauses

I think I’ll change my IN stuff to OR

Thanks for sparking the discussion.

Reply

SQL Optmization theory

Like this:

Related

8 Replies to “SQL Optmization theory”

Leave a ReplyCancel reply

Share this:

Like this:

Related

8 Replies to “SQL Optmization theory”

Leave a ReplyCancel reply