SQL Optmization theory

Brishen and I have been having ongoing disucussions revolving around query optimization theory. I thought I’d toss this out to the world (or the tiny proportion of the world that sees this site).

Currently, we’re discussing constraints. If a.childID = b.objectID is all that I need to create the right link between tables, would a.childID = b.objectID AND a.childtype = ‘B’ be more efficient? Tests against SQL Server for me show that this appears to be true if a.childtype is indexed, but if it is not indexed, it appears to be about the same speed (a.childID is indexed in both cases). Is there a general rule that can be applied here? Such as: if the constraints apply to indexed columns, it is better to include as many as possible, but if the columns are not indexed, as few as possible?

A further questions concerns using WHERE a IN (option1,option2,option3) vs. a = option1 OR a = option2 OR a=option3. Which is better when? Is there a limit to how large the list used with IN () should be? Personal experience has suggested not really, but the largest I’ve used is not that large.

Furthermore, while they should be equal, is FROM tableX JOIN TableY on x.objectID = y.objectID equal to FROM tableX,TableY WHERE x.objectID = y.objectID, performance-wise?

Every time I leave a largeish query alone for a while, and return to it a few months later, I find little tweaks that I can do to improve it. What I’d really like is some way of learning all the ways of optimizing queries in the first place, even if it is technology specific. Can anyone out there in the world recommend any books about query optimization theory, or a series of experiments comparing the various ways of accomplishing the same things through SQL? If so, I, and Brishen, would be most appreciative.

8 Replies to “SQL Optmization theory”

  1. Yeah – I’ve had up to 300 before in an in, and the query (which is a fairly complex one beforehand), ran at about 0.8 seconds, which is horribly slow, really.
    I’ll definitely check out that link; thanks for that.

  2. Yeah – I’ve had up to 300 before in an in, and the query (which is a fairly complex one beforehand), ran at about 0.8 seconds, which is horribly slow, really.
    I’ll definitely check out that link; thanks for that.

  3. I’m a little late on this, but I have done a tiny bit of research on the “IN () vs. OR OR OR” topic.

    x IN (1,2,3) is a bit slower than its OR counterpart: x = 1 OR x = 2 OR x = 3

    Test subjects:
    SELECT COALESCE(SUM(oy.price * oy.qty),0)
    FROM Order_Yearbook oy, Transactions t
    WHERE oy.orderID = t.orderID
    AND oy.optionID IN (3,5,7,8,9,10)

    which a trace showed: on the average 180 for the duration

    -and-

    SELECT COALESCE(SUM(oy.price * oy.qty),0)
    FROM Order_Yearbook oy, Transactions t
    WHERE oy.orderID = t.orderID
    AND (
    oy.optionID = 3
    OR oy.optionID = 5
    OR oy.optionID = 7
    OR oy.optionID = 8
    OR oy.optionID = 9
    OR oy.optionID = 10
    )

    trace duration: 150 on average.

    This doesn’t mean a whole lot for a single query, but in my current situation this is being run as a subquery in a HUGE stored procedure for reporting, sometimes a thousand times. The report is coming back in about 10 seconds usually, which kinda sucks.

    The downside is sending the extra bytes for all the “OR x = N” clauses

    I think I’ll change my IN stuff to OR

    Thanks for sparking the discussion.

  4. I’m a little late on this, but I have done a tiny bit of research on the “IN () vs. OR OR OR” topic.

    x IN (1,2,3) is a bit slower than its OR counterpart: x = 1 OR x = 2 OR x = 3

    Test subjects:
    SELECT COALESCE(SUM(oy.price * oy.qty),0)
    FROM Order_Yearbook oy, Transactions t
    WHERE oy.orderID = t.orderID
    AND oy.optionID IN (3,5,7,8,9,10)

    which a trace showed: on the average 180 for the duration

    -and-

    SELECT COALESCE(SUM(oy.price * oy.qty),0)
    FROM Order_Yearbook oy, Transactions t
    WHERE oy.orderID = t.orderID
    AND (
    oy.optionID = 3
    OR oy.optionID = 5
    OR oy.optionID = 7
    OR oy.optionID = 8
    OR oy.optionID = 9
    OR oy.optionID = 10
    )

    trace duration: 150 on average.

    This doesn’t mean a whole lot for a single query, but in my current situation this is being run as a subquery in a HUGE stored procedure for reporting, sometimes a thousand times. The report is coming back in about 10 seconds usually, which kinda sucks.

    The downside is sending the extra bytes for all the “OR x = N” clauses

    I think I’ll change my IN stuff to OR

    Thanks for sparking the discussion.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.