24

I was working with a query I wrote today had to change the code from the WHERE clause to use a IN(list of stuff) filter instead of using something like

item_desc = 'item 1'
OR item_desc = 'item 2'
OR item_desc = 'item 3'
OR item_desc = 'item 4'

The above ran for 15 minutes and returned nothing, yet the following gave me my result set in 1.5 minutes

item_desc IN (
'item 1'
,'item 2'
,'item 3'
,'item 4'
)

I did this in SQL and am wondering why the IN(list of items) performed so much faster then the OR statement.

-- EDIT -- SQL Server 2008, I do apologize for not putting this bit of info in in the first place.

Here is the Query in its entirety using the OR statements:

DECLARE @SD DATETIME
DECLARE @ED DATETIME
SET @SD = '2013-06-01';
SET @ED = '2013-06-15';

-- COLUMN SELECTION
SELECT PV.PtNo_Num AS 'VISIT ID'
, PV.Med_Rec_No AS 'MRN'
, PV.vst_start_dtime AS 'ADMIT'
, PV.vst_end_dtime AS 'DISC'
, PV.Days_Stay AS 'LOS'
, PV.pt_type AS 'PT TYPE'
, PV.hosp_svc AS 'HOSP SVC'
, SO.ord_no AS 'ORDER NUMBER'
--, SO.ent_dtime AS 'ORDER ENTRY TIME'
--, DATEDIFF(HOUR,PV.vst_start_dtime,SO.ent_dtime) AS 'ADM TO ENTRY HOURS'
, SO.svc_desc AS 'ORDER DESCRIPTION'
, OSM.ord_sts AS 'ORDER STATUS'
, SOS.prcs_dtime AS 'ORDER STATUS TIME'
, DATEDIFF(DAY,PV.vst_start_dtime,SOS.prcs_dtime) AS 'ADM TO ORD STS IN DAYS'

-- DB(S) USED
FROM smsdss.BMH_PLM_PtAcct_V PV
JOIN smsmir.sr_ord SO
ON PV.PtNo_Num = SO.episode_no
JOIN smsmir.sr_ord_sts_hist SOS
ON SO.ord_no = SOS.ord_no
JOIN smsmir.ord_sts_modf_mstr OSM
ON SOS.hist_sts = OSM.ord_sts_modf_cd

-- FILTER(S)
WHERE PV.Adm_Date BETWEEN @SD AND @ED
AND SO.svc_cd = 'PCO_REMFOLEY'
OR SO.svc_cd = 'PCO_INSRTFOLEY'
OR SO.svc_cd = 'PCO_INSTFOLEY'
OR SO.svc_cd = 'PCO_URIMETER'

AND SO.ord_no NOT IN (
    SELECT SO.ord_no
    FRROM smsdss.BMH_PLM_PtAcct_V PV
    JOIN smsmir.sr_ord SO
    ON PV.PtNo_Num = SO.episode_no
    JOIN smsmir.sr_ord_sts_hist SOS
    ON SO.ord_no = SOS.ord_no
    JOIN smsmir.ord_sts_modf_mstr OSM
    ON SOS.hist_sts = OSM.ord_sts_modf_cd
    WHERE OSM.ord_sts = 'DISCONTINUE'
    AND SO.svc_cd = 'PCO_REMFOLEY'
    OR SO.svc_cd = 'PCO_INSRTFOLEY'
    OR SO.svc_cd = 'PCO_INSTFOLEY'
    OR SO.svc_cd = 'PCO_URIMETER'
)
ORDER BY PV.PtNo_Num, SO.ord_no, SOS.prcs_dtime

Thank you,

MCP_infiltrator
  • 745
  • 2
  • 6
  • 12
  • 11
    Have you looked at the query plan? –  Jul 11 '13 at 22:46
  • 2
    This is VERY implementation specific. Which DBMS are you using? – James Anderson Jul 12 '13 at 06:54
  • I did not look at the query plan, I did not know if this was query specific or if it was a matter of fact, as in this would always work in this fashion. – MCP_infiltrator Jul 12 '13 at 11:42
  • 3
    @MCP_infiltrator So the execution plans won't be equivalent because the logic is not equivalent. When using `OR` like you do in the actual query above, you allow the engine to short circuit. `WHERE A AND B OR C` will evaluate to true even if A AND B are false, if C is true. If you say `WHERE A and B OR C OR D OR E OR F` like you do above, the `AND` can be factored out. The actual equivalent logic would encapsulate the `OR` series above in parenthesis so they are treated as a set: `WHERE A AND (B OR C OR D OR E)`. This is how an `IN` is treated. – JNK Jul 12 '13 at 13:55
  • 5
    Operator precendence in SQL Server specified that `AND` is handled before `OR`, so your query above is equivalent to `WHERE (OSM.ord_sts = 'DISCONTINUE' AND SO.svc_cd = 'PCO_REMFOLEY') OR SO.svc_cd = 'PCO_INSRTFOLEY' OR SO.svc_cd = 'PCO_INSTFOLEY' OR SO.svc_cd = 'PCO_URIMETER'` which means if any of the last 3 conditions are true it will be able to short circuit the rest of the evaluation. – JNK Jul 12 '13 at 13:58
  • I expanded the explanation in my answer. – JNK Jul 12 '13 at 14:25
  • If you expect to get the same result and don't, speed isn't the issue. – JeffO Jul 12 '13 at 17:50

2 Answers2

29

Oleski's answer is incorrect. For SQL Server 2008, an IN list gets refactored to a series of OR statements. It may be different in say MySQL.

I'm fairly certain that if you generated actual execution plans for both your queries they would be identical.

In all likelihood the second query ran faster because you ran it second, and the first query had already pulled all the data pages from the database and paid the IO cost. The second query was able to read all the data from memory and execute a lot faster.

Update

The actual source of the variance is likely that the queries are not equivalent. You have two different OR lists below:

WHERE PV.Adm_Date BETWEEN @SD AND @ED
AND SO.svc_cd = 'PCO_REMFOLEY'
OR SO.svc_cd = 'PCO_INSRTFOLEY'
OR SO.svc_cd = 'PCO_INSTFOLEY'
OR SO.svc_cd = 'PCO_URIMETER'

and later

 WHERE OSM.ord_sts = 'DISCONTINUE'
    AND SO.svc_cd = 'PCO_REMFOLEY'
    OR SO.svc_cd = 'PCO_INSRTFOLEY'
    OR SO.svc_cd = 'PCO_INSTFOLEY'
    OR SO.svc_cd = 'PCO_URIMETER'

In both those WHERE clauses, operator precendence (where AND is handled before OR) means that the actual logic run by the engine is:

WHERE (ConditionA AND ConditionB)
OR ConditionC
OR ConditionD
OR ConditionE

If you replace the OR lists with an IN expression, the logic will be:

WHERE ConditionA
AND (ConditionB OR ConditionC OR ConditionD OR ConditionE)

Which is radically different.

JNK
  • 479
  • 4
  • 9
  • He had answered it before I edited the question, I saw a comment that I left out the DBMS version so I added it in after he answered the question and running the query a second time did not matter, I ran it this morning after everything has been shutdown for 16 hours and it ran in just over a minute again. – MCP_infiltrator Jul 12 '13 at 12:37
  • 2
    @MCP_infiltrator Well that's the problem with making assumptions :) You really should get actual exec plans for both and see if there's a difference, I don't think there will be. – JNK Jul 12 '13 at 12:53
  • I did that for the OR statement and saw that 68% of my cost was scanning a Clustered Index, I will try and get both plans and post them since I have only looked at them not disected and really understood them. I have been teaching myself and using SO to learn SQL over the last really 5 or 6 months, before that never touched it or seen it so a lot is still beyond me but I'm having fun learning it. – MCP_infiltrator Jul 12 '13 at 13:13
  • 4
    Well if you have advanced DB question, you can also ask on [DBA.SE] - full disclosure, I'm a moderator over there, but if it's an advanced SQL or SQL optimization question we have a ton of experts, especially for SQL Server – JNK Jul 12 '13 at 13:18
  • 1
    I just looked at the two execution plans and they are way different. The query with the OR statements takes up 68% of cost in Clustered Index Scan, where the IN statement is 26%, along with what seems to be less execution steps as well. – MCP_infiltrator Jul 12 '13 at 13:21
  • thank you, I will post it over there, its really a curiosity thing now, I have to know lol – MCP_infiltrator Jul 12 '13 at 13:21
  • @MCP_infiltrator The costs are always estimated costs so may not be accurate. I'd be interested to see the execution plans though, can you share them in the body of the question? I'd also be interested to see the actual query text. – JNK Jul 12 '13 at 13:22
  • Sure I can do that, I'll post the query and then how should I go about posting the execution plan? – MCP_infiltrator Jul 12 '13 at 13:23
  • @MCP_infiltrator If you have a screenshot you can just use that I think – JNK Jul 12 '13 at 13:48
  • I'll have to use a couple of them, I'll do that. – MCP_infiltrator Jul 12 '13 at 13:50
  • will need to find another way to do it, they are just to big, they won't make sense all chopped up – MCP_infiltrator Jul 12 '13 at 14:01
  • 3
    @MCP_infiltrator No need, see my comments on your original post at top. `IN` is not equivalent to your `OR`s above because of the other conditions in your `WHERE` clause in the actual query. Basically the queries will return different results. – JNK Jul 12 '13 at 14:02
  • 3
    @MCP_infiltrator There is no need to post identical question at DBA.SE, JNK has answered it (and you'll get similar answers there.) If you do want to move ("migrate") it there though, you can always flag it (your question) mentioning in the comment box what you want. The mods will take care. – ypercubeᵀᴹ Jul 12 '13 at 14:35
  • Thanks @ypercube I'm not going to move it as JNK took care of it, thank you though. – MCP_infiltrator Jul 12 '13 at 14:44
7

The best way to tell is to look at the actual query plan using something like EXPLAIN. This should tell you exactly what the DBMS is doing, and then you can get a much better idea why it's more efficient.

With that said, DBMS systems are really good at doing operations between two tables (like joins). A lot of the optimizer's time is spent on these parts of the queries because they are generally more expensive.

For example, the DBMS could sort that IN list and, using an index on item_desc, filter the results very quickly. You can't do that optimization when you list a bunch of selections like in the first example.

When you use IN, you are making an impromptu table and filtering using these more efficient table combining techniques.

EDIT: I posted this answer before OP mentioned the specific DBMS. This is turns out to NOT be how SQL Server treats this query, but might be valid for other DBMS systems. See JNK's answer for a more specific, accurate answer.

Oleksi
  • 11,874
  • 2
  • 53
  • 54
  • I would imagine the cardinality has a lot to do with it. That `IN` wouldn't be so fast if it was a subselect with 100 records in it, or a thousand. – Robert Harvey Jul 11 '13 at 23:46
  • @RobertHarvey Yes, that's probably true, but I wouldn't expect it to be that much worse either. – Oleksi Jul 11 '13 at 23:47
  • Thanks @Oleksi I did not know that the DBMS would make the IN statement an impromptu list – MCP_infiltrator Jul 12 '13 at 11:44
  • 1
    -1 - In SQL Server the `IN` statement is not converted to a table, it's treated identically to a series of `OR`s. – JNK Jul 12 '13 at 12:09
  • @JNK: Even if Oleksi's theory is wrong, 'EXPLAIN' should give the correct answer as he said. – Katana314 Jul 12 '13 at 13:46
  • 2
    @Katana314 If EXPLAIN were a keyword in SQL Server (which the OP is using) I would agree with you, but it's not so it's not relevant. – JNK Jul 12 '13 at 13:48
  • I updated my answer to reflect this. I wasn't aware it was SQL Server when I answered. – Oleksi Jul 12 '13 at 17:25