sql error: "NOT IN" vs "MINUS"

Can anyone offer any help ?

I've got two queries...

1. SELECT mod_code
FROM SRS.Table1
WHERE mod_code NOT IN
(SELECT mod_code
FROM SRS.Table2);

2. SELECT mod_code
FROM SRS.Table1
MINUS
SELECT mod_code
FROM SRS.Table2;

And upon timing them (two tables are quite large) Query 1 takes ALOT longer than Query 2.

My problem is that I don't understand how "NOT IN" works - MINUS works by comparing the two tables and removing duplicates across the mod_code.SRS.Table1 and mod_code.SRS.Table2

But how does NOT IN work, and why is it alot slower?

thanks for your help!!Here is my answer based on the way Oracle works, other DBMSs may do things differently. I am also assuming there is a unique index on the table2 columns being compared.

With MINUS, a full scan is done on both tables and the results for table2 are removed from the results for table1.

With NOT IN, a full table scan is done on table1. For each table1 row, a lookup is then done in table2. If no row is found in table2, the table1 row is returned - at least, that is what I have found.
The reason the NOT IN is slower concerns the number of reads required to perform the query. Let's suppose the tables have the following characteristics:

TABLE1: 20,000 rows in 1000 blocks
TABLE2: 10,000 rows in 500 blocks

Reads required for minus:
Full scan of TABLE1 = 1000 blocks
+
Full scan of Table2 = 500 blocks
= 1500 reads

Reads required for NOT IN:
Full scan of TABLE1 = 1000 blocks
20,000 lookups in TABLE2 = 20,000 x (depth of index on TABLE2)
= 21,000 at least

i.e. a lot more work is done by the NOT IN query.

Here is my test example:

SQL> create table t1 as select object_id from all_objects;

Table created.

SQL> select count(*) from t1;

COUNT(*)
----
42169

SQL> create table t2 as select object_id from all_objects where rownum < 42000;

Table created.

SQL> alter table t1 add primary key(object_id);

Table altered.

SQL> alter table t2 add primary key(object_id);

Table altered.

SQL> analyze table t1 compute statistics;

Table analyzed.

SQL> analyze table t2 compute statistics;

Table analyzed.

SQL> set timing on
SQL> select count(*) from
2 ( select object_id from t1
3 minus
4 select object_id from t2
5 )
6 /

COUNT(*)
----
171

real: 1072

SQL> select count(*) from
2 ( select object_id from t1
3 where object_id not in
4 ( select object_id from t2
5 )
6 )
7 /

COUNT(*)
----
171

real: 2143

SQL> set timing off
SQL> set autotrace on

SQL> select count(*) from
2 ( select object_id from t1
3 minus
4 select object_id from t2
5 )
6 /

COUNT(*)
----
171

Execution Plan
------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=280 Card=1)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=280 Card=84168)
3 2 MINUS
4 3 SORT (UNIQUE) (Cost=140 Card=42169 Bytes=168676)
5 4 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=42169 By
tes=168676)

6 3 SORT (UNIQUE) (Cost=140 Card=41999 Bytes=167996)
7 6 TABLE ACCESS (FULL) OF 'T2' (Cost=10 Card=41999 By
tes=167996)

Statistics
------------------
0 recursive calls
24 db block gets
136 consistent gets
64 physical reads
0 redo size
380 bytes sent via SQL*Net to client
518 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select count(*) from
2 ( select object_id from t1
3 where object_id not in
4 ( select object_id from t2
5 )
6 )
7 /

COUNT(*)
----
171

Execution Plan
------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=2109 Bytes=8
436)

4 2 INDEX (UNIQUE SCAN) OF 'SYS_C00128497' (UNIQUE) (Cost=
1 Card=1 Bytes=4)

Statistics
------------------
0 recursive calls
12 db block gets
84406 consistent gets
0 physical reads
0 redo size
405 bytes sent via SQL*Net to client
541 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed|||Andrew,

That was just the answer I was looking for!

Many thanks for your help!

Matt|||Is there a similar command to "minus" in SQL Server? I use "NOT IN" but would prefer a faster command.

Originally posted by andrewst
Here is my answer based on the way Oracle works, other DBMSs may do things differently. I am also assuming there is a unique index on the table2 columns being compared.

With MINUS, a full scan is done on both tables and the results for table2 are removed from the results for table1.

With NOT IN, a full table scan is done on table1. For each table1 row, a lookup is then done in table2. If no row is found in table2, the table1 row is returned - at least, that is what I have found.
The reason the NOT IN is slower concerns the number of reads required to perform the query. Let's suppose the tables have the following characteristics:

TABLE1: 20,000 rows in 1000 blocks
TABLE2: 10,000 rows in 500 blocks

Reads required for minus:
Full scan of TABLE1 = 1000 blocks
+
Full scan of Table2 = 500 blocks
= 1500 reads

Reads required for NOT IN:
Full scan of TABLE1 = 1000 blocks
20,000 lookups in TABLE2 = 20,000 x (depth of index on TABLE2)
= 21,000 at least

i.e. a lot more work is done by the NOT IN query.

Here is my test example:

SQL> create table t1 as select object_id from all_objects;

Table created.

SQL> select count(*) from t1;

COUNT(*)
----
42169

SQL> create table t2 as select object_id from all_objects where rownum < 42000;

Table created.

SQL> alter table t1 add primary key(object_id);

Table altered.

SQL> alter table t2 add primary key(object_id);

Table altered.

SQL> analyze table t1 compute statistics;

Table analyzed.

SQL> analyze table t2 compute statistics;

Table analyzed.

SQL> set timing on
SQL> select count(*) from
2 ( select object_id from t1
3 minus
4 select object_id from t2
5 )
6 /

COUNT(*)
----
171

real: 1072

SQL> select count(*) from
2 ( select object_id from t1
3 where object_id not in
4 ( select object_id from t2
5 )
6 )
7 /

COUNT(*)
----
171

real: 2143

SQL> set timing off
SQL> set autotrace on

SQL> select count(*) from
2 ( select object_id from t1
3 minus
4 select object_id from t2
5 )
6 /

COUNT(*)
----
171

Execution Plan
------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=280 Card=1)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=280 Card=84168)
3 2 MINUS
4 3 SORT (UNIQUE) (Cost=140 Card=42169 Bytes=168676)
5 4 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=42169 By
tes=168676)

6 3 SORT (UNIQUE) (Cost=140 Card=41999 Bytes=167996)
7 6 TABLE ACCESS (FULL) OF 'T2' (Cost=10 Card=41999 By
tes=167996)

Statistics
------------------
0 recursive calls
24 db block gets
136 consistent gets
64 physical reads
0 redo size
380 bytes sent via SQL*Net to client
518 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select count(*) from
2 ( select object_id from t1
3 where object_id not in
4 ( select object_id from t2
5 )
6 )
7 /

COUNT(*)
----
171

Execution Plan
------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=2109 Bytes=8
436)

4 2 INDEX (UNIQUE SCAN) OF 'SYS_C00128497' (UNIQUE) (Cost=
1 Card=1 Bytes=4)

Statistics
------------------
0 recursive calls
12 db block gets
84406 consistent gets
0 physical reads
0 redo size
405 bytes sent via SQL*Net to client
541 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed|||Originally posted by acg_ray
Is there a similar command to "minus" in SQL Server? I use "NOT IN" but would prefer a faster command.
Amazingly (to me) it appears that SQL Server does not support MINUS, nor INTERSECT - according to the on-line manual here:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sa-ses_9sfo.asp

The ANSI name for MINUS is EXCEPT, but SQL Server doesn't seem to have that either. I find that strange, because relational databases are all about set processing, and UNION, MINUS/EXCEPT and INTERSECT are operators that work on sets (remember those Venn diagrams at school?)|||Thanks for confirming (unfortuanately) what I already expected. I learned Oracle in school, but have always used SQL Server professionally, and I was hoping I was missing something from SQL Server... but apparently not!

Originally posted by andrewst
Amazingly (to me) it appears that SQL Server does not support MINUS, nor INTERSECT - according to the on-line manual here:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sa-ses_9sfo.asp

The ANSI name for MINUS is EXCEPT, but SQL Server doesn't seem to have that either. I find that strange, because relational databases are all about set processing, and UNION, MINUS/EXCEPT and INTERSECT are operators that work on sets (remember those Venn diagrams at school?)

Tuesday, March 6, 2012

"NOT IN" vs "MINUS"

No comments:

Post a Comment

sql error

Blog Archive

About Me