Hi,
I'm trying to troubleshoot a slow-running query which is part of a
stored procedure. When I run the query as it is - it takes 53 seconds
to run.
--
Example original query:
SELECT DISTINCT SP.SP_ID
FROM StandardProject SP
INNER JOIN Classes cls ON SP.Cls_ID = cls.Cls_ID
INNER JOIN Groups ON cls.Cls_ID = Groups.Cls_ID
INNER JOIN ClassesGroups ON ClassesGroups.groupid = Groups.groupid
WHERE ClassesGroups.StudentID = 7615
AND ((AccessReport & 2) > 0 OR (AccessReport & 8) > 0)
AND SP.ProjectType = 0
AND Groups.status = 1
AND exists (SELECT SP_ID FROM Schedules WHERE SP_ID = SP.SP_ID)
--
If I removed the most time-consuming inner join tables and set this
part to insert values needed by the rest of the query into a table
variable and ran both queries (INSERT INTO table variable and the
SELECT), the query finishes in 0 seconds.
I've used the "Query Cost (relative to the batch)" in Query Analyzer
counter to analyze the performance of queries in the past (this value
displays when Show Execution Plan is set). In this case I'm
because when I run the original SELECT query and the modified query
(INSERT INTO table variable and then SELECT) - it says the Query Cost
relative to the batch for the first query is 35%, the INSERT = 64%, the
SELECT = 1%.
Why does the first query COST only 35% when it takes the longest to
finish? (53 seconds as opposed to under 1 second)? Does this mean that
the first query will perform better under load?
--
According to SQL Server Performance.com:
"Sometimes, it is valuable to compare two more queries at the same
time, checking to see which one is the better performer. This is
especially true if you have a query that you have rewritten two or more
different ways, and you want to find out which variation of the query
is the most efficient.
While you can analyze each query, one at a time in Query Analyzer, one
trick you should consider trying is to place all of the queries you
want to compare in Query Analyzer, turn on "Show Execution Plan," then
run all the queries at once. When you do this, the "Execution Plan"
window will divide into a separate window for each query ran, allowing
you to compare their execution plans much easier. In addition, in each
execution plan window, you will see a "Query Cost (relative to the
batch)" number. What this number is telling you is how long each query
took as a percentage of the total amount of time it took all of the
queries to run. The query with the lowest number is the query that
performed the quickest, and so on.
As you can imagine, this information can make it much easier to compare
different queries."
--
So what else should I be taking into account? What am I missing here?
Thank you,
Smitha
SQL Server DBAsmithabreddy@.gmail.com,
When comparing two or more queries, it is fair to:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
before each one is executed. (DO NOT DO THIS ON A PRODUCTION SERVER)
AMB
"smithabreddy@.gmail.com" wrote:
> Hi,
> I'm trying to troubleshoot a slow-running query which is part of a
> stored procedure. When I run the query as it is - it takes 53 seconds
> to run.
> --
> Example original query:
> SELECT DISTINCT SP.SP_ID
> FROM StandardProject SP
> INNER JOIN Classes cls ON SP.Cls_ID = cls.Cls_ID
> INNER JOIN Groups ON cls.Cls_ID = Groups.Cls_ID
> INNER JOIN ClassesGroups ON ClassesGroups.groupid = Groups.groupid
> WHERE ClassesGroups.StudentID = 7615
> AND ((AccessReport & 2) > 0 OR (AccessReport & 8) > 0)
> AND SP.ProjectType = 0
> AND Groups.status = 1
> AND exists (SELECT SP_ID FROM Schedules WHERE SP_ID = SP.SP_ID)
> --
> If I removed the most time-consuming inner join tables and set this
> part to insert values needed by the rest of the query into a table
> variable and ran both queries (INSERT INTO table variable and the
> SELECT), the query finishes in 0 seconds.
> I've used the "Query Cost (relative to the batch)" in Query Analyzer
> counter to analyze the performance of queries in the past (this value
> displays when Show Execution Plan is set). In this case I'm
> because when I run the original SELECT query and the modified query
> (INSERT INTO table variable and then SELECT) - it says the Query Cost
> relative to the batch for the first query is 35%, the INSERT = 64%, the
> SELECT = 1%.
> Why does the first query COST only 35% when it takes the longest to
> finish? (53 seconds as opposed to under 1 second)? Does this mean that
> the first query will perform better under load?
> --
> According to SQL Server Performance.com:
> "Sometimes, it is valuable to compare two more queries at the same
> time, checking to see which one is the better performer. This is
> especially true if you have a query that you have rewritten two or more
> different ways, and you want to find out which variation of the query
> is the most efficient.
> While you can analyze each query, one at a time in Query Analyzer, one
> trick you should consider trying is to place all of the queries you
> want to compare in Query Analyzer, turn on "Show Execution Plan," then
> run all the queries at once. When you do this, the "Execution Plan"
> window will divide into a separate window for each query ran, allowing
> you to compare their execution plans much easier. In addition, in each
> execution plan window, you will see a "Query Cost (relative to the
> batch)" number. What this number is telling you is how long each query
> took as a percentage of the total amount of time it took all of the
> queries to run. The query with the lowest number is the query that
> performed the quickest, and so on.
> As you can imagine, this information can make it much easier to compare
> different queries."
> --
> So what else should I be taking into account? What am I missing here?
> Thank you,
> Smitha
> SQL Server DBA
>|||I performed the compare after running both DBCC FREEPROCCACHE and DBCC
DROPCLEANBUFFERS.|||But did you run this between each query, or before the entire batch?
<smithabreddy@.gmail.com> wrote in message
news:1147268627.698501.322300@.v46g2000cwv.googlegroups.com...
> I performed the compare after running both DBCC FREEPROCCACHE and DBCC
> DROPCLEANBUFFERS.
>|||To capture relative cost information (compare the cost of both
approaches), I ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS first
and then ran both batches. This is where I get cost information for
the 53 seconds duration query = 34% and cost for the under 1 second
query = 76%.
I then ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS, ran the
original query. Took 1.05 minutes to run the first time and 54 seconds
to run the second time.
Ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS again, then ran the
modified query (with INSERT INTO table variable and then SELECT). Took
6 seconds to run the first time and 0 seconds to run the second time.
Basically -
I'd been relying on this measure (relative cost for each query in a
batch) somewhat to determine which query is more efficient. And when
both queries return values in under 1 second - I figured it would be
better to go with the lower cost option. But what happens in a
situation such as this where the lower cost query take SO much longer
to run?
I review the output from SET STATISTICS IO ON, Query execution time and
Execution plan to determine if a query has the most efficient
construct. This relative cost information seems counter-intuitive to
me...what rules of thumb should I use to guage query efficiency?|||This is puzzling, and I don't really know the answer, but off the top of my
head I can think of a possibility. This is more a guess than anything, so
take it as such.
I believe the cost considers IO, Memory, and CPU, among other things. I
don't think run time is necessarily as big an issue as the other resources.
It is possible that the faster query is using much more of one of these
resources, which results in the higher cost calculation, even though it
finishes in less time.
i.e. (these numbers are completely bogus, included only to illustrate the
idea):
Query1 uses a lot of IO (5,000) and CPU (5,000), but virtually no memory
(10), and finishes in 60 seconds.
Query2 uses minimal IO (10) and CPU (10) but a lot of memory (10,000).
Because it is using memory without the more time consuming IO or CPU usage,
everything is much faster.
Now, the example above is highly flawed and over simplified, but hopefully
it illustrates my point anyway.
The faster query may actually use more resources over all, and although it
runs faster in test, it may perform slower under peak usage (depending on
your available resources).
I have seen cases where a query would run in 30 seconds, but use 100 percent
of the CPU and make all other transactions grind to a halt, where a slight
change to the query would make it run in 60 seconds on 5 percent CPU and
everything else continued to run without issue.
I guess the point is that run time by itself is not always the best
indication of efficiency, and there is not a simple answer.
<smithabreddy@.gmail.com> wrote in message
news:1147280303.752842.58360@.v46g2000cwv.googlegroups.com...
> To capture relative cost information (compare the cost of both
> approaches), I ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS first
> and then ran both batches. This is where I get cost information for
> the 53 seconds duration query = 34% and cost for the under 1 second
> query = 76%.
> I then ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS, ran the
> original query. Took 1.05 minutes to run the first time and 54 seconds
> to run the second time.
> Ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS again, then ran the
> modified query (with INSERT INTO table variable and then SELECT). Took
> 6 seconds to run the first time and 0 seconds to run the second time.
> Basically -
> I'd been relying on this measure (relative cost for each query in a
> batch) somewhat to determine which query is more efficient. And when
> both queries return values in under 1 second - I figured it would be
> better to go with the lower cost option. But what happens in a
> situation such as this where the lower cost query take SO much longer
> to run?
> I review the output from SET STATISTICS IO ON, Query execution time and
> Execution plan to determine if a query has the most efficient
> construct. This relative cost information seems counter-intuitive to
> me...what rules of thumb should I use to guage query efficiency?
>|||Makes sense...although I've found quite often that something making
sense to me doesn't amount to a hill of beans when dealing with the
OPTIMIZER... :)
I will check this out. Thanks very much.|||Please let me know what you find. I am curious to see if my theory holds
any water.
<smithabreddy@.gmail.com> wrote in message
news:1147289299.238269.87340@.g10g2000cwb.googlegroups.com...
> Makes sense...although I've found quite often that something making
> sense to me doesn't amount to a hill of beans when dealing with the
> OPTIMIZER... :)
> I will check this out. Thanks very much.
>|||So the optimizer has estimated that the first query will use 34% of the
total elapsed time and the second query the remaining 76%, but in
reality the first query is taking 98% of the time and the second query
2%.
There could be a couple of things going on:
a) the table statistics might be out of date. This could easily explain
the incorrect relative cost, and cause the optimizer to choose a
suboptimal plan. When in doubt run UPDATE STATISTICS on all tables in
the query, preferably WITH FULLSCAN.
b) the data distribution of one the the table columns could be very
a-typical. If the optimizer has no usuable statistics for that column
and the column is used in the predicates, then this could result in a
poor query plan. This could be the case with column AccessReport when
evaluating the expression ((AccessReport & 2) > 0 OR (AccessReport & 8)
> 0). If you know how to read query plans, you might be able to determine if that is
the case. Also, the IO statistics would show a marked difference between the fast a
nd slow query with respect to that table.
c) maybe the original query runs into a bug/flaw in the optimizer. For
example the inappropriate use of parallellism (when in doubt, add OPTION
(MAXDOP 1) to the query).
By the way, you can rewrite
((AccessReport & 2) > 0 OR (AccessReport & 8) > 0)
as
AccessReport & (8+2) > 0
which might run a tiny bit faster.
Also, if the column SP.ProjectType or Groups.status is of data type bit,
then make sure you explicitely convert the literal to a bit. For
example, if Groups.status is a bit, then write
AND Groups.status = CAST(1 as bit)
And finally, if column StandardProject.SP_ID is unique, then you can
eliminate the DISTINCT keyword by replacing the joins that could cause
duplicates by pushing them to the EXISTS clause.
HTH,
Gert-Jan
smithabreddy@.gmail.com wrote:
> To capture relative cost information (compare the cost of both
> approaches), I ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS first
> and then ran both batches. This is where I get cost information for
> the 53 seconds duration query = 34% and cost for the under 1 second
> query = 76%.
> I then ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS, ran the
> original query. Took 1.05 minutes to run the first time and 54 seconds
> to run the second time.
> Ran DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS again, then ran the
> modified query (with INSERT INTO table variable and then SELECT). Took
> 6 seconds to run the first time and 0 seconds to run the second time.
> Basically -
> I'd been relying on this measure (relative cost for each query in a
> batch) somewhat to determine which query is more efficient. And when
> both queries return values in under 1 second - I figured it would be
> better to go with the lower cost option. But what happens in a
> situation such as this where the lower cost query take SO much longer
> to run?
> I review the output from SET STATISTICS IO ON, Query execution time and
> Execution plan to determine if a query has the most efficient
> construct. This relative cost information seems counter-intuitive to
> me...what rules of thumb should I use to guage query efficiency?|||a) the table statistics might be out of date. This could easily explain
the incorrect relative cost, and cause the optimizer to choose a
suboptimal plan. When in doubt run UPDATE STATISTICS on all tables in
the query, preferably WITH FULLSCAN.
--> No difference after updating stats with full scan for all tables
involved. Issued DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS before
executing the queries.
b) Wouldn't the stats below suggest that the table variable option is
more efficient?
Fast Query: IO Stats for Groups:
Table 'Groups'. Scan count 0, logical reads 111026, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
Slow Query: IO Stats for Groups:
Table 'Groups'. Scan count 0, logical reads 27419652, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
When I looked at the costs associated with the various nodes in the
execution plan - the one cost that is consistently higher than the
others for the fast query is: Estimated Operator Cost which then causes
the Estimated Subtree Cost for that node to be higher. According to
SQL Server 2005 BOL:
Estimated Operator Cost = The cost to the query optimizer for executing
this operation. The cost of this operation as a percentage of the total
cost of the query is displayed in parentheses. Because the query engine
selects the most efficient operation to perform the query or execute
the statement, this value should be as low as possible.
Estimated Subtree Cost = The total cost to the query optimizer for
executing this operation and all operations preceding it in the same
subtree.
Does this mean that this is just an estimation or that this is a real
cost which will affect the SP each time it executes?
--
Thanks again.
No comments:
Post a Comment