One of the most common errors a DBA encounters is related to a “Slow Server Response Time”. This blog will help DBAs understand this error better and how to overcome it. To start with, a DBA needs to know that this problem can lie within memory, IO, CPU or TempDB.PROBLEM: High CPU Usage / Slow Server Response/ Slow Running Query/ Low Throughput.
Tools and Metrics:
• System Monitor: Processor object, if the % Processor Time counter regularly exceeds 80% per CPU then you have a potential CPU bottleneck.
• Task Manager-->Performance-->CPU Usage
• SQL Server: sys.dm_os_schedulers DMV. An average of the runnable_tasks_count greater than 0 depicts a CPU bottleneck.
• SQL Server: sys.dm_exec_query_stats DMV shows you the statistics of the currently cached batches or stored procedures using CPU. Watch out for total sum of total_worker_time and execution_count per Plan Handle.
Root Cause Analysis:
1. Query Compilation / Recompilation – When you fire a query, the SQL optimizer decides the best execution plan after considering the cost effectiveness. [Re]Compilation is CPU Intensive and to leverage the system resources SQL stores the recently used execution plan in the system memory. For any reason if the SQL optimizer doesn’t find a matching execution plan in memory it [re]compiles the query. Changes in the query text, indexes, search parameters, schema, use of WITH Recompile, Use of SET options, DDL/DML in a same query and temp tables also lead to [re]compilation and consequently CPU usage.
Following SQL SERVER: SQL Statistics performance counters can be used to track compilation.
Batch Requests/Sec – Indicates the number of batches processed per second. High batch requests mean good throughput.
SQL Compilations/Sec – Indicates the number of queries and procedures compiled per second. Ideally it should be less than 10% of batch requests/sec.
SQL Recompilation/Sec – High value is a clear indication of more query recompilation per second. Ideally it should be Zero. There is no specific threshold for these counters and you should define a baseline based on your workload.
Now that you know that [Re]-Compilation is high, what is next? Finding the specific queries is the answer. You can use SQL Profiler trace to find the queries involved.
SP:Recompile – This event tracks the recompilation on the stored procedures.
SQL:Stmt Recompile – This event tracks the recompilation for the ad-hoc queries.
2. Inefficient Query Plans – Availability of an execution plan within the memory is no guarantee for its efficiency. A query plan can be inefficient because of many reasons among including:
• Outdated Statistics
• Missing Indexes
• Inefficient Cardinality
• Distributed Transaction
By using certain Dynamic Management Views (DMV’s) you can access the statistics and metadata of what queries were run on, what query plans and system resources were used, and so on and so forth.
DMV sys.dm_exec_requests – This is the first place for your rescue as you can quickly ascertain what is running at present and consuming the CPU time. A simple “Select” statement ordered by CPU_Time will list the most expensive processes at present.
DMV sys.dm_exec_cached_plans – One row per query plan referenced by plan_handle. You can use this to determine how many times a plan has been used, memory size, objects being referenced and ad-hoc query vs. stored procedure.
DMF sys.dm_exec_sql_text – This will return the batch text for the cached plan. All you need to do is call this function and pass plan_handle as a parameter.
DMV sys.dm_exec_query_stats / sys.dm_exec_procedure_stats – This has the capability to show historical queries that we executed in the past and what was their related cost. Use total_worker_time divided by execution_count to find the average CPU time per plan handle. Problem queries have the highest CPU consumption.
With a combination of these you can identify the CPU intensive queries, related database/objects, execution count and a lot of other related information.
3. Intra-Query Parallelism – A Query that runs in parallel breaks the process into smaller pieces that are executed in parallel by the spawning threads. First, the CPU needs to manage the communication between these threads which can slow down a query. Parallelism works when you have more than one CPU and there are possibilities that one parallel thread is behind the other due to a lock on CPU for a specific data range. Missing Indexes is another cause of intra-query parallelism.
CXPACKET – You will see this wait type if a query is using parallelism.
MAXDOP – Max degree of parallelism controls the number of parallel processes a query can have.
DMV sys.dm_os_tasks – Identifies the number of tasks associated with a session or a process and its execution status.
Execution Plan – If your execution plan shows parallelism operator then it is clear that query will be executed in multiple threads on different CPU schedulers.
You can also use the DMV’s as mentioned in point no. 2 to find more about query plan and query text.
Once you have identified the problem query you can work on resolving the same depending on the reasons mentioned above. Some of the techniques you can use are:
• Monitor set option changes and avoid them in the stored procedures.
• Don’t Mix (Data Definition Language) DDL/DML (Data Manipulation Language) statements.
• Use Database Tuning Advisor
• Keep statistics up to date
• Check for missing statistics
• Check cardinality estimates
• Evaluate if the query can be rewritten efficiently in T-SQL
It can be concluded that with the help of above methodology one can find the exact cause of CPU spike within SQL. We will discuss more about other resource bottlenecks in the upcoming blogs. Feel free to write to me on the id below if you have any queries or comments: vagarwal@Tricoresolutions.com
Have TriCore provide comprehensive remote managed service solutions for your Microsoft SQL Server environment