Wednesday, August 26, 2009

Execute sp_msforeachdb by excluding the system databases

For executing a query against each database, we can use either dynamic SQL (search my other posts for info on that) or use undocumented stored procedure sp_msforeachdb. It is particularly helpful when we want to do this against multiple databases.

The other challenge in using this is excluding the system databases if we want the query to be run just against the user databases. Use the below sql statements to accomplish that. Consider executing a sql statement of sp_addrole 'rolename' against each database on just the user databases.

EXEC master..sp_MSForeachdb
'USE [?]
IF DB_ID(''?'') > 4
exec sp_addrole ''rolename'''

Similar commands like above can be used to force an update statistics on all the tables in a given database:

EXEC sp_MSForEachTable  "update statistics ? with fullscan"

Tuesday, August 25, 2009

Threshold of Performance counter values to be considered as ideal

Performance monitor has a set of counters that can be monitored over a given period of time to see if the Windows server that is hosting the Sql Server is in good health.

A general question for a SQL Server DBA while monitoring counters in performance monitor is 'What is the threshold for each of the counters that can be considered as alarm bells for further investigation?'

The below table summarizes the counters and their ideal values. It was given to me by a Microsoft Engineer who works at our client place. All the values have been put together by his team at Microsoft.

The below should be a very good starting point for a DBA to either zoom in or rule out OS as the culprit.

PerformanceObject

Counter

Observed value

Threshold

Memory

Available MBytes

<10mb>

Memory

Cache Bytes

Memory

Commit Limit

Memory

Committed Bytes

> Physical Bytes is bad

Memory

Free System Page Table Entries

<5,000>very bad; <7,000>

Memory

Page Faults/sec

<>

Memory

Pages/sec

>2500/s bad for modern servers; less for older servers

Memory

Pool NonPaged Bytes

32-bit: 100MB-260MB; 64-bit: 1GB-128GB (depends on total physical RAM; is it close to being consumed)

Memory

Pool Paged Bytes

32-bit: 120MB-480MB; 64-bit: 1GB-128GB (depends on total physical RAM; is it close to being consumed)

Memory

System Cache Resident Bytes

Network Interface

Bytes total/sec

*

look for descending numbers to indicate network problems (kk)

Paging File

% Usage

*

>70% is a cause for concern

Paging File

% Usage Peak

*

PhysicalDisk/LogicalDisk

% Disk Time

*

>85% (consider spindle count)

PhysicalDisk/LogicalDisk

% Free Space

*

10-30%

PhysicalDisk/LogicalDisk

% Idle Time

*

PhysicalDisk/LogicalDisk

Avg. Disk Bytes/Read

*

PhysicalDisk/LogicalDisk

Avg. Disk Bytes/Transfer

*

Avg. I/O size; theoretical ~90-100MB/s sequential; reality significantly less

PhysicalDisk/LogicalDisk

Avg. Disk Bytes/Write

*

PhysicalDisk/LogicalDisk

Avg. Disk Queue Length

*

1.5 x # of spindles

PhysicalDisk/LogicalDisk

Avg. Disk Read Queue Length

*

1.5 x # of spindles

PhysicalDisk/LogicalDisk

Avg. Disk sec/Read

*

Disk Latency > 50ms POOR

PhysicalDisk/LogicalDisk

Avg. Disk sec/Transfer

*

Disk Latency > 50ms POOR

PhysicalDisk/LogicalDisk

Avg. Disk sec/Write

*

Disk Latency > 50ms POOR

PhysicalDisk/LogicalDisk

Avg. Disk Queue/sec

*

Disk Latency

PhysicalDisk/LogicalDisk

Avg. Disk Write Queue Length

*

1.5 x # of spindles

PhysicalDisk/LogicalDisk

Current Disk Queue Length

*

1.5 x # of spindles

PhysicalDisk/LogicalDisk

Disk Bytes/sec

*

PhysicalDisk/LogicalDisk

Disk Read Bytes/sec

*

PhysicalDisk/LogicalDisk

Disk Write Bytes/sec

*

PhysicalDisk/LogicalDisk

Disk Reads/sec

*

}~150 (for random)

PhysicalDisk/LogicalDisk

Disk Transfers/sec

*

}

PhysicalDisk/LogicalDisk

Disk Writes/sec

*

}

PhysicalDisk/LogicalDisk

Split IO/sec

*

Fewer the better

Process

% Processor Time

_Total,

Process

Page Faults/sec

_Total,

<>

Process

Handle Count

_Total,

Process

IO Data Operations/sec

_Total,

Process

IO Other Operations/sec

_Total,

Process

Private Bytes

_Total,

Process

Virtual Bytes

_Total,

Process

Working Set

_Total,

Processor

% Idle

Processor

% Privileged Time

Processor

% Processor Time

Processor

% User

User+Priv>80% CPU bottleneck

Processor

Interrupts/sec

System

Context Switches/sec

5,000/s x # of CPUs; somewhat flexible if other CPU-related counters are good--including low SQL compiles/s

System

Processor Queue Length

2 x # of CPUs; threads waiting to be executed; >2 means CPU bottleneck

Access Methods

Full Scans/sec

(Index Searches/sec)/(Full Scans/sec) > 1000

Access Methods

Index Searches/sec

(Index Searches/sec)/(Full Scans/sec) > 1000

Access Methods

Mixed Page Allocations/sec

baseline

Access Methods

Page Splits/sec

baseline or <>

Access Methods

Mixed Page Allocations/sec

baseline

Access Methods

Workfiles Created/sec

baseline

Access Methods

Worktables Created/sec

baseline

Access Methods

Worktables from Cache Ratio

baseline

General Statistics

Temp Tables Creation Rate

baseline

General Statistics

Temp Tables for Destruction

baseline

Buffer Manager

Buffer Cache Hit Ratio

98% OK, 99% better, 99.8% desirable

Buffer Manager

Checkpoint pages/sec

baseline

Buffer Manager

Database Pages

<>

Buffer Manager

Lazy writes/sec

baseline

Buffer Manager

Page Life Expectancy

<300s>1,000 or >10,000s much better

Buffer Manager

Procedure Cache Pages

(2000) prefer stable values, otherwise cache thrash may be occurring

Plan Cache

Cache Pages

(2005) prefer stable values, otherwise cache thrash may be occurring

Buffer Manager

Total pages

baseline

Cache Manager

Cache Hit Ratio

<99.9>98 not optimal

Databases

Data File(s) Size (kb)

Databases

Log Flushes/sec

baseline

Databases

Percent Log Used

Databases

Transactions/sec

baseline

General Statistics

Logins/sec & Logouts/sec

baseline; >2/sec suggests app's implementation of connection pooling may be flawed

General Statistics

User Connections

~

Latches

Average Latch Wait Time (ms)

~

Latches

Latch Waits/sec

(Total Latch Wait Time) / (Latch Waits/Sec) <>

Latches

Total Latch Wait Time (ms)

(Total Latch Wait Time) / (Latch Waits/Sec) <>

Locks

Avg. Wait Time (ms)

~

Locks

Lock Timeouts/sec

> 0

Locks

Lock Wait Time (ms)

Locks

Lock Waits/sec

> 0

Locks

Number of Deadlocks/sec

> 0

Memory Manager

Granted Workspace Memory (KB)

baseline

Memory Manager

Memory Grants Pending

>0 is cause for concern

Memory Manager

Memory Grants Outstanding

Memory Manager

Target Server Memory

high or rising value indicates insufficient memory

Memory Manager

Total Server Memory

~=Physical Ram

SQL Statistics

Batch Requests/sec

OLTP workloads: server should support ~2,500/s/core

SQL Statistics

SQL Compilations/sec

if compiles > 10% of Batch Requests, then app may not be caching effectively

SQL Statistics

SQL Recompilations/sec

Recompiles > 10% of Compilations should be investigated

SQL Server:Buffer Manager

Free list stalls/sec

<>

SQL Server:Buffer Manager

Free pages

> 640

SQL Server:Buffer Manager

Lazy Writes/Sec

<>

SQLServer:Buffer Manager

Page lookups/sec

(Page lookups/sec) / (Batch Requests/sec) <>

SQL Server:Buffer Manager

Page reads/sec

<>

SQL Server:Buffer Manager

Page writes/sec

<>


Failed to obtain the resource handle for cluster resource with name or ID 'xxxxxxxxxxxxxxx' error on Availability Group databases

We recently had failures for our Backup Maintenance Plan (performing T-Log backups) and it would fail with the below error message: Message ...