Following are the counters the Microsoft Service
Support engineers rely on for monitoring.
These data are very important for your server monitoring, you can use them with Operation Manager or Windows Performance Console
LogicalDisk\%
Free Space This measures the percentage of free space on the
selected logical disk drive. Take note if this falls below 15 percent, as you
risk running out of free space for the OS to store critical files. One obvious
solution here is to add more disk space.
PhysicalDisk\%
Idle Time This measures the percentage of time the disk was
idle during the sample interval. If this counter falls below 20 percent, the
disk system is saturated. You may consider replacing the current disk system
with a faster disk system.
PhysicalDisk\Avg.
Disk Sec/Read This measures the average time, in seconds, to
read data from the disk. If the number is larger than 25 milliseconds (ms),
that means the disk system is experiencing latency when reading from the disk.
For mission-critical servers hosting SQL Server® and
Exchange Server, the acceptable threshold is much lower, approximately 10 ms.
The most logical solution here is to replace the current disk system with a
faster disk system.
PhysicalDisk\Avg.
Disk Sec/Write This measures the average time, in seconds, it
takes to write data to the disk. If the number is larger than 25 ms, the disk
system experiences latency when writing to the disk. For mission-critical
servers hosting SQL Server and Exchange Server, the acceptable threshold is
much lower, approximately 10 ms. The likely solution here is to replace the
disk system with a faster disk system.
PhysicalDisk\Avg.
Disk Queue Length This indicates how many I/O operations are
waiting for the hard drive to become available. If the value here is larger
than the two times the number of spindles, that means the disk itself may be
the bottleneck.
Memory\Cache
Bytes This indicates the amount of memory being used for the
file system cache. There may be a disk bottleneck if this value is greater than
300MB.
Memory Bottleneck
A memory shortage is typically due to insufficient
RAM, a memory leak, or a memory switch placed inside the boot.ini. Before I get
into memory counters, I should discuss the /3GB switch.
More memory reduces disk I/O activity and, in turn,
improves application performance. The /3GB switch was introduced in Windows NT® as a way to provide more memory for the user-mode programs.
Windows uses a virtual address space of 4GB
(independent of how much physical RAM the system has). By default, the lower
2GB are reserved for user-mode programs and the upper 2GB are reserved for
kernel-mode programs. With the /3GB switch, 3GB are given to user-mode
processes. This, of course, comes at the expense of the kernel memory, which
will have only 1GB of virtual address space. This can cause problems because
Pool Non-Paged Bytes, Pool Paged Bytes, Free System Page Tables Entries, and desktop
heap are all squeezed together within this 1GB space. Therefore, the /3GB
switch should only be used after thorough testing has been done in your
environment.
This is a consideration if you suspect you are
experiencing a memory-related bottleneck. If the /3GB switch is not the cause
of the problems, you can use these counters for diagnosing a potential memory
bottleneck.
Memory\%
Committed Bytes in Use This measures the ratio of Committed
Bytes to the Commit Limit—in other words, the amount of virtual memory in use.
This indicates insufficient memory if the number is greater than 80 percent.
The obvious solution for this is to add more memory.
Memory\Available
Mbytes This measures the amount of physical memory, in
megabytes, available for running processes. If this value is less than 5
percent of the total physical RAM, that means there is insufficient memory, and
that can increase paging activity. To resolve this problem, you should simply
add more memory.
Memory\Free
System Page Table Entries This indicates the number of page
table entries not currently in use by the system. If the number is less than
5,000, there may well be a memory leak.
Memory\Pool
Non-Paged Bytes This measures the size, in bytes, of the
non-paged pool. This is an area of system memory for objects that cannot be
written to disk but instead must remain in physical memory as long as they are
allocated. There is a possible memory leak if the value is greater than 175MB
(or 100MB with the /3GB switch). A typical Event ID 2019 is recorded in the
system event log.
Memory\Pool
Paged Bytes This measures the size, in bytes, of the paged
pool. This is an area of system memory used for objects that can be written to
disk when they are not being used. There may be a memory leak if this value is
greater than 250MB (or 170MB with the /3GB switch). A typical Event ID 2020 is
recorded in the system event log.
Memory\Pages per
Second This measures the rate at which pages are read from or
written to disk to resolve hard page faults. If the value is greater than
1,000, as a result of excessive paging, there may be a memory leak.
Processor Bottleneck
An overwhelmed processor can be due to the processor
itself not offering enough power or it can be due to an inefficient
application. You must double-check whether the processor spends a lot of time
in paging as a result of insufficient physical memory. When investigating a
potential processor bottleneck, the Microsoft Service Support engineers use the
following counters.
Processor\%
Processor Time This measures the percentage of elapsed time the
processor spends executing a non-idle thread. If the percentage is greater than
85 percent, the processor is overwhelmed and the server may require a faster
processor.
Processor\% User
Time This measures the percentage of elapsed time the processor
spends in user mode. If this value is high, the server is busy with the
application. One possible solution here is to optimize the application that is
using up the processor resources.
Processor\%
Interrupt Time This measures the time the processor spends
receiving and servicing hardware interruptions during specific sample
intervals. This counter indicates a possible hardware issue if the value is
greater than 15 percent.
System\Processor
Queue Length This indicates the number of threads in the
processor queue. The server doesn’t have enough processor power if the value is
more than two times the number of CPUs for an extended period of time.
Network Bottleneck
A network bottleneck, of course, affects the
server’s ability to send and receive data across the network. It can be an
issue with the network card on the server, or perhaps the network is saturated
and needs to be segmented. You can use the following counters to diagnosis
potential network bottlenecks.
Network Interface\Bytes
Total/Sec This measures the rate at which bytes are sent and
received over each network adapter, including framing characters. The network
is saturated if you discover that more than 70 percent of the interface is
consumed. For a 100-Mbps NIC, the interface consumed is 8.7MB/sec (100Mbps =
100000kbps = 12.5MB/sec* 70 percent). In a situation like this, you may want to
add a faster network card or segment the network.
Network
Interface\Output Queue Length This measures the length of the
output packet queue, in packets. There is network saturation if the value is
more than 2. You can address this problem by adding a faster network card or
segmenting the network.
Process Bottleneck
Server performance will be significantly affected if
you have a misbehaving process or non-optimized processes. Thread and handle
leaks will eventually bring down a server, and excessive processor usage will
bring a server to a crawl. The following counters are indispensable when
diagnosing process-related bottlenecks.
Process\Handle
Count This measures the total number of handles that are
currently open by a process. This counter indicates a possible handle leak if
the number is greater than 10,000.
Process\Thread
Count This measures the number of threads currently active in a
process. There may be a thread leak if this number is more than 500 between the
minimum and maximum number of threads.
Process\Private
Bytes This indicates the amount of memory that this process has
allocated that cannot be shared with other processes. If the value is greater
than 250 between the minimum and maximum number of threads, there may be a
memory leak.
No comments:
Post a Comment