Troubleshooting poor Windows logon performance in Active Directory environments

Problems based around performance are often the most frustrating to resolve, mainly because there are so many variables to consider. In this article, I will focus on the difficult issue of diagnosing and resolving slow logon performance for users when logging in to their domain accounts.

When troubleshooting any performance problem, you must first define what is an acceptable delay. I’ve seen some environments where users experience 5-10 minute logon times and they don’t complain simply because they are used to it. Then I’ve seen others scenarios where even a one minute delay is considered unacceptable. That’s why it’s important to first define what is reasonable so that you know when you have solved the problem.

Windows logon performance factors

It’s important to consider a variety of factors when looking for the cause of logon performance issues. Some of these factors include:

  • the proximity of domain controllers to your users
  • network connections and available bandwidth
  • hardware resources on the DCs (x64 vs. x86, memory, etc.)
  • the number of Group Policy Objects (GPOs) applied to the user and computer (which directly affects bandwidth)
  • the number of security groups the user and computer are members of (also directly affects bandwidth)
  • GPOs containing settings that require extra processing time such as:
    • loopback processing
    • WMI filters
    • ACL filtering
  • heavily loaded domain controllers caused by:
    • applications requiring authentication
    • inefficient LDAP queries from user scripts or applications (see my article on taming the LSASS.exe process for more details)
    • a DC hosting other apps such as Exchange, IIS, SQL Server, etc.
  • client configuration
    • memory, disk, processor,etc.
    • network Interface (10/100/1000)
    • subnet mapped properly to the site
    • DNS configuration

Define the scope

I always spend time asking basic questions in order to define the true scope of the problem. This will take some effort because these problems are usually defined by users who complain, while there may also be users who have just learned to live with it. Below are some important questions to ask:

  • Are the problems defined to a single site, security group, OU, department, type of client (laptop or desktop), or OS?
  • Does the problem happen at a particular time of day?
  • Does the problem occur when you are in the office or connecting over the VPN?
  • Describe the symptoms:
    • Does the delay occur at a specific point each time (i.e. “Network Settings” on the logon screen)
    • Does it occur before or after the logon screen?
  • When did this start happening?

Tools and data gathering

There are some basic tools that I use to gather data. For performance problems, I like to cast a wide net and collect all that I can. Here are some examples:

  • Run Microsft Product Support Reports (MPSreports) on clients and their authenticating DCs. This is a common tool that collects data for all event logs, MSINFO32, NetDiag, IPConfig, drivers, hotfixes and more. Hewlett-Packard also has its own version called HPS Reports which is, in my opinion, superior to Microsoft’s tool and will collect specific Active Directory data if run on a DC. It also collects a plethora of hardware-related information, even for non-HP hardware.
  • On the client, use Microsoft KB article 221833 to set verbose logging for Winlogon. This will provide excellent details in the %Systemroot%\Debug\UserMode\Userenv.log file. Note that this log does not contain date stamps, so you must:
    1. delete the existing userenv.log from the client
    2. enable verbose logging per KB 221833
    3. logoff, logon, and save the userenv.log to a new location in order to limit data collection for the logon period.

Note that the userenv.log is excellent at following GPO and profile processing, and often you can clearly see where a logon delay occurs, indicated by a long interval between events.

  • Enable Net Logon logging. The Netlogon log is located in %systemroot%\debug and will be empty if logging is not enabled. This is an excellent source of information. For instance, it will show you which clients in subnets that are not mapped to a site. This can cause a client to go to an out-of-site DC for authentication and result in a longer than expected logon time.
  • Run Process Monitor from Sysinternals. Look in the Help section for details on enabling boot logging. You can capture the process information during the slow boot to see which processes might be affecting performance.

Other tips for troubleshooting slow client logons

There are a few more quick things you can do to see if your logon performance is caused by a known issue.

First, examine the GPResult.exe and LOGONSERVER environment variable on the client. While MPSreports and HPS Reports collect the GPResult for the logged on user, they don’t collect the LOGONSERVER variable which points to the authenticating DC. This is important because each time a user logs in, the GPOs are downloaded to the client. SYSVOL — which contains the GPOs — is a DFS root, however, and does not obey client site awareness. Instead, it collects the DCs (hosting the SYVOL DFS root) in a randomized order, then the GPOs are downloaded from the first DC in the list.

I have seen situations where clients in a main hub site would go across a slow WAN link to an out-of-site DC in order to get the GPOs, causing very slow logon times. Since this could change on each logon, the problem was intermittent.

Examine the GPResult for the DC that the GPOs were downloaded from and see if the GPOs are coming from an out-of-site DC. Also compare the LOGONSERVER variable to see if the client is being authenticated to an out-of-site DC. The logon delay could be explained through this “normal” behavior using known slow or busy links.

Another good test is to boot to Safe Mode with Networking and see if the delay occurs. If not, then do a Net Start and list all the services started. Then boot in normal mode and run Net Start and list all the services again. The difference should point to services that may be suspect, and eliminating them one at a time should help you identify the problem. You can also try disabling applications that start on boot to see if an application is getting in the way.

One final technique is usually to take a network trace using Netmon, Wireshark or another network capture utility. Since you are trying to capture the logon process, one good way to do this is to connect a dumb hub to the network cable going to the switch, then connect a cable from the hub to the problem PC and connect another cable to another PC or laptop that has Netmon or WireShark installed. Run the capture tool in promiscuous mode and reproduce the logon. This setup will ensure that the capture collects traffic in and out of the client and eliminates the network noise.

These are the basics to get you started. Just remember that there are no magic solutions – it really just takes time and detective work to find the problem. In an upcoming article, I will describe the methods I used in some case studies that should help tie this all together.

Texto Gary Olsen

fonte: Windows Server Tips:

No comments yet

Deixe um comentário