Kerberos SSPI/PAC errors and NetLogon errors 5719 and 5783 and Login Failure Audits – Oh my!


We appear to be having a bunch of Kerberos errors in our SQL clusters that represent 2-30 minutes of downtime at a stretch.

A recent network change seemed to help a lot, but is unfortunately not technically supposed to affect our issues at all. Still we wait and see if the errors happen again.

The outages have been happening randomly for the past 5 days (starting last Thursday, ending yesterday – Monday), probably 4 – 6 incidents a day, for the aforementioned 2 – 30 minutes at a stretch.

Samples of the associated events in the Event Logs:

Bunches of failure audits alternating with Kerberos SSPI Handshake errors in the Application Logs of the SQL Cluster (clustered via Microsoft Clustering):
Kerberos SSPI:

Event Type: Error
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 17806
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The alternate failure audits:

Event Type: Failure Audit
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 18452
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

In the Systems log, we commonly see one Netlogon (Event ID 5719) and one Kerberos event (Event ID 7), and one rare Netlogon (Event ID 5783) event that are concurrent with the Application Log events:

Netlogon Event ID 5719:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5719
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
This computer was not able to set up a secure session with a domain controller in domain due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

ADDITIONAL INFO
If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

Kerberos Event ID 7:

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 7
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
The kerberos subsystem encountered a PAC verification failure. This indicates that the PAC from the client in realm had a PAC which failed to verify or was modified. Contact your system administrator.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

The rare NetLogon Event ID 5783:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5783
Date: 9/27/2008
Time: 1:00:56 PM
User: N/A
Computer:
Description:
The session setup to the Windows NT or Windows 2000 Domain Controller \\ for the domain is not responsive. The current RPC call from Netlogon on \\ to \\ has been cancelled.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

We are still watching the situation but suspect that high network latency may be the issue. Other possibilities are a stale DNS cache somewhere and possibly a Veritas Diskeeper 2007 bad install/upgrade. I plan to escalate the general troubleshooting questions to our Microsoft Technical Account Manager today or tomorrow.

, , ,

2 responses to “Kerberos SSPI/PAC errors and NetLogon errors 5719 and 5783 and Login Failure Audits – Oh my!”

  1. Hi,

    I am seeing the same issue (win 2003, MOSS 2007 and SQL 2005)

    Did you find a solution to this problem?

    I gather it is due to connection problems talking to the AD server – due to latency or DNS??

    Cheers!

  2. @IUser, the problem went away with a network change that was supposed to be unrelated. I can only guess that the network change changed something about routing and latency, but I don’t know for sure. I’m sorry, but this one remains unresolved for now.

Leave a Reply to Malcolm Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.