Kerberos SSPI/PAC errors and NetLogon errors 5719 and 5783 and Login Failure Audits – Oh my!

We appear to be having a bunch of Kerberos errors in our SQL clusters that represent 2-30 minutes of downtime at a stretch.

A recent network change seemed to help a lot, but is unfortunately not technically supposed to affect our issues at all. Still we wait and see if the errors happen again.

The outages have been happening randomly for the past 5 days (starting last Thursday, ending yesterday – Monday), probably 4 – 6 incidents a day, for the aforementioned 2 – 30 minutes at a stretch.

Samples of the associated events in the Event Logs:

Bunches of failure audits alternating with Kerberos SSPI Handshake errors in the Application Logs of the SQL Cluster (clustered via Microsoft Clustering):
Kerberos SSPI:

Event Type: Error
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 17806
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The alternate failure audits:

Event Type: Failure Audit
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 18452
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

In the Systems log, we commonly see one Netlogon (Event ID 5719) and one Kerberos event (Event ID 7), and one rare Netlogon (Event ID 5783) event that are concurrent with the Application Log events:

Netlogon Event ID 5719:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5719
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
This computer was not able to set up a secure session with a domain controller in domain due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

ADDITIONAL INFO
If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

Kerberos Event ID 7:

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 7
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
The kerberos subsystem encountered a PAC verification failure. This indicates that the PAC from the client in realm had a PAC which failed to verify or was modified. Contact your system administrator.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

The rare NetLogon Event ID 5783:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5783
Date: 9/27/2008
Time: 1:00:56 PM
User: N/A
Computer:
Description:
The session setup to the Windows NT or Windows 2000 Domain Controller \\ for the domain is not responsive. The current RPC call from Netlogon on \\ to \\ has been cancelled.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

We are still watching the situation but suspect that high network latency may be the issue. Other possibilities are a stale DNS cache somewhere and possibly a Veritas Diskeeper 2007 bad install/upgrade. I plan to escalate the general troubleshooting questions to our Microsoft Technical Account Manager today or tomorrow.

MediaWiki RSS feed on Recent Changes page error

If you get an error like:

XML Parsing Error: xml declaration not at start of external entity
Location: http://10.2.48.69/hdwiki//index.php?…anges&feed=rss
Line Number 1, Column 2: <?xml version=”1.0″ encoding=”utf-8″?>
-^

When clicking the rss or atom link in your toolbox from the Recent Changes page, you probably have an extra blank line, either before the <? php tag or after ?> tag in one of your settings files or extension files for your MediaWiki.

In my case, I found it in my EasyTimeline Timeline.php extension file. I had two blank lines at the end. Erasing those lines and re-saving the file fixed the issue.

See uncle Google for more information, but the discussion I referred to to fix my issue was here.

Various

NOTE: Apparently I wrote this around early-August, but never posted it, so I posted it today after finding the draft in my WordPress.

– Updated to WordPress 2.3.1. Apparently some security fixes as well as normal feature updates. Requires a DB upgrade, so be sure to hit that admin link to do the DB upgrade.
– Have been tinkering with using GMail IMAP as a spamfilter and integrating with Thunderbird 2.0+. Using forwarding from old POP3 accounts, I now have a couple extra GMail accounts (for keeping some IDs separate), managed to delete about 7 Thunderbird account profiles and now have GMail spam filter working in conjunction with Thunderbird Junk Mail filtering. I wrote it up (roughly) here: http://health.malcolmgin.com/Using_GMail_to_Filter_Spam
– Mostly doing SharePoint 2003 maintenance at work while we also prepare for SharePoint 2007 (finally – my last job started with 2007 in November of 2006). I’m also doing various other system engineer stuff like working with maintenance, licensing, support, etc. for some other interrelated products my team supports. Still learning, though, so I’m OK on the happiness & fulfillment side, and my commute totals about 3 hours less per day, I mostly get every other Friday off, and I have time and a partner with which to hit the gym 3-4 days a week, work permitting.

Internet Literacy 301: NAVTEQ and you

I decided that since I’m not solely a SharePoint Guy anymore, I’d add other geeky articles I write (usually for private audiences) here as well.

Here’s one about the mapping data source that provides most of the mapping data you see on Google Maps and GPSes and so on:

NAVTEQ is the mapping data company that supplies most of the major mapping companies and utilities with that data. They provide the address resolution (geocoding), the squiggly lines that map to our real world roads and highways and a lot of Point of Interest data like gas stations, hotels, hospitals, police stations, etc. Anything that later ends up on your view of Google or Yahoo Maps or your GPS in your car or that you walk around with, whatever.

So if you notice a mapping error (not necessarily a directions error), but something like a missing street or the wrong way one way or a missing hotel from the Points of Interest, or that the address you just typed in is on the wrong end of the street, or whatever, NAVTEQ is the folks you need to notify.

Last time i did this (when NAVTEQ was showing a schoolyard as a street and thus GPS devices were directing presumably lost motorists to drive through it), NAVTEQ’s form was clunky and difficult to use, no tracking information or updates were available to you when you submitted a change request, and it kinda sucked. Still, they did eventually get the update out to Google Maps, and I presume to many of the GPS software/devices folks use (they issue updates as frequently as quarterly, but it depends on the vendors who use their data).

Anyhow, now they have a newfangled form that’s integrated into their mapping data and if you enter changes, it provides you a tracking number, and you can tie an e-mail address to the report so you can get updates.

So here’s where to go if you want to enter any updates (don’t fret, it’s mostly a form, and you can attach a document or picture to help with the issue)