Kerberos SSPI/PAC errors and NetLogon errors 5719 and 5783 and Login Failure Audits – Oh my!

We appear to be having a bunch of Kerberos errors in our SQL clusters that represent 2-30 minutes of downtime at a stretch.

A recent network change seemed to help a lot, but is unfortunately not technically supposed to affect our issues at all. Still we wait and see if the errors happen again.

The outages have been happening randomly for the past 5 days (starting last Thursday, ending yesterday – Monday), probably 4 – 6 incidents a day, for the aforementioned 2 – 30 minutes at a stretch.

Samples of the associated events in the Event Logs:

Bunches of failure audits alternating with Kerberos SSPI Handshake errors in the Application Logs of the SQL Cluster (clustered via Microsoft Clustering):
Kerberos SSPI:

Event Type: Error
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 17806
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The alternate failure audits:

Event Type: Failure Audit
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 18452
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

In the Systems log, we commonly see one Netlogon (Event ID 5719) and one Kerberos event (Event ID 7), and one rare Netlogon (Event ID 5783) event that are concurrent with the Application Log events:

Netlogon Event ID 5719:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5719
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
This computer was not able to set up a secure session with a domain controller in domain due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

ADDITIONAL INFO
If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

Kerberos Event ID 7:

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 7
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
The kerberos subsystem encountered a PAC verification failure. This indicates that the PAC from the client in realm had a PAC which failed to verify or was modified. Contact your system administrator.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

The rare NetLogon Event ID 5783:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5783
Date: 9/27/2008
Time: 1:00:56 PM
User: N/A
Computer:
Description:
The session setup to the Windows NT or Windows 2000 Domain Controller \\ for the domain is not responsive. The current RPC call from Netlogon on \\ to \\ has been cancelled.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

We are still watching the situation but suspect that high network latency may be the issue. Other possibilities are a stale DNS cache somewhere and possibly a Veritas Diskeeper 2007 bad install/upgrade. I plan to escalate the general troubleshooting questions to our Microsoft Technical Account Manager today or tomorrow.

MediaWiki RSS feed on Recent Changes page error

If you get an error like:

XML Parsing Error: xml declaration not at start of external entity
Location: http://10.2.48.69/hdwiki//index.php?…anges&feed=rss
Line Number 1, Column 2: <?xml version=”1.0″ encoding=”utf-8″?>
-^

When clicking the rss or atom link in your toolbox from the Recent Changes page, you probably have an extra blank line, either before the <? php tag or after ?> tag in one of your settings files or extension files for your MediaWiki.

In my case, I found it in my EasyTimeline Timeline.php extension file. I had two blank lines at the end. Erasing those lines and re-saving the file fixed the issue.

See uncle Google for more information, but the discussion I referred to to fix my issue was here.

Alternate Access Mapping 2007 Research Links

We must migrated from 2003 to 2007 this past weekend (Friday, actually), and now the mapping is causing problems for our MAC users, who can’t use UNC hostnames, but have to use full hostnames.

So I’m doing some background research on the issue.

Links:
– http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=521109&SiteID=17
– http://technet2.microsoft.com/Office/en-us/library/be9d31d2-b9cb-4442-bfc6-2adcdbff8fae1033.mspx
– http://blogs.officezealot.com/mauro/archive/2007/03/02/20178.aspx
– http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/MS-SharePoint/Q_22117646.html
– http://msdn2.microsoft.com/en-us/library/ms771995.aspx
– http://groups.google.com/group/microsoft.public.sharepoint.windowsservices/browse_thread/thread/580e9a2b981c0319/acdd966b3ce3ada8?lnk=raot
– http://blogs.msdn.com/sharepoint/archive/2007/03/06/what-every-sharepoint-administrator-needs-to-know-about-alternate-access-mappings-part-1.aspx
– http://blogs.msdn.com/sharepoint/archive/2007/03/19/what-every-sharepoint-administrator-needs-to-know-about-alternate-access-mappings-part-2-of-3.aspx
– http://blogs.msdn.com/sharepoint/archive/2007/04/18/what-every-sharepoint-administrator-needs-to-know-about-alternate-access-mappings-part-3-of-3.aspx
– http://support.microsoft.com/kb/913113
– http://mindsharpblogs.com/Driskell/archive/2007/05/15/1769.aspx
– http://www.codeplex.com/SLK/Thread/View.aspx?ThreadId=9590
– http://blog.henryong.com/2007/01/17/alternate-access-mapping-in-sharepoint/
– http://www.toddklindt.com/blog/Lists/Posts/Post.aspx?ID=18
– http://www.toddklindt.com/blog/Lists/Posts/Post.aspx?ID=39
– http://www.jjfblog.com/2006/12/how-to-change-server-name-post.html
– http://msmvps.com/blogs/obts/archive/2007/03/27/717296.aspx

For 2003:
– http://office.microsoft.com/en-us/sharepointportaladmin/HA011603021033.aspx

Getting Partial and Complete Full Farm backups to restore properly on 2007

This is my research today/this week, until I get it working and properly documented, at least on our configuation traking Wiki.

Links:

Colleagues have alluded to “The GUID Problem” wherein if you don’t take a site collection’s content database offline before restoring a copy of it to the same farm, you’ll get errors because the GUIDs will match and confuse SharePoint, so the recommendation there is that if you have retroactive data you wish to restore to a different web application/site collection while maintaining your main branch on a prod server, try instead to stand up an entirely different farm, restore to that farm, then use stsadm with the -o export operation to backup the content and then restore it again on Production.

Or better yet, don’t fiddle with Production!

Anyhow, here’s the steps I am going through to get this to work. Again, my situation is that I’m just trying to verify that a full farm backup can have part of its content (one web application’s site collections) restored somewhere else if need be.

  1. Take the full farm backup with either stsadm -o backup -directory \\UNC\path\ -backupmethod full - url http://mossdev1/ or via the Central Administration UI to do the same thing (left as exercise to reader). (Note, you need to make sure that the service account that’s running your instance of SQL Server for the back-end has write access to the filesystem/UNC Path you provide during the backup setup steps.)
  2. Copy the xml files and directory tree generated to a new farm for restore. Share this directory to make sure it has a valid UNC path. Make sure your SQL instance Service Account has full access to the share/UNC path.
  3. Check to make sure there are no failed Job Statuses for Backup/Restore on your target (restore) farm.
  4. Locate the directory where you want to/already store SQL database files (your SQL admin may already have placed this somewhere else on the server or it may go to the default: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data. Make sure you’ve got proper permissions to write to that directory (I used a very high-level account, permissions-wise, to request the restore, and made sure that it had write permissions to the directory, but am not entirely sure that’s the correct account to configure – need more research here.
  5. Restored part of full farm (only one web application: http://mossdev1:44444/ and one Content Database: WSS_Content_Database).
  6. Because the security model was completely different for the target server from the source server, went into SQL 2005 Management Studio, connected to the proper instance, and found the Content Database (to make sure it was properly restored), opened the AllDocs table to make sure data was in there, and then edited the SQL instance’s logins to make sure that the Farm Service Account had proper rights to access the Content Database (I gave it dbo rights on the database, but you can probably get away with User rights specifically granting Connect rights within the database in question).
  7. Because the host name of the Web Application I restored is different from the VPC’s hostname, make sure that IIS recognizes the proper host header, and that you have DNS or hosts file entries that map to the proper IP address. Go into the IIS Manager, right click the Virtual Server that corresponds to the Web Application you just restored. Click the Advanced button in the Web Site Identification control group on the Web Site tab. In the pop-up box, click the entry for your port, click the Edit button, and add the proper host header value(s) to the Host Header Value text box. If you are changing DNS records, be sure to create an A and a PTR record. If using hosts files, just go to C:\WINDOWS\system32\drivers\etc\ and edit that file!
  8. Because the fact of the different security model probably kept the Content Database from being properly attached to your Web Application, go back and do that manually. Go to Central Administration, choose Application Management. Then choose the Content databases link under the SharePoint Web Application Management section. Click the Add a content database link in the title bar. On the next screen, specify the database server/instance, database name and you can probably leave the other fields at their defaults (unless your organization specifies other settings).
  9. Because of the different security model, you’ll also need to add your current login account to the Web Application’s policy. Do that now. From Central Administration’s Application Management area, choose Policy for Web Application (Under the Application Security section). Click Add Users, then make sure you’ve got the right web application and click Next. Specify the user(s) you wish to have Site collection administration, choose Full Control and click Finish.
  10. Now try out your site restore by going to the URL it should be at. If you’re not sure about that, check out your Site Collection List under Application Management for that Web Application.
  11. If you have any other issues, you may be on your own, because honestly that was enough problems to surmount for me today! 🙂

Creating new My Site hosts for MOSS 2007

If you should happen to recreate your SSP or your MySite host in MOSS 2007, you may find that the wizard that helped you out the first time with properly configuring your MySites host may have flown the coop and you’re left at sea about how to proceed. I know I was.

On trying to create a new My Site (for a user that doesn’t already have one), typical error messages will tell you that Self-Service Site Creation is disabled, or that there was an error in creating your personal site. Both error messages will entreat you to contact your administrator.

Here’s the full scoop on creating a My Site host in MOSS 2007 by hand from the ground up (i.e. at Web application creation on up):

Prepare new web application to take up My Site host duties:

  1. Create a new web application (e.g. http://mossdev1:25000/)
  2. Inspect Managed Paths for the new web application. You should already have:
    1. (root) - Explicit inclusion
    2. sites - Wildcard inclusion
  3. Delete managed paths:
    1. sites - Wildcard inclusion
  4. Create managed paths:
    1. personal - Wildcard inclusion
    2. mysite - Explicit inclusion
  5. End state for managed paths should be:
    1. (root) - Explicit inclusion (thanks to imsaurabh for catching this!)
    2. personal - Wildcard inclusion
    3. mysite - Explicit inclusion
  6. Create a site collection at /mysite/ managed path. This will use a My Site Host template:
    1. Choose correct web application (e.g. http://mossdev1:25000/)
    2. Title: My Site Host (doesn’t matter, really)
    3. URL: http://mossdev1:25000/mysite (no fill-in because path is explicit in managed paths)
    4. Template: Enterprise (tab) -> My Site Host
    5. Specify primary and secondary administrators.
    6. Click OK.
  7. Create a blank site collection at the / managed path to enable self-service site creation:
    1. Choose correct web application (e.g. http://mossdev1:25000/)
    2. Title: Blank site (doesn’t matter, really)
    3. URL: http://mossdev1:25000/ (no fill-in because path is explicit in managed paths)
    4. Template: Collaboration (tab) -> Blank Site
    5. Specify primary and secondary administrators.
    6. Click OK.
  8. Enable Self-Service Management. Choose from Application Management -> Application Security.

Now that you’ve created the host, here’s how to make sure it works properly in the SSP’s My Site Settings:

  1. Navigate to My Site Settings (go to your SSP’s admin pages, it’s the 3rd link in the 1st section).
    1. For form’s sake, inspect the Preferred Search Center entry. This URL should end in /SearchCenter/Pages/.
    2. Set Personal Site Services to http://mossdev1:25000/mysite/. Note that this points to the URL for the explicit inclusion path and My Site Host Template site collection you created above.
    3. Set Personal site Location into just personal. Note that this points to the URL (after SharePoint puts context to it) for the Wildcard inclusion managed path you created above.
    4. Choose the 2nd Site Naming Format: User name (resolve conflicts by using domain_username).
    5. Enable Allow user to choose the language of their personal site.
    6. Disable My Site to support global deployments.
    7. Default Reader Site Group: NT AUTHORITY\authenticated users.

Now try to navigate to your MySite link and you should be golden. Creation should go just fine.

Good luck!

Links

I’m doing research today to answer multiple questions.

Questions:

  1. I recreated my MOSS 2007 SSP a while back and now MySite creations aren’t working for anyone. It’s not working exactly like the discussion at Technet Forums. Still researching this one.
  2. On a related note, I need to nail down exactly how to create personalized MySite services scoped to a particular web application on the MOSS 2007 farm. No articles yet.
  3. How do we federate/rollup content (if possible, what’s best practice?) from multiple sites? Client has security policies that require internet/extranet servers/farms be separate from intranet servers/farms, but they also have a requirement (it’s thankfully more of a “nice to have”, since even they are not sure it’s possible) to make it so that a single user doesn’t have to go to multiple sites to see all of their content, especially their personalized content. I’m aware that this is possible in many ways in SharePoint, but not sure if any implementation is ideal. The first really helpful link I’ve found along these lines of thinking is Joel Oleson’s blog entry about managing Global and Multifarm deployments. Another good one from Mr. Oleson. I’m reading it right now.
  4. I have to do some research on the best ways to integrate outside LDAP, AD and custom-schema organizational directories for user information into MOSS 2007. No links there yet.
  5. I need to get on the stick and do Workflows in VS 2005 against WSS 3.0/MOSS 2007. I did try SharePoint Designer for my needs and while it does address most of them, one thing I couldn’t figure out how to do was to make a workflow that publishes documents across sites (up, down, sideways, between sites, subsites and unrelated sites). All I could figure out how to do was publish from one document library to another in the same site. There are other options:
    1. Major/Minor versions in Document Libraries: Probably the most elegant of the solutions, since it’s already built-in to SharePoint 2007, the major down-side is that this may be too complicated a new feature for users to learn given that check in/check out is already foreign to them (unless they’re developers). I know the pat answer is learn, but honestly that doesn’t cut any ice with client-focused business analysts. They have a point. The “learn” answer just offloads the effort on another group: either training or support. Not everyone is as technically focused as implementors are. Not everyone wants to learn a new feature every version upgrade just to do their jobs right.
    2. The Send-To->Other Location option on document libraries’ documents works just fine with Firefox 2.0 but barfs completely with IE7. See my discussion of it on the Microsoft Newsgroups (I think you’ll need a passport identity – alternate link via Google Groups) for more information. It’s possible I’ll call MS support about this, but only if the client says it’s critical path and means it. It’s too risky to burn a support call on a bug. I wish MS really provided other meaningful ways of reporting bugs.
  6. I also need to find out whether the helpful Weather and other free, useful, fuzzy good feelings web parts exist any more, like they do in SPS 2003. Weather’s a big request these days. If they’re a download/install I need to do that. No research here yet either.
  7. Finally, I asserted to a friend/co-worker a few days ago that from a programmer’s perspective, I can’t see why Perfmon would, as his manager asserted, bring a server to its knees. Given that in the programming I’ve done that does create Perfmon counter objects, I never check to see if any monitors are running, I just throw the stats over the wall for the OS to do with it as it will. The guy’s job would be made so much simpler if his manager relaxed about this, and I simply don’t have the resources myself to do the exhaustive system profiling and performance monitoring this might take to convince anyone. So maybe someone else has. No research here yet.

So what do you think? Do I have enough to do?

Installing .NET 3.0 on a balky workstation

So if you keep trying to install .NET 3.0 and keep getting an error during install (and looking at the Error Log link in the failure message refers to the Windows Communication Framework being missing), go look for a file named something like “dd_dotnetfx3install.txt” in your %temp% directory (in my case, that was at C:\Documents and Settings\[username]\Local Settings\Temp, but just open a command shell and type “echo %temp%” to find out what it is for your system/login). If THAT log shows something like:

[02/02/07,12:55:41] WapUI: ***ERRORLOG EVENT*** : DepCheck indicates Windows Communication Foundation is not installed.
[02/02/07,12:55:41] WapUI: Return for Windows Presentation Foundation indicates a successful installation. DepCheck indicates the component is installed.
[02/02/07,12:55:41] WapUI: Return for Windows Workflow Foundation indicates a successful installation. DepCheck indicates the component is installed.

Then you need to restart your workstation in Diagnostic mode with minimal services and then try the install. It should work.

Details are here in an obscure Microsoft forums post. The method I used, transcribed for your pleasure is as follows:

  1. Set your system to start in diagnostic mode with some additional services:
    1. Click Start, then Run..., then type msconfig. Click OK.
    2. Click the radio button to start your machine in Diagnostic Startup.
    3. Click the Services tab and enable the checkbox to the left of “Windows Installer“, “Plug and Play” and “System Restore“.
    4. Click OK. Authorize your system to restart.
  2. In Diagnostic Mode, do the .NET Framework 3.0 install:
    1. When your system restarts, click OK to acknowledge the msconfig popup warning and set the msconfig window aside (or close it and call it back later to set you system to boot normally).
    2. Now install the .NET 3.0 Framework. Install should go off without a hitch.
  3. Put your system back into Normal startup:
    1. When the installation is complete, recall or get back the msconfig window and set your workstation to Normal startup. Click OK.
    2. Authorize your system to restart.

You’re done!

Oy! – Inaccuracy in order of steps in a MS support document screws up the process

This almost sucked. Hard.

The Microsoft Technet article I linked to previously about moving the WSS Content Database in WSS 2003 is pretty good! Except for the last section, under Moving the Datases, subsection “Set the content database in Windows SharePoint Services”.

Unfortunately, the article says you should, for each Virtual Server, remove/disconnect each content database, THEN add the new content database. This is ass-backwards, and SharePoint (SP2, at least) won’t let you do it. It will only work if you add the new content database first and THEN remove the old database.

I thought my goose was cooked until we tried that, then we were okay.

In my dream world, Microsoft and all other vendors will actually QA their documentation as thoroughly as they say the QA their code.

Miscellaneous research points from this morning

Getting MOSS 2007 RTM Backup to work when your SQL Database is one you don’t control

In this post I will be somewhat generic about my machine names and account names. This is partly to keep me in form and not thinking too specifically about my special situation and partly for security reasons. I don’t believe in security through obscurity but I also don’t believe in making it extra easy for an attacker to get the first steps of the puzzle for free.

Configuration of my MOSS 2007 RTM install in development:

  • 1 Server that serves all databases to all development environments (Call it SQLDEV1).
    • I have a specific instance (Call it MOSD) in which I can put all my various MOSS databases and my domain user account (Call it DOMAIN\myuser) is a security admin and dbcreator in the Server.
    • Since I used my domain user account (DOMAIN\myuser) to give all the various configuration services and app pools and datbases an ID during initial setup/configuration of MOSS 2007, I (DOMAIN\myuser) am also dbo on all the MOSS databases in the instance.
    • Additionally, it should be noted that on SQLDEV1, the MSSQLSERVER service is running as one domain user account (Call it DOMAIN\dbuser1), and the instance (MOSD) is running as another (Call it DOMAIN\dbuser2).
  • 1 Server that has all the MOSS 2007/.NET 2.0/.NET 3.0 WFX installs (Call it MOSSDEV1).
    • On this machine, my user account (DOMAIN\myuser) is a local administrator. I also have some additional rights that allow me to log into the machine remotely with a terminal services client.
    • It was a precondition to my getting these rights that I set up the MOSS 2007 install so that all configuration aspects that required local admin access be set up with my domain account’s ID (DOMAIN\myuser). I know this isn’t ideal in some respects, but it seems to work okay in this development environment. This will be the main factor in making this blog entry not be entirely helpful in debugging YOUR configuration problem, but I hope it’ll help anyway.

With no additional special steps, I couldn’t get MOSS 2007’s Operations interface to successfully back up the Farm (via the Central Administration: http://your_server:your_port/_admin/Backup.aspx). I kept getting some progress, but each and every actual site collection failed with a failure message ending in “Operating system error 5(Access is denied.). BACKUP DATABASE is terminating abnormally.” [It should be noted that I was previously getting an error 3, which appeared to be tied to using a non-UNC path in the Backup Location field.]

The steps necessary for me to get it all working were:

  1. Create a shared folder on the system for where you want the backups to go. (i.e. \\MOSSDEV1\Farm Backups\).
  2. Add full control permissions to this share for all three accounts: The service account for MSSQLSERVER on the database server (DOMAIN\dbuser1), the service account for the database instance hosting the content/configuration databases (DOMAIN\dbuser2) and the service account running the MOSS 2007 application pool (my guess of the most likely suspect in this operation – DOMAIN\myuser).
  3. Use the share name you created in the Backup Location text box in the 2nd step of the Backup configuration in Central Administration (i.e. \\MOSSDEV1\Farm Backups\)

I had been using the Admin’s traditional UNC path (i.e. \\MOSSDEV1\d$\Farm Backups\) without explicitly sharing a folder, but that tripped me up, because the SQL Server service account IDs were not Local Admins on the MOSS 2007 box (i.e. MOSSDEV1).
Also it took a bit of digging and asking my DBAs about the service account identities for the SQL Server.

My main reference is about SQL Server Backup in the Microsoft KB #255235.