“Unknown Error” and OOTB SharePoint 2007 builtin Workflows

If you are playing with Approval workflows, and you find that your workflows are erroring out even when you think it should have completed successfully, make sure you aren’t in a situation where you’re updating the approval status without having the approval functionality of your document or workflow library enabled.

It’s all built-in, but it doesn’t all automatically enable itself on need.

I ultimately found my answer on SharePoint Blogs, but my discovery route was circuitous.

First, the Unknown Error statuses on my workflow status page for the workflow. These workflows would error out and require termination from the workflow status page so they’d stop contributing to my active workflow counts. Searching around, I found out (look at Eilene Hao’s response to Misha) that you can get more info about these by going to where your Diagnostic Logs are kept. Find out where those logs are kept by going to SharePoint Central Administration, Operations page/tab, then under the Logging and Reporting section, click the link to Diagnostic Logging. On the next page, you’ll find out where those logs are under Trace Log.

Go to that directory in your front end web server(s) and use a decent directory search/grep (I used Textpad I have installed in portable mode on a USB key) to find log files with the word “workflow”. Upon doing that, I found the string/error, “System.ArgumentNullException: Value cannot be null. Parameter name: name”. Googling eventually led me back to the SharePoint Blogs post above.

So the fix is that if you’re going to use an ootb (out of the box) builtin workflow that updates the approval status of an item, you should also enable the “Require content approval for submitted items?” option in Versioning Settings for the list settings. This will do a lot of things automatically for you:

  • When a list item is changed, the approval status for the item gets automatically changed back to “Pending”.
  • When a person with the permission level to approve items looks at the list, they get a special view option called “Approve/reject Items”. (So they can bypass the approval workflow)
  • The workflow that updates approval status stops erroring out.

It’s all pretty cool, but you have to know how it all hooks together.

Kerberos SSPI/PAC errors and NetLogon errors 5719 and 5783 and Login Failure Audits – Oh my!

We appear to be having a bunch of Kerberos errors in our SQL clusters that represent 2-30 minutes of downtime at a stretch.

A recent network change seemed to help a lot, but is unfortunately not technically supposed to affect our issues at all. Still we wait and see if the errors happen again.

The outages have been happening randomly for the past 5 days (starting last Thursday, ending yesterday – Monday), probably 4 – 6 incidents a day, for the aforementioned 2 – 30 minutes at a stretch.

Samples of the associated events in the Event Logs:

Bunches of failure audits alternating with Kerberos SSPI Handshake errors in the Application Logs of the SQL Cluster (clustered via Microsoft Clustering):
Kerberos SSPI:

Event Type: Error
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 17806
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The alternate failure audits:

Event Type: Failure Audit
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 18452
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

In the Systems log, we commonly see one Netlogon (Event ID 5719) and one Kerberos event (Event ID 7), and one rare Netlogon (Event ID 5783) event that are concurrent with the Application Log events:

Netlogon Event ID 5719:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5719
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
This computer was not able to set up a secure session with a domain controller in domain due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

ADDITIONAL INFO
If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

Kerberos Event ID 7:

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 7
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
The kerberos subsystem encountered a PAC verification failure. This indicates that the PAC from the client in realm had a PAC which failed to verify or was modified. Contact your system administrator.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

The rare NetLogon Event ID 5783:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5783
Date: 9/27/2008
Time: 1:00:56 PM
User: N/A
Computer:
Description:
The session setup to the Windows NT or Windows 2000 Domain Controller \\ for the domain is not responsive. The current RPC call from Netlogon on \\ to \\ has been cancelled.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

We are still watching the situation but suspect that high network latency may be the issue. Other possibilities are a stale DNS cache somewhere and possibly a Veritas Diskeeper 2007 bad install/upgrade. I plan to escalate the general troubleshooting questions to our Microsoft Technical Account Manager today or tomorrow.

Installing .NET 3.0 on a balky workstation

So if you keep trying to install .NET 3.0 and keep getting an error during install (and looking at the Error Log link in the failure message refers to the Windows Communication Framework being missing), go look for a file named something like “dd_dotnetfx3install.txt” in your %temp% directory (in my case, that was at C:\Documents and Settings\[username]\Local Settings\Temp, but just open a command shell and type “echo %temp%” to find out what it is for your system/login). If THAT log shows something like:

[02/02/07,12:55:41] WapUI: ***ERRORLOG EVENT*** : DepCheck indicates Windows Communication Foundation is not installed.
[02/02/07,12:55:41] WapUI: Return for Windows Presentation Foundation indicates a successful installation. DepCheck indicates the component is installed.
[02/02/07,12:55:41] WapUI: Return for Windows Workflow Foundation indicates a successful installation. DepCheck indicates the component is installed.

Then you need to restart your workstation in Diagnostic mode with minimal services and then try the install. It should work.

Details are here in an obscure Microsoft forums post. The method I used, transcribed for your pleasure is as follows:

  1. Set your system to start in diagnostic mode with some additional services:
    1. Click Start, then Run..., then type msconfig. Click OK.
    2. Click the radio button to start your machine in Diagnostic Startup.
    3. Click the Services tab and enable the checkbox to the left of “Windows Installer“, “Plug and Play” and “System Restore“.
    4. Click OK. Authorize your system to restart.
  2. In Diagnostic Mode, do the .NET Framework 3.0 install:
    1. When your system restarts, click OK to acknowledge the msconfig popup warning and set the msconfig window aside (or close it and call it back later to set you system to boot normally).
    2. Now install the .NET 3.0 Framework. Install should go off without a hitch.
  3. Put your system back into Normal startup:
    1. When the installation is complete, recall or get back the msconfig window and set your workstation to Normal startup. Click OK.
    2. Authorize your system to restart.

You’re done!

Getting MOSS 2007 RTM Backup to work when your SQL Database is one you don’t control

In this post I will be somewhat generic about my machine names and account names. This is partly to keep me in form and not thinking too specifically about my special situation and partly for security reasons. I don’t believe in security through obscurity but I also don’t believe in making it extra easy for an attacker to get the first steps of the puzzle for free.

Configuration of my MOSS 2007 RTM install in development:

  • 1 Server that serves all databases to all development environments (Call it SQLDEV1).
    • I have a specific instance (Call it MOSD) in which I can put all my various MOSS databases and my domain user account (Call it DOMAIN\myuser) is a security admin and dbcreator in the Server.
    • Since I used my domain user account (DOMAIN\myuser) to give all the various configuration services and app pools and datbases an ID during initial setup/configuration of MOSS 2007, I (DOMAIN\myuser) am also dbo on all the MOSS databases in the instance.
    • Additionally, it should be noted that on SQLDEV1, the MSSQLSERVER service is running as one domain user account (Call it DOMAIN\dbuser1), and the instance (MOSD) is running as another (Call it DOMAIN\dbuser2).
  • 1 Server that has all the MOSS 2007/.NET 2.0/.NET 3.0 WFX installs (Call it MOSSDEV1).
    • On this machine, my user account (DOMAIN\myuser) is a local administrator. I also have some additional rights that allow me to log into the machine remotely with a terminal services client.
    • It was a precondition to my getting these rights that I set up the MOSS 2007 install so that all configuration aspects that required local admin access be set up with my domain account’s ID (DOMAIN\myuser). I know this isn’t ideal in some respects, but it seems to work okay in this development environment. This will be the main factor in making this blog entry not be entirely helpful in debugging YOUR configuration problem, but I hope it’ll help anyway.

With no additional special steps, I couldn’t get MOSS 2007’s Operations interface to successfully back up the Farm (via the Central Administration: http://your_server:your_port/_admin/Backup.aspx). I kept getting some progress, but each and every actual site collection failed with a failure message ending in “Operating system error 5(Access is denied.). BACKUP DATABASE is terminating abnormally.” [It should be noted that I was previously getting an error 3, which appeared to be tied to using a non-UNC path in the Backup Location field.]

The steps necessary for me to get it all working were:

  1. Create a shared folder on the system for where you want the backups to go. (i.e. \\MOSSDEV1\Farm Backups\).
  2. Add full control permissions to this share for all three accounts: The service account for MSSQLSERVER on the database server (DOMAIN\dbuser1), the service account for the database instance hosting the content/configuration databases (DOMAIN\dbuser2) and the service account running the MOSS 2007 application pool (my guess of the most likely suspect in this operation – DOMAIN\myuser).
  3. Use the share name you created in the Backup Location text box in the 2nd step of the Backup configuration in Central Administration (i.e. \\MOSSDEV1\Farm Backups\)

I had been using the Admin’s traditional UNC path (i.e. \\MOSSDEV1\d$\Farm Backups\) without explicitly sharing a folder, but that tripped me up, because the SQL Server service account IDs were not Local Admins on the MOSS 2007 box (i.e. MOSSDEV1).
Also it took a bit of digging and asking my DBAs about the service account identities for the SQL Server.

My main reference is about SQL Server Backup in the Microsoft KB #255235.