“Unknown Error” and OOTB SharePoint 2007 builtin Workflows

If you are playing with Approval workflows, and you find that your workflows are erroring out even when you think it should have completed successfully, make sure you aren’t in a situation where you’re updating the approval status without having the approval functionality of your document or workflow library enabled.

It’s all built-in, but it doesn’t all automatically enable itself on need.

I ultimately found my answer on SharePoint Blogs, but my discovery route was circuitous.

First, the Unknown Error statuses on my workflow status page for the workflow. These workflows would error out and require termination from the workflow status page so they’d stop contributing to my active workflow counts. Searching around, I found out (look at Eilene Hao’s response to Misha) that you can get more info about these by going to where your Diagnostic Logs are kept. Find out where those logs are kept by going to SharePoint Central Administration, Operations page/tab, then under the Logging and Reporting section, click the link to Diagnostic Logging. On the next page, you’ll find out where those logs are under Trace Log.

Go to that directory in your front end web server(s) and use a decent directory search/grep (I used Textpad I have installed in portable mode on a USB key) to find log files with the word “workflow”. Upon doing that, I found the string/error, “System.ArgumentNullException: Value cannot be null. Parameter name: name”. Googling eventually led me back to the SharePoint Blogs post above.

So the fix is that if you’re going to use an ootb (out of the box) builtin workflow that updates the approval status of an item, you should also enable the “Require content approval for submitted items?” option in Versioning Settings for the list settings. This will do a lot of things automatically for you:

  • When a list item is changed, the approval status for the item gets automatically changed back to “Pending”.
  • When a person with the permission level to approve items looks at the list, they get a special view option called “Approve/reject Items”. (So they can bypass the approval workflow)
  • The workflow that updates approval status stops erroring out.

It’s all pretty cool, but you have to know how it all hooks together.

Kerberos SSPI/PAC errors and NetLogon errors 5719 and 5783 and Login Failure Audits – Oh my!

We appear to be having a bunch of Kerberos errors in our SQL clusters that represent 2-30 minutes of downtime at a stretch.

A recent network change seemed to help a lot, but is unfortunately not technically supposed to affect our issues at all. Still we wait and see if the errors happen again.

The outages have been happening randomly for the past 5 days (starting last Thursday, ending yesterday – Monday), probably 4 – 6 incidents a day, for the aforementioned 2 – 30 minutes at a stretch.

Samples of the associated events in the Event Logs:

Bunches of failure audits alternating with Kerberos SSPI Handshake errors in the Application Logs of the SQL Cluster (clustered via Microsoft Clustering):
Kerberos SSPI:

Event Type: Error
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 17806
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The alternate failure audits:

Event Type: Failure Audit
Event Source: MSSQLSERVER
Event Category: (4)
Event ID: 18452
Date: 9/29/2008
Time: 8:17:52 AM
User: N/A
Computer:
Description:
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: xxx.xxx.xxx.xxx]

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

In the Systems log, we commonly see one Netlogon (Event ID 5719) and one Kerberos event (Event ID 7), and one rare Netlogon (Event ID 5783) event that are concurrent with the Application Log events:

Netlogon Event ID 5719:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5719
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
This computer was not able to set up a secure session with a domain controller in domain due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

ADDITIONAL INFO
If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

Kerberos Event ID 7:

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 7
Date: 9/29/2008
Time: 8:15:59 AM
User: N/A
Computer:
Description:
The kerberos subsystem encountered a PAC verification failure. This indicates that the PAC from the client in realm had a PAC which failed to verify or was modified. Contact your system administrator.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 5e 00 00 c0 ^..À

The rare NetLogon Event ID 5783:

Event Type: Error
Event Source: NETLOGON
Event Category: None
Event ID: 5783
Date: 9/27/2008
Time: 1:00:56 PM
User: N/A
Computer:
Description:
The session setup to the Windows NT or Windows 2000 Domain Controller \\ for the domain is not responsive. The current RPC call from Netlogon on \\ to \\ has been cancelled.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

We are still watching the situation but suspect that high network latency may be the issue. Other possibilities are a stale DNS cache somewhere and possibly a Veritas Diskeeper 2007 bad install/upgrade. I plan to escalate the general troubleshooting questions to our Microsoft Technical Account Manager today or tomorrow.

SharePoint 2003 Reader Permissions include… Export to Spreadsheet!

I had to find this out for a customer this morning.

It’s true: Being able to read something means they’re okay with exporting to a spreadsheet. We may need to “Lock this down” but all I can think of is to use javascript to hide the control, which is not really locked down per se.

Dev VM Farm Notes

Had a similar problem to this one, wherein the SharePoint Timer service appeared to be screwing around with my app pool settings for my SSP web application.

Not entirely sure what caused it (because I wasn’t taking notes – trying to set this one up quickly) but I found the problem while trying to set a trusted location for Excel Services.

The big bold black “Service Unavailable” kept balking me. I kept double-checking passwords (I only use one password in Dev environments), Web Application Policies, the Server Farm Administrators’ group and AD Security Group memberships to no avail.

For me, trying to set a new App Pool and App Pool Identity didn’t in fact work when trying it with IIS Manager. The Timer Service appeared to keep messing me around.

What did finally work was using the Service Accounts settings in Operations Manager of SharePoint Central Administration.

Various

NOTE: Apparently I wrote this around early-August, but never posted it, so I posted it today after finding the draft in my WordPress.

– Updated to WordPress 2.3.1. Apparently some security fixes as well as normal feature updates. Requires a DB upgrade, so be sure to hit that admin link to do the DB upgrade.
– Have been tinkering with using GMail IMAP as a spamfilter and integrating with Thunderbird 2.0+. Using forwarding from old POP3 accounts, I now have a couple extra GMail accounts (for keeping some IDs separate), managed to delete about 7 Thunderbird account profiles and now have GMail spam filter working in conjunction with Thunderbird Junk Mail filtering. I wrote it up (roughly) here: http://health.malcolmgin.com/Using_GMail_to_Filter_Spam
– Mostly doing SharePoint 2003 maintenance at work while we also prepare for SharePoint 2007 (finally – my last job started with 2007 in November of 2006). I’m also doing various other system engineer stuff like working with maintenance, licensing, support, etc. for some other interrelated products my team supports. Still learning, though, so I’m OK on the happiness & fulfillment side, and my commute totals about 3 hours less per day, I mostly get every other Friday off, and I have time and a partner with which to hit the gym 3-4 days a week, work permitting.

Alternate Access Mapping 2007 Research Links

We must migrated from 2003 to 2007 this past weekend (Friday, actually), and now the mapping is causing problems for our MAC users, who can’t use UNC hostnames, but have to use full hostnames.

So I’m doing some background research on the issue.

Links:
– http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=521109&SiteID=17
– http://technet2.microsoft.com/Office/en-us/library/be9d31d2-b9cb-4442-bfc6-2adcdbff8fae1033.mspx
– http://blogs.officezealot.com/mauro/archive/2007/03/02/20178.aspx
– http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Server/MS-SharePoint/Q_22117646.html
– http://msdn2.microsoft.com/en-us/library/ms771995.aspx
– http://groups.google.com/group/microsoft.public.sharepoint.windowsservices/browse_thread/thread/580e9a2b981c0319/acdd966b3ce3ada8?lnk=raot
– http://blogs.msdn.com/sharepoint/archive/2007/03/06/what-every-sharepoint-administrator-needs-to-know-about-alternate-access-mappings-part-1.aspx
– http://blogs.msdn.com/sharepoint/archive/2007/03/19/what-every-sharepoint-administrator-needs-to-know-about-alternate-access-mappings-part-2-of-3.aspx
– http://blogs.msdn.com/sharepoint/archive/2007/04/18/what-every-sharepoint-administrator-needs-to-know-about-alternate-access-mappings-part-3-of-3.aspx
– http://support.microsoft.com/kb/913113
– http://mindsharpblogs.com/Driskell/archive/2007/05/15/1769.aspx
– http://www.codeplex.com/SLK/Thread/View.aspx?ThreadId=9590
– http://blog.henryong.com/2007/01/17/alternate-access-mapping-in-sharepoint/
– http://www.toddklindt.com/blog/Lists/Posts/Post.aspx?ID=18
– http://www.toddklindt.com/blog/Lists/Posts/Post.aspx?ID=39
– http://www.jjfblog.com/2006/12/how-to-change-server-name-post.html
– http://msmvps.com/blogs/obts/archive/2007/03/27/717296.aspx

For 2003:
– http://office.microsoft.com/en-us/sharepointportaladmin/HA011603021033.aspx

Getting Partial and Complete Full Farm backups to restore properly on 2007

This is my research today/this week, until I get it working and properly documented, at least on our configuation traking Wiki.

Links:

Colleagues have alluded to “The GUID Problem” wherein if you don’t take a site collection’s content database offline before restoring a copy of it to the same farm, you’ll get errors because the GUIDs will match and confuse SharePoint, so the recommendation there is that if you have retroactive data you wish to restore to a different web application/site collection while maintaining your main branch on a prod server, try instead to stand up an entirely different farm, restore to that farm, then use stsadm with the -o export operation to backup the content and then restore it again on Production.

Or better yet, don’t fiddle with Production!

Anyhow, here’s the steps I am going through to get this to work. Again, my situation is that I’m just trying to verify that a full farm backup can have part of its content (one web application’s site collections) restored somewhere else if need be.

  1. Take the full farm backup with either stsadm -o backup -directory \\UNC\path\ -backupmethod full - url http://mossdev1/ or via the Central Administration UI to do the same thing (left as exercise to reader). (Note, you need to make sure that the service account that’s running your instance of SQL Server for the back-end has write access to the filesystem/UNC Path you provide during the backup setup steps.)
  2. Copy the xml files and directory tree generated to a new farm for restore. Share this directory to make sure it has a valid UNC path. Make sure your SQL instance Service Account has full access to the share/UNC path.
  3. Check to make sure there are no failed Job Statuses for Backup/Restore on your target (restore) farm.
  4. Locate the directory where you want to/already store SQL database files (your SQL admin may already have placed this somewhere else on the server or it may go to the default: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data. Make sure you’ve got proper permissions to write to that directory (I used a very high-level account, permissions-wise, to request the restore, and made sure that it had write permissions to the directory, but am not entirely sure that’s the correct account to configure – need more research here.
  5. Restored part of full farm (only one web application: http://mossdev1:44444/ and one Content Database: WSS_Content_Database).
  6. Because the security model was completely different for the target server from the source server, went into SQL 2005 Management Studio, connected to the proper instance, and found the Content Database (to make sure it was properly restored), opened the AllDocs table to make sure data was in there, and then edited the SQL instance’s logins to make sure that the Farm Service Account had proper rights to access the Content Database (I gave it dbo rights on the database, but you can probably get away with User rights specifically granting Connect rights within the database in question).
  7. Because the host name of the Web Application I restored is different from the VPC’s hostname, make sure that IIS recognizes the proper host header, and that you have DNS or hosts file entries that map to the proper IP address. Go into the IIS Manager, right click the Virtual Server that corresponds to the Web Application you just restored. Click the Advanced button in the Web Site Identification control group on the Web Site tab. In the pop-up box, click the entry for your port, click the Edit button, and add the proper host header value(s) to the Host Header Value text box. If you are changing DNS records, be sure to create an A and a PTR record. If using hosts files, just go to C:\WINDOWS\system32\drivers\etc\ and edit that file!
  8. Because the fact of the different security model probably kept the Content Database from being properly attached to your Web Application, go back and do that manually. Go to Central Administration, choose Application Management. Then choose the Content databases link under the SharePoint Web Application Management section. Click the Add a content database link in the title bar. On the next screen, specify the database server/instance, database name and you can probably leave the other fields at their defaults (unless your organization specifies other settings).
  9. Because of the different security model, you’ll also need to add your current login account to the Web Application’s policy. Do that now. From Central Administration’s Application Management area, choose Policy for Web Application (Under the Application Security section). Click Add Users, then make sure you’ve got the right web application and click Next. Specify the user(s) you wish to have Site collection administration, choose Full Control and click Finish.
  10. Now try out your site restore by going to the URL it should be at. If you’re not sure about that, check out your Site Collection List under Application Management for that Web Application.
  11. If you have any other issues, you may be on your own, because honestly that was enough problems to surmount for me today! 🙂

SharePoint Developer Environments (Esp. 2007)

About a year and a half ago, our petition at my current work site to create/use developer environments for SharePoint (then 2003) was punted, mostly because the requirements for dev environments are almost completely diametrically opposed to the various security and systems policies at this site. The detente when finally reached was that we could run entirely self-contained VPCs but only if they ran on our (my consulting company’s) hardware and were composed of developer licensed software licensed to my company.

Anyhow, now that 2007’s out, deployed at my current work site, and people are taking seriously the task of converting from MS CMS 2002 to MS SharePoint 2007, among other things, the need for developer environments cannot be denied, and the VPCs my company is creating for such an environment are too resource-intensive to run on our old laptops, so something’s got to give.

As such, I’m doing a lot of research about it to justify/position the arguments for such environments. In general I will have at least skimmed these links, but may not have read them thoroughly. These link dump posts are really something I do to keep track of my current research and provide pointers to those who might need them.

Basic assumptions:

  • You want one server context (IIS/SharePoint) per developer for the initial code-writing steps
    This is for non-shared resources, like web parts, maybe features, etc. Something that a single developer could conceivably do in a short chunk of time in the project.
    The reason is primarily that when you attach VS 2005 to a dll/process during interactive debugging, the step-through operations lock up that application pool for anyone else sharing the same execution context.
  • Your basic virtual machine for this kind of dev work is going to be running Windows Server 2003, Windows SharePoint Services 3.0 and Visual Studio 2005, at minimum.
    In our situation we’ll probably need the whole shooting match, because unapproved SQL servers are not allowed on the open net, and they’re unapproved if they’re not solely administrated by our DBA group. Similar injunctions apply to domain controllers and unapproved DC/AD activity, as well as simply running Windows Server 2003 on a non-server box.
  • If you are allowed to run the server type applications on the VPC and then use your host operating system (i.e. the Windows XP or whatever you’re running VPC in) to run the client developer tools against the services on the VPC, the actual recommendation is to do that instead
    So you’d have a standard VPC image that has Server 2003, SQL Server (unless you’re sharing that as a network resource hosted somewhere else – but be sure you have the permissions you need to noodle around in it if it is a shared resource), AD (same cautions apply to this as SQL Server), and WSS3.0/MOSS 2007. You’d run this in a host operating system where you’d run all your client development tools. You’d still have one execution environment per developer, but the performance might be a little better.

Anyhow, here’s a list of relevant blog entries/MSDN/Technet articles:

Some links also for software architecture/development approaches:

And some links to relevant tools:

I’ll add more links as I ID them.

Send to -> Other Location sort of works!

(Woops, had to replace the images after pixelizing out all the identifying URL and ID information from the screenshots.)

(Woops 2, apparently I have now broken the image attachments here, so maybe I have to do it all again, sigh. Will do so later – Short version is that IE7 seems to be able to copy like this only across web applications, whereas Firefox 2.0 and IE6 seem only to be able to copy from subsite to parent site (I haven’t tried the other direction yet). Issue is now active with Microsoft support. The SharePoint support team have escalated the issue to the IE team, but I have yet to hear from the IE team.)

I have an open Microsoft Support case for this, and will update when I get more information.

In a document library in MOSS2007/WSS3.0, if you use the drop-down menu on the document, you can choose “Send To” -> “Other Location”. In the following dialogue box, you can select another location to send this document to. It seems to work in both IE7 and Firefox 2.0 if you are publishing from one site collection to another, but not if you are publishing, say, from a locked down subsite to a parent site that’s more public. In that case, the Send works fine in Firefox 2.0 but not in IE7. In IE7, you get prompted for permissions no matter who you are (or at least I do, being admin up and down the tree of sites, subsites, server farms, on the machine, using the account that runs the whole farm, etc.).

So here’s the typical functioning copy process (for a copy to other location between two site collections on the same Web Application (aka IIS’s virtual server, and specifically in this case, the same port: 80).

Start the copy process:
01PX - Dropdown to Copy to Other Location in IE7 - Cross Site Collection

Specify parameters:
02PX - Initial Copy settings screen - Cross-Site Collection Copy

Confirm copy:
03PX - Copy Progress popup with confirmation (OK) button - Cross-Site-Collection Copy

Review success message:
04PX - Copy Progress Successful/Done - Cross-Site-Collection Copy

Confirm that the copy succeeded in destination document library:
05PX - Confirmation of Copy - Cross-Site-Collection Copy

In contrast, here’s the same operation NOT working in IE7 while copying a file from a locked down subsite to the subsite’s parent (and top-level) site.

Start the operation normally:
06PX - Drop down command - Copy from Subsite

Set the parameters for the copy:
07PX - Copy Screen - Copy from Subsite

The Copy Progess confirmation popup window:
08PX - Copy Confirmation popup - Copy from Subsite

Get prompted for access (click cancel after trying lots of different possibilities):
09PX - Login Prompt - Copy from Subsite

Get returned to the Copy Progress popup with failures:
10PX - Copy Progress failures - Copy from Subsite

And here, I demonstrate that Firefox 2.0 has no problem with the same operation IE7 just failed at:

Initial authentication (Because it’s Firefox):
11PX - Firefox 2.0 - Initial Authentication - From Subsite

Start the copy via the dropdown box (normal):
12PX - Firefox 2.0 - Start the Copy - From Subsite

Normal copy parameters:
13PX - Firefox 2.0 - Copy Parameters - From Subsite

No popup window for Copy Progress – you get a nice website screen instead. Same deal, though:
14PX - Firefox 2.0 - Copy Confirmation - From Subsite

Copy Progress Screen reports finished:
15PX - Firefox 2.0 - Finish status from the Copy Progress - From Subsite

Confirm the copy worked – go to the destination Document Library:
16PX - Firefox 2.0 - Confirmation that document is where it should be - From Subsite