DCDIAG.EXE Problem On Windows 2008(R2): VerifyEnterpriseReferences indicates problem “Missing Expected Value” & points to Knowledge Base Article: Q312862

I was preparing to replace some 5 year old DELL PE1850 servers running Active Directory with new DELL R610 servers when the DCDIAG.exe output showed a possible issue with SYSVOL FRS and some missing expected value.

Starting test: VerifyEnterpriseReferences

The following problems were found while verifying various important DN

references.  Note, that  these problems can be reported because of

latency in replication.  So follow up to resolve the following

problems, only if the same problem is reported on all DCs for a given

domain or if  the problem persists after replication has hadreasonable time to replicate changes.

[1] Problem: Missing Expected Value

Base Object: CN=DC1,OU=CITY,OU=Domain Controllers,DC=corp,DC=com

Base Object Description: "DC Account Object"

Value Object Attribute Name: msDFSR-ComputerReferenceBL

Value Object Description: "SYSVOL FRS Member Object"

Recommended Action: See Knowledge Base Article: Q312862

The log points to a knowledge base article at  but that has no relevance here.This is a phantom error when found under following circumstances. It occurs on Windows 2008 or Windows 2008 R2 when you are running in Windows 2008 or Windows 2008 R2 domain functional level. Since Windows 2008 the File Replication Service (FRS) that sysvol uses has been replaced with the  Distributed File Replication service (DFRS) as used by DFS. If you’re not yet running DFRS when you can (which is highly recommend  http://blogs.technet.com/b/askds/archive/2010/04/22/the-case-for-migrating-sysvol-to-dfsr.aspx but not required), you’ll see this error show up when running DCDIAG.exe, so no real issue at all.

There are lots of posts on the internet pointing to various possible issues or causes: http://social.technet.microsoft.com/Forums/en-US/winserverDS/thread/2ce07c3f-9956-4bec-ae46-055f311c5d96/  & http://social.technet.microsoft.com/Forums/en-IE/winserverDS/thread/3062d40a-b73e-42ea-b27a-e817ee29abc1. But before you worry to much I suggest you check that everything that has to do with replication is running well. Is so and you’re running in Windows 2008 or Windows 2008 R2 domain functional level you’ll see this error go way once you complete your migration to DFRS.

So, to recapture, if you have a well maintained & working Active Directory, do not panic when you see some warning or failures in diagnostic test results. Make sure things are indeed fine and if you conclude that you don’t have any lingering problems, do some further research on what the real reason might . This pahnatom error is a fine example of this.

There is an absolute brilliant step by step guide to get the move from FRS to DFRS completed without a problem in a series by the storage team at Microsoft . You can find the first of a 5 part blog series over here http://blogs.technet.com/b/filecab/archive/2008/02/08/sysvol-migration-series-part-1-introduction-to-the-sysvol-migration-process.aspx.

While you are at it. If your still running DFS in Windows 2000 native mode, you might want to upgrade that as well. More on that later Smile

Exchange 2007 & 2010 Event ID’s: 2601, 2604, 2501 & Users Can’t Access Mailboxes / Public Folders On My Day Off

I took the day off as I needed some time to deal with government administration. Good thing this is a blog about IT issues because holey crap what a time eating, confusing and rather pointless mess government administration can be. The process to get to the desired outcome is very tedious, prone to misunderstanding & pretty inefficient . What the entire duration of the process and the number of administrative entities involved contribute to the desired result is a mystery. It’s pure show and window dressing. But OK, we took the day of to finally get it all sorted after 5 months of patiently waiting for this day.

So I sleep until 08:00, get up and head for the kitchen for a jar of coffee. With the only Java I truly like in my hand I make my way to the home office. I check mails/alerts from System Center, Support Requests etc. I’m like a responsible guy dude, even when I need a day off. I do monitor the condition of my projects in production and I do step in when needed and document my findings. It keeps me honest when I design and sell my solutions. Beware of some architects that are not the ones having to deal with the crap architectures they design, they are often empty suits. Anyway, I see an issue that could be a warning of more to come. Someone has a problem with Outlook 2007 which reports the following error (translation from Dutch):

“Unable to expand the folder. The Microsoft Exchange Server computer is not available. Either there are network problems or the Microsoft Exchange Server computer is down for maintenance.(/o=<DOMAIN>/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=servers/cn=<dagmember1>)”

Now I know that user. Smart, diligent and reliable. That user even provides the relevant and necessary information in their support request. Yes they do exist and HRM should hire those exclusively. So in combination with that error we knew we did not have an PEBKAC or ID-10T on our hands but a real issue.

I quickly check that DAG member node Outlook of that user is trying to connect to but I know that due to maintenance their mailboxes currently reside on another member of the DAG. So i could very well be just the public folders. Bingo. A quick test reveals this to be the case. Also the Windows 2008 R2 server and Exchange 2010 itself are running perfectly fine, happy as can be, except on that one node we see the Application Event Log messages:

Log Name:      Application
Source:        MSExchange ADAccess
Date:          8/19/2010 7:12:43 AM
Event ID:      2601
Task Category: General
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      dagmember1.company.blog
Description:
Process MSEXCHANGEADTOPOLOGY (PID=1620). When initializing a remote procedure call (RPC) to the Microsoft Exchange Active Directory Topology service, Exchange could not retrieve the SID for account <WKGUID=XXXXXXXXXXNOREALIDXXXXXXXXXXXXXX,CN=Microsoft Exchange,CN=Services,CN=Configuration,…> – Error code=8007077f. The Microsoft Exchange Active Directory Topology service will continue starting with limited permissions.

Log Name:      Application
Source:        MSExchange ADAccess
Date:          8/19/2010 7:12:43 AM
Event ID:      2604
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      dagmember1.company.demo
Description:
Process MSEXCHANGEADTOPOLOGY (PID=1620). When updating security for a remote procedure call (RPC) access for the Microsoft Exchange Active Directory Topology service, Exchange could not retrieve the security descriptor for Exchange server object DAGMEMBER1 – Error code=8007077f. The Microsoft Exchange Active Directory Topology service will continue starting with limited permissions.

Log Name:      Application
Source:        MSExchange ADAccess
Date:          8/19/2010 7:12:43 AM
Event ID:      2501
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      dagmember1.company.blog
Description:
Process MSEXCHANGEADTOPOLOGY (PID=1620). The site monitor API was unable to verify the site name for this Exchange computer – Call=DsctxGetContext Error code=8007077f. Make sure that Exchange server is correctly registered on the DNS server.

I think I’m OK when I see the possible cause. Why? Because I also know even if that probable cause isn’t the problem, it’s a hiccup I’ve seen before and I know how to fix its one. When you search those errors you can find a TechNet article describing a possible cause: “An inactive network connection is first on the binding list” http://technet.microsoft.com/en-us/library/dd789571(EXCHG.80).aspx. The fix is quite simple. Correct the NIC order and restart the MSExchange ADTopology Service. I had my scare about Active Directory and DNS horrors the first time I ever saw this one. So no gut wrenching panic here 🙂

But why do servers ever get in to this state when the NIC ordering is just fine? We did some firmware and upgrade recently after hours but that didn’t affect the NIC binding order. Now I’m pretty weird at times but I still know what I’m doing. Those NIC where OK when I configured those servers. Checking that has become a second nature on multi homed and clustered servers. I also remember happening this to me once before somewhere in February 2010 with another setup of Exchange 2010 on Windows 2008 R2. And in that case the NIC order in the binding list was also OK. I checked back then as well just to make sure. But since I build those Exchange 2010 setups myself I just know they are close to godliness both in design and implementation :-). Back then the issue went away by restarting the server, restarting the MSExchange ADTopology Service will do however, and the problem never came back. For some reason the AD Site information query fails. Now Windows retries and is OK after a while. Exchange, tries to get the AD Site information once, fails and keeps thinking there is an issue. With as a result clients have no connectivity and those errors that initially make you think you could have DNS issues, AD problems etc. But fortunately it’s a lot less serious.

So when the NIC binding order is OK why does this happen? I can’t tell you for sure but I do know that I’m not the only one (not that weird after all) since Microsoft published KB Article “MSExchange ADAccess Event ID’s 2601, 2604, 2501” http://support.microsoft.com/kb/2025528 . This article is a so called FAST PUBLISH from Microsoft Support and states that the issue only occurs on Windows 2008 R2 and that it affect Exchange 2007 and Exchange 2010. The cause? Well this is where they provide only what I already knew:

“During a restart of the server, the operating system queries Active Directory to get its AD Site information.  On a Windows 2008 R2 server, this will sometimes fail.  As the Exchange services are starting, it also will do a query for its AD Site and that too will fail. Windows will continue to try and determine its AD Site name and will eventually succeed.  However, Exchange does not re-try the query and the above errors are logged in the application log every 15 minutes.”

And yes the workaround/fix is also nothing new:

“After the server has been up for a minute or two, run NLTest /DSGetSite to verify that that the proper Active Directory Site is being returned by Windows.  Once that has been verified, restart the MSExchange ADTopology Service.”

Do note that this will also restart a slew of dependant Exchange services so it takes a little while.

  • Microsoft Exchange Transport Log Search
  • Microsoft Exchange Transport Log
  • Microsoft Exchange Service Host
  • Microsoft Exchange Search Indexer
  • Microsoft Exchange Replication Service
  • Microsoft Exchange Mail Submission
  • Microsoft Exchange Mailbox Assistants
  • Microsoft Exchange File Distribution
  • Microsoft Exchange EdgeSync
  • Microsoft Exchange Anti-spam Update

So after some manual intervention we had the users back in business. And all is well for them, as they rise and sleep under the watchful eye of a bunch of good IT Pro’s who’ll protect them form further harm and problems 😉 Now I need to get an auto fix for this I think until Microsoft fixes this one for good. SCOM where are you? No, no, no … It’s my day off for getting that administration done!

Windows 2008 R2 SP1 Beta Install Gone Wrong: Service Pack Installation failed with error code 0x800f0818

Today I was in the lab installing Windows 2088 R2 SP1 beta on the nodes of a test Hyper-V Live Migration Cluster. It went pretty well and quick on all nodes except for one. I got “An unknown error has occurred” and the details said Service Pack Installation failed with error code 0x800f0818. A quick search on the internet didn’t provide any applicable results.  Bummer. I really need all nodes on SP1 Beta. The CBS log didn’t reveal much either but the tool Microsoft advices to use when clicking on the link to get more information about the error helped out. The link sends you to http://windows.microsoft.com/en-gb/windows7/troubleshoot-problems-installing-service-pack and explains things to check, which is apart from anti virus tools and such inconsistencies in the Windows Servicing Store . It also points you towards the System Update Readiness Tool and provides a link http://windows.microsoft.com/en-GB/windows7/What-is-the-System-Update-Readiness-Tool. Click on the link to get more information on the use of it. Download the appropriate versions (in our case Windows 2008 R2 x64) and install the tool. This will check for any issues and repair them if possible. After that you can try to install SP1 beta again. But this didn’t work. What now? Well that tool creates a log named produced a log named CheckSUR.log  in the folder C:WindowsLogsCBS. This is something that is documented in http://support.microsoft.com/?kbid=947821 a KB titled “Description of the System Update Readiness Tool for Windows Vista, for Windows Server 2008, for Windows 7, and for Windows Server 2008 R2”

So we went to look for the log and yes it was there.

In that log file at C:WindowsLogsCBSCheckSUR.log I found the following warning:

=================================
Checking System Update Readiness.
Binary Version 6.1.7600.20667
Package Version 8.0
2010-07-17 12:40

Checking Windows Servicing Packages

Checking Package Manifests and Catalogs
(f)    CBS MUM Corrupt    0x00000000    servicingPackagesMicrosoft-Windows-FileServices-BPA-Package-MiniLP~31bf3856ad364e35~amd64~en-US~7.1.7600.16422.mum        Expected file name Microsoft-Windows-Rights-Management-Services~31bf3856ad364e35~amd64~~6.1.7600.16385.mum does not match the actual file name

Checking Package Watchlist

Checking Component Watchlist

Checking Packages

Checking Component Store

Summary:
Seconds executed: 371
Found 1 errors
CBS MUM Corrupt Total count: 1

Unavailable repair files:
servicingpackagesMicrosoft-Windows-FileServices-BPA-Package-MiniLP~31bf3856ad364e35~amd64~en-US~7.1.7600.16422.mum
servicingpackagesMicrosoft-Windows-FileServices-BPA-Package-MiniLP~31bf3856ad364e35~amd64~en-US~7.1.7600.16422.cat

Ah well we’ve dealt with issues like this before with Vista and Windows 2008 when files in the C:Windowswinsxs folder get corrupted. No sweat, especially since this is a lab and I have other servers available. The trick is to copy these from another Windows 2008 R2 server (those where a more recent version than the ones on the problematic server). Now to be able to do this you might need to take ownership of the folder and grant yourself full control so you can overwrite the files.

The default permissions on the Package folder.

An example of the default permissions of a file in the Package folder.

Afterwards you give ownership back to the original owner of the folder and files and take away your permissions to restore the original state of the server. The Local Administrator group is the default owner and the Trusted Installer is the one with Full Control permissions, so make sure that’s back in order.

But the main thing is using the System Update Readiness Tool and checking the log file resulted in me being able to find the root cause of the SP1 beta install issue and fix it. It’s a typical issue your see once with a service pack install. The solution is a bit convoluted and you need a good second machine to borrow the files from but in the end it’s not very complicated to fix.

So if you run into some issues during the Windows 2008 R2 SP1 Beta installation you know what to try so you to can enjoy testing Windows 2008 R2 SP1 just like me 🙂