Exchange 2007 & 2010 Event ID’s: 2601, 2604, 2501 & Users Can’t Access Mailboxes / Public Folders On My Day Off

I took the day off as I needed some time to deal with government administration. Good thing this is a blog about IT issues because holey crap what a time eating, confusing and rather pointless mess government administration can be. The process to get to the desired outcome is very tedious, prone to misunderstanding & pretty inefficient . What the entire duration of the process and the number of administrative entities involved contribute to the desired result is a mystery. It’s pure show and window dressing. But OK, we took the day of to finally get it all sorted after 5 months of patiently waiting for this day.

So I sleep until 08:00, get up and head for the kitchen for a jar of coffee. With the only Java I truly like in my hand I make my way to the home office. I check mails/alerts from System Center, Support Requests etc. I’m like a responsible guy dude, even when I need a day off. I do monitor the condition of my projects in production and I do step in when needed and document my findings. It keeps me honest when I design and sell my solutions. Beware of some architects that are not the ones having to deal with the crap architectures they design, they are often empty suits. Anyway, I see an issue that could be a warning of more to come. Someone has a problem with Outlook 2007 which reports the following error (translation from Dutch):

“Unable to expand the folder. The Microsoft Exchange Server computer is not available. Either there are network problems or the Microsoft Exchange Server computer is down for maintenance.(/o=<DOMAIN>/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=servers/cn=<dagmember1>)”

Now I know that user. Smart, diligent and reliable. That user even provides the relevant and necessary information in their support request. Yes they do exist and HRM should hire those exclusively. So in combination with that error we knew we did not have an PEBKAC or ID-10T on our hands but a real issue.

I quickly check that DAG member node Outlook of that user is trying to connect to but I know that due to maintenance their mailboxes currently reside on another member of the DAG. So i could very well be just the public folders. Bingo. A quick test reveals this to be the case. Also the Windows 2008 R2 server and Exchange 2010 itself are running perfectly fine, happy as can be, except on that one node we see the Application Event Log messages:

Log Name:      Application
Source:        MSExchange ADAccess
Date:          8/19/2010 7:12:43 AM
Event ID:      2601
Task Category: General
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      dagmember1.company.blog
Description:
Process MSEXCHANGEADTOPOLOGY (PID=1620). When initializing a remote procedure call (RPC) to the Microsoft Exchange Active Directory Topology service, Exchange could not retrieve the SID for account <WKGUID=XXXXXXXXXXNOREALIDXXXXXXXXXXXXXX,CN=Microsoft Exchange,CN=Services,CN=Configuration,…> – Error code=8007077f. The Microsoft Exchange Active Directory Topology service will continue starting with limited permissions.

Log Name:      Application
Source:        MSExchange ADAccess
Date:          8/19/2010 7:12:43 AM
Event ID:      2604
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      dagmember1.company.demo
Description:
Process MSEXCHANGEADTOPOLOGY (PID=1620). When updating security for a remote procedure call (RPC) access for the Microsoft Exchange Active Directory Topology service, Exchange could not retrieve the security descriptor for Exchange server object DAGMEMBER1 – Error code=8007077f. The Microsoft Exchange Active Directory Topology service will continue starting with limited permissions.

Log Name:      Application
Source:        MSExchange ADAccess
Date:          8/19/2010 7:12:43 AM
Event ID:      2501
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      dagmember1.company.blog
Description:
Process MSEXCHANGEADTOPOLOGY (PID=1620). The site monitor API was unable to verify the site name for this Exchange computer – Call=DsctxGetContext Error code=8007077f. Make sure that Exchange server is correctly registered on the DNS server.

I think I’m OK when I see the possible cause. Why? Because I also know even if that probable cause isn’t the problem, it’s a hiccup I’ve seen before and I know how to fix its one. When you search those errors you can find a TechNet article describing a possible cause: “An inactive network connection is first on the binding list” http://technet.microsoft.com/en-us/library/dd789571(EXCHG.80).aspx. The fix is quite simple. Correct the NIC order and restart the MSExchange ADTopology Service. I had my scare about Active Directory and DNS horrors the first time I ever saw this one. So no gut wrenching panic here 🙂

But why do servers ever get in to this state when the NIC ordering is just fine? We did some firmware and upgrade recently after hours but that didn’t affect the NIC binding order. Now I’m pretty weird at times but I still know what I’m doing. Those NIC where OK when I configured those servers. Checking that has become a second nature on multi homed and clustered servers. I also remember happening this to me once before somewhere in February 2010 with another setup of Exchange 2010 on Windows 2008 R2. And in that case the NIC order in the binding list was also OK. I checked back then as well just to make sure. But since I build those Exchange 2010 setups myself I just know they are close to godliness both in design and implementation :-). Back then the issue went away by restarting the server, restarting the MSExchange ADTopology Service will do however, and the problem never came back. For some reason the AD Site information query fails. Now Windows retries and is OK after a while. Exchange, tries to get the AD Site information once, fails and keeps thinking there is an issue. With as a result clients have no connectivity and those errors that initially make you think you could have DNS issues, AD problems etc. But fortunately it’s a lot less serious.

So when the NIC binding order is OK why does this happen? I can’t tell you for sure but I do know that I’m not the only one (not that weird after all) since Microsoft published KB Article “MSExchange ADAccess Event ID’s 2601, 2604, 2501” http://support.microsoft.com/kb/2025528 . This article is a so called FAST PUBLISH from Microsoft Support and states that the issue only occurs on Windows 2008 R2 and that it affect Exchange 2007 and Exchange 2010. The cause? Well this is where they provide only what I already knew:

“During a restart of the server, the operating system queries Active Directory to get its AD Site information.  On a Windows 2008 R2 server, this will sometimes fail.  As the Exchange services are starting, it also will do a query for its AD Site and that too will fail. Windows will continue to try and determine its AD Site name and will eventually succeed.  However, Exchange does not re-try the query and the above errors are logged in the application log every 15 minutes.”

And yes the workaround/fix is also nothing new:

“After the server has been up for a minute or two, run NLTest /DSGetSite to verify that that the proper Active Directory Site is being returned by Windows.  Once that has been verified, restart the MSExchange ADTopology Service.”

Do note that this will also restart a slew of dependant Exchange services so it takes a little while.

  • Microsoft Exchange Transport Log Search
  • Microsoft Exchange Transport Log
  • Microsoft Exchange Service Host
  • Microsoft Exchange Search Indexer
  • Microsoft Exchange Replication Service
  • Microsoft Exchange Mail Submission
  • Microsoft Exchange Mailbox Assistants
  • Microsoft Exchange File Distribution
  • Microsoft Exchange EdgeSync
  • Microsoft Exchange Anti-spam Update

So after some manual intervention we had the users back in business. And all is well for them, as they rise and sleep under the watchful eye of a bunch of good IT Pro’s who’ll protect them form further harm and problems 😉 Now I need to get an auto fix for this I think until Microsoft fixes this one for good. SCOM where are you? No, no, no … It’s my day off for getting that administration done!

Exchange 2007-2010 Public Folders Issues “The Active Directory user wasn’t found.”

I was working on an Exchange 2007 to Exchange 2010 project when we ran into trouble creating our first public folder database on an Exchange 2010 server. Mind you, this was just creating the database. We did not even set up replication for this database yet. All mailboxes still resided in Exchange 2007 databases pointing to an Exchange 2007 public folder. Very soon after creating the database we got notified users could no longer send mails to mail enabled public folders. The exact error was this:

554 5.6.0 STOREDRV.Deliver.Exception:ObjectNotFoundException; Failed to process message due to a permanent exception with message The Active Directory user wasn’t found.

Also browsing of the public folders in Outlook was slow and the application froze/hung. These issues where fixed very fast by getting rid of the still unused public folder database all together. Now we could commence our search for the root cause. The error seemed related to the issue described in Public Folder Replication Fails Due To Empty Legacy Administrative Group which can be found @ http://msexchangeteam.com/archive/2010/05/05/454821.aspx  The blog describes this error during replication:

Log Name: Application

Source: MSExchange Store Driver

Event ID: 1020

Level: Error

Description:

The store driver couldn’t deliver the public folder replication message “Hierarchy ([email protected])” because the following error occurred: The Active Directory user wasn’t found.

But apart from replication not working there were other, more severe issues impacting end users who can still all be on Exchange 2007. The hanging of the outlook clients and mail enabled folders no longer being available. Dave Stork blogged about this in http://blogs.dirteam.com/blogs/davestork/archive/2010/03/16/mail-enabled-public-folder-recipient-not-found.aspx

Now the first mentions of the replication issue have been reported back in November 2009 (see http://get-exchange.blogspot.com/2009/11/public-folder-mayhem-exchange-2010.html) but still hasn’t been fixed. For the moment that fix is planned to be included in E2K10 RU5. Currently we’re at RU3, so that might well be august 2010.

The workaround described in above mentioned blog posts works & is effective immediately. Now they described the issue and the fix very well but I’ll add to tips.

Tip 1

“Practical End User Friendly Detection” of this issue can be done using exfolders.exe. You can read more about this tool here: “Exchange, meet ExFolders” (http://msexchangeteam.com/archive/2009/12/04/453399.aspx).The error only occurs when you create a public folder on Exchange 2010 and can be very annoying for the users so I’ll share this tip with you. Download the tool here http://msexchangeteam.com/files/12/attachments/entry453398.aspx and install it on an Exchange 2010 server in the bin directory (follow the readme.txt and don’t forget to merge the .reg file or the tool will crash). Running exfolders.exe and connect against any Exchange 2007 public folder. When you get this error …

—————————

ExFolders

—————————

An error occurred while trying to establish a connection to the Exchange server. Exception: The Active Directory user wasn’t found.

—————————

OK  

—————————

… you know you are affected. Deleting the empty Servers containers from ALL legacy Administrative Groups fixes the error. You then can connect successfully to a Exchange 2007 public folder with exfolder.exe. Which is a cool way to test for this issue and if the fix works as you don’t need to create a public folder and possibly hinder you users.

Tip 2

Also note that you need to delete  (using ADSIEDIT) every empty servers container out of every legacy Administrative Group, not just or only the one in the “First Administrative Group”. Don’t worry if you renamed that one to something more descriptive, that doesn’t matter at all. All the servers containers in the legacy Administrative Group should be empty I you have no more E2K3 servers left in your exchange organization. Feel free to leave comments on your experiences.

PF2010Bug

DELL PE1850 Domain Controller Upgrade to Windows 2008 R2

UPDATE May 20th 2010: DELL changed the location of the pdf’s I updated the links to them on this blog post as I see lost of people clicking on them. Hope this helps.

Just in case someone needs to do something similar, I’m posting some of issues I needed to resolve when I did an Active Directory upgrade for a partner’s IT department (September/October 2009). There was a need for Windows 2008 R2 DNS (for Direct Access), the Active Directory Recycle Bin as well as the desire to have as much servers as possible running the same OS to keep management easy. This is not an extensive manual of any tool used but it will give you some pointers.

The hardware in use was DELL. Seven Power Edge 1850’s servers spread around the country in corporate HQ and large branch offices. Those servers where 3 years old at the time and had 2 years support remaining on the contract. A hardware replacement or virtualization was not an option. So we needed to find out if the upgrade was possible. The good news was we could get our hands on a spare PE1850 for testing and if need be a "swing type" migration. But to reduce the work an in-place upgrade was preferred. The original installation of these domain controllers where x64 bit, otherwise that would have been a no go, W2K8R2 is x64 bit only. It was quite a smart and forward looking chap who did the original project. OK, I admit it was me, so this is blatant self-promotion. The anti-virus had W2K8R2 support; we got an updated agent for the UPS from the vendor, etc. It all looked pretty good.

I checked the following support documents on the dell website:

Microsoft® Windows Server® 2008 R2 for Dell™ PowerEdge™ Systems

Important Information Guide

https://support.dell.com/support/edocs/software/win2008/WS08_R2/en/IIG/WS08R2IG.pdf

Microsoft® Windows Server® 2008 R2 for Dell™ PowerEdge™ Systems

Installing Microsoft Windows Server 2008 R2

http://support.dell.com/support/edocs/software/win2008/WS08_R2/en/ING/WS08R2In.pdf

There found that all drivers and firm ware updates needed to support W2K8R2 on a PE1850 where available except for one and that was the driver for the raid controller, a PERC4e/si. That was a potential show stopper, but a driver was coming. So I kept a close eye on the DELL FTP site and around 3 September it showed up. Using the SUU 6.1 DVD or manually download installation packets I upgraded the firmware of the servers (BIOS, DRAC).

I also ran the upgrade advisor and found that we need to remove the Dell Open Manage Server Assistant version as the aging Dell® OpenManage Diagnostic Service used an unsigned driver (C:Program Files (x86)DellSysMgtoldiagspackagesPORTACCESSOR64.sys). All potential problematic software, unneeded tools and drivers like video, anti-virus, UPS … where removed as well as this makes any upgrade process less risky.

No Native RAID Driver

As the DELL PERC 4e/Si is not natively supported by Windows 2008 R2 we need to use the DELL driver (R227150.exe)
from the FTP site. You could put the drivers in a subfolder $WinPEDriver$ on the root of a volume that Windows can find during the upgrade (hard disk, usb thumb drive…). Now to make absolutely sure we didn’t have any issues with the raid controller we decided to inject the driver in to WIM files to build a custom ISO. That might be redundant but we wanted to have an ISO with all needed drivers for disaster recovery purposes anyway. The drivers need to be injected into the boot.wim and thee install.wim files using DISM (Deployment Image Servicing and Management from the WAIK for Windows 7 en W2K8R2) see The Windows® Automated Installation Kit (AIK) for Windows® 7 @ http://www.microsoft.com/downloads/details.aspx?familyID=f1bae135-4190-4d7c-b193-19123141edaa&displaylang=en.

We used the x64 bit versions of the tools as our WIM files are x64 bit

The following commands inject the driver into the boot wim file’s two indexes:

DISM /MOUNT-WIM /WIMFILE:D:InjectDriverboot.wim /INDEX:1 /MOUNTDIR:D:MOUNTHERE

DISM /IMAGE:D:MOUNTHERE /ADD-DRIVER /DRIVER:D:DRIVERSDELLR227150

DISM /UNMOUNT-WIM /MOUNTDIR:D:MOUNTHERE /COMMIT

DISM /MOUNT-WIM /WIMFILE:D:InjectDriverboot.wim /INDEX:2 /MOUNTDIR:D:MOUNTHERE

DISM /IMAGE:D:MOUNTHERE /ADD-DRIVER /DRIVER:D:DRIVERSDELLR227150

DISM /UNMOUNT-WIM /MOUNTDIR:D:MOUNTHERE /COMMIT

Index 1 is the Microsoft Windows Preinstallation Environment (WinPE) and Index 2 is the actual Windows Setup that you can run when booted into WinPE. DISM has a command to find out more info about the image files: DISM.exe /Get-WimInfo. The documentation in the WAIK is quite good. Read it!

The following commands inject the driver into the install wim file. You need to do that for any index you want or need (Web, Standard, Enterprise, core and full install …) Just paste everything you need in a cmd file and you’re good to go.

DISM /MOUNT-WIM /WIMFILE:D:InjectDriverinstall.wim /INDEX:1 /MOUNTDIR:D:MOUNTHERE

DISM /IMAGE:D:MOUNTHERE /ADD-DRIVER /DRIVER:D:DRIVERSDELLR227150

DISM /UNMOUNT-WIM /MOUNTDIR:D:MOUNTHERE /COMMIT

Video Driver Injection Hiccup

As we wanted to have a good screen resolution for the sys admins we also embedded the video drivers. The screen resolution with the native driver wasn’t very good so we looked around to find one that would work. We found the Radeon 7000M driver (ATI_Radeon-7000M_A00_R177829.exe) on the DELL website and also injected them into the boot.wim and install.wim image files using DISM. That way we didn’t need to update the video drivers after installation. Cool. Most video drivers are packed twice. The trick is that you still need to expand the drivers after you extracted them from the installer using WinZip, 7Zip, WinRAR or whatever it is your use or prefer. Otherwise you’ll get an error like this after dism has found the inf:

Searching for driver packages to install…Found 2 driver package(s) to install.
Installing 1 of 2 – D:DRIVERSDELLR177829DriverXP6A_INFCA_58688.inf: Error – An error occurred. The driver package could not be installed.For more information, check for log files in the <windir>inf folder of the target image.
Installing 2 of 2 – D:DRIVERSDELLR177829DriverXP_INFCX_58688.inf: Error – An error occurred. The driver package could not be installed.For more information, check for log files in the <windir>inf folder of the target image.

For more information, check for log files in the <windir>inf folder of the target image.  

Error 30

The command completed with errors. For more information, refer to the log file.

The DISM log file can be found at C:WindowsLogsDISMdism.log

If you look in the dism.log you’ll find an error code like 0x8007001E as the cause of the error. This is rather cryptic. Any way you can prevent this by extracting them:

Expand D:DRIVERSDELLR177829DriverXP6A_INFB_58469*.* D:DriversExpanded

Copy the expanded files into a copy of the original folder structure to replace the original files. Make sure that you repeat this exercise for any subfolder as well if needed or you’ll only expand none or only a portion of the files. When you’ve done that you can add the drivers to the WIM files. Beware that adding those large video drivers can take rather long.

Now that we have added all drivers to the wim files we write the customized installation to an ISO file (oscdimg.exe, WAIK). We can burn this to a CD to add to the disaster recovery kit or mount the ISO using the DRAC media.

"C:Program FilesWindows AIKToolsamd64oscdimg.exe" -n -m -bD:W2K8R2WithDellPerc4esiDriverW2K8R2bootetfsboot.com "D:W2K8R2WithDellPerc4esiDriverW2K8R2" "D:W2K8R2WithDellPerc4eDiAndRadeon7000M.iso"

Using the custom install we were able to upgrade the domain controllers fast and without any issues. All what was left to do after the upgrade was check if all was well with the DC. After that we had to clean up the post upgrade artifacts, install anti-virus, UPS software, Dell Management tools and provide for and schedule the backups.

So in in all we did 7 in place upgrades and now they have been running in native Windows 2008 Domain & forest functional level for over 4 months without any issues. Nice job. That’s the good thing about that partner, they always have some interesting jobs to do and helping out is always appreciated.