Fix Windows Server 2012 R2 DFSR Event ID 4614

Recently had a ticket come in where a newly created Domain with two DC’s was not replicating properly. Upon logging into the DC’s I noted the following log entry in the DFS Replication Log


The DFS Replication service initialized SYSVOL at local path S:\SYSVOL\domain and is waiting to perform initial replication. The replicated folder will remain in the initial synchronization state until it has replicated with its partner . If the server was in the process of being promoted to a domain controller, the domain controller will not advertize and function as a domain controller until this issue is resolved. This can occur if the specified partner is also in the initial synchronization state, or if sharing violations are encountered on this server or the synchronization partner. If this event occurred during the migration of SYSVOL from File Replication service (FRS) to DFS Replication, changes will not replicate out until this issue is resolved. This can cause the SYSVOL folder on this server to become out of sync with other domain controllers.

Additional Information:
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: C0037E37-EF20-4CDF-968A-932E669ED810
Replication Group Name: Domain System Volume
Replication Group ID: 35F23B97-543A-4310-A08E-5F28D6342C18
Member ID: 41E8F809-5114-48DC-8297-B7E866502101
Read-Only: 0

Working backwards through the log I found this entry


The DFS Replication service encountered an error communicating with partner AD-02 for replication group Domain System Volume.

Partner DNS address: AD-02.contoso.com

Optional data if available:
Partner WINS Address: AD-02
Partner IP Address: 192.168.1.4

The service will retry the connection periodically.

Additional Information:
Error: 1726 (The remote procedure call failed.)
Connection ID: 3E337B81-109F-4A97-880C-63E30F52E63F
Replication Group ID: 35F23B97-543A-4310-A08E-5F28D6342C18

I didn’t see that error on AD-02 but I did see several alerts like this one leading up to when the above alert fired on AD-01.


The DNS server is waiting for Active Directory Domain Services (AD DS) to signal that the initial synchronization of the directory has been completed. The DNS server service cannot start until the initial synchronization is complete because critical DNS data might not yet be replicated onto this domain controller. If events in the AD DS event log indicate that there is a problem with DNS name resolution, consider adding the IP address of another DNS server for this domain to the DNS server list in the Internet Protocol properties of this computer. This event will be logged every two minutes until AD DS has signaled that the initial synchronization has successfully completed.

As both DC’s are also DNS servers I configured the binding to be the statically set IP address. Which should once and for all rule out any sort of weird resolution issues. This had no affect either. There is not a lot else to go on from either server around the date where I think things went sideways. There is an odd 15 day gap in the DFSR Log, but maybe that’s normal? I don’t really see any indication of a problem in the Directory Services, System or Application logs around that time.

I then installed the Active Directory Replication Status Tool (https://www.microsoft.com/en-us/download/details.aspx?id=30005), while it was interesting to run there were no issues reported. The next thing I ran was the health report from the DFS Management console, which told me more or less what I already knew, DFS was waiting on initial synchronization.

So the next thing to try was to figure out how to force that synchronization to happen. I found the following article (https://support.microsoft.com/en-us/kb/2218556) on Microsoft’s Support site. I ran through the Authoritative Synchronization steps and after going through that the servers were happy. I was able to drop a test file in SYSVOL and see it replicate to the other server, delete that file on the server and see it drop off the other server.

While I don’t really know what the root cause for this issue was, hopefully the next time it crops up I’ll be able to figure it out a lot more quickly. As a side note to the steps I followed below I would tell you that I restarted the DFSR service after making changes in ADSI, each time.

Prior to running dfsrdiag, you may need to install that feature

Add-WindowsFeature -Name RSAT-DFS-Mgmt-Con

In order to force Active Director Replication throughout the domain you will run this command


repadmin /syncall /APed

You want to force the non-authoritative synchronization of SYSVOL on a domain controller. In the File Replication Service (FRS), this was controlled through the D2 and D4 data values for the Burflags registry values, but these values do not exist for the Distributed File System Replication (DFSR) service. You cannot use the DFS Management snap-in (Dfsmgmt.msc) or the Dfsradmin.exe command-line tool to achieve this. Unlike custom DFSR replicated folders, SYSVOL is intentionally protected from any editing through its management interfaces to prevent accidents.

How to perform a non-authoritative synchronization of DFSR-replicated SYSVOL (like “D2” for FRS)

  1. In the ADSIEDIT.MSC tool modify the following distinguished name (DN) value and attribute on each of the domain controllers that you want to make non-authoritative:
    1. CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=servername,OU=Domain Controllers,DC=domainname
    2. msDFSR-Enabled=FALSE
  2. Force Active Directory replication throughout the domain.
  3. Run the following command from an elevated command prompt on the same servers that you set as non-authoritative:
    1. DFSRDIAG POLLAD
  4. You will see Event ID 4114 in the DFSR event log indicating SYSVOL is no longer being replicated.
  5. On the same DN from Step 1, set:
    1. msDFSR-Enabled=TRUE
  6. Force Active Directory replication throughout the domain.
  7. Run the following command from an elevated command prompt on the same servers that you set as non-authoritative:
    1. DFSRDIAG POLLAD
  8. You will see Event ID 4614 and 4604 in the DFSR event log indicating SYSVOL has been initialized. That domain controller has now done a “D2” of SYSVOL.

How to perform an authoritative synchronization of DFSR-replicated SYSVOL (like “D4” for FRS)

  1. Stop the DFSR service on all domain controllers
  2. In the ADSIEDIT.MSC tool, modify the following DN and two attributes on the domain controller you want to make authoritative (preferrably the PDC Emulator, which is usually the most up to date for SYSVOL contents):
    1. CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=servername,OU=Domain Controllers,DC=domainname
    2. msDFSR-Enabled=FALSE
    3. msDFSR-options=1
  3. Modify the following DN and single attribute on all other domain controllers in that domain:
    1. CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=servername,OU=Domain Controllers,DC=domainname
    2. msDFSR-Enabled=FALSE
  4. Force Active Directory replication throughout the domain and validate its success on all DCs.
  5. Start the DFSR service set as authoritative:
  6. You will see Event ID 4114 in the DFSR event log indicating SYSVOL is no longer being replicated.
  7. On the same DN from Step 1, set:
    1. msDFSR-Enabled=TRUE
  8. Force Active Directory replication throughout the domain and validate its success on all DCs.
  9. Run the following command from an elevated command prompt on the same server that you set as authoritative:
    1. DFSRDIAG POLLAD
  10. You will see Event ID 4602 in the DFSR event log indicating SYSVOL has been initialized. That domain controller has now done a “D4” of SYSVOL.
  11. Start the DFSR service on the other non-authoritative DCs. You will see Event ID 4114 in the DFSR event log indicating SYSVOL is no longer being replicated on each of them.
  12. Modify the following DN and single attribute on all other domain controllers in that domain:
    1. CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=servername,OU=Domain Controllers,DC=domainname
    2. msDFSR-Enabled=TRUE
  13. Run the following command from an elevated command prompt on all non-authoritative DCs (i.e. all but the formerly authoritative one):
    1. DFSRDIAG POLLAD

Week In Review : 07/20/2014

I had intended these to be weekly, and I got off track, sorry about that. So I missed the week of Jun 29, Jul 6 and Jul 13. I can tell you that the week of the 29th ended on July 5th, and I was pretty wiped out from all the fireworks and stuff. The week of the 6th we were in Florida on vacation and the last couple of weeks (Jul 13 and this week Jul 20) were about the same.

This week has been all about the code. My boss asked me to focus on getting the revised provisioning web app up and running, so I’m happy to report that it is on par with the previous version. The previous version was just a basic web app, nothing fancy, it had some pages that worked like a form and it was all driven by click events and such. It worked really well, but it lacked flexibility.

See, everything was written on a single page, the various “pages” were in reality Asp Panels separated by div’s. In order to add a feature I would have to figure out where to place it, then adjust all the code to handle it. Some of that was problem, I should have broken things out more earlier into separate functions, that may have made it easier, but I doubt it.

The new version is using MVC, each page has a model on the back to represent the data. So each page is strongly typed, For most things I use the built-in validation provided to me from that, for a few things I wrote some custom validation, for example to make sure someone chooses a valid option from a drop down and not the text “–Select an item–“. There are a few places where the custom validation just wasn’t appropriate. So I was able to create a couple of functions to handle that, I must say that in this new format it is a lot easier to work.

The way I have it now I just pass around the models from one page to the next, updating as we click along. I can pass those models to my custom validation and return a simple error message quite easily. Displaying the validation error’s previously was a pain, so I really enjoy the way this works.

Another nice feature is I can re-use a lot of what I do so I can provide new features. For example, I want to provide a way for admins to create a bare VM. I can use all the models I already have, as well as just about all the logic behind how that is wired up. If I want to add a new feature it’s as simple as writing up a model and then building a page for it, then writing a controller to handle it all, or updating an existing controller.

Some thoughts I’m having for the next version is to use the database, currently I’m not doing that at all, I’m actually querying VMware directly, this introduces some lag that the previous version didn’t have. I think that some of that can be resolved by me working through my code to get it optimized, but a lot of that is just inherent in how it all works.

So the thought was a separate job that would at regular intervals poll VMware for clusters, datastores, vm’s and so forth. Then the app becomes almost instantaneous, since instead of asking VMware we’ll ask the database. Then before we provision we will validate everything against VMware, which may introduce some entertaining issues, but as our change process is fairly slow I think this will work nicely.

The nice thing about storing this data I can add tables to give me logic that I had to code for before. I will be able to access some data without calling VMware since it will be stored in the database and some of the code I wrote to pull data from VMware I can dump as it won’t be needed.

So what all code am I using?

mod-VMware : allows users to connect to, query and clone (soon create) virtual machines
mod-proteus : allows users to connect to, query and add host records to Proteus and retrieve IP information for new hosts
mod-servicenow : allows users to connect to, query and create service tickets and configuration items
mod-Zenoss : allows users to connect to, query and add devices to Zenoss monitoring
mod-ads : allows users to connect to, query and create objects in Active Directory

I feel that I’m getting closer to my idea of being modular, where we can plug in the various things we want. I still haven’t worked that out yet, but in my head it works 😉

enjoy!

Week In Review : 06-22-2014

Not a terribly eventful week, I’ve been working on tuning Ops down and clearing out the errors, and lowering the signal to noise ratio. One of the hardest ones has been the domain trust monitor in the AD MP. So we use the firewall to isolate VLAN’s from one another and I don’t know that I totally agree with that, but one thing at a time I suppose. Our DC’s couldn’t resolve another domain to which we have a trust established, you read that right. So, I tried several different requests to make that go. I created their zone as a secondary in my DNS, had to allow TCP/UDP 53 through for that to work, still no go. Then I noticed that LDAP was being dropped on one of the firewalls so had to allow that through, still no go. Finally took out the big hammer and had those DC’s added to the same firewall group as mine. That worked, sort of, yesterday I noticed I’m still getting some errors, so I’ve decided that Monday I’m going to work on trying to get that and as much of everything else sorted out as I can.

I did get some code written this week, I have a proof of concept wired up that will allow us to copy profile information from one system over to SharePoint Online. I would feel better about it if we weren’t forced to use a deprecated web service. Which I need to bring up in our next meeting about this. The other code I worked on was the Arin Whois-RWS interface. I didn’t realize this, but Arin has a REST interface that allows you to query their Whois database.

So I’ve been working on that and wrote up a very nice little piece of code that is suitably generic for me, which I like!

So, if I want to start using the .net HTTP stuff on the Windows Phone and tablets, it looks like I have start using the async stuff. Also, I think a lot of their HTTP request stuff is moving to that anyway, so I’ll really need to keep that in mind as I progress in code writing. At any rate, I really like this GetResource function. A lot of the code I write I try to make as reusable as possible, I hate having to write code that more or less does something another piece of code already does. So we send in a string URL and cast it against the XML classes we created to return objects that we can easily work with. The disappointing part of this for me was I should be able to pass as part of the request that I want a JSON response, but the server seemed to ignore my ContentType request and kept spitting back XML.

As I was working through this I started thinking about the other code that I’ve written for VMware, Hyper-V, ServiceNow and so on. I think I’m going to re-write my provisioning web app so that I can be a little more pluggable than I currently am. For the VMware stuff I think this will be easier as I’ve already re-written that code to use the ManagedObjectReference for everything, and since at it’s root, that’s just a string this may be pretty simple. I’ll just need to create a couple of simple interfaces that I can pass strings or arrays into, as well as some functions to take the VMware objects and pass them up as strings or arrays.

Also I got notified from Microsoft that they are doing away with domains.live.com and that if you use that service for mail and so on, it will stop working very soon. So, I decided I would not wait around and moved my wife and daughter over to my Office365 subscription getting their email to work was simple, but my account to forward to the patton-tech.com one…not so much. Hopefully I can figure this out, because just adding that address to my account was not an option.

Week In Review : 06/08/2014

Still a lot of programming this week, but like I said before I think anymore that is more the norm than not. We did some interesting Active Directory stuff this week. We had a handful of servers get their AD objects deleted at some point, and we found out about the beginning of this week. Now my guess is these were deleted close to about 3mo ago, and they either rebooted recently or attempted to change their password recently.

About a year or more ago we changed our audit policy and started using Advanced Auditing. We were concerned about user account and group account management, but it turns out we should also have put in computer account management as well. When a computer object is deleted event 4743 is logged in the security log of the domain controller. We searched and couldn’t find that entry anywhere, when I started researching that event is when I found you need to tick the boxes for computer management.

Along those lines we had a similar issue, our admin accounts in our QA domain were disabled, since we do very little auditing at all in there, I enabled the same features so we can see when that happens. When a user account is disabled event 4725 is logged. To go along with both of these events I’m going to update our reporting in Ops on things like this.

While doing all this I found a very nice support article listing our the various event id’s and what they mean.

All of the servers that are supposed to report in to System Center Advisor, are now doing so. I feel rather stupid about the issue originally. So my first problem is that I wasn’t patched up to where I needed to be in order to even use the preview, so that was step 1. The next part is where it gets a little fuzzy, I don’t actually recall patching the clients on any of the agents reporting in, yet all 3 domain controllers reported an update agent. Coincidentally all 3 domain controllers were the only servers showing up. After some investigating with the SCA guys from Microsoft they quickly realized I had not patched my agents. So, I must have patched the DC’s, I just don’t recall doing it, hence the stupid.

So the result of this is a working program in SCCM that will patch outdated clients, which is good as my next step in this whole saga is to patch production. It’s either patch, or move over to R2 and currently I’m leaning towards patching. So currently in QA when a server gets discovered the ops client gets pushed down to them, now it will also get patched. Then the only manual part of this process left is to add them to the advisor management pack.

It’s been lots of fun talking with these guys about stuff, I’ve been invited to participate in an SCA board to go over new features and talk about how things work. My recent experiences dealing with some of the internal folks with Microsoft really make me want to work there more.

I’ve done some fun things with PowerShell this week. A new SQL module has been fleshed out and validated against just about all instances of SQL. I’m still having a hard time working with a connectionstring for the Windows Internal Database, but it will come. I’ll most likely write about this module after I’m done with this WIR.

I’ve updated the Orchestrator module. The Start-scoRunboook function worked incredibly well if you only ever had one parameter, as soon as you throw more than one it freaks out. How I originally handled it was dumb, so now the function accepts requires a hashtable object, it then compares the key (property name) field against what the Parameter object returns. This worked out extremely well, again probably a topic for a whole blog post as well.

One last pure PowerShell item is a function that writes functions. It’s not too terribly complicated and I *WILL* post about this later, but basically the idea is that Orchestrator contains Runbooks that perform some action, my module reads those Runbooks in, gets their parameters and allows you the admin to run them. What if we could have a function that would build cmdlets based on that information on the fly…

SharePoint Online! How much fun is it working with UserProfiles in SPO? Well, let me tell you, in order to do anything meaningful it appears you have to access a 10yr old web service that must be ripe for deprecation but has been forgotten about! I’d really like to get some more information direct from Microsoft about that. At any rate, I’ve got some POC code that will allow me to programmatically populate a SharePoint user’s profile with information that we glean from another source. The next step down this rabbit hole is using a 7yr old SDK (Office Server 2007) to see if I can create UserProfile subtypes! I’ve got some examples of how this works, but I’ve not written anything up yet to see if it will go, fun times ahead!

Keeping in line with the SharePoint Online topic, creating admin cloud accounts. So we have an Azure subscription that allows us to get into Azure AD for our tenant, which isn’t anything special. If you have an Office365 subscription, you can create an Azure account, hook the two together and boom…Azure AD! So I created an admin account for me, and one of the other guys on the project. After that I enabled the Multi-Factor Authentication on these two accounts. Now, when I login with my admin account, I receive a txt message with a verification code. So we have looked at this as THE way to secure access to these accounts as we begin to think about the cloud.

With that out of the way, I can talk about the Orchestration. I’ve created a Runbook that will connect to our tenant and provision a user. This came out of the Provisioning project for the larger SPO project. This code takes a single parameter, samaccountname, and then provisions that user in o365 with the appropriate licensing. There are two differences between an o365 user and a cloud admin. The first is licensing, a cloud admin gets none by default (our design), second the all important UPN, user@tenant.onmicrosoft.com. The idea is these accounts live solely in the cloud, and are used specifically for administering cloud things. I have a couple modification in mind, first I need to populate the AlternateAddresses field, as well as the MobilePhone field. Then I need to see if I can enable MFA in Azure for these accounts automatically.

Lots of Orchestrator this week, but now that I’m ready and the network is ready it’s time to start working on Orchestrating Windows Updates. I’ve started a rough draft of that at the moment:

  1. basically get a list of servers (or service)
  2. for each server start maintenance mode (ops and Zenoss)
  3. get the applicable updates (SCCM perhaps)
  4. apply the updates
  5. reboot if needed
  6. make sure the server is back online
  7. check if required services are running
  8. leave maintenance mode
  9. and move on to the next server

If one server in a group fails then we need to stop the update process and throw an alert in ops and Zenoss. This will prevent an entire service from going offline if the updates cause an issue.