DPM 2010 console crashes when pushing an agent Install

This is a new one for me, I’ve been running DPM for quite a while now and I’ve not seen this behavior. In a recent staff meeting it came up that the DPM server was having some RPC issues, so since I’m jonesing for stuff to do I said I wouldn’t mind taking a look at it.

When you open the DPM Management Console, click the Management tab and then Agents you are presented with all the servers that have the DPM agent installed. From here you are also able to install/uninstall/update the agent. Working through the Agent Install wizard, I selected the server to be backed up, entered my credentials and within a minute received a nasty error message.

<FatalServiceError>
    <__System>
        <ID>19ID>
        <Seq>0Seq>
        <TimeCreated>8/1/2012 3:12:18 PMTimeCreated>
        <Source>DpmThreadPool.csSource>
        <Line>163Line>
        <HasError>TrueHasError>
    __System>
    <ExceptionType>ArgumentExceptionExceptionType>
    <ExceptionMessage>Value does not fall within the expected range.ExceptionMessage>
    <ExceptionDetails>
    System.ArgumentException: Value does not fall within the expected range.
    at System.Management.ManagementScope.Initialize()
    at Microsoft.Internal.EnterpriseStorage.Dls.UI.InstallAgentsWizard.Win32Cluster.GetNodeClusterState(String nodeName, ConnectionOptions options, UInt32& clusterState)
    at Microsoft.Internal.EnterpriseStorage.Dls.UI.InstallAgentsWizard.CredentialsPage.CheckForCluster(ProductionServerCollection errorNodesAccessDenied, ProductionServerCollection errorNodesClusterDetectionFailed, ProductionServerCollection errorNodesDRDetectionFailed)
    at Microsoft.Internal.EnterpriseStorage.Dls.UI.InstallAgentsWizard.CredentialsPage.FormListOfTargetServers(WindowsIdentity runAsIdentity)
    at Microsoft.Internal.EnterpriseStorage.Dls.UI.InstallAgentsWizard.CredentialsPage.OnLeavePage(LeavePageEventArgs e)
    at Microsoft.Internal.EnterpriseStorage.UI.WizardFramework.WizardPage.RaiseLeavePage(LeavePageEventArgs e)
    at Microsoft.Internal.EnterpriseStorage.UI.WizardFramework.WizardForm.ValidateAndLeavePage(WizardPage page, LeavePageEventArgs e)
    at Microsoft.Internal.EnterpriseStorage.UI.WizardFramework.WizardForm.TraversePagesToTarget(WizardPage startPage, WizardPage targetPage, NavigationDirection direction)
    at Microsoft.Internal.EnterpriseStorage.UI.WizardFramework.WizardForm.InternalNavigateToPage(WizardPage targetPage, NavigateEventArgs e)
    at Microsoft.Internal.EnterpriseStorage.UI.WizardFramework.WizardForm.NextPage()
    at System.Windows.Forms.Control.OnClick(EventArgs e)
    at System.Windows.Forms.Button.WndProc(Message& m)
    at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
    at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
    ExceptionDetails>
FatalServiceError>

I know, nasty right? At any rate we ran through several different things, making sure the server we wanted to get at had the proper firewall rules, could we access the admin hidden share, were the groups there and so on. We even fired up netmon and reproduced the problem just to make sure they were talking. Everything seemed ok, so we called up Microsoft and opened a ticket.

After talking with one of the DPM support tech’s we found that it was an issue with the remote server we were attempting to connect to. While everything appeared to be ok, there was a problem with the RPC settings in the registry. At some point all the entries in the Internet subkey of RPC were removed. Turns out it’s OK if the entire key is missing, or if the key is there and has the proper settings in it, but if it’s there and empty…that’s hurty.

Here is some information he pasted over to me about this key:

With Registry Editor, you can modify the following parameters for RPC. The RPC Port key values discussed below are all located in the following key in the registry: HKEY_LOCAL_MACHINESoftwareMicrosoftRpcInternet
Key Data Type

Ports REG_MULTI_SZ
Specifies a set of IP port ranges consisting of either all the ports available from the Internet or all the ports not available from the Internet. Each string represents a single port or an inclusive set of ports.

For example, a single port may be represented by 5984, and a set of ports may be represented by 5000-5100. If any entries are outside the range of 0 to 65535, or if any string cannot be interpreted, the RPC runtime treats the entire configuration as invalid.

PortsInternetAvailable REG_SZ Y or N (not case-sensitive)
If Y, the ports listed in the Ports key are all the Internet-available ports on that computer. If N, the ports listed in the Ports key are all those ports that are not Internet-available.

UseInternetPorts REG_SZ ) Y or N (not case-sensitive
Specifies the system default policy.
If Y, the processes using the default will be assigned ports from the set of Internet-available ports, as defined previously.
If N, the processes using the default will be assigned ports from the set of intranet-only ports.
Example:

In this example ports 5000 through 5100 inclusive have been arbitrarily selected to help illustrate how the new registry key can be configured. This is not a recommendation of a minimum number of ports needed for any particular system. 1.  Add the Internet key under: HKEY_LOCAL_MACHINESoftwareMicrosoftRpc 
2.  Under the Internet key, add the values “Ports” (MULTI_SZ), “PortsInternetAvailable” (REG_SZ), and “UseInternetPorts” (REG_SZ).

For example, the new registry key appears as follows:
Ports: REG_MULTI_SZ: 5000-5100
PortsInternetAvailable: REG_SZ: Y
UseInternetPorts: REG_SZ: Y 
3.  Restart the server. All applications that use RPC dynamic port allocation use ports 5000 through 5100, inclusive. In most environments, a minimum of 100 ports should be opened, because several system services rely on these RPC ports to communicate with each other. 

The solution was very easy, simply delete (or correct) the malformed entry and reboot. Worked like a charm!

Get recent events from servers

I’ve been working with Microsoft on an issue that I am having with my DPM server. We have been doing some fairly intense logging, and today I enable several performance counters in an attempt to ascertain if something external is triggering this issue.

Along those lines I thought it would be cool to get a list of log entries from two hours before the event occurs. The event I’m tracking is DPM 3101, Volume Missing. We have seen that during a regular backup something happens and then DPM stops with the message that the disk I’m backing up to is no longer connected.

I’ve started a thread and have participated in several other threads on the forums about this issue.

At any rate, I decided that I would write a script that would grab up all the events from my DPM server and my two file servers, that I’m backing up. The hope is that maybe something interesting will be logged.

Why the two hours? Well, it’s silly, but I’ve noticed that two hours seems to be significant in the timeline of how these things are happening.

The script is also available on the TechNet Gallery

DPM Sizing Script

Yesterday I told you how I had decided to automate a portion of my DPM routine. As usual this got the fires burning and a second script was born. I would have told you about it yesterday but I wanted to make the appearance of doing actual work 😉

So today I give you the Get-DPMSizingValues.ps1 script. This is basically the portion of the DPM Sizing tool that I use regularly, the part that deals with file servers. I must say I’m rather proud of it as it worked out better than I thought it would. It uses some of the same basic stuff as the previous script, which was nice for me.

My Get-PSDrive statement is a little different. I noticed when I ran this against my Windows 7 machine I had a lot of cruft I didn’t care about, so you’ll note the Where-Object bit. That filters out any results that have less than or no used space.


Get-PSDrive -PSProvider FileSystem |Where-Object {$_.Used -gt 0} |Select-Object -Property Name, @{Label='Used';Expression={$_.Used /1gb}}

The nitty gritty part of it uses the same formula found in the spreadsheet. Now, there are some values that are hard-coded as these are direct from Microsoft and I don’t really know what they mean as they have not been terribly forthcoming about it, or my fu is just not working for me today.


if (($ReplicaOverheadFactor/100) -gt 1)
{
$ReplicaVolume = $VolumeIdentifier.Used * ($ReplicaOverheadFactor/100)
}
else
{
$ReplicaVolume = $VolumeIdentifier.Used * 1.5
}

if ($VolumeIdentifier.Used -gt 0)
{
$ShadowCopyVolume = ($VolumeIdentifier.Used * $RetentionRange * ($DataChange/100)) + (1600/1024)
}

So I just found a bug while writing this and fixed it, turned out I forgot to convert the ReplicaOverheadFactor into a fraction in that first test. Oh well, it’s working now which is good. At any rate, that is the heart of the script, that gets looped through for every drive that has used space. I had thought about not doing the second test, since my scriptblock actually shouldn’t return any volumes that have zero used space, but what the heck, it doesn’t hurt anything.

The resulting output is pretty nifty, I would imagine you could potentially pipe this into dpm cmdlet, but I haven’t verified that. If someone needs it I’ll look into doing that but for now, it’s a very nice little reporting tool that will give you calculated values for Replica Volumes and ShadowCopy Volumes.


Name : C
UsedSpace : 44.3877143859863
Retention : 7
Replica : 53.2652572631836
ShadowCopy : 32.6339000701904
DataChange : 10
ReplicaOverhead : 120

There is also a version on the Technet Gallery.

Weekly DPM Monitoring

Part of my responsibility is handling storage. This includes allocating, deallocating, backing up and restoring. Now we’ve been using DPM for quite some time and are currently running on DPM 2010. Since this past summer I have personally come to peace with the fact that my users don’t know what the delete key is, so I have set some things in place to make it easy for me to monitor overall usage of storage for the School.

Since storage is always increasing, three weeks ago I decided that I would start to regularly monitor the used space on the file servers and update DPM accordingly. For that I used the DPM sizing tool, it’s a wonderful set of spreadsheets and scripts and if you haven’t played with them, you should!

What I love most about this tool is that you can just type in the used space of a given volume and it will calculate, based on various settings, the new size of the Replica Volume and Recovery Point Volume. So, for the past three weeks I’ve been manually opening up the spreadsheet, firing up RDP, connecting to my server and running Get-PSDrive from inside PowerShell.

For whatever reason, today I decided that enough was enough and to automate this for myself. After all I get regular updates from my file server when it runs out of space, so I can add more why can’t I have something similar for DPM? That’s how the Update-DPMSpreadSheet.ps1 script was born.

The idea is pretty simple, for each file server get a list of drives and the amount of used space in GB. So I created a scriptblock that gives me the bits of information I require.


Get-PSDrive -PSProvider FileSystem |Select-Object -Property Name, @{Label='Used';Expression={$_.Used /1gb}}

I use Invoke-Command and pass it in a session object and the above scriptblock and capture the results. When I’m done I close out of my session with Remove-PSSession that way I don’t consume too many resources.

There is a max limit on the number of concurrent sessions an account can have open. This default is 5, and can be modified as needed. Please see the following article for details on how to do this.

Once I have all that data I create a new instance of Excel, open the DPM Sizing Tool spreadsheet, and set my worksheet to the DPM File Volume sheet. I use the Volume Identification column to match up against the list of drives that are returned from my servers. As of v3.3 of this tool that column is column D. Once I find the current drive in the spreadsheet I hop over one column and update the value of the Used space in GB column (Column E as of v3.3).

If there are any errors along the way, I log them to the Application log and close out of everything.

I had thought about creating a scheduled job to have this run every Monday, but seeing as how my computer might be off or something I took the low-tech route. I updated my $PROFILE with the following chunk of code.


if ((Get-Date).dayofweek -eq 'Monday')
{
C:scriptspowershellproductionUpdate-DPMSpreadSheet.ps1
Invoke-Item 'C:UsersjspattonSyncStuffDPMvolumeSizing v3.3DPMvolumeSizing.xlsx'
}

Hopefully it’s pretty straightforward, if today is Monday, run the Update-DPMSpreadSheet.ps1 script, and then open it up in Excel.

I have also uploaded a version of this script to the Technet Gallery.