JEA icon indicating copy to clipboard operation
JEA copied to clipboard

Register-PSSessionConfiguration causes WinRM service hanging in state 'stopping'

Open jnury opened this issue 8 years ago • 15 comments

Hi, I use DSC to deploy JEA configuration on many Windows Server 2012 R2 hosts:

PS > $psversiontable

Name                           Value
----                           -----
PSVersion                      5.1.14409.1012
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.14409.1012
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

About 3 times out of 4, when Register-PSSessionConfiguration is triggered by the DSC module, WinRM service is restarted but hangs on Stopping.

It seems to happen more frequently when the configuration causes WinRM to change Logon As (from Network Service to Local System).

Is there a 'correct' way to avoid this behaviour ?

We use the following script to force restart WinRM service (with SCCM as we lost PS remoting ability on host):

$winRMService = Get-Service -Name 'WinRM'
if ($winRMService -and $winRMService.Status -eq 'StopPending') {
    $processId = Get-CimInstance -ClassName 'Win32_Service' -Filter "Name LIKE 'WinRM'" | Select-Object -Expand 'ProcessId'
    $serviceList = Get-CimInstance -ClassName 'Win32_Service' -Filter "ProcessId=$processId" | Select-Object -Expand 'Name'
    $failure = @()
    Write-Host "Forcing process $processId to stop ..." -NoNewline
    try {
        Stop-Process -Id $processId -Force
        Write-Host ' done'
        Write-Host 'Waiting 5 seconds'
        Start-Sleep -Seconds 5
        foreach ($service in $serviceList) {
            Write-Host "Starting service $service ..." -NoNewline
            try {
                Start-Service -Name $service
                Write-Host ' done'
            } catch {
                Write-Host ' failed'
                $failure += "Start service $service"
            }
        }
    } catch {
        Write-Host ' failed'
        $failure += "Kill WinRM process"
    }

    if ($failure) {
        Throw "Failed to execute following operation(s): $($failure -join ', ')"
    }
}

Should-we add WinRM restart problem detection/mitigation directly in the DSC resource ? I can provide a PR for that (with a less verbose code ;-))

jnury avatar Nov 17 '17 07:11 jnury

@PaulHigin: are you aware of problems in Register-PSSessionConfiguration/WinRM that could explain this behavior (and issue #31 ) ?

jnury avatar Nov 17 '17 09:11 jnury

/cc @manojampalam for the WinRM aspects

Thanks for reporting these issues, Julien. I've seen this behavior a few times, but nowhere near as frequently or consistently as you are describing. Have you seen this behavior on 2008 R2 / 2012 / 2016 as well, or just 2012 R2?

rpsqrd avatar Nov 17 '17 17:11 rpsqrd

For now, I only deployed my configuration on 85 hosts, but only on Windows 2012 R2. I'll have some targets on 2008 R2 soon, but only a few. I plan to deploy on 2016 soon too... So, only tested on 2012 R2.

jnury avatar Nov 17 '17 17:11 jnury

This bug repeated consistently on 2016, I worked around it by putting the call to Register-PSSessionConfiguration within a PSJob, waiting for 10 seconds, setting $global:DSCMachineStatus = 1 at the end of the set block.

I will try and make my code a bit cleverer and then post it

djwork avatar Dec 11 '17 03:12 djwork

@djwork : what about calling Register-PSSessionConfiguration within a PSJob and entering a loop until the job is OK or a timeout of, say 30 seconds, is expired. While in the loop, if the service is 'stoping' for more than, say 5 seconds, we run the 'force restart' script I mentioned above ?

It would be quite safe as the WinRM isn't left hanging, other services running with the same process are restarted as well and the resource would be compliant at first run.

Of course, patching WinRM to avoid hanging would be the best solution ;-)

jnury avatar Dec 19 '17 21:12 jnury

@jnury That's what I did

[DscResource()] class JeaEndpoint { ## The mandatory endpoint name. Use 'Microsoft.PowerShell' by default. [DscProperty(Key)] [string] $EndpointName = 'Microsoft.PowerShell'

## The mandatory role definition map to be used for the endpoint. This
## should be a string that represents the Hashtable used for the RoleDefinitions
## property in New-PSSessionConfigurationFile, such as:
## RoleDefinitions = '@{ Everyone = @{ RoleCapabilities = "BaseJeaCapabilities" } }'
[Dscproperty(Mandatory)]
[string] $RoleDefinitions

## The optional groups to be used when the endpoint is configured to
## run as a Virtual Account
[DscProperty()]
[string[]] $RunAsVirtualAccountGroups

## The optional Group Managed Service Account (GMSA) to use for this
## endpoint. If configured, will disable the default behaviour of
## running as a Virtual Account
[DscProperty()]
[string] $GroupManagedServiceAccount

## The optional directory for transcripts to be saved to
[DscProperty()]
[string] $TranscriptDirectory

## The optional startup script for the endpoint
[DscProperty()]
[string[]] $ScriptsToProcess

## The optional switch to enable mounting of a restricted user drive
[Dscproperty()]
[bool] $MountUserDrive

## The optional size of the user drive. The default is 50MB.
[Dscproperty()]
[long] $UserDriveMaximumSize

## The optional number of seconds to wait for registering the endpoint to complete.
## The default is 10 seconds.
[Dscproperty()]
[int] $HungRegistrationTimeout = 10

## The optional number of times to retry starting the WinRM service.
## The default is 10.
[Dscproperty()]
[int] $MaximumWinRMStartRetry = 10

## The optional expression declaring which domain groups (for example,
## two-factor authenticated users) connected users must be members of. This
## should be a string that represents the Hashtable used for the RequiredGroups
## property in New-PSSessionConfigurationFile, such as:
## RequiredGroups = '@{ And = "RequiredGroup1", @{ Or = "OptionalGroup1", "OptionalGroup2" } }'
[Dscproperty()]
[string] $RequiredGroups

## Applies the JEA configuration
[void] Set()
{
    $psscPath = Join-Path ([IO.Path]::GetTempPath()) ([IO.Path]::GetRandomFileName() + ".pssc")

    ## Convert the RoleDefinitions string to the actual Hashtable
    $roleDefinitionsHash = $this.ConvertStringToHashtable($this.RoleDefinitions)

    $configurationFileArguments = @{
        Path = $psscPath
        RoleDefinitions = $roleDefinitionsHash
        SessionType = 'RestrictedRemoteServer'
    }

    if($this.RunAsVirtualAccountGroups -and $this.GroupManagedServiceAccount)
    {
        throw "The RunAsVirtualAccountGroups setting can not be used when a configuration is set to run as a Group Managed Service Account"
    }

    ## Set up the JEA identity
    if($this.RunAsVirtualAccountGroups)
    {
        $configurationFileArguments["RunAsVirtualAccount"] = $true
        $configurationFileArguments["RunAsVirtualAccountGroups"] = $this.RunAsVirtualAccountGroups
    }
    elseif($this.GroupManagedServiceAccount)
    {
        $configurationFileArguments["GroupManagedServiceAccount"] = $this.GroupManagedServiceAccount -replace '\$$', ''
    }
    else
    {
        $configurationFileArguments["RunAsVirtualAccount"] = $true
    }

    ## Transcripts
    if($this.TranscriptDirectory)
    {
        $configurationFileArguments["TranscriptDirectory"] = $this.TranscriptDirectory
    }

    ## Startup scripts
    if($this.ScriptsToProcess)
    {
        $configurationFileArguments["ScriptsToProcess"] = $this.ScriptsToProcess
    }

    ## Mount user drive
    if($this.MountUserDrive)
    {
        $configurationFileArguments["MountUserDrive"] = $this.MountUserDrive
    }

    ## User drive maximum size
    if($this.UserDriveMaximumSize)
    {
        $configurationFileArguments["UserDriveMaximumSize"] = $this.UserDriveMaximumSize
        $configurationFileArguments["MountUserDrive"] = $true
    }

    ## Required groups
    if($this.RequiredGroups)
    {
        ## Convert the RequiredGroups string to the actual Hashtable
        $requiredGroupsHash = $this.ConvertStringToHashtable($this.RequiredGroups)
        $configurationFileArguments["RequiredGroups"] = $requiredGroupsHash
    }

    ## Register the endpoint
    try
    {
        ## If we are replacing Microsoft.PowerShell, create a 'break the glass' endpoint
        if($this.EndpointName -eq "Microsoft.PowerShell")
        {
            $breakTheGlassName = "Microsoft.PowerShell.Restricted"
            if(-not (Get-PSSessionConfiguration -Name ($breakTheGlassName + "*") |
                Where-Object Name -eq $breakTheGlassName))
            {
                Register-PSSessionConfiguration -Name $breakTheGlassName
            }
        }

        ## Remove the previous one, if any.
        $existingConfiguration = Get-PSSessionConfiguration -Name ($this.EndpointName + "*") |
            Where-Object Name -eq $this.EndpointName

        if($existingConfiguration)
        {
            Unregister-PSSessionConfiguration -Name $this.EndpointName
        }

        ## Create the configuration file
        New-PSSessionConfigurationFile @configurationFileArguments
        #Register-PSSessionConfiguration has been hanging because the WinRM service is stuck in Stopping state
        #therefore we need to run Register-PSSessionConfiguration within a job to allow us to handle a hanging WinRM service
        Start-Job -ScriptBlock {
            param($endpointName, $psscPath)
            Register-PSSessionConfiguration -Name $endpointName -Path $psscPath -Force -ErrorAction Stop
        } -ArgumentList ($this.EndpointName), $psscPath | Wait-Job -Timeout ($this.HungRegistrationTimeout) | Remove-Job -Force -ErrorAction SilentlyContinue
        #Note: above I used the "ArgumentList" rather than "$using:" because I don't know if "$using:this.EndpointName" will work

        #if WinRM is stilling Stopping after the job has completed / exceeded $this.HungRegistrationTimeout, force kill the underlying WinRM process
        if ((Get-Service -Name WinRM).Status -ieq 'Stopping') {
            $id = Get-WmiObject -Class Win32_Service -Filter "Name LIKE 'WinRM'" | Select-Object -ExpandProperty ProcessId
            Stop-Process -Id $id -Force
        }

        #if stopped try to start WinRM, with $this.MaximumWinRMStartRetry reties
        [int]$tryCount = 0
        while (((Get-Service -Name WinRM).Status -ieq 'Stopped') -and ($tryCount -le $this.MaximumWinRMStartRetry))
        {
            Write-Verbose -Message 'Starting WinRM service'
            Start-Service -Name WinRM
            Start-Sleep -Seconds 1
        }

        ## Enable PowerShell logging on the system
        $basePath = "HKLM:\Software\Policies\Microsoft\Windows\PowerShell\ScriptBlockLogging"
        if(-not (Test-Path $basePath))
        {
            $null = New-Item $basePath -Force
        }
        Set-ItemProperty $basePath -Name EnableScriptBlockLogging -Value "1"
    }
    finally
    {
        Remove-Item $psscPath
    }
}

# Tests if the resource is in the desired state.
[bool] Test()
{
    $currentInstance = $this.Get()

    ## If this was configured with our mandatory property (RoleDefinitions), dig deeper
    if($currentInstance.RoleDefinitions)
    {
        if($currentInstance.EndpointName -ne $this.EndpointName)
        {
            Write-Verbose "EndpointName not equal: $($currentInstance.EndpointName)"
            return $false
        }

        ## Convert the RoleDefinitions string to the actual Hashtable
        $roleDefinitionsHash = $this.ConvertStringToHashtable($this.RoleDefinitions)
        Write-Verbose ($currentInstance.RoleDefinitions.GetType())

        if(-not $this.ComplexObjectsEqual($this.ConvertStringToHashtable($currentInstance.RoleDefinitions), $roleDefinitionsHash))
        {
            Write-Verbose "RoleDfinitions not equal: $($currentInstance.RoleDefinitions)"
            return $false
        }

        if(-not $this.ComplexObjectsEqual($currentInstance.RunAsVirtualAccountGroups, $this.RunAsVirtualAccountGroups))
        {
            Write-Verbose "RunAsVirtualAccountGroups not equal: $(ConvertTo-Json $currentInstance.RunAsVirtualAccountGroups -Depth 100)"
            return $false
        }

        if($currentInstance.GroupManagedServiceAccount -ne ($this.GroupManagedServiceAccount -replace '\$$', ''))
        {
            Write-Verbose "GroupManagedServiceAccount not equal: $($currentInstance.GroupManagedServiceAccount)"
            return $false
        }

        if($currentInstance.TranscriptDirectory -ne $this.TranscriptDirectory)
        {
            Write-Verbose "TranscriptDirectory not equal: $($currentInstance.TranscriptDirectory)"
            return $false
        }

        if(-not $this.ComplexObjectsEqual($currentInstance.ScriptsToProcess, $this.ScriptsToProcess))
        {
            Write-Verbose "ScriptsToProcess not equal: $(ConvertTo-Json $currentInstance.ScriptsToProcess -Depth 100)"
            return $false
        }

        if($currentInstance.MountUserDrive -ne $this.MountUserDrive)
        {
            Write-Verbose "MountUserDrive not equal: $($currentInstance.MountUserDrive)"
            return $false
        }

        if($currentInstance.UserDriveMaximumSize -ne $this.UserDriveMaximumSize)
        {
            Write-Verbose "UserDriveMaximumSize not equal: $($currentInstance.UserDriveMaximumSize)"
            return $false
        }
        # Check for null required groups

        $requiredGroupsHash = $this.ConvertStringToHashtable($this.RequiredGroups)
        if(-not $this.ComplexObjectsEqual($this.ConvertStringToHashtable($currentInstance.RequiredGroups), $requiredGroupsHash))
        {
            Write-Verbose "RequiredGroups not equal: $(ConvertTo-Json $currentInstance.RequiredGroups -Depth 100)"
            return $false
        }



        return $true
    }
    else
    {
        return $false
    }
}

## A simple comparison for complex objects used in JEA configurations.
## We don't need anything extensive, as we should be the only ones changing
## them.
hidden [bool] ComplexObjectsEqual($object1, $object2)
{
    $json1 = ConvertTo-Json -InputObject $object1 -Depth 100
    Write-Verbose "Argument1: $json1"

    $json2 = ConvertTo-Json -InputObject $object2 -Depth 100
    Write-Verbose "Argument2: $json2"

    return ($json1 -eq $json2)
}

## Convert a string representing a Hashtable into a Hashtable
hidden [Hashtable] ConvertStringToHashtable($hashtableAsString)
{
    if ($hashtableAsString -eq $null)
    {
        $hashtableAsString = '@{}'
    }
    $ast = [System.Management.Automation.Language.Parser]::ParseInput($hashtableAsString, [ref] $null, [ref] $null)
    $data = $ast.Find( { $args[0] -is [System.Management.Automation.Language.HashtableAst] }, $false )

    return [Hashtable] $data.SafeGetValue()
}

# Gets the resource's current state.
[JeaEndpoint] Get()
{
    $returnObject = New-Object JeaEndpoint

    $sessionConfiguration = $null

    [int]$tryCount = 0
    while (((Get-Service -Name WinRM).Status -ine 'Running') -and ($tryCount -le 10))
    {
        Write-Verbose -Message 'Starting WinRM service'
        Start-Service -Name WinRM
        Start-Sleep -Seconds 1
    }

    $winRMService = Get-Service -Name WinRM
    if (($winRMService -ne $null) -and ($winRMService.Status -ieq 'running')) {
        #This code will fail if winrm not running
        $sessionConfiguration = Get-PSSessionConfiguration -Name ($this.EndpointName + "*") |
            Where-Object Name -eq $this.EndpointName
    }

    if((-not $sessionConfiguration) -or (-not $sessionConfiguration.ConfigFilePath))
    {
        return $returnObject
    }
    else
    {
        $configFileArguments = Import-PowerShellDataFile $sessionConfiguration.ConfigFilePath
        $rawConfigFileAst = [System.Management.Automation.Language.Parser]::ParseFile($sessionConfiguration.ConfigFilePath, [ref] $null, [ref] $null)
        $rawConfigFileArguments = $rawConfigFileAst.Find( { $args[0] -is [System.Management.Automation.Language.HashtableAst] }, $false )

        $returnObject.EndpointName = $sessionConfiguration.Name

        ## Convert the hashtable to a string, as that is the input format required by DSC
        $returnObject.RoleDefinitions = $rawConfigFileArguments.KeyValuePairs | Where-Object { $_.Item1.Extent.Text -eq 'RoleDefinitions' } | ForEach-Object { $_.Item2.Extent.Text }

        if($sessionConfiguration.RunAsVirtualAccountGroups)
        {
            $returnObject.RunAsVirtualAccountGroups = $sessionConfiguration.RunAsVirtualAccountGroups -split ';'
        }

        if($sessionConfiguration.GroupManagedServiceAccount)
        {
            $returnObject.GroupManagedServiceAccount = $sessionConfiguration.GroupManagedServiceAccount
        }

        if($configFileArguments.TranscriptDirectory)
        {
            $returnObject.TranscriptDirectory = $configFileArguments.TranscriptDirectory
        }

        if($configFileArguments.ScriptsToProcess)
        {
            $returnObject.ScriptsToProcess = $configFileArguments.ScriptsToProcess
        }

        if($configFileArguments.MountUserDrive)
        {
            $returnObject.MountUserDrive = $configFileArguments.MountUserDrive
        }

        if($configFileArguments.UserDriveMaximumSize)
        {
            $returnObject.UserDriveMaximumSize = $configFileArguments.UserDriveMaximumSize
        }

        if($configFileArguments.RequiredGroups)
        {
            $returnObject.RequiredGroups = $rawConfigFileArguments.KeyValuePairs | Where-Object { $_.Item1.Extent.Text -eq 'RequiredGroups' } | ForEach-Object { $_.Item2.Extent.Text }
        }

        return $returnObject
    }
}

}

djwork avatar Dec 20 '17 02:12 djwork

Hi all, This is my proposal of a workaround for this bug: https://github.com/jnury/JEA/blob/issue%2330/DSC%20Resource/JustEnoughAdministration/JustEnoughAdministration.psm1

As I've done a 'lot' of refactoring and would appreciate a code review before filling a PR ;-)

This is what I've done:

  • implementing proposal from @djwork (with some small corrections)
  • adding restart of services that share the same process as WinRM
  • adding WinRM status verification before each call to xxx-PSSessionConfiguration
  • improving Verbose messages

jnury avatar Feb 25 '18 20:02 jnury

@manojampalam It is a shame you have to do this. It would be better for the WinRM service to restart rather than hang on stopping. It might be due to the service host process not being able to restart for some reason. If this is the case then you should be able to ensure WinRM always runs in its own process. I am not familiar with WinRM, but Manoj may be able to help.

PaulHigin avatar Feb 26 '18 17:02 PaulHigin

We are looking into this now and will follow up.

manojampalam avatar Mar 05 '18 18:03 manojampalam

Hello @manojampalam, any news ?

The workaround shipped in PR #46 is really 'heavy' ... but it was triggered on half of my last deployments, so it's really useful.

Hope you can fix the WinRM restart problem directly in the WinRM service and we can remove the workaround some day.

If it helps: on some of my hosts, it seems that the LanmanWorkstation service (which is co-hosted in the same process as WinRM) ended on an error while WinRM restarted after Register-PSSessionConfiguration.

jnury avatar May 10 '18 13:05 jnury

Hello guys @rpsqrd : have-you been able to have a look at PR #46 ? @manojampalam : have you find something in WinRM ?

jnury avatar Jun 08 '18 12:06 jnury

Hi jnury, I am Chenming YU, who works with Manoj for WinRM area in Microsoft. Based on the symptom of you situation, I suspect it is similar with one case in past, which the pending action of winrm hang in dsctimer wakeup activities. (hosted inside winrm service). if so, here is The workaround is to call “start-dsc*” with : (either) 1) –force (to make sure that any deadlock between WINRM and WMI breaks by cancelling existing operation). 2) Perform the operation using Dcom protocol instead of Winrm protocol to avoid getting errors when WINRM is transitioning between start-stop-start state.

To checked whether wakeup of dsctimer is activated within WinRM: -

  1. whether the regkey (below) value : 1 or not HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled
  2. listed files existed under %windir%\system32\Configuration : "MetaConfig.mof", "Pending.mof",

if still repro with those workaround, please chat me with the memory dump of the pending winrm service (svchost.exe, via Taskmgr >> marked process >> "create dump file")

cmyu-gh avatar Jun 19 '18 20:06 cmyu-gh

Hi @cmyu-gh, it seems I missed your answer, sorry for that.

I'm not able to use Start-DSC* with -Force as I use the Pull mode, so the configuration is automatically triggered.

Will the second option apply to Pull mode as well or is it only for the Start-DSC* commands ?

jnury avatar Oct 16 '18 19:10 jnury

sorry about the late response, in your situation, can You check the below regvalue on the repro machines, : HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled

if it is '1' or ('2' and %windir%\system32\Configuration*mof existed), then it might prove my suspect on dsctimer plugin of WinRM. otherwise, can you forward me the repro dump of winrm service in hang for advance analysis.

in case of pull control on your case, there is a workaround in directly stop dsctimer plugin within winrm (the effect needs steprestart-service winrm)

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled set it to 0

  • after restart winrm service, retry your execution.

**please ignore 2nd option posted before, it is force cimsession protocol via DCOM instead of WinRM -- switch set in some management cmdlets.

cmyu-gh avatar Oct 22 '18 20:10 cmyu-gh

Note, this is still an issue on PowerShell 5.1 on Windows 2019 and the workaround of setting HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System:DSCAutomationHostEnabled to 0 seems to cause problems continuing to apply DSC settings after a reboot.

djwork avatar May 06 '20 20:05 djwork