Windows-Containers icon indicating copy to clipboard operation
Windows-Containers copied to clipboard

agnhost throws `Class not registered` in HPC container with containerd 1.7.1

Open AbelHu opened this issue 2 years ago • 24 comments

Describe the bug agnhost throws Class not registered in HPC container with containerd 1.7.1

• HPC:

k logs agnhost-win
 Start-Process : This command cannot be run due to the error: Class not registered.
At line:1 char:1
+ Start-Process -Wait -FilePath 'C:/hpc/agnhost' -ArgumentList 'help';  ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Start-Process], InvalidOperationException
    + FullyQualifiedErrorId : InvalidOperationException,Microsoft.PowerShell.Commands.StartProcessCommand

• Non-HPC:

k logs agnhost-win
Usage: app [command]
Available Commands:
  audit-proxy                           Listens on port 8080 for incoming audit events
  completion                            Generate the autocompletion script for the specified shell
  connect                               Attempts a TCP, UDP or SCTP connection and returns useful errors
  crd-conversion-webhook                Starts HTTP server on port 443 for testing CustomResourceConversionWebhook
  dns-server-list                       Prints the host's DNS Server list
  dns-suffix                            Prints the host's DNS suffix list
  entrypoint-tester                     Prints the args it's passed and exits
  etc-hosts                             Prints the host's /etc/hosts file
  fake-gitserver                        Fakes a git server
  grpc-health-checking                  Starts a simple grpc health checking endpoint
  guestbook                             Creates a HTTP server with various endpoints representing a guestbook app
  help                                  Help about any command

To Reproduce Steps to reproduce the behavior:

  1. Create a Windows 2022 agent pool with k8s 1.27 and containerd 1.7.1
  2. Deploy test pod yaml

• HPC pod yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: agnhost-win
    hostnetwork: "true"
  name: agnhost-win
spec:
  containers:
  - command: ["powershell", "-c", "Start-Process -Wait -FilePath 'C:/hpc/agnhost' -ArgumentList 'help'; Sleep 16800"]
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    imagePullPolicy: IfNotPresent
    name: agnhost-win
    ports:
    - containerPort: 18080
      hostPort: 18080
      protocol: TCP
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
  hostNetwork: true
  nodeSelector:
    kubernetes.io/os: windows
  restartPolicy: Always
  securityContext:
    windowsOptions:
      hostProcess: true
      runAsUserName: NT AUTHORITY\SYSTEM
  terminationGracePeriodSeconds: 30

• Non-HPC pod yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: agnhost-win
    hostnetwork: "true"
  name: agnhost-win
spec:
  containers:
  - command: ["powershell", "-c", "Start-Process -Wait -FilePath 'C:/agnhost' -ArgumentList 'help'; Sleep 16800"]
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    imagePullPolicy: IfNotPresent
    name: agnhost-win
    ports:
    - containerPort: 18080
      hostPort: 18080
      protocol: TCP
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
  nodeSelector:
    kubernetes.io/os: windows
  restartPolicy: Always
  terminationGracePeriodSeconds: 30

Expected behavior Expect the same output in HPC container as that in non-HPC container.

Configuration:

  • Edition: Windows 2022
  • Base Image being used: registry.k8s.io/e2e-test-images/agnhost:2.40
  • Container engine: containerd v1.7.1
  • Container Engine version: 1.7.1-azure

Additional context We found that this in testing Windows containerd v1.7.1 on AKS.

AbelHu avatar May 31 '23 02:05 AbelHu

Thank @gaopenghigh for the info that adding below env can make it work

    env:
    - name: PATHEXT
      value: ".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL;;"

Work HPC yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: agnhost-win
    hostnetwork: "true"
  name: agnhost-win
spec:
  containers:
  - command: ["powershell", "-c", "%CONTAINER_SANDBOX_MOUNT_POINT%/agnhost help; Sleep 16800"]
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    imagePullPolicy: IfNotPresent
    name: agnhost-win
    env:
    - name: PATHEXT
      value: ".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL;;"
    ports:
    - containerPort: 18080
      hostPort: 18080
      protocol: TCP
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
  hostNetwork: true
  nodeSelector:
    kubernetes.io/os: windows
  restartPolicy: Always
  securityContext:
    windowsOptions:
      hostProcess: true
      runAsUserName: NT AUTHORITY\SYSTEM
  terminationGracePeriodSeconds: 30

AbelHu avatar May 31 '23 07:05 AbelHu

Were you able to get your question answered @AbelHu?

fady-azmy-msft avatar May 31 '23 17:05 fady-azmy-msft

@fady-azmy-msft The workaround works but it seems like that it is a regression issue so it will block all customers before they apply the same workaround.

AbelHu avatar Jun 01 '23 01:06 AbelHu

FYI @msscotb and @kiashok

fady-azmy-msft avatar Jun 02 '23 16:06 fady-azmy-msft

Successfully repro'd by Dawei. Working on a mitigation strategy.

ntrappe-msft avatar Jun 15 '23 00:06 ntrappe-msft

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

We're still looking into this, but don't have a timeline to share.

fady-azmy-msft avatar Jul 19 '23 17:07 fady-azmy-msft

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

We are actively looking into this. Don't have a timeline yet - will share more details on this once we have them.

kiashok avatar Sep 18 '23 16:09 kiashok

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.

This issue has been open for 30 days with no updates. @msscotb, @kiashok, please provide an update or close this issue.