wave icon indicating copy to clipboard operation
wave copied to clipboard

484 blob upload stress test

Open munishchouhan opened this issue 1 year ago • 7 comments

This PR is only for testing visibility purposes, not intended to be merged

munishchouhan avatar May 02 '24 13:05 munishchouhan

getting error while creating pods Try 'sleep --help' for more information.

munishchouhan avatar May 02 '24 13:05 munishchouhan

unknown blob error is happeining with just using 5 min sleep:

munish.chouhan@Munishs-MacBook-Pro ~ % docker pull f938d118d5bc.ngrok.app/wt/2bbeb1a29da7/public/nf-jdk:corretto-17.0.7
corretto-17.0.7: Pulling from wt/2bbeb1a29da7/public/nf-jdk
bf72c394abb7: Downloading
4edf64cf85c0: Pulling fs layer
f938cbd6d06c: Downloading
unknown blob
munish.chouhan@Munishs-MacBook-Pro ~ % docker pull f938d118d5bc.ngrok.app/wt/2bbeb1a29da7/public/nf-jdk:corretto-17.0.7
corretto-17.0.7: Pulling from wt/2bbeb1a29da7/public/nf-jdk
bf72c394abb7: Pulling fs layer
4edf64cf85c0: Pulling fs layer
f938cbd6d06c: Downloading
unknown blob
munish.chouhan@Munishs-MacBook-Pro ~ % wave -i cr.seqera.io/public/nf-jdk:corretto-17.0.7 --wave-endpoint http://localhost:9090
f938d118d5bc.ngrok.app/wt/5a007fbdf24a/public/nf-jdk:corretto-17.0.7
munish.chouhan@Munishs-MacBook-Pro ~ % docker pull f938d118d5bc.ngrok.app/wt/5a007fbdf24a/public/nf-jdk:corretto-17.0.7
corretto-17.0.7: Pulling from wt/5a007fbdf24a/public/nf-jdk
bf72c394abb7: Downloading
4edf64cf85c0: Downloading
f938cbd6d06c: Downloading
unknown blob

munishchouhan avatar May 02 '24 15:05 munishchouhan

weird Screenshot 2024-05-02 at 18 19 49

munishchouhan avatar May 02 '24 16:05 munishchouhan

@pditommaso Tested in local:

  1. pulled image with 1 minute sleep
  2. after the pods are created drained the node
  3. after drain completion, uncordon the node
  4. Pull completed after docker client did multiple retries, there were errors in the logs:

error:

22:44:28.353 [io-executor-thread-7] WARN  i.s.w.s.b.impl.BlobCacheServiceImpl - == Blob cache failed for object 'cr.seqera.io/v2/public/nf-jdk/blobs/sha256:4edf64cf85c039184023bdfaa7e82e8a607c7f0a55286cce0c938431af0d83d3' - cause: 
io.kubernetes.client.openapi.ApiException: 
	at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:989)
	at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:905)
	at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPodWithHttpInfo(CoreV1Api.java:26769)
	at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPod(CoreV1Api.java:26747)
	at io.seqera.wave.service.k8s.K8sServiceImpl.getPod(K8sServiceImpl.groovy:212)
	at io.seqera.wave.service.k8s.K8sServiceImpl.waitPod(K8sServiceImpl.groovy:441)
	at io.seqera.wave.service.blob.impl.KubeTransferStrategy.transfer(KubeTransferStrategy.groovy:53)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.store(BlobCacheServiceImpl.groovy:207)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.storeIfAbsent(BlobCacheServiceImpl.groovy:186)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.retrieveBlobCache(BlobCacheServiceImpl.groovy:102)
	at io.seqera.wave.controller.RegistryProxyController.fromDownloadResponse(RegistryProxyController.groovy:326)
	at io.seqera.wave.controller.RegistryProxyController.handleDelegate0(RegistryProxyController.groovy:231)
	at io.seqera.wave.controller.RegistryProxyController.handleGet0(RegistryProxyController.groovy:200)
	at io.seqera.wave.controller.RegistryProxyController.handleGet(RegistryProxyController.groovy:141)
	at io.seqera.wave.controller.$RegistryProxyController$Definition$Exec.dispatch(Unknown Source)
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371)
	at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594)
	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303)
	at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111)
	at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103)
	at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659)
	at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49)
	at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
	at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)
	at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
	at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53)
	at io.micrometer.core.instrument.composite.CompositeTimer.recordCallable(CompositeTimer.java:129)
	at io.micrometer.core.instrument.Timer.lambda$wrap$1(Timer.java:206)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)
22:44:28.353 [io-executor-thread-3] WARN  i.s.w.s.b.impl.BlobCacheServiceImpl - == Blob cache failed for object 'cr.seqera.io/v2/public/nf-jdk/blobs/sha256:f938cbd6d06ceb181f12f0acdd75f343bdc5bff5b6253d32c886ea0c75ec1ebb' - cause: 
io.kubernetes.client.openapi.ApiException: 
	at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:989)
	at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:905)
	at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPodWithHttpInfo(CoreV1Api.java:26769)
	at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPod(CoreV1Api.java:26747)
	at io.seqera.wave.service.k8s.K8sServiceImpl.getPod(K8sServiceImpl.groovy:212)
	at io.seqera.wave.service.k8s.K8sServiceImpl.waitPod(K8sServiceImpl.groovy:441)
	at io.seqera.wave.service.blob.impl.KubeTransferStrategy.transfer(KubeTransferStrategy.groovy:53)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.store(BlobCacheServiceImpl.groovy:207)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.storeIfAbsent(BlobCacheServiceImpl.groovy:186)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.retrieveBlobCache(BlobCacheServiceImpl.groovy:102)
	at io.seqera.wave.controller.RegistryProxyController.fromDownloadResponse(RegistryProxyController.groovy:326)
	at io.seqera.wave.controller.RegistryProxyController.handleDelegate0(RegistryProxyController.groovy:231)
	at io.seqera.wave.controller.RegistryProxyController.handleGet0(RegistryProxyController.groovy:200)
	at io.seqera.wave.controller.RegistryProxyController.handleGet(RegistryProxyController.groovy:141)
	at io.seqera.wave.controller.$RegistryProxyController$Definition$Exec.dispatch(Unknown Source)
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371)
	at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594)
	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303)
	at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111)
	at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103)
	at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659)
	at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49)
	at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
	at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)
	at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
	at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53)
	at io.micrometer.core.instrument.composite.CompositeTimer.recordCallable(CompositeTimer.java:129)
	at io.micrometer.core.instrument.Timer.lambda$wrap$1(Timer.java:206)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)
22:44:28.353 [io-executor-thread-5] WARN  i.s.w.s.b.impl.BlobCacheServiceImpl - == Blob cache failed for object 'cr.seqera.io/v2/public/nf-jdk/blobs/sha256:bf72c394abb748707ec4590d5017f36ad47098c9b92adc1b04c3ea3ba0b395f6' - cause: 
io.kubernetes.client.openapi.ApiException: 
	at io.kubernetes.client.openapi.ApiClient.handleResponse(ApiClient.java:989)
	at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:905)
	at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPodWithHttpInfo(CoreV1Api.java:26769)
	at io.kubernetes.client.openapi.apis.CoreV1Api.readNamespacedPod(CoreV1Api.java:26747)
	at io.seqera.wave.service.k8s.K8sServiceImpl.getPod(K8sServiceImpl.groovy:212)
	at io.seqera.wave.service.k8s.K8sServiceImpl.waitPod(K8sServiceImpl.groovy:441)
	at io.seqera.wave.service.blob.impl.KubeTransferStrategy.transfer(KubeTransferStrategy.groovy:53)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.store(BlobCacheServiceImpl.groovy:207)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.storeIfAbsent(BlobCacheServiceImpl.groovy:186)
	at io.seqera.wave.service.blob.impl.BlobCacheServiceImpl.retrieveBlobCache(BlobCacheServiceImpl.groovy:102)
	at io.seqera.wave.controller.RegistryProxyController.fromDownloadResponse(RegistryProxyController.groovy:326)
	at io.seqera.wave.controller.RegistryProxyController.handleDelegate0(RegistryProxyController.groovy:231)
	at io.seqera.wave.controller.RegistryProxyController.handleGet0(RegistryProxyController.groovy:200)
	at io.seqera.wave.controller.RegistryProxyController.handleGet(RegistryProxyController.groovy:141)
	at io.seqera.wave.controller.$RegistryProxyController$Definition$Exec.dispatch(Unknown Source)
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371)
	at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594)
	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303)
	at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111)
	at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103)
	at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659)
	at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49)
	at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
	at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)
	at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
	at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53)
	at io.micrometer.core.instrument.composite.CompositeTimer.recordCallable(CompositeTimer.java:129)
	at io.micrometer.core.instrument.Timer.lambda$wrap$1(Timer.java:206)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)

Client

munish.chouhan@Munishs-MacBook-Pro ~ % wave -i cr.seqera.io/public/nf-jdk:corretto-17.0.7 --wave-endpoint http://localhost:9090
f938d118d5bc.ngrok.app/wt/15913b732c38/public/nf-jdk:corretto-17.0.7
munish.chouhan@Munishs-MacBook-Pro ~ % docker pull f938d118d5bc.ngrok.app/wt/15913b732c38/public/nf-jdk:corretto-17.0.7
corretto-17.0.7: Pulling from wt/15913b732c38/public/nf-jdk
bf72c394abb7: Retrying in 1 second
bf72c394abb7: Pull complete
4edf64cf85c0: Pull complete
f938cbd6d06c: Pull complete
Digest: sha256:5b884ddbca42a76df70e3b3658a5e14da51500a18a357f8e87c18750e4c52adc
Status: Downloaded newer image for f938d118d5bc.ngrok.app/wt/15913b732c38/public/nf-jdk:corretto-17.0.7
f938d118d5bc.ngrok.app/wt/15913b732c38/public/nf-jdk:corretto-17.0.7

K8s

munish.chouhan@Munishs-MacBook-Pro ~ % kubectl drain colima --delete-emptydir-data --force
node/colima already cordoned
Warning: deleting Pods that declare no controller: wave-local/transfer-4657e9b637556c73, wave-local/transfer-7c10506f7e70fd71, wave-local/transfer-ac3dc288d194ee36
evicting pod wave-local/transfer-ac3dc288d194ee36
evicting pod wave-local/transfer-7cab8462c8baea25
evicting pod kube-system/coredns-6799fbcd5-s6tld
evicting pod default/s5cmd-sleeper
evicting pod kube-system/metrics-server-67c658944b-7gjls
evicting pod wave-local/transfer-280fad5e9accaa78
evicting pod wave-local/transfer-4657e9b637556c73
evicting pod kube-system/local-path-provisioner-84db5d44d9-qrn5n
evicting pod wave-local/transfer-8bbfe77fd48232ff
evicting pod wave-local/transfer-7c10506f7e70fd71
pod/transfer-280fad5e9accaa78 evicted
pod/transfer-7cab8462c8baea25 evicted
pod/s5cmd-sleeper evicted
pod/transfer-8bbfe77fd48232ff evicted
pod/coredns-6799fbcd5-s6tld evicted
pod/metrics-server-67c658944b-7gjls evicted
pod/transfer-ac3dc288d194ee36 evicted
pod/local-path-provisioner-84db5d44d9-qrn5n evicted
pod/transfer-4657e9b637556c73 evicted
pod/transfer-7c10506f7e70fd71 evicted
node/colima drained
munish.chouhan@Munishs-MacBook-Pro ~ % kubectl get pods -n wave-local
No resources found in wave-local namespace.
munish.chouhan@Munishs-MacBook-Pro ~ % kubectl get pods -n wave-local
No resources found in wave-local namespace.
munish.chouhan@Munishs-MacBook-Pro ~ % kubectl start colima
error: unknown command "start" for "kubectl"
munish.chouhan@Munishs-MacBook-Pro ~ % kubectl uncordon colima
node/colima uncordoned
munish.chouhan@Munishs-MacBook-Pro ~ % kubectl get pods -n wave-local
NAME                        READY   STATUS    RESTARTS   AGE
transfer-42d59159ae931c4b   1/1     Running   0          97s
transfer-ac6247652b4af59b   1/1     Running   0          97s
transfer-44adb58a6e5cfc5c   1/1     Running   0          97s
munish.chouhan@Munishs-MacBook-Pro ~ % kubectl get pods -n wave-local
NAME                        READY   STATUS      RESTARTS   AGE
transfer-ac6247652b4af59b   0/1     Completed   0          5m57s
transfer-44adb58a6e5cfc5c   0/1     Completed   0          5m57s
transfer-42d59159ae931c4b   0/1     Completed   0          5m57s

munishchouhan avatar May 02 '24 21:05 munishchouhan

I see. Should not the column RESTARTS report something different from zero?

pditommaso avatar May 03 '24 06:05 pditommaso

I see. Should not the column RESTARTS report something different from zero?

I will test again and will post the findings

munishchouhan avatar May 03 '24 07:05 munishchouhan

In retest i can see that, when the node comes back, new set of pods created because of docker client retry

munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % kubectl get pods -n wave-local
NAME                        READY   STATUS    RESTARTS   AGE
transfer-ff4b34bdf5bf113c   1/1     Running   0          8s
transfer-fd763895bd074254   1/1     Running   0          8s
transfer-0f24872ccd9e7aa8   1/1     Running   0          8s
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % kubectl drain colima --delete-emptydir-data --force
node/colima cordoned
Warning: deleting Pods that declare no controller: wave-local/transfer-ff4b34bdf5bf113c, wave-local/transfer-fd763895bd074254, wave-local/transfer-0f24872ccd9e7aa8
evicting pod wave-local/transfer-0f24872ccd9e7aa8
evicting pod kube-system/metrics-server-67c658944b-4lttl
evicting pod wave-local/transfer-ff4b34bdf5bf113c
evicting pod kube-system/coredns-6799fbcd5-kc8r2
evicting pod kube-system/local-path-provisioner-84db5d44d9-d7fgk
evicting pod wave-local/transfer-fd763895bd074254
pod/coredns-6799fbcd5-kc8r2 evicted
pod/metrics-server-67c658944b-4lttl evicted
pod/transfer-0f24872ccd9e7aa8 evicted
pod/transfer-fd763895bd074254 evicted
pod/transfer-ff4b34bdf5bf113c evicted
pod/local-path-provisioner-84db5d44d9-d7fgk evicted
node/colima drained
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % kubectl uncordon colima
node/colima uncordoned
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % kubectl uncordon colima
node/colima already uncordoned
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % kubectl get pods -n wave-local
NAME                        READY   STATUS    RESTARTS   AGE
transfer-fe9c635db1308343   1/1     Running   0          13s
transfer-32b454a16d1d2e94   1/1     Running   0          13s
transfer-274746be8e5f0c75   1/1     Running   0          13s
munish.chouhan@Munishs-MacBook-Pro blob_cache_testing % kubectl get pods -n wave-local
NAME                        READY   STATUS      RESTARTS   AGE
transfer-274746be8e5f0c75   0/1     Completed   0          3m28s
transfer-32b454a16d1d2e94   0/1     Completed   0          3m28s
transfer-fe9c635db1308343   0/1     Completed   0          3m28s

munishchouhan avatar May 03 '24 15:05 munishchouhan