ditto icon indicating copy to clipboard operation
ditto copied to clipboard

Error 500 on gateway. Got unknown Status.Failure when a 'Command' was expected.

Open neottil opened this issue 3 years ago • 13 comments

My team have to install Ditto 2.4 on AWS EKS cluster. We have some limitation in the cluster, such as: we can't create secret, service account and role. We had made some changes on the official ditto's helm deployments (https://github.com/eclipse/packages/tree/master/charts/ditto/templates):

  • Change Secrets in ConfigMap
  • Use default service account by set Values.serviceAccount.create=false
  • Not deploy networkPolicy, pdb and podmonitor files

In this moment and with the specified changes, the pod is in running status.

By running this command into nginx pod we have succes response (200 OK) and the list of things, as correct that is it. curl -v -u ditto:ditto http://release-name-ditto-nginx:8080/api/2/things

By running this command into nginx pod we have error curl -v -u ditto:ditto http://release-name-ditto-nginx:8080/api/2/whoami

2022-06-29 16:07:41,517 ERROR [] o.e.d.g.s.e.a.HttpRequestActor akka://ditto-cluster/user/$i - Got unknown Status.Failure when a 'Command' was expected.
java.lang.ClassCastException: class akka.http.javadsl.server.directives.RouteAdapter cannot be cast to class org.eclipse.ditto.base.model.signals.commands.Command (akka.http.javadsl.server.directives.RouteAdapter and org.eclipse.ditto.base.model.signals.commands.Command are in unnamed module of loader 'app')
        at org.eclipse.ditto.gateway.service.endpoints.routes.AbstractRoute.lambda$doHandlePerRequest$162cd935$1(AbstractRoute.java:220)
        at akka.stream.javadsl.Source.$anonfun$map$1(Source.scala:1991)
        at akka.stream.impl.fusing.Map$$anon$1.onPush(Ops.scala:52)
        at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:542)
        at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:423)
        at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:650)
        at akka.stream.impl.fusing.GraphInterpreterShell.init(ActorGraphInterpreter.scala:620)
        at akka.stream.impl.fusing.ActorGraphInterpreter.tryInit(ActorGraphInterpreter.scala:727)
        at akka.stream.impl.fusing.ActorGraphInterpreter.preStart(ActorGraphInterpreter.scala:776)
        at akka.actor.Actor.aroundPreStart(Actor.scala:548)
        at akka.actor.Actor.aroundPreStart$(Actor.scala:548)
        at akka.stream.impl.fusing.ActorGraphInterpreter.aroundPreStart(ActorGraphInterpreter.scala:716)
        at akka.actor.ActorCell.create(ActorCell.scala:644)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:514)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:536)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:295)
        at akka.dispatch.Mailbox.run(Mailbox.scala:230)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)

Api /things call it's the only one that works, other api return same error.

neottil avatar Jun 29 '22 14:06 neottil

Hi,

I tried reproducing your error on k3s locally. Only thing different is that I didn't change the secrets configmap. Don't think that it should be an issue. Would you be able to provide a way to reproduce this(for example a values.yaml file)?

kalinkostashki avatar Jul 05 '22 14:07 kalinkostashki

Hi,

We tried also to reproducing on local k8s (enbedded in docker for windows) without success. The EKS cluster is managed by customer, we don't have admin priveleges. We tried to change ditto version from 2.4.0 to 2.3.2 and the product is working correctly, but we hope to use latest release!

Thanks

neottil avatar Jul 05 '22 15:07 neottil

Hi,

When this error happens is there any other information. For example the /status/health endpoint of ditto? Have any pods been restarting during these tries or maybe are in inconsistent states ?

kalinkostashki avatar Jul 06 '22 08:07 kalinkostashki

Hi,

The deployed pods are all in running status. During the remote debugging sessions, sometimes the gateway restarts, but if the remote debugger is disconnected, gateway stays up. We tried to call /status/health on ditto nginx endpoint, and this is the response: {"label":"roles","status":"UP","children":[{"label":"expected-roles","status":"UP","details":[{"INFO":{"missing-roles":[],"extra-roles":[]}}]},{"label":"things","status":"UP","children":[{"label":"100.89.11.216:2551","status":"UP","children":[{"label":"persistence","status":"UP"}]}]},{"label":"connectivity","status":"UP","children":[{"label":"100.89.2.196:2551","status":"UP","children":[{"label":"persistence","status":"UP"}]}]},{"label":"concierge","status":"UP","children":[{"label":"100.89.9.67:2551","status":"UP"}]},{"label":"things-search","status":"UP","children":[{"label":"100.89.8.54:2551","status":"UP","children":[{"label":"persistence","status":"UP"},{"label":"backgroundSync","status":"UP","details":[{"INFO":{"enabled":true,"events":[],"progressPersisted":":_","progressIndexed":":_"}}]}]}]},{"label":"policies","status":"UP","children":[{"label":"100.89.8.160:2551","status":"UP","children":[{"label":"persistence","status":"UP"}]}]},{"label":"gateway","status":"UP","children":[{"label":"100.89.7.89:2551","status":"UP"}]}]}

neottil avatar Jul 07 '22 12:07 neottil

Hi,

there was a migration from java 11 to 17 in ditto from versions 2.3 to 2.4. Please check if you considered the migration notes: https://www.eclipse.org/ditto/release_notes_240.html#migration-notes

Also are all pods upgraded to 2.4? You shouldn't have a mix of different pod versions.

kalinkostashki avatar Jul 11 '22 13:07 kalinkostashki

Hi,

We did not migrate from 2.3, we use helm files from eclipse/packages repo. https://github.com/eclipse/packages/tree/master/charts/ditto We are using chart version 2.5.2, and we does make some changes becouse we have to deploy chart in a "limited" customer platform. Only for test we changed appVersion in Chart file from 2.4.0 to 2.3.2, and the product worked. This was just only a little test, as you said there are some migrations notes to respect.

Now we would like to try to deploy chart 2.5.2 with ditto 2.4 (with changes to make it compatible with our customer platform). Attached the files we are trying to deploy , (some private information are blurred). helm.zip

neottil avatar Jul 13 '22 09:07 neottil

Hi, are there any news?

neottil avatar Jul 22 '22 12:07 neottil

Hi,

sorry for the delayed response. After looking at the code where this is happening it seems very weird that a ClassCastException would come. Without a reproducer it would be very hard to investigate further. Also is this 2.4.0 image the official ditto image from docker hub or is it one built locally with the release-2.4 branch and pushed to your local registry?

kalinkostashki avatar Jul 25 '22 14:07 kalinkostashki

Hi,

We use the official ditto image from docker hub. We will try to reproduce error in local configuration. We need to know from the customer the configuration of the kubernetes cluster that he has made available to us. For the moment we know that it is a multicluster on AWS and that some kind cannot be used, for example secret and serviceAccount. I hope this information can give you some ideas in the meantime.

neottil avatar Jul 26 '22 09:07 neottil

Hi,

We have sent you the helm files that we have modified, with which unfortunately we were not able to reproduce the error locally (probally in the AWS EKS cluster where we have to deploy product there are some configuration that we don't know). We are wondering if any changes we have made to the helm files could be the cause of the error. By any change, do you would be able to tell us if any part that we have modified or removed could be the cause of the error? As inidcate in first post, we cannot deploy network policies, pdb, pod monitor, serviceAccount, secrets.

My team and I are working on an ambitious project for an important multinational operating worldwide. Ditto must be the digital twin of hundreds of thousands of devices around the world. The company wants to use opensource products as much as possible, also contributing to the chosen projects. For our purpose, after doing some research, we think Ditto is the best choice. We would like to be able to adapt the product to the customer's platform and use version 2.4 or later.

neottil avatar Jul 27 '22 13:07 neottil

My team and I are working on an ambitious project for an important multinational operating worldwide. Ditto must be the digital twin of hundreds of thousands of devices around the world.

Sounds good. 👍

And how exactly do you expect us to solve your problem? We also do not have access to your customer's EKS cluster. Do you expect to get paid for setting up Ditto and get free premium support here?

If you cannot reproduce locally, we also can't. So I suggest that you try to reproduce on an AWS EKS by yourself, maybe also with the restrictions your customer did set up.

We did not do a "diff" of the Helm files you sent - please do this "diff" yourself and respond with the changes you made to the chart. Maybe we can then get an idea of what the cause might be - however I somehow doubt that this problem is based in the Helm chart.

thjaeckle avatar Jul 27 '22 14:07 thjaeckle

Hi, We are trying to reproduce the error locally or in an environment that we can provide to you. We do not seek premium support, but collaboration to understand where the problem lies. Attached is the file that lists the changes made to the files.

NOTES.md

neottil avatar Jul 27 '22 15:07 neottil

Hi,

just for completeness I did take a look at your NOTES.md file, but can't find anything significant :( Did you perhaps manage to preroduce/fix the issue?

kalinkostashki avatar Aug 11 '22 07:08 kalinkostashki

When you are able to provide a reproducer, please open a new ticket. Closing this one for now ..

thjaeckle avatar Sep 15 '22 11:09 thjaeckle