TzuHsiang (Andy) Su

Results 47 comments of TzuHsiang (Andy) Su

Update: I think it is because appengine dependency is interfering with the service. After I removed the appengine dependency, everything works perfectly. My appengine maven: ``` com.google.appengine appengine-api-1.0-sdk 1.9.64 ```

@prakhar10 are you still working on this one or can we close this?

> Spitballing a scenario here - > > What happens in the case that Kube/Core DNS (or anything node specific tbh) fails to resolve FQDNs for one of the N...

Please see https://lucid.app/lucidchart/1058db28-d6f7-4d35-95db-1231e295b989/edit?viewport_loc=-128%2C270%2C2003%2C1154%2C0_0&invitationId=inv_e2f1449c-560e-458d-81e2-cffc0d2b98e7 for the flow chart of this design. >If trino-gateway sees the health state is unhealthy, then it will maintain an in-memory storage where it will make Y...

One note: Right now the logic to fetch `RUNNING` queries and `QUEUED` queries is coupled with setting `healthy` state. See https://github.com/trinodb/trino-gateway/blob/b616d408c9bbb66ec88ec1b26935b63d91adb2ee/gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStatsJdbcMonitor.java#L85-L87 for `ClusterStatsJdbcMonitor` and https://github.com/trinodb/trino-gateway/blob/b616d408c9bbb66ec88ec1b26935b63d91adb2ee/gateway-ha/src/main/java/io/trino/gateway/ha/clustermonitor/ClusterStatsHttpMonitor.java#L70-L77 for `ClusterStatsHttpMonitor` If we are...

Another note: once the admin deactivates the trino backend, trino-gateway should clear that trino backend's state from shared DB and its local cache.

> Sounds all good except one thing. I am not so sure about creating a new table. What would be the reasons for not adding 2 fields to [gateway_backend](https://github.com/trinodb/trino-gateway/blob/main/gateway-ha/src/migrations/gateway-ha.sql#L3-L9) table?...

I'm leaning towards @oneonestar 's way for this issue tbh. On a second thought, the results from having the gateway check DB for trino states could be misleading. For example,...

@oneonestar are you proposing the following: - Each gateway instance will check trino clusters' health individually, then store the health state of each cluster locally - When routing, filter out...

> IMO, JMX metrics and JMX endpoint should be used. Does that mean we need to start another JMX server?