Cluster terminate messages are not displayed in cluster log
Moved from HPCCloud #597
Been looking at the cluster_log_url and it's
http://localhost:9999/api/v1/clusters/59497ecf0640fd17c3a5fbdf/log
That would be the localhost of the machine right though? The line %s/clusters/%s/log appears a few places and we're passing different base URL's to it.
-
task.taskflow.girder_api_url -
cumulus.config.girder.baseUrl -
getApiUrl()
@TristanWright That is the problem. 9999 is the port for the dev server not girder I think. I would guess in this can the third option is being used. We can't rely on getApiUrl(). We should be using cumulus.config.girder.baseUrl ro the girder_api_url from the taskflow object.
I think I have a fix, culprit was getApiUrl here. Should the other uses of getApiUrl be swapped with cumulus.config.girder.baseUrl?
What I'm a bit curious about is why the URLs stopped working too?
@TristanWright Yes, we should not be using getApiUrl(...) for urls that end up being calling from with a celery worker. It would be good to clean these up. Its possible that this worked in the pass it everything was running on the same machine or the hostname inside and outside the VM where the same, not sure :man_shrugging:
I cleaned them up in 1da84ba035c1f350acaf95e069655ee5b282a9e5 and fixed tests with them
getApiUrl is in three other files
- /girder/cumulus/server/utility/cluster_adapters.py
- /girder/cumulus/server/volume.py
- /girder/cumulus/server/job.py
None of these should be called from the celery worker right?
@TristanWright These should be changed as well.