cloudml icon indicating copy to clipboard operation
cloudml copied to clipboard

Terminal crashes on windows but job completes.

Open andrie opened this issue 7 years ago • 4 comments

This may not be an R issues, but something on the CloudML end.

I received a crash report in the terminal, despite the job still running on CloudML.

This happens after submitting:

cloudml::cloudml_train(...)

Terminal output:

INFO    2018-04-11 17:03:18 +0100       master-replica-0                Copying gs://adv-cloudml-test-195616/r-% Done
cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/hms.tar...
INFO    2018-04-11 17:03:18 +0100       master-replica-0                Copying gs://adv-cloudml-test-195616/r-
cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/cloudml.tar...
INFO    2018-04-11 17:03:18 +0100       master-replica-0                Copying gs://adv-cloudml-test-195616/r-
cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/digest.tar...
INFO    2018-04-11 17:03:18 +0100       master-replica-0                / [0/48 files][    0.0 B/ 61.0 MiB]   0
% Done
IERROR: gcloud crashed (IOError): [Errno 0] Error

If you would like to report this issue, please run the following command:
  gcloud feedback

To check gcloud for common problems, please run the following command:
  gcloud info --run-diagnostics
>>> Job 'cloudml_2018_04_11_155929102' is currently running -- please wait...
>>> [state: RUNNING; last updated 2018-04-11 17:03:48]
Execution halted
Error in shell.exec(url) :
  'C:/Users/apdev/OneDrive/github/experiments/cloudml-deployment/runs/cloudml_2018_04_11_155929102/tfruns.d/vie
w.html' not found
Calls: <Anonymous> -> shell.exec
Execution halted

andrie avatar Apr 12 '18 22:04 andrie

This still happens. Another terminal dump, in case it helps:

INFO    2018-04-24 14:54:35 +0100       master-replica-0                / [5/48 files][900.0 KiB/ 61.0
MiB]   1% Done
INFO    2018-04-24 14:54:35 +0100       master-replica-0                Copying gs://adv-cloudml-test-1
95616/r-cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/packrat.tar...
INFO    2018-04-24 14:54:35 +0100       master-replica-0                / [6/48 files][  3.0 MiB/ 61.0
MiB]   4% Done
IERROR: gcloud crashed (IOError): [Errno 0] Error

andrie avatar Apr 24 '18 14:04 andrie

Most likely, this is external and we would need a consistent repro to open an issue with Google CloudML. I've seen this a couple times, but I can't hit this consistently.

javierluraschi avatar May 31 '18 22:05 javierluraschi

got the same problem by applying mnist_mlp.R (https://github.com/rstudio/keras/blob/master/vignettes/examples/mnist_mlp.R) using cloudml_train on google cloud platform.

I think the download functionality does not work properly. I also do not have a local runs directory created as it does in the mnist_mlp.R script. I think job_collect is the problem

cloudml::job_collect('Project Name', destination = '../runs', view = 'save')

does not copy anything in the destination folder

Any Idea what we can do?

R commands:

library(cloudml) cloudml_train("mnist_mlp.R", config = "config.yml")

config.yml:

trainingInput: scaleTier: BASIC runtimeVersion: "2.1" pythonVersion: "3.7"

philipus avatar May 13 '20 08:05 philipus

Most likely, this is external and we would need a consistent repro to open an issue with Google CloudML. I've seen this a couple times, but I can't hit this consistently.

did we make some progress here. I just saw that the issue is open for a long time

philipus avatar May 14 '20 10:05 philipus