googleComputeEngineR icon indicating copy to clipboard operation
googleComputeEngineR copied to clipboard

Add rocker/r-parallel template

Open grantmcdermott opened this issue 5 years ago • 12 comments

Hi Mark,

I might be missing something obvious here, but is there a reason that the rocker/r-parallel container isn't among the list of available templates? My thinking is that this could significantly reduce pull+build times of large VM clusters that rely on, say, nested parallelisation.

In other words, instead of

gce_vm_cluster(docker_image = "rocker/r-parallel")

we could specify

gce_vm_cluster(template = "r-parallel")

and take advantage of the off-the-shelf templating speeds.

(I know that I can add+cache my own parallel container in the registry, but this seems a common enough use-case for googleComputeEngineR that others would benefit from it being one of the package templates.)

Happy to contribute if you're short on time... and provided there's no technical barrier that I'm missing.

Cheers, Grant

grantmcdermott avatar May 25 '20 20:05 grantmcdermott

Nothing blocking it as far as I know, will it alter startup time a lot? The templates I suppose do launch with the image already downloaded?

MarkEdmondson1234 avatar May 25 '20 20:05 MarkEdmondson1234

Yeah, exactly.

TBF the initial download obviously matters less as jobs scale. But — apart from the convenience of providing for a common use-case — my rather selfish motivation is not detracting from the "wow" factor when I demonstrate nested parallelization with googleComputeEngineR on small examples in class!

grantmcdermott avatar May 25 '20 23:05 grantmcdermott

Wow factors are good ;)

It should be simple as I refactored to try to make new templates easy to add, it involves a new startup script and cloud config with the template name in the appropriate folders of /inst/ and adding the option to the switch in the function. I can do it but if you fancy taking a look both work.

MarkEdmondson1234 avatar May 26 '20 04:05 MarkEdmondson1234

Thanks Mark. Probably best if you do it (and it's not a time suck), since I'd only get around to it once the quarter ends. Two more weeks!

grantmcdermott avatar May 26 '20 04:05 grantmcdermott

I think thats done it but not tested it yet, if you want to try it

MarkEdmondson1234 avatar May 26 '20 05:05 MarkEdmondson1234

thanks, just gave it a try but ran into an error:

remotes::install_github("cloudyr/googleComputeEngineR", force = T) 
library(googleComputeEngineR)
gce_vm( 
        name = "new-vm", 
        predefined_type = "n1-standard-4", 
        template = "r-parallel" 
        )                                                                       
#> 2020-05-25 23:33:51> Creating template VM
#> Error in readChar(the_file, nchars = file.info(the_file)$size) : 
#>   invalid 'nchars' argument
#> In addition: Warning messages:
#> 1: In file(con, "rb") :
#>   file("") only supports open = "w+" and open = "w+b": using the former
#> 2: In readChar(the_file, nchars = file.info(the_file)$size) :
#>   text connection used with readChar(), results may be incorrect

So it looks like it's not recognising the "r-parallel" template. When I check the the help documentation for gce_vm_template I don't see the option for "r-parallel" there... which is odd because I just installed from Master and am definitely on the latest SHA.

Any ideas? If not, don't worry: Will try again in the morning.

grantmcdermott avatar May 26 '20 06:05 grantmcdermott

I only forgot to build the docs, may work now

MarkEdmondson1234 avatar May 26 '20 06:05 MarkEdmondson1234

Sorry one more thing:

Not to preempt me actually testing, but does the template argument (supplied via ...) override the docker_image argument in gce_vm_cluster()?

From the way the function is set up now, it looks like it will pull from some docker image (default to rocker/parallel) regardless of whether a template is provided or not.

grantmcdermott avatar May 26 '20 06:05 grantmcdermott

I only forgot to build the docs, may work now

Better get some rest this side. But will test in the morning. Cheers!

grantmcdermott avatar May 26 '20 06:05 grantmcdermott

No worries, early morning here ;)

MarkEdmondson1234 avatar May 26 '20 06:05 MarkEdmondson1234

Not to preempt me actually testing, but does the template argument (supplied via ...) override the docker_image argument in gce_vm_cluster()? From the way the function is set up now, it looks like it will pull from some docker image (default to rocker/parallel) regardless of whether a template is provided or not.

The template determines which startup script runs on VM launch, which it reads off the VM metadata GCER_DOCKER_IMAGE, which in this case is docker run --name=r-parallel rocker/r-parallel - if a docker_image is supplied then it changes the metadata value.

I think this means there won't be much different to the startup speed, and the gains will be only consistency of code, but when you're next available let me know if you see any difference.

MarkEdmondson1234 avatar May 26 '20 07:05 MarkEdmondson1234

Just pulled the latest update and ran into the same error message. Stepping through the debugger, it looks like the fail point is at get_template_file() here.

In particular, it's trying to call the cloudconfig/r-parallel.yaml file when it actually needs (I think) the startupscripts/r-parallel.sh system file. (In turn, that behaviour is determined here.)

(My test example is the same as before.)

grantmcdermott avatar May 26 '20 19:05 grantmcdermott