doAzureParallel icon indicating copy to clipboard operation
doAzureParallel copied to clipboard

Ability to schedule batch jobs

Open dustindall opened this issue 8 years ago • 6 comments

Ability to schedule batch jobs. Let me know if I should move this request to rAzureBatch.

https://docs.microsoft.com/en-us/rest/api/batchservice/job-schedules

dustindall avatar Aug 15 '17 17:08 dustindall

@dustindall - Can you provide a bit more info around your scenario? Do you just want to create a job w/ Wait = FALSE and a schedule to run it on? Or is there something else you're looking for? If the former, are you thing of running the same job on a cadence, or just defer the running of the job to a later date?

paselem avatar Aug 22 '17 23:08 paselem

I was thinking in terms of being able to schedule one-time and routine jobs at a specific day/time. Something like a windows task scheduler for Azure Batch. I know Azure has many different options to schedule jobs including Azure Scheduler and Azure Data Factory (ADF) and I need to do more research to wrap my head around these services.

I was just wondering how much it would take to build this out using this package, rAzureBatch, or another R package. Reading through the REST API docs, one can add a job with a schedule and pool specifications where the pool is created once the job is called. However, the scheduling API for Azure Batch does not look as robust or customizable as the Azure Scheduler API. This package goes a different route and creates a pool first and then adds jobs/tasks to it when foreach is called.

I was hoping to have a pure R solution given that's what I'm most familiar with, but I'm thinking ADF might be the best route, but my required task is far from big data which ADF is geared towards.

Have any thoughts?

dustindall avatar Aug 23 '17 14:08 dustindall

Actually, the Batch scheduler should be able to do both one-offs and routine jobs. It would be reasonably easy to integrate this into the package if we used a pure Batch solution since the backend service supports it - the hard part is figuring out how to structure the API and what options to surface to make this easy to use. For example in Batch you can:

  • Create a pool and schedule a job against it using auto-scale and also,
  • Create a job or scheduled job which will create the pool and then destroy it when the job completes.

Each of these has it's own strengths, but communicating that via the API while keeping the package simple can be tough.

As a side note, Azure Scheduler and ADF are good for scheduling especially when you want to tether multiple services together. In this case, my feeling is that Batch should be sufficient.

paselem avatar Aug 23 '17 15:08 paselem

Okay, maybe you can put this down for the next version or "a want" for the future? I'll continue to give it some thought and maybe I can put something together to help.

dustindall avatar Aug 23 '17 17:08 dustindall

One our upcoming milestones is going to be job-centric and figuring out how to have a good experience around long running jobs. I feel that this would be a good fit for that milestone.

Heads up that we may need to push that milestone out and instead fit in a debugging & troubleshooting milestone as we have had several people reach out for help figuring out issues which we feel is slightly more important for having a good self-service and support story with our existing features.

paselem avatar Aug 23 '17 18:08 paselem

I agree. It's been a learning experience for me trying to debug issues I've encountered. Any work towards this will definitely be valuable as the package matures and more people start using it.

dustindall avatar Aug 23 '17 22:08 dustindall