TESK icon indicating copy to clipboard operation
TESK copied to clipboard

Mount PVC with reference data to executor

Open uniqueg opened this issue 4 years ago • 7 comments

Allow a PVC on which reference data is saved, to be mounted on the executor pod for easy/fast data access.

uniqueg avatar Nov 06 '21 12:11 uniqueg

Requested by the Greek ELIXIR node, see here: https://docs.google.com/spreadsheets/d/1vBFhBQ-nFqhSL5dLjQfOWO6x9BzmV9x6l18p9GYRZdQ/edit#gid=0

Contacts: @zagganas & @vergoulis

uniqueg avatar Nov 06 '21 12:11 uniqueg

While this sounds like a useful feature to increase the performance of TESK in some use cases, I think it goes beyond of what TES tries to be, a thin API layer to execute atomic, containerized tasks in any compute backend. I therefore put this issue in the TESK repository, as I could imagine that this could possibly be provided in an implementation-specific manner that does not break the TES/DRS pattern of gaining access to data envisioned by the GA4GH Cloud WS & FASP.

While I lack the technical k8s knowledge to devise a detailed design strategy, I could imagine that one could optionally co-deploy a DRS API service with TESK that gives access to data stored on one or more PVCs mounted in the executor pods of TES tasks. Deployments making use of this setup could then access data on those PVCs without having to rely on network traffic via DRS (even if this means - not sure - that those data are not accessible outside of TESK executor pods). I'm not at all sure if this is feasible, so let's discuss :)

uniqueg avatar Nov 06 '21 12:11 uniqueg

So there are two ideas on the table:

  • Define a PVC that will always be mounted on every task pod. The PVC will have reference/public data and will persists meanwhile the TESK-api pod is running. This will be in addition to the PVC that is created for tasks. I wonder how data will be copied to the PVC? by hand (kubectl cp).
  • Add the option to deploy a DRS in the same namespace as TESK, that also shares storage with the executors? As I can see DRS does not currently have any storage. So maybe there is something I am not understanding about this solution.

lvarin avatar Nov 08 '21 08:11 lvarin

Same issue here, seems like there are multiple repos need to be updated:

https://github.com/broadinstitute/cromwell/issues/2190

And the TES api definition:

https://github.com/ga4gh/task-execution-schemas

noooonee avatar Aug 18 '22 08:08 noooonee

Thanks for bumping this, @hex43ver.

I have opened https://github.com/ga4gh/task-execution-schemas/issues/186 to discuss this on a wider scale. Perhaps you want to add your own opinions and use case? :)

uniqueg avatar Aug 18 '22 12:08 uniqueg