Mount PVC with reference data to executor
Allow a PVC on which reference data is saved, to be mounted on the executor pod for easy/fast data access.
Requested by the Greek ELIXIR node, see here: https://docs.google.com/spreadsheets/d/1vBFhBQ-nFqhSL5dLjQfOWO6x9BzmV9x6l18p9GYRZdQ/edit#gid=0
Contacts: @zagganas & @vergoulis
While this sounds like a useful feature to increase the performance of TESK in some use cases, I think it goes beyond of what TES tries to be, a thin API layer to execute atomic, containerized tasks in any compute backend. I therefore put this issue in the TESK repository, as I could imagine that this could possibly be provided in an implementation-specific manner that does not break the TES/DRS pattern of gaining access to data envisioned by the GA4GH Cloud WS & FASP.
While I lack the technical k8s knowledge to devise a detailed design strategy, I could imagine that one could optionally co-deploy a DRS API service with TESK that gives access to data stored on one or more PVCs mounted in the executor pods of TES tasks. Deployments making use of this setup could then access data on those PVCs without having to rely on network traffic via DRS (even if this means - not sure - that those data are not accessible outside of TESK executor pods). I'm not at all sure if this is feasible, so let's discuss :)
So there are two ideas on the table:
- Define a PVC that will always be mounted on every task pod. The PVC will have reference/public data and will persists meanwhile the
TESK-apipod is running. This will be in addition to the PVC that is created for tasks. I wonder how data will be copied to the PVC? by hand (kubectl cp). - Add the option to deploy a
DRSin the same namespace as TESK, that also shares storage with the executors? As I can seeDRSdoes not currently have any storage. So maybe there is something I am not understanding about this solution.
Same issue here, seems like there are multiple repos need to be updated:
https://github.com/broadinstitute/cromwell/issues/2190
And the TES api definition:
https://github.com/ga4gh/task-execution-schemas
Thanks for bumping this, @hex43ver.
I have opened https://github.com/ga4gh/task-execution-schemas/issues/186 to discuss this on a wider scale. Perhaps you want to add your own opinions and use case? :)