Make filesystem modelstore independent of creator's root path
Problem
I was trying to run a service in a docker container with a modelstore I created on my own machine as a mounted volume attached to it. This is not possible right now because the models have a fixed absolute path to the users system that created the models.
It should not be hard to circumvent this by only adding the path starting from
operatorai-model-store/domain/date/artifacts.tar.gz to the files within the operator ai directory. The path leading up to it can be added after the modelstore instance is initiated, since the path to the root directory has to be given anyway.
Example
For example, I initiate a modelstore like this:
self.__storage = ModelStore.from_file_system(
root_directory="/my/local/storage",
create_directory=False
)
The directory mounts just fine and an existing operator model store exists at the path. However all the model artifact files are irretrievable because the absolute path designated in the generated files links to the absolute path of another filesystem and not the one I chose to use when I initialized the modelstore.
9b03e4f4-72d4-4c9b-894d-41a7c646295a.json
{"storage": {
"type": "file_system",
"path": "/home/some1/Docs/stuff/ml-service/localstorage/operatorai-model-store/lr_prediction/2022.06.21-10.51.20/artifacts.tar.gz"}
}
The problem here is the leading path: /home/some1/Docs/stuff/ml-service/localstorage that should not be there if I initialize the modelstore on a different machine, for instance if I was to share a modelstore on github with multiple participants running on different systems.
Resolution
-
Take the path from the initialization and store it (already being kept in local.py):
/my/local/storage -
Store relative paths for artifacts generated:
9b03e4f4-72d4-4c9b-894d-41a7c646295a.json
{"storage": {
"type": "file_system",
"path": "operatorai-model-store/lr_prediction/2022.06.21-10.51.20/artifacts.tar.gz"}
}
Note the removal of the leading user filesystem dependent path
- Append the initialization path to the artifacts path when needed
source = f"{self.root_dir}/{storage.path}"
print(source)
>>> /my/local/storage/operatorai-model-store/lr_prediction/2022.06.21-10.51.20/artifacts.tar.gz
It may be as simple as changing to this in local.py but I am not sure
def _storage_location(self, prefix: str) -> metadata.Storage:
"""Returns a dict of the location the artifact was stored"""
return metadata.Storage.from_path(
storage_type="file_system",
path=self.relative_dir(prefix)
)
def _get_storage_location(self, meta_data: metadata.Storage) -> str:
"""Extracts the storage location from a meta data dictionary"""
return f"{self.root_prefix}/{meta_data.path}"
Thanks for raising this @hauks96! Am thinking about what could work here, stay tuned 👀
Main thoughts so far:
- I think this is both possible and will make the file system storage implementation work more similarly to the cloud storage ones
- I'm looking through the code base to ensure that I can make this change in a backwards compatible way (ie without breaking current modelstore users who are using the file system storage)
A very short term solution that could work in the mean time is to not copy the file system directory into your docker container, but just copy over the model you need for that specific container (I'm assuming you just need 1?). I've been meaning to create an example of this, will do that first.
Thanks again for raising this @hauks96 - and apologies for the delay getting to this. I've opened the PR above which should enable this to work; I'll need to test it a bit more before merging & then this can get shipped with the next version of modelstore. Feedback welcome!
This has been released as part of modelstore==0.0.77, thanks again for raising!
- https://github.com/operatorai/modelstore/pull/216