metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

TypeError: rename: src should be string, bytes or os.PathLike, not NoneType

Open Sypek opened this issue 2 years ago • 3 comments

While creating triggering flow using AWS Step Functions, on irregular basis I got this error: TypeError: rename: src should be string, bytes or os.PathLike, not NoneType

I'm using python=3.9 and metaflow=2.9.9.

Here is the logstream:

Setting up task environment.
--
Downloading code package...
Code package downloaded.
Task is starting.
/metaflow/metaflow/__init__.py:194: UserWarning: Unbuilt egg for flow_name [unknown version] (/metaflow)
__version__ = pkg_resources.get_distribution("metaflow").version
/metaflow/metaflow/__init__.py:194: UserWarning: Unbuilt egg for flow_name [unknown version] (/metaflow)
__version__ = pkg_resources.get_distribution("metaflow").version
Bootstrapping environment...
/metaflow/metaflow/__init__.py:194: UserWarning: Unbuilt egg for flow_name [unknown version] (/metaflow)
__version__ = pkg_resources.get_distribution("metaflow").version
Traceback (most recent call last):
File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/metaflow/metaflow/plugins/conda/batch_bootstrap.py", line 104, in <module>
bootstrap_environment(sys.argv[1], sys.argv[2], sys.argv[3])
File "/metaflow/metaflow/plugins/conda/batch_bootstrap.py", line 16, in bootstrap_environment
packages = download_conda_packages(flow_name, env_id, datastore_type)
File "/metaflow/metaflow/plugins/conda/batch_bootstrap.py", line 75, in download_conda_packages
shutil.move(
File "/usr/local/lib/python3.9/shutil.py", line 825, in move
os.rename(src, real_dst)
TypeError: rename: src should be string, bytes or os.PathLike, not NoneType
/metaflow/metaflow/__init__.py:194: UserWarning: Unbuilt egg for flow_name [unknown version] (/metaflow)
__version__ = pkg_resources.get_distribution("metaflow").version

I only found a manual workaround which is deleting .metaflow directory and creating the flow again.

Sypek avatar Aug 08 '23 10:08 Sypek

@Sypek and I were looking into it more in depth and it seems to be coming from around this line https://github.com/Netflix/metaflow/blob/2eb661e428c8f82b910f8c0a5cbb376c1cb439c0/metaflow/plugins/conda/batch_bootstrap.py#L57-L60 It is caused by the cached URLs, under pkg variable having correct string value, but the actual S3 content is no longer there. In our case it's likely due to the fact, that the S3 bucket is treated as a temporary one and have 30-days object lifetime rule, which removes the cached files at some point. We need to have this rule as it is related to the data retention constraints for the project.

Not that the above issue is mentioned in comment in https://github.com/Netflix/metaflow/blob/2eb661e428c8f82b910f8c0a5cbb376c1cb439c0/metaflow/plugins/conda/batch_bootstrap.py#L34-L38 The cause of the files missing is different but it's a similar problem.

So the condition not only should check if the manifest file is present or not, but also if cached URLs point to something that exists or not.

Also, for some reason, the packages in conda env seem to be reused from the previous runs or created only at pipeline creation time and not checked later and bootstraped if needed.

TBH apart from disabling the lifecycle rule for S3 we don't see any workaround at the moment, which doesn't require changing logic in this file. Also, it seems that the bucket for data and package cache is shared; if they would be two different buckets than we could remove the lifecycle rule for the "system" bucket and leave it for "data" bucket only.

jsnowacki avatar Aug 28 '23 13:08 jsnowacki

Getting this same error on Azure when using Pypi as the datastore.

Setting up task environment.
Downloading code package...
Code package downloaded.
Task is starting.
Bootstrapping virtual environment...
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/metaflow/metaflow/plugins/pypi/bootstrap.py", line 119, in <module>
    shutil.move(tmpfile, dest)
  File "/usr/local/lib/python3.11/shutil.py", line 825, in move
    os.rename(src, real_dst)
TypeError: rename: src should be string, bytes or os.PathLike, not NoneType

Using Metaflow version 2.10.7 and python 3.11. It is what you described @jsnowacki where the conda.manifest file has a path to the wheel on Azure blob but the file is not on Azure blob. Even when I rerun the flow to bootstrap the environment, it doesn't put a new wheel file in that path. The only solution I've found is to delete everything under conda/files.pythonhosted.org and then rerun a flow so the wheels get redownloaded. This only works for a short period of time before happening again.

I checked on the Azure blob container and there is no automatic delete policy.

In my scenario what's happening is that Metaflow is looking for the cryptography 41.0.7 wheel in blob based on what's in the conda.manifest file. That obviously doesn't exist but I did find cryptography wheels for both versions 41.0.6 and 41.0.5. Is it possible that the Conda.manifest file is getting updated without updating the wheel files in storage?

Ashton-Sidhu avatar Dec 05 '23 07:12 Ashton-Sidhu

I've also gotten this issue and another way to fix it is to change from using conda to pypi and back or viceversa. To force environment recreation.

DanielVZ96 avatar Jan 25 '24 13:01 DanielVZ96