Sagemaker doesn't convert path to module name properly
Describe the bug I have the following project structure:
project
├── my_pacakge/
├── pipeline/
│ ├── train.py
└── setup.py
I create the following estimator:
estimator = PyTorch(
entry_point = "pipeline/train.py",
source_dir=".",
role=role,
py_version="py38",
framework_version="1.10",
instance_count=1,
instance_type="ml.g5.2xlarge",
session=session,
)
When training starts it tries to run the following command and produces and error:
Invoking script with the following command:
/opt/conda/bin/python3.8 -m pipeline/train
/opt/conda/bin/python3.8: No module named pipeline/train
Clearly, it didn't convert filename to module name properly. It should be pipeline.train.
If I try to specify entry_point="pipeline.train" during Estimator creation, it gives an error when I try to run .fit:
ValueError: No file named "pipeline.train" was found in directory ".".
To reproduce See above
Expected behavior
Either convert filepath to module name properly, or let us specify module as entry_point.
Screenshots or logs N/A
System information Sagemaker Studio
Additional context N/A
Hi @Rizhiy , according to SageMaker docs "If source_dir is specified, then entry_point must point to a file located at the root of source_dir." I believe this may be coming up as you are passing in both source_dir, and entry_point as a path.
Can you try moving train.py to the root of the project?
That works, but it would be great if it checked for that restriction properly at the point of initialisation.
Also, this greatly limits how I can structure my projects, basically forcing me to create a new repo for every project I want to run inside sagemaker. Would it be possible to just use python path as is without making it a module?