Passing ParameterString for Processor arguments fails in Pipeline definition
Describe the bug Passing a ParameterString parameter in the arguments for Processor throws an error when defining it as part of a pipeline.
Getting TypeError: Object of type ParameterString is not JSON serializable error
To reproduce
train_or_infer = ParameterString(name="TrainOrInfer", default_value="infer")
script_processor = SKLearnProcessor(framework_version='0.20.0',
instance_count=1,
instance_type=processing_instance_type,
role=role,
sagemaker_session=sagemaker_session)
processor_step_args = script_processor.run(
inputs=[
ProcessingInput(source="s3://path", destination="/opt/ml/processing/payload"),
],
outputs=[
ProcessingOutput(output_name="payload_train", source="/opt/ml/processing/processed_payload_train"),
ProcessingOutput(output_name="payload_infer", source="/opt/ml/processing/processed_payload_infer")
],
code=os.path.join("localpath", "processing.py"),
arguments=[
'--train_or_infer', train_or_infer,
]
)
Expected behavior The pipeline definition printed out
Screenshots or logs If applicable, add screenshots or logs to help explain your problem.
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.105.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): SKLearnProcessor
- Framework version: 0.20.0
- Python version: 3.8
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
- I'm passing LocalPipelineSession as sagemaker_session.
from sagemaker.workflow.pipeline_context import LocalPipelineSession - Removing the pipelineparameter from arguments, creates the pipeline definition successfully
Seems similar to my problem. I would also add the label: "component pipelines". Have you tried just passing in a string rather than a ParameterString?
Hi @amitpeshwani , thanks for using SageMaker!
I've tried your code snippet with both LocalPipelineSession and PipelineSession under v2.105.0 but I'm not able to reproduce the issue. Both cases passed on my side and the parameterized arguments worked.
To help us for further investigation, could you please provide information below:
- The code snippet on how you define the session. On my side, I defined the session as:
local_session = LocalPipelineSession()
pipeline_session = PipelineSession()
script_processor = SKLearnProcessor(
framework_version='0.20.0',
instance_count=1,
instance_type=processing_instance_type,
role=role,
sagemaker_session=local_session, # Or pipeline_session
)
- Could you provide us the entire error log trace?
I tried one more time to test with the normal session:
from sagemaker import Session
sagemaker_session = Session()
script_processor = SKLearnProcessor(
framework_version='0.20.0',
instance_count=1,
instance_type=processing_instance_type,
role=role,
sagemaker_session=sagemaker_session, # <<<<<<<<<<<<
)
This time I can reproduce the error you've seen, see below.
.tox/py39/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:248: in wrapper
return run_func(*args, **kwargs)
.tox/py39/lib/python3.9/site-packages/sagemaker/processing.py:569: in run
self.latest_job = ProcessingJob.start_new(
.tox/py39/lib/python3.9/site-packages/sagemaker/processing.py:793: in start_new
processor.sagemaker_session.process(**process_args)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:943: in process
self._intercept_create_request(process_request, submit, self.process.__name__)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:4304: in _intercept_create_request
return create(request)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:940: in submit
LOGGER.debug("process request: %s", json.dumps(request, indent=4))
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py:234: in dumps
return cls(
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:201: in encode
chunks = list(chunks)
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:431: in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:438: in _iterencode
o = _default(o)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <json.encoder.JSONEncoder object at 0x129d27910>, o = ParameterString(name='MyPI', parameter_type=<ParameterTypeEnum.STRING: 'String'>, default_value=None)
def default(self, o):
...
> raise TypeError(f'Object of type {o.__class__.__name__} '
f'is not JSON serializable')
E TypeError: Object of type ParameterString is not JSON serializable
Note: only PipelineSession or LocalPipelineSession supports to generate the step arguments: processor_step_args = script_processor.run(...) as they prevent the .run from creating a processing job in sdk compile time and return the request body as step arguments for the pipeline.
Please replace the sagemaker_session below to be local pipeline session and let us know if it works for you
local_session = LocalPipelineSession()
script_processor = SKLearnProcessor(
framework_version='0.20.0',
instance_count=1,
instance_type=processing_instance_type,
role=role,
sagemaker_session=local_session, # <<<<<<<<<<<<
)
Hey @qidewenwhen ,
Thanks for looking into the issue. As specified in the Additional context section, I'm passing LocalPipelineSession as sagemaker_session from sagemaker.workflow.pipeline_context import LocalPipelineSession
Issue is occurring due to passing Pipeline ParameterString parameter (train_or_infer) in arguments of script_processor.run() .
Folllowing is the code snippet:
boto_session = boto3.Session(region_name="eu-west-1", profile_name='dev')
local_pipeline_session = LocalPipelineSession(
boto_session=boto_session
)
train_or_infer = ParameterString(name="TrainOrInfer", default_value='infer')
script_processor = SKLearnProcessor(
framework_version='0.20.0',
instance_count=1,
instance_type=processing_instance_type,
role=role,
sagemaker_session=local_pipeline_session
)
processor_step_args = script_processor.run(
inputs=[
ProcessingInput(source="s3://path", destination="/opt/ml/processing/payload"),
],
outputs=[
ProcessingOutput(output_name="payload_train", source="/opt/ml/processing/processed_payload_train"),
ProcessingOutput(output_name="payload_infer", source="/opt/ml/processing/processed_payload_infer")
],
code=os.path.join("/localpath", "processing.py"),
arguments=[
'--train_or_infer', train_or_infer, # <<<<<<<<<<< Pipeline Parameter
]
)
step_process = ProcessingStep(
name="PayloadGenerator",
step_args=processor_step_args
)
Error Message:
Pipeline step 'PayloadGenerator' FAILED. Failure message is: TypeError: Object of type ParameterString is not JSON serializable
Pipeline execution dd309799-85bb-4ea3-b7f2-735e5c6eb0be FAILED because step 'PayloadGenerator' failed.
Thanks for the details! Seems the error was raised during Pipeline local mode execution time rather than the compile time. I can reproduce the issue when starting an execution.
The issue may relate to these code lines: https://github.com/aws/sagemaker-python-sdk/blob/855552cf4c73576e17c51d80336db20394c1dbc2/src/sagemaker/local/pipeline.py#L94-L102
- As
v-ContainerArgumentsis a list, it recursively invoke_parse_argumentsand pass the parameterizedtrain_or_inferinto the function - However, as
train_or_inferis not a dict, it is returned directly.
Engaging the feature owner to look into it