sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Passing ParameterString for Processor arguments fails in Pipeline definition

Open amitpeshwani opened this issue 3 years ago • 7 comments

Describe the bug Passing a ParameterString parameter in the arguments for Processor throws an error when defining it as part of a pipeline.

Getting TypeError: Object of type ParameterString is not JSON serializable error

To reproduce

train_or_infer = ParameterString(name="TrainOrInfer", default_value="infer")

script_processor = SKLearnProcessor(framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=sagemaker_session)

processor_step_args = script_processor.run(
    inputs=[
        ProcessingInput(source="s3://path", destination="/opt/ml/processing/payload"),
    ],
    outputs=[
        ProcessingOutput(output_name="payload_train", source="/opt/ml/processing/processed_payload_train"),
        ProcessingOutput(output_name="payload_infer", source="/opt/ml/processing/processed_payload_infer")
    ],
    code=os.path.join("localpath", "processing.py"),
    arguments=[
        '--train_or_infer', train_or_infer, 
    ]
)   

Expected behavior The pipeline definition printed out

Screenshots or logs If applicable, add screenshots or logs to help explain your problem.

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.105.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): SKLearnProcessor
  • Framework version: 0.20.0
  • Python version: 3.8
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context

  • I'm passing LocalPipelineSession as sagemaker_session. from sagemaker.workflow.pipeline_context import LocalPipelineSession
  • Removing the pipelineparameter from arguments, creates the pipeline definition successfully

amitpeshwani avatar Aug 22 '22 05:08 amitpeshwani

Seems similar to my problem. I would also add the label: "component pipelines". Have you tried just passing in a string rather than a ParameterString?

george-chenvibes avatar Aug 23 '22 13:08 george-chenvibes

Hi @amitpeshwani , thanks for using SageMaker!

I've tried your code snippet with both LocalPipelineSession and PipelineSession under v2.105.0 but I'm not able to reproduce the issue. Both cases passed on my side and the parameterized arguments worked.

To help us for further investigation, could you please provide information below:

  1. The code snippet on how you define the session. On my side, I defined the session as:
    local_session = LocalPipelineSession()
    pipeline_session = PipelineSession()


   script_processor = SKLearnProcessor(
                                framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=local_session, # Or pipeline_session
   )
  1. Could you provide us the entire error log trace?

qidewenwhen avatar Sep 02 '22 03:09 qidewenwhen

I tried one more time to test with the normal session:

from sagemaker import Session
sagemaker_session = Session()

   script_processor = SKLearnProcessor(
                                framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=sagemaker_session, # <<<<<<<<<<<<
   )

This time I can reproduce the error you've seen, see below.

.tox/py39/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:248: in wrapper
    return run_func(*args, **kwargs)
.tox/py39/lib/python3.9/site-packages/sagemaker/processing.py:569: in run
    self.latest_job = ProcessingJob.start_new(
.tox/py39/lib/python3.9/site-packages/sagemaker/processing.py:793: in start_new
    processor.sagemaker_session.process(**process_args)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:943: in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:4304: in _intercept_create_request
    return create(request)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:940: in submit
    LOGGER.debug("process request: %s", json.dumps(request, indent=4))
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py:234: in dumps
    return cls(
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:201: in encode
    chunks = list(chunks)
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:431: in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
    yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
    yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
    yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:438: in _iterencode
    o = _default(o)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <json.encoder.JSONEncoder object at 0x129d27910>, o = ParameterString(name='MyPI', parameter_type=<ParameterTypeEnum.STRING: 'String'>, default_value=None)

    def default(self, o):
...
>       raise TypeError(f'Object of type {o.__class__.__name__} '
                        f'is not JSON serializable')
E       TypeError: Object of type ParameterString is not JSON serializable

qidewenwhen avatar Sep 02 '22 03:09 qidewenwhen

Note: only PipelineSession or LocalPipelineSession supports to generate the step arguments: processor_step_args = script_processor.run(...) as they prevent the .run from creating a processing job in sdk compile time and return the request body as step arguments for the pipeline.

Please replace the sagemaker_session below to be local pipeline session and let us know if it works for you

   local_session = LocalPipelineSession()
   script_processor = SKLearnProcessor(
                                framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=local_session, # <<<<<<<<<<<<
   )

qidewenwhen avatar Sep 02 '22 03:09 qidewenwhen

Hey @qidewenwhen ,

Thanks for looking into the issue. As specified in the Additional context section, I'm passing LocalPipelineSession as sagemaker_session from sagemaker.workflow.pipeline_context import LocalPipelineSession

Issue is occurring due to passing Pipeline ParameterString parameter (train_or_infer) in arguments of script_processor.run() .

amitpeshwani avatar Sep 02 '22 06:09 amitpeshwani

Folllowing is the code snippet:

boto_session = boto3.Session(region_name="eu-west-1", profile_name='dev')

local_pipeline_session = LocalPipelineSession(
    boto_session=boto_session
)

train_or_infer = ParameterString(name="TrainOrInfer", default_value='infer')

script_processor = SKLearnProcessor(
    framework_version='0.20.0',
    instance_count=1,
    instance_type=processing_instance_type,  
    role=role, 
    sagemaker_session=local_pipeline_session

)

processor_step_args = script_processor.run(
    inputs=[
        ProcessingInput(source="s3://path", destination="/opt/ml/processing/payload"),
    ],
    outputs=[
        ProcessingOutput(output_name="payload_train", source="/opt/ml/processing/processed_payload_train"),
        ProcessingOutput(output_name="payload_infer", source="/opt/ml/processing/processed_payload_infer")
    ],
    code=os.path.join("/localpath", "processing.py"),
    arguments=[
        '--train_or_infer', train_or_infer, # <<<<<<<<<<< Pipeline Parameter
    ]
)

step_process = ProcessingStep(
    name="PayloadGenerator",
    step_args=processor_step_args
)

Error Message:

Pipeline step 'PayloadGenerator' FAILED. Failure message is: TypeError: Object of type ParameterString is not JSON serializable
Pipeline execution dd309799-85bb-4ea3-b7f2-735e5c6eb0be FAILED because step 'PayloadGenerator' failed.

amitpeshwani avatar Sep 02 '22 06:09 amitpeshwani

Thanks for the details! Seems the error was raised during Pipeline local mode execution time rather than the compile time. I can reproduce the issue when starting an execution.

The issue may relate to these code lines: https://github.com/aws/sagemaker-python-sdk/blob/855552cf4c73576e17c51d80336db20394c1dbc2/src/sagemaker/local/pipeline.py#L94-L102

  • As v - ContainerArguments is a list, it recursively invoke _parse_arguments and pass the parameterized train_or_infer into the function
  • However, as train_or_infer is not a dict, it is returned directly.

Engaging the feature owner to look into it

qidewenwhen avatar Sep 02 '22 20:09 qidewenwhen

The fix has been merged in v2.109.0 a month ago. Closing this issue. Feel free to reopen if you have any questions.

qidewenwhen avatar Oct 12 '22 16:10 qidewenwhen