[QST] Support Triton ensemble runtime for SageMaker multi-model deployment

Open mvidela31 opened this issue 1 year ago • 1 comments

❓ Questions & Help

I was wondering if it is possible to support the SageMaker multi-model deployment using the Triton ensemble of Merlin models.

SageMaker already supports multilpe hosting modes for Model deployment with Triton Inference Server, including the Multi-model endpoints with ensemble hosting mode. I tried to use that hosting mode with the Triton ensembles of Merlin models, but according to the last update of the Merlin SageMaker example implementation #1040 the --model-control-mode=explicit control mode (required by multiple models hosting for dynamic model loading) was removed.

I hypothesize that the cause of this incompatibility is due to the generated Merlin executor_model is not a proper Triton ensemble (since its config.pbtxt file doesn't have the correct platform platform: "ensemble", neither the required ensemble_scheduling: {...} section), but just another Triton model that executes the 0_transformworkflowtriton and 1_predictpytorchtriton steps internally. Therefore, the executor_model it's not automatically recognized as the ensemble of the 0_transformworkflowtriton and 1_predictpytorchtriton models to be executed.

EDIT: I realized that in #255 the Triton ensemble runtime was deprecated and changed to the current executor model. It is possible to support the option of exporting the recommender system artifacts as a Triton ensemble, at least for Transformers4rec systems deployment?

Dec 27 '24 17:12 mvidela31

I verified that writing the Triton ensemble with its config.pbtxt file allows to deploy the Transformers4rec systems in multi-model hosting mode using SageMaker.

For anyone facing the same problem, I share below the auxiliary functions that I used to make the Triton ensemble using the outputs of the merlin-systems export method:

import os

from nvtabular.workflow import Workflow
from transformers4rec import torch as tr


def export_t4rec_triton_ensemble(
    data_workflow: Workflow,
    model: tr.Model,
    executor_model_path: str,
    output_path: str = ".",
) -> None:
    """
    Export a Triton ensemble with the `NVtabular` data processing step and the \
    `Transformer4rec` model inference step.

    Parameters
    ----------
    data_workflow: nvtabular.workflow.Workflow
        Data processing workflow.
    model: transformers4rec.torch.Model
        Recommender model.
    executor_model_path: str
        Exported `Transformers4rec` execution model directory.
    output_path: str
        Output path to save the generated Triton ensemble files.
    """
    ensemble_cfg = get_t4rec_triton_ensemble_config(
        data_workflow=data_workflow,
        model=model,
        executor_model_config_path=os.path.join(executor_model_path, "config.pbtxt"),
    )
    os.makedirs(os.path.join(output_path, "ensemble_model"), exist_ok=True)
    os.makedirs(os.path.join(output_path, "ensemble_model", "1"), exist_ok=True)
    with open(os.path.join(output_path, "ensemble_model", "config.pbtxt"), "w") as f:
        f.write(ensemble_cfg)
    return


def get_t4rec_triton_ensemble_config(
    data_workflow: Workflow,
    model: tr.Model,
    executor_model_config_path: str,
) -> str:
    """
    Generates a `config.pbtxt` file for the `Transformers4rec` Triton ensemble.

    Parameters
    ----------
    data_workflow: nvtabular.workflow.Workflow
        Data processing workflow.
    model: transformers4rec.torch.Model
        Recommender model.
    executor_model_config_path: str
        `config.pbtxt` file path of the exported `Transformers4rec` execution model.

    Returns
    -------
    str
        Triton ensemble `config.pbtxt` file contents.
    """
    cfg = 'name: "ensemble_model"\nplatform: "ensemble"\n'
    # Ensemble input/outputs
    with open(executor_model_config_path, "r") as f:
        executor_cfg = f.read()
    cfg += "\n".join(executor_cfg.split("\n")[2:-4]) + "\n"
    # Ensemble scheduling
    cfg += "ensemble_scheduling {\n\tstep [\n"
    ## 0_transformworkflowtriton model step
    cfg += "\t\t{\n"
    cfg += '\t\t\tmodel_name: "%s"\n\t\t\tmodel_version: %s\n' % (
        "0_transformworkflowtriton",
        "-1",
    )
    for col in data_workflow.input_schema.column_names:
        cfg += '\t\t\tinput_map {\n\t\t\t\tkey: "%s"\n\t\t\t\tvalue: "%s"\n\t\t\t}\n\t\t\t\n' % (
            col,
            col,
        )
    for col in model.input_schema.column_names:
        cfg += (
            '\t\t\toutput_map {\n\t\t\t\tkey: "%s__values"\n\t\t\t\tvalue: "%s__values"\n\t\t\t\
        }\n\t\t\toutput_map {\n\t\t\t\tkey: "%s__offsets"\n\t\t\t\tvalue: "%s__offsets"\n\t\t\t}\
        \n'
            % (col, col, col, col)
        )
    cfg += "\t\t},\n"
    ## 1_predictpytorchtriton model step
    cfg += "\t\t{\n"
    cfg += '\t\t\tmodel_name: "%s"\n\t\t\tmodel_version: %s\n' % ("1_predictpytorchtriton", "-1")
    for col in model.input_schema.column_names:
        cfg += (
            '\t\t\tinput_map {\n\t\t\t\tkey: "%s__values"\n\t\t\t\tvalue: "%s__values"\n\t\t\t}\
        \n\t\t\tinput_map {\n\t\t\t\tkey: "%s__offsets"\n\t\t\t\tvalue: "%s__offsets"\n\t\t\t}\n'
            % (col, col, col, col)
        )
    for col in model.output_schema.column_names:
        cfg += '\t\t\toutput_map {\n\t\t\t\tkey: "%s"\n\t\t\t\tvalue: "%s"\n\t\t\t}\n' % (
            col,
            col,
        )
    cfg += "\t\t}\n"
    cfg += "\t]\n}"
    cfg = cfg.replace("\t", "  ")
    return cfg

Nevertheless, it would be awesome to support exporting the Triton ensemble artifacts via the merlin-systems library.

Dec 28 '24 21:12 mvidela31