[QST] Support Triton ensemble runtime for SageMaker multi-model deployment
❓ Questions & Help
I was wondering if it is possible to support the SageMaker multi-model deployment using the Triton ensemble of Merlin models.
SageMaker already supports multilpe hosting modes for Model deployment with Triton Inference Server, including the Multi-model endpoints with ensemble hosting mode. I tried to use that hosting mode with the Triton ensembles of Merlin models, but according to the last update of the Merlin SageMaker example implementation #1040 the --model-control-mode=explicit control mode (required by multiple models hosting for dynamic model loading) was removed.
I hypothesize that the cause of this incompatibility is due to the generated Merlin executor_model is not a proper Triton ensemble (since its config.pbtxt file doesn't have the correct platform platform: "ensemble", neither the required ensemble_scheduling: {...} section), but just another Triton model that executes the 0_transformworkflowtriton and 1_predictpytorchtriton steps internally. Therefore, the executor_model it's not automatically recognized as the ensemble of the 0_transformworkflowtriton and 1_predictpytorchtriton models to be executed.
EDIT: I realized that in #255 the Triton ensemble runtime was deprecated and changed to the current executor model. It is possible to support the option of exporting the recommender system artifacts as a Triton ensemble, at least for Transformers4rec systems deployment?
I verified that writing the Triton ensemble with its config.pbtxt file allows to deploy the Transformers4rec systems in multi-model hosting mode using SageMaker.
For anyone facing the same problem, I share below the auxiliary functions that I used to make the Triton ensemble using the outputs of the merlin-systems export method:
import os
from nvtabular.workflow import Workflow
from transformers4rec import torch as tr
def export_t4rec_triton_ensemble(
data_workflow: Workflow,
model: tr.Model,
executor_model_path: str,
output_path: str = ".",
) -> None:
"""
Export a Triton ensemble with the `NVtabular` data processing step and the \
`Transformer4rec` model inference step.
Parameters
----------
data_workflow: nvtabular.workflow.Workflow
Data processing workflow.
model: transformers4rec.torch.Model
Recommender model.
executor_model_path: str
Exported `Transformers4rec` execution model directory.
output_path: str
Output path to save the generated Triton ensemble files.
"""
ensemble_cfg = get_t4rec_triton_ensemble_config(
data_workflow=data_workflow,
model=model,
executor_model_config_path=os.path.join(executor_model_path, "config.pbtxt"),
)
os.makedirs(os.path.join(output_path, "ensemble_model"), exist_ok=True)
os.makedirs(os.path.join(output_path, "ensemble_model", "1"), exist_ok=True)
with open(os.path.join(output_path, "ensemble_model", "config.pbtxt"), "w") as f:
f.write(ensemble_cfg)
return
def get_t4rec_triton_ensemble_config(
data_workflow: Workflow,
model: tr.Model,
executor_model_config_path: str,
) -> str:
"""
Generates a `config.pbtxt` file for the `Transformers4rec` Triton ensemble.
Parameters
----------
data_workflow: nvtabular.workflow.Workflow
Data processing workflow.
model: transformers4rec.torch.Model
Recommender model.
executor_model_config_path: str
`config.pbtxt` file path of the exported `Transformers4rec` execution model.
Returns
-------
str
Triton ensemble `config.pbtxt` file contents.
"""
cfg = 'name: "ensemble_model"\nplatform: "ensemble"\n'
# Ensemble input/outputs
with open(executor_model_config_path, "r") as f:
executor_cfg = f.read()
cfg += "\n".join(executor_cfg.split("\n")[2:-4]) + "\n"
# Ensemble scheduling
cfg += "ensemble_scheduling {\n\tstep [\n"
## 0_transformworkflowtriton model step
cfg += "\t\t{\n"
cfg += '\t\t\tmodel_name: "%s"\n\t\t\tmodel_version: %s\n' % (
"0_transformworkflowtriton",
"-1",
)
for col in data_workflow.input_schema.column_names:
cfg += '\t\t\tinput_map {\n\t\t\t\tkey: "%s"\n\t\t\t\tvalue: "%s"\n\t\t\t}\n\t\t\t\n' % (
col,
col,
)
for col in model.input_schema.column_names:
cfg += (
'\t\t\toutput_map {\n\t\t\t\tkey: "%s__values"\n\t\t\t\tvalue: "%s__values"\n\t\t\t\
}\n\t\t\toutput_map {\n\t\t\t\tkey: "%s__offsets"\n\t\t\t\tvalue: "%s__offsets"\n\t\t\t}\
\n'
% (col, col, col, col)
)
cfg += "\t\t},\n"
## 1_predictpytorchtriton model step
cfg += "\t\t{\n"
cfg += '\t\t\tmodel_name: "%s"\n\t\t\tmodel_version: %s\n' % ("1_predictpytorchtriton", "-1")
for col in model.input_schema.column_names:
cfg += (
'\t\t\tinput_map {\n\t\t\t\tkey: "%s__values"\n\t\t\t\tvalue: "%s__values"\n\t\t\t}\
\n\t\t\tinput_map {\n\t\t\t\tkey: "%s__offsets"\n\t\t\t\tvalue: "%s__offsets"\n\t\t\t}\n'
% (col, col, col, col)
)
for col in model.output_schema.column_names:
cfg += '\t\t\toutput_map {\n\t\t\t\tkey: "%s"\n\t\t\t\tvalue: "%s"\n\t\t\t}\n' % (
col,
col,
)
cfg += "\t\t}\n"
cfg += "\t]\n}"
cfg = cfg.replace("\t", " ")
return cfg
Nevertheless, it would be awesome to support exporting the Triton ensemble artifacts via the merlin-systems library.