amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

[Example Request] HuggingFace PyTorch 1.12 with SM Training Compiler - Multi Node Multi GPU

Open Lokiiiiii opened this issue 3 years ago • 0 comments

Describe the use case example you want to see

A notebook example describing how to use SM Training Compiler with PyTorch 1.12 This particular example will explore how to use SM Training Compiler in a Single Node Multi GPU training setting for efficient training of Computer Vision models from the HuggingFace model zoo.

This example will showcase performance of PyTorch DDP vs SageMaker DDP vs SageMaker Training Compiler over varying cluster sizes.

How would this example be used? Please describe.

Onboarding new Computer Vision customers to SM Training Compiler

Describe which SageMaker services are involved

1. SageMaker Training
1. SageMaker Training Compiler

Describe what other services (other than SageMaker) are involved*

None

Describe which dataset could be used. Provide its location in s3://sagemaker-sample-files or another source.

MNIST from the HuggingFace dataset store.

Lokiiiiii avatar Aug 18 '22 21:08 Lokiiiiii