Pierre-Yves
Pierre-Yves
Hey @adamantike , Fargate is not supported at the moment. We'll evaluate adding it. Happy for contributions on this one.
@sean-smith ready now?
Still draft or shall we merge?
You could link to the AWS doc: - https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-NVIDIA-GPU.html - https://docs.nvidia.com/deploy/xid-errors/index.html - https://aws.amazon.com/blogs/compute/capturing-gpu-telemetry-on-the-amazon-ec2-accelerated-computing-instances/ - https://repost.aws/knowledge-center/ec2-linux-troubleshoot-xid-errors
cancel or do we move forward with it?
@mhuguesaws how about CloudWatch or profilers like Nsight?
Approaching 2 months, shall we close @bkulnik-auvaria ?
Hey @nithiyn , you'll want to amend the [readme](https://github.com/aws-samples/awsome-distributed-training/tree/main/3.test_cases/10.FSDP#0-prerequisites) file. For example, you could add a subsection in the **3.Launch Training** to show how to run this new case (example...
@awsankur @KeitaW are we good on this?
Add digits for the directory number?