Results 2 issues of sega-hsj

I delete the metric learning part of the code and run the CIFAR100 experiment, so all the model is the Pretrain model at test time. It seems like I get...

### 🐛 Describe the bug ``` world_size = torch.distributed.get_world_size() shard_pg = ProcessGroup(tp_degree=world_size) if args.shardinit else None default_dist_spec = ShardSpec([-1], [world_size]) if args.shardinit else None with ColoInitContext(device=get_current_device(), dtype=torch.half, default_dist_spec=default_dist_spec, default_pg=shard_pg): model...

bug