FasterTransformer [Question] Is it possible to use my own pretrained weights for ViT QAT

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:23.04-py3

GPU name

T4

CUDA Driver

470.141.03

Reproduced Steps

I'm trying to run calib.sh from https://github.com/NVIDIA/FasterTransformer/tree/main/examples/pytorch/vit/ViT-quantization And I'm getting an error in def load_from(...) from 'FasterTransformer/examples/pytorch/vit/ViT-quantization/vit_int8.py'

Traceback (most recent call last):
  File "main.py", line 496, in <module>
    main()
  File "main.py", line 486, in main
    args, model = setup(args)
  File "main.py", line 114, in setup
    model.load_from(np.load(args.pretrained_dir))
  File "/workspace/quantization/FasterTransformer/examples/pytorch/vit/ViT-quantization/vit_int8.py", line 429, in load_from
    self.transformer.embeddings.patch_embeddings.weight.copy_(np2th(weights["embedding/kernel"], conv=True))
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 249, in __getitem__
    raise KeyError("%s is not a file in the archive" % key)
KeyError: 'embedding/kernel is not a file in the archive'

I have pretrained ViT model 'vit_base_patch32_224_clip_laion2b' from timm, and I want to quantize it to Int8 Is it possible to use my weights from torch ? or do I have to get the weights using the train loop from 'https://github.com/NVIDIA/FasterTransformer/blob/f8e42aac45815c5be92c0915b12b9a6652386e8c/examples/pytorch/vit/ViT-quantization/main.py#L196'?

Jul 20 '23 12:07 proevgenii

perhaps there is an example of converting weights?

Jul 25 '23 08:07 proevgenii

I encountered the same problem, have you solved it?

Oct 14 '23 07:10 douzi0248

@douzi0248 It is impossible to say for sure that this problem was solved. But there are several ways: Take the architecture from the FT code and train it, this option, even in fp32, showed a metric 10% lower than the model from timm. The second option is simply to write a function that will copy the weights from your architecture to the FT architecture, in fp32 the metric is the same as for the timm model, but after quantization the metric drop is very large, >20% f1 score is lost

Oct 16 '23 07:10 proevgenii