FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

[Question] Is it possible to use my own pretrained weights for ViT QAT

Open proevgenii opened this issue 2 years ago • 3 comments

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:23.04-py3

GPU name

T4

CUDA Driver

470.141.03

Reproduced Steps

I'm trying to run calib.sh from https://github.com/NVIDIA/FasterTransformer/tree/main/examples/pytorch/vit/ViT-quantization And I'm getting an error in def load_from(...) from 'FasterTransformer/examples/pytorch/vit/ViT-quantization/vit_int8.py'

Traceback (most recent call last):
  File "main.py", line 496, in <module>
    main()
  File "main.py", line 486, in main
    args, model = setup(args)
  File "main.py", line 114, in setup
    model.load_from(np.load(args.pretrained_dir))
  File "/workspace/quantization/FasterTransformer/examples/pytorch/vit/ViT-quantization/vit_int8.py", line 429, in load_from
    self.transformer.embeddings.patch_embeddings.weight.copy_(np2th(weights["embedding/kernel"], conv=True))
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 249, in __getitem__
    raise KeyError("%s is not a file in the archive" % key)
KeyError: 'embedding/kernel is not a file in the archive'

I have pretrained ViT model 'vit_base_patch32_224_clip_laion2b' from timm, and I want to quantize it to Int8 Is it possible to use my weights from torch ? or do I have to get the weights using the train loop from 'https://github.com/NVIDIA/FasterTransformer/blob/f8e42aac45815c5be92c0915b12b9a6652386e8c/examples/pytorch/vit/ViT-quantization/main.py#L196'?

proevgenii avatar Jul 20 '23 12:07 proevgenii

perhaps there is an example of converting weights?

proevgenii avatar Jul 25 '23 08:07 proevgenii

I encountered the same problem, have you solved it?

douzi0248 avatar Oct 14 '23 07:10 douzi0248

@douzi0248 It is impossible to say for sure that this problem was solved. But there are several ways: Take the architecture from the FT code and train it, this option, even in fp32, showed a metric 10% lower than the model from timm. The second option is simply to write a function that will copy the weights from your architecture to the FT architecture, in fp32 the metric is the same as for the timm model, but after quantization the metric drop is very large, >20% f1 score is lost

proevgenii avatar Oct 16 '23 07:10 proevgenii