FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

infer_visiontransformer_op.py error

Open macrocredit opened this issue 2 years ago • 2 comments

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:22.09-py3

GPU name

A-5000

CUDA Driver

12

Reproduced Steps


Error encounter while running the below in the docs:

cd $WORKSPACE/examples/pytorch/vit
pip install ml_collections

##profile of FP16/FP32 model
python infer_visiontransformer_op.py \
  --model_type=ViT-B_16  \
  --img_size=384 \
  --pretrained_dir=./ViT-quantization/ViT-B_16.npz \
  --batch-size=32 \
  --th-path=$WORKSPACE/build/lib/libth_transformer.so

In running the infer_visiontransformer_op.py file, I was able to load torch.classes.VisionTransformerClass. 

However, there is a run time error at FasterTransformer/src/fastertransformer/utils/cublasMMWrapper.cc:306.

This was thrown at "op_tmp = vit.forward(images)" line in the infer_visiontransformer_op.py file.

Below is the line where the error was thrown:

check_cuda_error(cublasGemmEx(cublas_handle_,
                                      transa,
                                      transb,
                                      m,
                                      n,
                                      k,
                                      alpha,
                                      A,
                                      Atype_,
                                      lda,
                                      B,
                                      Btype_,
                                      ldb,
                                      beta,
                                      C,
                                      Ctype_,
                                      ldc,
                                      computeType_,
                                      static_cast<cublasGemmAlgo_t>(cublasAlgo)));

I am not sure what is happening. I followed all the build steps for FasterTransformer with C++; there was no problem.

macrocredit avatar Jun 22 '23 05:06 macrocredit

Actually, I was able to rerun this. I need to set up docker. However, I was wondering why without docker it still posts the above error? Is there any config on my own machine that is different from the docker side? Thanks.

If possible, could you config an easy-to-run non docker version?

macrocredit avatar Jun 22 '23 22:06 macrocredit

@macrocredit Hey, can u please tell, how do u convert pytorch weights to FasterTransformer format(.npz)?

proevgenii avatar Sep 20 '23 16:09 proevgenii