FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

libth_transformer.so: cannot open shared object file: No such file or directory

Open ma-siddiqui opened this issue 2 years ago • 13 comments

While running the below command, i am facing errors. please advise.

python3 ./FasterTransformer/examples/pytorch/t5/summarization.py --ft_model_location t5-v1_1-base/c-models/ --hf_model_location t5-v1_1-base/ --test_ft --test_hf

[INFO] load HF model spend 4.947393 sec [INFO] MPI is not available in this PyTorch build. [INFO] MPI is not available in this PyTorch build. [INFO] load FT encoder model spend 0.317683 sec [INFO] load FT decoding model spend 0.43306 sec [INFO] MPI is not available in this PyTorch build. Traceback (most recent call last): File "./FasterTransformer/examples/pytorch/t5/summarization.py", line 404, in main() File "./FasterTransformer/examples/pytorch/t5/summarization.py", line 200, in main ft_encoder = FTT5Encoder(ft_encoder_weight.w, args.lib_path, encoder_config.num_heads, File "~FasterTransformer/examples/pytorch/t5/../../../examples/pytorch/t5/utils/ft_encoder.py", line 380, in init torch.classes.load_library(lib_path) File "~anaconda3/envs/nlp_dev/lib/python3.8/site-packages/torch/_classes.py", line 51, in load_library torch.ops.load_library(path) File "~anaconda3/envs/nlp_dev/lib/python3.8/site-packages/torch/_ops.py", line 573, in load_library ctypes.CDLL(path) File "~anaconda3/envs/nlp_dev/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: ~smart_nation/nlp/lib/libth_transformer.so: cannot open shared object file: No such file or directory

ma-siddiqui avatar May 29 '23 08:05 ma-siddiqui

Hi, I solved this problem. Firstly, you should check if libth_transformer.so is in ~smart_nation/nlp/lib/libth_transformer.so image

If not , you should find it first, you can use command "find -name libth_transformer.so" in root dir of FasterTransformer

If you can't find this file, you might delete the build dir, then re-run cmake ... make...( actually I didn't find it and I rerun the building commands )

Then you can check your building log if there is libth_transformer.so image

Finally, I find this file in FasterTransformer/build/lib/libth_transformer.so, then you can change the lib_path in you file and rerun your python command.

Hope this can be helpful.

Wanan-ni avatar Jun 02 '23 09:06 Wanan-ni

Thank you for the reply. I am able to rebuild it but unfortnately, there is no libth_transformer.so.....is created with an other name?

ma-siddiqui avatar Jun 02 '23 15:06 ma-siddiqui

is it possible to share your libth_transformer.so?

ma-siddiqui avatar Jun 03 '23 09:06 ma-siddiqui

Thank you for the reply. I am able to rebuild it but unfortnately, there is no libth_transformer.so.....is created with an other name?

After you executing "make ...", you shoud check if there is an indication showing that your compilation was successful. Is there a notification circled in red image

Wanan-ni avatar Jun 03 '23 10:06 Wanan-ni

is it possible to share your libth_transformer.so?

I am willing to share, but I am afraid it can't work, since its compliation depending on the env and we might have different env. my env is pytorch1.10, cuda11.1

Wanan-ni avatar Jun 03 '23 10:06 Wanan-ni

Thank you for your kind reply. No issues. I will use it with pytorch 1.10 and cuda 11.1

Additionally, if possible please share the same code of that release. I will try to build at my own. Much appreciated your help and support.

ma-siddiqui avatar Jun 03 '23 10:06 ma-siddiqui

Can the lib_path be set as a relative path, or an absolute path? I've tried both, but couldn't make it to work. The file libth_transformer.so is there alright, but can't make the machine ID it...

taehyunzzz avatar Jun 26 '23 17:06 taehyunzzz

May be your build command is wrong. Can you share your build command?

sfyumi avatar Jun 27 '23 02:06 sfyumi

There was an issue while remaking the library... I've seen CUDA compatibility issues with FT, but I can't seem to find the post that I've seen. I was using CUDA11.8, and I think that was the issue. Environment setups are so frustrating :(

taehyunzzz avatar Jun 27 '23 05:06 taehyunzzz

I'm running into this issue too. What are the build and make commands you are running?

shannonphu avatar Jul 07 '23 00:07 shannonphu

I'm running into this issue too. Build is successful and I can see the libth_transformer.so file in build/lib folder, but I'm not sure why it's not detecting it. I am running bart translate_example.py. I have tried multiple times and have provide both relative and absolute path, still nothing.

Env: pytorch - 1.13.0, cuda version - 11.8, RTX 3080 Ti Same issue on the g5 instances of AWS too

OSError: /workspace/FasterTransformer/build/lib/libth_transfomer.so: cannot open shared object file: No such file or directory

Can anyone please help, Been stuck at it for a long time

arnab-photon avatar Jul 27 '23 11:07 arnab-photon

maybe you need build the fasterTransformers with -DBUILD_PYT=ON, eg: cmake -DSM=80 -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON -DBUILD_PYT=ON ..

vuuihc avatar Aug 15 '23 10:08 vuuihc

@arnab-photon

  • according to the reference by Nvidia/fastertransformer
  • you may change into the dir of /FasterTransformer/examples/pytorch/bert and run python bert_example .... , then you cnonnt find the /examples/pytorch/bert/lib/libth_transformer.so
  • you need to follow the instruction totally by in dir of /FasterTransformer/build and run python ../examples/pytorch/bert/bert_example.py 1 12 32 12 64 --data_type fp16 --time

Wish this can be helpful ! :)

HeyDavid633 avatar Jan 11 '24 11:01 HeyDavid633