venus comments

Results 10 comments of


                                            venus

undefined symbol errors for libtransformer_server.so

I have error too, tritonserver r21.10 @Taka152 ![image](https://user-images.githubusercontent.com/15058321/167762761-9353aa14-d1d5-472a-a8d5-31967527d626.png)

undefined symbol errors for libtransformer_server.so

@Taka152 that seems is Protobuf problem,(I aright keep same version) cmake version 3.20.6 Protobuf version 3.20.1 I fixed this issuse, Protobuf version problem

undefined symbol errors for libtransformer_server.so

@Taka152 更新一下，使用USE_TRITONBACKEND ON生成对应libtriton_lightseq.so文件进行load模型或者rdd -l都可以正常；使用server/ 下的so文件仍然报undefined symbol: _ZN6nvidia15inferenceserver19GetDataTypeByteSizeENS0_8DataTypeE 错误；

Confused about cuda version, cuda 10.2 or cuda 11.6 are supported or not by lightseq inference

just confirmed, V2.2.0 version CMakeList.txt require cuda version is 10.1 latest version require cuda version is 11.6 that means base on your version of cuda, build different .so file, right?...

找不到 gpt.pb.h 文件

can you introduct how to generated them? like gpt.pb.h ,bert.pb.h

base on https://github.com/bytedance/lightseq/tree/master/lightseq/inference/triton_backend, I built dynamic link library ，after cmake,when I do make install, then cannot find bert.pb.h (env-3.8.8) [root@bedd035bb520 /data_dev/lightseq/lightseq/inference/triton_backend/build]# make install -j 10 Consolidate compiler generated dependencies of...

support for MBART (big models)?

> initializing bart tokenizer... creating lightseq model... Parsing hdf5: /home/sysadmin/downlaod/lightseq_models/lightseq_mbart_base.hdf5 loading 976 MB of embedding weight. Finish loading src_emb_wei from host to device loading 1073 MB of embedding weight. Finish...

torch 2.3.1+cuda12.1 with flux lora train, but got the train error when use single GPU, pls help

### single GPU error: even If I train sd-lora or sdxl-lora using single GPU, it all have this error: sd-scripts commit ID: f8f5b1695842cce15ba14e7edfacbeee41e71a75 ### Command: python -m accelerate.commands.launch --num_cpu_threads_per_process=2 sd-scripts/train_network.py...

torch 2.3.1+cuda12.1 with the latest code, but got the train error when use single GPU, pls help

if I use 2 GPUs, everything is fine: ![image](https://github.com/user-attachments/assets/5c4562be-bc56-4bb7-a4ad-954ea768c395) ![image](https://github.com/user-attachments/assets/9f61af6e-2f59-4822-975a-cefd5e6606e6)

[Bug] version libcudnn_cnn_infer.so.8 not defined in file libcudnn_cnn_infer.so.8 with link time reference

same, my env is 2.3.1+cu12.1