wuflyh

Results 7 comments of wuflyh

I converted the mpt7b model into fastertransformer format and served it as a http service using triton inference server. however it failed to generate. e.g. with input "the model does...

name: "mpt7b" backend: "fastertransformer" max_batch_size: 1024 model_transaction_policy { decoupled: False } input [ { name: "input_ids" data_type: TYPE_UINT32 dims: [ -1 ] }, { name: "start_id" data_type: TYPE_UINT32 dims: [...

@nik-mosaic yes I am following the GPT config.pbtxt and above is my file. I generated the fastertransformer format for 2 gpus so set the "tensor_para_size" to 2

the triton inference server launches normally. there is no error in the log when inferencing.

the issue remains the same on 1-gpu (which I generated using "convert_hf_mpt_to_ft.py -o mpt7b_1gpu -i mosaicml/mpt-7b-instruct -i_g 1"

@nik-mosaic what version of triton-ft-backend do you use to serve the mpt7b model? Thanks.

Thanks @nik-mosaic that might be the reason. I was using 22.12. I will install 23.04 to give it a try.