pngmafia
pngmafia
Tried using ``` input_ids = {"input_ids": decoder_input_ids.cpu(),"past_key_values": None, "attention_mask": None, "token_type_ids": None,"position_ids": None,"head_mask":None, "inputs_embeds": None, "encoder_hidden_states": encoder_hidden_states.cpu()} # some inference engines don't support int64 tensor as inputs, we convert all...
This script worked forexporting the model to onnx but its not working for optimizing the onnx graph ``` input_ids = {"input_ids": decoder_input_ids.cpu(),"encoder_hidden_states": encoder_hidden_states.cpu()} # some inference engines don't support int64...
@nvpohanh Any luck with the KV-cache support? I could help if I get proper contexual info.