Results 3 issues of cyh-ustc

**Is your feature request related to a problem? Please describe.** triton trace api timing only contains total inference time how to get detailed timing like operator level. or kernel level?...

https://microsoftedge.microsoft.com/addons/detail/ustcpass/hbdkmpdpjgdimjopgeklhhejedmpiioj

https://github.com/huggingface/optimum/blob/c55f8824f58db1a2f1cfc7879451b4743b8f206b/optimum/onnxruntime/modeling_decoder.py#L649 ``` python def prepare_inputs_for_generation(self, input_ids, past_key_values=None, **kwargs): if past_key_values is not None: past_length = past_key_values[0][0].shape[2] # Some generation methods already pass only the last input ID if input_ids.shape[1] >...

question
onnxruntime