intellinjun comments

Results 12 comments of


                                            intellinjun

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect.

`model = AutoModelForCausalLM.from_pretrained(model_path, device_map='cpu', torch_dtype=torch.float16, quantization_config=woq_config, trust_remote_code=True, _use_neural_speed=False_)` Do you want to use neural_speed? If yes, try to use neural speed = True.

Inference ONNX format model

P > C:\Windows\system32>D:\o\1\run_whisper.exe -l zh -m D:\o\1\whisper_gpu_int8_gpu-cuda_model.onnx -f D:\o\1\1.wav -osrt whisper_init_from_file_no_state: loading model from 'D:\o\1\whisper_gpu_int8_gpu-cuda_model.onnx' whisper_model_load: loading model NE_ASSERT: E:\whisper_opt\intel_extension_for_transformers\llm\runtime\graph\core\ne_layers.c:643: wtype != NE_TYPE_COUNT This method uses the cpp model for...

intellinjun

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect.

Inference ONNX format model

Load Quantized model

context size of the model keeps fall to default of 512

context size of the model keeps fall to default of 512

context size of the model keeps fall to default of 512

[CI] Add LLAMA3 ACC CI Test

Loading checkpoint shards takes too long

Add support for phi3-vision

enable wholek_prefetch