Shira Guskin
Shira Guskin
Hello, should I expect a high f1 score when training only the first step (intermediate-layers) distillation on SQuAD1.1? Thanks
Hello, Could you please elaborate regarding the data-augmentation procedure you used for SQuAD1.1 task? Thank you, Shira
Below are my results when running speculative-sampling notebook. **Device:** GPU **Models and drafts:** **Phi-3 pair:** draft_model_id = "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" target_model_id = "OpenVINO/Phi-3-mini-4k-instruct-int4-ov" **Llama3.1 pair:** draft_model_id = "OpenVINO/Llama-3.1-8B-Instruct-FastDraft-150M-int8-ov" target_model_id = "fakezeta/Meta-Llama-3.1-8B-Instruct-ov-int4" **Results:**...
We want to use speculative decoding where one model runs on the xpu and another model (significantly smaller) runs on the cpu. We installed xpu build and run the script...
### Describe the issue I tried the example in: https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm/inference#learn-to-quantize-llm-and-save-quantized-model-then-run-inference-with-quantized-model , using `microsoft/Phi-3-mini-4k-instruct `model. It fails with: ``` File "C:\Users\sdp\.cache\huggingface\modules\transformers_modules\0a67737cc96d2554230f90338b163bc6380a2a85\modeling_phi3.py", line 1305, in prepare_inputs_for_generation elif past_length < input_ids.shape[1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError:...