coremltools
coremltools copied to clipboard
need help about both model weight and activation quantization with only a float32 mlmodel
from the issue "https://developer.apple.com/forums/thread/740518 how do we use the computational power of A17 Pro Neural Engine?"
I learn that if i want to inference my mlmodel on my ipad pro with m4 soc int8 38T ane high performance, i have to use the coreml torch api to quantize both weight and activation during training time quantization with int8 datatype.
my question is: I only have a fp32 mlmodel without torch code or model, what can i do? by the way, if just only weight int8 quantization, M4 ane will use fp16 to compute or int8? thanks for your help~
- If you don't have the torch model, you will not be able to do training-aware quantization, instead, you will be only able to run the post-training-quantization through the
ct.optimize.coremlAPI. - The weight only quantization will result in a model in size of
int8, but at the runtime, the compute precision is stillfp16.