piecewise-quantization
piecewise-quantization copied to clipboard
PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation
Piecewise-Quantization
PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation
Usage
There are 5 main arguments
- quantize: whether to quantize parameters(per-channel) and activations(per-tensor).
- imagenet_path: path to folder contains train/val folder of imagenet data
- model: the type of model, should be one of ['mobilenetv2', 'resnet50', 'inceptionv3'], default to mobilenetv2
- qtype: the type of quantization for weights, should be one of ['uniform', 'pws', 'pwg', 'pwl'], default to uniform
- bits_weight: number of bits for weight quantization, default to 8
run the 4-bits quantized pws mobilenetv2 model by:
python main_cls.py --quantize --qtype pws --model mobilenetv2 --bits_Weight 4
Notes
Fake quantization
The quantization in this repo is fake quantization. Inference is NOT pure Int8 arithmetics.
TODO
- [x] Uniform quantization
- [x] PWS quantization
- [ ] update results for classification models
- [ ] PWG quantization
- [ ] PWL quantization
- [ ] detection model
- [ ] segmentation model