piecewise-quantization
piecewise-quantization copied to clipboard

→

Metadata

PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation

Readme
Issues

Piecewise-Quantization

PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation

Usage

There are 5 main arguments

quantize: whether to quantize parameters(per-channel) and activations(per-tensor).
imagenet_path: path to folder contains train/val folder of imagenet data
model: the type of model, should be one of ['mobilenetv2', 'resnet50', 'inceptionv3'], default to mobilenetv2
qtype: the type of quantization for weights, should be one of ['uniform', 'pws', 'pwg', 'pwl'], default to uniform
bits_weight: number of bits for weight quantization, default to 8

run the 4-bits quantized pws mobilenetv2 model by:

python main_cls.py --quantize --qtype pws --model mobilenetv2 --bits_Weight 4

Notes

Fake quantization

The quantization in this repo is fake quantization. Inference is NOT pure Int8 arithmetics.

TODO

[x] Uniform quantization
[x] PWS quantization
[ ] update results for classification models
[ ] PWG quantization
[ ] PWL quantization
[ ] detection model
[ ] segmentation model

About

PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation

21

Stars

3

Forks

Watchers

Owner

← Metadata

21

Stars

3

Forks

Watchers

Owner

Metadata

PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation