quantization topic
optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
cdma-simulation
CDMA communication system using Matlab. #github
llama2gptq
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
hsi-toolbox
Hyperspectral CNN compression and band selection
mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
tidy
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
quantized_neural_networks
Python code (packaged in Docker container) to run the experiments in "A Greedy Algorithm for Quantizing Neural Networks" by Eric Lybrand and Rayan Saab (2020).