Alexandre Marques issues

Results 12 issues of


                                            Alexandre Marques

Created base class for knowledge distillation

This PR creates a base class for knowledge distillation modifiers and changes the existing knowledge distillation modifier to inherit from the base class. The goal of this change is to...

Add support to total batch size argument for transformers transfer learning

Created the NMTrainingArguments that inherits from HF's TrainingArguments. This class allows one to add arguments to the training script and handling potential conflicts with other arguments. In particular, added "total_train_batch_size"...

Add support for ultrachat200k

Add ultrachat200k for perplexity eval

INT4 ONNX export

Replace quant_min/max values with INT8/UINT8 ranges so ONNX export is supported for other bit widths (e.g., 4).

ONNX export for weights-only quantization

This PR adds a transformation that quantizes weights for weights-only quantization. It was tested on a Llama2 model.

Per-token dynamic quantization

This PR adds support for per-token dynamic quantization. Quantization scales and zero points are computed "on-the-fly" for each new tensor. Each token has its own quantization scale and zero-point (one...

Fix loading of state_dict for for quantized transformers

The file_path was being joined twice to the files leading to wrong paths (lines 663 and 683). This PR removes duplicate joining of path.

Improve control of RN50 quantization

Define separate AddInput classes for the different branches of the bottleneck. Allows control of quantization via module class in addition to name

Updates to enable ultrachat200k

Ultrachat200k has 2 splits for training, one for sft and another for dpo. As a result it doesn't have a "train" split per se. This PR allows for a train_sft...

Fix GSM template

There's no need of a period between the question and the line break since the question will contain its own punctuation (normally interrogation mark). The period also doesn't match the...