AMMO Integration with Llama2 Post-Training Quantization Example and Tests

Open janekl opened this issue 1 year ago • 0 comments

What does this PR do ?

Integrating AMMO library to the project and providing utilities for quantizing models with Llama2 PTQ example.

Different quantization algorithms are available including INT8 SmoothQuant, INT4 AWQ, and FP8.

Main class Quantizer from the nemo.export.quantize submodule produces .qnemo tarball to be consumed by TensorRT-LLM toolbox for efficient inference. This will be a part of NeMo Framework Inference Container.

Collection: [NLP]

Changelog

Adding nvidia-ammo package to requirements
Adding nemo.export.quantize submodule for quantizing models
Adding tests.setup module to facilitate Jenkins setup
Adding PTQ test to Jenkins

Usage

Example for INT8 SmoothQuant method:

python examples/nlp/language_modeling/megatron_llama_quantization.py \
    model_file=llama2-7b-fp16.nemo \
    decoder_type=llama \
    quantization.algorithm=int8_sq \
    inference_tensor_parallel=1 \
    model_save_path=llama2-7b-fp16.qnemo

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

[x] Make sure you read and followed Contributor guidelines
[x] Did you write any new necessary tests?
[x] Did you add or update any necessary documentation?
[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[x] New Feature
[ ] Bugfix
[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

For more transparent and easier review process some components were isolated into individual MRs:

https://github.com/NVIDIA/NeMo/pull/8281
https://github.com/NVIDIA/NeMo/pull/8429

Feb 16 '24 18:02 janekl