larin92 comments

Results 8 comments of


                                            larin92

2 train steps for a single batch?

Yes, but this can be accomplished in one execution of session.run. It looks like one of those lines should be commented out at a time

Feature Request: Quantized Mixtral

yes please, support for pre-quantized models from HuggingFace would be great. i'm not even sure i can use multi-gpu setup for DIY quantization using TensorRT-LLM, as this file doesn't have...

Mixtral convert error: t() expects a tensor with <= 2 dimensions, but self is 3D

> I managed to quantize Mixtral 8x7B to 4 bpw. > > I first tried running this command: > > ```shell > model="models--mistralai--Mixtral-8x7B-Instruct-v0.1" > model_dir="/models/$model" > model_chkpt_dir="/models/$model--trt-chkpt" > > python3...

please add compatibilty for encoder-only models

a bit of an offtopic, but longformer support would be nice as well

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

same issue, cuda 12.4, originally used torch==2.4, tried these (didn't help): ``` pip install torch==2.6.0.dev20240922+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124; ``` ``` pip install torch==2.5.0.dev20240905+cu121 --index-url https://download.pytorch.org/whl/nightly/cu121; ``` ``` pip install torch==2.6.0.dev20240923+cu121 --index-url...

larin92

2 train steps for a single batch?

Feature Request: Quantized Mixtral

Mixtral convert error: t() expects a tensor with <= 2 dimensions, but self is 3D

please add compatibilty for encoder-only models

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

CUDA error when trying to use llama3.1 8B 4bit quantized model sample

fix error in _get_eval_sampler when group_by_length enabled

Differential Attention implementation for BERT.