Nathan Azrak comments

Results 32 comments of


                                            Nathan Azrak

Support for FP8 quantization with TensorRT-LLM

Ah thanks @ydm-amazon - I was aware of both, but am concerned about the quality difference in the model outputs given the reported MMLU decrease of SmoothQuant versus the "native"...

Adds dependencies and extras for torch 2.3.0 with new xformers versions

Readme updated in accordance with the current style! It's getting a bit repetitive - I suggest a separate PR that consolidates and explains the install pattern (and if Colab is...

save_pretrained 4-bit models with bitsandbytes

Bumping this since this is a top Google result for this topic, and I haven't found an answer elsewhere - is there any way to "de-quantize" a model which was...

Why does the alignment-handbook account for user & system Inputs in loss calculation

I'm curious on the official response here. My guess would be: * Currently packing does not work with completion-only training in TRL's implementation, which makes training much slower for training...

Why does the alignment-handbook account for user & system Inputs in loss calculation

> Do u have any reference or evidence for worse performance on completion-only tuning for new tasks? I want to learn more! Nope, no references other than trying it on...

Attach to kubernetes: Support alternate config files

+1 to this. Copying to the default location `~/.kube/config` worked for me (I'm not sure how setting KUBECONFIG as an env var in a shell would have any effect since...

Neat packing with pretraining?

I'm using the word "pretraining" loosely since afaik `pt` is the only task in LLaMA-Factory that accepts a raw `text` field . This is useful for training with datasets that...

Expose `ignore_eos_token` to HTTP endpoints

@varad0309 We've begun using the [benchmark tool](https://github.com/huggingface/text-generation-inference/blob/0759ec495e15a865d2a59befc2b796b5acc09b50/benchmark/README.md). `ignore_eos_token` would still be a nice-to-have for forcing long output sequences, but the benchmark tool serves its purpose well!

Add feature ligerceloss

@mikaylagawarecki If there are any public issues/PRs that we could follow to track progress for a fused linear loss in Core, that would be great. I don't see it currently...

[Bug]: Is vllm compatible with torchrun?

@youkaichao I believe the motivation here is to use multiple vLLM processes on the same node to enable a data parallel setup, e.g. having 4 GPUs on a node, and...