Nathan Azrak
Nathan Azrak
Ah thanks @ydm-amazon - I was aware of both, but am concerned about the quality difference in the model outputs given the reported MMLU decrease of SmoothQuant versus the "native"...
Readme updated in accordance with the current style! It's getting a bit repetitive - I suggest a separate PR that consolidates and explains the install pattern (and if Colab is...
Bumping this since this is a top Google result for this topic, and I haven't found an answer elsewhere - is there any way to "de-quantize" a model which was...
I'm curious on the official response here. My guess would be: * Currently packing does not work with completion-only training in TRL's implementation, which makes training much slower for training...
> Do u have any reference or evidence for worse performance on completion-only tuning for new tasks? I want to learn more! Nope, no references other than trying it on...
+1 to this. Copying to the default location `~/.kube/config` worked for me (I'm not sure how setting KUBECONFIG as an env var in a shell would have any effect since...
I'm using the word "pretraining" loosely since afaik `pt` is the only task in LLaMA-Factory that accepts a raw `text` field . This is useful for training with datasets that...
@varad0309 We've begun using the [benchmark tool](https://github.com/huggingface/text-generation-inference/blob/0759ec495e15a865d2a59befc2b796b5acc09b50/benchmark/README.md). `ignore_eos_token` would still be a nice-to-have for forcing long output sequences, but the benchmark tool serves its purpose well!
@mikaylagawarecki If there are any public issues/PRs that we could follow to track progress for a fused linear loss in Core, that would be great. I don't see it currently...
@youkaichao I believe the motivation here is to use multiple vLLM processes on the same node to enable a data parallel setup, e.g. having 4 GPUs on a node, and...