Kawin comments

Results 30 comments of


                                            Kawin

Bos token in `prompt_input_ids` but not in `completion_input_ids`?

thanks for catching this @fahadh4ilyas ! you appear to be correct -- the completion IDs should have a BOS token at the beginning of the sequence. I think the confusion...

[KTO]: Fix nan losses and crashing job

@johncordeiro could you try the version here and see if you're sill experiencing hanging? https://github.com/kawine/trl if so, more context would be helpful @claralp thank you for all the fixes! seems...

[KTO]: Fix nan losses and crashing job

> It sends the micro batch of every GPU directly to the CPU to calculate metrics there. thanks @claralp ! maybe it's just me, but I'm not seeing the CPU...

[KTO]: Fix nan losses and crashing job

@claralp if i add the line `metrics[f"device"] = torch.Tensor([float(str(self.args.device)[-1])]).cpu()` to `get_batch_loss_metrics`, i can see in wandb that the value is always 0 (i.e., the main process), suggesting that only metrics...

[KTO]: Fix nan losses and crashing job

still getting the same thing with `metrics[f"device"] = torch.Tensor([float(str(self.accelerator.process_index)[-1])]).cpu()` and the latest version of accelerate (0.28.0)

[KTO]: Fix nan losses and crashing job

@claralp i've tried this with deepspeed and the regular data parallel (examples/accelerate_configs/multi_gpu.yaml) and it still only reports stats from the main process. if you are printing the stats in store_metrics,...

[Question] desirable_weight and undesirable_weight in KTOTrainer

Thanks for pointing this out @seanexp ! This looks like a bug, since I was under the (incorrect) impression that interleaving datasets with all_exhausted would preserve the relative size of...

KTO (unpaired) Support

@winglian given that follow-up studies have found unpaired KTO has shown to outperform DPO/IPO/CPO on various tasks (https://www.semanticscholar.org/paper/Insights-into-Alignment%3A-Evaluating-DPO-and-its-Saeidi-Verma/db407c3a60c6dc768fde8dd1088dab3be951f04e), would it be possible to add support for it in axlotl? The...

Request for details and assistance on PPO Experiments with SFT+PPO training

sorry for the late reply @roshansridhar . I don't have access to the wandb logs anymore, but my coauthor should -- @xwinxu can you paste the plots from the last...

Compatibility with quantized embeddings

Is the model you're using on Huggingface? (asking so i can reproduce the issue)