LOMO icon indicating copy to clipboard operation
LOMO copied to clipboard

请问adalomo可以支持用transformer中的trainer训练么?或者未来有可能实现么?

Open lyt719 opened this issue 2 years ago • 16 comments

lyt719 avatar Jan 26 '24 09:01 lyt719

你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。

KaiLv69 avatar Jan 28 '24 05:01 KaiLv69

你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。

请问adalomo除了collie中的实现,有类似lomo代码中的lomo_trainer的实现吗

Jieni05 avatar Feb 01 '24 12:02 Jieni05

因为集成到collie里了,所以应该不会再实现一个这样的trainer了。 collie里的实现实际就类似lomo_trainer,在这里了: https://github.com/OpenLMLab/collie/blob/5a3041279c1840a3cac323401b3348958b982c8e/collie/controller/trainer.py#L469-L507

KaiLv69 avatar Feb 02 '24 02:02 KaiLv69

你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。

我想再请教您一下,adalomo集成到hf后可以支持其他大模型吗?

lyt719 avatar Feb 02 '24 03:02 lyt719

可以的,这是模型无关的

KaiLv69 avatar Feb 02 '24 03:02 KaiLv69

你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。

@lyt719 I've raised an issue to ask for the intergration here https://github.com/huggingface/transformers/issues/29649 但还不能给出集成进去的时间

KaiLv69 avatar Mar 14 '24 06:03 KaiLv69

你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。

@lyt719 I've raised an issue to ask for the intergration here huggingface/transformers#29649 但还不能给出集成进去的时间

非常感谢!

lyt719 avatar Mar 15 '24 09:03 lyt719

Hi there! I am trying to integrate LOMO on HF Trainer, does AdaLOMO have a hard constraint on DeepSpeed: https://github.com/OpenLMLab/LOMO/blob/85d8105c48cbd676dbf6915ee755461cd241da9b/lomo_optim/adalomo.py#L85 ? - ds_shape is only available for DS models

younesbelkada avatar Apr 11 '24 09:04 younesbelkada

Hi! Oh, actually there are no hard constraints on DeepSpeed. I have made the correction in this commit. Thank you for pointing this out. https://github.com/OpenLMLab/LOMO/commit/2b826c5e641129e1a838284d46ee0082a12187f8

KaiLv69 avatar Apr 11 '24 09:04 KaiLv69

Ohh thanks for the quick fix!

younesbelkada avatar Apr 11 '24 09:04 younesbelkada

@KaiLv69 - I made another quick fix: https://github.com/OpenLMLab/LOMO/pull/78 let me know if the change look good !

younesbelkada avatar Apr 11 '24 09:04 younesbelkada

@younesbelkada Thank you very much for fixing this edge case. I have merged the PR. (However, based on these two lines of code below, self.step_num must > 0 when calculating beta2t.) https://github.com/OpenLMLab/LOMO/blob/ebbf410f3f7cb4d1951848fe55225777c3c83a67/lomo_optim/adalomo.py#L317-L318

KaiLv69 avatar Apr 11 '24 12:04 KaiLv69

Thanks so much @KaiLv69 ! I think I am doing somehting wrong inside my training loop, here is the optimizer.step() logic:

                    if "Lomo" in self.optimizer.optimizer.__class__.__name__:
                        if self.optimizer.optimizer.clip_grad_norm is not None or (
                            hasattr(self.optimizer.optimizer, "loss_scaler")
                            and self.optimizer.optimizer.loss_scaler is not None
                        ):
                            self.optimizer.optimizer.grad_norm(tr_loss)
                    else:
                        # Optimizer step
                        self.optimizer.step()

What would be the correct way to call optimizer.step() here? (Note self.optimizer is an instance of AcceleratedOptimizer: https://github.com/huggingface/accelerate/blob/b8c85839531ded28efb77c32e0ad85af2062b27a/src/accelerate/optimizer.py#L38

younesbelkada avatar Apr 11 '24 12:04 younesbelkada

@younesbelkada LOMO fused loss.backward() and optimizer.step() into optimizer.fused_backward(loss, lr) So I think the lomo.fused_backward(loss, lr) should be call in the training_step() https://github.com/younesbelkada/transformers/blob/68a894a5875bfd958b8254afd3bbb23db9c2e813/src/transformers/trainer.py#L2483 And there is no need for optimizer.step().

This code may be of reference to you.

# for other optimizer
loss = model(batch)
loss.backward()
optimizer.step()

# for lomo
loss = model(batch)
optimizer.fused_backward(loss, lr)

KaiLv69 avatar Apr 11 '24 12:04 KaiLv69

Thanks @KaiLv69 ! That seemed to work great!

Do you have any recommendation for the best practices around hyper parameters? I am trying to fine-tune Mistral 7B on imdb but the loss struggles to converge 🤯

import torch
import datasets
from transformers import TrainingArguments, AutoTokenizer, AutoModelForCausalLM
import trl

train_dataset = datasets.load_dataset('imdb', split='train')

args = TrainingArguments(
    output_dir="./test-lomo",
    max_steps=1000,
    per_device_train_batch_size=2,
    optim="adalomo",
    gradient_checkpointing=False,
    logging_strategy="steps",
    logging_steps=1,
    learning_rate=5e-4,
    save_strategy="no",
    run_name="lomo-imdb",
)

model_id = "mistralai/Mistral-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True).to(0)

trainer = trl.SFTTrainer(
    model=model, 
    args=args,
    train_dataset=train_dataset,
    dataset_text_field='text',
    max_seq_length=512,
)

trainer.train()

younesbelkada avatar Apr 11 '24 16:04 younesbelkada

@younesbelkada Hi, sry for the late response. Great work! I think the learning_rate=5e-4 is ok and maybe the per_device_train_batch_size=2 is too small. I recommend to set a larger per_device_train_batch_size like 32.

KaiLv69 avatar Apr 12 '24 03:04 KaiLv69

但是当我使用fine-tune Mistral 7B on imdb这段代码的时候, 却报了ValueError: adalomo is not a valid OptimizerNames, please select one of ['adamw_hf', 'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_torch_npu_fused', 'adamw_apex_fused', 'adafactor', 'adamw_anyprecision', 'sgd', 'adagrad', 'adamw_bnb_8bit', 'adamw_8bit', 'lion_8bit', 'lion_32bit', 'paged_adamw_32bit', 'paged_adamw_8bit', 'paged_lion_32bit', 'paged_lion_8bit', 'rmsprop', 'rmsprop_bnb', 'rmsprop_bnb_8bit', 'rmsprop_bnb_32bit', 'galore_adamw', 'galore_adamw_8bit', 'galore_adafactor', 'galore_adamw_layerwise', 'galore_adamw_8bit_layerwise', 'galore_adafactor_layerwise']

我安装的包如下: transformers==4.41.2 trl==0.9.4

luoruijie avatar Jun 27 '24 06:06 luoruijie

@luoruijie 你好,我看了下,应该得更新transformers到4.42.2及以上。 FYI: https://github.com/huggingface/transformers/blob/086c74efdf98b4e64ac40863ce190144316873a5/src/transformers/training_args.py#L176

KaiLv69 avatar Jul 01 '24 03:07 KaiLv69

Hi, we just tried the latest transformer and lomo-optim version, which still shows the ds_shape error. Could you please update the release on PyPI, whose current release version is on Mar 6th?

aaronlifenghan avatar Jul 02 '24 14:07 aaronlifenghan

@aaronlifenghan Sry about that. I just release a new version and you may have a try! Glad to help if any other question.

KaiLv69 avatar Jul 02 '24 15:07 KaiLv69