请问adalomo可以支持用transformer中的trainer训练么?或者未来有可能实现么?
你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。
你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。
请问adalomo除了collie中的实现,有类似lomo代码中的lomo_trainer的实现吗
因为集成到collie里了,所以应该不会再实现一个这样的trainer了。 collie里的实现实际就类似lomo_trainer,在这里了: https://github.com/OpenLMLab/collie/blob/5a3041279c1840a3cac323401b3348958b982c8e/collie/controller/trainer.py#L469-L507
你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。
我想再请教您一下,adalomo集成到hf后可以支持其他大模型吗?
可以的,这是模型无关的
你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。
@lyt719 I've raised an issue to ask for the intergration here https://github.com/huggingface/transformers/issues/29649 但还不能给出集成进去的时间
你好,现在还不可以用hf的trainer。想实现的话需要修改一下hf的trainer训练循环中的backward和step,换成adalomo里的fused_backward。
@lyt719 I've raised an issue to ask for the intergration here huggingface/transformers#29649 但还不能给出集成进去的时间
非常感谢!
Hi there!
I am trying to integrate LOMO on HF Trainer, does AdaLOMO have a hard constraint on DeepSpeed: https://github.com/OpenLMLab/LOMO/blob/85d8105c48cbd676dbf6915ee755461cd241da9b/lomo_optim/adalomo.py#L85 ? - ds_shape is only available for DS models
Hi! Oh, actually there are no hard constraints on DeepSpeed. I have made the correction in this commit. Thank you for pointing this out. https://github.com/OpenLMLab/LOMO/commit/2b826c5e641129e1a838284d46ee0082a12187f8
Ohh thanks for the quick fix!
@KaiLv69 - I made another quick fix: https://github.com/OpenLMLab/LOMO/pull/78 let me know if the change look good !
@younesbelkada Thank you very much for fixing this edge case. I have merged the PR.
(However, based on these two lines of code below, self.step_num must > 0 when calculating beta2t.)
https://github.com/OpenLMLab/LOMO/blob/ebbf410f3f7cb4d1951848fe55225777c3c83a67/lomo_optim/adalomo.py#L317-L318
Thanks so much @KaiLv69 !
I think I am doing somehting wrong inside my training loop, here is the optimizer.step() logic:
if "Lomo" in self.optimizer.optimizer.__class__.__name__:
if self.optimizer.optimizer.clip_grad_norm is not None or (
hasattr(self.optimizer.optimizer, "loss_scaler")
and self.optimizer.optimizer.loss_scaler is not None
):
self.optimizer.optimizer.grad_norm(tr_loss)
else:
# Optimizer step
self.optimizer.step()
What would be the correct way to call optimizer.step() here? (Note self.optimizer is an instance of AcceleratedOptimizer: https://github.com/huggingface/accelerate/blob/b8c85839531ded28efb77c32e0ad85af2062b27a/src/accelerate/optimizer.py#L38
@younesbelkada LOMO fused loss.backward() and optimizer.step() into optimizer.fused_backward(loss, lr)
So I think the lomo.fused_backward(loss, lr) should be call in the training_step()
https://github.com/younesbelkada/transformers/blob/68a894a5875bfd958b8254afd3bbb23db9c2e813/src/transformers/trainer.py#L2483
And there is no need for optimizer.step().
This code may be of reference to you.
# for other optimizer
loss = model(batch)
loss.backward()
optimizer.step()
# for lomo
loss = model(batch)
optimizer.fused_backward(loss, lr)
Thanks @KaiLv69 ! That seemed to work great!
Do you have any recommendation for the best practices around hyper parameters? I am trying to fine-tune Mistral 7B on imdb but the loss struggles to converge 🤯
import torch
import datasets
from transformers import TrainingArguments, AutoTokenizer, AutoModelForCausalLM
import trl
train_dataset = datasets.load_dataset('imdb', split='train')
args = TrainingArguments(
output_dir="./test-lomo",
max_steps=1000,
per_device_train_batch_size=2,
optim="adalomo",
gradient_checkpointing=False,
logging_strategy="steps",
logging_steps=1,
learning_rate=5e-4,
save_strategy="no",
run_name="lomo-imdb",
)
model_id = "mistralai/Mistral-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True).to(0)
trainer = trl.SFTTrainer(
model=model,
args=args,
train_dataset=train_dataset,
dataset_text_field='text',
max_seq_length=512,
)
trainer.train()
@younesbelkada Hi, sry for the late response.
Great work! I think the learning_rate=5e-4 is ok and maybe the per_device_train_batch_size=2 is too small. I recommend to set a larger per_device_train_batch_size like 32.
但是当我使用fine-tune Mistral 7B on imdb这段代码的时候, 却报了ValueError: adalomo is not a valid OptimizerNames, please select one of ['adamw_hf', 'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_torch_npu_fused', 'adamw_apex_fused', 'adafactor', 'adamw_anyprecision', 'sgd', 'adagrad', 'adamw_bnb_8bit', 'adamw_8bit', 'lion_8bit', 'lion_32bit', 'paged_adamw_32bit', 'paged_adamw_8bit', 'paged_lion_32bit', 'paged_lion_8bit', 'rmsprop', 'rmsprop_bnb', 'rmsprop_bnb_8bit', 'rmsprop_bnb_32bit', 'galore_adamw', 'galore_adamw_8bit', 'galore_adafactor', 'galore_adamw_layerwise', 'galore_adamw_8bit_layerwise', 'galore_adafactor_layerwise']
我安装的包如下: transformers==4.41.2 trl==0.9.4
@luoruijie 你好,我看了下,应该得更新transformers到4.42.2及以上。 FYI: https://github.com/huggingface/transformers/blob/086c74efdf98b4e64ac40863ce190144316873a5/src/transformers/training_args.py#L176
Hi, we just tried the latest transformer and lomo-optim version, which still shows the ds_shape error. Could you please update the release on PyPI, whose current release version is on Mar 6th?
@aaronlifenghan Sry about that. I just release a new version and you may have a try! Glad to help if any other question.