ERNIE icon indicating copy to clipboard operation
ERNIE copied to clipboard

Add Ernie model support to the Liger-kernel library

Open kadirnar opened this issue 8 months ago • 3 comments

I want to add TTS support to the Ernie-0.3B model. However, there is no liger-kernel support. Are you considering adding this? And I want to release the ernie-0.3b-tts model as open source.

kadirnar avatar Aug 20 '25 20:08 kadirnar

Thank you for your support of the ERNIE model. However, we currently lack experience in supporting Liger Kernel for ERNIE-based models, which would make it challenging for us to provide assistance. Would it be possible to explore training directly through ERNIEKit instead? If any issues arise, we would be glad to provide support for that approach.

cheng221 avatar Aug 22 '25 05:08 cheng221

Thank you for your support of the ERNIE model. However, we currently lack experience in supporting Liger Kernel for ERNIE-based models, which would make it challenging for us to provide assistance. Would it be possible to explore training directly through ERNIEKit instead? If any issues arise, we would be glad to provide support for that approach.

Could you share a simple usage example for this?

Example my train code:

....

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, attn_implementation="flash_attention_2")


number_add_tokens = 7 * 4096 + 10
new_tokens = [f"<custom_token_{i}>" for i in range(0, number_add_tokens + 1)]
tokenizer.add_tokens(new_tokens)
model.resize_token_embeddings(len(tokenizer))


ds1 = load_dataset(dsn1, split="train")
ds2 = load_dataset(dsn2, split="train")


batch_total = batch_size * number_processes
train_dataset = BatchedRatioDataset(ds1, ds2, batch_total, ratio=config_ratio)


training_args = TrainingArguments(
    overwrite_output_dir=True,
    num_train_epochs=epochs,
    per_device_train_batch_size=batch_size,
    logging_steps=1,
    bf16=True,
    output_dir=f"./{base_repo_id}",
    fsdp="auto_wrap",
    report_to="wandb",
    save_steps=save_steps,
    remove_unused_columns=True,
    learning_rate=learning_rate,
    lr_scheduler_type="cosine", 
)


trainer = FSDPTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=data_collator,
    log_ratio=config_ratio
)

trainer.train()

kadirnar avatar Aug 26 '25 10:08 kadirnar

Thank you for your support of the ERNIE model. However, we currently lack experience in supporting Liger Kernel for ERNIE-based models, which would make it challenging for us to provide assistance. Would it be possible to explore training directly through ERNIEKit instead? If any issues arise, we would be glad to provide support for that approach.

Could you share a simple usage example for this?

Example my train code:

....

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) model = AutoModelForCausalLM.from_pretrained( model_name, attn_implementation="flash_attention_2")

number_add_tokens = 7 * 4096 + 10 new_tokens = [f"<custom_token_{i}>" for i in range(0, number_add_tokens + 1)] tokenizer.add_tokens(new_tokens) model.resize_token_embeddings(len(tokenizer))

ds1 = load_dataset(dsn1, split="train") ds2 = load_dataset(dsn2, split="train")

batch_total = batch_size * number_processes train_dataset = BatchedRatioDataset(ds1, ds2, batch_total, ratio=config_ratio)

training_args = TrainingArguments( overwrite_output_dir=True, num_train_epochs=epochs, per_device_train_batch_size=batch_size, logging_steps=1, bf16=True, output_dir=f"./{base_repo_id}", fsdp="auto_wrap", report_to="wandb", save_steps=save_steps, remove_unused_columns=True, learning_rate=learning_rate, lr_scheduler_type="cosine", )

trainer = FSDPTrainer( model=model, args=training_args, train_dataset=train_dataset, data_collator=data_collator, log_ratio=config_ratio )

trainer.train()

May I ask if you intend to expand the vocabulary for post-pretraining or for SFT training? If it's the former, ERNIE does not support pretraining currently. If it's the latter, you can refer to similar code examples here.

wtmlon avatar Sep 23 '25 07:09 wtmlon