ERNIE Add Ernie model support to the Liger-kernel library

I want to add TTS support to the Ernie-0.3B model. However, there is no liger-kernel support. Are you considering adding this? And I want to release the ernie-0.3b-tts model as open source.

Aug 20 '25 20:08 kadirnar

Thank you for your support of the ERNIE model. However, we currently lack experience in supporting Liger Kernel for ERNIE-based models, which would make it challenging for us to provide assistance. Would it be possible to explore training directly through ERNIEKit instead? If any issues arise, we would be glad to provide support for that approach.

Aug 22 '25 05:08 cheng221

Thank you for your support of the ERNIE model. However, we currently lack experience in supporting Liger Kernel for ERNIE-based models, which would make it challenging for us to provide assistance. Would it be possible to explore training directly through ERNIEKit instead? If any issues arise, we would be glad to provide support for that approach.

Could you share a simple usage example for this?

Example my train code:

....

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, attn_implementation="flash_attention_2")


number_add_tokens = 7 * 4096 + 10
new_tokens = [f"<custom_token_{i}>" for i in range(0, number_add_tokens + 1)]
tokenizer.add_tokens(new_tokens)
model.resize_token_embeddings(len(tokenizer))


ds1 = load_dataset(dsn1, split="train")
ds2 = load_dataset(dsn2, split="train")


batch_total = batch_size * number_processes
train_dataset = BatchedRatioDataset(ds1, ds2, batch_total, ratio=config_ratio)


training_args = TrainingArguments(
    overwrite_output_dir=True,
    num_train_epochs=epochs,
    per_device_train_batch_size=batch_size,
    logging_steps=1,
    bf16=True,
    output_dir=f"./{base_repo_id}",
    fsdp="auto_wrap",
    report_to="wandb",
    save_steps=save_steps,
    remove_unused_columns=True,
    learning_rate=learning_rate,
    lr_scheduler_type="cosine", 
)


trainer = FSDPTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=data_collator,
    log_ratio=config_ratio
)

trainer.train()

Aug 26 '25 10:08 kadirnar

Thank you for your support of the ERNIE model. However, we currently lack experience in supporting Liger Kernel for ERNIE-based models, which would make it challenging for us to provide assistance. Would it be possible to explore training directly through ERNIEKit instead? If any issues arise, we would be glad to provide support for that approach.

Could you share a simple usage example for this?

Example my train code:

....

tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) model = AutoModelForCausalLM.from_pretrained( model_name, attn_implementation="flash_attention_2")

number_add_tokens = 7 * 4096 + 10 new_tokens = [f"<custom_token_{i}>" for i in range(0, number_add_tokens + 1)] tokenizer.add_tokens(new_tokens) model.resize_token_embeddings(len(tokenizer))

ds1 = load_dataset(dsn1, split="train") ds2 = load_dataset(dsn2, split="train")

batch_total = batch_size * number_processes train_dataset = BatchedRatioDataset(ds1, ds2, batch_total, ratio=config_ratio)

training_args = TrainingArguments( overwrite_output_dir=True, num_train_epochs=epochs, per_device_train_batch_size=batch_size, logging_steps=1, bf16=True, output_dir=f"./{base_repo_id}", fsdp="auto_wrap", report_to="wandb", save_steps=save_steps, remove_unused_columns=True, learning_rate=learning_rate, lr_scheduler_type="cosine", )

trainer = FSDPTrainer( model=model, args=training_args, train_dataset=train_dataset, data_collator=data_collator, log_ratio=config_ratio )

trainer.train()

May I ask if you intend to expand the vocabulary for post-pretraining or for SFT training? If it's the former, ERNIE does not support pretraining currently. If it's the latter, you can refer to similar code examples here.

Sep 23 '25 07:09 wtmlon