BitNet
BitNet copied to clipboard
KBLaM and/or fine-tuning?
I am no expert, so please forgive the naive questions, but:
-
Is there any way to integrate KBLaM into these models?
-
Is it possible to fine-tune the models as I understand is recommended practice for KBLaM?
Links or information would be greatly appreciated.
Thanks for any response!
I can provide a minimal implementation using TRL.
install the necessary packages:
pip install trl
pip install git+https://github.com/shumingma/transformers.git
a sample code snippet (Please note that it is only a minimal example—hyperparameters such as batch size and learning rate should be tuned for optimal performance):
from trl import SFTConfig, SFTTrainer
from datasets import load_dataset
dataset = load_dataset("trl-lib/Capybara", split="train")
training_args = SFTConfig(
max_length=2048,
output_dir="/tmp",
per_device_train_batch_size=4,
)
trainer = SFTTrainer(
model="microsoft/bitnet-b1.58-2B-4T-bf16",
train_dataset=dataset,
args=training_args,
)
trainer.train()