Classification Fintuning on my dataset and meet precisionn/recall is 0?
Very impressive work, thanks a lot for providing this tool. However, when I use this tool to do fine tunning on my classification dataset (0/1), I encounterd a strange problem: I goot eval_precision/eval_recall/eval_f1 = 0.0; and auc-roc = 0.5, prc-auc = 0.504. These metrics stays unchanged during training. I didn't get any error or warning during training. And I think the unbalanced data have nothing to do witch this problem because I created a balanced data but the problem remains.
The balanced data is Balanced_lite.csv. And I modifiedd the MAX_LEN in train_classification_models from 128 into 150 and I add some paras in TrainingArguments:
python training_args = TrainingArguments( output_dir=args.save_to, overwrite_output_dir=True, evaluation_strategy="epoch", # save_strategy="epoch", num_train_epochs=TRAIN_EPOCHS, learning_rate=LEARNING_RATE, weight_decay=WEIGHT_DECAY, per_device_train_batch_size=TRAIN_BATCH_SIZE, per_device_eval_batch_size=VALID_BATCH_SIZE, disable_tqdm=False, load_best_model_at_end=True, # metric_for_best_model="prc-auc", # greater_is_better=True, save_total_limit=50, lr_scheduler_type = "cosine", logging_dir="logs/", logging_strategy = "steps", save_strategy = "epoch", seed = 2504 )
I tried the same train_classification_model.py on your example data bbbp & hiv and no problem happens....
I can't understand why this happen, can you give me some clues for this? Maybe related to the smiles? Should I provide cannonical smiles only?
Any help or discussion is valuable for me! Balanced_Lite.csv
Hi, thanks for reaching out and for your interest in our model. Based on your description, the issue most likely stems from changing MAX_LEN from 128 to 150. Because the model is being fine-tuned rather than trained from scratch, modifying the length could be introducing inconsistencies that affect model behavior. Other factors seem less likely to be the root cause. Using non-canonical SMILES shouldn’t be a problem as they are converted to SELFIES. As long as they’re valid, they should work fine. The dataset you're using contains around 500 positives and 40,000 negatives, which is quite imbalanced but HIV has a similar class distribution and still trains successfully, so this likely doesn’t explain the metrics. Reverting back to MAX_LEN = 128 to see if it resolves the issue or pretraining from scratch with the longer length might be options worth exploring. Feel free to let us know how things go, and we’ll be happy to help further if needed!
Hi, thanks for reaching out and for your interest in our model. Based on your description, the issue most likely stems from changing MAX_LEN from 128 to 150. Because the model is being fine-tuned rather than trained from scratch, modifying the length could be introducing inconsistencies that affect model behavior. Other factors seem less likely to be the root cause. Using non-canonical SMILES shouldn’t be a problem as they are converted to SELFIES. As long as they’re valid, they should work fine. The dataset you're using contains around 500 positives and 40,000 negatives, which is quite imbalanced but HIV has a similar class distribution and still trains successfully, so this likely doesn’t explain the metrics. Reverting back to MAX_LEN = 128 to see if it resolves the issue or pretraining from scratch with the longer length might be options worth exploring. Feel free to let us know how things go, and we’ll be happy to help further if needed!
Unfortunally, after I change MAX_LEN back 128 and nothing changes. The Precision/Recall/F1 stays 0 and I still don't get any error or something. I have run out of options. Besides, I tried the HIV data in finetuning example. It's working no matter MAX_LEN = 128 or 150. The model performs well. And the unbalanced data is not a issue because when I run on another pretty balanced data (8000/30000), I still get same problem that Precision/Recall/F1 equales 0. So I think the problem may comes from my data? I may change another data and try again to see if the problem stays.
Hi, then it may be related to SMILES. Could you please check whether the SELFIES representations were correctly generated? Another thing to check might be the data balance, although the model works well on the unbalanced HIV dataset. Your task and data may be more prone to issues related to data imbalance. You can create a dataset with an exact 1:1 balance and run the training.