Poor training performance (3.3% success rate) with pi0 model on locally downloaded official LIBERO dataset
I'm experiencing extremely poor performance when training the pi0 model using PyTorch with a locally downloaded version of the official LIBERO dataset. After training, the model achieves only ~3.3% success rate, which is far below expected performance:
My Personal Training Configuration: TrainConfig( name="pi0_lyh_libero_lora_official_full", model=pi0_config.Pi0Config(), # Data configuration data=LeRobotLiberoDataConfig( repo_id="/opt/liblibai-models/user-workspace2/dataset/libero_dataset", # local_files_only=True, # Using local dataset downloaded from physical_intelligence/libero base_config=DataConfig( prompt_from_task=True, # Load task descriptions from dataset's task field ), extra_delta_transform=True, # Additional transformations required for LIBERO ), # Load converted PyTorch base model weight_loader=weight_loaders.CheckpointWeightLoader( "/opt/liblibai-models/user-workspace2/users/lyh/model_checkpoint/pi0/pytorch/pi0_base_lora" ), # Training hyperparameters num_train_steps=50_000, batch_size=32, # Adjusted based on VRAM pytorch_training_precision="bfloat16", # or "float32" if encountering numerical issues
# LoRA specific settings
# freeze_filter=pi0_config.Pi0Config().get_freeze_filter(),
# Disable EMA (not needed for LoRA fine-tuning)
ema_decay=None,
# Optional: Enable gradient checkpointing to save VRAM
# gradient_checkpointing=True,
) Expected Behavior: The model should achieve success rates comparable to the official pi0 model performance on LIBERO tasks. Actual Behavior: Training results in only ~3.3% success rate, which suggests a fundamental issue with the training setup. Questions/Requests: Is there an issue with how I'm loading the locally downloaded dataset? Are there any critical configuration parameters missing or set incorrectly? Could the LoRA configuration be causing this poor performance? What is the expected success rate baseline for pi0 on LIBERO, and what training steps/configurations are recommended to achieve it? Any guidance on debugging this performance issue would be greatly appreciated.
Same issue, but not 3.3%, pi05_libero pytorch gets much higher result than pi0_libero PyTorch
Same issue, but not 3.3%, pi05_libero pytorch gets much higher result than pi0_libero PyTorch
Have you tried fine-tuning pi0.5_pytorch on LIBERO tasks? Are you observing that both pi0_pytorch and pi0.5_pytorch models trained with train_pytorch achieve significantly lower performance compared to the numbers reported in the paper and the official JAX checkpoints? I'm curious whether this is a systemic issue with the entire pi model family's PyTorch pipeline, or if it's specifically a problem with the pi0_pytorch pipeline only.
My pi0.5_pytorch fine-tuning (batch size 16, 14k iterations) only reaches around 75% on LIBERO tasks. pi0_pytorch is even worse, only about 50%.
My pi0.5_pytorch fine-tuning (batch size 16, 14k iterations) only reaches around 75% on LIBERO tasks. pi0_pytorch is even worse, only about 50%.
it is same to me!!
My pi0.5_pytorch fine-tuning (batch size 16, 14k iterations) only reaches around 75% on LIBERO tasks. pi0_pytorch is even worse, only about 50%.
can you show your train loss? my train info is grad_norm=0.2106, loss=0.0280, param_norm=1380.0787
My pi0.5_pytorch fine-tuning (batch size 16, 14k iterations) only reaches around 75% on LIBERO tasks. pi0_pytorch is even worse, only about 50%.
can you show your train loss? my train info is grad_norm=0.2106, loss=0.0280, param_norm=1380.0787