candle icon indicating copy to clipboard operation
candle copied to clipboard

Fix LayerNorm gradient flow issue

Open tymat opened this issue 9 months ago • 2 comments

  • Fix LayerNorm.forward() to use tensor operations instead of scalar operations
  • Replace sum_keepdim()/size with mean_keepdim() to preserve gradients
  • Use broadcast_add() with epsilon tensor instead of scalar addition
  • Fix ops::layer_norm_slow() with same gradient-preserving changes
  • Update ops::layer_norm() to use slow implementation for proper gradients
  • Add comprehensive gradient flow test (now passes with 100% gradient flow)
  • Add numerical equivalence test to ensure accuracy is maintained
  • Fixes training issues where LayerNorm parameters weren't being updated

Resolves gradient propagation bug where only 33% of parameters received gradients during backpropagation, preventing proper model training. #3011

tymat avatar Jun 28 '25 19:06 tymat

great. it would be awesome to have more training code examples and workflows with candle

AlpineVibrations avatar Jun 29 '25 22:06 AlpineVibrations

Hey! Thanks for this :)

I think we'll have to implement this in the optimized kernels as well before we can merge. I assume all the variants (cpu, cuda, metal) suffer from the same issue?

ivarflakstad avatar Jul 01 '25 10:07 ivarflakstad