n8programs

Results 34 comments of n8programs

But my god, float32 is brutal. 1/10th the speed of float16...

How come mlx fails in 16-bit if most big models are pretrained that way? Is it cause it doesn't use bfloat16?

Got it. Thank you for the info!

Can confirm the effectiveness of float32 end-to-end tuning on tinyllama.

Do you perform your full fine-tune in float32?

Tried training qwen-1.8b. NaN loss immediately. Will try phi-2.

Think its the float16.

Just checked - NaN w/ phi.

In-python implementation, yoinked from torch and ported w/ Claude - appears to work in training, though: ```python def _compute_T1(A): """I + A""" return mx.eye(A.shape[-1]) + A def _compute_T2(A): """I +...