Sam Blouir

Results 5 issues of Sam Blouir

The link from the text below is broken. **Extend and Test the Op in Python** Once you have built your custom op shared library, you can follow the example in...

**Please describe the bug** If parallelize shard parallel encounters Jax's FFT functions, the program crashes with an unhandled instruction error. **Please describe the expected behavior** The function runs FFT on...

**Please describe the bug** Hi, Using a bfloat16, whether by initializing an embedding layer or casting a float32 to bfloat16, causes a double free exception and crash. Sometimes it just...

unknown error

**Please describe the bug** Hi, This code works with other methods, but crashes when Pipeshard Parallel is used. **Please describe the expected behavior** The model compiles without crashing. **System information...

**Please describe the bug** When creating a toy model using ShardParallel/Zero2/PipeshardParallel and bfloat16, the first step works, but subsequent steps crash citing an error in the arguments to nccl_all_reduce_thunk.cc The...