Ran Ran

Results 16 comments of Ran Ran

The training still hangs there with the latest change. Full logs: ``` 2023-10-11 01:27:34.284670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No...

I think this is related to the checkpoint saving/restoring. In the previous logs, it mentions `Saving checkpoint at step: 1`; while during restoring, I noticed both: * `Restoring orbax checkpoint...

Thanks for adding `tf.` to `tf.nightly-se`

Hi @hyeygit, Chandra is updating `nightly-se` to `tf.nightly-se` for consistence. Could you send a separate a PR to address that in your tests? Otherwise, SE tests won't be shown on...

Are we good to start review? If so, please mark it as ready, and assign it to @RissyRan @gobbleturk and @ZhiyuLi-goog. Thanks!

Hi, thanks for reaching out! Could you provide more detailed logs for `not implemented` error with 64 dim? Yeah, padding may be needed based on hardware design for 192 dims....

The recent change is merged, please have a try, https://github.com/jax-ml/jax/pull/30862

Thanks for reaching out! It seems you have tuned a little bit on this general tile size ([here](https://github.com/AI-Hypercomputer/maxtext/blob/f69734088f4746a0507646be287f4f57e5e174d7/MaxText/layers/linears.py#L403)), but I'd like to mention this size could be very different based...

Thanks for the info! Yes, ideally, we should see pallas_call as top operations. Our team is working DeepSeek-like model config, and have onboarded some functional features recently. We are also...

Thanks for reaching out! We did some internal benchmarks about DeepSeek v3 and Llama4 Maverick on [Cloud v5p](https://cloud.google.com/tpu/docs/v5p), using megablox, adamw, dtype=bf16, weight_dtype=f32, and FSDP sharding. The performance is around...