Celio Larcher Junior
Celio Larcher Junior
Hi @hbq1! Thank you for the fix! One question, I am still seeing a larger consumption with MultiStep when compared with the function apply_every. This was supposed to happen?
As a follow-up, I was conducting some debugging by myself and it seems that the problem is on this part of the code (line 414): ``` new_updates, new_state = jax.lax.cond(...
Just added a PR merging apply_every logic into MultiStep function. From my initial tests, it reduces the memory footprint (able to train Llama2 7b in a v3-8 now) without affecting...
I'm glad to be able to help! About the issue @hbq1 , I can open it there, no problem.
Any news on this? I'm facing the same error.
Hi @joaoguilhermeS I'm still facing the same error and I'm unable to follow to this channel. It seems that it improves in the 1.1.1 version, but not solve it.
Hi all, I managed to solve the problem. In the end, the issue was with Cloudflare, which has a default timeout of 100 seconds. I'm not sure if calls from...