Niket Kumar
Niket Kumar
@sw32-seo Can you please check Orbax version in all regions?
Hi David, Can you please try [0.4.7](https://github.com/google/orbax/blob/main/checkpoint/CHANGELOG.md#fixed), which might help.
Thanks for debugging the issue and associating it with SaveArgs.aggregate option! While we recreate the issue in our dev setup, please switch to aggregate=False if that works for your use...
Thank you for reporting this issue. We are working on the fix. I hope as a work around, you are fine with renaming the prefix to something like `pponetworks`?
How are you planning to construct the Generator object back from the restored *state*?
Thanks for sharing the details. Will using `numpy.random.get_state(legacy=False)` meet your requirements? In that case, Orbax already supports it. Please take a look at this unit test: https://github.com/google/orbax/blob/53e2f22234717d29eca59282b496d3a6ba897b84/checkpoint/orbax/checkpoint/random_key_checkpoint_handler_test.py#L118 Alternatively, using Json...
Thanks for clarifying the difference between MT19937 and PCG64! A JSON based solution is ideal for this scenario. I will look into it.
Based on the above error stack, it is not likely that `checkpoint_metadata_store` was called from a non-primary host. The checkpoint_metadata_store write is called right after the `tmp` dir creation, so...
Can you please attach additional details like env and error stack. May be generic debugging tips for JupyterLab can be helpful. https://stackoverflow.com/questions/74154123/how-to-debug-jupyter-kernel-crashes
> repro code in the single cell does not crash (at least I've never seen) Just to be clear, here by `cell` you meant the notebook cell. Correct?