shliu0
shliu0
In [tutorial](https://huggingface.co/docs/accelerate/quicktour), it is mentioned that `Some data at the end of the dataset may be duplicated so the batch can be divided equally among all workers.` So if my...
 I am training a vqvae and using 1*8 gpus, the code to log as belows, where variable `logs` is a dict: ``` if accelerator.sync_gradients and not...
In quant.py, I saw that you use ema_vocab_hit_SV rather than vocab_hit_V to record codebook hit frequency, what is the reason for doing this?