Xin (Simon) Dong
Xin (Simon) Dong
I have implemented ResNet-XNOR and reproduce result of xnor paper. I will release it soon when I have time.
@tianyu-l @lessw2020 FYI, I am using this trick. ```python hf_ds = HuggingFaceDataset( dataset_name, dataset_path, tokenizer, seq_len, world_size, rank, infinite ) if shuffle: hf_ds._data = hf_ds._data.shuffle(seed=int(rank*10007+int(time.time()))) ```
Thanks for updating. @wanchaol Yes, I am talking about microbatching. https://github.com/pytorch/torchtitan/blob/58b11693507bc16e7df4618455ebe66e8094f71d/train.py#L291-L294 @awgu is it sufficient to change ? Thanks from (current) ```python with loss_parallel_ctx(): pred = model(input_ids) loss = loss_fn(pred,...
Could you please give some information how to derive equ(3) in this paper?
Tested the branch ``` File "torchtitan/train.py", line 255, in main checkpoint.load() File "torchtitan/torchtitan/checkpoint.py", line 217, in load dcp.load( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/checkpoint/utils.py", line 427, in inner_func return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/checkpoint/state_dict_loader.py", line...
I was wondering whether you find the root cause of all ranks receiving the same state_dict of dataloader? I guess that it is because the state_dict is not in DTensor?...
@tianyu-l @gokulavasan Thanks for reply. One more note I want to mention here is that the current implementation does not support `num_worker>1`. If we set `num_worker>1`, different workers will load...
I agree. For very large model, it may be the case. Torchtitan is currently doing on-the-fly tokenization. I really like the idea of on-the-fly-tokenization which is great for SFT and...
Had the same issue here and @chrisociepa's script is useful to me.