phile

Results 6 issues of phile

This PRs aims to move the attributes of `DistributedFusedAdam` to the correct device for v1 state dict. After loading V1 state dict, tensors in `DistributedFusedAdam.["buckets"]` will be on CPU device....

**Describe the Bug** Currently I want to build the latest apex from source for the nightly pytorch (2.1.0, commit id 3817de5d840bdff3f11ee23782494b5a13ae2001) . I run the following command ``` python3 -m...

bug

Dear authors: Thank you for your nice work! I have a question about how the generator works in your code. It seems that the generator should take the random variables...

This PR aims to pass `timeout` parameters to `new_group` function. Previously, the ProcessGroups created by `new_group` does not set `timeout` parameters, which would make the communications under these ProcessGroup uses...

**Describe the bug** Hi, I want to do deterministic training which requires setting `NVTE_ALLOW_NONDETERMINISTIC_ALGO=0`. However, I find it will lead to the following NaN error at the first step when...

Hi, I try to build the index of the wiki corpus using the script you provide in `scripts/run_retrieve_tevatron.sh`. However, I find the performance of retrieval evaluation is very bad. The...