BradZhone
Results
2
issues of
BradZhone
As we can see, there are three newest official bitnet 2b models: model1: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T model2: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16 model3: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf The info I know is that model1 used for gpu inference, model2...
### Describe the bug CPU memory utilization grows with training and finally cause OOM when num_workers of Dataloader greater than 0. Especially when more datasets are used, this mem growth...
bug