bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

Request for AdamW8bit support on CPU (would help TorchTune)

Open sanchitintel opened this issue 1 year ago • 5 comments

Feature request

Port AdamW8bit support for CPU from multi-backend-refactor branch to the main branch

Motivation

Public cloud providers' machines with GPUs are usually expensive while datacenter-grade CPUs are more readily available at lower prices. Towards the goal of making Deep Learning more accessible to developers & learners, the ability to finetune with AdamW8bit on CPU seems like a good milestone. TorchTune is currently unable to support full fine-tuning on CPU with AdamW8bit because it uses bitsandbytes' AdamW8bit optimizer.

#898 enabled AdamW8bit for CPU in multi-backend-refactor branch, but the main branch doesn't have it.

It'd be great if we could enable AdamW8bit for CPU in bitsandbytes main branch before TorchTune's next release (provided there would be a bitsandbytes release before that), so that users who'd install TorchTune would automatically end up installing a version of bitsandbytes that'd support AdamW8bit on CPU.

Thanks!

Your contribution

@jianan-gu could port over his code from multi-backend-refactor branch to the main branch.

cc @mingfeima @ashokei @TimDettmers

sanchitintel avatar May 28 '24 19:05 sanchitintel

#1220 will fix this issue.

sanchitintel avatar May 28 '24 22:05 sanchitintel

#1220 will fix this issue.

I don't recall seeing any optimizers implemented yet for CPU, but may be mistaken.

Paged optimizer doesn't make sense to me for CPU, but I can understand the request for AdamW8bit.

matthewdouglas avatar May 29 '24 04:05 matthewdouglas

Thanks for pointing that out, @matthewdouglas! I've revised the description.

@jianan-gu @xia-weiwen, please clarify if you had added AdamW8bit implementation for CPU to bitsandbytes. If not, do you have plans to add it? Thanks!

sanchitintel avatar May 29 '24 04:05 sanchitintel

@sanchitintel Yes, we are going to do it. cc. @jianan-gu @jiqing-feng

Xia-Weiwen avatar May 29 '24 05:05 Xia-Weiwen

@sanchitintel thanks for raising this. When is the next torchtune release foreseen?

Hmm, the problem is that the device abstraction / dispatcher situation is still not stable. Things will change fundamentally in the next 3 weeks. Not sure if this can be done as a PR to main in isolation? @Xia-Weiwen could you sketch out a bit more how you think this would make sense?

Titus-von-Koeller avatar Jun 03 '24 17:06 Titus-von-Koeller