Using Bigscience Bloom 176B or Bloomz 176B instead of GPT-J 6B

Open sblaszak opened this issue 2 years ago • 1 comments

Would it be possible to take this software and substitute the Bigscience Bloom 176B or Bloomz 176B models, instead of the present GPT-J 6B model, as a simple drop-in in the code? If so, would running such a refinement be expected to take an equivalently large amount of time and/or amount of GPU resources? Thanks.

Mar 24 '23 21:03 sblaszak

Unofficial comment - generally 'yes' but the real premise here is that you can achieve something near state-of-the-art performance of models of that size with a much smaller model. Using a 176B param model kind of defeats that purpose.

Mar 25 '23 03:03 srowen

The difference between this repo and alpaca is that the model used is gpt-j instead of llama?

Mar 27 '23 06:03 Syno8

Yes at the moment that is the substantial difference.

Mar 27 '23 20:03 srowen