fairseq Release 0.12.2 is broken due to #4480 and #4513

🐛 Bug

https://github.com/facebookresearch/fairseq/pull/4480 was added Jun 16th, 2022, but then reverted (by https://github.com/facebookresearch/fairseq/commit/956fcf495b2d5d696ba114520363f82148a8a649) on Jun 22nd.

However it seems the 0.12.2 release was done between this (despite the release date saying Jun 27th).

To Reproduce

E.g. pip install fairseq, then go to where the files have been installed and look under fairseq/modules/transformer_layer.py and the __init__() function of TransformerEncoderLayerBase contains lots of references to BT.

Also models/transformer/transformer_layer.py has load under forward_scriptable().

(Aside: shouldn't this kind of version detection code be done globally, and added to cfg, rather than in the forward() function that is going to get called a lot?)

Additional context

The revert commit describes how this was causing a test failure. (So I think you need to check your release procedures are running tests on the release candidate branch!)

I think it is also the cause of #4518? And also the cause of #4740?

But beyond that, the code in transformer_layer.py was also buggy: when loading a big transformer model (created with a version of fairseq prior to this) that buggy code added a number of variables to every encoder layer, which added 75 million float32 weights to the saved model. It has taken me a day and a half of troubleshooting to narrow it back to this "BT" code.

(BTW, it was also not mentioned anywhere that "BT" means the BetterTransformer code added to PyTorch 1.12.)

I would suggest 0.12.3 be released based on the intended Jun 27th version of main?

Oct 12 '22 10:10 DarrenCook

Aha! It was re-landed on Jun 24th (see https://github.com/facebookresearch/fairseq/pull/4513) and then backed out a second time on Jul 14th. So that explains how it ended up in the 0.12.2 release.

Oct 12 '22 10:10 DarrenCook

Hi @DarrenCook You are right and loading 0.12.2 checkpoints with 0.12.1 is also buggy/risky. A 0.12.3 release seems to be indeed the right action. Kind

Oct 12 '22 21:10 WilliamTambellini

Using pip install fairseq==0.12.1 has been working for me so far.

I had to throw away the models made with 0.12.2: the extra weights added to the encoder seem to be part of the model that was training, i.e. it has created a different architecture. Which also means supporting models built with 0.12.2 in 0.12.3 and later is not going to work.

Oct 13 '22 07:10 DarrenCook

@frank-wei can you investigate please ? @dianaml0 @cbalioglu do we need to cut a new release to prevent people working on the broken 0.12.2 ?

Oct 13 '22 15:10 gwenzek

Yeah, we should have a 0.12.3 release and mark 0.12.2 as deprecated. Let me sync up internally.

Oct 13 '22 16:10 cbalioglu

Yes, the code has some problems(bring more variables and takes more memory) and we decided to back it out later https://github.com/facebookresearch/fairseq/pull/4568

Oct 13 '22 16:10 frank-wei

@cbalioglu any news on 0.12.3 ?

Nov 08 '22 20:11 WilliamTambellini

Things like mms have been added to the repo after the latest release, would it be possible to have a new release that includes the mms things ? I'm asking as I would like to package this for nixos. Using an officially released version rather than latest master would be great. Thank you for the software!

Jul 11 '23 07:07 happysalada

Hey all, any news on a new release? We are currently unable to use this repo because we have a dependency on numpy > 1.24, which requires a fix for the deprecated np.float alias, which seems to have been taken care of here: https://github.com/ncoish/fairseq/commit/a24fdf2d1b36e699d8f4e3efd33b7b78d6a02e7e

Oct 13 '23 18:10 ncoish