What does this PR do?

This PR ports IDEFICS to tensorflow

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Oct 17 '23 13:10 a8nova

@VictorSanh just an fyi - I am hoping this model is TF portable..

Oct 20 '23 10:10 a8nova

also ccing @Rocketknight1

Oct 20 '23 14:10 ArthurZucker

🤗👀

Oct 20 '23 15:10 VictorSanh

Hey @a8nova! I'm the TF maintainer around here, and this sounds like a great idea. Feel free to ping me if you encounter any difficulties. In general when porting code to TF you will first want to do the 'boilerplate'. Take a look at #22970 to see how this works! All of these steps are fairly quick:

Copy the modeling_idefics.py file to modeling_tf_idefics.py
Rename all the classes e.g. IdeficsForXXX -> TFIdeficsForXXX
Add the relevant imports in src/transformers/models/idefics/__init__.py and /src/transformers/__init__.py
Add the TF classes to modeling_tf_auto.py
Add the TF classes in docs/source/en/model_doc/

After this, you can start actually porting the code in modeling_tf_idefics.py. In general, you can just replace torch.xxx() ops with tf.xxx(), and layers from torch.nn with layers from tf.keras.layers. When creating layers, subclass from tf.keras.layers.Layer instead of nn.Module. Many of the ops are exactly identical, or only have minor name changes. There are a few key changes to watch out for, though:

The TF forward() method is called call(). NOT __call__()!
TF layers usually don't have an input_dim argument. This value is inferred when they are built with their first input.
When creating a layer (any class that is subclassed from tf.keras.layers.Layer, which includes TF built-in layers like Dense), pass the attribute name as the name argument. This ensures the TF weights layout lines up with the PyTorch state dict, like so:

self.layer_norm1 = tf.keras.layers.LayerNormalization(name="layer_norm1")

In-place modifications of tensors are prohibited in TF (and JAX!). Most neural net code doesn't do this anyway, because it creates problems in backprop. If you need to do it, you can use something like tf.where() or tf.scatter_nd_update() to create a new tensor with the updates instead. This can be tricky, let me know if you need help!
For... reasons... the base stem of the model is contained in TFIdeficsBaseLayer and TFIdeficsModel is just a wrapper around this. Derived classes all use TFIdeficsBaseLayer and don't create any TFIdeficsModel layers. This is different from Torch models, where the model stem is contained in IdeficsModel and the derived classes use it as a layer.

We've actually had some success using GPT-4 to draft a port of the modeling code, so let me know if you'd like me to do that and add the GPT port of modeling_tf_idefics.py to this PR as a starting point!

After all this, the last step is to make changes to any processing code and tests needed to also support the TF model. It's a long list, but it's doable!

Oct 20 '23 16:10 Rocketknight1

Thank you @Rocketknight1 for the detailed guide. I have 1,2 & 3 done already, i just updated the PR. I will continue to work on the rest.

Regarding the GPT-4 generated draft, I already started doing some of the work, if you think the generated draft is easier to port to TF, please add it here and I can continue working from that ( I saw a comment about "auto-translation" in modeling_tf_sam.py and I was wondering about the details.. :)

A few questions:

What about all the torch.nn code inside perceiver.py and vision.py, do they also need TF porting? (my goal is to port inference code first, if this isn't needed for inference, then maybe i can come back to it)
For model_tf_auto.py, what is this code doing? It is not clear to me what to add the TF idefics versions, since i don't understand that file

Thanks for all the help!

Oct 20 '23 18:10 a8nova

Hi @a8nova, to answer the questions:

You'll probably need to convert those files too - the IdeficsVisionTransformer in vision.py seems to be a core part of the model. You might be able to skip perceiver.py initially, as it's only used in some model configs, but we should probably do it somewhere as part of this PR!
That code is there because SAM was the first ForMaskGeneration model we added in TF. For Idefics it's easier, because you're not adding a whole new "category" of model like that. If you look in modeling_auto.py you'll see there are two lines for IdeficsModel and IdeficsForVisionText2Text - all you need to do is make copies of those lines in modeling_tf_auto.py for TFIdeficsModel and TFIdeficsForVisionText2Text.

I'll work on generating GPT-4 translations for you, and post them here when they're available! Since you've already started on modeling_tf_idefics.py I won't overwrite it. You can just copy pieces from it when you need to.

Oct 20 '23 20:10 Rocketknight1

@a8nova I added the three files with _autotranslate.py endings! Note that there are likely to be issues (e.g. forgetting the name kwarg when initializing layers even though I told it to)

Oct 20 '23 20:10 Rocketknight1

Thank you @Rocketknight1, yes that makes sense, taking a closer look, idefics is 2 pre-trained models combined together, so vision.py is a core part.

I will take a look at the auto-translated files!

Oct 21 '23 06:10 a8nova

Hello @Rocketknight1 - Sorry I went MIA, I have been occupied with my 9-5. I just made some progress. I have the tests running, I am running into a weird error, I will attach a file below, any ideas? error.txt

Also, regarding: "For... reasons... the base stem of the model is contained in TFIdeficsBaseLayer and TFIdeficsModel is just a wrapper around this. Derived classes all use TFIdeficsBaseLayer and don't create any TFIdeficsModel layers. This is different from Torch models, where the model stem is contained in IdeficsModel and the derived classes use it as a layer." how come I don't see similar stuff for other TF implementations? Is this specific to IDEFICS?

Nov 16 '23 07:11 a8nova

Hi @a8nova!

Firstly the error: The problem is that TF models don't let you assign to self.layers, because TensorFlow reserves that as a special keyword. What you should do is replace self.layers with something else, maybe self.modules or self.decoder_layers or something. However, you should keep the name kwarg as layers_{i}. We match TF layers to PT layers when doing weight cross-loading using the name attributes of layers, not the actual name like self.layers, so as long as you keep the argument the same then weight cross-loading should work.

Secondly, regarding TFIdeficsBaseLayer, that was actually just a typo on my part - it's actually called TFIdeficsMainLayer! If you check any of our other TF models, you'll see they have a MainLayer class like TFBertMainLayer that contains the model stem.

Nov 16 '23 13:11 Rocketknight1

Thank you @Rocketknight1!

Nov 17 '23 07:11 a8nova

Hi @Rocketknight1, I have a few questions:

For processing_idefics.py, how do you suggest I handle both pytorch and tf? right now I just have it hacked in my view to only have TF stuff (just to unblock me).
I am getting this weird error from freeze_model . I am doing something wrong but not sure, the error is AttributeError: 'TFIdeficsRMSNorm' object has no attribute 'layers' (full stacktrace attached). Any ideas?
There is also a "ALL_LAYERNORM_LAYERS" in the pytorch code, I added it in this commit https://github.com/huggingface/transformers/pull/26870/commits/7e0a35119b4d7a6284d04d8c543fba1b29e573c9, does this look right to you?

Thanks in advance!

error.txt

Nov 21 '23 22:11 a8nova

Hi @a8nova, let's see...

For 1, we usually add an argument like return_tensors which can take values like tf, pt, etc. You can take a look at e.g. models/sam/processing_sam.py for an example - the main thing is that you should guard any imports of tf or torch behind something like is_torch_available() to make sure that the code doesn't crash for people who only have one of them installed!

For 2, I think the problem there is that freeze_text_layers iterates over multiple layers that include the normalization layer self.norm:

def freeze_text_layers(self, module_exceptions=[]):
    for module in [self.decoder_layers, self.norm]:
        freeze_model(module, module_exceptions=module_exceptions)

As a result, it tries to iterate over self.norm.layers, which doesn't exist because self.norm is just a LayerNormalization layer, not a model with additional sub-layers. I'll suggest a change in freeze_model that should help.

For 3, ALL_LAYERNORM_LAYERS is a value mainly used by the HuggingFace Trainer, which is only used for PyTorch. When training with TF, we just use Keras instead to get the same functionality. You can just skip/remove it!

Nov 22 '23 14:11 Rocketknight1

Thank you @Rocketknight1 for the detailed explanation and help! Moving on to other errors..

Nov 22 '23 16:11 a8nova

Sorry to bother you again @Rocketknight1, Do i need to worry about gradient checkpointing in the TF training tests?

I am asking because for TFIdeficsForVisionText2Text there is test_training and test_training_gradient_checkpointing and they call model.train() and model.gradient_checkpointing_enable() which fail with AtributeError.

I found gradient_checkpointing_enable() inside modeling_utils.py, i don't see one inside modeling_tf_utils.py, can you guide me if my observation is correct and if I need to add all the gradient_checkpointing_* routines in modeling_tf_utils.py?

Thank you!

Nov 22 '23 17:11 a8nova

@a8nova No, you can skip gradient checkpointing in TF ports!

Nov 23 '23 13:11 Rocketknight1

Hi @Rocketknight1, how do i run the integration tests? For example IdeficsModelIntegrationTest.

test_training for tf is failing due to model.train(). i see it is defined in Trainer class in files trainer.py and trainer_tf.py. I don't think trainer_tf.py is used anywhere or is it? how do you suggest I resolve this? Thanks!

Nov 29 '23 14:11 a8nova

Hi @a8nova, you're right - we used to have a TFTrainer class but it's now deprecated. We recommend just training our TF models using the Keras API like model.fit(), and that means methods like model.train() do not really exist for them. I wrote a blogpost about this here if you're curious, but it's mostly not that relevant to this PR!

Anyway, as a result of this the integration tests for TF models can be quite different from the integration tests for PT models - I'd recommend copying tests from another TF model instead, rather than trying to exactly copy the PT tests. You may also have to drop some tests entirely - the full IDEFICS tests use 4-bit quantization in PT, which isn't supported for TF, and as a result our CI machines may not be able to run it in TF at all!

Nov 29 '23 16:11 Rocketknight1

Thanks @Rocketknight1. I have a few follow up questions. There are some test failures I don't understand.

For image_processing_idefics.py, I made changes in this commit to pass return_tensors to preprocess. I am testing it via the integration tests, I will share diff I needed to get it to run below.

Does the integration test make sense?

diff --git a/tests/models/idefics/test_modeling_tf_idefics.py b/tests/models/idefics/test_modeling_tf_idefics.py
index f9bcec579..50eee25d6 100644
--- a/tests/models/idefics/test_modeling_tf_idefics.py
+++ b/tests/models/idefics/test_modeling_tf_idefics.py
@@ -454,7 +454,7 @@ class TFIdeficsForVisionText2TextTest(TFIdeficsModelTest, unittest.TestCase):
 
 @require_tf
 @require_vision
-class IdeficsModelIntegrationTest(TestCasePlus):
+class TFIdeficsModelIntegrationTest(TestCasePlus):
     @cached_property
     def default_processor(self):
         return (
@@ -463,8 +463,6 @@ class IdeficsModelIntegrationTest(TestCasePlus):
             else None
         )
 
-    @require_bitsandbytes
-    @slow
     def test_inference_natural_language_visual_reasoning(self):
         cat_image_path = self.tests_dir / "fixtures/tests_samples/COCO/000000039769.png"
         cats_image_obj = Image.open(cat_image_path)  # 2 cats
@@ -494,9 +492,7 @@ class IdeficsModelIntegrationTest(TestCasePlus):
             load_in_4bit=True,
             bnb_4bit_compute_dtype="float16",
         )
-        model = IdeficsForVisionText2Text.from_pretrained(
-            "HuggingFaceM4/idefics-9b", quantization_config=quantization_config, device_map="auto"
-        )
+        model = TFIdeficsForVisionText2Text.from_pretrained("HuggingFaceM4/idefics-9b")
         processor = self.default_processor
         inputs = processor(prompts, return_tensors="tf")
         generated_ids = model.generate(**inputs, max_length=100)

How do i pass return_tensors? I adopted other code where return_tensors for image processing code is passed to the preprocess code only and not sure how to pass it here.

I have attached an error log for another failure I get. This is coming from TFIdeficsDecoupledLinear: E assert <transformers.models.idefics.modeling_tf_idefics.TFIdeficsDecoupledLinear object at 0x7d4a9ac41540> is None,

full error in file attached

There are a few other errors I don't understand but some look related to this. Thanks for the help @Rocketknight1 !

Dec 03 '23 14:12 a8nova

Hi @a8nova, sorry for the delay!

Firstly, for return_tensors, generally our processors handle it like this:

def __init__(self, return_tensors=None):
    self.return_tensors = return_tensors

def __call__(self, return_tensors=None):
    if return_tensors = None:
        return_tensors = self.return_tensors

In other words, you can supply it as an argument to __call__, but if you don't supply that argument, then it uses the default value that's set in the __init__.

Also, I took a look in the attached file but I can't see the error - can you double-check the contents?

Dec 06 '23 12:12 Rocketknight1

whoops I attached the wrong one, i have attached the correct one below. thank you @Rocketknight1! Edit: i can't seem to repro that error now, I will look into it over the weekend again

Dec 06 '23 13:12 a8nova

No probs - let me know if it recurs!

Dec 07 '23 12:12 Rocketknight1

Thank you @Rocketknight1!. There is another test failure I could use your help on. For test_resize_token_embeddings, the input pixel_values is CHW without the batches N, which code is responsible for resizing this kind of input? Right now it crashes when it tries to build the dummy weights, I have attached full error. The vision_tf.py code is still a little bugy. In the TFIdeficsVisionEmbeddings forward pass, the first thing I do is change the pixel_values to channels last format so it runs on CPU. what do I do when I get this kind of input? Thank you!

test_resize_tokens_embeddings_error.txt

Dec 09 '23 16:12 a8nova

Hi @a8nova, the models aren't intended to run with CHW input - they should always receive NCHW! The test_resize_token_embeddings test is mostly designed to work with text models - you might need to override for IDEFICS, but it's also not a critical test, so you can just skip it instead!

Also, the thing you did where you transpose to NHWC / channels_last is normal for TF models, because TF convolutions don't work on CPU in channels_first mode. Be careful that you transpose back after convolutional layers/blocks, though, or else the dimensions in the rest of the code will be wrong!

Dec 11 '23 16:12 Rocketknight1

Hi @Rocketknight1, Happy holidays! I have a few more questions. I have less test failures now but I can't wait to get this model working end-to-end. I am using the integration test as a start for fully testing the tensorflow implmentation.

EDIT: Update on 1 (Jan 7th): I have figured out the issue, it was due to a bad reshape, I am able to run the model end-to-end now using tiny-random-idefics. I will run it with idefics-9b next! (I still have some follow up questions coming up as there is some weirdness i don't understand)

~~1. For the integration test, I have( This is from TFIdeficsModelIntegrationTest slightly modified as below, I understand this test will fail with tiny-random-idefics but I am using it to help me flush other bugs)~~

        model = TFIdeficsForVisionText2Text.from_pretrained("HuggingFaceM4/tiny-random-idefics", from_pt=True)
        processor = self.default_processor
        inputs = processor(prompts, return_tensors="tf")
        generated_ids = model.generate(**inputs, max_length=100)
        generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

~~Right now this fails when calling from_pretrained, it goes into the forward pass and has an invalid size for pixel_values, this goes back to what I asked last time where I said the input is missing batches for some test but it looks like the problem is from my code actually. Right now what I am observing is that processing_idefics.py::__call__ returns the correct size for pixel_values from the call to BatchFeatures() where pixel_values is a 5d tensor of [2,2,3,30,30] but the first forward pass inside modeling_idefics and vision_tf.py have pixel_values of [3,30,30]. Do you have suggestions how this might be?~~

I tried converting the pytorch weights to tf but failed with some error, do i need to get the tensorflow implementation working before I can convert the weights?
Is dummy_inputs still needed for tf implementations? Like modeling_tf_bert.py#L916

Dec 27 '23 20:12 a8nova

Hi @a8nova, sorry for the Christmas-related delay. Huge congratulations on getting the tiny-random model working though, that indicates you're probably quite close to getting the whole thing working!

Firstly, I'd suggest that you probably do not need to convert the weights. The reason for this is that our idefics checkpoints all have safetensors weights already, and those weights can be loaded by any framework - TF, PyTorch or JAX. Once the TF code is working, you should just be able to load those repos with from_pretrained with no additional changes to the repo.

Secondly, dummy inputs are much less important than they used to be, since we're moving to adding explicit build() methods on all of our TF models. That's something we'll have to add to this PR as well, but let's leave it to the end - I can help with it, and it shouldn't be too painful.

Finally, I notice a lot of the test failures are caused by the TF test runner trying to import torch - the reason for this is that you're importing ModelOutput from modeling_outputs.py, but this is a torch file - try importing it from modeling_tf_outputs instead!

Still, I think this PR is getting close now - I should be able to reply much more quickly now the holidays are over, so please let me know if you encounter any other difficulties!

Jan 08 '24 14:01 Rocketknight1

Also @alshenkute it seems like the issue there is that the model is stored in bfloat16, which is supported in PyTorch on GPU and CPU, but maybe not on M1. I'm not too familiar with developing on M1 though - maybe load with torch_dtype=torch.float32?

Jan 10 '24 16:01 Rocketknight1

Got it, thank you @Rocketknight1! let me try that. ( i accidentally replied from another email, sorry for confusion, i deleted it)

Jan 10 '24 16:01 a8nova

Hi @Rocketknight1 - I am still having issues loading the idefics-9b model, Main issue right now is in the call to TFIdeficsForVisionText2Text.from_pretrained, image_attention_mask is None in TFIdeficsMainLayer. It looks like image_attention_mask is populated by prepare_inputs_for_generation which isn't called for the from_pretrained. I have a hack but it is ugly and doesn't feel correct.

For pytorch, the from_pretrained doesn't seem to invoke the inference code while loading the model weights, so i don't see same issue..

What am i missing here? If none of the stuff I am saying above makes sense then there is a chance I have a bug elsewhere or I am misunderstanding things

By the way, it looks like the OOM is killing my process and I am looking into why that is but I need to understand how to handle the issue above with the call to from_pretrained

Thanks in advance!

Jan 15 '24 10:01 a8nova

Hi @a8nova - I think I have a suspicion of what's going on now, let me explain!

I made a PR just before the holidays at #27794. This PR replaces our old approach to building TF models, where we used dummy inputs and ran an inference step to build weights, with a new approach where we build weights with proper build() methods. We did this because there were several issues with the dummy inputs, including poor performance, and I suspect the issue you're encountering is related to this.

I think the solution here is twofold:

Rebase this PR on the most recent transformers version (so you get the new build() PR)
Add build() methods to IDEFICS.

Since this is a very new PR that was added while your PR was open, I realize that it's a bit awkward. Let me know if you want me to handle one or both of those steps!

Jan 15 '24 17:01 Rocketknight1

Port IDEFICS to tensorflow

What does this PR do?

Before submitting

Who can review?