transformers Adding hiera

What does this PR do?

Adds Hiera model from Meta as suggested in https://github.com/huggingface/transformers/issues/28993

GitHub Repo: https://github.com/facebookresearch/hiera/ arXiv: https://arxiv.org/abs/2306.00989

Model License recently changed to Apache

This PR is to merge the work done by @EduardoPach and me in #29339 and #29945 respectively.

c.c. @amyeroberts

Apr 19 '24 19:04 Namangarg110

@amyeroberts @Namangarg110 and I merged our branches and we agreed that @Namangarg110 will take care of your comments and I will assist him if needed.

The work should be already good to review as it has already implemented and tested the following:

HieraModel
HieraForImageClassification
HieraBackbone
HieraForPreTraining

I remember that you mentioned that the latter should be added with a different PR (maybe like it was done with ViTMAE), but I've added all in one as a hiera_mae would be a bunch of copy and pasta with an extra decoder implementation.

One thing that we both aren't sure about is HieraForVideoClassification (as we also have the checkpoints for that), but the ImageProcessor would look a bit different as it uses trilinear interpolation. How do we often go about models that have checkpoints for both image and video?

Apr 23 '24 19:04 EduardoPach

c.c. @amyeroberts

Apr 29 '24 11:04 EduardoPach

cc @NielsRogge could you do a first review?

Apr 29 '24 18:04 amyeroberts

@NielsRogge or @amyeroberts (don't know who is reviewing) I believe that for the xxxForVideoClassification checkpoints we could have a different implementation as it would be easier to maintain

May 06 '24 08:05 EduardoPach

@amyeroberts @NielsRogge, Writing to follow up on status of the review.

May 07 '24 02:05 Namangarg110

@EduardoPach feel free to comment on the PR what you'd change.

@Namangarg110 haven't had time to review your PR yet, will do so this week.

May 07 '24 06:05 NielsRogge

@EduardoPach feel free to comment on the PR what you'd change.

@Namangarg110 haven't had time to review your PR yet, will do so this week.

@NielsRogge I think that if we want to implement the video checkpoints of Hiera separately from the image checkpoints then the PR is done as we already have HieraModel, HieraForImageClassification, HieraForPretraining (For MAE) and HieraBackbone implemented and tested. You could fix some nits regarding how the config is structured (patch sizes, strides, padding are expected to be lists as I wasn't sure if HieraForVideoClassification would be done in the same implementation).

IMO we should implement the video Hiera separately as we can then isolate the ImageProcessors (normal image processor is simple and can be copied from Bit, but the video processor is a bit more complex as they use trilinear resampling), it's probably easier to maintain them as separate implementations and I haven't found an instance in the library where we have both image and video on the sample implementation. LMK what are your thoughts about this

May 07 '24 07:05 EduardoPach

@NielsRogge comments were addressed and tests are green (the failing one is unrelated to Hiera)

May 13 '24 07:05 EduardoPach

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

May 13 '24 09:05 HuggingFaceDocBuilderDev

c.c @amyeroberts

May 16 '24 11:05 EduardoPach

Thanks for the work adding this model!

I've just done an initial higher-level pass, so will need at least another in-depth review. Overall code looks good, just a bit complex in parts, which we can work to simplify to make sure any new readers can understand easily.

Main comments:

There's a lot of logic to handle images and videos here, but this doesn't seem possible from the image processors

There are a lot of tests which are reimplemented to include noise for the pretraining model. It should be fine to have a deterministic noise applied (through a seed) in which case, this can be added in the _prepare_for_class method. This should remove the need for all the custom test methods.

Thanks for the comments, yeah it's definitely a good idea to remove the complexity added to handle both video and images. I commented before that it should probably be a good idea to have the video checkpoints as their own implementation on a different PR.

May 20 '24 21:05 EduardoPach

@amyeroberts, All issues mentioned above in the comments are addressed. We look forward to your review, kindly let us know if any further changes are required.

May 28 '24 14:05 Namangarg110

@amyeroberts, All issues mentioned above in the comments are addressed. We look forward to your review, kindly let us know if we are ready to update the checkpoints.

Jun 13 '24 14:06 Namangarg110

Hey @amyeroberts I've updated the model cards for all hiera checkpoints here could you help out to transfer them to facebook org?

Jun 21 '24 10:06 EduardoPach

Hey @amyeroberts I've updated the model cards for all hiera checkpoints here could you help out to transfer them to facebook org?

c.c. @amyeroberts

Jun 25 '24 11:06 EduardoPach

@EduardoPach Sure! Just to confirm, should all of these checkpoints be moved?

Jun 26 '24 12:06 amyeroberts

@EduardoPach Sure! Just to confirm, should all of these checkpoints be moved?

Yeap, we have one for each (HieraModel, HieraForPreTraining, HieraForImageClassification) and 6 different sizes

Jun 26 '24 22:06 EduardoPach

Hi @EduardoPach so, it seems that @NielsRogge was handling this and was in contact with people from meta regarding the weights transfers. Niels is coming back from holiday I think next week, so let's wait for him to confirm!

Jun 27 '24 17:06 amyeroberts

@amyeroberts, @NielsRogge doesn't have permission to move the checkpoints from my account to the facebook orq and I was talking to @dbolya and he is only a part of facebookresearch org and therefore can't add Niels or me as contributors to the org. Could someone with the appropriate permissions just transfer the checkpoints from my account to the Facebook org? (or perhaps facebookresearch idk)

Jul 01 '24 19:07 EduardoPach

@amyeroberts Any updates regarding the same ?

Jul 04 '24 16:07 Namangarg110

Hi both, finally got to transfer the checkpoints :) they're now all here: https://huggingface.co/models?other=hiera&sort=created

Jul 04 '24 17:07 NielsRogge

Thankyou @NielsRogge. @amyeroberts, is there anything else required from our side ?

Jul 04 '24 17:07 Namangarg110

@amyeroberts fixed the checkpoints and cleaned the __init__.py the failing test seems unrelated to Hiera

Jul 07 '24 13:07 EduardoPach

@EduardoPach Great! The last thing is to have a run of the slow tests before merge. Could you push an empty commit with: git commit --allow-empty -m "[run_slow] hiera"? This will trigger a run of the slow integration tests for the model. I'll need to approve the workflow after the commit. Once it's all green we're good to merge!

Jul 09 '24 09:07 amyeroberts

Thank you for your response, @amyeroberts. I have pushed the empty commit, and all tests are green.

Jul 09 '24 16:07 Namangarg110

@Namangarg110 It seems there's some slow tests which fail and need to be addressed before merge:

Single-GPU tests

FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_attention_outputs - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_batching_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_determinism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_hidden_states_output - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_model_from_pretrained - OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like facebook/hiera-tiny-224-hf is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_model_outputs_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_save_load - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx_output_loss - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant_false - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_for_pretraining - AssertionError: False is not true
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_fp16 - OSError: facebook/hiera-tiny-224-hf does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

Multi-GPU tests

FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_attention_outputs - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_batching_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_determinism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_hidden_states_output - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_model_outputs_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_save_load - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx_output_loss - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant_false - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_for_pretraining - AssertionError: False is not true
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_fp16 - ValueError: HieraModel does not support `device_map='auto'`. To implement support, the model class needs to implement the `_no_split_modules` attribute.
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_image_classification_head - RuntimeError: Float did not match Half
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_interpolate_pos_encoding - RuntimeError: Input type (float) and bias type (c10::Half) should be the same
FAILED tests/models/hiera/test_modeling_hiera.py::HieraBackboneTest::test_backbone_outputs - RuntimeError: Input type (float) and bias type (c10::Half) should be the same
FAILED tests/models/hiera/test_modeling_hiera.py::HieraBackboneTest::test_backbone_stage_selection - RuntimeError: Input type (float) and bias type (c10::Half) should be the same
FAILED tests/models/hiera/test_modeling_hiera.py::HieraBackboneTest::test_create_from_modified_config - RuntimeError: Input type (float) and bias type (c10::Half) should be the same

Jul 09 '24 20:07 amyeroberts

c.c. @amyeroberts

Jul 11 '24 19:07 EduardoPach

Thanks for addressing all the slow tests - we're good to merge!

Jul 11 '24 21:07 amyeroberts