Adding hiera
What does this PR do?
Adds Hiera model from Meta as suggested in https://github.com/huggingface/transformers/issues/28993
GitHub Repo: https://github.com/facebookresearch/hiera/ arXiv: https://arxiv.org/abs/2306.00989
Model License recently changed to Apache
This PR is to merge the work done by @EduardoPach and me in #29339 and #29945 respectively.
c.c. @amyeroberts
@amyeroberts @Namangarg110 and I merged our branches and we agreed that @Namangarg110 will take care of your comments and I will assist him if needed.
The work should be already good to review as it has already implemented and tested the following:
-
HieraModel -
HieraForImageClassification -
HieraBackbone -
HieraForPreTraining
I remember that you mentioned that the latter should be added with a different PR (maybe like it was done with ViTMAE), but I've added all in one as a hiera_mae would be a bunch of copy and pasta with an extra decoder implementation.
One thing that we both aren't sure about is HieraForVideoClassification (as we also have the checkpoints for that), but the ImageProcessor would look a bit different as it uses trilinear interpolation. How do we often go about models that have checkpoints for both image and video?
c.c. @amyeroberts
cc @NielsRogge could you do a first review?
@NielsRogge or @amyeroberts (don't know who is reviewing) I believe that for the xxxForVideoClassification checkpoints we could have a different implementation as it would be easier to maintain
@amyeroberts @NielsRogge, Writing to follow up on status of the review.
@EduardoPach feel free to comment on the PR what you'd change.
@Namangarg110 haven't had time to review your PR yet, will do so this week.
@EduardoPach feel free to comment on the PR what you'd change.
@Namangarg110 haven't had time to review your PR yet, will do so this week.
@NielsRogge I think that if we want to implement the video checkpoints of Hiera separately from the image checkpoints then the PR is done as we already have HieraModel, HieraForImageClassification, HieraForPretraining (For MAE) and HieraBackbone implemented and tested. You could fix some nits regarding how the config is structured (patch sizes, strides, padding are expected to be lists as I wasn't sure if HieraForVideoClassification would be done in the same implementation).
IMO we should implement the video Hiera separately as we can then isolate the ImageProcessors (normal image processor is simple and can be copied from Bit, but the video processor is a bit more complex as they use trilinear resampling), it's probably easier to maintain them as separate implementations and I haven't found an instance in the library where we have both image and video on the sample implementation. LMK what are your thoughts about this
@NielsRogge comments were addressed and tests are green (the failing one is unrelated to Hiera)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
c.c @amyeroberts
Thanks for the work adding this model!
I've just done an initial higher-level pass, so will need at least another in-depth review. Overall code looks good, just a bit complex in parts, which we can work to simplify to make sure any new readers can understand easily.
Main comments:
There's a lot of logic to handle images and videos here, but this doesn't seem possible from the image processors
There are a lot of tests which are reimplemented to include noise for the pretraining model. It should be fine to have a deterministic noise applied (through a seed) in which case, this can be added in the
_prepare_for_classmethod. This should remove the need for all the custom test methods.
Thanks for the comments, yeah it's definitely a good idea to remove the complexity added to handle both video and images. I commented before that it should probably be a good idea to have the video checkpoints as their own implementation on a different PR.
@amyeroberts, All issues mentioned above in the comments are addressed. We look forward to your review, kindly let us know if any further changes are required.
@amyeroberts, All issues mentioned above in the comments are addressed. We look forward to your review, kindly let us know if we are ready to update the checkpoints.
Hey @amyeroberts I've updated the model cards for all hiera checkpoints here could you help out to transfer them to facebook org?
Hey @amyeroberts I've updated the model cards for all hiera checkpoints here could you help out to transfer them to facebook org?
c.c. @amyeroberts
@EduardoPach Sure! Just to confirm, should all of these checkpoints be moved?
@EduardoPach Sure! Just to confirm, should all of these checkpoints be moved?
Yeap, we have one for each (HieraModel, HieraForPreTraining, HieraForImageClassification) and 6 different sizes
Hi @EduardoPach so, it seems that @NielsRogge was handling this and was in contact with people from meta regarding the weights transfers. Niels is coming back from holiday I think next week, so let's wait for him to confirm!
@amyeroberts, @NielsRogge doesn't have permission to move the checkpoints from my account to the facebook orq and I was talking to @dbolya and he is only a part of facebookresearch org and therefore can't add Niels or me as contributors to the org. Could someone with the appropriate permissions just transfer the checkpoints from my account to the Facebook org? (or perhaps facebookresearch idk)
@amyeroberts Any updates regarding the same ?
Hi both, finally got to transfer the checkpoints :) they're now all here: https://huggingface.co/models?other=hiera&sort=created
Thankyou @NielsRogge. @amyeroberts, is there anything else required from our side ?
@amyeroberts fixed the checkpoints and cleaned the __init__.py the failing test seems unrelated to Hiera
@EduardoPach Great! The last thing is to have a run of the slow tests before merge. Could you push an empty commit with: git commit --allow-empty -m "[run_slow] hiera"? This will trigger a run of the slow integration tests for the model. I'll need to approve the workflow after the commit. Once it's all green we're good to merge!
Thank you for your response, @amyeroberts. I have pushed the empty commit, and all tests are green.
@Namangarg110 It seems there's some slow tests which fail and need to be addressed before merge:
Single-GPU tests
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_attention_outputs - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_batching_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_determinism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_hidden_states_output - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_model_from_pretrained - OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like facebook/hiera-tiny-224-hf is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_model_outputs_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_save_load - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx_output_loss - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant_false - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_for_pretraining - AssertionError: False is not true
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_fp16 - OSError: facebook/hiera-tiny-224-hf does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
Multi-GPU tests
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_attention_outputs - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_batching_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_determinism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_hidden_states_output - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_model_outputs_equivalence - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_save_load - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_torch_fx_output_loss - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelTest::test_training_gradient_checkpointing_use_reentrant_false - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA_gather)
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_for_pretraining - AssertionError: False is not true
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_fp16 - ValueError: HieraModel does not support `device_map='auto'`. To implement support, the model class needs to implement the `_no_split_modules` attribute.
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_image_classification_head - RuntimeError: Float did not match Half
FAILED tests/models/hiera/test_modeling_hiera.py::HieraModelIntegrationTest::test_inference_interpolate_pos_encoding - RuntimeError: Input type (float) and bias type (c10::Half) should be the same
FAILED tests/models/hiera/test_modeling_hiera.py::HieraBackboneTest::test_backbone_outputs - RuntimeError: Input type (float) and bias type (c10::Half) should be the same
FAILED tests/models/hiera/test_modeling_hiera.py::HieraBackboneTest::test_backbone_stage_selection - RuntimeError: Input type (float) and bias type (c10::Half) should be the same
FAILED tests/models/hiera/test_modeling_hiera.py::HieraBackboneTest::test_create_from_modified_config - RuntimeError: Input type (float) and bias type (c10::Half) should be the same
c.c. @amyeroberts
Thanks for addressing all the slow tests - we're good to merge!