Add propainter
What does this PR do?
This PR adds ProPainter, a Video Inpainting model with 5.4k stars and 635 forks repo. It fixes #26360 and resolve stale PR #26391 for the above issue from complete scratch to build on with transformers standard.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests?
Who can review?
@amyeroberts @ArthurZucker @NielsRogge (?) @rafaelpadilla(as he was the initial reviewer on the stale PR)
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
The PR is more than ready for first pass of review!!!
TODO(will be done in a fly :)):
- [x] Fix all common test failures
- [x] Update weights conversion scripts with the working one on local machine
- [x] Review batching nits one more time in the applicable files
- [x] Update docs in corresponding files
- [x] Check for video 'outpainting' error
Results:
Here, I am attaching the GIFs for original video, original model's output for object removal through video inpainting and the current PR' HF model's output for object removal through video inpainting:
Original video:
Original model output:
HF ported model output:
Example usage is provided in the doc file here
(Sorry about the failures on the ci should be fixed now)
Thank you @RUFFY-369 @ArthurZucker @amyeroberts @NielsRogge all for your hard work in bringing our ProPainter model to the Hugging Face Transformers repository! I really appreciate your efforts to make it more accessible to the community. Cheers!🎉🎉🎉
cc @amyeroberts @NielsRogge , I have addressed all the suggested changes that were mentioned. Please check them out. ~~Only thing to add is the checkpoints conversion file and that will be also soon be added.~~ Update:This has also been done
Update: All the files I've modified are ready for review.
Hi @amyeroberts and @NielsRogge ! When you have some time, could you please take another look at this PR? I've resolved your previous remarks and left the ones where I had questions open (some are updated & closed from my side).
Thanks in advance!
@RUFFY-369 I can see that there's still commits being pushed. Could you address the failing tests first, and then ping when ready for review? Let us know if there's any areas of the PR you have specific questions about or need any help with resolving the tests
@amyeroberts Thank you for your reply. Yeah I was working on solving the failing CI tests in these commits. And meanwhile, I also did some refactoring in the code by removing more of every hard coded values that remained by moving it to config file if code allows.
I tagged you in one of the review conversation in this PR that I asked regarding the VideoProcessor
I will ping you as soon as the tests are green :+1: and may ask for help if I get stuck in it
Hi @amyeroberts , I need your help with the ~test_processorstest which is failing [here](https://circleci.com/gh/huggingface/transformers/1382859) as I think this isn't related to the PR and is related to themain` branch.~
Update: I noticed that it was fixed when I pushed my latest commits. :+1:
Hi @amyeroberts . All the tests are green except one. I may need some advice and your help with this as this is the only one remaining.
tests_torch is failing with this: worker 'gw6' crashed while running 'tests/models/propainter/test_modeling_propainter.py::ProPainterModelTest::test_attention_outputs'. And I have changed different config attributes in the test modeling file in the above commits to make it go as light as possible but it still happens.
If you could take a look at it then this remaining one test will be green and the PR will be ready for review.
Thank you :smile:
cc: @NielsRogge , @ydshieh
still failing on the same test
@RUFFY-369 Are you able to run the test locally?
@amyeroberts Yes, tests are running locally, all of them. And all are passed.
Although my previous experience says it's likely OOM, but from the resource usage log it doesn't seem that case
And I have changed different config attributes in the test modeling file in the above commits to make it go as light as possible but it still happens.
I suggest you set a breakpoint, print the content of model's config and compare it to the default config's values.
I would also be helpful to save the (created) model and see its size, and/or print the model print(model)
Although my previous experience says it's likely OOM, but from the resource usage log it doesn't seem that case
> And I have changed different config attributes in the test modeling file in the above commits to make it go as light as possible but it still happens.
I suggest you set a breakpoint, print the content of model's config and compare it to the default config's values.
I would also be helpful to save the (created) model and see its size, and/or print the model
print(model)
@ydshieh Thank you for your reply. Initially, I skipped the test which was crashing the worker but then the next test in run also failed, you can notice that in the latest test fails. It seems to be OOM but when I also checked the analytics they didn't point in that direction.
I had already set the config attributes as low as possible in test file. And what points out it be OOM in my opinion was that all the tests are passing without any OOM in my own system.
You can still print the config of the default and the used config in full form and let's see what are the difference.
You can still print the config of the default and the used config in full form and let's see what are the difference.
@ydshieh I think with default config you mean the config which is defined in the test file and model's config is what we will get with model.config, right? For example : this and this respectively
PS If I understand correct then here they are respectively:
ProPainterConfig {
"adversarial_weight": 0.01,
"channels": [
64,
96,
128
],
"corr_levels": 4,
"corr_radius": 4,
"dropout": 0.0,
"flow_weight_flow_complete_net": 0.25,
"gan_loss": "hinge",
"hidden_size": 512,
"hole_weight": 1.0,
"in_channels": [
64,
64,
96
],
"initializer_range": 0.02,
"interp_mode": "nearest",
"kernel_size": [
7,
7
],
"kernel_size_3d": [
1,
3,
3
],
"kernel_size_3d_discriminator": [
3,
5,
5
],
"model_type": "propainter",
"neighbor_length": 10,
"no_dis": false,
"norm_fn": [
"batch",
"group",
"instance",
"none"
],
"num_attention_heads": 1,
"num_channels": 128,
"num_hidden_layers": 2,
"num_local_frames_flow_complete_net": 8,
"num_local_frames_propainter": 8,
"padding": 1,
"padding_inpaint_generator": [
3,
3
],
"patch_size": 3,
"perceptual_weight": 0.0,
"pool_size": [
4,
4
],
"raft_iter": 20,
"ref_stride": 10,
"stride": [
3,
3
],
"stride_3d": [
1,
1,
1
],
"strides": [
1,
2,
2
],
"subvideo_length": 80,
"transformers_version": "4.45.0.dev0",
"valid_weight": 1.0,
"window_size": [
5,
9
]
}
ProPainterConfig {
"adversarial_weight": 0.01,
"channels": [
64,
96,
128
],
"corr_levels": 4,
"corr_radius": 4,
"dropout": 0.0,
"flow_weight_flow_complete_net": 0.25,
"gan_loss": "hinge",
"hidden_size": 512,
"hole_weight": 1.0,
"in_channels": [
64,
64,
96
],
"initializer_range": 0.02,
"interp_mode": "nearest",
"kernel_size": [
7,
7
],
"kernel_size_3d": [
1,
3,
3
],
"kernel_size_3d_discriminator": [
3,
5,
5
],
"model_type": "propainter",
"neighbor_length": 10,
"no_dis": false,
"norm_fn": [
"batch",
"group",
"instance",
"none"
],
"num_attention_heads": 1,
"num_channels": 128,
"num_hidden_layers": 2,
"num_local_frames_flow_complete_net": 8,
"num_local_frames_propainter": 8,
"padding": 1,
"padding_inpaint_generator": [
3,
3
],
"patch_size": 3,
"perceptual_weight": 0.0,
"pool_size": [
4,
4
],
"raft_iter": 20,
"ref_stride": 10,
"stride": [
3,
3
],
"stride_3d": [
1,
1,
1
],
"strides": [
1,
2,
2
],
"subvideo_length": 80,
"transformers_version": "4.45.0.dev0",
"valid_weight": 1.0,
"window_size": [
5,
9
]
}
and they are identical
I did CPU profiling with torch.profiler.profile context manger while calculating the outputs in this line and this was the result of it:
----------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
aten::uniform_ 46.82% 1.077s 46.82% 1.077s 29.121ms 37
cudaLaunchKernel 23.84% 548.694ms 23.84% 548.694ms 44.264us 12396
aten::convolution 0.18% 4.196ms 16.86% 387.925ms 352.019us 1102
aten::_convolution 0.44% 10.180ms 16.68% 383.729ms 348.212us 1102
aten::conv2d 0.11% 2.452ms 16.13% 371.109ms 350.765us 1058
aten::cudnn_convolution 8.29% 190.862ms 15.27% 351.484ms 318.951us 1102
aten::copy_ 4.51% 103.848ms 6.37% 146.647ms 152.758us 960
aten::pow 0.12% 2.851ms 3.05% 70.158ms 449.729us 156
aten::nonzero 0.02% 467.298us 2.31% 53.172ms 6.646ms 8
cudaFuncGetAttributes 2.19% 50.410ms 2.19% 50.410ms 3.151ms 16
aten::to 0.03% 776.032us 2.18% 50.269ms 74.142us 678
aten::_to_copy 0.09% 2.185ms 2.15% 49.493ms 115.908us 427
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla... 1.57% 36.145ms 1.57% 36.145ms 488.446us 74
cudaStreamSynchronize 1.51% 34.656ms 1.51% 34.656ms 150.025us 231
aten::cat 0.88% 20.281ms 1.39% 31.962ms 28.873us 1107
aten::remainder 0.00% 113.288us 1.29% 29.800ms 7.450ms 4
aten::linalg_vector_norm 0.09% 2.159ms 1.24% 28.519ms 250.165us 114
aten::softmax 0.01% 153.429us 1.24% 28.500ms 593.750us 48
aten::_softmax 0.03% 732.581us 1.23% 28.347ms 590.554us 48
aten::sub 0.41% 9.529ms 1.15% 26.554ms 34.219us 776
aten::mul 0.66% 15.154ms 1.14% 26.187ms 22.712us 1153
torchvision::deform_conv2d 0.21% 4.721ms 1.02% 23.512ms 489.830us 48
aten::grid_sampler 0.05% 1.086ms 1.02% 23.425ms 92.226us 254
aten::div 0.42% 9.729ms 1.01% 23.292ms 31.390us 742
aten::batch_norm 0.01% 203.242us 0.90% 20.699ms 344.990us 60
aten::_batch_norm_impl_index 0.01% 277.477us 0.89% 20.496ms 341.602us 60
aten::conv3d 0.01% 189.311us 0.85% 19.635ms 446.239us 44
aten::add_ 0.54% 12.523ms 0.80% 18.451ms 17.116us 1078
aten::flip 0.02% 406.855us 0.77% 17.812ms 614.206us 29
aten::eq 0.01% 260.818us 0.77% 17.728ms 1.477ms 12
aten::neg 0.01% 139.777us 0.67% 15.494ms 1.937ms 8
aten::add 0.45% 10.315ms 0.63% 14.571ms 19.454us 749
cudaOccupancyMaxActiveBlocksPerMultiprocessor 0.61% 14.033ms 0.61% 14.033ms 57.512us 244
aten::relu_ 0.04% 822.113us 0.60% 13.698ms 87.805us 156
aten::arange 0.09% 1.987ms 0.58% 13.420ms 32.893us 408
aten::instance_norm 0.01% 151.675us 0.57% 13.158ms 438.599us 30
aten::clamp_min_ 0.06% 1.343ms 0.56% 12.875ms 82.535us 156
aten::reshape 0.13% 3.072ms 0.55% 12.681ms 8.387us 1512
aten::native_batch_norm 0.04% 929.592us 0.54% 12.315ms 410.515us 30
aten::cudnn_grid_sampler 0.29% 6.769ms 0.53% 12.197ms 50.821us 240
aten::stack 0.09% 1.973ms 0.51% 11.804ms 32.340us 365
aten::any 0.01% 169.477us 0.51% 11.746ms 1.958ms 6
aten::round 0.00% 51.830us 0.49% 11.327ms 5.664ms 2
aten::empty 0.48% 11.011ms 0.48% 11.011ms 5.533us 1990
aten::clone 0.05% 1.213ms 0.46% 10.543ms 32.441us 325
aten::sum 0.06% 1.441ms 0.45% 10.242ms 160.027us 64
aten::min 0.00% 92.426us 0.44% 10.239ms 5.119ms 2
aten::grid_sampler_2d 0.01% 184.281us 0.44% 10.142ms 724.442us 14
aten::threshold 0.00% 61.058us 0.43% 9.929ms 4.964ms 2
aten::max 0.00% 73.069us 0.41% 9.524ms 4.762ms 2
aten::addmm_ 0.07% 1.546ms 0.41% 9.408ms 196.010us 48
aten::sigmoid 0.11% 2.637ms 0.40% 9.183ms 43.730us 210
aten::mean 0.02% 499.139us 0.39% 9.010ms 391.741us 23
aten::l1_loss 0.00% 48.459us 0.39% 8.907ms 1.485ms 6
aten::atan2 0.00% 58.627us 0.38% 8.689ms 4.345ms 2
aten::lt 0.02% 574.884us 0.33% 7.690ms 192.260us 40
aten::cudnn_batch_norm 0.14% 3.218ms 0.33% 7.652ms 255.079us 30
aten::leaky_relu_ 0.13% 3.002ms 0.31% 7.177ms 25.542us 281
aten::relu 0.04% 920.344us 0.30% 6.848ms 33.567us 204
aten::abs 0.01% 203.165us 0.29% 6.692ms 278.846us 24
aten::view 0.28% 6.346ms 0.28% 6.346ms 1.873us 3388
aten::clamp_min 0.21% 4.835ms 0.26% 5.927ms 29.056us 204
aten::empty_like 0.05% 1.259ms 0.26% 5.913ms 9.988us 592
aten::im2col 0.07% 1.716ms 0.25% 5.829ms 126.719us 46
aten::gelu 0.00% 111.354us 0.25% 5.818ms 1.455ms 4
aten::slice 0.18% 4.206ms 0.24% 5.575ms 2.796us 1994
aten::gather 0.01% 130.534us 0.24% 5.482ms 1.371ms 4
cudaMemcpyAsync 0.23% 5.251ms 0.23% 5.251ms 15.675us 335
aten::empty_strided 0.21% 4.737ms 0.22% 5.019ms 9.159us 548
aten::as_strided 0.20% 4.667ms 0.20% 4.667ms 0.763us 6115
aten::meshgrid 0.08% 1.845ms 0.20% 4.634ms 17.963us 258
cudaEventRecord 0.19% 4.471ms 0.19% 4.471ms 1.716us 2606
aten::upsample_nearest2d 0.01% 123.315us 0.19% 4.438ms 1.110ms 4
aten::pad 0.00% 113.255us 0.19% 4.429ms 170.363us 26
aten::select 0.14% 3.197ms 0.18% 4.136ms 2.631us 1572
cudaFree 0.18% 4.106ms 0.18% 4.106ms 513.260us 8
aten::tanh 0.08% 1.838ms 0.17% 4.018ms 30.436us 132
aten::where 0.03% 766.533us 0.17% 3.956ms 47.099us 84
aten::linear 0.01% 294.243us 0.17% 3.813ms 105.929us 36
aten::rsub 0.04% 906.408us 0.17% 3.802ms 23.324us 163
aten::contiguous 0.01% 152.726us 0.17% 3.800ms 32.478us 117
cudaMalloc 0.16% 3.783ms 0.16% 3.783ms 199.119us 19
aten::zeros 0.02% 379.029us 0.16% 3.779ms 26.240us 144
aten::fill_ 0.08% 1.895ms 0.15% 3.549ms 10.754us 330
aten::matmul 0.06% 1.395ms 0.15% 3.422ms 142.576us 24
aten::linspace 0.08% 1.903ms 0.14% 3.312ms 5.175us 640
aten::resize_ 0.11% 2.634ms 0.14% 3.222ms 5.692us 566
aten::roll 0.02% 538.465us 0.14% 3.185ms 33.179us 96
aten::zero_ 0.03% 693.011us 0.13% 3.042ms 15.212us 200
aten::narrow 0.06% 1.453ms 0.13% 2.973ms 4.504us 660
aten::type_as 0.01% 150.860us 0.13% 2.937ms 31.242us 94
aten::repeat 0.03% 640.731us 0.11% 2.458ms 53.427us 46
aten::index 0.05% 1.151ms 0.10% 2.210ms 42.503us 52
aten::expand 0.07% 1.670ms 0.10% 2.195ms 3.346us 656
aten::addmm 0.06% 1.447ms 0.09% 2.126ms 70.873us 30
aten::binary_cross_entropy_with_logits 0.00% 62.545us 0.08% 1.851ms 462.803us 4
aten::upsample_bilinear2d 0.01% 333.709us 0.08% 1.841ms 131.469us 14
aten::replication_pad3d 0.00% 71.759us 0.08% 1.832ms 915.865us 2
aten::permute 0.05% 1.233ms 0.07% 1.670ms 5.353us 312
aten::avg_pool2d 0.01% 130.910us 0.07% 1.666ms 277.674us 6
made some changes and pushed it above as those changes reduced the CPU usage to a level but still the worker crashes :disappointed_relieved:
The above cpu profile was after the latest try at fixing it. Before that the main consumption component, i.e., aten::uniform_ had the following share:
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::uniform_ 53.62% 2.068s 53.62% 2.068s 55.886ms 0 b 0 b 37
For benchmarking purposes I checked the same with other models per say I took ViLT and it just had 11.64% max of CPU usage at a time unlike the above.
cc @amyeroberts @ydshieh
Hi @RUFFY-369 Thank you for trying ❤️ (and running with torch.profiler.profile too!)
with default config you mean the config which is defined in the test file and model's config is what we will get with model.config, right?
No. What I suggest to compare is:
- load the model from the Hub repository checkpoint (so the real model), and print its config
real_model.config - the model created during a test, and print its config:
tested_model.confg
I can take a closer look if you can provide the above one.
Hi @RUFFY-369 Thank you for trying ❤️ (and running with
torch.profiler.profiletoo!)with default config you mean the config which is defined in the test file and model's config is what we will get with model.config, right?
No. What I suggest to compare is:
- load the model from the Hub repository checkpoint (so the real model), and print its config
real_model.config- the model created during a test, and print its config:
tested_model.confgI can take a closer look if you can provide the above one.
Hi @ydshieh , Thank you for your reply, help and the solution you proposed❤️. We got it done, cheers for that first of all :smiling_face_with_tear: :smile: . Secondly, it was indeed OOM resulting in worker crash initially. I got it sorted out when I dived in a little more by pinpointing the memory gobbler with torch.profiler.profile and fixing it. This fix will help in general inference with the model as well.
cc @amyeroberts @NielsRogge
@RUFFY-369 I can see that there's still commits being pushed. Could you address the failing tests first, and then ping when ready for review? Let us know if there's any areas of the PR you have specific questions about or need any help with resolving the tests
@amyeroberts All the tests are GREEN :green_circle: and the PR is ready for review :+1:
Glad it works!
Could you share what change fixed it?
Also we need to trigger the slow CI
Before merging this pull request, slow tests CI should be triggered. To enable this:
- Add the
run-slowlabel to the PR (I did it now) - When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command
[run-slow]followed by a comma separated list of all the models to be tested, i.e.[run_slow] model_to_test_1, model_to_test_2 - A
transformersmaintainer will then approve the workflow to start the tests
Glad it works!
Could you share what change fixed it?
Yeah so basically when I looked back in the code with the profiler I noticed all the consumption was happening in just one class and that was where perceptual metric was created. In the original code and in this as well, VGG16 pretrained features from torchvision is used to calculate the perceptual loss. So, the worker crashes because in any mode, whether training or eval, the pretrained features were still loaded and those are 138 M parameters :smiling_face_with_tear: . So, I made changes here so that those pretrained features are loaded when the model is used for training to calculate the perceptual loss. :heart:
Also we need to trigger the slow CI
Before merging this pull request, slow tests CI should be triggered. To enable this:
- Add the
run-slowlabel to the PR (I did it now)
Thank you for adding.
Before merging this pull request, slow tests CI should be triggered. To enable this:
- When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command
[run-slow]followed by a comma separated list of all the models to be tested, i.e.[run_slow] model_to_test_1, model_to_test_2- A
transformersmaintainer will then approve the workflow to start the tests
Okay I will get this done when the PR is ready to merge and the reviews have been addressed completely. Thank you for mentioning :smile:
@ydshieh I think you triggered a build run for PR docs and it failed. I addressed that in the latest commit tho. So, is that trigger run regarding PR docs also meant to be run when all the reviews are addressed?
So, is that trigger run regarding PR docs also meant to be run when all the reviews are addressed?
We prefer to trigger (some) CIs/building jobs when the PR is (almost) ready, but it's not 100% strict :-). I can trigger the PR doc building job again.
(also since this PR is a about new model, we don't need the 2nd step a commit with the command [run-slow] ...)
(also since this PR is a about new model, we don't need the 2nd step
a commit with the command [run-slow] ...)
Okay, noted :+1:
So, is that trigger run regarding PR docs also meant to be run when all the reviews are addressed?
We prefer to trigger (some) CIs/building jobs when the PR is (almost) ready, but it's not 100% strict :-). I can trigger the PR doc building job again.
@ydshieh Thank you for the trigger build run. Just one question, like as of right now, two of them are failing so, can we address these build fails after triggering the jobs when the final review has been done and addressed?! Because even if I push the changes to fix them then you have to run it again and again for checking the fixes a few time and then that would be time consuming and inefficient for you. And also, code may change depending on the review
can we address these build fails after triggering the jobs when the final review has been done and addressed
for sure !
soft ping @molbap Thank you
> And I have changed different config attributes in the test modeling file in the above commits to make it go as light as possible but it still happens.