EAT_code Details about CLIP fine-tuning and zero-shot text-guided editing

Hi,

Could you kindly provide more details on the setting for model fine-tuning with CLIP and the zero-shot text-guided expression editing procedure?

For model fine-tuning with CLIP, my understanding is that: the same losses in emotion adaptation are used in addition to CLIP loss; the fine-tuning is performed on MEAD, where each training video is paired with fixed text prompts of the corresponding emotion category (attached in the screenshot).

For the zero-shot text-guided expression editing, I was wondering how is the CLIP text feature incorporated into the existing model structure (e.g. a mapping from CLIP feature to the latent code z or to the emotion prompt?).

Thank you in advance for your time and help!

Originally posted by @JamesLong199 in https://github.com/yuangan/EAT_code/issues/23#issuecomment-2110126521

May 14 '24 12:05 JamesLong199

Hi, thank you for your attention.

As an application of our proposed modules, we achieve this in a direct way: optimize the latent code z with the CLIP's loss. Given an emotion label and a video, we finetune the EAT components including the mapping network, EDN, and EAM to edit the expression in the video according to the input text.

The training of text-guided mapper in StyleCLIP may help you understand this process.

If you have any other questions, please let us know.

May 15 '24 05:05 yuangan

Thank you for your swift response and concise explanation. In addition, the description in the StyleCLIP paper is really helpful :)

May 17 '24 01:05 JamesLong199

Hi,

Would it be possible to upload a script for CLIP fine-tuning? Thank you in advance for your time and help.

May 25 '24 03:05 JamesLong199

Thank you for your consistent attention.

The answer is Yes. We are now considering releasing a script for zero-shot video editing this week. This is an interesting phenomenon and it needs more brilliant research, I think.

I am working on clearing the code of this part. You can try it soon.

May 27 '24 09:05 yuangan

Thank you for your consistent attention.

The answer is Yes. We are now considering releasing a script for zero-shot video editing this week. This is an interesting phenomenon and it needs more brilliant research, I think.

I am working on clearing the code of this part. You can try it soon.

Good to see you continue to work on this in my experience your implementation has been the best for replicating expressions.

May 28 '24 11:05 G-force78

I really appreciate your feedback. @G-force78

I've uploaded the zero-shot editing code, and you can find more details here.

It has been a long journey for me to develop and release this work. I hope you find the emotional talking-head generation as interesting as I do. 😜

May 28 '24 15:05 yuangan

Hi, Im not sure what this refers to?

Traceback (most recent call last): File "/content/EAT_code/prompt_st_dp_eam3d_mapper_full.py", line 162, in train(text, config, generator, None, kp_detector, audio2kptransformer, mapper, sidetuning, opt.checkpoint, log_dir, dataset, opt.device_ids) File "/content/EAT_code/train_transformer.py", line 474, in train_batch_prompt_mapper3 generator_full = GeneratorFullModelBatchDeepPromptSTEAM3DNewStyle3(text, kp_detector, audio2kptransformer, mapper, sidetuning, generator, discriminator, train_params, estimate_jacobian=config['model_params']['common_params']['estimate_jacobian']) NameError: name 'GeneratorFullModelBatchDeepPromptSTEAM3DNewStyle3' is not defined. Did you mean: 'GeneratorFullModelBatchDeepPromptSTEAM3D'?

May 29 '24 09:05 G-force78

Hi, Im not sure what this refers to?

Traceback (most recent call last): File "/content/EAT_code/prompt_st_dp_eam3d_mapper_full.py", line 162, in train(text, config, generator, None, kp_detector, audio2kptransformer, mapper, sidetuning, opt.checkpoint, log_dir, dataset, opt.device_ids) File "/content/EAT_code/train_transformer.py", line 474, in train_batch_prompt_mapper3 generator_full = GeneratorFullModelBatchDeepPromptSTEAM3DNewStyle3(text, kp_detector, audio2kptransformer, mapper, sidetuning, generator, discriminator, train_params, estimate_jacobian=config['model_params']['common_params']['estimate_jacobian']) NameError: name 'GeneratorFullModelBatchDeepPromptSTEAM3DNewStyle3' is not defined. Did you mean: 'GeneratorFullModelBatchDeepPromptSTEAM3D'?

Hi, you need to download the latest version of our code. I've uploaded these functions in our "train_transformer.py" and so on...

May 29 '24 10:05 yuangan

I did update the files, maybe Ive missed something but here it is https://github.com/yuangan/EAT_code/blob/622d5460d8308177e71edc5ee40ed0422a54ca82/train_transformer.py#L262 GeneratorFullModelBatchDeepPromptSTEAM3D

May 29 '24 10:05 G-force78

I did update the files, maybe Ive missed something but here it is

https://github.com/yuangan/EAT_code/blob/622d5460d8308177e71edc5ee40ed0422a54ca82/train_transformer.py#L262

GeneratorFullModelBatchDeepPromptSTEAM3D

Hi, I can find GeneratorFullModelBatchDeepPromptSTEAM3D and GeneratorFullModelBatchDeepPromptSTEAM3DNewStyle3 in modules/model_transformer.py. Could you check these functions in your code? Maybe you need git pull and pull the latest version.

May 29 '24 11:05 yuangan

Thank you for your consistent attention.

The answer is Yes. We are now considering releasing a script for zero-shot video editing this week. This is an interesting phenomenon and it needs more brilliant research, I think.

I am working on clearing the code of this part. You can try it soon.

Thank you so much for your awesome project and I really appreciate you taking time to release this 💯 .

May 30 '24 14:05 JamesLong199