AniPortrait size mismatch for PPE.pe

when I prepare the environment by using pip24.2 all package successfully installed except torchsde:

pip install torchsde==0.2.5 Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Collecting torchsde==0.2.5 Using cached https://mirrors.aliyun.com/pypi/packages/73/8d/efd3e7b31ea854d0bd6886aa3cf44914adce113a6d460850af41ac1dd4dd/torchsde-0.2.5-py3-none-any.whl (59 kB) WARNING: Ignoring version 0.2.5 of torchsde since it has invalid metadata: Requested torchsde==0.2.5 from https://mirrors.aliyun.com/pypi/packages/73/8d/efd3e7b31ea854d0bd6886aa3cf44914adce113a6d460850af41ac1dd4dd/torchsde-0.2.5-py3-none-any.whl#sha256=4c34373a94a357bdf60bbfee00c850f3563d634491555820b900c9a4f7eff300 has invalid metadata: .* suffix can only be used with == or != operators numpy (>=1.19.*) ; python_version >= "3.7" ~~~~~~~^ Please use pip<24.1 if you need to use this version. ERROR: Could not find a version that satisfies the requirement torchsde==0.2.5 (from versions: 0.2.5, 0.2.6) ERROR: No matching distribution found for torchsde==0.2.5

so I remove the version specification of torchsde in requirement.txt, and pip install -r requirement successfully completed.

when I run 'python -m scripts.app', the following error occurs:

(anip) D:\gitrepos\AniPortrait>python -m scripts.app Some weights of the model checkpoint at ./pretrained_model/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']

This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ./pretrained_model/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of the model checkpoint at ./pretrained_model/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ./pretrained_model/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "D:\pinokio\bin\miniconda\envs\anip\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\pinokio\bin\miniconda\envs\anip\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\gitrepos\AniPortrait\scripts\app.py", line 49, in a2p_model.load_state_dict(torch.load(audio_infer_config['pretrained_model']['a2p_ckpt']), strict=False) File "D:\pinokio\bin\miniconda\envs\anip\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Audio2PoseModel: size mismatch for PPE.pe: copying a param with shape torch.Size([1, 630, 512]) from checkpoint, the shape in current model is torch.Size([1, 600, 512]).

is this a torchsde version problem?

Sep 12 '24 22:09 godsonzhou

I encountered the same problem, how did you solve it?

Oct 22 '24 15:10 WarmCongee

+1

Oct 31 '24 03:10 xwb123

+1

I fixed this problem by changing the initialization length of pose embedding in the code. The specific location is on line 43 of the file https://github.com/Zejun-Yang/AniPortrait/blob/main/src/audio_models/pose_model.py. I fixed this by changing max_len to 630. But I'm not sure this is as required

Nov 01 '24 10:11 WarmCongee