Calmepro777
Calmepro777
> Did you find a solution? I am reshaping all images to 224X224, but that seems a bit fishy, especially with the varying aspect ratio. That was how I tackle...
> > > > I use the code in the `README.md (Zero-Shot Prediction)` to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62%...
> Hi, > > There can be numerical differences that we cannot fully control, e.g. different CUDA and driver versions, batch sizes, hardware, etc., that may cause the 0.5% difference...
Here is the best results I obtained: image encoder: ViT-B/16 prompt: "itap of a {label}." | Dataset | Reproduced Acc. | Reported Acc. | Gap | | ------------- | -------------...
I noticed that my GPU utility and VRAM usage is so low, 2% and ~2GiB respectively, any hint on resolving this problem?Is there a specific hyper-parameter I should set to...
Thanks for the clarification. I am folllowing your guidance to process the vox2 dataset. Regarding the preprocessed MEAD dataset I downloaded via the link you provided, however, it appears to...
In addition, I noticed that even if the person in the video that serve as headpose source has minimal head movement, the person in the generated video is like being...
> Thank you for your attention. > > You can download the preprocessed MEAD data from [Yandex](https://disk.yandex.com/d/yzk1uTlZgwortw) or [Baidu](https://pan.baidu.com/s/1Jxzow2anGjMa-y3F8yQwAw?pwd=lsle#list/path=%2F). > > As for the Vox2, you can find some details...
> 1. Yes, we do not use Vox2 data in fine-tuning the emotional adaptation stage. > 2. The deepfeature32 contains audio features extracted by the DeepSpeech [code](https://github.com/yuangan/EAT_code/tree/main/preprocess/deepspeech_features). Every dataset should...