FFVT
FFVT copied to clipboard
[BMVC 2021] The official PyTorch implementation of Feature Fusion Vision Transformer for Fine-Grained Visual Categorization
Feature Fusion Vision Transformer for Fine-Grained Visual Categorization
Official PyTorch implementation of Feature Fusion Vision Transformer for Fine-Grained Visual Categorization (BMVC 2021).
If you use the code in this repo for your work, please cite the following bib entries:
@article{wang2021feature,
title={Feature Fusion Vision Transformer for Fine-Grained Visual Categorization},
author={Wang, Jun and Yu, Xiaohan and Gao, Yongsheng},
journal={British Machine Vision Conference},
year={2021}
}
Abstract
Prerequisites
The following packages are required to run the scripts:
- [Python >= 3.6]
- [PyTorch = 1.5]
- [Torchvision]
- [Apex]
Download Google pre-trained ViT models
- Get models in this link: ViT-B_16, ViT-B_32...
wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npz
Dataset
You can download the datasets from the links below:
Training scripts for FFVT on Cotton dataset.
Train the model on the Cotton dataset. We run our experiments on 4x2080Ti/4x1080Ti with the batchsize of 8 for each card.
$ python3 -m torch.distributed.launch --nproc_per_node 4 train.py --name {name} --dataset cotton --model_type ViT-B_16 --pretrained_dir {pretrained_model_dir} --img_size 384 --resize_size 500 --train_batch_size 8 --learning_rate 0.02 --num_steps 2000 --fp16 --eval_every 16 --feature_fusion
Training scripts for FFVT on Soy.Loc dataset.
Train the model on the SoyLoc dataset. We run our experiments on 4x2080Ti/4x1080Ti with the batchsize of 8 for each card.
$ python3 -m torch.distributed.launch --nproc_per_node 4 train.py --name {name} --dataset soyloc --model_type ViT-B_16 --pretrained_dir {pretrained_model_dir} --img_size 384 --resize_size 500 --train_batch_size 8 --learning_rate 0.02 --num_steps 4750 --fp16 --eval_every 50 --feature_fusion
Training scripts for FFVT on CUB dataset.
Train the model on the CUB dataset. We run our experiments on 4x2080Ti/4x1080Ti with the batchsize of 8 for each card.
$ python3 -m torch.distributed.launch --nproc_per_node 4 train.py --name {name} --dataset CUB --model_type ViT-B_16 --pretrained_dir {pretrained_model_dir} --img_size 448 --resize_size 600 --train_batch_size 8 --learning_rate 0.02 --num_steps 10000 --fp16 --eval_every 200 --feature_fusion
Training scripts for FFVT on Dogs dataset.
Train the model on the Dog dataset. We run our experiments on 4x2080Ti/4x1080Ti with the batchsize of 4 for each card.
$ python3 -m torch.distributed.launch --nproc_per_node 4 train.py --name {name} --dataset CUB --model_type ViT-B_16 --pretrained_dir {pretrained_model_dir} --img_size 448 --resize_size 600 --train_batch_size 4 --learning_rate 3e-3 --num_steps 30000 --fp16 --eval_every 300 --feature_fusion --decay_type linear --num_token 24
Download Models
Acknowledgment
Thanks for the advice and guidance given by Dr.Xiaohan Yu and Prof. Yongsheng Gao.
Our project references the codes in the following repos. Thanks for thier works and sharing.