SketchyDatabase
SketchyDatabase copied to clipboard
This project is a reimplementation of The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies
SketchyDatabase
This project is a repo of The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies.
The homepage of the original project.
Get the dataset via Google Drive sketchydataset SketchyDataset Intro
DataSet
Sketchy Database
Test Set
As I didn't notice that the Sketchy Database contained a list of the testing photos, I randomly chose the testing photos and their related sketches myself. The test data set are listed in TEST_IMG and TEST_SKETCH
| category | photo | sketch |
|---|---|---|
| airplane | 10 | 75 |
| alarm_clock | 52 | |
| ant | 53 | |
| . | . | |
| . | . | |
| . | . | |
| window | 54 | |
| wine_bottle | 52 | |
| zebra | 66 | |
| Total | 1250 | 7875 |
The Dataset Structure in My Project
Dataset
├── photo-train # the training set of photos
├── sketch-triplet-train # the training set of sketches
├── photo-test # the testing set of photos
├── sketch-triplet-test # the testing set of sketches
Test
using feature_extract.py to get the extracted feature files ('*.pkl')
using retrieval_test.py to get the testing result.
Testing Result
There is no GoogLeNet, which resulted the best in the original paper, implement in PyTorch, so I used vgg16 instead.
| model | epoch | recall@1 | recall@5 |
|---|---|---|---|
| resnet34(pretrained;mixed training set;metric='cosine') | |||
| 90 | 8.51% | 18.68% | |
| 150 | 9.31% | 20.44% | |
| resnet34(pretrained;mixed training set;metric='euclidean') | |||
| 90 | 6.45% | 14.79% | |
| 150 | 6.96% | 16.46% | |
| resnet34(150 epoch;triplet loss m=0.02;metric='euclidean';lr=1e-5 batch_size=16) | |||
| 85 | 9.87% | 22.37% | |
| vgg16(pretrained;triplet loss m=0.3;metric='euclidean';lr=1e-5;batch_size=16) | |||
| 0 | 0.17% | 0.72% | |
| 5 | 17.59% | 45.51% | |
| 190 | 31.03% | 67.86% | |
| 275 | 32.22% | 68.48% | |
| 975 | 35.24% | 71.53% | |
| vgg16(fine-tune(275epoch);m=0.15;metric='euclidean';lr=1e-7;batch_size=16) | |||
| 55 | 33.22% | 70.04% | |
| 625 | 35.78% | 72.44% | |
| 995 | 36.09% | 73.02% | |
| resnet50(pretrained; triplet loss m=0.15; metric='euclidean'; lr=1e-7;batch_size=16) | |||
| 0 | 0.71% | 11.48% | |
| 55 | 10.18% | 29.94% | |
| 940 | 15.17% | 47.61% | |
| resnet50(pretrained; triplet loss m=0.1; metric='euclidean'; lr=1e-6 batch_size=32) | |||
| 315 | 19.58% | 57.19% | |
| resnet50(pretrained; triplet loss m=0.3; metric='euclidean'; lr=1e-5 batch_size=48) | |||
| 20 | 21.56% | 57.50% | |
| 95 | 30.32% | 71.73% | |
| 265 | 40.08% | 78.83% | |
| 930 | 46.04% | 83.30% |
I have no idea about why the resnet34 got that bad result, while the vgg16 and resnet50 resulted pretty well.
Retrieval Result
I randomly chose 20 sketches as the query sketch and here is the retrieval result. The model I used is the resnet50(pretrained; triplet loss m=0.3; metric='euclidean'; lr=1e-5 batch_size=48) after 265 training epoch.

Feature Visulization via T-SNE
all the visulizated categories are the first ten categories in alphabetical order.
The boxes represent the photos, while the points represent the sketches.
| model | vis |
|---|---|
| resnet34 pretrained on ImageNet | ![]() |
| pretrained; sketch branch& photo branch are trained sparately | |
| resnet34 | ![]() |
| pretrained; mixed training set | |
| resnet34 after 90 training epoch | ![]() |
| resnet34 after 150 training epoch | ![]() |
| pretrained; triplet loss m=0.3 lr=1e-5 | |
| vgg16 after 0 training epoch | ![]() |
| vgg16 after 5 training epoch | ![]() |
| vgg16 after 190 training epoch | ![]() |
| fine tune; triplet loss m=0.15 lr=1e-7 | |
| vgg16(fine tune) after 995 training epoch | ![]() |
| pretrained; triplet loss m=0.15 lr=1e-7 | |
| resnet50 after 0 training epoch | ![]() |
| resnet50 after 940 training epoch | ![]() |
| pretrained; triplet loss m=0.1 lr=1e-6 | |
| resnet50 after 315 training epoch | ![]() |
| pretrained; triplet loss m=0.3 lr=1e-5 batch_size=48 | |
| resnet50 after 265 training epoch | ![]() |











