Survey of Model Improvements Proposals

Open kvrooman opened this issue 7 years ago • 1 comments

Creating a list to organize my own thoughts but I would love to hear everyone else's ideas and suggestions as well as what they're looking at

Ways to improve the faceswap mode( accuracy, speed, robustness to outliers, model convergence )

Adapt a submission from the NIST Face Recognition 2017 Challenge or other survey - Website - https://www.nist.gov/programs-projects/face-recognition-prize-challenge - Results - https://nvlpubs.nist.gov/nistpubs/ir/2017/NIST.IR.8197.pdf

Adapt a submission from the Menpo Face Alignment 2017 Challenge - Website - https://ibug.doc.ic.ac.uk/resources/2nd-facial-landmark-tracking-competition-menpo-ben/ - Results - https://pdfs.semanticscholar.org/657a/58a220b1e69d14ef7a88be859d2f8d75e6a1.pdf - Results from 1adrianb's own papers show that the competition winners are more accurate - Variety of openess on source code from winners..one top submission is on github - https://github.com/MarekKowalski/DeepAlignmentNetwork - https://www.dropbox.com/sh/u4g2o5kha0mt1uc/AADDMkoMKG2t4iiTxMUC6e2Ta?dl=0&preview=Deep+Alignment+Network.pptx

Add Batch Normalization after Conv and Dense Layers in the autoencoder ( Current code: No norm )
- lots of papers on improving training speed and reducing over-fitting, better stability
- https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c
- explore alternatives to standard batch norm
  - batch renormalization
  - ghost normalization
  - streaming normalization
Replace rectifier in the conv layer with more robust alternatives ( Current code: Leaky RELU )
- PRELU
- ELU
- PELU
Adjust learning rate when user changes batch size ( Current code: no LR scaling with batch size )
- lots of papers but simply, if you have a larger batch, you can afford to explore larger step sizes and train more quickly with the same model stability
- Apply linear scaling factor to learning rate if batch size is increased ( doubling batch size double learning rate )

https://www.quora.com/Intuitively-how-does-mini-batch-size-affect-the-performance-of-stochastic-gradient-descent

Explore using other optimizers in autoencoder ( Current code: Adam )
- SGD with momentum
- Cyclical Learning Rate
- L4adam
- Yellow Fin
Effective learning rate schedule ( Current code: no adjustment of LR after start or at stagnation )

best practice is to modify(lower) the learning rate or (raise) the batch size either at set intervals or after the loss has stagnated ( Note: for a constant mini-batch size of say 16 which is limited by GPU memory, you can still increase batch size, i.e. running on multiple GPUs or sequential GPU runs )
https://openreview.net/pdf?id=B1Yy1BxCZ
also related: model restarting at set intervals with higher LR to kick out of local minimum
https://arxiv.org/pdf/1608.03983.pdf

Dependent on model architecture ( normalization, batch size, regularization, better rectifiers, optimizers allow you to increase LR with same stability/accuracy
Suspect it is too low but current model has few of the tweaks which promote training stability
Also highly dependent on default batch size

Use keras.preprocessing.image.ImageDataGenerator ( Current code: random_transform and other )

more built-in transforms ( shear, skew, whitening etc. ) to create warpd images that are sent to trainer
built-in normalization for warped batches
integrated into keras model pipeline for queuing and paralleling
https://machinelearningmastery.com/image-augmentation-deep-learning-keras/

Threw in a lot, but can add more if anyone ever looks at this. PS - I was looking at 4/5/7/9 with a re-write of the Original_Model

Mar 15 '18 16:03 kvrooman

Very nice info, thanks

Mar 16 '18 10:03 Clorr