Survey of Model Improvements Proposals
Creating a list to organize my own thoughts but I would love to hear everyone else's ideas and suggestions as well as what they're looking at
Ways to improve the faceswap mode( accuracy, speed, robustness to outliers, model convergence )
- Improved face detection options ( Current code: dlib/CNN+mmod )
- MTCNN
- Mask R-CNN ( would also provide a pixel-by-pixel semantic mask for the face )
- YOLOv2
- Improved face recognition ( Current code: ageitgey/face_recognition )
- Adapt a submission from the NIST Face Recognition 2017 Challenge or other survey - Website - https://www.nist.gov/programs-projects/face-recognition-prize-challenge - Results - https://nvlpubs.nist.gov/nistpubs/ir/2017/NIST.IR.8197.pdf
- Improved face alignment ( Current code: 1adrianb/face-alignment )
- Adapt a submission from the Menpo Face Alignment 2017 Challenge - Website - https://ibug.doc.ic.ac.uk/resources/2nd-facial-landmark-tracking-competition-menpo-ben/ - Results - https://pdfs.semanticscholar.org/657a/58a220b1e69d14ef7a88be859d2f8d75e6a1.pdf - Results from 1adrianb's own papers show that the competition winners are more accurate - Variety of openess on source code from winners..one top submission is on github - https://github.com/MarekKowalski/DeepAlignmentNetwork - https://www.dropbox.com/sh/u4g2o5kha0mt1uc/AADDMkoMKG2t4iiTxMUC6e2Ta?dl=0&preview=Deep+Alignment+Network.pptx
-
Add Batch Normalization after Conv and Dense Layers in the autoencoder ( Current code: No norm )
- lots of papers on improving training speed and reducing over-fitting, better stability
- https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c
- explore alternatives to standard batch norm
- batch renormalization
- ghost normalization
- streaming normalization
-
Replace rectifier in the conv layer with more robust alternatives ( Current code: Leaky RELU )
- PRELU
- ELU
- PELU
-
Adjust learning rate when user changes batch size ( Current code: no LR scaling with batch size )
- lots of papers but simply, if you have a larger batch, you can afford to explore larger step sizes and train more quickly with the same model stability
- Apply linear scaling factor to learning rate if batch size is increased ( doubling batch size double learning rate )
- https://www.quora.com/Intuitively-how-does-mini-batch-size-affect-the-performance-of-stochastic-gradient-descent
-
Explore using other optimizers in autoencoder ( Current code: Adam )
- SGD with momentum
- Cyclical Learning Rate
- L4adam
- Yellow Fin
-
Effective learning rate schedule ( Current code: no adjustment of LR after start or at stagnation )
- best practice is to modify(lower) the learning rate or (raise) the batch size either at set intervals or after the loss has stagnated ( Note: for a constant mini-batch size of say 16 which is limited by GPU memory, you can still increase batch size, i.e. running on multiple GPUs or sequential GPU runs )
- https://openreview.net/pdf?id=B1Yy1BxCZ
- also related: model restarting at set intervals with higher LR to kick out of local minimum
- https://arxiv.org/pdf/1608.03983.pdf
- Initial learning rate ( Current code: 5e-5 )
- Dependent on model architecture ( normalization, batch size, regularization, better rectifiers, optimizers allow you to increase LR with same stability/accuracy
- Suspect it is too low but current model has few of the tweaks which promote training stability
- Also highly dependent on default batch size
- Use keras.preprocessing.image.ImageDataGenerator ( Current code: random_transform and other )
- more built-in transforms ( shear, skew, whitening etc. ) to create warpd images that are sent to trainer
- built-in normalization for warped batches
- integrated into keras model pipeline for queuing and paralleling
- https://machinelearningmastery.com/image-augmentation-deep-learning-keras/
Threw in a lot, but can add more if anyone ever looks at this. PS - I was looking at 4/5/7/9 with a re-write of the Original_Model
Very nice info, thanks