Slide improvements
These slides are getting used as a reference for teaching in the ML for BioInformatic course as well. In the process of class, some points of improvement got found. This issues tries to serve as a thread for conveying these improvements.
In the first slide, the slide that shows the category of MLs have 2 typos that need to be fixed. First: Recommender system (is written recommended system) Second: Feature Elimination (is written feature elicitation)
@FunnyPhantom Thank you for your suggestion. Please provide the exact name of the slide and page numbers.
@mahsayazdani the file that needs improvement resides here:
Slides/Chapter_02_Classical_Models/Introduction to ML/Figs/1.jpeg
There is another issue in here: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/3a595142161801b224e9fd06b1e447de7dfb0749/Slides/Chapter_02_Classical_Models/Loss/Loss.tex#L269
The definition of this function should replace cosh with Logcosh.
There is also another improvements that can happen.
Ideally after this slide: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/a78a9025418769af3d4d99d4246b108cd504eff3/Slides/Chapter_03_Train_and_Evaluation/SVM_ML2022.tex#L554
In this slide, there are multiple improvements: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/main/Slides/Chapter_04_Tabular_Data_Models/Chapter%204%20(ML_Models_for_Tabular_Datasets).pdf
- The Majority Voting slide should be after soft clustering.
- In the Why Majority Voting slide, the ensemble error is missing a choose(n,k).
- for the P(k) in the slide above, the condition of k > [n/2] should be removed. (but not for the ensemble error)
On each node, i will make my tree to use random subset of features (sqrt(n))

There is a sentence which is not there iand does not convey the sould of fthe topic.
https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1244
Sample size should be removed in the following picture.
The Fanout notation is described here: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1137
It should be added before here: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1223
Also for the xavier initialization slides, please denote whether it has been initialized with normal distribution, or uniform distribution.
In this figure, the "loss" should be changed to "training loss" https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#LL1627C13-L1627C13
In this figure, the "testing error" should be changed to "validation error" https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1743
Also, Early stopping slides to be put before L1/L2 regularization term.
Also, solution to dropout causing hyperactivation is missing. https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1823
Wrong picture is getting used for the result of CNN training. The picture is the same as the FCN. https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_06_Convolutional_Neural_Networks/CNN_Architecture/CNN-Architecture.tex#L593
In the same context as the comment above, the Epochs of each network should be explicitly specified.
Moreover, the epoch number should be starting from 1 and not 0 (Since zero is indicating not even once gradient decent has been ran)
Page 20-22 can be aggregated into one slide.
Moreover providing a famous kernel with well known function (such has horizontal or vertical edge detection) can be more helpful in the instruction process.
In page 87, the number of dimensions is written as Channel * Width * Height. The common way to denote them is Width * Height * Channel. Changing this can help reduce student's confusion.
Also there it on the bottom of the page, the number is written as n_kernels * channels * width * height. In addition to changing the suggestion above, it would be good to write the number as n_kernels x ( width * height * channel) This will reduce confusion as it is showing the first number is the number of kernels and not the tensor dimensions.
** VERY IMPORTANT CHANGE ** All the slides for the channel section in the CNN slides should be moved BEFORE the strides Section
This needs to change from E(yy_hat) to E(2yy_hat)
https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/dcadd7d80b422ff98389b059bcb1f72c6181cd12/Slides/Chapter_02_Classical_Models/Generalization%20Error/Generlaization_Error.tex#L283
Very important!
Please add the following images for the GAN slides to convey the concept better:
ref
ref
(Based on Dr. Sharifi's comments, it would be best to revise the slides for GAN rigorously and let him review the results) (Also Dr. Sharifi was searching these search terms in the class: "CycleGan" "DiscoGan")
page 22/65 RNN, a simpler example which can convey the meaning better. Preferably with same dimension and with a non linear activation function
- Notation in RNN slides is not consistent. It should be consistent. Ideally everything should be defined based on LSTM Notation
Page 37 RNN slides, there are two activation functions, one can be only tanh, but the other one can be tanh or sigmoid. This should be explicitely specified.
Page 38 RNN slides, for the Gated recurrent unit, it should have more details for its architecture. (Compare with the previous page slide which is the architecture of a simple RNN Unit)
In the introduction of the RNN slides, the limitation of previous model should be specified first in order to give more context about the problem RNN solves.
RNN limitation should be specified, before going to GRU for the same reason explained above.
GRU limitation should be specified before going to LSTM for the same reason above.
Transformer should be introduced by the limitation of LSTM. (BPTT (need to be sequential), vanishing or exploding gradient, long range dependency)