Introduction_to_Machine_Learning icon indicating copy to clipboard operation
Introduction_to_Machine_Learning copied to clipboard

Slide improvements

Open FunnyPhantom opened this issue 3 years ago • 30 comments

These slides are getting used as a reference for teaching in the ML for BioInformatic course as well. In the process of class, some points of improvement got found. This issues tries to serve as a thread for conveying these improvements.

FunnyPhantom avatar Feb 05 '23 08:02 FunnyPhantom

In the first slide, the slide that shows the category of MLs have 2 typos that need to be fixed. First: Recommender system (is written recommended system) Second: Feature Elimination (is written feature elicitation)

FunnyPhantom avatar Feb 05 '23 08:02 FunnyPhantom

@FunnyPhantom Thank you for your suggestion. Please provide the exact name of the slide and page numbers.

mahsayazdani avatar Feb 06 '23 19:02 mahsayazdani

@mahsayazdani the file that needs improvement resides here: Slides/Chapter_02_Classical_Models/Introduction to ML/Figs/1.jpeg

FunnyPhantom avatar Feb 14 '23 11:02 FunnyPhantom

There is another issue in here: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/3a595142161801b224e9fd06b1e447de7dfb0749/Slides/Chapter_02_Classical_Models/Loss/Loss.tex#L269 The definition of this function should replace cosh with Logcosh.

FunnyPhantom avatar Mar 12 '23 07:03 FunnyPhantom

There is also another improvements that can happen.

Ideally after this slide: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/a78a9025418769af3d4d99d4246b108cd504eff3/Slides/Chapter_03_Train_and_Evaluation/SVM_ML2022.tex#L554

FunnyPhantom avatar Apr 04 '23 08:04 FunnyPhantom

In this slide, there are multiple improvements: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/main/Slides/Chapter_04_Tabular_Data_Models/Chapter%204%20(ML_Models_for_Tabular_Datasets).pdf

  1. The Majority Voting slide should be after soft clustering.
  2. In the Why Majority Voting slide, the ensemble error is missing a choose(n,k).

FunnyPhantom avatar Apr 09 '23 07:04 FunnyPhantom

  1. for the P(k) in the slide above, the condition of k > [n/2] should be removed. (but not for the ensemble error)

FunnyPhantom avatar Apr 09 '23 07:04 FunnyPhantom

On each node, i will make my tree to use random subset of features (sqrt(n)) photo_2023-04-09_18-48-18

There is a sentence which is not there iand does not convey the sould of fthe topic.

FunnyPhantom avatar Apr 09 '23 08:04 FunnyPhantom

https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1244

Sample size should be removed in the following picture.

FunnyPhantom avatar Apr 27 '23 06:04 FunnyPhantom

The Fanout notation is described here: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1137

It should be added before here: https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1223

FunnyPhantom avatar Apr 27 '23 06:04 FunnyPhantom

Also for the xavier initialization slides, please denote whether it has been initialized with normal distribution, or uniform distribution.

FunnyPhantom avatar Apr 27 '23 06:04 FunnyPhantom

In this figure, the "loss" should be changed to "training loss" https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#LL1627C13-L1627C13

FunnyPhantom avatar May 02 '23 18:05 FunnyPhantom

In this figure, the "testing error" should be changed to "validation error" https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1743

FunnyPhantom avatar May 02 '23 18:05 FunnyPhantom

Also, Early stopping slides to be put before L1/L2 regularization term.

FunnyPhantom avatar May 02 '23 18:05 FunnyPhantom

Also, solution to dropout causing hyperactivation is missing. https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_05_Deep_Neural_Networks/Introduction-to-NN.tex#L1823

FunnyPhantom avatar May 02 '23 18:05 FunnyPhantom

Wrong picture is getting used for the result of CNN training. The picture is the same as the FCN. https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/be0f28ab925daae56ef2ae208b97add3cf00441f/Slides/Chapter_06_Convolutional_Neural_Networks/CNN_Architecture/CNN-Architecture.tex#L593

FunnyPhantom avatar May 08 '23 15:05 FunnyPhantom

In the same context as the comment above, the Epochs of each network should be explicitly specified.

Moreover, the epoch number should be starting from 1 and not 0 (Since zero is indicating not even once gradient decent has been ran)

FunnyPhantom avatar May 08 '23 15:05 FunnyPhantom

Page 20-22 can be aggregated into one slide.

Moreover providing a famous kernel with well known function (such has horizontal or vertical edge detection) can be more helpful in the instruction process.

FunnyPhantom avatar May 08 '23 15:05 FunnyPhantom

In page 87, the number of dimensions is written as Channel * Width * Height. The common way to denote them is Width * Height * Channel. Changing this can help reduce student's confusion.

Also there it on the bottom of the page, the number is written as n_kernels * channels * width * height. In addition to changing the suggestion above, it would be good to write the number as n_kernels x ( width * height * channel) This will reduce confusion as it is showing the first number is the number of kernels and not the tensor dimensions.

FunnyPhantom avatar May 08 '23 15:05 FunnyPhantom

** VERY IMPORTANT CHANGE ** All the slides for the channel section in the CNN slides should be moved BEFORE the strides Section

FunnyPhantom avatar May 08 '23 15:05 FunnyPhantom

This needs to change from E(yy_hat) to E(2yy_hat)

https://github.com/asharifiz/Introduction_to_Machine_Learning/blob/dcadd7d80b422ff98389b059bcb1f72c6181cd12/Slides/Chapter_02_Classical_Models/Generalization%20Error/Generlaization_Error.tex#L283

FunnyPhantom avatar May 13 '23 16:05 FunnyPhantom

Very important!

Please add the following images for the GAN slides to convey the concept better: image ref image ref image

(Based on Dr. Sharifi's comments, it would be best to revise the slides for GAN rigorously and let him review the results) (Also Dr. Sharifi was searching these search terms in the class: "CycleGan" "DiscoGan")

FunnyPhantom avatar May 28 '23 06:05 FunnyPhantom

page 22/65 RNN, a simpler example which can convey the meaning better. Preferably with same dimension and with a non linear activation function

FunnyPhantom avatar May 28 '23 06:05 FunnyPhantom

  • Notation in RNN slides is not consistent. It should be consistent. Ideally everything should be defined based on LSTM Notation

FunnyPhantom avatar May 28 '23 11:05 FunnyPhantom

Page 37 RNN slides, there are two activation functions, one can be only tanh, but the other one can be tanh or sigmoid. This should be explicitely specified.

FunnyPhantom avatar May 28 '23 11:05 FunnyPhantom

Page 38 RNN slides, for the Gated recurrent unit, it should have more details for its architecture. (Compare with the previous page slide which is the architecture of a simple RNN Unit)

FunnyPhantom avatar May 28 '23 11:05 FunnyPhantom

In the introduction of the RNN slides, the limitation of previous model should be specified first in order to give more context about the problem RNN solves.

FunnyPhantom avatar May 28 '23 11:05 FunnyPhantom

RNN limitation should be specified, before going to GRU for the same reason explained above.

FunnyPhantom avatar May 28 '23 11:05 FunnyPhantom

GRU limitation should be specified before going to LSTM for the same reason above.

FunnyPhantom avatar May 28 '23 11:05 FunnyPhantom

Transformer should be introduced by the limitation of LSTM. (BPTT (need to be sequential), vanishing or exploding gradient, long range dependency)

FunnyPhantom avatar Jun 06 '23 13:06 FunnyPhantom