DataScience icon indicating copy to clipboard operation
DataScience copied to clipboard

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Link to Notebooks:


Other Resources

Books

  • Understanding Machine Learning, From Theory to Algorithms. Shai Shalev-Shwartz and Shai Ben-David

Concepts in Machine Learning

  • Kernel Methods: http://kernel-methods.net
  • Support Vectors: http://support-vector.net
  • A collection of resources or SVM: http://svms.org
  • Everythong you Want to Know about Kernel Trick: http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

Data Clustering

  • Vector Quantization: http://www.data-compression.com/vq.shtml
  • Mean-shift clustering: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf
  • Affinity Propagation, Clustering By Passing Messages: http://www.cs.columbia.edu/~delbert/docs/DDueck-thesis_small.pdf

Text Analysis

  • Topic modeling: gensim https://radimrehurek.com/gensim

Time-series data

  • Fourier Transformation http://www.thefouriertransform.com
  • Conditional Random Field https://pystruct.github.io/index.html

Hadoop (Java)

  • Customizing mapper and reducer: http://hadooptutorial.info/creating-custom-hadoop-writable-data-type/

Tools and Softwares

  • Building Hidden Markov Models in Python: http://hmmlearn.readthedocs.org/en/latest

Datasets

  • MNIST (handwritten digits) dataset: http://yann.lecun.com/exdb/mnist/
  • Movie Dataset: http://grouplens.org/datasets/movielens/
  • Million Song Dataset: http://labrosa.ee.columbia.edu/millionsong/
  • Adult dataset: http://archive.ics.uci.edu/ml/datasets/Adult
  • Star Cluster: (Hertzsprung-Russell Diagram Data of Star Cluster CYG OB1) https://vincentarelbundock.github.io/Rdatasets/doc/robustbase/starsCYG.html
  • Speech Recognition: https://code.google.com/archive/p/hmm-speech-recognition/downloads