Data-Science-45min-Intros
Data-Science-45min-Intros copied to clipboard
Materials for our team teaching+learning sessions around CS, ML, stats, and related data science topics. Intended to take ~45 minutes, mostly in narrative IPython notebooks.
Data Science 45-min Intros
Every week*, our data science team @Gnip (aka @TwitterBoulder) gets together for about 50 minutes to learn something.
While these started as opportunities to collectively "raise the tide" on common stumbling blocks in data munging and analysis tasks, they have since grown to machine learning, statistics, and general programming topics. Anything that will help us do our jobs better is fair game.
For each session, someone puts together the lesson/walk-through and leads the discussion. Presentation platforms commonly include well-written READMEs, IPython notebooks, knitr documents, interactive code sessions... the more hands-on, the better.
Feel free to use these for your own (or your team's) growth, and do submit pull requests if you have something to add.
*ok, while we try to do it every week, sometimes it doesn't happen. In that case, we try to guilt trip the person who slacked.
Current topics
Python
-
Object oriented programming concepts + modules/packaging
-
Unit testing with
unittest -
Iterators + Generators
-
Introduction to
pandas -
Introduction to Vertica with
vertica_python -
Introduction to
multiprocessing -
Python decorators
-
Python Interfaces
-
Python logging
Bash + command-line tools
-
Using
jq -
Bash data structures
-
Regular expressions
Statistics
-
Maximum Likelihood Estimation
-
Count-Min algorithm
-
A/B Testing
-
Causal inference
-
Error statistics
-
Classical statistics applied to social data
-
Meaningful comparisons of ordered lists
-
Counting and Maximum Likelihood Estimation
-
Estimating the number of classes in a population
-
Long Tail Distributions I
-
Long Tail Distributions II
-
Maximum Likelihood Parameter Estimation
-
Probabilty graph models
Machine Learning
-
Intro to
scikit-learn -
Introduction to K-means clustering
-
Choosing
kin k-means clustering -
Logistic Regression
-
Naive Bayes Classifier
-
Introduction to kNN
-
Introduction to AdaBoost
-
Decision Trees
-
Basis expansions + kernels
-
Model selection
-
Introduction to SVM
-
Text Mining with
sklearn -
Bandit Algorithms
-
Kernel smoothing
-
Neural Networks I
-
Neural Networks II
Natural Langugage Processing
-
Intro to topic modeling
-
More on topic modeling & a practical example
-
Part of speech tagging
-
Text processing
-
Word vector spaces
Network structure
-
Network statistics + igraph
-
Network analysis: using null models
-
Network analysis: community structures
-
Network analysis: centrality metrics
Algorithms
- Count min sketch
Engineering
- Refactoring
Geographic Information Systems
- Shapefile utilties + reverse geo coding (and Makefile)
Web development
-
Websockets
-
Python + Flask basics
Visualization
-
D3 and Javascript Intro
-
D3 reusable charts: Heatmap
-
Real Time Data - Websockets Intro
-
Introduction to horizon charts
-
Bokeh
-
Matplotlib - Graphing for science in Python
Databases
-
SQL 201 - script-based data and queries
-
Vertica