ML_for_learner
ML_for_learner copied to clipboard
Implementations of the machine learning algorithm with Python and numpy
ML_for_learner
该项目旨在使用numpy实现一个类scikit-learn的mini机器学习库,对于相关的知识,均配有blog文章对其理论进行讲解,对于部分功能,还配有notebook分析代码实现上的细节。该项目的初衷是为那些算法学习者提供从理论到实现的一站式服务。
由于本人学识有限,并且没有Python开发经验,该库目前还是一个非常松散的代码集合体。如果你在blog、notebook或者code中发现任何纰漏或bug,甚至是觉得哪写的不通顺,都可以联系我,当然也可以直接在项目页面提issue,谢谢。
QQ: 435248055 | WeChat: QQ435248055 | Blog
点击算法名称进入相应Blog了解算法理论,notebook指导如何step-by-step的去实现该算法,code为模块化的代码文件。
注:除非特别说明,各模型所接受的数据格式均为numpy.ndarray格式,部分也可接受List或者嵌套List,除此之外的数据格式本人暂不保证。由于目前的Python type hint还不支持numpy,所以在代码中未说明(感谢微信昵称@Stream的提醒)。
Supervised learning
| Class | Algorithm | Implementation | Code |
|---|---|---|---|
| Generalized Linear Models | Linear Regression | notebook | code |
| Logistic regression | notebook | code | |
| Nearest Neighbors | Nearest Neighbors Classification | notebook | code |
| Naive Bayes | Gaussian Naive Bayes | notebook | code |
| Support Vector Machine | SVC | notebook | code |
| Decision Trees | ID3 Classification | notebook | code |
| ID3 Regression | notebook | code | |
| CART Classification | notebook | code | |
| CART Regression | notebook | code | |
| Ensemble methods | Random Forests Classification | notebook | code |
| Random Forests Regression | notebook | code | |
| AdaBoosting Classification | notebook | code |
Unsupervised learning
| Class | Algorithm | Implementation | Code |
|---|---|---|---|
| Gaussian mixture models | Gaussian Mixture | notebook | code |
| Clustering | K-means | notebook | code |
| DBSCAN | notebook | code | |
| Association Rules | Apriori | notebook | |
| Collaborative Filtering | User-based | notebook | |
| Item-based | notebook | ||
| LFM | notebook |
Model selection and evaluation
| Class | Approach | Code |
|---|---|---|
| Model Selection | Dataset Split | code |
| K-Fold | code | |
| Stratified K-Fold | code | |
| Metrics | Accuracy | code |
| Log loss | code | |
| F1-score | code | |
| AUC | code | |
| Explained Variance | code | |
| Mean Absolute Error | code | |
| Mean Squared Error | code | |
| R Square | code | |
| Euclidean Distances | code |
Preprocessing data
| Class | Algorithm | Implementation | Code |
|---|---|---|---|
| Feature Scaling | StandardScaler | code | |
| MinMaxScaler | code | ||
| Unsupervised dimensionality reduction | PCA | notebook | code |
| SVD | notebook | code | |
| Supervised dimensionality reduction | Linear Discriminant Analysis | notebook | code |
| Text Feature | Count Feature | code | |
| TF-IDF | code |
Known Issues
整体代码重用性较低。
random forest没有实现并行。
LDA代码存在功能欠缺。
K-Fold代码中使用了np.append(),效率较低。