Note
#Machine learning enables system to learn from data and improve performance . Types Like 1.supervised learning-learns from labelled data here input and output both columns are taken process example Regression (numerical data),Classification(responses like yes/no). 2.unsupervised learning-find patterns in unlabelled data means here only input columns are present , example Clustering like task to find out how many types of groups find in data.
#Supervised learning Algorithms Linear Regression – predicts continuous values.
Logistic Regression – binary classification.
Decision Trees
Random Forest etc.,
#Unsupervised learning Algorithms K-Means Clustering
Hierarchical Clustering
Principal Component Analysis (PCA)
#Regression- MSE (Mean Squared Error)
RMSE (Root Mean Squared Error)
R² Score
#Classification- Accuracy
Precision
Recall
F1 Score
Confusion Matrix
ROC-AUC
In Machine learning for uploading data in google-colab use following code
from google.colab import files
uploaded = files.upload()
then choose the file by selecting from the files locating in your system
now, for linear regression /logistic regression(almost same as Linear regression Code)
TASK1.importing the required packages
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import r2_score,accuracy_score,confusion_matrix,classification_report from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import mean_squared_error from sklearn.tree import plot_tree
#here linear_model.py is sklearn package/library where function/class is pre written ,model_selection.py and metrics.py are python files(modules).
TASK2.Reading and exploring the data
data=pd.read_csv('your actualfile name.csv') data.head() data.shape
TASK3.finding the following terms using required coding Null values, check the Duplicates and drop them , Data types of values in each column , detection of Outliers present in data, and using their removal codes ,Necessary visualisations -these processes called as EDA(Exploratory data analysis)
df.isnull().sum().sum() df.dropna(inplace=True) df.duplicated().sum() df=df.drop_duplicates(inplace=True) df.shape
TASK4.Steps in performing MODEL building in ML process 1.creating x and y variables 2.splitting the given dataset into training and testing data 3.Standardization/scaling of data 4.Applying the algorithm on data which is also known as training of ML model 5.Check the performance of model on testing data
Label encoding is change all data from object to numeric type for training of data-
from sklearn.prepocessing import LabelEncoder le=LabelEncoder()
for i in df.columns: if df[i].dtype=='object': df[i]=le.fit_transform(df[i]) df.info()