Mathematical Intuition and Implementation of Simple Linear Regression from Scratch
Day 1 :
So lets get started , Today we will be discussing about the Mathematical intuition and implementation of Simple Linear Regression
Simple Linear regression is basically used to predict the value of one variable based on the value of another variable , and we assume that the datapoints are linear in nature
Working in simple terms ::
we basically sketch a line which go closest through each of the data point and we called this line as a best fit line .
so you all will be wondering what is a best fit line ??
A best fit line is a line which go as much as closest that it can through all the data point and via that line we can predict the outcome of the dependent variable
Geometric Intuition :
As we all know the equation of the line is y=mx+b where y is our independent variable , x is our dependent variable , m is our slope and b is the intercept
In linear regression our main task is to find the value of m and b
lets talk more about slope and intercept in a more fun way ::
Slope : it acts like a weightage which impact the y a lot , if it gets increase or decrease
for example in the given figure lets make x as our cgpa and y as the package and now we have to predict on what value of cgpa what package we will be getting ?
Assumption 1 :
lets increase the value of slope (M) in the above figure and now what will happen ?
Ans -: its obvious that the value of the package will drastically change by a big margin when we change the value of cgpa with a very small number .
Assumption 2 :
lets increase the value of intercept (B) in the above figure and now what will happen ?
Ans :- There will be no change because b is totally different its acting like a offset
Now what is offset , let me explain it to you :
as we all know y = mx +b now lets make b =0 now equation is y=mx, and if this y=mx also becomes zero then our y is also zero which makes our package to be 0 which is not possible, so that’s why B is considered to be an offset or (positive no ) so that if the value of the y=mx becomes 0 then the Y will not be zero in that case !!
Mathematical intuition
How to find M and B ?
So before moving to find M and B lets discuss about the loss function :
loss functions are a measurement of how good your model is in terms of predicting the expected outcome.
In the above figure you can see that the green point are the data points and the line below the points is the best fit line so the distance between the green points and the best fit line is known as the Error
we can write this as (y-y^) where y is our actual value and the y^ is our predicted value
if we combine all the data points we can write it as :
Here we are taking the mean squared error if you want to calculate the full error then you can remove the n from the equation
Lets calculate the formulae for M and B
Please if you have any doubts solving the above eq do message me in the comment section !!!
so now we have both the formulae for calculating B and M !!
Implement Scikit Learn Linear Regression class of your own
so here we will try to make a Linear regression class of our own and try to predict with that class :
class MeraLR:
def __init__(self):
self.m = None
self.b = None
def fit(self,X_train,y_train):
num = 0
den = 0 for i in range(X_train.shape[0]):
num = num + ((X_train[i] - X_train.mean())*(y_train[i] - y_train.mean()))
den = den + ((X_train[i] - X_train.mean())*(X_train[i] - X_train.mean()))
self.m = num/den
self.b = y_train.mean() - (self.m * X_train.mean())
print(self.m)
print(self.b)
def predict(self,X_test):
print(X_test)
return self.m * X_test + self.b##we are basically going in each and every row of the dataset then we are calculating the value of m and b by using the above formulae that we derived ###
Thank You !!