Linear Regression is a Machine Learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting. Different regression models differ based on – the kind of relationship between dependent and independent variables, they are considering and the number of independent variables being used.
Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output) is the salary of a person. The regression line is the best fit line for our model.
import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
read csv file using pandas
data = pd.read_csv('F:/Machine_learn/machine_learning/Salary_Data.csv')
See the firts 5 data Using
data.head()
YearsExperience | Salary | |
---|---|---|
0 | 1.1 | 39343.0 |
1 | 1.3 | 46205.0 |
2 | 1.5 | 37731.0 |
3 | 2.0 | 43525.0 |
4 | 2.2 | 39891.0 |
plottting data using matplotlib
data.plot(x='YearsExperience', y='Salary')
plt.title('Exp vs Salary')
plt.xlabel('YearsExperience')
plt.ylabel('Salary')
plt.show()
create x and y
x = data['YearsExperience'].values.reshape(-1,1)
y = data['Salary'].values
Split data into train and test using train_test_split method
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
Training algorithm
Create object regressor
regressor = LinearRegression()
regressor.fit(X_train,y_train)
# intercept_
regressor.intercept_
#slope
regressor.coef_
Prediction our model
# prediciton
y_pred = regressor.predict(X_test)
y_pred
# Look the actual data and predicted data
df = pd.DataFrame({'Actual':y_test.flatten(), 'predicted':y_pred.flatten()})
print(df)
Actual | predicted | |
---|---|---|
0 | 112635.0 | 115790.210113 |
1 | 67938.0 | 71498.278095 |
2 | 113812.0 | 102596.868661 |
3 | 83088.0 | 75267.804224 |
4 | 64445.0 | 55477.792045 |
5 | 57189.0 | 60189.699707 |
0 Comments