Python Linear Regression Tutorial machine learning in hindi

Linear Regression Tutorial in machine learning python

# packages we will be using
import matplotlib.pyplot as plt
from sklearn import linear_model, metrics, model_selection
import numpy as np
import pandas as pd

What is Linear Regression?
डेटा के माध्यम से सबसे अच्छा फिट की एक सीधी रेखा ढूँढना। यह अच्छी तरह से काम करता है जब सही अंतर्निहित फ़ंक्शन रैखिक होता है

Example:

हम "प्रतिक्रिया" y की predict करने के लिए फीचर्स x का उपयोग करते हैं। उदाहरण के लिए हम exam_score पर num_hours_studied को फिर से प्राप्त करना चाहते हैं - दूसरे शब्दों में हम अध्ययन किए गए घंटों की संख्या से परीक्षा स्कोर की predict करते हैं।

आइए इस मामले के लिए कुछ उदाहरण डेटा उत्पन्न करें और x और y के बीच संबंध की जांच करें।

num_hours_studied = np.array([1, 3, 3, 4, 5, 6, 7, 7, 8, 8, 10])
exam_score = np.array([18, 26, 31, 40, 55, 62, 71, 70, 75, 85, 97])
plt.scatter(num_hours_studied, exam_score)
plt.xlabel('num_hours_studied')
plt.ylabel('exam_score')
plt.show()

हम देख सकते हैं कि यह लगभग सीधी line है। हमें इस तरह के high line correaltion के साथ संदेह है कि linear regression प्रतिगमन इस कार्य के लिए एक सफल तकनीक होगी।

अब हम इस डेटा को फिट करने के लिए एक linear model का build करेंगे।

Linear Model

Hypothesis:

एक linear model अंतर्निहित फ़ंक्शन की वास्तविक प्रकृति के बारे में एक "परिकल्पना" बनाता है - कि यह रैखिक है। हम इस परिकल्पना को एकतरफा मामले में व्यक्त करते हैं.

hθ(x)=ax+b

उर सरल उदाहरण ऊपर "univariate regression" का एक उदाहरण था - यानी सिर्फ एक चर (या "सुविधा") - अध्ययन किए गए घंटों की संख्या। नीचे हमारे पास एक से अधिक फीचर ("मल्टीवेरेट रिग्रेशन") होगा जो कि दिया गया है

hθ(x)=a⊤X

यहां एक सीखा मापदंडों का वेक्टर है, और एक्स सभी डेटा बिंदुओं के साथ "डिज़ाइनमैट्रिक्स" है। इस फॉर्मूलेशन में

इंटरसेप्ट शब्द को पहले कॉलम (सभी के) के रूप में डिजाइन मैट्रिक्स में जोड़ा गया है।

Design Matrix:

सामान्य तौर पर n डेटा पॉइंट्स और p फीचर्स के साथ हमारे डिजाइन मैट्रिक्स में n रो और पी कॉलम होंगे।

हमारे परीक्षा स्कोर प्रतिगमन उदाहरण पर लौटते हुए, आइए एक और विशेषता जोड़ें - परीक्षा से पहले रात को घंटों की संख्या। यदि हमारे पास 4 डेटा पॉइंट और 2 सुविधाएँ हैं, तो हमारा मैट्रिक्स 4 × 3 के आकार का होगा (याद रखें कि हम एक पूर्वाग्रह कॉलम जोड़ते हैं)। ऐसा लग सकता है

⎡⎣⎢⎢⎢111115788664⎤⎦⎥⎥⎥

Univariate Example:

Let's now see what our univariate example looks like

# Fit the model
exam_model = linear_model.LinearRegression(normalize=True)
x = np.expand_dims(num_hours_studied, 1)
y = exam_score
exam_model.fit(x, y)
a = exam_model.coef_
b = exam_model.intercept_
print(exam_model.coef_)
print(exam_model.intercept_)

[9.40225564]
4.278195488721792

# Visualize the results
plt.scatter(num_hours_studied, exam_score)
x = np.linspace(0, 10)
y = a*x + b
plt.plot(x, y, 'r')
plt.xlabel('num_hours_studied')
plt.ylabel('exam_score')
plt.show()

लाइन बहुत अच्छी तरह से आंख का उपयोग करके फिट होती है, जैसा कि यह होना चाहिए, क्योंकि सही फ़ंक्शन linear है और डेटा में बस थोड़ा सा noise है।

What is a Good Fit?

आमतौर पर हम एक linear समस्या में फिट की अच्छाई को मापने के लिए "mean squared error" का उपयोग करते हैं

MSE=1n∑i=1n(y(i)−h(i)θ)2

आप देख सकते हैं कि यह माप रहा है कि वास्तविक अनुमानों में से प्रत्येक हमारे अनुमानित बिंदु से कितनी दूर है, जो कि अच्छा अर्थ देता है।

This function is then taken to be our "loss" function - a measure of how badly we are doing. In general we want to minimize this.

Practice 1

In this first practice our goals are:

Load the data
Visualize the relationships in the data
Prepare the data for the learning algorithm
Fit the data

Load the Data :

डेटा .csv प्रारूप में आता है। हम डेटा लोड करने और प्रबंधित करने के लिए पांडा लाइब्रेरी (जिसे हमने पीडी के रूप में आयात किया था) का उपयोग करेंगे। यह डेटा को संभालने और मशीन सीखने के प्रयोगों के संचालन के लिए एक बहुत ही सुविधाजनक library है।

आपका पहला काम डेटा लोड करने के लिए pd.read_csv () फ़ंक्शन का उपयोग करना है।

Sep और index_col पर विशेष ध्यान दें (आपको केवल इन दोनों की आवश्यकता है)। यह डेटा टैब वर्ण '\ t' द्वारा अलग किया गया है और इंडेक्स कॉलम पहले वाला है (याद रखें कि python शून्य-आधारित इंडेक्स का उपयोग करता है)। कच्चे डेटा पर एक नज़र डालें (बस पाठ संपादक में फ़ाइल लोड करें) यह देखने के लिए कि आप क्या कर रहे हैं और इस महत्वपूर्ण कदम के लिए एक महसूस करें।

सुनिश्चित करें कि आप डेटा को एक variable नाम के डेटा में लोड करते हैं। निम्नलिखित सेल पुष्टि करने में मदद करेगा कि आपने यह पहला चरण सफलतापूर्वक पूरा किया है।

यदि आप यह देखना चाहते हैं कि आपने क्या टाइप किया है data.head () को डीबग करने के लिए सेल में सबसे नीचे की रेखा के रूप में।

""
Task: Load the data with pandas
"""
file_name = '../input/prostate.csv'
data = pd.read_csv(file_name, sep='\t', index_col=0)

assert len(data.columns) == 10
assert len(data) == 97
for column_name in ['lcavol', 'lweight', 'age', 'lbph', 'svi', 
                    'lcp', 'gleason', 'pgg45', 'lpsa', 'train']:
    assert column_name in data.columns
print('Success!')

output : Success

Let's now take a look at the data..

data.head()

	lweight	age	lbph	svi	gleason	pgg45	lpsa	train
1	-0.579818	2.769459	50	-1.386294	-1.386294	6	0	-0.430783	T
2	-0.994252	3.319626	58	-1.386294	-1.386294	6	0	-0.162519	T
3	-0.510826	2.691243	74	-1.386294	-1.386294	7	20	-0.162519	T
4	-1.203973	3.282789	58	-1.386294	-1.386294	6	0	-0.162519	T
5	0.751416	3.432373	62	-1.386294	-1.386294	6	0	0.371564	T

हम देख सकते हैं कि हमारे आठ फ़ीचर कॉलम और प्रतिक्रिया lpsa भी हैं। हमारे पास एक और कॉलम है जिसे ट्रेन कहा जाता है जो बूलियन कॉलम है। यह हमें बताता है कि हमारे प्रशिक्षण सेट में कौन से डेटा शामिल हैं, और जो परीक्षण के लिए हमारे पास हैं।

Visualize the Data:

सुविधाओं में से प्रत्येक के लिए linear regression की उपयुक्तता का आकलन करने के लिए, हमें हमेशा डेटा में संबंधों की कोशिश और साजिश करनी चाहिए।

# function to help us plot
def scatter(_data, x_name):
    plt.scatter(_data[x_name], _data['lpsa'])
    plt.xlabel(x_name)
    plt.ylabel('lpsa')
    plt.show()

scatter(data, 'pgg45')

वहाँ एक बहुत स्पष्ट linear संबंध है। यह इस ट्यूटोरियल की शुरुआत में हमारे toy उदाहरण के रूप में साफ है। लेकिन यह अधिक यथार्थवादी डेटा है, इसलिए ऐसे आसान रिश्ते और ऐसे स्वच्छ डेटा नहीं होंगे।

Prepare the Data

हमने देखा कि जब हमने डेटा को लोड किया था तो हमारे पास एक अतिरिक्त कॉलम था जो यह दर्शाता था कि मॉडल पैरामीटर को प्रशिक्षित करने के लिए डेटा बिंदु का उपयोग किया जाना चाहिए, या क्या इसे परीक्षण के लिए आयोजित किया जाना चाहिए।

हमें अब अपने डेटा को ट्रेन और टेस्ट सेट में अलग करना होगा। यह आपका अगला काम है।

You will need to use the pandas selection syntax which for an "equals" relation is:

new_data = data[data['column_name'] == desired_value]

You must create two new variables, train and test including the correct data points.

new_data = data.drop(['unwanted_column'], axis=1)

The following cell will validate your work.

"""
Task: Split the data into train and test
"""
train = data[data['train'] == 'T'].drop(['train'], axis=1)
test = data[data['train'] == 'F'].drop(['train'], axis=1)

assert len(train) == 67
assert len(test) == 30
assert 'train' not in train.columns
assert 'train' not in test.columns
assert len(train.columns) == 9
assert len(test.columns) == 9
print('Success!')

अब आपके पास अगले कार्य को पूरा करने के लिए आवश्यक उपकरण हैं। आपके द्वारा बनाए जाने वाले variables हैं

x_train = train.loc[:, train.columns != 'lpsa']
y_train = train['lpsa']
x_test = test.loc[:, test.columns != 'lpsa']
y_test = test['lpsa']

assert len(x_train.columns) == 8
assert len(x_test.columns) == 8
assert len(y_train) == 67
assert len(y_test) == 30
print('Success!')

Fit the Data:

जहां तक कोडिंग का सवाल है, मॉडल को पकड़ना वास्तव में आसान हिस्सा है। डेटा की तैयारी आमतौर पर सबसे अधिक समय लेने वाली होती है।

You need to create a new instance of sklearn.linear_model.LinearRegression and use the fit() function to fit it to x_train using y_train.

Note that when you choose normalize=True the argument fit_intercept is ignored, since we have centered the data around zero)

model = linear_model.LinearRegression(normalize=True)
model.fit(x_train, y_train)

Assess the Goodness of Fit:

Let's now assess the performance of this fit using mean squared error. sklearn makes this easy to calculate, providing the sklearn.metric.mean_squared_error

train_pred = model.predict(x_train)
mse_train = metrics.mean_squared_error(y_train, train_pred)
print(mse_train)

यदि हम MSE का वर्गमूल लेते हैं तो हमें एक ऐसा मान मिलता है जो सीधे हमारी प्रतिक्रिया से संबंधित होता है, y। इसलिए हम अपनी मात्रा के संबंध में इसकी सटीकता के संदर्भ में निर्णय ले सकते हैं।

np.sqrt(mse_train)

Thank You .....

Python Linear Regression Tutorial machine learning in hindi

Linear Model

Hypothesis:

Design Matrix:

Univariate Example:

What is a Good Fit?

Practice 1

Visualize the Data:

Prepare the Data

Fit the Data:

Assess the Goodness of Fit:

Post a Comment

0 Comments

Recent Posts

Categories

Tags

Recent in Health

Python Linear Regression Tutorial machine learning in hindi

Linear Model

Hypothesis:

Design Matrix:

Univariate Example:

What is a Good Fit?

Practice 1

Visualize the Data:

Prepare the Data

Fit the Data:

Assess the Goodness of Fit:

Related Posts

Post a Comment

0 Comments

Recent Posts

Categories

Tags

Recent in Health

Footer Social Widget