[Machine Learning] 회귀분석(Regression)

Machine Learning

[Machine Learning] 회귀분석(Regression)_1

구루싸 2019. 10. 20. 00:47

SMALL

이번 학습 주제는 머신러닝(Machine Learning) 알고리즘(Algorithms) 중 회귀분석(Regression)이고

두 변수 사이에 1:1로 대응되는 확률적, 통계적 상관성을 찾는 단순회귀분석(Simple Linear Regression)에 대해 알아보겠습니다

그 전에 머신러닝 프로세스(Machine Learning Process)에 대해 잠깐 살펴보면 아래와 같습니다

데이터 정리
- 머신러닝 데이터 분석을 시작하기 전에 컴퓨터가 이해할 수 있는 형태로 데이터를 변환하는 작업이 선행되어야하는데 분석 대상에 대한 관측값(observation)을 속성(feature or variable)을 기준으로 정리합니다
데이터 분리(훈련/검증)
알고리즘 준비
모형 학습
- 훈련 데이터 이용
예측
- 검증 데이터 이용
모형 평가
모형 활용

또한 머신러닝은 크게 지도 학습과 비지도 학습 두가지 유형으로 분류하는데 회귀분석은 이 중에 지도 학습에 속하는 방법입니다

구분

지도 학습(Supervised Learning)

비지도 학습(Unsupervised Learning)

알고리즘(분석모형)

◇ 회귀분석

◇ 분류

◇ 군집분석

특징

◇ 정답을 알고 있는 상태에서 학습

◇ 모형 평가 방법이 다양한 편

◇ 정답이 없는 상태에서 서로 비슷한 데이터를 찾아서 그룹화

◇ 모형 평가 방법이 제한적임

자 이제 단순회귀분석에 대해 학습하겠습니다-_-

# 판다스(Pandas)
import pandas
import numpy
import matplotlib.pyplot as mp
import matplotlib
import seaborn

# Prepare Data
filepath = "/Users/dennis_sa/Documents/"
# 0번 로우를 header로 설정 
read_data = pandas.read_csv(filepath+"auto-mpg.data-original", header = None, sep = '\s+') 
read_data.columns = ['연비(mpg)', '실린더 수(cylinders)', '배기량(displacement)', '출력(horsepower)',
                     '차중(weight)', '가속능력(acceleration)', '출시년도(model_year)', '제조국(origin)', '모델명(name)']

matplotlib.rc('font', family = 'AppleGothic') # MAC OS 일 경우 한글 폰트 오류 해결
pandas.set_option('display.max_columns', 10)

# Explore Data
print(read_data.info(), end = '\n')
print(read_data.describe(), end = '\n')
print(read_data['연비(mpg)'], end = '\n')
print(read_data['출력(horsepower)'], end = '\n')
read_data.dropna(subset = ['연비(mpg)'], axis = 0, inplace = True)
read_data.dropna(subset = ['출력(horsepower)'], axis = 0, inplace = True)
print(read_data.info(), end = '\n') 

# Choose Variables
choose_data = read_data[['연비(mpg)', '실린더 수(cylinders)', '출력(horsepower)', '차중(weight)']]
print(choose_data, end = '\n')

# Use matplotlib
choose_data.plot(kind = 'scatter', x = '차중(weight)', y = '연비(mpg)', c = 'coral', s = 10, figsize = (10, 5))
mp.show()
mp.close()

# Use seaborn
fig = mp.figure(figsize = (10, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
seaborn.regplot(x = '차중(weight)', y = '출력(horsepower)', data = choose_data, ax = ax1) # 회귀선 표시
seaborn.regplot(x = '차중(weight)', y = '출력(horsepower)', data = choose_data, ax = ax2, fit_reg = False) # 회귀선 미표시
mp.show()
mp.close()

seaborn.jointplot(x = '차중(weight)', y = '출력(horsepower)', data = choose_data) # 회귀선 미표시
seaborn.jointplot(x = '차중(weight)', y = '출력(horsepower)', kind = 'reg', data = choose_data) # 회귀선 표시
mp.show()
mp.close()

# choose_data의 모든 컬럼을 두 개씩 짝지어서 그릴 수 있는 모든 경우 그리기
combination = seaborn.pairplot(choose_data)
mp.show()
mp.close()

# 출력(horsepower)와 차중(weight) 열이 연비(mpg) 열과 선형관계를 보이므로 이 두가지로 훈련(train)/검증(test) 데이터 분할
x = choose_data[['차중(weight)']]
y = choose_data[['연비(mpg)']]
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 10)

# 학습 및 검증
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(x_train, y_train)

r_square = lr.score(x_test, y_test)
print('결정계수(R-제곱)', r_square, end = '\n')
print('기울기 a : ', lr.coef_)
print('y절편 b : ', lr.intercept_)

y_hat = lr.predict(x)
mp.figure(figsize = (10, 5))
ax1 = seaborn.distplot(y, hist = False, label = 'y')
ax2 = seaborn.distplot(y_hat, hist = False, label = 'y_hat', ax = ax1)
mp.show()
mp.close()

실제 머신러닝(Machine Learning)을 학습해보니 기초 개념들이 많이 부족하네요-_-

그렇지만 일단은 진도를 빼면서 살펴보고 부족했던 부분들은 추후에 보충하는 것으로..

그럼 오늘의 학습은 이만-_-

LIST

'Machine Learning' 카테고리의 다른 글

[Machine Learning] 분류(Classification)_3 (0)	2019.11.23
[Machine Learning] 분류(Classification)_2 (0)	2019.11.23
[Machine Learning] 분류(Classification)_1 (0)	2019.11.11
[Machine Learning] 회귀분석(Regression)_3 (0)	2019.11.07
[Machine Learning] 회귀분석(Regression)_2 (0)	2019.10.28

현재글[Machine Learning] 회귀분석(Regression)_1

GuruSa

구루싸의 IT 파헤치기

파이썬, Design Pattern, 스프링, java, Algorithms, 머신러닝, React, 안드로이드, 자바, 데이터, 코틀린, 보안, 알고리즘, 빅데이터, 판다스, 디자인 패턴, 분석, go, python, 고,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

GuruSa