Skip to content

Latest commit

ย 

History

History

exploration_13

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
ย 
ย 
ย 
ย 
ย 
ย 

13. ์–ด์ œ ์˜ค๋ฅธ ๋‚ด ์ฃผ์‹, ๊ณผ์—ฐ ๋‚ด์ผ์€?

์˜ค๋Š˜์€ ์‹œ๊ณ„์—ด ์˜ˆ์ธก(Time-Series Prediction)์„ ๋‹ค๋ฃจ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ํ†ต๊ณ„์  ๊ธฐ๋ฒ• ์ค‘์— ๊ฐ€์žฅ ๋„๋ฆฌ ์•Œ๋ ค์ง„ ARIMA(Auto-regressive Integrated Moving Average)์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ  ์ด๋ฅผ ํ† ๋Œ€๋กœ ํŠน์ • ์ฃผ์‹ ์ข…๋ชฉ์˜ ๊ฐ€๊ฒฉ์„ ์˜ˆ์ธกํ•ด ๋ณด๋Š” ์‹ค์Šต์„ ์ง„ํ–‰ํ•ด๋ณด์ž.

ํ•™์Šต ๋ชฉํ‘œ


  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ๊ณผ ์•ˆ์ •์ (Stationary) ์‹œ๊ณ„์—ด์˜ ๊ฐœ๋…์„ ์ดํ•ดํ•œ๋‹ค.
  • ARIMA ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜๋Š” AR, MA, Diffencing์˜ ๊ฐœ๋…์„ ์ดํ•ดํ•˜๊ณ  ๊ฐ„๋‹จํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•ด ๋ณธ๋‹ค.
  • ์‹ค์ œ ์ฃผ์‹ ๋ฐ์ดํ„ฐ์— ARIMA๋ฅผ ์ ์šฉํ•ด์„œ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ํ™•์ธํ•ด ๋ณธ๋‹ค.

์‹œ๊ณ„์—ด ์˜ˆ์ธก

(1) ๋ฏธ๋ž˜๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค๋Š” ๊ฒƒ์€ ๊ฐ€๋Šฅํ• ๊นŒ?


  • ์ง€๊ธˆ๊นŒ์ง€์˜ ์ฃผ๊ฐ€๋ณ€๊ณก์„ ์„ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ ์ฃผ๊ฐ€๋ณ€๋™ ์˜ˆ์ธก
  • ํŠน์ • ์ง€์—ญ์˜ ๊ธฐํ›„๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‚ด์ผ์˜ ์˜จ๋„๋ณ€ํ™” ์˜ˆ์ธก
  • ๊ณต์žฅ ์„ผํ„ฐ๋ฐ์ดํ„ฐ ๋ณ€ํ™”์ด๋ ฅ์„ ํ† ๋Œ€๋กœ ์ด์ƒ ๋ฐœ์ƒ ์˜ˆ์ธก

์œ„ ์˜ˆ์‹œ์˜ ๊ณตํ†ต์ ์€ ์˜ˆ์ธก ๊ทผ๊ฑฐ๊ฐ€ ๋˜๋Š” ์‹œ๊ณ„์—ด(Time-Series) ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ž€ ์‹œ๊ฐ„ ์ˆœ์„œ๋Œ€๋กœ ๋ฐœ์ƒํ•œ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜์—ด์ด๋ผ๋Š” ๋œป์ด๋‹ค.

image

์ผ์ • ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ์œผ๋กœ ๋ฐœ์ƒํ•œ ๋ฐ์ดํ„ฐ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋งค์ผ์˜ ์ฃผ์‹ ๊ฑฐ๋ž˜ ๊ฐ€๊ฒฉ์„ ๋‚ ์งœ-๊ฐ€๊ฒฉ ํ˜•ํƒœ๋กœ ๋‚ ์งœ์ˆœ์œผ๋กœ ๋ชจ์•„๋‘” ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๋ฉด ์ด ๋ฐ์ดํ„ฐ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํ›Œ๋ฅญํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ ๋  ๊ฒƒ์ด๋‹ค.

๊ทธ๋ ‡๋‹ค๋ฉด ํŠน์ • ์ฃผ์‹์˜ ๋งค์ผ ๊ฐ€๊ฒฉ ๋ณ€๋™ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆ˜๋…„ ์น˜ ์Œ“์—ฌ์žˆ๋‹ค๊ณ  ํ•  ๋•Œ, ์ด ๋ฐ์ดํ„ฐ๋ฅผ ํ† ๋Œ€๋กœ ๋‚ด์ผ์˜ ์ฃผ์‹ ๊ฐ€๊ฒฉ์ด ์–ผ๋งˆ๊ฐ€ ๋ ์ง€, ์˜ค๋ฅผ์ง€ ๋‚ด๋ฆด์ง€๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์„๊นŒ? ๊ฒฐ๋ก ์ ์œผ๋กœ ๋งํ•˜์ž๋ฉด ๋ฏธ๋ž˜ ์˜ˆ์ธก์€ ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋ฏธ๋ž˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์˜ˆ์ธกํ•˜๋ ค๊ณ  ํ•œ๋‹ค๋ฉด ๋‘ ๊ฐ€์ง€ ์ „์ œ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

  • ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ์— ์ผ์ •ํ•œ ํŒจํ„ด์ด ๋ฐœ๊ฒฌ๋œ๋‹ค
  • ๊ณผ๊ฑฐ์˜ ํŒจํ„ด์€ ๋ฏธ๋ž˜์—๋„ ๋™์ผํ•˜๊ฒŒ ๋ฐ˜๋ณต๋  ๊ฒƒ์ด๋‹ค.

์ด ๋‘ ๊ฐ€์ง€ ๋ฌธ์žฅ์ด ์˜๋ฏธํ•˜๋Š” ๋ฐ”๋Š” ์ฆ‰, ์•ˆ์ •์ (Stationary)์ธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋งŒ ๋ฏธ๋ž˜ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์—ฌ๊ธฐ์„œ ์•ˆ์ •์ (Stationary)์ด๋‹ค๋Š” ๊ฒƒ์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ํ†ต๊ณ„์  ํ‹์„ฑ์ด ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๋œป์ด๋‹ค.

(2) Stationaryํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ


  1. ์‹œ๊ฐ„์˜ ์ถ”์ด์™€ ๊ด€๊ณ„ ์—†์ด ํ‰๊ท ์ด ๋ถˆ๋ณ€

    images00.png

  2. ์‹œ๊ฐ„์˜ ์ถ”์ด์™€ ๊ด€๊ณ„ ์—†์ด ๋ถ„์‚ฐ์ด ๋ถˆ๋ณ€

    images01.png

  3. ๋‘ ์‹œ์  ๊ฐ„์˜ ๊ณต๋ถ„์‚ฐ์ด ๊ธฐ์ค€ ์‹œ์ ๊ณผ ๋ฌด๊ด€

    images02.png

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์‚ฌ๋ก€๋ถ„์„

(1) Daily Minimum Temperatures in Melbourne


๋ฐ์ดํ„ฐ ์ค€๋น„

$ wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv
$ wget https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv

์‹œ๊ณ„์—ด(Time Series) ์ƒ์„ฑ

์ฒซ ๋ฒˆ์งธ๋กœ ๋‹ค๋ฃจ์–ด๋ณผ ๋ฐ์ดํ„ฐ๋Š” Daily Minimum Temperatures in Melbourne์ด๋‹ค.

# ๋ชจ๋“ˆ import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

# data load
dataset_filepath = os.getenv('HOME')+'/aiffel/stock_prediction/data/daily-min-temperatures.csv' 
df = pd.read_csv(dataset_filepath) 
print(type(df))
df.head()

# ์ด๋ฒˆ์—๋Š” Date๋ฅผ index_col๋กœ ์ง€์ • 
df = pd.read_csv(dataset_filepath, index_col='Date', parse_dates=True)
print(type(df))
df.head()

ts1 = df['Temp']
print(type(ts1))
ts1.head()

from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 13, 6 

# ์‹œ๊ณ„์—ด(time series) ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™”. ํŠน๋ณ„ํžˆ ๋” ๊ฐ€๊ณตํ•˜์ง€ ์•Š์•„๋„ ์ž˜ ๊ทธ๋ ค์ง„๋‹ค.
plt.plot(ts1)

# ์‹œ๊ณ„์—ด(Time Series)์—์„œ ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ๋Š” ๋ถ€๋ถ„๋งŒ Series๋กœ ์ถœ๋ ฅ
ts1[ts1.isna()]  

# ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ๋‹ค๋ฉด ์ด๋ฅผ ๋ณด๊ฐ„ํ•ฉ๋‹ˆ๋‹ค. ๋ณด๊ฐ„ ๊ธฐ์ค€์€ time์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. 
ts1=ts1.interpolate(method='time')

# ๋ณด๊ฐ„ ์ดํ›„ ๊ฒฐ์ธก์น˜(NaN) ์œ ๋ฌด๋ฅผ ๋‹ค์‹œ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
print(ts1[ts1.isna()])

# ๋‹ค์‹œ ๊ทธ๋ž˜ํ”„๋ฅผ ํ™•์ธํ•ด๋ด…์‹œ๋‹ค!
plt.plot(ts1)

images03.png

# ์ผ์ • ๊ตฌ๊ฐ„ ๋‚ด ํ†ต๊ณ„์น˜(Rolling Statistics)๋ฅผ ์‹œ๊ฐํ™”
def plot_rolling_statistics(timeseries, window=12):
    
    rolmean = timeseries.rolling(window=window).mean()  # ์ด๋™ํ‰๊ท  ์‹œ๊ณ„์—ด
    rolstd = timeseries.rolling(window=window).std()    # ์ด๋™ํ‘œ์ค€ํŽธ์ฐจ ์‹œ๊ณ„์—ด

     # ์›๋ณธ์‹œ๊ณ„์—ด, ์ด๋™ํ‰๊ท , ์ด๋™ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ plot์œผ๋กœ ์‹œ๊ฐํ™”ํ•ด ๋ณธ๋‹ค.
    orig = plt.plot(timeseries, color='blue',label='Original')    
    mean = plt.plot(rolmean, color='red', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label = 'Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show(block=False)

plot_rolling_statistics(ts1, window=12)

images04.png

์‹œ๊ฐ„์— ๋”ฐ๋ผ ์ผ์ •ํ•œ ํ‰๊ท , ๋ถ„์‚ฐ, ์ž๊ธฐ๊ณต๋ถ„์‚ฐ์˜ ํŒจํ„ด์ด ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ ์ฒ˜๋Ÿผ ๋ณด์ด๋ฏ€๋กœ ์•ˆ์ •์ ์ธ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋กœ ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค. ์ข€ ๋” ๋ช…ํ™•ํ•˜๊ฒŒ ํ•˜๋ ค๋ฉด ํ†ต๊ณ„์  ์ ‘๊ทผ์ด ํ•„์š”ํ•˜๋‹ค.

(2) International airline passengers


# data load
dataset_filepath = os.getenv('HOME')+'/aiffel/stock_prediction/data/airline-passengers.csv' 
df = pd.read_csv(dataset_filepath, index_col='Month', parse_dates=True).fillna(0)  
print(type(df))
df.head()

ts2 = df['Passengers']
plt.plot(ts2)

plot_rolling_statistics(ts2, window=12)

images05.png

์œ„์˜ ์‚ฌ๋ก€์™€๋Š” ๋‹ฌ๋ฆฌ ์‹œ๊ฐ„์˜ ์ถ”์ด์— ๋”ฐ๋ผ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์ด ์ฆ๊ฐ€ํ•˜๋Š” ํŒจํ„ด์„ ๋ณด์ธ๋‹ค๋ฉด ์ด ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋Š” ์ ์–ด๋„ ์•ˆ์ •์ ์ด์ง„ ์•Š๋‹ค๊ณ  ์ •์„ฑ์ ์ธ ๊ฒฐ๋ก ์„ ๋‚ด๋ ค๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค. ์ด๋Ÿฐ ๋ถˆ์•ˆ์ •์ (Non-Stationary) ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹œ๊ณ„์—ด ๋ถ„์„ ๊ธฐ๋ฒ•๋„ ์•Œ์•„๋ณด์ž.

Stationary ์—ฌ๋ถ€๋ฅผ ์ฒดํฌํ•˜๋Š” ํ†ต๊ณ„์  ๋ฐฉ๋ฒ•

(1) Augmented Dickey-Fuller Test


Augmented Dickey-Fuller Test(ADF Test)๋ผ๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ์•ˆ์ •์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ํ†ต๊ณ„์  ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ํ…Œ์ŠคํŠธ๋Š” ์ฃผ์–ด์ง„ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ ์•ˆ์ •์ ์ด์ง€ ์•Š๋‹ค๋ผ๋Š” ๊ท€๋ฌด๊ฐ€์„ค(Null Hypothesis)๋ฅผ ์„ธ์šด ํ›„, ํ†ต๊ณ„์  ๊ฐ€์„ค ๊ฒ€์ • ๊ณผ์ •์„ ํ†ตํ•ด ์ด ๊ท€๋ฌด๊ฐ€์„ค์ด ๊ธฐ๊ฐ๋  ๊ฒฝ์šฐ์— ์ด ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ ์•ˆ์ •์ ์ด๋‹ค๋ผ๋Š” ๋Œ€๋ฆฝ๊ฐ€์„ค(Alternative Hypothesis)์„ ์ฑ„ํƒํ•œ๋‹ค๋Š” ๋‚ด์šฉ์ด๋‹ค.

ํ†ต๊ณ„์  ๊ฐ€์„ค ๊ฒ€์ •์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ์ด๋ฃจ๋Š” p-value ๋“ฑ์˜ ์šฉ์–ด์— ๋Œ€ํ•ด์„œ๋Š” ํ•œ ๋ฒˆ์ฏค ์งš๊ณ  ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ์ด๋‹ค.

(2) statsmodels ํŒจํ‚ค์ง€์™€ adfuller ๋ฉ”์†Œ๋“œ


statsmodels ํŒจํ‚ค์ง€๋Š” R์—์„œ ์ œ๊ณตํ•˜๋Š” ํ†ต๊ณ„๊ฒ€์ •, ์‹œ๊ณ„์—ด๋ถ„์„ ๋“ฑ์˜ ๊ธฐ๋Šฅ์„ ํŒŒ์ด์ฌ์—์„œ๋„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ํ†ต๊ณ„ ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค. ์ด๋ฒˆ ๋…ธ๋“œ์—์„œ๋Š” statsmodels ํŒจํ‚ค์ง€์˜ ๊ธฐ๋Šฅ์„ ์ž์ฃผ ํ™œ์šฉํ•˜๊ฒŒ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” statsmodels ํŒจํ‚ค์ง€์—์„œ ์ œ๊ณตํ•˜๋Š” adfuller ๋ฉ”์†Œ๋“œ๋ฅผ ์ด์šฉํ•ด ์ฃผ์–ด์ง„ timeseries์— ๋Œ€ํ•œ Augmented Dickey-Fuller Test๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์ฝ”๋“œ์ด๋‹ค.

from statsmodels.tsa.stattools import adfuller

def augmented_dickey_fuller_test(timeseries):
    # statsmodels ํŒจํ‚ค์ง€์—์„œ ์ œ๊ณตํ•˜๋Š” adfuller ๋ฉ”์†Œ๋“œ๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค.
    dftest = adfuller(timeseries, autolag='AIC')  
    
    # adfuller ๋ฉ”์†Œ๋“œ๊ฐ€ ๋ฆฌํ„ดํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ •๋ฆฌํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    print('Results of Dickey-Fuller Test:')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)' % key] = value
    print(dfoutput)

# Daily Minimum Temperatures in Melbourne
augmented_dickey_fuller_test(ts1)

"""
Results of Dickey-Fuller Test:
Test Statistic                   -4.444805
p-value                           0.000247
#Lags Used                       20.000000
Number of Observations Used    3629.000000
Critical Value (1%)              -3.432153
Critical Value (5%)              -2.862337
Critical Value (10%)             -2.567194
dtype: float64
"""

Daily Minimum Temperatures in Melbourne ์‹œ๊ณ„์—ด์ด ์•ˆ์ •์ ์ด์ง€ ์•Š๋‹ค๋Š” ๊ท€๋ฌด๊ฐ€์„ค์€ p-value๊ฐ€ ๊ฑฐ์˜ 0์— ๊ฐ€๊น๊ฒŒ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๊ท€๋ฌด๊ฐ€์„ค์€ ๊ธฐ๊ฐ๋˜๊ณ , ์ด ์‹œ๊ณ„์—ด์€ ์•ˆ์ •์  ์‹œ๊ณ„์—ด์ด๋ผ๋Š” ๋Œ€๋ฆฝ๊ฐ€์„ค์ด ์ฑ„ํƒ๋œ๋‹ค.

# International airline passengers
augmented_dickey_fuller_test(ts2)

"""
Results of Dickey-Fuller Test:
Test Statistic                   0.815369
p-value                          0.991880
#Lags Used                      13.000000
Number of Observations Used    130.000000
Critical Value (1%)             -3.481682
Critical Value (5%)             -2.884042
Critical Value (10%)            -2.578770
dtype: float64
"""

International airline passengers ์‹œ๊ณ„์—ด์ด ์•ˆ์ •์ ์ด์ง€ ์•Š๋‹ค๋Š” ๊ท€๋ฌด๊ฐ€์„ค์€ p-value๊ฐ€ ๊ฑฐ์˜ 1์— ๊ฐ€๊น๊ฒŒ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ด๊ฒƒ์ด ๋ฐ”๋กœ ์ด ๊ท€๋ฌด๊ฐ€์„ค์ด ์˜ณ๋‹ค๋Š” ์ง์ ‘์ ์ธ ์ฆ๊ฑฐ๊ฐ€ ๋˜์ง€๋Š” ์•Š์ง€๋งŒ, ์ ์–ด๋„ ์ด ๊ท€๋ฌด๊ฐ€์„ค์„ ๊ธฐ๊ฐํ•  ์ˆ˜๋Š” ์—†๊ฒŒ ๋˜์—ˆ์œผ๋ฏ€๋กœ ์ด ์‹œ๊ณ„์—ด์ด ์•ˆ์ •์ ์ธ ์‹œ๊ณ„์—ด์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜๋Š” ์—†๋‹ค.

์‹œ๊ณ„์—ด ์˜ˆ์ธก์˜ ๊ธฐ๋ณธ ์•„์ด๋””์–ด: Stationary ํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ


์•ˆ์ •์ ์ด์ง€ ์•Š์€ ์‹œ๊ณ„์—ด์„ ์•ˆ์ •์ ์ธ ์‹œ๊ณ„์—ด๋กœ ๋ฐ”๊พธ๊ธฐ ์œ„ํ•ด ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋‹ค. ํ•œ๊ฐ€์ง€๋Š” ์ •์„ฑ์ ์ธ ๋ถ„์„์„ ํ†ตํ•ด ๋ณด๋‹ค ์•ˆ์ •์ (starionary)์ธ ํŠน์„ฑ์„ ๊ฐ€์ง€๋„๋ก ๊ธฐ์กด์˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€๊ณต/๋ณ€ํ˜•ํ•˜๋Š” ์‹œ๋„๋“ค์ด๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ์‹œ๊ณ„์—ด ๋ถ„ํ•ด(Time series decomposition)๋ผ๋Š” ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

(1) ๋ณด๋‹ค Stationaryํ•œ ์‹œ๊ณ„์—ด๋กœ ๊ฐ€๊ณตํ•˜๊ธฐ


๋กœ๊ทธํ•จ์ˆ˜ ๋ณ€ํ™˜

ts_log = np.log(ts2)
plt.plot(ts_log)

augmented_dickey_fuller_test(ts_log)

"""
Results of Dickey-Fuller Test:
Test Statistic                  -1.717017
p-value                          0.422367
#Lags Used                      13.000000
Number of Observations Used    130.000000
Critical Value (1%)             -3.481682
Critical Value (5%)             -2.884042
Critical Value (10%)            -2.578770
dtype: float64
"""

p-value๊ฐ€ 0.42๋กœ ๋ฌด๋ ค ์ ˆ๋ฐ˜ ์ด์ƒ ์ค„์–ด๋“ค์—ˆ๋‹ค. ์ •์„ฑ์ ์œผ๋กœ๋„ ์‹œ๊ฐ„ ์ถ”์ด์— ๋”ฐ๋ฅธ ๋ถ„์‚ฐ์ด ์ผ์ •ํ•ด์ง„ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์•„์ฃผ ํšจ๊ณผ์ ์ธ ๋ณ€ํ™˜์ด์—ˆ๋˜ ๊ฒƒ ๊ฐ™์ด ๋ณด์ด๋‚˜, ๊ฐ€์žฅ ๋‘๋“œ๋Ÿฌ์ง€๋Š” ๋ฌธ์ œ์ ์€ ์‹œ๊ฐ„ ์ถ”์ด์— ๋”ฐ๋ผ ํ‰๊ท ์ด ๊ณ„์† ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค.

Moving average ์ œ๊ฑฐ - ์ถ”์„ธ(Trend) ์ƒ์‡„ํ•˜๊ธฐ

์‹œ๊ณ„์—ด ๋ถ„์„์—์„œ ์œ„์™€ ๊ฐ™์ด ์‹œ๊ฐ„ ์ถ”์ด์— ๋”ฐ๋ผ ๋‚˜ํƒ€๋‚˜๋Š” ํ‰๊ท ๊ฐ’ ๋ณ€ํ™”๋ฅผ ์ถ”์„ธ(trend)๋ผ๊ณ  ํ•œ๋‹ค. ์ด ๋ณ€ํ™”๋Ÿ‰์„ ์ œ๊ฑฐํ•ด ์ฃผ๋ ค๋ฉด ๊ฑฐ๊พธ๋กœ Moving Average, ์ฆ‰ rolling mean์„ ๊ตฌํ•ด์„œ ts_log๋ฅผ ๋นผ์ฃผ๋ฉด ๋œ๋‹ค.

# moving average๊ตฌํ•˜๊ธฐ 
moving_avg = ts_log.rolling(window=12).mean()  
plt.plot(ts_log)
plt.plot(moving_avg, color='red')

# ๋ณ€ํ™”๋Ÿ‰ ์ œ๊ฑฐ
ts_log_moving_avg = ts_log - moving_avg 
ts_log_moving_avg.head(15)

# ๊ฒฐ์ธก์น˜ ์ œ๊ฑฐ
ts_log_moving_avg.dropna(inplace=True)
ts_log_moving_avg.head(15)

plot_rolling_statistics(ts_log_moving_avg)

augmented_dickey_fuller_test(ts_log_moving_avg)

"""
Results of Dickey-Fuller Test:
Test Statistic                  -3.162908
p-value                          0.022235
#Lags Used                      13.000000
Number of Observations Used    119.000000
Critical Value (1%)             -3.486535
Critical Value (5%)             -2.886151
Critical Value (10%)            -2.579896
dtype: float64
"""

images06.png

p-value๊ฐ€ 0.02 ์ˆ˜์ค€์ด ๋˜์—ˆ์œผ๋ฏ€๋กœ, 95% ์ด์ƒ์˜ confidence๋กœ ์ด time series๋Š” stationaryํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ง€๊ธˆ๊นŒ์ง€์˜ ์ ‘๊ทผ์—์„œ ํ•œ๊ฐ€์ง€ ์ˆจ๊ฒจ์ง„ ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ๋ฐ”๋กœ Moving Average๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” window=12๋กœ ์ •ํ™•ํ•˜๊ฒŒ ์ง€์ •ํ•ด ์ฃผ์–ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค. ๋งŒ์•ฝ ์œ„ ์ฝ”๋“œ์—์„œ window=6์„ ์ ์šฉํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค.

moving_avg_6 = ts_log.rolling(window=6).mean()
ts_log_moving_avg_6 = ts_log - moving_avg_6
ts_log_moving_avg_6.dropna(inplace=True)

plot_rolling_statistics(ts_log_moving_avg_6)

augmented_dickey_fuller_test(ts_log_moving_avg_6)

"""
Results of Dickey-Fuller Test:
Test Statistic                  -2.273822
p-value                          0.180550
#Lags Used                      14.000000
Number of Observations Used    124.000000
Critical Value (1%)             -3.484220
Critical Value (5%)             -2.885145
Critical Value (10%)            -2.579359
dtype: float64
"""

images07.png

๊ทธ๋ž˜ํ”„๋ฅผ ์ •์„ฑ์ ์œผ๋กœ ๋ถ„์„ํ•ด์„œ๋Š” window=12์ผ ๋•Œ์™€ ๋ณ„ ์ฐจ์ด๋ฅผ ๋Š๋‚„์ˆ˜ ์—†์ง€๋งŒ Augmented Dickey-Fuller Test์˜ ๊ฒฐ๊ณผ p-value๋Š” 0.18 ์ˆ˜์ค€์ด์–ด์„œ ์•„์ง๋„ ์•ˆ์ •์  ์‹œ๊ณ„์—ด์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์—†๊ฒŒ ๋˜์—ˆ๋‹ค.

์ด ๋ฐ์ดํ„ฐ์…‹์€ ์›” ๋‹จ์œ„๋กœ ๋ฐœ์ƒํ•˜๋Š” ์‹œ๊ณ„์—ด์ด๋ฏ€๋กœ 12๊ฐœ์›” ๋‹จ์œ„๋กœ ์ฃผ๊ธฐ์„ฑ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— window=12๊ฐ€ ์ ๋‹นํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์ถ”์ธกํ•  ์ˆ˜๋„ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค๋งŒ, moving average๋ฅผ ๊ณ ๋ คํ•  ๋•Œ๋Š” rolling mean์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•œ window ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•ด๋‘์ž.

์ฐจ๋ถ„(Differencing) - ๊ณ„์ ˆ์„ฑ(Seasonality) ์ƒ์‡„ํ•˜๊ธฐ

Trend์—๋Š” ์žกํžˆ์ง€ ์•Š์ง€๋งŒ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์•ˆ์— ํฌํ•จ๋œ ํŒจํ„ด์ด ํŒŒ์•…๋˜์ง€ ์•Š์€ ์ฃผ๊ธฐ์  ๋ณ€ํ™”๋Š” ์˜ˆ์ธก์— ๋ฐฉํ•ด๊ฐ€ ๋˜๋Š” ๋ถˆ์•ˆ์ •์„ฑ ์š”์†Œ์ด๋‹ค. ์ด๊ฒƒ์€ Moving Average ์ œ๊ฑฐ๋กœ๋Š” ์ƒ์‡„๋˜์ง€ ์•Š๋Š” ํšจ๊ณผ๋กœ, ์ด๋Ÿฐ ๊ณ„์ ˆ์ , ์ฃผ๊ธฐ์  ํŒจํ„ด์„ ๊ณ„์ ˆ์„ฑ(Seasonality)๋ผ๊ณ  ํ•œ๋‹ค.

์ด๋Ÿฐ ํŒจํ„ด์„ ์ƒ์‡„ํ•˜๊ธฐ ์œ„ํ•ด ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์—๋Š” ์ฐจ๋ถ„(Differencing)์ด ์žˆ๋‹ค. ์‹œ๊ณ„์—ด์„ ํ•œ ์Šคํ… ์•ž์œผ๋กœ ์‹œํ”„ํŠธํ•œ ์‹œ๊ณ„์—ด์„ ์›๋ž˜ ์‹œ๊ณ„์—ด์— ๋นผ ์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด๋ ‡๊ฒŒ ๋˜๋ฉด ๋‚จ์€ ๊ฒƒ์€ ํ˜„์žฌ ์Šคํ… ๊ฐ’ - ์ง์ „ ์Šคํ… ๊ฐ’์ด ๋˜์–ด ์ •ํ™•ํžˆ ์ด๋ฒˆ ์Šคํ…์—์„œ ๋ฐœ์ƒํ•œ ๋ณ€ํ™”๋Ÿ‰์„ ์˜๋ฏธํ•˜๊ฒŒ ๋œ๋‹ค.

ts_log_moving_avg_shift = ts_log_moving_avg.shift()

plt.plot(ts_log_moving_avg, color='blue')
plt.plot(ts_log_moving_avg_shift, color='green')

ts_log_moving_avg_diff = ts_log_moving_avg - ts_log_moving_avg_shift
ts_log_moving_avg_diff.dropna(inplace=True)
plt.plot(ts_log_moving_avg_diff)

plot_rolling_statistics(ts_log_moving_avg_diff)

augmented_dickey_fuller_test(ts_log_moving_avg_diff)

"""
Results of Dickey-Fuller Test:
Test Statistic                  -3.912981
p-value                          0.001941
#Lags Used                      13.000000
Number of Observations Used    118.000000
Critical Value (1%)             -3.487022
Critical Value (5%)             -2.886363
Critical Value (10%)            -2.580009
dtype: float64
"""

Trend๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ๋‚œ ์‹œ๊ณ„์—ด์—๋‹ค๊ฐ€ 1์ฐจ ์ฐจ๋ถ„(1st order differencing)์„ ์ ์šฉํ•˜์—ฌ Seasonality ํšจ๊ณผ๋ฅผ ๋‹ค์†Œ ์ƒ์‡„ํ•œ ๊ฒฐ๊ณผ, p-value๊ฐ€ ์ด์ „์˜ 10% ์ •๋„๊นŒ์ง€๋กœ ์ค„์–ด๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ผ์„œ๋Š” 2์ฐจ ์ฐจ๋ถ„(2nd order differencing, ์ฐจ๋ถ„์˜ ์ฐจ๋ถ„), 3์ฐจ ์ฐจ๋ถ„(3rd order differencing, 2์ฐจ ์ฐจ๋ถ„์˜ ์ฐจ๋ถ„)์„ ์ ์šฉํ•˜๋ฉด ๋”์šฑ p-value๋ฅผ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ์„์ง€๋„ ๋ชจ๋ฅธ๋‹ค.

(2) ์‹œ๊ณ„์—ด ๋ถ„ํ•ด(Time Series Decomposition)


statsmodels ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์•ˆ์—๋Š” seasonal_decompose ๋ฉ”์†Œ๋“œ๋ฅผ ํ†ตํ•ด ์‹œ๊ณ„์—ด ์•ˆ์— ์กด์žฌํ•˜๋Š” trend, seasonality๋ฅผ ์ง์ ‘ ๋ถ„๋ฆฌํ•ด ๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์ด ์žˆ๋‹ค. ์ด ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜๋ฉด ์šฐ๋ฆฌ๊ฐ€ ์œ„์—์„œ ์ง์ ‘ ์ˆ˜ํ–‰ํ–ˆ๋˜ moving average ์ œ๊ฑฐ, differencing ๋“ฑ์„ ๊ฑฐ์น˜์ง€ ์•Š๊ณ ๋„ ํ›จ์”ฌ ์•ˆ์ •์ ์ธ ์‹œ๊ณ„์—ด์„ ๋ถ„๋ฆฌํ•ด ๋‚ผ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log)

trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

plt.rcParams["figure.figsize"] = (11,6)
plt.subplot(411)
plt.plot(ts_log, label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()

Original ์‹œ๊ณ„์—ด์—์„œ Trend์™€ Seasonality๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ๋‚œ ๋‚˜๋จธ์ง€๋ฅผ Residual์ด๋ผ๊ณ  ํ•œ๋‹ค. ๋’ค์ง‘์–ด์„œ ๋งํ•˜๋ฉด Trend+Seasonality+Residual=Original ์ด ์„ฑ๋ฆฝํ•œ๋‹ค๋Š” ๋œป์ด๋‹ค. ์ด๋Ÿฌํ•œ Decomposing์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•˜๋Š” ์ค‘์š”ํ•œ ๊ด€์ ์„ ์ œ์‹œํ•ด ์ค€๋‹ค.

# Residual ์•ˆ์ •์„ฑ ์—ฌ๋ถ€ ํ™•์ธ
plt.rcParams["figure.figsize"] = (13,6)
plot_rolling_statistics(residual)

residual.dropna(inplace=True)
augmented_dickey_fuller_test(residual)

"""
Results of Dickey-Fuller Test:
Test Statistic                -6.332387e+00
p-value                        2.885059e-08
#Lags Used                     9.000000e+00
Number of Observations Used    1.220000e+02
Critical Value (1%)           -3.485122e+00
Critical Value (5%)           -2.885538e+00
Critical Value (10%)          -2.579569e+00
dtype: float64
"""

Decomposing์„ ํ†ตํ•ด ์–ป์–ด์ง„ Residual์€ ์••๋„์ ์œผ๋กœ ๋‚ฎ์€ p-value๋ฅผ ๋ณด์—ฌ ์ค€๋‹ค. ์ด ์ •๋„๋ฉด ํ™•์‹คํžˆ ์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์˜ ์•ˆ์ •์ ์ธ ์‹œ๊ณ„์—ด์ด ์–ป์–ด์กŒ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

ARIMA ๋ชจ๋ธ์˜ ๊ฐœ๋…

(1) ARIMA ๋ชจ๋ธ์˜ ์ •์˜


์ด์ „ ์Šคํ…์—์„œ ์šฐ๋ฆฌ๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ Trend์™€ Seasonality์™€ Residual๋กœ Decompose๋˜๋ฉฐ, Trend์™€ Seasonality๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„๋ฆฌํ•ด ๋‚ธ ๊ฒฝ์šฐ ์•„์ฃผ ์˜ˆ์ธก๋ ฅ ์žˆ๋Š” ์•ˆ์ •์ ์ธ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋กœ ๋ณ€ํ™˜ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ด๋Ÿฐ ์›๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์˜ˆ์ธก๋ชจ๋ธ์„ ์ž๋™์œผ๋กœ ๋งŒ๋“ค์–ด ์ฃผ๋Š” ๋ชจ๋ธ์ด ARIMA(Autoregressive Integrated Moving Average)์ด๋‹ค.

ARIMA๋Š” AR(Autoregressive) + I(Integrated) + MA(Moving Average)๊ฐ€ ํ•ฉ์ณ์ง„ ๋ชจ๋ธ์ด๋‹ค.

AR(์ž๊ธฐํšŒ๊ท€, Autoregressive)

images08.png

  • ์ž๊ธฐํšŒ๊ท€(AR)๋ž€, $Y_t$๊ฐ€ ์ด์ „ p๊ฐœ์˜ ๋ฐ์ดํ„ฐ $Y_{t-1},Y_{t-2}, ..., Y_{t-p}$์˜ ๊ฐ€์ค‘ํ•ฉ์œผ๋กœ ์ˆ˜๋ ดํ•œ๋‹ค๊ณ  ๋ณด๋Š” ๋ชจ๋ธ์ด๋‹ค.
  • ๊ฐ€์ค‘์น˜์˜ ํฌ๊ธฐ๊ฐ€ 1๋ณด๋‹ค ์ž‘์€ $Y_{t-1},Y_{t-2}, ..., Y_{t-p}$์˜ ๊ฐ€์ค‘ํ•ฉ์œผ๋กœ ์ˆ˜๋ ดํ•˜๋Š” ์ž๊ธฐํšŒ๊ท€ ๋ชจ๋ธ๊ณผ ์•ˆ์ •์  ์‹œ๊ณ„์—ด์€ ๋™๊ณ„์ ์œผ๋กœ ๋™์น˜์ด๋‹ค.
  • AR์€ ์ผ๋ฐ˜์ ์ธ ์‹œ๊ณ„์—ด์—์„œ Trend์™€ Seasonality๋ฅผ ์ œ๊ฑฐํ•œ Residual์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์„ ๋ชจ๋ธ๋งํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
  • ์ฃผ์‹๊ฐ’์ด ํ•ญ์ƒ ์ผ์ •ํ•œ ๊ท ํ˜• ์ˆ˜์ค€์„ ์œ ์ง€ํ•  ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๊ด€์ ์ด ๋ฐ”๋กœ ์ฃผ์‹ ์‹œ๊ณ„์—ด์„ AR๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ด€์ ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

MA(์ด๋™ํ‰๊ท , Moving Average)

images09.png

  • ์ด๋™ํ‰๊ท (MV)์€ $Y_t$๊ฐ€ ์ด์ „ q๊ฐœ์˜ ์˜ˆ์ธก์˜ค์ฐจ๊ฐ’ $e_{t-1},e_{t-2}, ..., e_{t-q}$์˜ ๊ฐ€์ค‘ํ•ฉ์œผ๋กœ ์ˆ˜๋ ดํ•œ๋‹ค๊ณ  ๋ณด๋Š” ๋ชจ๋ธ์ด๋‹ค.
  • MA๋Š” ์ผ๋ฐ˜์ ์ธ ์‹œ๊ณ„์—ด์—์„œ Trend์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์„ ๋ชจ๋ธ๋งํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ์ธก์˜ค์ฐจ๊ฐ’ $e_{t-1}$์ด +๋ผ๋ฉด ๋ชจ๋ธ ์˜ˆ์ธก๋ณด๋‹ค ๊ด€์ธก๊ฐ’์ด ๋” ๋†’์•˜๋‹ค๋Š” ๋œป์ด๋ฏ€๋กœ, ๋‹ค์Œ $Y_t$ ์˜ˆ์ธก ์‹œ์—๋Š” ์˜ˆ์ธก์ง€๋ฅผ ์˜ฌ๋ ค์žก๊ฒŒ ๋œ๋‹ค.
  • ์ฃผ์‹๊ฐ’์€ ํ•ญ์ƒ ์ตœ๊ทผ์˜ ์ฆ๊ฐ ํŒจํ„ด์ด ์ง€์†๋  ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๊ด€์ ์ด ๋ฐ”๋กœ ์ฃผ์‹ ์‹œ๊ณ„์—ด์„ MA๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ด€์ ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

I (์ฐจ๋ถ„๋ˆ„์ , Integration)

  • ์ฐจ๋ถ„๋ˆ„์ ์€ $Y_t$ ์ด ์ด์ „ ๋ฐ์ดํ„ฐ์™€ d์ฐจ ์ฐจ๋ถ„์˜ ๋ˆ„์ (integration) ํ•ฉ์ด๋ผ๊ณ  ๋ณด๋Š” ๋ชจ๋ธ์ด๋‹ค.
  • ์˜ˆ๋ฅผ ๋“ค์–ด์„œ d=1์ด๋ผ๋ฉด, $Y_t$ ๋Š” $Y_{t-1}$๊ณผ $ฮ”Y_{t-1}$์˜ ํ•ฉ์œผ๋กœ ๋ณด๋Š” ๊ฒƒ์ด๋‹ค.
  • I๋Š” ์ผ๋ฐ˜์ ์ธ ์‹œ๊ณ„์—ด์—์„œ Seasonality์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์„ ๋ชจ๋ธ๋งํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

ARIMA๋Š” ์œ„ 3๊ฐ€์ง€ ๋ชจ๋ธ์„ ๋ชจ๋‘ ํ•œ๊บผ๋ฒˆ์— ๊ณ ๋ คํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค.

(2) ARIMA ๋ชจ๋ธ์˜ ๋ชจ์ˆ˜ p, q, d


ARIMA๋ฅผ ํ™œ์šฉํ•ด์„œ ์‹œ๊ณ„์—ด ์˜ˆ์ธก ๋ชจ๋ธ์„ ์„ฑ๊ณต์ ์œผ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ARIMA์˜ ๋ชจ์ˆ˜(parameter)๋ฅผ ๋ฐ์ดํ„ฐ์— ๋งž๊ฒŒ ์„ค์ •ํ•ด์•ผ ํ•œ๋‹ค. ์‰ฝ๊ฒŒ ๋งํ•˜์ž๋ฉด ๋ชจ๋ธ์— ์•„์ฃผ ํ•ต์‹ฌ์ ์ธ ์ˆซ์ž๋“ค์„ ์ž˜ ์„ค์ •ํ•ด์•ผ ์˜ฌ๋ฐ”๋ฅธ ์˜ˆ์ธก์‹์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

ARIMA์˜ ๋ชจ์ˆ˜๋Š” 3๊ฐ€์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, ์ž๊ธฐํšŒ๊ท€ ๋ชจํ˜•(AR)์˜ ์‹œ์ฐจ๋ฅผ ์˜๋ฏธํ•˜๋Š” p, ์ฐจ๋ถ„(diffdrence) ํšŸ์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๋Š” d, ์ด๋™ํ‰๊ท  ๋ชจํ˜•(MA)์˜ ์‹œ์ฐจ๋ฅผ ์˜๋ฏธํ•˜๋Š” q๊ฐ€ ์žˆ๋‹ค.

์ด๋“ค ์ค‘ p ์™€ q ์— ๋Œ€ํ•ด์„œ๋Š” ํ†ต์ƒ์ ์œผ๋กœ p + q < 2, p * q = 0 ์ธ ๊ฐ’๋“ค์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด๋Š” p ๋‚˜ q ์ค‘ ํ•˜๋‚˜์˜ ๊ฐ’์ด 0์ด๋ผ๋Š” ๋œป์ด๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋Š” ์ด์œ ๋Š” ์‹ค์ œ๋กœ ๋Œ€๋ถ€๋ถ„์˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋Š” ์ž๊ธฐํšŒ๊ท€ ๋ชจํ˜•(AR)์ด๋‚˜ ์ด๋™ํ‰๊ท  ๋ชจํ˜•(MA) ์ค‘ ํ•˜๋‚˜์˜ ๊ฒฝํ–ฅ๋งŒ์„ ๊ฐ•ํ•˜๊ฒŒ ๋ ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๊ทธ๋Ÿฌ๋ฉด ARIMA(p,d,q) ๋ชจ๋ธ์˜ ๋ชจ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์–ด๋–ป๊ฒŒ ๋ ๊นŒ? ์˜ˆ๋ฅผ ๋“ค์–ด q๋ผ๋ฉด ์ด์ „ ์Šคํ…์—์„œ Moving Average๋ฅผ ๊ตฌํ•  ๋•Œ์˜ window=12์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’๊ณผ ๊ฐ™์€ ์—ญํ• ์„ ํ•œ๋‹ค๋Š” ๋Š๋‚Œ์ด ๋“ ๋‹ค. ์ด ๊ฐ’์„ ์–ด๋–ป๊ฒŒ ๊ฒฐ์ •ํ•˜๋Š๋ƒ๊ฐ€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์˜ ์•ˆ์ •์„ฑ ๋ฐ ์ดํ›„ ์˜ˆ์ธก์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ํฌ๊ฒŒ ๋ฏธ์น  ๊ฒƒ์ด๋‹ค.

ARIMA์˜ ์ ์ ˆํ•œ ๋ชจ์ˆ˜ p,d,q๋ฅผ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์—๋Š” ์—„์ฒญ๋‚œ ํ†ต๊ณ„ํ•™์ ์ธ ๋‹ค์–‘ํ•œ ์‹œ๋„๋“ค์ด ์žˆ๋‹ค. ํ†ต๊ณ„ํ•™์ ์ธ ์„ค๋ช…์„ ์ƒ๋žตํ•˜๊ณ  ๊ฒฐ๋ก ๋ถ€ํ„ฐ ์ด์•ผ๊ธฐํ•˜์ž๋ฉด, ๋ชจ์ˆ˜ p,d,q๋Š” ACF(Autocorrelation Function)์™€ PACF(Partial Autocorrelation Function)์„ ํ†ตํ•ด ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด AutoCorrelation์€ ์šฐ๋ฆฌ๊ฐ€ ๋งจ ์ฒซ ์Šคํ…์—์„œ ๋งŒ๋‚ฌ๋˜ ๋ฐ”๋กœ ๊ฐœ๋… ์ค‘ ํ•˜๋‚˜์ธ ์ž๊ธฐ์ƒ๊ด€๊ณ„์ˆ˜์™€ ๊ฐ™์€ ๊ฒƒ์ด๋‹ค.

ACF ๋Š” ์‹œ์ฐจ(lag)์— ๋”ฐ๋ฅธ ๊ด€์ธก์น˜๋“ค ์‚ฌ์ด์˜ ๊ด€๋ จ์„ฑ์„ ์ธก์ •ํ•˜๋Š” ํ•จ์ˆ˜์ด๋ฉฐ, PACF ๋Š” ๋‹ค๋ฅธ ๊ด€์ธก์น˜์˜ ์˜ํ–ฅ๋ ฅ์„ ๋ฐฐ์ œํ•˜๊ณ  ๋‘ ์‹œ์ฐจ์˜ ๊ด€์ธก์น˜ ๊ฐ„ ๊ด€๋ จ์„ฑ์„ ์ธก์ •ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

plot_acf(ts_log)   # ACF : Autocorrelation ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
plot_pacf(ts_log)  # PACF : Partial Autocorrelation ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
plt.show()

images10.png

images11.png

์•„๋ž˜ ๊ทธ๋ฆผ์€ ACF๋ฅผ ํ†ตํ•ด MA ๋ชจ๋ธ์˜ ์‹œ์ฐจ q๋ฅผ ๊ฒฐ์ •ํ•˜๊ณ , PACF๋ฅผ ํ†ตํ•ด AR ๋ชจ๋ธ์˜ ์‹œ์ฐจ p๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ†ต๊ณ„ํ•™์ ์œผ๋กœ ์„ค๋ช…ํ•˜๋Š” ์•„ํ‹ฐํด์—์„œ ์š”์•ฝ๊ฒฐ๋ก  ๋ถ€๋ถ„๋งŒ ๊ฐ€์ ธ์˜จ ๊ฒƒ์ด๋‹ค.

images12.png

์ด ๊ฒฐ๋ก ์— ๋”ฐ๋ผ ๋ณด์ž๋ฉด PACF ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณผ ๋•Œ p=1์ด ๋งค์šฐ ์ ํ•ฉํ•œ ๊ฒƒ ๊ฐ™๋‹ค. p๊ฐ€ 2 ์ด์ƒ์ธ ๊ตฌ๊ฐ„์—์„œ PACF๋Š” ๊ฑฐ์˜ 0์— ๊ฐ€๊นŒ์›Œ์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. PACF๊ฐ€ 0์ด๋ผ๋Š” ์˜๋ฏธ๋Š” ํ˜„์žฌ ๋ฐ์ดํ„ฐ์™€ p ์‹œ์  ๋–จ์–ด์ง„ ์ด์ „์˜ ๋ฐ์ดํ„ฐ๋Š” ์ƒ๊ด€๋„๊ฐ€ 0, ์ฆ‰ ์•„๋ฌด ์ƒ๊ด€ ์—†๋Š” ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ณ ๋ คํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค๋Š” ๋œป์ด๋‹ค.

๋ฐ˜๋ฉด ACF๋Š” ์ ์ฐจ์ ์œผ๋กœ ๊ฐ์†Œํ•˜๊ณ  ์žˆ์–ด์„œ AR(1) ๋ชจ๋ธ์— ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋ฅผ ๋ณด์ด๊ณ  ์žˆ๋‹ค.

q์— ๋Œ€ํ•ด์„œ๋Š” ์ ํ•ฉํ•œ ๊ฐ’์ด ์—†์–ด ๋ณด์ธ๋‹ค. MA๋ฅผ ๊ณ ๋ คํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค๋ฉด q=0์œผ๋กœ ๋‘˜ ์ˆ˜ ์žˆ์œผ๋‚˜, q๋ฅผ ๋ฐ”๊ฟ” ๊ฐ€๋ฉด์„œ ํ™•์ธํ•ด ๋ณด๋Š” ๊ฒƒ๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค.

d๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ข€ ๋‹ค๋ฅธ ์ ‘๊ทผ์ด ํ•„์š”ํ•˜๋‹ค. d์ฐจ ์ฐจ๋ถ„์„ ๊ตฌํ•ด ๋ณด๊ณ  ์ด๋•Œ ์‹œ๊ณ„์—ด์ด ์•ˆ์ •๋œ ์ƒํƒœ์ธ์ง€๋ฅผ ํ™•์ธํ•ด ๋ณด์•„์•ผ ํ•œ๋‹ค.

# 1์ฐจ ์ฐจ๋ถ„ ๊ตฌํ•˜๊ธฐ
diff_1 = ts_log.diff(periods=1).iloc[1:]
diff_1.plot(title='Difference 1st')

augmented_dickey_fuller_test(diff_1)

"""
Results of Dickey-Fuller Test:
Test Statistic                  -2.717131
p-value                          0.071121
#Lags Used                      14.000000
Number of Observations Used    128.000000
Critical Value (1%)             -3.482501
Critical Value (5%)             -2.884398
Critical Value (10%)            -2.578960
dtype: float64
"""

# 2์ฐจ ์ฐจ๋ถ„ ๊ตฌํ•˜๊ธฐ
diff_2 = diff_1.diff(periods=1).iloc[1:]
diff_2.plot(title='Difference 2nd')

augmented_dickey_fuller_test(diff_2)

"""
Results of Dickey-Fuller Test:
Test Statistic                -8.196629e+00
p-value                        7.419305e-13
#Lags Used                     1.300000e+01
Number of Observations Used    1.280000e+02
Critical Value (1%)           -3.482501e+00
Critical Value (5%)           -2.884398e+00
Critical Value (10%)          -2.578960e+00
dtype: float64
"""

์ด๋ฒˆ ๊ฒฝ์šฐ์—๋Š” 1์ฐจ ์ฐจ๋ถ„์„ ๊ตฌํ–ˆ์„ ๋•Œ ์•ฝ๊ฐ„ ์• ๋งคํ•œ ์ˆ˜์ค€์˜ ์•ˆ์ •ํ™” ์ƒํƒœ๋ฅผ ๋ณด์˜€๊ณ , 2์ฐจ ์ฐจ๋ถ„์„ ๊ตฌํ–ˆ์„ ๋•Œ๋Š” ํ™•์‹คํžˆ ์•ˆ์ •ํ™” ์ƒํƒœ์˜€์ง€๋งŒ ์ด๋ฒˆ ๊ฒฝ์šฐ์—๋Š” d=1๋กœ ๋จผ์ € ์‹œ๋„ํ•ด๋ณด์ž.

(3) ํ•™์Šต๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ


# train, test ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
train_data, test_data = ts_log[:int(len(ts_log)*0.9)], ts_log[int(len(ts_log)*0.9):]
plt.figure(figsize=(10,6))
plt.grid(True)
plt.plot(ts_log, c='r', label='training dataset')  # train_data๋ฅผ ์ ์šฉํ•˜๋ฉด ๊ทธ๋ž˜ํ”„๊ฐ€ ๋Š์–ด์ ธ ๋ณด์ด๋ฏ€๋กœ ์ž์—ฐ์Šค๋Ÿฌ์šด ์—ฐ์ถœ์„ ์œ„ํ•ด ts_log๋ฅผ ์„ ํƒ
plt.plot(test_data, c='b', label='test dataset')
plt.legend()

# ๋ฐ์ดํ„ฐ์…‹ ํ˜•ํƒœ ํ™•์ธ
print(ts_log[:2])
print(train_data.shape)
print(test_data.shape)

ARIMA ๋ชจ๋ธ ํ›ˆ๋ จ๊ณผ ์ถ”๋ก 


์œ„์—์„œ ์šฐ๋ฆฌ๋Š” ์ผ๋‹จ p=1, d=1, q=0์„ ๋ชจ์ˆ˜๋กœ ๊ฐ€์ง€๋Š” ARIMA ๋ชจ๋ธ์„ ์šฐ์„ ์ ์œผ๋กœ ๊ณ ๋ คํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ARIMA ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐ„๋‹จํ•˜๋‹ค.

from statsmodels.tsa.arima_model import ARIMA

# Build Model
model = ARIMA(train_data, order=(1, 1, 0))  
fitted_m = model.fit(disp=-1)  
print(fitted_m.summary())

# ๋ชจ๋ธ ์˜ˆ์ธก ํ™•์ธ
fitted_m.plot_predict()

# Forecast : ๊ฒฐ๊ณผ๊ฐ€ fc์— ๋‹ด๊น๋‹ˆ๋‹ค. 
fc, se, conf = fitted_m.forecast(len(test_data), alpha=0.05)  # 95% conf

# Make as pandas series
fc_series = pd.Series(fc, index=test_data.index)   # ์˜ˆ์ธก๊ฒฐ๊ณผ
lower_series = pd.Series(conf[:, 0], index=test_data.index)  # ์˜ˆ์ธก๊ฒฐ๊ณผ์˜ ํ•˜ํ•œ ๋ฐ”์šด๋“œ
upper_series = pd.Series(conf[:, 1], index=test_data.index)  # ์˜ˆ์ธก๊ฒฐ๊ณผ์˜ ์ƒํ•œ ๋ฐ”์šด๋“œ

# Plot
plt.figure(figsize=(9,5), dpi=100)
plt.plot(train_data, label='training')
plt.plot(test_data, c='b', label='actual price')
plt.plot(fc_series, c='r',label='predicted price')
plt.fill_between(lower_series.index, lower_series, upper_series, color='k', alpha=.10)
plt.legend()
plt.show()

images13.png

์ตœ์ข…์ ์ธ ๋ชจ๋ธ์˜ ์˜ค์ฐจ์œจ์„ ๊ณ„์‚ฐํ•˜๋ ค๋ฉด, ๊ทธ๋™์•ˆ ๋กœ๊ทธ ๋ณ€ํ™˜๋œ ์‹œ๊ณ„์—ด์„ ์‚ฌ์šฉํ•ด ์™”๋˜ ๊ฒƒ์„ ๋ชจ๋‘ ์ง€์ˆ˜ ๋ณ€ํ™˜ํ•˜์—ฌ ์›๋ณธ์˜ ์Šค์ผ€์ผ๋กœ ๊ณ„์‚ฐํ•ด์•ผ ํƒ€๋‹นํ•˜๋‹ค. np.exp()๋ฅผ ํ†ตํ•ด ์ „๋ถ€ ์›๋ณธ ์Šค์ผ€์ผ๋กœ ๋Œ๋ฆฐ ํ›„ MSE, MAE, RMSE, MAPE๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

from sklearn.metrics import mean_squared_error, mean_absolute_error
import math

mse = mean_squared_error(np.exp(test_data), np.exp(fc))
print('MSE: ', mse)

mae = mean_absolute_error(np.exp(test_data), np.exp(fc))
print('MAE: ', mae)

rmse = math.sqrt(mean_squared_error(np.exp(test_data), np.exp(fc)))
print('RMSE: ', rmse)

mape = np.mean(np.abs(np.exp(fc) - np.exp(test_data))/np.abs(np.exp(test_data)))
print('MAPE: {:.2f}%'.format(mape*100))

"""
MSE:  5409.550103512347
MAE:  63.136923863759435
RMSE:  73.54964380275644
MAPE: 14.08%
"""

์ตœ์ข…์ ์œผ๋กœ ์˜ˆ์ธก ๋ชจ๋ธ์˜ ๋ฉ”ํŠธ๋ฆญ์œผ๋กœ ํ™œ์šฉํ•˜๊ธฐ์— ์ ๋‹นํ•œ MAPE ๊ธฐ์ค€์œผ๋กœ 14% ์ •๋„์˜ ์˜ค์ฐจ์œจ์„ ๋ณด์˜€๋‹ค. ๋งŒ์กฑ์Šค๋Ÿฝ์ง€ ๋ชปํ•œ ๊ฒฐ๊ณผ์ธ ๊ฒƒ ๊ฐ™์•„์„œ ๋” ์ ๋‹นํ•œ ๋ชจ์ˆ˜๋ฅผ ์ฐพ์•„ ๊ฐœ์„ ํ•˜๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค. q=8 ์„ ์ค„ ๊ฒฝ์šฐ MAPE๊ฐ€ 10% ์ •๋„๋กœ ๋‚ด๋ ค๊ฐ„๋‹ค. q=12๋ฅผ ์“ธ ์ˆ˜ ์žˆ์œผ๋ฉด ๋”์šฑ ์ข‹์„ ๊ฒƒ ๊ฐ™์ง€๋งŒ ์ด๋ฒˆ ๊ฒฝ์šฐ์—๋Š” ๋ฐ์ดํ„ฐ์…‹์ด ๋„ˆ๋ฌด ์ž‘์•„ ์“ธ ์ˆ˜ ์—†๋‹ค.

์ฃผ์‹ ์˜ˆ์ธก์— ๋„์ „ํ•ด๋ณด์ž

์•ผํ›„ ํŒŒ์ด๋‚ธ์Šค์—์„œ ์ข…๋ชฉ์„ ๊ฒ€์ƒ‰ํ•œ ํ›„, "Historical Data" ํƒญ์—์„œ "Time Period"๋ฅผ "Max"๋กœ ์„ ํƒ, "Apply" ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ ๊ณผ๊ฑฐ ์ƒ์žฅํ•œ ์‹œ์ ๋ถ€ํ„ฐ ๊ฐ€์žฅ ์ตœ๊ทผ๊นŒ์ง€์˜ ์ž๋ฃŒ๋ฅผ ์กฐํšŒํ•œ ๋‹ค์Œ "Download"๋ฅผ ํด๋ฆญํ•˜๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์šด๋กœ๋“œ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

ํšŒ๊ณ ๋ก

  • ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜์Œ ์ ‘ํ–ˆ์„ ๋• ๋‹ค๋ฃจ๊ธฐ ์–ด๋ ค์šด ๋ฐ์ดํ„ฐ๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ ๋…ธ๋“œ๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ ๊ณ„์† ๋‹ค๋ฃจ๋‹ค๋ณด๋‹ˆ ์กฐ๊ธˆ์€ ์ต์ˆ™ํ•ด์ง„ ๊ฒƒ ๊ฐ™๋‹ค.
  • ์˜ˆ์ „๋ถ€ํ„ฐ ์ฃผ๊ฐ€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์€ ํ•œ๋ฒˆ ํ•ด๋ณด๊ณ  ์‹ถ์—ˆ๋Š”๋ฐ ์ƒ๊ฐ๋ณด๋‹ค ์–ด๋ ค์šด ๊ฒƒ ๊ฐ™๋‹ค. ํ•˜๊ธด ์‰ฌ์šฐ๋ฉด ๋ˆ„๊ตฌ๋‚˜ ๋ถ€์ž๊ฐ€ ๋๊ฒ ์ง€...
  • ์ฒ˜์Œ์œผ๋กœ raw data๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์—์„œ ์ด์ƒ์น˜๋ฅผ ๋ฐœ๊ฒฌํ•˜์—ฌ ๊ทธ๋™์•ˆ ํ•™์Šตํ•œ ๋ฐฉ๋ฒ•์„ ๋™์›ํ•˜์—ฌ ์ •์ƒ์ ์œผ๋กœ ์ˆ˜์ •ํ•˜์˜€๋‹ค! ์ง์ ‘ ํ•ด๋ณด๊ณ  ๋‚˜๋‹ˆ ์กฐ๊ธˆ์€ ์„ฑ์žฅํ–ˆ๋‹ค๊ณ  ๋Š๊ผˆ๋‹ค.
  • ์ด์ „์— ํ•ด์ปคํ†ค์„ ์ง„ํ–‰ํ–ˆ์„ ๋•Œ๋„ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๊ฐ€๊ฒฉ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์—์„œ ARIMA ๋ชจ๋ธ์€ ์„ฑ๋Šฅ์ด ๋ณ„๋กœ์—ฌ์„œ XGBoost์™€ LightGBM์„ ์ด์šฉํ–ˆ๋˜ ๊ธฐ์–ต์ด ๋‚œ๋‹ค. ์ด ๋ชจ๋ธ๋„ ์ถ”ํ›„ ์•™์ƒ๋ธ” ๋ชจ๋ธ์„ ์ด์šฉํ•ด์„œ ํ•œ๋ฒˆ ์˜ˆ์ธก์„ ์‹œ๋„ํ•ด๋ด์•ผ๊ฒ ๋‹ค.

์œ ์šฉํ•œ ๋งํฌ

http://www.dodomira.com/2016/04/21/arima_in_r/ ARIMA ๋ชจํ˜•

https://destrudo.tistory.com/15 ๊ณต๋ถ„์‚ฐ๊ณผ ์ƒ๊ด€๊ณ„์ˆ˜

https://rfriend.tistory.com/264 ๊ฒฐ์ธก์น˜ ๋ณด๊ฐ„

https://m.blog.naver.com/baedical/10109291879 p-value