*“Hiding within those mounds of data is the knowledge that could change the life of a patient, or change the world.”*

*— Atul Butte*

# Introduction

In this blog post, we are going to develop a time series ARIMA and Holt Winters model and compare model performance. The data to be used can be gotten here, the data is a weather forecasting for Indian climate from 1st January 2013 to 24th April 2017 in the city of Delhi, India. The 4 variables in the data are mean temperature, humidity, wind speed and mean pressure. But for the sake of this analysis we are interested only in the mean temperature. The data is divided into training and testing data which we would use to judge model performance and comparison.

# Load dataset

```
#load libraries
library(fma)
library(tidyverse)
library(seasonal)
#load train and test data
train <- read_csv("C:/Users/Adejumo/Downloads/DailyDelhiClimateTrain.csv")
test <- read_csv("C:/Users/Adejumo/Downloads/DailyDelhiClimateTest.csv")
```

The training data ranges from 1st January 2013 to 1st January 2017 while the testing data ranges from 1st January 2017 to 24th April 2017.

`head(train)`

```
## # A tibble: 6 x 5
## date meantemp humidity wind_speed meanpressure
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2013-01-01 10 84.5 0 1016.
## 2 2013-01-02 7.4 92 2.98 1018.
## 3 2013-01-03 7.17 87 4.63 1019.
## 4 2013-01-04 8.67 71.3 1.23 1017.
## 5 2013-01-05 6 86.8 3.7 1016.
## 6 2013-01-06 7 82.8 1.48 1018
```

`head(test)`

```
## # A tibble: 6 x 5
## date meantemp humidity wind_speed meanpressure
## <date> <dbl> <dbl> <dbl> <dbl>
## 1 2017-01-01 15.9 85.9 2.74 59
## 2 2017-01-02 18.5 77.2 2.89 1018.
## 3 2017-01-03 17.1 81.9 4.02 1018.
## 4 2017-01-04 18.7 70.0 4.54 1016.
## 5 2017-01-05 18.4 74.9 3.3 1014.
## 6 2017-01-06 19.3 79.3 8.68 1012.
```

# TS Graphics and Decomposition

We need to convert the time series data into time series objects to work with time series functions in R.

```
#creating time series object
train_ts <- ts(train,
frequency = 7,
start = c(2013, 1),
end = c(2017, 1))
test_ts <- ts(test,
frequency = 7,
start = c(2017, 1),
end = c(2017, 114))
#plot the time series of the lettuce data
autoplot(train_ts[, "meantemp"]) +
ylab("Mean temperature") +
xlab("Year") +
ggtitle("Mean Temperature")
```

The graphs above shows the graph of the training data. Now we decompose the training data to see the trend and seasonality using the STL “Seasonal and Trend decomposition using Loess” decomposition method, the advantage of this method is that it has the ability to handle any type of seasonality over other methods.

```
#stl decompostion
fit <- stl(train_ts[, "meantemp"],
s.window = 5,
robust = TRUE)
autoplot(fit) +
ggtitle("STL decompostion of meantemp")
```

From the above plot we can see that the data is not stationary.

# ARIMA model

The ARIMA modelling is going to be generated using the Hyndman-Khandakar algorithm which combines unit root tests, minimisation of the AICs and MLE to obtain an ARIMA model.

```
#ARIMA modelling
fit_arima <- auto.arima(train_ts[, "meantemp"],
stepwise = F,
approximation = F)
fit_arima
```

```
## Series: train_ts[, "meantemp"]
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.4473
## s.e. 0.1684
##
## sigma^2 estimated as 4.293: log likelihood=-59.73
## AIC=123.46 AICc=123.94 BIC=126.13
```

```
#accuracy of the ARIMA model on train data
a1 <- accuracy(fit_arima)
a1
```

```
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2266035 1.999313 1.533964 -0.1518837 13.44902 0.4115334
## ACF1
## Training set 0.0598476
```

The ARIMA(1,1,0) with no lagged forecast errors. From the model summary, we can see that the AIC value is 123.46. The RMSE of the ARIMA model is also given as 1.999. The residual plot given below and the Ljung-Box test with p-value of 0.9357 indicates absence of autocorrelation.

```
#check for autocorrelation
checkresiduals(fit_arima)
```

```
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0)
## Q* = 1.2926, df = 5, p-value = 0.9357
##
## Model df: 1. Total lags used: 6
```

Forecasting the model on the train and test data and also see the accuracy of the model on the test data.

```
#forecasting on train sets.
fit_arima %>%
forecast %>%
autoplot
```

```
#forecasting on the test data
fit_arima %>%
forecast() %>%
autoplot() +
autolayer(test_ts[, "meantemp"])
```

```
#accuracy of the model on test data
ARIMA_test <- Arima(test_ts[, "meantemp"],
model = fit_arima)
a2 <- accuracy(ARIMA_test)
a2
```

```
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.2054088 1.754148 1.385013 0.3583744 6.982821 0.4550031 0.2717389
```

The RMSE of the test data shows that the model is accurate in forecasting new data. But is this better than the Holt-Winters model, let us see in the next section…

# Holt Winters Model

The Holt Winter multiplicative variation will be used to model the training data because seasonal variations are changing proportional to the level of the series.

```
#multiplicative Holt winters model
fit_holts <- hw(train_ts[, "meantemp"],
seasonal = "multiplicative",
damped = FALSE)
#check residuals for autocorrelation
checkresiduals(fit_holts)
```

```
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' multiplicative method
## Q* = 8.1189, df = 3, p-value = 0.04362
##
## Model df: 11. Total lags used: 14
```

The residual plots above and the Ljung-Box test indicates that there is no autocorrelation. From the summary below, the RMSE is given as 2.039

```
#holt winter's model summary
summary(fit_holts)
```

```
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = train_ts[, "meantemp"], seasonal = "multiplicative", damped = FALSE)
##
## Smoothing parameters:
## alpha = 0.6091
## beta = 5e-04
## gamma = 0.0027
##
## Initial states:
## l = 5.6374
## b = 0.4297
## s = 0.9582 0.9555 0.9387 1.0354 1.0962 1.0695
## 0.9464
##
## sigma: 0.2716
##
## AIC AICc BIC
## 174.3851 193.8851 190.7926
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1759436 2.039067 1.681243 -3.407764 15.23326 0.4510455
## ACF1
## Training set -0.1950938
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2017.143 16.57158 10.802900 22.34026 7.74914465 25.39402
## 2017.286 17.43068 10.307933 24.55343 6.53737717 28.32399
## 2017.429 16.93207 9.140486 24.72366 5.01586872 28.84828
## 2017.571 15.73300 7.778260 23.68774 3.56727475 27.89873
## 2017.714 16.42541 7.447375 25.40345 2.69468839 30.15614
## 2017.857 16.87547 7.017495 26.73344 1.79899876 31.95194
## 2018.000 17.11677 6.521310 27.71222 0.91241414 33.32112
## 2018.143 19.77169 6.881089 32.66229 0.05721814 39.48616
## 2018.286 20.70643 6.568569 34.84430 -0.91556544 42.32843
## 2018.429 20.03104 5.768135 34.29394 -1.78218826 41.84426
## 2018.571 18.53923 4.820339 32.25812 -2.44200362 39.52047
## 2018.714 19.28245 4.496322 34.06857 -3.33098009 41.89588
## 2018.857 19.73972 4.092696 35.38674 -4.19033641 43.66977
## 2019.000 19.95329 3.638189 36.26840 -4.99850511 44.90509
```

We forecast the Holt Winters Model on the training and testing data.

```
#forecast on training data
autoplot(fit_holts)
```

```
#forecast on testing data
fit_holts %>%
forecast() %>%
autoplot() +
autolayer(test_ts[, "meantemp"])
```

```
#accuracy of the holt winters train model
e1 <- accuracy(fit_holts)
e1
```

```
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1759436 2.039067 1.681243 -3.407764 15.23326 0.4510455
## ACF1
## Training set -0.1950938
```

```
#accuracy of the holt winters test model
hw_lettuce_test <- hw(test_ts[, "meantemp"],
model = fit_holts)
e2 <- accuracy(hw_lettuce_test)
e2
```

```
## ME RMSE MAE MPE MAPE MASE
## Training set -0.07483383 1.656473 1.294086 -0.9493109 6.560652 0.4251317
## ACF1
## Training set 0.04032252
```

From RMSE values in the summary statistics above, the model also forecast well on the test data.

# Model Comparison

```
model_comp <- data.frame(ARIMA_train = a1[1:7],
ARIMA_test = a2[1:7],
HW_train = e1[1:7],
HW_test = e2[1:7],
row.names = c("ME", "RMSE",
"MAE", "MPE",
"MAPE", "MASE",
"ACF1"))
model_comp
```

```
## ARIMA_train ARIMA_test HW_train HW_test
## ME 0.2266035 0.2054088 -0.1759436 -0.07483383
## RMSE 1.9993135 1.7541479 2.0390669 1.65647328
## MAE 1.5339639 1.3850134 1.6812427 1.29408594
## MPE -0.1518837 0.3583744 -3.4077636 -0.94931093
## MAPE 13.4490207 6.9828210 15.2332573 6.56065158
## MASE 0.4115334 0.4550031 0.4510455 0.42513173
## ACF1 0.0598476 0.2717389 -0.1950938 0.04032252
```

From the RMSE values above, the ARIMA fits the training data better than the Holt Winters, but the Holt Winters gives a much more accurate forecast on testing data.

# References

- Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(1), 1–22.
- Hyndman, R. J, Athanasopoulos. Forecasting Principles and Practice.