This report details the analysis of monthly electricity consumption in France from January 2008 to March 2025. The primary objective is to develop a robust time series model capable of accurately forecasting future consumption.
The methodology follows the Box-Jenkins approach for building a Seasonal Autoregressive Integrated Moving Average (SARIMA) model. The process involves several key stages:
The dataset will be split into a training set for model building and a test set for forecast validation, as recommended.
First, we load the necessary R libraries and the dataset. The data represents the monthly electricity available to the internal market in France, measured in Gigawatt-hours (GWh).
#install.packages(c("forecast", "urca", "lmtest", "tseries", "ggplot2", "MuMIn", "kableExtra"))
library(MuMIn)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(urca)
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(tseries)
library(ggplot2)
library(kableExtra)
# Load the data
dados <- read.csv("data/data2.csv")
# Create a time series object
# The data is monthly (frequency=12) and starts in January 2008.
ts_data <- ts(dados$OBS_VALUE, start = c(2008, 1), frequency = 12)
names(dados)
## [1] "STRUCTURE"
## [2] "STRUCTURE_ID"
## [3] "STRUCTURE_NAME"
## [4] "freq"
## [5] "Time.frequency"
## [6] "nrg_bal"
## [7] "Energy.balance"
## [8] "siec"
## [9] "Standard.international.energy.product.classification..SIEC."
## [10] "unit"
## [11] "Unit.of.measure"
## [12] "geo"
## [13] "Geopolitical.entity..reporting."
## [14] "TIME_PERIOD"
## [15] "Time"
## [16] "OBS_VALUE"
## [17] "Observation.value"
## [18] "OBS_FLAG"
## [19] "Observation.status..Flag..V2.structure"
## [20] "CONF_STATUS"
## [21] "Confidentiality.status..flag."
head(dados)
## STRUCTURE STRUCTURE_ID
## 1 dataflow ESTAT:NRG_CB_EM(1.0)
## 2 dataflow ESTAT:NRG_CB_EM(1.0)
## 3 dataflow ESTAT:NRG_CB_EM(1.0)
## 4 dataflow ESTAT:NRG_CB_EM(1.0)
## 5 dataflow ESTAT:NRG_CB_EM(1.0)
## 6 dataflow ESTAT:NRG_CB_EM(1.0)
## STRUCTURE_NAME freq
## 1 Supply, transformation and consumption of electricity - monthly data M
## 2 Supply, transformation and consumption of electricity - monthly data M
## 3 Supply, transformation and consumption of electricity - monthly data M
## 4 Supply, transformation and consumption of electricity - monthly data M
## 5 Supply, transformation and consumption of electricity - monthly data M
## 6 Supply, transformation and consumption of electricity - monthly data M
## Time.frequency nrg_bal Energy.balance siec
## 1 Monthly AIM Available to internal market E7000
## 2 Monthly AIM Available to internal market E7000
## 3 Monthly AIM Available to internal market E7000
## 4 Monthly AIM Available to internal market E7000
## 5 Monthly AIM Available to internal market E7000
## 6 Monthly AIM Available to internal market E7000
## Standard.international.energy.product.classification..SIEC. unit
## 1 Electricity GWH
## 2 Electricity GWH
## 3 Electricity GWH
## 4 Electricity GWH
## 5 Electricity GWH
## 6 Electricity GWH
## Unit.of.measure geo Geopolitical.entity..reporting. TIME_PERIOD Time
## 1 Gigawatt-hour FR France 2008-01 NA
## 2 Gigawatt-hour FR France 2008-02 NA
## 3 Gigawatt-hour FR France 2008-03 NA
## 4 Gigawatt-hour FR France 2008-04 NA
## 5 Gigawatt-hour FR France 2008-05 NA
## 6 Gigawatt-hour FR France 2008-06 NA
## OBS_VALUE Observation.value OBS_FLAG Observation.status..Flag..V2.structure
## 1 49891 NA NA NA
## 2 45277 NA NA NA
## 3 46605 NA NA NA
## 4 41417 NA NA NA
## 5 35384 NA NA NA
## 6 34787 NA NA NA
## CONF_STATUS Confidentiality.status..flag.
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
To study the existence of outliers we can use a boxplot.
dados$OBS_VALUE <- as.numeric(gsub(",", ".", dados$OBS_VALUE))
boxplot(dados$OBS_VALUE, main = "Boxplot Of France Energy Consumption", col = "lightblue")
As we can observe, the box plot shows us that we do not have
outliers.
We begin by plotting the time series to visually inspect its components.
autoplot(ts_data,
ylab = "Consumption (GWh)",
xlab = "Year",
main = "Monthly Electricity Consumption in France (2008-2025)")
The plot reveals two key characteristics:
To properly evaluate our model’s forecasting ability, we split the data into a training set (80% of the data) for model fitting and a test set (the remaining 20%) for validation.
# Split data: 80% for training, 20% for testing
n <- length(ts_data)
train_size <- floor(0.8 * n)
ts_train <- window(ts_data, end = time(ts_data)[train_size])
ts_test <- window(ts_data, start = time(ts_data)[train_size + 1])
cat("Training set length:", length(ts_train), "observations\n")
## Training set length: 165 observations
cat("Test set length:", length(ts_test), "observations\n")
## Test set length: 42 observations
We will only apply the transformations to the training set (and repeat in the test when necessary) to avoid data leakage.
We opted for classical decomposition rather than STL, since the trend and seasonality are clear and stable.
# Decompose the series to show trend, seasonality, and remainder
decomposed <- decompose(ts_data)
autoplot(decomposed)
The decomposition confirms the strong seasonal pattern and a visible,
though somewhat noisy, trend.
To stabilize the variance, we should apply a log transformation. Visually, the series shows more intense seasonal variation at higher values (2008-2012), and then less over time. This indicates that the variance depends on the level of the series, which justifies the use of log().
log_ts <- log(ts_train)
plot(log_ts)
We already now that the trend is not constant just by observing the time series plot. However, we will do some stacionarity tests with ADF e KPSS to check if we do need to differenciate the time series. We can confirm this with statistical tests.
# ADF test on the training data
adf.test(log_ts)
## Warning in adf.test(log_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: log_ts
## Dickey-Fuller = -9.89, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
# KPSS test on the training data
kpss.test(log_ts)
##
## KPSS Test for Level Stationarity
##
## data: log_ts
## KPSS Level = 0.53416, Truncation lag parameter = 4, p-value = 0.03397
The ADF test’s low p-value (< 0.05) means we can reject \[H_0\] , suggesting stationarity. The KPSS test’s low p-value (< 0.05) leads us to reject \[H_0\] , indicating non-stationarity.
We have an ambiguous situation, assuming that the series is not yet fully stationary, it is safer to differentiate the series.
we use the ndiffs() and nsdiffs() functions from the forecast package to determine how many differentiations are needed to make the logarithmically transformed series (log_ts) stationary.
# library(forecast)
ndiffs(log_ts) # trend
## [1] 1
nsdiffs(log_ts) # sazonality
## [1] 1
To make the series stationary, we apply differencing. Given the strong 12-month seasonality, we start with seasonal differencing (lag=12).
# Apply seasonal differencing (D=1)
ts_seasonal_diff <- diff(log_ts, lag = 12)
autoplot(ts_seasonal_diff, main = "Seasonally Differenced Series")
The seasonal pattern is gone, but a trend might still be present. We now
apply a regular first difference to remove any remaining trend.
# Apply regular differencing to the seasonally differenced series (d=1)
ts_stationary <- diff(ts_seasonal_diff, differences = 1)
autoplot(ts_stationary, main = "Seasonally and Regularly Differenced Series")
This series appears stationary. Let’s re-run the tests to confirm.
adf.test(ts_stationary)
## Warning in adf.test(ts_stationary): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: ts_stationary
## Dickey-Fuller = -8.1083, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
kpss.test(ts_stationary)
## Warning in kpss.test(ts_stationary): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: ts_stationary
## KPSS Level = 0.028599, Truncation lag parameter = 4, p-value = 0.1
After applying seasonal differencing (lag = 12) and a first regular difference, we re-evaluated the series for stationarity. The Augmented Dickey-Fuller (ADF) test strongly rejected the null hypothesis of non-stationarity (p < 0.01), and the KPSS test failed to reject the null of stationarity (p > 0.1). These results confirm that the differenced series is stationary.
The ACF and PACF must be analysed in the stationary series that will be modelled - that is, after applying the necessary transformations and differentiations. They are basic tools for analysing time series and helping to choose models (such as ARIMA).
# ACF/PACF on log series
# acf(log_ts, main="CF of log-transformed series")
# pacf(log_ts, main="PACF of log-transformed series")
# ACF/PACF on diffenciated series
# acf(ts_stationary, main="ACF of the log-transformed series and differentiated 2 times")
# pacf(ts_stationary, main="PACF of the log-transformed and 2-fold differentiated series")
tsdisplay(ts_stationary, main="ACF/PACF of Stationary Series")
- Seasonal Order (P, Q): At the seasonal lag 12, the ACF has a
significant negative spike, and the PACF cuts off afterward. This
suggests a seasonal MA(1) model, so we set Q=1 and P=0. - Non-Seasonal
Order (p, q): In the non-seasonal lags, the ACF has a significant spike
at lag 1 and the PACF has a significant spike at lag 1. This could
suggest either an AR(1) or MA(1) model. Based on this, a good candidate
model is SARIMA(p,1,q)(0,1,1)[12]. We’ll let auto.arima() find the
optimal p and q values.
Both graphs are a little difficult to interpret since they are not so linear on the values presented, they do not cut off and the do not decay exponencially. For that reason, we will apply an auto.arima() that tests many combinations of values (p,d,q)(P,D,Q)[s] e select the best model with lower AICc ou BIC values. Since it performs all the transformations needed automatically (log and differences, we will apply the auto.arima() function to the original data (ts_data).
We will create different models in order to compare methodologies for scientific purposes. Since the time series has many characteristics, it is important to use different models. The purpose of this project is to compare different approaches and select the best-fitting model without overfitting. To achieve this, we will compare the adjustment of point 3 to the accuracy values of point 4 to find the optimal balance between the two.
Since we are going to carry out forecasting later, we have divided our data into training and test sets. The models will only be fitted and modelled using the training set, and we will then be able to test and evaluate the forecasts.
# Fit model using auto.arima on the training data
auto_fit <- auto.arima(ts_train, stepwise = FALSE, approximation = FALSE)
summary(auto_fit)
## Series: ts_train
## ARIMA(1,0,0)(0,1,2)[12] with drift
##
## Coefficients:
## ar1 sma1 sma2 drift
## 0.3712 -1.0536 0.1825 -38.3646
## s.e. 0.0769 0.0987 0.1008 4.5860
##
## sigma^2 = 3551888: log likelihood = -1378.82
## AIC=2767.64 AICc=2768.05 BIC=2782.8
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 107.0486 1790.939 1231.609 0.1500579 3.025571 0.566978 -0.02814851
auto.arima() selected a SARIMA(2,0,0)(0,1,2)[12] model. The selected model was partly in line with our expectations, as established through the initial exploratory analysis. Given the behaviour of the observed data, the presence of a seasonal differentiation with periodicity 12 and a trend differentiation was already expected. However, the order parameters of the model, especially the p and q values, were unknown, introducing some unexpected elements to the modelling process. This highlights the importance of automated methods, such as auto.arima(), in capturing patterns that are not always evident in preliminary visual or statistical analyses.
Next, we perform diagnostic checks on the model’s residuals to ensure they behave like white noise (i.e., are random and uncorrelated).
# Perform residual diagnostics
checkresiduals(auto_fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(0,1,2)[12] with drift
## Q* = 22.728, df = 21, p-value = 0.3586
##
## Model df: 3. Total lags used: 24
The diagnostic plots show:
We now use our final model, to forecast the next 41 observations.
# Generate forecasts
forecast_horizon <- length(ts_test)
sarima_forecast <- forecast(auto_fit, h = forecast_horizon)
autoplot(sarima_forecast, include = forecast_horizon) +
autolayer(ts_test, series = "Actual Data") +
labs(title = "(Auto.Arima) SARIMA(2,0,0)(0,1,2)[12] Forecast vs. Actual Data",
x = "Year", y = "Consumption (GWh)") +
guides(colour = guide_legend(title = "Series")) +
theme_minimal()
The SARIMA forecast shows good results over time. The blue-shaded bands
represent the uncertainty levels associated with the forecasts: the
darker area, closer to the central forecast line, indicates higher
confidence in the predicted values (95%), while the lighter bands
indicate lower confidence (80%).
The observed values (red line), for the most part, remain within the 95% confidence intervals, suggesting that the model effectively captures both the trend and the variability of the series. Moreover, the regular seasonal patterns are well represented, with the intervals successfully capturing the extremes of the historical series, indicating a good fit of the model to the seasonality.
The widening of the bands over time reveals the increasing uncertainty in longer-term forecasts, a typical and expected behavior in time series models.
Finally, we compare the forecasts to the held-out test data to evaluate the model’s accuracy.
# Calculate accuracy metrics by comparing forecast to test set
accuracy(sarima_forecast, ts_test)
## ME RMSE MAE MPE MAPE MASE
## Training set 107.0486 1790.939 1231.609 0.1500579 3.025571 0.5669780
## Test set -536.0502 1860.800 1547.569 -1.2363108 4.375022 0.7124317
## ACF1 Theil's U
## Training set -0.02814851 NA
## Test set 0.61872986 0.5181165
The accuracy metrics, particularly the Mean Absolute Percentage Error (MAPE) on the test set (4.37%), indicate that the model’s forecasts are, on average, within approximately 4,5% of the actual values. This demonstrates a high level of accuracy. The Root Mean Squared Error (RMSE) gives a measure of the typical error magnitude in GWh.
This model need to be applied on the transformated data, ts_stacionary, the result of the transformmations and differences.
manual_fit <- Arima(ts_train, order = c(1,1,1),
seasonal = list(order = c(0,1,1), period = 12),
include.drift = FALSE) # usually better to exclude drift when d=1
summary(manual_fit)
## Series: ts_train
## ARIMA(1,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ma1 sma1
## 0.3651 -0.9689 -1.0000
## s.e. 0.0821 0.0404 0.1472
##
## sigma^2 = 3428576: log likelihood = -1375.45
## AIC=2758.9 AICc=2759.17 BIC=2770.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -254.2903 1759.576 1236.777 -0.7515563 3.069451 0.5693571
## ACF1
## Training set -0.02769854
The coefficientes make sense and seem well estimated.
By comparing the values of AIC, AICc and BIC with values of the first SARIMA model (SARIMA(2,0,0)(0,1,2)[12]), we can observe that these values are lower, indicating that this models is better adjusted to the data.
Training errors ME: -254.29 - tendency to slightly underestimate RMSE: 1759.58 - the smaller the better MAPE: 3.07% - Values below 10% are generally good, excellent percentage accuracy ACF1: -0.028 - no autocorrelation, close to 0, which is desirable.
To increase confidence in the model, we will check the residuals:
checkresiduals(manual_fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)(0,1,1)[12]
## Q* = 28.317, df = 21, p-value = 0.1314
##
## Model df: 3. Total lags used: 24
The diagnostic plots show:
The forecast generated by the SARIMA(1,1,1)(0,1,1)[12] model for future data was evaluated against the actual values of the test set. The main results are presented below:
# Forecast with manual SARIMA
sarima_manual_forecast <- forecast(manual_fit, h = length(ts_test))
# Plot forecasts
autoplot(sarima_manual_forecast, include = forecast_horizon) +
autolayer(ts_test, series = "Actual Data") +
labs(title = "SARIMA(1,1,1)(0,1,1)[12] Forecast vs. Actual Data",
x = "Year", y = "Consumption (GWh)") +
guides(colour = guide_legend(title = "Series")) +
theme_minimal()
In the same way taht Auto-Arima, this SARIMA forecast performs well when
comparing the predicted data with the actual data. The blue-shaded bands
represent the levels of uncertainty associated with the forecasts: the
darker area, closer to the central forecast line, indicates higher
confidence in the estimated values, while the lighter bands indicate
lower confidence.
The actual values (red line) mostly remain within the 95% confidence intervals, suggesting that the model adequately captures the trend and variability of the series during the forecast period. Furthermore, the seasonal patterns observed in the actual data are well reproduced by the model, indicating a good fit to the seasonality present in the series.
The widening of the bands over time highlights the increasing uncertainty as the forecast extends further into the future, which is expected and characteristic of time series models.
# Forecast accuracy
accuracy(sarima_manual_forecast, ts_test)
## ME RMSE MAE MPE MAPE MASE
## Training set -254.2903 1759.576 1236.777 -0.7515563 3.069451 0.5693571
## Test set -631.7464 2016.456 1659.371 -1.4554391 4.673250 0.7639005
## ACF1 Theil's U
## Training set -0.02769854 NA
## Test set 0.65713280 0.5581256
The accuracy metrics, particularly the Mean Absolute Percentage Error (MAPE) on the test set (4.67%), is still a very good value. This demonstrates a high level of accuracy. Theil’s U = 0.558: a value below 1 indicates that the model outperforms a naive prediction, validating its predictive utility.
When we evaluated the test set, we observed a natural increase in absolute and percentage errors, although these remained at acceptable levels (MAPE of 4.67%). However, there was also a significant increase in the autocorrelation of the residuals (ACF1 = 0.657), which suggests that the model could be improved by testing other SARIMA orders, for example, as part of the structure of the future data has not been captured. In terms of possible overfitting, it’s not a severe overfitting, but there are signs of underfitting to the new behaviour of the series, possible limitation in the ability to generalise.
We will now model with STL (Seasonal-Trend Decomposition using Loess) that is a method of decomposing time series that divides them into three main components: trend, seasonality, and residuals. It helps prepare for modelling the residuals. Once trend and seasonality have been removed (already done - ts_stationary), the residuals can be modelled more accurately using models such as ETS. STL can be adjusted to be less sensitive to extreme values and more robust to outliers. This model allows unstructured fluctuations to be modelled separately with an ARIMA model. This strategy becomes useful when there is complexity or non-linear variation in the seasonal components.
stlm_fit <- stlm(ts_train, s.window = "periodic", method = "arima")
stlm_fit$model
## Series: x
## ARIMA(2,1,1)
##
## Coefficients:
## ar1 ar2 ma1
## 0.3385 0.0386 -0.9154
## s.e. 0.0854 0.0836 0.0341
##
## sigma^2 = 3291384: log likelihood = -1462.31
## AIC=2932.62 AICc=2932.87 BIC=2945.02
summary(stlm_fit)
## Length Class Mode
## stl 660 mstl numeric
## model 18 forecast_ARIMA list
## modelfunction 1 -none- function
## lambda 0 -none- NULL
## x 165 ts numeric
## series 1 -none- character
## m 1 -none- numeric
## fitted 165 ts numeric
## residuals 165 ts numeric
The ARIMA model adjusted after the STL decomposition was ARIMA(2,1,1).
checkresiduals(stlm_fit)
##
## Ljung-Box test
##
## data: Residuals
## Q* = 32.585, df = 24, p-value = 0.1131
##
## Model df: 0. Total lags used: 24
The diagnostic plots show:
The STL + ARIMA model was effective in capturing the seasonal and trend patterns of the series. Analysis of the residuals shows no significant autocorrelation, suggesting that the model is adequate and has left no unmodelled systematic patterns.
# Forecast with STL + ARIMA
stlm_forecast <- forecast(stlm_fit, h = length(ts_test))
# Plot forecasts
autoplot(stlm_forecast, include = forecast_horizon) +
autolayer(ts_test, series = "Actual Data") +
labs(title = "STL + ARIMA(2,1,1) Forecast vs. Actual",
x = "Year", y = "Consumption (GWh)") +
guides(colour = guide_legend(title = "Series")) +
theme_minimal()
The forecast generated by the STL + ARIMA(2,1,1) model shows satisfactory results over time. The blue-shaded bands represent the levels of uncertainty associated with the forecasts: the darker area, close to the central forecast line, indicates higher confidence in the estimated values, while the lighter bands indicate lower confidence.
The actual values (red line), for the most part, remain within the 95% confidence intervals, but unlike previous models, at times they fall only within the 80% range or even outside the intervals. Still, the model manages to capture the trend and variability of the time series well for most of the forecast horizon. The seasonal patterns present in the data are adequately represented, reflecting good performance of the seasonal decomposition performed by STL.
The widening of the bands over time highlights the natural increase in uncertainty for more distant periods, an expected behavior in time series forecasts.
# Forcast accuracy
accuracy(stlm_forecast, ts_test)
## ME RMSE MAE MPE MAPE MASE
## Training set -154.9727 1792.092 1284.789 -0.5790029 3.174841 0.5914597
## Test set -1975.0121 2879.010 2292.354 -5.3710130 6.317898 1.0552978
## ACF1 Theil's U
## Training set -0.004816872 NA
## Test set 0.696125655 0.7794289
Observing the forecasts plot we can observe that the forecasts capture the essence. This model has good training performance where the training residuals are practically white noise (ACF1 ≈ 0) and MASE < 1 and MAPE ≈ 3% indicate good in-sample performance. However, we verify a worse performance in the testwhere all errors increased significantly in the test and MASE > 1. ACF1 = 0.69 in the test suggests strong autocorrelation in the errors, which indicates underfitting to the most recent data - perhaps new patterns were not captured by the model which indicates of out-of-sample underfitting. The model seems not to have been able to capture recent changes or patterns specific to the test period that can be caused by recent structural changes or ARIMA model in the residuals with inappropriate order.
We will now fit an ETS model to the training data. The ets() function in the forecast package automatically selects the best model by testing different combinations of error, trend, and seasonality components (additive, multiplicative, damped, etc.) and choosing the one with the lowest AICc. This model is useful for time series with systematic components, such as trends and/or seasonality, but with low noise. However, this model does not focus on stationarity, where differentiations are undesirable.
# Fit an ETS model to the training data
ets_fit <- ets(ts_train)
summary(ets_fit)
## ETS(M,N,M)
##
## Call:
## ets(y = ts_train)
##
## Smoothing parameters:
## alpha = 0.2592
## gamma = 1e-04
##
## Initial states:
## l = 40906.018
## s = 1.243 1.0857 0.9475 0.8328 0.7968 0.8592
## 0.8235 0.8708 0.9344 1.1363 1.1667 1.3032
##
## sigma: 0.0445
##
## AIC AICc BIC
## 3314.285 3317.507 3360.874
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -111.0541 1824.54 1313.975 -0.4202313 3.22943 0.6048954 0.157429
The function selected an ETS(M,Ad,M) model. This notation stands for:
Parameter Value Interpretation alpha: 0.2592 - moderate value, indicates that the model gives weight to both recent and past observations for the level. gamma: 1e-04 - practically zero, seasonality practically fixed over the period, not very adaptive.
The vector s represents the seasonal components for the 12 periods of the seasonal cycle (months). The seasonality values show variations over the months, with some months showing higher values (winter months).
The sigma value is 0.0445, indicates the residual variability of the model and a relatively low value suggests controlled residuals. The criteria information values are much larger than the previous ones, indicating that this is not the best adjusted model for this time series.
The residual autocorrelation (ACF1 ≈ 0.16) suggests that there is some remaining dependence in the residuals, but it is slight. The model tends to slightly underestimate the values, as indicated by the negative mean errors (ME and MPE). The reason this model is not very well-adjusted can be attributed to the fact that our data is non-stationary and requires differencing transformations to address this issue. Since this model does not lead with, we may end up with a poorly adjusted model for the data.
Next, we perform diagnostic checks on the ETS model’s residuals.
# Perform residual diagnostics for the ETS model
checkresiduals(ets_fit)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,M)
## Q* = 53.978, df = 24, p-value = 0.0004292
##
## Model df: 0. Total lags used: 24
The diagnostic plots show:
The model does not passes successfully the diagnostic checks since the residuals do not behave like white noise.
# Generate forecasts using the fitted ETS model
ets_forecast <- forecast(ets_fit, h = length(ts_test))
autoplot(ets_forecast, include = forecast_horizon) +
autolayer(ts_test, series = "Actual Data") +
labs(title = "ETS Model Forecast vs. Actual",
x = "Year", y = "Consumption (GWh)") +
guides(colour = guide_legend(title = "Series")) +
theme_minimal()
In the same way that SARIMA models, the ETS forecast shows good results
over time. The blue-shaded bands represent the uncertainty levels
associated with the forecasts: the darker area, closer to the central
forecast line, indicates higher confidence in the predicted values
(95%), while the lighter bands indicate lower confidence (80%).
The observed values (Red line), for the most part, remain within the 95% confidence intervals, suggesting that the model effectively captures both the trend and the variability of the series. Moreover, the regular seasonal patterns are well represented, with the intervals successfully capturing the extremes of the historical series, indicating a good fit of the model to the seasonality.
The widening of the bands over time reveals the increasing uncertainty in longer-term forecasts, a typical and expected behavior in time series models.
# Calculate accuracy metrics
accuracy(ets_forecast, ts_test)
## ME RMSE MAE MPE MAPE MASE
## Training set -111.0541 1824.540 1313.975 -0.4202313 3.229430 0.6048954
## Test set -2149.3766 2856.831 2356.847 -6.0522804 6.604214 1.0849874
## ACF1 Theil's U
## Training set 0.1574290 NA
## Test set 0.6504006 0.7871231
In the training set, the model fits well, with relatively low error rates (MAPE ≈ 3.2%) and better performance than the naive model (MASE < 1). However, in the test set, errors increase significantly (MAPE >6%, MASE >1), indicating poorer out-of-sample performance. Additionally, the high residual autocorrelation in the test set (ACF1 ≈ 0.65) suggests that the model did not fully capture the series’ patterns, possibly due to underfitting or changes in the data’s dynamics.
In addition to error metrics, information criteria such as AIC (Akaike Information Criterion), AICc (corrected AIC) and BIC (Bayesian Information Criterion) are fundamental to assessing model fit, considering adherence to the data and model complexity.
These criteria penalise overly complex models and help avoid overfitting. They are particularly useful for comparing models fitted to the same set of data.
criteria <- data.frame(
Model = c("SARIMA_auto", "SARIMA_manual", "ETS", "STLM_ARIMA"),
AIC = c(AIC(auto_fit), AIC(manual_fit), ets_fit$aic, AIC(stlm_fit$model)),
AICc = c(AICc(auto_fit), AICc(manual_fit), ets_fit$aicc, AICc(stlm_fit$model)),
BIC = c(BIC(auto_fit), BIC(manual_fit), ets_fit$bic, BIC(stlm_fit$model))
)
criteria_num <- criteria
criteria_num[, -1] <- round(criteria[, -1], 2)
kable(criteria_num, caption = "Comparison of Information Criteria across Models") %>%
kable_styling(bootstrap_options = c("striped"), full_width = FALSE)
Model | AIC | AICc | BIC |
---|---|---|---|
SARIMA_auto | 2767.64 | 2768.05 | 2782.80 |
SARIMA_manual | 2758.90 | 2759.17 | 2770.99 |
ETS | 3314.29 | 3317.51 | 3360.87 |
STLM_ARIMA | 2932.62 | 2932.87 | 2945.02 |
When the AIC, AICc and BIC criteria are compared between the models, it can be seen that the SARIMA_manual model has the lowest values for all metrics, indicating a better balance between fit and model complexity. The SARIMA_auto model performed similarly, but slightly less well. In contrast, the STLM_ARIMA and ETS models had significantly higher values, suggesting that they fit the data less efficiently, possibly due to greater complexity or an inability to capture the patterns in the series. Therefore, based on the information criteria, the SARIMA_manual model is the most suitable for this time series.
To evaluate the predictive ability of the various models, we examined the error metrics in the test set. The metrics considered were RMSE (root mean squared error), MAE (mean absolute error), MAPE (mean absolute percentage error) and Theil’s U. These metrics provide different perspectives on the quality of the forecasts.
The table below summarises these metrics for the four evaluated models: SARIMA with automatic parameter selection (SARIMA_auto); SARIMA with manual parameters (SARIMA_manual); the ETS model; and the hybrid STLM model with ARIMA in the residuals (STLM_ARIMA). The aim of this comparison is to identify the model with the best predictive performance and the lowest error in forecasting future values of the series.
library(knitr)
results <- rbind(
SARIMA_auto = accuracy(sarima_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")],
SARIMA_manual = accuracy(sarima_manual_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")],
ETS = accuracy(ets_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")],
STLM_ARIMA = accuracy(stlm_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")]
)
kable(round(results, 2), caption = "Forecast Accuracy Comparison on Test Set") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
RMSE | MAE | MAPE | Theil’s U | |
---|---|---|---|---|
SARIMA_auto | 1860.80 | 1547.57 | 4.38 | 0.52 |
SARIMA_manual | 2016.46 | 1659.37 | 4.67 | 0.56 |
ETS | 2856.83 | 2356.85 | 6.60 | 0.79 |
STLM_ARIMA | 2879.01 | 2292.35 | 6.32 | 0.78 |
When the models were compared based on the forecasting metrics in the test set, the SARIMA_auto model exhibited the best overall performance, showing the lowest RMSE, MAE, MAPE and Theil’s U values. This indicates greater accuracy and better predictive capacity. SARIMA_manual performed reasonably well, albeit slightly less well than the automatic model. The ETS and STLM_ARIMA models, on the other hand, produced significantly higher error values, suggesting that they are less effective for forecasting this series. Therefore, the automatic SARIMA model is the most suitable choice for future forecasting.
In addition to quantitative metrics, visually comparing the predictions with the actual data is fundamental to understanding how each model behaves over time. The graph below shows the actual values of the test set alongside the forecasts generated by each model.
This visualisation enables us to identify any systematic deviations, as well as the models’ ability to capture seasonality and trends, and any delays or one-off errors in the forecasts.
autoplot(ts_test, series = "Actual") +
autolayer(sarima_forecast$mean, series = "SARIMA Auto", PI = FALSE) +
autolayer(sarima_manual_forecast$mean, series = "SARIMA Manual", PI = FALSE) +
autolayer(ets_forecast$mean, series = "ETS", PI = FALSE) +
autolayer(stlm_forecast$mean, series = "STLM+ARIMA", PI = FALSE) +
labs(title = "Model Forecasts vs Actual Data", y = "GWh", x = "Time") +
guides(colour = guide_legend(title = "Forecasts")) +
theme_minimal()
## Warning in ggplot2::geom_line(ggplot2::aes(x = .data[["timeVal"]], y = .data[["seriesVal"]], : Ignoring unknown parameters: `PI`
## Ignoring unknown parameters: `PI`
## Ignoring unknown parameters: `PI`
## Ignoring unknown parameters: `PI`
By comparing the model forecasts with the actual data, we conclude that
SARIMA_auto and SARIMA_manual produced the most accurate
predictions.
When analysing the information criteria (AIC, AICc and BIC) and forecast accuracy metrics (RMSE, MAE, MAPE and Theil’s U) in the test set, the SARIMA_auto and SARIMA_manual models emerged as the most suitable for modelling the time series.
SARIMA_manual produced the lowest information criterion values, indicating the best balance between model fit and complexity. However, SARIMA_auto obtained the lowest forecast errors in terms of predictive performance, suggesting greater capacity for generalisation and accuracy in future data.
Conversely, the ETS and STLM_ARIMA models exhibited significantly higher values in both the information criteria and the error metrics, suggesting poorer model fit and forecasting performance.
Therefore, the ideal choice depends on the objective: SARIMA_manual is more suitable for better historical adjustment, while SARIMA_auto is superior for future forecasting with lower error. Overall, SARIMA models are more appropriate for this data than ETS and STLM_ARIMA models. By comparing the models’ forecasts with the actual data, we can conclude that SARIMA_auto and SARIMA_manual produced the most accurate predictions.