Time Series Analysis of French Electricity Consumption: 2024-2025 Project

1. Introduction

This report details the analysis of monthly electricity consumption in France from January 2008 to March 2025. The primary objective is to develop a robust time series model capable of accurately forecasting future consumption.

The methodology follows the Box-Jenkins approach for building a Seasonal Autoregressive Integrated Moving Average (SARIMA) model. The process involves several key stages:

Exploratory Data Analysis: Initial visualization and decomposition of the data to understand its underlying structure, including trend and seasonality.
Data Transformation: Applying necessary transformations to stabilize the variance and stationarize the series.
Model Identification, Estimation, and Diagnostics: Proposing candidate SARIMA models based on the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), estimating model parameters, and performing rigorous diagnostic checks on the model’s residuals.
Forecasting and Accuracy Evaluation: Generating forecasts on a hold-out test set and evaluating their accuracy using standard metrics. 5.Results, Discussion and Conclusion: Compare the different results of the differnt models.

The dataset will be split into a training set for model building and a test set for forecast validation, as recommended.

2. Data and Exploratory Analysis

Data Loading and Preparation

First, we load the necessary R libraries and the dataset. The data represents the monthly electricity available to the internal market in France, measured in Gigawatt-hours (GWh).

#install.packages(c("forecast", "urca", "lmtest", "tseries", "ggplot2", "MuMIn", "kableExtra"))
library(MuMIn)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(urca)
library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(tseries)
library(ggplot2)
library(kableExtra)


# Load the data
dados <- read.csv("data/data2.csv")

# Create a time series object
# The data is monthly (frequency=12) and starts in January 2008.
ts_data <- ts(dados$OBS_VALUE, start = c(2008, 1), frequency = 12)
names(dados)

##  [1] "STRUCTURE"                                                  
##  [2] "STRUCTURE_ID"                                               
##  [3] "STRUCTURE_NAME"                                             
##  [4] "freq"                                                       
##  [5] "Time.frequency"                                             
##  [6] "nrg_bal"                                                    
##  [7] "Energy.balance"                                             
##  [8] "siec"                                                       
##  [9] "Standard.international.energy.product.classification..SIEC."
## [10] "unit"                                                       
## [11] "Unit.of.measure"                                            
## [12] "geo"                                                        
## [13] "Geopolitical.entity..reporting."                            
## [14] "TIME_PERIOD"                                                
## [15] "Time"                                                       
## [16] "OBS_VALUE"                                                  
## [17] "Observation.value"                                          
## [18] "OBS_FLAG"                                                   
## [19] "Observation.status..Flag..V2.structure"                     
## [20] "CONF_STATUS"                                                
## [21] "Confidentiality.status..flag."

head(dados)

##   STRUCTURE         STRUCTURE_ID
## 1  dataflow ESTAT:NRG_CB_EM(1.0)
## 2  dataflow ESTAT:NRG_CB_EM(1.0)
## 3  dataflow ESTAT:NRG_CB_EM(1.0)
## 4  dataflow ESTAT:NRG_CB_EM(1.0)
## 5  dataflow ESTAT:NRG_CB_EM(1.0)
## 6  dataflow ESTAT:NRG_CB_EM(1.0)
##                                                         STRUCTURE_NAME freq
## 1 Supply, transformation and consumption of electricity - monthly data    M
## 2 Supply, transformation and consumption of electricity - monthly data    M
## 3 Supply, transformation and consumption of electricity - monthly data    M
## 4 Supply, transformation and consumption of electricity - monthly data    M
## 5 Supply, transformation and consumption of electricity - monthly data    M
## 6 Supply, transformation and consumption of electricity - monthly data    M
##   Time.frequency nrg_bal               Energy.balance  siec
## 1        Monthly     AIM Available to internal market E7000
## 2        Monthly     AIM Available to internal market E7000
## 3        Monthly     AIM Available to internal market E7000
## 4        Monthly     AIM Available to internal market E7000
## 5        Monthly     AIM Available to internal market E7000
## 6        Monthly     AIM Available to internal market E7000
##   Standard.international.energy.product.classification..SIEC. unit
## 1                                                 Electricity  GWH
## 2                                                 Electricity  GWH
## 3                                                 Electricity  GWH
## 4                                                 Electricity  GWH
## 5                                                 Electricity  GWH
## 6                                                 Electricity  GWH
##   Unit.of.measure geo Geopolitical.entity..reporting. TIME_PERIOD Time
## 1   Gigawatt-hour  FR                          France     2008-01   NA
## 2   Gigawatt-hour  FR                          France     2008-02   NA
## 3   Gigawatt-hour  FR                          France     2008-03   NA
## 4   Gigawatt-hour  FR                          France     2008-04   NA
## 5   Gigawatt-hour  FR                          France     2008-05   NA
## 6   Gigawatt-hour  FR                          France     2008-06   NA
##   OBS_VALUE Observation.value OBS_FLAG Observation.status..Flag..V2.structure
## 1     49891                NA       NA                                     NA
## 2     45277                NA       NA                                     NA
## 3     46605                NA       NA                                     NA
## 4     41417                NA       NA                                     NA
## 5     35384                NA       NA                                     NA
## 6     34787                NA       NA                                     NA
##   CONF_STATUS Confidentiality.status..flag.
## 1          NA                            NA
## 2          NA                            NA
## 3          NA                            NA
## 4          NA                            NA
## 5          NA                            NA
## 6          NA                            NA

Exploratory Analysis

To study the existence of outliers we can use a boxplot.

dados$OBS_VALUE <- as.numeric(gsub(",", ".", dados$OBS_VALUE))
boxplot(dados$OBS_VALUE, main = "Boxplot Of France Energy Consumption", col = "lightblue")

As we can observe, the box plot shows us that we do not have outliers.

We begin by plotting the time series to visually inspect its components.

autoplot(ts_data,
         ylab = "Consumption (GWh)",
         xlab = "Year",
         main = "Monthly Electricity Consumption in France (2008-2025)")

The plot reveals two key characteristics:

Strong Seasonality: There is a clear and repeating annual pattern, with consumption peaking in the winter months and reaching a low in the summer.
Trend: There appears to be a slight downward trend in consumption over the years, particularly after 2010.
Varience: It seems that the variance also varies a little bit, specially comparing the earlier years with the older ones. To better visualize these components, we decompose the series.

Data Splitting

To properly evaluate our model’s forecasting ability, we split the data into a training set (80% of the data) for model fitting and a test set (the remaining 20%) for validation.

# Split data: 80% for training, 20% for testing
n <- length(ts_data)
train_size <- floor(0.8 * n)
ts_train <- window(ts_data, end = time(ts_data)[train_size])
ts_test <- window(ts_data, start = time(ts_data)[train_size + 1])

cat("Training set length:", length(ts_train), "observations\n")

## Training set length: 165 observations

cat("Test set length:", length(ts_test), "observations\n")

## Test set length: 42 observations

We will only apply the transformations to the training set (and repeat in the test when necessary) to avoid data leakage.

We opted for classical decomposition rather than STL, since the trend and seasonality are clear and stable.

# Decompose the series to show trend, seasonality, and remainder
decomposed <- decompose(ts_data)
autoplot(decomposed)

The decomposition confirms the strong seasonal pattern and a visible, though somewhat noisy, trend.

Stationarity Assessment

To stabilize the variance, we should apply a log transformation. Visually, the series shows more intense seasonal variation at higher values (2008-2012), and then less over time. This indicates that the variance depends on the level of the series, which justifies the use of log().

log_ts <- log(ts_train)
plot(log_ts)

We already now that the trend is not constant just by observing the time series plot. However, we will do some stacionarity tests with ADF e KPSS to check if we do need to differenciate the time series. We can confirm this with statistical tests.

Augmented Dickey-Fuller (ADF) Test: Null hypothesis \[H_0\] is that the series is non-stationary.
KPSS Test: Null hypothesis \[H_0\] is that the series is stationary.

# ADF test on the training data
adf.test(log_ts)

## Warning in adf.test(log_ts): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  log_ts
## Dickey-Fuller = -9.89, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

# KPSS test on the training data
kpss.test(log_ts)

## 
##  KPSS Test for Level Stationarity
## 
## data:  log_ts
## KPSS Level = 0.53416, Truncation lag parameter = 4, p-value = 0.03397

The ADF test’s low p-value (< 0.05) means we can reject \[H_0\] , suggesting stationarity. The KPSS test’s low p-value (< 0.05) leads us to reject \[H_0\] , indicating non-stationarity.

We have an ambiguous situation, assuming that the series is not yet fully stationary, it is safer to differentiate the series.

Achieving Stationarity through Differencing

we use the ndiffs() and nsdiffs() functions from the forecast package to determine how many differentiations are needed to make the logarithmically transformed series (log_ts) stationary.

# library(forecast)
ndiffs(log_ts)     # trend

## [1] 1

nsdiffs(log_ts)    # sazonality

## [1] 1

To make the series stationary, we apply differencing. Given the strong 12-month seasonality, we start with seasonal differencing (lag=12).

# Apply seasonal differencing (D=1)
ts_seasonal_diff <- diff(log_ts, lag = 12)
autoplot(ts_seasonal_diff, main = "Seasonally Differenced Series")

The seasonal pattern is gone, but a trend might still be present. We now apply a regular first difference to remove any remaining trend.

# Apply regular differencing to the seasonally differenced series (d=1)
ts_stationary <- diff(ts_seasonal_diff, differences = 1)
autoplot(ts_stationary, main = "Seasonally and Regularly Differenced Series")

This series appears stationary. Let’s re-run the tests to confirm.

adf.test(ts_stationary)

## Warning in adf.test(ts_stationary): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_stationary
## Dickey-Fuller = -8.1083, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

kpss.test(ts_stationary)

## Warning in kpss.test(ts_stationary): p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  ts_stationary
## KPSS Level = 0.028599, Truncation lag parameter = 4, p-value = 0.1

After applying seasonal differencing (lag = 12) and a first regular difference, we re-evaluated the series for stationarity. The Augmented Dickey-Fuller (ADF) test strongly rejected the null hypothesis of non-stationarity (p < 0.01), and the KPSS test failed to reject the null of stationarity (p > 0.1). These results confirm that the differenced series is stationary.

Study of ACF/PACF functions

The ACF and PACF must be analysed in the stationary series that will be modelled - that is, after applying the necessary transformations and differentiations. They are basic tools for analysing time series and helping to choose models (such as ARIMA).

ACF - Autocorrelation Function Shows the correlation between the series and its lags and, useful to choose the p value (AR order)
PACF - Partial Autocorrelation Function Shows the correlation of the series with its lags, but controlling for intermediate correlations, useful to choose the q value (MA order).

# ACF/PACF on log series
# acf(log_ts, main="CF of log-transformed series")
# pacf(log_ts, main="PACF of log-transformed series")

# ACF/PACF on diffenciated series
# acf(ts_stationary, main="ACF of the log-transformed series and differentiated 2 times")
# pacf(ts_stationary, main="PACF of the log-transformed and 2-fold differentiated series")

tsdisplay(ts_stationary, main="ACF/PACF of Stationary Series")

- Seasonal Order (P, Q): At the seasonal lag 12, the ACF has a significant negative spike, and the PACF cuts off afterward. This suggests a seasonal MA(1) model, so we set Q=1 and P=0. - Non-Seasonal Order (p, q): In the non-seasonal lags, the ACF has a significant spike at lag 1 and the PACF has a significant spike at lag 1. This could suggest either an AR(1) or MA(1) model. Based on this, a good candidate model is SARIMA(p,1,q)(0,1,1)[12]. We’ll let auto.arima() find the optimal p and q values.

Both graphs are a little difficult to interpret since they are not so linear on the values presented, they do not cut off and the do not decay exponencially. For that reason, we will apply an auto.arima() that tests many combinations of values (p,d,q)(P,D,Q)[s] e select the best model with lower AICc ou BIC values. Since it performs all the transformations needed automatically (log and differences, we will apply the auto.arima() function to the original data (ts_data).

3. Model Proposal and Diagnostics

Model Identification and Estimation

We will create different models in order to compare methodologies for scientific purposes. Since the time series has many characteristics, it is important to use different models. The purpose of this project is to compare different approaches and select the best-fitting model without overfitting. To achieve this, we will compare the adjustment of point 3 to the accuracy values of point 4 to find the optimal balance between the two.

3.1. Auto Arima

Since we are going to carry out forecasting later, we have divided our data into training and test sets. The models will only be fitted and modelled using the training set, and we will then be able to test and evaluate the forecasts.

# Fit model using auto.arima on the training data
auto_fit <- auto.arima(ts_train, stepwise = FALSE, approximation = FALSE)
summary(auto_fit)

## Series: ts_train 
## ARIMA(1,0,0)(0,1,2)[12] with drift 
## 
## Coefficients:
##          ar1     sma1    sma2     drift
##       0.3712  -1.0536  0.1825  -38.3646
## s.e.  0.0769   0.0987  0.1008    4.5860
## 
## sigma^2 = 3551888:  log likelihood = -1378.82
## AIC=2767.64   AICc=2768.05   BIC=2782.8
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE     MASE        ACF1
## Training set 107.0486 1790.939 1231.609 0.1500579 3.025571 0.566978 -0.02814851

auto.arima() selected a SARIMA(2,0,0)(0,1,2)[12] model. The selected model was partly in line with our expectations, as established through the initial exploratory analysis. Given the behaviour of the observed data, the presence of a seasonal differentiation with periodicity 12 and a trend differentiation was already expected. However, the order parameters of the model, especially the p and q values, were unknown, introducing some unexpected elements to the modelling process. This highlights the importance of automated methods, such as auto.arima(), in capturing patterns that are not always evident in preliminary visual or statistical analyses.

Next, we perform diagnostic checks on the model’s residuals to ensure they behave like white noise (i.e., are random and uncorrelated).

# Perform residual diagnostics
checkresiduals(auto_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(0,1,2)[12] with drift
## Q* = 22.728, df = 21, p-value = 0.3586
## 
## Model df: 3.   Total lags used: 24

The diagnostic plots show:

The residuals plot shows no obvious patterns, appearing random around a mean of zero.
The ACF plot of residuals have only one significant spikes which means in general there is no correlation between the residuals.
The Ljung-Box test has a large p-value (0.35), meaning we cannot reject the null hypothesis that the residuals are independently distributed. The model successfully passes the diagnostic checks, since the residuals behave like white noise, without patterns and non-autocorrelated.

3.2. Future Observations Forecast for (Auto.Arima) SARIMA(2,0,0)(0,1,2)[12]

We now use our final model, to forecast the next 41 observations.

# Generate forecasts
forecast_horizon <- length(ts_test)
sarima_forecast <- forecast(auto_fit, h = forecast_horizon)

autoplot(sarima_forecast, include = forecast_horizon) +
  autolayer(ts_test, series = "Actual Data") +
  labs(title = "(Auto.Arima) SARIMA(2,0,0)(0,1,2)[12] Forecast vs. Actual Data",
       x = "Year", y = "Consumption (GWh)") +
  guides(colour = guide_legend(title = "Series")) +
  theme_minimal()

The SARIMA forecast shows good results over time. The blue-shaded bands represent the uncertainty levels associated with the forecasts: the darker area, closer to the central forecast line, indicates higher confidence in the predicted values (95%), while the lighter bands indicate lower confidence (80%).

The observed values (red line), for the most part, remain within the 95% confidence intervals, suggesting that the model effectively captures both the trend and the variability of the series. Moreover, the regular seasonal patterns are well represented, with the intervals successfully capturing the extremes of the historical series, indicating a good fit of the model to the seasonality.

The widening of the bands over time reveals the increasing uncertainty in longer-term forecasts, a typical and expected behavior in time series models.

3.3 Forecast Accuracy

Finally, we compare the forecasts to the held-out test data to evaluate the model’s accuracy.

# Calculate accuracy metrics by comparing forecast to test set
accuracy(sarima_forecast, ts_test)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  107.0486 1790.939 1231.609  0.1500579 3.025571 0.5669780
## Test set     -536.0502 1860.800 1547.569 -1.2363108 4.375022 0.7124317
##                     ACF1 Theil's U
## Training set -0.02814851        NA
## Test set      0.61872986 0.5181165

The accuracy metrics, particularly the Mean Absolute Percentage Error (MAPE) on the test set (4.37%), indicate that the model’s forecasts are, on average, within approximately 4,5% of the actual values. This demonstrates a high level of accuracy. The Root Mean Squared Error (RMSE) gives a measure of the typical error magnitude in GWh.

4.1. Alternative SARIMA MODEL: SARIMA(1,1,1)(0,1,1)[12]

This model need to be applied on the transformated data, ts_stacionary, the result of the transformmations and differences.

manual_fit <- Arima(ts_train, order = c(1,1,1), 
                    seasonal = list(order = c(0,1,1), period = 12),
                    include.drift = FALSE)  # usually better to exclude drift when d=1
summary(manual_fit)

## Series: ts_train 
## ARIMA(1,1,1)(0,1,1)[12] 
## 
## Coefficients:
##          ar1      ma1     sma1
##       0.3651  -0.9689  -1.0000
## s.e.  0.0821   0.0404   0.1472
## 
## sigma^2 = 3428576:  log likelihood = -1375.45
## AIC=2758.9   AICc=2759.17   BIC=2770.99
## 
## Training set error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -254.2903 1759.576 1236.777 -0.7515563 3.069451 0.5693571
##                     ACF1
## Training set -0.02769854

Model Parameters, Information Criteria and Training errors measures

AR (ar1 = 0.3651): significant (standard error much smaller than the coefficient).
MA (ma1 = -0.9689): practically -1, indicating a strong moving average component.
Seasonal MA (sma1 = -1.0000): extreme value, suggesting strong seasonal dependence. The standard error is reasonable (0.1472), so it’s probably significant too.

The coefficientes make sense and seem well estimated.

By comparing the values of AIC, AICc and BIC with values of the first SARIMA model (SARIMA(2,0,0)(0,1,2)[12]), we can observe that these values are lower, indicating that this models is better adjusted to the data.

Training errors ME: -254.29 - tendency to slightly underestimate RMSE: 1759.58 - the smaller the better MAPE: 3.07% - Values below 10% are generally good, excellent percentage accuracy ACF1: -0.028 - no autocorrelation, close to 0, which is desirable.

To increase confidence in the model, we will check the residuals:

checkresiduals(manual_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)(0,1,1)[12]
## Q* = 28.317, df = 21, p-value = 0.1314
## 
## Model df: 3.   Total lags used: 24

The diagnostic plots show:

The residuals plot shows no obvious patterns, appearing random around a mean of zero.
The ACF plot of residuals have one or two significant spikes which means in general there is no correlation between the residuals.
The Ljung-Box test has a large p-value (0.13), meaning we cannot reject the null hypothesis that the residuals are independently distributed. The model successfully passes the diagnostic checks, since the residuals behave like white noise, without patterns and non-autocorrelated, exactly like the other SARIMA model tested.

4.2. Future Observations Forecast for SARIMA(1,1,1)(0,1,1)[12]

The forecast generated by the SARIMA(1,1,1)(0,1,1)[12] model for future data was evaluated against the actual values of the test set. The main results are presented below:

# Forecast with manual SARIMA
sarima_manual_forecast <- forecast(manual_fit, h = length(ts_test))

# Plot forecasts
autoplot(sarima_manual_forecast, include = forecast_horizon) +
  autolayer(ts_test, series = "Actual Data") +
  labs(title = "SARIMA(1,1,1)(0,1,1)[12] Forecast vs. Actual Data",
       x = "Year", y = "Consumption (GWh)") +
  guides(colour = guide_legend(title = "Series")) +
  theme_minimal()

In the same way taht Auto-Arima, this SARIMA forecast performs well when comparing the predicted data with the actual data. The blue-shaded bands represent the levels of uncertainty associated with the forecasts: the darker area, closer to the central forecast line, indicates higher confidence in the estimated values, while the lighter bands indicate lower confidence.

The actual values (red line) mostly remain within the 95% confidence intervals, suggesting that the model adequately captures the trend and variability of the series during the forecast period. Furthermore, the seasonal patterns observed in the actual data are well reproduced by the model, indicating a good fit to the seasonality present in the series.

The widening of the bands over time highlights the increasing uncertainty as the forecast extends further into the future, which is expected and characteristic of time series models.

# Forecast accuracy
accuracy(sarima_manual_forecast, ts_test)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -254.2903 1759.576 1236.777 -0.7515563 3.069451 0.5693571
## Test set     -631.7464 2016.456 1659.371 -1.4554391 4.673250 0.7639005
##                     ACF1 Theil's U
## Training set -0.02769854        NA
## Test set      0.65713280 0.5581256

The accuracy metrics, particularly the Mean Absolute Percentage Error (MAPE) on the test set (4.67%), is still a very good value. This demonstrates a high level of accuracy. Theil’s U = 0.558: a value below 1 indicates that the model outperforms a naive prediction, validating its predictive utility.

When we evaluated the test set, we observed a natural increase in absolute and percentage errors, although these remained at acceptable levels (MAPE of 4.67%). However, there was also a significant increase in the autocorrelation of the residuals (ACF1 = 0.657), which suggests that the model could be improved by testing other SARIMA orders, for example, as part of the structure of the future data has not been captured. In terms of possible overfitting, it’s not a severe overfitting, but there are signs of underfitting to the new behaviour of the series, possible limitation in the ability to generalise.

5.1 Alternative Model: STL

We will now model with STL (Seasonal-Trend Decomposition using Loess) that is a method of decomposing time series that divides them into three main components: trend, seasonality, and residuals. It helps prepare for modelling the residuals. Once trend and seasonality have been removed (already done - ts_stationary), the residuals can be modelled more accurately using models such as ETS. STL can be adjusted to be less sensitive to extreme values and more robust to outliers. This model allows unstructured fluctuations to be modelled separately with an ARIMA model. This strategy becomes useful when there is complexity or non-linear variation in the seasonal components.

stlm_fit <- stlm(ts_train, s.window = "periodic", method = "arima")
stlm_fit$model

## Series: x 
## ARIMA(2,1,1) 
## 
## Coefficients:
##          ar1     ar2      ma1
##       0.3385  0.0386  -0.9154
## s.e.  0.0854  0.0836   0.0341
## 
## sigma^2 = 3291384:  log likelihood = -1462.31
## AIC=2932.62   AICc=2932.87   BIC=2945.02

summary(stlm_fit)

##               Length Class          Mode     
## stl           660    mstl           numeric  
## model          18    forecast_ARIMA list     
## modelfunction   1    -none-         function 
## lambda          0    -none-         NULL     
## x             165    ts             numeric  
## series          1    -none-         character
## m               1    -none-         numeric  
## fitted        165    ts             numeric  
## residuals     165    ts             numeric

The ARIMA model adjusted after the STL decomposition was ARIMA(2,1,1).

checkresiduals(stlm_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 32.585, df = 24, p-value = 0.1131
## 
## Model df: 0.   Total lags used: 24

The diagnostic plots show:

The residuals plot shows no obvious patterns, appearing random around a mean of zero.
The ACF plot of residuals have one or two significant spikes which means in general there is no correlation between the residuals.
The Ljung-Box test has a large p-value (0.11), meaning we cannot reject the null hypothesis that the residuals are independently distributed. The model successfully passes the diagnostic checks, since the residuals behave like white noise, without patterns and non-autocorrelated.

The STL + ARIMA model was effective in capturing the seasonal and trend patterns of the series. Analysis of the residuals shows no significant autocorrelation, suggesting that the model is adequate and has left no unmodelled systematic patterns.

5.2. Future Observations Forecast for STL model

# Forecast with STL + ARIMA
stlm_forecast <- forecast(stlm_fit, h = length(ts_test))

# Plot forecasts
autoplot(stlm_forecast, include = forecast_horizon) +
  autolayer(ts_test, series = "Actual Data") +
  labs(title = "STL + ARIMA(2,1,1) Forecast vs. Actual",
       x = "Year", y = "Consumption (GWh)") +
  guides(colour = guide_legend(title = "Series")) +
  theme_minimal()

The forecast generated by the STL + ARIMA(2,1,1) model shows satisfactory results over time. The blue-shaded bands represent the levels of uncertainty associated with the forecasts: the darker area, close to the central forecast line, indicates higher confidence in the estimated values, while the lighter bands indicate lower confidence.

The actual values (red line), for the most part, remain within the 95% confidence intervals, but unlike previous models, at times they fall only within the 80% range or even outside the intervals. Still, the model manages to capture the trend and variability of the time series well for most of the forecast horizon. The seasonal patterns present in the data are adequately represented, reflecting good performance of the seasonal decomposition performed by STL.

The widening of the bands over time highlights the natural increase in uncertainty for more distant periods, an expected behavior in time series forecasts.

# Forcast accuracy
accuracy(stlm_forecast, ts_test)

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  -154.9727 1792.092 1284.789 -0.5790029 3.174841 0.5914597
## Test set     -1975.0121 2879.010 2292.354 -5.3710130 6.317898 1.0552978
##                      ACF1 Theil's U
## Training set -0.004816872        NA
## Test set      0.696125655 0.7794289

Observing the forecasts plot we can observe that the forecasts capture the essence. This model has good training performance where the training residuals are practically white noise (ACF1 ≈ 0) and MASE < 1 and MAPE ≈ 3% indicate good in-sample performance. However, we verify a worse performance in the testwhere all errors increased significantly in the test and MASE > 1. ACF1 = 0.69 in the test suggests strong autocorrelation in the errors, which indicates underfitting to the most recent data - perhaps new patterns were not captured by the model which indicates of out-of-sample underfitting. The model seems not to have been able to capture recent changes or patterns specific to the test period that can be caused by recent structural changes or ARIMA model in the residuals with inappropriate order.

6.1. Alternative Model: Exponential Smoothing (ETS)

We will now fit an ETS model to the training data. The ets() function in the forecast package automatically selects the best model by testing different combinations of error, trend, and seasonality components (additive, multiplicative, damped, etc.) and choosing the one with the lowest AICc. This model is useful for time series with systematic components, such as trends and/or seasonality, but with low noise. However, this model does not focus on stationarity, where differentiations are undesirable.

# Fit an ETS model to the training data
ets_fit <- ets(ts_train)
summary(ets_fit)

## ETS(M,N,M) 
## 
## Call:
## ets(y = ts_train)
## 
##   Smoothing parameters:
##     alpha = 0.2592 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 40906.018 
##     s = 1.243 1.0857 0.9475 0.8328 0.7968 0.8592
##            0.8235 0.8708 0.9344 1.1363 1.1667 1.3032
## 
##   sigma:  0.0445
## 
##      AIC     AICc      BIC 
## 3314.285 3317.507 3360.874 
## 
## Training set error measures:
##                     ME    RMSE      MAE        MPE    MAPE      MASE     ACF1
## Training set -111.0541 1824.54 1313.975 -0.4202313 3.22943 0.6048954 0.157429

The function selected an ETS(M,Ad,M) model. This notation stands for:

Error: Multiplicative (M)
Trend: Damped Additive (Ad)
Seasonality: Multiplicative (M) This choice is sensible, as it acknowledges the multiplicative nature of the seasonality seen in the initial plots, while incorporating a trend component that dampens over time.

Smoothing parameters

Parameter Value Interpretation alpha: 0.2592 - moderate value, indicates that the model gives weight to both recent and past observations for the level. gamma: 1e-04 - practically zero, seasonality practically fixed over the period, not very adaptive.

The vector s represents the seasonal components for the 12 periods of the seasonal cycle (months). The seasonality values show variations over the months, with some months showing higher values (winter months).

The sigma value is 0.0445, indicates the residual variability of the model and a relatively low value suggests controlled residuals. The criteria information values are much larger than the previous ones, indicating that this is not the best adjusted model for this time series.

The residual autocorrelation (ACF1 ≈ 0.16) suggests that there is some remaining dependence in the residuals, but it is slight. The model tends to slightly underestimate the values, as indicated by the negative mean errors (ME and MPE). The reason this model is not very well-adjusted can be attributed to the fact that our data is non-stationary and requires differencing transformations to address this issue. Since this model does not lead with, we may end up with a poorly adjusted model for the data.

Next, we perform diagnostic checks on the ETS model’s residuals.

# Perform residual diagnostics for the ETS model
checkresiduals(ets_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,M)
## Q* = 53.978, df = 24, p-value = 0.0004292
## 
## Model df: 0.   Total lags used: 24

The diagnostic plots show:

The residuals plot shows no obvious patterns, appearing random around a mean of zero.
The ACF plot of residuals have some significant spikes which means in general there is correlation between the residuals, even a little.
The Ljung-Box test has a large p-value (0), meaning we can reject the null hypothesis that the residuals are independently distributed, meaning that they are not white noise.

The model does not passes successfully the diagnostic checks since the residuals do not behave like white noise.

6.2. Future Observations Forecast for ETS model

# Generate forecasts using the fitted ETS model
ets_forecast <- forecast(ets_fit, h = length(ts_test))

autoplot(ets_forecast, include = forecast_horizon) +
  autolayer(ts_test, series = "Actual Data") +
  labs(title = "ETS Model Forecast vs. Actual",
       x = "Year", y = "Consumption (GWh)") +
  guides(colour = guide_legend(title = "Series")) +
  theme_minimal()

In the same way that SARIMA models, the ETS forecast shows good results over time. The blue-shaded bands represent the uncertainty levels associated with the forecasts: the darker area, closer to the central forecast line, indicates higher confidence in the predicted values (95%), while the lighter bands indicate lower confidence (80%).

The observed values (Red line), for the most part, remain within the 95% confidence intervals, suggesting that the model effectively captures both the trend and the variability of the series. Moreover, the regular seasonal patterns are well represented, with the intervals successfully capturing the extremes of the historical series, indicating a good fit of the model to the seasonality.

The widening of the bands over time reveals the increasing uncertainty in longer-term forecasts, a typical and expected behavior in time series models.

# Calculate accuracy metrics
accuracy(ets_forecast, ts_test)

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  -111.0541 1824.540 1313.975 -0.4202313 3.229430 0.6048954
## Test set     -2149.3766 2856.831 2356.847 -6.0522804 6.604214 1.0849874
##                   ACF1 Theil's U
## Training set 0.1574290        NA
## Test set     0.6504006 0.7871231

In the training set, the model fits well, with relatively low error rates (MAPE ≈ 3.2%) and better performance than the naive model (MASE < 1). However, in the test set, errors increase significantly (MAPE >6%, MASE >1), indicating poorer out-of-sample performance. Additionally, the high residual autocorrelation in the test set (ACF1 ≈ 0.65) suggests that the model did not fully capture the series’ patterns, possibly due to underfitting or changes in the data’s dynamics.

7. Compare Models Results

Model Information Criteria Comparison

In addition to error metrics, information criteria such as AIC (Akaike Information Criterion), AICc (corrected AIC) and BIC (Bayesian Information Criterion) are fundamental to assessing model fit, considering adherence to the data and model complexity.

These criteria penalise overly complex models and help avoid overfitting. They are particularly useful for comparing models fitted to the same set of data.

criteria <- data.frame(
  Model = c("SARIMA_auto", "SARIMA_manual", "ETS", "STLM_ARIMA"),
  AIC = c(AIC(auto_fit), AIC(manual_fit), ets_fit$aic, AIC(stlm_fit$model)),
  AICc = c(AICc(auto_fit), AICc(manual_fit), ets_fit$aicc, AICc(stlm_fit$model)),
  BIC = c(BIC(auto_fit), BIC(manual_fit), ets_fit$bic, BIC(stlm_fit$model))
)

criteria_num <- criteria
criteria_num[, -1] <- round(criteria[, -1], 2)
kable(criteria_num, caption = "Comparison of Information Criteria across Models") %>%
  kable_styling(bootstrap_options = c("striped"), full_width = FALSE)

Comparison of Information Criteria across Models
Model	AIC	AICc	BIC
SARIMA_auto	2767.64	2768.05	2782.80
SARIMA_manual	2758.90	2759.17	2770.99
ETS	3314.29	3317.51	3360.87
STLM_ARIMA	2932.62	2932.87	2945.02

When the AIC, AICc and BIC criteria are compared between the models, it can be seen that the SARIMA_manual model has the lowest values for all metrics, indicating a better balance between fit and model complexity. The SARIMA_auto model performed similarly, but slightly less well. In contrast, the STLM_ARIMA and ETS models had significantly higher values, suggesting that they fit the data less efficiently, possibly due to greater complexity or an inability to capture the patterns in the series. Therefore, based on the information criteria, the SARIMA_manual model is the most suitable for this time series.

Comparison of Model Forecast Accuracy on Test Set

To evaluate the predictive ability of the various models, we examined the error metrics in the test set. The metrics considered were RMSE (root mean squared error), MAE (mean absolute error), MAPE (mean absolute percentage error) and Theil’s U. These metrics provide different perspectives on the quality of the forecasts.

The table below summarises these metrics for the four evaluated models: SARIMA with automatic parameter selection (SARIMA_auto); SARIMA with manual parameters (SARIMA_manual); the ETS model; and the hybrid STLM model with ARIMA in the residuals (STLM_ARIMA). The aim of this comparison is to identify the model with the best predictive performance and the lowest error in forecasting future values of the series.

library(knitr)

results <- rbind(
  SARIMA_auto = accuracy(sarima_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")],
  SARIMA_manual = accuracy(sarima_manual_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")],
  ETS = accuracy(ets_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")],
  STLM_ARIMA = accuracy(stlm_forecast, ts_test)[2, c("RMSE", "MAE", "MAPE", "Theil's U")]
)
kable(round(results, 2), caption = "Forecast Accuracy Comparison on Test Set") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Forecast Accuracy Comparison on Test Set
	RMSE	MAE	MAPE	Theil’s U
SARIMA_auto	1860.80	1547.57	4.38	0.52
SARIMA_manual	2016.46	1659.37	4.67	0.56
ETS	2856.83	2356.85	6.60	0.79
STLM_ARIMA	2879.01	2292.35	6.32	0.78

When the models were compared based on the forecasting metrics in the test set, the SARIMA_auto model exhibited the best overall performance, showing the lowest RMSE, MAE, MAPE and Theil’s U values. This indicates greater accuracy and better predictive capacity. SARIMA_manual performed reasonably well, albeit slightly less well than the automatic model. The ETS and STLM_ARIMA models, on the other hand, produced significantly higher error values, suggesting that they are less effective for forecasting this series. Therefore, the automatic SARIMA model is the most suitable choice for future forecasting.

Visualising forecasts

In addition to quantitative metrics, visually comparing the predictions with the actual data is fundamental to understanding how each model behaves over time. The graph below shows the actual values of the test set alongside the forecasts generated by each model.

This visualisation enables us to identify any systematic deviations, as well as the models’ ability to capture seasonality and trends, and any delays or one-off errors in the forecasts.

autoplot(ts_test, series = "Actual") +
  autolayer(sarima_forecast$mean, series = "SARIMA Auto", PI = FALSE) +
  autolayer(sarima_manual_forecast$mean, series = "SARIMA Manual", PI = FALSE) +
  autolayer(ets_forecast$mean, series = "ETS", PI = FALSE) +
  autolayer(stlm_forecast$mean, series = "STLM+ARIMA", PI = FALSE) +
  labs(title = "Model Forecasts vs Actual Data", y = "GWh", x = "Time") +
  guides(colour = guide_legend(title = "Forecasts")) +
  theme_minimal()

## Warning in ggplot2::geom_line(ggplot2::aes(x = .data[["timeVal"]], y = .data[["seriesVal"]], : Ignoring unknown parameters: `PI`
## Ignoring unknown parameters: `PI`
## Ignoring unknown parameters: `PI`
## Ignoring unknown parameters: `PI`

By comparing the model forecasts with the actual data, we conclude that SARIMA_auto and SARIMA_manual produced the most accurate predictions.

Final conclusions and Discussion

When analysing the information criteria (AIC, AICc and BIC) and forecast accuracy metrics (RMSE, MAE, MAPE and Theil’s U) in the test set, the SARIMA_auto and SARIMA_manual models emerged as the most suitable for modelling the time series.

SARIMA_manual produced the lowest information criterion values, indicating the best balance between model fit and complexity. However, SARIMA_auto obtained the lowest forecast errors in terms of predictive performance, suggesting greater capacity for generalisation and accuracy in future data.

Conversely, the ETS and STLM_ARIMA models exhibited significantly higher values in both the information criteria and the error metrics, suggesting poorer model fit and forecasting performance.

Therefore, the ideal choice depends on the objective: SARIMA_manual is more suitable for better historical adjustment, while SARIMA_auto is superior for future forecasting with lower error. Overall, SARIMA models are more appropriate for this data than ETS and STLM_ARIMA models. By comparing the models’ forecasts with the actual data, we can conclude that SARIMA_auto and SARIMA_manual produced the most accurate predictions.