#### Given N Forecasting Models, What To Do? **Given** *N* **Forecasting Models, What To Do?**

Fabrizio Culotta <sup>a</sup> Fabrizio Culotta

<sup>a</sup> Department of Political and International Science, University of Genoa, Genoa, Italy.

### 1. Introduction

It is well known that the future is uncertain. Against this uncertainty, economic agents plan their economic activity accordingly. In this planning, producing forecasts of the quantity of interest is the traditional way of uncovering possible not-yet-realized trajectories. Feedback from estimated future dynamics will then influence actual planning and business activities. This is true also for private decision-makers, like firms and other types of organizations, but especially for public policy-makers since their activities produce effects at the whole country level.

The increasing availability of data, together with progress in computational techniques, have incentivized researchers to construct more sophisticated forecasting models and to increase the accuracy of their performances. Nowadays, available forecasting models range from classical econometric models, e.g. ARIMA, to non-parametric models, e.g. exponential smoothing, to machine-learning, e.g. trees and neural networks. It results in a plethora of single forecasting models available to both private and public decision-makers. Since the late '70s, a group of academic researchers proposed the idea of competition among different forecasting models (Makridakis et al., 1982). It emerged that statistically sophisticated models do not necessarily produce more accurate forecasts, whereas combinations of them outperform vis-a´-vis single models. Moreover, the ranking of forecasting models depends on the accuracy measure being as well as on the adopted forecast horizon. The success of the first so-called *M-competition* (M stands to Makridakis) allowed us to carry on the tradition of forecasting competitions (Hyndman, 2020) until today with the recent M4 and M5 competitions (Petropoulos and Makridakis, 2020; Makridakis et al., 2021). Given a set of time series at different frequencies, several models compete to produce the best forecast. Models? performances are then ranked based on some accuracy measures. Based on the idea of competition among different forecasting methods, this work compares their forecasting performances on a given time horizon. Unlike the tradition of Ms competitions, which are based on thousands of time series at different time frequencies, a single univariate time series is selected at the monthly frequency.

The motivation of this choice is to show that, in the simplest exercise of forecasting a single time series, the ex-ante choice of the model is likely to be misleading because a model ranking exists and it is specific to time (hence, frequency) and of measurement object of the single series. Indeed, when a set of forecasting models is available, a semi-automatic algorithm of model selection based on some performance measures would be a superior choice for the various decision-makers. In the case at hand, the choice of the monthly unemployment rate is dictated by the fact that it is the most common measure of the (mis-)functioning of the labour market and, as such, is of utmost importance for policymakers.

Forecasting models are finally ranked based on some accuracy measures. The main findings confirm that, given N forecasting models, combination techniques outperform single uncombined models in terms of accuracy and reduce the risk of adopting a single forecasting model.

Fabrizio Culotta, University of Genoa, Italy, fabrizio.culotta@edu.unige.it, 0000-0002-3910-3088

Referee List (DOI 10.36253/fup\_referee\_list)

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Fabrizio Culotta, *Given N Forecasting Models, What To Do*, © Author(s), CC BY 4.0, DOI 10.36253/979-12-215-0106-3.55, in Enrico di Bella, Luigi Fabbris, Corrado Lagazio (edited by), *ASA 2022 Data-Driven Decision Making. Book of short papers*, pp. 317-322, 2023, published by Firenze University Press and Genova University Press, ISBN 979-12-215-0106-3, DOI 10.36253/979- 12-215-0106-3

# 2. Forecasting Models

The comparative forecasting exercise presented in this work comprises a set of 23 different uncombined and combined models. The selected time series on which all models are trained is the deseasoned dynamics of the Italian unemployment rate over the years 2004 − 2019 at the monthly frequency freely available from the ISTAT data warehouse (http://dati.istat.it/). The observational period is split between the training set, from January 2004 to June 2019, and the test set, from July to December 2019. The set of selected forecasting models contains some ARIMA-like models, some Exponential Smoothing models, to machine learning models. It also contains combinations of them based on some model averaging techniques. For sake of brevity, the succinct list is reported in table 1. All the computations are carried out with the statistical software R by using the most recent packages. Model specifications and other details can be provided upon request.


Table 1: *Selection of forecasting models*.

Once all forecasting models have been estimated, it is interesting to compare statistics of model fitting in terms of moments of the corresponding error distribution. At this aim, table 2 below provides rank values (column RANK) for each forecasting model based on a total score (SCORE). The latter statistics is computed as the sum of the single scores reported in terms of mean (RANK MEAN), standard deviation (RANK SD), skewness (RANK SKEWNESS), and kurtosis (RANK KURTOSIS).


Table 2: *Ranking of forecasting models in terms of model fitting*.

What emerges from table 2 is that, in terms of model fitting, the best-performing forecasting model is SPL followed by COMB5, COMB4 BG, COMB4 InW, and so on. In detail, the error distribution of the NN model is associated with the lowest mean error, COMB4 BG with the lowest dispersion. Whereas ARML and SPL are characterized by the lowest skewness and kurtosis, respectively. Despite model fitting being an important quality feature of forecasting models, it is not the definitive dimension to consider when a decision-maker needs to adopt a single forecasting model. As shown in the next section, the accuracy of forecasting performances may deliver different conclusions.

## 3. Results

Figure 1 shows the forecasts produced by each model on the test set over a time horizon of six months. It is possible to observe that ARML model fails in capturing the dynamics of actual data despite its model fitting performances being characterized by the lowest skewness. On the contrary, the COMB2 forecasts closely mimic the dynamics of the Italian unemployment rate despite its model fitting performance are not the best in any moments of the error distribution.

Figure 1: *Forecasts of Italian unemployment rate. ARIMA models (solid line): ARFIMA, ARIMA, GARMA, SSARMA. Combinations (COMB, two-dashed line): COMB1, COMB2, COMB3, COMB4 BG, COMB4 InvW, COMB4 MED. Exponentional Smoothing (ES, dotted line): CES, ES, GUM, HOLT, THETA. Hybrid models (dot-dashed line): ADAM, ATA, BATS, SPL. Machine Learning models (ML, long-dashedline): ARML, BAG, NN.*

These considerations confirm that model fitting, despite being an important aspect to consider for the selection of forecasting models, does not necessarily ensure that forecast performances are aligned with model fitting performances. Instead, the use of various ensembling techniques delivers satisfactory results compared to those of single uncombined models. On this point, note also from figure 1 that the actual dynamics of the unemployment rate is contained within the full set of forecasts. This means that a suitable model combination can be obtained by ensembling appropriately some of the models under scrutiny.

Finally, table 3 provides the values of various accuracy measures used in the various forecasting competitions: ME (mean error), MAE (mean absolute error), MPE (mean percentage error), MSE (mean squared error), MAPE (mean absolute percentage error), RMSSE (root mean squared scaled error), RAME (relative absolute mean error), RMAE (root mean absolute error) and RRMSE (relative root mean squared error).


Table 3: *Ranking of forecasting models in terms of accuracy measures*.

As expected, the overall rank of forecasting models in terms of accuracy measures differs from the ranking in terms of model fitting presented in table 2. Now, the best-performing forecasting model is GUM, followed by CES and SSARIMA. Among all model combinations, only COMB2 and COMB4 InvW lie in a good position, being the fourth and the sixth best performing models respectively. Forecasting models SPL and ARML occupy the next-to-last and last positions, respectively.

## 4. Conclusions

Results confirm that it does not exist yet a single superior universal model. On the contrary, the ranking of different forecasting models is specific to the adopted training set. For example, when the time series of interest switches to the employment rates instead of unemployment rates, the rank of model performances changes. Secondly, results confirm that performances of machine learning and neural network models offer satisfactory alternatives to the traditional econometric models like ARIMA or the non-parametric Exponential Smoothing. Finally, the results stress the importance of model ensemble techniques as a solution to model uncertainty as well as a tool to improve forecast accuracy (Shaub, 2020).

Overall, the flexibility provided by a rich set of forecasting models, and the possibility to combine them, together represent an advantage for decision-makers often constrained to adopt solely pure, uncombined, forecasting models.

# References

