#### Cristina Davinoa , Nicola d'Alesiob **Sustainable development goals: classifying European countries through self-organizing maps**

**Sustainable development goals: classifying European countries through self-organizing maps**

Department of Economics and Statistics, University of Naples Federico II, Naples, Italy Department of Statistical Sciences, University of Padua, Padua, Italy Cristina Davino, Nicola D'Alesio

# **1. Introduction**

b

a

Environmental sustainability, despite being the subject of different interpretations (Hueting & Reijnders, 1998; Goodland, 1995), involves the preservation of things and qualities valued in the environment (Sutton, 2004). To achieve this goal, the United Nations (Brundtland et al., 1997) included three goals about environmental sustainability among the proposed 17 Sustainable Development Goals (SDGs). The SDGs related to environmental sustainability are the following: number 13, which refers to climate change and its impacts; number 14, which refers to the conservation of water and marine resources; and number 15, which refers to the preservation of forests. Each of these goals is measured through a set of indicators. An important question is understanding what Europe has achieved in terms of environmental sustainability. In this paper, a mapping of the environmental sustainability within the European territory is proposed using Machine Learning techniques. In particular, Self-Organizing Maps (SOMs), an unsupervised clustering method in the framework of artificial neural networks, are exploited to identify and visualize European countries into a low-dimensional grid (Kohonen, 1982a, 1982b). The analysis considers the indicators related to the three SDGs of environmental sustainability (SDG 13, 14, and 15) and aims to identify groups of countries with similar characteristics through a dimensionality reduction, representing them in a two-dimensional map. The reference year was 2019, except for two indicators updated in 2018 and 2020. To ensure the stability of our results, we built several SOMs with different grids and chose the best one using accuracy measures and a Leave-One-Out procedure. The paper is divided as follows: Section 2 shows the concept of environmental sustainability and the different methods of measurement. In Section 3 there is a description of the data and methodology. Section 4 provides the presentation of the results. All the computations are realized using the R packages *kohonen* (Wehrens & Buydens, 2007), *aweSOM* (Julien et al., 2021), *factomineR* (Husson et al., 2016), and *Factoextra* (Kassambara & Mundt, 2017).

# **2. Literature review**

Sustainability has a long and complex history. It was discussed at the end of the eighteenth century as a "derivation from the noun sustenance" (Jenkins & Schröder, 2013). A key point on sustainability is the perspective for the future: it is necessary to manage resources to guarantee them also for future generations (Hueting & Reijnders, 1998). Because of the difficulties to define sustainability, environmental sustainability has also been subject to different interpretations and discussions over time (Goodland, 1995). A proper definition is the following: "the ability to maintain things or qualities that are valued in the physical environment" (Sutton, 2004). This definition seems more appropriate as it allows us to include the sustenance of all facets of physical capital. The definition of environmental sustainability is crucial to provide policymakers with precise information on its development, but an important step of this process is also to understand how to measure it. Efforts to build indicators to measure environmental sustainability have led to the creation of several evaluation exercises. Among the best known there are the SDGs proposed by the United Nations which cover all fields of sustainability (economic, social, and environmental). They are not exempt from

Cristina Davino, University of Naples Federico II, Italy, cdavino@unina.it, 0000-0003-1154-4209 Nicola D'Alesio, University of Campania Luigi Vanvitelli, Italy, nicola.dalesio@unicampania.it

Referee List (DOI 10.36253/fup\_referee\_list)

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Cristina Davino, Nicola D'Alesio, *Sustainable development goals: classifying European countries through self-organizing maps*, © Author(s), CC BY 4.0, DOI 10.36253/979-12-215-0106-3.17, in Enrico di Bella, Luigi Fabbris, Corrado Lagazio (edited by), *ASA 2022 Data-Driven Decision Making. Book of short papers*, pp. 95-100, 2023, published by Firenze University Press and Genova University Press, ISBN 979-12-215-0106-3, DOI 10.36253/979-12-215-0106-3

criticism, as they are recent and, according to experts, must be integrated and updated constantly (Hak et al., 2016). Notwithstanding this, they provide an accurate framework of indicators to measure sustainability. In particular, SDGs n°13, 14, and 15 consider indicators aiming to measure environmental sustainability: climate change and its impacts (Climate Action - SDG 13), conservation and sustainable use of the oceans, seas, and marine resources and reduce marine pollution and water acidification (Life Below Water - SDG 14), protection, restoration, and sustainable use of terrestrial, inland and mountain ecosystems (Life on Land - SDG 15).

# **3. Data and methods**

## **3.1 Data**

Data for the three considered SDGs are available on the Eurostat website. We used 2019 as the base year (just two indicators of the SDG-15 are updated to 2018 and 2020). A subset of 14 indicators from the set of 21 indicators was used for the analysis because some of them are not available at the national level for each country and/or because they contained more than 80% of missing values. The units of analysis are represented by the 31 countries<sup>1</sup> . Table 1 shows the list of considered indicators, divided by SDGs, with the acronym used in results figures and tables and with some descriptive statistics<sup>2</sup> . The asterisk ("\*") denotes indicators with negative polarity with respect to the concept of environmental sustainability. Missing data and outliers have not been treated because the algorithm of the SOMs can impute a value for the missing data and isolate the effect of the outliers in the extreme regions of the network. All the considered indicators have been standardized before applying the SOM algorithm.

# **3.2 Methods**

Self-Organizing Maps (SOMs) are artificial neural networks that produce a lowdimensional representation of the input space, allowing a dimensionality reduction (Kohonen, 1982a, 1982b, 1990). They use a neighborhood function to preserve the topological properties of the input space. The SOM algorithm is divided into two phases: the competitive phase and the cooperative phase. In the competitive phase for each input vector, the neuron with the minimum distance from the input is selected and it represents the winner. Although several distance measures are available, the Euclidean distance is the most used (Miljković, 2017). The neurons within a grid interact with each other using a neighborhood function such as the Gaussian function. In the cooperative phase, on the other hand, the weights are modified as topologically related subsets on which similar weight updates are performed. During learning, not only the weight vector of the winning neuron is updated, but also those of its reticular neighbors and, therefore, that end up responding to similar inputs. This is achieved with the neighborhood function, which is centered on the winning neuron and decreases with the distance of the grid from the winning neuron. Once the units (the weights) have been initialized, the training phase starts. SOMs training is done through unsupervised learning that can be realized in a sequential formation (or online algorithm: a single statistical unit is inserted into the network at a time) or in batch modality (or batch algorithm: all statistical units are inserted into the network at once) (Matsushita & Nishio, 2020). In our case, it was preferred the online algorithm. We chose the Euclidean distance as a distance measure and the Gaussian function as a neighborhood function.

<sup>1</sup> Belgium, Bulgaria, Czechia, Denmark, Germany, Estonia, Ireland, Greece, Spain, France, Croatia, Italy, Cyprus, Latvia, Lithuania, Luxembourg, Hungary, Malta, the Netherlands, Austria, Poland, Portugal, Romania, Slovenia, Slovakia, Finland, Sweden, Iceland, Norway, Switzerland, and the United Kingdom.

<sup>2</sup> VC means variation coefficient.


Table 1: SDGs Indicators.

The most widespread accuracy measures used in the SOM framework are the following:


SOMs prove to be a useful and innovative tool for our study, being able to reduce dimensionality and provide a two-or three-dimensional representation of European countries in the different facets of environmental sustainability. There are many studies of the application of these networks in environmental contexts, also in Italy (Carboni et al., 2015).

## **4. Results**

After the indicator selection described in Section 3.1, the analysis is carried out through the following steps: identification of the best SOM through the estimation of several SOMs and accuracy evaluation, clustering of countries, visualization, and interpretation of the results.

## **4.1 Identification of the best self-organizing map**

It is well known that one of the main drawbacks of neural networks is the selection of the architecture. We decided to train several networks with different numbers of neurons and with a grid compatible with the sample size and to select the best SOM by comparing the accuracy measures. The results in Table 2 showed that SOMs with grids 3x5 and 5x4 have very similar performance.


*Table 2 - SOMs trials: evaluation with accuracy measures*

The choice of the best network between these two SOMs was made taking into account the stability of the results in terms of sensitivity to the specific statistical units (countries). The two networks were trained using a leave-one-out procedure, i.e., they were estimated n-1 times by excluding one country each time. The aim is to assess how sensitive the results shown in Table 2 may be to the exclusion of even one country. Results are shown in Figure 1 where we plot the percentage of variability explained and the quantization error of the 3x5 (left-hand side) and 5 x 4 (right-hand side) networks trained excluding each time a country. We decided to use these two measures because the other two accuracy measures give the same information about the topographic qualities of a SOM. The red lines represent the values of the reference network (with all statistical units and shown in Table 2). Observing the two graphs, it results that the accuracy of the 3x5 SOM improves (quadrant in the bottom right part) by removing 5 statistical units, while the 5x4 SOM is much more unstable as it improves by removing more than half of the observations.

*Figure 1 - Scatter Plot of the accuracy measures for the two SOMs (grid 3x5 – left; grid 5x4 - right)*

Although of the two selected networks, the 3x5 network is more stable, it is necessary to find its optimal configuration by trying to figure out which of the five countries displayed in the bottom right-hand quadrant is appropriate to eliminate. The proposed procedure proceeds one step at a time starting from the elimination of the statistical unit that provides the most benefit (Hungary) to the one that provides the least benefit (Iceland). Table 6 shows the accuracy measures of these 3x5 SOMs and highlights that the best compromise is obtained just by eliminating Hungary because all the accuracy measures worsen if two or more countries are removed from the analysis.



# **4.2 Classification of countries**

Once a stable SOM has been achieved, it is possible to identify the best partition of countries by applying a clustering procedure. The SOM built without Hungary is shown in Figure 2 where colors highlight the four groups identified using the Ward criterion.

*Figure 2 - Visualization of the SOM 3x5 and the partition in four groups*

The characterization of the clusters is typically done by comparing, for each indicator, the group averages with the averages on the total sample. Due to lack of space, we report the result of this comparison and the countries belonging to each cluster directly below:


− Group 4, in red, is composed of Italy, Spain, Portugal, Greece, Croatia, Cyprus, Austria, Slovenia, Bulgaria, Poland, Slovakia, and Luxembourg (these are mainly countries in the Mediterranean region). These countries have a high number of protected areas (SDG-15) but high net emissions (SDG-13). It can be tagged as the group of "Countries close to achieving SDG-15 but far from achieving SDG-13".

The previous classification separates countries closer to achieving a goal and those which are very far from some or all SDGs. This information could help policymakers in assessing what has been achieved so far, what policies need to be implemented to achieve, and which policies in the countries furthest from attainment have either not been implemented or have not been implemented appropriately. The main limitation of this paper is the typical black box effect of neural networks even if the SOMs provide at least a visualization of the grid. A possible future development could be a comparison with other techniques such as cluster analysis, although it will be necessary, in this case, to address the problem of missing data that SOMs are capable of handling. A further problem is the small sample size which has been faced proposing a study of the stability of the results through a leave-one-out procedure.

## **References**

