#### Fabrizio Antolinia , Samuele Cesarinia , Francesco Giovanni Trugliab **Spread of Covid-19 epidemic in Italy between March 2020 and February 2021: empirical evidence at provincial level**

**Spread of Covid-19 epidemic in Italy between March 2020 and February 2021: empirical evidence at provincial level**

<sup>a</sup> Department of Business Communication, University of Teramo, Teramo, Italy. <sup>b</sup> Istat, Directorate for Environmental and Territorial statistics, Rome, Italy. Fabrizio Antolini, Samuele Cesarini, Francesco Giovanni Truglia

## **1. Introduction**

The literature on the determinants of Covid-19 contagion is evidently rather recent and does not always draw generally accepted conclusions in identifying the factors that may explain the differences between territorial areas in the severity of Covid-19 impact (Moosa and Khatatbeh, 2021). The rate of contagion is a phenomenon that depends on many and varied factors that are not easy to interpret and must be analysed considering their spatial component (Cutrini and Salvati, 2021).

To this end, convergence models were used, in which the initial level and growth of observed infections in a certain province were related to the level of infections and the relative growth rate of all other provinces. This model was implemented for all three waves that occurred in Italy from March 2020 to February 2021. The proposed convergence model was constructed by also including environmental (Azuma et al, 2020; Copat et al, 2020) and demographic (Goumenou et al, 2020) factors as controlling elements of a conditional β-convergence (Truglia, 2021).

In the literature, spatial regression models have been widely used in many epidemiological studies (Guo, G. et al., 2020; Liu, X. et al., 2020; Zhao, et al., 2020). To date, however, only a few studies are available that have investigated the close association between sociodemographic and environmental determinants and the spatial convergence of Covid-19 infection incidence. Therefore, this study aims to address the mentioned research gap.

This work further contributes to the study and understanding of the impact of demographic and environmental parameters on the spread of Covid-19 cases by adopting a spatial regression approach.

The work is divided into four sections. The first describes the construction of the panel of data used and their recoding into indicators and indices. The second part circumscribes the spatial approach in the implementation of the conditional β-convergence model to investigate any convergence processes observed in the transmission of contagion between the spatial areal units under study. The third part presents the results obtained. Finally, the fourth part proposes a discussion of the findings and introduces some final considerations and possible implications for future studies.

## **2. Data**

In the following analysis, a balanced panel of data referring to the 107 Italian provinces was used. The data on contagion were retrieved and processed from the Civil Protection repository in the 'data-provinces' section. From these, for each of the 107 Italian provinces, the contagion rates for the three waves and their respective durations and distances (in days) were calculated. The spatial context data were collected from the ISTAT data warehouse and the ISPRA environmental data yearbook.

As for the infection rate, this was measured as the simple ratio of the total number of registered cases of Covid-19 infection at period t - where t represents the first (I), second (II) and third wave (III) respectively - to a standard reference population of 100,000 individuals.

The other indices relating to contagion (duration and distance), calculated for each province,

Referee List (DOI 10.36253/fup\_referee\_list)

Fabrizio Antolini, Samuele Cesarini, Francesco Giovanni Truglia, *Spread of Covid-19 epidemic in Italy between March 2020 and February 2021: empirical evidence at provincial level*, © Author(s), CC BY 4.0, DOI 10.36253/979-12-215-0106-3.19, in Enrico di Bella, Luigi Fabbris, Corrado Lagazio (edited by), *ASA 2022 Data-Driven Decision Making. Book of short papers*, pp. 107-112, 2023, published by Firenze University Press and Genova University Press, ISBN 979-12-215-0106-3, DOI 10.36253/979-12-215-0106-3

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

do not require statistical formalisation, and represent, the first, the number of days elapsed between the beginning of one wave and its end, and the second, the number of days between the end of one wave and the beginning of the next. On the other hand, the indices that are assigned the role of explanatory variables and that will be the controlling factors for the convergence of infection are:


As these control variables have different units of measurement, they are standardised for use in the convergence model.

# **3. Method**

There are various procedures for analysing territorial convergence. In the present study, the most well-known convergence concepts were used to which reference is made in the bibliography (Barro e Sala-I-Martin, 1992; Mankiw, 1992; Arbia, 2005), including β-convergence. In short, in the literature, this approach originates directly from the neoclassical theory of economic growth theorised by Solow-Swan (Solow et Swan, 1956). This type of convergence describes an economic environment in which a poorer country develops faster than a richer country, in terms of per capita income level. Unlike formal models that require a measure of physical and/or human capital, greater freedom is granted by informal models that are not required to be traceable to the variables brought into play by growth accounting (Alexiadis, 2010). The conditional βconvergence model can therefore be rewritten as follows (equation 1):

$$
\ln \text{(Y\u0\urcorner Y\u0\urcorner + \text{)}} = \beta\_o + \beta\_i \ln \text{(Y\u0\urcorner + \text{)}} + \text{YZ} + \text{\infty} \tag{1}
$$

Where,

i, and t denote respectively, the spatial unit and the time reference in which the observation Y is measured

β<sup>0</sup> is the intercept

Z is the matrix of the *n* control variables that are assumed to influence the growth rate

ε<sup>i</sup> is the error term at zero mean and variance σ<sup>2</sup>

ln(Yi,t/Yi,0) is the natural logarithm of the growth rate

ln(Yi,0) is the natural logarithm of the initial level

The β<sup>1</sup> coefficient, if statistically significant and of negative sign, indicates the existence of the β-convergence hypothesis.

The β-convergence model thus captures whether territorial gaps, in relation to a specific aspect, increase or decrease over a certain time span (in our study the beginning and end of the three successive waves). This research adopts a method that differs from the conventional convergence strategy by instead focusing on the spatial convergence aspect. In fact, an interesting issue to consider in the territorial convergence analysis is the recognised need to introduce elements that consider functional relationships between provinces. For these reasons, it is therefore appropriate to make use of specific procedures capable of considering the structure of

connections between the units of analysis (Guliyev, 2020). Translated into other terms, the βconvergence model can be transformed in such a way that it considers the spatial proximity of the N observations by means of a proximity matrix W consisting of elements wij that take on value 1 or 0, respectively in the case that units i and j are contiguous or non-contiguous.

The spatial methods that can be constructed from this common basis are many and varied depending on the spatial effects to be investigated. Below we propose the conditional βconvergence model (in matrix form) in the case of spatial autoregressive lag of the dependent variable (SAR) (equation 2).

$$\mathbf{y} = \rho Wy + \beta \mathbf{X} + Y \mathbf{Z} + \mathbf{z} \tag{2}$$

Where,

**y** is the matrix containing the natural logarithm of the growth rate at time *t* and province *i* **X** is the matrix containing the natural logarithm of the initial level

**Z** is the matrix of the *n* control variables that are assumed to influence the growth rate

**ρ (Rho)** denotes the spatial autoregressive coefficient

**W**represents the contiguity matrix of the provinces

β and Ɣ are the coefficients to be estimated

**ε** is the error term with zero mean and variance σ<sup>2</sup> .

It was decided to use a W contiguity matrix of the queen contiguity type. In this typology, provinces that share at least one side or vertex are considered contiguous (LeSage, 1998).

## **4. Results**

Table 1 show the results obtained through the estimation of the spatial autoregressive SAR model implemented for the conditional β-convergence model.


**Table 1.** Results conditional β-convergence (SAR): (a) first wave; (b) second wave; (c) third wave

Signif. codes: 0 <= '\*\*\*' < 0.001 < '\*\*' < 0.01 < '\*' < 0.05 < '.' < 0.1 < '' < 1

*Source: author's elaboration of collected data*

The regression results show that the coefficient of the initial level of the infection rate β<sup>1</sup> is less than 0 and significant for all three waves analysed in this study. This implies the existence of the convergence hypothesis (Baumol, 1986).

Since the spatial regression parameters, unlike with the OLS method, were estimated using the maximum likelihood (ML) method, this does not allow the R2 index to be used to assess the goodness of fit of the model. In this case, therefore, the goodness of fit of the model is assessed by comparing the AIC statistics (Akaike, 1974) calculated for the OLS and SAR models (Table 2).


**Table 2.** Goodness of fit conditional β-convergence (SAR): (a) first wave; (b) second wave; (c) third wave

*Source: author's elaboration of collected data*

The AIC calculated for SAR is always lower than the same measured for OLS. The Rho (ρ) is statistically significant as is its relationship to the dependent variable (Wald test). Therefore, the spatial model best fits the data and most accurately interprets the observed convergence process.

## **5. Discussion**

The results obtained are robust and consistent with the established body of literature in previous medical studies suggesting that poor air quality creates chronic exposure to respiratory disease. On the other hand, population density, the old-age index and average temperature were not always found to be conditional elements of the observed convergence processes, varying in significance depending on the wave taken as the period of observation, and thus partly confirming what emerged from the reference literature. As far as the spatial delays are concerned, the spillover effects recorded by the parameter ρ (Rho) for all three waves are significant and are respectively equal to 0.41 for the first wave, 0.29 for the second, and 0.26 for the third. According to these results, therefore, it is possible to state that increases and decreases in the average growth rate in the *i-th* province can also be attributed to changes in growth levels in its neighbouring provinces. According to the estimated SAR model, *spillover effects* calculated for population density (0.12) and pollution (0.21) for the first wave are also significant. It would thus appear that provinces with a high population density over the available surface area and above-average presence of substantial air pollutants are directly responsible for the growth of contagions in neighbouring areas. Density retains its spatial influence even during the second wave by significantly reducing its magnitude (0.04). Pollution (0.02) becomes slightly significant (p-value just under 10%) and decreases its influence in exerting an effect on the growth of contagions in neighbouring provinces. During the second wave there emerges a restraining effect due to the old index (old\_index = -0.02) according to which in provinces in which there is a high presence of individuals aged 65 years or over, relative to the resident population, there is a negative relationship with the growth rate of contagions in the contiguous provinces. Finally, as regards the third wave, a weak (p-value of just under 10%) positive spatial relationship emerges between the observed temperature (0.02) and the level of contagions in the neighbouring areas. Confirmed, on the other hand, is the significance of pollution (0.04) in producing an increase in contagions in provinces sharing a border with a province characterised by high levels of this variable. Finally, all three waves share the significance of the observed durations, respectively 0.01 the first, 0.001 the second and 0.003 the third wave, showing, however, a weak spatial influence on the average rate of contagion growth.

Although consistent with the initially hypothesised framework, however, the results obtained have several limitations and implications for future research. Firstly, some critical elements should be noted in the nature of the dependent variable used. These reflections arise from the fact that it is not possible to know the true population that has been exposed to the virus. A further investigation could examine the actual number of people tested. These data are currently not available at the provincial level, and those at the regional level suffer from multiple counting due to repeated testing of positive cases. Secondly, there are some provinces that have reallocated some positive cases to other provinces due to health facility capacity or registration errors. To address these concerns, the paper proposes an analysis on aggregated wave-level data, but possible biases may still exist. Future studies could implement estimation control procedures, potentially including some dummy variables and retesting the model. A further possible source of bias may be introduced by potential outliers. Results could potentially be driven by a few provinces showing several new cases that are exceptionally far from the average. In addition to all this, it must be remembered that the Covid-19 testing policy in Italy, especially at the beginning of the pandemic, was different over time and in the various provinces. Initially, the tests were performed on suspected patients who presented themselves in hospital and/or on persons who had been in contact with positive cases, later only patients with severe symptoms were tested, and finally the tests were also performed on suspects without severe symptoms. Finally, it should be added that the statistical significance of conditional factors does not necessarily imply causality in the recorded convergence process and based on the characteristics of the data, there is no possibility of testing causality by means of a suitable counterfactual trend (in fact, it is impossible to construct a suitably randomised control group for a phenomenon that is already occurring at the time of the evaluation).

### **References**


the strategy for COVID-19 infection control. *Environmental Health and Preventive Medicine volume*, *25*(1).

