#### Alessandro Fusta Moroa , Matteo Salisa , Andrea Zucchia , Michela Camelettib, Natalia Golinia , Rosaria Ignaccoloa <sup>a</sup> Dept. of Economics and Statistics "Cognetti de Martiis", University of Turin, Turin (IT) **Ammonia emissions and fine particulate matter: some evidence in Lombardy**

Ammonia emissions and fine particulate matter: some evidence in Lombardy

<sup>b</sup> Dept. of Economics, University of Bergamo, Bergamo (IT) Alessandro Fusta Moro, Matteo Salis, Andrea Zucchi, Michela Cameletti, Natalia Golini, Rosaria Ignaccolo

# 1. Introduction

Air quality in the Lombardy region (northern Italy) is affected by high concentrations of pollutants. One of the reasons is that Lombardy is localised in the Po Valley where air circulation is very weak because of the mountains surrounding the area. The peculiar weather conditions, the industrial development and the population density make Lombardy one of the worst European region in terms of air quality1. As a result, epidemiological studies have found that Lombardy is characterized by an elevated mortality rate related to fine particulate matter (PM2.5) exposure. It is well known that a considerable part (from 10% up to 50%) of the PM2.5 is formed by the chemical reactions of the ammonia (NH3) with other precursors. In the Lombardy region, 97% of the total NH3 gaseous emissions are linked to the agriculture sector (INEMAR - ARPA Lombardia, 2022). Considering that Lombardy is the leading region in Italy for agriculture production, with the highest regional density of swine and bovines, it is clear that the agriculture section has a considerable impact on air quality.

The project *Agriculture Impact On Italian Air Quality*, hereafter *AgrImOnIA*, aims to estimate the local impact of ammonia emissions on particulate matters (PM10 and PM2.5). This information can be crucial for the policy-makers who have to prioritise interventions. *AgrImOnIA* is an ongoing research project, promoted and financed by Fondazione Cariplo within the framework of *Data Science for science and society*. More information on the project are available on https://agrimonia.net/.

In this work, we present preliminary results providing continuous spatial maps of PM2.5 concentrations (with daily temporal scale) in the Lombardy region using the *AgrImOnIA dataset*2, which contains harmonised data on meteorology, emissions and land use. We implement three spatial prediction methods whose performance will be compared by using standard indexes computed with the Leave-One-Out Cross-Validation strategy (LOOCV). In particular, we consider a spatio-temporal Kriging model with external drift, and two random forest algorithms which include spatial and temporal components.

# 2. Data

The *AgrImOnIA dataset* is an open access data set containing Air Quality (AQ), Weather (WE), Land cover (LA), Emission (EM) and Livestock (LI) data with daily temporal resolution. The data are available for all the air quality monitoring stations after a pre-processing step to change the support of spatial data from area to point, when necessary. We consider the period from 2017 to 2020. The area covered by the *AgrImOnIA dataset* includes the Lombardy region

Andrea Zucchi, University of Turin, Italy, andrea.zucchi925@edu.unito.it

Michela Cameletti, University of Bergamo, Italy, michela.cameletti@unibg.it, 0000-0002-6502-7779

Natalia Golini, University of Turin, Italy, natalia.golini@unito.it, 0000-0003-4457-5781

Rosaria Ignaccolo, University of Turin, Italy, rosaria.ignaccolo@unito.it, 0000-0003-2998-1714

Referee List (DOI 10.36253/fup\_referee\_list)

<sup>1</sup>https://www.eea.europa.eu//publications/air-quality-in-europe-2021

<sup>2</sup>The *AgrImOnIA dataset* will soon be available on Zenodo, which is an open repository operated by CERN (https://zenodo.org/).

Alessandro Fusta Moro, University of Turin, Italy, alessandro.fustamoro@unito.it, 0000-0003-1129-5038 Matteo Salis, University of Turin, Italy, matteo.salis@edu.unito.it

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Alessandro Fusta Moro, Matteo Salis, Andrea Zucchi, Michela Cameletti, Natalia Golini, Rosaria Ignaccolo, *Ammonia emissions and fine particulate matter: some evidence in Lombardy*, © Author(s), CC BY 4.0, DOI 10.36253/979-12-215-0106-3.40, in Enrico di Bella, Luigi Fabbris, Corrado Lagazio (edited by), *ASA 2022 Data-Driven Decision Making. Book of short papers*, pp. 227-232, 2023, published by Firenze University Press and Genova University Press, ISBN 979-12-215-0106-3, DOI 10.36253/979-12-215-0106-3

Figure 1: Area of interest with PM2.5 monitoring stations, classified by type: rural background (RB); rural industrial (RI); suburban background (SB); suburban traffic (ST); urban background (UB); urban industrial (UI); urban traffic (UT).

and a neighbouring area defined by applying a 0.3-degree buffer to the regional boundaries. The area and the PM2.5 monitoring network considered by the *AgrImOnIA project* can be seen in Figure 1.

Figure 2: PM2.5 annual mean concentrations (µg/m3) for each station from 2017 to 2020.

Figure 3: Time series of PM2.5 concentrations (top) and NH3 agriculture emissions (bottom) for all the monitoring sites

From the AQ available variables, we selected PM2.5 (AQ pm25, in µg/m<sup>3</sup>) as the response variable in logarithmic scale. Figure 2 shows the annual mean of PM2.5 concentrations in each monitoring station: higher values are located in the lower Po Valley, particularly in the provinces of Milan, Cremona, Lodi and Brescia. The other selected variables are described in Table 1. The overall NH3 emissions from the agriculture sector (nh3 agr) are calculated by summing up NH3 emissions from manure management, agriculture soil and agriculture waste burning (EM nh3 livestock mm + EM nh3 agr soils + EM nh3 waste burn). To generate continuous maps as output, the covariates are also obtained on a regular grid 0.1◦ x 0.1◦. Figure 3 displays the PM2.<sup>5</sup> and NH3 daily time series for all the monitoring stations. As it can be seen, PM2.5 concentrations follow a seasonality with peaks during the winter, while ammonia shows higher values in summer, likely because of a higher uses of fertilisers.


Table 1: Description of the selected variables

# 3. Spatial prediction techniques

In order to perform spatial prediction and to produce continuous spatial maps of PM2.5 concentrations, we consider two approaches: 1) spatio-temporal kriging with external drift (STKED), a classical approach in geostatistics framework; 2) a well-known machine learning method - random forest (RF) - extended to the case of data correlated in space and time.

### Spatio-temporal kriging with external drift

Spatio-temporal kriging is a supervised parametric model which assumes that the observed PM2.<sup>5</sup> data are generated by a given stochastic spatio-temporal model. In particular, we suppose that the response variable log(AQ pm25) is Normally distributed with a mean changing in space and time and a variance given by the measurement error variance (i.e. the nugget). The mean of the response field is in turn defined as the sum of a large-scale trend (or external drift, which includes the linear effect of the covariates described in Table 1), and a residual spatiotemporal process with separable space-time covariance function. For the implementation of this method we use the gstat R-package (Graler et al., 2016), which requires the estimation of a ¨ spatio-temporal variogram. Once all the models parameters are estimated, spatial prediction of the expected log(AQ pm25) value at any location in Lombardy region is straightforward: it is given basically by a local weighted mean of the spatio-temporal residuals plus the large scale component (evaluated using the covariate values at the new sites).

### Random forest for spatio-temporal data

Random forest is a data-driven non-parametric machine learning technique given by an ensemble of regression trees fitted using several bootstrapped version of the original data and subsets of the considered covariates. The main limitation of random forest is that it is not able to take directly into account the temporal and spatial correlation, as kriging does. In order to include in the fitting algorithm some information about the data spatial structure, we propose here two different implementations of the method: 1) the standard RF algorithm (RFbase) which includes in the covariate set, besides the variables described in Section 2., also the spatial coordinates (longitude and latitude) of each observation; 2) the spatial RF (RFsp) method proposed by Hengl et al. (2018). This method expands the set of covariates by including the buffer distances from the observation sites (i.e., if we have n monitoring stations we will have n additional columns in the training set each referring to a given station and including the distances from the remaining locations). To take into account the temporal component we consider as covariates the date of the day, the day of the week and the type of season. The two RF algorithms are implemented using the Ranger R-package. Prediction in a new spatial location is usually given by the averages of the, say, B predictions computed using the single trees in the forest. Indeed, we consider an ordinary spatio-temporal kriging model for the differences between observed and predicted data in order to include a term taking into account spatio-temporal correlation and predict a term for the small scale component.

## 4. Preliminary results

Starting considering the STKED technique, we subset covariates through stepwise strategy and we estimate the coefficients shown in Table 2 also referred to interaction terms between season and emissions. We can see that, among the emissions, nh3 agr has a larger impact on log(AQ pm25) during the winter, while EM nox sum shows larger effects in the remaining seasons; this is consistent with the results in Thunis et al. (2021). The sample variogram of the residuals of the large-scale component is used to choose the exponential variogram model.

Figure 4 (right) shows the 2020 mean of the daily predicted PM2.5 concentrations in the area of interest. It is worth to note that higher PM2.5 concentrations are predicted where we observe higher NH3 emissions from the agriculture sector, as shown in Figure 4 (left).

As for the ML approach, the variable importance analysis returns similar results for RFbase and RFsp. Figure 5 shows the weather components as the most important, in accordance with the literature (Cameletti et al., 2011) together with the temporal components. Moreover, it turns out that the euclidean distances between sites (dist from ) is not very important.

The comparison between the prediction capability of the three models is performed through LOOCV and the results are shown in Table 3. STKED shows higher performance compared to the two versions of RF, although it is worth to note that these results are based on a preliminary version of the *AgrImOnIA dataset*.

## 5. Discussion and further development

Further developments of this work will consider the forthcoming versions of the *AgrImOnIA dataset* and extensions of the considered techniques, always in the framework of the comparison


Table 2: Large-scale component coefficient estimates of STKED

*Note:* <sup>∗</sup>p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

Figure 4: Mean of daily NH3 emissions from agriculture over the period 2017-2020 (left); mean of daily PM2.5 concentrations predicted by STKED for 2020 (right).

Table 3: LOOCV comparison of the models described in Section 3..


Figure 5: Variable importance of the spatial random forest (RFsp) measured by the RSS mean reduction for each variable.

between classical geostatistics and machine learning approaches.

Recent studies found that NH3 emissions reductions are the most cost-effective way to reduce PM2.5 concentrations (Gu et al. 2021). Scenario analysis based on spatial prediction techniques, in compliance with the Regional Plane for emissions reductions3, will allow to assess the expected impact of NH3 emissions reduction's policies before their implementation.

# References


<sup>3</sup>https://www.regione.lombardia.it/wps/portal/istituzionale/

HP/DettaglioRedazionale/istituzione/direzioni-generali/

direzione-generale-ambiente-e-clima/piano-regionale-interventi-qualita-aria-pria