# **Statistics and Information Systems for Policy Evaluation ASA 2021**

BOOK OF SHORT PAPERS of the on-site conference

edited by Bruno Bertaccini Luigi Fabbris Alessandra Petrucci

### PROCEEDINGS E REPORT

ISSN 2704-601X (PRINT) - ISSN 2704-5846 (ONLINE)

– 132 –

# *Scientific Program Committee*

Luigi Fabbris (co-chair) (University of Padua) Alessandra Petrucci (co-chair) (SIS - University of Florence)

Luciana Annarumma (Assirm) Fabio Bacchini (ISTAT) Rossella Berni (University of Florence) Bruno Bertaccini (University of Florence) Luigi Biggeri (University of Florence) Eugenio Brentari (University of Brescia) Maurizio Carpita (University of Brescia) Giulia Cavrini (Free University of Bolzano-Bozen) Alessandro Celegato (AICQ-AISS, PSV Project Service and Value) Giuliana Coccia (Alleanza Sviluppo Sostenibile ASviS) Cristina Davino (Federico II University of Naples) Adriano Decarli (University of Milan) Loretta Degan (Galgano Group, Milan) Tonio Di Battista ("G. D'Annunzio" University of Chieti and Pescara) Enrico Di Bella (University of Genoa) Angela Maria Digrandi (CNR) Simone Di Zio ("G. D'Annunzio" University of Chieti and Pescara) Guido Ferrari (University of Florence) Benito Vittorio Frosini (Sacred Heart Catholic University of Milan) Antonio Giusti (University of Florence) Gabriella Grassia (Federico II University of Naples) Salvatore Ingrassia (University of Catania) Michele Lalla (University of Modena and Reggio Emilia) Corrado Lagazio (University of Genoa) Paolo Mariani (University of Milan-Bicocca) Stefania Mignani (University of Bologna) Francesco Palumbo (Federico II University of Naples) Alfonso Piscitelli (Federico II University of Naples) Giorgio Tassinari (University of Bologna) Laura Trinchera (NEOMA Business School, FR) Venera Tomaselli (University of Catania) Domenico Vistocco (Federico II University of Naples)

### *Local Program Committee*

Bruno Bertaccini (chair) (University of Florence)

Silvia Bacci (University of Florence) Chiara Bocci (University of Florence) Federico Crescenzi (University of Florence) Maria Veronica Dorgali (University of Florence) Carla Galluccio (University of Florence) Antonio Giusti (University of Florence) Alessandra Petrucci (University of Florence)

# ASA 2021 Statistics and Information Systems for Policy Evaluation

BOOK OF SHORT PAPERS of the on-site conference

> edited by Bruno Bertaccini Luigi Fabbris Alessandra Petrucci

> FIRENZE UNIVERSITY PRESS 2021

ASA 2021 Statistics and Information Systems for Policy Evaluation : BOOK OF SHORT PAPERS of the on-site conference / edited by Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci. – Firenze : Firenze University Press, 2021. (Proceedings e report ; 132)

https://www.fupress.com/isbn/9788855184618

ISSN 2704-601X (print) ISSN 2704-5846 (online) ISBN 978-88-5518-461-8 (PDF) ISBN 978-88-5518-462-5 (XML) DOI 10.36253/978-88-5518-461-8

Cover graphic design: Lettera Meccanica SRLs Front cover: © man64|123rf.com

# **ASA 2021 On-site Conference on STATISTICS AND INFORMATION SYSTEMS FOR POLICY EVALUATION**

University of Florence, September 6 - 8, 2021

Bruno Bertaccini, Luigi Fabbris and Alessandra Petrucci (Editors)

**Partners**

*FUP Best Practice in Scholarly Publishing* (DOI https://doi.org/10.36253/fup\_best\_practice)

All publications are submitted to an external refereeing process under the responsibility of the FUP Editorial Board and the Scientific Boards of the series. The works published are evaluated and approved by the Editorial Board of the publishing house, and must be compliant with the Peer review policy, the Open Access, Copyright and Licensing policy and the Publication Ethics and Complaint policy.

### *Firenze University Press Editorial Board*

M. Garzaniti (Editor-in-Chief), M.E. Alberti, F. Vittorio Arrigoni, E. Castellani, F. Ciampi, D. D'Andrea, A. Dolfi, R. Ferrise, A. Lambertini, R. Lanfredini, D. Lippi, G. Mari, A. Mariani, P.M. Mariano, S. Marinai, R. Minuti, P. Nanni, A. Orlandi, I. Palchetti, A. Perulli, G. Pratesi, S. Scaramuzzi, I. Stolzi.

The online digital edition is published in Open Access on www.fupress.com.

Content license: except where otherwise noted, the present work is released under Creative Commons Attribution 4.0 International license (CC BY 4.0: http://creativecommons.org/licenses/by/4.0/legalcode). This license allows you to share any part of the work by any means and format, modify it for any purpose, including commercial, as long as appropriate credit is given to the author, any changes made to the work are indicated and a URL link is provided to the license.

Metadata license: all the metadata are released under the Public Domain Dedication license (CC0 1.0 Universal: https:// creativecommons.org/publicdomain/zero/1.0/legalcode).

III

© 2021 Author(s)

Published by Firenze University Press Firenze University Press Università degli Studi di Firenze via Cittadella, 7, 50144 Firenze, Italy www.fupress.com ISBN: XXXXX This Book is published only in pdf format. Copyright © 2021 Firenze University Press Via Cittadella, 7

*This book is printed on acid-free paper Printed in Italy* 50144 Firenze info@fupress.com

# **Table of contents**


FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

# SESSION DECISION MAKING


SESSION HEALTH AND WELL-BEING



# **Preface**

The Association for Applied Statistics (ASA) and the Department of Statistics, Computer Science, Applications DiSIA "*Giuseppe Parenti*" of the University of Florence, jointly with the partners AICQ-CN (Italian Association for Quality Culture North and Centre of Italy), AISS (Italian Academy for Six Sigma), ASSIRM (Italian Association for Marketing, Social and Opinion Research), Comune di Firenze (the Florence Municipality), SIS (the Italian Statistical Society), Regione Toscana (the Tuscany Region) and Valmon – Evaluation & Monitoring Ltd, have organised a scientific conference titled "*Statistics and Information Systems for Policy Evaluation*", aimed at promoting new statistical methods and applications for the evaluation of policies.

Due to the health emergency caused by the COVID-19 pandemic, the Scientific and the Local Organizing Committees decided to reschedule the conference appointment in two different scientific events: an on-line Opening Conference held in February and March 2021 and a postponed on-site Conference held in September 2021.

This Book includes 40 peer-reviewed short papers discussed during the on-site Scientific Conference. This event was spread over 3 days and organized in thematic sessions; each session, led by a chair, collected works on the following homogeneous issues: "Evaluation Of Educational Systems", "Decision Making", "Health and Well-Being", "Tourism and Gastronomy". The papers published in this book are organized in those sessions.

On behalf of the Scientific Program Committee, we would like to thank the authors for submitting and presenting their interesting and inspiring works in the context of the evaluation of policies, the partners, the chairs, the discussants and the Local Organizing Committee. Finally, we are thankful to the members of the Scientific Committee for helping with the peer-reviewing process.

Florence (Italy), October 2021

Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

SESSION

# EVALUATION OF EDUCATIONAL SYSTEMS

### Patrizio Frederica , Michele Lallab a Department of Economics "Marco Biagi" and and RECent (Center for Economic Research), **Determinants of the transition to upper secondary school: differences between immigrants and Italians**

**Determinants of the transition to upper secondary school: differences between immigrants and Italians**

> University of Modena and Reggio Emilia, Modena, Italy. CAPP (Centre for the Analyses of Public Policies), Modena, Italy. Patrizio Frederic, Michele Lalla

# **1. Introduction**

b

Education decisions that teenagers in 13-15 age range face are the first important steps in the lifecycle which determine their educational achievements and job trajectory. These choices occur at a particular stage in their lives, when influences inside the home are still strongly felt and knowledge about their interests and abilities or skills is vague and unstable. In this sense, such decisions strongly depend on both individual and family characteristics involving their socioeconomic conditions, as well as on the environment or contextual background of the area where they reside.

The objective of this paper is to pinpoint differences with respect to citizenship, a binary variable distinguishing between immigrants and non-immigrants (hereinafter also referred to as Italians), and the *secondary* binary variable, defined as equal to one for individuals who were not enrolled in an upper secondary school and equal to zero otherwise. The Bayesian approach has been applied to investigate the determinants of the secondary variable. The prior distribution was set to be a Laplace distribution with parameter λ. Hence, the Bayesian estimation of the model parameters corresponds to the Lasso estimation procedure. The latter is a popular method that simultaneously allows for the selection of the explanatory variables and their interactions and the estimation of the model coefficients. Starting from an initial model, which includes all the selected quantitative and categorical variables and all the interactions between the categorical variables, the applied method led to a very parsimonious model, but surprisingly it did not include family income.

# **2. Data sources and descriptive statistics**

The data were extracted from two surveys, with the reference year being 2009, carried out by the Italian National Institute of Statistics (Istat): one being the European Union Statistics (or Surveys) on Income and Living Conditions (EU-SILC) restricted to Italy alone, IT-SILC (Istat, 2008; Eurostat, 2009), and the other being the Italian Survey on Income and Living Conditions of families with Immigrants (IM-SILC), which is a single cross-sectional survey (Istat, 2009) that involved families with at least one immigrant component residing in Italy. The IT-SILC sample was added to the IM-SILC sample to obtain a sample with a consistent number of immigrants with respect to non-immigrants. For further details about these two data sets and about the main variables introduced in the model, see Lalla and Frederic (2020). The target sample was obtained by first selecting individuals in the age range of 16 to 19, obtaining a sample of 2,702 cases. Then, among the latter, the eligible cases were only those individuals whose highest attained ISCED (International Standard Classification of Education) level was equal to 2 (=lower secondary education). The final target sample was made up of 2,039 individuals.

The relationship between the secondary (binary) dependent variable and the ISCED Level Currently Attended (ILCA) showed that 16.9% of individuals were not enrolled in further education (termed "not-attending"), while 79.7% were currently attending an upper secondary school (Table 1).

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Patrizio Frederic, Michele Lalla, *Determinants of the transition to upper secondary school: differences between immigrants and Italians*, pp. 13-18, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.04, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

<sup>3</sup> Patrizio Frederic, University of Modena and Reggio Emilia, Italy, patrizio.frederic@unimore.it, 0000-0001-9073-2878 Michele Lalla, University of Modena and Reggio Emilia, Italy, michele.lalla@unimore.it, 0000-0002-1639-7300


**Table 1.** Absolute frequencies and row percentages of secondary (binary) dependent variable by the ISCED level currently attended (ILCA)

The ILCA was examined with respect to several qualitative variables and revealed many significant relationships. For the sake of brevity, only some of them are cited. The ILCA showed a significant relationship with respect to citizenship, CS(2)= 45.177 (p<0.000), where CS(g) stands for Chi-Square with g degrees of freedom: the percentage of immigrants attending upper secondary education was lower than that of Italian citizens (74.3% versus 81.7%), while the percentage of immigrant not in school was higher than that of Italians (24.9% versus 14.0%). There was a significant relationship between the ILCA and self-perceived health, CS(2)= 8.351 (p<0.015), implying that individuals perceiving fair or bad or very bad health tended to discontinue their education with respect to those perceiving good or very good health (Ichou and Wallace, 2019). The ILCA was also related to the index of the total self-perceived health of parents, CS(6)= 27.356 (p<0.000). The ILCA proved to be linked to the Italian macro-regions CS(8)= 39.092 (p<0.000), as industrialisation and the possibility of finding employment increased, the percentage of individuals not in school decreased. The ILCA was related to the maximum ISCED level attained by parents, CS(12)= 179.908 (p<0.000). As the education of parents increased, the percentage of young individuals in school increased. The ILCA yielded significant relationships also with several variables describing the working conditions of the parents, although the strength of such relationships was often weak.

The ILCA was also analysed with respect to the main quantitative variables.

The age of fathers analysed according to the ILCA and citizenship showed that the fathers of immigrants were younger than the fathers of Italians by about four years. Similarly, the mothers of immigrants were younger than the mothers of Italians by about four years and nine months. The Disposable Family Income (DFI) per capita (in thousands of euros) is reported in Table 2 by the ILCA and citizenship. On the average, the DFI per capita for immigrants was significantly lower than that of Italians by about four thousand euros: about 39.8%.


**Table 2.** Absolute frequencies, means, and standard deviations (SD) of the disposable family income per capita (in thousands of euros) by citizenship and by the ISCED level currently attended (ILCA) by their children

The other types of income considered in the models revealed various structures of relationships and levels of significance. For example, the gap between immigrant and Italian fathers amounted to about ten thousand euros, i.e., 37.4%. The mothers' disposable personal income presented similar statistically significant differences for both marginal effects, with a gap amounting to about five thousand nine hundred euros, i.e., 39.5%. However, the disposable personal income gender gaps were 51% for Italians and 54% for immigrants.

The size of immigrant families proved to be slightly larger than those of Italians and was statistically significant for both marginal effects, i.e., citizenship and the ILCA.

Citizenship was examined with respect to some other variables, even if it was not a target dependent variable. Its relationship with the maximum ISCED level attained by parents was statistically significant, CS(6)= 97.73 (p<0.000) (Bertolini and Lalla, 2012; Bertolini et al., 2015). Citizenship was significantly related to the degree of urbanisation, CS(2)= 24.225 (p<0.000): immigrants tended to settle in densely populated areas more than Italians (38.4% versus 35.5%) or in moderately populated areas (44.6% versus 39.3%). Citizenship also showed a significant relationship with the Italian macro-regions and yielded a significant relationship with the index summarising the total self-perceived health of parents, CS(3)= 29.832 (p<0.000) (Ichou and Wallace, 2019). Citizenship proved to be associated with many variables describing working conditions and revealed a significant relationship with the maximum position of parents on the job, CS(4)= 173.877 (p<0.000).

# **3. Model by Bayesian Lasso selection of regressors**

Let *Y* be the binary variable coding if the *i*-th individual is not attending upper secondary education, or he/she is. Let *<sup>i</sup>* **x** be a vector of regressors. Let *<sup>i</sup>* be the probability that *Y*=1 given *<sup>i</sup>* **x** . Let 0 (,, ) **β** *<sup>K</sup>* be the parameters vector of the model. The logit model is

$$\pi\_{l} = \frac{\exp\left(\mathbf{x}\_{l}"\mathfrak{P}\right)}{1 + \exp\left(\mathbf{x}\_{l}"\mathfrak{P}\right)}.\tag{1}$$

A common method that performs estimation and model selection at the same time is the *Lasso* method (Tibshirani, 1996), which is a procedure involving an additional penalization term, *L*1, summed up to the negative log-likelihood of the model that depends on an additional parameter named , 0. Many penalized methods can be interpreted as the negative logarithm of a posterior distribution in a purely Bayesian fashion. Let ( | , ) *i i p y* **<sup>x</sup> <sup>β</sup>** <sup>=</sup> <sup>1</sup> <sup>1</sup> *<sup>i</sup> <sup>i</sup> <sup>y</sup> <sup>y</sup> i i* be the usual logit model in the usual Bayesian notation, and let <sup>0</sup> ( | ) exp *<sup>K</sup> <sup>j</sup> <sup>j</sup> p* **β** be the Laplace prior distribution on coefficients **β**; then the posterior distribution is

$$\begin{aligned} p(\mathfrak{B}|\mathbf{x}, \mathbf{y}, \boldsymbol{\lambda}) & \quad \propto & p(\mathbf{y}|\mathbf{x}, \mathfrak{B}) \cdot p(\mathfrak{B}|\boldsymbol{\lambda}) \\ &=& \prod\_{i=1}^{n} \pi\_i^{y\_i} \left( 1 - \pi\_i \right)^{1 - y\_i} \exp \left( -\boldsymbol{\lambda} \Sigma\_{j=0}^{K} \left| \boldsymbol{\beta}\_j \right| \right) \end{aligned} \tag{2}$$

The choice of parameter *λ* plays a crucial role in the estimation procedure. Many different studies have focused on this issue. Besides the classic AIC and BIC criteria, a *k*-fold Cross Validation (CV) procedure and the One Standard Error Rule (1SE) have been proposed. The applied estimation method consists in two steps:

1. The model was first estimated using the *glmnet* (Friedman et al., 2010) package in R (R Core Team, 2019). Then the *optimal* lambda 1 ( ) *SE* and the mode estimations <sup>1</sup> ˆ *SE* **β** were evaluated.

2. Using the R package *MCMCpack*, N=10,000 samples were drawn from the posterior distribution 1 (|,, ) *SE p* **β x y** to perform a full Bayesian analysis, where 1 (| ) *SE p* **β** was chosen to be Laplace distributed.

Note that the model matrix of the starting model consists in 2039 rows by 943 columns, and classical methods can be affected by the *curse of dimensionality*. Instead, the Lasso method is very stable and quick, and shrinks 923 values (out of 943) of 1 ˆ *SE* **β** to zero; thus only 20 betas have a posterior distribution which is not symmetric to zero.

# **4. Outcomes of the logistic model**

The interpretation of coefficients in a logit model is not easy. The odds ratios (OR) are reported in Table 3, which presents only interaction terms of the first order because the analysis of interactions orders was limited to the first order to simplify interpretation. The interactions are indicated by the symbol , which may be read as "by".

A binary variable having an odds ratio greater than 1 implied that the group represented by the binary variable equal to 1 had a higher probability of having *y*=1 than the group identified by the binary variable equal to 0. The binary variables ( ) *<sup>b</sup>* **x** with an odds ratio greater than 1 were observed for interactions only. For example, the odds ratio of the interaction term the "father is limited in activity because of health problems" 1 ( ) *x* "father with a permanent contract" 2 ( ) *x* , denoted by 12 *x* , was equal to 1.826 meaning that the odds of the event *y*=1, when 12 *x* =1 (both 1*x* and 2*x* are equal to 1), are +82.6% greater than the odds of the event *y*=1, when 12 *x* =0. Let **x***<sup>c</sup>* **μ** be the mean values of the continuous regressors. Note that: (1) the product of two binary variables is a binary variable again, (2) the percentage of increment of the reference probability, | *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** , is given by [100\*(1OR)] and is reported below in parentheses, (3) the corresponding value of OR may be found in Table 3. The probability of having *y*=1 (i.e., of discontinuing their education) was equal to | *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** = **0.160**, calculated at the mean values of the continuous regressors ( ) *<sup>c</sup>* **x = μ** and the binary variables equal to 0 ( ) **x 0** *<sup>b</sup>* . Therefore, for <sup>12</sup> *x* the result was a probability of 12 x | 1.8260.160= 0.292, nearly double the probability for 12 *x* =0. Similarly, significant high probabilities of discontinuing one's education or dropping out were observed for other interaction terms: "father is limited in activity because of health problems" "family living in the macro-region Islands" (+149.0%), "father is limited in activity because of health problems" "family living in a moderately populated area" (+56.3%), "assets reduction for needs" "young individual with self-perceived bad health" (+234.0%), "assets reduction for needs" "mother with self-perceived bad health" (+56.3%), "family living in a densely-populated area" "parents are unemployed or inactive" (+122.7%), "mother only is employed" "parents skill level on the job is labourer" (+82.3%). In synthesis, real and selfperceived health conditions heavily affect the probability of discontinuing one's education in the transition from lower secondary to upper secondary school and throughout all the secondary school years, although this happens through the interactions with other factors. Note that the "number of helps requests for aid because the family lives in need", which is formally a continuous variable, interacts with "mother suffering from any chronic (long-standing) illness or condition" and yielded an odds ratio greater than 1 at the mean of the first term.

The binary variables having an odds ratio lower than 1 implied that the represented group had a lower probability of having *y*=1 than the complementary group. In Table 3 there are only two binary variables with an odds ratio lower than 1. For example, the binary variable "both parents employed" (BPE) had an odds ratio equal to 0.736 and hence the corresponding complement to one, expressed as a percentage, was equal to [100\*(10.736)] = 26.4%. Therefore, the probability of discontinuing one's education amounted to 26.4% (the negative value indicates the reduction quantity) of the probability of the complementary group, which did not have both parents employed, | . *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** In other words, the group with BPE equal to 1 had a probability BPE| 0.7360.160= 0.118, implying that the probability of the group with BPE equal to 1 decreased the probability of discontinuing their education by an amount of 100(10.736)= 26.4% with respect to the complementary group, which had a probability given by | *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** = 0.160. Similarly, a significant low probability of discontinuing their education was observed for the interaction term the "assets reduction for needs" "mother with permanent employment contract" (51.5%). The constant of the model was not statistically significant, even if its magnitude was comparable with other parameters.



The continuous variables. The individual age (range 16-19), expressed in decades, showed a parabolic and positive impact on the interruption of education paths before completion of upper secondary school. The high impact may occur for specific reasons: the survey protocols did not interview individuals under the age 16, the vocational school data were not collected well. The other continuous single variables entering the model showed significant effects on the interruption of education. As the ages of fathers and the parents' education levels increased, the probability of discontinuing education decreased. If the number of objects owned in home (dishwasher, refrigerator, telephone, television, and so on) increased, then the risk of interrupting one's education decreased. As indicated above, the increase in the "number of helps requests because the family lives in need" for individuals having a "mother suffering from any a chronic (longstanding) illness or condition" yielded an increase in the risk of dropping out of school. This empirical evidence highlights the importance of welfare programmes to help families experiencing economic and physical difficulties, with the specific aim of reducing the number of students interrupting their education.

The main fault of the Lasso method in selecting significant explanatory variables concerns the lack of some income variables in the model because various income components have frequently been found to be significant in the literature (Ochsen, 2011; Krause et al., 2015).

In the applications, the interactions should be supported by social, behavioural, psychological or economic theories. Otherwise, they may be obtained automatically just by using an adaptive procedure like the Lasso method and only as empirical findings. In fact, few models with interactions exist in the literature. Probably, the interactions may be easily found among binary or categorical variables, but this case is relatively interesting because they can be replaced with specific typologies. The same holds true for the interactions of a continuous variable with other explanatory binary variables, but the interaction between two continuous variables is very difficult to grasp immediately. In general, it is useful to find a theoretical justification for the existence of the interactions, instead of blindly searching for interaction terms. However, it is highly plausible that almost all phenomena are outcomes of interactions among many variables, but knowledge about and explanations of these results may become very complicated and challenging.

# **References**


# **An analysis of online posts of Veneto industries The top candidate is an intermediate one: An analysis of online posts of Veneto industries**

**The top candidate is an intermediate one:** 

Luigi Fabbris1 Tolomeo Studi e Ricerche, Padua and Treviso, Italy Luigi Fabbris

# **1. Introduction**

In this work2 , we examine the results of an experiment carried out in year 2018 by sending a number of fictitious CVs in response to a sample of job vacancies posted by Veneto industries. The experiment aimed at highlighting which applicant characteristics influence the recruiters' call back rate and speed (Brocco et al., 2021).

Common sense dictates that the best candidate for a job is the person who, among those who showed up, possesses the most qualifying characteristics to fill the vacancy. So, "best" is relative to the vacancy. For this reason, a company's recruiter tends to match the expected with the exhibited characteristics and ignores the candidates whose skills are at all irrelevant to the vacancy even if their human and social characteristics stand out.

We hypothesise that another factor affects the initial stage of the recruitment process: the applicant's expectations in terms of benefits and career as perceived by the recruiters. Our hypothesis is that recruiters match the applicants' expectations, as perceivable from their CVs, with the benefits the company is prepared to offer to the future employee and, for this reason, they discard both the worst applicants and the ones who are too good, thus favouring the intermediate ones.

The rest of this paper is organised as follows: Section 2 briefly describes the available data and the survey methodology, Section 3 presents the main results on reverse discrimination and Section 4 interprets the outcomes, refers to the mainstream literature and offers a conclusion.

# **2. Data and methods**

the data.

The experimental survey was carried out by sending a number of fictitious CVs in response to online posts for 120 job vacancies in Veneto industries. The experiment consisted of creating and sending five different CVs to each job opening and waiting for the company to call back. The CVs differed according to a fractional factorial design aimed to control a total of ten applicant characteristics: gender, place of origin, field of study, academic degree level and final mark, English and computer skills, driving their own car, and being a music lover or a youth group volunteer. The rate and speed of call backs was expected to reflect the acceptability—or conversely, the social discrimination—likelihood of certain characteristics or combinations of characteristics. As a whole, 600 CVs were emailed in response to online posts and 59 call backs were obtained.

The job openings were drawn from a specialised job-search website—subito.it. In a preliminary comparison with other websites and newspapers, we verified that all job openings advertised through local newspapers were available through the chosen website. Hence, we decided to only use the internet to collect the ads. As a whole, these belonged to the manufacturing industry (20.5%), the service industry (43.2%), other service sectors (21.8%) and commerce (14.5%). Job vacancies were related to the following five activities: administrative offices, human resource offices, marketing activities, commercial offices and information

9 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Luigi Fabbris, *The top candidate is an intermediate one: An analysis of online posts of Veneto industries*, pp. 19-24, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.05, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www. fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

 <sup>1</sup> luigi.fabbris@unipd.it; 2 The author wishes to thank Professor Maria Cristiana Martini for her precious help with the segmentation analysis of 1 The author wishes to thank Professor Maria Cristiana Martini for her precious help with the segmentation analysis of the data.

Luigi Fabbris, University of Padua, Italy, luigi.fabbris@unipd.it, 0000-0001-8657-8361

systems. These were jobs for which all hypothesised graduates might be appropriate. By design, each type of activity received the same number of openings: one-fifth of the sample.

The criterion variable *Y* of our analyses is having obtained a response to a mailed CV. So, the *Y* variable has two possible values: 1 if the company called back and 0 otherwise. For practical purposes, the telephone call backs were equal to the email ones. Moreover, the CVs were randomly selected from a pool defined by logically crossing the ten experimental factors, so all responses were equally important. A certain level of intra-post correlation is possible due to a partial similarity of CVs mailed in response to the same post.

We applied both a segmentation and a multilevel regression analysis. The segmentation, or regression-tree analysis, consists of a stepwise partitioning of the 600 CVs in subsamples according to a predictor at a time to maximize the between-subsample distance of the criterion variable. The segmentation procedure ends if either no partition is statistically significant or the size of a possible subsample is below a predefined minimum. This technique is particularly appropriate to highlight multiple interactions, that is the interaction between a plurality of predictors. In this work, we adopted the CHAID algorithm of the SPSS package, which allows the partitioning of the sample to any number (≥ 2) of subsamples and of categorical predictors. The CHAID results are presented in the following; other multivariate analysis results have been published elsewhere (Brocco et al., 2021).

# **3. Results**

The synthetic results of call backs (Table 1) show the following highlights:


(Brocco et al., 2021), this variable was significant at 10%.


*Significance level: \*\*\* 1%o, \*\* 1%; \* 5%; ° 10%.* 

A multivariate analysis was realised to better understand the reason recruiters showed higher preferences for intermediate levels of competencies and for graduates without a car. The segmentation analysis of the *Y* variable produced Figures 1 and 2 for the overall and the sevenday rates, respectively. The tree configurations showed the following:


level users of computers (17.4% vs. 1.5% for basic and expert users, respectively).


**Figure 1. Regression tree obtained partitioning the sample of CVs. Criterion variable: total proportion of call backs.** (Significance: \*=5%; \*\*=1%; \*\*\*=1%o; minimum group size=18)

# **4. Discussion and conclusion**

We hypothesised that criteria adopted by recruiters while selecting, through examination of CVs, applicants for an invitation to a job interview are complex. The golden standard of the selection process is the set of activities pertinent to the vacant job. Namely, when confronted with applicants with various competencies, recruiters restrict their choice to those that are pertinent to the vacancy.

**Figure 2. Regression tree obtained segmenting the sample of CVs. Criterion variable: call back by seven days from mailing** (Significance: \*=5%; \*\*=1%; \*\*\*=1%o; minimum group size=18)

However, competencies are a matter for interpretation because recruiters—with the possible help of line operators—even if they know exactly what the job entails, are called to state if the competence of the best candidate fits the organisation's expectations (Taylor and Bergmann, 1987; Rynes and Barber, 1990; Autor et al., 2003; Thebe and Van der Waldt, 2014). For instance, if they are confronted with two applicants, one with a Bachelor's degree and the other with a Master's in the same discipline, given equivalent financial standing, they tend to prudently choose the latter.

Besides competencies, the perceived attitudes and job-related values of applicants are the basic parameters for recruitment. At this level, the recruiters' tastes may discriminate against certain candidates. A vast body of research indicates that gender, race, age, and physical, moral and cultural characteristics may cause discrimination. Even in this research, the applicants' ethnicity appeared to be a cause of discrimination.

Our research highlighted a sort of reverse discrimination that pushed us to unveil why recruiters implicitly preferred women to men as well as candidates possessing a degree in social sciences or humanities to those with a degree in a STEM field and/or an intermediate rather than a higher level of English and computer skills and, finally, showed aversion against fresh graduates possessing their own car. Indeed, the multivariate analysis showed that the preference for women masked a prevalence of social sciences and humanities degrees among call backs and this means that gender is not a cause of discrimination.

The analysis of multiple interactions involving linguistic and/or computer skills showed a higher preference for applicants perceived as likely to be less demanding. The lower preferences for graduates owning a car can be considered a further symptom of the attitude not to call back wealthier people. We could conclude that recruiters, in opposition to job market common sense (Autor et al., 2003), considered the risk of losing an exceptional but demanding candidate a minor regret. The practical implications of this outcome are that applicants should consider writing their CV accordingly.

In decision theory, this type of attitude refers to the so-called *minimax regret*, or *avoidance of* 

*regret* criterion, which is typical of a risk-neutral decision-maker. An analogous theory in the recruitment field called *uncertainty avoidance* was initially developed by Hofstede (1980) with reference to country cultures and adapted to organizations by Barber (1998) and House et al. (2004). With regard to recruitment, the theory claims that companies should prevent applicants from dropping out during the selection process because the maintenance of candidacies is a factor that improves the organization's reputation.

According to Hofstede, Italy is a country with a high uncertainty avoidance culture. For instance, Italian companies prefer predictability and dislike ambiguous situations. So, in general, recruiters are frightened that changes in applicants' pursuit intentions could cause a loss of image for their organization and could negatively affect their career (Barber, 1998). Indeed, it is easy to imagine that top candidates are given more occupational opportunities than others and are more prone to drop out of candidate pools (Highhouse et al., 2003).

These cultural considerations3 interact with technology. Online job postings and the company's website are now a main source of organizational information. Therefore, applicants are aware of the reputation of the company advertising the job and recruiters are cognizant that applicants know this. This job market transparency contrasts with the hypothesis that recruiters prefer the good instead of the best candidates. However, graduates apply for job interviews even if they are called back for an interview at another company. Uncertainty avoidance theory is relevant to job applicants since, while trying to avoid the risk of unemployment, they apply even for dead-end and low-paying jobs.

Definitely, we resorted to the hypothesis that even the quality of vacancies could influence the search for a good-instead-of-the-best candidate. Unfortunately, job quality was not a factor in our experimental construct. We suggest that, in future work, the type and quality of job offers be considered as a recruiter's ulterior motive influencing their call back rate and speed.

# **References**


<sup>3</sup> Cultural studies in business are developing continuously. See, among others, Venaik and Brewer (2010).

### Pasquale Anselmia , Daiana Colledania , Luigi Fabbrisb , Egidio Robustoa , Manuela Scionic . <sup>a</sup> FISPPA Department, University of Padua, Padua, Italy. **Psychometric properties of a new scale for measuring academic positive psychological capital**

**Psychometric properties of a new scale for measuring academic positive psychological capital**

<sup>b</sup> Tolomeo Studi e Ricerche, Padua and Treviso, Italy. <sup>c</sup> Department of Statistics, University of Padua, Padua, Italy. Pasquale Anselmi, Daiana Colledani, Luigi Fabbris, Egidio Robusto, Manuela Scioni

# **1. Introduction**

The understanding of the factors that may influence the academic performance of students and the effectiveness of fresh graduates to stand the labor market is a crucial objective to develop adequate educational policies. Individual dispositions and personality traits are among the most important variables that should be considered to achieve this goal. Scholars attributed a relevant role to a set of traits developed within the framework of positive psychology (Seligman & Csikszentmihalyi, 2014), named "psychological capital" (PsyCap; Luthans et al., 2007). PsyCap is defined as an individual's positive psychological state of development, which is characterized by four traits: Self-efficacy, resilience, optimism, and hope. Self-efficacy (or confidence) represents one's awareness of having all the abilities and resources needed to accomplish his own tasks and duties. Resilience indicates the ability to overcome difficulties and "bounce back" from adversities and failure. Optimism reflects the subjective tendency to positively interpret events and circumstances and to consider both positive and negative aspects of reality to drawn new bits of knowledge (Youssef & Luthans, 2005). Finally, hope defines a positive motivational state that is typical of those people who are determined toward their goals and able to redirect, if needed, their strategies to achieve them.

Several instruments for the assessment of these traits can be found in the literature. The most popular is the PsyCap Questionnaire (PCQ; Luthans et al., 2007). Since these instruments are meant for workers, they may not be appropriate for assessing PsyCap traits among fresh graduates who are only about to enter the labor market. To overcome this limitation, a new instrument has been recently developed for measuring PsyCap among students and fresh graduates: The Academic PsyCap (Anselmi et al., 2021; Robusto et al., 2019). It includes four scales that measure the traits of the psychological capital (i.e., self-efficacy, resilience, optimism, and hope) and has been found to be significantly associated with several variables (e.g., entrepreneurial disposition and the number of actions taken to search for a job) that are relevant for students and young workers at the beginning of their careers.

In its last version, the Academic PsyCap includes 24 items, selected from an initial pool of 37, and is characterized by satisfactory psychometric properties (Anselmi et al., 2021). In this work, we present and discuss a refinement of the instrument through a bifactor approach aimed to improve it. The bifactor method allows for modeling the structure of a questionnaire through a general factor and a set of domain-specific factors. In the case of PsyCap, the general factor is the positive psychological capital, whereas the domain-specific factors are the four distinct dimensions it consists of. Using this method to refine the scale would allow for a better understanding of the structure of the positive psychological capital and for developing an instrument that, while assessing the four dimensions of PsyCap, also provides an effective measure of its general factor. This makes sense also in light of the findings of several studies that suggested the existence of a core underlying factor accounting for the overlap between the four PsyCap dimensions (Baron et al., 2016; Choisay et al., 2021; Luthans et al., 2007). The research supported the usefulness of considering the single

15 Pasquale Anselmi, University of Padua, Italy, pasquale.anselmi@unipd.it, 0000-0003-2982-7178 Daiana Colledani, University of Padua, Italy, daiana.colledani@unipd.it, 0000-0003-2840-9193 Luigi Fabbris, University of Padua, Italy, luigi.fabbris@unipd.it, 0000-0001-8657-8361 Egidio Robusto, University of Padua, Italy, egidio.robusto@unipd.it, 0000-0002-7583-2587 Manuela Scioni, University of Padua, Italy, manuela.scioni@unipd.it, 0000-0003-3192-4030

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Pasquale Anselmi, Daiana Colledani, Luigi Fabbris, Egidio Robusto, Manuela Scioni, *Psychometric properties of a new scale for measuring academic positive psychological capital*, pp. 25-30, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88- 5518-461-8.06, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

PsyCap components but also showed that they often act synergistically and that a broader construct may be more effective than the distinct components in predicting individuals' attitudes and performances (Baron et al., 2016; Dawkins et al., 2013; Luthans et al., 2007, 2016).

# **2. Method**

# **Participants**

A sample of 1,603 fresh graduates (Males 38.5%, Mean age = 24.44, SD = 4.36), recruited in the context of the PETERE project, took part in the study. All participants were surveyed within one month after graduation at the University of Padua. The survey was administered via a CAWI (Computer-Assisted Web-based Interviewing) system. Students from medicine and nursing courses were not included in the sample.

# **Measures**

The original pool of 37 items was used to measure the four facets of PsyCap: resilience (11 items), self-efficacy (9 items), optimism (9 items), and hope (8 items). All items were scored on a four-point Likert scale (from 1 "Completely disagree" to 4 "Completely agree").

# **Analytic approach**

A bifactor Exploratory Factor Analysis (EFA) was run on the 37 items. Relying on the results of this model and the investigation of item content, 20 items (five for each dimension) were selected to compose the new Academic PsyCap. Thus, starting from the original full item pool, a new version of the scale was obtained that was based on a bifactor approach. This new scale differed from that developed by Anselmi et al. (2021) with a different (non-bifactor) approach.

The factor structure of the resulting scale was investigated through confirmatory factor analysis (CFA). Three models were tested and compared: a one-factor model, a correlated fourfactor model, and a bifactor model. In the first model, all the 20 items of the scale were loaded on a single dimension (PsyCap). In the second model, four different and correlated factors were defined (i.e., self-efficacy, resilience, optimism, and hope), each consisting of five items. Finally, a bifactor model was run that included one general factor (i.e., positive psychological capital) measured by all the 20 items of the scale, and four domain-specific factors (i.e., selfefficacy, resilience, optimism, and hope), each measured by five items.

All models were run using Mplus7 (Muthén & Muthén, 2012), and the WLSMV estimator (weighted least squares mean and variance-adjusted; Muthén & Muthén, 2012), which is recommended for categorical observed data (e.g., Flora & Curran, 2004; Brown, 2006). The goodness-of-fit of the three models was evaluated using several fit indices: χ<sup>2</sup> , Comparative Fit Index (CFI), Standardized Root Mean Square Residual (SRMR), and Root Mean Square Error of Approximation (RMSEA). A non-significant χ<sup>2</sup> (*p* ≥ .05) suggests adequate fit. Since this statistic is sensitive to sample size, other fit measures were also considered. CFI indices close to .90 (over .95 for excellent fit), SRMR values less than .08, and RMSEA smaller than .06 (.06 to .08 for reasonable fit) are indicative of a good model fit (Marsh et al., 2004). To compare these competing factor structures, the Akaike Information Criterion (AIC; Akaike, 1974) was considered. To this aim, following Olatunji et al. (2019) and Rhemtulla et al. (2012), the 4 point Likert scale data were temporarily treated as continuous and the Robust Maximum Likelihood estimator (Muthén & Muthén, 2012) was used. Concerning AIC, smaller values are indicative of a better fit. Relative differences were considered meaningful if models differed in AIC (∆AIC) by 10 or more (Burnham et al., 2011). Concerning the bifactor model, a series of indices were also considered, namely the Explained Common Variance (ECV; Sijtsma, 2009; Ten Berge and Sočan, 2004), and McDonald's coefficients (1999) omega (ω) and hierarchical omega (ωh). The ECV represents the ratio of the common variance explained by the general factor to the total common variance (Reise, Bonifay et al., 2013; Reise, Scheines et al., 2013; Rodriguez et al., 2016). High values (.70 to .80) indicate that the factor loadings obtained from a unidimensional model well approximate those on the general factor obtained from the bifactor solution, and suggest that the scale is substantially one-dimensional (Rodriguez et al., 2016). McDonald's (1999) ω and ωh are factor-analytic "model-based" estimates of internal consistency. The former represents the proportion of variance of the scores that can be attributed to all sources of variance (i.e., general and domain-specific factors), whereas the latter quantifies the amount of variance that is accounted for by the general factor (Revelle & Zinbarg, 2009; Zinbarg et al., 2005, 2007). Both ω and ωh were computed for the general factor. Conversely, ω was also computed for the domain-specific factors. For this coefficient, values close to or greater than .70 are satisfactory. Concerning ωh, values larger than .75-.80 indicate that a factor can be interpreted as the measure of a single construct despite multidimensionality (Reise, Bonifay et al., 2013; Reise, Scheines et al., 2013).

The invariance of the scale across males and females and across bachelor and master graduates was tested through Multiple-Group Confirmatory Factor Analysis (MG-CFA). In the first step, the model was simultaneously fitted to the specific subsamples (males and females; bachelor and master graduates) to test configural invariance (i.e., the same pattern of fixed and free factor loadings were specified across groups). Subsequently, a series of constrained models were tested and compared to evaluate scalar (i.e., invariance of both factor loadings and item thresholds) and strict invariance (i.e., invariance of factor loadings, item thresholds, and residual variances). The test of change in CFI (ΔCFI) was used to compare nested models. Invariance was indicated by ΔCFI values lower than or equal to |.01| (Cheung & Rensvold, 2002).

# **3. Results**

Table 1 shows the factor loadings of the three models that were run on the 20 items selected by applying the bifactor EFA, whereas Table 2 shows the fit indices of these models. The onefactor model did not fit the data, while the other two models obtained a better fit. In the fourfactor model, consistently with theoretical expectations, all items showed meaningful loadings on the intended dimensions (*λ*s from .505 to .887, *p*s ≤ .001), even though correlations between factors were large (*r*s = from .580 to .985, *p*s ≤ .001). With regard to the bifactor model, all items significantly loaded on the general factor (*λ*s = from .328 to .799, *p*s ≤ .001) and on the relative domain-specific factors (*λ*s from .095 to .705, *p*s ≤ .05). The inspection of ΔAICs indicated that the bifactor model was superior compared with the other two models (ΔAIC between the one-factor and correlated four-factor models = 1892.64; ΔAIC between the onefactor and bifactor models = 2462.13, and ΔAIC between the correlated four-factor and bifactor models = 569.49). Moreover, given the high correlations between the latent factors in the correlated four-factor model, the bifactor solution seems to be the most suitable option to represent the structure of the scale.

In the bifactor model, the ECV of the general factor was .67, indicating that the scale should be intended as multidimensional. However, the value of the ωh coefficient was high (.86), and this suggests that, despite multidimensionality, the general factor could be interpreted as the measure of a single common construct (Reise, Bonifay et al., 2013; Reise, Scheines et al., 2013).

With regard to internal consistency, ω coefficients were satisfactory for both the general and domain-specific factors (ωs = .95, .88, .90, .83, and .81 for general, self-efficacy, optimism, resilience, and hope factors, respectively).

The invariance of the bifactor model was tested across males and females and across bachelor and master graduates. The results are reported in Table 3. All models reached a successful fit in all samples and the value of the ΔCFI supported the considered levels of invariance.

# **4. Discussion and conclusion**

In this work, a 20-item version of the Academic PsyCap was developed adopting a bifactor approach. The resulting scale was found to adequately assess the four dimensions of selfefficacy, resilience, optimism, and hope, as well as to appropriately define a general factor of psychological capital. In the bifactor model, both the domain-specific and the general factors showed adequate internal consistency and factorial validity.

The results of this work are in line with the literature that indicates that PsyCap components often act synergistically as a broader construct that may be more effective than the distinct components in predicting individual's attitudes and performances (Baron et al., 2016; Luthans et al., 2007; see Dawkins et al., 2013; Luthans et al., 2016).

Future studies are advocated to explore the relationships of the Academic PsyCap scales with indicators of students' and fresh graduates' achievements.


### **Table 1. Factor loadings and correlations between factors**

*Note*. All parameters were significant at *p* ≤ .001, excluding those indicated with \**p* ≤ .05 and \*\**p* ≤ .01. The parameter indicated with † was non-significant (*p* > .05).

**Table 2. Model fit indices**


**Table 3. Fit indices of multiple-group confirmatory factor analyses for invariance** 


# **References**


### Mariangela Zenga<sup>a</sup> . <sup>a</sup> Department of Statistics and Quantitative Methods, University of Milano Bicocca, Milano, **Gender and Information and Communication Technologies interest: results from PISA 2018**

Gender and Information and Communication Technologies interest: results from PISA 2018

> Italy Mariangela Zenga

# 1. Introduction

The information and communications technology (ICT) is a growing presence in the modern society, so knowledge and skills related to ICT should become an integral part of education. Moreover the development of digital literacy also takes place firstly at school but also in the informal learning at home, among peers and in other out-of-school contexts (Fraillon et al. (2014); Juhanak et al. (2019); Erstad (2012)). The literature on gender and ICT is a thriving topic in the last years. Previous research pointed out that the differences in gender are much more pronounced for ICT usage at home, instead of at school (BECTA (2008)): boys use more often ICT outside of school for leisure purposes (as for playing computer or console games) than girls. On the contrary, girls have a greater use of ICT for school work and online social networking. Considering attitudes, confidence and self-efficacy girls show lower level on ICT in comparison to boys.

In 2015, a new construct, *ICT Engagement*, was introduced by Zylka et al. (2015). ICT Engagement is theoretically based on self-determination theory (Deci and Ryan (2000)) and it is assumed to be "a crucial individual factor for developing and adapting ICT skills in a selfregulated way" that "facilitates learning and acquiring new knowledge and skills through the life span by using ICT in both formal and informal learning environments" (Goldhammer et al. (2017)). The ICT Engagement involves ICT interest, Perceived ICT competence, Perceived autonomy related to ICT use, and ICT as a topic in social interaction (Goldhammer et al. (2017)). In this work we are interested in the ICT interest (ICTI) that represents a "content-specific motivational disposition" and describes "individuals' long-term preference for dealing with topics, tasks, or activities related to ICT" (Goldhammer et al. (2017)). Six items are included in the construct using a four-point Likert response scale ranging from 1 to 4 (where: 1=Strongly disagree, 2= Disagree, 3=Agree, 4=Strongly Agree):


The overall index of ICTI based on the previous six items is scaled using a generalized partial credit model (Muraki (1992)) and values of the index correspond to Warm likelihood estimates (Warm (1989)) that are standardized in a second moment. In this way, the index shows the average equal to zero and the standard deviation equal to one across OECD countries (PISA (2018)).

Using 2018 PISA data, the relationship between gender and ICTI for 15-year-olds in OECD countries will be analyzed. Moreover a three-level multilevel model will show the effects on

<sup>21</sup> Mariangela Zenga, University of Milano-Bicocca, Italy, mariangela.zenga@unimib.it, 0000-0002-8112-5627 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Mariangela Zenga, *Gender and Information and Communication Technologies interest: results from PISA 2018*, pp. 31-36, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.07, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www. fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

the ICTI of the characteristics of the respondent, of the school in which the student is and of the country in which the student's living.

# 2. Data

The OECD Programme for International Student Assessment (PISA) is a triennal international survey and it aims to assess the performance in mathematics, reading, science and financial literacies. It provides the most comprehensive and rigorous international assessment of 15 aged students learning outcomes to date. In several countries the questionnaire has questions on students' familiarity with ICTs and engagement in ICT. The assessments are also supplemented by background questionnaires. Pupils are asked about their motivations for study, attitudes to school, views on reading, and their socio-economic background. Another questionnaire asked headteachers about the challenges facing their schools, organisation and factors that they believe affect their students' performance.

In this paper we analyze 109,106 15-aged students (49.1% of female) of 8,115 schools who were sampled for PISA 2018 in the 23 OECD countries. Approximately 38% of students started to use a digital device when they were 6 years old or younger, 37% when they were 7-9 years old and 25% when they were 10 years old or older.

# 3. Statistical method

The multilevel models are used in literature when the aim of analysis is to investigate the relationships between outcomes and variables when data presents naturally a hierarchical structure (Goldstein (2011); Hox et al. (2017) and Rice and Leyland (1996)). In this work, a model with three levels is proposed: students within school belonging to a OECD country. In particular, the multilevel models will control for the presence of a possible effect of school, which may render students within the same school more alike in terms of experienced outcome than students coming from different schools, everything else held equal. Moreover, it is possible to consider also the influence of country. As aforementioned, the proposed model includes three levels: students i as level-1 unit (i = 1, ..., n<sup>j</sup> ), school j as level-2 unit (j = 1, ..., J) and country k as level-3 unit (k = 1, ..., K). The aim of the analysis is the identification of some relationships between ICTI and some characteristics related to the students, schools and countries. Let Yijk be the score of the ICTI, for student i within school j belonging to country k. Following Hox et al. (2017), let X(1) = {Xijk} be the matrix for the explanatory variables at the level-1, X(2) = {Xjk} the matrix for the explanatory variables at the level-2 and X(3) = {Xk} the matrix for the explanatory variables at the level-3. The level-1 model states a linear relationship between the observed response and the level-1 covariates:

$$Y\_{ijk} = \alpha\_{0jk} + X^{(1)}\alpha\_{1jk} + \epsilon\_{ijk\cdot} \tag{1}$$

At the level-2, the intercept of the level-1 model (eq. 1) can be written as:

$$
\alpha\_{0jk} = \beta\_{00k} + X^{(2)}\beta\_{1k} + u\_{0jk}.\tag{2}
$$

Finally, the level-2 intercept in equation 2 can be modeled as:

$$
\beta\_{00k} = \lambda\_{00} + X^{(3)}\lambda\_1 + \gamma\_{0k}.\tag{3}
$$

Combining equation 3 and equation 2 in equation 1, it yields the following:

$$Y\_{ijk} = \lambda\_{00} + X^{(1)}\alpha\_{1jk} + X^{(2)}\beta\_{1k} + X^{(3)}\lambda\_1 + u\_{0jk} + \gamma\_{0k} + \epsilon\_{ijk}.\tag{4}$$

In eq. 4, the fixed effects are given by the the overall intercept (λ00), the student level, school level and country level covariates, the random effects are given by the school level and the country level (u0jk +γ0k), and the residuals are represented by ijk . Respect to the meaning of the levels, u0jk is the unobserved school random effect of the intercept amongst schools, with u0jk ∼ N(0, σ<sup>2</sup> <sup>u</sup>); γ0<sup>k</sup> is the random variation of the intercept amongst countries, with γ0<sup>k</sup> ∼ N(0, σ<sup>2</sup> <sup>γ</sup>); moreover ijk ∼ N(0, σ<sup>2</sup> ). Random components at different levels are assumed uncorrelated, whilst non null correlations are assumed for students in the same school or in the same country. The random effect among schools can be interpreted as the mean score in ICTI of schools with respect to outcome adjusted for fixed coefficients related to student, school and country characteristics. The u0jk estimates show the contribution of the j-th school to mean score in ITCI. Using the model in eq. 4, the intraclass correlation (ICC) is defined as:

$$\begin{cases}ICC\_{Level-2} = \frac{\sigma\_u^2}{\sigma\_u^2 + \sigma\_\gamma^2 + \sigma\_\epsilon^2} \\ ICC\_{Level-3} = \frac{\sigma\_\gamma^2}{\sigma\_u^2 + \sigma\_\gamma^2 + \sigma\_\epsilon^2} \\ \cdot \end{cases} \tag{5}$$

In general, the ICC indicates the proportion of the variance explained by the grouping structure in the population. In particular, ICC in eq. 5 identifies the proportion of variance at the school level (ICCLevel−<sup>2</sup>) and at the country level (ICCLevel−<sup>3</sup>).

# 4. Results

First of all, we analyze the difference in gender for the ICTI. In 2018, the ICTI mean for the female is equal to -0.10, while it is equal to 0.10 for male group and the *F* test for Gender F = 11.47, p = 0.0007 suggests that there are differences in ICTI between male and female. Considering the interaction of gender and country on the ICTI, the *F* test (F = 90.29,p < 0.0001) suggests significant effects. The Fig.1 reports the countries difference in means for gender for the ICTI as

$$
\Delta(ICTI) = ICTI\_{Female} - ICTI\_{Male} \tag{6}
$$

If ICTI < 0 then Female group shows lower value in mean for ICTI respect to the male group, while if ICTI > 0 then female group shows greater value in mean for ICTI respect to the male group. If ICTI = 0 female group has the same level of interest in ICT than the male group. Czech Republic students show the lowest difference in means for ICTI (-0.27), followed by Luxembourg students (-0.23) and Belgian students (-0.25), while Greek students have the highest difference in means for ICTI (0.23), followed by Korean students (0.17) and Irish students (0.15).

As the second aim, we consider a three-level multilevel model considering the effect on several explanatory variables on ICTI. The explanatory variables are: GENDER, AGE (Age when the respondent first used a digital device), ESCS (index of family economic, social and cultural status), WEALTH (family wealth possession), ICTHOME (availability of ICT devices in student's home), ICTSH (availability of ICT devices in student's school) and USESH (ICT use at school). Table 1 reports the results for the models. The test on the random effects shows that a three levels model is required underlining that the ICTI depends both on the shool effect and on the country effect. As shown in Model 0, the ICC values for the three levels model indicate that approximately 19% of the variability in the ICTI is accounted by the school and 3% by the country.

Figure 1: The difference in Gender in ICTI for OECD countries.

The results of the complete model (Model 1) show that the ICTI level of female respondents is higher than the level of male respondents. The age at which children start using a digital device has a significant relationship with the level of the ICTI. In particular, the sooner children approach a digital device, the higher their interest in ICT will be. The availability of ICT devices in student's home a significant positive relationship with ICTI, on the contary the availability of ICT devices in student's school have a negative impact. The family wealth possession results to have a positive impact on ICTI. Moreover as the use of ICT at school increases, the level of ICTI increases too.

# 5. Conclusion

The ICT interest is an individual preferring (long-term) participation in activities related to ICT and its use (Goldhammer et al. (2017)). For this reason it is important to investigate the relationship among the level of ICT interest of 15-years old students and variables in a family environment considering the influences of the school by a multilevel model. We verified that the gender difference exists, but it depends also by the country in which the students live. The results also seem to underline that the ICT interest depends in large scale on the school effect more than a country effect. Other factors were considered: gender, age at which children start using a digital device, availability of ICT devices in student's home, family wealth possession and use of ICT at school increases, the level of ICTI represent interesting explanatory variables for the model. No evidences seem to be related to the economic, social and cultural status of the student's family.


Table 1: Results of the three-level multilevel models. Source: calculations on 2018 PISA data. In table "rc" means "reference category"

# References

BECTA (2008). How do boys and girls differ in their use of ICT? Coventry: BECTA.


### Beatrice Donati <sup>b</sup> <sup>a</sup> Department of Statistics, Computer Science, Applications, University of Florence, Florence, **A structural equation model to measure logical competences**

, Riccardo Bruni b, Federico Crescenzi <sup>a</sup>

,

, Bruno Bertaccini <sup>a</sup>

A structural equation model to measure logical competences

Italy; <sup>b</sup> Department of Humanities University of Florence, Florence, Italy; Silvia Bacci, Bruno Bertaccini, Riccardo Bruni, Federico Crescenzi, Beatrice Donati

# 1. Introduction

Silvia Bacci <sup>a</sup>

Logical abilities are a ubiquitous ingredient in all those contexts that take into account soft skills, argumentative skills, or critical thinking. However, there is a substantial lack of research that addresses the actual possession of such logical abilities by students. With this aim, since October 2020 the University of Florence has promoted a three-stage initiative to collect data in order to measure the logical abilities of students when enrolling at the University. The first stage is an entrance test for assessing the students' initial abilities. This test comprises ten questions, each investigating a specific reasoning construct.

At the second stage, students attend a short training course to strengthen their logical abilities. As third step, in order to evaluate the effectiveness of the course, they take an exit examination, replicating the structure and the difficulty of the entrance test.

This paper builds on the previous work by Bertaccini et al. (2021) where the effectiveness of the course was tested via Item Response Theory (DeMars, 2010; Bartolucci et al., 2019) and test-equating techniques (Battauz, 2015). Building on an enlarged database of students that took the training course and examinations in the second half of 2021 and leveraging auxiliary information about students' characteristics, we estimated a Structural Equation Model (SEM; Duncan, 2014; Bollen, 1989) to have a better comprehension and interpretation of the results reported by Bertaccini et al. (2021).

# 2. Data and methods

# Data

The data that we analyse in this work are obtained from the 80 students that took both the tests and the short training course in the second semester of the academic year 2021-2021. The items of each test aimed at investigating the same logical constructs, namely: Double negation (item code N); Disjunction negation (item code D); Conjunction negation (item code C); Hypothetical reasoning (item code IMPL); Sufficient and necessary conditions (item code NEC); Negation of the universal quantifier (item code NU); Negation of the existential quantifier (item code NE); Modus tollens (item code MT); Syllogism (item code S); Multiple steps deduction (item code DED). The students who respond correctly to a given item are given a score equal to 1, otherwise they are given a score equal to 0.

In addition, we were able to obtain exogenous information on students' characteristics such as their age, the grade obtained at the secondary school, the scientific area the student has enrolled in (i.e. science, social, technic, humanistic) and the years of university enrolment.

Compared to the previous work by Bertaccini et al. (2021), the novelty of this study consists in investigating the role of auxiliary information on students to explain their logic abilities and the effectiveness of the training course. The authors assessed the effectiveness of the course by

<sup>27</sup> Silvia Bacci, University of Florence, Italy, silvia.bacci@unifi.it, 0000-0001-8097-3870 Bruno Bertaccini, University of Florence, Italy, bruno.bertaccini@unifi.it, 0000-0002-5816-2964 Riccardo Bruni, University of Florence, Italy, riccardo.bruni@unifi.it, 0000-0003-2695-0058 Federico Crescenzi, University of Florence, Italy, federico.crescenzi@unifi.it, 0000-0002-0701-4398 Beatrice Donati, University of Florence, Italy, beatrice.donati@unifi.it, 0000-0002-4707-8476

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Silvia Bacci, Bruno Bertaccini, Riccardo Bruni, Federico Crescenzi, Beatrice Donati, *A structural equation model to measure logical competences*, pp. 37-41, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.08, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

first estimating two Item Response Theory (IRT) models, one for the entrance test and one for the exit test, and then it tested whether there was a significant shift in the distribution of the logical abilities through a test equating procedure. Given the availability of auxiliary information, we opted for a one-step procedure based on SEM so as to take into account both the measurement issue of logical abilities before and after the training course and the structural relations among the observed (i.e., student characteristics) and the latent variables (i.e., logical abilities).

# Methods

A SEM is a multivariate technique used to test complex relationships between observed (manifest) and unobserved (latent) variables as well as relationships between two or more latent variables. Special observed variables, named indicators or items, are used to measure the latent variables. In turn, observed and latent variables distinguish in exogenous variables, which are not explained within the model, and endogenous variables that are affected by other variables in the model (plus an error term). A SEM is characterised by a system of multiple equations, discerning between two sub-models: (i) a structural model, designed to explain the relationships among latent variables as well as among endogenous latent variables and observed variables, and (ii) a measurement model, to link the latent variables to the items. In more detail, the structural model can be expressed by the following equation

$$
\eta = B\eta + \Gamma\xi + \zeta,\tag{1}
$$

where we model the latent logic ability at the exit test η as depending on the latent logic ability at entrance, ξ. Also, in (1) B is a matrix of regression coefficients of the endogenous latent variables; Γ is the matrix of regression coefficients among the endogenous and exogenous latent variables and ζ is vector of errors.

Figure 1: Structural part of the theoretical SEM.

The measurement model is defined by two equations, respectively for the endogenous (2) and exogenous (3) latent variables:

$$x = \Delta\_x \zeta + \sigma,\tag{2}$$

$$y = \Delta\_y \eta + \epsilon,\tag{3}$$

where y is a vector of the item responses, x is a vector of exogenous individual characteristics. In both (2) and (3) ∆<sup>y</sup> and ∆<sup>x</sup> are matrices of factor loading while δ and σ are vectors of error terms.

Note that when one or more exogenous variables are not affected by measurement errors, the structural (1) is simplified as:

$$
\eta = B\eta + \Gamma x + \zeta \tag{4}
$$

# 3. Results

The proposed SEM with all the significant variables is reported in Figure 1. More detailed estimates are shown in Table 1. The estimated SEM presents a good fit to the data with Comparative Fit Index (CFI) equal to 0.944, Tucker-Lewis Index (TLI) of 0.935 and a Root Mean Squared Error of Approximation (RMSEA) of 0.054. All estimates were obtained using the R-package Lavaan (Rosseel, 2012).

Figure 2: Final SEM.

Our results for the measurement part suggest that the course was indeed effective as the estimated items' coefficients are greater in magnitude after having attended the training course. (see Figure 1).

Regarding the regression part of the model, we found that the only significant determinant of the logical skill at entrance was the final grade obtained at the high-school. Also, we found that the only significant effect on the logical skill at exit was the logical skill at entrance. These results confirms that the short training course was indeed useful to sharp students' logical abilities and moreover it is consistent with the preliminary results obtained by Bertaccini et al. (2021).


Table 1: SEM results: measurement part, regression part, and covariances.

# 4. Conclusions

In this paper, we took extended the previous work study by Bertaccini et al. (2021) to offer a more comprehensive and a unified framework to test the effectiveness of the training course for the development of the logical skills of students enrolling at the University of Florence. The effectiveness of the course is confirmed thus making advisable for the University of Florence to design an internal policy so that it may become a standard tool of training and evaluation.

# References


Bertaccini, B., Bruni, R., Crescenzi, F., and Donati, B. (2021). Measuring logical competences and soft skills when enrolling in a university degree course. In Bertaccini, B., Fabbris, L., and Petrucci, A., editors, *ASA 2021 Statistics and Information Systems for Policy Evaluation: Book of short papers of the opening conference*, volume 127. Firenze University Press.

Bollen, K. A. (1989). *Structural equations with latent variables*, volume 210. John Wiley & Sons.

DeMars, C. (2010). *Item response theory*. Oxford University Press.


### Rosa Fabbricatore <sup>a</sup> , Francesco Palumbo <sup>b</sup> <sup>a</sup> Department of Social Sciences, University of Naples Federico II, Naples, Italy; <sup>b</sup> Department of Political Sciences, University of Naples Federico II, Naples, Italy; **Clustering students according to their proficiency: a comparison between different approaches based on item response theory models**

Clustering students according to their proficiency: a comparison between different approaches based on item response theory models

Rosa Fabbricatore, Francesco Palumbo

# 1. Introduction

Evaluating learners' competencies is a crucial concern in education, and home and classroom structured tests represent an effective assessment tool. Structured tests consist of sets of items that can refer to several abilities or more than one topic. Several statistical approaches allow evaluating students considering the items in a multidimensional way, accounting for their structure. According to the evaluation's ending aim, the assessment process assigns a final grade to each student or clusters students in homogeneous groups according to their level of mastery and ability. The latter represents a helpful tool for developing tailored recommendations and remediatiodddns for each group (Davino et al., 2020; Fabbricatore et al., 2021). At this aim, latent class models represent a reference.

In the item response theory (IRT) paradigm, the multidimensional latent class IRT models, releasing both the traditional constraints of unidimensionality and continuous nature of the latent trait, allow detecting sub-populations of homogeneous students according to their proficiency level also accounting for the multidimensional nature of their ability (Bartolucci et al., 2014). Moreover, the semi-parametric formulation leads to several advantages in practice: It avoids normality assumptions that may not hold and reduces the computation demanding.

However, when the interest is to accurately estimate the individual level of ability in addition to the clustering purpose, a two-step approach could be used.

In this vein, this study compares the results of the multidimensional latent class IRT models with those obtained by a two-step procedure, which consists of firstly modeling a set of unidimensional IRT models to estimate students' ability in each knowledge domain and then applying a clustering algorithm to classify students accordingly. Regarding the latter, parametric and non-parametric approaches were considered. In particular, the k-means clustering algorithm (MacQueen, 1967), the Gaussian mixture model-based clustering (McLachlan and Peel, 2000), and the archetypal analysis (Cutler and Breiman, 1994) were implemented.

The aim is to investigate similarities and differences in groups detection and students' classification. Indeed, describing students' profiles according to a set of reference groups can take many forms, depending on the adopted approach and estimation procedure.

# 2. Data and procedure

Data refer to the N = 944 subjects involved in the admission test for the degree course in psychology exploited in 2014 at the University of Naples Federico II.

The following five different domains represent the knowledge dimensions assessed by the admission test: Humanities (30 items), Reading (30 items), Mathematics (10 items), Science (10 items), and English (20 items). Correct answers receive one credit and are coded with 1, whereas blank and wrong answers receive no credit and are coded as 0.

33 Rosa Fabbricatore, University of Naples Federico II, Italy, rosa.fabbricatore@unina.it, 0000-0002-4056-4375 Francesco Palumbo, University of Naples Federico II, Italy, fpalumbo@unina.it

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Rosa Fabbricatore, Francesco Palumbo, *Clustering students according to their proficiency: a comparison between different approaches based on item response theory models*, pp. 43-48, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88- 5518-461-8.09, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Firstly, we carried out the multidimensional latent class IRT model to cluster subjects into classes as homogeneous as possible according to their abilities, concurrently accounting for the multidimensional structure of the data. Secondly, we implemented three two-step procedures exploiting the k-means algorithm, the Gaussian mixture modeling, and the archetypal analysis, respectively. Finally, we compared the different approaches employing a graphic example and evaluated their agreement through the Adjusted Rand Index (ARI; Hubert and Arabie, 1985). The ARI is a commonly used measure to evaluate distances in clustering. It allows comparing a partition with another one on the same elements or with external criteria. Index computation is based on the number of pairs of elements that are allocated in the same (or different) cluster in both partitions (agreements) and the number of pairs of elements that are placed in the same cluster in one partition but in different clusters in the other (disagreements). The ARI values range from 0 (random partitioning) to 1 (partitions perfect agreement).

# 3. Statistical method

Methods we compared in this study exploit IRT models for students' ability estimation (see Bartolucci et al. (2019) for a review on the IRT models). In more detail, we considered the two-parameter logistic (2PL) IRT parametrization, where the parameters of guessing and ceiling are constrained to be equal to 0. Thus the probability of correct response depends only on the discrimination and difficulty item parameters and the student's ability. More formally, the probability that the subject s correctly answers the dichotomously-scored item i (with i = 1,...,I) can be expressed as follows:

$$P(X\_{si} = 1 | \theta\_s, a\_i, b\_i) = \frac{e^{a\_i(\theta\_s - b\_i)}}{1 + e^{a\_i(\theta\_s - b\_i)}},\tag{1}$$

where Xsi is the response of the subject s at the item i with realization xsi ∈ [0, 1], θ<sup>s</sup> ∈ R is the ability of the subject s, a<sup>i</sup> ∈ R is the item discrimination parameter, and b<sup>i</sup> ∈ R represents the item difficulty. It is worth noting that traditional IRT models are ground on three main assumptions: unidimensionality, monotonicity, and local independence. Moreover, the latent trait is described by a continuous normal probability distribution.

Within this theoretical paradigm, the multidimensional latent class IRT models represent a semi-parametric formulation of the traditional IRT models, allowing releasing both the constraints of unidimensionality and the continuous nature of the latent trait. This extension is particularly useful for detecting sub-populations of homogeneous students according to their ability level.

Since we defined the ability as a multidimensional latent trait, each subject is described by the ability vector Θ<sup>s</sup> = (Θs1, Θs2,..., ΘsD) where D is the number of considered dimensions. Following the between-item multidimensional formulation, each item measures only one dimension, and thus items are divided into different subsets I<sup>d</sup> with d = 1, 2,...,D.

Moreover, according to the semi-parametric formulation, each latent trait have a discrete distribution with ξ1,...,ξ<sup>k</sup> support points defining k latent classes with weights π1,...,πk. The main assumption is that subjects in the same latent class share common levels of the latent trait. The generic class weight π<sup>c</sup> (with c = 1,...,k) represents the probability of belonging to class c and can be expressed as π<sup>c</sup> = P(Θ<sup>s</sup> = ξc) with <sup>k</sup> <sup>c</sup>=1 π<sup>c</sup> = 1 and π<sup>c</sup> ≥ 0.

Accordingly, the manifest distribution of the response vector X = (X1,...,X<sup>I</sup> ) can be formalized as:

$$P(\mathbf{X} = \mathbf{x}) = \sum\_{c=1}^{k} P(\mathbf{X} = \mathbf{x} | \boldsymbol{\Theta} = \boldsymbol{\xi}\_c) \pi\_c = \sum\_{c=1}^{k} \prod\_{d=1}^{D} \prod\_{i \in I\_d} P(X\_i = x\_i | \boldsymbol{\Theta}\_d = \boldsymbol{\xi}\_{cd}) \pi\_c,\tag{2}$$

where P(X<sup>i</sup> = xi|Θ<sup>d</sup> = ξcd) It is herein specified according to the 2PL parameterization.

The number of classes k can be derived from theoretical assumptions or by comparing the model fit measures at different values of k. Each unit was assigned to the class that corresponds to the highest probability of belonging.

The estimation of the model parameters is usually based on the Maximum Marginal Likelihood (MML) approach. In the specific case of the latent class formulation, the Expectation-Maximization (EM) algorithm is used (Dempster et al., 1977). The estimation process is performed through the R packages mirt (Chalmers, 2012) and MultiLCIRT (Bartolucci et al., 2014) for the parametric and semi-parametric IRT formulation, respectively.

As stated before, the latent class IRT models allow removing parametric assumptions that may not hold and make the estimation process computationally demanding. Moreover, they are more flexible than the parametric formulation when the main aim is clustering individuals. However, this semi-parametric formulation provides a less accurate estimate of the individual level ability than the continuous one.

Regarding the clustering algorithms applied on the ability estimates, a very brief description below. The *k-means* produces a hard clustering changing the data partition at each step taking into account the Euclidean distance of each point from the cluster centers. It is one of the most used algorithms in cluster analysis mainly due to its ease of implementation and interpretation. Nevertheless, the k-means algorithm works well only when dealing with spherical clusters and no outliers are present in the data set. Firstly, accounting only for clusters' centroids is not suitable enough to properly detect subpopulations that also have covariance parameters significantly different. Secondly, centroids could be dragged by outliers.

Overcoming these issues, the *Gaussian mixture model* provides a model-based clustering allowing to detect differences between sub-populations that share the same (Gaussian) distribution but have one or more different vectors of parameters; thus, these models estimate a specific covariance matrix for each cluster and better manage the presence of outliers. On the other hand, they could entail the risk of overparameterization: increasing model complexity does not guarantee a better solution to the classification problem.

Compared to the methods mentioned above, the archetypal analysis allows more separate groups, detecting extreme representative observations that differ from each other as much as possible. Consequently, it approximates each point in a dataset as a convex combination of this set of extreme data points, called archetypes, lying on the convex hull of the data. Conversely, drawbacks reside in its computation costs, especially as the number of observations increases.

The corresponding R packages used to carry out the analyses were stats, mclust (Scrucca et al., 2016), and archetypes (Eugster, 2009).

# 4. Results

A set of multidimensional latent class IRT models with a different number of latent classes k were estimated. Basing on the Bayesian information criterion (BIC; Schwarz, 1978), we chose the model with k = 3 as the best one for describing our data. Looking at support points, we notice that latent classes are decreasing ordered according to the students' proficiency levels in all the considered domains (see Table 1). In particular, Class 1 encompasses students with poor performance in all the six domains; Class 2 includes students with low performance in Humanities, Math, Science and English, and high performance in Reading; Class 3 consists of students with a good performance in all the domains except for Humanities for which they achieved an average performance. Class weights indicate that Class 2 (moderate ability) is the largest one (π<sup>2</sup> = 0.48), followed by Class 1 (higher ability; π<sup>1</sup> = 0.39).

This result was compared with students' classifications obtained by a two-step procedure.

As stated before, the comparison involved different clustering algorithms that were carried out on the students' ability estimates provided by the set of unidimensional IRT models (see Table 1). It is worth noting that the number of classes was imposed equal to k = 3 in all the clustering procedure for comparison purposes.



The example reported in Figure 1, showing only two of the five ability dimensions for simplicity and lack of space, allows depicting differences in students allocation due to the considered clustering approaches. As can be guessed from the picture, the multidimensional latent class IRT model reached the strongest agreement in terms of classification when the k-means algorithm was implemented (ARI = 0.53). The confusion matrix showed that the main difference resided in a higher allocation rate in class 2 rather than in class 1 for the multidimensional latent class IRT model compared to the approach based on k-means. A weaker agreement was found with the archetypal analysis (ARI = 0.39), whereas the lowest one was reported with the parametric approach based on Gaussian mixture modelling (ARI = 0.09).

Notice that in addition to the ability estimates in Table 1, the considered clustering approaches also differ for the allocation procedure that strongly influences the level of agreement between the partitions and, consequently, the ARI.

Figure 1: Students allocation based on different clustering approaches. X-axes and y-axes refer to students' ability estimation through the multidimensional IRT model in Humanities and Reading, respectively. According to the considered method, red points indicate: (a) standardized support points, (b) centroids, (c) component means, and (d) archetypes.

# 5. Conclusion

The study provides a useful insight in understanding dissimilarities between different approaches used for clustering purposes. Assuming as a matter of the fact that the adequacy of a method mainly depends on research goals and thus that there is not the best one in absolute terms, we compared different approaches illustrating which of the clustering algorithm we considered in the two-step procedure provides results more similar to those obtained by the multidimensional latent class IRT model.

The proposed comparison also invokes the difference between the parametric and semiparametric formulation of IRT models in practical applications.

Future research should investigate how the considered approaches work when a different data structure holds. Moreover, it would be interesting also to consider differences deriving from classical test theory rather than the IRT paradigm for the ability estimation.

# References

Bartolucci, F., Bacci, S., Gnaldi, M. (2014). MultiLCIRT: An R package for multidimensional latent class item response models. *Computational Statistics & Data Analysis*, 71, pp. 971– 985.

Bartolucci, F., Bacci, S., Gnaldi, M. (2019). *Statistical analysis of questionnaires: A unified*

*approach based on R and Stata*. Chapman and Hall/CRC, London, (UK).

Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. *Journal of statistical Software*, 48(6), pp. 1–29.

Cutler, A., Breiman, L. (1994). Archetypal analysis. *Technometrics*, 36(4), pp. 338–347.


McLachlan, G.J., Peel, D. (2000). *Finite mixture models*. Wiley-Interscience, New York, (NY).


### <sup>a</sup> Department of Social Sciences, University of Naples "Federico II", Italy; <sup>b</sup> Department of Economics, University of Foggia, Italy; <sup>c</sup> Department of Management and Quantitative Studies - University of Naples "Parthenope", **Sustainable Innovation: worldwide trends in the scientific production through a bibliometric study**

, Antonella Rocca <sup>c</sup>

, Corrado Crocetta b, Maria Gabriella Grassia <sup>a</sup>

, Claudio Quintano <sup>d</sup>

, Paolo

Sustainable Innovation: worldwide trends in the scientific production through a bibliometric study

Italy; <sup>d</sup> Department of Legal Sciences - University of Naples "Suor Orsola Benincasa", Italy Rosanna Cataldo, Corrado Crocetta, Maria Gabriella Grassia, Paolo Mazzocchi, Antonella Rocca, Claudio Quintano

# 1. The Sustainable Innovation

Rosanna Cataldo <sup>a</sup>

Mazzocchi <sup>c</sup>

The scientific production on the Innovation, especially on Sustainable Innovation, has grown in recent years. Research on sustainable innovation has expanded rapidly in order to understand how new technologies can make societies more sustainable.

Various expressions and definitions for sustainability and innovation have been reported in the literature. Sometimes the two concepts are combined and described with one term, *Sustainable Innovation*. Research on sustainable innovation has grown in popularity due to the need to incorporate sustainability within business practices (Boons and Ludeke-Freund, 2013). In- ¨ novation that is seen not only as a tool to guarantee a competitive advantage for companies but also as a tool that provides environmental benefits and produces social well-being (Cillo et al., 2019).

Tello and Yoon (2008) define the Sustainable innovation as "the development of new products, processes, services and technologies that contribute to the development and well-being of human needs and institutions while respecting natural resources and regeneration capacities". Several studies have focused on sustainable innovation and they stated that sustainable innovation can be studied on the basis of three main perspectives: internal managerial, external relational and performance evaluation (Cillo et al., 2019).

The paper contributes to the literature on sustainable innovation by providing the worldwide trend in the scientific production over time through a research conducted on the metadata of Web of Science, the main database commonly used by researchers. A bibliometric analysis has been developed to analyse a total of 1,511 documents published between 2000 and 2021 in order to discover the research trends in this field and the main dimensions and words related to the term "Sustainable Innovation".

# 2. Methodology

A bibliometric analysis has been used to explore the evolution of research in the innovation field. Bibliometric analysis is a quantitative approach for the analysis of academic literature using bibliographies to provide the description, evaluation and monitoring of the published research (Garfield et al., 1964); (White and McCain, 1989).

The methodological aim is to analyze publications, citations and sources of information (Rodriguez - Soler et al., 2020). The scientific community has always used bibliometric methods as a tool for analysis. For this study, the Bibliometrix package (Aria and Cuccurullo, 2017), in the R programming language (https://www.r-project.org/) was used. This recent R-package

39 Corrado Crocetta, University of Foggia, Italy, corrado.crocetta@unifg.it, 0000-0001-9059-5092 Maria Gabriella Grassia, University of Naples Federico II, Italy, mariagabriella.grassia@unina.it, 0000-0002-7128-7323

Paolo Mazzocchi, University of Naples Parthenope, Italy, paolo.mazzocchi@uniparthenope.it, 0000-0002-6632-314X Antonella Rocca, University of Naples Parthenope, Italy, antonella.rocca@uniparthenope.it, 0000-0001-8171-3149 Claudio Quintano, University of Naples Suor Orsola Benincasa, Italy, claudio.quintano@unisob.na.it, 0000-0001-8315-8476

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Rosanna Cataldo, Corrado Crocetta, Maria Gabriella Grassia, Paolo Mazzocchi, Antonella Rocca, Claudio Quintano, *Sustainable Innovation: worldwide trends in the scientific production through a bibliometric study*, pp. 49-54, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.10, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Rosanna Cataldo, University of Naples Federico II, Italy, rosanna.cataldo2@unina.it, 0000-0002-6324-8252

provides a set of tools for quantitative research in bibliometrics and scientometrics, supporting scholars in all key phases of analysis from the data importing to the visualization of results.

# 3. Analysis

The data for this research project were collected in the Web of Science's database of the Institute for Scientific Information (ISI). Web of Science (WoS) is the world's most trusted independent global citation database. It is recognised as covering a broad range of relevant journals and peer-reviewed articles of high quality (Cataldo et al., 2019).

To collect the documents published on this topic field in the past 20 years, we queried the WoS database on July 7th, 2021. A total of 1,511 documents published between 2000 and 2021 (incl.) containing the topic "Sustainable Innovation" were retrieved. The majority (907; 60%) were research articles. The second most common type of documents was proceeding papers which constituted 32.43%. Details about documents were shown in Table 1. Those documents show an average citation per documents of 13.08 in the considered period and were written by 3,897 authors from 663 different sources, such as journals, books, etc. The author's keyords are 3,694, while the keywords plus are 2,341.

According to Garfield (1990), the keyword plus "provides search terms extracted from the titles of papers cited in each new article in the ISI database, is an independent supplement for titlewords and author keywords". The collaboration index, that represents the mean number of authors per joint paperand is calculated as total authors of multi-authored articles/total multiauthored articles (Elango and Rajendran, 2012), is equal to 2.96. It implies the research team falls between 2 and 3 in the field of sustainable innovation.


Table 1: General Information

Figure 1 (a) presents the annual-trends of publications, indicating that sustainable innovation literature has been growing since 2007, peaking in 2018 with 333 documents published. Generally, almost 17% of annual growth rate has been observed in the production of research articles during the study period (see Table 1). Figure 1 (b) shows the annual number of citations. The works published in the first years of analysis have accumulated a lot of recognition. It is possible to note that the average of citations in 2002 was equal to 4.74, and a similar average is reached in 2017.

Figure 1: Scientific production (2000-2021), n=1511

Figure 2 shows the main ten sources of pubblication. The first source is a book with the title "A Creative Path to Sustainable Innovation" related to the Siam Physics Congress 2018 (SPC2018) with 190 documents published. From Figure 2 it is possible to note that the most relevant sources, based on the number of articles, are *Sustainability*, *Journal of Cleaner Production*, *Green Technologies for Sustainable & Innovation in Materials*, journals whose aims are to provide up-to-date information on new developments and trends in relation to this topics.

Figure 2: Main sources of pubblication

Figure 3 shows the number of articles produced by the authors of different countries and the rate of cooperation of each country's authors with other countries' authors (SCP: Single Country Publications; MCP: Multiple Country Publications).

Thailand produced a large number of papers in the analysis period, showing a rather low collaboration rate (MCP) with authors from other countries. This means that the Thai authors who write on this topic do not collaborate with foreign researchers. The USA, despite having the same number of documents as Thailand, has a higher MCP than Thailand. England is the nation with the highest rate of collaboration with foreign authors, followed by China and Netherlands. These links are highlighted in the Figure 4.

In the network the size of the circle of the country is related to the number of works published on the analyzed topic, the different colors of the countries and of the links represent the clusters that have been formed, as determined by the Louvain algorithm, while the strength of the collaboration is indicated by the thickness of the links (Crocetta et al., 2021). The networking analysis emphasizes the strong collaboration of the USA with China. USA collaborates with almost all the countries shown in the network, except for some such as Malaysia and Portugal.

Figure 3: Corresponding Author Country

Figure 4: Collaboration Network

In the network we can see that there are only five connections from Thailand (to USA, China, Sweden, United Kingdom and France) and this reinforces what has been said about the low rate of collaboration.

The last Figure, Figure 5, is the thematic maps, an intuitive plot in which author's keywords

are viewed as themes, classified by different levels of density (which represents the development degree) and centrality (which represents the relevance degree) in the network of scientific keywords (Cataldo et al., 2019).

Figure 5: Thematic Map

The cluster named "business model innovation" represents the motor theme, topic that is developed and relevant to the research field. In this cluster there are keywords such as "innovation system", "barriers", "innovation ecosystems", "drivers", "green buildings". The themes such as "sustainability" and "sustainable innovation" represent the basic themes, topics that appear ubiquitously in different scientific works and can be considered a common synthesis of the content expressed in the literature. The cluster named "innovation management" is positioned as emerging or declining themes, because this cluster is formed by keywords that are weakly developed and marginal. This cluster includes keywords such as "big data", "environment", "research and development". Finally, the cluster "knowledge management" represents the isolated theme. It is formed by keywords such as "product innovation", "literature review", "organizational learning", "process innovation", all topics that are of limited importance for the research topic.

# 4. Final remarks

The main purpose of this paper was to review the literature related to the sustainable innovation. This study has tried to provide a comprehensive view of scientific papers between 2000 and the first six months of 2021 in this research field. In doing so, we identify 1,511 documents found relevant in the Web of Science database by using the keyword "sustainable innovation". The scientific production has grown very gradually over the years reaching a peak of 333 products in 2018, in the previous year there were only 110. This shows that until a few years ago the concept of sustainable innovation was not yet widespread in the scientific community.

This research has shown the Thailand, USA and China have been the most productive countries in this area. In particular, the main authors who write on this topic are from Thailand and collaborate with each other, showing a very low collaboration rate with foreign researchers. British researchers, on the other hand, are those who collaborate with authors from different countries. The thematic map analysis has identified that cluster of "business model innovation" is the motor theme in this research field, while the cluster of "innovation management" has been emerging or declining theme. It must be said that the theme analyzed in this work is a fairly new and constantly evolving theme in literature. Therefore the results of this bibliometric analysis could be different in a few years. Furthermore, the analysis was carried out only with documents downloaded from the web of science, so it could be more global using other scientific databases. However, we hope the present study may assist researchers in investigation this theme in their researches.

# References


# **the North-West of Italy** Luigi Bollania **Personal weaknesses recognized by high school students in the North-West of Italy**

**Personal weaknesses recognized by high school students in** 

<sup>a</sup> ESOMAS Department, University of Turin, Turin, Italy. Luigi Bollani

# **1. Introduction**

This study is part of a project aimed at supporting the weaknesses of young people who are experiencing, or are at risk of reaching, NEET condition (Not in Education, Employment or Training; aged between 15 and 29 or 34, according to different definitions).

At present the entire project consists of three phases. The first, concluded in 2020 with the volume "From Neet to Need" (see also Bollani, Rota, 2018), presents the European statistical situation of the Neet phenomenon and its sociological implications; it includes an empirical study, based on thorough interviews with people in Neet condition intended to collect their life stories (Merril, West, 2012); the purpose is to classify them as to their internal and external needs, in order to suggest adequate support for intervention. The second phase, to be completed in 2021 with a second book, is the object of this study: it searches for harbingers of Neet status in the difficulties encountered during high school (which in Italy is divided into an initial two-year set and a subsequent three-year set); in fact, in general it is more simple and effective to tackle minor problems by a preventive approach, starting from school age. Of course students as such are not Neets and the anonymity required for a survey dealing with very personal issues does not allow a longitudinal study to be carried out on the same individuals. However, the construction of a comparison base referring to a generic population, relative to the incidence of certain states of weakness (or some of their combinations), will allow for a subsequent comparison with the incidence of the same difficulties in school age for those who find themselves in a Neet situation. The third phase, started in 2021, is about identifying good practices in order to grow Neets into working adults; the people selected to accompany the Neets in this process are now fully trained, while the first groups of Neets will be involved by the end of the year. The research group was identified among members of the InCreaSe Association (Innovation Creativity Settings; www.increasegroup.org), which boasts transversal research skills. Numerous local authorities are involved in the project, which is supported by Compagnia di San Paolo Foundation.

As to the specific object of this paper, the first signs of the extreme discomfort caused by Neet condition were sought by investigating signs of weakness in high school students in the Piedmont, Valle d'Aosta and Liguria regions. A survey questionnaire was administered to students, an operator being present in the classrooms, shortly before the start of the pandemic. Collected results will therefore be more directly linked to school activities "in presence", even if discontinuities in teaching methods and characteristics caused by the pandemic might have brought in alterations that will need to be monitored over time.

# **2. Themes of investigation and subjects involved**

School transmits knowledge accumulated and elaborated by society over the centuries, allowing younger generations to experience social, cultural and territorial belonging and to be citizens of the society with its rights and duties.

The proposed survey is embedded in the context of training institutions challenged to cope with rapid change and face uncertainty about the impact of offered training courses on personal

Luigi Bollani, University of Turin, Italy, luigi.bollani@unito.it, 0000-0002-2488-3659

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Luigi Bollani, *Personal weaknesses recognized by high school students in the North-West of Italy*, pp. 55-60, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.11, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

45

growth and access to work in the current context. The intention of the study is to highlight positive states of well-being, certainly present in a school context, but at the same time identify situations of discomfort among students which, if not adequately prevented, can develop in a harmful way. It was decided to address the issue from within the school itself, administrating questionnaires to students engaged in secondary education.

The topics of the survey, included in the questionnaire within specific sections, were:

• You and the school (role attributed to the school)


Special attention was devoted to the field organization, with an operator present in each class during the survey in order to ensure correct presentation of the questionnaire and homogeneous assistance as well as sense of anonymity towards teachers during its compilation.

It should be mentioned that the Italian secondary education system is divided into an initial compulsory two-year period ("biennio" below) and a subsequent voluntary three-year period ("triennio"); it is also divided into a more academic (classical or scientific) school called "liceo", a technical school ("tecnico") and a vocational school ("professionale").

Students were selected following a multi-stage process, considering schools and classes as first and second level units and students as third level units. The first stage units, i.e. the schools, reflect different territorial situations, with several focuses in the three regions. In particular, for Piedmont distinctions were made according to size and location of the centres and among Turin city areas with different socio-economic patterns. The choice of second stage units, i.e. classes, allowed to reach, to a certain extent, an overall balance among different educational paths; finally, for third stage units, i.e. students, a complete survey of the students present in classrooms at the time of the survey was opted for. In this way, 14 schools (some classes for each) were surveyed, in line with the three Italian regions of interest. In Piedmont 10 schools are considered: three are in Turin city (see Torino1, 2 and 3 in table 1), two in the city belt (in the municipalities of Nichelino and Settimo Torinese), five in other areas of Piedmont (one in the municipality of Pinerolo, close to Turin, and four in different provinces). In Valle d'Aosta and Liguria, two schools were surveyed (see Aosta1 and 2 for Valle d'Aosta and two schools in the municipalities of Genova and Savona for Liguria). Globally, 931 students were surveyed, keeping a sufficient number for each type of educational path, as shown in table 1.


Table 1 - Number of students surveyed by education type and school area

# **3. Growth of weakness as an accumulation process**

A first research objective concerns how weakness is formed and how it can grow. One possibility that also seems to emerge from life stories is that there is an accumulation phenomenon. A perhaps tolerable weakness emerging for a particular aspect of life in school age may be aggravated by a second circumstance of discomfort, and so on, with the consequence of reaching a very intense state of discomfort due to a sum of causes, each of which is not particularly relevant.

In this perspective, the approach used for this paper consists in considering the key questions of each section of the questionnaire, which, each in view of a specific aspect, detected a condition of well-being or weakness of the answering student. The number of difficult situations encountered for each student was then added; this sum enabled to consider further degrees of individual difficulty.

In particular, ten questions were examined, two for each section of the questionnaire; they are presented below; labels used in figures and the percentage of students in a state of weakness (see ending "\_yes") for each of them appear in brackets.

As regards the first part of the questionnaire, related to the role students attribute to the school and the relationship they have with it (first two sections), the selected questions were on:




An aspect fundamental in determining the categories of students most in difficulty certainly is the one concerning the relationship with classmates, which if problematic can lead to important episodes of exclusion and marginalization; thus the chosen questions were whether they:



Other elements taken into consideration are the average grade and the relationship with their teachers; the selected questions concerned:



Finally, two questions were chosen from the section about the relationship with one's family, i.e. whether:



The answers to each of these ten questions were coded in terms of presence or absence of weakness; the resulting synthetic variable indicates the amount of weakness situations, which can vary from zero (absence of weakness) to ten (maximum weakness intensity) for each answering student. The frequency distributions of this variable, called "intensity of weakness", for all respondents and for those living in different territorial areas of interest for the survey, are presented in figure 1.

The shapes of the frequency distributions shown in the figure are very similar, although Turin belt and Liguria display the highest average levels of weakness intensity (3.0 and 3.1) and city of Turin and belt of Turin show the highest variability (standard deviation 2.0 and 1.9).

In all cases, at higher degrees of weakness frequencies progressively decrease. Referring to a more stable condition, intensity of weakness is also measured more synthetically as follows, assuming to set only three qualitative degrees of weakness: students with intensity of weakness 0 and 1 are categorized as students "without weakness" (or nearly so), those with scores 2 and 3 are considered to have "lower weakness", while students with a score higher than 3 fall into the "greater weakness" category. The subdivision into bands respects the subdivision into tertiles, considering the variable starting from the most disadvantaged in the general total column (obtaining following frequencies: greater weakness 33.94%; lower weakness 39.53% and without weakness 26.53%).

Figure 1 – Frequency distributions of the intensity of weakness (0-10)

In order to check the usefulness of the considered variable empirically, as to the degree of weakness, a multiple correspondence analysis (MCA) was performed on the binary codified ten variables used for its construction; the intensity of weakness, as both ten level and three synthetic level variables, are maintained as supplementary. The MCA map is shown in Figure 2; software R and FactoMineR package were used for the analysis (Escofier, Pagès, 2008).

Figure 2 – Relationships among the ten main variables of the questionnaire (active variables are shown in the left box, while supplementary ones, relating to the intensity of weakness, are shown in the two boxes on the right)

Note: the MCA, given its coding scheme, underestimates the percentage of inertia explained by the first factorial dimensions (Abdi, Valentin, 2007). In literature, some methods of reevaluation of variances explained by first factorial dimensions are indicated to overcome this drawback (Benzécri, 1973; Greenacre, 2017). Using Greenacre's method (more parsimonious), we obtain a variance percentage of 74.65% for the first dimension and of 3.97% for the second (overall, therefore, the first factorial plan would explain the variance of 78.62%, which seems satisfactory; R software and in particular the ca package was used for this re-evaluation, as in Nenadic, Greenacre, 2007).

It can be observed that the intensity of weakness may be used as a good synthesis of the ten variables, as it displays quite homogeneous levels, growing from left to right of the graph and substantially following the first axis of the map. This is congruent with the presence of almost all "\_no" (absence of weakness) statements for almost all variables on the left of the map, while the "\_yes" (presence of weakness) statements are on the right.

It should also be noted that negative responses to the fact that school is useful for one's life and for personal growth, corresponding to a state of weakness visible in the map at the top right in an eccentric position (Life\_yes and Growth\_yes), are generally less recurrent than other types of weakness (4.40% for Life\_yes and 13.43% for Growth\_yes).

# **4. Living conditions that can facilitate weakness**

Considering Figure 2 again, the most important reading direction is - as already shown - from the far left (absence of weakness) to the far right (greater weakness). However, a second useful piece of information can be found using the bottom-up dimension, especially in the quadrants to the right of the graph where the points are vertically more dispersed and can provide information on the type of discomfort experienced. In fact, staying on the right, towards the top there are the already known negative answers on the importance of school for one's life and growth and also situations of prolonged absence (Absences\_yes): this could suggest a difficulty deriving from lack of motivation and escape. On the downside, however, there is a lack of support and attention from adults (Confide\_yes; Support\_yes), as well as isolation at school and outside peer groups (Isolated\_yes, Outside\_yes).

Figure 3 – Supplementary variables in the same plane as Figure 2.

Figure 3 also shows, on the same factorial plane as Figure 2, some characteristics of the surveyed students.

On the left side of the map, where difficulty is generally low, we note the simultaneous greater

presence of both parents in the family unit of cohabitation (par. home\_2), a high parental qualification (at least for one of the two parents; par. degr. high), "liceo" enrolment; for students of the second three-year period (who were asked more questions) also the conviction that school prepares for the future and they will to continue studying.

Compared to "liceo" students, those from other paths ("professionale", "tecnico") generally show more weakness. Females appear to be in a better position than males. Piedmontese schools seem to be in a better position than those of Valle d'Aosta or Liguria.

Looking at the top right of the map, students of the second three-year period, who do not think school prepares for the future (Prepare\_no) and do not want to continue studying (Continue\_no) are in the same position as those already discussed with lack of motivation and orientation to escape in Figure 2.

On the bottom right, which is the already discussed area with more students unsupported by adults and isolated by peers, Figure 3 adds the information concerning greater presence in this area of parents with low qualification (par. degr. low) and difficulty of presence of both parents in the family unit of cohabitation (par. home\_0 or par. home\_1).

# **5. Conclusions**

In this study, which considers secondary education institutions from the inside by a survey conducted on students, a transversal look at the various sections of the survey questionnaire was maintained to highlight one of the key aspects of the research, aimed at observing situations of well-being, but also of progressive weakness of the students themselves.

The focus is on observing in the sense of describing and therefore favouring a reflection on causes determining weakness: a method to place surveyed students in a scale polarized between well-being on the one hand and extreme discomfort on the other was proposed; subsequently it was shown how some characteristics (to be explored as possible motivations), derived above all from the school and parenting context, can accompany the different situations of intensity of individual weakness.

Furthermore, in the last three-year school period, attention was devoted to how one's "feeling" within the school context can influence the desire for a future perspective for oneself and how it can influence choices.

# **References**

Abdi, H.; Valentin, D. (2007). *Multiple correspondence analysis*, ed. N. Salkind, Encyclopedia of Measurement and Statistics, Sage Publications, Thousand Oaks, (CA), **95**, pp. 116-128.

AA.VV. (2020). *From Neet to Need. Il cortocircuito sociale dei giovani che non studiano e non lavorano*, eds. G. Lazzarini, L. Bollani, F. S. Rota, M. Santagati. Innovation Creativity Settings, Franco Angeli, Torino.

Benzécri, J.P. (1973). *L'Analyse des Données*. Dunod, Paris.


Merril B., West L. (2012). *Metodi biografici per la ricerca sociale*, Apogeo.

Nenadic, O., Greenacre, M. (2007). Correspondence analysis in R, with two-and threedimensional graphics: The ca package. *Journal of Statistical Software,* **20**(3), pp. 1-13.

### Emma Zavarronea , Maria Gabriella Grassiab , Rocco Mazzab , Alessia Forcinitib a Department of Humanities Studies, Iulm University, Milan, Italy. **Emergency remote teaching: an explorative tool**

**Emergency remote teaching: an explorative tool** 

b Department of Social Sciences, University Federico II, Naples, Italy. Emma Zavarrone, Maria Gabriella Grassia, Rocco Mazza, Alessia Forciniti

# **1. Introduction**

The worldwide rapid spread and severity of the infectious disease caused by Coronavirus forced the WHO to declare a global state of pandemic emergency during March 2020, by leading the governments around the world to adopt policies that created the widest rift of education systems in human history. As 85% of worldwide countries, also Italy has temporarily closed each educational institution, by causing the disruption of tertiary education for 16.89% of the Italian learner's population.

To ensure the "pedagogic continuity", universities adopted the transitioning from traditional face-to-face to online learning (e.g., Tallent-Runnels *et al*., 2006; Sangrà *et al*., 2012; Todri *et al.*, 2021). In particular, the shift to fully remote teaching solutions as response to crisis is called by Hodges *et al.* (2020) as *emergency remote teaching (ERT).* This paradigm shift created changes about the perception of the learning process (Lederman, 2020) and supposed significant didactic efforts in terms of digitalisation and interactive pedagogical approaches. The ERT main goal is not to re-design a long-term educational ecosystem, but to supply a rapid and temporary solution to a crisis condition (Appolloni *et al*., 2021) adopting a learning framework different from online ones.

This implicates venturing into uncharted territory with several logistical challenges and attitudinal modifications (Ribeiro, 2020), also in terms of teaching-learning assessment.

Thus, the evaluation of ERT on the quality of higher education becomes a sensitive issue. This paper raises the evaluation of the effectiveness of teaching delivery during the transition from a traditional model to the ERT one. The focus is to detect how ERT is perceived and how it can connect to students' performance and to the quality of education. The aspect of quality has been dealt out in terms of European Standards and Guidelines (ESG) adopted in 2005 by the Ministers of Higher Education of the countries participating in the Bologna Process (1999; Grano and Ricci, 2009). To ensure the quality of the tertiary education system, European countries have established monitoring agencies. In Italy, this agency is ANVUR, which received official accreditation by the European association for quality assurance in higher education (ENQA) in 2019. However, during the Coronavirus health emergency, ANVUR did not provide guidance to universities on how to manage distance learning and its evaluations, relying on the autonomy of universities which continued to adopt the traditional evaluation systems. In a higher education landscape dominated by quality assurance view for evaluating teaching quality and student satisfaction (Fabbris, 2007), the teaching-learning ERT solutions adopted could not be fit since it is affected by several new factors and the principles recalled by the Bologna Process may be not appropriate.

Thus, the paper focuses on an alternative simple tool for evaluating the quality of teachinglearning in ERT cases. Our research question has an explorative nature: we are interested in detecting empirical evidence about the learning assessment and engagement in higher education with focus on students' engagement and their success performance during ERT.

These dimensions have been represented in the ERT map inspired by perceptual maps of the consumer' theory (Whitlark & Smith, 2001; Gower *et al*., 2010). In our model, the ERT map has been realised by a data integration perspective which considers the university administrative and textual data in a multivariate scenario of methodologies. Textual information is represented by the student voice, since this provides essential information for Quality Assurance systems and for

51 Emma Zavarrone, IULM, International University of Languages and Media, Italy, emma.zavarrone@iulm.it, 0000-0001-9509-8773 Maria Gabriella Grassia, University of Naples Federico II, Italy, mariagabriella.grassia@unina.it, 0000-0002-7128-7323

Rocco Mazza, University of Naples Federico II, Italy, rocco.mazza@unina.it, 0000-0002-4901-5225 Alessia Forciniti, IULM, International University of Languages and Media, Italy, alessia.forciniti1@gmail.com

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Emma Zavarrone, Maria Gabriella Grassia, Rocco Mazza, Alessia Forciniti, *Emergency remote teaching: an explorative tool*, pp. 61-66, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.12, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the onsite conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

monitoring and managing information for universities processes. It represents one of the central issues in the most recent version of the European Guidelines and Standards for Quality Assurance in the European Higher Education Area adopted at the Yerevan meeting in 2015 and in AVA 2017 (where AVA stands for Self-assessment, Evaluation, Accreditation; ANVUR, 2021).

In the following, section 2 introduces the theoretical framework; section 3 describes our model and data; section 4 shows the main findings; section 5 presents the future directions.

# **2. Theoretical framework**

During the last two years, the ERT condition caused by the *SARS-CoV-2* pandemic allowed to enrich the literature by several contributions aimed to propose methods and techniques to evaluate the aspects of online teaching-learning in higher education.

Among approaches to evaluate engagement and performance, Bawa (2020) examined the effects of the pandemic related ERT on learners' grades, using an experimental design to investigate the shift of online learning. The analysis has been realised by comparing the same course content and assessment methods for an experimental group formed of students enrolled during 2019-2020 and a control group with students who attended college before the health crisis. The results showed better outcomes in the experimental group than in the control group, above all for highest range performance. Dost *et al.* (2020) investigated attendance and perceptions of medical students across 40 UK schools during May 2020. By means of a cross-sectional study conducted on a national level via an online survey on a 20-items questionnaire measured on Likert scales, the study examined the experiences of online teaching, perceived benefits and barriers and the reached outcomes. From the findings, it emerges that online teaching platforms allow students to digest information in their own time and at the same time to discuss them with peers and, showed to be effective in terms of achieving learning outcomes. Huang *et al.* (2020) analysed the students' engagement by adopting a mixed-methods design: from a quantitative descriptive approach to a qualitative visual method.

At the end of course, all students were given a format inspired by the Motivated Strategies of Learning Questionnaire evaluated on Likert scales about four components: task value, metacognitive self- regulation, effort regulation, and peer learning. In addition, the demographic information and the answers to open- ended questions were coded and grouped into themes by qualitative approach. The results demonstrated the engagement does not depend on learning experience but on extrinsic goal orientation.

Therefore, in a landscape of emerging difficulties to evaluate the student's performance and engagement in ERT contexts as the Coronavirus one, our work proposes a strategy of quantitative analysis aimed to show empirical evidence about the learning assessment and engagement in higher education.

# **3. Model and data**

To assess the quality and ERT success our proposal is based on study of two dimensions: the students' engagement (SE) and success performance (SP) that represent the proxy variables used for constructing our model of analysis. SE has a textual nature comes from students' voice whilst SP uses the career data of students. We obtained our model (*Fig.*1) by integrating these sources of information: textual ones linked to the analysis of the strengths of the engagement and administrative ones related to the results of students' performance. We operated a multidimensional analysis to study the data from these different sources.

# *Fig.1: Model flowchart*

The measurement of the SE was focused on answers referring to the strengths of the engagement which have been analysed by means of the textual approach. We applied bags of words scheme to transform unstructured texts data in a structured data matrix to analyse. We operated the following pre-treatment operations: 1. Texts normalization; 2. Lemmatization; 3. Stopwords deleting. We built a Document-Term Matrix (DTM) without low frequency words (frequency cut at 5 minimum term frequency and 2 at document frequency of the feature) and empty documents. The DTM has texts for each row (755) and words in columns (790).

The SP measurement is connected to the quantification of the students' results and for reaching this goal the administrative data were used. To construct the Success Performance Indicator (SPI), we considered the average of marks (*M*) and ECTS (European Credit Transfer and Accumulation System) or credits (*C*) achieved before and during the pandemic. More precisely, for measuring each variation of marks and ECTS we used (1):

$$
\Delta \mathcal{M}\_l \mathcal{C}\_l = \frac{\mathcal{M}\_{l\_1} \mathcal{C}\_{l\_1} - \mathcal{M}\_{l\_0} \mathcal{C}\_{l\_0}}{\mathcal{M}\_{l\_0} \mathcal{C}\_{l\_0}} \tag{1}
$$

where *Mi0* and *Ci0* denote the average of marks and ECTS obtained until February 2020 for each *i-th* student, where *i*=*1, ... N; Mi1* and *Ci1* are the average of marks and ECTS achieved from February to November 2020.

Therefore, the SPI*<sup>i</sup>* was computed by (2):

$$SPI\_i = \frac{\Delta \, M\_l \, C\_l}{\max(M, C)} \tag{2}$$

It considers each variation of average of marks and ECTS in relation to the maximum average of marks and ECTS (maximum of mark and ECTS number, 30x180=5400) reachable by 100.

To simplify the next steps, we recoded the SPI considering the quartile (Q) of SPI for each *i-th* student:

$$\text{SPI}\_{\mathbb{I}} = \begin{cases} 1 - \text{low if } \text{SPI}\_{\mathbb{I}} \le \text{Q}\_{1[\text{SPI}\_{\mathbb{I}}]} \\ 2 - \text{medium if } \text{Q}\_{1[\text{SPI}\_{\mathbb{I}}]} < \text{SPI}\_{\mathbb{I}} \le \text{Q}\_{2[\text{SPI}\_{\mathbb{I}}]} \\ 3 - \text{high if } \text{SPI}\_{\mathbb{I}} > \text{Q}\_{3[\text{SPI}\_{\mathbb{I}}]} \end{cases}$$

The SPI can be interpreted as: low performance when SPI*i* is lower or equal to the first quartile (SPI*i* ≤ *Q1*); medium performance if SPI*i* is between the first and the second quartile (*Q1* < SPI*i* ≤ *Q2*); high performance in the case of SPI*<sup>i</sup>* is greater to the third quartile (SPI*i* > *Q3*).

Using the SPIi recoded in terms of quartile and the lexical dimension of SE, we created a contingency table to cross the proxy variables of our model. We inserted into this matrix the performance information at the student level in the rows and lexical keyword extracted from

DTM in the columns. This was possible because the DTM matrix (from SE study) and SP measurement matrix had the same rows. With a multidimensional data analysis approach, we operated a dimensional reduction through Correspondence Analysis (CA). A factorial plan or perceptual map, where we plotted our features, has been created. With this strategy, we can study the association between the two dimensions taken into account in our model. The intersection of these two proxy variables on the factorial plan shows on the horizontal axis SE while on the vertical axis SP. According to our model three principal theoretical areas can be imagined:


The population is composed by the students enrolled in a three-year degree course at Iulm University of Milan (*N*=5000) during the academic year 2019-2020. The survey on ERT weaknesses and strengths had a response rate equal to 14% of the population. The investigated variables are related to the students' career: year of the course; type of high school; gender; average of the marks and ECTS obtained until February 2020, before the Coronavirus, and from February to November 2020, during ERT.

The female students are overrepresented. They respectively represent the 82.3% of the whole student population, the overall students enrolled in the first year are 44.3%. The Iulm University is composed by three Faculties: the 68% of the respondents studied at the Faculty of Communication and other respondents are equally split in other faculties.

# **4. Results**

The recoded SPI barplot is symmetric and as shown in *Fig.2 (a)*.

**S Indicator**

For the SE, 755 texts were parsed and tokenized from the corpus. The results are a set of strings containing words used in documents. Subsequently, we reduced language variability to avoid possible sources of noise and to improve the effectiveness of the next analytical steps. The step consisted in the normalization of words, spelling and brought back each inflected word in its canonical form. Finally, we pruned non-informative words and non-alphabetic characters from the texts. The vocabulary size consisted of 790 types. Subsequently, we reduced the dimensionality of this matrix, we filtered sparse words (with a sparsity threshold of 2%). At the end of the process, each document was represented as a document-vector and the number of types was 190. The comparison wordcloud plot shows the most frequent terms for separate grouping level of SPI (*Fig. 2 (b)*). The wordcloud allows to capture some differences among the words related to the different level of SPI: terms like "riascoltare" "listen to the lesson again" characterized the low level of SPI, whilst terms with negative meaning in the SPI were in the high-level area.

As we affirm in Par.3, a contingency table was created, and we obtained the factorial plan through the CA (*Fig.3*) where the first dimension explains 54.4% and represents the ERT success while the second one explains 45.6% and denotes the SE. We can see three sections characterized by SPI levels splitting the map in the three horizontal levels: low, middle and high. On the contrary, the SE can be read easily from right to left, where we find an individual student engagement and the collective students' engagement respectively. At first glance, we discover that the individual engagement is at the high level of SPI. This puts the semantic dimension related to the individual experience close to the high and low performance factors, as far as an exploratory analysis is concerned. We want to highlight that the two polarities (high and low) referring to the performance are both in the same half plane. Obviously, both refer to two different ways of experiencing distance for students. The difference consists, for the high SPI, in the facilitated access to the technologies made available and the possibility of optimizing the time available for the study. The low performance is close to the properly teaching dimension and relative to the contents of the courses.

*Fig.3: Factorial plan* 

# **5. Future directions**

The work proposes the data integration approach to create the ERT map. The CA allows to explore this integration using the contingency table built on SPI and SE, proxies developed to detect the success of the performance and the level of student engagement respectively. Attention should be paid to the use of short texts, inspired by the dialogue on social media, which do not always allow- based on the Italian language- to extract the true underlying concept. For this reason, future developments are moving in two directions: creation of an Italian dictionary, specifically for evaluating ERT, and creation of indicators that can be used in an agile way for subsequent ERT evaluations. The indicators could also be useful for drop out screening and prevention by monitoring the level of collaborative engagement.

# **References**


Il%20Processo%20di%20Bologna.%20Documenti%20ufficiali.pdf. Last access: 15/04/2021.


# **on lectures fruition and teaching effectiveness**  Maria Cristiana Martini, Marco Furini, Giovanna Galli **Effects of an experimental online education support on lectures fruition and teaching effectiveness**

**Effects of an experimental online education support** 

Department of Communication and Economics, University of Modena and Reggio Emilia, Reggio Emilia, Italy Maria Cristiana Martini, Marco Furini, Giovanna Galli

# **1. Introduction**

The fruition of university courses has significantly changed in the last decade, in consequence of the higher accessibility of technological devices: universities, as well as for-profit companies, started to propose video lectures, as a substitution or in support of traditional lessons, and massive online open courses (MOOCs) have gained more and more importance in the education processes. This tendency comes as an answer to a growing need for flexibility expressed by working students, life-long learning processes, students who have families and care burdens, those with some forms of disability or special needs that make it difficult to attend classes.

Many authors have investigated the effectiveness of video lectures, primarily in comparison with face-to-face classes, with mixing results: some found no significant differences between online and face-to-face courses (Lim et al., 2007; Neuhauser, 2002; Nemetz et al., 2017), while others suggested higher outcomes in online courses (Soffer and Nachmias, 2018; Burkhardt et al., 2008; Connolly et al., 2007; Lim et al., 2008). The Covid-19 pandemic has magnified and accelerated the surge of online teaching, in a way that makes the change hardly reversible. The ongoing debate on the effectiveness of video lectures in higher education is meant to last and intensify.

In this paper, we describe and discuss the implementation, the acceptability, and the effectiveness of an experimental service designed to capture, record, edit and stream video lectures; this system was introduced with the principal aim of supporting, and not substituting, in-class learning. In detail, Section 2 illustrates the experimental service and the main usage behaviours; Section3 presents the main results in terms of effectiveness and usage models, while some conclusions are drawn in Section 4.

# **2. ONELab: an experimental education support**

ONELab is a system designed to capture, record, edit and stream video lectures, introduces by the Department of Communication and Economics of the University of Modena and Reggio Emilia in September 2017. Traditional face-to-face classes were regularly held, but ONELab was intended to ease the educational experience of those students who cannot attend classes regularly, and to provide an additional support to traditional students. Each classroom is equipped with a video camera pointed on the teacher's desk, an audio system to capture and amplify the teacher's voice, a screen to display the slideshow, and a live video production system to capture, mix, record and stream the video signals (i.e. teacher's video and slideshow) and the audio. After a minimal post-processing, the video lectures are loaded to the online platform and made available for students (see Furini et al., 2018; 2020 for more details).

In the first year of experimentation, from September 2017 to June 2018, 1,376 video lectures were produced, covering the 49 courses offered in the first year of the five bachelor's and master's degrees supplied by the Department, for a total of 2,064 hours. In the academic year 2018/19 these numbers doubled, and further increased in 2019/20, as the courses offered in the second and third year joined the experimentation.

Maria Cristiana Martini, University of Modena and Reggio Emilia, Italy, cmartini@unimore.it

Marco Furini, University of Modena and Reggio Emilia, Italy, marco.furini@unimore.it, 0000-0003-1094-6521

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Maria Cristiana Martini, Marco Furini, Giovanna Galli, *Effects of an experimental online education support on lectures fruition and teaching effectiveness*, pp. 67-72, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.13, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

<sup>57</sup> Giovanna Galli, University of Modena and Reggio Emilia, Italy, giovanna.galli@unimore.it

The analysis of the first year log files shows that students' reaction was enthusiastic, with an average of 8,323 video lectures played each month in the first year, from September 2017 to August 2018, and a peak of 14,483 views in January 2018, during the exams session. Overall, during the year students watched video lectures for 71,488 hours. The most watched lectures relate to technical subjects, such as mathematics and statistics, where students take advantage from the possibility to replay some difficult passages until they are clear.

Video lectures are watched mostly during the teaching semester, but a significant part of students resort to watch them when the semester is over, especially during the exams sessions. The usage analyses show that students watch video lessons mostly during the working hours, from Monday to Friday; however, 16% of the views happen during the week-ends, and 22% in the evening and during the night, suggesting that, when given the opportunity, students tend to customise the learning process to their needs and life-style.

Of the 1251 freshmen in the academic year 2017/18, only 319 (25.4%) never accessed the ONELab platform to watch video lessons during their first year, while 13.4% never accessed it, neither in the first or in the second year. Table 1 shows the percentage of non-users among different categories of students, separately for undergraduate and graduate students.


**Table 1.** Percentage of students who never accessed the ONELab video lectures during the first year and in the first two years, per students' characteristics.

*Significance level: \*\*\* 99.9%, \*\* 99%; \* 95%; ° 90%.* 

The use of ONELab is particularly popular among graduate students, while almost one undergraduate student out of four never watched any video lecture. Females are more conscientious than males, and look for every provided support to enhance their preparation, but the difference is statistically significant only among undergraduates. Students affected by Specific Learning Disorders are only 12, but none of them missed the new learning support, that allows for a certain degree of self-paced study, ensuring more control over their learning. On the other hand, undergraduate students coming from technical and vocational colleges are less organised in their study, and overlook video lectures to a greater extent, while non-EU foreigners miss this support both as undergraduate and graduate students.

The recourse to video lectures is extremely scarce among students who end up dropping out the university in the first year. To some extent, these students might have dropped out because they did not take advantage of ONELab to support their studies, but it might also be that some students decided to leave the university, or to transfer to a different degree, so early that they did not have time to try the video lectures.

# **3. Effectiveness of the video lectures**

The high percentage of usage is a first indirect indicator of effectiveness of the video lectures, but we aim at assessing the benefits of the video lectures in terms of learning outcomes, namely the number of acquired (European) credits and the final grades. We focus on students enrolled in 2017, and analyse data on their ONELab accesses and academic achievements during the academic years 2017/18 and 2018/19. We do not consider the third year, neither for three-year courses, because it corresponds to the Covid-19 pandemic outbreak, when face-to-face classes were totally replaced by video lectures in the second semester, which makes the situation incomparable.

Since early dropouts produce a low number of credits (or no credits at all) and a low level of access to the online platform, we remove them from the following analyses to exclude the existence of spurious relationships.

Table 2 shows that the average number of credits acquired by students who watched video lectures is largely bigger than for those who did not, both in the first and in the second year, and the difference is statistically significant. Students who accessed ONELab also performed better in terms of grades: the average grade of accessing students is higher than for non-accessing students, although the difference is only marginally significant in the second year. However, separate analyses carried out on undergraduate and master students demonstrate that students who accessed the video lectures show significantly better performances only in terms of acquired credits.


**Table 2.** Average number of acquired credits during the first year and average grade for ONELab users and non-users, per degree level.

The rough distinction between students who never watched video lectures and those who accessed the platform at least one time, although simplistic, has proven to be meaningful in explaining performance differences among students. We try to describe in more detail the different usage styles of those who accessed the platform at least one time in the two years through the following variables, separately measured on the first and the second year:

 *Total number of accesses*: this variable measures the general degree of usage during the first year. It varies between 0 and 864 for the first year, and between 0 and 1,885 in the second year; the average is respectively 75.3 and 112.7;


Some of these variables show unexpectedly high values (for example, students registering 1,885 total accesses, or students who played 698 times the video lectures of a single course), and the reason is twofold. First, every single access does not correspond to a complete play of the video-lecture; as reported by many students, "critical" passages, especially on some technical topics, have been repeatedly reloaded and re-played, and sometimes a lecture is erroneously played while looking for another one, or for a different part of the same recording. In addition, the platform was a novelty that probably raised curiosity among students, leading some to explore the resources far beyond the actual usage.

Based on the ten described variables, we perform an agglomerative hierarchical cluster analysis; the agglomeration criterion is the Ward's method that, at each step, merges the couple of units/clusters that leads to minimum increase in total within-cluster variance. The distance is the squared Euclidean. Given the different order of magnitude, all variables have been rescaled to the [0-1] range using min-max normalization.

This cluster analysis suggests the existence of four distinct groups; combining these clusters with the group of absolute non-users, we obtain the five profiles described in Table 3 (first year dropouts are excluded from the analysis):


users. They are probably students who discovered the video lectures only late in the first year, or approached them first without enthusiasm but found out they were useful than expected for their preparation.

5. Zealous users: They amount to 9.5% of students, and they accessed the platform hundreds of times; they accessed more or less all of the courses provided in their study program, and they were assiduous on most of them, playing video-lectures from each single course up to 70 times in the first year, and even up to 120 in the second year.



For each group of students, the learning performances are reported in Table 4. The level of performance increases with the frequency of usage of the ONELab services. Regarding the number of credits, all the group means are statistically different at least at a 95% significance level, except for Converted users, that are not significantly different from Regular users in the first year, and from Zealous users in the second year. The average grade shows only slight differences, nevertheless consistent with a better performance for regular and zealous users. Differences between graduate and undergraduate students are not noticeable.



# **4. Conclusions**

In this paper, we analysed the effectiveness of an experimental platform to provide university students with remote access to video lectures to support traditional face-to-face classes. Results show higher learning outcomes for students who regularly watched the video lectures, primarily in terms of the number of acquired credits. This is consistent with the conclusions drawn in Cagliero et al. (2017), who report higher student's success rates following the introduction of an analogous system to provide video-recorded lessons to complement in-class learning. In our experience, the beneficial is particularly pronounced for undergraduate students, although they show a more limited recourse to the platform than graduate students.

However, a more careful analysis of the principal beneficiaries of the implemented service casts a shadow on the capacity of the system to smooth learning ability differences and recover those students who have a hard time keeping pace with their studies and exams. Video lectures, in fact, are mainly watched by conscientious students, i.e. females, students coming from "lyceum" high school, and graduate students, who aim at improving their learning through additional educational material, while critical students are those who access the platform less. This suggests that the information about the new service should be conveyed to students in a more careful and focused way, addressing especially to students at risk of being left behind and dropping out. In this sense, given the strong connection between dropouts and video lectures (non) usage, monitoring and analysing the access data might help to detect critical students, and try to prevent them from dropping out.

Finally, a negative consequence of the introduction of this service was a dramatic decrease in the number of students attending classes, much before the university classrooms were emptied by the pandemic crisis. When face-to-face classes will return to normality after more than one year of online teaching, the problem is likely to become even more compelling, forcing teachers and pedagogues to rethink face-to-face classes in a more interactive and engaging format.

# **References**


SESSION

# DECISION MAKING

# **policies in Italian regions: are we doing enough?** a **Measuring the effectiveness of COVID-19 containment policies in Italian regions: are we doing enough?**

**Measuring the effectiveness of COVID-19 containment** 

Demetrio Panarelloa , Giorgio Tassinari <sup>a</sup> Department of Statistical Sciences "Paolo Fortunati", University of Bologna, Bologna, Italy. Demetrio Panarello, Giorgio Tassinari

# **1. Introduction**

The Coronavirus disease 2019 (COVID-19), caused by the SARS-CoV-2 virus, was first identified in Wuhan, China, in December 2019. The disease quickly spread to the rest of the world. The earliest cases of Italian citizens infected by the virus were detected on the 21st of February 2020 (Romagnani et al., 2020). Italy was sent into a severe lockdown on the 10th of March (Verma et al., 2020) and emerged from it on the 4th of May, slowly starting to reopen its economic activities (Buonomo and Della Marca, 2020).

While the lockdown conveyed a message of danger, the reopening might have led citizens to perceive that the threat had come to an end (Reinders Folmer et al., 2020). Indeed, during the lockdown, inhabitants were obliged to confine themselves under severe penalties; after that, the issue was confidently put into citizens' hands, who were now able to choose how much they were willing to cooperate. An effective response to the pandemic relies heavily on citizens' compliance with the restrictive measures put in place to halt its spread (Sobol et al., 2020), ultimately reducing the number of deaths.

With this paper, we aim at giving an insight into how Italian citizens' compliance with the restrictions – measured through longitudinal data on sanctions and movement trends – has affected the number of deaths over time. Moreover, we investigate what would have happened if, in the event of insufficient compliance on the part of citizens, heavier restrictions were put in place. In so doing, we provide an estimate of how many human lives could have been spared as a result of stricter public health regulations.

# **2. Data and Methods**

Our data come from several sources of information. For each considered variable and each of the 107 Italian provinces, we collected 260 daily observations, pertaining to the period running from the 24th of February to the 9th of November 2020.

First, we collected the daily distribution of COVID-19 positive cases, performed swabs, and recorded deaths in the country's 19 regions and 2 Autonomous provinces, provided by the Italian Civil Protection (Dipartimento della Protezione Civile, 2020). To each province, we associated the corresponding regional values. Country-level daily swabs (in thousands), positive cases (in hundreds), and deaths are plotted in Figure 1. The number of swabs, which was remarkably low at the beginning of the pandemic, shows a major increase in the second half of the considered period. At this point, the deaths line starts keeping pace with the swabs one, so that the number of deaths becomes close to 1 per 100 positive cases.

Furtherly, we made use of the Containment and Health Index, developed by the University of Oxford's Blavatnik School of Government (Hale et al., 2020), tracing the government response to the pandemic outbreak over time. It is a composite index made up of 12 country-level indicators on closings of schools and universities, closing of workplaces, cancelling of public events, restrictions on private gatherings, closing of public transport, stay-at-home requirements, restrictions on internal movements, restrictions on international travel, presence of public information campaigns, testing policy, contact tracing, and facial coverings policy.

Moreover, we gathered the number of daily controls and fines imposed on citizens due to

63 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Demetrio Panarello, Giorgio Tassinari, *Measuring the effectiveness of COVID-19 containment policies in Italian regions: are we doing enough?*, pp. 75-80, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.15, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Demetrio Panarello, University of Bologna, Italy, demetrio.panarello@unibo.it, 0000-0003-1667-1936 Giorgio Tassinari, University of Bologna, Italy, giorgio.tassinari@unibo.it, 0000-0002-5161-7989

disrespecting the COVID-19-related restrictive measures, made available by the Italian Ministry of the Interior (Ministero dell'Interno, 2020) at the national level. We can calculate the sanction rate as the ratio between the number of fines and the number of people who were controlled on a given day; the compliance rate is the one's complement to this rate, which represents a proxy of citizens' degree of adhesion and consent to the measures aimed at containing the Coronavirus spread.

Additionally, we employ Google's Community Mobility Reports, capturing movement trends across various locations at the province level (Google LLC, 2020). We include five categories of places: retail stores and recreation sites, grocery stores and pharmacies, parks, transit stations, and workplaces. The data consist in daily per cent variations in the number of visitors compared to a pre-pandemic baseline.

Finally, we include some variables describing the demographic characteristics of the Italian provinces, taken from the Italian National Institute of Statistics (Istat): activity rate, population density, and ratio of over-65s to the total population.

**Figure 1 – Swabs, positive cases and deaths over time (Italy, 24 February – 9 November 2020).**

We estimate Negative Binomial regressions of the regional deaths count on regional positive cases, regional swabs, Containment and Health Index, Compliance rate, Google Mobility data (for retail and recreation, grocery and pharmacy, parks, transit stations, and workplaces), activity rate, population density, and percentage of over-65s to the total population.

Indeed, as we employ a count variable as dependent, the correct investigation approach is given by regression models based on the Negative Binomial distribution (Chan et al., 2021). The time-varying variables are employed with a 17-day lag from the dependent variable, as we add the median time from the onset of symptoms to death, which was estimated in 12 days in Italy (Gruppo della Sorveglianza COVID-19, 2020), to the mean incubation period (i.e., the time between the contact with a positive individual and the onset of symptoms) of approximately 5 days (Linton et al., 2020). Our specifications employ the robust estimator of variance and do not include fixed effects.

We run the first model on the complete sample. Then, as the schools' reopening on 14

September 2020 is said to have been the primary cause of the resurgence of the pandemic in Italy (Sebastiani and Palù, 2020), we estimate the same model on two subsamples: until the 13th of September and since the 14th of September, which is marked as the beginning of the "second wave" of the pandemic.

Finally, we add up 10 points to the Containment and Health Index daily since the 1st of September, to investigate what would have happened if stricter public health regulations were put in place two weeks before schools restarted. Through this, we provide an estimate of how many human lives could have been spared in the period from 14 September to 30 October 2020 in the Italian regions affected by the highest lethality rate.

# **3. Results**

The results of our estimations are shown in Table 1. As regards the number of deaths, analysed through Negative Binomial regression models, most variables are highly significant and show the expected signs. Our results confirm that the lockdown policies have had a beneficial impact on the pandemic, having been able to reduce the number of deaths caused by COVID-19. Moreover, the number of deaths exhibits a negative relationship with the Compliance rate.


**Table 1 – Results from Negative Binomial regressions of regional deaths.**

*Notes: \*, \*\* and \*\*\* stand for p < 0.10, p < 0.05 and p < 0.01.*

We replicated the analysis by dividing the sample into two subperiods: the first one until the 13th of September and the second one since the 14th of September. The results roughly confirm those from the analysis carried out for the whole period, demonstrating the goodness of the model. Nevertheless, some regressors change their sign from one period to the other: mobility towards parks is positive in the first period, but negative in the second one, and the same goes for activity rate. Moreover, the magnitude of some coefficients changes considerably. In particular, the coefficient for Compliance rate in the second period is over four times that of the first period; additionally, the coefficient for Containment and Health Index shows an increase of about 11 times. This means that the importance of the restrictive measures and of citizens' accord on their abidance has greatly increased since the end of the summer, also because the stringency level of the adopted measures has critically declined, which was preparatory to the formation of the "second wave" of the pandemic. Finally, the share of population aged 65 or more always shows a positive sign, which reflects the known situation of higher lethality characterising the elderly population (Rinaldi and Paradisi, 2020). However, in the second period, its coefficient is about one fifth that of the first period: indeed, this shows that the demographic dynamics of the pandemic have changed compared to the beginning and that the elderly have become more cautious in the second phase of the pandemic.

Trying to sum up our achieved outcomes, the restrictions represented by the Containment and Health Index appear essential to contain the pandemic until the vaccination campaign has produced the so-called herd immunity. However, these restrictions are not sufficient when they are not accompanied by citizens' consent, which translates into adherence to the mobility restrictions, observed through the reduction in Google mobility indices: indeed, it is not realistic to think that repressive actions are enough to enforce compliance with the new mobility rules.

Finally, we add up 10 points to the Containment and Health Index since the 1st of September, providing a prediction of the deaths count from 14 September to 30 October 2020 in the six Italian regions affected by the highest overall lethality rate, in the hypothesis of higher stringency put in place starting from two weeks before the reopening of schools. These simple estimates do not consider the variations in compliance and mobility which could result from a hypothetical change in stringency. The results are summarised in Table 2 and plotted in Figure 2.

Apart from Valle d'Aosta, which experienced a low number of deaths due to its small population size, the predictions show that a significant number of losses could be averted by introducing more restrictions in good time before schools restarted. In particular, Lombardia – the region in which the outbreak started – could have saved 429 lives just between 14 September and 30 October, compared to the 563 deaths faced in the same period (-76.20%).



**Figure 2 – Deaths over time, real prediction, prediction in the case of higher stringency level.**

# **4. Conclusive remarks**

We should be aware that mitigating the spread of infections is a cooperative process: hence, all policymakers (State and Regional authorities) should manage communication to motivate the citizens and avoid contradictory behaviours that confuse the population. Indeed, it is necessary to act to address people's behaviours, as the defeat of COVID-19 begins in people's minds.

But it is not just a psychological and political communication problem. The role played by the closure of workplaces, except for essential activities, should also be borne in mind. In the period that began on the 14th of September, the contribution of workplace-related mobility to the deaths count has almost doubled, which leads us to question whether in the second phase of the pandemic there has been some hesitation in taking more incisive measures, such as the partial closure of productive activities.

As we have seen, we would have been saved hundreds of deaths if more restrictions were promptly introduced before schools' reopening. With no additional interventions, the number of lost lives will eventually become much greater than that suffered in the very first period of the pandemic (Vollmer et al., 2020). Moreover, it should be remarked that timeliness in introducing restrictive measures is essential to reduce their required duration (Chang et al., 2020).

# **References**


**11**(1), pp. 1-13.


### **model for the probability of winning**  Silvia Baccia , Tijan Juraj Cvetkovićb a Department of Statistics, Computer Science, Applications "G. Parenti", University of **Motivation of basketball players: a random-effects logit model for the probability of winning**

**Motivation of basketball players: a random-effects logit** 

Florence, Firenze, Italy b Department of Neuroscience, Psychology, Drug Research and Child Health, University of Silvia Bacci, Tijan Juraj Cvetković

Florence, Firenze, Italy

# **1. Introduction**

Professional sports are getting more competitive as athletes strive to improve their sports performance and sport organizations employ various coaches in order to help athletes in achieving this aim. In an environment where athletes are physically dominant and have high skill mastery, psychological factors can make a difference to prevail over other athletes. For this reason, sport psychology (Perry, 2015) plays an important role in preparing an athlete from a mental perspective, just as a coach prepares from a physical perspective. In sports, motivation is a key factor of success, hence sport organisations, decision makers, sport psychologists, and players themselves must address it constantly in order to keep it and perform at the highest levels.

Psychology offers various theories and theoretical models to explain the motivational process, its benefits, and how to create a motivational climate. In this contribution we considered McClelland's Need achievement theory (McClelland, 1961) and the Nicholls' Achievement goal theory (Nicholls, 1984). These theories have something in common: goal setting, the incentive value of success, and the probability of success. Estimating the probability of success is difficult, subjective and, often, inaccurate. An error in any step of the motivational process may lead to a mistake in the role assignment, performance, and goal setting.

This paper aims at estimating the probability of success and, consequently, at making clear the motivational process such that a team or an athlete can be easily assigned to a certain role, can enhance their performance, and can set a goal as in deciding what segments of a sport must be improved. The estimation of the success probability relies on detecting the variables that affect the probability of winning in a statistically significant way. As these variables differ according to various sports, in this paper we focus on basketball, in particular the U.S. National Basketball Association (NBA). The study is based on the analysis of the traditional box scores of the regular season games played in the seasons 2016-17, 2017-18, 2018-19, and 2020-21. Because of the hierarchical structure of data at issue, with multiple observations for each team, a random intercept logit model was formulated and estimated.

The remaining part of the paper is organized as follows. The theoretical background concerning the motivational process from a psychological point of view is illustrated in Section 2, data are described in Section 3, and the main results related with the random intercept logit model are shown in Section 4. Finally, some remarks conclude the paper.

# **2. Motivation**

Need achievement theory (McClelland, 1961) is a theory that explains what a person goes through when he/she decides to adopt a certain behaviour. McClelland considered the one's implemented behaviour as the result of a combination between personality traits and situational, resultant, and emotional factors, as illustrated in Figure 1. In detail, there are two main personality traits driving the behaviour along alternative paths: "need to achieve" and "need to avoid failure". The need to achieve is characterized by a drive to successfully compete with the standards of excellence, whereas the need to avoid failure distinguishes for a negative motivation oriented to avoid failure and criticism. These factors link with situational factors, including the probability of success and the incentive value of success. A person weights his/her probability to success and what he/she stands to gain from it. This interaction is crucial as its

Silvia Bacci, University of Florence, Italy, silvia.bacci@unifi.it, 0000-0001-8097-3870

69 Tijan Juraj Cvetković, University of Florence, Italy, tijan.cvetkovic@stud.unifi.it, 0000-0002-6971-4023

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Silvia Bacci, Tijan Juraj Cvetković, *Motivation of basketball players: a random-effects logit model for the probability of winning*, pp. 81-85, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.16, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the onsite conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

resultant leads to either approach success or avoid failure. Moreover, emotional factors influence whether we focus on pride or shame. As result of these two main paths, the implemented behaviour consists, respectively, in seeking a challenge and enhanced performance or avoiding challenges, less effort and risk.

Achievement goal theory (Nicholls, 1984) is an orthogonal theory based on the persons approach to a task. Nicholls emphasizes the journey to the goal rather than the results of the goal itself. In the achievement goal theory a person focuses on skill mastery, a self-comparative perspective in which beating a previous personal result is success, or on the ego in which success is determined by comparison with others.

Figure 1 - Illustration of McClelland's (1961) Need Achievement Theory (our elaboration).

# **3. Data**

The data used in the model was collected on websites NBA.com and BasketballReference.com. The dataset was constructed using the traditional box score statistics of the NBA for each game played in the seasons 2016-17, 2017-18, 2018-19 and 2020-21. The traditional box score contains information about: opposing teams, final outcome of the match (in terms of winning and losing), duration of the game (in minutes), total points scored, field goals made, field goals attempted, field goal shooting percentage, 3 point field goals made, 3 point field goals attempted, 3 point field goal percentage, number of free throws made, number of free throws attempted, percentage of free throws, offensive rebounds, defensive rebounds, total rebounds, assists, number of stolen balls, number of lost balls, number of blocks, personal fouls.

Data was arranged with a record per game (i.e., two teams in a single row) and variables were rescaled with extreme values omitted to avoid singularities. Due to changes in the leagues structure, omitted games include games played after the implementation of the Play-in tournament in the 2020-21 season and the 2019-2020 season in order to avoid variance due to circumstances. The resulting dataset is composed of 4,770 games played by 30 teams.

# **4. Random intercept logit model**

To properly address the multilevel data structure, consisting in multiple observations per team, the probability of winning is modelled through a random-intercept logit model, where teams are the upper-level units and games are the lower-level units. The dependent variable of the model is a binary one equal to 1 if the team won the game and 0 otherwise.

As concerns the independent variables, in order to estimate the probability of success for a team we considered the *differences of the per game statistics*, which are an average of the variables based on previous games, *and the opponent per game statistics*, which are averages of other teams performances against the team of interest.

For the sake of clarity, we illustrate how to build the independent variables and to estimate the probability of success of a game played between the Utah Jazz (Team A) and the Sacramento Kings (Team B). Let us consider the following variables:


The values of per game statistics representing the offensive performance of Team A and Team B, respectively, are displayed in Table 1, whereas the opponent per game statistics, representing the defensive performance of Team A and Team B, are reported in Table 2.

Table 1 - Per game statistics


Table 2 - Opponent per game statistics


Cross averaging Team A per game statistics and Team B opponent per game statistics, and vice-versa, results in a set of variables that take into account both offense and defense of the teams of interest (Table 3). This set of variables considers how the Utah Jazz (Team A) attack will vary against the Sacramento Kings (Team B) defence.

Table 3 - Cross averages of Team A per game and Team B opponent per game statistics, and vice-versa


Differences in the last column of Table 3 are then used as independent variables in the random-intercept logit model.

Estimates of the fixed effects of the model are shown in Table 4 (letter "d" before the variable names stays for "difference") and the related correlation matrix in Table 5. The selected model fits data in a very satisfactorily way, being the conditional and marginal pseudo-R2 equal to 95.5% (Nakagawa and Schielzeth, 2013).


Table 4 – Random-intercept logit model: estimates of fixed effects (significance level 5%)

We note that, in addition to variables displayed in Table 2, we investigated other possible determinants that, however, did not result statistically significant. In particular, no significant effect resulted for the difference in field goal percentage, 3-point field goal percentage, free throws made, free throws attempted, offensive rebounds, rebounds, blocks, and for the game season (dummies were added to the model for seasons 2016-2017, 2017-2018, 2018-2019 versus season 2020-2021).


Table 5 – Correlation matrix for significant independent variables.

# **5. Conclusions**

By analysing the traditional box scores of regular season games of the National Basketball Association (NBA) played in the seasons 2016-17, 2017-18, 2018-19 and 2020-21 we found several variables influencing the probability of winning, such as field goals made, field goals attempted, 3-point field goals made, 3-point field goals attempted, free throw percentage, number of defensive rebounds, number of assists, number of steals, number of turnovers and number of personal fouls.

Knowing the effect of these variables on the probability of winning helps a sport organization to improve the motivation of its athletes and to adopt a team-oriented approach to games. By objectively defining the probability of success and knowing what aspects of the game to focus on, the team decision makers can make changes accordingly. For instance, the roles assignment within a team can be improved assembling a team of players that are individually specialized in the significant categories and can consistently obtain values favouring the probability of winning. Moreover, goal setting such as keeping the opposing team under a certain number of made field goals or any other category, or rather prioritizing certain categories to maximise the probability of winning, can be easily identified benefiting both the team as a whole, by improving its chances, and the single athletes, by making him/her more proficient in a single task.

For the future research, we intend to investigate the role of an additional independent variable aimed at considering how injuries of key players affect the probability of winning. The role of team key players will be determined by analyzing their win share statistics. In particular, it will be interesting to assess the effect on the probability of winning of the number of injured or missing key players (none, one, more than one) in a game.

# **References**

"NBA Win Shares - Basketball-Reference.com". Basketball-Reference.com. Retrieved 2016- 03-25.

McClelland, D. C. (1961). *The achieving society*. Van Nostrand, Princeton, NJ.


### **application to future scenarios** Simone Di Zio Department of Legal and Social Sciences, University "G. d'Annunzio", Chieti-Pescara, **Reducing inconsistency in AHP by combining Delphi and Nudge theory and network analysis of the judgements: an application to future scenarios**

**Reducing inconsistency in AHP by combining Delphi and Nudge theory and network analysis of the judgements: an** 

> Pescara, Italy. E-mail: simone.dizio@unich.it Simone Di Zio

# **1. Introduction**

The Delphi is a widely used method for collecting data from panels of experts (Dalkey and Helmer, 1963) and its key characteristics are: anonymity, interaction, controlled feedback, and statistical aggregation of responses (Rowe and Wright, 1999), while the main goal is reaching a consensus among the panel members on the issue dealt with (Linstone and Turoff, 2011). Another well-known and widely spread method in the context of decision making is the Analytic Hierarchy Process (AHP), a Multi-Criteria Decision-Making (MCDM) method designed to solve problems containing multiple conflicting criteria (Pirdashti et al., 2011). Developed by Thomas Saaty (Saaty, 1980), it has many advisable properties, such as the combination of subjective aspects, the chance of integrating objective and subjective data, and a way to combine individual and group priorities.

As far as we know, no study takes advantage of the Delphi features for reducing the inconsistency in the AHP matrices, a known problem but practically inevitable, given that it is mostly the product of cognitive biases (Bonaccorsi et al., 2020). In case of high inconsistency, generally experts are asked to evaluate again the AHP matrices, but no expert likes to give again judgements because the first ones are inconsistent, which basically means wrong. Furthermore, even if they accept, there are no guarantees that the new judgements are less inconsistent. Our proposal is to exploit the Nudge theory, which proposes suggestions to influence the behaviour of groups involved in a decision-making process (Thaler and Sunstein, 2008). A Nudge is known as a "gentle push" to make better choices which, in our context, means more consistent evaluations. In this paper we propose a new method that exploits a combination of the Delphi method and the Nudge theory to reduce the inconsistency of the AHP matrices. The method has several advantages. In addition to reducing inconsistency, it allows the collection of textual material (expert comments), a valuable data in any decision-making context. A function of the inconsistency is used as stopping criterion of the Delphi rounds. Given the Delphi logic, the participants know from the beginning that they will be reconsulted, therefore they do not feel scrutinized or pressured and they are never told that their judgments are inconsistent. This, at least in principle, ensure freer and more sincere participation and a more willing attitude to evaluate again the judgments. Finally, since at each Delphi round only the matrices with the highest inconsistency values are sent back to the experts, round after round the length of the questionnaire diminishes, and this help in reducing the dropout.

In the next sections we provide an overview of the AHP method, while section 3 shows how Nudge theory can help in reducing the inconsistency of the AHP matrices. Sections 4 presents a case study and finally the paper ends with some concluding remarks.

# **2. The inconsistency of the AHP matrices**

The Analytic Hierarchy Process (AHP) is a general theory of measurement, useful to derive ratio scales for multi-criteria decision problems,suitable when the decision problem is complex and ill-structured. The decision factors are organized in a hierarchical structure where *criteria* and *alternatives* are compared pairwise using the Saaty scale (Saaty 1980). The goal is to find a set of weights (1, 2, …) for each level of the hierarchy (called *local weights*) and, from these, a vector

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Simone Di Zio, *Reducing inconsistency in AHP by combining Delphi and Nudge theory and network analysis of the judgements: an application to future scenarios*, pp. 87-92, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.17, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

<sup>75</sup> Simone Di Zio, Gabriele d'Annunzio University, Italy, s.dizio@unich.it, 0000-0002-9139-1451

of *global weights* (G1, 2, … , ) representing a rating of the alternatives in achieving the decision problem ( denotes the number of alternatives). The AHP can be adapted to group decisions (group AHP), and there are two families of methods for the combination of the individual preferences (Ossadnik et al., 2016), known as the Aggregation of Individual Judgements - AIJ - and the Aggregation of Individual Priorities - AIP - (Wu et al, 2008). Anyhow, no technique considers the variability in the distribution of responses, so that AIJ and AIP approaches do not take account of the degree of consensus/dissensus among participants, a fundamental issue in a group decision setting (Pirdashti et al., 2011). A voting procedure can overcome these limitations (Lai et al. 2002), but a majority vote is a "winner-take-all" system, where the opinions of the losers are completely disregarded (Di Zio and Maretti, 2014). This is why some scholars have proposed an integration of the AHP and the Delphi (Tavana et al. 1993; Di Zio and Maretti, 2014) which gives the Delphi the task of structuring a convergence towards a single solution shared by all.

Given , the pairwise judgement of alternatives and , for a perfect consistent matrix we should have ⁄ = (∀,) and = ℎ ∙ ℎ (∀,, ℎ) but human judgements are never perfectly consistent and in practical applications the equalities do not occur. Inconsistency in expert judgments has been observed in many fields and, for lack of space, we refer the reader to the vast specialized literature. In short, inconsistency is practically inevitable, because it is the product of cognitive biases (Bonaccorsi et al., 2020) and/or problem complexity. Consequently, there is a need to check the consistency through the calculation of a consistency index. The Consistency Ratio (), is the most common index used to check for consistency (Brunelli, 2018), calculated as = ⁄ , where = ( − )⁄( − 1), is the maximum eigenvalue of the matrix and (the random index) is the average of the calculated over many random square matrices, reciprocal and positive. As a rule of thumb, introduced by Saaty (1980), if ≤ 0.1 the judgements of a matrix can be considered consistent, otherwise the matrices must be reviewed by the expert (Liao, 2010), as many times until to have ≤ 0.1. The critical point is going back and stress the expert telling him/her that he/she made a wrong evaluation that needs to be revised.

All that being said, the reduction of the inconsistency in the AHP method remains an open issue, and here we propose a new approach which involves asking the experts for new evaluations according to the Delphi logic, in a structured and iterative procedure that, by means of *nudges*, gently push them towards more consistent solutions.

# **3. Reducing the inconsistency by combining the Delphi and the Nudge theory**

Although it still has open issues (Pill, 1971) - such as how to choose the experts, how many experts to include in the panel or how to measure the expertise - the Delphi is a method that offers undoubted advantages in the context of group decisions. In the Delphi-AHP the experts are consulted more than once, and starting from the second round, for each AHP matrix, we propose to give a nudge as feedback. By using a "nudge approach" we obtain both a reduction of the inconsistency of the AHP matrices and the elimination of the problem of choosing an aggregation method. After the first round (time 1) we get + 1 matrices for each expert. With 1,1 we denote the 3D array containing the × pairwise comparison matrices according to the first criterion, at time 1, where = 1, 2, … , denotes the expert and the cardinality of the panel. Since each participant give = ( − 1)⁄2 judgements, we have vectors of size . For the first criterion, the vector of the generic pairwise comparison (,) is [,1 1,1 , ,2 1,1 , … , , 1,1 ]. To synthetize these judgements, we use the median (other syntheses are possible) and as a result, we obtain a matrix representing the judgments of the whole panel after the first round, say 1,1 . On this matrix we calculate the consistency ratio CR 1,1 and by using the 17 values of the Saaty scale (1⁄9 , 1⁄8 , … ,8,9) we replace the first element of 1,1 , namely 12, 1,1 (as well as its symmetric 21, 1,1 <sup>=</sup> <sup>1</sup> 12, 1,1 ⁄ ) obtaining 17 different matrices. On each resulting matrix we calculate the

consistency ratio among which we find the smallest - say CR12, 1, <sup>1</sup> . This value is the result of a specific value of the Saaty scale, say V12, 1, <sup>1</sup> . This figure represents the theoretical assessment which, for the cell (1,2), gives the best consistency of the matrix, given all the other values. By repeating the same search for the upper triangular of the matrix, we obtain different "best " among which we find the smallest, that represents the "best of the bests": CR∗ 1, <sup>1</sup> = {CR, 1, <sup>1</sup> : = 1, … , ; > }. We denote the position of this value with ( ,) and its corresponding value of the Saaty scale with (actually our nudge), that is the judgement that most improves the consistency of the matrix.

In the second round of consultation, the panel is invited to judge each (and ) inside the proposed interquartile range - , = [Q1 1; Q3 1] - where Q1 <sup>1</sup> and Q3 <sup>1</sup> are, respectively, the first and third quartile in the distribution of judgements in the cell , (Di Zio and Maretti, 2014). The quantity Δ = , ⁄2 is used to create a symmetric interval around . For the judgement in position ( ,), in the second round, instead of the *IQR*, the interval proposed to the panel is [ − Δ; + Δ]. Therefore, among the proposed intervals − 1 are but one is a Nudge which gently pushes the respondents towards a more consistent matrix. The same process applies to all the matrices of the hierarchy and the procedure is repeated iteratively in the following rounds. If the consensus triggers, there will be a progressive reduction of the consistency ratios: CR, <sup>1</sup> ≥ CR, <sup>2</sup> ≥ CR, <sup>3</sup> ≥ ⋯.

This method has several advantages. The aggregation of judgements is managed optimally, by considering the degree of consensus, and this reduces, at least in principle, the dropout rate. Simultaneously, we reduce the inconsistency of the matrices in a gentle way, because there are no pressures on the participants. The experts do not perceive any kind of "mistake message" and are softly driven to revise their judgements. A right nudge, in the AHP context, "pushes gently" the participants to more consistent judgements. So, the method stimulates consensus and reduces inconsistency at the same time.

The rule to stop the Delphi iterations is twofold. To make the benefits of the Delphi at least two rounds must be performed, therefore the first stopping criterion is ≥ 2 (here denotes the rounds). During the rounds we have, for each matrix, a sequence of Consistency Ratios , 1, , 2, … , , (here = 1,2, … , + 1 and we removed the subscript to simplify) and the second stopping criterion is that at least one in the sequence is less or equal than 0.1. After the round 2,for the matrix , we have four possible cases. 1) , <sup>1</sup> > 0.1 and , <sup>2</sup> ≤ 0.1; the Delphi for the matrix stops and as result we take the matrix coming from the second round: , <sup>2</sup> . 2) , <sup>1</sup> ≤ 0.1 and , <sup>2</sup> > 0.1; the Delphi stops, but the matrix we take is , <sup>1</sup> . 3) , <sup>1</sup> ≤ 0.1 and , <sup>2</sup> ≤ 0.1; the Delphi stops, and we choose between , <sup>1</sup> and , <sup>2</sup> the matrix with the lowest inconsistency. 4) , <sup>1</sup> > 0.1 and , <sup>2</sup> > 0.1; in this last case only the condition ≥ 2 holds, therefore the Delphi continues. This double-stopping criterion is appropriate because after a reduction of the inconsistency, there is no guarantee that continuing the rounds the decreases monotonically. If after rounds, for a matrix no index in the sequence is less than 0.1, we suggest the following solution: take the round such that , is the minimum and hold the matrix used for the calculation of the intervals for the round + 1. This matrix, by definition, has ∗ , < , . Since the above algorithm applies to each matrix of the hierarchy, it may happen that for one matrix only two rounds are necessary while for another matrix we can have three or more rounds. The advantage is that the length of the AHP questionnaires reduces during the rounds. The result of the method is a vector of global weights, with lower levels of inconsistency in the pairwise matrices than the classic AHP.

# **4. Application on four future scenarios with network analysis**

We applied the proposed method in the evaluation of four future scenarios on the genetic modification experiments. It is called CRISPR the new technology that allows splicing of DNA molecules, and in the future, it could allow human selection of characteristics of children, including escaping of many diseases. The ethics of this technology is obviously questionable. Starting from these considerations, Theodore J. Gordon (one of the fathers of the Delphi method) sketched four brief future scenarios on CRISPR technology. For lack of space, we do not report the scenarios in full but only their titles: Scenario A. *Genetic tech self-regulation*; Scenario B. *Genetic tech external control*; Scenario C. *Genetic tech uncontrolled*; Scenario D. *Genetic tech downside*. Each scenario represents an alternative of the AHP hierarchy.

Following Gordon and Glenn (2018), the main factors measuring the usefulness of a scenario, and which here constitutes the criteria of the AHP, are the following: *Plausibility*: the paths to the futures must be seen as feasible and may not be viewed as impossible. *Consistency*: the paths to the futures and the resulting images must not be mutually contradictory. *Simplicity*: a good scenario describes paths to the future scenario in a way that is easily understood. Therefore, we had 4 alternatives (the scenarios) and 3 criteria, and we wanted to find a ranking of importance of the scenarios according to these criteria. The survey was performed on Alchemer (https://app.alchemer.com) where each pairwise comparison was built on a radio button that reproduces the whole Saaty scale. This avoided that the respondents fill in the matrices, in general a complicated task for non AHP-experts.

The panel consisted of 26 experts, recruited around the world, diversified according to age, gender, expertise and employment, and having skills both in the field of futures studies and genetics. For each round they gave 21 pairwise judgements and voluntary comments. For each round and for each matrix (1,2,3, ) we obtained the consistency ratios reported in Table 1.


For the calculation of the local and global weights we take the matrix resulting from the last round for *plausibility* (1,3 = 0.0366) and *consistency* (2,3 = 0.0018). For the criterion *simplicity* the best value derives from the first round (3,1 = 0.0195) and for the comparison of the three criteria we take the matrix coming from the second round (,2 = 0.0056). In all cases the values are very good, being all well below the 0.1 threshold. The result consists of a vector of global weights, which quantifies the relative importance of each future scenario. The best scenario, according to the panel of experts, is scenario B, Genetic tech external control ( = 0.52). It follows scenario A, Genetic tech self-regulation ( = 0.28), and scenario D, Genetic tech downside ( = 0.10). The last is scenario C, Genetic tech uncontrolled ( = 0.09). About the local weights of the criteria, the experts considered *plausibility* as the most important criterion ( = 0.47). Following we have *consistency* ( = 0.43) and *simplicity* ( = 0.10).

After that, we explored the network structure of the scenarios and criteria. A network refers to a structure representing a group of objects and relationships between them, and its mathematical representation is a graph, which consists of nodes and edges. Since each scenario/criterion is linked to the others through a preference ratio, it is useful to represent the results of the AHP through weighted direct graphs, in which the nodes are the scenarios/criteria and the edges are proportional to the geometric mean (or median) of the judgments provided by the experts. By considering the matrices with the lowest (Table 1, bold digits) we obtained the four digraphs of Figure 1.

From each graph emerges, with a single glance, the whole structure of the preferences expressed by the panel in comparing the future scenarios under each criterion and the criteria, as well as the structure of relationships between the scenarios/criteria, with evident advantages over the representation through matrices. Also, we can build a network for each expert and, even more interesting, we can consider each expert as a layer in a multiplex network, that is a network in which the same set of nodes are connected via more than one type of links (Kyu-Min et. al, 2015). Besides, we can consider each criterion as a layer, to study the interactions between scenarios and criteria, or even each Delphi round as a layer, to explore, within each criterion, the interactions between scenarios and rounds. In short, there are many possibilities to represent and analyse the outputs of a Delphi-AHP through the Network Analysis. So, we can study whether scenarios behave similarly across experts, across Delphi rounds or across criteria. Hence, this is not only a way of visualizing the results but a statistical tool for modelling the Delphi-AHP data in a way that to highlight the structure of relationships between experts, scenarios, criteria and Delphi rounds.

*Figure 1. Network representation of the results (nodes sizes are proportional to the closeness)*

To give a taste of the measures that can be computed, we calculated the closeness () of the networks of the Figure 1 (where nodes sizes are proportional to ), which gives information on how close a scenario (or a criterion) to all the others is. Plausibility network: = 0.549, = 0.800, = 0.799, = 0.925. Consistency network: = 0.627, = 0.779, = 0.474, = 0.457. Simplicity network: = 0.335, = 0.422, = 0.597, = 0.626. Criteria network: = 1.246, = 1.014, = 1.488. Scenario D is the "closest" to the others under plausibility and simplicity criteria, while under consistency the scenario with major closeness is B, and simplicity is the criterion with the higher closeness.

# **5. Concluding remarks**

We have introduced a new method to use the Delphi method to nudge responses of participants toward better consistency in the AHP pairwise comparison matrices. The network analysis helps to depict the structure of interactions between alternatives and criteria of the AHP hierarchy. We applied the method for the evaluation of four future scenarios, dealing the management of genetic modification technologies. The study confirmed quite well the research hypothesis, since the inconsistency in all the AHP matrices remained under, or dropped below, the threshold of 0.1.

Although the rounds of the Delphi must stop after the second round, in the application we performed three rounds for all the matrices, to explore all the potentialities of the method. The method removes the problem of choosing an aggregation method of the individual judgements, because the Delphi produces a convergence toward a synthesis of the evaluations which includes all points of view, even the extremes or the minority ones. By using a multiplex network approach, the structure of relationships between experts, scenarios, criteria and Delphi rounds can be studied.

As future developments we can think of the graph representation as a tool to be included in the Delphi questionnaires, which help to visualize in real time the answers that each participant gives. Also, when considering each expert as a layer of a multiplex network, the similarity measures between layers could be exploited to explore new measures of consensus in the Delphi method and new ways of aggregating the individual judgements could be also studied.

# **References**


Saaty, T.L. (1980). *The Analytic Hierarchy Process*. McGraw-Hill, New York (NY).


### **Mapping and factoring the 2007 ATECO categories in regard to specialised human capital Mapping and factoring the 2007 ATECO categories in regard to specialised human capital**

Luigi Fabbris, Paolo Feltrin**<sup>1</sup>** Tolomeo Studi e Ricerche, Padua and Treviso, Italy. Luigi Fabbris, Paolo Feltrin

# **1. Introduction**

The paper describes an exercise of classification of the five-digit categories of the 2007 Ateco classification system of economic activities (https://www.istat.it/en/methods-andtools/classifications). The purpose of the work is to highlight the categories showing top levels of human capital (HC) in order to pinpoint the categories that are likely to lead the Italian economic growth in the near future.

An attempt to measure the effects of HC concentration in a territory was realised by Moretti (2012) in the United States. He showed that innovation can attract in a territory many other jobs and form the basis of a global knowledge economy (see also Etzkowitz and Leydesdorff, 1997). In Italy, attempts to link HC to territorial clusters was studied, among others, by Colombo and Delmastro (2002) with reference to technology incubators, Liberati et al. (2013) to science and technology parks, and Bertamino et al. (2014) to technological districts. All Italian studies conclude that locating firms within a specialised territory does not influence significantly business R&D. This may be due to the different demographic density of the United States and Italy. In this work, we ignored the location of economic activities and scouted the complementary hypothesis that HC impacts certain activities more than others and this can lead to an enduring development of the activities.

Our exercise was realised through a multivariate mapping of the two-digit Ateco categories on the basis of HC indicators and a factorisation of the indicators so to understand if and how higher competence can be considered a connection trait of certain economic categories.

The results of our analysis could be useful, among else, to evaluate possible relations between academic education and economic growth in Italy. This possible relationship correlates also with some strategies of the Italian PNRR (National Plan for Recovery and Resilience; https://www.governo.it/sites/governo.it/files/PNRR.pdf) and may help forecasting its possible outcomes.

The rest of the paper is organised as follows. Section 2 shortly describes the indicators created for defining the HC of Ateco categories and the methodology adopted for the exercise; Section 3 presents the main results of the data analysis; and Section 4 discusses the results of the statistical analysis with reference to the mainstream literature and then concludes.

# **2. Data and methodology**

The indicators of HC associated to the Ateco categories are the following.


<sup>1</sup> luigi.fabbris@unipd.it; paolo.feltrin@gmail.com. Luigi Fabbris, University of Padua, Italy, luigi.fabbris@unipd.it, 0000-0001-8657-8361 Paolo Feltrin, University of Trieste, Italy, paolo.feltrin@gmail.com, 0000-0003-2801-5151

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

81 Luigi Fabbris, Paolo Feltrin, *Mapping and factoring the 2007 ATECO categories in regard to specialised human capital*, pp. 93-98, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.18, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8


To obtain more stable estimates, year *T* data were averaged over years 2018 and 2019 and *T-1* data were averaged over years 2011 and 2012. The Ateco categories that changed from 2011 to 2019 or were null at either year were merged or excluded from the analyses. We ended up with 84 Ateco categories. The Covid-19 pandemic particularly threatened employment; that is why, in this work, we considered anomalous, and then ignored the 2020 data.

The idea in the background of our choice of indicators was that a leading economic category is one that is qualified by a high frequency of college-educated workers and parallels that of people working in higher jobs. This frequency is evaluated for both all Italian workers and the selfemployed. While the relevance of higher education as a distinctive trait of leading economic activities recurs in the mainstream literature (Autor et al., 2003 and Moretti, 2012, though the latter argues that excellent exceptions are numerous), that of self-employment as a qualitative symptom derives from studies on the future of work (European Commission, 2013; OECD, 2019), which forecast a growing relevance of self-employment for job creation or job restructuring in the next decades.

The relational analysis of the indicators was based on a Varimax-rotated principal-component factor analysis (Browne, 2001). The analysis aimed to elicit the multiple relationships between indicators and define a mapping system of categories inclusive of all intercorrelated indicators.

The R Studio package was used to compute the estimates.

# **3. Results**

The statistical analysis of the basic indicators showed that, in Italy, both the percentage of workers with a college (from now on also "higher education") degree and that of workers in an intellectual or scientific job (from now on also "higher job") are important and show an increasing trend (Table 1). In particular, the proportion of the employed with a higher education degree was 23.3% in 2018/19 and had an amazing increase since 2011/12: +29.5% (basis 2011/12=100).

Even the employed in a higher job represented a relevant and increasing quota of Italian workers: in 2018 and 2019, the quota of workers in an intellectual or scientific job was 14.8%, with a notable increase (+14.8%) from to the basic year. This may be due to the diffusion of technological innovation also in many traditional sectors, which, in turn, activated an additional demand for highly qualified jobs.

The proportion of self-employed was relevant (22.8%, average value of the years 2018 and 2019), at least in comparison with other European countries, but in diminution (-8.3%) from 2011/12. The stream-lining concerned the categories of para-subordinates and of self-employed in craft, commerce and agriculture: in fact, most movers from these categories either retired or became employees. Instead, the number of employers and freelances increased during the examined time span (Fabbris and Feltrin, 2021). Our data show that the increase concerned both the self-employed with a college degree and those in a higher job and this increase was larger than that that involved employees. It is interesting that all indicators of level are positively skewed, and this allows pinpointing the Ateco categories with the higher levels of the examined indicators.

So, we applied factor analysis twice, once to examine the relationships among the level at 2018/19 and its variation from 2011/12 of three basic indicators (per cent of workers with a higher education degree, in a higher job, and self-employed) and another including also the level and variation of two qualified categories (workers with a college education and workers in an intellectual or scientific job) among the self-employed. The latter, which was named 10 variable analysis, was an attempt to involve in the analysis the interactions between selfemployment and the other two basic indicators.


**Table 1. Mean values of HC indicators at years 2011/12 and 2018/19, Italy** 

**Table 2. Correlation coefficients between human capital indicators (Italy, 2018/19)** (significance levels in the upper triangle: \*\*\*<1%0; \*\*<1%; \*<5%; °<10%)




The correlation coefficients between the indicators, presented in Table 2, showed that:


correlation between the rate of variation of high education and high jobs is positive both among the complex of Italian workers (0.50) and the self-employed (0.13), while it is not significant or negative for all the other analysed variations.

 Both the 6-indicator and the 10-indicator factor analyses (Table 3) showed that two factors are enough to represent the between-indicator correlations. In fact, the first two factors explained, respectively, 62% and 55.5% of the global variance. Though, the higher complexity of the 10 indicator analysis led us to privilege it for our analysis. The two-factor solution (Figure 1) showed that:


**Figure 1. Map of the Ateco categories on the surface defined by the first (abscissa) and second factor (ordinate) obtained with a Varimax rotated 10-indicator factor analysis, Italy** (numeric codes refer to twodigit Ateco classification; X-arrows represent the indicators)

The Ateco categories represented in Figure 1 show that the categories leading the intensity scale of human capital as measured by higher education, higher jobs and entrepreneurial spirit were: 75 (veterinary services), 72 (scientific research and development), 70 (business management and consultation), 71 (studies of architecture, engineering and other technical services), 86 (health services), 69 (legal and accounting offices), 85 (education) and 90 (creative, artistic and entertainment activities). In all these categories graduates exceeded 50% of total workers and, with the exception of category 90, exceeded 60% rate of workers possessing a higher education degree. Also, the categories number 58 (editorial activities), 62 (software production) and 74 (other professional, scientific or technical activities) scored positively on this main factor. All the quoted categories but number 62 (software production) showed also a positive trend at the end of the examined period. As expected, high skill jobs are associated to innovative sectors and refer to both employees and the self-employed.

The category scoring negatively on the first factor but showing a steep qualitative increase during the considered period is number 97 (family and community assistance), meaning that the personnel to assist families for housework and/or people with impairments were less educated than average in 2011 but are notably increasing their education and skills in the last years.

# **4. Discussion and conclusion**

In this work, we aimed to highlight the Ateco categories showing higher and/or increasing levels of human capital at 2018 and 2019. The indicators of HC intensity referred to both college-educated skills, workers' employment in higher jobs, and entrepreneurship. Taken together, the indicators aimed at representing both the rate of superior knowledge required by jobs at certain economic activities and the innovation necessary to improve products and processes, as well as the entrepreneurial spirit that should accompany knowledge and innovation as drivers of business opportunities. We examined both the level of indicators and their dynamic perspective. Variation was taken with reference to 2011 and 2012 as baseline years.

Our exercise is similar to that of defining what economists, referring to industrial clusters, call "the Marshallian trinity of information exchange, specialized suppliers, and a pool of labor with specialized skills" (Krugman, 2017). Of course, in our case, proximity does not refer to territory but to similar economic activities: paraphrasing Becattini (1990), we asked ourselves if there were economic categories sharing a system of values, views, language, expectations and behaviours, combined to an entrepreneurial culture and knowledge, that shape the productive atmosphere and drive the development of the firms and the workers in them. In particular, people working for themselves should know what hitherto and in perspective would be managed for them.

We have found that there are categories leading the trinomial: knowledge-innovationentrepreneurial spirit. Some of them could be given for granted, such as medical, veterinary, education and R&D activities that are mainly related with top jobs. Legal, accounting, architecture, engineering and other highly technical activities require superior education and training and are often realised in a self-employed environment, either solo or in small offices.

What may be a novelty in this knowledge-oriented group of activities is that of business and management consultation and that of creative artistic and entertainment industry. Business consultant and managers are relevant to the development of both local and global businesses and work in competition to each other at national and international levels. To consult and manage firms you need not only a specific knowledge but also culture and a personality adequate to make strategic decisions. Specific education and training and the capacity to identify themselves with entrepreneurs are essential components of the professional personality of these workers.

The relevance of creative industries as a driver of local development is underlined in Moretti (2012). In Italy, these sectors refer to the so-called "four Fs" of made in Italy (fashion, food, factory automation, furniture and design), as well as tourism, leisure and information diffusion. The peculiarity of this industry is that technical activities are of the creative and cultural type.

All this raises an education issue. With reference to universities, the issue implies decisions ranging from educational strategies to practicalities, such as thinking in terms of building transferable skills, and in particular developing attitudes and skills for running a business. Educating students to start an own business and developing their business competencies could support the graduates in their job finding and raise the productive capacity of the whole economic system. Also, the matching between economic activities involving graduates and higher education paths could help universities to pinpoint their productive stakeholders and imagine future courses.

Our hypotheses of a cogent relationship between the intensity of higher education required by certain jobs and that of higher jobs of a given economic activity was confirmed. We did not find overt relations between these two variables and self-employment. The correlation between knowledge intensity and self-employment frequency was positive but the variation between the two variables was negative, namely, while the intensity of knowledge employed by firms grows in time, the number of self-employed diminishes. This does not imply that self-employment requires lower education, but the opposite. In fact, the groups of self-employed diminishing in recent years are craft, commerce and agriculture self-employed workers, who are, on average, less educated than the other self-employed.

So, while the number of self-employed diminishes, the knowledge required to them and to the

category of business owners, with or without other employees, increases. Education and training help the self-employed to become self-reliant. He may become better at finding customers, or at least at handling his personal finances. He may even understand the world of business better if he has all had to run one, however small, at one time or another, and this introduces a variable we did not consider in our exercise, the age of worker. This could be matter for future work.

Even though the self-employed work showed time trends diverging from those of knowledge and innovation, entrepreneurship remains a relevant pillar of our argument. We support the idea that creating the conditions to foster self-employment is socially and economically relevant. A better understanding of how the self-employed organise their work and harness the benefits of knowledge and innovation while managing their job activities can offer insights to policymakers, employers and employees on the changing work domain. Though, the Covid-19 calamity may prevent some workers to run an own business, even for long time.

Our exercise has a main limit: the two-digit Ateco classification. This classification is raw; the analysis of one more digit classification might inform more than we did. Even this exercise showed problematic while computing variations because of low frequencies in some classes. This means that a more implied analysis should be realised *cum grano salis*. Finally, we did not assume a criterion variable and the presented analyses concerned just the non-hierarchical relationships among the selected indicators. Even this could be a further issue for future work.

# **References**


### Maurizio Carpita <sup>a</sup> , Rodolfo Metulini <sup>b</sup> <sup>a</sup> Department of Economics and Management (DEM) - University of Brescia, Contrada Santa Chiara, 50, Brescia BS 25122, Italy. **Modelling the spatio-temporal dynamic of traffic flows with gravity models and mobile phone data**

Modelling the spatio-temporal dynamic of traffic flows with gravity models and mobile phone data

<sup>b</sup> Department of Economics and Statistics (DISES) - University of Salerno, Via Giovanni Paolo II, 132, Fisciano SA 84084, Italy. Maurizio Carpita, Rodolfo Metulini

# 1. Introduction

The analysis of origin-destination traffic flows may be useful in many contexts of application and have been commonly studied through the *Gravity Model* (Tinbergen, 1962). The popularity of Tinbergen's log-linear specification of the Gravity Model is due to its good performance in modelling international trade flows and to the strong theoretical foundations provided in papers such as Anderson (1979) and Anderson & Van Wincoop (2003). At the macro-level, this model states that the volume of trade between any two countries is proportional to the product of their gross domestic products (GDP) and a distance deterrence function, where distance is broadly construed to include all factors that might create trade resistance. The Gravity Model equation can be straightforward translated to micro-level flow data, such as, for example, passenger flows, simply by substituting trade flows with the total number of passenger flows from two cities, a measure of dimension of the city of origin and of the city of destination (such as their population) instead of GDP, and the geographical (or network) distance among the two cities in place of trade resistance.

Using data on the flow of mobile phone signals of TIM (*Telecom Italia Mobile*) users among different *census areas* (ACE of ISTAT, the *Italian National Statistical Institute*), recorded on hourly basis for six months, in this preliminary study we model such a flows in the *Mandolossa* to predict flows' intensity during flood episodes in the context of smart cities emergency management plans. Traffic flows data can be integrated to mobile phones densities and used to develop dynamic exposure to flood risk maps, as proposed in Balistrocchi et al. (2020). From a prevention perspective, this could make the identification of preferential traffic flows possible, thus evidencing potential risks during inundation onsets or emergency situations.

Whereas, as explained above, for the classical Gravity Model a traditional *static* mass explanatory variable is represented by GDP or by residential population (Kepaptsoglou et al., 2010) also thanks to the availabiity of a time series of data, we propose to use a most accurate set of explanatory variables in order to better account for the *dynamic* over the time. First, we employ a time-varying mass variable represented by the density of TIM users by area and by time period, which has been estimated from mobile phone data using the method proposed by Metulini & Carpita (2021) and adopted by Balistrocchi et al. (2020) to derive crowding maps for flood exposure. Second, a proper set of time effects is included. We show that the joint use of these two novel sets of explanatory features allow us to obtain a better linear fit of the Gravity Model and a better traffic flow prediction for the flood risk evaluation.

# 2. The mobile phone flows and the other datasets

The TIM mobile phone flows used in this study has been provided by Olivetti (*www.olivetti. com/en/iot-big-data*) and FasterNet (*www.fasternet.it*), for the development of the MoSoRe

87

Maurizio Carpita, University of Brescia, Italy, maurizio.carpita@unibs.it, 0000-0001-7998-5102 Rodolfo Metulini, University of Salerno, Italy, rmetulini@unisa.it, 0000-0002-9575-5136

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Maurizio Carpita, Rodolfo Metulini, *Modelling the spatio-temporal dynamic of traffic flows with gravity models and mobile phone data*, pp. 99-104, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.19, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Project 2020-2022 co-founded by Lombardy Region (*bit.ly/2Xh2Nfr*), and has been used at the DMS StatLab of the University of Brescia (*dms-statlab.unibs.it*).

The original data flows are square origin-destination (OD) matrices of dimension N × N, where N = 235 represents the number of *census areas* or ACE (*Aree di Censimento*, using the standard definition of ISTAT) in the Province of Brescia, available at each hour's interval for six months from September 2020 to February 2021, so the length of the time series is 24 × 181 (T = 4, 344). Furthermore, ISTAT provided the shape files for SCE (*Sezioni di Censimento*), with additional information about the belonging to their ACE and its area (*www.istat.it/it/archivio/104317*).

We restrict our attention to a particular subset of OD matrices, as the core of the analysis regards the area of the *Mandolossa*, which has been identified with 4 ACE (Brescia Mandolossa, Cellatica, Gussago and Rodengo Saiano) intersecting with the identified flooding-risk area (return period of 10 years), as reported in the left chart of Figure 1. We choose other 38 neighboring ACE aggregated as represented in the map, which fulfil the criteria of having a minimum (considering the four ACE of the Mandolossa) outflow of 10 in both three sample days chosen randomly. The total flows counted between the 4 Mandolossa's ACE and the 38 selected neighboring ACE counts for about the 84% of the total outflows from the Mandolossa's ACE.

Figure 1: Map of flooding risk area, ACE in Mandolossa and neighboring (by macro-area) (left). Kriskograms of TIM flows between the eight macro-areas (right).

The three kriskograms in Figure 1 show flow between the 8 macro-areas of interest at three different hours (7-8 am, 3-4 pm, 9-10 pm): the diameter of the circles (proportional to the total flow) highlights that flows increase from morning to afternoon and decrease from afternoon to evening, and that for the four ACE of the Mandolossa flows are internal flows are very high. As show in Section 4, these evidences have suggested to introduce in the Gravity Model a parabolic effect for *hour* and a dummy effect for three internal flows.

About Gravity Model's variables, we collect ISTAT data on residential *population* in each SCE (and, by aggregation, each ACE) at January 1st, 2016, and on the *distance* in km between the centroids of the 4 Mandolossa's ACE and the other 38 neighboring ACE. Furthermore, to extend the classical Gravity Model we have used the *mobile phone density* of TIM users, computed for each hour and ACE of interest, which can be interpreted as the average number of mobile phones simultaneously connected to the TIM network in that area in that time interval (Carpita & Simonetto, 2014). These data are created by Metulini & Carpita (2021) and used in Balistrocchi et al. (2020) for the analysis of the Mandolossa in the period 2014-2016. As the mobile phone densities for 2020 and 2021 are not yet available, we have used as proxy the data in the same month, hour and day of the week of 2015 (from September to December) and 2016 (for January and February).

# 3. The Gravity Model and its extension

The classical Gravity Model states that *flows* from origin i to destination j (Fij ) are proportional to *masses* of both origin and destination (M<sup>i</sup> and M<sup>j</sup> ) and inversely proportional to *distance* between them (Dij ), where G and γ are positive constants:

$$F\_{ij} \propto G \cdot \frac{M\_i \cdot M\_j}{D\_{ij}^{\gamma}} \tag{1}$$

Assuming masses as functions of *Populations* (P<sup>i</sup> and P<sup>j</sup> ), the Gravity Model can be linearised using the logarithmic transformation of (1) and specified as a multiple linear regression model with a temporal dependence subscript t (in our case the hour), with random errors ijt (LeSage & Pace, 2009):

$$\log(F\_{ijt}) = \alpha + \beta\_1 \cdot \log(P\_i) + \beta\_2 \cdot \log(P\_j) - \gamma \cdot \log(D\_{ij}) + \epsilon\_{ijt} \tag{2}$$

Model (2) can be extended introducing as other explanatory variables the dynamic masses (dependent from t) *mobile phone densities* (MPit and MPjt), the fixed effect for *Internal flows* (IFij ) and a vector of pure *Time effects* (TEt), with parameters α, β1, β2, γ, δ1, δ2, ω and λ that must be estimated:

$$\begin{aligned} \log(F\_{ijt}) &= \alpha + \beta\_1 \cdot \log(P\_i) + \beta\_2 \cdot \log(P\_j) - \gamma \cdot \log(D\_{ij}) + \\ &\quad \delta\_1 \cdot \log(MP\_{it}) + \delta\_2 \cdot \log(MP\_{jt}) + \omega \cdot IF\_{ij} + \lambda^T \mathbf{TE}\_t + v\_{ijt} \end{aligned} \tag{3}$$

It must be considered that this traditional log-linear specification of the Gravity Model along with Ordinary Least Squares (OLS) estimation method can be inappropriate when bilateral flows are frequently zero. Many studies estimate the log-linear model on samples of observations using the truncated OLS approach but, by disregarding pairs of observations that do not have a positive flows with each other can generate biased estimates (Helpman et al., 2008). Silva & Tenreyro (2006) have shown that log-linearisation of the Gravity Model leads to inconsistent estimates in the presence of heteroscedasticity in flows levels, and propose a Poisson specification along with the Poisson Pseudo Maximum Likelihood (PPML) estimator. However, when just interested on the flows between areas with positive flows, as in our explorative case study, it is possible to rely on OLS without any loss in estimation efficiency.

# 4. Application and preliminary results

The parameters of the classical Gravity Model (2) and its extension (3) presented in Section 3 have been estimated using the standard OLS method using data described in Section 2. For this preliminary study, a sample of flows of 6 hours (7,10,13,15,18,21) and 4 days of the week (Monday, Wednesday, Thursday and Saturday) for the six months from September 2020 to February 2021 has been extracted from the 4 Mandolossa's ACE and the 38 neighboring ACE. Then, this sample of 6,912 observations has been randomly partitioned in *training set* (6,000 observations) used for estimation and *test set* (912 observations) used to evaluate prediction performance.

To assess the goodness of fit of the four models considered in this preliminary analysis, Residual standard error and adjusted R<sup>2</sup> have been used, whereas the AIC (Akaike's information criterion) for the training set and the correlation between observed and predicted flows (Cor(Y,Y)) for the test set have been used to assess prediction performance. The F tests of sig- ˆ nificance for the parameters of the considered (*full*) model and for the model included (*nested*) in the considered model are reported too.

Table 1 shows preliminary results for the four Gravity Models described in the previous section. MOD1, the classical Gravity Model in formula (1) with only *Population* and *Distance* as explanatory variables, has statistical significance (t and F tests have zero p-value), but rather low goodness of fit (adjusted R<sup>2</sup> is 34.5%) and prediction performance (for the test set, correlation between observed Y and estimated Y flows is 0.595); as expected, the estimated effects ˆ on the *Flows* are positive for *Population* and negative for *Distance*. MOD2, that includes *Mobile phone density* as explanatory variables, has statistical significance (F test reject the nested model MOD1), but doesn't improve substantially the fit (adjusted R<sup>2</sup> is 34.9%) and has the same prediction performance of MOD1 (but AIC is a little lower and Cor(Y,Y) for the test set is 0.594). ˆ When the dummy for the three *Internal flows* is added to the model (see the end of Section 2), results noticeably improve: for MOD3 the F test reject the nested model MOD2, the fit gets better (adjusted R<sup>2</sup> is 53.1%) and prediction performance increases (AIC decreases a lot and Cor(Y,Y) for the test set is 0.741); note that the presence in the model of ˆ *Internal flows* strongly reduce the effect of *Distance* (from −0.186 to −0.06) and slightly increase the effects of the two *Mobile phone density* on *Flows*. Finally, the introduction of the temporal effects as in MOD4 further improves the results: the F test reject the nested model MOD3, adjusted R<sup>2</sup> is 62.7%, AIC decreases further and Cor(Y,Y) for the test set is 0.808. ˆ *Hour* has the expected significant and parabolic effect on *Flows* (increasing from morning to afternoon and decreasing from afternoon to evening), *Day of the week* has a significant and negative effect for Saturday and *Month* has significant and negative seasonal effects, i.e. flows are lower in Autumn and Winter with respect to September: this rather unexpected effect may have been caused by the limitations caused by the COVID19 pandemic that began in October 2020. Note that introducing the time effects doesn't change substantially the parameter estimates for the other regressors respect to MOD3.

# 5. Concluding remarks

Using data on the flow of mobile phone signals of TIM users among different ISTAT census areas the classical Gravity Model and some its extensions have been preliminarily adopted to study dynamic of such flows over the time in the *Mandolossa*, an area at the western outskirts of Brescia in northern Italy, with the final aim of predicting the traffic flow during flood episodes.

In addition to the usual population and distance regressors, the joint use as explanatory variables in the model of time-varying mass variable represented by the density of TIM users by area and by time period and a proper set of temporal effects allow us to obtain a better linear fitting with respect to the classical Gravity Model, and a better traffic flow prediction for the flood risk evaluation. These preliminary results are promising, but some in-depth analyses have yet to be carried out. As explained at the end of Section 3, it will be important to evaluate


# Table 1: Preliminary results of four Gravity Models for the Mandolossa flows

*Notes:* For all the models, the variables flows, population, distance and mobile phone densities are in logarithms. Parameter estimates have been obtained using the standard OLS method. Significance codes for t and F tests: . p < 0.1; <sup>∗</sup>p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001.

the possibilities offered by the most appropriate estimation methods; moreover, the actual predictive capacity of the model for the purposes of the MoSoRe Project will have to be further investigated.

Finally, we are also evaluating to introduce in the Gravity Model other non-standard explanatory variables, related to to the number and the type of streets, the number of offices, restaurants or cinemas, which may be retrieved from OpenStreetMap, would allow to better characterize the areas of interest and further improve the model performance.

Future use of 5G and GPS technologies will facilitate the real-time assessments of the spatial distribution of people: with an early-warining system, alternative safe pathways could be identified and communicated to exposed people in order to facilitate their evacuation.

# Acknowledgments

Research carried out at the DMS StatLab, University of Brescia (*dms-statlab.unibs.it*), cofunded by *MoSoRe@UniBS* (Infrastrutture e servizi per la Mobilita' Sostenibile e Resiliente) Project of Lombardy Region, Italy (CallHub ID 1180965; bit.ly/2Xh2Nfr). We acknowledge anonymous reviewer for the fruitful comments.

# References


# **market in Italy during the Great Recession (2010-2015) The effectiveness of marketing tools in a consumer goods market in Italy during the Great Recession (2010-2015)**

**The effectiveness of marketing tools in a consumer goods** 

Giorgio Tassinaria , Demetrio Panarelloa <sup>a</sup> Department of Statistical Sciences "Paolo Fortunati", University of Bologna, Bologna, Italy. Giorgio Tassinari, Demetrio Panarello

# **1. Introduction**

During the Great Recession of 2008-2015, household consumption decreased in all EU countries (Streeck, 2016; Tooze, 2018). In Italy, for instance, at constant 2015 prices, the value of private national consumption decreased by 4.6% (Istat, 2021). It must be borne in mind that the considered period is very diverse. The Great Recession 2008-2015, as is well known, is Wshaped, with a first lower turning point in 2009 (financial crisis that spread from the USA to all high-income countries) and a second lower turning point in 2013 (sovereign debt crisis in EU countries). Therefore, this cyclical profile sees, within the considered period, the alternation of depressive and moderately expansive phases.

Faced with this situation, undertakings active in the markets for consumer goods with a high purchase intensity mostly reacted by means of strategies based on price reductions and promotions; conversely, most enterprises reduced their promotional advertising investments (Freo et al., 2020). Companies' marketing strategies in times of economic recession have been the subject of in-depth studies (Deleersnyder et al., 2009; Van Heerde et al., 2013). Most of the evidence presented in the literature confirms that the adopted marketing strategies vary throughout the different phases of the economic cycle (Lamey et al., 2012; Van Heerde et al., 2013). In the wake of recession, households reduce consumer spending, for instance by switching from national brands to private labels brands; at the same time, companies react by changing the marketing mix, reducing regular prices, making greater use of promotions, and cutting advertising investments. Van Heerde et al. (2013) shows that price elasticity increases during the downward phases of the economic cycle, whereas advertising elasticity increases during the expansionary phases. Besides, other studies find that increasing or maintaining advertising investments has a positive impact on brand performance during recessions (Deleersnyder et al., 2009; Kashmiri and Mahajan, 2014).

The subject of our analysis is the Italian market of tea-based beverages in the period 2010- 2015, of which the marketing tools' effectiveness and the competitive structure are examined, in order to ascertain the intensity and extent of price-based strategies compared to those that leverage advertising investments.

Based on the literature, we expect the price elasticity of each brand to be greater than the elasticity to advertising. Since we are dealing with a stationary market, we employ a market share model, making use of the methodology described by Cooper and Nakanishi (1988). This approach allows us not only to measure the impact of marketing mix on each brand's market share but also to identify the competitive structure.

# **2. Data and preliminary analyses**

The present study analyses the competitive situation and the effectiveness of price maneuvers and advertising investments in terms of increasing market shares.

We make use of monthly observations obtained by aggregating IRI Infoscan weekly surveys concerning Italian hypermarkets and supermarkets in the period from November 2010 to October 2015. For each brand, the sales in value and volume, the price per liter, the possible presence of price promotions, and the weighted distribution are known. The advertising carried out by each brand, sourced from Nielsen, is expressed in terms of Gross Rating Points referring to all mass

Demetrio Panarello, University of Bologna, Italy, demetrio.panarello@unibo.it, 0000-0003-1667-1936

93 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Giorgio Tassinari, Demetrio Panarello, *The effectiveness of marketing tools in a consumer goods market in Italy during the Great Recession (2010-2015)*, pp. 105-110, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.20, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Giorgio Tassinari, University of Bologna, Italy, giorgio.tassinari@unibo.it, 0000-0002-5161-7989

communication channels.

The brands on the market are numerous and heterogeneous. Price differences are considerable between one brand and another, ranging from 0.56 to 1.65 euros per liter. In order not to saturate the information capacity of the model in relation to the available data, requiring the estimation of an excessive number of parameters, we separately consider five different brands, so that they adequately represent the heterogeneity of competitors on the market, both in terms of price and market share. Altogether, such five brands cover about three quarters of the category's volume sales.

Since tea-based beverages are characterized by a low emotional involvement, the differentiation between brands is mostly based on tangible attributes and the product's price is seen as a mirror of its intrinsic quality. This is also reflected in the ratio of market share to share of voice, which is high for lower-priced brands with a higher market share. The choice of the product is made directly at the point of sale and is based on habits and routines. The process of purchasing tea-based beverages, a "luxury" good, follows the do-learn-feel scheme, whereby consumers know and are able to evaluate the product only after making the purchase; additional purchases may only happen after they learn about the product's actual quality and feel satisfied. Therefore, for products of this kind, the advertising investment is primarily aimed at ensuring that consumers recognize the product and are induced to carry out an initial test; then, only after repeated purchases, the goal turns into strengthening users' loyalty.

As could be expected, the advertising investments made by the different brands during the analyzed period show a very marked variability, in relation to the different time intervals in which the companies carried out their advertising campaigns. Before proceeding to the estimation of the attraction model, we verify that the five brands' market shares do not present unit roots (stochastic non-stationarity), by performing a Dickey-Fuller test augmented by means of seasonal dummies and deterministic trend (Table 1) on the log-centered market shares.


*Note: The null hypothesis is the presence of unit root.*

The trend of volume sales has a purely seasonal pattern. The total volume of sales in the category is not markedly affected by the economic crisis. We remark the existence of a consumer segment, not insignificant in its size, which buys the category even in the winter months. Category-level advertising investments follow the seasonal pattern: each year, they begin in the spring months, reach their peak at the beginning of the summer, and then gradually decrease until almost zero in winter.

The average price of the five considered brands increases over the period. Every year, the price decreases in the spring-summer months, due to more frequent price promotions, and increases in autumn and winter, with the average maximum price of € 1.06 recorded in November 2014. In general, prices are gradually increasing: in the considered period, the price went from € 0.89 to € 0.99 per liter, with an increase of more than 11%, against an increase in the general level of consumer prices in the same period of 7.5% and, for the general category of non-alcoholic beverages, of 8% (Istat, 2021). Overall, the fairly modest decrease in volume sales is therefore offset by the increase in unit prices. These characteristics make, in our opinion, the study of this category particularly interesting and peculiar.

# **3. Attraction model estimation**

For the estimation, the method of ordinary least squares individually applied to each equation was used in the first place; this procedure is equivalent to the Zellner estimator when the same regressors appear in each equation of the system (Cooper and Nakanishi, 1988). Prices and advertising investments of each brand were employed as independent variables in each equation, in addition to the weighted distribution, while the dependent variables are the log-centered market shares. Both the MCI and MNL models (Cooper and Nakanishi, 1988) are theoretically plausible; therefore, it is necessary to resort to empirical criteria for their selection. We opted for the use of the MNL model, as advertising investments are absent for many brands and in different periods (flight strategy).

We proceeded as follows: a) formal seasonality test; b) estimation of the complete MNL model by means of the OLS method after performing the data log-centering; c) analysis of the presence of residual autocorrelation; d) estimation of the final model through the Zellner's SUR method (Cooper and Nakanishi, 1988).

For what concerns seasonality, we performed a formal test through an OLS regression on seasonal dummies, not reported for the sake of brevity. In the equations for each brand, according to the previous results, the seasonal dummies and the deterministic trend are included.

The choice between the use of either a static or a dynamic model was solved by performing a Durbin-Watson autocorrelation test (Table 2) on the residuals from the OLS estimates, the results of which led us to prefer a dynamic formulation of the error correction type, as in three equations out of five we come to reject the hypothesis that residuals are white noise (at a 1% significance level).


**Table 2 – Durbin-Watson autocorrelation test.**

The presence of seasonality in market shares is confirmed by the OLS estimates. From brand to brand, the seasonality pattern presents different shapes and the deterministic trend shows different slopes.

The presence of non-significant coefficients in the OLS estimates led us to estimate the system of equations through the SUR method by setting the values of the barely significant parameters to zero. The equations also include the seasonal dummies, the deterministic trend, and each brand's log-centered market share delayed by one lag, in order to consider the dynamic aspect highlighted by the Durbin-Watson test. The results are shown in Table 3. The R2 coefficient weighted for the entire system is equal to 0.947.

The parameters concerning the influence of a brand's price on its market share present a negative sign, while those relating to competitors' prices are generally positive. Most coefficients regarding advertising investments are not significant, while the distribution confirms itself as an important marketing tool.

To verify the appropriateness of the restrictions imposed on the parameters of the set of equations, we made use of the F test (not reported for the sake of brevity), which confirms that we can rely on the restricted model.


**Table 3 – Coefficients of the full-effects MNL model estimated through the SUR method.**

*Notes: p-value < 0.01 = \*\*\*; p-value < 0.05 = \*\*; p-value < 0.10 = \*.*

# **4. Elasticity of shares with respect to marketing tools and basic market shares**

The estimated parameters of the restricted model allow us to determine the cross-elasticity coefficients, according to the following formula (Cooper and Nakanishi, 1988):

$$\varepsilon\_{\mathbb{S}\_l: \mathcal{X}\_{kf}} = \langle \mathcal{B}\_{kij}^\* - \sum\_{h=1}^m \mathcal{S}\_h \mathcal{B}\_{khj}^\* \rangle \mathcal{X}\_{kfi}$$

where is the elasticity of the market share of brand *i* with respect to the *k* marketing tool of brand *j*; are the estimated coefficients; and are the average market shares.

Examined by rows, the elasticity matrices provide information on the effects of marketing variables (own and competitors') on the share of a brand, while by columns they indicate the effects produced by a specific brand's marketing tool on its own share and on that of competitors; in essence, they provide useful information on the competitive situation in the examined market. In the following tables, we present the elasticities of market shares to the various brands' prices, advertising investments and weighted distribution.


**Table 4 – Elasticity to prices (authors' elaboration, Italy, Nov. 2010 – Oct. 2015).**

By observing the elasticities to the average prices in the whole period (Table 4), it is immediately clear that those relating to each brand's own price are all negative. Therefore, a price increase manifests itself in a more or less sharp decrease in a brand's own market share. Moreover, a modest proportion of cross-price elasticities shows a different sign than expected. Observing the table by columns, we find the elasticities of market shares with respect to the price of the examined brand: it is easy to notice that they differ much, both from one brand to another (which shows that price variations have effects of varying strength depending on the brand) and within the same column. Indeed, this is a clear sign of the existence of strong competitive asymmetries in the analyzed market.

Let us now examine the elasticities by brand, which present some noteworthy characteristics. Most of the values in the matrix are below one. Higher values mean more price competition between brands. Analyzing the values by rows, it emerges that Ferrero, the highest-priced brand, is characterized by a low direct elasticity in absolute value and is relatively isolated from the price maneuvers of other brands, thus confirming its role as the market leader.

Moving on to the column relating to San Benedetto, which is the most important follower, we can see that its price has a relevant impact on the other medium- and low-priced brands.

Briefly, the examination of the cross-price elasticity matrix, while confirming the importance of competitive asymmetries, underlines that the two main brands are rather isolated from each other as regards the effect of prices on market shares.


**Table 5 – Elasticity to advertising investments (authors' elaboration, Italy, Nov. 2010 – Oct. 2015).**

The coefficients regarding the elasticity of market shares to advertising investments (Table 5) are all close to zero. Therefore, such investments almost never seem to produce any noteworthy variation in either own or competitors' market shares. This is very much in tune with the coefficients relating to advertising investments resulting from the MNL models estimated through the SUR method, which were mostly non-significant.



Moving on to Table 6, we can notice that most direct elasticities are positive, while cross elasticities (which should be negative) often show different signs than expected.

An overall evaluation of the effectiveness of each brand's marketing strategy can be carried out by comparing the basic and average market shares for the whole period.


The basic market share defines the share that each brand would have if all brands had the same coefficients concerning marketing tools' effectiveness and if the intensity of their use were the same for each brand. In brief, it represents the intrinsic attractiveness of each brand.

Basic market shares are very different from the average shares observed in the considered period (Table 7). Ferrero, the brand that invests in advertising the most, manages to obtain a much higher average market share than its basic market share, while a significant erosion of the basic share can be pointed out for San Benedetto.

# **5. Conclusive remarks**

From the combined analysis of the effect on market shares of price, advertising investments and weighted distribution, it can be seen that the availability of the brands within a point of sale also plays a relevant role in influencing own and competitors' market shares.

Price competition does not seem to have considerable direct effects (cross elasticities lower than one). With this in mind, advertising investments, despite not having a major direct effect on market shares, are intended to increase brand awareness in consumers' minds, stimulating its recognition at the time of purchase, while price is the variable that ultimately determines the decision to buy the item. Therefore, the key elements in determining market shares in the examined category are the elasticity to price – especially the direct one – and to the weighted distribution, which is in line with the characteristics of this category of products (i.e., high purchase frequency and weak emotional involvement).

In conclusion, it should be remarked that the analyzed category recorded an excellent performance in the considered period. It is self-evident that our results ought to be expanded by considering a wider range of categories, in order to be able to draw more general conclusions.

# **References**


# **matches: which elements lead to a good shot?** Alessandro Lubiscoa **The role of the extra-man play actions in elite water polo matches: which elements lead to a good shot?**

**The role of the extra-man play actions in elite water polo**

<sup>a</sup> Department of Statistical Sciences, University of Bologna, Bologna, Italy. Alessandro Lubisco

# **1. Introduction**

Many studies on team sports seek to identify which elements set the winning team apart from the losing one and which elements of play lead to victory (Lupo *et al*., 2014). This is one of the reasons why match analysis has developed a great deal over recent years in many disciplines, including water polo.

In water polo, two teams, each of six outfield players and a goalkeeper, compete for four quarters of 8 minutes' real play in a playing area of 30x20m. All players are involved in both attack and defence. Generally speaking, the attacking team places one player at the centre (position 6, the centre forward) and arranges the others in a semicircle (Fig. 1a). Defence has a number of strategies which range from pressing to one of various types of defence zones. Each team has 30 seconds to complete a play action. If the attacking team still has possession of the ball after a shot, they then have at least 20 available seconds. This is a similar system to that adopted in basketball: 24 seconds for an action and at least 14 seconds following a shot. The team that scores the most goals wins. The match may also finish with a tie and, in direct elimination matches, a penalty shootout is used to determine the winner.

A frequent situation thought to be very significant to the final result of a match (Takagi *et al*., 2005) is one of numerical superiority usually called extra-man, man-up or a 6-on-5 situation (XM). This occurs when a player is temporarily excluded following a major foul (FINA, 2020) and is sent out of play for 20 seconds.

Coaches dedicate a lot of time to training their team to attack and defend in an XM situation. Briefly, players in attack line up along two lines: two players at 5/6 metres from the goal, each in line with the goal posts, and the others on the two-metre line, a sort of off-side line. So, there are two players in line with the posts and two on the flanks. This kind of attack is known as a 4-2 and is used by most teams. There is an initial attack formation called a 3-3, where 3 players are positioned on the external line and 3 others on the two-meter line. This formation is less frequent.

In reference to the attack formation 4-2, the positions of the players are numbered from 1 to 6 clockwise as you face the goal, starting from the player on the right on the 2-metre line (Fig. 1b).

The defending team generally places three players on the 2-metre line who 'jump' sideways to mark the two opponents either side of each defender and stop the wings, at 2 metres (1 and 4), from scoring by raising their arms. The two external players also move backwards and forwards to cover the goal area with raised arms, stopping the wings from scoring (2 and 3) and intercepting any passes toward the central players on the 2-metre line (5 and 6) as shown by the arrows in Fig. 1b. This formation is called a 2-3 defence.

As there are 20 seconds available for concluding the action (with some exceptions depending on the area where the exclusion takes place), the attacking team, with a series of passes and player movements must quickly try to disrupt the defence and enable a shot. For those in defence these 20 seconds are never-ending because the physical effort involved in mounting an aggressive defence able to prevent the attack from scoring is huge. This is why, for the attack, a coach will aim to improve the players' ability to move the ball quickly from one position to another without leaving the defence enough time to take up positions. And defence has to work on coordination of movement between players so that the attack finds it difficult to score too easily.

It has to be said that nothing is easy in water polo, particularly at high levels. Even in the socalled 1 against 0 situation, when one attacker is alone in front of the goalkeeper, actually scoring

99 Alessandro Lubisco, University of Bologna, Italy, alessandro.lubisco@unibo.it, 0000-0002-3440-7867 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Alessandro Lubisco, *The role of the extra-man play actions in elite water polo matches: which elements lead to a good shot?*, pp. 111-115, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.21, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the onsite conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

is by no means a foregone conclusion.

This paper investigates the issue of XM actions in detail. More specifically, the study analyses data from a recent European men's water polo championships, whose aim is to identify whether XM actions have any elements that lead to a good shot, meaning a ball in the goal even if it is saved.

Section 2 describes a preliminary analysis of the 48 matches in this championship. The results of the analysis carried out into the XM actions are presented in Section 3. The finalsection discusses some concluding remarks.

# **2. Preliminary analysis**

The forty-eight matches of the 34th European men's water polo championships held in Budapest in January 2020 were taken into consideration.

To verify the importance of XM actions in water polo matches, a preliminary analysis on all the matches was carried out starting from the information obtained from the play-to-play tables available on the championship site1 .

For each match, the following variables were taken into consideration: outcome, number of possessions, total number of actions (i.e. possessions that end with a shot, an exclusion or a penalty), number of 6-on-6 actions (EA), number of 6-on-5 actions (XM), total number of goals, number of goals scored in 6-on-6 actions (EG) and number of goals scored in 6-on-5 actions (XG).

After regular time, three of the 48 matches ended with a draw and the winner was decided with a penalty shootout: Spain defeated Hungary in the preliminary phase and Serbia in the quarterfinals, but lost the match for 1st place against Hungary. The definition of the winning/losing team in this case corresponds to the result of the match after penalties.

In the tournament, the number of possessions per team averaged 37.8 (SD=3.1) per match. With regard to actions, there were significant differences between winning (mean=30.8; SD=4.1) and losing (mean=26.0; SD=4.1) teams, as is the case for the number of goals per match, in both even (6-on-6) and in man-up (6-on-5) situations (Table 1).

Overall, most goals resulted from even-player actions (6.6 per match against 3.7 in man-up). This is due to the fact that 72% of actions were played in this situation.

Unquestionably, the probability that an even-player action concludes with a goal is lower than the same probability for man-up actions. As you can see in Table 1, 31.2% of even-player actions concludes with a goal, whilst with an extra man the percentage is 46.6%.

Differentiating between winning and losing teams, the percentages become 38.2% and 24.2% respectively in 6-on-6 actions (EG/EA). That is to say, the winning teams score a goal for approximately every three play actions with even players, whilst the losers score one every four.

<sup>1</sup> http://wp2020budapest.microplustiming.com/

When considering a numerically superior formation, the winning teams score on average one goal every two play actions (51.9%), whereas losing teams score 41.3% per action (XG/XM). There is, therefore, a distinctly higher performance in situations of numerical superiority, which is emphasised when differentiating between winning and losing teams.

So, when the opponent is given an exclusion it provides an opportunity to play an action with a higher probability of scoring a goal. This is why the objective for most of the game is to give the ball to the centre forward in order to obtain an exclusion.


Table 1: European men's water polo matches. Means, standard deviations, differences between winning and losing teams and p-value of ANOVA F test (significant differences in bold)

# **3. The extra-man action analysis**

In the previous section, analysis underlined the importance of a man-up situation.

The decision was made to proceed with an analysis of all man-up actions in all 48 matches in order to understand if any characteristics of numerically superior play actions can be identified which increase the probability of scoring a goal or at least of making a good throw at the goal.

For this purpose, data from official FINA<sup>2</sup> video footage of the European men's water polo championships were collected and analysed. The dataset is formed of 979 extra-man plays. This number is higher than the total number of play actions in Table 1 (762); the reason being that numerically superior play actions were also considered when the excluded player had already returned to play, but had yet to reach his position in defence.

Focusing attention on the characteristics of an action that depend on the way the team plays, for each of the chosen actions, regardless of the outcome, the following variables were considered: number of passes, action duration (in seconds), sequence of passes (positions in Fig. 1b), time out call (Yes/No).

The following variable were then defined:


<sup>2</sup> Dailymotion platform

with the ball; otherwise 0.


The reason behind using the GoodShot variable instead of Goal relates to the fact that a wellplayed action may not lead to a score because the goalkeeper performs an exceptional save. It can still be defined as a well-played action that satisfies the coach, even though a score would have been preferable.

Pearson's Chi Square test showed a significant association between the occurrence of a good shot and some of the variables considered (in bold), as shown in Table 2.


Table 2: Pearson's Chi Square test results for the selected variables and GoodShot

In order to illustrate the effect of significant variables on the probability of performing a good shot, a logistic regression model was estimated. As the number of passes is strongly correlated with the duration of the action (r=0.774), NPassesCat was not included in the model. These results are shown in Table 3.


Table 3: Logistic regression model for GoodShot

Given substantial heterogeneity in the data, a Nagelkerke pseudo R-squared of 0.21 can be considered acceptable (Hu *et al*., 2006)

The results show that DurationCat and ZoneCat variables both significantly affect the probability of performing a good shot. In particular, an action that lasts less than 11 seconds has 1.423 times the probability of concluding with a good shot than an action that lasts from 11 to 15 seconds. And the probability of "long" actions, as in those which last more than 15 seconds, finishing with a good shot is 2.1 times higher than the probability associated to 'intermediate' ones. The reason may be that short actions are often the result of an extremely fast conclusion. For example, if the centre forward attains an exclusion and finds himself in front of the goal unmarked for a few seconds, this provides a great opportunity to score if the ball is passed to him rapidly. Longer actions, on the other hand, permit the attacking team to 'upset' the defence, thus creating good opportunities for goals.

As far as the ZoneCat variable is concerned, play actions which are more likely to generate a good attempt at scoring are those concluded from the 'external' zone. Attempts from 'lateral' zones are more difficult because the angle of attack is narrower and the defence covers the goal area more effectively. Conclusions from the goal posts are also difficult because it is hard to get a good ball into that area. The defence covers that area densely and easily succeeds in neutralising the play action.

Despite a significant connection to a good shot, the LongPasses variable was not relevant in the model, even though long passes are thought to contribute to upsetting the defence.

# **Conclusions**

This paper concentrates on extra-man play actions, believed to bear great importance on the outcome of a water polo match. Forty-eight matches comprising the men's 2020 European water polo championships in Budapest were considered.

A significant association was observed between the duration of man-up actions, the origin of the shots and the occurrence of good shots in the 979 actions analysed. Man-up actions that last less than 11 seconds and 'long' actions are more likely to produce good shots. In addition, external shots are more likely to be good shots than posts or lateral shots.

Several characteristics were recorded on each man-up action, but few of them seemed to influence its outcome. This may be explained by the fact that the outcome of a play action is not only linked to the execution of a strategy, but it is influenced by factors which cannot all be measured. The opponents' performance naturally has an effect on the game. When faced by a solid defence, the effectiveness of the attack is likely to suffer negatively. Psychological conditions can also have a positive or negative effect. The coach's role is to motivate his team and bring out the best in them particularly when the opponent is better or the odds at stake are high.

It is not unsurprising that clear cut indications to follow for a numerically superior attack were not found: the result of a match is not only a question of technique. The beauty of team sports also lies in observing the more unexpected result.

# **References**

FINA (2020). *FINA Water Polo rules*, https://www.fina.org/water-polo. Book


### Francesca Giambona, Adham Khalawi, Lucia Buzzigoli, Laura Grassini and Cristina Martelli **Big data analysis and labour market: an analysis of Italian online job vacancies data**

Big data analysis and labour market: an analysis of Italian online job vacancies data

Dipartimento di Statistica, Informatica, Applicazioni, Università di Firenze, Firenze, Italia Francesca Giambona, Adham Khalawi, Lucia Buzzigoli, Laura Grassini, Cristina Martelli

# 1. Introduction

Economists and social scientists are increasingly making use of web data to address socioeconomic issues and integrate existing sources of information. The data produced by online platforms and websites could provide a lot of useful and multidimensional information with a variety of potential applications in socio-economic analysis. In this respect, with the internet growth and knowledge, many aspects of job search have transformed thanks to the availability of online tools for job searching, candidate searching and job matching.

In European countries, there is growing interest in designing and implementing evidence-based decision-making tools to analyse Internet labour market data. The analysis of labour market online data could provide useful information, as big data - jointly with official statistics - could help answer the question namely "How to tackle the mismatch between jobs and skills?".

In this regard, the topic of skills gap, how to measure it, and how to bridge it with education and continuous training have been tackled by using the big data collection, as in the Cedefop (European Center for the Development of Vocational Training) initiative (Cedefop, 2018).

This contribution focuses on the issues arising from the use (and the usefulness) of online job vacancies (OJVs) to analyse the most recent Italian data. Data available for the years 2019 and 2020 are analysed to evaluate whether there has been any change in terms of required skills in occupations after the COVID19 pandemic. We use the index proposed by Deming and Noray (2020) that accounts for the change in skills for each occupation (here considered) between 2019 and 2020. Furthermore, some regional information is provided due to the particular importance that the territory has in the Italian labour market.

# 2. Online job vacancies and data

For some years on, OJVs have received increasing attention as an important source for realtime information on the labour market: thanks to the availability of more and more efficient big data analysis and text mining techniques, an enormous amount of information can be quickly collected and processed to monitor the changes in job demand.

These data provide a detailed and timely description of the jobs: the set of skills and the level of education and experience requested by the companies; the geographic location of the job; the type of contract; the economic sector of the company, etc..

In this sense, even if they cannot be used directly as a support tool for employment policies, they can be considered part of the modern view of Labour Market Information Systems (LMIS, see ETF, 2019), together with more traditional sources, such as statistical and administrative data. Moreover, OJVs also represent an important link between the labour market and the education system, because they provide updated information on the skills required by the market, an essential input to configure effective training offers (OECD, 2020). On the other hand, this type of data also has evident limitations and drawbacks, mainly related to representativeness and, in general, to quality issues (Cedefop, 2019).

105 Francesca Giambona, University of Florence, Italy, francesca.giambona@unifi.it, 0000-0002-1760-2062 Adham Kahlawi, University of Florence, Italy, adham.kahlawi@unifi.it, 0000-0003-4040-5590 Lucia Buzzigoli, University of Florence, Italy, lucia.buzzigoli@unifi.it, 0000-0003-3297-1023 Laura Grassini, University of Florence, Italy, laura.grassini@unifi.it, 0000-0003-4678-6507 Cristina Martelli, University of Florence, Italy, cristina.martelli@unifi.it

Francesca Giambona, Adham Khalawi, Lucia Buzzigoli, Laura Grassini, Cristina Martelli, *Big data analysis and labour market: an analysis of Italian online job vacancies data*, pp. 117-120, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518- 461-8.22, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Our study is based on OJVs data produced for Italy by Burning Glass Technologies1 (BGT), a company that collects millions of online job posting by scanning daily thousands of Internet sources (dedicated portals and company's websites).

The procedure for creating the database is very articulated and complex (ETF, 2019). The data are collected from different sources with various methods (API, scraping, crawling), based on the web portal characteristics, and are pre-processed to eliminate noise, outliers and duplicate entries. Then with the application of text classification algorithms the content of the ads is coded using categories based on reference taxonomies: in short, the taxonomy of variables is standardised according to the official classifications used in the various countries. These data have received increasing attention and have been analysed in numerous research works. Recently, Cammeerat and Squicciarini (2021) have analysed BGT data from a statistical point of view to assess their representativeness. Our data, in particular, refer to the OJVs posted on 239 online job portals in Italy in the period January 2019 - December 2020. The total number of ads is 1,741,621 in 2019 and 1,748,431 in 2020. They contain about 70 variables, most of them referred to official classifications (shown in brackets in the following): opening and closure date of publication, identification and description of occupation and related skills (ESCO classification), job geographic location (LAU and NUTS), economic sector of the company (NACE), educational level (ISCED).

To the aim of this contribution we use the BGT data to explore if between 2019 and 2020 skill changes occur by considering the occupation and regional classification.

# 3. Methods

Skill change is measured by the index proposed by Deming and Noray (2020) in order to understand if between 2019 and 2020 changes in skills required occurred.

For each year, BGT data collect all skills required for each job vacancy (JobAds) and for each occupation. The formulation of the index for the single occupation o is:

$$SCI\_o = \sum\_{s=1}^{S} \left| \left( \frac{\#\,JobA \text{ds}\_{os}}{\#\,JobA \text{ds}\_o} \right)\_{2020} - \left( \frac{\#\,JobA \text{ds}\_{os}}{\#\,JobA \text{ds}\_o} \right)\_{2019} \right|$$

where # JobAdsos is the number of job ads requiring skill s for the occupation o. This index measures the net skill change in each occupation: the greater the index value the greater the skill change.

Due to the peculiarities of the Italian labour market, it may be useful to report the index value by region instead of occupation, in order to understand if and in which regions there has been the greatest change in required skills. To this aim, the above equation becomes:

$$SCI\_r = \sum\_{s=1}^{S} \left| \left( \frac{\#\,JobA \text{ds}\_{rs}}{\#\,JobA \text{ds}\_r} \right)\_{2020} - \left( \frac{\#\,JobA \text{ds}\_{rs}}{\#\,JobA \text{ds}\_r} \right)\_{2019} \right|.$$

where r stands for each Italian region. And, finally, by crossing occupations and regions

$$SCI\_{ro} = \sum\_{s=1}^{S} \left| \left( \frac{\#\,fobA\,\mathrm{ds}\_{ros}}{\#\,fobA\,\mathrm{ds}\_{ro}} \right)\_{2020} - \left( \frac{\#\,fobA\,\mathrm{ds}\_{ros}}{\#\,fobA\,\mathrm{ds}\_{ro}} \right)\_{2019} \right|$$

�

<sup>1</sup> Source: Burning Glass Technologies. burning-glass.com. 2021.

# 4. Empirical findings

The index SCIo is calculated for each occupation available in the BGT data to assess if changes occurred between 2019 and 2020. Highest values (i.e. the highest skill changes) concern mainly occupations related to the ICT as: statistical and mathematical technicians and similar, software and application developers and analysts not classified elsewhere, web and multimedia developers, software developers, specialists in databases and computer networks not classified elsewhere, web technicians and specialists in the design and administration of databases. We find also some occupations as public transport controllers and conductors, pawnbrokers and loan officers and education specialists not classified elsewhere. Occupations as geologists and geophysicists have the lowest SCI values.

Overall, some skills required in 2019 disappear in 2020 such as: MySQL or searching online information; on the contrary, new skills appear in 2020 such as: buy raw materials, maintain relations with suppliers, be updated on social media, interpreting the automatic call distribution data and create animation.

For specific occupations, we find some skills that in 2019 are not required. For example Android in the occupation social networking or also sell the services in occupation statistical and mathematical technicians and similar. Overall, skills required in 2020 (respect to the previous year) mainly concern the (advanced) use of computer and statistical tools, the ability to adapt to change and work in a team, offer support to customers.

Due to the territorial characteristics of the Italian labour market, it is interesting to investigate if between 2019 and 2020 there was a change in the skills required at the regional level using the index SCIr. Results highlight the index is higher for Molise, Calabria and Lazio, whilst the index is lower for Friuli Venezia Giulia, Marche and Emilia Romagna.

Graph 1: skill change index (SCI) at regional level

If we cross the information about occupation and regions it is possible to analyse, for the occupations with the higher SCIo, in which region the change was highest and, therefore, whether there are any notable regional differences. Graph 1 displays the SCIr values quartiles for the overall occupations, and the SCIro values of some occupations with higher changes.

In this respect, if we consider, for example, those occupations with highest values of SCIro we can appreciate slight different patterns across regions. In fact, we observe high values of the coefficient of variation (CV) of SCIro for those occupation with highest skill changes as, for example, CV=0.57 for mathematicians, actuaries and statisticians...

# 5. Some conclusions

The online job vacancies data give us the chance to improve information about labour market with the availability of timely data about the demand of businesses and the skills required for each occupation. In this contribution, by using the BGT data available for the years 2019 and 2020, we apply the skill change index proposed by Deming and Noray (2020) to understand if skills demand changed, for which occupation and if there are Italian regional differences.

Empirical findings suggest that between 2019 and 2020 skill changes occur, especially for some occupations and in some Italian regions. This result proves that the change in the skills required is obviously linked to each occupation (ICT-related occupations are the ones with the greatest dynamism) and to regional business environment.

By crossing occupation with regions, the skills change appears much differentiated between regions proving that for the same occupation the change in skill requirements coming from businesses are not the same, perhaps underlining a different local "perception" with respect to the skills required to carry out the same occupation.

# References


### Andrea Marletta<sup>a</sup> <sup>a</sup> Department of Economics Management and Statistics, **Sizing & Allocation in Labour Market: business strategies and multivariate analysis**

Sizing & Allocation in Labour Market: business strategies and multivariate analysis

> University of Milano-Bicocca, Milan, Italy Andrea Marletta

# 1. Introduction

In Labour Market, the issue of Sizing and Allocation is a largely discussed problem (Mariani, 2002). In this study, this topic has been considered from a statistical point of view. Indeed, the choice to increase or decrease the number of employees after a change in the marketing strategy needs a very accurate analysis. If, for example, a company decides to launch a new product on the market, it could be necessary to recruit new resources. The proposed statistical approach aims to give some hints about how many (Sizing) and where (Allocation) these resources have to be placed. This process is based on the features of the existing market and the territorial geography. Statistically speaking, multivariate analysis techniques have been presented as exploratory tools. In the application, a Principal Component Analysis has been used to investigate the business environment after some qualitative interviews to the board of the company. In a second step, some different scenarios have been proposed to determine the exact number of new resources using a data hybridization technique including internal and external sources. Finally, the allocation of the new hired on the Italian territory has been achieved thanks to the construction of a territorial potential index.

# 2. Methodological tools

This study is the result of a collaboration of the Bicocca Applied Statistics Center (B-ASC) and a private company requesting a new rule based on a statistical indicator for reorganize their employees after the introduction in the market of a new product. The Bicocca Applied Statistics Center (B-ASC) aims to promote the application of statistical methodologies within private companies and public organisations. The Center's main objective is to represent a point of reference for companies wishing to develop a statistical approach to decision-making processes, using quantitative methods and integrated information processing systems.

In particular, this collaboration aims to offer different scenarios for representatives' activity, representing the most appropriate models to satisfy both the company's needs and its competitiveness within the reference market. The term "scenario" is here intended as a possible reallocation of the workforce following a change in the marketing strategy. To reach this purpose, some internal business data will be compared to Open Data, considering the type of subject of interest, the market dynamics and the prescriptive potential.

This project has been divided into four phases: firstly, a qualitative analysis has been provided through semi-structured individual interviews with company managers involved in the strategic and operational management of the markets; secondly, a structured database has been built through data collection and hybridisation of open data and business internal data; successively, a sizing model has been developed through the synthesis of indicators and weightings; finally, the measure of the actual and potential effort in terms of promotional pressure, indices of territorial potential may be applied to define the placement of the new resources in contiguous and/or nested areas.

109 Andrea Marletta, University of Milano-Bicocca, Italy, andrea.marletta@unimib.it, 0000-0002-4050-5316 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Andrea Marletta, *Sizing & Allocation in Labour Market: business strategies and multivariate analysis*, pp. 121-124, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.23, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www. fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

From a methodological point of view, multivariate analysis techniques has been proposed as possible tools in order to achieve the company's purposes. Using data from the qualitative analysis based on individual interviews, a Principal Component Analysis has been applied considering the frequency distribution of the terms present in the textual corpus (Jolliffe, 2002). After the construction of the structured dataset, a Principal Component Analysis on some internal and external indicators has been used to synthesize the potential of a specific geographic area. Finally, the results from these two approaches are used to propose some different solutions for sizing and allocation of possible new resources in the company.

# 3. Principal results

The qualitative analysis based on individual interviews produced some evidences about the vision of the manager board of the involved company. Among the main results, some considerations of the managers have been extracted: "The effort required from the company appears blurred between the global and local vision. It is important to be able to integrate local needs with global strategies. The arrival of new products may be an opportunity for change. In view of a new product launch, the interviewees agree on rethinking the presence on the territory. This may happen in two ways, by acting on the mix of products on offer or on the sales force. A meeting point must be found between the company's revenue and the working efficiency of the employees."

Different scenarios have been proposed as an alternative to the current situation to contemplate a new product's launch addressing a new target. Some managers underlined the "short blanket dilemma": to add a new product, something else should be removed. Otherwise, it is necessary to make an investment. Defining a new structure may help to be more efficient and to manage new products launch in the future. Optimal segmentation and targeting are crucial. Some external barriers should be considered, e.g. regional restrictions.

All the interviews have been analysed to achieve the key concepts and obtain a multiperspective vision of the company. Firstly, the term frequency has been considered to build a dictionary as a Text Mining technique. From a detailed analysis of the interviews, the main concepts have been extracted. The term frequency allows to obtain a quantitative variable, and for supporting the conceptual analysis, a PCA has been applied on these data. From the PCA, two components explaining the 74% of the variance have been extracted.

Figure 1: Cartesian plane after PCA qualitative analysis, 2019, Italy

The first component on the horizontal axis represents the continuum between a strategical or a tactical vision. The second component on the vertical axis represents the continuum between an internal or external vision. Using this technique, each term could be associated to one quarter of the cartesian plane. In figure 1, it is reported the cartesian plane with some key concepts for each quarter. These concepts are the results of the set of words analysed, unfortunately for privacy reasons, it was not possible give more information about respondents and terms used.

After the qualitative analysis, in order to perform the sizing and allocation model, the final dataset containing quantitative indicators has been collected mixing internal and external sources. The internal sources are represented by data about sales of the products in the market and reports showing the daily activity of the employees. The external sources have been collected by using the healthcare data warehouse of National Institute of Statistics (Istat) named Health for All (Istat a, 2020) and the portal demo-istat for obtaining some demographic indexes in the Italian territory (Istat b, 2020).

The hybridization of these sources led to a structured dataset where each row represents an Italian geographical area. The used territorial classification is NUTS 1, NUTS 2 and NUTS 3. The variables for the definition of the potential have been divided into three categories: structural, market and promotional pressure. The first group is composed by some indicators about total, female and female 15-49 years population, birth rate, number of deliveries, physicians and total number of beds in specialized wards. In second group, some KPI about market sales have been reported. Finally, the third is about performance indexes of the employees.

In order to make assumptions about the capacity of a new team to determine the correct sizing, some hypothesis about the number of working days have been assumed. The potential portfolio has been computed only by considering the definition of the total number of physicians in portfolio and computing a number of visits for day. The final sizing model has been obtained through the potential portfolio, the number of physicians and the working days. The original workforce of the team was made of 112 employees. After the launch of a new product in the market, using the sizing model, the proposed new team is composed by 132 elements, with a differential of +20 employees1.

Once the sizing phase is completed, the allocation phase allows to arrange the new resources in the Italian geographic area. The first allocation is about NUTS 1 and NUTS 2 units. An index of Territorial Potential (ITP) for Area and Region has been computed to detect under-estimated territories (Mariani, 2002). The ITP explains 79% of the variance using the first component of the ACP.

In table 1, for each Italian territory belonging to NUTS 1, it is possible to represent the allocation through the use of the ITP. The area with the biggest increase is South & Islands with a differential of 12 units. Similar results are available at NUTS 2 level.


Table 1: Allocation of new hirings in the company for NUTS 1, 2019, Italy

In table 2, it is displayed the ITP for Italian regions. Lombardy is the region with the highest ITP, this means that in North-West area, a possible new hiring could regard this region.

<sup>1</sup>In order to respect a non-disclosure agreement between the B-ASC and the interested company, all quantitative results have been blinded and re-scaled.


Table 2: ITP for NUTS 2 regions, 2019, Italy

Similar considerations could be hypothesized for NUTS 3 regions.

# 4. Conclusions

In this work, the problem of sizing and allocation has been considered at business level. In particular, starting from a real case, thanks to the application of some multivariate techniques, an exploratory approach has been proposed to determine the number of new hirings. This approach consists in a multi-steps procedure. Firstly, a preliminary analysis was based on some qualitative interviews to the top managers of the company. These interviews led a text mining analysis, in which through a PCA a dictionary based on frequency terms was obtained. This qualitative analysis allowed to visualize the possible strategies and the different visions proposed by the managers.

Starting from this qualitative analysis, a dataset was built after an hybridization of business and external sources to perform the sizing and allocation model. Each considered variable refers to an Italian geographical area. Similar analysis was performed at NUTS 1, 2 and 3 level. The sizing step was realized by considering the starting number of employees, the working days, the potential portfolio and the number of involved stakeholders. The allocation step was achieved through a PCA based on selected KPI about structural, market and promotional pressure. This PCA led to an Index of Territorial Potential (ITP). At level of NUTS 1, the South & Islands area has been detected as under-estimated, so the majority of new hirings could regard this area. At level of NUTS 2, Lombardy is the region with the highest ITP.

In conclusion, this approach could be considered as a valid alternative to solve the problem of sizing and allocation of new resources in Labour Market when a company chooses to launch a new product.

# References

Istat a. (2020). Health for All. https://www.istat.it/it/archivio/14562

Istat b. (2020). Geo-demo. http://demo.istat.it/

Jolliffe, I. T. (2002). *Principal Component Analysis, 2nd edn. Series: Springer Series in Statistics*, XXIX, 487. illus. Springer, NY, 28.

Mariani, P., (2002). *La statistica in azienda, contesti ed applicazioni*. Franco Angeli, Milano.

### F.D. d'Ovidio<sup>a</sup> , A.M. D'Uggento<sup>a</sup> , R. Mancarellab , E. Tomaa <sup>a</sup> Department of Economics and Finance, University of Bari *Aldo Moro*, Bari, Italy. b ARTI - Apulia Region Agency for Technology and Innovation, Bari, Italy. **Post-stratification as a tool for enhancing the predictive power of classification methods**

**Post-stratification as a tool for enhancing the predictive power of classification methods**

F.D. d'Ovidio, A.M. D'Uggento, R. Mancarella, E. Toma

# **1. Introduction**

As is well known, any decision-making model involving classification algorithms often faces the problem of predictive or diagnostic power (sensitivity or specificity), which tends to decrease rapidly as the asymmetry of the target variable increases (Sonquist et al., 1973; Fielding 1977). For example, segmentation analyses with categorical target variables generally provide very little improvement in purity (or none at all) if the least represented category accounts for less than one-fourth of the cases of the most represented category. The same problem occurs with other theoretically more exhaustive techniques, such as artificial neural networks. In fact, the optimal situation for any classification analysis is the maximum uncertainty, namely the equal distribution of the target variable.

Certainly, some classification techniques are more robust, such as those based on a logit transformation of the target variable (Fabbris & Martini 2002), which is less sensitive to the distribution's shape. However, even this technique is affected by the distributive asymmetry of the target variable, as will be shown below.

Indeed, beginning from the results of a direct survey in which the target variable (binary) was highly asymmetric (12.3% versus 87.7%), the first analysis performed here shows that even logit models with very significant parameter estimates can have an insufficient fit and such low predictive power that they are useless in decision-making processes.

To address this prediction problem, we tested a post-stratification technique originally developed to solve classification problems by making a training sample that is artificially symmetrical in terms of the target variable's distribution.

In this way, a substantial increase in goodness of fit and predictive ability was achieved for both the symmetrized sample and, more importantly, for the original sample, whose probabilities of success are assessed by the parameters estimated by the model.

# **2. The case study**

A sample of participants in a national survey on dietary habits was studied from December 2020 to the end of May 2021 (in continuation of similar surveys carried out since 2018), selecting only those who had regularly completed the proposed questionnaire, corresponding to 2,562 people residing or domiciled in Italy. One of the research topics was the tendency on the part of Italians to eat away from home, i.e, in restaurants or pizzerias, in view of the restrictions necessitated by the COVID-19 pandemic.

The target variable resulted from a question about the frequency with which subjects tended to eat outside home, distinguishing between sporadic customers (who did so, at most, occasionally) and those for whom eating at a restaurant was a usual habit. The percentage of the latter, which had never been very high in previous years, dropped sharply to zero during the pandemic period, but because the survey investigated (even retrospectively) the eating habits of respondents, the result was not quite so poor. However, considering that the pandemic had already affected the social habits of Italians prior to 2021, the response variable shows that

113 Francesco D. d'Ovidio, University of Bari Aldo Moro, Italy, francescodomenico.dovidio@uniba.it, 0000-0003-1641-039X Angela Maria D'Uggento, University of Bari Aldo Moro, Italy, angelamaria.duggento@uniba.it, 0000-0001-9768-651X Rossana Mancarella, ARTI, Agency for Technology and Innovation of Apulia, Italy, r.mancarella@arti.puglia.it, 0000-0001-8179-4970 Ernesto Toma, University of Bari Aldo Moro, Italy, ernesto.toma@uniba.it, 0000-0002-4817-7169

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

F.D. d'Ovidio, A.M. D'Uggento, R. Mancarella, E. Toma, *Post-stratification as a tool for enhancing the predictive power of classification methods*, pp. 125-130, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.24, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

more than 87.7% of respondents (2,248 people) fall into the "non-customers" group and less than 12.3% (314 people) fall into the "restaurant lovers" group.

Despite this outstanding asymmetry, an investigation was conducted into the individual characteristics that were found to be related in some way to the target variable<sup>1</sup> and may explain the motivations for this tendency to eat away from home.

To this end, a common logistic regression model was first developed for exploratory purposes<sup>2</sup> . Such a model is not reported here, because it includes variables with insufficient or zero significance, although it has high statistical significance for the variables gender (p<0.002), work position (p<0.001) and food-delivery frequency (p<0.001); however, the model has minimal fit to the data (Cox-Snell R<sup>2</sup> = 0.094; Nagelkerke R<sup>2</sup> = 0.179) and minimal predictive power for the category of interest: only 28 cases were correctly identified as "habitual customers" (8.9% of actual cases).

The *correct identification* of "non-customers" (2,224 cases, that is, 98.9% of the subgroup), in contrast, is very relevant, but this result seems trivial.

The overall percentage of correctly identified cases is 87.9%, but it should be noted that, simply assigning the sampling mode "non-usual customer" to predict all cases would have resulted in 100% correct classifications of non-customers and, of course, no correct classifications of usual customers. In short, over 87.7% of cases could be correctly classified simply by assigning the mode value, without the need for complex statistical processing.

Estimating a more articulated logistic model, one that also included the more important interactions among the explanatory variables (but was made parsimonious by using the stepwise forward-deletion criterion, i.e., gradually inserting the most related variables and removing the non-significant ones), did not improve the result.

However, the final model is shown in Table 1, and it is interesting in its own way. The reference categories of the explanatory variables (referred to as *baselines* and shown in brackets after the names in the table) are generally identified with the first category, except for employment position.

This model (although better than the previous one, at least in terms of potential generalization due to the statistical significance found for many variables and items) is also affected by an overestimation of "non-habitual customers," as shown in Table 2. In fact, compared to the almost perfect classification of these (99.1%), few regular customers of the restaurant are also correctly classified, at only 27. Therefore, the overall correct classifications are almost entirely due to the predominance of "non-habitual customers" in the sample, for which the usefulness of the model for predictive and decision-making purposes remains very limited or practically zero.

<sup>1</sup> The following individual characteristics were considered: *Gender* (F, M), *Age group* (18 to 80 years), *Highest level of education attained* (from primary or lower secondary school to PhD, also including higher non-university studies), *Employment position* (entrepreneur, full-time employee, part-time employee, self-employed, unemployed, student, retired, other position), *Marital status* ("Married or Cohabiting" to "Single/never married", but also "I prefer not to say"), *Dietary habits* (omnivore, omnivore with reduced meat in diet, vegetarian, vegan), *Average time spent preparing meals at home* ("No time, do not cook at home", and ranging from "Less than 30 minutes" to "4 hours or more"), *Frequency of using food delivery*, *Frequency of buying sustainable food*, *Frequency of buying fresh food*, *Frequency of buying local food*, *Frequency of buying organic food*, *Frequency of buying food "Made in Italy"* (all frequency questions ranging from "Never" to "Always"), *Willingness to pay an extra fee for "sustainable" food* (scale from "definitely not" to "definitely yes"), *Willingness to pay an extra fee for "Made in Italy" food* (same scale as previous question), *Annual income class* ("Not specified", and ranging from "Less than 4,500€" to "Over 130,000€").

<sup>2</sup> The statistical tests used in the analysis are 1) the maximum likelihood ratio test, in terms of improving the fit of the model by adding or removing variables, and 2) the Wald test, which is used to assess the statistical significance of individual parameters.


**Table 1.** *Estimation of logit model's parameters related to respondents' propensity to consume meals at a restaurant or pizzeria.*

*Significance of parameters: (°) 10%; (\*) 5%; (\*\*) 1%; (\*\*\*) 1‰*

**Table 2.** *Matrix of correct classification of the model.*


# **3. Post-stratification for symmetrisation of the target variable**

The main reason for the poor performance described above is undoubtedly the extreme asymmetry of the alternatives investigated. Indeed, if about 90% of the observations have one of the two modalities, in practice, any analysis aimed at assessing the probability of the complementary modality will be able to use only a minimal fraction of the necessary information. This phenomenon, which is almost fatal in other statistical techniques based on the search for the best predictability (for example, in the analysis of segmentation, Fabbris, 1997; Fabbris & Martini, 2002), is less relevant in logit analysis, especially when the samples are quite numerous. However, it persists and, sometimes, makes any decision rule impossible or very difficult.

Therefore, here, it was appropriate to experiment with a "Deep Learning" technique that has previously shown excellent results in solving very heavy penalties for symmetry in segmentation analysis and, later, in artificial neural networks elaborated on the basis of dichotomous response variables (d'Ovidio, Mancarella & Toma, 2016): the formulation of a *symmetric learning sample* constructed by randomly extracting, from the group of statistical units with the majority response, a subgroup of the same size as the one indicating the minority response. The combination of the two subgroups provides a (post-stratified) sample that is almost symmetric in terms of the target variable, although it is undoubtedly smaller in size. In fact, through the above procedure, in addition to the 314 surveyed customers of restaurants and pizzerias, 320 people were randomly selected who ate out only occasionally or never<sup>3</sup> . The corresponding percentages are 49.5% and 50.5%, and the almost perfect symmetry of the distribution of the responses should improve the predictive power of the model.

Table 3, which was elaborated with the same criteria as the previous Table 1, highlights some important differences. First, there is an absence of significant interactions, so *Gender*, whose effect was previously diluted in the interactions, assumes considerable and significant importance in its own right; both the variables *Average time devoted to cooking at home* (but not its specific modalities) and *Willingness to pay an extra fee for food "Made in Italy"* assume statistical relevance; in contrast, *Marital status* and *Frequency of buying "sustainable" food* lose all their relevance and do not appear in the model, while *Use of food delivery services* (which indeed replaced restaurants and pizzerias in terms of the habits of many Italians in the pandemic period) retains statistical significance and much of its relevance. The model fits better, even if sample size is smaller: Cox-Snell R2 = 0.157; Nagelkerke R<sup>2</sup> = 0.210.

Finally, the predictive power of the model assumes acceptable values (Table 4), reaching almost two-thirds of correct predictions for the target variable (and surpassing this level in the correct classification of respondents who do not tend to have lunch or dinner outside the home), with 63.4% correct classifications of regular customers of restaurants and pizzerias.


**Table 3.** *Estimation of logit model's parameters related to respondents' propensity to consume meals at a restaurant or pizzeria, symmetrised sample.*

*Significance of parameters: (°) 10%; (\*) 5%; (\*\*) 1%; (\*\*\*) 1‰*

3 The number don't match perfectly between the two groups, because an unavoidable approximation of the computerised procedure of random extraction of the sample of the non-customers respondents.

**Table 4.** *Matrix of correct classification of the model, symmetrised sample.*


The striking difference in structure between the model shown in Table 3 and the previous model is obviously due to the different hierarchy of objectives. The model shown in Table 1, while it aimed to identify the characteristics of individuals who tend to eat outside home, necessarily identified, instead, only the variables that characterise individuals who are not accustomed to eating in restaurants. The present model, on the other hand, correctly identified the primary required characteristics, certainly not optimally but well enough for the purposes of the study.

To investigate the reproducibility of the results obtained, it is possible, to calculate the value that the probability of success *p* of each unit of the total sample assumes using the estimated coefficients, in a logit transformation, for the symmetrised sample (of course, by setting the *baseline category* coefficient to zero):

$$\text{logit}(p) = b\_0 + b\_1 \mathfrak{x}\_1 + b\_2 \mathfrak{x}\_2 + \dots + b\_m \mathfrak{x}\_m \mathfrak{x}\_m$$

For each subject, the following is then calculated:

$$p = \frac{\exp[\text{logit}(p)]}{1 + \exp[\text{logit}(p)]},$$

rounding the result to the value that identifies the target characteristic "habitual customer" if this probability is close to 1, as well as to the value of the reference characteristic "noncustomer" if it is close to zero. In practice, the cut-off line is assumed to be 0.5, in accordance with the given threshold for statistical software.

Thus, once the "expected condition" has been identified (and assigned to a specific record) for each unit of the joint sample, the collected and "expected" data can be easily compared in a contingency table that plays the role of the correct classification matrix.

This transfer (to the totality of the data) of the results obtained with the model derived from the symmetrised sample, as shown in Table 5, provides (as in other experiments previously conducted) results that are fully comparable to those obtained thus far, that is, quite adequate but not optimal. Presumably, beginning from a larger sample and randomly selecting the units to make the modalities of the target variable symmetric would yield a post-stratified sample large enough to guarantee the power and representativeness of the procedure.


**Table 5.** Matrix of correct classification of the model, applied to the whole sample.

# **4. Final remarks**

The deep learning post-stratification method was first shown to be useful in classification techniques such as segmentation analysis (or artificial neural networks) for symmetrising a categorical response (d'Ovidio, Mancarella & Toma; 2016). In that study, in which no inference was involved, the method provided optimal and robust results. In the first analysis, using the CRT technique, 84% of the minority responses were correctly classified, as compared to 79% of the alternative, while the original sample analysis provided only 50% correct classification of the interest responses and 99% of the alternative). The same results were obtained by applying the classification rules to the entire dataset (well over one million cases).

In the above research, artificial neural networks, of course, provided better results in the learning and testing samples and were more stable in population reporting (84% to 88% of correct classifications).

The application here shown, thus, demonstrate that post-stratification into symmetric groups provides an effective solution to the problem of the correct representation of relationships by more complex analyses, such as logistic regression. Further applications (including multinomial response variables) could provide a better understanding of the advantages and limitations of this technique.

# **Information and Acknowledgements.**

This paper is the result of joint research, conducted in compliance with statistical ethics, but F.D. d'Ovidio handled the final editing of Section 3, A.M. D'Uggento edited Section 2, R. Mancarella handled Section 4 and E. Toma edited Section 1.

The authors thank Prof. M. G. Onorati and the University of Pollenzo (Bra) for their kind permission to use the survey results described in Sections 2–3.

# **References**


### Adham Kahlawia , Francesca Giambonaa , Lucia Buzzigolia , **A statistical information system in support of job policies orientation**

**A statistical information system in support of job policies orientation**

Laura Grassinia , Cristina Martellia <sup>a</sup> Dipartimento di Statistica, Informatica, Applicazioni, Università di Firenze, Firenze, Italia Adham Kahlawi, Francesca Giambona, Lucia Buzzigoli, Laura Grassini, Cristina Martelli

# **1. Introduction**

One of the main issues in modern labour market governance is about connecting people with jobs (Martin, 2015; Varanasi, 2021); bridging the skills gap is on the agenda of many governs and institutions (Mohla, 2020; Ras, et al., 2017), and many efforts have been done, also at the European level, to address vocational training to the real needs of the different economic sectors. Under the European Blueprint Initiative, for instance, stakeholders work together in sectorspecific partnerships, called alliances for sectoral cooperation for skills, which develop and implement strategies to address skills gaps in different sectors<sup>1</sup> .

In this perspective, the availability of suitable data centred on skills needs in the labour market is strategic and addresses vocational training investment and lifelong learning politics. However, up to now, data sources are mainly organised around the concept of occupation which is too wide to orient politics and investments. In this work, we intend to use the recommendation systems approach to describe the skills more requested by the different occupations in order to improve the granularity of labour market description.

Born in the era of big data, recommendation systems are a family of information filtering procedures that help users make choices in an extremely rich and variable information context (for a brief, recent review, see Jariha and Jain, 2018). They can also be interpreted as methods of predicting whether a particular user will like a particular item based on its preference structure and characteristics. These methods are widely used in various fields: to suggest the purchase of products to customers in e-commerce; to recommend news articles or blog contents to online content readers; to recommend movies or music to users of streaming services, etc. The two classic entities considered in recommendation systems are users (those who choose) and items (what is chosen): the user-item matrix, also called preference or utility matrix, shows the users by row and the items by column, and each cell contains a number that represents the importance of that item for that user. This number can simply be 0/1 (the user has/hasn't chosen the item) or can be the rating expressed by the user for the item. The matrix is typically sparse because many or sometimes most of the entries are empty: recommender systems consist of filling in the empty cells with what similar users would choose. Additional information about users or items can be added to get better results. This article aims to use recommendation systems to predict the future skills that a person has to acquire to reach a particular profession or to develop himself to improve his chances of getting a job.

The data source is always huge, and the system must be able to produce timely responses by continuously updating the information set that is fed by users' feedback. Therefore, the problem is to combine traditional statistical methods used to develop professional skills with data mining and machine learning techniques, which are able to solve the computational complexity of the system and optimise its performance. Many different approaches can be used to solve this problem (Leskovec et al., 2019). We will refer to model based collaborative filtering methods (Chen et al., 2018), that have received great success in many fields of applications. In this case, no information on users or items is requested, and the user-item matrix is factorised by means of latent factor models to reduce its dimensionality. Different algorithms can be used to map each user and each

<sup>1</sup> https://ec.europa.eu/social/main.jsp?langId=en&catId=89&furtherNews=yes&newsId=10035

119 Adham Kahlawi, University of Florence, Italy, adham.kahlawi@unifi.it, 0000-0003-4040-5590 Francesca Giambona, University of Florence, Italy, francesca.giambona@unifi.it, 0000-0002-1760-2062 Lucia Buzzigoli, University of Florence, Italy, lucia.buzzigoli@unifi.it, 0000-0003-3297-1023 Laura Grassini, University of Florence, Italy, laura.grassini@unifi.it, 0000-0003-4678-6507 Cristina Martelli, University of Florence, Italy, cristina.martelli@unifi.it

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Adham Kahlawi, Francesca Giambona, Lucia Buzzigoli, Laura Grassini, Cristina Martelli, *A statistical information system in support of job policies orientation*, pp. 131-135, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.25, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

item to their corresponding factor vectors (Koren et al., 2009).

In our case, users and items are represented, respectively, by occupations and skills; the useritem matrix is built starting from a database produced by Burning Glass Technologies, which collects online job vacancy ads scanned from Italian online portals and company's websites in 2019 and 2020. The cell (i, j) of the matrix contains the number of ads that require skill j for occupation i. The use of recommender systems in this field of analysis is not new (Al-Otaibi and Ykhlef, 2012; Giabelli et al., 2021; Tavakoli et al., 2020; Valverde-Rebaza et al., 2018), but in this case the recommendation system is based on a dataset referred to Italy, in which occupations and skills follow the ESCO classification (European Skills, Competences, Qualifications and Occupations) (Kahlawi, 2020). In particular, the objective of the analysis is to help the vocational training systems and institutions to answer the question posed by every person looking for a new job or professional opportunities: which are the skills to have to enhance the professional profile? Finally, the matrix factorisation process is performed with the Alternating Least Squares (ALS) method and will be described in the next paragraph.

The results offered by the application of the proposed methodology will show which are the skills more requested in the framework of a specific occupation. Workers, job seekers, vocational training institutions, recruitment companies may take advantage of these results in different ways: Starting from the skills already owned by workers to suggest new skills for them, individuating the closest occupation that matches their skills based on the matrix and then comparing the actual profile with the most requested by the labour market. Alternatively, they may move from the concept of occupation to model updating skills politics.

# **2. Methodology**

The methodology in this article is based on six basic actions, as shown in Figure 1.


<sup>2</sup> https://implicit.readthedocs.io/en/latest/als.html


*Figure 1 The methodology*

# **3. Results and discussion**

The effectiveness indicator of the recommendation system refers to the model's ability to

recommended with an accuracy of up to 95.9 percent, as shown in Table One, which expresses the metadata of this model.

*Table 1 Model metadata*


Table 2 represents the profile of two job seekers and the best three recommendations of our model.

*Table 2 Professional profile and the recommend skills for two hypothetical job seekers*


Consequently, Table 3 represents the extent of development that the users will achieve after getting these three skills by showing the top 4 job ads they can apply for. Indeed, it appears clearly from the results how the recommendation system helped users improve their chances of getting jobs directly and based on market demands.


*Table 3 Personal profile improvement of two hypothetical job seekers*

# **4. Conclusion**

Choosing the skills that a person has to learn to get a job opportunity or develop his job position is an ongoing problem because the labour market is constantly changing, and the skills required to do the job are constantly changing. Thus, the solutions provided have to be able to be continuously updated based on market changes. Indeed, this article proposed a recommendation system based on a database collected from the labour market and capable of updating itself based on new data that can be obtained in the future from the labour market. Furthermore, the proposed recommendation system improved people's chances of getting new jobs through the skills that it recommended to these people. Finally, this work faced a set of limitations, the most important of which was the size of the matrix built from the initial data, which is why we used the same data for training and testing the model; for this reason, the proposed recommendation system is not considered a complete system and can not find all solutions for all users. Therefore, we will strive in future work to develop this system to become suitable for the largest possible segment of users.

# **References**


### stadium Cristina Davino <sup>a</sup> , Giuseppe Lamberti <sup>b</sup> <sup>a</sup> Department of Economics and Statistics, University of Naples Federico II, Italy; **Linear regression pathmox segmentation tree: the case of visitors' satisfaction to attend a Spanish football match at the stadium**

Linear regression pathmox segmentation tree: the case of visitors' satisfaction to attend a Spanish football match at the

<sup>b</sup> Department of Business, Universitat Autonoma de Barcelona, Barcelona, Spain; Cristina Davino, Giuseppe Lamberti

# 1. Introduction

Segmentation trees have been attracting a great deal of attention as model comparison tools, with research mainly motivated by the fact that segmentation trees allow identification of partitions of data characterised by different dependency structures. Few algorithms have been proposed by the statistical community that combine model estimation and segmentation trees, outside the MOdel-based recursive partitioning (MOB) procedure proposed by Zelies *et al.* (2008). In a new approach we generalize the pathmox algorithm developed by Lamberti *et al.* (2016) to the context of linear regression models, using a model comparison test to identify the most significant partitions (i.e., sub-groups) in data. Further developments of the proposed approach will involve extensions to other contexts such as quantile regression.

# 2. State-of-the-art

Analysis of a dependency model can be furthered by assessing whether a model and/or the impact of regressors on dependent variables differ if heterogeneity is observed. In other words, it may be interesting to assess differences between a global model estimated on the whole set of observations and models based on sub-groups identified on the basis of known categorical variables external to the model. These variables may identify partitions characterised by a dependency structure heterogeneity. The most popular approaches to comparing regression models rely on comparative statistical testing or on recursive methods. The comparison approach consists of comparing coefficients related to a model common to all the data (i.e., a restricted model representing a homogeneous situation) and another model that reflects the interactions between categorical and predictor variables (i.e., an unrestricted model corresponding to a heterogeneous situation). The comparison approach, which allows for analysis of one categorical variable at a time, is reflected in the F-tests developed by Chow (1960) and Lebart *et al.* (1979), based on an assumption of the normality of the residuals of the two models. Comparison is done by calculating restricted deviance (SSR0) and unrestricted deviance (SSR1). The latter will be lower if interaction between categorical and predictor variables is significant. Under the null hypothesis, if both types of deviance are equal, then the categorical variables produce no differences in model coefficients. This null hypothesis is tested by computing an F–statistic:

$$F = \frac{(SSR\_0 - SSR\_1)/(n - 2p)}{SSR\_1/p} \tag{1}$$

The recursive approach, based on multiple model comparisons, ranks variables that produce differences in the model coefficients. The outcome is a tree where each node represents a model. Partitions are obtained by comparing the effect of each categorical variable on the model coefficients and choosing the partitions that produce the biggest differences. This approach requires a criterion to quantify differences in the model coefficients. In case of the MOB

Cristina Davino, University of Naples Federico II, Italy, cristina.davino@unina.it, 0000-0003-1154-4209

125 Giuseppe Lamberti, Autonomous University of Barcelona, Spain, giuseppe.lamberti@uab.cat, 0000-0002-8666-796X

Cristina Davino, Giuseppe Lamberti, *Linear regression pathmox segmentation tree: the case of visitors' satisfaction to attend a Spanish football match at the stadium*, pp. 137-140, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461- 8.26, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

procedure this criterion is based on a fluctuation test that measures coefficient instability (Zelies and Hornick, 2007) as caused by a categorical variable. High instability points to a significant effect of the variable. Tree partitions are defined according to the variables that produce the highest instability.

# 3. Pathmox in a nutshell

Pathmox (Lamberti *et al.*, 2016), developed to detect heterogeneity in models, is a recursive algorithm based on segmentation trees. While pathmox was introduced in the context of partial least square structural equation modelling, it can be generalized to other contexts when a suitable test for comparing models is available. The algorithm applies binary segmentation principles to produce a tree with different models in each node. It starts by fitting a global model to all the data (i.e., the tree root) and identifies models with the most significant differences in child nodes. The most different models are identified by minimizing the sum of the squares of the residuals of the models estimated in each child node. The available data are recursively partitioned according to categorical variables – not included in the model – that yield the most significant differences in the child nodes. Partitions are identified using a test that determines the degree of difference between two compared sub-models. Finally, pathmox avoids overfitting using stopping rules based on maximum depth, minimum node size and non-significance of the partitioning criterion. As the partitioning criterion we propose the hypothesis test as proposed by Lebart *et al.* (1979) and Chow (1960) to compare two linear regression models.

# 4. Visitors' satisfaction to attend a Spanish football match: a pathmox application

We applied the pathmox approach in an empirical analysis to measure the visitors' satisfaction to attend a Spanish football match at the stadium. The sample consisted of visitors aged 18 years and older who attended Barcelona Football Club home matches during the 2017/2018 season. Visitors were selected using a no-random selection based on convenience. Three hours before matches started, randomly selected visitors were approached by seven researchers, who had previously reviewed and resolved any doubts regarding the questionnaire. The visitors were told about the purpose of the research and were asked to collaborate. If they agreed, they were asked to supply an email address to receive an online version of the questionnaire to be completed after the match. The questionnaire was available in the Catalan, Spanish, English and French languages to avoid bias due to the understanding of the questions by tourists.We offered the possibility of accessing the questionnaire through a QR code if they did not want to give an email address. Finally to encourage participation, respondents were entered in a lottery to win an authentic Barsa football shirt. A total of 944 visitors were invited to take part; the response rate of 38.45% meant that 362 usable questionnaires were collected. Men represented almost three-quarters (71.27%) of the respondents (women 28.72%), and nearly half (48.34%) were aged ≤30 years (34.52%, 31-45 years, and 16.71% ≥46 years). Involvement was strong, moderate, and slight in 32.50%, 48.76%, and 18.45% of the respondents, respectively. The percentage of tourist was 40,88% (no tourist 59.12%). 69.06% indicated that it was not the first time that they went to the Camp Nou Stadium.

The questionnaire was designed with closed questions answered on a 5-point Likert scale and aimed to measure the *visitors' satisfaction* in terms of perceived benefits of attending a Barcelona Football Club match (adapted from Ahrholdt et al., 2017; Oliver, 2010), *image of the football team* measured as visitors' perception of the attributes, players, management, and condition of the club (adapted from Beccarini, and Ferrand. 2006), and *stadium service quality* measured as visitors' perception of service performance, based on evaluations of several service dimensions as tickets price, accessibilities, stadium facilities (adapted from Ahrholdt et al., 2017). The following categorical variables, reflecting specific visitors' characteristics, were considered as potential sources of heterogeneity: *gender*, *age* (≤30, 31-45, ≥46 years), *involvement* (strong, moderate, slight), *tourist* (yes or not), and *first time at the stadium* (yes or not).

Pathmox analysis results are reported in Figure 1 and Table 1. We set maximum depth to two levels, bounded the final number of segments to a maximum of four and set the minimum admissible node size to 10% of the total sample. The significance threshold for the partitioning algorithm was p=0.05. The pathmox algorithm identified *involvement* as the variable with the greatest power, distinguishing between not involved– slight – (Node 2) and involved – strong and moderate – (Node 3). Not involved visitors were differentiated according to the variable *tourist* defining two terminal nodes: Node 4 (no tourist) and Node 5 (tourist). Involved visitors were further differentiated according to *age*: visitors aged ≤30 years form one group (Node 6) while visitors aged >30 years (Node 7) form another. On the basis of involvement combined with age and tourist, we could characterise partitions and assign labels to sub-groups. Thus, Node 4 can be defined as the group of not involved-local visitors, Node 5 as not involvedtourist visitors, Node 6 as younger-involved visitors, and Node 7 as older-involved visitors. Finally, the global model coefficients were compared with the coefficients for the four models estimated for the sub-samples identified by the terminal nodes (Table 1), showing that, in terms of satisfaction, not involved-local visitors primarily valued the image of the football team, not involved-tourist visitors valued more the quality of the stadium, younger-involved valued both image and quality in a similar way, and the older-involved valued again primarily the image of the football team.

Figure 1: Pathmox tree


<sup>∗</sup> indicates significance according to the t-test (p-value <0.05)

Table 1: Coefficient comparison for global and terminal nodes.

# 5. Discussion and conclusion

Our results suggest that pathmox can be used to compare regression models, opening up a future research line in other contexts such as quantile regression. From a decision-making perspective, the paper contributes evidence exemplifying how an apparently representative global model can in fact mask different relationships between variables due to heterogeneous data, underlining the importance of accounting for heterogeneity when defining new p olices. While the algorithm allows partitions to be identified w here differences between m odel coefficients are greatest, it has the limitation that no overall significance criterion is considered once each partition is identified. This important aspect needs to be considered in a future version of the algorithm. Note that pathmox aims to identify the most significantly different sub-groups, unlike a classic decision tree where the objective is to obtain the best prediction based on splitting observations into sub-groups. Therefore, the only similar method is the MOB proposed by Zelies *et al.* (2008), which, however, uses a different criterion to identify the best partitions. A comparison of both approaches will be a natural next step in our research.

# References


### Carlo Cusatellia , Massimiliano Giacaloneb , Eugenia Nissic <sup>a</sup> Ionian Department, University Aldo Moro, Bari, Italy. **Exploring competitiveness and wellbeing in Italy by spatial principal component analysis**

**Exploring competitiveness and wellbeing in Italy by spatial principal component analysis**

<sup>b</sup> Department of Economics and Statistics, University Federico II, Naples, Italy. <sup>c</sup> Department of Economics, University G. d'Annunzio, Chieti-Pescara, Italy. Carlo Cusatelli, Massimiliano Giacalone, Eugenia Nissi

# **1. Introduction**

The statistical observation of a complex social phenomenon such as of *well-being* includes a series of logical operations that lead to empirical indicators suitable for its study. The socalled statistical operationalization, which in this case will include the following basic steps: - definition of the complex concept of well-being, which cannot be measured directly;



It is not easy to define the concept of well-being: it concerns a social phenomenon with very different semantic and interpretative orientations that have their roots in various disciplines, such as Economics, Sociology, Psychology, Urban Planning and others. A concept that embraces all aspects of living can therefore only be part of a theoretical model built ad hoc. To begin to evaluate the problems that led to the construction of the model used in this work, we want to briefly recall the excursus that over time has led to the different formulations of an evolving concept: *quality of life*.

Some authors<sup>1</sup> identify the birth of the problem in question in the period of the industrial revolution. At that time, the profound changes in the living conditions of the European population led some researchers<sup>2</sup> to study, together with the income earned by the new types of workers, the price of a basket of goods consumed by them and the composition of their expenses, identifying significant differences between the structure of the budgets of the less well-off classes (in which almost all of the income was consumed for food) and that of the wealthier classes (who saw the relative share of food expenditure diminish). Such research, of a purely descriptive nature, spread during the nineteenth century in various European countries<sup>3</sup> and, essentially linked to the study of the subsistence budgets of working families, maintained a purely economic significance of studies on the level of survival of these families.

Subsequently, together with the evolution of the economic conditions of the working families, the concept under examination took on a different meaning, becoming more and more identified with that of satisfying needs that are no longer exclusively nutritional. One of the terms relating to this evolving phenomenon was that of *standard of living*, by which some scholars<sup>4</sup> defined the set of goods and services used in families whose type of life (*mode of life*) was determined by different parameters and social characteristics. According to a different

<sup>3</sup> See, e.g., E. Engel, *La consommation comme mesure du bien-être des individus, des familles, des nations*, in Bulletin de l'Institut International de Statistique, Tome II, Roma, 1887.

<sup>4</sup> See, e.g., A. Bowley, *The nature and the purposes of the measurement of social phenomena*, P.S. King & Son Ltd., London, 1923.

129 Carlo Cusatelli, University of Bari Aldo Moro, Italy, carlo.cusatelli@uniba.it, 0000-0003-3770-3060 Massimiliano Giacalone, University of Naples Federico II, Italy, maxgiacit@yahoo.it, 0000-0002-4284-520X Eugenia Nissi, Gabriele d'Annunzio University, Italy, nissi@unich.it, 0000-0003-3440-601X

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Carlo Cusatelli, Massimiliano Giacalone, Eugenia Nissi, *Exploring competitiveness and wellbeing in Italy by spatial principal component analysis*, pp. 141-146, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.27, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

<sup>1</sup> See, e.g., W.S. and E.S. Woytinsky, *World population and production. Trends and outlook,* The Twentieth Century Fund, New York, 1953; J. Fourastié, *Machinisme et bien-être, niveau de vie et genre de vie en France* 

*de 1700 à nos jours,* Les éditions de minuit, Paris, 1962. 2 See, e.g., F. Le Play, *Les ouvriers européens; études sur les travaux, la vie domestique et la condition morale des populations ouvriéres de l'Europe, précédées d'un exposé de la méthode d'observation*, Paris, 1855.

approach, other authors<sup>5</sup> addressed themselves towards measuring the individual psychic satisfactions that income could provide through the consumption of goods and services, formulating the utilitarian concept of *niveau de confort*. Still others<sup>6</sup> , considering a broader meaning of well-being, came to the concept of a *level of living* which, by loosening the centrality of economic needs, also penetrated into the social areas of *welfare*.

Totally emancipated from definitions of an economic matrix that are not suited to an approach inspired by components of a purely social nature, the concept of *quality of life* dates back to the early Seventies, a period in which a particular and active political-social climate was developing. as opposed to the prevailing economist conception of progress: in Western societies characterized by high rates of economic growth (the *most*, the quantity), it was beginning to doubt that such growth could always be equivalent to social progress (the *best*, the quality). Roads choked by traffic, air pollution, difficult accessibility to services theoretically aimed at all citizens, the spread of new forms of poverty, difficulties in interpersonal relationships are just some of the phenomena that are easy to find in the most industrialized and, particularly, in urbanized contexts, where wealth and population are concentrated but also inequality and social hardship.

Another fundamental guideline of the debate on the quality of life was the requirement that the concept also contemplate the subjective aspects of human existence: the term *quality* implies in fact a personal judgment that is generally not measurable except through subjective indicators. Through the latter it is in fact possible to grasp the internalization of social problems by individuals (attitudes, judgments, perceptions, concerns, etc.). Furthermore, subjective indicators make it possible to complete and specify the information collected by means of objective indicators regarding the aspects (material and otherwise) of the quality of life, towards which individuals perceive a different satisfaction.

Given the interdisciplinary value of the phenomenon, and lacking a unanimous definition of the concept of well-being, the theoretical model formulated in this work stems from the consideration that the healthy evolution of the territory must correspond to a trend towards economic growth that also brings with it another type prosperity: efficiency and effectiveness of services, interpersonal relationships, culture, good housing conditions, and so on. But if on the one hand urbanization offers such undoubted localization advantages, on the other it produces growing disadvantages; therefore the individual who expects high living standards from it may, on the contrary, find himself paying the price attributable to the malaise produced by the degradation: atmospheric pollution, crime, etc.

Assuming that the urbanized one is still the space in which these expectations can be realized more easily, it becomes inevitable to ask the question of its livability when planning and governing the modernization processes of today's society. Therefore, in order to study the phenomenon in question in relation to a more circumscribed collective than that of the entire population of a nation (to which most of the existing studies on the subject refer for reasons of international comparability), it is advisable to examine a territorial area of great sociodemographic importance: the provincial one. The statistical units of this research are therefore represented by the 107 Italian provinces with respect to the objective indicators that will be described later.

The insights expressed so far highlight the evanescence of the concept of well-being which must therefore be stopped in its empirically measurable dimensions. Well-being is an important element as it discriminates against human aggregates by enriching the image of a territory, strengthening its attractiveness, highlighting, in short, its state of health, that is the ability to fulfill its different roles: it is a place of private residence, social place where meetings and

<sup>5</sup> See, e.g., Bureau International du Travail (BIT), *Les méthodes d'enquête sur les budgets familiaux*, Etudes et Documents, Série N, n.9, Genève, 1926.

<sup>6</sup> See, e.g., D.E. Christian, *International social indicators: the OECD experience*, Social Indicators Research, 1974, n °1, p. 169-186.

interactions are easier, public place where collective demand services can be used to the extent that agglomeration externalities exist. These are precisely the guidelines on which the model adopted here to evaluate the problem of quality of life is based: an integral concept of human development capable of grasping, in addition to the economic one, also other aspirations.

Nowadays a significant amount of data and information is available on topics of local interest which, especially at an objective level, can lead to a fairly complete examination of well-being according to the directions just outlined. Social concerns often examined in order to identify and compare levels of well-being between nations constitute a useful reference in the selection of thematic areas and indicators that can be adopted for this study. The topics that most interest the sphere of well-being can be summarized in the following areas of investigation: environment, housing conditions, roads, work, public services, crime, cultural level.

Within the aforementioned areas, suitable means of indication must therefore be sought. Their identification constitutes a very important phase of this study: after having defined and specified the object of the research and identified the statistical units of reference (the Italian provinces), it is necessary to select the indicators that are best able to measure the various aspects of the well-being.

Once the indicators were chosen on the basis of the foregoing, the data collection often provided only the raw material of the final product represented by the measuring instruments of well-being. The next step therefore concerned the construction of the indicators: this phase consists of all those operations in which some initial data are weighted or variously combined with each other in order to make them statistically comparable and theoretically representative of the phenomenon to be studied.

The simplest family of indicators is that of the so-called primary measures: they concern the amount of individual characters possessed by each statistical unit. At a higher level of complexity are the simple (or elementary) weighted indicators, constructed by dividing the primary measure by a reference variable (which is often another primary measure) called the basic measure; this operation, eliminating the source of variation determined by the basic measure, has the purpose of legitimizing the comparability of the data relating to the various statistical units.

Often, then, the need arises to combine the different simple indicators into compound (or synthetic) indicators, especially when the relationship between phenomenon and elementary indicator is not simply one-to-one, but rather problematic and complex: the same phenomenon can in fact be measured by means of several simple indicators, all different but sometimes partial with respect to the final dimension to be represented. They can be integrated into one or more models that give as an answer the level of each constitutive aspect of the phenomenon considered (partial compound indicators), or its overall level (a global compound indicator). These aggregations also make it possible to better visualize the state conditions, especially when one wishes to make comparisons between different realities, which are also necessary in this research so that the representation of the livability of our country can be interpreted.

The work is organized as follows: in section two some methodological aspects related to principal component analysis for spatial data will be presented; finally, an application to the the data of BES at local level NUTS 3 will be presented.

# **2. Spatial Principal Component Analysis**

Principal Component Analysis (PCA) is one of the most popular multivariate statistical technique used for reducing data with many dimensions, and often wellbeing indicators are obtained using PCA: it is implicitly based on a reflective measurement model that it non suitable for all types of indicators. Mazziotta and Pareto (2013) in their paper discuss the use and misuse of PCA for measuring well-being. The classical PCA is not suitable for data collected on the territory because it does not take into account the spatial autocorrelation present in the data.

Spatial PCA techniques, specifically designed for spatial effects are available. The Geographically Weighted Principal Component (GWPC) is a method that adapt PCA for spatial effects. Given *n* observation *xi* it depends from its location *i* in the space with coordinates (*u*, *v*) such that supposing ~((, ), Σ(, ), the Geographically Weighted Principal Components are obtained through the decomposition of geographically weighted variancecovariance matrix (Harris et al 2011):

$$\Sigma(u,v) = \mathbf{X}^t \mathbf{W}(u,v)\mathbf{X}^t$$

where (, ) is a diagonal matrix of weights. Different kernels functions can be employed to generate the diagonal matrix of weights. Hence the local principal component at location (, ).

$$\Sigma(u\_{i\prime}v\_i) = L(u\_{i\prime}v\_i)\mathcal{V}(u\_{i\prime}v\_i)L(u\_{i\prime}v\_i)^T$$

where (, ) is the matrix of geographically weighted eigenvectors and (, ) is the diagonal matrix of the geographically weighted eigenvalues.

Considering a set of *p* variables, the GWPC provides *p* components, *p* eigenvalues, *p* set of component loadings and *p* sets of component scores for each location in the study area.

An alternative way to assess the spatial variability of data in PCA is to consider the Locally Weighted Principal Component (LWPC) applied to the situation when data are not described well by an universal set of principal component (Tipping and Bishop 1999,). This technique use a moving window weighting approach in the data space. For each individual LWPCA around *x*, neighboring data point are first weighted according to some distance decay kernel function. Each observation is then multiplied by its respective weight and a standard PCA algorithm is (locally) applied to this weighted data.

Spatial effects can also be taken into account when PCA is combined with a measure of spatial autocorrelation. Jombart et al. (2008) have introduced a modification of PCA (called sPCA) to investigate the pattern of spatial variability of multivariate spatial pattern. The presence of spatial autocorrelation is measured using Moran's I (Moran, 1950). sPCA provides PC scores that summarize both the aspatial variability and the spatial autocorrelation structure in geographical space.

# **3. Empirical Results**

The application consider the data of BES at local level NUTS 3, a system of equitable and sustainable well-being indicators at small-regions level that are consistent with the national Bes measures. To meet the statistical information needs of local communities, Istat designed Bes at local level in cooperation with local authorities, investigating the specific information needs of Italian Municipalities, Provinces and Metropolitan Cities and tuning a shared theoretical framework. Bes measures at local level maintain a high level of quality and consistency with the Bes indicators system and constantly follow the evolution of the Bes framework. The two frameworks share a core of common and harmonized indicators. In addition, Bes at local level includes specific well-being indicators, concerning some issues that are related to responsibilities and functions of local authorities (Istat, 2020).

The set of indicators, illustrating the 12 domains relevant for the measurement of wellbeing, is updated and illustrated annually in the Bes report. In 2020, the set of indicators has been expanded to 152 (it was 130 in previous editions), with a deep revision that takes into account the transformations that have characterised Italian society in the last decade, including those linked to the spread of the COVID-19 pandemic.

The first step in spatial analysis is to asses if a source of spatial correlation is present in the

data or not. Among all the possible alternatives, Moran's *I* is the test used for assessing the presence of spatial autocorrelation. The results of Moran's *I* test reveal a statistically positive spatial autocorrelation for each pillar (Tab. 1).



The results shows that spatial dimension is relevant for all sustainable well-being dimensions and within each dimension, for most indicators. This evidence gives other further elements in favour of the application of Spatial PCA (sPCA) than a classical PCA in investigating determinants of provinces well-being for construction spatial composite indicator.

There is a strong spatial differentiation between positive and negative spatial correlation for each principal component obtained, recorded in the North and the South part of the country, respectively.

The spatial nature of sPCA ensure that the percentage of total variance explained can be decompose into pure variability and spatial autocorrelation.

The spatial patterns in the proportion of the explained variance vary significantly across the studied region, allowing in such a way to highlight territorial urban differences. In general, for the majority of the urban well-being domains the highest PTVs (Proportion of the Total Variance) are located in the Province capital cities in the south of Italy.

For the domain Health, the thematic maps of the local principal components reveal the

presence of a global spatial pattern.

By mapping the spatial variation of the first local component of the Education domain, we find out that it mainly considers the participation to primary school and this elementary indicator dominates in most urban areas, with some exceptions for a number of Province capital cities of Calabria and Sicily, where the leading variable is represented by the early leavers from education and training, aimed to capture the problem of school dropout. The dimension Education is characterized by local and global pattern.

For work and life balance pillar we can observe a lesser geographic variation in the influence of each variable on the first component: for the majority of cities the dominant variable is the employment rate of women.

# **References**


### **economic statistical registers** Roberta Varrialea , Fabiana Roccia , Orietta Luzia **Total Process Error framework: an application to economic statistical registers**

**Total Process Error framework: an application to** 

<sup>a</sup> Istat, Rome, Italy Roberta Varriale, Fabiana Rocci, Orietta Luzi

# **1. Introduction**

As many other National Statistical Institutes (NSIs in the following), in recent years Istat has given new impetus to the renewal of its overall strategy for the production of Official Statistics. In this strategy, the production of the required outputs in all the statistical production areas is obtained based on the combined use of both primary and secondary sources of information. Primary data are those obtained by direct surveys, while secondary data correspond to information that are made available to NSIs by external bodies, and that are used by NSIs for statistical purposes (Memobust, AA.VV. 2014). Actually, one of the fundamental principles of the new Istat production strategy is the massive and integrated use of micro data from administrative sources (hereafter AD), which are used in particular for the construction of statistical registers. Besides other methodological aspects, this deep change in the statistical production paradigm requires to adapt standards and tools for the evaluation and documentation of data quality for the final users of the registers outputs and, more generally, of the outputs of multisource processes.

In this context, the Total Process Error (TPE) framework has been recently proposed in literature for assessing the quality of multisource processes, such as the production process of statistical registers. TPE framework can be used both to support the multisource process design and to monitor an overall production process, and can provide key elements for the assessment of the quality of both the processes and their statistical outputs.

In this paper, we describe how the TPE framework can be used referring, as a case study, to the Istat Register for Public Administrations. The production process of this register is still under construction, and is characterized by a modular structure depending on the different subpopulation covered by the register itself. By using the TPE, we focus on the different steps and critical "decision points" of the production process for the different modules of the register. In section 2, we describe the main elements of the TPE, in section 3 we describe its application to the Register for Public Administrations.

# **2. The Total Process Error framework**

Total Process Error (TPE) framework has been recently proposed in literature for assessing the quality of multisource processes (Rocci *et al*., 2022). The TPE framework represents an evolution of the Zhang's two-phase life-cycle approach (Zhang, 2012).

The TPE includes two phases of assessment, that can be described as: Phase 1. Assessment of single data sources w.r.t. original source purposes; Phase 2. Combination/re-use/integration of data sources w.r.t. target statistical purposes, that can be further splitted in: Phase 2a. Assessment of single data sources w.r.t. target statistical purposes and Phase 2b. Assessment of the combined data sources w.r.t. target statistical purposes. For each phase, some potential errors that may arise together with specific indicators to assess them are identified.

The TPE also includes an operative tool to connect the steps of a multisource production process to the phases of the quality evaluation framework: actually, this tool consists of a crossclassification scheme describing the link between the process steps of an entire production process and the above mentioned phases of the TPE framework. The cross-classification scheme may be used both to support the design of the statistical production process and to monitor the whole process once it has been put into production. Furthermore, the scheme allows to use the TPE in a

135 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Roberta Varriale, Fabiana Rocci, Orietta Luzi, *Total Process Error framework: an application to economic statistical registers*, pp. 147-151, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.28, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Roberta Varriale, ISTAT, Italian National Institute of Statistics, Italy, varriale@istat.it Fabiana Rocci, ISTAT, Italian National Institute of Statistics, Italy, rocci@istat.it Orietta Luzi, ISTAT, Italian National Institute of Statistics, Italy, luzi@istat.it

very flexible way to represent different production processes. Table 1 shows the crossclassification scheme for a multisource production process using AD composed by *N* steps.


**Table 1. Cross-classification scheme: production process steps vs TPE phases**

# **3. The register for public administrations, territorial bodies**

The economic Register for Public Administrations (hereafter Frame PA) is the result of an Istat project started in 2019. Frame PA is a *satellite* register of the *base* Register of Public Administrations (S13 hereafter). The latter defines the Italian public administrations as a subset of the Italian business register units. The difference between *base* and *satellite* register is in the role they play in the statistical production system, given the target (sub)populations and variables they are referred to. Following Wallgren and Wallgren (2014), we can define the base registers as the ones that represent the statistical reference populations for all the statistical processes (individuals/hoiseholds, economic units, etc.) and the satellite registers as those releasing additional variables usually representing specific phenomena. The information contained in the final statistical Register Frame PA will be, for each statistical unit, both structural information coming from the Register S13, and some economic variables respecting accountancy definitions.

Frame PA includes different subpopulations. Nowadays, Istat is working on the subpopulation of Local Authorities, including municipalities, unions of municipalities, provinces, mountain communities, metropolitan cities, regions and autonomous provinces.

The first step to build Frame PA for Local Authorities (hereafter Frame PALA) is to select the statistical units from the Register S13, together with some structural information (address, number of employees, etc). Subsequently, information from AD sources is extracted, integrated and treated to produce the final output, that are some economic variables according to the statistical target accountancy definitions. The main AD sources concerning the economic variables of Local Authorities are the Public Administration Database (BDAP), and the Information System on the Operations of Public bodies (SIOPE). BDAP records the accounting variables of balance sheets according to the Financial Statement Management Schemes; SIOPE is a system of digital collection of profits and payments made by treasurers and cashiers of all Public Administrations. Both BDAP and SIOPE can be can be queried at different times of a reference year to acquire periodic updates.

Following the subject matter experts' indications, taking into account the target population and variables of the Frame PALA, the BDAP has been defined as the primary source of information, as it is provides information consistent with the statistical target accountancy definitions. This choice implies that, after drawing and integrating information from BDAP and SIOPE, missing information in BDAP need to be estimated (imputed), by using SIOPE as auxiliary variables.

Different features of BDAP source characterize the Local Authorities: information on

municipalities, unions of municipalities, provinces, mountain communities and metropolitan cities is affected by total missing values, while information on regions and autonomous provinces (22 bodies in total) usually do not suffer of this problem.

Three variables are considered in the process, both on the revenues and the expenses sides. Let *Y1 BDAP*, *Y2 BDAP* and *Y3 BDAP*, with (*Y2 BDAP* + *Y3 BDAP*) = *Y4 BDAP*, be the variables observed in BDAP and *Y4 SIOPE* the variable observed in SIOPE corresponding to *Y4 BDAP*. The revenues and expenses are specified in Frame PALA across 148 and 22 "items", respectively, that are grouped in Titles. We will refer to the 148 and 22 items as the Frame PALA "theoretical scheme".

In case of total missing values from BDAP, such as for municipalities, unions of municipalities, provinces, mountain communities and metropolitan cities, missing information in BDAP have to be fully imputed, by using SIOPE information as auxiliary variables.

Table 2 shows the coverage of BDAP at different times during 2020 and 2021 for units belonging to the base Register S13 population. The reference year for data of both Register S13 and BDAP is 2019.

**Table 2 – Coverage od BDAP source with respect to the target population (Register 2013), for Local authorities type – Number of respondents. Year 2019.**


The presence/absence of total missing values in BDAP, makes the design of the Frame PALA production process for the two groups of local authorities completely different. Tables 3 and 4 show how the cross-classification scheme may be used to support the design of these two production processes.

Without going into the details of the two production process steps, it is clear that the process relating to the population of municipalities, unions of municipalities, provinces, mountain communities, metropolitan cities is more complex, and comprehend both an integration and an imputation step that are not present in the production process of Frame PA for the populations of regions and autonomous provinces. This means that this process is characterised by additional critical "decision points" and potential errors that may arise. The indicators linked to these steps (and phases) will be useful to support the design of these two different production processes (Rocci *et al.*, 2022).


**Table 3. Frame PA,** *municipalities, unions of municipalities, provinces, mountain communities, metropolitan cities***: production process steps vs TPE phases.**

# **Table 4. Frame PA, regions and autonomous provinces: production process steps vs TPE phases.**


In the future, Frame PA will comprehend additional statistical populations, characterized by a different structure of information sources. Therefore, the production process of the output economic variables will have different steps and critical "decision points". TPE was a useful tool in the design phase of the Local Authorities component, it will be used in the design phase of the other components and will also be used for their monitoring once it is put into production.

# **References**


SESSION

# HEALTH AND WELL-BEING

### Barbara Bartolini1 , Serena Bertoldi2 , Laura Benedan3 , Carlotta Galeone3 , Paolo Mariani3 , Francesca Sofia2 , Mariangela Zenga4 1 HPS-AboutPharma srl, Milano, Italy 2 Science Compass, Milano, Italy **Development of an innovative methodology to define patient-designed quality of life: a new version of a wellknown concept in healthcare**

Development of an innovative methodology to define patient-designed quality of life: a new version of a wellknown concept in healthcare

3 Bicocca Applied Statistics Center, University of Milano-Bicocca, Milano, Italy 4 Dipartimento di Statistica e Metodi Quantitativi, University of Milano-Bicocca, Milano, Italy Barbara Bartolini, Serena Bertoldi, Laura Benedan, Carlotta Galeone, Paolo Mariani, Francesca Sofia, Mariangela Zenga

# 1 Introduction

Quality of life (QoL) is a concept embracing several aspects and functionalities of people's lives. Some of the areas affected are health, relationships, socializing, leisure (The International Society for Quality of Life Research). The achievement of a good QoL is recognized as an essential aim of health assistance, regardless of the pathology and the administered therapy (M. Asadi-Lari et al., 2004). QoL is therefore a pivotal parameter used by clinicians to evaluate how treatments and therapies influence patients' functionality and emotional state, aiming to ameliorate interventions and their outcomes.

QoL is determined by indices assessed by administering questionnaires that can be either generic or disease-specific (D. L. Patrick & Deyo, R. A., 1989; R. Rabin & de Charro, F., 2001; J. E. Ware, Jr. et al., 2016). Currently, the majority of the QoL questionnaires are designed with the main contribution of clinicians and, therefore, include items that are centered on the disease rather than on its multifaceted impact on people's life. These tools are useful for clinicians in determining the best clinical approach, but may fail to truly grasp the patients' perspective, needs, aspirations, perceptions and emotional state, resulting in a major drawback that sets medical care on clinical parameters alone. A proper tool defining the patient's perception of the pathology is missing.

To bridge this existing gap, the definition of a bottom-up patient-designed QoL index could provide a new, patient-centric, unbiased tool to evaluate the patients' perception of their own well-being. Here we describe the development of an innovative methodology to define patientdesigned QoL, based predominantly on patients' contribution.

# 2 Working group and methodology

To define a patient-centric QoL tool, we used a consensus technique aiming to favor the expression of the major players involved in dealing with the pathology.

 The Delphi method is currently widely used in academic research, industry, social sciences and healthcare to reach consensus (R. Boulkedid et al., 2011; I. R. Diamond et al., 2014; M. K. Murphy et al., 1998; Robinson N Trevelyan EG, 2015). The main goal is to collect different opinions to be evaluated by the panel, with the aim of reaching pluralistic evaluations of an issue. In the Delphi panel, the participants are either technical or non-technical experts (i.e., patient representatives) reporting their own point of view (G. Mazziotta Marbach, C. & Rizzi, A., 1991).

In our model, patients and healthcare professionals constitute the working group to build the settings and assertions of the questionnaire.

1


Paolo Mariani, University of Milano-Bicocca, Italy, paolo.mariani@unimib.it, 0000-0002-8848-8893 Francesca Sofia, Science Compass, Italy, francesca@sciencecompass.it

Mariangela Zenga, University of Milano-Bicocca, Italy, mariangela.zenga@unimib.it, 0000-0002-8112-5627

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Barbara Bartolini, Serena Bertoldi, Laura Benedan, Carlotta Galeone, Paolo Mariani, Francesca Sofia, Mariangela Zenga, *Development of an innovative methodology to define patient-designed quality of life: a new version of a wellknown concept in healthcare*, pp. 155-159, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.30, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

<sup>141</sup> Barbara Bartolini, HPS, Health Publishing and Services, Italy, bbartolini@aboutpharma.com, 0000-0001-9646-9310 Serena Bertoldi, Science Compass, Italy, serena.bertoldi@sciencecompass.it

The items of the QoL questionnaire were defined during focus groups involving a panel of patients, two clinicians, one statistician and one facilitator.

Patients are the active players that identify the settings and the main items. According to the European Patients' Academy on Therapeutic Innovation (EUPATI) (K. Warner et al., 2018) definition, patients involved may be:

● Individual Patients, i.e., "persons with experience of living with a disease. Their main role is to contribute with their subjective experience";

● Patient Organization Representatives, i.e., "persons who are mandated to represent and express the collective views of a patient organization on a specific issue or disease area".

Clinicians offer the technical knowledge of the pathology, supporting the precise description of the items. Facilitators act as a guide for discussion among the parties, with the important role of harmonizing the contribution of all the participants to avoid a decisional bias due to the influential opinion of the clinicians on patients' decision.

 The Pseudo-Delphi we propose is an iterative process with subsequent steps aiming to identify a shared solution. It is a flexible method, useful when the identification of the statistical model to be applied is uncertain and when the gist of the discussed problem is not completely known.

The Pseudo-Delphi steps are as follows:


● Second round to discuss questionnaire's answers and definition of the Likert scale and the closed questions

● Definition of the second questionnaire, open questions and closed questions with the respective agreement scale

● Administration of the second questionnaire to each expert

● Third round, with discussion on the questionnaire's answers, identification of the scale of importance with the evaluation of the possible consequences of each decision and the feasibility of each defined option

● Moderated feedback, result aggregation and sharing with the experts. In this round, the new questionnaires resulting from the previous opinions are also shared

● Repetition of the questionnaire, if necessary, to reach the agreement.

 Working anonymously allows to avoid the prevalence of a charismatic individual over the others, which can freely express their opinions without any social pressure.

Moreover, the feedback control allows the experts to be provided with all the information needed to reach the final agreement. With the Pseudo-Delphi method all the participants can analyze and re-consider a variety of aspects included in the questionnaire. This method entails a great effort for the methodologists in estimating and summarizing facts, quantitative data and subjective variables. This approach is of value in the analysis of real-world data, especially in QoL evaluation (S. Pietersma et al., 2014).

The workflow starts from the patients' evaluation of a list of settings identified by a literature search. Patients independently provide social, economic, and organizational information related to their pathology and their relationship with the healthcare system. This allows the identification of the settings of the questionnaire.

The setting evaluation is carried out by every single patient anonymously, to avoid any kind of psychological subjection from the healthcare professional's opinion. At the end of this process, the group meets in a roundtable session to openly discuss about the settings emerged and to rank them according to the perceived order of importance. This step is crucial to skim the settings and find those to be included in the questionnaire, to make it usable in daily practice.

Within each setting, a series of assertions are generated individually and anonymously by each patient. Following similar steps, a set of assertions is identified and included in the questionnaire. Every assertion is then associated to a four-point Likert scale. On the basis of the score, a synthetic patient-centric QoL index is then defined (Figure 1).

Figure 1. Flow chart showing the generation of the QoL questionnaire.

The final version of the questionnaire contains the scales for agreement and importance measures. They aim to link the agreement to one item with the importance for the patient in her/his life and they are built on the basis of the Customer Satisfaction Techniques. After the identification of the different settings (e.g., physical, emotional, social, functional and economic) the level of agreement and the level of importance of the statements within each setting are rated on a 4-points Likert scale (response categories: not at all; a little; quite a bit; very much) by the participants. The methodology allows the production of a composite index for "uneasiness", which will be then compared to the internal control ‒provided by the evaluation of each own QoL on a one to ten scale. The composite index is defined as follows.

Let ��� (i=1,…, n; j=1,…, �; s=1,...S) be the agreement of the i-th respondent on the j-th statement for the s-th setting. The categories on the agreement part for a statement are treated as numeric variable where "not at all" =0.001; "a little" =0.33; "quite a bit" =0.67; "very much" =1. In this case we transform the variable at 4 categories in 3 categories where the distance between each successive item category is equivalent and equal to 0.33. The agreement on not at all is treated as the lack respect to the statement. Moreover let ��� (i=1,…, n; j=1,…, �; s=1,...S) be the importance given by the i-th respondent to the j-th statement for the s-th setting. In this case the categories for the importance for a statement are "not at all" =0.25; "a little" =0.5; "quite a bit" =0.75; "very much" =1.

An indicator on the j-th statement for the s-th setting given by the i-th respondent is given by

$$
\mu\_{ljs} = \mathfrak{x}\_{ljs} \cdot \mathfrak{w}\_{ljs} \tag{l}
$$

The ��� takes values in [0.00025; 1]. For each value of ���, it is possible to find the correct combination of ��� and ���.

The questionnaire includes a section with structural questions exploring the current state of the disease, personal evaluation about the psychological state and the type of assistance received, geographical and demographic characteristics. This information completes the patient profile and can be used for further analysis and stratification.

For the i-th respondent, it is possible to create an uneasiness score for the s-th setting as

$$U\_{\rm is} = \sum\_{j=1}^{k\_s} \boldsymbol{\pi}\_{\rm ijs} \cdot \boldsymbol{w}\_{\rm ijs} \tag{\rm II}$$

In (II) the statements running in the opposite direction for the s-th setting are reversed for the score. The Uis could take values in [ks · 0.00025; ks].

For the i-th respondent, the total composite index is given by:

$$T\mathcal{U}\_i = \sum\_{s=1}^{S} \mathcal{U}\_{is} \tag{\text{III}}$$

that takes values in [0.00025 ∑ � � ��� ; ∑ � � ��� ]. The linear transformation of (III) in

$$TU\_i^{10} = (10 - 1) \cdot \frac{T U\_i - \sum\_{s=1}^{S} k\_3 \cdot 0.00025}{\sum\_{s=1}^{S} k\_3 (1 - 0.00025)} + 1\tag{1V}$$

allows that � �� ∈ [1; 10]. The � �� represents the synthetic patient-centric QoL index. It is possible to compare � �� respect to the i-th respondent to the score of the quality of life of the i-th respondent QoLi.

# 3 Conclusions

With this pilot study we suggest a methodology to set up a questionnaire for the identification of a synthetic index that allows the evaluation of the overall QoL of patients, regardless of the clinical data. The index enhances the patients' awareness of their subjective experience with the disease and enables them to better present their situation to the clinicians. This methodology can be considered in light of the idea of improving patient engagement as highlighted by the EUPATI PARADIGM project (P. Spindler & Lima, B. S., 2018). This methodology needs to be further validated through administration to patients suffering from different pathologies, and compared to the methodologies already available from international sources. An index directly generated by the patients can provide a descriptive model helpful not only to patients, but also to clinicians and third parties, that can be further integrated with clinical details to obtain an overall view of the course of treatment for each patient.

# References


### Corrado Cuccurulloa , Luca D'Aniello<sup>a</sup> , Massimo Ariab , Maria Spanob **Measuring the impact of healthcare indicators on academic medical centers' scientific production**

**Measuring the impact of healthcare indicators on academic medical centers' scientific production**

<sup>a</sup> Department of Economics, University of Campania Luigi Vanvitelli, Caserta, Italy. <sup>b</sup> Department of Economics and Statistics, University of Naples Federico II, Naples, Italy. Corrado Cuccurullo, Luca D'Aniello, Massimo Aria, Maria Spano

# **1. Introduction**

The Academic Health Centers (AHCs), also known as Academic Health Science Centers (AHSCs) or Academic Medical Centers (AMCs) are hospitals where the activities of scientific research, teaching, and patients care are fully integrated. These complex institutions pursue a triple mission: research, teaching, and care, having an enormous impact on society and the nation's health.

Recently, policymakers and practitioners give more and more great importance to the AMCs' scientific activity for both welfare and Country competitiveness. However, there is no commonly agreed definition of AMCs because their structure and composition are different from the context in which an AMC is located. Indeed, some scholars comment *"when you have seen one Academic Health Centre, you've seen one Academic Health Centre"* (Sanfilippo, 2009). AMC structural and operational characteristics could affect their scientific production and impact. These factors are the scope of services, the location, the size, the market and so on.

Our study aims to investigate and determine which are the possible factors impacting the research productivity and impact of AMCs. We develop a model to assess the academic value of AMCs by considering these factors and how they are related to healthcare performance, measured in terms of scientific productivity, impact, and growth. We focus our research on Italian publicowned AMCs - that is 20 public AMCs as "Aziende Ospedaliere Universitarie", 9 public AMCs as "Ex Policlinici Universitari a gestione diretta", 23 public-owned "Istituti di Ricovero e Cura a Carattere Scientifico" (IRCCS) (Ministry of Health - *www.dati.salute.gov.it*). We retrieve structural information mainly from AMC websites and research data from bibliographic indexing databases (e.g. Web of Science, PubMed) in the period 2010-2019.

Our analysis is articulated in two steps. First, we identify different groups of AMCs by applying a Hierarchical Cluster Analysis (HCA). These groups share common structural and operational characteristics. Second also test the presence statistically significant differences in terms of research productivity and impact among the resulting groups through the Analysis of Variance (ANOVA). Any group is a peculiar AMC configuration.

This work has been partially financed by the research project "Leading Change in Academic Medical Centers", funded by the competitive call for projects V:ALERE 2019. The project aims to provide evidences, advices, and remarks to support System and AMC decision-makers to address the many challenges that AMC face.

147 Corrado Cuccurullo, University of Campania Luigi Vanvitelli, Italy, corrado.cuccurullo@unicampania.it, 0000-0002-7401-8575 Luca D'Aniello, University of Campania Luigi Vanvitelli, Italy, lucadaniello94@gmail.com, 0000-0003-1019-9212 Massimo Aria, University of Naples Federico II, Italy, massimo.aria@unina.it, 0000-0002-8517-9411 Maria Spano, University of Naples Federico II, Italy, maria.spano@unina.it, 0000-0002-3103-2342

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Corrado Cuccurullo, Luca D'Aniello, Massimo Aria, Maria Spano, *Measuring the impact of healthcare indicators on academic medical centers' scientific production*, pp. 161-165, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461- 8.31, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

# **2. Data and methodology**

As we said above, AMC could be very different in terms of different structural and operational characteristics. Aiming at covering these several aspects we collect data from different sources, from the official websites of AMCs, from official documents published by the Italian Ministry of Health, and on Google Maps. Table 1 provides an overview of the variables considered in our study. The table shows the variable synthetic label, how it was encoded, and the source of data. It is worth noting that some variables (e.g. *Type of AMC, Geographical localization*) do not change value during the years, but some others (e.g. *Structure Dimension*) have changed the value in the reference period 2010-2019.


**Table 1** Main structural and operational characteristics of Italian public-owned AMCs

We carried out a HCA to identify homogenous groups of AMCs by minimizing their distance within groups (clusters) and, at the same time, maximizing distance among groups. HCA is a multivariate technique that allows the visualization of the association structure among statistical observations at different levels of granularity.

We choose an agglomerative algorithm where each observation is initially considered as a singleelement cluster. At each step of the agglomerative procedure, the two clusters that are the most similar are combined into a new bigger cluster, using a specific linkage criterion. This procedure is iterated until all observations are in a single cluster. The different solutions are sequentially nested and displayed in a tree structure, known as a dendrogram. Here, we used the Ward linkage algorithm (Ward, 1963) with the Gower's distance (Gower, 1971), the most popular distance for mixed-type variables.

Regarding the research activity of AMCs, we retrieved on Web of Science (WoS) indexing database – launched by the Institute for Scientific Information (ISI) and now maintained by Clarivate Analytics – all the publications from January 2010 to December 2019. To identify the publications related to each AMC, we searched by full name affiliation (e.g. "IRCCS FND MILANO" for the Fondazione IRCCS Istituto Nazionale Tumori Milano, "IRCCS Ca Granda Ospedale Maggiore Policlinico" for the Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico). We limit our search by document type and selected only Articles, Proceedings Papers, Review Articles, and Book Chapters in the English language. The records were exported into PlainText format. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) was used for the selection process of the publications (Liberati *et al.*, 2009). We used three bibliometric indicators to capture the different aspects of their research activity in terms of productivity (*n. of publications/total affiliated authors*), impact *(total citations/n. of publications*), and the annual percentage growth rate for article publication (*percentage growth rate 2010-2019*).

Analysis of Variance (ANOVA) (Jaccard *et al.*, 1984) and Tukey's Post-hoc test were used to inspect differences among the clusters resulting from the HCA.

# **3. Findings and conclusion**

HCA performed on the main characteristics of AMCs returned the dendrogram in Figure 1. We choose the solution into five clusters, highlighted in different color in the graphical representation. Interestingly, there is a natural separation among healthcare institutions with respect to the variable *Type of AMC*. For instance, IRCCS are almost all included in the Cluster 5 (orange) and in the Cluster 2 (green). They differ only with respect to the *SSR* and *PAR* variables, because the Cluster 5 includes all IRCCS subjected to both Regional Health system turnaround plans [*SSR=1*] and Hospital turnaround plans [*PAR=1*].

The Cluster 1 (blue) includes the 75% of AOU and a small portion of IRCCS (13%). All these AMCs are mainly characterized by a more articulated architectural structure [*LAYOUT=1*] and by the presence of an Emergency Departement [*ED=1*]. All of them are not subjected to both Regional Health system turnaround plans [*SSR=0*] and Hospital turnaround plans [*PAR=0*].

The remaining 25% of the AOUs fall within the Cluster 3 (red). They differ from the AOUs in the Cluster 1 because of their dimension. Indeed, these AMCs are all organized in a monoblok [*LAYOUT=0*] and therefore, they have on average a lower number of beds and wards. Finally, the Cluster 4 (lighblue) includes about the 80% of AOU\_SSN, all of them localized in metropolitan areas [*GEO\_LOC=1*], with an Emergency Departement [*ED=1*] and mainly organized in pavilions [*LAYOUT=1*].

**Figure 1** Dendogram resulting from HCA of Italian public-owned AMCs

Table 2 shows the results of a one-way ANOVA. We found a statistically-significant difference in average in our clusters by N. of publications per affiliated authors (F stat = 2.994, Pvalue = 0.0281\*) and by Total citations per N. of publications (F-stat = 4.523, P-value = 0.003\*\*) but not by Growth rate (F-stat = 0.307, P-value = 0.872).


**Table 2** Mean and standard deviation by clusters and ANOVA analysis among clusters.

We noted that AMCs in Cluster 2, Cluster 5 and Cluster 1 including all the IRCCSs and the 75% of AOUs are more productive than the others with an average value of N. of publications per affiliated authors greater than 2. This result is reflected also on the impact of their research with an average value of total citations per N. of publications greater than 22. From these preliminary results we could observe that the AMCs, where the research activity is regulated by strict guidelines (IRCCS) push these institutions to produce more and more with respect to AOUs and AOU\_SSN where more time is probably devoted to teaching and patient care.

# **References**


# healthcare Pietro Renziab, Alberto Franciab **EGIPSS model for the evaluation of performance in healthcare**

EGIPSS model for the evaluation of performance in

Department of Economics, Science and Law, University of Republic of San Marino, Republic of San Marino Pietro Renzi, Alberto Franci

Department of Economics, Society and Politics, Urbino University "Carlo Bo", Urbino, Italy

# 1. Introduction

a

b

Debate about the performance of healthcare systems has been amplified by the current Covid19 pandemic. The impact of this crisis has served to highlight the fragility of many such systems and the key need for policymakers and health service managers across the world to evaluate their performance. Arguably the strategic development of any healthcare system should aim to reduce health inequalities, and therefore, as a minimum, it is necessary to monitor its performance in addressing inequalities in both health and its social determinants. The situation in Italy is a case in point, where there are demands for better quality of care,

higher productivity, better responsiveness, more efficiency and better sustainability. All of these are expressions of the same question, viz. how to improve the performance of health services and health workers? However, measuring healthcare performance presents difficulties because of its multidimensional nature, which can easily lead to conceptual and methodological confusion. As a consequence, there is a scarcity of models which fully analyse performance at healthcare system level. Unsurprisingly, virtually all current performance frameworks include quality of care as a key element, with effectiveness, productivity and efficiency also being recurrent themes. Examples include the World Health Organisation's (WHO) World Health Report 2000, the Organisation for Economic Cooperation and Development's (OECD) framework (2004), and the Nuti's framework (2008). In contrast, social outcomes of healthcare and equity are missing or little developed in most frameworks, with Australian and Canadian national frameworks being notable exceptions. Given this situation, Sicotte et al. (1999) developed the comprehensive Evaluation Globale et Intégrée de la Performance des Systèmes de Santé (EGIPPS) framework for the assessment of the performance of Health Care Organizations (HCOs). Therefore, the main aims of this paper are:


# 2. Key features of the EGIPSS model

In the healthcare sector, one framework stands out: the EGIPSS framework developed by Sicotte et al. (cit.), which is a comprehensive approach to the assessment of performance of HCOs. It includes goal achievement, service production and adaptation to the environment as core dimensions of performance, and usefully adds a focus on values and culture. EGIPSS is geared towards North American settings and has been mainly used in OECD countries. For example, it acts as the basis of WHO-Europe's framework for assessing hospitals, to assess accreditation schemes, to analyse how actors and stakeholders of an HCO define performance and to explore how HCOs learn.

In this paper, the authors present a practical, simplified version of the EGIPSS framework. Keeping the key strengths of this framework, some elements were redefined based on

Pietro Renzi, University of San Marino, San Marino, pietro@unirsm.sm, 0000-0001-6200-7265

<sup>153</sup> Alberto Franci, University of Urbino Carlo Bo, Italy, alberto.franci@uniurb.it, 0000-0002-8157-9792

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Pietro Renzi, Alberto Franci, *EGIPSS model for the evaluation of performance in healthcare*, pp. 167-172, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.32, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

concepts of integrated healthcare systems and public service. Inspiration was found in Parsons' social system action theory to develop an integrative framework of performance, with the performance of a HCO considered to be multi-dimensional. More specifically, it is the result of the interaction between four organisational functions (see Figure 1). Consequently, the success of an organisation depends not only on how each of these functions is organised, but also on how they are aligned with each other. Performance is therefore understood as something more comprehensive than merely efficiently producing desired outputs. Furthermore, it incorporates the managerial approach of the New Public Management (NPM). The framework also describes six equilibriums or alignments between these four functions, which can be best understood as tensions that may arise between the functions as a result of a change in one of them (Figure 1).

The tactical alignment links the Goal Achievement and Service Production function. This deals first with the appropriateness of the service provision in relation to the goals: "To what extent do the service production processes contribute to attaining the goals? Are they effectively producing the output needed to reach the goals?".

The allocative alignment links the Interaction with the environment and the Service Production function. It first deals with resource acquisition. Questions that can be used to assess this include: "Are the obtained resources adequate to organise the service production function? Is the service production function optimal in relation to available resources?".

The strategic alignment examines the link between the Goals that the HCO is pursuing and its Environment. Here, questions include whether the organisational goals correspond with the needs of the population and other key actors.

The legitimating alignment is about the congruence of the Goal Attainment function with the Culture and Values Maintaining function, and questions how the strategic choice of goals influences and shapes the organisational values.

The operational alignment covers the congruence of the Culture and Values Maintaining function with the Service Production modalities, and the impact of the Service Production system on the organisational culture and values.

Finally, the contextual alignment between Culture and Values Maintaining function and Adaptation to the environment deals with how the social, political and cultural dimensions of the environment influence the organisational culture and its core operational values.

Fig. 1: EGIPSS model

Figure 1 below sets out the model:

# 3. Materials and methods

The research made use of various statistical sources (Istituto Nazionale di Statistica (ISTAT), Centro Studi Investimenti Sociali, Osservasalute (CENSIS), Istituto Superiore di Sanità and an array of survey methods. The indicators used in the performance evaluation model were determined through an in-depth study of the existing literature (Sicotte, cit.) and in collaboration with experts from the two locations involved in the study.

Data relating to the Republic of San Marino was provided by its Health Authority, its Istituto per la Sicurezza Sociale (ISS RSM) and by the Office of Statistics of the Republic. The data for the "AreaVasta" of the Marche Region were local and regional statistical sources, plus some internal information sources.

Once the list of indicators was identified, a "balise" (according to French terminology) or benchmarking (according to English terminology) of excellence was determined for each indicator. This represented a norm/guide/method against which results could be compared, thereby enabling opinions and judgements to be formed. The EGIPSS model and methodology incorporates performance indices that enable comparisons to be made based on excellence, which can then be weighted relative to each other within a set of dimensional and subdimensional categories. For example, the Adaptation function covers the dimension of 'Availability of resources' which has two subdimensions of 'Healthcare expenditure and financing' and 'Health workforce' (see tab.2 below). The weights used were based on the original weights provided by the model which were in turn validated by a panel of experts representing the various stakeholders of the healthcare systems studied, chosen according to their skills. This validation process incorporated the DELPHI method (Fabbris et al 2007).

One analytical issue involves establishing the relationship between an indicator and performance. The approach adopted was to determine a balise of excellence for each indicator, that involved values considered to be 'high performing'. The sense of variation in the relationship between an indicator and its associated performance can be positive, negative or parabolic. An overall performance achievement index for a subdimension and dimension is calculated by applying the assigned weights to the calculated percentage of achievement of the balise for each indicator and then aggregating the results. This can be done at each level, on the basis that if the weights are expressed as a percentage their sum within a subdimension, dimension and function must be equal to 100. The process can be repeated for the four functions provided by the model.

Once the percentage of achievement of the balise has been calculated, it is possible to assign a qualitative scale of performance. This serves to add precision, with the values used in this study shown in table 1:


Tab.1: Levels of performance

# 4. Results

This section presents a synthesis and summary of the results derived from the application of the EGIPSS model in the "Area Vasta" of the Marche region and the Republic of San Marino (the latter involves a more reduced version, due to a lack of certain data in local information systems).

Comparisons were made using the indicators relating to the Adaptation function, the Service Production function, Goal Attainment, and the Culture and Values Maintaining function, which are set out in tabs. 2, 3, 4 and 5, respectively. In addition to the above tables the authors sought a helpful diagrammatical presentation that enables the reader to judge the relative performance of different healthcare organisations in terms of 'Strategic equilibrium'. Figure 2 below illustrates this for the relationship between Infant Mortality and Healthcare Expenditure and Financing for the three healthcare organisations studied.



Tab. 3: Service Production function



# Tab. 4: Goal Attainment function


# Tab. 5: Culture and Values Maintaining function


# 5. Weaknesses and strengths of the results obtained

The model provides an overview of the performance, especially for the "Area Vasta" of the Marche, through a prism of 107 indicators. These indicators can present warning signs regarding accessibility, technical quality, efficiency and fairness of a system.

Among the inevitable weaknesses that can affect any frameworks of this nature it should be pointed out that the evaluation of performance is equivalent to comparing the result of an indicator, of a dimension or of a sub-dimension, to a given standard. Where such standards exist they have been used. Otherwise, the performance evaluation was set against external objectives (such as those determined by Canada, the WHO and OECD) or comparisons with the similar results from other countries. The authors' choice on Canada was justified by the fact that this country, by establishing its own empirical standards of excellence, utilised its own comparisons with the EU15 countries (Vrijens et al., 2016). This approach, the only practicable one available, made it possible to position the areas studied in relation to those states that have similar healthcare systems. However, it should be noted that the interpretation of relative performance between micro-areas and States appears very delicate due to the methodological and contextual differences that can compromise the validity of the comparisons. Further constraints were the absence of relevant indicators or the lack of data in the information systems of the two areas.

# 6. Conclusions

Performance evaluation is a process that enables the holistic analysis of healthcare systems utilising measurable indicators. Its role, therefore, is to improve the quality of the decisions being taken by all the staff in the healthcare arena and those services that can impact on people's health. The principles and orientation of the EGIPPS model are useful for assessing healthcare systems at any level, whether it be a country, a province or even a local community.

# 7. References


Nuti S. (2008), La valutazione della performance in sanità, Il Mulino Editore.

Vrijens F., Renard F., Camberlin C. et al. (2016), Performance of the Belgian health system – Report 2015, Belgian Health Care Knowledge Centre (KCE), KCE Reports 259C.

### Yuri Calleoa , Simone Di Zioa , **Unsupervised spatial data mining for the development of future scenarios: a Covid-19 application**

**Unsupervised spatial data mining for the development of future scenarios: a Covid-19 application**

<sup>a</sup> Department of Legal and Social Sciences, University "G. d'Annunzio", Chieti-Pescara, Pescara, Italy. Yuri Calleo, Simone Di Zio

# **1. Introduction**

In the framework of Future Studies, the development of future scenarios can contribute within the social context by providing inputs at the decision-making level in order to take action in the present. However, this implies an effort and a long-time frame in the first two phases of the scenario development (typically *Framing* and *Scanning*) which require a long desk research, such as reading of documents, research of the scientific literature or the consultation of experts for identifying the key factors (Bishop et al., 2007). In particular, the goal of the scanning phase is to define a number of basic driving forces which constitute the base for the construction of alternative futures scenarios. Some scholars (Kayser and Shala, 2020) estimated an average time of two weeks, in which typically the research team compose a panel aimed at understanding the object of study. Recently, with the exponential growth of social networks, users are constantly in connection with each other, disseminating textual, multimedia, and geographical content on a daily basis. It therefore follows that given the enormous increase in data sources within them and given the communication with which users share ideas, thoughts, and information, all this could be exploited in the context of scenario building.

From the premises made so far, we have developed a new approach that uses unsupervised classification models aimed at speeding up the first two phases of scenario development and optimizing the entire process. To capture the topics and the relevant key factors we used Machine Learning methods, including text-mining (Kayser and Blind, 2017) and Spatial Data Mining techniques. The goal of this work is to provide an answer to the following questions: "Is it possible to obtain information on the object of study by extracting key factors from Twitter?", "Does this approach speed up the Scanning phase?". And, above all, "What contribution can spatial data mining offer to the process of development of future scenarios?". To apply the method, we extracted a dataset from Twitter containing textual and geo-spatial content relating to Covid-19.

# **2. Materials and Methods**

The approach used here applies unsupervised classification models belonging to Machine Learning and aims to extract the major topics within a dataset of tweets, in order to use them as key factors in the scenarios' development process. During the month of November 2020, a dataset of 60.000 tweets was extracted through the use of the Streaming API System using 95 keywords and hashtags related to the discussions on Covid-19 (Uhl and Schiebel, 2017). After extracted the matrix, we proceeded to import it into Python to clean and manipulate it, and then we applied the techniques useful for our analysis (after this phase, the remaining tweets resulted in 29.949). The first step carried out saw the conversion into numbers, better defined as "number vectors" (Atenstaedt, 2012) of the data matrix, through the "lemmatisation" and "tokenization". In the processing of a specific language, the vectors of numbers are determined by textual data, in orderto reflect various linguistic properties of the text, where a coding of the characteristics is necessary (Goldberg, 2017). First of all, we tried to have a qualitative general view of our dataset by applying the text-mining technique using the bag-of-words model that extracts and flexibly represents the data of a given text describing

159 Yuri Calleo, Gabriele d'Annunzio University, Italy, yuricalleo@yahoo.com, 0000-0002-0190-6061 Simone Di Zio, Gabriele d'Annunzio University, Italy, s.dizio@unich.it, 0000-0002-9139-1451

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Yuri Calleo, Simone Di Zio, *Unsupervised spatial data mining for the development of future scenarios: a Covid-19 application*, pp. 173-178, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.33, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

the occurrence of words within a document or corpus of documents. The same extracts in a document only the words known and therefore present in a vocabulary assigned to it, while any other information is discarded a priori. We then applied a Sentiment Analysis to understand the degree of polarity of the terms found within the dataset, using two distinct algorithms, called *Vader* and *Afinn,* in order to have a comparison between the two results obtained. We decided to use these algorithms since they are two of the most used in Sentiment Analysis for social networks (cfr. Narasamma et al. 2021; Mayor & Bietti, 2021; Tan & Guann, 2021), and compared to the others, they are able to specifically decipher abbreviations and emojis in the corpus of documents.

They are tools based on lexical rules in relation to what is published mainly on social networks (Hutto & Gilbert, 2014) using a vocabulary of words generally labelled a priori (manually) and subsequently acquired by the model based on their semantic orientation (for example they can be labelled as positive, negative, or neutral). Both algorithms assign a final score based on a sum of the valence scores of the terms in the text and normalized usually between the negative (-1) and the positive (+1) extremes (Huang et al. 2019).

After having a general view of the dataset, in order to understand the most cited terms and their polarity, we used topic modelling to extract possible topics and keywords from the tweets. In this case, we used the Latent Dirichlet Allocation (LDA) (Tong and Zhang, 2016) with the following term frequency function of term :

$$tf(t,d) = \frac{f\_{t,d}}{\sum\_{t' \in d} f\_{t',d}}$$

where , represents the raw count of the term in document .

It is based on a distributive hypothesis of statistical measurement, through the extraction of a series of topics from a corpus of documents. This process is carried out through the mapping of every single document with a good part of the words present (Wang & Grimson, 2007), and the model assigns to each topic a word arrangement determining the key factors.

The LDA assumes that the topics follow a Dirichlet distribution (Minka, 2000), in fact the similarity of documents and topics is controlled by hyperparameters known as and ; if is low it will assign fewer topics to each document, while when is high, we will have the opposite. A low value will use fewer words in the topic modelling process, while a high value will use more words, thus making the topics more similar to each other. The LDA, in fact, does not know a priori the number of topics or terms to be extracted. The model produces a vector that contains the coverage of each topic for the document to be modelled: = (1, 2, … ) where 1 is the coverage of the first argument and so on.

To answer the research questions, we propose an analysis of georeferenced data that will optimize all process by adding important spatial information. Here we use the expression "georeferenced data" in a broad meaning, including any kind of information useful to link a tweet to a geographic object, where the object can be a unit of a vector shapefile layer (like for example a country). Numerous studies have been conducted on Twitter using text-mining or open-mining techniques (Pang & Lee 2008; Taboada et al., 2011; Liu 2012; Poria et al., 2014). Few studies, on the other hand, have focused on the construction of future scenarios starting from the extraction of georeferenced data from social networks. The spatial aspect, in our case, becomes of fundamental importance, as having a geographical view of the subject would benefit the development process.

In the scientific literature, some studies (including Haining, 2010) have highlighted the importance of georeferenced data and therefore the presence of such information in the data (if any) is worth to explore. Actually, through web mining it is not easy to extract spatial information, given that the geographic coordinates (latitude and longitude) are rarely available. The social networks themselves, while previously freely providing quantities of data relating to the positions of users, recently they try to protect themselves by not having such data extracted in a substantial way. In our case, having used a streaming API system extraction, it was possible to model them.

First, we replaced the missing values with the wording "data\_2 NA", after this first step, we have obtained 20.372 tweets with a geographic information included. Subsequently, we linked each tweet to the corresponding country from which it was written by means of information on the location (e.g. village or city), so that, for example, a tweet from Paris is assigned to France. From that, we obtained the frequency distribution of the number of tweets for each country, for each topic and for each key factor permitting to calculate the discussion rate in a single topic and in a single country. These relative frequencies are then reported in a GIS software (Q-GIS Development Team, Open Source Geospatial Foundation Project. http://qgis.osgeo.org.) to create cartograms for the topics, in order to have a representation of the spatial distributions of the same.

# **3. Results and discussion**

The results obtained fully answer the research questions. In fact, it was possible, through the use of text-mining and spatial data mining techniques to extract the influencing factors from our dataset for future scenario development. From sentiment analysis it was possible to measure the polarity of the terms within our matrix, identifying more positive words than negative ones in both algorithms. In support of this analysis, the results are shown in Table 1.


The key factors were extracted through topic modelling which highlighted 5 topics with 6 related keywords (Table 2). The first topic focuses on the *health* aspects, important to understand how the existing pandemic has brought health problems, causing discomfort and death. But beyond the physical aspect, the psychological aspect has also been affected, in fact we can find the presence of the term "anxiety" within the keywords that compose our topic. However, the vaccination uncertainty that persisted in November 2020 should not be underestimated, this aspect is of fundamental importance precisely because it fuelled – and feeds – discussions and conspiracy theories (see topic 3). The second topic describes the *political* aspects, and it is worth noting that the dataset, having English as language and having been extracted during the American elections, was affected by a strong influence of the same. This perspective can be observed from the terms: "government" and "trump". The third topic is reserved for denial and *conspiracy*, as we can see from the words "forced", "reality", "protest" and "planning".


The fourth topic refers us to the *economic* field, in fact governments (see keyword "recession") were suffering for the pandemic. Citizens too (see keyword "employee") – forced to close shops, companies, etc. to prevent the spread of infections, they found themselves in a difficult grip to overcome with consequences at work and personal level. Finally, the fifth topic regards the *social* context, and shows how the pandemic issue has had implications in the social structure. Social distancing adopted by governments prevented the normal development of social activities. Not only that, the spread by governments of applications aimed at tracking movements has also had a debate on social networks, specifically on possible complications and on the possible violation of citizens' privacy. The results are shown in Table 2. After analysing the keywords for each topic, we constructed a cartogram for each of them (Figures 1-5).

Specifically, topic 1 (fig. 1), concerning health, it was carried out in Austria, Brazil, Canada, Greece, Philippines and Turkey. These rates of discussion may be higher than in other countries because during the period studied these countries were experiencing more infections and deaths from Covid-19. As for topic 2 (fig. 2) which analyzed political discussions on Twitter, it was more discussed in American countries, Australia, Germany, South Korea and New Zeland, probably due to the political involvement of the american elections carried out in November 2020. Topic 3 (fig. 3) – which analyzed the conspiracy aspect – sees a multitude of countries involved, take for example Spain, Sri Lanka, New Zeland, China and Pakistan. China, first saw the virus appear in its territory, and subsequently it had to interface with conspiracy theories about the nature of the virus trying to disprove them rapidly. Topic 4 (fig. 4), depicting the discussion rates of the economic topic, was most discussed in Iran, China, Japan, Malaysia and United Kingdom, probably because they were particularly affected by the economic damage that has occurred resulting in a strong response from central governments. The last topic (fig. 5), depicting the discussion of topics of a social nature, finds its foundation in Singapore, Switzerland, Uganda and Sweden. A specific note must be addressed in the analysis of African territories, in fact it is possible to find a strong rate of discussion in some countries such as Nigeria, Uganda, Gambia and Kenya compared to other continents, probably due to the social problems added by the pandemic issue to those already existing.

Since the world scale does not allow to highlight all the details, especially for the smaller countries, in Figure 6 we report the five cartograms with a focus on the European region.

The analysis of georeferenced data has fully answered our research questions, given that the results can be used in the context of futures studies in order to implement the initial process of constructing futures scenarios. This approach provides an effective tool for the development of future scenarios greatly reducing the timing of the *Framing* and *Scanning* phases. Furthermore, it provides a contribution to the future research from a statistical-spatial field and, in particular, in the field of spatial scenarios.

Starting from these results, the scenario planning process will continue with the forecasting phase (Bishop et al., 2007; Hines and Bishop, 2015), which consists of the generation of a sufficient number of alternative futures.

# **4. Concluding remarks**

The approach developed above confirms the possibility of introducing the use of text-mining and spatial data mining within the first two phases of the scenario development (Framing and Scanning). It was therefore possible to extract the influencing factors in a short time frame without any literature review of the object studied and without the consultation of experts. Our study, in addition to providing elements for speeding up the process, enrich the analysis through the spatial component that offers important insights, when it is possible to observe the dynamics on geographical distributions. Understanding in which situations and in which parts of the globe a certain key factor is spoken of, means that much more information is provided. The analysis of Twitter data is only a starting point, in fact, in future studies additional social networks could also be taken into consideration (e.g., Reddit, Facebook, Instagram etc.). Furthermore, it will be possible to analyse much larger datasets in order to have a more complete vision of a given subject.

We recommend that subsequent studies focus on the spatial analysis, too often underestimated in futures studies, but capable of providing important information and, if combined with text-mining techniques, it could lead to an important turning point in the process of scenario and/or spatial scenario development.

It is worth noting that the method proposed in this paper produces spatial data that can be analyzed with the typical tools of spatial statistics. For example, a spatial autocorrelation analysis could reveal similarities between adjacent countries, even if in this case study it was not possible given the very low contiguity of the nations included in the dataset.

# **References**


### Massimo Aria <sup>a</sup>, Corrado Cuccurullo <sup>b</sup>, Agostino Gnasso <sup>a</sup> <sup>a</sup> Department of Economics and Statistics, University of Naples Federico II, Italy <sup>b</sup> Department of Economics, University of Campania Lugi Vanvitelli, Italy **Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests**

**Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests**

Massimo Aria, Corrado Cuccurullo, Agostino Gnasso

# **1. Introduction**

Today, the availability of data is growing exponentially in all sectors, especially in the healthcare sector. Machine Learning (ML) techniques allow to analyze big data to exctrat knowledge and support healthcare activities (Miotto et al., 2018), such as models for the diagnosis of complex diseases (Dhillon and Singh, 2019), (Aria et al., 2020). Despite the use of ML is spreading in many applications, it is characterized by some limitations and disadvantages.

ML main drawback corresponds to its lack of interpretability which does not allow users to represent causal relationships and interactions between predictors and response. This leads to the inability to learn how particular decisions are made. From this problem derives the definition of the Black Box model, a highly accurate model with a large complexity that cannot be represented by a relational structure. In other words, it is not possible to visualize how it internally works.

Furthermore, the opaque nature of these models hinders application in various sectors, especially in critical ones such as healthcare. To undertake a decision-making process, having faith in a machine learning model is essential, to feel reassured when analyzing and using it.

Ribeiro et al. (2016) identify a different but at the same time-related definitions of trust: trust in a prediction and trust in a model. Trusting a prediction implies that the user will take a certain action based on it; it is important to determine this confidence given that the model will be used to make decisions think for example of the use of a decisionmaking process in the clinical field, the consequence of acting with absolute confidence on the predictions obtained without being able to understand how they are obtained. Having faith in a model is equivalent to evaluating the model as a whole and testing its ability to generalize with appropriate evaluation metrics. A problem that recurs in using data from real contexts is that they are often significantly different and the chosen metric may not be adequate, therefore an inspection procedure of individual predictions and their interpretations may be the optimal choice.

In this work, we pay attention to one of the most used, accurate, and performing models in Machine Learning, the Random Forest model (RF) (Breiman, 2001).

Random Forest is an evolution of Bagging which aims to reduce the variance of a statistical model, simulates the variability of data through the random extraction of bootstrap samples from a single training set, and aggregates predictions on a new record (see Breiman, 1996). Being an evolution of Bagging, Random Forest aims to obtain even more different and unrelated trees. It is known as an efficient ensemble learning model, as it ensures high predictive accuracy, flexibility, and immediacy; it is recognized as an intuitive and understandable approach to the construction process, but is also considered a Black Box model due to the large number of deep decision trees produced within it (Haddouchi and Berrado, 2019).

Massimo Aria, University of Naples Federico II, Italy, massimo.aria@unina.it, 0000-0002-8517-9411

165 Corrado Cuccurullo, University of Campania Luigi Vanvitelli, Italy, corrado.cuccurullo@unicampania.it, 0000-0002-7401-8575 Agostino Gnasso, University of Naples Federico II, Italy, agostino.gnasso@unina.it, 0000-0002-9220-9754

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Massimo Aria, Corrado Cuccurullo, Agostino Gnasso, *Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests*, pp. 179-184, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88- 5518-461-8.34, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

The results deriving from the use of the Random Forest are valuable. Various studies have confirmed RF effectiveness in many sectors, such as biomedical for genetic selection (D´ıaz-Uriarte and De Andres, 2006). Breiman et al. (2001) states that Random Forest has A + performance but, having a prediction process that is difficult to understand, evaluates an F on interpretability. This leads to Occam's dilemma (Domingos, 1998) (Domingos, 1999).

The poor interpretability has prevented the adoption of the model in some sectors where there is little or no tolerance for errors, such as healthcare and clinical context (Ahmad et al., 2018). Having set the common goal of interpretability, in recent years the scientific community has fueled considerable interest in Interpretable Machine Learning, which today is an extremely open and active research field with numerous approaches that continually emerge every year (Adadi and Berrada, 2018) (Du et al., 2019) (Guidotti et al., 2018).

This research focuses on the comparison between two approaches proposed in the literature that attempt to overcome the interpretative problem. These approaches, Node Harvest by Meinshausen (2010) and inTrees by Deng (2019), are based on a post-processing interpretation method. They are also defined as Rule Extraction (Haddouchi and Berrado, 2019) approaches as they are focused on the extraction of rule sets. Both proposals use an understandable model based on the rules extracted from a Random Forest. The general idea is to identify a representative weak model to provide the interpretation. This one is selected from the sequence of weak models generated by the ensemble procedure. In particular, Node Harvest selects the set of rules through weights that are assigned based on quadratic programming with linear inequality constraints. Performing this task manages to coincide with two objectives, such as interpretability and accuracy in prediction.

Similarly, inTrees obtain interpretable information through the extraction and processing of rules deriving from a tree ensemble sequence. The extracted rules are used for the realization of a learner, which serves to make predictions on new data.

inTrees works through a series of algorithms that, at first, extract the rules and classify them; subsequently, they carry out a pruning phase on each rule, eliminating the rules that produce background noise or that are irrelevant. Subsequently, these algorithms select a compact set of rules considered relevant and not redundant. Frequent interactions are extracted and finally, everything is summarized in a learner that will be used to make predictions on new data.

# **2. Comparison Study**

We compare Node Harvest and inTrees on four health datasets.

Comparison analysis is performed in an empirical context, where their performance is evaluated using performance metrics. These are obtained from the output and are compared to a reference standard (Aria et al., 2021).

The metrics that evaluate the performance of predictive models, when used for classification, are based on the confusion matrix, which contains the expected and observed class labels, as well as the predicted target category and the source category, as can be seen from Table 1 which represents the structure of a 2x2 confusion matrix.

Regarding comparison, the goal is to compare these approaches through the use of different health datasets. The analysis is conducted on four binary classification health datasets. These datasets are available in the UCI Machine Learning repository. They have different characteristics (see Table 2).

Table 1: Confusion Matrix


Table 2: Main characteristics of the selected health datasets.


The analysis follows the following structure: we proceed with carrying out the random forest for each of the four datasets to obtain the performance of the standard model, in terms of the confusion matrix and prediction of the target variable; the extraction of the set of rules is carried out to investigate the paths taken by each observation, of which the most important and frequent rules of the set itself will also be shown.

Finally, the comparison of the various sets of rules obtained from the two investigated methodologies is performed. The final performance evaluation is conducted through nine parameters obtained from the confusion matrices: Accuracy, Precision, Sensitivity, Specificity, G-Mean, F1 Score, Youden's Index, Balanced Accuracy, Kappa (see Sokolova et al., Garc´ıa et al., Akosa).

Examples are provided of the outputs obtained from the Node Harvest and inTrees approaches. These examples derive from the analysis conducted on Pima Indians Diabetes data: Node Harvest allows you to view the set of rules through an explanatory plot, provided in figure 1, while inTrees allows easy reading through summary tables that show the most frequent rule sets, such as in the table 3.

Table 3: inTrees (STEL) on Pima Indians Diabetes: set of decision rules that are easily applicable to new data. The impRRF value measures the relative percentage decrease in the Gini index for each rule derived from the random forest. The impRRF consider the length of each rule as a proxy of its complexity.


Table 4 shows the nine performance metrics calculated on the four health datasets. The highest score, for each metric, is marked in bold. First of all, the interpretative solutions

Figure 1: Rule set plot obtained from Node Harvest on Pima Indians Diabetes.

proposed by Node Harvest (NH) and inTrees (STEL) represent an understandable approximation that provides an accurate summary of Random forest structure. All datasets show accurate measures very close to the reference value, provided by RF.

Focusing on the comparison, inTrees obtained higher scores in all the analyzed datasets. In particular, for EEG Eye State and Diabetic Retinopathy Debreceen, it shows much higher classification performances. It worth to noting, Node Harvest reports higher scores of sensitivity for all datasets. Maybe, it depends on the fact that this classifier can better recognize positive observations.

# **3. Conclusion**

InTrees represents an excellent strategy for obtaining interpretative learners from Random Forest models.

The results deriving from this methodology are just as good, considering that the simplified rules based on the STEL classifier can be implemented in any programming language.

This work is a starting point for understanding the potential of Interpretable Machine Learning, which requires the development of innovative approaches that can meet the interpretative needs of each application context, such as the healthcare framework. A more complete comparative analysis should focus on analyzing data characterized by unbalanced responses and the presence of missing data (D'Ambrosio et al., 2012), and multiclass responses.


Precision 0.78 0.74 **0.78** G-mean 0.66 0.62 **0.68** F1 0.79 **0.82** 0.79

Youden's Index 0.35 0.35 **0.38**

Table 4: Summary tables on the performance metrics performed on the four health datasets.

# **References**


Precision 0.71 0.64 **0.70** G-mean 0.73 0.68 **0.71** F1 0.74 0.72 **0.73**

Youden's Index 0.47 0.38 **0.43**


pp. 37–43.


*pattern recognition and image analysis*, pp. 441–448. Springer.


Meinshausen, N. (2010). Node harvest. *The Annals of Applied Statistics*, pp. 2049–2072.

Miotto, R., Wang, F., Wang, S., Jiang, X., and Dudley, J. T. (2018). Deep learning for healthcare: review, opportunities and challenges. *Briefings in bioinformatics*, **19**(6).

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). " why should i trust you?" explaining the predictions of any classifier. In *Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining*, pp. 1135–1144.

Sokolova, M., Japkowicz, N., and Szpakowicz, S. (2006). Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In *Australasian joint conference on artificial intelligence*, pp. 1015–1021. Springer.

### **toward misinformation in European countries** Mauro Ferrantea , Anna Maria Parrocob **Media and fake news: An analysis of citizens' attitudes toward misinformation in European countries**

**Media and fake news: An analysis of citizens' attitudes** 

<sup>a</sup> Department of Culture and Society, University of Palermo, Italy <sup>b</sup> Department of Psychology, Educational. Science and Human Movement, University of Mauro Ferrante, Anna Maria Parroco

Palermo, Italy.

# **1. Introduction**

The rapid changes determined by the rise of Internet and by the recent development of social media in everyday life have led to profound consequences on the quantity and quality of information made available and on the mechanisms of their dissemination. Today, information is increasingly shared through decentralized mechanisms in which social media play a role as a distribution channel, thanks to tools and platforms that enable peer-to-peer sharing mechanisms (Baldacci & Pelagalli, 2017). The rapid spread of on-line misinformation is one of the most-discussed issue today and has been identified as one of the top-trends in modern societies by the World Economic Forum (2013), partly because of the link between these processes and political communication. Among the reasons behind the relevance of this phenomenon, in addition to the already mentioned process of decentralization of the information, it is possible to identify also: the loss of control by the media on the dissemination process, now increasingly determined by algorithms that decide what, when and to whom to show in an unpredictable way; the growing power of Internet giants, such as Google, Facebook, and Twitter; to mention but a few, in deciding who to allow to publish news, what news to show, to whom to show it and how to earn from this process. This because among the scope of on-line disinformation, it is possible to identify the intention of generating interaction on social media, to gain profits from advertising or to discredit someone image (Figueira & Oliveira, 2017). It is therefore important to better understand citizens' attitude and trust toward media, and eventually to identify the potential determinants of different attitudes.

Starting from these premises, the present work aims at analysing the attitude of European citizens toward fake news and disinformation. After briefly discussing the growing literature on fake news and disinformation, by virtue of the availability of micro-data from the Flash Eurobarometer survey on "Fake news and disinformation online" (European Commission, 2018), a segmentation of users is proposed according to their attitude towards different types of media. Secondly, clusters are characterized both in terms of socio-demographic characteristics and in relation to users' behaviour and opinions regarding misinformation. In consideration of the social and political relevance of misinformation, potential strategies to face with fake news and online misinformation are discussed.

# **2. Background**

Fake news and misinformation are not new phenomena. However, starting from U.S. Presidential election in November 2016 a rapid increase in the use of the term "fake news" has been observed (Rose, 2020). Also, terms such as "post-fact" and "alternative facts" emerged in new media communications. These terms are referred to deliberate distortion of the news with the aim of having an influence in public opinion and to exasperate the internal divisions in the society (Martens et al., 2018). This determined a rise in preoccupation for fake news and for their capability in generating confusion among the public. When the term fake news is used, the reference is generally to deliberate fraudulent media products. It is indeed a more severe judgement compared to "biased news". Also, fake news is something different from on-line satire. However, except for striking situations, in many cases it is not easy to identify the border between satire and discredit

Mauro Ferrante, University of Palermo, Italy, mauro.ferrante@unipa.it, 0000-0003-1287-5851

171 Anna Maria Parroco, University of Palermo, Italy, annamaria.parroco@unipa.it, 0000-0003-3213-7805

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Mauro Ferrante, Anna Maria Parroco, *Media and fake news: An analysis of citizens' attitudes toward misinformation in European countries*, pp. 185-190, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.35, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

intention. Allcott and Gentzkow (2017) define fake news as "news articles that are intentionally and verifiably false and could mislead readers" (Allcott and Gentzkow, 2017, p. 213) with facts entirely false. Dentith (2016) points out that fake news is an "allegation that some story is misleading – it contains significant omissions – or even false – it is a lie – designed to deceive its intended audience" (Dentith, 2016, p. 66), with facts that may be entirely false, contain partial truths, or omissions that would undermine the real fact. Fake news and misinformation have attracted the interest of researchers and institutions in identifying the mechanisms of dissemination of fake news and, eventually, potential strategies for their identification (Shao, 2018). The use of social media to get information has even more amplified the fake news issue: the news, due to its bounce rate, is likely to be contaminated, that is to undergo considerable changes until it becomes itself a fake news. It is undisputed that nowadays mainstream media have been progressively displaced by social media as a source of information. Consequently, individuals must be able to select reliable or unreliable information.

Some studies focused on the principal factors causing fake news and they found that both micro and contextual variables act (Kim & Kim, 2020). People's attitudes and citizens perception towards fake news have been recently investigated by several authors (Reuter et al., 2019; Borges-Tiago et al., 2020; Quan‐Haase et al., 2018; Dinev et al., 2009; Fletcher et al., 2018). They agree that age, education, tech-profile, and cultural and ideological differences among users are relevant variables in shaping the attitudes towards fake news and disinformation. Reuter et al. (2018), referring to a survey conducted in Germany, find that people who are younger or more educated show more ability to identify fake news, and liberal or left-wing persons are more critical; Borges-Tiago et al. (2020) show that citizens attitude towards fake news is different among European countries and report that younger and tech savvy users recognize fake news most likely than others. Quan‐Haase et al. (2018), highlight the importance of information literacy characteristics and information technology skills and Dinev et al. (2009) focus on cultural dimension. Finally, Fletcher et al. (2018), in presenting the results on Italian and French attitude towards fake news, stress the relevance of policy makers, private and public companies in acting to regulate information sources. Nonetheless, few studies have assessed whether populations can be segmented according to their attitude toward media. The present work aims at filling this gap and to assess whether these segments exhibit specific characteristics, both in terms of socio-demographic profile and according to media use.

# **3. Data and Methods**

This study uses micro-data from the European Commission Flash Eurobarometer 464 on "fake news and disinformation online" (European Commission, 2018). The survey carried out in 28 Member States in 2018 on a sample of about 26 thousand respondents interviewed via telephone, aims at exploring EU citizens awareness and attitude toward fake news and disinformation online. Detailed information on the survey, as well as the questionnaire and micro-data are made available by the European Commission through the official portal for European data: https://data.europa.eu.

With the aim of identifying the main determinants of consumer attitudes towards misinformation and fake news, in a first step clusters users have been identified in relation to their attitude toward media, for the six different media types considered (i.e. Printed newspapers and news magazines; Online newspapers and news magazines; Online social networks and messaging apps; Television; Radio; Video hosting websites and podcasts). Secondly, the degree of association of socio-demographic characteristics and of media usage with the proposed cluster is explored in order to characterize different profiles of users across European countries.

In consideration of the categorical nature of data concerning the level of trust in media *k*-mode clustering (Huang, 1997) has been implemented. According to this approach, let A1, A2, …, A6 the set of attributes, describing the categorical space Ω, representing the users' opinion on the different media types considered, where the domain DOM(Aj) of each categorical attribute Aj is given by the three answers' categories, namely: *trust*, *don't trust*, *don't use that media type*. A categorical object X ∈ Ω is represented by the set of attribute-value pair for each of the set of attributes considered, and it can be represented as a vector [*x1, x2, …,x6*]. Let X = {X1, X2, …, Xn} be the set of *n* categorical objects observed in the *n* sample units. We write Xi = Xz if *xi,j = xz,j*, for 1 ≤ *j* ≤ 6, i.e. if two generic sample units, *i* and *z*, have the same value for any of the 6 considered attributes.

The *k*-mode algorithm is an extension of the *k*-means clustering procedure to categorical variables (Chaturvedi et al., 2001) and it aims to partition the objects into *k* groups such that the distance from objects to the assigned cluster modes is minimized. A mode of X is a vector Q = [*q1, q2, …, q6*] that minimises a dissimilarity measure *d*, which is computed by counting the number of mismatches in all variables (simple-matching distance). The *k*-mode algorithm works iteratively by selecting initial *k*-modes of each cluster, allocating each unit to the cluster with the nearest mode according to *d*, then retesting the dissimilarity of units against the current mode and eventually reallocating the units to the cluster with the nearest mode iteratively, until no unit has changed cluster after a full cycle test of the whole dataset (Huang, 1997). In the present work the package *KlaR* implemented in *R* software has been used for the analysis.

Having been identified the clusters which characterise our sample according to their attitude toward different types of media, a regression modeling approach was undertaken to quantify the degree of association of socio-demographic characteristics and of users' behaviour and opinions regarding misinformation with cluster membership. The covariates included in the model were: 1) *Gender,* 2) *Age,* 3) *Occupation,* 4) *Social network use (How often do you use online social networks?),* 5) *Reading and sharing attitude on social network (Do you read or share things when using social network?)* 6) *Presence of fake news and misinformation in the media (Do you come across news which misrepresent reality or are even false?)* 7) *Confidence in the ability to detect fake news (Are you confident that you are able to identify news or information that misrepresent reality or is even false?)* 8) *Perception on the danger of misinformation and fake news (Is the existence of news or information that misrepresent reality a problem in your country or for democracy in general?).* By considering the categorical nature of the response variable, a multinomial logistic regression model was implemented.

# **4. Results**

The dataset under analysis is constituted by 26,576 respondents residing in one of the EU28 countries. As also reported by the EU Commission, at an aggregate level, most respondents tend to trust news and information they receive through radio (70%), television (66%) and printed media (63%). However, less than half (47%) trust online newspapers and magazines, and lower proportions trust video hosting websites and podcasts (27%) and online social networks and messaging apps (26%). Also, these results are consistent across all the Member States (European Commission, 2018). Before implementing the clustering algorithm, those cases containing missing data for at least one of the covariates examined for the analyses were removed. This reduced the dataset to 22,384 cases. Then, to implement *k*-mode algorithm, according to the elbow method, a number of clusters equal to 5 was fixed. Modes of each item corresponding to the attitude for the different types of media considered are reported in table 1.

The results of the *k*-mode clustering procedure highlight different segments of users according to their attitude toward different type of media. Those here called *Impatients* are constituted by users which tend to trust on online social network, radio, and television, whereas they tend to do not trust on printed or online newspapers or magazines. On the contrary, it is possible to define as *Traditionalists* those who trust mainly on traditional sources of information, such as printed or online newspapers and radio. Also, they tend not to trust on social network, television, and they do not use video-hosting websites or podcasts. A particular group of users constitute those who can be defined *Sceptics*, which tend not to trust to any type of media. A fourth group are here named as the *News buff*. They trust on media coming from printed or online newspapers, radio, and television. They are very similar to the *Traditionalists*, except for a trust on television compared to *Traditionalists* who do not. Finally, the last group can be labelled as *Credulous*. They believe in almost any type of media; the only exception being represented by video hosting websites which generally are not used by this type of users.


**Table 1.** Cluster modes according to the attitude toward different types of media.

Table 2 summarizes beta-coefficients, odds ratios(OR) and related *p*-values of the multinomial logistic regression model. From an analysis of the results in Table 2, all the considered factors appear significant, although differences emerge in their effects in relation to the various clusters considered. Conditionally to the other variables, and considering the cluster of *Traditionalists* as baseline, *Gender* is significantly associated to the cluster of *Sceptics*, with a risk of being "Male" of about 1.30 higher compared to the baseline, whereas being "Female" is associated with the cluster of *Credulous* (OR=1.11). In terms of *Age*, being of an age comprised between "25 and 39 years old" and "older than 55 years old", decreases the 'risk' of belonging to the *Impatients*, compared to the other categories. Also being older than 55 years old decreases the risk of belonging to the *News buff*, which tend to be younger compared to the *Traditionalists.* Different occupation profiles characterize the various clusters. Being "manual worker" or "not worker" increases the 'risk' of belonging to the *Impatients*(OR=1.46 and 1.23, respectively); similarly, "not workers" are more likely to belong to the *Sceptics* (OR=1.23), compared to the other occupation categories. Whereas being "self employed" is negatively associated with the cluster of *Credulous* (OR=0.75). Regarding *social network use*, frequent users are associated with the *Credulous* and *News buff*  clusters, whereas being a non-frequent user is associated with the *Sceptics*. A more active behaviour in terms of *reading or sharing things on social media* characterizes the *Impatients,* the *Sceptics* and the *News buff*, compared to *Traditionalists*. On the other hand, *Credulous* tend not to read or share things on social media, thus indicating a more passive behaviour. *Sceptics,* as expected, tend to come across news which they think misrepresent reality or are even false. The other clusters perceive less this risk, compared to the *Traditionalists.* Nonetheless, the *Sceptics* are less confident on their capability in identifying fake news; the same holds for the *Impatients*, whereas *News buff*  and *Credulous* are more confident from this perspective. Finally, a perception of fake news and disinformation as a problem in the country or for democracy in general characterizes mainly the *News buff*. On the contrary, those who do not perceive this problem are more likely to belong to the *Credulous.*

In summary, the analysis of the segmentation results reported in Table 1, jointly with the results of the logistic regression, suggest that *Impatients* and *Credulous* seems to be those at more risk for fake news and misinformation on-line. Also, they do not perceive misinformation as a problem for democracy in general. In the case of the *Impatients* an active behaviour in terms of on-line sharing emerged, thus potentially determining an active role in the spreading of on-line misinformation, also in consideration that both groups are constituted by regular social network users.


**Table 2.** Multinomial regression coefficients, odds ratios (*Exp(β)*), and p-values. The baseline is the cluster of "Traditionalists" for the response variable.

# **5. Conclusion**

The results of the present work show different attitudes of European citizens towards the media, and this is related not only to socio-demographic characteristics, but also to their behavior and opinions regarding misinformation. In considering the relevance of misinformation and fake news in contemporary times, it is important to identify potential strategies for tackling misinformation. Indeed, the role of countering misinformation is the responsibility of a variety of actors. Policymakers could promote a climate of calm discussion around decision that have to be made. The media could make greater efforts to promote unbiased reporting and ensure high standards of quality. It is incumbent on public institutions to provide support and monitoring misinformation, just as social media should pay more attention to the content disseminated through their platforms, playing a role that increasingly resembles that of a publisher. But a key role is represented by education and training, to act on the side of the final recipients of information and make the effects of misinformation less dangerous.

Reflecting on the limitations of this study and future research, it was not possible to include other potentially relevant information, such as the ones regarding tech-profile and cultural background of users since no information are provided from the Eurobarometer survey. It is likely that these aspects markedly affect users' attitude toward media. Finally, the proposed clusters have not been validated in other contexts, a deeper analysis through other data sources and in relation to different geographical areas is required to investigate the validity of the proposed users' segments.

# **References**


### Matteo Di Masoa , Monica Ferraronia , Pasquale Ferranteb , Serena Delbueb , Federico Ambrogia **Longitudinal profile of a set of biomarkers in predicting Covid-19 mortality using joint models**

**Longitudinal profile of a set of biomarkers in predicting Covid-19 mortality using joint models** 

 Department of Clinical Sciences and Community Health, Branch of Medical Statistics, Biometry and Epidemiology "G.A. Maccacaro", University of Milan, Milan, Italy Department of Biomedical, Surgical & Dental Sciences, Università degli Studi di Matteo Di Maso, Monica Ferraroni, Pasquale Ferrante, Serena Delbue, Federico Ambrogi

Milano, Milan, Italy

# **1. Introduction**

a

b

In survival analysis, time-varying covariates (i.e., covariates that are repeatedly measured over time) are endogenous when (i) their measurements are directly related to the event status and (ii) when incomplete information occur at random points during the follow-up because subjects may skip schedule visits and dropout from the study (Rizopoulos, 2012). Consequently, the classical time-dependent Cox model (Therneau and Grambsch, 2000) leads to biased estimates.

In order to correctly estimate the association between a time-to-event outcome and endogenous covariates, two approaches become in widespread use. The first is the joint model (JM) for the simultaneously analysis of longitudinal and time-to-event data (Rizopoulos, 2012). In this approach, the survival sub-model (used to predict hazards for a set of time-invariant covariates) and longitudinal sub-model (used to predict timevarying covariates) are interdependent by means of a set of random effects (i.e., shared parameters). Random effects are individual-specific model terms, and their inclusion in JM provides a way of producing overall predictions. The second approach is the landmarking analysis (van Houwelingen and Putter, 2012), a more pragmatic method that avoids modelling the time-varying covariates. In this approach, the estimated effect of the time-varying covariates is based on the value at the landmark time point, after which values of time-varying covariates may change.

During the first wave of Covid-19 pandemic, physicians at Istituto Clinico di Città Studi in Milan collected a set of inflammatory biomarkers in order to understand what might be used as prognostic factors in progression and mortality of Covid-19 disease. Biomarkers were collected repeatedly over the follow-up. Furthermore, particularly in the first epidemic outbreak, physicians did not have standard clinical protocols for management of Covid-19 disease and for this reason, measurements of biomarkers were highly incomplete especially at the baseline.

The aim of the present study is twice. Using data on Covid-19 patients, we firstly evaluate the association of a single biomarker on Covid-19 mortality using JM, landmarking, and time-dependent Cox model in order to compare estimates. Second, we present JM estimates for the whole set of biomarkers collected on Covid-19 patients to evaluate their association on mortality risk.

177 Matteo Di Maso, University of Milan, Italy, matteo.dimaso@unimi.it, 0000-0002-6481-990X Monica Ferraroni, University of Milan, Italy, monica.ferraroni@unimi.it, 0000-0002-4542-4996 Pasquale Ferrante, University of Milan, Italy, pasquale.ferrante@unimi.it Serena Delbue, University of Milan, Italy, serena.delbue@unimi.it, 0000-0002-3199-9369

Federico Ambrogi, University of Milan, Italy, federico.ambrogi@unimi.it, 0000-0001-9358-011X

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Matteo Di Maso, Monica Ferraroni, Pasquale Ferrante, Serena Delbue, Federico Ambrogi, *Longitudinal profile of a set of biomarkers in predicting Covid-19 mortality using joint models*, pp. 191-196, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88- 5518-461-8.36, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

# **2. Methods**

# *Theoretical framework of JM*

According to the shared parameter approach, the JM consists of two sub-models: one to model the time-to-event outcome (survival sub-model) and the other to model the timevarying covariates (longitudinal sub-model).

The survival sub-model is a typical semi-parametric (or parametric) model for timeto-event outcome. Let \* *Ti* be the true event time for the *th i* subject (with *i N* 1, , ), *Ti* be the observed event time, defined as the minimum of the potential right-censoring time *Ci* and \* *Ti* , i.e., \* min , *T TC <sup>i</sup> i i* , and let \* *i ii IT C* be the event indicator. Furthermore, let *m t <sup>i</sup>* be the true and unobserved value of a single time-varying covariate at time *t* . The (proportional) hazards model is:

$$h\_i\left(t^\*\right) = h\_0\left(t^\*\right) \cdot \exp\left\{\beta\_j X\_{ij} + \alpha m\_i\left(t\right)\right\}$$

where *h*<sup>0</sup> denotes the baseline risk function, *Xij* is the set of *j* time-invariant covariates measured at baseline for the *th i* subject, *<sup>j</sup>* is the corresponding vector of regression coefficients, and is the regression coefficient for the time-varying covariate, quantifying the effect of such variable to the event risk.

The longitudinal sub-model is a typical linear mixed model for longitudinal outcome. As information on the time-varying covariate are collected intermittently and with error at a set of few time points for each subject, the aim of longitudinal sub-model is to predict the complete longitudinal history (also called trajectory) of the time-varying covariate (the outcome of the longitudinal sub-model) for a set of time-invariant covariates. In particular, longitudinal sub-model is:

$$\mathcal{Y}\_i\left(t\right) = m\_i\left(t\right) + \varepsilon\_i\left(t\right)$$

where *m t X gZ t <sup>i</sup> j ij i i* , with *gND <sup>i</sup>* ~ 0, and <sup>2</sup> ~ 0, *<sup>i</sup> t N* . The quantity *y t <sup>i</sup>* is the observed longitudinal outcome for the *th i* subject at time *t* , *<sup>j</sup>* denote the estimates for the fixed effects *Xij* and *<sup>i</sup> g* denote the estimates for the random effects *Z t <sup>i</sup>* . In the shared parameter approach, the random effects are common for the longitudinal and survival sub-models.

Recently, a Bayesian approach for fitting JM was introduced. In particular, estimation of JM's parameters proceeds using Markov chain Monte Carlo (MCMC) algorithm. The posterior distribution of the model parameters is derived under the assumptions that given the shared parameter, both longitudinal and survival sub-models are assumed independent, and the longitudinal outcomes of each subject are assumed independent. In this approach, non-informative priors can be used for explorative purposes.

# *Landmarking analysis*

The idea behind landmarking analysis is to select, for a given time point *LM t s* , all subjects alive and under follow-up at time *s* . In particular, landmarking involves to set *s* and using the value of the time-varying covariate at *s* as fixed covariate in a timedependent Cox model from *s* onwards, in a subset of subjects at risk at *s* . For a generic subject *i*, the objective is to use part of the information of the time-varying covariate of the subject to estimate the conditional probability that the subject is still alive after a predefined time window *w*. More specifically, at a prediction time point *s* , the conditional probability that the subject is still alive at time *w s* conditionally on being alive at time *s* and conditional the history of the time-varying covariate up to *s* is given by:

$$\pi\_i\left(\mathbf{s} + \boldsymbol{\omega} \mid \mathbf{s}\right) = P\left\{T\_i > \mathbf{s} + \boldsymbol{\omega} \mid T\_i \ge \mathbf{s}, m\_i\left(\mathbf{s}\right)\right\}$$

with *m s <sup>i</sup>* denoting the history of time-varying covariate up to *s* .

# *Data collection*

Between 21 February and 19 March 2020, a total of 403 Covid-19 patients were admitted at Istituto Clinico Città Studi in Milan. Patients aged 21-100 years and 58.3% were men. Person-time at risk was computed as the time elapsed from the day of hospital admission to the day of Covid-19 death (event time), to the day of hospital discharge, or to the day of moving in other structure (right-censoring time), whichever came first. Baseline characteristics included sex and age of patients, whereas biomarkers measurements included ferritin (ng/ml), lymphocytes count, neutrophil granulocytes count, D-dimer (ng/ml), C-reactive protein (ml/l), glucose (mg/dl) and lactate dehydrogenase (LDH; U/l).

# *Statistical analysis*

In order to compare JM, landmarking, and time-dependent Cox model estimates, ferritin was considered. In particular, logarithm of ferritin (log-ferritin) levels was used to account for the skewedness of the measurements. According to the Bayesian approach, independent and non-informative priors for the fixed effects of the longitudinal and survival sub-models (i.e., age and sex) and for the shared parameter (i.e. subject-specific predicted trajectories of log-ferritin level) were used in the JM. In addition, a natural cubic spline with 2 knots was used to model the subject specific log-ferritin trajectories through time and to model age. Two knots are generally sufficient to detect mild non-linear effects and to avoid over-parametrization of the model considering the available sample size.

In landmarking analysis, a set of landmarking time point for log-ferritin time of measurements was considered. In particular, data were analysed with *s* running from 3 to 20 days which corresponded to the median and the 75th centile of log-ferritin time of the first and last measurements, respectively. Prediction windows *w* were set at 7, 14, 21, and 28 days. Age was modelled in the same way as the JM.

In the time-dependent Cox model, observed log-ferritin levels was incorporated as time-varying covariate using a natural cubic spline with 2 knots as well as age.

In order to provide associations between biomarkers and Covid-19 mortality, a JM of each biomarker one at time with the occurrence of Covid-19 death was performed (univariable JM). Logarithmic transformation was considered for ferritin, lymphocytes (log-lymphocytes), neutrophil granulocytes (log-neutrophil granulocytes), D-dimer (log-D-dimer), and C-reactive protein (log-C-reactive protein). In the multivariable JM including all biomarkers (multivariable JM), D-dimer was excluded due to the high number (78; 19%) of patients with missing values. Assumptions for priors, biomarkers trajectories and age were the same as the JM for log-ferritin previously described.

Analyses were performed using JMbayes (Rizopoulos, 2016) and dynpred (Putter, 2015) packages in R Statistical Software, version 4.0.5 (R Core Team 2021).

# **3. Results**

Among 403 Covid-19 patients admitted at Istituto Clinico Città Studi, 140 patients died during the follow-up. Among 263 patients survived, 99 were discharged and 164 were moved in other structures. The median of follow-up was 14 days (range: 0-78 days).

Hazard ratios (HR) and corresponding 95% confidence intervals (CI) from the (biased) time-dependent Cox model and JM for log-ferritin levels (ng/ml) were 2.10 (1.67-2.64) and 1.73 (1.38-2.20), respectively. According to landmarking analysis, the HR was 1.73 (1.25-2.38) for a prediction window of 7 days. With regards to 14, 21, and 28 prediction windows, HRs were 1.86 (1.36-2.54), 1.91 (1.40-2.60), and 1.91 (1.40- 2.61), respectively.

The estimates obtained from univariable JM showed decreased level through time for expected log-ferritin according to the negative coefficients for the splines of time at measurements (table 1). Conversely, the expected level of log-ferritin increased with increasing age and men showed higher expected levels than women. The expected loglymphocytes count increased through time, whereas it decreased with age. No association emerged between log-lymphocytes and sex. The expected log-neutrophil granulocytes count decreased through time, whereas it increased with age and men showed higher levels. Likewise, expected log-D-dimer levels decreased through time, increased with age and men had higher levels. For log-C-reactive protein, expected levels showed a mixed trend through time. In particular, levels initially decreased according to the negative coefficient for the first part of follow-up and increased thereafter. The expected log-Creactive protein levels increased with age and men showed higher levels than women. Expected levels of glucose and LDH decreased through time, while increasing with age and men had higher levels.

In univariable JM, all biomarkers were significantly associated with Covid-19 mortality. An increase in the levels of biomarkers was associated with an increased in the mortality risk, except for lymphocytes. In particular, doubling of levels for loglymphocytes count was associated with approximately halving mortality risk (HR=0.58; 95% CI: 0.46-0.73). The strongest associations were observed for log-neutrophil granulocytes (HR=2.87; 95% CI: 2.30-3.51 for doubling of levels), for log-C-reactive protein (HR=2.44; 95% CI: 2.01-2.97 for doubling of levels) and glucose (HR=2.89; 95% CI: 1.92-4.26 for an increase of 100 mg/dl).

The multivariable JM was estimated using data on 320 patients with 96 (30%) events (after exclusion of patients with missing values for D-dimer). For ferritin and lymphocytes there were no more evidence of association with mortality. The strength of the association was attenuated with respect to the univariable JM for log-neutrophil granulocytes (HR=1.78; 95% CI: 1.16-2.69 for doubling of levels), log-C-reactive protein (HR=1.44; 95% CI: 1.13-1.83 for doubling of levels), LDH (HR=1.28; 95% CI: 1.09- 1.49 for an increase of 100 UI/l), and glucose (HR of 2.44; 95% CI: 1.28-4.26 for an increase of 100 mg/dl).

However, the strongest effect in both univariable and multivariable JM was observed for age with a HR starting to rapidly increase approximately at 60 years.

# **4. Conclusion**

In the present work, we firstly compared HR estimates of a single time-varying covariate (log-ferritin) using different approaches. The HRs from JM and landmarking approaches were lower than that of the time-dependent Cox model. In addition, landmarking estimate for a 7-day prediction window was similar to the estimate of the JM, but it tended to increase increasing prediction window. However, landmarking estimates were lower than the time-dependent Cox model one.

Finally, the multivariable JM model showed associations between some biomarkers and Covid-19 mortality but the strong association between age and mortality risk persisted after adjusted for biomarkers considered.

# **References**

Rizopoulos D. (2012). *Joint Models for Longitudinal and Time-to-Event Data. With Application in R.* Boca Raton: Chapman & Hall/CRC.



**Table 1.** Univariable and multivariable joint model estimates.


### level: A stochastic frontier approach Alessandro Magrini **Assessment of agricultural productivity change at country level: A stochastic frontier approach**

Assessment of agricultural productivity change at country

Department of Statistics, Computer Science, Applications – University of Florence, Italy. E-mail: alessandro.magrini@unifi.it Alessandro Magrini

# 1. Introduction

Productivity growth of agriculture is widely recognized as a key resource to meet food demand of the rapidly increasing world population, thus monitoring agricultural productivity change at country level is of core importance for international decision makers. The United States Department of Agriculture (USDA) represents the reference source for agricultural productivity change estimates at country level, covering almost all countries in the world for a long and updated period (from 1961 to 2016). USDA estimates consist of yearly changes in Total Factor Productivity (TFP) based on the growth accounting method, i.e., they are obtained as the ratio between the aggregated output and the sum of the input quantities weighted by their cost shares (Caves *et al.*, 1982). Growth accounting is a widely adopted methodology to assess TFP change due to several advantages, in particular it does not require assumptions on the characteristics of the production processes and allows to consider one decision making unit at a time. However, input cost shares are often partially available, and thus they should be approximated or imputed based on several different sources, like in the case of USDA estimates (see Fuglie, 2015, Table A2), with uncontrollable consequences on the accuracy of estimates. In addition, the growth accounting method has the limitation of assuming that the decision making units operate at their optimal conditions, thus it may overestimate TFP change in presence of technical inefficiency. Frontier-based methods like Data Envelopment Analysis (DEA, Charnes *et al.*, 1978) and Stochastic Frontier Models (SFMs, Schmidt & Sickles, 1984) represent valid alternatives because, by estimating the production frontier from the sample of decision making units, they can distinguish between change in technology and change in technical efficiency, and do not require input cost shares. The main difference between DEA and SFMs is that DEA does not make any assumption on the production frontier, but it is unable to account for random shocks independent of production and, as a consequence, all the deviations from the frontier are attributed to technical inefficiency. Instead, SFMs can disentangle technical inefficiency from external shocks, but they require parametric assumptions on the production frontier. Despite their appealing properties, SFMs and DEA have been employed only in some scattered studies (see the review in Kryszak *et al.*, 2021) and, as such, the available estimates are not comparable with USDA ones.

In this paper, we apply a SFM with translog specification to the same data on agricultural output and inputs employed by USDA, and exploit the generalized Malmquist index proposed by Orea (2002) to derive country level measures of agricultural TFP change, which are then compared with USDA estimates. Our preference for SFMs over DEA relies in the opportunity to account for external shocks and to assess differences in technology across countries, interactions among inputs and the trend of returns to scale.

# 2. Data and methodology

In this study, we employ the same data on which USDA estimates of agricultural TFP change are based (USDA, 2019). These data are sourced to Food and Agriculture Organization (FAO)

Alessandro Magrini, University of Florence, Italy, alessandro.magrini@unifi.it, 0000-0002-7278-5332 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

<sup>183</sup> Alessandro Magrini, *Assessment of agricultural productivity change at country level: A stochastic frontier approach*, pp. 197-202, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.37, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

and to International Labour Organization (ILO), and integrated by modeled estimates. The output variable is the gross agricultural production (Y , thousand US dollars, 2004–2006 average international prices), while the input variables consist of six measures: land use (X1, rainfed cropland equivalents), labour force (X2, economically active adults), livestock (X3, cattle equivalents), machinery stock (X4, 40 CV tractor equivalents), fertilizer use (X5, tonnes of nutrients), animal feed (X6, megacalories of metabolizable energy). The data have annual frequency in the period 1961–2016 and cover the almost totality of countries in the world. In particular, the considered countries account for more than 99.7% of FAO's global gross agricultural output. Some national data have been aggregated to create consistent political units over time (e.g., former Yugoslavia, former Czechoslovakia, Ethiopia plus Eritrea, former Soviet Union) or to avoid very small measurements (e.g., Lesser Antilles, Micronesia), for a total of 170 countries. Futher details and descriptive statistics can be found in Fuglie (2015).

Let i = 1,...,n denote the decision making units (countries) and t = 1,...,T the time points (years). Also, let yi,t be the output level produced by unit i at time t and xi,j,t the level of the j-th input (j = 1,...,p) employed by unit i at time t. A Stochastic Frontier Model (SFM) has the following general form (Schmidt & Sickles, 1984):

$$y\_{i,t} = f(\mathbf{x}\_{i,t}; \Theta) \exp(v\_{i,t} - u\_{i,t}) \qquad i = 1, \ldots, n; \ t = 1, \ldots, T \tag{1}$$

where f is the production frontier, representing the maximum output level technically feasible based on a given combination of the inputs xi,t = (xi,1,t,...,xi,j,t,...,xi,p,t) and a given technology Θ, while vi,t ∈ R and ui,t ∈ R<sup>+</sup> are two random errors representing the deviation from the production frontier f due to shocks, respectively, independent of the producer and related to the production. As such, the maximum feasible output may differ from the maximum output level technically feasible due to the occurrence of either favourable or unfavourable events beyond the control of producers. Specifically, the maximum feasible output for unit i at time t is equal to y<sup>∗</sup> i,t = f(xi,t; Θ) exp(vi,t), thus technical efficiency is TEi,t = yi,t/y<sup>∗</sup> i,t = exp(−ui,t). We employ the following translog specification for f:

$$\begin{split} f(\mathbf{z}\_{i,t};\Theta) = \exp\left(\alpha\_i + \delta t + \gamma t^2 + \sum\_{j=1}^p \beta\_j \log x\_{i,j,t} + \sum\_{j=1}^p \sum\_{k=i}^p \beta\_{j,k} \log x\_{i,j,t} \log x\_{i,k,t} + \delta t\right) \\ + \sum\_{j=1}^p \lambda\_j t \log x\_{i,j,t} + \sum\_{j=1}^p \eta\_j t^2 \log x\_{i,j,t} \end{split} \tag{2}$$

This formulation is identical to the most commonly adopted one in the literature (see the review in Laureti, 2006, Chapter 3, and in Magrini, 2021), with the difference that we added parameters η1,...,η<sup>p</sup> to allow output elasticities to vary in time according to a quadratic trend, rather than to a linear one. The frontier specification in (2) leads to the following SFM:

$$\begin{aligned} \log y\_{i,t} &= \alpha\_i + \delta t + \gamma t^2 + \sum\_{j=1}^p \beta\_j \log x\_{i,j,t} + \sum\_{j=1}^p \sum\_{k=i}^p \beta\_{j,k} \log x\_{i,j,t} \log x\_{i,k,t} + \\ &+ \sum\_{j=1}^p \lambda\_j t \log x\_{i,j,t} + \sum\_{j=1}^p \eta\_j t^2 \log x\_{i,j,t} + v\_{i,t} - u\_{i,t} \end{aligned} \tag{3}$$

with εi,t = vi,t − ui,t. We complete the specification of the SFM by assuming:

$$\begin{aligned} v\_{i,t} &\sim\_{i.i.d.} \mathbf{N}(0, \sigma\_V^2) \\ u\_{i,t} &= \phi\_i t + \psi\_i t^2 + U\_i & U\_i &\sim\_{i.i.d.} \mathbf{N}^+(0, \sigma^2) \\ \mathbf{Cov}(v\_{i,t}, U\_i) &= 0 \; \forall i, t \end{aligned} \tag{4}$$

where parameters φ<sup>i</sup> and ψ<sup>i</sup> regulate the second order polynomial trend of the logarithmic technical efficiency of country i, 'i.i.d.' stands for 'independent and identically distributed', N(·) and N<sup>+</sup>(·) denote the Normal and the half Normal distribution, respectively. This specification for ui,t is the same as in Battese & Coelli (1995) with the addition of the quadratic term.

In order to account for technological gaps among countries with different level of development, we specify four separate models according to the WESP 2020 classification (United Nations, 2020): 'industrialized' (28), 'transition' (22), 'developing' (42), 'least developed' (78). Before estimating the parameters, the time variable is coded as the year minus 1961, thus t = 0, 1,..., 55, and the input variables are divided by their respective sample mean. This allows first order coefficients β1,...,β<sup>p</sup> to be interpreted as the output elasticity of each input evaluated at the sample mean and at the first time point (year 1961), and makes the output elasticity of the j-th input evaluated at the sample mean and at year s equal to βj+λ<sup>j</sup> (s−1961)+η<sup>j</sup> (s−1961)<sup>2</sup>.

TFP change is assessed through the generalized Malmquist index proposed by Orea (2002), which allows to account for variable returns to scale. Based on this index, the TFP change between two time points s and t (TFPCs,t) is decomposed into technological change (TCs,t), technical efficiency change (ECs,t), and scale change (SCs,t):

$$\text{TFPC}\_{s,t} = \text{TC}\_{s,t} \cdot \text{EC}\_{s,t} \cdot \text{SC}\_{s,t} \tag{5}$$

Orea (2002) showed that, under a translog production frontier, these three terms equate to:

$$\begin{split} \mathbf{TC}\_{s,t} &= \exp\left[\frac{1}{2} \left( \frac{\partial \log y\_{i,s}}{\partial s} + \frac{\partial \log y\_{i,t}}{\partial t} \right) \right] \\ \mathbf{HC}\_{s,t} &= \exp[\mathbb{E}(u\_{i,s} \mid \varepsilon\_{i,s}) - \mathbb{E}(u\_{i,t} \mid \varepsilon\_{i,t})] \\ \mathbf{SC}\_{s,t} &= \exp\left[\frac{1}{2} \sum\_{j=1}^{p} \left( \frac{\sum\_{j=1}^{p} e\_{i,j,s} - 1}{\sum\_{j=1}^{p} e\_{i,j,s}} e\_{i,j,s} + \frac{\sum\_{j=1}^{p} e\_{i,j,t} - 1}{\sum\_{j=1}^{p} e\_{i,j,t}} e\_{i,j,t} \right) \frac{\log x\_{i,j,t}}{\log x\_{i,j,s}} \right] \\ e\_{i,j,s} &= \frac{\partial \log y\_{i,s}}{\partial \log x\_{i,j,s}} \qquad e\_{i,j,t} = \frac{\partial \log y\_{i,t}}{\partial \log x\_{i,j,t}} \end{split} \tag{6}$$

# 3. Results

We performed maximum likelihood estimation of model (3) for each group of countries using the R package frontier (Coelli & Henningsen, 2020). Parameter estimates imply significant and positive output elasticities at the sample mean for the almost totality of time points in all the four models, suggesting consistency with the economic theory. Also, the quadratic component of the trend of output elasticities (parameters η<sup>j</sup> , j = 1,...,p) and of logarithmic technical inefficiencies (parameters ψi, i = 1,...,n) are significant, respectively, for most inputs and countries, supporting the adequacy of our model formulation.

Figure 1 displays the time series of the estimated overall elasticity at the sample mean, equal to the sum of all output elasticities at the sample mean by time point. Since the overall elasticity is almost always significantly lower than one for all groups of countries, we deduce that returns to scale are decreasing (and not constant) in the considered period. Based on this result, the assumption of constant returns to scale made by many authors appears just a simplification and not a real property of the production processes of the various countries.

Based on the estimated models, we computed TFPC and its components (TC, EC and SC) with s = t−1 (chained index numbers) and with s = 1961 (index numbers with base year 1961). Table 1 reports average annual percentage changes averaged by group of countries, while Figure 2 displays the time series of index numbers with base year 1961 for a selection of countries. We see that USDA estimates of TFP change are greater in absolute value than ours for most

Figure 1: Time series of the estimated overall elasticity at the sample mean. Shaded areas indicate 95% confidence intervals.

Table 1: Average annual percentage variation of our and USDA's estimates of TFP change averaged by geographical region. The region 'Africa, sub-Saharan' does not include South Africa, while the region 'Oceania' does not include Australia and New Zealand.


geographical regions and periods. Exceptions include North America, Sub-Saharan Africa, East Europe, Central Asia, North Africa and Australia-New Zealand, where our estimates are greater in absolute value than USDA ones, or even discordant, for at least half the periods shown in Table 1. From TFP changes at country level, we note that our and USDA's estimates are in substantial agreement for United States, France, United Kingdom, Australia, South Africa and India, while USDA estimates are very higher than ours for Germany, Italy, Japan, China and Brazil, and moderately higher for Russian Federation and former Yugoslavia. Instead, our estimates are fairly higher than USDA ones for Canada, Afghanistan and Somalia.

The difference between our and USDA's estimates may be due to the presence of techni-

Figure 2: Our and USDA's estimates of TFP change for a selection of countries (indices, 1961=1). The time series of TFPC is shown in blue (TC, EC and SC denoted respectively by straight, dashed and dash-dotted black lines), while USDA estimates are shown in red.

cal inefficiency, that can be taken into account only by stochastic frontier models, but also to inaccuracies in USDA's input cost shares and/or in our model specification. Furthermore, our model is able to detect changes in input use through the term SC, which appears generally non-negligible, coherently with the evidence found in favour of decreasing returns to scale.

To provide an overall assessment on the agreement between our and USDA's estimates, we computed the Person correlation by country and found a median equal to 0.857, with first and third quartile equal to 0.554 and 0.943, respectively. These correlations emphasize that our and USDA's methodology provide different results but in substantial agreement, thus confirming the different theoretical foundations and suggesting the empirical validity of both of them. Full results are available at https://github.com/alessandromagrini/agrTFP.

# 4. Concluding remarks

We have estimated agricultural TFP change at country level based on the same data employed by the United States Department of Agriculture (USDA) using a stochastic frontier model instead of the growth accounting method. This work has the value to provide, for the first time in the literature, a comparison between agricultural TFP changes estimated with different methodologies, and an additional data source that can be employed in a large variety of longitudinal economic analyses at country level.

Our methodology overcomes the limitation of USDA estimates which rely on approximated and imputed input cost shares, and of the growth accounting method in general, which ignores technical inefficiency. However, the accuracy of estimates based on a stochastic frontier are sensitive to model specification. For this reason, we employed a more flexible specification than those adopted in the literature, but, since it is based on deterministic trends, it may be inadequate for long periods, like the one considered in this study. We also paid attention to account for heterogeneity in technology among the various countries by specifying four separate models based on the level of development.

In the future, we plan to improve our methodology by introducing autoregressive coefficients to represent stochastic trends, and by specifying latent classes to account for heterogeneity in technology among the various countries.

# References


### a Bicocca-Applied Statistics Center, University of Milano-Bicocca, Milan, Italy. b Dermatology Unit and Genodermatosis Unit, Genetics and Rare Diseases Research Division, Bambino Gesù Children's Hospital, Rome, Italy c Fondazione REB Onlus, Milan, Italy **Patient-generated evidence in Epidermolysis Bullosa (EB): Development of a questionnaire to assess the Quality of Life**

Pilo c, Gianluca Tadini d

, Carlotta Galeone a, Paolo Mariani a, Cinzia

**Patient-generated evidence in Epidermolysis Bullosa (EB): Development of a questionnaire to assess the Quality of Life**

d Centro Malattie Cutanee Ereditarie, UOC Dermatologia Pediatrica ospedale Policlinico e Università degli Studi di Milano Laura Benedan, May El Hachem, Carlotta Galeone, Paolo Mariani, Cinzia Pilo, Gianluca Tadini

# **1. Introduction**

Laura Benedan a, May El Hachem <sup>b</sup>

Epidermolysis Bullosa (EB) is a genetic disorder characterised by skin fragility and blistering from mild mechanical trauma. There are four major classical EB types: EB simplex, junctional EB, dystrophic EB, and Kindler EB (Has et al., 2020). All types and subtypes of EB are rare. The overall prevalence of inherited EB in the US is about 11 cases per 1 million live births, and the incidence about 20 per 1 million population (Fine, 2016). Similar results have been obtained in some European countries, Italy included (Tadini et al., 2005).

The clinical manifestations and the severity are very heterogeneous. Physical symptoms include fragile skin that blisters easily, causing pain, itch, and odour; dental problems and blisters inside the mouth and throat, dysphagia, and hair loss. This disease may also present muscle, heart, brain, gastrointestinal, bone, or kidney issues. The physical symptoms significantly impact on daily life and everyday activities and are associated with functional limitations and time-consuming medications that can severely affect the Quality of Life (QoL) of patients and their families. Besides, the disfiguring nature of these symptoms causes an additional burden at the psychological and social level, and the overall EB management may have detrimental financial consequences. The rarity of the disease is an additional issue because there is a lack of awareness and understanding by both laypeople and non-specialist healthcare professionals. Dures and colleagues (Dures, Morris, Gleeson, & Rumsey, 2011) underlined how EB patients' unmet needs were above the medical support. Informational needs, self-management, peer support, social skills and one-to-one therapy emerged as critical themes to be improved.

Considering all the implications of living with EB, a valid and reliable scale to assess the QoL of these patients is essential in patient care and management. The most used instrument available to assess EB patients' QoL, which has proven to be valid and reliable, is the QoLEB questionnaire (Frew, Martin, Nijsten, & Murrell, 2009). It was initially developed in English with an Australian sample, and it was successively translated and validated in other languages (Cestari et al., 2016; Dănescu et al., 2019; Frew, Cepeda Valdes, Fortuna, Murrell, & Salas Alanis, 2013; Yuen et al., 2014).

Even though the translation of this existing tool would have been a valid option, in the present study it was decided to conduct a Delphi study to fully understand the patients' point of view, make their voices heard, and capture possible peculiarities of the Italian context.

A three-stage online Delphi consensus procedure was conducted to identify the key domains and specific statements to assess crucial areas of EB patients' QoL.

1

189

Gianluca Tadini, University of Milan, Italy, gtadinicmce@unimi.it, 0000-0003-1164-2802

Laura Benedan, University of Milano-Bicocca, Italy, laura.benedan@unimib.it, 0000-0003-0427-2487 May El Hachem, Bambino Gesù Children's Hospital, Italy, may.elhachem@opbg.net, 0000-0002-8145-4797 Carlotta Galeone, University of Milano-Bicocca, Italy, carlotta.galeone@statinfo.org, 0000-0003-1934-5167 Paolo Mariani, University of Milano-Bicocca, Italy, paolo.mariani@unimib.it, 0000-0002-8848-8893 Cinzia Pilo, REB Onlus Foundation, Italy, cinzia.pilo@fondazionereb.com

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Laura Benedan, May El Hachem, Carlotta Galeone, Paolo Mariani, Cinzia Pilo, Gianluca Tadini, *Patient-generated evidence in Epidermolysis Bullosa (EB): Development of a questionnaire to assess the Quality of Life*, pp. 203-207, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.38, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

# **2. Methodology**

The project started from the request of the Italian no-profit association for EB research and the Italian Registry for EB Foundation (REB) to develop for the first time a patientcentred questionnaire to assess the QoL of patients affected by EB. The methodological process to develop the questionnaire consisted of two phases: firstly, a critical review of scientific literature was performed; secondly, an online pseudo-Delphi study was carried out. The Delphi method is an iterative process where several rounds are organised to identify a shared solution, with useful applications also in health research (Trevelyan & Robinson, 2015). It is a flexible method to determine the gist of the discussed problem when it is not entirely known and when it may be challenging to apply a specific statistical model. The Delphi method consists in envisaging one or more topics to a determined group of experts to provide subsequent evaluations in an iterative process aimed to reach a consensus, which will represent the final expression of the group opinion (Marbach, Mazziotta, & Rizzi, 1991). In this case, the Delphi procedure may be considered "Pseudo-Delphi" because, even though each questionnaire was anonymously analysed and summarised to be presented to the group, the discussions were open, and each participant contributed to the group discussion.

A literature review was conducted to understand what was already known about this pathology and what instruments are used at both a national and international level.

After the problem definition, the expert panel was identified. A multidisciplinary panel including patients, caregivers, and clinicians actively participated in round tables.

The team comprised:


Then, a first group meeting was organised to discuss every step of the project, the main topics to cover, and the primary aim to be achieved. Successively, the patients and clinicians were asked to provide a list of spontaneously generated items to describe different areas of the EB patient's QoL. They worked separately, and all the answers were collected in an anonymous way, allowing every person to freely express their opinions and personal state of mind without any social pressure or external influence. As a result, some powerful statements appeared (e.g. *"Sometimes I think it would be better if I died"*). A total of more than 160 items were created. All answers were carefully considered and grouped within a specific domain. Accurate analysis and harmonisation of all the statements were carried out, in a first attempt to summarise the questionnaire, combine the items with the same meaning, and obtain statements that had a clear value generalisable for the entire reference population. The results were presented in the first Delphi table, i.e., a roundtable session to discuss all the implications of daily living with the disease openly. This group meeting was essential to skim the scopes and find the most salient and relevant assessment in daily practice. On this occasion, great care was taken to ensure a comprehensive and accurate understanding of the experts' points of view.

Hence, the first questionnaire (Q1) was created. This questionnaire also included some items from the literature that were not originally reported during the Delphi roundtable. The questionnaire comprised seven core domains (see Table 1) for a total of 80 items. Each participant was asked to read every statement and assess their degree of importance. They were also required to comment on the clarity and specificity of each item and write any missing information that might have been included. Each expert responded anonymously to the questionnaire and returned it to be discussed in the second Delphi round.


All the answers were carefully examined, and a ranking was created for every item within each domain according to the degree of importance indicated by the participants. The results of this analysis were discussed in the group, and further refinement of the questionnaire was made. Some items were changed or rephrased for greater clarity; others were merged or removed because of their lesser importance.

A new questionnaire (Q2) was defined, considering all suggestions that emerged from the group meeting. The previously identified core domains remained unchanged, but some new items were suggested and inserted. Overall, Q2 was composed of 86 items. At this stage, each participant was asked to rate both the degree of agreement and the degree of importance of each item on a four-point Likert scale (*"Not at all"*, *"A little"*, *"Quite"*, *"Very"*). This step is necessary to remove some irrelevant statements and evaluate the order in which the items are presented. The agreement and importance measures were constructed as satisfactionimportance measures, in line with the widely used Customer Satisfaction techniques.

In addition to the abovementioned seven domains, some specific questions were inserted about the type of EB diagnosed, some socio-demographic information (e.g., age group, the Italian region of residence, the perceived need for psychological support, the perceived satisfaction of the quality of care, etc.). Finally, an overall QoL satisfaction question was asked ("On a scale from 1 to 10, how do you rate your quality of life?").

The results of this phase were presented to the group to define the questionnaire structure further and prepare the new version (Q3) with 85 items, which each participant anonymously filled in. Only one sentence was removed, and some others were modified to be more easily understandable and clear.

It should be noted that, in some cases, a different view emerged between clinicians and patients, and some information learned by the literature were then rejected or modified to be adapted to the language and the experience of the patients (e.g., the terms used to talk about some physical symptoms).

The final version of the questionnaire will be administered to a larger sample to assess its validity and reliability.

# **3. Conclusions**

The present study is part of a more extensive research project aimed at developing a valid and reliable questionnaire to assess the QoL of EB patients. This tool is meant to grasp the point of view and the patient's subjective experience beyond clinical classifications and take into account the patient's overall experience. Starting from an initial set of areas and through the three-round pseudo-Delphi methodology, a gradual refinement of the statements was carried out, and a list of items was defined to be included in an easy-to-use but meaningful patient-centred questionnaire. Each participant had the opportunity to read and fulfil the questionnaire in private, having anonymity assured, allowing free expression of opinions without any social pressure or compliance effect that may conversely arise during the group discussions. On the other hand, knowing all information gathered from the questionnaires and discussing it in the group offered them the opportunity to critically analyse and re-consider all items and areas composing the questionnaire and achieve a final agreement among participants. From a methodological point of view, this approach is worthy in analysing realworld data pertaining to a subjective topic such as QoL, especially in rare diseases. The final patient-centred questionnaire is thus able to measure the QoL beyond the physical symptoms and the clinical evolution of the disease, encompassing functional autonomy, psychoemotional state, social relations inside and outside the family context, the working field and several aspects of the medical care and assistance. The experts approved the final version of the questionnaire after three iterations of anonymous online questionnaire completion and related presentation and discussion of results within the group. The future steps of this research will provide for the assessment of the psychometric properties of the questionnaire to prove its reliability and validity in measuring the QoL of EB patients. This new tool may be a valid aid for clinicians to understand patients better and identify the areas that need more attention; moreover, it may allow them to follow the patients over time and evaluate the impact of any treatments.

# **Acknowledgements**

We are grateful to those who provided valuable contributions to this study, particularly to Simona Buonanno, Claudia Campus, Valeria Manca, Sandra Micich, and Valeria Romiti.

# **References**


# Fabrizio Culottaa **A Prospective Sustainability Indicator for Pension Systems**

**A Prospective Sustainability Indicator for Pension Systems**

<sup>a</sup> Department of Political Science, University of Genoa, Genoa, Italy. Fabrizio Culotta

# **1. Introduction**

Nowadays, population ageing is a central topic in the political agenda of many OECD countries. It is well known that when the demographic structure of population gradually shifts towards older ages, the pressure on the financial sustainability of the welfare system raises. Health care and pension are those public systems affected the most.

Focusing on PAYG (unfunded) public pension systems, i.e. the first of the multi-pillars architecture for pension systems (Holzmann, 2005), the pressure from an older population is twofold. On the payment side, it increases the proportion of recipients and the duration. On the contribution side, instead, it decreases relative portion of contributors. This pressure can be captured by an indicator for old-age dependency ratio tracing the ratio between pension recipients and pension contributors.

From a pension system perspective, the old-age dependency ratio can be conceived as an indicator for the *extensive margin* of financial sustainability since it traces how many individuals are involved in the pension payments and contributions flow. Clearly, solely considering the extensive margin is not exhaustive. In fact, within the set of indicators for *sustainable pensions* (i.e. the second objective of the Pension strand of the Open Method of Coordination), Eurostat considers indicators for two other types of margins. In particular, the extensive margin is combined with the *intensive margin*, i.e. how much workers contribute to and pensioners receive from the public pension system (as a share of GDP). Note that, from the contribution side, the intensive margin can be further decomposed into the product between pensionable wage and pension contribution rate. Thirdly, Eurostat adopts a durational indicator for the length of postretirement period as well as for the working life. The former traces the effective duration of pension payment, interrupted at individual level because of pensioner's death. The latter proxies the effective duration of pension contribution, often suspended in the case of unemployment (see Bravo and Herce, 2020) or even interrupted in the case of disability. These two indicators are important since they allow to trace a third dimension of the financial sustainability: for how long workers contribute and pensioners benefit. As such, they can identify a *durational margin*.

Considering all three margins not only enriches the set of dimensions to be evaluated, but it also stresses the importance of integration between labour market and pension statistics to analyse more narrowly the sustainability of public pension systems. In this sense, no sustainability indicators for pension systems follows this approach. This work tries to fill this gap by proposing a *Prospective Indicator for the Sustainability of Pension systems* (hereafter, PISP) coherent with the informative system proposed by Eurostat. A pool of European countries is considered as an application, and their ranking assessed. Finally, PISP is compared with an alternative formulation, stressing the contribution of the durational margin, as well as with a benchmark indicator.

# **2. Data and Methods**

Data to construct PISP span over five years, namely the period 2015-2019, and they are extracted from Eurostat database. The pool of European countries is represented by Austria, Germany, Finland, France, Italy, and the Netherlands. The following statistics are selected to construct PISP for each country in the pool (Table 1).

195 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Fabrizio Culotta, *A Prospective Sustainability Indicator for Pension Systems*, pp. 209-214, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.39, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Fabrizio Culotta, University of Genoa, Italy, fabrizio.culotta@edu.unige.it, 0000-0002-3910-3088


*Table 1: Statistics used for the construction of ISPS. Source: Eurostat.*

Once statistics are collected, the PISP is constructed as the difference of the product among margins for each flow. Let *i* and *t* index country and time, the PISPi,t can be defined as:

$$PISP\_{i,t} = PC\_{i,t} \cdot WL\_{i,t} - PP\_{i,t} \cdot LE\_{i,t} \tag{1}$$

where PCi,t and PPi,t refer to the pension contributions and payments, as share of GDP, respectively. WLi,t and LEi,t represent the duration, in years of working life and life expectancy at retirement (age 65) respectively. Values of statistics for each country and year are reported in Appendix A (Table 3). Note that to construct PISP there is no need to explicitly consider countryspecific pension parameters of pension systems (e.g. contribution rates, pension formulas).

# **3. Results**

The computed scores of PISP are depicted below and reported for each country (Fig. 1).

*Figure 1: PISP scores across countries and time. Source: author's own elaborations on Eurostat data.*

The Netherlands shows the highest profile in terms of PISP, despite it has decreased in the last years. Germany and Austria report an increasing profile, while those of France and Finland are decreasing. Furthermore, France profile is penalized by a decreasing GDP share of pension contributions (PCi,t). Italy shows the lowest profile, despite it slightly increases after 2016.

The PISP, as defined by equation 1, is now compared to an alternative version that excludes the durational margins from PISP (CISP, standing for Current ISP). Thus, CISP is defined as:

$$\text{CISP}\_{i,t} = \text{PC}\_{i,t} - \text{PP}\_{i,t} \tag{2}$$

For each country in the pool, the time profile of CISPi,t is depicted in figure 3.

*Figure 3: CISP scores across countries and time. Source: author's own elaborations on Eurostat data.*

CISP profiles, being simply defined as the current balance of pension system expressed in terms of share of GDP, show the direct impact of pension contribution and payment flows. All countries but Italy and Finland report positive values, with Finland having the lowest. Overall, three features emerge. Firstly, Italy and Finland maintain the lowest profile across indicators. Secondly, the Netherlands remains at the top positions followed by Germany. Thirdly, from 2017 onwards the financial sustainability of the French public pension system decreases. Explanations rely on the dynamics of each component forming the contribution and payment side.

Finally, the Mercer-Melbourne Global Pension Index - Sustainability (GPIi**,t**) is taken as a benchmark (Mercer-Melbourne, 2015-2019). Countries profiles are reported in Figure 4.

*Figure 4: GPI scores across countries and time. Source: Mercer-Melbourne (2015, 2016, 2017, 2018, 2019).*

 Despite GPI is an index for the sustainability of the whole (i.e. all pillars of) pension systems, it is possible to note that the Netherlands are confirmed to be ranked first throughout the considered period. Finland, ranked second, shows a stable pattern. Germany and France, placed third and fourth respectively, evolve along the same trend. Lastly, Austria and Italy show a positive trend, with the latter country having the lowest profile for the whole period.

 PISP and CISP are then compared each other as well as to the benchmark indicator GPI. For each pair of indicators, their (Pearson) correlation coefficient ρ is measured. Results are reported below (Table 2).


*Table 2: Correlation (Pearson) across pair of indicators: PISP, CISP and GPI. Source: author's own elaboration.*

 The first column shows that the correlation between PISP and CISP is quite high for all countries (on average 0.95), with the lowest dispersion across countries (0.06) if compared to the other two columns. This result can be interesting if one considers that countries have different pension regimes, namely different rules determining pensions contribution and payment flows. The last two columns of Table 2 report the correlation between PISP and CISP with the benchmark GPI. Firstly, note that in the case of France it is negative in both cases. This means that GPI may underestimate the impact of current imbalances on the overall sustainability of public pension systems. Secondly, correlation is very high for countries like Germany, Austria, Finland and Italy. On the other side, the proposed indicators are not informative about the overall sustainability of France and the Netherlands as measured by GPI. Overall, PISP reveals some desirable properties. In fact, not only the average correlation with current imbalances is higher for PISP (0.95) than for GPI (0.51), but it is also the least dispersed across countries (0.06 and 0.64, respectively).

# **4. Conclusions**

This work proposes an indicator for the financial sustainability of public pension systems. The novel lies in the consideration of durational margin for pension contribution and payment sides, namely the duration of working life and the life expectancy at retirement. In doing so, it explicitly combines both labour market and pension statistics in a unifying indicator, thus stressing their interplay. These novels are coherent with the recent focus posed by Eurostat within the set of indicators selected to monitor the sustainability of pension systems.

The proposed indicator PISP satisfies some properties which are desirable in the context of financial sustainability of public pension systems: highly correlates with current imbalances and the benchmark GPI and it is the least dispersed across countries. On the contrary, GPI shows a weaker correlation with current imbalances and negatively correlates in the case of France. This consideration opens the need towards a solid and reliable indicator which meaningfully track the financial performances of pension systems. The structure of PISP can be conceived as a first step to be further developed towards an informative system reliable and comprehensive for policymakers (Whitehouse, 2012). The challenge becomes of utmost importance in a context of an ageing society, where the proportion of retired people increases and accordingly the importance of pensions.

# **References**


Mercer, Melbourne (2015). *Melbourne Mercer Global Pension Index*.

Mercer, Melbourne (2016). *Melbourne Mercer Global Pension Index*.

Mercer, Melbourne (2017). *Melbourne Mercer Global Pension Index*.

Mercer, Melbourne (2018). *Melbourne Mercer Global Pension Index*.

Mercer, Melbourne (2019). *Melbourne Mercer Global Pension Index*.

Whitehouse, E. (2012). Pension indicators: reliable statistics to improve pension policymaking (No. 70347, pp. 1-12). *World Bank Publications.*

# **APPENDIX A: Dataset**


*Table 3: Dataset for the construct PISP and CISP. Source: Eurostat.*

### Illya Bakurova , Fabrizio Culottab **Unemployment dynamics in Italy: a counterfactual analysis at Covid time**

**Unemployment dynamics in Italy: a counterfactual analysis at Covid time**

<sup>a</sup> Information Management School, NOVA University of Lisbon, Lisbon, Portugal. <sup>b</sup> Department of Political Science, University of Genoa, Genoa, Italy. Illya Bakurov, Fabrizio Culotta

# **1. Introduction**

The experience of the Covid pandemic has revealed the importance of statistical monitoring systems. When the phenomenon of interest is evolving, information about past and future dynamics becomes fundamental to assess both effects and future trajectories. This is true especially during the recovery and post-recovery phases.

The various measures undertaken in each country to contain the spread of the virus moved towards the reduction of physical interaction and, a fortiori, gathering of people. The underlying uncertainty, in the Knightian sense, forced governments of almost all countries to impose harsh remedies. The freedom of movement has been suspended for a while. Undoubtedly, the onset of Covid was an unprecedented and unexpected shock for the world population and, thus, for the whole economy. In this, Italy can be considered a case of study.

Focusing on labour markets, from one side, a substantial drop in unemployment has been observed in Italy during the year 2020 (Fig. 1). The conditions of active search and (immediate) availability to work, whose simultaneous fulfilment identifies an unemployed individual, were not met. Accordingly, a consequent rise in inactivity occurred. From the other side, the Italian government introduced a ban on dismissal operating throughout the year 2020. The goal was to preserve the employment level avoiding firms to fire workers massively. In doing so, the level of employment was forced not to drop. Overall, this was the extraordinary regime under which the observed dynamics of unemployment evolved in Italy during the year 2020.

The aim of this work is to compare the observed dynamics of unemployment during the year 2020 in Italy with a counterfactual outcome to assess the (broad) impact of Covid in Italy in terms of unemployed individuals. In doing so, counterfactual outcomes are generated by Seasonal ARIMA models (SARIMA). Results are presented for the whole population of unemployed individuals and disaggregated by socioeconomic dimension as gender, age, education.

*Figure 1: Unemployed individuals (thousands) in Italy over the years 2014-2020 quarters. Source: Italian Labour Force Survey.*

201 Illya Bakurov, Nova University of Lisbon, Portugal, ibakurov@novaims.unl.pt, 0000-0002-6458-942X Fabrizio Culotta, University of Genoa, Italy, fabrizio.culotta@edu.unige.it, 0000-0002-3910-3088

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Illya Bakurov, Fabrizio Culotta, *Unemployment dynamics in Italy: a counterfactual analysis at Covid time*, pp. 215-220, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.40, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www. fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

# **2. Data and Methods**

This work adopts data from the Italian Labour Force Survey (Rilevazione sulle Forze di Lavoro, hereafter ILFS, for the years 2014-2020 at quarterly frequency focusing on working age population, i.e. aged 15-64 years, is considered. Raw (not smoothed) data covering the period 2014-2019 are used to train SARIMA models to forecast the four quarters of year 2020 (see Hyndman and Athanasopoulos, 2018, as a reference book). It is implicitly assumed that the first quarter of 2020 is the first period affecting unemployment dynamics, that is training data are not affected by the treatment. Estimated projections are then compared with observed values. The causal impact of Covid will be then defined as the difference between what is observed during the 2020 quarters (under the influence of Covid measures) and what would have been observed (in the absence of Covid measures). This empirical exercise is performed not only for the total population but also for eleven socioeconomic groups: two by gender (males and females), five by age (15-24, 25-34, 35-44, 45-54, 55-64), four by educational level (primary, lower and upper secondary, tertiary). The analysis is performed in *R* with help of the package *fpp2* (Hyndman et al., 2020).

Diagnostics analyses are also performed and available upon requests. In particular, trend-cycle decompositions visually suggest that each series exhibits strong seasonality which however is stable in variance over time. The visual inspection suggests that both seasonal and first differencing could take place. Therefore, we run a sequence of Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski et al., 1992) for the null hypothesis of stationarity in the data, and we look for the evidence of rejection. Results from KPSS tests confirm that both seasonal and first differencing should take place. Model orders are selected by inspecting PACF and ACF. The winner model has been selected based on common information criteria AIC and BIC. Estimated models are reported below (Table 1).


# *Table 1: Estimated Seasonal ARIMA (SARIMA) models. Drift refers to the time-invariant intercept component.*

The resulting error components are distributed as white noise. Results from Ljung-Box suggest accepting null hypothesis of serially uncorrelated errors. Accordingly, we use such model specifications to produce the counterfactual trajectories for the 2020 quarters.

# **3. Results**

Results are reported in figures 2-5. For sake of graphical clarity, confidence intervals are provided only for the total profile (Fig. 2). For the other profiles, they are available upon requests.

*Figure 2: Total profile. Observed (FITTED=F) vs Forecast (FITTED=T) at Covid time (COVID19=T). Source: authors' own elaborations on ILFS data.*

*Figure 3: Gender Profiles. Observed (FITTED=F) vs Forecast (FITTED=T) at Covid time (COVID19=T). Source: authors' own elaborations on ILFS data.*

*Figure 4: Age profiles. Observed (FITTED=F) vs Forecast (FITTED=T) at Covid time (COVID19=T). Source: authors' own elaborations on ILFS data.*

*Figure 5: Education profiles. Observed (FITTED=F) vs Forecast (FITTED=T) at Covid time (COVID19=T). Source: authors' own elaborations on ILFS data.*

All profiles depicted above share some general features. The difference between observed and forecast values, i.e. the impact of Covid on the number of unemployed, is negative during the first quarter of 2020. The difference is even larger (in absolute value) during the second quarter of 2020, where the impact of Covid in 2020 led a relevant drop in the number of unemployed. The difference is positive at the third quarter, coinciding with the summer season, and becomes negative during the fourth.

Results are also displayed in a tabular format (Table 2). The largest impact of Covid on unemployed workers corresponds to 2020-Q2 (-652000). This drop is concentrated among females (-385000, about 60% of the total drop). Across age classes, 45-54 reports the largest reduction (-237000, around 36%). Whereas unemployed with lower secondary education are the educational group (-295000, 45%). During the third quarter of 2020, the raise in unemployed occurred (410000). Of such a raise, men were the majority (242000, 60%). The age class 25-34 shows the largest share (153000, 37%). Similarly, the largest increase is observed for the upper secondary educational group (238, 58%). Overall quarters, it results that the drop in unemployment caused by Covid during the year 2020 regarded women more than men especially from the second quarter. Among the age groups, Covid had an impact especially on unemployed individuals aged 45-54. Across educational levels, following this logic, individuals without a tertiary education were affected the most.

# **4. Conclusions**

This work studies the impact of Covid pandemic, and related measures, on the number of unemployed workers during the 2020 quarters in Italy. Observed and counterfactual outcomes are compared to identify the causal impact of the onset of Covid since the first quarter of 2020.

In doing so, counterfactual outcomes are produced by means of SARIMA models applied to different socioeconomic groups in the population of unemployed. The causal impact is then measured as difference between observed and forecast values. Results confirms that the drop in unemployment caused by Covid was heterogenous, i.e. not homogenously distributed in the population. Females, individuals aged 45-54 and those with secondary educational levels were those groups associated with the highest drop.

In general, the counterfactual analysis is used as a tool to identify causal mechanism. In the case of this work, the (macro-)econometric model is also offered as a (simple) policy statistical tool. It can be used to identify future patterns and to reason on possible thresholds or rebounds. It can offer an informative, yet statistical, support to face important decisions under uncertainty. Possibly, it can reveal insights for future planning.


*Table 2: Counterfactual analysis on unemployment dynamics (thousands of individuals) in Italy at Covid time (2020 quarters). Covid impact is the difference between observed and forecast values. Source: authors' own elaborations on Italian Labour Force Survey data.*

# **References**

Kwiatkowski, D., Phillips, P. C., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? *Journal of econometrics*, **54**(1-3), 159-178.

Hyndman, R. J., Athanasopoulos, G. (2018). *Forecasting: principles and practice*. OTexts.

Hyndman, R. J., Athanasopoulos, G., Bergmeir, C., Caceres, G., Chhay, L., O'Hara-Wild, M., ... , Wang, E. (2020). *Package 'forecast'*. https://cran. r-project.org/web/packages/forecast/forecast.pdf. SESSION

# TOURISM AND GASTRONOMY

### Alfonso Piscitellia , Roberto Fasanellib , Elena Cuomoc and Ida Gallib a Department of Agricultural Sciences, Federico II University of Naples, Naples, Italy b Department of Social Sciences, Federico II University of Naples, Naples, Italy c Department of Political Sciences, Federico II University of Naples, Naples, Italy **Understanding the sensory characteristics of edible insects to promote entomophagy: A projective sensory experience among consumers**

**Understanding the sensory characteristics of edible insects to promote entomophagy: A projective sensory experience among consumers.** 

Alfonso Piscitelli, Roberto Fasanelli, Elena Cuomo, Ida Galli

# **1. Introduction**

The world population is continuously growing, and with it, the demand for food increases. Processes such as urbanization and globalization are increasingly influencing dietary change for a considerable part of the population. The result is a constant increase in the need for high biological value proteins, the production of which represents a challenge for the future, especially considering that current production techniques (i.e., animal protein farming) not only have a significant environmental impact but also show a low level of efficiency. These techniques produce high levels of carbon dioxide, consume considerable amounts of water, and involve major waste-disposal problems (Amato et al., 2019).

The European Parliament has indicated that the deficit in protein sources is one of Europe's most critical problems: the Old Continent imported about 80% of its protein from other countries. Insects can be a sustainable alternative to this problem, for their efficient metabolism and their ability to transform organic waste into high-quality protein (Materia and Cavallo, 2015). Western countries' interest in insects as a potential source of food has grown considerably in recent years: the high content of high-quality protein and the sustainability of the production process, compared to traditional sources, primarily meat, have contributed to increasing scientific debate on the topic. The progressive inclusion of insect-based ingredients in the human diet has attracted increasing attention as a valid alternative to overcome the major nutrition challenges the world is facing (Schrögel and Wätjen, 2019). However, a diet based on insects (or their components) entails a radical departure from Western societies' current food traditions. Although recent research shows that consuming insects (raw or processed) provides significant benefits in terms of protein content, social acceptance is, on the contrary, very low in Western societies (Verneau et al., 2016; La Barbera et al., 2018; 2020). However, insects and their derivatives in food products are not entirely new even in the West: products such as jams and fruit juices contain traces of them, for an estimated average per capita consumption of 250 gr/year (Materia and Cavallo, 2015; Sogari and Vantomme, 2014), even if a clear awareness of this is still lacking. Scholars conducted several studies to analyze consumer behavior employing insect-based foods; many of these have identified factors that may positively or negatively influence the degree of acceptance.

Our basic hypothesis is that intention to try insect-based dishes is causally dependent on sensory reasons that are relevant to sensation-seeking. To test this hypothesis, we interviewed a convenience sample of consumers to examine the relationships between their intention to try insect-based dishes and their anticipatory gustative sensations regarding "insects as food". The research data were obtained from a web questionnaire completed by a sample of consumers which was held using social media and via e-mail lists.

The paper is organised as follows. After this introduction, Section 2 describes the sample

207

Alfonso Piscitelli, University of Naples Federico II, Italy, alfonso.piscitelli@unina.it, 0000-0001-6638-2759 Roberto Fasanelli, University of Naples Federico II, Italy, roberto.fasanelli@unina.it, 0000-0001-8908-3284 Elena Cuomo, University of Naples Federico II, Italy, elena.cuomo@unina.it, 0000-0001-7526-2353 Ida Galli, University of Naples Federico II, Italy, ida.galli@unina.it, 0000-0001-5159-9162

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Alfonso Piscitelli, Roberto Fasanelli, Elena Cuomo, Ida Galli, *Understanding the sensory characteristics of edible insects to promote entomophagy: A projective sensory experience among consumers*, pp. 223-227, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.42, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

survey and introduces the model for data analysis. Then, Section 3 presents the main results of the statistical analysis of the collected data. Finally, Section 4 interprets the data with reference to sensation seeking in connection with the choice of food.

# **2. Data and methods**

# **2.1 Consumer survey on entomophagy**

A fact-finding/exploratory survey was conducted on a small convenience sample of Italian consumers. The inclusion criterion was represented by having at least heard of the introduction of insects into human nutrition. The survey, conducted between July to December 2020, was carried out by means of a web questionnaire filled out by participants who completely satisfied the inclusion criterion and recruited using social media (e.g., Facebook, Twitter, WhatsApp chats) and via e-mail lists (e.g., University of Naples student lists).

The self-administered questionnaire has been structured into 3 sections each of them with a specific aim of collecting data. In the first section, we asked each participant to answer both semi-structured and structured questions made up starting from the following dimensions: previous knowledge of the "insects as food", informative sources, and opinions about it. As a supplement, three additional items were administered to measure the intention to eat insects in general. This section ended with four items added to ask participants for their willingness to eat specific animals (cow, fish, chicken, and pig) fed with insects. All answers were collected using a 7-point scale from "strongly unlikely" to "strongly likely". A specific section was devoted to what we called projective sensory experience (PSE). To identify respondents' anticipatory gustative sensations regarding "insects as food" we used an ad hoc created two-step tool. In particular, (first step) we asked participants to imagine tasting an insect dish and then rate from 1 imperceptible up to 10 very perceptible the following taste-olfactory sensations inspired by the work of Donadini et al. (2008): Sapidity, Bitter tendency, Acidity, Sweet, Spiciness, Aroma, Greasiness-Unctuosity, Succulence, Sweet, Fatness, Persistence. Furthermore (second step), since a representation is always built from a disturbing object (in positive or negative), at the end of the task, we asked our interviewees to indicate, through a specific checklist, which was the most disturbing and least disturbing imagined taste-olfactory sensation. The goal is to know which kind of "sensory anchoring" participants activate in front of an insect's dish.

The questionnaire ended by collecting the respondents' descriptive characteristics (gender, age, education, living area) as well as their eating habits.

# **2.2 Analytical model**

The model for data analysis includes the intention to try insect-based dishes as a dependent variable, *Y*, a set of possible regressors, **X**, and a set of control variables, **Z**. The relationship may be written as

$$Y = f\left(X\_i \mid Z\_i\right),$$

where *Xi* denotes the taste-olfactory sensations. The control variables, which were forced into the model, were gender, age, education, and living area. The *Y* variable was measured on two levels.

The logistic regression model is written as follows (Agresti, 2002; Bilder and Loughin, 2014):

$$\text{logit}\left(\pi\right) = \beta\_0 + \beta\_1 X\_1 + \dots + \beta\_p X\_p \quad , \text{ }$$

where 0 < p *<* 1 and *logit* (p ) *= log* [p / (*1* – p)], and *βi* measures the relation between *Y* and *Xi* when all other variables in the model remain fixed.

$$\pi = \Pr\{Y = 1 \mid X, Z\}$$

where Y = 1 represent the positive intention to try insect-based dishes.

Maximum likelihood estimation is used to estimate the parameters *β*0, *. . . , βp* of the logistic regression model. It will be clear, based on the context, how the probability of success is a function of explanatory variables, being the slope of the logit function hyperplane along each *X* dependent on the value of the corresponding parameter *β.*

Statistical analyses were carried out in the *R* environment (R Core Team, 2021). A logistic regression model to a dichotomous response variable was identified by the *glm* function. Moreover, the *step* function was utilised to perform stepwise model selection with criterion *AIC*.

# **3. Results**

The data were gathered from 154 consumers, of these, about 58% of respondents were female, and about 42% were male. The mean age of respondents was 43 years (SD = 14.12), 53% lived in a highly urbanized area, 39% lived in the urban suburbs, and the remaining 8% lived in a rural area. About 28% of respondents has a very high level of education (superior graduate or PhD), 49% has a bachelor's or master's degree, and the remaining 23% has a level of education of secondary school.

Table 1 summarises the results of the regression analysis reporting the estimates of the regression betas obtained and their significance.

**Table 1.** Beta estimates, and related standard errors, of the regression model with intention to try insect-based dishes as criterion variable (forward stepwise selection of regressors, *n* = 154; AIC = 181.95;

*\*\*\* 0 <* a*oss < 0.001; \*\* 0.001 <* a*oss < 0.01; \* 0.01 <* a*oss < 0.05;* ° *0.05 <* a*oss < 0.1;* NS= Not significant at 10% threshold; the interactions are labeled by colon between the variables' names)


*Note: Some regressors are not individually significant but are significant wrt AIC criterion.*

Our results highlight a dependence of the willingness to try insect-based dishes by the respondent's fondness to Acidity (indirect), Spiciness (direct) and a mediation effect of education degree on the former. The results of the regression analysis support the following claims:





# **4. Discussion and conclusion**

We have tested whether the consumption of insects is carried out for sensorial reasons by means of the new tool, ad hoc created, the Projective Sensory Experience (PSE).

As Lammers, Ullmann, and Fiebelkorn (2019) demonstrated, in connection with the choice of food, sensation seeking correlates with liking spicy food and the willingness to try unusual foods also showed that people with a high sensation seeking have a lower food neophobia. From our results, we assume that sensations like "spicy" should be positively related to the willingness to consume insects while "acidity" has a negative role.

In summary, this study confirms the results found in other researches on Italian consumers, who are not completely familiar with the topic of entomophagy. In addition, it was shown that overall, only four of ten of our interviewees would try the experience of eating insects, as well as another study (Sogari et al., 2017) showed that 47% of young Italian "foodies" envisaged insect eating. Moreover, it is important to highlight that the willingness of Italian consumers to adopt insects into their diet seems higher than in other countries of Mediterranean Europe (Mancini et al., 2019). Furthermore, this study identified sensation seeking and especially "sensory anchoring" as additional predictors for the willingness to consume insects as food in addition to the already known influential factors such as gender, educational level, previous insect consumption, food neophobia and food technology neophobia. In our case highest level of education mediate the willingness for spicy insect dishes.

These preliminary results allowed us to identify which aspects are worth focusing on while searching for the multidimensional motivations behind this particular food choice, and also to highlight the moderation role played by some kind of cultural factors. After the screening achieved thanks to this pilot stage of the study, it would be interesting, for example, to understand the role played by environmentalist ideology on the choice to regularly include insects in one's diet. The question arises as to whether the ecology-based food really motivates consumers to eat insects. A more in-depth analysis of the motivation of people already consuming insect products might provide further insights on how insects might be integrated into the Italian diets. The findings of this preliminary analysis are encouraging about the idea of exploring by a "projective" approach the sensory experience related to food, and supports us to continue along this path. Nonetheless, in order to reduce the aversion to insects as food, it is necessary to create opportunities for the Italian population to make their own positive taste experiences, probably by giving to these foods precisely the sensory characteristics that our interviewees have already imagined attractive to their palate.

# **References**

Agresti, A. (2002). *Categorical Data Analysis*, 2nd edition. Wiley, Hoboken, (NJ). Amato, M., Fasanelli, R. and Riverso, R. (2019). Emotional Profiling for Segmenting Consumers: The Case of Household Food Waste. *Calitatea*, **20**(S2), pp. 27–32.


### Luigi Fabbrisa , Alfonso Piscitellib a Tolomeo studi e ricerche, Padua, Italy **Experience, sensorial skills and personality qualifying a wine consumer as an expert**

**Experience, sensorial skills and personality qualifying a wine consumer as an expert** 

 Department of Agricultural Sciences, Federico II University of Naples, Naples, Italy Luigi Fabbris, Alfonso Piscitelli

# **1. Introduction**

b

This paper highlights the characteristics of wine consumers that may qualify them as wine experts. In this work, the expertise of wine consumers was measured through various degrees of self-perceived ability. Participants ranged from limited-knowledge consumers to consumers with enough knowledge to perceive wine quality or recognise certain wines and, finally, to professional experts.

Wine is an 'experience good' in that its quality is unknown before consumption. Thus, a wine expert is not only knowledgeable about wine but also practises wine consumption as a continual consumer. In Italy, wine is a cultural product as well, as the consumption habitus depends on consumer taste, which is crucial in choosing products (Bourdieu, 2005). Wine culture is defined as the capacity to harmonise wine and food and conceive of wine as a nutritional, social and health-related means. In this work, the cultural roots of wine were measured through a 'semantic differential' (Osgood et al., 1957) of wine preferences, which was determined with a rating scale designed to measure the assessors' preferences for wine.

Our basic hypothesis is that wine expertise is causally dependent on cognitive and noncognitive characteristics of the wine experience, sensorial skills that are relevant to wine assessment and wine consumption culture. To test this hypothesis, we evaluated a convenience sample of consumers to examine the relationships between their self-assessment of wine expertise and qualification of their wine-related training and experience (consumption, production, purchase), their sensorial skills (visual, olfactory, gustative), their enogastronomic culture and their approach to evaluating a set of selected wines. The research data were obtained from an evaluation questionnaire completed by a sample of wine assessors at a tasting experiment which was held during a scientific meeting in Pescara, Italy in September 2018. The sample includes both meeting participants and external experts involved in AIS-Abruzzo, the regional association of chartered sommeliers.

The paper is organised as follows. After this introduction, Section 2 describes the methodological aspects of the tasting experience and introduces the model for data analysis. Then, Section 3 presents the main results of the statistical analysis of the collected data. Finally, Section 4 interprets the data with reference to the mainstream literature on wine expertise analysis.

# **2. Data and methods**

# **2.1 The tasting experience**

In September 2018, a sensory evaluation experiment was conducted on 12 white wines originating from six grape varieties (*Trebbiano d'Abruzzo, Pecorino d'Abruzzo, Passerina d'Abruzzo, Pagadebit di Romagna* and *Pignoletto di Romagna*) from two Italian regions, Abruzzo and Romagna. All wines were controlled designation of origin (DOC) products. The pool of tasters included 48 individuals, of whom 30 typically consumed mild amounts of wine

213 Luigi Fabbris, University of Padua, Italy, luigi.fabbris@unipd.it, 0000-0001-8657-8361

Alfonso Piscitelli, University of Naples Federico II, Italy, alfonso.piscitelli@unina.it, 0000-0001-6638-2759

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Luigi Fabbris, Alfonso Piscitelli, *Experience, sensorial skills and personality qualifying a wine consumer as an expert*, pp. 229-234, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.43, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

(mild consumers), and 18 were professional sommeliers belonging to the AIS-Abruzzo association. Both mild consumers and sommeliers were selected on the basis of their interest in and availability for the experiment as well as their experience in wine consumption.

The wine characteristics considered in this evaluation experiment were selected through an anonymous paper questionnaire. This questionnaire asked participants to make judgements on 11 intrinsic attributes of appearance, nose and palate for four wines that were randomly selected from the 12 at hand. Subsequently, participants were instructed to provide an overall judgement of each wine. The questionnaire also gathered data on the tasters regarding their background characteristics, their drinking habits, and the relevance of wine in their diet and social life. In this work, we confine the analysis to the characteristics of tasters. The characteristics of the assessed wines enter the analysis only as distributional parameters (mean and variance) of the scores which single assessors assigned to the tasted wines.

# **2.2 The experiment**

The experiment involved a horizontal tasting, as it compared only white wines from the same terroir and of the same vintage. On this basis, it is possible to obtain comparative judgements between the selected wines. In accordance with a fractional factorial experiment, each taster was administered four randomly selected wines from different grapes. The sampling of the administered wines was carried out at the grape-variety level. Only four of the six possible varieties were administered to any taster, and one of the two potential cellars was randomly selected. In this case, the experiment sampled possible choices rather than choosers (Manski and Lerman, 1977).

The sampling design followed a systematic pattern such that each grape variety appeared 8 times every 12 trials. Thus, each wine variety had 32 repetitions once 48 tasters had performed their task; consequently, the number of repetitions of each variety by cellar was 16.

Each taster had five glasses: one for water and four for the wines. The wines were poured in a flight. In the tasting session, the judges received 6 centilitres of each of the four randomly selected wine varieties, which were served at the same cold temperature. The protocol envisaged that tasters could taste and re-taste before concluding preferential judgements, and they would evaluate the intrinsic attributes of each tasted wine.

# **2.3 Analytical model**

The model for data analysis includes the self-evaluation of wine expertise as a dependent variable, *Y*, a set of possible regressors, **X**, and a set of control variables, **Z**. The relationship may be written as

$$Y = f(X\_1, \, X\_2, \, X\_3, \, X\_4, \, X\_5 \mid Z),$$

where *X1* denotes wine expertise and learning experience, *X2* represents the descriptors of wine habits, *X3* refers to the sensorial skills, *X4* signifies the descriptors of wine-related attitudes and culture, and *X5* is the evaluation style of the tasted wines. The latter was measured through the mean and standard deviation of the scores for the four tasted wines. The underlying hypothesis was that the evaluations by experts would be more critical and uniform than those of nonexperts. The control variables, which were forced into the model, were gender, age and smoking experience. The *Y* (ordinal) variable was measured on four levels.

The ordinal logistic regression model is written as follows (Agresti, 2002; Bilder and Loughin, 2014):

$$\text{logit}\left[p(Y \le j)\right] = \beta\_{j0} + \beta\_I X\_I + \dots + \beta\_p X\_p \quad (j = I, \dots, J-I), \quad i$$

where *logit(p) = ln[(p/(1-p)]*, and *βi* measures the relation between *Y* and *Xi* when all other variables in the model remain fixed. We adopted the proportional odds model, which assumes that the logit of the cumulative probabilities changes linearly as the regressors change, and the slope of the relationship between *Y* and the *X*'s is the same regardless of the category *j* of variable *Y*.

A logistic regression model to an ordered response variable was performed with the *polr* function from MASS package (R Core Team, 2021). After that, the *stepAIC* function was utilised to perform stepwise model selection with criterion *AIC*.

# **3. Results**

Of the 48 assessors, five (10.4%) considered themselves to be wine experts, and eight (16.7%) stated that they were able to recognise some wines but did not consider themselves to be wine experts. The majority of the participating sommeliers classified themselves in the latter category. A larger group of assessors (47.9%) indicated that they possessed sufficient knowledge of wine to adequately understand its quality. Finally, 25% of the assessors admitted that they knew little or very little about wine.

Overall, our sample included a group of experts and a group of nonexperts (each accounting for approximately one-quarter of the tasters) as well as a larger, intermediate category of mildly informed amateurs (about one-half of the tasters). Only 3 of the 48 assessors produced or bottled their own wine; the others bought it occasionally or on a monthly basis either at vineries or in supermarkets or wine shops. A few (8.3%) purchased wine through the internet.

Regarding wine practice, about 56% of assessors had been consuming wine for decades, usually with dinner. The majority (54.2%) had attended a wine-tasting session coordinated by a sommelier. One-half of the tasting sample was female, in which the average age was 47. This group mostly had a college degree (66.7%), worked mainly at a university (81.3%) and did not smoke (41.7% had never smoked, and 29.2% had formerly smoked).

Table 1 summarises the results of the regression analysis and presents the estimates of the regression betas and their significance. We highlight the elevated significance of the statistical analysis: R2 =61.3%. To corroborate the regression results, selected covariates are crossed with the self-perceived expertise of assessors (Table 2).



*Note: Some regressors are not individually significant but are significant wrt AIC criterion.*

The analysis supports the following claims:



**Table 2.** Some covariates, by the self-perceived expertise of assessors

*Note: The levels "being wine experts" and "being able to recognise some wines" of the self-evaluation of wine expertise, have been merged into the "Expert" category.* 


# **4. Discussion and conclusion**

This work has aimed to define the characteristics of wine experts. An expert is a person who possesses in-depth knowledge, abundant experience, a proclivity for vivid imagery and a stronger descriptive capacity than that of other people (Parr et al., 2002; Ericsson et al., 2007; Croijmans and Majid, 2016; Croijmans et al., 2020). A wine expert also demonstrates an acute capacity to recognise, classify and evaluate wine characteristics. Notably, skill training can be particularly effective with practice.

The analysis illustrates that experts assumed a leading role in wine selection, which is a habit that they adopted decades ago and had improved over time. The attendance of specific courses and tasting events led by sommeliers or between-peer contests improved their expertise and self-confidence. An expert clearly seeks to train themself through both the exploration of new sensations and products and the intensification of their sensorial skills.

Furthermore, the data analysis indicates that experts perceive themselves as different from producers and bottlers. For both producers and experts, wine is a professional means as well as an important part of their life. However, producers (should) know how to make high-quality wines, whereas experts approach wine from a position which resembles that of an explorer who is constantly seeking out new land to discover. For experts, such exploration targets unfamiliar sources of sensation (e.g. aromas, bouquets, flavours, terroirs, ages, faults) for themselves and possibly for other initiates as well.

Several experiments have reported the tendency of experts to scout out new sensations. For example, in a study by Mariño-Sánchez et al. (2010), Spanish wine tasters perceived more odours as intense but fewer as irritating compared to the non-trained healthy population. The identification of wine peculiarities, in particular when practicing the olfactory-gustatory skill, involves cognitive skills. Croijmans et al. (2020) have suggested that expertise entails a heightened ability to imagine hidden structures, thus extending the plasticity of cognition which underlies the chemical senses. On this basis, a wine expert can recognise, discriminate and match the peculiarities of a wine much more effectively than a novice. According to Ericsson et al. (2007), a particular kind of practice – a deliberate practice – is imperative to develop expertise. The practice of wine experts essentially concerns revelation, as they strive to identify elements of wine that were previously unknown or difficult for most people to detect independently. Therefore, a wine expert resembles a member of an uncovered sect or, in more politically correct terms, a highly exclusive professional cluster.

In this study, the experts evaluated the tasted wines more highly than the other tasters, which may be considered an indirect compliment to the people who selected the tasted wines. Moreover, the category of experts displayed significantly less divergence in their evaluation scores compared to those of the other tasters. This finding supports the view of experts as a compact cluster and implies that their judgements of wine are generally more reliable than those of other assessors.

Finally, the findings highlight some differences amongst the evaluation styles of experts. In view of this, future research could consider an analysis of between-expert differences.

# **References**

Agresti, A. (2002). *Categorical Data Analysis*, 2nd edition. Wiley, Hoboken, NJ.


### Maurizio Carpita <sup>a</sup> , Silvia Golia <sup>a</sup> <sup>a</sup> Department of Economics and Management, University of Brescia, Brescia, Italy **Prediction of wine sensorial quality: a classification problem**

Prediction of wine sensorial quality: a classification problem

Maurizio Carpita, Silvia Golia

# 1. Introduction

When dealing with a wine, it is of interest to be able to predict its quality based on chemical and/or sensory variables. There is no agreement on what wine quality means, or how it should be assessed and it is often viewed in intrinsic (physicochemical, sensory) or extrinsic (price, prestige, context) terms (Jackson, 2017). For example, in Golia et al. (2017) it was measured by a global score of quality, ranging from 0 to 100, produced by Altroconsumo, an Italian independent consumer's association, and based on a large set of variables including chemical and sensory variables, as well as variables of context. Cortez et al. (2009) used an indicator, ranging from 0 to 10 with 0 meaning very bad and 10 excellent, obtained from the evaluations of experienced judges who scored the wines.

In this study we started from the Cortez et al. (2009) paper, but we maintained the categorical nature of the variable measuring the wine sensorial quality. The approach to the prediction of this categorical variable followed by Cortez and coauthors makes use of the observed wine quality, but it suffers from the fact that it is necessary to know the wine quality measure. Instead, in this paper we started from the predicted probabilities' record of the categories of the target variable, obtained from the application of the Cumulative Logit Model, and then we applied a classifier in order to predict the final category. This last step is the one of interest for this paper; in fact we will compare the predictive performances of the default method (Bayes Classifier), which assigns a unit to the most likely category, and other two methods (Maximum Difference Classifier and Maximum Ratio Classifier). In order to do that, we will use the data analysed in Cortez et al. (2009) concerning both the white and red variants of the Portuguese "Vinho Verde" wine.

The paper is organized as follows. Section 2 discusses the categorical classifiers used in this study, whereas Section 3 reports the results concerning the prediction of the wine sensorial quality. Conclusions follow in Section 4.

# 2. The categorical classifiers

As stated in the introduction, the statistical problem of this study refers to the way in which the record of the predicted occurrence probabilities of each of the categories of the categorical target variable is transformed into a single value. The default method is the *Bayes Classifier* (BC), which assigns a unit to the most likely category. BC has the property to minimize, on average, the test error rate (James et al., 2013), so it is the optimal criterion when the accuracy of the classification is the main goal. Nevertheless, BC favors the prevalent category most and when there is not a category of interest but all the categories have the same relevance, it can not be the best choice.

Starting from this observation, in Golia and Carpita (2018, 2020) we have investigated the performances of different categorical classifiers (some of them take into account also the ordinal nature of the target variable) and we have found the so-called *Maximum Difference Classifier* (MDC) promising. In this study we considered MDC and a new classifier denoted as *Maximum*

Maurizio Carpita, Silvia Golia, *Prediction of wine sensorial quality: a classification problem*, pp. 235-238, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.44, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Maurizio Carpita, University of Brescia, Italy, maurizio.carpita@unibs.it, 0000-0001-7998-5102 Silvia Golia, University of Brescia, Italy, silvia.golia@unibs.it, 0000-0003-0015-8126

<sup>219</sup> FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

*Ratio Classifier* (MRC). Both classifiers are based on the comparison between the predicted probabilities and the sample frequencies and they are defined as follows.

Let pr<sup>i</sup> be the predicted probability of the category c<sup>i</sup> (i = 1, 2,...,k) of the categorical variable C, and let fr<sup>i</sup> be the corresponding frequency computed from observed data. The MDC computes the deviations of pr<sup>i</sup> from fr<sup>i</sup> and takes the category corresponding to the maximum difference, that is:

$$MDC: \arg\max\_{i \in (c\_1, c\_2, \dots, c\_k)} (pr\_i - fr\_i).$$

This classifier represents the extension of what proposed by Cramer (1999) for the dichotomous case.

The MRC computes the relative deviations of pr<sup>i</sup> from fr<sup>i</sup> and takes the category corresponding to the maximum ratio, that is:

$$MRC: \arg\max\_{i \in \left(c\_1, c\_2, \dots, c\_k\right)} \left(pr\_i/fr\_i\right).$$

# 3. The prediction of wine quality

The data under study concern the sensorial quality of the white and red variants of the Portuguese "Vinho Verde" wine (Cortez et al., 2009). The wine quality was measured by a sensory preference variable, from now on denoted as SPV, using a 0-10 scale. For each wine, eleven of the most common physicochemical variables were recorded; they represent the explanatory variables for the SPV, which is the target variable. Table 1 reports the frequencies of SPV scores observed in the white and red wine data sets; not all the available scores were used and some of them own a low frequency.

Table 1: Frequencies of the sensory preferences observed in the white and red wine data sets


The model used to study and predict the occurrence probabilities of each of the categories of the SPV, is the *Cumulative Logit Model* (CLM) (Agresti, 2010), defined as follows. Let Y be a categorical target variable with k ordinal categories {1, 2,...,k}, and let {X1,...,Xp} be a set of explanatory variables; for the statistical unit s, the CLM has the following form:

$$\text{logit}[P(Y\_s \le i)] = \log \frac{P(Y\_s \le i)}{1 - P(Y\_s \le i)} = \alpha\_i + \sum\_{m=1}^p \beta\_m x\_{sm}, \quad \text{for } i = 1, 2, \dots, k - 1.$$

Once estimated the parameters, it is possible to use the model for predictive purposes, so the CLM gives the k predicted probabilities that are passed to the categorical classifier.

In order to evaluate the predictive performance of a classifier, some indicators computed from the confusion matrix can be used. In this study they are: the *Sensitivity* (Sen) of each category, the *Maximum Distance Between Sensitivities* (MDBSen), the *Overall Accuracy* (OvAc), the *Macro Average F1 score* (MAF1) and the *Kappa statistic* (Kappa) (Raschka and Mirjalili, 2019). Sen<sup>i</sup> expresses how well the classifier recognizes a unit belonging to the category ci. MDBSen, defined as:

$$\mathbf{MDBSen} = \max\_{i \neq j} |\mathbf{Sen}\_i - \mathbf{Sen}\_j|,$$

highlights the balanced or unbalanced ability of the classifier to assign a unit to the right category, the lower the MDBSen, the more balanced the classification. The OvAc is the rate of correct classification and it is the indicator maximized by BC. The MAF1 is another indicator to measure the accuracy of the classifier and it is obtained as the average of the F1 scores classby-class. The choice of MAF1 instead of the weighted average F1 score, is linked to the will to attribute the same relevance to all classes. Kappa is used to measure the agreement between the actual and the predicted classifications of a dataset, while correcting for agreement occurred by chance.

Table 2 reports the value of these statistics computed on the base of the in-sample prediction of the SPV of all the available wines. For the sake of clarity, we added the percentage variation of OvAc, MAF1 and Kappa with respect to the value obtained applying BC in the last three rows.


In the face of an expected but limited reduction in OvAc (6.4% for white wines and 2.7% for red wines), MDC performs better than BC with respect to MAF1 and Kappa and shows more balanced values of the sensitivities, especially for the white wines. MRC outperforms both BC and MDC in terms of balancing the sensitivities, but loses a lot in terms of OvAc and Kappa.

Given that the lowest and highest sensory preferences have low frequency, we merged the first two and the last two categories for both the two varieties of wine, obtaining a SPV on a 5-category ordinal scale for white wine and on a 4-category ordinal scale for red wines.

Table 3 reports the indicators of Table 2 with the exception of the sensitivities of the single categories. The results show the same behaviour observed in Table 2. It is of interest to note that also in this case there are some categories with a low frequency and others that absorb the majority of the statistical units.

# 4. Conclusions

In this paper we investigated the impact of different classifiers in the capability to predict the wine sensorial quality of the Portuguese "Vinho Verde" wine. We have studied this variable applying the CLM for prediction purposes. We have transformed the prediction of the occurrence probabilities of each of its categories into a single sensory preference through three


Table 3: Performance Indicators for white and red wines after merging some categories

different classifiers, the BD, the MDC and the MRC. The results have shown that, despite an expected but limited reduction of the overall accuracy, the MDC seems to be the suitable categorical classifier in an unbalanced context (that is when some categories absorb almost all the statistical units) and when all the categories have equal importance (i.e. different types of mis-classification do not involve different costs).

# References


Jackson R.S. (2017). *Wine Tasting, 3rd ed*. Academic Press.


### **development: the last 15 years, region by region** Fabrizio Antolinia , Antonio Giustib **Tourism of Italians in Italy through crisis and development: the last 15 years, region by region**

**Tourism of Italians in Italy through crisis and** 

<sup>a</sup> Department of Business Communication, University of Teramo, Italy. <sup>b</sup> Department of Statistics, Computer Science, Application, University of Firenze, Italy. Fabrizio Antolini, Antonio Giusti

# **1. Introduction**

Tourism is a very important economic activity for many nations and Italy is among those that particularly benefit from it. However, the economic effects determined by tourist flows depend also on their composition. For example, the distinction between international and domestic tourism is important to understand the pattern of the performed expenditure. Similarly, the distinction between tourism within the region and tourism from outside the region is important to improve the programming of certain services (in particular transport) or to evaluate the attractiveness of the area as a tourist destination.

In recent years, many believe that local tourism has increased; however, a precise estimate of the phenomenon is not easy to make. An indirect estimate, using statistical official sources, can be made using the tourist flows within the region, or by observing the trend of hikers as an indirect measure of the phenomenon. Finally, it would be useful to use new information sources (big data), even if their quality does not yet seem to be able to guarantee an adequate representation of the phenomenon.

However, the analysis of external tourist flows is also relevant, since they express the attractiveness of the territories, (in this case regions) as a tourist destination, which can increase or decrease over time, also due to the policies implemented at the territorial level.

The paper examines the tourism of Italians in Italy, in the various regions, from 2006 to 2020, using the origin-destination matrix produced by ISTAT, distinguishing external flows from those within the region. Particular attention will be given to the year 2020, for which economic information will be provided, showing that overall, the tourism sector in Italy has continued to play an important role, despite the pandemic crisis. The choice of arrivals, instead of night-spent, reduces the influence of the specific type of tourism in each region. The initial results appear interesting and have also been summarised using correspondence analysis.

# **2. The territory and tourist flow among regions: different approaches**

The analysis of the measurement of tourism trends presents several problems since it can have very different objectives. On the other hand, whatever the variable considered, at territorial level the tourism phenomenon almost always presents a high degree of variability, even in contiguous areas. This inevitably raises the question of the real usefulness of an aggregate analysis of the tourism phenomenon. While the territorial variable is important, the choice of territorial detail is often conditioned by the availability of data and the sectoral policy competencies of the territories. In our analysis, the detail used is regional, since it is at this level that the prevalence of public policies on tourism is decided, but also because of the greater availability of data, especially economic and social data. On the other hand, there is still a lack of evaluation models at territorial level that can also be used in the monitoring phase of the various measures, using a simplified inputoutput scheme, in which the two subjects of the hypothetical function can be identified as public tourism expenditure (input) and arrivals or nights-spent (output). The latter two indicators, often used interchangeably, have a very different descriptive capacity (Antolini et al., 2017). For example, a territory that increases its nights-spent more than its arrivals is a territory that succeeds in retaining tourists. This may be because the services offered are competitively priced, or because the territory

Fabrizio Antolini, Università di Teramo, Italy, fantolini@unite.it, 0000-0002-3112-524X

Antonio Giusti, University of Florence, Italy, antonio.giusti@unifi.it, 0000-0001-9804-4578

223 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Fabrizio Antolini, Antonio Giusti, *Tourism of Italians in Italy through crisis and development: the last 15 years, region by region*, pp. 239-244, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.45, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the onsite conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

has been able to exploit its many tourist destinations. On the other hand, as far as policy is concerned, two different approaches can be followed: one aimed at carrying out many small actions appropriately coordinated in different sectors; the other aimed at carrying out a small number of actions concentrated in sectors considered to be a priority on the scale of possible actions. Whichever approach is used, which must of course be identified only after an overall analysis of the territory's strengths and weaknesses, an analytical knowledge of tourist flows cannot be omitted.

If, as described above, the distinction between arrivals and nights-spent is important, for the purposes of analysing tourism flows it is very useful to distinguish between internal and external tourism flows. Reference is often made, especially when describing the impact of Covid-19 on the tourism sector, to proximity tourism. It is defined as that form of tourism characterised by trips relatively close to the place of residence of the visitors. One way to measure proximity tourism is to break down the analysis of regional tourism flows into internal and external. As we shall see, the two flows at regional level in the period considered (2006-2020) have not always shown similar trends. Another important tool for the measurement of movements is represented by big data that in the Covid-19 period represented a powerful tool for the analysis of people's movements on the territory. The reference is to Google Map (https://www.google.com/covid19/mobility/) as it provides information to assess movements in a specific territorial area. For example, in the metropolitan area of Rome in the period July 15th - August 26th, using these data we know that the attendance in the parks increased by 4% compared to the reference period. The use of data for the analysis of tourist flows still presents several critical issues, which could be resolved by better targeting the usability of the data. In fact, these data do not yet allow a comparison between geographical areas with certain tourist vocations (mountain, seaside, cultural, …).

# **3. The data employed and the descriptive analysis**

The breakdown of flows into internal and external uses the data contained in the origindestination matrix of arrivals at a regional level as recorded in the survey on the movement of accommodation facilities(ISTAT, 2020). This survey, as it is known, is a census survey (Petrei and Manente, 2018; Antolini and Grassini, 2020) which produces information down to the level of detail of individual municipality. The availability of the origin-destination matrix is only at the regional level, while it would be important for it to be available down to the provincial level. In this way, tourism within the region would be mapped very precisely and could also support regional transport planning. The analysis is carried out considering tourist arrivals, as we want to analyse tourist flows from origin to destination and highlight connectivity between regions. This analysis will be carried out at a laterstage, however, compared the preliminary analysis which aims to break down regional tourist flows into external and internal.

In Figure 1, we have considered arrivals because our objective is initially to verify whether the two flows always record the same trend at the regional level and, subsequently, through correspondence analysis, to have a measure of the level of similarity existing between each region (Rij). Over the years, the first consideration that can be made is that the tourist flows within the region are always lower than those from other regions, except for Sicily from 2016 to 2017 and Piedmont from 2008 to 2012. However, the difference between internal and external tourism remains the simplest index of attractiveness which in fact is particularly high in some of the regions shown in the figure above (Friuli, Trentino, Tuscany, Umbria, Aosta Valley, Marche).

Overall, internal tourism is less variable than external tourism, partly because it is a more habitual flow of tourists. There is also a statistical measurement problem that affects the level of internal tourism (but not its performance). In fact, only rented houses managed in an entrepreneurial way or those rented with a registered contract for tourism purposes are recorded, while secondowned houses escape the survey. Finally, the level of internal tourism is also conditioned by the large number of the resident population. The combination of these aspects explains both the lower level and the lower variability of the internal tourist flow compared to the external one.

However, some regions show variability regarding internal flows, for example, Lazio,

Campania, Apulia, Calabria (only from 2016) and Sardinia. Among these, Campania is the only region to have an extremely irregular trend as regards internal flows, in particular decreasing from 2008 to 2016 and increasing from 2016 to 2019, but in any case not far from the level of flows recorded in 2006 and 2007. The trend can be explained by the economic crisis that was particularly incisive at a regional level, and in fact the trend of internal tourist flows takes on the same trend as GDP, particularly over the years 2012-2013 where GDP fell by 0.8 and 1.3 percent. Sicily is the only region where internal tourism exceeds external tourism in 2015 and 2016. This is due to the decrease in external flows while the internal flow, although slightly growing, remains substantially stable. The dynamics of external flows appear to be conditioned by the lack of infrastructure, which makes it difficult to reach and physically move around the territory.

In addition, the small and sometimes negative difference (2016-2017) between flows from outside and inside the region indicates Sicily's low attractiveness. Finally, it should be noted that in years when there is a decrease in the external tourist flow in Sicily (e.g., 2016-2017), those from neighbouring regions increase in Basilicata, Apulia, and Calabria. Finally, the decrease in external flows in 2012 can be traced back to the country's recessionary crisis at that time.

A special analysis deserves the year 2020 where, due to the pandemic, there was a change in the levels and composition of tourist flows. If we analyse the summer period (July-September 2020), the number of visitors to establishments decreased in trend terms by 36.1 per cent. This was due to the sharp drop in foreign visitors, who fell by 39.7 per cent, even though the overall flow was made up of the lower outflow of residents who poured into the country. Italian tourists accounted for 86.2 per cent of the total (ISTAT, 2020). Moreover, again analysing the tourist flows, with reference to the overall data just mentioned, it must be remembered that the motivation is almost entirely due to leisure travel, while the missing part is due to business travel, although in the July-September period it usually has a lower impact. Regarding the impact of Covid-19 on the productive sectors, an important proxy is the indicator related to the opening of VAT numbers. Compared to 2019 in the section of economic activity relating to accommodation and catering and sports and entertainment activities, the contraction was 34.1 per cent and 33.5 per cent respectively. As is well known, both activities represent an important part of the tourism industry. The situation remains difficult in 2021 and in the period January-March, considering the same economic activities, the contraction was 25.3% and 4.7% respectively (Ministry of Economy and Finance, 2021).

# **5. Some first results with the correspondence analysis**

To get a synthetic picture of the dynamics we are considering, we decided to use the correspondence analysis (Benzécri, 1973), a technique of multivariate statistical analysis of an exploratory nature, which allows us to analyze the existence of association patterns between qualitative variables. As is known, this technique considers each modality of the qualitative variables as an element of analysis; therefore, we will have 20 regions as destination and 20 regions as origin. We have only analyzed four years: 2008 which reflects the recessive economic crisis, 2014 which is the moment of recovery after the second economic crisis occurred in 2012, 2019 because it is an extremely positive year for Italian tourism, and 2020 which represents the time of the pandemic.

In figure 2, we have the projection of the 20 Italian regions in a graph with two factorial axes. In particular, the regions as origins, in blue with the indication of the year, and the region as destination, in red. In this first reading we will only look at the regions as destination.

In 2008, a year of relative crisis for tourism, the graph shows three alignments, which allow us to identify three different groups of regions, which, limiting ourselves to the destinations, are: 1) Calabria, Basilicata, Campania, Molise, Apulia, Abruzzo, Umbria, and Lazio; 2) Piedmont, Aosta Valley and Liguria; 3) Friuli-Venezia Giulia, Trentino Alto Adige and Veneto. Marche, Tuscany and Lombardy are close to the center of gravity.

In 2014, a year of recovery after the previous crises, the graph is more compact. Campania moves away from the center of gravity, followed at a distance by Calabria, Molise, Basilicata, and Apulia; while Piedmont, Aosta Valley and Liguria show another alignment.

*Source: Our processing on ISTAT data.*

We can consider 2019 a year of development that could mark a real growth of our tourism. Campania, Calabria, Basilicata, Molise, with Apulia, Abruzzo, Umbria, and Lazio reconstitute an alignment already seen, as do Piedmont, Aosta Valley and Liguria.

Finally, 2020, with the health crisis, brings us back to a situation like that of 2008 with three alignments, made up almost as much as then. On this aspect, the analysis is still in progress and will require evaluating the significance of the factorial axes.

# **6. A final remark**

The work is still in progress and will require a careful analysis of the reasons that have led to a different role for the Italian regions in the context of domestic tourism.

As we have mentioned, the choice of domestic tourism was made considering it more stable than international tourism, which is more easily affected by positive or negative conditions (health, economic, political, etc.).

The choice of arrivals, rather than nights-spent, was done to make the analysis less dependent on the type of tourism, considering the variability of the average stay based on the type of destination (sea, mountains, countryside, tourist cities, etc.).

# **References**

Antolini F., Grassini L. (2020). Issues in Tourism Statistics: A Critical Review. *Social Indicators Research*, **150**, 1021-1042.


ISTAT, (2020). Movimento Turistico in Italia, *Statistiche Report*, 29/12/2020.


### Annalina Sarra <sup>a</sup> , Adelia Evangelista <sup>a</sup> ,Tonio Di Battista <sup>a</sup> <sup>a</sup> Department of Philosophical, Pedagogical and Economic-Quantitative Sciences, University **Assessment of visitors' perceptions in protected areas through a model-based clustering**

Assessment of visitors' perceptions in protected areas through a model-based clustering

> of "G. d'Annunzio", Chieti-Pescara, Italy Annalina Sarra, Adelia Evangelista,Tonio Di Battista

# 1. Introduction

Protected areas are well-defined geographical spaces that, in view of their recognized, natural, ecological or cultural values, receive protection. They have the twofold mandate of protection of natural resources and cultural values and providing a space for nature-based tourism activities, including, among others, mountain hiking, nature photography, bird and animal watching. In the last years, the nature-based tourism is experiencing a positive and sustainable growth worldwide, making it an important sector of the tourism activity, with substantial impacts on the environment, economy and local communities. As broadly highlighted in literature, visitors' experience can be deemed a complex interaction between people and their internal states, the activity they are undertaking, and the social and natural environment in which they find themselves (Leung et al., 2008). Understanding the value attached by visitors to their destination and know their assessment on various activities in which they are engaged during their stay is a key element in shaping their satisfaction. A number of studies have shown that visitor's satisfaction is essential to boost demand, since it increases intention to revisit and recommend the destination to other people (see, among others,Sangpikul (2018); Su et al. (2016)). Besides, a greater knowledge of needs and perception of different visitor-groups should lead to improved management and marketing strategies and to more targeted provision of facilities. In this study, we focus on analyzing the perceived value of visitors who had a specific experience in the Majella National Park, located in the Abruzzo region (Italy). The research data were collected by means of a structured questionnaire administrated to people who visited the sites of the protected area during the last three summer months of 2020. A total of 151 valid questionnaires were obtained and form the base of the data analysis. Our aim is to assess the views and preferences of visitors of that protected natural space in relation to specific profiles. To this end, through a Bayesian model-based clustering, better known as Bayesian Profile Regression, we partitioned visitors into clusters, characterized by similar profiles in terms of their demographic characteristics (age, gender, education attainment, professional activity, origin area), as well as, in terms of the features of their travel behaviour (accommodation, length of stay, past visitation experience). The benefit of the followed approach lies in the ability of that Bayesian technique of simultaneously estimating the contribute of all covariates to the outcome of interest. In our context, we explore the association of detected groups with visitors' satisfaction. In the survey, the global quality of tourism service is segmented into single features and respondents were asked to give their level of appreciation on a five-point Likert satisfaction scale. To estimate the latent trait measured by the items and related to the overall satisfaction, we followed an IRT modelling. The rest of the paper is organized as follows. In Section 2 we describe the study area and the data collected. Section 3 is devoted to illustrate the methodology adopted. Finally, in Section 4 are presented the main results.

229 Annalina Sarra, Gabriele d'Annunzio University, Italy, annalina.sarra@unich.it, 0000-0002-0974-0799 Adelia Evangelista, Gabriele d'Annunzio University, Italy, adelia.evangelista@unich.it, 0000-0002-7596-9719 Tonio Di Battista, Gabriele d'Annunzio University, Italy, tonio.dibattista@unich.it, 0000-0003-2139-7273

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Annalina Sarra, Adelia Evangelista,Tonio Di Battista, *Assessment of visitors' perceptions in protected areas through a model-based clustering*, pp. 245-250, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.46, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88- 5518-461-8

# 2. Study case and data

Majella National Park (MNP) is a protected area located in the provinces of Chieti, Pescara and L'Aquila, in the region of Abruzzo, Italy. It was established in 1991 and it extends over an area of about 74,000 hectares, comprising the mountains of Majella and Morrone. The mountains dominate the territory of this national park: as a matter of fact, the 55% of it is over the 2,000 meters. It includes wide lands with particular wilderness aspects, the rarest and most precious part of the biodiversity national heritage. The diversity of the environments, the richness of nature, the evidences left by the human presence make Majella protected area attractive for visitors, potentially involved in different activities, ranging from visits to hamlets and hermitages, climbing and trekking excursions to participation to traditional festivals. In this study, a sample of visitors was intercepted through a non-probabilistic design. At the end of the survey period, a total of 151 valid questionnaires were returned and form the basis of the data analysis reported herein. The questionnaire, other than to include a few questions with respect to respondents' background (age, gender, education, professional activity, origin area), is organized in two sections devoted to investigate different aspects. Part 1 controls for travel behaviour characteristics of respondents (accommodation type, length of stay, past visitation experience, daily average expenditure) and their expectations. Since visitors are increasingly demand high quality recreational opportunities and the service that support them, the second section of the questionnaire deals with the satisfaction. The satisfaction scale contains 23 items, corresponding to different aspects of tourism experience (staff, food, excursions, outdoor activities, accommodation, information services, naturalistic and historical heritages, hospitality of local population, safety and sustainability, sanitation). Respondents were asked to indicate the degree to which they agree with each item on a Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). In our sample, the majority of the visitors surveyed were men (55%), from outside region (68%), aged between 36 and 45 (40%). Half of them had a university education. Regarding their professional activity the respondents were mainly employees (29%). The main reasons that encourage people to visit the Majella National Park are relax and contact with nature, they are in fact curious to carry out guided tours, and they make this choice thanks to the testimony of their acquaintances (word of mouth). The tourist offices to which they contacted for information are principally those of the province of Pescara. Visitors are encouraged to return to the places of the Majella National Park; in fact about half of those interviewed have already been there more than 5 times, and there spent more days (at least 1-3 nights). However, the average daily expenditure that tourists expect to expend is between 10-30 euros.

# 3. Methodology

3.1 IRT model for polytomous response data: the Graded Response Model Item response theory (IRT), initially developed in the 1960s, comprises a versatile family of measurement models concerned with the measurement of an individual's latent trait assessed indirectly by a group of items (de Ayala, 2009). The basic idea behind IRT is that the structure in the manifest responses is explained by assuming the existence of one or more latent traits (θ). The mathematical characteristics of IRT models allow a transformation from binary or ordinal answer pattern, e.g. Likert type data, into measure on an equal-interval scale. In this study, the parametric Graded Response Model (GRM; Samejima (1969)) was applied to analyze the ordered response categories of the satisfaction scale included in the questionnaire administered to tourists. In the GRM, items are described by a discrimination parameter (λ) and two or more location parameters (γ). Graded-response model item parameters are easily interpretable: the location parameters locate the point at the latent trait continuum where is a 50% chance of scoring at or above category c<sup>k</sup> of item k whereas the discrimination parameter reflects the degree to which the item is related to the underlying latent trait and can differentiate among persons with different trait levels. Specifically, in the GRM the probability that a person's response falls at or above a particular ordered category c (c = 1,...,Cκ), given the latent trait θ, may be expressed as follows:

$$Pr(Y\_{ij} = c | \lambda\_k, \theta\_i, \gamma\_k) = \Phi(\lambda\_k \theta\_i - \gamma\_{k, c-1}) - \Phi(\lambda\_k \theta\_i - \gamma\_{k, c}) \tag{1}$$

Eq. 1 describe the normal ogive formulation for the unidimensional two-parameter GRM model, where Φ(·) denotes the standard normal cumulative distribution functions and c represent the k ordered categories. GRM cannot be identified because it is overparameterized: for each item a set of k − 1 location parameters along k item slope (λ) are to be estimated. To overcome this issue some restrictions on parameters are necessary.

3.2 Model-based clustering: the BPR In this work, we focused on tracing the profile of the tourists who visited the Majella National Park, considering in the analysis, as covariates, the socio-demographic background, vacation habits, typically activities at destinations. To this end, we opted for a cluster-based method, better known as Bayesian Profile Regression (BPR)(Molitor et al., 2010). Profile regression is a Bayesian cluster method that, by capturing the heterogeneity among covariates, allows both identifying specific covariate profiles that are representative of a population (i.e. cluster) and associating them with the outcome of interest (in our case tourists'satisfaction) via a regression model. The Bayesian aspect of this clustering process has some advantages over traditional clustering approaches (e.g. Latent Class Analysis) in that the number of clusters has not to be fixed in advance but it is informed by the data and the model is fitted as a unit, allowing that an individual's outcome potentially influences cluster membership, so that the outcome and the clusters mutually inform each other. Additionally, BPR provides a unified procedure in which the uncertainty associated with clustering is naturally propagated into the regression model and incorporated into posterior inference via MCMC algorithms. Formally, for each individual i = 1,...,N, Y<sup>i</sup> denotes the outcome of interest and X<sup>i</sup> = (X<sup>i</sup><sup>1</sup> ,...,X<sup>i</sup><sup>P</sup> ) represents the covariate profile. The joint probability model for the outcome Y<sup>i</sup> and the predictors X<sup>i</sup> can be written as an infinite mixture model (Molitor et al., 2010; Hastie et al., 2013)

$$f(Y\_i, \mathbf{X}\_i | \Theta) \quad = \sum\_{c=1}^{\infty} \psi\_c Pr(\mathbf{X}\_i | \Theta\_c, \Theta\_0) Pr(Y\_i | \Theta\_c, \Theta\_0). \tag{2}$$

The probability model in equation (2) consists of two sub-models:

the *profile sub-model*, P r(Xi|Zi, Θ<sup>Z</sup><sup>i</sup> , Θ0), and the *response sub-model*, P r(Yi|Zi, Θ<sup>Z</sup><sup>i</sup> , Θ0). For each mixture component, the probability models for the outcome Y<sup>i</sup> and the profile X<sup>i</sup> are independent conditionally on some component specific parameters Θ<sup>c</sup> and some global parameters Θ0. In the BPR approach, in order to make inference, an additional allocation parameter Z<sup>i</sup> is introduced such that Z<sup>i</sup> = c indicates that individual i is assigned to component c and P r(Z<sup>i</sup> = c) = ψc, with ψ<sup>c</sup> the mixture component weight. As pointed out in Molitor et al. (2010), it is possible to approximate the infinite mixture model with a finite one, by specifying a maximum number C of components. The mixture weights, ψ = (ψ1,...,ψC), are modelled according to a stick-breaking prior (Ishwaran and James, 2001).


Table 1: Two-parameter GRM estimates

# 4. Results and conclusion

In this section, we first present the parameter estimates obtained by fitting the GRM to the items of visitors' satisfaction scale. All data were analyzed in the R programming environment (R Core Team, 2020) with mirt package (Chalmers, 2012). Table 1 shows the estimates for discrimination and threshold parameters. Discrimination estimates for the items ranged from 1.85 to 5.09, indicating that all items discriminate well between low and high levels of satisfaction: higher values indicate better discrimination. Specifically, the inspection of discrimination parameter estimates suggests that the key indicators in distinguishing visitors with different satisfaction levels are those related to appropriate promotion and guided tours. Also, play a fundamental role in discriminating visitors scoring high and low on the latent satisfaction trait, the item ascertaining the appreciation of the park naturalistic heritage. Likewise, the items that had a smaller discriminative power are those linked to the quality of catering, food and wine products, the hospitality of local population, park staff's knowledge of foreign languages. As for threshold parameters, it is worth noting that they reflects the cut-points between the five item categories. Each of them, mirrors the probability of scoring above or below a given cut-point. In other terms, the thresholds can be thought of as being on the same scale as the z-scale, where a normal distribution is centered at zero with a unit standard deviation metric. By comparing thresholds values across items, we see that, for example, item related to "catering quality" has the lowest initial threshold value of -3.08 and item referring to "pubblic transport services" has the largest initial threshold value of -1.09. This result indicated that fewer people endorsed the first item response category for the item related to "catering quality" compared to the item concerning "pubblic transport services". After estimating the visitors' satisfaction by means of IRT modelling, the next step in our data analysis was the identification of specific visitor-groups. The BPR was fitted using the R package PReMiuM (Liverani et al., 2015). The BPR has pro-

Figure 1: Characterization of visitors'satisfaction profiles associated to each cluster

# Legend: modalities code of each categorical variable included in the profile sub-model

Area of origin (0= Abruzzo; 1= Other Regions); Gender (0= Male; 1= Female); Age (0= less than 25 years, 1= 25- 45 years; 2= 46-65 years; 3= more than 65 years); Education (0= upper secondary school; 1= Degree); Job (0= Selfemployed; 1= Public-employee; 2= Other professions); Reasons (0= Contact with nature; 1= Visit to the historical, artistic and cultural heritage; 2= Relax); Expectation (0=; 1=; 2=); Knowing (0= Personal reccomandation; 1=Park website; 2=Guidebook); Tourist office (0= Chieti; 1= Pescara; 2= L'Aquila); Number of previous visits (0= Once; 1= 2-5 times; 2= more than 5 times); Number of overnight stays (0= None; 1= 1-3 overnights; 2= 4-7 overnights; 3= more than 7 overnights); Average daily expenditures (0= less than 30C, 1= 30-50C; 2= more than 50C).

duced a partition of visitors into 3 clusters: each of them is characterized by similar covariates profile as well as the same satisfaction level. In order to delineate the visitors' specific profile we can refer to the characterization of each cluster in terms of covariates, as illustrated in Figure 1. On the left panel of each figure, the MCMC posterior draws of the satisfaction level of the identified clusters are given; conversely, for categorical variables such those considered in this study, the right panel of each figure shows the posterior distributions of the probability that an explanatory variable appears with one of the discrete categories across the identified groups. Note that each column corresponds to one covariate and cluster labels are specified on horizontal axis. The different colours of box-plots (blue, green and red) refer to a 95% credible interval respectively under, within or upper the global average on all visitors (whatever the cluster). The order of cluster representation follows the order of the associated estimated visitors'satisfaction level of each cluster. A close analysis of Figure 1 reveals that in the typical profile of cluster associated with the highest satisfaction level there is a prevalence of visitors coming from other regions, aged 25-45 years, who had never been before in the Majella National Park area and who have decided to stay overnight from 4 to 7 days. On the other hand, among the visitors who exhibited lower level of appreciation of the natural area, we find a greater number of Abruzzo resident people, for whom the word of mouth has had a key role in their decision making process to choose that tourist destination. Additionally, both the number of previous visits (more than 5) and the overnight stays (more than seven) have contributed to negatively shape the visitors'satisfaction. The results of this study might have practical implications for managers of protected areas giving them useful insights on how elaborate programs according to visitors' profile. To our knowledge this research represents the first attempt of identifying clusters of visitors with similar covariate profiles through a Bayesian approach based on Dirichlet modeling mixture techniques. Along this benefit, it is important to stress the major limitation of this work concerning the selection of sample units, intercepted through a non-probabilistic design. As a result, we are not able to infer the actual visitors' flows over all the different seasons of the year.

# References


This book includes 40 peer-reviewed short papers submitted to the Scientific Conference titled *Statistics and Information Systems for Policy Evaluation*, aimed at promoting new statistical methods and applications for the evaluation of policies and organized by the Association for Applied Statistics (ASA) and the Dept. of Statistics, Computer Science, Applications DiSIA "G. Parenti" of the University of Florence, jointly with the partners AICQ (Italian Association for Quality Culture), AICQ-CN (Italian Association for Quality Culture North and Centre of Italy), AISS (Italian Academy for Six Sigma), ASSIRM (Italian Association for Marketing, Social and Opinion Research), Comune di Firenze, the SIS – Italian Statistical Society, Regione Toscana and Valmon – Evaluation & Monitoring.

Bruno Bertaccini is Associate Professor of Statistics at the "G. Parenti" Department of Statistics, Computer Science and Applications of the University of Florence. After earning his PhD in Applied Statistics, he became an expert in the effectiveness assessment of public policies, with particular attention to the evaluation of the quality of higher education and academic policies.

Luigi Fabbris is a freelance researcher in the field of statistics and social science; President of the ASA, (Italian) Association for Applied Statistics; formerly Professor of Social Statistics at the University of Padua; he has authored or co-authored more than 400 papers or books.

Alessandra Petrucci is Professor of Social Statistics and she holds a PhD in Applied Statistics. Her research interests include survey sampling methods, spatial statistics, multivariate statistical analysis applied to social and environmental issues. She is the author and co-author of numerous scientific papers published in national and international journals, and she has been a member of research projects and scientific committees for several national and international conferences.

> ISSN 2704-601X (print) ISSN 2704-5846 (online) ISBN 978-88-5518-461-8 (PDF) ISBN 978-88-5518-462-5 (XML) DOI 10.36253/978-88-5518-461-8

www.fupress.com