#### Patrizio Frederica , Michele Lallab a Department of Economics "Marco Biagi" and and RECent (Center for Economic Research), **Determinants of the transition to upper secondary school: differences between immigrants and Italians**

**Determinants of the transition to upper secondary school: differences between immigrants and Italians**

> University of Modena and Reggio Emilia, Modena, Italy. CAPP (Centre for the Analyses of Public Policies), Modena, Italy. Patrizio Frederic, Michele Lalla

# **1. Introduction**

b

Education decisions that teenagers in 13-15 age range face are the first important steps in the lifecycle which determine their educational achievements and job trajectory. These choices occur at a particular stage in their lives, when influences inside the home are still strongly felt and knowledge about their interests and abilities or skills is vague and unstable. In this sense, such decisions strongly depend on both individual and family characteristics involving their socioeconomic conditions, as well as on the environment or contextual background of the area where they reside.

The objective of this paper is to pinpoint differences with respect to citizenship, a binary variable distinguishing between immigrants and non-immigrants (hereinafter also referred to as Italians), and the *secondary* binary variable, defined as equal to one for individuals who were not enrolled in an upper secondary school and equal to zero otherwise. The Bayesian approach has been applied to investigate the determinants of the secondary variable. The prior distribution was set to be a Laplace distribution with parameter λ. Hence, the Bayesian estimation of the model parameters corresponds to the Lasso estimation procedure. The latter is a popular method that simultaneously allows for the selection of the explanatory variables and their interactions and the estimation of the model coefficients. Starting from an initial model, which includes all the selected quantitative and categorical variables and all the interactions between the categorical variables, the applied method led to a very parsimonious model, but surprisingly it did not include family income.

## **2. Data sources and descriptive statistics**

The data were extracted from two surveys, with the reference year being 2009, carried out by the Italian National Institute of Statistics (Istat): one being the European Union Statistics (or Surveys) on Income and Living Conditions (EU-SILC) restricted to Italy alone, IT-SILC (Istat, 2008; Eurostat, 2009), and the other being the Italian Survey on Income and Living Conditions of families with Immigrants (IM-SILC), which is a single cross-sectional survey (Istat, 2009) that involved families with at least one immigrant component residing in Italy. The IT-SILC sample was added to the IM-SILC sample to obtain a sample with a consistent number of immigrants with respect to non-immigrants. For further details about these two data sets and about the main variables introduced in the model, see Lalla and Frederic (2020). The target sample was obtained by first selecting individuals in the age range of 16 to 19, obtaining a sample of 2,702 cases. Then, among the latter, the eligible cases were only those individuals whose highest attained ISCED (International Standard Classification of Education) level was equal to 2 (=lower secondary education). The final target sample was made up of 2,039 individuals.

The relationship between the secondary (binary) dependent variable and the ISCED Level Currently Attended (ILCA) showed that 16.9% of individuals were not enrolled in further education (termed "not-attending"), while 79.7% were currently attending an upper secondary school (Table 1).

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Patrizio Frederic, Michele Lalla, *Determinants of the transition to upper secondary school: differences between immigrants and Italians*, pp. 13-18, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.04, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

<sup>3</sup> Patrizio Frederic, University of Modena and Reggio Emilia, Italy, patrizio.frederic@unimore.it, 0000-0001-9073-2878 Michele Lalla, University of Modena and Reggio Emilia, Italy, michele.lalla@unimore.it, 0000-0002-1639-7300


**Table 1.** Absolute frequencies and row percentages of secondary (binary) dependent variable by the ISCED level currently attended (ILCA)

The ILCA was examined with respect to several qualitative variables and revealed many significant relationships. For the sake of brevity, only some of them are cited. The ILCA showed a significant relationship with respect to citizenship, CS(2)= 45.177 (p<0.000), where CS(g) stands for Chi-Square with g degrees of freedom: the percentage of immigrants attending upper secondary education was lower than that of Italian citizens (74.3% versus 81.7%), while the percentage of immigrant not in school was higher than that of Italians (24.9% versus 14.0%). There was a significant relationship between the ILCA and self-perceived health, CS(2)= 8.351 (p<0.015), implying that individuals perceiving fair or bad or very bad health tended to discontinue their education with respect to those perceiving good or very good health (Ichou and Wallace, 2019). The ILCA was also related to the index of the total self-perceived health of parents, CS(6)= 27.356 (p<0.000). The ILCA proved to be linked to the Italian macro-regions CS(8)= 39.092 (p<0.000), as industrialisation and the possibility of finding employment increased, the percentage of individuals not in school decreased. The ILCA was related to the maximum ISCED level attained by parents, CS(12)= 179.908 (p<0.000). As the education of parents increased, the percentage of young individuals in school increased. The ILCA yielded significant relationships also with several variables describing the working conditions of the parents, although the strength of such relationships was often weak.

The ILCA was also analysed with respect to the main quantitative variables.

The age of fathers analysed according to the ILCA and citizenship showed that the fathers of immigrants were younger than the fathers of Italians by about four years. Similarly, the mothers of immigrants were younger than the mothers of Italians by about four years and nine months. The Disposable Family Income (DFI) per capita (in thousands of euros) is reported in Table 2 by the ILCA and citizenship. On the average, the DFI per capita for immigrants was significantly lower than that of Italians by about four thousand euros: about 39.8%.


**Table 2.** Absolute frequencies, means, and standard deviations (SD) of the disposable family income per capita (in thousands of euros) by citizenship and by the ISCED level currently attended (ILCA) by their children

The other types of income considered in the models revealed various structures of relationships and levels of significance. For example, the gap between immigrant and Italian fathers amounted to about ten thousand euros, i.e., 37.4%. The mothers' disposable personal income presented similar statistically significant differences for both marginal effects, with a gap amounting to about five thousand nine hundred euros, i.e., 39.5%. However, the disposable personal income gender gaps were 51% for Italians and 54% for immigrants.

The size of immigrant families proved to be slightly larger than those of Italians and was statistically significant for both marginal effects, i.e., citizenship and the ILCA.

Citizenship was examined with respect to some other variables, even if it was not a target dependent variable. Its relationship with the maximum ISCED level attained by parents was statistically significant, CS(6)= 97.73 (p<0.000) (Bertolini and Lalla, 2012; Bertolini et al., 2015). Citizenship was significantly related to the degree of urbanisation, CS(2)= 24.225 (p<0.000): immigrants tended to settle in densely populated areas more than Italians (38.4% versus 35.5%) or in moderately populated areas (44.6% versus 39.3%). Citizenship also showed a significant relationship with the Italian macro-regions and yielded a significant relationship with the index summarising the total self-perceived health of parents, CS(3)= 29.832 (p<0.000) (Ichou and Wallace, 2019). Citizenship proved to be associated with many variables describing working conditions and revealed a significant relationship with the maximum position of parents on the job, CS(4)= 173.877 (p<0.000).

## **3. Model by Bayesian Lasso selection of regressors**

Let *Y* be the binary variable coding if the *i*-th individual is not attending upper secondary education, or he/she is. Let *<sup>i</sup>* **x** be a vector of regressors. Let *<sup>i</sup>* be the probability that *Y*=1 given *<sup>i</sup>* **x** . Let 0 (,, ) **β** *<sup>K</sup>* be the parameters vector of the model. The logit model is

$$\pi\_{l} = \frac{\exp\left(\mathbf{x}\_{l}"\mathfrak{P}\right)}{1 + \exp\left(\mathbf{x}\_{l}"\mathfrak{P}\right)}.\tag{1}$$

A common method that performs estimation and model selection at the same time is the *Lasso* method (Tibshirani, 1996), which is a procedure involving an additional penalization term, *L*1, summed up to the negative log-likelihood of the model that depends on an additional parameter named , 0. Many penalized methods can be interpreted as the negative logarithm of a posterior distribution in a purely Bayesian fashion. Let ( | , ) *i i p y* **<sup>x</sup> <sup>β</sup>** <sup>=</sup> <sup>1</sup> <sup>1</sup> *<sup>i</sup> <sup>i</sup> <sup>y</sup> <sup>y</sup> i i* be the usual logit model in the usual Bayesian notation, and let <sup>0</sup> ( | ) exp *<sup>K</sup> <sup>j</sup> <sup>j</sup> p* **β** be the Laplace prior distribution on coefficients **β**; then the posterior distribution is

$$\begin{aligned} p(\mathfrak{B}|\mathbf{x}, \mathbf{y}, \boldsymbol{\lambda}) & \quad \propto & p(\mathbf{y}|\mathbf{x}, \mathfrak{B}) \cdot p(\mathfrak{B}|\boldsymbol{\lambda}) \\ &=& \prod\_{i=1}^{n} \pi\_i^{y\_i} \left( 1 - \pi\_i \right)^{1 - y\_i} \exp \left( -\boldsymbol{\lambda} \Sigma\_{j=0}^{K} \left| \boldsymbol{\beta}\_j \right| \right) \end{aligned} \tag{2}$$

The choice of parameter *λ* plays a crucial role in the estimation procedure. Many different studies have focused on this issue. Besides the classic AIC and BIC criteria, a *k*-fold Cross Validation (CV) procedure and the One Standard Error Rule (1SE) have been proposed. The applied estimation method consists in two steps:

1. The model was first estimated using the *glmnet* (Friedman et al., 2010) package in R (R Core Team, 2019). Then the *optimal* lambda 1 ( ) *SE* and the mode estimations <sup>1</sup> ˆ *SE* **β** were evaluated.

2. Using the R package *MCMCpack*, N=10,000 samples were drawn from the posterior distribution 1 (|,, ) *SE p* **β x y** to perform a full Bayesian analysis, where 1 (| ) *SE p* **β** was chosen to be Laplace distributed.

Note that the model matrix of the starting model consists in 2039 rows by 943 columns, and classical methods can be affected by the *curse of dimensionality*. Instead, the Lasso method is very stable and quick, and shrinks 923 values (out of 943) of 1 ˆ *SE* **β** to zero; thus only 20 betas have a posterior distribution which is not symmetric to zero.

## **4. Outcomes of the logistic model**

The interpretation of coefficients in a logit model is not easy. The odds ratios (OR) are reported in Table 3, which presents only interaction terms of the first order because the analysis of interactions orders was limited to the first order to simplify interpretation. The interactions are indicated by the symbol , which may be read as "by".

A binary variable having an odds ratio greater than 1 implied that the group represented by the binary variable equal to 1 had a higher probability of having *y*=1 than the group identified by the binary variable equal to 0. The binary variables ( ) *<sup>b</sup>* **x** with an odds ratio greater than 1 were observed for interactions only. For example, the odds ratio of the interaction term the "father is limited in activity because of health problems" 1 ( ) *x* "father with a permanent contract" 2 ( ) *x* , denoted by 12 *x* , was equal to 1.826 meaning that the odds of the event *y*=1, when 12 *x* =1 (both 1*x* and 2*x* are equal to 1), are +82.6% greater than the odds of the event *y*=1, when 12 *x* =0. Let **x***<sup>c</sup>* **μ** be the mean values of the continuous regressors. Note that: (1) the product of two binary variables is a binary variable again, (2) the percentage of increment of the reference probability, | *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** , is given by [100\*(1OR)] and is reported below in parentheses, (3) the corresponding value of OR may be found in Table 3. The probability of having *y*=1 (i.e., of discontinuing their education) was equal to | *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** = **0.160**, calculated at the mean values of the continuous regressors ( ) *<sup>c</sup>* **x = μ** and the binary variables equal to 0 ( ) **x 0** *<sup>b</sup>* . Therefore, for <sup>12</sup> *x* the result was a probability of 12 x | 1.8260.160= 0.292, nearly double the probability for 12 *x* =0. Similarly, significant high probabilities of discontinuing one's education or dropping out were observed for other interaction terms: "father is limited in activity because of health problems" "family living in the macro-region Islands" (+149.0%), "father is limited in activity because of health problems" "family living in a moderately populated area" (+56.3%), "assets reduction for needs" "young individual with self-perceived bad health" (+234.0%), "assets reduction for needs" "mother with self-perceived bad health" (+56.3%), "family living in a densely-populated area" "parents are unemployed or inactive" (+122.7%), "mother only is employed" "parents skill level on the job is labourer" (+82.3%). In synthesis, real and selfperceived health conditions heavily affect the probability of discontinuing one's education in the transition from lower secondary to upper secondary school and throughout all the secondary school years, although this happens through the interactions with other factors. Note that the "number of helps requests for aid because the family lives in need", which is formally a continuous variable, interacts with "mother suffering from any chronic (long-standing) illness or condition" and yielded an odds ratio greater than 1 at the mean of the first term.

The binary variables having an odds ratio lower than 1 implied that the represented group had a lower probability of having *y*=1 than the complementary group. In Table 3 there are only two binary variables with an odds ratio lower than 1. For example, the binary variable "both parents employed" (BPE) had an odds ratio equal to 0.736 and hence the corresponding complement to one, expressed as a percentage, was equal to [100\*(10.736)] = 26.4%. Therefore, the probability of discontinuing one's education amounted to 26.4% (the negative value indicates the reduction quantity) of the probability of the complementary group, which did not have both parents employed, | . *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** In other words, the group with BPE equal to 1 had a probability BPE| 0.7360.160= 0.118, implying that the probability of the group with BPE equal to 1 decreased the probability of discontinuing their education by an amount of 100(10.736)= 26.4% with respect to the complementary group, which had a probability given by | *b c <sup>i</sup>* **x =0 x = <sup>μ</sup>** = 0.160. Similarly, a significant low probability of discontinuing their education was observed for the interaction term the "assets reduction for needs" "mother with permanent employment contract" (51.5%). The constant of the model was not statistically significant, even if its magnitude was comparable with other parameters.



The continuous variables. The individual age (range 16-19), expressed in decades, showed a parabolic and positive impact on the interruption of education paths before completion of upper secondary school. The high impact may occur for specific reasons: the survey protocols did not interview individuals under the age 16, the vocational school data were not collected well. The other continuous single variables entering the model showed significant effects on the interruption of education. As the ages of fathers and the parents' education levels increased, the probability of discontinuing education decreased. If the number of objects owned in home (dishwasher, refrigerator, telephone, television, and so on) increased, then the risk of interrupting one's education decreased. As indicated above, the increase in the "number of helps requests because the family lives in need" for individuals having a "mother suffering from any a chronic (longstanding) illness or condition" yielded an increase in the risk of dropping out of school. This empirical evidence highlights the importance of welfare programmes to help families experiencing economic and physical difficulties, with the specific aim of reducing the number of students interrupting their education.

The main fault of the Lasso method in selecting significant explanatory variables concerns the lack of some income variables in the model because various income components have frequently been found to be significant in the literature (Ochsen, 2011; Krause et al., 2015).

In the applications, the interactions should be supported by social, behavioural, psychological or economic theories. Otherwise, they may be obtained automatically just by using an adaptive procedure like the Lasso method and only as empirical findings. In fact, few models with interactions exist in the literature. Probably, the interactions may be easily found among binary or categorical variables, but this case is relatively interesting because they can be replaced with specific typologies. The same holds true for the interactions of a continuous variable with other explanatory binary variables, but the interaction between two continuous variables is very difficult to grasp immediately. In general, it is useful to find a theoretical justification for the existence of the interactions, instead of blindly searching for interaction terms. However, it is highly plausible that almost all phenomena are outcomes of interactions among many variables, but knowledge about and explanations of these results may become very complicated and challenging.

## **References**

