#### their access test results: a focus on an Italian case Matteo Corsi <sup>a</sup> , Luca Persico <sup>a</sup> , Sara Preti <sup>a</sup> , Agnese Sechi <sup>a</sup> <sup>a</sup> Department of Economics, University of Genoa, Genoa, Italy; **The relation between students' educational performances and their access test results: a focus on an Italian case**

The relation between students' educational performances and

Matteo Corsi, Luca Persico, Sara Preti, Agnese Sechi

### 1. Introduction, Data and Descriptive analysis

This paper aims at analyzing the relationship between the university performances of freshman students, measured by the University Credits (CUs)<sup>1</sup> gathered during the first semester, the results achieved in T.E.L.E.MA.CO. (*TEst di Logica E MAtematica e COmprensione verbale*) test and their social-demographic characteristics. Starting from the Bologna Declaration of 1999 (ministerial decree of November 3, 1999, no. 509), the Italian university system has seen important changes at the organizational, educational, and financial levels. The training credit model was introduced for harmonizing national and international university systems. Another change of major importance in the reform consisted of the reorganization of degree courses into homogeneous classes. The reform established a three-cycle higher education system comprising undergraduate (3-years bachelor's degrees), master's or specialist degrees (2-years master equivalent degrees), and doctoral studies. The education system also provides for the possibility of attending other courses such as first and second-level masters. Furthermore, in 2004 non-selective admission tests were introduced for all bachelor's degrees.

The Department of Economics and Business Studies (DIEC) of the University of Genoa (Italy), which has open-enrolment courses, adopted TE.L.E.MA.CO. test, a very important tool for verifying initial knowledge considered functional to the effective participation of a university course. It consists of two sections: a common core for all degree programs, aimed at proving the basic skills of comprehension of Italian texts (literacy), and logical reasoning skills (numeracy), and a differentiated section according to the chosen program2. Additional mandatory tasks will be assigned to students who gain a score lower than the established thresholds.

Data are collected by the DIEC. The main dataset derives from three different sources: the first one contains information related to sociodemographic characteristics and students' educational backgrounds; the second one is about information relating to the university career; the last one concerns the results of the TE.L.E.MA.CO admission test. The main dataset records information on 488 students enrolled in the Department of Economics of the University of Genoa; they are all pure freshmen (first matriculation in the university) and not exempted from the obligation to take the test3. The considered attributes are age, gender, high school, diploma grade, course of study, results of T.E.L.E.MA.CO. test, and average number of CUs.

Once the main dataset has been assembled, we performed a descriptive analysis of the students' characteristics. The average age of the students is 19 years, the females represent 31% of the sample. 55% of students are enrolled in Business Administration, 27% in Economics of Maritime Business, Logistics and Transport, and 18% in Economics. The average high school final grade is 74.78, and 25% of the students have a grade higher than 81. Women in the sample

Referee List (DOI 10.36253/fup\_referee\_list)

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

<sup>1</sup>CUs represent indicators that measure the workload required to attend the lessons and prepare for the specific exam.

<sup>2</sup>Students pass the common core if and only if obtain a score equal to or higher than 12 out of 20. Then, those who have passed the common core and who have achieved a score equal to or greater than 6 in the individual sections (literacy and numeracy) can access the T (text) and M (mathematical) extensions respectively.

<sup>3</sup>Students who are exempted are students who have achieved a high school final score equal to or greater than 90/100 or in other peculiar situations listed at the following link: https://unige.it/studenti/telemaco#cosaTELEMACO.

Matteo Corsi, University of Genoa, Italy, matteo.corsi@edu.unige.it, 0000-0001-7545-3600 Luca Persico, University of Genoa, Italy, luca.persico@unige.it, 0000-0002-5436-2627 Sara Preti, University of Genoa, Italy, sara.preti@edu.unige.it, 0000-0002-7424-9998 Agnese Sechi, University of Genoa, Italy, agnese.sechi@edu.unige.it

Matteo Corsi, Luca Persico, Sara Preti, Agnese Sechi, *The relation between students' educational performances and their access test results: a focus on an Italian case*, © Author(s), CC BY 4.0, DOI 10.36253/979-12-215-0106-3.05, in Enrico di Bella, Luigi Fabbris, Corrado Lagazio (edited by), *ASA 2022 Data-Driven Decision Making. Book of short papers*, pp. 23-28, 2023, published by Firenze University Press and Genova University Press, ISBN 979-12-215-0106-3, DOI 10.36253/979-12-215-0106-3

are on average better than males in terms of high school final grades: female students have a mean equal to 76.75, while men's one is equal to 73.88. A t-test confirmed a significant difference on average between the two groups.

To have a whole picture of the scenario, it is interesting to deepen into how the unalike performances are related to the different types of high schools. Table 1 shows the frequency distribution of students' high school and university performances by school of origin. High school performances are measured by the average grade of diploma, while university performances by the number of CUs gained during the first semester and by the average score of the Common Core Score (named *CC Score* in Table 1) in the TE.L.E.MA.CO. test. It is worth noting that about 40% of students enrolled in Economics in the year 2021 come from the scientific high school, followed by the technical institute with 30% and the vocational institute and linguistic high school with 9%. Regarding the TE.L.E.MA.CO. test results, 346 students out of 488 students have been successful: 65% of the total girls and 74% of the total males who do the test, pass it4. Focusing on the sample distribution of the scores gained by students grouped by gender in the common core of the TE.L.E.MA.CO. test, there are no gender gaps in the scores obtained in the literacy section; on the other hand, differences emerge in the scores in the numeracy section. If there is a gap in favor of females relating to high school performances, the scenario tips up and male students perform better than females in the numeracy section, a result that has been confirmed with a t-test5. These two results may be consistent. Indeed, we do not know if the differences in STEM<sup>6</sup> subjects performances (which occur in our sample for the numeracy section) in favor of males also exist in the grades of the high school STEM tests or not. On average, we know that females get higher graduation marks, but we do not know what their performance in STEM subjects is. It should be considered that our sample examines students who must necessarily take the test (therefore not the best in terms of school performance) and that the male-female ratio which comes from scientific high schools (students with a stronger propensity in scientific subjects) is very high, compared to other institutes. There is therefore certainly a problem with sample selection and balancing, which does not allow us to interpret the problem of the gender gap exhaustively and completely.


Table 1: Distribution of students' high school and university performances by school of origin

Focusing on the students' background, Figure 1 shows that on average people who come from scientific and classical high schools perform better than the others in all the sections, and on average students who attended vocational or other types of high school (such as music or artistic high school) do not pass the common core of the TE.L.E.MA.CO. test. Students who come from the scientific school perform much better than others, even compared to the students of the classic school, with regard to the extension of mathematics. Finally, we examine the performance of students during the first semester by looking at the number of CUs (which ranges from 0 to 27); 33% of students do not pass any exams (0 CUs), while 28% reach 27 CUs

<sup>4</sup>Moreover, 253 students pass the mathematical extension: 43% of the total girls and 55% of the total males who do it, pass it. No one is allowed to do the text extension.

<sup>5</sup>This result is consistent with the literature about the gender gap in STEM courses (Priulla et al., 2021).

<sup>6</sup>STEM is an acronym for the fields of science, technology, engineering, and math.

threshold. The number of CUs earned at the end of the first semester follows the same trend for both male and female students. Looking at the backgrounds, students from vocational institutes, human sciences, or other types of high schools perform worse, while people with scientific and classical backgrounds earn a greater number of CUs.

Figure 1: TE.L.E.MA.CO. test scores' distribution per school type

*Source*: Computed on the basis of data from DIEC, 2021

### 2. Empirical Model

In this section, we perform two different models, a logistic and an ordered logistic, to study the probability of acquiring CUs. These approaches are useful to understand when and how timely policies and programs can be implemented to avoid losing students, a frequent trend, especially in the first semester of the first year. Specifically, the main goal of the logit model is to represent the probability of getting at least 18 CUs<sup>7</sup> during the first semester, with respect to students' characteristics and their TE.L.E.MA.CO. test results. This model and the idea of expressing the dependent variable as a dummy depend on the fact that, after only a few months from the start of a university career, a student has necessarily given few, if any, exams. This implies the existence of a minimum number (0) and a maximum number (27) of credits which prompted us to consider the exceeding or not of the threshold as a proxy of academic performance. The binary dependent variable is equal to 1 if students gain at least 18 CUs (2 exams) at the end of the first semester, and 0 otherwise. The independent variables included in the model are the following: gender (dummy variable); age at enrolment8; high school final mark (which are normalized from 60 to 100); type of school; university courses; two variables that capture the literacy and numeracy scores9; a variable that measures the distance in km between home (we use the high school address as a proxy) and the university; and a variable which represents the average income in the municipality where they reside, as a proxy of the students' parents income. We suppose that both variables have an important, even if indirect, impact on students' performances. The idea that commuting or changing the habit and home (especially at

<sup>7</sup>We have chosen this threshold because it represents 2 out of 3 exams since in the first semester there are only 9-credit exams by default.

<sup>8</sup>The variable is dichotomous in <= 19 and > 19; the dummy assumes the value 0 if the student has a regular or early schooling path, otherwise it takes the value 1.

<sup>9</sup>We do not consider the mathematical extension score because this variable hides the effects of other covariates, even though only a part of the sample accesses the test.

the early beginning) may negatively affect performances is widespread in the literature (Tigre et al., 2016). Also, socio-economic situations can influence school achievement. The left side of Table 2 reports the main results of the logit model (odds ratios, estimated coefficients, standard errors, and p-value significance).


Table 2: Logit and Ordered Logit estimates

The baseline student has the following profile: female, who comes from the scientific high school, with an age of 19 years at most (therefore regular from the academic point of view), with a final grade equal to 74.78 (average diploma grade of the sample) and who has reached the average sample results in both literacy and numeracy sections. In addition, this student attends Business Administration, has an income equal to the average of the sample, and has a zero distance from the university.

Proceeding with the analysis of the results obtained from the logit regression, the intercept shows that for the baseline student the probability to gain at least 18 CUs is 76% and the odds ratio is 3.125 with a significance of (with p<0.01). Regarding the school types, we can see that students attending different high schools to the scientific one are less likely to obtain the credit threshold with a high significance. The Other types high school category, on the other hand, is not significant. Another relevant variable is the High School final grade; for a unit increase in the final grade, the log odds of CUs increases by 1.081 (with p<0.01). About the admission test, we can see that the score achieved in the numeracy section is the only significant: with a probability of 53% students who have a score higher than the mean, perform better. Distance also has a significant impact on students' performance: the further away a student is from the university, the less likely it is to take two out of three exams. In literature, the role of commuting as a penalty in student performance has already been addressed, although not extensively: the waste of time associated with the hours of travel, the physical and mental stress of being far away, and also the greater difficulty in creating work and friendship groups are certainly some of the main components.

To assess the performance of the logit model we use the area under the receiver operating characteristic (ROC) curve (AUC). The AUC value of the logit model is equal to 0.767; since the larger the AUC, the more accurate will be the prediction model, the logit model can be considered as sufficiently accurate. Another way to assess the model performances is to examine the agreement between actual observations and predictions, through a contingency table. In order to transform the student's predicted probability (probability of obtaining at least 18 CUs) into a predicted class (if the student has obtained at least 18 CUs) is sufficient to define a specified cut-off probability value. This value is computed using the *Youden's index*<sup>10</sup> (Youden, 1950), and it is equal to 0.570, as shown in Figure 2. Finally, we consider the actual and predicted classification to measure the goodness of the logit model: the percentage of correctly classified is 70%.

*Source*: Computed on the basis of the logit model's output

Since in the first semester, students have done only 0, 9, 18, or 27 CUs, and every exam has the same number of CUs (9) and so the same difficulty, we have decided to perform an ordered logistic model trying to capture more information. Also in this model, the dependent variable is the students' performances in terms of CUs, but this time it is measured on an ordinal scale in 4 categories: 0 exams (inactive) corresponding to 0 credits, 1 exam to 9, 2 to 18, and 3 to 27. The right side of Table 2 shows the main results of the ordered logistic model. As we can see, there are three estimates of the intercept because, being four the variables, three are the cutoffs from one category to another. About the last cutoff, it is worth noting that the third and fourth categories (2 exams and 3 exams respectively) are not significantly different, therefore they could be aggregated without consequences. Also in this case it is more interesting to comment on the coefficients, which confirm the results of the logistic model, even if some differences emerge: the variable *Other Types* becomes significant, and the influence on the dependent variable of other covariates (Technical, Classic, Score Numeracy) increases. However, the Distance from home loses its significance. Compared to the baseline, set as previously, males rather than females, students of other schools than the scientific, and with a lower than average diploma and numeracy grades are more likely to obtain fewer CUs. We also perform a Brant test to check the hypothesis of parallelism and the test suggests that ordered logit's regression assumptions are met. In addition to the results of ordered logit coefficients, marginal effects are used to predict the effect and the magnitude of change. Concerning the high school type, we can see that students who came from a high school other than the scientific (model baseline), have a lower probability to reach two or three exams; in particular, the probability is much lower for the vocational and the human science high schools (in these cases also the likelihood of students to get one exam is lower). Furthermore, students who attended classic and technical high schools have a higher probability to take at least one exam: for example, a student from a classical

<sup>10</sup>The Youden's index, also called Youden's J statistic, was developed in 1950 by W.J. Youden and represents a single statistic that captures the performance of a dichotomous test. The index considers both the true positive rate (Sensitivity), and the true negative rate (Specificity), and it is given by Sensitivity+Specificity-1.

high school has a probability of 0.327 of getting two exams higher than a student from human sciences. Moreover, if the student's high school grade or the score in the numeracy section increases by one point, then the likelihood of taking zero exams decreases by 1.26% and 2.69% respectively.

# 3. Conclusions

The objective of this work was to analyze the relationship between students' university performances, measured by the University Credits (CUs) gathered during the first semester, and the results achieved in T.E.L.E.MA.CO. test, a useful tool for orientation and access to university studies based on solid scientific methodologies, and their social-demographic characteristics. A logit and an ordered logit model are used to compute the probabilities to reach at least 18 CUs (logit) or to obtain 0, 9, 18, and 27 CUs (ordered logit). What emerges from the models is that various factors are determinants. About the students' background, the graduation grade and the type of school predict the success at exams (especially in a negative way for vocational, linguistic, and human sciences high schools). As for the test, the evaluation of the numeracy section is the main determinant of success in performance. Based on a consistent statistical approach, our result seems to confirm the ability of the admission test to predict academic success in the first year (Bestetti et al., 2020; Migliaretti et al., 2017; Carrieri et al., 2013; CISIA, 2020). Furthermore, given the fact that students we consider obtain a diploma grade lower than 90, the admission test is also significant in the presence of the high school grade, providing additional information that the latter element fails to provide. Also for this reason the test can be a powerful tool and a good alternative to the high school final mark as a university admission indicator, often the only information used. It would be interesting as future work to understand if additional and perhaps differentiated approaches are necessary according to the background of each student, especially at the beginning of their university careers. In addition, hybrid solutions for distance and face-to-face teaching could be implemented to facilitate off-site students.

# References

