#### Annalina Sarra <sup>a</sup> , Adelia Evangelista <sup>a</sup> , Tonio Di Battista <sup>a</sup> <sup>a</sup> Department of Philosophical, Pedagogical and Economic-Quantitative Sciences, University "G.d'Annunzio" of Chieti-Pescara, Italy **Students' feedback on the digital ecosystem: a structural topic modeling approach**

Students' feedback on the digital ecosystem: a structural topic modeling approach

Annalina Sarra, Adelia Evangelista, Tonio Di Battista

## 1. Introduction

In March 2020, to contain the spread of the COVID-19 pandemic, almost all educational ecosystems (school, universities and private centres) around the world were forced to cancel face-to-face classes and replace them with didactic instruction online. Various and diversified methods of teaching delivered remotely were activated quickly. These solutions have undoubtedly had the purpose of ensuring the continuity of basic education and institutional activities, but they also made it possible to experiment, on large scale, didactic solution, mediated by screen, at design and didactic mediation level and interaction. The debate around the way educational systems reacted to the emergency is probably going to be a proper theme of investigation for next years. In this respect, (14), argue that the infrastructures for digital education that have been chosen to give a reply to the pandemic crisis, will redefine public education for the future. In addition, other scholars, see for example (2) and (6), have already carried out researches on screen-mediated didactics in the pandemic context. Their studies highlighted some essential specificities for a positive teaching-learning process, mainly related to the sociality and the possibility of working in cooperative environments, the possibility of co-building knowledge in an active way, within a community of practice. Following these lines of research, in this paper, we are aimed at capturing students' perspectives and perceptions on screen-mediated didactics during the pandemic emergency. Data have been collected through a survey, which consisted of open-ended questions administrated to students attending six teaching large courses, held by four professors in two different Italian universities (Macerata and Chieti-Pescara). In particular, in the research have been involved students who attended course of Educational Sciences degree (45 from the course of "Didactics" and 48 from the course of "Special Pedagogy"). The questionnaire was also administrated to students enrolled in the Primary Education degree programme: 230 from the course of "Technologies for Education and Learning", 230 from the course "Laboratory of Technologies" and 230 from the course of "General Education". Finally, there were students who attended the course "Didactics of Training", enrolled in the Pedagogical Sciences degree. All courses refer to the year 2019/2020. To circumvent the dilemma between the benefit of having open-ended questions and the cost associated with their analysis, we adopt, in this work, an unsupervised topic modelling approach. More in detail, we focus on Structural Topic Modeling (10), which is deemed a variant of Latent Dirichlet Allocation (1), suited to address the strict statistical assumption that all texts in the modelled corpus are generated by the same underlying process. The remainder of the paper is organized as follows. Section 2 describes the unsupervised topic modelling adopted, while Section 3 presents the results. Section 4 contains an interpretation of the main findings and the conclusions.

#### 2. Methodology

Topic modelling, focusing on text mining and information retrieval, has received a lot of attention and gained widespread interest among researchers, in recent years, in many research

Annalina Sarra, University of Chieti-Pescara G. D'Annunzio, Italy, annalina.sarra@unich.it, 0000-0002-0974-0799 Adelia Evangelista, University of Chieti-Pescara G. D'Annunzio, Italy, adelia.evangelista@unich.it, 0000-0002-7596-9719 Tonio Di Battista, University of Chieti-Pescara G. D'Annunzio, Italy, tonio.dibattista@unich.it, 0000-0003-2139-7273

Referee List (DOI 10.36253/fup\_referee\_list)

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Annalina Sarra, Adelia Evangelista, Tonio Di Battista, *Students' feedback on the digital ecosystem: a structural topic modeling approach*, © Author(s), CC BY 4.0, DOI 10.36253/979-12-215-0106-3.36, in Enrico di Bella, Luigi Fabbris, Corrado Lagazio (edited by), *ASA 2022 Data-Driven Decision Making. Book of short papers*, pp. 203-208, 2023, published by Firenze University Press and Genova University Press, ISBN 979-12-215-0106-3, DOI 10.36253/979-12-215-0106-3

fields. The core idea behind topic models is that documents are mixture of multiple topics. One of the most used probabilistic topic modelling algorithm is the Latent Dirichlet Allocation (LDA) (1). In the LDA approach, documents are generated via 3-level hierarchical Bayesian structure, under which each document d<sup>m</sup> is modelled a finite mixture over a set of K corpuswide topics z<sup>k</sup> (1) and each topic is modelled as a set of V words wv. The generative process performed by LDA on a corpus of documents can be summarized as follows: for each topic z, choose the probabilities over words ϕ<sup>z</sup> ∼ Dir(β), where ϕ<sup>z</sup> is drawn from a symmetric Dirichlet prior distribution with parameter β; for each document d, choose the probabilities over topics θ<sup>d</sup> ∼ Dir(α), where θ<sup>d</sup> is drawn from a symmetric Dirichlet prior distribution with parameter α; for each word wdn in document d, choose a topic zdn ∼ Multinomial(θd) and choose a word wdn ∼ Multinomial(θzdn). Being LDA a bag of words model, the order in which the words appear is disregarded. Additionally, although LDA is able to extract hidden topics from text document, it does not allow examining the relationship between document-level information and the content of a document model. This limitation can be overcome by using Structural Topic Modelling (STM), developed by (10). STM is a natural-language processing algorithm expressly designed to represent the effect of external variables on topical content (probabilities associated with words in each topic) and topical prevalence (proportion of different topics that occurs within documents). Through STM, it is possible to estimate a series of regression models that treat the prevalence of each identified topic as an outcome variable. The STM capability has been investigated in an extensive body of works, in the fields of economics, finance, political science, education, new media (see, among others, (12), (15)).

## 3. Results

The textual responses collected in this study were pre-processed using common steps for cleaning text data, including tokenization, lowercase conversions, stop-removal and lemmatizing/ stemming. Corpus preparation and cleaning were done using the *quanteda* package (4) in R (8). The final corpus contains 1354 documents. To avoid any possible inconsistences, we carried our topic analysis on the original texts, expressed in Italian language. The most frequently 20 words of the corpus are displayed in Figure 1. To extract hidden topics from the corpus, we used a STM package in R, developed by Roberts et al. (11). As argued by Roberts, for having semantically interpretable topics, words should tend to occur within response and their top keywords should be unlikely to overlap with keywords from other themes. The first analytical step was the identification of the appropriate number of topics. By triangulating different diagnostic measures (namely, held-out likelihood, residuals, semantic coherence and lower bound), 10-topic model was settled as the best option. In the topic labelling process, to come with topic labels that reflect the main themes in a clear and concise way, high probabilities (Highest Prob) words, frequency-exclusivity (FREX) words, Lift, Score metrics and top 10 representative words of each topic were used (Figure 2 and Figure 3).

The most interpretable Topics retrieved from STM were assigned to the following dimensions: "Physical space home" (Topic 1), "Lack of direct confrontation and relationship" (Topic 2), "Building the community: use of whatsapp", (Topic 3), "Ask question to the professor" (Topic 4), "Communication and learning tools" (Topic 5), "Feedback" (Topic 6), "Listen to the recorded lesson again" (Topic 7), "Interaction with teacher" (Topic 8) (see wordclouds displayed in Figure 4).

The top words occurring within Topic 1 (lessons, distance, face-to-face, value, added, home) stress how that topic is connected with a different reinterpretation of learning environment. In more detail, students underline two central aspects: the possibility to have more concentration at home but also some elements of distraction or linked to digital divide. Looking at the set of

Figure 1: Word frequencies (Top 20) in open-ended responses


Figure 2: Top words for each topic according to highest probabilities, FLEX, LIFT and SCORE weighting

Figure 3: Top words associated with each topic resulting from structural topic modeling (k = 10)

Figure 4: Wordcloud: a) Topic 1: *"Physical space home"*; b) Topic 2: *"Lack of direct confrontation and relationship"*; c) Topic 3: *"Building the community: use of whatsapp"*; d) Topic 4: *"Communication and learning tools"*; e) Topic 6: *"Feedback"*; f) Topic 7: *"Listen to the recorded lesson again"*.

words linked to Topic 2 (contact, confrontation, absence, presence, direct), we are able to state that students think that interaction is somehow limited in the screen-mediated mode. Topic 3 focuses on the attempts made by the students of rebuilding the community or the contact with the other. Words associated to Topic 4 (questions, asking, available, greater, professor) recall the possibility for students to constantly ask questions to teacher. Terms immersed in Topic 5 refer to the online learning platform, perceived by students as essential for both supporting learning in an uncommon situation and as a space for discussion. Topic 6 captures the centrality of interaction and specifically of feedback and highlights how the teacher's feedback has not changed during the transition from face-to-face teaching methods to online mode. The top scoring words for Topic 7 clearly refer to the possibility of listening again to the lesson and of watching it more and more times, getting back to it in a recursive way. Finally the discussion in Topic 8, gives us the students' perception of having built a sound relationship with the professor. More challenging was to get insights from the last two dimensions characterized by less focused words. We also estimated the correlation between the identified topics. Except for "Interaction with teacher", the other topics are associated with at least a topic, meaning that they are likely to occur within the same documents. Finally, to complete the quantitative analysis of textual data, we incorporated the covariate information into topic modeling. Specifically, we estimated the topical prevalence by "teacher" covariate. The regression results support the causal impact of "teacher" variable that especially affects how Topic 2, Topics 5, 6 and 7 vary by document.

## 4. Discussion and Conclusions

The purpose of this study was investigating how students, who attended courses in two Italian universities, experienced online education during the coronavirus emergency. To this end, we used an unsupervised approach, based on the identification of latent topics, to automatically analysis open-ended questions. A throughout analysis of topic modelling results allows us to draw the following conclusions. By considering the perceptions in relation to blended environments, modellized by Chang and Fisher (3), we focus on the categories of "Interaction" and "Reply", which exploring to what extent communication is achieved from students' point of view and how students had felt about using web-based medium, respectively. Topics retrieved by the structural topic modelling analysis can be aggregated into three broad themes: perceptions related to the *physically of body and space*, perceptions related to *virtual relationships* and *communication and perception related to feedback*. Topic 1 and Topic 7 fall in the category "Spatiality and corporeity". In the distance learning mode, students recognized the undeniable advantages of being free from having to move: due to distance educational technologies implementation, remote learning is available to everyone, in any place. This aspect enables to stretch the same concept of access and participation and it has to be considered as an element of inclusion. Additionally, students reported the possibility of a greater interaction and participation during the lesson and the opportunity to listening again to the lesson and of watching it more and more times, getting back to it in a recursive way along time and in different moments. Under the umbrella of "virtual relationship" theme, there are Topics 2, 3 and 5. Based on the results of the topic modelling algorithm, we found out that students expressed that the filter of the screen was perceived as a barrier. In fact, even if online learning enables them to see each other and talk each other, it interrupted the relation flow that used to be experienced in a classroom. Finally, Topics 4 and 6 are the relevant themes for the broad category "Feedback". Throughout these topics, students underlined how the emergency remote education did not compromise the possibility of giving and receiving some feedback. Overall, the results of this study suggest the fluidity of contemporary education context: in other words, we are in front of a dynamic, hybrid educational context, with a weak structure, in continuous transformation (7). This feature, exacerbated during crisis periods for the emergences of new obstacles and constraints, requires a rethinking of learning-teaching practises. A robust pedagogically and learning environment can be guaranteed by hybridizing the educational contexts. "Vertical blended", which provides for an alternation between moments of classroom teaching activity and remote teaching moments, must be accompanied by a "Horizontal blended", which integrates and hybridizes real and virtual, analogical and digital in a synchronous dimension (9).

# References

