**Advances in Analytics for Learning and Teaching**

Sofia Mougiakou · Dimitra Vinatsella · Demetrios Sampson · Zacharoula Papamitsiou · Michail Giannakos · Dirk Ifenthaler

# Educational Data Analytics for Teachers and School Leaders

# **Advances in Analytics for Learning and Teaching**

### **Series Editors**

Dirk Ifenthaler , Learning, Design and Technology University of Mannheim Mannheim, Baden-Württemberg, Germany

David Gibson, Teaching and Learning Curtin University Bentley, WA, Australia

This book series highlights the latest developments of analytics for learning and teaching as well as providing an arena for the further development of this rapidly developing feld. It provides insight into the emerging paradigms, frameworks, methods, and processes of managing change to better facilitate organizational transformation toward implementation of educational data mining and learning analytics. The series accepts monographs and edited volumes focusing on the above-mentioned scope, and covers a number of subjects. Titles in the series *Advances in Analytics for Learning and Teaching* look at education from K-12 through higher education, as well as vocational, business, and health education. The series also is interested in teaching, learning, and instructional design and organization as well as data analytics and technology adoption.

Sofa Mougiakou • Dimitra Vinatsella Demetrios Sampson • Zacharoula Papamitsiou Michail Giannakos • Dirk Ifenthaler

# Educational Data Analytics for Teachers and School Leaders

Sofa Mougiakou University of Piraeus Piraeus, Greece

Demetrios Sampson Department of Digital Systems University of Piraeus Piraeus, Greece

Michail Giannakos Dept of Computer Science Norwegian Univ of Science & Technology Trondheim, Norway

Dimitra Vinatsella University of Piraeus Piraeus, Greece

Zacharoula Papamitsiou Department of Technology Management SINTEF Digital Trondheim, Norway

Dirk Ifenthaler Learning, Design and Technology University of Mannheim Mannheim, Baden-Württemberg, Germany

This book is an open access publication.

ISSN 2662-2122 ISSN 2662-2130 (electronic) ISBN 978-3-031-15265-8 ISBN 978-3-031-15266-5 (eBook) https://doi.org/10.1007/978-3-031-15266-5 Advances in Analytics for Learning and Teaching

© The Editor(s) (if applicable) and The Author(s) 2023

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specifc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# **Preface**

**Educational Data Analytics** (**EDA**) have been attributed with signifcant benefts for enhancing on-demand personalised educational support of individual learners as well as refective course (re)design for achieving more authentic teaching, learning and assessment experiences integrated into real work-oriented tasks. As a result, most Course Management Systems are now incorporating Educational Data Analytics tools. However, these tools are still not widely used because of the low **Educational Data Literacy** (**EDL**) competences of the education professionals that could be using them (i.e. K12 teachers adopting the fipped classroom model in their teaching and school leaders leveraging educational data to support decisionmaking). Furthermore, online learning environments and education data-driven practice and assessment raise challenges such as ethical issues and implications, especially in terms of privacy, security of data and informed consent that should be addressed via transparent and well-defned ethical policies and codes of practices.

Particularly nowadays, as the Covid-19 pandemic continues to unfold around the world, **emergency remote teaching** has become the new reality for school education around the world. Subsequently, **educational data**, which is the rich data footprint that students generate through their interactions in digital learning environments, has increased exponentially. This unprecedented crisis has brought to the forefront the urgent demand for all education professionals, including schoolteachers and leaders, to reinvent their teaching and learning environments. Educational Data Analytics (EDA) has been identifed as a key enabler to seize the opportunities – through the use of educational data generated during **teaching** and **learning** (including **assessment**) – for better supporting learners in online and blended courses. To this end, the "upskilling imperative" of comprehensive EDL competences has been recognised as an immediate need to support **creative**, **fexible** and **inclusive education** and **training** in the long term, highlighting the importance for educators to ground decisions based on data and evidence aiming to boost the **effectiveness** and the **effciency** of the **education systems**, even beyond the periods of emergency education such as public health crises or natural disasters.

This book aims to support the development of core competences for Educational Data Analytics of online and blended teaching and learning.

It combines:


It *targets:*


The content of this book has been developed within the action **Learn2Analyze** – *An Academia-Industry Knowledge Alliance for enhancing Online Training Professionals' (Instructional Designers and e-Trainers) Competences in Educational Data Analytics*, which is co-funded by the European Commission through the *Erasmus+ Programme* of the European Union (Cooperation for innovation and the exchange of good practices – *Knowledge Alliances*, Agreement n. 2017-2733 / 001-001, Project No 588067-EPP-1-2017-1-EL-EPPKA2-KA). The European Commission's support for the production of this publication does not constitute an endorsement of the contents, which refects the views only of the authors, and the Commission will not be held responsible for any use which may be made of the information contained therein. More information about the project is available at www.learn2analyze.eu.

By studying this book, you will:


The learning objectives of this text book cover the set of competences anticipated by the **Learn2Analyse Educational Data Literacy competence framework (**L2A- - EDL-CP**)** – see Section 1.2.5. Each chapter is developed to support a given set of L2A-EDL-CP related learning objectives which are clearly stated at the beginning of the chapter.


# **Contents**





# **Chapter 1 Online and Blended Teaching and Learning Supported by Educational Data**

# **1.1 Introduction and Scope**

# *1.1.1 Scope*

The goals on this chapter are to:


# *1.1.2 Chapter Learning Objectives*


© The Author(s) 2023 S. Mougiakou et al., *Educational Data Analytics for Teachers and School Leaders*, Advances in Analytics for Learning and Teaching, https://doi.org/10.1007/978-3-031-15266-5\_1

# *1.1.3 Introduction*

Data is identifed as one of the **key enablers for driving change** in the twenty-frst century.

In the context of online education, **learners are leaving behind a rich data footprint** throughout the course of their study. As a result, the existing **educational data about** learners, their learning and the **environments in which they learn**, has exponentially increased.

Educators can grasp the great opportunities offered by educational data and the potential provided by data analytics technologies, to gain powerful insights and develop new ways of achieving excellence in both teaching and learning.

Educational data can reveal insights about the course design and teaching practice that might not be recognised otherwise. Moreover, through educational data analysis, tutors can have **a holistic view of their learners' past, present and likely future**, develop a **deep understanding of their learners' activities, behaviour and preferences**. As a result, they can target accordingly their teaching and learning interventions to provide the learners with a **personalised learning experience** and **better feedback**, and help them meet their educational goals.

**Educational Data-Driven Decision Making (DDDM)** can be a useful tool for refecting on the teaching practices and improving the teaching and learning outcomes. For effective DDDM, educators need to be able to **identify, collect, combine, analyse, interpret and effectively act upon all types of educational data from diverse sources***.*

**Educational Data Literacy for all Education Professionals** (such as instructional designers, teachers and tutors of online and blended courses) is now recognized internationally as **a key set of competences** and a **strong competitive advantage** to get the **best results in online and blended teaching and learning.**

However, emerging advancements related to the use of data-driven design and delivery of online and blended learning courses, exploiting **Educational Data Analytics are not yet thoroughly addressed by existing competence frameworks for education professionals**.

To this end, the **Learn2Analyze project has developed a comprehensive proposal for an Educational Data Literacy Competence Framework to enhance existing competence frameworks with new Educational Data Literacy competences**. The Learn2Analyze Educational Data Literacy Competence Framework comprises of **6 competence dimensions and 17 competence statements**.

The frst competence dimension of this framework refers to **Data Collection**. Since educational data comes from a variety of sources in diverse formats, the effective Data Collection is considered as an essential competence that educators need to acquire and a prerequisite for this continuous process of evaluation, refection and improvement.

It is fundamental for educators to distinguish the different types of Educational Data in Online and Blended courses, to identify the Educational Data Sources related to core elements of e-learning environments and to access and gather the appropriate educational data by combining data from different sources and in different formats, avoiding systematic errors induced from the data collection process employed.

# **1.2 Educational Data as a Key Success Factor for Online and Blended Teaching and Learning**

# *1.2.1 Educational Data for Data-Driven Decision Making*

As described in the "What is Big Data and how does it work?" video (in the useful video resources), data is identifed as one of the key factors driving change in the twenty-frst century. Commonly referred to as the '*data revolution'*, the '*era of big data'*, or more simply '*big data'*, the term is used to describe the tremendous increase in the amounts of data we generate in all aspects of our lives. Big Data can bring big possibilities and thus create big expectations (Shacklock, 2016).

"Big Data gives you the ability to achieve superior value from analytics on data at higher volumes, velocities, varieties or veracities". This claim is summarized in Fig. 1.1, based on the infographic "Extracting business value from the 4 V's of big data" by IBM (2019).


This video by intel (see the useful video resources "Big Data's Making Education Smarter") explains further how big data can make education smarter.

In the context of online education, learners are leaving behind a rich data footprint throughout the course of their study. Educational data comprises a wide range of datasets about learners, their learning and the environments in which they learn, stored in various sources. We will focus on and discuss in detail different types of educational data in the next topic of this chapter (Shacklock, 2016).

Educational data and data analytics technologies can support us in developing a better understanding of our learners' activities, behaviour and preferences, by identifying patterns and trends in the data that, in turn, can help us predict possible future outcomes and take actions for improving the learners' experience in our courses.

**Fig. 1.2** Educational data opportunities

Therefore, in both online and blended courses (Fig. 1.2),


On the other hand, data could potentially enable learners to take control of their own learning. When appropriately delivered, data can provide learners with better insights about their current academic performance in real-time, about their progress (also in comparison to their peers) and recommendations about what they need to do for meeting their learning goals and help them to make informed, data-driven choices about their studying (Sclater et al., 2016).

The Data Quality Campaign, in the video "Data is Power" (in the useful video resources), highlights the importance of collecting and using quality data to transform education. Nevertheless, the provision of educational data by itself does not automatically lead to improved teaching and learning. Appropriate analyses and sensemaking of educational data allow us to identify actionable insights and inform decision making.

Data-Driven Decision Making is about that.

### **Data-driven decision making (DDDM) is defned as**

the systematic collection, analysis, examination, and interpretation of data to inform practice and policy in educational settings (Mandinach, 2012).

Data-driven decision making has become an essential component of educational practice in order to ground decisions based on data and evidence.

**Data-Driven Decision Making (DDDM)** crosses all levels of the educational system and uses a variety of data from which decisions can be made. Therefore, it can be challenging to engage in DDDM due to data being siloed in different sources and at different levels.

Developing **competences for effective** DDDM is essential for education professionals. Such competences require "to effectively transform information into actionable knowledge and practices by collecting, analyzing, and interpreting all types of data" (Ridsdale et al., 2015).

**Decisions** fall into two categories (Marsh et al., 2006:


Data is not a static entity and therefore decisions based on data should not be static either. Data usage and evaluation should be continuous and integrated into existing decision-making processes (Fig. 1.3).

Don't approach data analysis as a cool "science experiment" or an exercise in amassing data for data's sake. The fundamental objective in collecting, analyzing, and deploying data is to make better decisions (Díaz et al., 2018).

As per Marsh et al. (2006), "*Once the decision to act has been made, new data can be collected to begin assessing the effectiveness of those actions, leading to a continuous cycle of collection, organization, and synthesis of data in support of decision making.*"

**Data analytics** refers to methods and tools for analysing large sets of different types of data from diverse sources, to support and improve decision-making. Data

**Fig. 1.3** From problem identifcation to informed decision making

analytics are mature technologies that are currently applied in real-life fnancial, business and health systems.

However, it is only recently (Johnson et al., 2011, p.28-30), that data analytics have been considered in education - frst in higher education, and more recently in school education (Bienkowski et al., 2012).

The "Engaging with students to build a better digital environment" video (in the useful video resources) shows a real-life case study of the implementation of Jisc digital experience insights service (Jisc, 2018) aiming to improve the student experience of blended learning at Canterbury Christ Church University based on educational data analysis.

As the Project lead Duncan MacIver concludes "*The data we have from the insights service makes a signifcant difference to where we are moving digitally as an institution. This lends a credible voice to decisions being made and provides us with a level of confrmation that we are taking actions that are of direct beneft to students*."

### **Questions and Teaching Materials**


Is this sentence True or False?


Correct answer: False

### 2. **Please match the appropriate defnition (from the right column), to the respective "V of Big Data" in the left column.**



Correct answers: 1-B, 2-C, 3-E, 4-A, 5-D

### 3. **Please select the right answer.**

According to the "Big Data's Making Education Smarter" video by Intel, that presents how education technology companies are leveraging big data to make learning more effective, analytics can


Correct answer: C.

4. **Please match how educational data and data analytics technologies (from the right column) can support each professional role (in the left column)**


Correct answers: 1-D, 2-C, 3-A, 4-B

### 5. **Please select the right answer(s). You may select more than one answer.**

Data-Driven Decision Making (DDDM) is about:


Correct answers: B, C, and D

### 6. **Please match the steps we need to proceed with, for Data-Driven Decision Making (DDDM), in the right order:**


Correct answers: 1-C, 2-F, 3-B, 4-E, 5-A, 6-D

### 7. **Please select the right answer(s). You may select more than one answer.**

As per the video "Engaging with students to build a better digital environment", the educational data analysis provided Canterbury Christ Church University with a deep understanding of the


Correct answers: B and E.

### 8. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about the implementation of personalised learning in the following refective task.

You may refect on:


# *1.2.2 Why Educational Data Is Important for Online and Blended Teaching and Learning?*

Personalised learning is identifed as one of the major educational challenges of the twenty-frst century (2017 Horizon Report, Freeman et al. (2017)). Personalised learning refers to the supporting of individual student learning in a pedagogically effective and practically effcient personalised manner, based on their individual short, mid and long-term needs.

Education Elements (2018) states that personalised learning is increasingly recognized as a promising strategy to


by meeting their individual needs, customizing their learning experiences to indulge their interests, using customised lessons, units and projects at their own pace.

Personalised learning has become easier with the leverage of learners' performance, engagement and behaviour data, captured in online and blended learning environments and analysed with the help of data science.

The video published by Educause "Educause: What Is Personalized Learning?" (in the useful video resources), explains aspects of personalised learning emphasizing the variety of tools and technologies that can support each learner's individual needs. The importance of a personalised learning experience, that is tailored to the learners' unique needs, skills, and interests, is also illustrated in the following infographic "You Need Data to Personalize Learning" from Data Quality Campaign.

A wide range of data is generated by the learners and stored in online and blended teaching and learning environments. Data is collected from explicit learners' activities, such as completing assignments and taking exams, and from tacit actions, including online social interactions, extracurricular activities, posts on discussion forums, and other activities that are not directly assessed as part of the learner's educational progress (U.S. Department of Education, 2012) (Bienkowski et al., 2012).

Such learner-generated data is used to assess learning progress, to predict learning performance, to detect and identify potentially harming behaviours and to act upon the fndings.

Nevertheless, as stated in the 2011 Horizon report, we should not solely focus on learners' performance. Deeper analysis of the educational data can be used to improve understanding of teaching and learning taking place online and/or in blended courses.

As it can be seen at the video in Arizona State University "Using big data to customize learning" (in the useful video resources), online learner generated data is used to customise teaching and learning in subjects like maths, by tailoring the content to the detected needs of the learners.

Every drop-off, click or share is a learner shouting their likes and dislikes. These actions are the eye-rolls, smiles and crossed arms from the classroom, simply in digital format (Greany & Niles-Hofmann (2018), An Everyday Guide to Learning Analytics).

# **Questions and Teaching Materials**

# 1. **Please select the right answer(s). You may select more than one answer.**

According to Anthony Kim, CEO and Founder of Education Elements "We don't need a model of superhuman superhero teachers. We need to use the power of technology and educational design—combined with the high aspirations we all begin with—in order to create innovative learning environments that foster personalised learning for everyone." Why does personalised learning matter, as per Chap. 2 "Why personalized Learning" in Education Elements?

### **Personalised learning**


Correct answers: A, C, E

### 2. **Please select the right answer(s). You may select more than one answer.**

As stated by Data Quality Campaign (in the infographic "You need Data to Personalize Learning") "For all students to be college and career ready, they need a learning experience that is tailored to their unique needs, skills, and interests. Data is a critical tool that makes this personalised learning possible". You are requested to explain the reasons.

With Data:


Correct answers: A, B, D

### 3. **Please select the right answer.**

As described in the Executive Summary of the Issue Brief published by U.S. Department of Education in 2012, K–12 schools and school districts are starting to adopt applications of educational data mining and learning analytics techniques in order to proceed with institution-level analyses


Correct answer: B

### 4. **Please select the right answer**

At the Arizona State University (as per the video "Using big data to customize learning"), appropriate software adapts to each individual student needs by analyzing students' every keystroke to fgure out their learning styles. The software harvests information from the devices the students use and collates grades, learning skills, strong and weak points and even hesitation patterns when using the computer mouse.

This is achieved using:


Correct answer: A.

### 5. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about the implementation of personalised learning in the following refective task.

You may refect on:


# *1.2.3 How Educational Data Can Help Instructional Designers and e-Tutors of Online Courses?*

**Instructional design** - also referred to Learning Design or Educational Design - is a systematic and iterative process for any educational challenge (including professional training and human performance improvement) that requires an educational intervention.

Reiser (2001) states that: "The feld of instructional design and technology encompasses the analysis of learning and performance problems, and the design, development, implementation, evaluation and management of instructional and non-instructional processes and resources intended to improve learning and performance in a variety of settings, particularly educational institutions and the workplace" (p.53).

The widely used **ADDIE model**, illustrated in the below infographic from Obsidian Learning (2018) (Fig. 1.4), is a fve-phase approach to analyse, design, develop, implement and evaluate any teaching and learning product and process in an effective and effcient way.

Within the context of the ADDIE approach


The roles of instructional designers and trainers/tutors in online and blended courses require new competences compared to those in traditional face-to-face education and training programs.

Instructional Designers are mainly engaged in the analysis, the design, the development and the evaluation phases of the ADDIE process.

### **Analysis Phase**

During this phase, the instructional designer identifes an instructional (educational or learning) problem and analyses the parameters of the context in which teaching and learning will take place, as well as the learners' characteristics and their existing competences (knowledge, skills and attitudes). As a result, the key elements of this phase can be codifed as follows:


**Fig. 1.4** ADDIE model representation (Krisna kristiandi hartono [CC BY-SA 4.0])

The major outcome of the Analysis phase is a granulated overview of the contextual and learner conditions that will be used to confgure and formulate the upcoming Design phase.

### **Design Phase**

During this phase, the instructional designer defnes the educational objectives to be achieved, selects an appropriate teaching approach for attaining these objectives, as well as, appropriate assessment methods for evaluating whether and to what extent the educational objectives have been met. As a result, the key elements of this phase can be codifed as follows:

DES.1. **Defnition of Educational Objectives:** this includes the defnition of general educational objectives, as well as the development of specifc subject matter objectives, aligned to the general objectives.

14 1 Online and Blended Teaching and Learning Supported by Educational Data


The main outcome of the Design phase is a detailed blueprint of the fow and description of the learning and assessment activities, which also accommodates the contextual and learner considerations from the Analysis phase.

# **Develop Phase**

During this phase, the development or selection of appropriate educational materials and the development/arrangement of the appropriate delivery setting is performed for the outcome of the Design Phase. This phase can involve except from the instructional designer, other individuals such as subject matter experts or technical and media experts. As a result, the key elements of this phase can be codifed as follows:


The main outcome of the Develop phase is the selection or production of educational materials/tools that can appropriately support the outcome of the Design Phase.

# **Evaluate Phase**

During this phase, an evaluation of both the entire teaching and learning process, as well as each phase, is performed towards identifying whether the desired results have been achieved. As a result, the key elements of this phase can be codifed as follows:

E.1. **Formative Evaluation:** this includes an ongoing evaluation process during design, development and implementation phases and aims to maximize pedagogical/ andragogical effectiveness (e.g. achievement of educational objectives) and/or implementation effciency (e.g. time/cost reduction)

E.2. **Summative Evaluation**: this is performed after completion of the Implement phase and aims to measure pedagogical/ andragogical effectiveness (e.g. achievement of educational objectives) and/or implementation effciency (e.g. time/cost reduction).

The main outcome of the Evaluate phase is to identify issues or changes needed, so as to refne the design, development and implementation phases of future designs and to assess whether the desired results have been achieved.

Trainers or tutors, as part of their role, they are mainly engaged in the implementation phase of the ADDIE process, whereas they can also inform the evaluation phase.

### **Implement Phase**

During this phase, the outcome of the previous phases is delivered to the learners. Although delivery is typically addressing groups of learners, still emphasis should be given to providing individual learning experiences, including *scaffolding* and *feedback*. To this end, it is important that learners' (and teachers'/tutors') actions are tracked and meaningful educational data is collected (to be analysed and inform refection and decision making). As a result, the key elements of this phase can be codifed as follows:


The main outcome of the Implement phase is to support learners in attaining the educational objectives by appropriately monitoring them so, if needed, changes and adaptations can be made.

As presented in the fgure below (Fig. 1.5), instructional designers and trainers/ tutors, as part of their role, leverage educational data at all phases of the ADDIE process they are engaged in.

### **Questions and Teaching Materials**



**Fig. 1.5** Instructional designers and trainers/tutors leverage educational data in all phases of the ADDIE process


Correct answers: 1: B, E, I – 2: H – 3: C, G, J – 4: D - 5: A, F

2. **Instructional Designers are mainly engaged in the analysis, the design, the development and the evaluation phases of the ADDIE process. Please mark the correct key elements corresponding to each phase of the ADDIE process.**



Correct answers: as marked with X above

3. **The key elements in the Implement Phase of the ADDIE process are Delivery and Monitoring. Please match the sentences (from the right column) corresponding to the respective key element of the Implement Phase of the ADDIE process (in the left column).**


Correct answers: 1: B, C - 2: A, D.

### 4. **Please mark the correct professional role and outcome to the respective phase of the ADDIE process**

Correct answers: as marked with X above


### 5. **Please select the right answer(s). You may select more than one answer.**

As described in the video "Data Driven Learning Design", "Data-Driven Learning Design very simply is all about just looking at data at the start before you begin any design, put all this data together and build a picture for how you want to design your content to respond to what insights they're telling you."

Can you give some examples of the data that you should be looking at, in order to decode the digital body language of your learners?


Correct answers: A, C, E, G.

### 6. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about the use of educational data, in the following refective task.

You may refect on:


# *1.2.4 How Educational Data Can Help School Teachers of Blended Courses?*

Schools are using self-evaluation as an instrument to engage all key stakeholders (namely, school leaders, educators, parents and students) in refecting and improving school activities.

**Fig. 1.6** The six essential activities for continuous, non-linear inquiry process for self-evaluation

For example, as presented in the "School Self-evaluation Guidelines 2016-2020 Primary", the Irish Inspectorate of the Department of Education Skills (2016) defnes School Self-evaluation as:

a collaborative, inclusive, and refective process of internal school review. An evidencebased approach, it involves gathering information from a range of sources, and then making judgements. All of this with a view to bring about improvements in students' learning.

The Annenberg Institute for School Reform (Barnes, 2004) has developed a continuous, non-linear inquiry process for self-evaluation, comprised of six essential activities, depicted in the fgure below (Fig. 1.6).

In the video "Data: It's Just Part of Good Teaching" (in the useful video resources) from the Data Quality Campaign, Sherman Elementary in Rhode Island demonstrates how the effective use of data by a school community can improve students' performance. Moreover, the video "How Data Help Teachers" (in the useful video resources), from the Data Quality Campaign, demonstrates how data helps school teachers and their students succeed. For more details, you may also review the corresponding infographic, "Ms. Bullen's Data-Rich Year" by DQC.

This video from the Data Quality Campaign, "Data Can Help Every Student Excel" (in the useful video resources) also discusses what does it mean to use data in service of student learning, taking the stand that data is one of the most powerful tools to inform, engage, and create opportunities for students along their education journey.

Before proceeding further, let's now discuss what fipped classroom is all about. As per Panopto (2015) "*The fipped classroom is a teaching strategy in which the traditional class format is turned on its head. This inverted model "fips" the traditional order of class activities so that school work is done at home, and "homework" is done at school. In fipped classes, students review lecture materials prior to class, reserving in-class time for teacher-guided activities that allow students to put the lecture materials into practice. Activities can include in-depth discussion, labs, debates, problem-solving, or just open time for individual assignments — all with the added beneft of having the teacher nearby to help when questions arise.*"

This new approach enables teachers to make the shift from teacher-driven instruction to student-centred learning and thus to reinforce deeper learning. You may also review the video "The Flipped Classroom Model" (in the useful video resources).

As Brame (2013) from the University of Vanderbilt, Center for Teaching, suggests, the fipped classroom approach yields statistically signifcant improvements in engagement, test scores and overall long-term learning.

The infographic "How Educational Data can help School Teachers of Blended (Flipped Classroom) Courses?" is presenting a use-case with an example for the school teacher of blended learning courses in the K-12 education context (Fig. 1.7).

The video "Why Personalized Learning: 4 Stories from 4 School Districts" (in the useful video resources) shows 4 School Districts sharing their fndings on the

**Fig. 1.7** Infographic for school teacher in K12 blended courses

implementation of blended courses aiming to provide a unique personalised learning experience to their students.

### **Questions and Teaching Materials**

1. **Please select the right answer(s). You may select more than one answer.**

As described in the Chap. 2 of the "School Self-evaluation Guidelines 2016-2020 Primary" (Inspectorate, Department of Education and Skills, 2016), selfevaluation requires a school to address the following key questions with regards to an aspect or aspects of its work.


Correct answers: B, D, E, G, H

### 2. **Please select the correct answer.**

According to Sherman Elementary in Rhode Island (please refer to the video "Data: It's Just Part of Good Teaching"), the use of data may be really benefcial for the school community and can lead to improved academic performance. Nevertheless, it creates an extra add-on and an overwhelming burden for both teachers and students.

Is this statement valid?


Correct answer: No

3. **Let's meet Alice! Alice is an enthusiastic English Language teacher who has just been appointed in an Experimental High School, in Athens, Greece. She will be responsible for the English Language Course of class1 and class2 of the ninth Grade (14 to 15 years' students). Alice is very excited about her new role. Nevertheless, the school's principle, Alex, is concerned about the relatively low performance of last year's eighth graders compared to other experimental schools in the region. Alex encourages Alice to use student** 

**data to gain insights and plan her teaching activities accordingly, so as to improve this year's Grade 9 students' academic performance.**

Alice decides to watch the video by Data Quality Campaign "How Data Help Teachers", so as to fnd out how she can leverage data for her students to succeed. Alice is really inspired by Ms. Bullen who empowers her student Joey to get on track meeting the educational goals.

Can you help Alice to arrange the instances of Ms. Bullen's story in the right order?

**A.** Joey's data shows that he's on track for success.

**B.** Ms. Bullen reviews her performance with the principal to note strengths and opportunities.

**C.** She uses data and her experience in the classroom to see where Joey and his classmates excel or struggle.

**D.** Ms. Bullen gets access to data on her students' past performance, behaviour and attendance.

**E** when Ms. Bullen sees that Joey is at risk of failing, she works with Joey, his parents and his other teachers to get him on track by the end of the year.

**F.** Ms. Bullen talks with Joey about his own data and they work together to set goals throughout the year.

Correct answers: 1-D, 2-C, 3-F, 4-B, 5-E, 6-A

### 4. **Please select the appropriate answer (s). You may select more than one answer.**

Alice is really excited about the power of using data in service of learning and for personalizing her instruction to keep every student on track to excel. Thus, she searches for further information. Alice now watches the video about How "Data Can Help Every Student Excel". She then reviews the below statements. Something seems wrong...

### **Which of these statements are not valid?**


Correct answers: C, E, F

5. **Please select the right answer(s). You may select more than one answer.**

**Let's go back to Alice. The principal informs Alice about the Learning Management System used by the school to facilitate teaching and learning, pointing out that the previous teacher has already created some online activities there.**

Alice decides to apply the fipped classroom strategy to her new students using the school's LMS. For this purpose, she designs and develops online teaching resources for Class1 and Class2. Students of these classes enrol in the respective group and study the lecture material at home (prior to classroom meeting). The material is in the form of video, text, small activities with automatic feedback (such as online quizzes), and forum discussions. During the classroom sessions, students are performing more complex activities, typically in small groups, with the beneft of Alice's scaffolding, guidance and feedback. Then, they can undertake some additional homework online to further check their understanding and extend their learning through appropriately designed individual and group assignments.

Alice reads the article "Flipping the classroom" By Cynthia J. Brame. She is looking at the key elements of the fipped classroom.


Correct answers: A, B, D

24

6. **As referred to the infographic "**How Educational Data can help School Teachers of Blended (Flipped Classroom) Courses?**", to improve this year's Grade 9 students' academic performance, Alice decides to apply the fipped classroom strategy to her new students using the school's Learning Management System. Alice starts creating a detailed plan about the needed steps to go through in order to make her fipped classroom strategy a success story for herself, her students, her principle and the parents. Can you help Alice to arrange the steps of her plan in the right order?**

**Please arrange the instances in the right order.**

A. **Following data analysis, Alice plans to use Learning analytics to monitor students' learning process, to discover patterns, to identify problems early, to fnd indicators for success and indicators for poor marks or drop-out and self-refect to improve the design and the delivery of her course comprehending the story that the collected data reveals.**


Correct answers: 1E-2C-3F-4D-5A-6B

### 7. **Please select the right answer (s). You may select more than one answer.**

Alice is unstoppable. She meets regularly with other teachers for training and for identifying promising data use practices for her fipped classroom model. She now wonders. What are the fndings of other schools that have successfully implemented their Blended Courses?

She retrieves the video describing the 4 stories from 4 School Districts on personalised learning "Why Personalized Learning: 4 Stories from 4 School Districts, https://www.youtube.com/watch?v=ur2E\_S1IBP0". She notes down some initial fndings. She reads them again and she notices that she probably mixed it up a bit.**Can you help her identify the right fndings?**


Correct answers: B, C, E, H.

### 8. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about using educational data in blended learning environment, in the following refective task. You may refect on your experience from implementing fipped classroom and/or express your opinion on why to use blended learning (or not!):


# *1.2.5 The Learn2Analyze Educational Data Literacy Competence Framework*

### **Educational data literacy is defned as:**

the ability to collect, manage, evaluate, and apply data, in a critical manner (Ridsdale et al., 2015).

the ability to accurately observe, analyse and respond to a variety of different kinds of data for the purpose of continuously improving teaching and learning in the classroom and school (Love, 2012).

the ability to understand and use data effectively to inform decisions … composed of a specifc skill set and knowledge base that enables educators to transform data into information and ultimately into actionable knowledge (Mandinach & Gummer, 2013).

[the capacity to] continuously, effectively, and ethically access, interpret, act on, and communicate multiple types of data from state, local, classroom, and other sources in order to improve outcomes for students in a manner appropriate to their professional roles and responsibilities (Data Quality Campaign, 2014a).

Thus, educational data literacy refers to the competence set which is required to identify, collect, combine, analyse, interpret and act upon educational data from

26

**Fig. 1.8** Educational data literacy roadmap

different sources, with the aim of continuously improving the teaching, learning and assessment process.

In the "Roadmap for Educator Licensure Policy Addressing Data Literacy" report, the Data Quality Campaign recommends the following set of Data Literacy Competences for teachers, (Fig. 1.8):


As already discussed, Educational Data Analytics are attributed with signifcant benefts for enhancing **personalised educational support** of the learners as well as **refective course (re)design** for achieving improved **teaching, learning and assessment**.

However, emerging advancements related to the use of data-driven design and delivery of online and blended learning courses, exploiting Educational Data Analytics are not yet thoroughly addressed by existing competence frameworks for education professionals (instructional designers, trainers, educators, teachers). Existing professional competence frameworks for instructional designers and trainers almost ignore the dimension of Educational Data Literacy.

**Fig. 1.9** Learn2Analyze educational data literacy dimensions

To this end, the **Learn2Analyze project** has developed a comprehensive proposal for an **Educational Data Literacy Competence Framework** to enhance existing competence frameworks for instructional designers and e-trainers of online courses with new Educational Data Literacy competences.

The Learn2Analyze Educational Data Literacy Competence Framework comprises of 6 competence dimensions and 17 competence statements, as captured in Fig. 1.9.

In addition, the following table (Table 1.1.) provides a brief overview of the Learn2Analyze Educational Data Literacy Competence Framework.

### **Questions and Teaching Materials**

### 1. **Please select the right answer.**

Alice is back! She has realized that, for effective data use, Educational Data Literacy is a prerequisite. So, she is trying to understand the meaning of this much-discussed key component.

Alice believes that educational data literacy refers to the specifc skill of using assessment data effectively in order to improve outcomes for students.

### **Is this defnition by Alice valid?**


Correct answer: No


**Table 1.1** Learn2Analyze educational data literacy competences

### 2. **Please select the right answer.**

Alice decides to study further on Educational Data Literacy. She reads the brief intended for State Policy Makers by Data Quality Campaign "Teacher Data Literacy: It's About Time".

According to this brief, "One element of quality teaching for improving student outcomes is effective data use. Teacher data use is also the best way to maximize state investment in data systems."

Consequently, to date, policies have heavily promoted the skills teachers need to be data literate. Thus, many teachers regard data as a powerful tool for improving instruction and ultimately outcomes for students.

Is this interpretation by Alice, True or False?


Correct answer: False.

3. **Alice is now interested in fnding out the skills that teachers need to be qualifed, so as to integrate the use of data into their everyday practice as one tool for improving student achievement. Are you ready to support Alice in this task?**

According to the "Roadmap for Educator Licensure Policy Addressing Data Literacy" report, the ability to effectively use data includes a set of skills that teachers (and administrators) need to use data both collaboratively and individually to inform instruction.

**Please match the appropriate defnition (from the right column), to the respective data use skill in the left column.**


Correct answers: 1-D, 2-E, 3-B, 4-A, 5-C.

# 4. **Alice is confdent with the fipped classroom approach, as she has used it before with great results. However, she realises that she is lacking data literacy competences**

The principle encourages her to enrol in the Learn2Analyse MOOC before the school year starts – it is only an 8-week course and it is free. She is really excited and she immediately reviews the Educational Data Literacy Competence Profle Framework that comprises of 6 competence dimensions.

Can you assist Alice to arrange these 6 competence dimensions in the right order?

### **Please arrange the 6 competence dimensions in the right order.**


Correct answers: 1-E, 2-C, 3-A, 4-D, 5-F, 6-B

5. **Alice studies thoroughly the competence statements of the Learn2Analyze Educational Data Literacy Competence Profle Framework (L2A EDL-CP Framework). She is really interested in investigating the exact competences she needs to develop in order to be Educational Data Literate and use data both effectively and ethically to improve her students' achievements.**

But she needs your assistance, again!

**Please help Alice to mark the correct statements corresponding to each of the 6 competence dimensions of the EDL-CP Framework.**



Correct answers: as marked with X above


Enabling analytics to have maximum impact on her school, involves addressing key concepts like diversity in school's data, systems and most importantly diversity of the people, to make sure that the school can get diverse ideas and diverse skills to really innovate. Do you agree with the assumption of Alice?

### **Please select the correct answer:**


Correct answer: Yes.

### 7. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response to educational data literacy training in the following refective task. You may refect on:


# **1.3 Data Is Everywhere (Educational Data Collection)**

# *1.3.1 Posing Questions and Identifying Appropriate Educational Data*

In the previous sections we reviewed the key role of educational data. In this section we will have a closer look of what educational data is.

In the school context, educational data can be broadly defned as:

information that is collected and organised to represent some aspect of schools. This can include any relevant information about students, parents, schools, and teachers derived from qualitative and quantitative methods of analysis (Lai & Schildkamp, 2013, p. 10).

As this defnition suggests, educational data is not restricted to students' grades in national exams and standardised tests (although that is a common misconception). Instead, educational data comprises a wide range of data from various sources, both internal (school-wide and classroom-specifc data) and external (state and/or district data) to the school.

This defnition can be extended to higher education and professional training institutions, as represented in Fig. 1.10 (Long & Siemens, 2011).

We can distinguish two major categories of data, the qualitative and quantitative data. With a combination of different types of data being the most effective in generating powerful evidence to assess learning performance and improve teaching practice. Both quantitative and qualitative data is equally important in these processes (Fig. 1.11).

**Fig. 1.10** Different levels of educational data

**Fig. 1.11** Quantitative and qualitative educational data

**Fig. 1.12** The four Ws and one how of educational data

As discussed, for effective Data-Driven Decision Making, we need to be data literate, to **be able to understand basic data science processes (speak data)**. According to Gartner analysts Idoine, Schlegel, and Sallam (Pettey, 2018), "*learning to "speak data" is like learning any language. It starts with understanding the basic terms and describing key concepts*." In our case, the frst key area of data literacy vocabulary is **Educational Data Collection**.

Educational Data is everywhere. To inform our decisions and beneft from them, we need to **collect the necessary data.** To do this, we need to answer to the "*Four Ws and One How*" questions, presented in Fig. 1.12. We should know why we collect this data, what types of data we need to collect, when and how to get it and where to fnd it. "*Who will collect or grant access to the needed data*", is also a question that should be answered, since, obviously, we can only collect data to which we have access and which we have been granted permission to use.

You will probably need to utilize a variety of data types from different sources and use various methods to process and analyse them according to our goals.

Bear this strategy in mind and start posing questions that will help you identify and collect the appropriate educational data. Your ultimate goal is to improve your instructional e-learning strategy and make your online and blended course a success story for your target learners.

We'll now guide you step by step through the effective process for collecting educational data by answering each one of these key questions.

Most things start with a question. The frst question to ask ourselves is "Why is data needed? Why we need to collect the data, in the frst place?"


**Fig. 1.13** Key questions to help us identify the needed data

**When you analyse and design any course** you need to gather the questions that are related to your instructional design, your teaching and tutoring strategy and your learners' support:


### **When the course is up and running:**


Figure 1.13 summarizes some key questions to help us identify the needed data.

Now that we have the right questions in place, we can identify the type of data that may help us fnd the answers we are looking for. As suggested by Fig. 1.14 and this infographic by the Data Quality Campaign project, the types of educational data commonly used can be classifed in two types: Static and Dynamic Data.

**Fig. 1.14** Static and dynamic educational data

**Static data**, refers to **data which can remain unchanged for large periods of time**. According to Shacklock (2016), it is the data *"which is collected, recorded and stored by institutions and traditionally includes student records, staff data, fnancial data and estates data".*

As Shacklock (2016), points out *"Static data has always been a strategic asset for both institutions and government. It informs all operational and business decision-making and planning in an institution, and indicates to government and the public how the sector is performing as a whole."*

**Dynamic data** refers to data generated at a more frequent rate and they are mainly related to **learners' activities during the learning process**. Such data is usually collected by the e-tutors, classroom teachers typically through Learning Management Systems.

If we manage to collect, link and analyse dynamic data, then we can probably get an instant, accurate view of how an individual learner or a group of learners is performing.

Lai and Schildkamp (2013, p. 11–12) have extended Ikemoto and Marsh's (2007) categories of educational data, to input data, context data, process data and outcome data. Each category indicates when data will be collected. Figure 1.15 presents examples of educational data for each category.

To get a better understanding of the use of data to strengthen lifelong learning, you may watch the video (in the useful video resources) from UNESCO "Data for Lifelong Learning", presenting an overview of the tools developed by the UNESCO Institute for Statistics (UIS) to measure learning and improve learning outcomes.

**Fig. 1.15** Examples of educational data for each category

### **Questions and Teaching Materials**

### 1. **Please select the correct answer.**

Do you remember Alice? Alice is an English teacher who has just been appointed in an Experimental High School, in Athens, Greece. She is responsible for the English Course of Class1 and Class2 of the ninth grade. Her principal has encouraged her to use student data to gain insights and prepare her instruction accordingly, so as to improve this year's Grade 9 students' achievement.

Alice is studying the categories of data and realizes that we can distinguish two major categories of data, qualitative and quantitative data.

In a school setting, quantitative data may include notes from classroom observations.

### **Do you agree with the assumption of Alice?**


Correct answers: No

### 2. **Please select the correct answer.**

Alice realizes that the frst key to be data literate is Educational Data Collection. To achieve her goals, she needs to utilize a variety of data types from different sources and use various methods to process and analyse them.

### **Do you agree with the assumption of Alice?**


Correct answers: Yes

3. **Alice is a bit confused about Navigation and click stream data. What information could this data reveal?**

**Help Alice select the correct answer(s):**


Correct answers: A, B, D



Correct answers: 1-A, 2-B, 3-B, 4-B, 5-A, 6-A, 7-B, 8-A

	- A. **input data**
	- B. **context data**
	- C. **process data and**
	- D. **outcome data.**

**Alice wants to make effective instructional changes to her reading program to better cater for the boys in her class. She is considering using the following data:**


**Help Alice to match the data [1 to 4] with the categories of the data [a to d] mentioned above.**

Correct answer: 1a – 2d – 3c – 4b

### 6. **Please select the correct answer.**

After watching the video "Data for Lifelong Learning", from the UNESCO Institute for Statistics, Alice realizes that "robust monitoring is needed to track whether children and adults are gaining the skills they need to thrive in today's world."

### **Do you agree with the assumption of Alice?**


Correct answer: Yes.

### 7. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about the use of educational data in the following refective task. You may refect on:

1. **As an instructional designer or a school teacher, you want to collect data to redesign your course. Describe your evaluation plan. Defne the questions you need to answer and the data you will need to collect. Please share either your past experience or your thoughts for future actions.**

**Fig. 1.16** Difference between reliability and validity

2. **As a tutor of an online course, you want to collect data to enhance your learners' participation in the course. Defne the questions you need to answer and the data you will need to collect. Please focus on either your past experience or your thoughts for future actions.**

# *1.3.2 Matching Appropriate Educational Data with Data Sources*

In this section we will discuss **where** to fnd the educational data you need and **how**.

**WHERE** applies to the location where you might have to go for the data collection, according to the data you need.

There are **numerous data sources of learners' information available**:


**Fig. 1.17** Different sampling methods

By EGalvez (WMF) – Own work, CC BY-SA 3.0, https://commons.wikimedia. org/w/index.php?curid=31697223

Before proceeding further with data collection, we need to agree on a few basic concepts related to the nature of data itself. As Guerra-López (2008) points out, data must meet three basic characteristics:


Another important question is which methods will be used to select a representative group of people from the learners' target audience? How can we avoid **biases** in sampling?

Rothwell et al. (2016) argue that the four types of sampling procedures commonly used are: (1) convenience or judgmental sampling, (2) simple random sampling, (3) stratifed sampling, and (4) systematic sampling. To determine which one to select, we need to consider our goals and objectives, certainty needed in the conclusions, the willingness of decision makers in the organisation to allow information to be collected for our study, and the resources (time, money, and staff) available (Rothwell et al., 2016) (Fig. 1.17)*.*

Let's have a closer look at this important aspect affecting our data, biases. **Biases** are systematic errors induced from the data collection process employed, reducing potential biases allows us to have data that represent the population. This "Bias when collecting data" video (in the useful video resources) explains different kind of biases that occur when collecting data.

**Fig. 1.18** Barriers to educational data


As seen in Fig. 1.18, access to educational data may be a really serious barrier to overcome when gathering appropriate educational data. Here are some additional questions that we need to answer. What data do we need versus what data can we access? With whom in the organisation should we interact during our data collection process? How many people? For what issues? Whose approval is necessary to collect information?

Perhaps the most common failure during the collection process is failing to receive enough—or the right—permissions to collect data. To overcome this problem, we should make sure we have secured all necessary approvals before collecting data.

Failure to complete this step successfully can create signifcant, and often unfortunate, barriers to cooperation within the organisation.

In their 2006 report, Making Sense of Data-Driven Decision Making in Education, Julie Marsh and her colleagues (Marsh et al., 2006, p. 9) identifed a number of barriers to the effective and effcient take-up of educational data use.

# **Questions and Teaching Materials**

1. **Alice wants to collect the following data, in order to study how her students' academic achievement is related to their learning behaviour.**

Help Alice decide which of this data is stored in the institutional Student Information.

System (select all that apply):


Correct answers: A, B, E


**Help Alice fnd the right answer (you may also refer to** https://www.thegraidenetwork.com/blog-all/2018/8/1/the-two-keys-to-quality-testing-reliabilityand-validity**). The data collected is:**


Correct answer: D

3. **Alice has decided to apply the fipped classroom strategy with her students using the school's Learning Management System. This inverted model "fips" the traditional order of class activities so that school work is done at home, and "homework" is done at school.**

Before using the fipped classroom initiative, Alice wants to study students' perceptions of technology. She prepares a questionnaire and uses students from her class as a sample for her study. She wants to generalize her fndings to all High-School students.

**The above procedure is an example of:**


Correct answer: A.

4. **Alice is excited. There are so many different kinds of bias that occur when collecting data.**

Alice wonders if she understood the different categories of bias. Can you help

The data collection process described in the above comic strip with Hagar the Horrible is an example of:


Correct answer: C

5. **Alice contacts Mr. Adams, who is appointed as school's Data Protection Offcer (DPO), to secure all necessary approvals for the sources handled by her school or by the corresponding district. As soon as Alice signs the required data protection consent form, she gets permission and downloads the datasets from the various sources. Alice also requests that she be granted access to the LMS used by the school (a new teacher account is created by the LMS administrator).**

Without the availability of high-quality data and perhaps technical assistance, data may become misinformation or lead to invalid inferences. Delayed or late access to data and/or its analysis might affect the effciency of the planned activities in response to the data intervention.

### **This kind of data barrier is referred as:**


Correct answer: B.

### 6. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about data collection, in the following refective task. You may refect on:


# *1.3.3 Combining Data from Different Educational Data Sources*

As we have already seen, there are many different data sources that contain useful educational data. Shacklock (2016) reports that *"some institutions are beginning to explore the possibility of incorporating more types of data into their analytics systems. The University of Lancaster is considering capturing and using data on which students are accessing library PCs and for how long, NTU are also looking at capturing data on e-book usage*".

Figure 1.19 summarizes indicative educational data sources that store data from various sources:


Once we decide upon the "Ws" of the data we need, we have to defne HOW we will collect the data. School of Data (2013) distinguishes three basic ways of getting hold of data:

**Fig. 1.19** Indicative educational data sources

	- surveys and polls
	- internal data sources, like Institutions' Management Information Systems and/or Students Information Systems.
	- online educational environments, such as LMSs, MOOCs, ITSs which record any learner activity involved, such as reading, writing, taking tests, performing various tasks and commenting with peers (Fig. 1.20).

There are various sources of educational data where data is stored in different formats. Romero et al. (2014) state that "*the goal of data aggregation/ integration is to group together data from multiple sources into a coherent recompilation, normally into a database*".

**Aggregation** is the process of grouping together same type of data from different organisations/institutions and **integration** is the process that groups different types of data from the same organisation/institution.


**Fig. 1.20** Searching for answers

Using aggregation and integration we can combine data from different sources and in different formats, for example performance data, attendance records, past academic data and forum participation data into a single database.

### **Questions and Teaching Materials**




Correct answer: 1A – 2D – 3A – 4E – 5A – 6E – 7B – 8C

	- A. **Aggregation**
	- B. **Integration**

Correct answer: B.

### 3. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response in the following refective task. You may refect on:


# **1.4 Concluding Self-Assessed Assignment**

# *1.4.1 Introduction*

Now that you have a better understanding of the power of educational data as a key success factor for online and blended teaching as well as of the fundamentals of Educational Data Collection, you are ready to link theory with practice and apply your Educational Data Literacy Competences focusing on the creation of a Data Collection plan.

In order to proceed, you are requested to complete a concluding self-assessed assignment. This self-assessed assignment is a real life scenario activity (based on the use case of our teacher Alice), using a rubric across three profciency levels and an exemplary solution rating. When you have completed this assignment, you will assess it yourself, following the rubric which will list the criteria required and give guidelines for the assessment.

This self-assessed assignment procedure consists of 5 steps:


# *1.4.2 Step 1. Real Life Scenario*

Let's go back to Alice. As introduced, Alice has decided to apply the fipped classroom strategy with her students using the school's Learning Management System. She wants to use educational data to reveal insights about her course design and students' activities, to refect on her teaching practices and to target accordingly her teaching and learning interventions so as to help every student to excel and meet their educational goals.

To inform her decisions and beneft from them, Alice needs to **access and gather the appropriate data.** She should know why she collects this data, what types of data she needs to collect, when and how to get it and where to fnd it.

**You need to help Alice to design her Educational Data Collection plan, in order to monitor students' performance in the online course, for the fipped classroom initiative.**

# *1.4.3 Step 2. Getting Familiar with the Assessment Rubric*

Alice has already prepared an Initial Educational Data Collection Plan and asks you to evaluate it, using the Rubric for assessing the Educational Data Collection Plan, to identify potential issues.

**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the evaluation of the Initial Educational Data Collection Plan created by Alice, in the following refective task. You may refect on:

1. *Does this Educational Data Collection Plan addresses student's performance aspects?*

2. If not, what would you advise Alice to add/modify, so that the data collection plan addresses all aspects and monitor student's performance, during the fipped classroom initiative?


# **1.4.3.1 Initial Educational Data Collection Plan**

### **1.4.3.2 Rubric for Assessing the Educational Data Collection Plan**


# *1.4.4 Step 3. Prepare Your Answer*

52

Please assist Alice in preparing an Educational Data Collection plan to monitor her students' performance in the online course for the fipped classroom initiative, by completing the following matrix.


**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the preparation of the Educational Data Collection plan to monitor students' performance in the online course, in the following refective task. You may refect on:


# *1.4.5 Step 4. Review a Sample Solution*

Please review a sample of an **Exemplary solution** that follows the criteria specifed in the **Rubric for assessing the Educational Data Collection Plan.**

**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the Exemplary solution that follows the criteria specifed in the Rubric for assessing the Educational Data Collection Plan, in the following refective task. You may refect on:

1. *Do you identify any requirements that you did not take under consideration when creating your Educational Data Collection Plan?*

# **1.4.5.1 Exemplary Sample Solution**

**Educational Data Collection plan to monitor students' performance in the English Language online course, for the fipped classroom initiative of the ninth Grade of Athens Experimental High School.**


Source: Clustering moodle data as a tool for profling students Bovo et al. (2013)

# *1.4.6 Step 5. Self-Evaluate Your Answer*

Now that you have seen the Exemplary Sample Solution, please rate your initial answer (evaluate the *Educational Data Collection Plan* you created), using the Rubric table below.



# **References**


# *Useful Video Resources*

External Video: What is Big Data and how does it work? [1:33]. External Video: Big Data's Making Education Smarter [2:16]. External Video: Data is Power [2:32] External Video: Engaging with students to build a better digital environment [2:56] External Video: Educause: What Is Personalized Learning?[2:27]. External Video: Using big data to customize learning[3:08]. External Video: Data Driven Learning Design [5:35]. External Video: Data: It's Just Part of Good Teaching [3:43]. External Video: How Data Help Teachers[1:51]. External Video: Data Can Help Every Student Excel [2:00]. External Video: The Flipped Classroom Model [3:00]. External Video: Why Personalized Learning: 4 Stories from 4 School Districts [3:41] External Video: Develop a culture of data and analytics enablement at the summit [1:30]. External Video: Data for Lifelong Learning [2:49]. External Video: Bias when collecting data [5:59].

# *Further Reading*


making (WCER working paper no. 2014-3). Retrieved from https://fles.eric.ed.gov/fulltext/ ED556492.pdf


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 2 Adding Value and Ethical Principles to Educational Data**

# **2.1 Introduction and Scope**

# *2.1.1 Scope*

The goals on this chapter are to:


# *2.1.2 Chapter Learning Objectives*


(continued)


# *2.1.3 Introduction*

This chapter will introduce the second key competence of educational data literacy, namely, **Educational Data Management**.

The frst step in this imperative process is **Data Cleaning**. Since educational data comes from various sources, it could be really messy. It may come in diverse formats and it may contain various types of inaccuracies. Thus, it is essential to know the most common quality issues of raw educational data and understand the data cleaning methods for educational datasets.

In order to add value to the datasets, educators need to understand the advantages of **enhancing educational data through data description by using Metadata**, usually defned as "data about data".

**Data Curation** is **attributed with great importance in educational data management**, in order to transform raw data into consistent data that can then be analysed.

Moreover, to ensure continued and reliable long-term access there are many important aspects we need to consider and manage, when it comes to an e**ffective digital preservation process for the educational data**.

Special focus should be given on key technical elements of digital preservation. The selected **storage solution is of prime importance for digital preservation**, since security and privacy issues are signifcant concerns.

Along with the emerging opportunities offered, education data-driven practice and assessment **raise challenges such as ethical issues and implications** especially in terms of **privacy, security of data and informed consent** that **should be addressed via transparent and well-defned ethical policies and codes of practices**.

Several frameworks, policies and guidelines have been developed to help institutions and educators to identify potential ethical issues and to apply clear ethical policies that govern the use of educational data.

New regulations, like the GDPR (General Data Protection Regulation) have raised awareness of data ethics issues that can arise from data misuse.

**Informed consent** is declared by most international guidelines as one of the pivotal principles in Data Ethics. The way individuals are informed is crucial for the informed consent process. Educators should ensure that individuals fully realize the expected consequences of granting or withholding consent.

With regards to the collection of personal data about children, additional protection should be granted since children are less aware of the risks and consequences of sharing data and of their rights.

As mentioned, in the light of rapid development of Educational Data Analytics on a global basis, **new challenges to privacy and data protection have also emerged**.

Do educational data analytics challenge the principles of data protection? Is privacy a show-stopper? How privacy is guaranteed/secured, especially if minors and/ or sensitive data is involved?

Education professionals need to pay extra attention to **sensitive data** (special category of personal data) since an organisation can only process this data under specifc conditions (explicit consent may be needed).

Moreover, the protection of the rights and freedoms of natural persons with regard to the processing of personal data require that appropriate technical and organisational measures are taken. In order to identify sensitive data, assess and respond to data risks and monitor implemented security processes, a Data Protection Impact Assessment (DPIA) may be required whenever processing is likely to result in a high risk to the rights and freedoms of individuals (IT Governance UK, 2016).

# **2.2 Adding Value to Educational Datasets (Educational Data Management)**

# *2.2.1 Making Data Tidy (Data Cleaning)*

We are surrounded by a sea of data. As per BrightBytes (2017) "*The widespread availability of accurate and usable data has the potential to unlock a universe of information for educators." We could add, that without the appropriate process of getting data ready to use (whether you call it wrangling, cleansing or simply cleaning), "data is simply a scatter of numbers*". You may also review the video "Data Wrangling for Faster, More Accurate Analysis" (in the useful video resources) showing that "Data discovery is a critical step when working with complicated data".

In this topic, we will continue studying the language of data. It is time for the second key area of data literacy vocabulary, **Educational Data Management**. The frst step in this imperative process is **Data Cleaning**. Figure 2.1 depicts the framework of data cleaning as defned by Maletic and Marcus (2000) in Data Cleansing: Beyond Integrity Analysis.

As mentioned, educational data comes from various sources. There is data from online learning environments, data from state tests, demographic data, data from management information systems, from open educational resources and much more. It would be really useful if we could unify all these little pieces to reveal the big picture and realize the untapped potential.

**Fig. 2.1** Data cleaning framework. (based on Maletic & Marcus, 2000)

All this data could be really messy. It may come in diverse formats and it may contain various types of inaccuracies like missing values, outliers, duplicate instances. To obtain an integrated and consistent database that is free from any sort of discrepancies, data clean-up is required.

As Romero et al. (2014) describe in A Survey on Pre-Processing Educational Data, the **data cleaning task** concerns the detection of erroneous or irrelevant data and how to discard it.

Let's move on and fnd out the most common discrepancies in data, like:


and how to handle them (Fig. 2.2).

Missing values occur when no value is stored for the variable in the current observation (Little & Rubin, 2002).

When using an e-learning environment, it is very common for learners to study at their own pace, to follow their own learning path. They usually skip some activities and complete only a part of the tasks in the course. Sometimes they even drop out and never come back. Thus, missing data is very common when collecting educational data.

Romero et al. (2014) suggest several ways to handle missing data:


An **outlier** is an observation that has values which deviate from the expected, either too large or too small from most other observations (Fig. 2.3). They may be caused by typographical errors or errors in measurement. Remember when NASA lost a Spacecraft due to a Metric Math mistake (Harish, 2019)?

In datasets, different scales of numerical values are often used to make it easier for humans to read. For example, in budget datasets, the units are often in the millions. 1,500,000 often becomes 1.5 m. However, smaller amounts like 400,000 are still written in full. As a result, 1.5 m looks like it is an outlier, while it is an inconsistency in data types and formats.

However, Romero et al. (2010) indicate that "*outliers may be phenomena of interest in a dataset, it could be correct and represent real variability for the given attribute*."

In the context of educational data, outliers can be often true observations (Romero et al., 2014). For example, there are always exceptions among learners, who succeed with little effort or fail against all expectations. In another example, very high values are often recorded for *time-spent* because the learner had not signed-out before leaving the digital learning environment.

It is clear that not all outliers are errors. It depends on the aims of the analysis, whether these outliers should be eliminated or not, and requires knowledge of the context in which the data was produced and collected.

**Inconsistent data** (Fig. 2.4) appears when a data set or group of data is dramatically different from a similar data set (conficting data set) for no apparent reason (Romero et al., 2014).

For example, imagine negative values for the age of a person or height data measured either in meters or in centimetres. In fact, some incorrect data may also result from inconsistencies in naming conventions or data codes in use, or inconsistent formats for input felds, such as a date (Chakrabarti et al., 2009). The most common error is the mixed use of American (MM/DD/YYYY) and European (DD/MM/ YYYY) formats (see Date formats around the world).

People often try to save time when entering data by abbreviating terms. If these abbreviations are not consistent, it can cause errors in the dataset. Differences in capitalisation, spacing, and genders of adjectives can all cause errors. There can be numerous inconsistencies. We have to deliberately deal with them. At the same time, it is in every case better to log the details of our procedure cautiously for future reference.

**Fig. 2.4** Inconsistent data

**Fig. 2.5** Double instances

**Data deduplication** is a process that reduces storage overhead by eliminating redundant copies of data and, ensuring that storage media retain only unique instances of data. A duplicate record is where the same piece of data has been entered more than once (Fig. 2.5). Duplicate records often occur when datasets have been combined or because it was not known there was already an entry.

In educational organisations, **data integration and correlation** are essential activities related to data collection. Information obtained from multiple sources usually leads to duplicated data observations and inaccurate data. This duplicate elimination is one of the most important steps in the data cleaning process. The procedure of detecting and eliminating duplicates from a particular data set is called Deduplication.

According to Crowdfower Data Science Report 2016, scientists spend the most time collecting and cleaning data (Fig. 2.6). Messy data is by far the most timeconsuming aspect of the typical data scientist's workfow.

The point with data is that it needs to be regularly maintained to ensure that data remains clean and crystal clear Ronald van Loon (2018).

Much of the data may be unstructured, noisy and in need of thorough cleansing and preparation before it is ready to yield working insights Big Data expert, Bernard Marr (2017).

### **Questions and Teaching Materials**

1. **Finally, after Alice collected the necessary parental consent for her intervention, the fipped classroom course is up and running.**

After running the online course for three weeks, Alice tracks her students' activity in the online learning environment. Thus, she also collects data related to

**Fig. 2.6** What data scientists spend the most time doing

students' engagement, behaviour and performance in the LMS e.g. time spent in the platform, the videos her students watched, their progress in the online course, downloaded fles, their online quiz scores, their participation in the forum as well as interaction among them.

Before proceeding further, Alice confrms that the collected data meets basic quality characteristics. She watches the video "Data Wrangling for Faster, More Accurate Analysis". Thus, she examines and verifes the educational data against different quality measures. Inconsistences in data, like missing pieces, errors, even differences in how the same value is expressed, produce inaccurate results.


Correct answer: True

2. **Alice has collected educational data from various sources (data from online learning environments, data from state tests, demographic data, data from management information systems, from open educational resources and much more) and she wants to unify the datasets in order to reveal the big picture.**

Alice soon realizes that the data coming from various sources in diverse formats, is quite messy, containing missing values, outliers, and duplicate instances. To obtain a consistent database, free from any sort of discrepancies, data cleaning is required so as to detect erroneous or irrelevant data and discard it.

In the framework of data cleaning, as defned by Maletic and Marcus (2000) and presented in fg. 2.1, the following three phases defne a data cleansing process.

Help Alice to arrange the phases in the right order:


Correct answer: B – C – A

3. **Alice has collected data from the Learning Management System and she realizes that some users accessed her course just once (in error or in order to see one specifc resource or to do an activity) but never returned to the course later.**

**What would you suggest Alice to do in order to handle the missing values?**


Correct answer: D

### 4. **Alice has extracted the following dataset containing fle downloads data from the school's Learning Management System.**

She can easily identify two outliers (Student4 and Student11). Help Alice to decide what to do with these outliers, in order to proceed with the data analysis. These outliers:


Correct answer: B


### 5. **Alice participates in an International Conference on Teaching and Learning. Therefore, she must prepare a review of students' performance from 6 different countries in three main subjects, namely Maths, English, and Science.**

Students' performance data from 6 different countries are collected in the following table.


Alice soon realises that the key to fnding the inconsistencies is to create a flter. The flter will allow her to see all of the unique values in the column, making it easier to isolate the incorrect values. (Source: https://edu.gcfglobal.org/en/ excel-tips/a-trick-for-fnding-inconsistent-data/1/).

**After examining carefully this table, please help Alice to select the inconsistencies you have identifed**


Correct answers: A, B, C, E, F. In our example, we can identify the following inconsistencies: In row 21 Greece is misspelled and in row 11 Italy has double l; In row 18 there is a negative value for the grade in English; In row 13 grades are in different scale; In rows 14 and 20 dates are out of range; and In rows 17 and 21 dates are in different format (DD/MM instead of MM/DD).

# 6. **Alice participates in an International Conference on Teaching and Learning. Therefore, she must prepare a review of students' performance from 6 different countries in three main subjects, namely Maths, English, and Science.**

Students' performance data from 6 different countries are collected in the following table.


(continued)


After searching the web for answers, Alice fnds out that she can identify duplicate rows by selecting Home-Conditional Formatting-Highlight Cell Rules-Duplicate Values in MS Excel.

**Help Alice identify the duplicates. How many duplicates can you identify?**


Correct answer: D

	- **True**
	- **False**

Correct answer: False.

### 8. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about data cleaning in the following refective task. You may refect on:


# *2.2.2 Data to Describe Data (Metadata)*

**Metadata** is usually defned as "data about data". Johnson et al. (2018) provide the following defnition about metadata *"It is information about a data set that is structured (often in machine-readable format) for purposes of search and retrieval. Metadata elements may include basic information (e.g., title, author, date created) and/or specifc elements inherent to data sets (e.g., spatial coverage, time periods)."*

However, in the context of education, metadata can more aptly be defned as tags used to describe educational assets.

Metadata **helps**:


Metadata **answers** the following questions about data:


Practical examples of metadata: https://dataedo.com/kb/data-glossary/what-ismetadata Kononow (2018), Fig. 2.7)

In Understanding Metadata 2017, from the National Information Standards Organization, Riley (2017) distinguishes the three types of metadata (see Fig. 2.8):

**Fig. 2.7** Examples of metadata

**Fig. 2.8** Types of metadata


**Descriptive metadata** can describe a learning asset or resource related to education — including learning standards, lessons, assessment items, books, etc. — for purposes such as identifcation, search and discovery. Descriptive metadata can be thought of as a keyword or tag on an asset that makes it easier to fnd. Examples include subject, grade level, and related skills and concepts.

**Administrative metadata** is used to manage a learning asset. Examples of this type of metadata include status, disposition, rights and licensing.

**Structural metadata** describes how data is organized or formatted and is often governed by a widely-adopted standard that ensures the data is accurately represented when exchanged and presented. Structural metadata enables content to be machine readable.

Metadata are used for the **purposes** of:


Primary uses of various metadata types are presented in the Table 2.1 below (adapted from Understanding Metadata, 2017).

The video from the National Archives of Australia "Meta… What? Metadata" (in the useful video resources) helps us understand the importance of metadata in order to describe, use, fnd and manage content and data.

The National Information Standards Organization describes "*data interoperability, as the effective exchange of content between systems. Interoperability relies on metadata describing that content so that the systems involved can effectively profle incoming material and match it to their internal structures*." You may also review this video "Learn More About Data Interoperability" (in the useful video resources).

### **Questions and Teaching Materials**

1. **Alice has heard of "metadata", but she is not quite sure what it means or why she might need it. She downloaded this photo from** pxhere.com **an online community sharing copyright-free images.**


**Table 2.1** Primary uses of various metadata types


### **Photo's properties**

**What information can Alice gather from photo's metadata? Match the questions from the frst column with the values in the second column.**


Correct answer: A8 – B1 – C4 – D6 – E2

2. **Open educational resources (OER) are freely accessible, openly licensed text, media, and other digital assets that are useful for teaching, learning, and assessing as well as for research purposes. The term OER describes publicly accessible materials and resources for any user to use, re-mix, improve and redistribute under some licenses.**

**OER Repositories** are repositories of open educational resources covering most of educational disciplines. Open Repositories are websites which house open books, textbooks, lectures, tutorials, quiz/test, case studies, assessment tools, images, syllabi, simulations, online courses and other resources of educational value.

**Photodentro OER** repositories is the Greek National Learning Object Repository (LOR) for primary and secondary education. It hosts reusable learning objects (small, self-contained reusable units of learning). It is open to everyone, pupils, teachers, parents, as well as anybody else interested. The URL for accessing Photodentro LOR is http://photodentro.edu.gr/lor.

For the purpose of collecting learning material for the fipped classroom initiative, Alice has found the following Learning Object (LO) in Photodentro OER repositories:

Alice is studying the Learning Object's metadata page (http://photodentro.edu. gr/lor/r/8521/2705?locale=en) to fnd answers to the following questions:

### 1. **What is the Subject Area of the LO?**


Correct answer: A.

### 2. **What are the Licence Terms of the LO?**


Correct answer: C.

### 3. **What is the Date of Publication?**


Correct answer: D.

### 4. **What is the File Size?**


Correct answer: A.

	- **True**
	- **False**

Correct answer: True.

6. **Alice watches the video from the League of Innovative Schools "**Learn More About Data Interoperability**" promoting the movement to advance data interoperability in public education.**

In this video, data interoperability is defned as the seamless, safe and controlled exchange between applications, with clear standards for how to send and receive student information, privately and securely.


Correct answer: True

**Fig. 2.9** Data curation

### 7. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about metadata, in the following refective task. You may refect on:

*The advantages of enhancing educational data through data description.*

# *2.2.3 The Signifcance of Data Curation*

According to ICPSR (2018), *"Through the curation process, data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future. Without curation, however, data can be diffcult to fnd, use, and interpret"* (Fig. 2.9).

Michael Stonebraker (2014), defnes **data curation** as the process of turning independently created data sources (structured and semi-structured data) into unifed data sets ready for analytics, using domain experts to guide the process. It involves:


Castanedo (2015), on the other hand, describes **data curation** as the process that involves data cleaning, schema defnition/mapping, and entity matching to transform raw data into consistent data that can then be analysed. Schema defnition/ mapping is making associations among data attributes and features. Entity matching is fnding data in different data sources that refer to the same entity. Entity matching is essential to remove duplicate records.

In this video, "ICPSR 101: What is Data Curation?" (in the useful video resources), ICPSR explains the intricacies of the work data processors do every day to fnd and fx issues in the data, ensuring their long-term availability and value to the research community.

According to The Digital Curation Centre (DCC) Fig. 2.10 provides a graphical, high-level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt through the iterative curation cycle.

**Fig. 2.10** The DCC curation lifecycle model. (Source: diagram from Higgins, 2008)

We can identify **four full life cycle actions**:


The outer cycle represents the **sequential actions** of the data curation process:


Digital curation is all about maintaining and adding value to a trusted body of digital information for future and current use; specifcally, the active management and appraisal of data over the entire life cycle (Jisc, 2006).

You may also review the video "Data Curation @UCSB", (in the useful video resources) to watch how UCSB Library eyes digital curation service to help preserve research data created across campus.

Now that we have completed the hard work to make our data tidy and meaningful, we will put in a little extra effort to preserve our valuable results.

Thus, we will discuss **Digital Educational Data Preservation** which is considered a key task in the data curation process, to safeguard our unique educational data from getting stolen, destroyed or simply lost.

### **Questions and Teaching Materials**

1. **Alice is studying the Data Curation Process to ensure that data is reliably retrievable for future reuse, and to determine what data is worth saving and for how long.**

**Help Alice match the following Data Curation processes to the appropriate Data Curation Phase.**


```
Correct answer: A1-B3-C2-D3-E1-F2-G1-H3.
```
2. **Data Curation is not quite clear to Alice, so she watches the** video from ICPSR **("**ICPSR 101: What is Data Curation?**") explaining what data curation is all about. According to this video, the purpose of data curation is to ensure that people can fnd data now and in the future. This can be achieved by following the 5 steps of data curation.**

**Please help Alice to arrange the following steps in the right order:**


Correct answer: B-E-A-D-C

3. **Alice studies the** Digital Curation Centre's (DCC) Curation Lifecycle Model**. According to this complex diagram, there are four full lifecycle actions and eight sequential actions of the data curation process.**

**Please help Alice to select only the full lifecycle data curation actions from the following list.**


Correct answers: B, E, F, H

4. **The last step of Data Curation Cycle is to ensure that data will last forever (or at least for a very long time). Alice is anxious, how can digital records last "forever"? What if the technology becomes obsolete?**

Thankfully, in the "Data Curation @UCSB" video Alice just watched Greg Janee, a Digital Library Research Specialist claims that digital information is far more robust than paper.

**Is Alice's understanding correct?**


Correct answer: No.

### 5. **ACTIVITY/PRACTICE QUESTION (Short answer)**

Name some of the data curation actions described in this session.

### 6. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response in the following refective task. You may refect on:

*The signifcance of data curation in educational data management.*

# *2.2.4 Storage Issues for Preserving Educational Data*

As explained in the short Library of Congress video "Why Digital Preservation is Important for Everyone" (in the useful video resources), traditional information sources such as books, photos and sculptures can easily survive for years, decades or even centuries but digital items are fragile and require special care to keep them useable. Rapid technological changes also affect digital preservation. As new technologies appear, older ones become obsolete, making it diffcult to access older content.

This video explores the complex nature of the problem, how digital content, unlike content on traditional media, depends on technology to make it available and requires active management to ensure its ongoing accessibility.

Preservation is no longer simply a concern for memory institutions in the long term but for everyone interested in using and accessing digital materials. The greater the importance of digital materials, the greater the need for their preservation: digital preservation protects investment, captures potential and transmits opportunities to future generations and our own. Digital materials – and the opportunities they create – are fragile ((Digital Preservation Handbook), Digital Preservation Coalition (2015).

Jisc, 2006 defnes **Digital Preservation** as "*the series of actions and interventions required to ensure continued and reliable access to authentic digital objects for as* 

**Fig. 2.11** The most important aspects we need to consider and manage, so as to ensure an effective digital preservation process for our educational data

*long as they are deemed to be of value. This encompasses not just technical activities, but also all of the strategic and organisational considerations that relate to the survival and management of digital material*".

According to Principles and Good Practice for Preserving Data, "*A sustainable preservation programme addresses organisational issues, technological concerns and funding questions*" (Interuniversity Consortium for Political and Social Research (ICPSR), 2009). The simple questions to be answered:


Figure 2.11 is based on Digital Preservation Handbook (Digital Preservation Coalition, 2015), and presents the most important aspects we need to consider and manage, so as to ensure an effective digital preservation process for our educational data.

Even though our main focus is not to drill down deep into technical details and aspects of digital preservation issues, which are not part of educators' main role,

**Fig. 2.12** Digital preservation activities

however it is essential to get an overview and understanding so as to be able to collaborate effectively with the responsible technical team, using a common language. Thus, next we will discuss briefy such issues for the effective educational data digital preservation.

The frst steps that need to be undertaken in order to begin to build or enhance the needed digital preservation activities are summarized in Fig. 2.12. You may further review detailed information in Digital Preservation Handbook (Digital Preservation Coalition, 2015).

Special focus should be given on these key technical elements of digital preservation, as specifed under USGS Guidelines, 2014:


To assess an organization's readiness, it is recommended that these components are checked against the National Digital Stewardship Alliance (NDSA) 'Levels of Digital Preservation' (Phillips et al., 2013):


### 2 Adding Value and Ethical Principles to Educational Data

**Fig. 2.13** Two storage methods

• **Level 3 – monitor your data**

### • **Level 4 – repair your data**

With regards to the storage technology, it has changed dramatically over the last twenty years. Initially, the norm was storing data using discrete media items, such as CDs/DVDs and hard-disk drives. Today, it has become common practice to use IT storage systems for the increasingly large volumes of digital material that needs to be preserved and to be easily and quickly retrievable (Digital Preservation Coalition, 2015).

At this point it is important to clarify the difference between backup and digital preservation process. **Backup** refers to "*short-term data recovery solutions following loss or corruption*" (Jisc, 2006). **Preservation storage systems** "*require a higher level of geographic redundancy, stronger disaster recovery, longer-term planning, and most importantly active monitoring of data integrity in order to detect unwanted changes such as fle corruption or loss*" (Digital Preservation Handbook).

The selected storage solution is of prime importance for digital preservation. When selecting the storage strategy there are several options we need to consider, such as Cost and Scalability, required Capacity, Security, Remote Access, Collaboration and Disaster Recovery. Legal provisions due to privacy or confdentiality may also infuence our decision. Figure 2.13 summarizes the pros and cons of each of the two basic storage methods, on-premises servers (local infrastructure/ data centres) and Cloud-based storage, as well as recommended actions to comply with the latest regulations (COMPARE THE CLOUD, 2018). You may also review the video "Public Cloud vs Private Cloud vs Hybrid Cloud" (in the useful video resources), which compares and contrasts public, private and hybrid clouds: the basic elements of each, the features and benefts that each delivers, and how each type meets specifc business needs.

In their 2018 report, Data Management Life Cycle Final report, Miller and his colleagues recognise the demand for cost-effective storage technologies. "*More and more organizations are considering outsourcing storage services or cloud storage options because the availability of cloud computing resources opens up possibilities for users to purchasing access to computing power and storage space as a service instead of maintaining it themselves. This way, providers are responsible for the performance, reliability, and scalability of the computing environment, while users can concentrate on data analysis and production*".

Nevertheless, security and privacy are signifcant concerns holding back use of the cloud, particularly for confdential, sensitive, or personally identifable information. Let's not forget what happened at Code Space, which led to data deletion and the eventual shutdown of the company.

The most common risks we need to consider include: Downtime and service outages since cloud computing systems are internet based, vulnerability to external cyber-security attacks, compliance and legal issues depending on the applied regulation, lifetime costs that could end up being higher than you expected as well as limited control and fexibility since the cloud infrastructure is owned, managed and monitored by the service provider.

Despite these concerns, the potential of cloud storage seems to be more promising than the associated risks which are expected to diminish over time. As per Gartner "*Through 2025, 99% of cloud security failures will be the customer's fault*" (Panetta, 2019). and *"Organizations that do not have a high-level cloud computing strategy driven by their business strategy will signifcantly increase their risk of failure and wasted investment*" (Cearley, 2017).

Whichever is our choice, even a hybrid storage solution, we need to realize that storage technologies present several risks to long-term preservation of data. Moreover, "Many cases of content loss are not necessarily due to technical faults but can come from human error, lack of budget, or a failure to regularly monitor the integrity of the stored data" (Digital Preservation Coalition, 2015) (Fig. 2.14).

Let's now take a closer look at security issues and particularly cybersecurity. According to Digital Preservation Handbook, security issues relate to:

• **system security** (e.g., protecting digital preservation and networked systems / services from exposure to external / internal threats),

**Fig. 2.14** Characteristics of good practice for storage strategy

**Fig. 2.15** Countermeasures against cyber-attacks


When it comes to **cybersecurity**, protecting educational data requires both administrative and technological security measures, in order to prevent unauthorized parties from accessing it. In the below Fig. 2.15, you may review some of these countermeasures to create an effective defence against cyber-attacks.

In order to help school protect against cyberthreats and develop effective security programs, there is also a really useful Report about K-12 Security Risk Methodology (Woody, 2004), emphasizing that while technology "is broadly used in the K-12 environment by many participants including administrators, teachers, parents, students, school board members, etc." "while this enables a wide range of useful activities, the risk for inappropriate and illegal behaviour that violates privacy, regulations, and common courtesy is increasing exponentially".

The thing that kept me awake at night (as NATO military commander) was cybersecurity. Cybersecurity proceeds from the highest levels of our national interest ... through our medical, our educational, to our personal fnance (systems). (Admiral James Stavridis, Ret. Former-NATO Commander in Cybersecurity and Digital Business Risk Management, 2020).

To this point we have provided an overview of the key issues of digital preservation and realized its importance to maintain usable our educational data over time. You may also review in this video "How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTV" (in the useful video resources), the (mostly) true story of how 'Toy Story 2' was almost deleted from Pixar Animation's computers during the making of the flm. And how the flm was saved by one mom's home computer!

Let us move forwards to identify good practices and appropriate actions to collect the needed data, as well to protect this data and safeguard its privacy, especially when it comes to sensitive educational data.

After all, "Data protection is all about protecting people – not just fles and computer systems" (Moore Barlow, 2018).

### **Questions and Teaching Materials**

1. **Following the discussion with the DPO about the school's preservation strategy and policies, Alice starts wondering. Is digital content so fragile, after all? Should I fnd more about preservation issues to protect my course's digital content?**

Alice accesses the video "Why Digital Preservation is Important for Everyone".

She now understands that though traditional information sources can easily survive for years, decades and even centuries, digital items require special care to preserve them. More specifcally, the digital items are fragile as they require special care to keep them usable, they are dependent as they depend on technology to make them available and require active management to ensure their ongoing accessibility.

**Is this assumption True or False? Please select the right answer.**


Correct answer: True

2. **Alice soon realises that she needs to seek "guidance on key issues and actions to consider when creating digital materials to ensure their longevity of active use and potential for long-term preservation" (**Digital Preservation Handbook**).**

**Please mark the correct key elements corresponding to each category of issues that Alice needs to address for digital preservation.**


Correct answers: as marked with X above

### 3. **Alice is presently at the point of investigating on the key technical elements of digital preservation.**

It's a bit hard for her to deal with such technical issues. Are you ready to help her?

You may review the defnitions of the key technical elements of digital preservation, presented in page 2 of the USGS Guidelines, 2014.

**Please match the appropriate defnition (from the right column), to the respective technical element (in the left column).**


Correct answers: 1-E, 2-A, 3-D, 4-C, 5-B

4. **Let's go back to Alice. She gets informed by the responsible colleague about the hybrid storage solution used by the school. It's a combination of local infrastructure/data centre and cloud-based storage. Moreover, as per her school guidelines for data storage good practice strategy, she needs to create multiple independent copies to stabilize her fles. The copies are geographically separated in different locations, using different storage technologies and are actively monitored to ensure any problems are detected and corrected.**

She wonders about the criteria that infuenced the school's decision making for the selected storage solution for digital preservation. Can you help her specify these selection criteria?

**Please select the right answers.**


Correct answers: B, C, and E.

5. **Alice is now interested in learning more about cost-effective storage technologies and more specifcally about storing data on the cloud. What is a cloud and why there are different types of clouds? She decides to watch again the video "**Public Cloud vs Private Cloud vs Hybrid Cloud**".**

Can you assist Alice in getting a deeper understanding of cloud-based storage? **Please select the right answer(s). You may select more than one answer.**


Correct answers: A, C, E

6. **After reading the article "**Murder in the Amazon cloud**", Vadali (**2017**), presenting the story of Code Space, which led to data deletion and the eventual shutdown of the company, Alice is more concerned about storage security.**

What are the needed tasks for the school and herself personally, to keep the students 'data safe?

You may review again Fig. 2.15, as well as the Techniques for protecting information according to Digital Preservation Handbook.

**Please select the right answer(s). You may select more than one answer.**


F. **Use Encryption, a cryptographic technique which protects digital material by converting it into a scrambled form.**

Correct answers: A, C, F

7. **Alice watches the video "**How Toy Story 2 Almost Got Deleted: Stories From Pixar Animation: ENTV**" and thinks "What an unbelievable story!"**

**She then starts laughing. The director could have avoided this "almost disaster" if he.**

**Please select the right answer.**


Correct answer: E.

### 8. **ACTIVITY/PRACTICE QUESTION (Short answer)**

Name some types of educational data that need long term preservation.

### 9. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response in the following refective tasks. You may refect on:


# **2.3 Educational Data Ethics**

# *2.3.1 Informed Consent*

The video "Introduction to data ethics" (in useful video resources) introduces the basic principles of data ethics.

As Pentland states when describing Big Data, *"the ability to track, predict and even control the behaviour of individuals and groups of people is a classic example of Promethean fre: it can be used for good or ill"* (Pentland, 2013).

New regulations, like the **GDPR (General Data Protection Regulation)** (Regulation (EU), 2016) that we will discuss later on, along with recent events such as the Cambridge Analytica and Facebook scandal, **have raised awareness of data ethics issues that can arise from data misuse (Open Data Institute, 2018a).**

Open Data Institute (ODI) (Broad et al., 2017), defnes Data Ethics as.

a branch of ethics that evaluates data practices with the potential to adversely impact on people and society – in data collection, sharing and use.

Several frameworks, policies and guidelines have been developed to address data ethics issues, including JISC's code of practice (Shacklett, 2016), updated in 2018, the LACE (Learning Analytics Community Exchange) framework in 2016 and the ICDE (International Council for Open and Distance Education) Global guidelines (Slade & Tait, 2019). To help identify potential ethical issues associated with a data project or activity and the steps needed to act ethically, Open Data Institute has also designed the Data Ethics Canvas in 2018 (Open Data Institute, 2018b).

We will further discuss the basic common principles of these practices in Chap. 3.

As emphasized by Shacklock (2016)"*Institutions should put in place clear ethical policies and codes of practices that govern the use of educational data. These policies should, at a minimum, address privacy, security of data and consent*."

Before proceeding further, the brief video "What is the GDPR?" (in useful video resources) provides an overview of the European Union data protection rules, also known as the EU General Data Protection Regulation (or GDPR), that apply since 25 May 2018 to all entities who collect, store and process any personal data belonging to EU citizens and residents (even organisations that are not EU-based). GDPR has strengthened the conditions for consent (GDPR.eu, 2019).

We will soon discuss this new regulation and how should be applied by the various entities. First, let's see what informed consent is all about.

Informed consent is declared by most international guidelines as one of the pivotal principles in Data Ethics and "is explicitly mentioned as a principle in article 7 of the International Covenant on Civil and Political Rights (1966), a United Nations Treaty" (European Commission, 2013).

According to Griffths et al. (2016) *"Informed consent refers to the requirement for an individual to give consent for the collection and analysis of the data which they generate."* While *"Transparency refers to the degree to which users can observe the ways in which the data they generate is used".*

As per European Commission's report (2013) regarding Ethics for Researchers "*Informed consent consists of three components: adequate information, voluntariness and competence***.**"

Thus, prior to consenting, individuals should be clearly informed of the data collection goals, possible adverse impacts and the means available to them to refuse or withdraw consent, without consequences, at any time.

Moreover, individuals must be competent to understand the information and should be fully aware of the consequences of their consent. Greater attention is required for **some special categories of people**, such as children, vulnerable adults and people with certain cultural or traditional backgrounds.

At this point, it is important to understand the distinction between **consent** and **informed consent**. For informed consent, we need to ensure that individuals genuinely understand how we intend to use their data e.g., by running focus groups and/ or publishing explanatory documents.

As per European Commission guidelines about GDPR, *"when a company or organisation asks for consent to collect or reuse personal information, the data subjects have to make a clear action agreeing to this, for example by signing a consent form or selecting yes from a clear yes/no option on a webpage"…"It is not enough to simply opt out, for example by checking a box saying they don't want to receive marketing emails. They have to opt in and agree to their personal data being stored and/or re-used for this purpose."*

European Commission emphasizes that informed consent means that before you consent, you must **be given information** about the processing of your personal data, including at least:


The way individuals are informed is crucial for the informed consent process. We should ensure that they fully realize the expected consequences of granting or withholding consent (Fig. 2.16).

With regards to the collection of personal data about **children**, additional protection should be granted since children are less aware of the risks and consequences of sharing data and of their rights.

In U.S., the foundational federal law on student privacy, the **Family Educational Rights and Privacy Act (FERPA)**, establishes student privacy rights by restricting with whom and under what circumstances schools may share students' personally identifable information. DQC has developed a tool that summarizes some of the main provisions of FERPA and can be used as a guide to help interested parties to understand when they need to take a closer look at the law or consult an expert.

Under GDPR, any information addressed specifcally to a child should be adapted to be **easily accessible, using clear and plain language.**

**Fig. 2.16** Conditions for informed consent

For most online services (social networking sites) **the consent of the parent or guardian** is required in order to process a child's personal data on the grounds of consent up to a certain age.

The age threshold for obtaining parental consent is established by each EU Member State and can be between 13 and 16 years, according to National Data Protection Authority.

As per European Commission clarifcations for the Rights for Citizens, "*Companies have to make reasonable efforts, taking into consideration available technology, to check that the consent given is truly in line with the law. This may involve implementing age-verifcation measures such as asking a question that an average child would not be able to answer or requesting that the minor provides his parents' email to enable written consent*".

Within the context of education, there are quite different approaches relating to the consent in collecting learners' data, according to national guidelines (when available).

Figure 2.17 depicts the main principles and challenges that should be taken under consideration to comply with GDPR. As presented, data-related activity can still be lawful, by complying with legal obligations e.g. GDPR, even though it may be considered that data is not treated ethically. Sclater (2017) also argues that "*consent is required for use of sensitive data and in order to take interventions directly with students on the basis of the analytics. This implies that if the data in question are not considered 'sensitive', and do not form the basis for any intervention, consent is not required (on the basis that this may be considered as of legitimate interest)"*.

Moreover, as per the ICDE's recent report (2019), many institutions seek for consent to collect student data for additional purposes, beyond institutional reporting and basic student support, at the point of registration. As emphasized, "*expectation that users should consent to uses of personal data unknown at the point of registration seems to be an unreasonable and unethical one*."

**Fig. 2.17** The main principles and challenges that should be taken under consideration to comply with GDPR

An alternative approach supported by most of the existing guidelines (Higher Education Commission, JISC's code of practice, ICDE Global guidelines) might be to differentiate between the granting of initial consent for the collection of data and the obtaining of additional consent at the point where a specifc personal intervention is proposed, or in the case where new data is incorporated into the institution's system, or existing data is used in new ways.

As concluded in ICDE report (2019) "*national legislation will infuence positions taken, but generally this principle (of consent) should be built around a minimum of informed consent (that is, transparency before registration)*."

You may also review this video "Why develop a data science code of ethics?" (in useful video resources) where experts from the data science community explain why it's important to have a code of ethics.

### **Questions and Teaching Materials**

	- A. **7 days**
	- B. **2 months**
	- C. **6 months**
	- D. **3 years**

Correct answer: C

### 2.3 Educational Data Ethics

# 2. **Before using the fipped classroom initiative, Alice wants to study Grade 9 students' perceptions of technology, using an online questionnaire she made with Google Forms.**

Alice wants to prepare an informed parental consent form for her students (as they are under 15) in order to participate in the students' perceptions of technology survey, but she is a bit confused with all this information.

**Can you help Alice to have a better understanding?**

	- **True**
	- **False**

Correct answer: True

	- **True**
	- **False**

Correct answer: False

	- **True**
	- **False**

Correct answer: True

3. **You give some advice to Alice in order to help her prepare the consent form for the students' perceptions of technology study. Select all that apply.**

**A consent request must:**


Correct answers: A, C, D, G, H

4. **Alice has a colleague, Betty, who has just come on board and wants to conduct an online survey with her 17-year-old students about their eating habits. Betty asks Alice if it is necessary to collect parental consent in order to process her students' personal data.**

**Help Alice decide if a consent as a parent or guardian is required in order to process students' personal data**


Correct answer: No.

5. **Alice's Secondary High-School relies upon the sixth lawful basis (public task basis) to justify the processing of personal data (according to GDPR) where processing is necessary for the performance of a task carried out in the public interest or in the exercise of offcial authority vested in the controller.**

Is this lawful basis (public task basis) appropriate for Alice in order to take interventions directly with students on the basis of the participation data recorded within the Learning Management System?

### **Help Alice fnd the correct answer**


Correct answer: No.

6. **In the video "**Why develop a data science code of ethics?**", Paula Goldman, VP/Head of Omidyar Network's Tech and Society Solutions Lab, claims that data and algorithms are neutral.**


Correct answer: False.

### 7. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response in the following refective task. You may refect on:


# *2.3.2 Sensitive Educational Data Protection*

Balancing digital learning with privacy and security is essential to fostering a successful digital culture (iKeepSafe, 2017).

**Privacy** is a **fundamental human right** and a **core value** in the functioning of democratic societies. As already discussed in the previous topics, with the exponential progress in the feld of information and communication technologies and in the light of **rapid development of Educational Data Analytics** on a global basis, **new challenges to privacy and data protection have emerged**.

The "Privacy Overview for K12 Teachers and Administrators" video (in useful video resources) provides us with an overview of the privacy issues that may arise and growing concerns about educational data privacy. Is educational data privacy over in the digital age?

In the Quantifed Student infographic you may see what a day in the data-driven life of most measured and monitored student in the history of education, looks like.

"*The data collection begins even before he steps into the school,"* says Khaliah Barnes, director of the Student Privacy Project at the Electronic Privacy Information Center. "*The issue is that this reveals specifcally sensitive information*," says Barnes (Hill, 2014).

Moreover, as Jose Ferreira CEO at Knewton (one of the biggest actors in the feld of educational technology software), points out "*We literally know everything about what you know and how you learn best, everything.*" Ferreira calls education "*the world's most data-mineable industry by far*" (Hill, 2014).

**Do educational data analytics challenge the principles of data protection? Is privacy a show-stopper?** How privacy is guaranteed/secured, especially if minors and/or sensitive data is involved?

The European position has been expressed in the European Commission's report: "New Modes of Learning and Teaching in Higher Education" (European Commission, 2014). In recommendation 14, the Commission clearly stated: "*Member States should ensure that legal frameworks allow higher education institutions to collect and analyse learning data. The full and informed consent of students must be a requirement and the data should only be used for educational purposes*", and in recommendation 15: "*Online platforms should inform users about their privacy and data protection policy in a clear and understandable way. Individuals should always have the choice to anonymise their data.*" This is a **widely accepted framework mirrored** in the laws of **multiple nations and international organisations** including many U.S. states (Drachsler & Greller, 2016).

Thus, it is essential that all educators understand how learners' personal information is used and adequately protect learners' data in order to strengthen the trust of all parties involved and encourage their participation in digital learning.

In the video by the Data Quality Campaign "Who Uses Student Data?" (in useful video resources), it is emphasized that most personal student information stays local. Districts, states, and the federal government all collect data about students for important purposes like informing instruction and providing information to the public. But the type of data collected, and who can access them, is different at each point.

As clearly stated in Foundational Principles for Using and Safeguarding Students' Personal Information developed by a coalition of US national education organisations "*Everyone who uses student information has a responsibility to maintain the privacy and the security of students' data, especially when these data are personally identifable.*"

The basic information security techniques, as specifed by Digital Preservation Handbook, include:

### **Encryption**

• Encryption is a cryptographic technique which protects digital material by converting it into a scrambled form. The use of a key is required to unscramble the data and convert it back to its original form.

### **Access Control**

• Access control enables an administrator to specify who is allowed to access digital material and the type of access that is permitted (for example read only, write).

### **Redaction**

• Redaction refers to the process of identifying and removing or replacing confdential or sensitive information, using anonymisation or pseudonymisation.

Now that we have a better understanding of the different types of data as categorized in terms of privacy, we will further review the levels of data as specifed under GDPR.

The Fig. 2.18 presents the main categories of personal data as defned by GDPR.

**Fig. 2.18** The main categories of personal data as defned by GDPR

We need to pay extra attention to **sensitive** (special category of personal data) since an organisation can only process this data under specifc conditions (explicit consent may be needed). Even **personal data**, as clarifed under GDPR, "*should only be processed where it isn't reasonably feasible to carry out the processing in another manner. Where possible, it is preferable to use anonymous data. Where personal data is needed, it should be adequate, relevant, and limited to what is necessary for the purpose ('data minimisation').*"

Once data is truly **anonymised** and does no longer contain any identifying elements, the anonymisation is **irreversible** and individuals are no longer identifable, the data will not fall within the scope of the GDPR and it becomes easier to use.

Before anonymization, we should consider the purposes for which the data is to be used. Anonymisation may devalue the data, so that it is no longer useful for specifc purposes.

The ICO's Code of Conduct on Anonymisation provides further guidance on anonymisation techniques (UCL, 2018). Unlike anonymisation, in **pseudonymised data** personally identifable material is replaced with artifcial identifers. Pseudonymised personal data can still fall within scope of the GDPR, depending on how diffcult it is to attribute the pseudonym to a particular individual.

Whether 'de-identifed' or pseudonymised data is in use, there is a residual risk of re-identifcation. For example, anonymisation is often seen as the "easy way out" of data protection obligations. However, experts around the world are adamant that 100% anonymisation is not possible. Anonymised data can rather easily be deanonymised when they are merged with other information sources. (Drachsler & Greller, 2016).

**Fig. 2.19** Data protection by design and data protection by default

L. Sweeney (2000) presented that it's possible to personally identify 87% of the U.S. population based on just three data points: fve-digit ZIP code, gender and date-of-birth (Wes, 2018). Later on, in 2006, the AOL release of users' search logs (Hansell, 2006) and the case of the Searcher No. 4417749, as recorded in "A Face Is Exposed for AOL Searcher No. 4417749"by M. Barbaro and T. Zeller (2006) of New York times, was one of the frst widely known cases of re-identifcation. In 2007, the Netfix case (Narayanan & Shmatikov, 2008), followed when researchers de-anonymized some of the Netfix data by matching rankings and timestamps with public information on the Internet Movie Database. As per Hill (2012), in 2012 the retail company Target, using behavioural advertising techniques, managed to identify a pregnant teen girl from her web searches and sent her relevant vouchers at home. (D'Acquisto et al., 2015).

Thus, though de-identifcation techniques can reduce the risks to the data subjects concerned and help organisations to meet their data-protection obligations, we need to assess properly the adequacy of these methods so as to decide whether further steps to de-identify the data are necessary (UCL, 2018).

The GDPR introduces two new principles: data protection by design and data protection by default, whose defnitions are presented in Fig. 2.19.

As specifed in GDPR (Regulation (EU), 2016), the protection of the rights and freedoms of natural persons with regard to the processing of personal data require that appropriate technical and organisational measures be taken which meet in particular the principles of data protection by design and data protection by default.

**Fig. 2.20** Eight privacy by design strategies, as proposed by the European Union Agency for Network and Information Security (D'Acquisto et al., 2015)

"Data protection by design minimises privacy risks and increases trust", while "Data protection by default entails ensuring that your company always makes the most privacy friendly setting the default setting" (European Union, 2018).

An example of Data protection by design is the use of pseudonymisation & encryption and examples for Data protection by default include "data minimisation" (only the data necessary should be processed), the limited accessibility as well as the short storage period.

Let's now review further the privacy by design strategies and the storage privacy (Data protection by design), as well as the Storage Limitation (Data protection by default).

Figure 2.20 depicts eight Privacy By Design Strategies, as proposed by the European Union Agency for Network and Information Security (D'Acquisto et al., 2015). These strategies enable us to identify the data protection and privacy requirements early in the educational analytics value chain and subsequently to implement the necessary technical and organizational measures. One of the most signifcant privacy enhancing technologies that can be used for implementing such strategies, is storage privacy.

Privacy challenges should be, seen as opportunities that, if appropriately handled, can build trust in the big data ecosystem for the beneft of both users and big data industry (D'Acquisto et al., 2015).

Danezis et al. (2014), in this report "Privacy and Data Protection by Design", defnes Storage Privacy as "*the ability to store data without anyone being able to read* 

*(let alone manipulate) them, except the party having stored the data (called here the data owner) and whoever the data owner authorises*."

As specifed further in the report, "*a major challenge to implement private storage is to prevent non-authorised parties from accessing the stored data. If the data owner stores data locally, then physical access control might help, but it is not suffcient if the computer equipment is connected to a network: a hacker might succeed in remotely accessing the stored data. If the data owner stores data in the cloud, then physical access control is not even feasible.*"

A straightforward option for storage privacy is storing the data, either locally or in cloud storage, in encrypted form. One can use full disk encryption (FDE) or fle system-level encryption (FSE). As clarifed in the report, "*encryption and decryption operations must be carried out locally, not by remote service, because both keys and data must remain in the power of the data owner if any storage privacy is to be achieved. The report specifes that outsourced data storage on remote clouds is practical and relatively safe as long as only the data owner, not the cloud service, holds the decryption keys. Such storage may be distributed for added robustness to failures*."

When it comes to Data protection by default, **Storage limitation** is one of the key conditions for processing personal data under GDPR. It replies to a simple question "*For how long can data be kept and is it necessary to update it?*" Regulation's answer is straightforward "*You must ensure that personal data is stored for no longer than necessary for the purposes for which it was collected*". There are 6 basic guidelines, specifed clearly by GDPR, which you need to take under consideration when storing personal data (Fig. 2.21).

Before closing this chapter, it is essential to analyse the individuals' rights. The main reason for the introduction of GDPR is to allow European Union citizens to better control their personal data. More specifcally is designed to:


GDPR applies to "all companies operating in the EU, wherever they are based" (European Commission, 2018). The GDPR introduces stronger rights for data subjects (Intersoft Consulting, 2018), and creates new obligations for data controllers (the person or body handling the personal data).

Figure 2.22 presents individuals' rights so as to have control over their personal data, under GDPR. To exercise individuals' rights they should contact the company or organisation processing their personal data, also known as the controller. If the company/organisation has a Data Protection Offcer ('DPO') they may address their request to the DPO. The company/organisation must respond to their requests without undue delay and **at the latest within 1 month.**

When the personal data, for which a company/organisation is responsible, is disclosed, either accidentally or unlawfully, to unauthorised recipients or is made temporarily unavailable or altered, a data breach occurs. In case a data breach occurs and the breach poses a risk to individual rights and freedoms, the company/

**Fig. 2.21** Six basic guidelines, which you need to take under consideration when storing personal data

**Fig. 2.22** Individuals' rights so as to have control over their personal data, under GDPR

organisation should notify its Data Protection Authority (DPA) within 72 hours after becoming aware of the breach. Depending on whether or not the data breach poses a high risk to those affected, a business may also be required to inform all individuals affected by the data breach (European Commission, 2018h).

Whenever processing is likely to result in a high risk to the rights and freedoms of individuals, as specifed by GDPR, a Data Protection Impact Assessment (DPIA) is required. A DPIA is required at least in the following cases:


National Data Protection Authorities, in collaboration with the European Data Protection Board, may provide lists of cases where a DPIA would be required. As emphasized, "*the DPIA should be conducted before the processing and should be considered as a living tool, not merely as a one-off exercise. Where there are residual risks that can't be mitigated by the measures put in place, the DPA must be consulted prior to the start of the processing*".

Figure 2.23 provides the 3 Basic Steps to Identify and Protect Sensitive Data, as per Krueger (2017).

A DPIA should be conducted as early as possible in the project lifecycle, so that its fndings and recommendations can be incorporated into the design of the processing operation (itgovernance).

You may also review the video "Protecting Student-Data Privacy: An Expert's View**"** (see useful video resources) where Fordham University Law Professor Joel Reidenberg talks with Education Week Correspondent John Tulenko about student data and the best ways to keep it secure.

**Fig. 2.23** The 3 Basic Steps to Identify and Protect Sensitive Data

### **Questions and Teaching Materials**

1. **Alice is a bit confused. Several state and federal laws require privacy protection for students and children. In the video she just watched, "**Privacy Overview for K12 Teachers and Administrators**", what laws are mentioned concerning data privacy for children?**

**There is more than one correct answer. Help Alice select the right ones**


Correct answers: A, C

2. **From watching the "**Who Uses Student Data?**" video, Alice understands that teachers have access only to de-identifed data (i.e. information about individual students but with identifying information removed).**

Is Alice's understanding correct? **Please select the correct answer:**


Correct answers: No.

### 3. **For the purposes of research, Alice intends to release student data.**

Alice asks to be informed by the responsible DPO on school's policy and guidelines to protect students' data privacy, confdentiality, integrity and security. She becomes aware of personal and sensitive data handling and the use of anonymisation and pseudonymisation to remove personally identifable information.

As student data might be released for the purposes of research, all names, postal codes and other identifable data are removed. Completely removing felds that could be used in any way to identify a person is considered a strong form of


**Please select the correct term to complete the sentence.**

4. **Alice has concerns about her students' records, and more specifcally about medical reports related to student's learning diffculties being accessed by unauthorized third persons. She contacts the responsible DPO and is informed about the appropriate technical and organisational measures taken by the school, so as to secure data protection by design and by default.**

More specifcally the DPO explains to Alice that the School Information System (SIS) has a mechanism for comprehensively logging who consulted the medical reports and preventing unauthorized access to these sensitive data. Moreover, personal and sensitive data are pseudoanonymized and "data minimization" (only the data necessary should be processed) is used.

Alice feels secure because the technical and organisational measures being taken meet in particular the principles of data protection by design and data protection by default.

Is Alice correct in feeling secure? **Please select the correct answer:**


Correct answer: Yes

5. **Storage privacy is about preventing non-authorized parties from accessing the stored data. This can be achieved only when encryption and decryption operations are carried out locally, not by remote service, because both keys and data must remain in the power of the data owner.**

Alice assumes that if any storage privacy is to be achieved, then data must be stored locally and cloud storage should be avoided.

**Do you agree with the assumption of Alice? Please select the correct answer:**


Correct answer: No.

6. **Alice's institution runs a recruitment offce and for that purpose it collects CVs and keeps records of persons seeking employment.** They keep recruitment application forms and interview notes (for unsuccessful candidates) for 5 years in case they need them without taking any measures for updating the CVs

Alice doubts that the storage period is proportionate to the purpose of fnding employment and thinks that this is not compliant with GDPR. Do you agree with Alice?

You may review "For how long can data be kept and is it necessary to update it? | European Commission (europa.eu)".

### **Please select the correct answer:**


Correct answer: Yes.

7. **Alice is trying to understand the rights for data subjects described in GDPR. She reviews "Data protection and online privacy – Your Europe (europa.eu)" and "It's your data – take control – Data protection in the EU (europa.eu)".**


**Help Alice match the cases to the appropriate individual right.**

Correct answer: A4 – B1 – C2 – D3 – E6 – F5.

8. **Alice's institution recruitment offce decides to implement an innovative recruitment procedure which includes e-recruitment tools automatically pre-selecting/excluding candidates without human intervention. Alice thinks that a Data Protection Impact Assessment (DPIA) is required.**

Study the "Decision of the European Data Protection Supervisor of 16 July 2019 on DPIA Lists issued under Articles 39(4) and (5) of Regulation (EU)" and **select** the "**Criteria for processing 'likely to result in high risk'**", that will trigger DPIA in the case of Alice's institution new recruitment procedure (select 3 criteria).

**Which are the criteria for processing "likely to result in high risk"?**


Correct answer: 1, 2, 8.

### 9. **According to Professor Joel Reidenberg, in the video "Protecting Student-Data Privacy: An Expert's View", the worst that could happen because of bad data practices is:**


Correct answer: B.

### 10. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response in the following refective task. You may refect on:


# **2.4 Concluding Self-Assessed Assignment**

# *2.4.1 Introduction*

Both Alice and you have come a long way in your understanding of the power of educational data as a key success factor for online and blended teaching and learning, as well as of the fundamentals of Educational Data Collection and Management, including issues related to ethics and privacy.

You are now ready to develop further your Educational Data Literacy Competences focusing on Educational Data Analysis, Comprehension and Interpretation.

In order to proceed, you are requested to complete a concluding self-assessed assignment. This self-assessed assignment is a real life scenario activity (based on the use case of our teacher Alice), using a rubric across three profciency levels and an exemplary solution rating. When you have completed this assignment, you will assess it yourself, following the rubric which will list the criteria required and give guidelines for the assessment.

This self-assessed assignment procedure consists of 5 steps:


# *2.4.2 Step 1. Real Life Scenario*

Alice is an enthusiastic English Language teacher who has just been appointed in an Experimental High School, in Athens, Greece. She wants to use student data to gain insights and plan her teaching activities accordingly, so as to improve this year's Grade 9 students' academic performance.

Alice contacts Mr. Adams, appointed as school's Data Protection Offcer (DPO), to secure all necessary approvals for the sources handled by her school or by the corresponding district. As soon as Alice signs the required data protection consent form, she gets permission and downloads the datasets from the several sources.

Alice also requests to grant her access to the LMS used by the school (a new teacher account is created by the LMS administrator). Before implementing her fipped classroom strategy, she contacts the school's DPO again to discuss any legal and ethical issues she needs to pay attention to. As advised by the DPO, she accesses the LMS and via the "*User agreements page*", she reviews the existing user agreements and confrms that **signed informed consent** has been given for all participating students (either parental consent on behalf of minors or directly by the students, as defned by National Data Protection Authority).

Alice realizes that she must update the current consent form based to the new General Data Protection Regulation Policy.

**You need to help Alice to prepare a new consent form for the students participating in her fipped classroom model.**

# *2.4.3 Step 2. Getting Familiar with the Assessment Rubric*

Alice reviews the **Initial Consent Form**.

Please help Alice to evaluate this Initial Consent Form using the **Rubric for assessing the Consent Form** and to identify potential issues.

**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the evaluation of the Initial Consent Form created by Alice, in the following refective task. You may refect on:


### **2.4.3.1 Initial Consent Form**

### Introduction

Welcome to Athens Experimental High School (the "School" or "We") Learning Management System (LMS). The School provides this LMS to you subject to the following Terms of Use and Privacy Policy (together, the "Terms"). When you use this LMS, you agree to abide by these Terms. If you do not agree to abide by these Terms, you may not use this LMS. Please read the Terms carefully.

The School reserves the right to make changes to this LMS and to modify the Terms at any time at its sole discretion. We encourage you to review the Terms frequently for modifcations. By your use of this LMS, you agree to abide by any such modifcations to the Terms, which are binding on you.

### Privacy Policy

This Privacy Policy describes the School's agreement with you regarding how we will handle certain information on the LMS. This Privacy Policy does not address information obtained from other sources such as submissions by mail, phone or other devices or from personal contact. By accessing the LMS and/or providing information to the School on the LMS, you consent to the collection, use and disclosure of certain information in accordance with this Privacy Policy.

### *Information Collected on Our LMS:*

If you merely download material or browse through the LMS, our servers may automatically collect certain information from you which may include: (a) the name of the domain and host from which you access the Internet; (b) the browser software you use and your operating system; and (c) the Internet address of the website from which you linked to the LMS. The information we automatically collect may be used to improve the LMS to make it as useful as possible for our visitors; however, such information will not be tied to the personal information you choose to provide to us.

We do collect and keep personally identifable information when you choose to voluntarily register to the LMS and submit such information. After your registration, we retain the information you submit for our records and to contact you from time to time. Please note that if we decide to change the manner in which we use or retain personal information, we may update this Privacy Policy, at our sole discretion.

### *Disclosure of Personal Information to Third Parties:*

The School does not rent or sell personal information that you choose to provide to us nor does the School disclose credit card or other personal fnancial information to third parties other than as necessary to complete a credit card or other fnancial transaction or as required by law. The School does engage certain third parties to perform functions and provide services, including, without limitation, hosting and maintenance, customer relationship, database storage and management, payment transaction and direct marketing campaigns. We will share your personal information with these third parties, but only to the extent necessary to perform the functions and provide the services, and only pursuant to binding contractual obligations requiring such third parties to maintain the privacy and security of your data.

### *Receiving Promotional Materials:*

We may send you information or materials such as newsletters, ebooks, whitepapers by e-mail or postal mail when you submit your address via the LMS. By your registration in the LMS, you are consenting to our sending you such information or materials.

If you do not want to receive promotional information or material, please send an email with your name, mailing address and email address to athens.expschool. online@gmail.com. When we receive your request, we may take reasonable steps to remove your name from such lists.

### *Cookies*

A cookie is a small text fle that a website can place on your computer's hard drive for record-keeping or other administrative purposes. Our LMS may use cookies to help to personalise your experience on the LMS. Although most web browsers accept cookies automatically, usually you can modify your browser setting to decline cookies. If you decide to decline cookies, you may not be able to fully use the features of the LMS. Cookies may also be used at certain sites accessible through links on the LMS.

### *Links to Other Websites:*

The School is not responsible for the practices or policies of the websites linked to or from the LMS, including without limitation their privacy practices or policies. If you elect to use a link that accesses another party's website, you will be subject to that website's practices and policies.

### Terms of Use

### *For Informational Purposes Only*

The School makes available the information on this Website for informational purposes only. You are solely responsible for the information you provide on this Website and for the information you use that you view on this Website. Information on this Website is not intended to be a replacement for direct consultation with the School; if you have questions or concerns, please contact the School directly.

### *Copyright and Trademark Information*

The content included on this LMS, such as data, text, graphics, logos, images and software and its compilation is the property of the School and/or its content suppliers and is protected by copyright and trademark laws. In the event you upload any content including, without limitation, photographs or videos to this LMS, you (i) represent to the School and its affliates that you have all rights necessary to upload the content; (ii) agree to indemnify the School and its affliates for any third party infringement or other claims related thereto; and (iii) hereby license to the School and its affliates a perpetual non-cancellable royalty-free license to use such uploaded content for any purposes in any media now existing or hereafter developed.

### *License for Your Use*

For any period of time that you use this LMS and abide by these terms, the School grants to you a limited, revocable and nonexclusive license to access this LMS for your use but not to copy, download or modify it, or any portion of it, except with the express written consent of the School. This LMS or any portion of this LMS may not be reproduced, duplicated, copied, sold, visited or otherwise exploited without the express written consent of the School. You may not utilize framing to enclose any trademark, logo, content or other proprietary information contained on this LMS without the express written consent of the School. You may not use any meta tags or any other "hidden text" utilizing the School or its affliates' name or trademarks without the School's express written consent.

You agree to use this LMS only for lawful purposes, and you acknowledge that your failure to do so may subject you to civil or criminal liability. You are responsible for ensuring that any materials you upload, post or submit to this LMS do not violate the copyright, trademark, trade secret or other personal or proprietary rights of any third party and you hereby agree to indemnify the School for any third party infringement or personal rights claims. You agree not to disrupt, modify, or interfere with this LMS or its associated software, hardware and servers in any way and you agree not to impede or interfere with others' use of this LMS. You further agree not to alter or tamper with any information or materials on or associated with this LMS. Any unauthorized use or violation of these terms automatically terminates any permission or license granted by the School to access and use this LMS.

### *External Links*

This LMS may provide links or references to third party websites or applications, including without limitation, third party websites or applications of advertisers or of providers of informational articles or other users. The School is not responsible for any information you choose to provide to those third party websites or applications; any information, products or services you acquire from those third party websites or applications, or any damages arising from your access to or use of those third party websites or applications.

Any links to third party websites and applications are provided as a convenience to the visitors of this LMS and any inclusion of any such links in this Website does not imply an endorsement or warranty of the third party websites or applications or their security, content, products, offerings or services. You are cautioned that any third party websites or applications are governed by their own terms of use and privacy policies, so when linking you should make sure to visit the appropriate pages of those third party websites or applications to determine what terms of use and privacy policies will apply to your use.

### • **YES, I GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.**

• **NO, I DO NOT GIVE CONSENT FOR MY CHILD TO PARTICIPATE IN THE ONLINE COURSE AND AGREE TO THE CONSENT AS NOTED ABOVE.**

Adapted from: *https://www.whitbyschool.org/privacy-policy*


### **2.4.3.2 Rubric for Assessing the Consent Form**

(continued)


# *2.4.4 Step 3. Prepare Your Answer*

Please assist Alice in preparing a consent form for the students participating in the online course for the fipped classroom initiative.

**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the preparation of the consent form for Alice's students participating in the online course for the fipped classroom initiative, in the following refective task. You may refect on:


# *2.4.5 Step 4. Review a Sample Solution*

Please review a sample of an **Exemplary solution** that follows the criteria specifed in the **Rubric for assessing the Consent Form.**

**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the Exemplary solution that follows the criteria specifed in the Rubric for assessing the Consent Form, in the following refective task. You may refect on:

1. *Do you identify any GDPR requirements that you did not take under consideration when creating your consent form?*

### **2.4.5.1 Exemplary Sample Solution**

### **Consent Form to Register and Participate in the Online Course for the English Language Course of the ninth Grade of Athens Experimental High School.**

In order to register and participate in the online course that will be offered for the English Language Course of the ninth Grade, you are invited to indicate your consent for the collection and processing of your personal data for the purposes of the online course, administered by Athens Experimental High School.

Athens Experimental High School (or "we") uses a variety of resources to support student learning. Moodle™ software has been adopted as Athens Experimental High School's Learning Management System (LMS). Moodle™ software is free and open source, and allows educators to create a private space online, flled with tools that easily create courses and various activities, all optimised for collaborative learning. In order to provide access to our students to the online course for the English Language Course of the ninth Grade on this platform/site, we need to collect and store personal information about them. You may also refer to https://moodle.com/privacy-notice/.

### *Please note:*


person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; where the purposes and means of such processing are determined by Union or Member State law, the controller or the specifc criteria for its nomination may be provided for by Union or Member State law.

### 4. The Data Controller for data processed under this Notice is:

Athens Experimental High School (VAT 021 27 76 45).

20 Makrygianni Road.

11,676 Athens.

Greece.

email: athens.expschool.online@gmail.com

# **Legal basis for processing the personal and sensitive data:**

### **Personal Data:**

In connection with this online course, the Athens Experimental High School's collection and processing of the following Personal Data is lawful based on.

Article 6.1(a), GDPR, Consent.

Article 6.1(b), GDPR, Contract.

Article 6.1(c), GDPR, Legal Obligation.

Article 6.1(f), GDPR, Legitimate Interest:

□ Name, Surname, Email Address.

□ User activity and contribution data.

### **Sensitive Data:**

In connection with this research, the Athens Experimental High School's collection and processing of the following Sensitive Data is lawful based on consent (Article 9.2(a), GDPR):

□ Gender.

### **Potential Benefts:**

The participation in this online course enables data subjects (students) to effectively collaborate with their peers, and tutor(s) to collect data, effciently provide resources, timely feedback and differentiated learning opportunities.

### **Potential Risk or Discomforts:**

We do not perceive of any risk or discomfort in participating in the online course. **Storage of Data:**

The installation of the Moodle™ software platform is hosted in a secure server at Athens Experimental High School's premises. The collected data is also stored in this secure server for the time required by the purposes described in this notice, for maximum 5 years.

### **Data transfer outside the European Union:**

We may share some of the data collected with services located outside the European Union, in particular through the aforementioned Moodle™ software services.

### **Right to Withdraw:**

Your participation in this online course is voluntary. You are under no obligation to participate in this online course and you may withdraw consent at any time, without being at a disadvantage, by contacting the Athens Experimental High School Data Controller for this online course in athens.expschool.online@ gmail.com.

### **Rights of Data Subject:**

Whilst Athens Experimental High School is in possession of or processing your personal data, you, the data subject, have the following rights:


by contacting the Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com.

If the Athens Experimental High School's use of your information is pursuant to your consent, you have the right to withdraw consent without affecting the lawfulness of the Athens Experimental High School's use of the information prior to receipt of your request.

If you think your data protection rights have been breached you have the right to lodge a complaint with Athens Experimental High School Data Controller for this online course in athens.expschool.online@gmail.com and/or your national Data Protection Authority (DPA).

### **Data Subject Concerns and Reporting:**

If you have any questions concerning the online course or experience any discomfort related to the online course, please contact the Athens Experimental High School Data Controller for this online course in athens.expschool.online@ gmail.com.

### **Confict of Interest**

We do not perceive any conficts of interest in the development of this online course.

### **Compensation:**

There is no compensation for data subjects in this online course.

### **Confdentiality:**

The only people processing your data will be the tutor(s) involved in the Athens Experimental High School's online course(s). The tutor(s) undertake to keep any information provided herein confdential, not to let it out of our possession and to report on the fndings from the perspective of the entire participating group and not from the perspective of an individual. Please note that confdentiality cannot be guaranteed while data is in transit over the Internet.

### **Purposes for which the data is being collected and processed:**

The data which is collected and processed via the online course in the Course Management System (Moodle) is being used by the Athens Experimental High School to facilitate teaching and learning. For this, online teaching resources are uploaded where the data subjects (students) enrol and study the lecture material at home. The material is in the form of videos, small activities with automatic feedback (online quizzes), and forum discussions. The data subjects (students) can undertake some additional homework online to further check their understanding and extend their learning. Though this online course and via the usage of CMS tools the tutor(s) monitor the data subjects (students) learning process, discover patterns, fnd indicators for success and indicators for poor marks or drop-out and proceed with recommendations and revisions of the course's online learning activities and educational resources, aiming to improve data subjects' (students') academic performance.

We ensure that the information we collect, process and use is appropriate for these correspondence purposes.

By indicating consent to participate in this online course you also indicate consent for the possible use of data for automated decision making, such as profling, to identify data subjects' (students') progress against a range of indicators and activities identifed to have an impact on data subjects' (students') success in the online course.

### **Consent to register and participate in the Online Course for the English Language Course of the ninth Grade of Athens Experimental High School. Selecting "YES, I AGREE" below indicates that:**


# *2.4.6 Step 5. Self-Evaluate Your Answer*

Now that you have seen the **Exemplary sample solution**, please rate your initial answer (evaluate the consent form you created), using the criteria in the **Rubric for assessing the Consent Form**.

### **Language**


### **Explicit and Distinguishable**


### **Freely given consent**


# **Possibility to withdraw the given consent**


# **Rights of the data subject**


# **Identity of the organisation processing data**


# **Purposes for which the data is being processed**


# **Describes the type of data that will be processed**


# **International transfer of data**


# **References**


Regulation). Offcial Journal L. 119. Retrieved from https://eur-lex.europa.eu/legal-content/ EN/TXT/?uri=celex:32016R0679


# *Useful Video Resources*

External Video: Data wrangling for faster, more accurate analysis [1:47].

External Video: Meta… what? metadata! [5:25].

External Video: Learn more about data interoperability [1:12].

External Video: ICPSR 101: What is data curation? [1:29].

External Video: Data curation @UCSB [2:29].

External Video: Why digital preservation is important for everyone [2:51].

External Video: Public cloud vs private cloud vs hybrid cloud [3:28].


# *Further Readings*


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 3 Learning Analytics**

# **3.1 Introduction and Scope**

# *3.1.1 Scope*

The goals on this chapter are to:


# *3.1.2 Chapter Learning Objectives*



# *3.1.3 Introduction*

At the heart of this chapter is the so-called **Learning Analytics.** Learning analytics has been a hot topic for a while in educational communities, organizations and institutions. There are four essential elements involved in all learning analytics processes: **data**, **analysis**, **report** and **action** (Fig. 3.1).


4. Action is the ultimate goal of any learning analytics process; it is the **set of the informed decisions and the practical interventions** that the **educational stakeholders will undertake**. The results of **follow-up actions** will determine the success or failure of the analytical efforts. Learning analytics is useful **only if there is "action"** as a result of its implementation.

The increased need to **inform decisions and take actions based on data**, points out the signifcance of understanding and **adopting learning analytics in everyday educational practice**. And in order to treat educational data in a respectful and protected manner, the **policies for learning analytics** play a major role and need to be explicitly clarifed.

# **3.2 Using Learner-Generated Data and Learning Context for Extracting Learning Analytics**

# *3.2.1 Defnition and Objectives of Learning Analytics*

Learning analytics is defned by SOLAR as "*the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occur*s" (SOLAR, 2011). In other words, it is an **ecosystem of methods and techniques** (in general procedures) that successively **gather, process, report and act** on machine-readable data **on an ongoing basis** in order to **improve** the **learning environments and experience**.

As described in the Learning Analytics video (in the useful video resources), like any other context-aware process, learning analytics procedures **track** and **record data** about learners and their contexts, **organize** and **monitor** them, and **interpret** and **map** the real current state of those data, **to use them** for providing **"actionable intelligence"**, i.e., insights to act upon.

Based on the shared common understanding of learning analytics, it is important to clarify and discuss what learning analytics can do, what they can be used for, why one needs to use learning analytics, or in other words, what are the **objectives** of learning analytics? Some simple examples from everyday experience can showcase those objectives.


• In *blended learning settings*, the students might not know how to **self-regulate their learning,** and they often **procrastinate**. It's also hard to **monitor each student's progress** and provide **feedback** accordingly.

These defciencies are identifed immediately with learning analytics. More specifcally, learning analytics aim to (Chatti et al., 2012; Papamitsiou & Economides, 2014) and those objectives are illustrated in Fig. 3.2:


Overall, **learning analytics are important** because *every "trace"* within an electronic learning environment may be valuable information that can be *tracked, analyzed and combined with external learner data*; every simple or more complex action within such environments can be *isolated*, *identifed* and *classifed* through computational methods *into meaningful patterns*; every type of interaction can be *coded into behavioural schemes* and *decoded into interpretable guidance* for decision making.

**Fig. 3.2** The objectives of learning analytics

# *3.2.2 Measurements as Indicators of Learners' Current Learning States*

Learning analytics seeks to produce **"actionable intelligence"**; the key is the action that is taken. Campbell and Oblinger (2007) have pointed out **fve steps** in learning analytics: **Capture, Report, Predict, Act, Refne**. From (a) capturing and gathering the raw data, to (b) introducing metrics for sharing a common understanding of the data in educationally meaningful ways, to (c) analyzing the metrics for predicting the future states of the learners, to (d) gaining insights into the learning processes, and to (e) acting upon the data-based evidence for delivering personalized learning to each individual, the **cyclical process of learning analytics is fed with the continuously generated learner data,** illustrated in Fig. 3.3.

**Learning analytics are about learners and their learning**. As such, Clow (2012) proposed a cycle for learning analytics that starts with **learners**. The next step is the generation and capture of **data** about or by the learners. The third step is the **processing of this data into metrics** or analytics, which provide some insight into the learning process. The cycle is not complete until these metrics are used to drive one or more **interventions** (actions) that have some effect on learners.

This learning analytics cycle can provide **a data-perspective to strong learning theories**. For instance, the cycle can be viewed as a data-driven aspect of Kolb's Experiential Learning Cycle (1984): taking the system as a whole, there is a direct correspondence: actions by or about learners (*concrete experience*) generate data (*observation*) from which metrics are derived (*abstract conceptualization*), which are used to guide an intervention (*active experimentation*). The role of the learner is fundamental in this process. And, since learning analytics are extracted from the learners' and learning data, two steps need to be clarifed: a) **what is the learner's data that will be used in learning analytics**, and b) **what types of learning analytics can be formed from the learner's data**.

As already explained, learning analytics is a cyclical process. Learners generate data that can be processed into metrics and analyzed for patterns such as success, weakness, overall personal or comparable performance, and learning habits. Educators can administer "interventions" based on the data analyzed, and the process then repeats itself.

Before beginning to analyze data, one should **understand what data are collected**, and **why it is needed to collect them:** data collection should have **specifc objectives and outcomes**. The collected data on their own cannot give meaningful insights, unless they are associated with specifc measurements, **depending on what one wants to measure:** learning outcomes, goal attainment, performance, behavioural changes, engagement, motivation, cognition, abilities, emotions, etc. **Metrics are what one measures, the measurements**.

There are many types of data that support student learning – and they are so much more than test scores. The type of information the educational data often include, and the sources the data can be collected from, usually are linked with a straightforward relation. For instance, **student characteristic data and/or contextual information** are usually collected from enrolment records, student profles, or attendance rolls; **student perception data** can be found in surveys and interviews; **student activity data** are available in logs from the LMS and interaction records; **student achievement data** lay within various kinds of assessment data such as rubrics, scores or observation notes; **student wellbeing data** capture students' social and emotional development, or school climate, and can be found in sources such as biosignals or social networks. Educational data and the respective data sources are explained in Chap. 1.

But **individual data points don't give the full picture** needed to support the incredibly important education goals of parents, students, educators, and policymakers. The What Is Student Data? Video (see useful video resources) explains in simple terms what student data is about and when they can be used effectively. As explained in this video and in Chap. 1, there are learner and context data that can be captured **within the learning environment** (e.g., log-fles, quiz scores, login data, content access, fle downloads, discussion participation, etc.), and there are also other types of data that are **external to the learning environment** (e.g., surveydemographic data, biosensor data, online discussion forums, social network data, etc.). In addition, **aggregating/integrating different data sources** to **increase validity and relevance, and to reduce biases (improve reliability)** is also important. Once one understands what data need to be collected, one will be able to **locate and select the most appropriate data sources to extract them from**. Those data will **feed the learning analytics cycle**.

It has been explained in previous sections what student data are about and how they can be combined together to show the whole picture of student learning, which is deeply related to the context itself. Learning analytics is a context-aware process. **Both learner and context data are necessary** in this process. Different types of **data can come together** – under different objectives – **to form a full picture of student learning**. When used effectively, data empowers everyone. The frst step is to **understand why one is collecting data** and **associate the data with metrics**

according to the learning concept one **aims to measure and shed light on**. Each of **these measurements** referred to as **learning analytics metrics, can be associated with one or more learning analytics objectives** (see Sect. 3.2.1), summarized in Fig. 3.4.

To understand that, let's take the following simple and generic example. How many views make an educational YouTube video a success? How about 300 K? That's how many views a video you posted got. It featured some well-known and successful professionals, who prompted young people to enrol in a Data Science course. It was twice as popular as any video you had posted to date. Success! Then came the data report: only eight viewers had signed up to take the course, and zero actually completed it. Zero completions. From 300 K views. Suddenly, it was clear that views did not equal success. In terms of completion rates, the video was a complete failure. **What happened?**

Well, not all important things in life can be measured and not everything that can be measured is important. **If one is measuring something, but not necessarily all the right things**, the end result could still not be right, or one is relying on the wrong data to make the case. The critical question is *which measurements are the "right" ones*. There is a difference between numbers and numbers that matter. This is **what separates data from metrics**. One can't control the educational data one is collecting, but can control what one measures. When we talk about learning analytics metrics and measurements, we're typically referring to gathering data on three areas: *effciency*, *effectiveness*, and *outcome* (Robbins, 2017), illustrated in Fig. 3.5.


*Learning effciency* refers to more granular metrics, closer to raw data; their objective is to describe learners' actions at the task or activity level (**micro-level**), and they cannot suffciently reveal a lot about learning (as a more general objective) on

**Fig. 3.4** Associating data to learning analytics metrics and objectives

**Fig. 3.5** Categories of learning analytics metrics

their own. Combining these metrics can contribute to understanding more complex learning constructs, such as engagement and collaboration. The metrics used to refer to this **meso-level** (activity or course) of more abstract and complex concepts are synopsized under the *learning effectiveness* metrics, and their objective is to quantify less fne-grained constructs. Finally, *learning outcome* can be described with metrics from previous categories that are combined to give insight and explain the results of the learning processes (**macro-level**).

**Depending on the goals** (i.e., the learning analytics objective), the learning analytics metrics will be **obtained from the same or different learner and context data**. The types/levels of the metrics will be decided according to their sophistication, the complexity of the analysis method employed, and the value they add for human decision-making (Lang et al., 2017; Scapin, 2015; Soltanpoor & Sellis, 2016):


Figure 3.6 illustrates the types of learning analytics based on their complexity and value for decision-making.

# **Questions and Teaching Materials**

	- (a) **What information to use? What has happened? What is likely to happen?**
	- (b) **What information to use? How is it gathered? What is likely to happen?**
	- (c) **What information to use? How is it gathered? How is it combined?**
	- (d) **What information to use? How is it gathered? What has happened?**

Correct answer: c.

	- (a) **Monitor progress, predict performance, create content, facilitate self-regulation**
	- (b) **Predict dropout, increase self-awareness, guide adaptation, detect emotions**
	- (c) **Generate feedback, model learners, support game-based learning, predict retention**
	- (d) **Evaluate learning, provide recommendations, assess collaboration, increase effort**

Correct answer: b.

	- (a) **LA helps teachers develop more appropriate interventions and learning opportunities for target learners (e.g., experiential learning)**
	- (b) **LA helps learners become aware of their progress on different tasks by combining the learning data that are generated during the process (e.g., self-regulated learning)**
	- (c) **LA make use of data generated by learners' online activity to identify behaviours and patterns within the learning environment that signify effective process (e.g., social learning).**
	- (d) **All the above**

Correct answer: d.

	- (a) **Survey-demographic data, biosensor data**
	- (b) **Gender, socioeconomic status, special education needs**
	- (c) **Test scores, educational fle downloads, educational content access**
	- (d) **Enrolment records, emotional development, social network data**

Correct answer: c.

	- (a) **Data are measurements (numbers/calculations) to help make decisions about how to move forward, whilst metrics are indicators of progress and achievement**
	- (b) **Data is the set of raw numbers or calculations gathered, whilst metrics are proxies for what ultimately matters (i.e., what we measure)**
	- (c) **Data is a mapping of observations into numbers, whilst metrics are numerical approximations of objectives**
	- (d) **Data is the raw measurements, whilst metrics are trends in the data**

Correct answer: b.

6. **Consider the following metrics: time on task, frequencies of resources downloads, quiz scores, attempts, hint usage. What category of metrics are they?**


Correct answer: a.

	- (a) **Descriptive analytics**
	- (b) **Diagnostic analytics**
	- (c) **Predictive analytics**
	- (d) **Prescriptive analytics**

Correct answer: b.

### 8. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response on data collection in the following refective task.

You may refect on:


# *3.2.3 Limitations and Data Quality Issues of Learners' Data Measurements in Open and Blended Courses*

As already explained in Chap. 1, data often suffer from inaccuracies, biases or even manipulations; the educational data, apart from being **relevant** to be used for decision making (ft-for-purpose), should also be **reliable** and **valid**. According to Wikipedia (Data Quality, 2022), **data is generally considered high quality** if it is "*ft for [its] intended uses in operations, decision making and planning and data is* 

**Fig. 3.7** The dimensions of data quality

*deemed of high quality if correctly represents the real-world construct to which it refers.*"

Like in all kinds of organizations, data quality is critical for educational institutes, as well. In online and blended learning settings, many factors are additive to the existing diffculty in **handling educational data quality.** For example, such factors often are heterogeneous educational data sources, high volumes of learner and learning data, and a myriad of unstructured data types extracted. The Data Quality Matters – Tech Vision 2018 Trendvideo (see useful video resources) explains the critical issues of data quality from a more general perspective. As discussed in this video, there are many aspects to data quality, including **completeness, consistency, accuracy, timeliness, validity,** and **uniqueness,** synopsized as follows (Mihăiloaie, 2015; Pipino et al., 2002) and illustrated in Fig. 3.7:


Among the 6 dimensions, completeness and validity usually are easy to assess, followed by timeliness and uniqueness. Accuracy and consistency are the most diffcult to assess. The critical question is how those data limitations relate to learning analytics and why does quality matters. Here, we will focus on how these principles/ limitations apply in learning analytics.

Specifcally, in the learning analytics cycle, learner and contextual data are collected and transformed into metrics (analytics), according to the learning objective that needs to be addressed; the different types of metrics shall next guide human decision-making and interventions. Yet, *the higher the need for data-driven decision-making is, the more the integrity and quality of data become critical* (National Forum for the Enhancement of Teaching and Learning in Higher Education, 2017). The following example demonstrates in simple terms the impact of data limitations and quality for learning analytics.

Let's examine the case of an educator who wants to understand learners' engagement with an activity. To measure engagement on the activity level, it is common practice to use learners' participation data (e.g., frequency of logins, session duration, posts on the activity forum, etc.). If the learners' ID is missing from the data that are available via the LMS (the data are *incomplete*), then the educator shall not be able to identify each learner's participation. Similarly, if each learner's data would be stored in different formats (e.g., dates: MM/DD/YY vs. DD/MM/YY) this would result in confusion about the validity of the data and their interpretation (the data are *not valid*). In the same example, this *inconsistency* in the data format would also result to *inaccurate* data– *when did the learner really log in to the activity*? – i.e., it would be unclear *what the correct values* of the stored data *are*. Furthermore, if the learners' data during the activity would not become timely available, the educator would not gain insight to what the learners are doing during that activity (violation of *timeliness*), making it impossible to intervene in a timely manner. Similarly, if the same learners' data are stored multiple times (e.g., each time a learner logs in the activity, the login is duplicated) and all the information is considered for analysis, the results would be misleading (violation of *uniqueness*).

It is important to clarify that raw data quality strongly affects the analytics quality; learning analytics metrics are **transformations of** the raw learner and learning **data** collected, according to the objectives set. These metrics will next be **treated as "data"** themselves, and they will be subjected to further processing. Just like with any kind of data, **quality also matters for learning analytics metrics**: what the specifc metrics **can reveal** is strongly dependent on their quality. In most cases, limited quality will have the direct result of **lack of trust** in the metrics, and consequently, **poor decisions** and **gradual abandonment** of the data-driven educational decision-support system. Poor quality data is troublesome (The data quality benchmark report, 2015). Educators **cannot and will not trust insights** that are acquired by processing corrupted, duplicate, inconsistent, missing, broken, or incomplete

**Fig. 3.8** Quality indicators for learning analytics. (Adopted from: Scheffel et al., 2015)

data. Learning analytics metrics quality is expected to **increase the value of the learner and learning data** and the opportunities to use them properly.

The following approaches were developed to discuss the exact concerns of **quality issues** in **learning analytics metrics.** In particular, the LACE project developed a proposal for a framework of quality indicators for learning analytics that contributes towards a standardized and holistic approach for the evaluation of learning analytics tools (Scheffel et al., 2015). It potentially can act as a means for providing evidence on the impact of learning analytics on educational practices. The suggested framework is *generic* and considers **multiple learning analytics aspects**, ranging from their objectives to organizational issues. For the measures and data aspects, the framework highlights *comparability, effectiveness, effciency, and helpfulness, as well as transparency, data standards, data ownership, and privacy, respectively* (Fig. 3.8).

From a *more "data-oriented" approach to "quality"* aspects for *learning analytics metrics*, the above indicators can be combined and merged with those identifed before (illustrated in Fig. 3.7), as follows:


The "*quality indicators*" refer to how appropriate the learning analytics metrics are, how ft-for-purpose they are as data that will be used in the decision-making process in turn; the "condition" of the data themselves – the degree to which a set of characteristics of data fulfls requirements.

The "*ethics considerations*" refer to systemising, defending, and recommending concepts of right and wrong conduct in relation to data; they are considerations that tackle the potential for data misuse, and issues about the right, legitimate, and proper ways to use data. Ethics considerations are placed on top of quality indicators, since the latter are relevant to the data, whilst the former are relevant to the usage of the data (Fig. 3.9).

Like any kind of data, learning analytics metrics should be protected from misuse, mistreatment, or violations. The **quality of learning analytics** as (data) metrics themselves matters in terms of impacting the quality of the outcome as a data-driven decision. **Mostly it is important to control** who has access to those metrics, what can and cannot be done with the metrics, and for how long access is granted after the collection and analysis of the raw learning and context data occurs. Therefore, along with the learning analytics metrics quality indicators, the ethical limitations should be considered, as well.

### **Questions and Teaching Materials**

### 1. **Data completeness refers to:**


Correct answer: c

	- (a) **Data consistency**
	- (b) **Data completeness**
	- (c) **Data accuracy**
	- (d) **Data timeliness**

Correct answer: a.

	- (a) **To evaluate how appropriate the learning analytics metrics are, how ftfor-purpose they are as data that will be used in the decision-making process in turn**
	- (b) **To contribute towards a standardized and holistic approach for the evaluation of learning analytics tools.**
	- (c) **To systemise, defend, and recommend concepts of right and wrong to tackle the potential for data misuse, and issues about the right, legitimate, and proper ways to use data**
	- (d) **To control who has access to the metrics, what can and cannot be done with the metrics, and for how long access is granted after the collection and analysis of the raw learning and context data occurs.**

Correct answer: b.

### 4. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about learning analytics metrics limitations and quality, in the following refective task:


# *3.2.4 Ethical Treatment of Learner-Generated Data and Measurements*

Learning analytics provides tremendous opportunities to assist learners – but they also pose ethical implications that shouldn't be ignored. The practical challenge of learning analytics metrics is the question of privacy of the learner and how to protect the learner from potential harm due to data misuse. Questions abound:


Towards addressing these issues, the Learning Analytics: The need for a code of ethics video (see useful video resources) elaborates on the need to establish a code of ethics for learning analytics. This code of practice aims to set out the responsibilities of educational institutions to ensure that learning analytics is carried out responsibly, appropriately and effectively, addressing the key legal, ethical and logistical issues which are likely to arise.

Slade and Prinsloo (2013) identifed three broad classes of ethical issues: (a) the location and interpretation of data; (b) informed consent, privacy, and the deidentifcation of data; and (c) the management, classifcation, and storage of data. As we have explicitly explained, in the learning analytics cycle, data are collected about individuals and their learning activities, and metrics are constructed; the data will be analysed and interventions (might) take place. This entails opportunities for positive impacts on learning, as well as risks for misunderstandings, misuse of data and adverse impacts on students.

When learners perform learning tasks within a learning environment to increase their knowledge and develop skills and competences, they expect to receive support to overcome gaps in knowledge/competences. They also expect **to be in a "safe"** 

**Fig. 3.10** Ethical considerations in learning analytics

**environment** where their mistakes will be treated with respect, without serious consequences or unfair and unjustifed discrimination against them, as individuals. Two critical issues are hidden in the implied "*safety*" of the learning environments: (a) the learners should feel **"secure"** and maintain the **"privacy"** of their data (integrity of the self), and (b) the learners' data should be treated in an **"ethical"** manner. Drachsler and Greller (2016) provided a clear differentiation between ethics and privacy: "*Ethics is the philosophy of morality that involves systematizing, defending, and recommending concepts of right and wrong conduct […] privacy is a living concept made out of continuous personal negotiations with the surrounding ethical environment"*. The main ethics considerations are illustrated in Fig. 3.10 and are outlined as follows:


Ethics provides us with guides on what is the right thing to do in all aspects of life, while the law generally provides more specifc rules so that societies and their institutions can be maintained (Tsachuridou, 2015).

Over the past 5 years or so, a number of guidelines, codes of practice and policies have been developed in response to this. Slade and Prinsloo (2013) established one of the earliest frameworks with a focus on ethics in learning analytics. Others have followed, including JISC's code of practice in 2015, the **Learning Analytics Community Exchange (LACE)** framework in 2016 (Drachsler & Greller, 2016) and a learning analytics policy development framework for the EU by the **SHEILA project** (Tsai & Gasevic, 2017). More recently and in the light of the rapid development of Learning Analytics on a global basis, **International Council for Open and Distant Education (ICDE)** has taken the initiative to produce a set of guidelines for ethically-informed practice that would be valuable to all regions of the world (March 2019).

To address the issues raised earlier in this section and **demystify the ethics and privacy limitations** around learning analytics, the LACE project published the **DELICATE instrument** to be used by any educational institution. The instrument includes **policies and guidelines** regarding privacy, legal protection rights or other ethical implications that address learning analytics. The DELICATE checklist helps to investigate the obstacles that could impede the rollout of learning analytics and the implementation of trusted learning analytics for higher education. The eight points are shown in Fig. 3.11 and include:


The EU SHEILA project focused on developing a learning analytics policy development framework for the EU under the 6 dimensions of the Rapid Outcome Mapping Approach (ROMA) (Ferguson et al., 2014; Macfadyen et al., 2014), and consisting of 49 **action points**, 69 **challenges**, and 63 **policy questions.** The ROMA dimensions, as considered by the SHEILA framework, include: (1) The political context of an institution, i.e., identifying the 'purposes' for adopting learning analytics in a specifc context; (2) The involvement of stakeholders, i.e., the implementation of learning analytics in a social environment involves collective efforts; (3) A vision of behavioural change and potential impacts; (4) Strategic planning, including resources, ethics & privacy, and stakeholder engagement and buy-in; (5) Institutional

**Fig. 3.11** The delicate checklist

capacity to affect change, i.e., assessing the availability of existing resources; (6) A framework to monitor and evaluate the effcacy and continue learning.

In addition, the ICDE report on Ethics in Learning Analytics identifed several core issues that are important on a global basis for the use and development of Learning Analytics in ethics-informed ways. Those issues are shown in Fig. 3.12 and include:


**Fig. 3.12** Ethics in learning analytics based on ICDE report

accessed on a 'need-to-know' basis to facilitate the provision of academic and other support services.


### **Questions and Teaching Materials**

	- (a) **The location and secure storage of data**
	- (b) **Misinterpretation of data, or other data errors**
	- (c) **Informed consent, privacy, and ownership of student data**
	- (d) **The regulation about the purposes for which data will be collected and used**
	- (a) **Data privacy requires us to, at least conceptually, agree that you as the data subject own your data and the data you generate. Data ownership in itself does not necessitate that privacy be respected by default.**
	- (b) **Data privacy is the regulation of how personal data will be collected and used, under which conditions, and who will have access to data. Data ownership is the act of having legal rights and complete control over a single piece or set of data.**
	- (c) **Data privacy is the right of a citizen to have control over how personal information is collected and used. Data ownership is the regulation of how personal data will be collected and used, under which conditions, and who will have access to data.**
	- (d) **Data privacy is the act of having legal rights and complete control over a single piece or set of data. Data ownership defnes and provides information about the rightful owner of data assets and the acquisition, use and distribution policy implemented by the data owner.**

Correct answer: a.

# 3. **Match the appropriate defnition (from the right column), to the respective "point" in the left column**


Correct answer: 1.c / 2.g / 3.f / 4.h / 5.a / 6.d / 7.b / 8.e.

	- (a) **It was diffcult to defne ownership and responsibilities among professional groups within the university.**
	- (b) **The provision of opt-out options conficts with the goal to tackle institutional challenges that involve all institutional members.**
	- (c) **Anonymised data could potentially be reidentifed when matched with other pieces of data.**
	- (d) **All the above**

Correct answer: d.

	- (a) **The SHEILA framework is used to inform the development of policies for learning analytics, but strategies are not covered**
	- (b) **The DELICATE checklist addresses issues of power-relationship, data ownership, anonymity, data security, privacy, data identity, transparency, and trust.**
	- (c) **The ICDE report on Ethics in Learning Analytics identifes which core principles relating to ethics are core to all, unless there is legitimate differentiation due to separate legal or more broadly cultural environments.**
	- (d) **All the above**

Correct answer: b.

### 6. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about learning analytics ethical considerations and policies, in the following refective task:


# **3.3 Analyzing Data and Presenting Learning Analytics**

# *3.3.1 Methods for Analyzing the Learner-Generated Data and the Measurements Over Them*

As already explained, the learning analytics cycle describes the whole process from collecting the learner and context data to taking data-driven actions and interventions. The raw learner and context data do not tell a lot on their own, but when converted to metrics, they have the potential to reveal what we don't know about our learners.

Good metrics have three key attributes: their data are consistent, clean, and valid to use (see Sect. 3.2.3). Data cleaning and management is a demanding task (see Chap. 2). Given that good and clean data are available, next the data analysis method needs to be selected. Here we explain **what methods** can be used for **analysing** the educational data and learning analytics. This step is the main "*game*" of Data Science; it requires the procedures under the umbrella of Data Science. Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data (Sharma, 2019). The main generic categories of methods of this step are shown in Fig. 3.13 and include (but are not limited to):


However, not all data analysis methods can yield the results one is seeking. To achieve that, **a number of criteria need to be specifed**, e.g., the learning analytics objective you want to address (modelling learners, prediction of performance, adaptation, recommendation, etc., see Sect. 3.2.1), the metrics you have to compute

**Fig. 3.13** The basic data analysis methods in learning analytics

**Fig. 3.14** Frequency of data analysis methods in learning analytics. (Data Source: http://bora.uib. no/handle/1956/17740)

(effective, effcient, outcome, see Sect. 3.2.2), and the type of analytics you want to use (descriptive, diagnostic, predictive, etc., see Sect. 3.2.2). The analysis methods will be utilized to form a better understanding of the educational settings and learners: learning analytics focus on the **application of known methods and models** to address issues affecting student learning and the environments in which it occurs.

Before we explain how the appropriate analysis method can be chosen according to the needs, we briefy introduce (in simple terms) the approaches commonly used in learning analytics. Specifcally, the learning analytics metrics come from data related to learners' interactions with course content, other learners, and instructors. Different techniques are applied to **detect interesting patterns hidden** in the educational data sets.

Among the analysis techniques, some have received increased attention in the last couple of years, namely *statistics*, *data mining*, *machine learning*, *qualitative analysis*, *social network analysis*, and *visualizations* (Chatti et al., 2012; Khalil & Ebner, 2016; Papamitsiou & Economides, 2014). In a recent report on the current state-of-the-art in learning analytics, a corpus of 100 studies was considered (Misiejuk & Wasson, 2017). Figure 3.14 shows the frequency of the data analysis methods used in the corpus.

By far, *statistics* is the most commonly used method, including *descriptive statistics* (43%), *correlation analysis* (36%), *ANOVA* (10%) and *T-Test* (10%). Data mining methods like *regression analysis* (24%) and *cluster analysis* (13%) are also common techniques, followed by *network analysis* (16%) and *data visualisations* (13%). The remainder of the methods were reported 1–5 times. Some of these less used approaches are *machine learning methods* such as neural networks and support vector machines. More recently, *multimodal analysis* uses more sophisticated data such as video, gaze, gestures, and combines various methods such as computer vision, machine learning, etc.

Although the different analysis methods are **inherently technical**, they can **provide pedagogical insights** if properly used. For example, *descriptive statistics* (such as the mean, median and standard deviation) can be used to showcase the students' interaction with a learning system (the usage), as it is coded with *effciency metrics* (see Sect. 3.2.2) like the time online, total number of visits, distribution of visits over time, frequency of students' postings/replies, percentage of material read, etc. Statistical methods can also be used to signify the importance of the analysis results (e.g., analysis of variance – ANOVA, and t-tests), or to explain more complex constructs of learning (*effectiveness metrics*), such as engagement (e.g., Principal Component Analysis – PCA). *Data mining* methods like classifcation and clustering can be used to model and explain learner performance (*outcome metric*), and *machine learning* techniques can be successfully applied to detect learners' affective states (*effectiveness metrics*) during the learning activities.

Next, we focus on how the most commonly used **statistical methods** can **tell the story** in the data. In particular, **statistics** are used for measuring, controlling, communicating and understanding the data (Davidian & Louis, 2012). It is a mathematical science including methods of collecting, organizing and analyzing data in such a way that meaningful conclusions can be drawn from them. In general, statistics begin with data collection using a **sampling method** (you have learned about that in Chap. 1), and next, for understanding the collected data, its investigations and analyses fall into two broad categories called **descriptive** and **inferential statistics**. Furthermore, **descriptive statistics** deals with the processing of data **without attempting to draw any inferences** from it (Kenton, 2018). Finally, **inferential statistics** is a scientifc discipline that uses mathematical tools to **make forecasts** and make generalizations about the larger population of subjects by analyzing the given data (Kuhar, 2010).

The **Statistics – Introduction to Statistics** video (see useful video resources) presents a brief introduction to statistics. Before advancing to more sophisticated techniques, we elaborate more on the fundamentals of statistical analysis and how they can tell the story in learning data analytics.

As already explained, descriptive statistics are used to summarize data in a way that makes sense. Descriptive statistics are, as their name suggests, descriptive: they illustrate what the data shows but do not generalize beyond the data considered. Here is a list of commonly used descriptive statistics (Dillard, 2017):


### 3.3 Analyzing Data and Presenting Learning Analytics


Mean, median and mode are measures of central tendency, while range and standard deviation are measures of dispersion.

Descriptive statistics may be suffcient if the results do not need to be generalized to a larger population, e.g., outside the specifc assignment; when comparing the percentage of students that have solved an assignment correctly versus wrongly, descriptive statistics may be suffcient. Most analytics fall into the basic data evaluation category, and there is tremendous value here, and opportunities for some huge wins.

However, using only this kind of statistics entails the risk of '*picking the low hanging fruit*' of learning analytics – descriptive information or **simple statistics** that *values what can be easily measured rather than measuring what values*. If it matters to understand, not only what happened, but also why it happened, utilizing the data to make inferences or predictions about learners is needed, and using inferential statistics is required.

Inferential statistics can be used to generalize the fndings from sample data to a broader population, and examine the differences and relationships between two or more samples of the population (Kuhar, 2010). These are more complex analyses and are looking for signifcant differences between variables and the sample groups of the population. Inferential statistics allow testing hypotheses and generalizing results to the population as a whole. Following is a list of basic inferential statistical tests (Rathi, 2018):


examined and proven to be signifcantly different. The ANOVA will tell you if the difference is signifcant, but it does not speculate regarding "why".

• **Regression** – used to determine whether one variable is a predictor of another variable. For example, a regression analysis may indicate to you whether or not participating in a test preparation program results in higher ACT scores for high school students. It is important to note that regression analysis are like correlations in that *causation cannot be inferred* from it.

### **Questions and Teaching Materials**

	- (a) **Data Science is used to convert the raw student data into learning analytics metrics**
	- (b) **Data Science uses mathematical tools to make forecasts about the larger student population by analyzing their data**
	- (c) **Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw student data and to understand learning in online and blended learning environments**
	- (d) **Data Science uses complex analyses and is looking for signifcant differences between variables for the sample groups of the student population**

Correct answer: c.

	- (a) **Median and standard deviation**
	- (b) **t-tests and/or ANOVA**
	- (c) **Principal Component Analysis**
	- (d) **Machine Learning**

Correct answer: b

	- (a) **Simple statistics, Complex statistics, Inferential statistics**
	- (b) **Sampling methods, Simple statistics, Complex statistics**
	- (c) **Sampling methods, Descriptive statistics, Inferential statistics**
	- (d) **Descriptive statistics, Complex statistics, Inferential statistics**

Correct answer: c.

### 4. **Match the appropriate defnition (from the right column), to the respective "descriptive statistic" in the left column**


Correct answer: 1.d / 2.f / 3.a / 4.c / 5.b / 6.e.

	- (a) **3**
	- (b) **7**
	- (c) **8**
	- (d) **12**


(continued)


Correct answer: a.

	- (a) **Mean and standard deviation of the participation variables: they illustrate what the data shows**
	- (b) **ANOVA of the participation variables: determine whether or not the difference in the means of the sampled groups is statistically signifcant or due to random chance**
	- (c) **Regression: determine whether the participation variables can explain the scores**
	- (d) **None of the above: more advanced data analysis methods are required**

Correct answer: c.

### 7. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about the analysis methods employed in learning analytics, in the following refection task:

1. **Provide 2 examples of learning analytics metrics and explain why you would use the mean and standard deviation to describe their values. Please, elaborate on your choices.**

2. **Provide examples of learning analytics metrics that could be used to explain a learning outcome, and elaborate on the statistical method you would use to explore the relationship.**

# *3.3.2 Presentation Methods for Reporting on Learner Data Analytics*

Now the educational data that were collected have been analyzed. *How did students perform in an assignment? How did they perform compared to the previous assignment? How many of them downloaded the material that was made available online? How much time did the students spent on studying the online material compared to the score they achieved on the assignments?*

These are common questions that can be answered when the educational data that have been collected, are analyzed using the respective metrics. The collected learner and context data and learning can be presented in many different ways to help make it **easier to understand** and **more interesting to read**. After collecting and organizing data, the next step is to **display them in an easy to read manner** – highlighting similarities, disparities, trends, and other relationships, or the lack of, in the dataset.

Data can be used to make data-driven and informed educational decisions, but all the data in the world **won't help if one cannot understand what the insightful analysis can present**. The frst step to presenting data is to understand that **how data is presented matters** (Kiss, 2018). Take these two visuals. They display the results of the scores that 250 students achieved on the fve assignments and the midterm exams during one semester, on a scale 0–100. The frst one (infographic style – Fig. 3.15) is "prettier." However, the visual is diffcult to understand unless one actually reads the information on it. Pretty, but not helpful…

On the other hand, the second one (Fig. 3.16) uses simple bars to display the same information. Helpful, and still pretty…

In this section we elaborate on the different ways used to represent educational data and learning analytics metrics in a meaningful manner.

As already explained, displaying the analysis results and what is within the educational dataset in a clear way, is helpful in telling the story and making sense of the data that have been collected. Data reports present the data, analyses, conclusions and recommendations in an easy to decipher and digest format (Lebied, 2016).

The methods commonly used to display data include **tables, charts, bar graphs, pie graphs, and line plots**. Other commonly used ways to present data are **histograms, box- plots, scatterplots, and stem-and-leaf plots**. Sometimes, a combination of the graphical representations is used as a **dashboard**: presenting data results together should tell a story or reveal insights together, that isn't possible if left apart.

**Why** do we use tables, diagrams or charts to display the learner/learning information?

**Fig. 3.16** Simple visualization of learning data


Data can be presented in various forms **depending on the type of data collected**. For example, a **frequency distribution table** shows **how often** each value (or set of values) of the variable occurs in a dataset. A **frequency table** is used to summarize categorical or numerical data. Frequencies are also presented as **relative frequencies**, that is, the **percentage** of the total number in the sample. Except from the tables, there are other, **graphical ways** to present data. Analytics presented visually make it easier for decision makers to grasp diffcult concepts or identify new patterns.

The **Value of Data Visualization** video (see useful video resources) provides a quick introduction to the value of data visualization. Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, **data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data**. Data visualization is a powerful tool, especially in a world desperate for hard facts. When it comes to making sense of learning analytics and understanding learning patterns in the educational data, one can start from simple graphs that can demonstrate this information. For example, quiz submission data, discussion interaction data (e.g., participation in the forum), data from the access to the learning management system, assignment completion data have been gathered and analyzed. What's next is to answer questions like the following:


To address these questions, **graphic representations that are easy to interpret** are needed (Blits, 2017). Figure 3.17 illustrates the most common data visualization types.

A **bar graph** is a way of summarizing a set of **categorical data**. It displays the data using a number of rectangles, of the same width, each of which represents a particular category. Bar graphs can be displayed horizontally or vertically, and they are usually drawn with a gap between the bars (rectangles). For example, to answer

**Fig. 3.17** Data visualization types

to how well an individual student did in comparison to the entire class, a bar graph can be used, where each student in the classroom is represented by a bar.

A **line graph** is particularly useful when we want to show the **trend of a variable over time**. Time is displayed on the horizontal axis (x-axis) and the variable is displayed on the vertical axis (y- axis). In the above example, a line graph can be used to showcase the overall performance on a quiz.

A **pie chart** is used to display a set of categorical data. It is a **circle**, which is divided into **segments**. Each segment represents a particular category. The area of each segment is **proportional** to the number of cases in that category. For example, a pie chart can be used to display the successful completion of an assignment.

A **histogram** is a way of summarizing data that are **measured on an interval scale** (either discrete or continuous). It is often used in **Exploratory Data Analysis (EDA)** to illustrate the features of the distribution of the data in a convenient form. In the above example, a histogram can be used to show the distribution of scores of students on the fnal exams.

A **scatter-plot** displays values for typically two variables for a set of data. The data are a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. The scatter-plot is usually used to determine if a correlation exists between the data, and how strong it is. For example, a scatter-plot can show if there is a relationship between quiz performance and content access, or if there is a relationship between assignment completion and quiz performance.

It needs to be clarifed that, in statistics, exploratory data analysis (EDA) is a preliminary data analysis approach to summarize the main characteristics of a given dataset, often with visual methods. EDA refers to a critical process of performing initial investigations on data to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations. It is a good practice to understand the data frst and try to gather as many insights from it.

In most cases, a single graph does not contain all the information that is hidden in the data, cannot provide all the insights that might be needed to understand students' learning behaviour or outcomes, and is not suffcient for informed decisionmaking. The solution is to use combined graphs of the learning analytics metrics that all together can tell the story in the data. These combined graphs are called **dashboards**. "*A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance*" (Few, 2004). Here are fve examples of learning analytics dashboard implementations, in relation to the educational objective they aim to address.

**LAPA – Learning Analytics for Prediction & Action** The goal of LAPA dashboard is to inform learners' online learning behaviour to learners themselves and the instructor and guide their learning in a smart, personalized way. The frst version of LAPA (Fig. 3.18) consists of 7 graphs. The graph chosen for the online activity summary is the scatterplot, where individual learners can choose the X-axis and

**Fig. 3.18** The LAPA dashboard. (Source: Park & Jo, 2015)

Y-axis to locate their position in class. The other 6 graphs are provided with a trend line of their activity every week along with the average activity information of their peers. All graphs in LAPA are updated every week until end of semester (Park & Jo, 2015).

**LADA – Learning Analytics Dashboard for Advisers** LADA is a learning analytics dashboard that supports academic advisers in compiling a semester plan for students based on their academic history. LADA also includes a prediction of the academic risk of the student (Gutiérrez et al., 2018). LADA visualizes two categories of information: a) The chance of success and prediction quality components b) The various information card components designed to support the adviser (Fig. 3.19).

**LISSA – Learning Dashboard for Insights and Support during Study Advice** LISSA provides an overview of every key moment in chronological order up until the period in which the advising sessions are held: the grades of the positioning test (a type of entry-exam without consequence), mid-term tests, January

**Fig. 3.19** The LADA dashboard. (Source: Gutiérrez et al., 2018)

exams, and June exams. A general trend of performance is visualised at the top: the student path consists of histograms showing the position of the student among their peers per key moment (Charleer et al., 2018). LISSA is shown in Fig. 3.20.

**SmartKlass (Moodle)** SmartKlass™ is a Learning Analytics dashboard for Institutions, Teachers and Students. By analyzing student's behavioural data SmartKlass™ creates a rich picture of the evolution of the students in an online course: it can help teachers to identify the students lagging behind, help teachers to identify the students that content is not challenging enough for them, help teachers to compare participation and results to other courses, so the teachers can take action (Fig. 3.21). Students can also learn about their performance, individually and compared with the group.

**Acrobatiq** The Learning Dashboard (Fig. 3.22) generates summary graphs, tables and reports and dynamically displays student learning estimates, engagement data and activity data in real time. It enables faculty, students, and other stakeholders to visualize and act on student learning performance. It can be used for revealing what students did/not learn, quantifying how well students have learned each skill, identifying consequential patterns in students' learning behaviours, and measuring effectiveness of instructional and design choices.

**Signals** Course Signals was developed to allow instructors the opportunity to employ the power of learner analytics to provide real-time feedback to a student. Course Signals relies not only on grades to predict students' performance, but also


**Fig. 3.20** The LISSA dashboard. (Source: Charleer et al., 2018)


**Fig. 3.21** The SmartKlass. (Source: https://moodle.org/plugins/local\_smart\_klass)

demographic characteristics, past academic history, and students' effort as measured by interaction with Blackboard Vista, Purdue's learning management system (Arnold & Pistilli, 2012). The Course Signals Explanation video (see useful video resources) is a brief introduction to Signals.

**Fig. 3.22** The Acrobatiq. (Source: https://www.acrobatiq.us/products/the-learning-dashboard. html)

**KlassData** The learning process in virtual environments is more complex to analyze, but the generated data unlocks the power of learning analytics and opens the door to personalized paths in education. The KlassData: Learning Analytics for Education video (see useful video resources) application explains how KlassData works.

### **Questions and Teaching Materials**

1. **Match the visualizations (from the left column) to the respective evaluation of data presentation clarity (i.e., "Easy to understand" / "Diffcult to understand" in the right column)?**

Correct answer: 1.b / 2.b / 3.b / 4.a.

	- (a) **To present data results together so that they tell a story or reveal insights together, that isn't possible if left apart**
	- (b) **To summarize categorical or numerical data**
	- (c) **To grasp diffcult concepts or identify new patterns**
	- (d) **To produce and deliver richly interactive visualizations**

Correct answer: a.

	- (a) **A scatter plot to visualize the students' scores on monthly assignments and the average class performance, and a histogram for the distribution of the scores**
	- (b) **A bar graph with a line to visualize the students' scores on monthly assignments and the average class performance, and a histogram for the distribution of the scores**
	- (c) **A bar graph with a line to visualize the students' scores on monthly assignments and the average class performance, and a pie for the distribution of the scores**
	- (d) **A scatter plot to visualize the students' scores on monthly assignments and the average class performance, and a pie for the distribution of the scores**

Correct answer: b.

4. **Select the visualization that better illustrated the performance of all students on all assignments**

Correct answer: c.

5. **Match the objective (from the right column), to the respective visualization dashboard in the left column.**


Correct answer: 1.c / 2.e / 3.d / 4.b / 5.a.

	- (a) **To capture moment-by-moment learning and students' achievements**
	- (b) **To increase students' awareness of their own progress, guide selflearning, and support self-regulation of learning**
	- (c) **To predict students' progress during the semester and make content recommendations**
	- (d) **To monitor individual students' learning and reveal gaps, misunderstandings, or diffculties and help teachers tailor their instruction to the students' needs**

Correct answer: d.

### 7. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about the data representation techniques in learning analytics, in the following refective task:


# **3.4 Interpreting Learning Analytics and Inferring Learning Changes**

# *3.4.1 Making Sense of Learners' Data Analytics and Analysis Results*

The **intersection of learning science with data and analytics** enables more sophisticated ways of **making meaning** to support student learning. All these available learner and context data "carry" so much knowledge about the learners and the learning processes, that remains hidden and waits to be revealed. But data from tracking systems are not inherently intelligent. *Hit counts* and *access patterns* do not really explain anything. **The intelligence is in the interpretation of the data**; what all those statistics about the learner's data and measurements can inform us about. For example, *login frequencies, time-spent on tasks* or *numbers of forum posts* do **not measure the impact** on students' learning. However, the data analysis

**Fig. 3.23** The path from learners to knowledge

techniques can reveal **potential relationships between metrics** that otherwise, in a human-analysis perspective, would be **undiscoverable** or even ignored. In the above example, learning analytics metrics such as *time-spent* or *frequencies of attempts* can be used to identify *specifc units of study or assignments* in a course that are diffcult (or trivial) for most of the students, and reveal *the correlation between task-diffculty and student behaviour*. Ideally, data analysis techniques enable the visualization of interesting data that in turn sparks the investigation of this data. Figure 3.23 illustrates the path from learners and their data to the interpretation of learning analytics.

The statistical analysis uses a *combination of potentially actionable metrics* to predict an outcome that needs attention and improvement. For example, to predict the successful completion of an assignment, metrics can include *measurable events*, such as time-spent on-task, on-task mental effort, number of attempts to solve a task, frequency of question posing, frequency of help-seeking, etc. *Less obvious data* can also be used, such as non-cognitive variables, like stress levels, emotional intensity, attention, etc. Analyses provide a score for each student, so students can be grouped objectively into categories needing high-, medium- or no-intervention to successfully complete the assignment. **The analysis cannot say that the learning analytics metrics** *caused* **the outcome, but it can show** *what combination of indicators is related* **to the outcome.** Your data reports and visualizations will help you to *identify historical trends* and *correlations*, which you can use to understand *what happened* and *(probably) why*.

Behavioural data can also be used to track *students' approaches to study*. For example, *frequency* and *sequence of interactions* can be tracked, as students engage with learning tasks. While this may **not directly measure student learning**, it can provide insights on the student's on-task activity and help to **identify strategies** that could improve *how they plan* and *regulate their study*.

Data science is promising to have a substantial infuence on the understanding of learning in online and blended learning environments. This, of course, implies a shift on the typical role of educators, from being instructors and facilitators to performing some of the tasks data analysts usually hold (Fig. 3.24). They need to be able to discover the patterns in the data and convey the meaning in educational terms, that is to interpret the analysis results into meaningful learning schemas.

**Fig. 3.24** The different roles of the educator in relation to data

The more an educator will use the learning analytics metrics, tools and visualization dashboards, the more she will understand what the story that the data can tell is, and what the most important patterns in the data are in explaining students' engagement, progress and outcomes. The analysis might reveal correlations between metrics that the educator had never thought of before, and behavioural patterns that are repeated from student to student and from class to class.

As the educator moves from effciency metrics to effectiveness metrics to outcomes (see Sect. 3.2.2), she should keep in mind that all **metrics are proxies for what ultimately matters**. The different types of analytics facilitate the selection of the most appropriate metrics and guide their interpretation. Next, we elaborate on how the analysis outcomes associate with the learning analytics objectives and the analytics types.

As already discussed, the common objectives of learning analytics include monitoring learners' progress, modelling learners/learners' behaviour, detecting learner's emotions, predicting learning performance/dropout/retention, generating feedback, providing recommendations, guiding adaptation, increasing selfrefection/self-awareness, and facilitating self-regulation. To address these objectives, four types of learning analytics can be used, namely descriptive, diagnostic, predictive and prescriptive analytics. The infographic by CommLabIndia and the article by eLearningIndustry give a comprehensive overview of different levels of learning analytics and of how bases of and approaches to using analytics can lead to deeper insights.

Each analytics type can be supported and facilitated by specifc data analysis methods that are appropriate for that type of data transformations. For example,

**Fig. 3.25** The learning analytics types with respect to the objectives and actions

descriptive statistics and simple visualizations (using bar graphs, histograms, etc.) are the suitable analysis technique to provide descriptive analytics. Similarly, correlation analysis better facilitates diagnostic analytics, whereas regression analysis is commonly used for prediction purposes, and as such it is an indicative analysis technique for predictive analytics. When it comes to prescriptive analytics, more sophisticated analysis techniques can be employed (e.g., heuristics, machine learning), which, however, require strong background in data science and are beyond the scope of this chapter. Depending on the objectives and the types of analytics used, the interpretation of the analysis results can vary from gaining insights, to making decisions, to taking actions (Fig. 3.25).

For example, let's assume that, in anticipation, an educator wants to early predict students' success in the fnal exams in order to provide them proactive feedback, recommendations, support their self-regulated learning strategies, and prevent failure or drop-out. Let's also assume that the educator has available all the data from the students' activity during the semester (online participation, assignments' completion, quizzes' scores, etc.). The learning management system the educator is using can provide all the *descriptive statistics* about students' misconceptions, engagement, achievement, progress, etc., and deliver this information using multiple visualizations of the different learning analytics metrics, demonstrating some critical interrelationships between them and facilitating some *diagnostic* operations. The dashboard can also provide the result from a regression analysis in graphical formats that considers the most critical metrics and forecasts the evolution of the *prediction* variable (e.g., success in fnal exams) and displays the tendencies in the metrics. If the educator combines all this graphical information, that is the result of the analytics processing, she will be able to *associate the numerical facts with each student's progress and learning needs*.

### **Questions and Teaching Materials**

1. **How can learning analytics contribute to human learning?**


Correct answer: c.

	- (a) **Descriptive statistics (e.g., mean, standard deviation, min, max) descriptive analytics (e.g., course enrolments, course compliance rates, what learning resources are accessed and how often)**
	- (b) **Correlation analysis (e.g., ANOVA, t-test) descriptive analytics (e.g., course enrolments, course compliance rates, what learning resources are accessed and how often)**
	- (c) **Regression analysis predictive analytics (e.g., high/low performance, high/low engagement)**
	- (d) **Machine learning (e.g., classifcation) predictive analytics (e.g., high/ low performance, high/low engagement)**

Correct answer: a.

### 3. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to elaborate on your response about the learning analytics interpretations, in the following refective task:

1. **Provide 2 examples of learning analytics objectives and explain what learning analytics type you would employ to achieve those objectives. Please, elaborate on your choices**

# *3.4.2 Explaining the Data Analysis Results in an Educationally Meaningful Manner to Understand Learners and the Environment they Learn In*

What analytics cannot do by themselves is improve instruction. While they can point to areas in need of improvement and they can identify engaging practices, the numbers cannot make suggestion for improvements. This requires a human intervention.

Intervention should be personalized to the learner – based on their engagement and/or performance data and any personal information you may have. For example, if the educator notices that a student stopped participating in online forums just before their performance began to drop, it would be proper to encourage the student to resume their involvement in the forums. At the same time, it could be helpful to get feedback from the student to fnd out why they stopped participating. There may have been an event in the course or some other obstacle that the educator should address in order to facilitate the student's involvement in the online forums.

Effective intervention may involve adapting teaching styles. If students tend to do better with certain kinds of media, interactivity, or assessments, the course design should be adapted to enable better learning. However, some learning professionals are hesitant to initiate a learning analytics practice for two reasons: the perception that they must address everything at once, and the concern that leadership will use the insights in a penalizing way. Τhe Learning Analytics to inform teaching practice video (see useful video resources) explains how learning analytics can be used to inform teaching practise.

If a metric is not informing a decision, there's no need to keep gathering it. If it is, optimize the specifc data and learn how to turn it into insights that inform decisions that matter. Over time, add more metrics, always keeping in mind the decisions they inform. The data one collects should be a combination of engagement and performance data – but it is important to make sure that one is not collecting information that will not use. The Jisc Learning Analytics: Making data useful video (see useful video resources) demonstrates an example of how data can be effectively used and how one can give meaning to data.

### **Questions and Teaching Materials**

1. What is the frst step teachers should consider for using learning analytics?


# **3.5 Concluding Self-Assessed Assignment**

# *3.5.1 Introduction*

In order to proceed, you are requested to complete a concluding self-assessed assignment. This self-assessed assignment is a real-life scenario activity (based on the use case of the instructional designer David), using a rubric across three profciency levels and an exemplary solution rating. When you have completed this assignment, you will assess it yourself, following the rubric which will list the criteria required and give guidelines for the assessment.

This self-assessed assignment procedure consists of 5 steps:


# *3.5.2 Step 1. Real Life Scenario*

David is an instructional designer. He always aims to create engaging learning activities and compelling course content. Recently he has been organizing the educational material and learning and assessment activities for a new course, and he wants to design a dashboard to monitor progress, engagement, and performance, both for individual students and for the whole class, that will advance the learning experience. He has available several types of student data tracked by the LMS during students' activities (e.g., login data, content/ educational material access, timestamp for each activity, fle downloads, assignments completed, correctness of assignments, grades on assignments, posting on online forums, quiz scores, discussion participation, etc.), as well as demographic and enrolment data (e.g., age, gender, socioeconomic status, special education needs, course enrolment, etc.). It is important for David to deliver a dashboard that will increase students' self-awareness about their progress, motivate them to self-refect and identify their needs, and fnally enhance their retention and performance.

However, David is new in learning analytics and educational data literacy. Help David design a dashboard that will integrate students' needs and will address the above learning objectives.

# *3.5.3 Step 2. Getting Familiar with the Assessment Rubric*

David has searched on the Internet for Learning Analytics Dashboards samples, to get some design inspiration, and designs and Initial ExampleDB.

Please help David to evaluate this Initial ExampleDB using the Rubrics for assessing the dashboard and to identify potential issues.

**ACTIVITY/PRACTICE QUESTION (Discussion)** We encourage you to elaborate on your response about the evaluation of the Initial ExampleDB created by David, in the following discussion task, by posting your thoughts on the discussion board. You may discuss:


# **3.5.3.1 Initial Example DB**


### **3.5.3.2 Rubric for Assessing the Example DB**

# *3.5.4 Step 3. Prepare Your Answer*

Please assist David to design a prototype of the dashboard that will integrate students' needs and will address the above learning objectives. For this purpose, you will have to design a detailed prototype of the dashboard (using pen and paper and/ or any tool of your preference). Please, consider that David (and you!) has available all types of student data he might need, and help him select the most appropriate ones for each learning objective, mapping the learning analytics metrics to the respective and most suitable type of graph and/or chart.

**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the prototype of the dashboard that David wants to design to increase students' self-awareness about their progress, motivate them to self-refect and identify their needs, and fnally enhance their retention and performance, in the following refective task:


# *3.5.5 Step 4. Review a Sample Solution*

Please review a sample of an Exemplary Sample Solution that follows the criteria specifed in the Rubrics for assessing the dashboard.

**ACTIVITY/PRACTICE QUESTION (Refect on)** We encourage you to elaborate on your response about the Exemplary Sample Solution that follows the criteria specifed in the Rubrics for assessing the dashboard, in the following refective task: Do you identify any design requirements that you did not take under consideration when creating your dashboard prototype?

# **3.5.5.1 Εxemplary Sample Solution**

# *3.5.6 Step 5. Self-Evaluate Your Answer*


Now that you have seen the Exemplary Sample Solution, please rate your initial answer (evaluate the dashboard you created), using the Rubric table below.

# **References**


# *Useful Video Resources*

External video. Learning analytics [4:02].


External video. Learning analytics: The need for a code of ethics[9:59].

External video. Statistics – Introduction to statistics [3:45].


External video. Learning analytics to inform teaching practice [7:44].

External video. Jisc learning analytics: Making data useful [9:22].

# *Further Readings*


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 Teaching Analytics**

# **4.1 Introduction and Scope**

# *4.1.1 Scope*

The goal on this chapter is to:

• introduce the basics of methods and tools for analysing and interpreting educational data for facilitating educational decision making, including course and curricula design.

# *4.1.2 Chapter Learning Objectives*


(continued)


# *4.1.3 Introduction*

This chapter will introduce the basics of methods and tools for analysing and interpreting educational data for facilitating educational decision making, including course and curricula design. Teaching analytics use static and dynamic information about the design of learning environments for near real-time modelling, prediction, and optimisation of learning artefacts, learning designs, learning processes, curriculum designs, and educational decision making.


In order to warm-up, explore the "didactic triangle" in Fig. 4.1 and refect what data may stem from each of the key concepts and related interactions.

**Fig. 4.1** Didactic triangle

# **4.2 Data Sources for Supporting Teaching Analytics**

# *4.2.1 Learning and Teaching*

According to Seel and Ifenthaler (2009), learning involves a stable and persisting change of what a person knows, requiring mental representations. The processes that result in learning (e.g., learning activities) can be and often are distinguished from the products of learning (e.g., learning outcomes), as discussed by Spector et al. (2014). Several theories of learning have been postulated over the 20th and 21st centuries: Behaviourism, Cognitivism, Constructivism, Connectivism. Figure 4.2 illustrates the theories of learning, how learning is conceptualised and what factors may infuence learning.

Teaching is considered as deliberate actions undertaken with the intention of facilitating learning. Hence, when it comes to teaching, the relevant input and output characteristics for designing a learning environment need to be identifed. The elementary parts of teaching include matching of content elements, psychological operations and didactic considerations (Scheerens et al., 2007). Doyle (1985) defnes seven key criteria for effectiveness of teaching as follows:


Table 4.1 provides an overview of phases in the structuring of teaching (Scheerens et al., 2007):

# *4.2.2 Design of Learning Environments*

Learning environments are physical or virtual settings in which learning takes place. Learning theory provides the fundament for the design of learning environments. However, there is no simple recipe for designing learning environments (Ifenthaler, 2012). Generally, the design of learning environments includes the three simple questions: What is taught? How is it taught? How is it assessed? Yet, the design of learning environments is not simply asking the above stated three questions. Rather, it includes a systematic analysis, planning, development, implementation, and evaluation phases (see Fig. 4.3).

**Fig. 4.2** Overview on learning theories. (Ifenthaler & Schumacher, 2016a, b)



Scheerens et al. (2007)

**Fig. 4.3** The ADDIE model. (Gustafson & Branch, 2002)

The analysis phase includes needs analysis, subject matter content analysis, and job or task analysis. The design phase includes the planning for the arrangement of the content of the instruction. The development phase results in the tasks and materials that are ready for instruction. The implementation phase includes the scheduling of instruction, training of instructors, preparing time tables, and preparing evaluation parts. The evaluation phase includes various forms of formative and summative assessments.

# *4.2.3 Learning Design*

Whereas instructional design is rooted in behaviourist learning theories and seems to on the one hand focus on learning products, such as learning objects and machinereadable representations and on the other hand on delivery systems and the advancement of the automation of designs, learning design is rooted in constructivist learning theories and seems to focus on making the design process explicit and shareable. Table 4.2 includes a list of defnitions of learning design exemplifying the roots of this research feld.

# *4.2.4 TPACK Model*

At the heart of good teaching with technology are three core components: content, pedagogy, and technology, plus the relationships among and between them (Mishra & Koehler, 2006). The TPACK model (i.e., Technological Pedagogical Content Knowledge) describes the core components of teaching where content (what you


**Table 4.2** Overview on defnitions of learning design

Ifenthaler et al. (2018)

teach) and pedagogy (how you teach) must be the basis for any technology that is used in a learning environment in order to support and enhance learning (see Fig. 4.4).

Pedagogical Content Knowledge (PCK) is the knowledge that teachers have about their content and the knowledge that they have about how teach that specifc content. Technological Pedagogical Knowledge (TPK) is the set of skills which teachers develop to identify the best technology to support a particular pedagogical approach. Technological Content Knowledge (TCK) is the set of skills which teachers acquire to help identify the best technologies to support their students as they learn content.

**Fig. 4.4** The TPACK model. (Mishra & Koehler, 2006)

**Questions and Teaching Materials**

	- (a) **Active participation and networking.**
	- (b) **Building ties for social networks.**
	- (c) **Providing rewards in relation to achievements.**
	- (d) **Active engagement and stimuli for social collaboration.**

Correct Answer: c

2. **Learning Design and Instructional Design have different origins and conceptual foundations. Still, the purpose of these disciplines can be summarised as follows:**


Correct Answer: b

	- (a) **Learner**
	- (b) **Teacher**
	- (c) **Content**
	- (d) **Technology**
	- (e) **Environment**

Correct Answer: a, b, c

### 4. **Effective teaching includes …**


Correct Answer: b, d, e,

### 5. **A key principle of learning design includes …**


Correct Answer: b

### 6. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to refect on your teaching experience supported through data. You may refect on:


# **4.3 Data Sources Within the Instructional Design Process**

# *4.3.1 Broadening the Perspective for Data-Driven Education*

The idea of grounding instructional design decisions on educational data has been around for some time. Traditionally, evidence-based instruction has used summative evaluation data to (re-)design instructional programs and systems. Immediate interventions based on formative evaluations have been conducted signifcantly less frequently. Research on learning and instruction brought attention to additional data sources, as summarized in the 3P-model of teaching and learning (Biggs et al., 2001): "presage" data focuses on student factors and the teaching context, "process" data on learning focused activities, and "product" data on learning outcomes. Historically, most of this data has been collected with social science research methods. Surveys and questionnaires have been used most often, at times supplemented by different forms of observations.

Online teaching and learning has created a wide range of opportunities for datadriven education. A lot more data sources are now at hand, as well as new technologies for data handling and analysis. While it seems impossible to create a complete list of potential data sources, educational data and the respective data sources can be systematized with a number of attributes.

Educational data can be primary data (direct data), that is: data that is especially collected for the purpose of improving teaching and learning. Secondary (indirect) data, on the other hand, has been initially collected for other purposes, but can also be used for teaching analytics. Data can be collected candid and transparent. This means that the purpose of data collection is clear, as for example in a direct survey, interview or an eye-tracking study. Educational data can also be collected automatically and with little or no transparency, as it is the case with user trails within the system or logging data. Educational data can be oriented toward the learning

**Fig. 4.5** Holistic learning analytics framework. (Ifenthaler & Widanapathirana, 2014)

outcome or the learning process. Educational data can be static, that is: stable over a defned period of time (e.g., personality traits). Educational data can be dynamic, that is: volatile over the course run (e.g., motivational and emotional states). Educational data can be sourced on the individual or on a collective level. Educational data can be idiosyncratic or generalizable. Educational data can refer to learner variables (person focus; i.e. personal learning goals), it can refer to contextual variables (environment focus; i.e. curricular learning objectives), or to learning behaviour (person-environment-interaction focus; i.e. course performance). Finally, educational data can be open and accessible to anyone (i.e., curriculum data, syllabi), or it can be protected (i.e., discussion posts within a course environment) – a distinction which is not always as straightforward as it may sound (Greller & Drachsler, 2012).

# *4.3.2 Data Sources Within a Holistic Analytics Framework*

Ifenthaler and Widanapathirana (2014) developed and empirically validated a holistic learning analytics framework that connects a number of different data sources (#1 to #5). A major aim of this model is to create a link between learner characteristics (e.g., prior learning), learning behaviour (e.g., access of materials), and curricular requirements (e.g., learning objectives, sequencing of learning) (see Fig. 4.5).

# *4.3.3 Sources of Learner Data*

Within the holistic learning analytics framework (see Fig. 4.5), three main areas of learner data and respective data sources have been differentiated. Characteristics of (1) individual learners include socio-demographic information, personal preferences and interests, responses to standardized inventories (e.g., learning strategies, achievement motivation, personality), demonstrated skills and competencies (e.g., computer literacy), acquired prior knowledge and proven academic performance, as well as institutional transcript data (e.g., pass rates, enrolment, dropout, special needs). Associated interactions with the (2) social web include preferences of social media tools (e.g., Twitter, Facebook, LinkedIn) and social network activities (e.g., linked resources, friendships, peer groups, web identity). Physical data (3) from outside the educational system is collected through various systems, for example through a library system (i.e., university library, public library). Other physical data may include sensor and location data from mobile devices (e.g., study location and time), or affective states collected through reactive tests (e.g., motivation, emotion, health, stress, commitments). Especially non-cognitive (i.e., emotional and motivational data) can provide deep insights into individual learning processes (D'Mello, 2017).

# *4.3.4 Sources of Online Learning Data*

Furthermore, there are two areas of data and respective data sources related to online learning behaviour (see Fig. 4.6). Rich information is available from learners' activities in the online learning environment (5) (i.e., learning management system, personal learning environment, learning blog). These mostly numeric data refer to logging on and off, viewing or posting discussions, navigation patterns, learning paths, content retrieval (i.e., learner-produced data trails), results on assessment tasks, responses to ratings and surveys. More importantly, rich semantic and contextspecifc information is available from discussion forums as well as from complex learning tasks (e.g., written essays, wikis, blogs). Additionally, interactions of facilitators with students and the online learning environment are tracked. Closely linked to the information available from the online learning environment is the curriculum information (5), which includes metadata of the online learning environment. These data refect the learning design (e.g., sequencing of materials, tasks, and assessments), and learning objectives as well as expected learning outcomes (e.g., specifc competencies). Ratings of materials, activities, and assessments as well as formative and summative evaluation data are directly linked to specifc curricula, facilitators, or student cohorts (Ifenthaler & Widanapathirana, 2014).

In summary, teaching analytics use static and dynamic data sources for informing learning and teaching processes as well as outcomes. The Figure below summarises the profles approach which includes static and dynamic data from students

**Fig. 4.6** Profles approach using static and dynamic data. (Ifenthaler & Widanapathirana, 2014)

(e.g., demographic information, academic performance), dynamic data of learning behaviour (e.g., navigation pathways), and static data defned in the curriculum (e.g., learning outcomes, learning artefacts).

### **Questions and Teaching Materials**

	- (a) **Forum activity, interaction with learning materials, assessment attempts**
	- (b) **Forum posts and historical grades**
	- (c) **Forum visits and learning objectives**
	- (d) **Forum activity, emotional states, place of living**

Correct Answer: a.

	- (a) **They help to understand the needs of a learner.**
	- (b) **They function as benchmark for adaptive feedback a teacher can relate to.**
	- (c) **They help the administrator to monitor the expertise of a teacher.**
	- (d) **Active engagement and stimuli for social collaboration.**

Correct Answer: d.

### 3. **What outcomes can be produced from a reporting engine?**


Correct Answer: a, b, e.

### 4. **The profles approach includes the following parameters**


Correct Answer: b, c.

### 5. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to refect on your teaching experience supported through data. You may refect on:


# **4.4 Key Concepts of Data Quality and Limitations of Data Meaningfulness**

# *4.4.1 Data Quality in Educational Contexts*

As the amounts of educational data grow larger, the issue of data quality is becoming more and more important. 'Big Data' in education is characterized by the same attributes as in other domains: Volume, Velocity, Variety, and Value (Katal et al., 2013). Volume refers to the tremendous volume of the data, usually measured in TB or above. Velocity means that data are being formed at an unprecedented speed and must be dealt with in a timely manner. Variety indicates that big data has all kinds of data types, and this diversity divides the data into structured data and unstructured data. Finally, Value represents low-value density. Value density is inversely proportional to total data size, the greater the big data scale, the less relatively valuable the data (Cai & Zhu, 2015).

Already on a smaller scale, data quality is of crucial importance for teaching and learning analytics, as 'poor data' can impede valid inferences and hamper subsequent educational interventions. However, there is no common defnition of educational data quality to date. If the broad ISO 9000:2015 defnition of quality is applied, data quality can be defned as the degree to which a set of characteristics of data fulfls pre-defned requirements. These requirements are usually described in quality dimensions, each with specifc elements and indicators for measurement (Cai & Zhu, 2015).

Despite the complexity of the topic, the majority of the numerous frameworks on data quality share a common core of quality dimensions that can be transferred to education datasets (Akoka et al., 2007; Goasdoué et al., 2007; Laranjeiro et al., 2015): completeness, accuracy, consistency, freshness and relevancy.

# *4.4.2 Core Dimensions of Data Quality*

Data Accuracy is defned as the correctness and precision used for representing real world data in an information system. Data needs to be precise, valid and errorfree. Three main accuracy defnitions have been established in current research literature: (i) Semantic correctness which describes how well data represent states of the realworld, i.e., identifying the semantic distance between system-based data and realworld data. For instsance, the recorded address "99, Main Street" is actually the address of Mary? (ii) Syntactic correctness related to the degree to which data is free of syntactic errors, for example misspellings and format discordances, i.e., identifying the syntactic distance between system-based data and expected data representation. For example, the address "99, Main Street" is valid and well written? (iii) Precision refers to the level of detail of data representation, i.e., identifying the gap between the level of detail of system-based data and its expected level of detail

(Peralta, 2006). For instance, the amount "€ 98" is a more precise representation of the cost of a product than "€ 100".

Data Completeness is defned as the degree to which all relevant data have been recorded in an information system. It is expected that all relevant facts of the real world are represented in the information system (Gertz et al., 2004). Two aspects of completeness are differentiated: (i) Coverage meaning whether all required entities for an entity class are included; (ii) Density describing whether all data values are present (not null) for required attributes (Peralta, 2006).

Data Consistency refers to the degree to which data satisfes a set of integrity requirements. Common requirements of data constancy include check for null or missing values, key uniqueness or functional dependencies (Peralta, 2006).

Data Freshness introduces the idea of how old is the data: Is it fresh enough with respect to the user expectations? Has a given data source the more recent data? Is the extracted data stale? When was data produced? There are two main freshness defnitions in the literature: (i) Currency describes how stale is data with respect to the sources. It captures the gap between the extraction of data from the sources and its delivery to the users. For example, given an account balance, it may be important to know when it was obtained from the bank data source. (ii) Timeliness describes how old is data (since its creation/update at the sources). It captures the gap between data creation/update and data delivery. For example, given a top-ten book list, it may be important to know when the list was created, no matter when it was extracted from sources (Akoka et al., 2007).

Data relevancy corresponds to the usefulness of the data. Among the huge volumes of data, it is often diffcult to identify that which is useful. In addition, the available data is not always adapted to user requirements. This might lead to the impression of poor relevancy. Relevancy plays a crucial part in the acceptance of a data source. This dimension, usually evaluated by rate of data usage, is determined by the user and thus not directly measurable by quality tools.

# *4.4.3 Dimensions of Educational Data Quality*

Valid examples for the core dimensions of educational data quality from the educational context could include the following (see Table 4.3):

# *4.4.4 Data Quality Problems*

Laranjeiro et al. (2015) classify data quality problems with respect to the source of information: single or multiple. Single-source problems are related with the (wrong or absent) defnition of integrity constraints. Multi-source problems relate with the integration of data from multiple sources, which, for instance, might hold different representations of the same values, or contradictions. Each of these two classes of


**Table 4.3** Dimensions of educational data quality

problems are further divided into schema-level, which are related with defects in the defnition of the data model and schema, and instance-level which are problems that are not visible at the schema level and cannot be prevented by restrictions at the schema level (or by redesign).

In exchange for the user-determined 'relevancy'-Dimension, the authors added 'Accessibility: The degree to which data can be accessed in a specifc context of use' to their synopsis of data quality problems (see Table 4.4).

### **Questions and Teaching Materials**

### 1. **An example for data accuracy is**


Correct Answer: c.

### 2. **Volume is referring to**


Correct Answer: d.


**Table 4.4** Data quality problems mapped into dimensions

### 3. **Missing data with reference to data quality can be mapped to**


Correct Answer: b, e.

### 4. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to refect on your teaching experience supported through data. You may refect on:

• **Please think of one type of educational data as introduced in the previous section. How would this data have to be characterised on the different dimensions of data quality in order to be good source of information? Please explain your indicators to the dimensions and explain your ratings according to those indicators.**

# **4.5 Data Ethics and Privacy Principles for Teaching Analytics**

# *4.5.1 Ethical and Privacy Challenges Associated with the Application of Educational Data Analytics*

Educational institutions have always used a variety of data about students, teachers and the learning environment, such as socio-demographic information, grades on entrance qualifcations, or pass and fail rates, to inform their curricular planning, academic decision-making as well as for resource allocation. Such data can help to successfully predict student's dropout rates and to enable the implementation of strategies for supporting learning and instruction as well as retaining students (Ifenthaler & Tracey, 2016). However, serious concerns and challenges are associated with the application of data analytics in educational settings:


Consequently, educational institutions need to address ethics and privacy issues linked to educational data analytics: They need to defne who has access to which data, where and how long the data will be stored, and which procedures and algorithms to implement for further use of the available educational data (Ifenthaler, 2015).

# *4.5.2 Privacy in the Digital World*

Within the digital world, many individuals are willing to share personal information without being aware of who has access to the data, how and in what context the data will be used, or how to control ownership of the data. Accordingly, data are generated and provided automatically by online systems, which limits the control and ownership of personal information in the digital world (Slade & Prinsloo, 2013).

There are several reasons why learners would like to keep their information private: First, there are competitive reasons, for example, if a learner performs poorly, a fellow student shall not know about it. Second, there are personal reasons, for example a learner might not want to share information about him−/ herself. There are also country-specifc differences who owns the personal data. In the United States the collected data belongs to the collectors. In Europe the personal data belongs to the individual (e.g., the learner).

Table 4.5 provides an overview of privacy theories in the digital age. The frst two concepts (1, 2) emphasize requirements for reaching privacy in a certain situation and focus on protection and normative or descriptive privacy. Early privacy theories (3) are based on control or limitation: Control refers to the infuence of individuals on the fow of their personal data, whereas limitation means the possibility to prevent others from accessing personal data. Contemporary privacy theories (4) incorporate these earlier theories as well as normative and descriptive privacy concepts but go beyond them in being more holistic and applicable to different contexts (Ifenthaler & Schumacher, 2016).

# *4.5.3 Ethical Principles*

Ethical principles for educational data analytics have been developed to underpin decision-making processes and provide guidance in the application of ethics (West et al., 2016). The key principles, as outlined and used in healthcare settings, are also relevant to the discussion of educational data analytics:


**Table 4.5** Overview of privacy concepts

Contemporary privacy theories are more holistic and go beyond the early theories of privacy; they were developed to apply them to diverse contexts

Ifenthaler and Schumacher (2016a, b)


Figure 4.7 presents a four step framework that views ethical decision making as an operational process. The aim of this framework is to concisely model how a complex issue can be mapped, refned, decided on, and documented within a fairly linear process that would suit the busy operating environments of most institutions. There may be circumstances where refection or new information means retracing earlier steps and the framework does not oppose doing so (West et al., 2016).

# **Questions and Teaching Materials**

	- (a) **Respect for autonomy**
	- (b) **Building advantages over competitors**

**Fig. 4.7** Ethical decision making process for learning analytics. (West et al., 2016)


Correct Answer: a, c, d.

	- (a) **No**
	- (b) **Yes**

Correct Answer: b.

	- (a) **Environmental reasons**
	- (b) **Competitive reasons**

### (c) **Technical reasons**

### (d) **Personal reasons**

Correct Answer: b, d.

### 4. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to refect on your teaching experience supported through data. You may refect on:


# **4.6 Identify Issues of Authorship, Ownership, Data Access and Data-Sharing**

# *4.6.1 Privacy Calculus*

To enhance the acceptance of educational data analytics, it is relevant to involve all stakeholders as early as possible. Students need to be considered in particular, as they take on two roles in the educational data analytics: (1) as producers of analytics data and (2) as recipients of the analyses derived from them (Slade & Prinsloo, 2013).

Figure 4.8 shows the deliberation process for disclosing information for educational data analytics. Students assess their concern over privacy on the basis of the specifc information required for the learning analytics system (e.g., name, learning history, learning path, assessment results, etc.). This decision can be infuenced by risk-minimizing factors (e.g., trust in the learning analytics systems and/or institution, control over data through self-administration) and risk-maximizing factors (e.g., non-transparency, negative reputation of the learning analytics system and/or institution). Concerns over privacy are then weighed against the expected benefts of the learning analytics system. The probability that the students will disclose required information is higher if they expect the benefts to be greater than the risk. Hence, the decision to divulge information on learning analytics systems is a cost– beneft analysis based on available information to the student.

# *4.6.2 Educational Data Analytics Benefts*

Table 4.6 provides a matrix outlining the benefts of educational data analytics for stakeholders including three perspectives: (1) summative, (2) real-time/formative, and (3) predictive/prescriptive. The summative perspective provides detailed

**Fig. 4.8** Deliberation process for sharing information for learning analytics systems. (Ifenthaler & Schumacher, 2016a, b)

insights after completion of a learning phase (e.g., study period, semester, fnal degree), often compared against previously defned reference points or benchmarks. The real-time or formative perspective uses ongoing information for improving processes through direct interventions. The predictive or prescriptive perspective is applied for forecasting the probability of outcomes in order to plan for future strategies and actions (Ifenthaler, 2015).

Each cell of the educational data analytics benefts matrix includes examples to be implemented at different phases of the learning process as well as for different purposes. When choosing a specifc beneft of educational data analytics, the teacher, e-Tutor or instructional designer needs to understand:


In sum, data ownership refers to the possession of, control of, and responsibility for information. Questions surrounding the ownership of data include considerations of who determines what data is collected, who has the right to claim possession over that data, who decides how any analytics applied to the data are created, used and shared, and who is responsible for the effective use of data. Ownership of data also


**Table 4.6** Educational data analytics benefts matrix

Ifenthaler (2015)

relates to the outsourcing and transfer of data to third parties. A number of scholars point to the lack of legal clarity with respect to data ownership (Corrin et al., 2019). With the absence of legal systems in place to address this issue, the default position has been that the "data belongs to the owner of the data collection tool [who is], typically also the data client and benefciary" (Greller & Drachsler, 2012, p.50).

# *4.6.3 Data for Instructional Support*

Personalised learning is the notion of customising learning resources and activities to ft the interests and needs of individual learners. As with many educational technologies, personalised learning has a long history. However, with the growth of the Internet and ICTs and the advancement of intelligent systems, it is possible to use learning analytics as the basis for automated recommendation engines that drive individualised e-learning. This technology has been promised by several emerging LMSs, but has not yet become a sustainable reality on any scale. However, personalised learning technology can signifcantly change how instruction occurs and transform the notion of a learning place dramatically (Spector & Ren, 2015). Hence, data is a critical tool that makes this personalised learning possible. When students, parents, and teachers are empowered with access to timely, useful, safeguarded data, there are so many ways to support students on their path to success.

# *4.6.4 Data for Instructional Support*

Corrin et al. (2019, p. 11) provide a well-informed overview on issues of educational data analytics focussing on (a) consent and (b) anonymity.

Consent is referred to as entering into a contract with data subjects in order to obtain their permission for their data to be gathered and analyzed. Consent must be informed in order to be valid; consequently, people should be given clear and transparent information about the purposes for data collection so that they may give informed consent. They should have the ability to opt out of having their data gathered at any time. Consent is not always a simple matter because it is not always a legal requirement, such as when data gathering is judged required for an organization's 'legitimate interests.' (Corrin et al., 2019, p. 32). An example refering to the issues of students not being able to opt out of having their data collected is given in the JISC code of practice (http://repository.jisc.ac.uk/6985/1/Code\_of\_Practice\_ for\_learning\_analytics.pdf).

A more challenging ethical practice is informed consent in the context of learning analytics, which has been critically debated in recent learning analytics research. West et al., 2016 refer to the problematic relationship between 'consent' and 'informed consent' noting that these concepts are often confated in higher education digital environments. For example, students are frequently asked to agree for their data to be collected, however, the purposes for which the data will be used is hidden or is not communicated clearly (West et al., 2016, p. 914). Cormack (2016) adds that it is not always clear prior to the collection and analysis of data what correlations will emerge or what the impact on individuals will be. This fact seems to make it diffcult for educational organizations to communicate clear and transparent information about the use and purposes of data being collected and for of obtaining informed consent.

Individuals are given the option of concealing or revealing their identity and any identifying information about themselves through anonymity. Individuals' identities may be de-identifed before data is shared or analyzed in the feld of learning analytics. Despite the fact that it is widely recognized that institutions should make every attempt to anonymize data, experts have claimed that anonymity cannot always be guaranteed. "Anonymized data can relatively readily be de-anonymized when they are integrated with other information sources," according to Drachsler and Greller (2016, p. 94). Anonymity also limits the possible applications of learning analytics because it hinders or precludes meaningful bilateral communication, as well as the capacity for student intervention, feedback, and assistance.

# *4.6.5 Data Privacy in Productive Systems*

One of the main concerns of educational data analytics is the handling of data privacy issues. As almost every learning analytics feature collects and processes user data by default, it is inevitable to consider this topic, particularly in regard of the country's data privacy act. It is even more important when the decision is to work within the running, productive environment of the educational institution as soon as possible.

As shown in the Fig. 4.9, the educational institution decided to use a pseudonymisation in two steps. Wherever a direct touch with students' activities occurs, a 32-bit hash value as an identifer is used. All tracking events and prompting requests use this hash value to communicate with the core application. The core API then takes this hash, enriches it with a secret phrase (a so-called pepper) and hashes

**Fig. 4.10** Individual setting for data collection and analytics. (Klasen & Ifenthaler, 2019)

it again. The doubled hash is then stored within the core's database. As a result, a match with new student generated data can be made to already existing data without being directly traceable back to a specifc student by a given date within the database.

Another important issue for implementing educational data analytics in productive systems is the setting of data collection and data analytics functionalities. Figure 4.10 shows an example implemented in a productive Learning Management System allowing the student to change the setting for data collection and data analytics anytime. In addition, the student may request to delete the data stored or download all stored data for self-inspection. Hence, compliance with EU GDPR is given in this case.

Given the examples how to implement data privacy settings in productive systems, think about your own institution and how you may implement similar features in order to be compliant with the EU GDPR.

# *4.6.6 Case Study: Curtin Challenge I*

This case study demonstrates how the analysis of navigation patterns and network graph analysis informs the learning design of self-guided digital learning experiences.

The Curtin Challenge digital learning platform (http://challenge.curtin.edu.au) supports individual and team-based learning via gamifed, challenge-based, openended, inquiry-based learning experiences that integrate automated feedback and rubric-driven assessment capabilities. The Challenge platform is an integral component of Curtin University's digital learning environment along with the Blackboard learning management system and the edX MOOCs platform. The Challenge development team at Curtin Learning and Teaching are working towards an integrated authoring system across all three digital learning environments with the view of creating reusable and extensible digital learning experiences (Ifenthaler et al., 2018).

Curtin Challenge includes several content modules, for example Leadership, Careers and English Language Challenge. Since 2015, over 2600 badges have been awarded for the completion of a challenge. The design features of each module contain approximately fve activities that might include one to three different learner interactions.

Educational analytics data for the presented case study includes 2,753,142 database rows. Overall, 3550 unique users registered and completed a total of 14,587 navigation events within a period of 17 months. Figure 4.11 provides an overview of modules started (M = 3427, SD = 2880) and completed (M = 2903, SD = 2303)

**Fig. 4.11** Module completion of Curtin Careers Challenge. (Ifenthaler et al., 2018)

for the Curtin Careers Challenge. The average completion rate for the Curtin Careers Challenge was 87%. The most frequently started module was "Who am I?" (10,461) followed by the module "Resumes" (7996). The module "Workplace Rights and Responsibilities" showed the highest completion rate of 96%, followed by the module "Interviews" (92%).

# *4.6.7 Case Study: Curtin Challenge II*

The network analysis identifes user paths within the learning environment and visualises them as a network graph on the fy. The dashboard visualisations help the learning designer to identify specifc patterns of learners and may reveal problematic learning instances. The nodes of the network graph represent individual interactions. The edges of the network graph represent directed paths from one interaction to another. The indicator on the edges represent the frequency of users taking the path from one interaction to another and in parenthesis the percentage of users who took the path. An aggregated network graph shows the overall navigation patterns of all users. A network graph can be created for each individual user, for selected groups of users (e.g., with specifc characteristics), or for all users of the learning environment.

The aggregation of all individual network graphs provides detailed insights into the navigation patterns of all users. Figure 4.12 shows the aggregated network graph including paths taken by all 3550 users showing 14,587 navigation events. The fve modules are highlighted using different colours.

Provided the case study above, the following questions arise:


**Fig. 4.12** Aggregated network graph. (Ifenthaler et al., 2018)

### **Questions and Teaching Materials**

	- (a) **False**
	- (b) **True**

Correct Answer: a.

	- (a) **Conduct cross-institutional comparisons**
	- (b) **Track enrolments**
	- (c) **Allocate fnancial resources.**
	- (d) **Plan for interventions.**

Correct Answer: d.

	- (a) **Non-transparency**
	- (b) **Positive reputation**
	- (c) **Holistic marketing of data**
	- (d) **Established data regulations**

Correct Answer: a, c.

### 4. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to refect on your teaching experience supported through data. You may refect on:

• **Are you able to provide your students all the data collected about them when they may request it?**

# **4.7 Applying and Communicating Educational Data and Analytics Findings**

# *4.7.1 Adaptive Learning Technologies*

Adaptive learning and teaching are an alternative to the traditional "one-size-ftsall" approach in the development of digital learning environments. Adaptive learning systems build a model of the goals, preferences and knowledge of each individual learner, and use this model throughout the interaction with the learner, in order to adapt to the needs of that learner (Brusilovsky, 1996). Educational data analytics provides the key element for designing and implementing adaptive learning experiences. In sum, adaptive learning and teaching are referred to as customised learning experiences that focus on the just-in-time need of an individual learning by providing meaningful interventions, feedback or support.

Learning management systems (LMSs) are most commonly used in technologyenhanced learning, typically present identical courses and content for every learner without consideration of the learner's individual characteristics, situation, and needs (Graf & Kinshuk, 2014). As seen in Massive Open Online Courses, such a one-sizefts-all strategy frequently leads to frustration, learning challenges, and a high dropout rate (MOOCs).

Adaptive learning technologies aim to solve this problem by allowing learning systems to automatically modify the learning environment and/or learning activities to the learners' unique situation, traits, and needs, resulting in individualized learning experiences. The system must represent the student and the learning setting in order to create adaptive interventions. This is where data and analytics are required. According to Graf and Kinshuk (2014), adaptive interventions can be based on the following areas:


Other common approaches besides "adaptive learning system" include "personalized learning system" which emphasizes the aim of the system to consider a learner's individual differences. "Intelligent learning (or tutoring) system" focus on the use of techniques from the feld of artifcial intelligence to provide learning support.

The phrase "adaptive learning system," on the other hand, emphasizes a learning system's ability to provide different courses, learning materials, or learning activities for different learners automatically. Adaptive, personalized, and intelligent learning systems are those that use learning analytics to tailor instruction to learners' traits and requirements. In their framework of personalization in technology enhanced learning, FitzGerald et al. (2018) characterized learning analytics systems as follows (see Table 4.7):


**Table 4.7** Personalization dimensions and learning analytics

FitzGerald et al. (2018)

# *4.7.2 Automated and Semi-Automated Interventions*

Closely linked to the demand of new approaches for designing and developing upto-date adaptive learning environments is the necessity of enhancing the design and delivery of assessment systems and automated computer-based diagnostics (Almond et al., 2002; Ifenthaler et al., 2010). These systems need to accomplish specifc requirements, such as:


Recently, promising methodologies have been developed which provide a strong basis for applications in learning and instruction in order to follow up with the demands that come with better theoretical understanding of the phenomena that are a prerequisite or an integral part or go along with the learning process.

Several possible solutions to the assessment and analysis problems of knowledge representations have been discussed (Ifenthaler & Pirnay-Dummer, 2014). Therefore, it is worthwhile to compare the model-based assessment and analysis approaches in order to illustrate their advantages and disadvantages, strengths and limitations (see Table below). Yet, there is no ideal solution for the automated assessment of knowledge. However, within the last fve years, strong progress has been made in the development of model-based tools for knowledge assessment. Still, Table 4.8 highlights necessary further development of the available tools, especially for everyday classroom application.


**Table 4.8** Comparison of model-based assessment tools

# *4.7.3 Instructional Design Principles for Adaptivity*

Leutner (2004) has summarized ten instructional design principles for fostering adaptivity in open learning environments. These principles highlight various instructional elements that can be designed for adaptivity and personalized learning. The principles are:

Adapting …


# **Questions and Teaching Materials**

	- (a) **Features such as need for fnancial study support help to build adaptive interventions**
	- (b) **Features related to the social environment can help to build adaptive interventions**
	- (c) **Features related to cognitive processing can help to build adaptive interventions**
	- (d) **Features such as need for social collaboration help to build adaptive interventions**
	- (e) **Plan for interventions**

Correct Answer: c.

	- (a) **Adapting the speed of algorithms for data processing**
	- (b) **Adapting the presentation format of learning artefacts**
	- (c) **Adapting the task diffculty**
	- (d) **Adapting the sequence of instructional units**

Correct Answer: b, c, d.

	- (a) **False**
	- (b) **True**

Correct Answer: b.

# 4. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to refect on your teaching experience supported through data. You may refect on:


# **4.8 Methodologies for Improving Learning and Teaching Processes as Well as Curricula**

# *4.8.1 Creating Interventions in Classroom Settings*

Following Ann L. Brown's (1992) article, the effective methodology for improving learning and teaching processes as well as curricula is the combination of creating innovative educational environments and conducting experimental studies of those innovations. The so called design experiment is illustrated in the Fig. 4.13. Brown (1992) explains, that a functional classroom is central to the design experiment before an investigation can be implemented. Hence, classroom life is synergistic: Aspects of it that are often treated independently, such as teacher training, curriculum selection, testing, and so forth actually form part of a systemic whole. Just as it is impossible to change one aspect of the system without creating perturbations in others, so too it is diffcult to study any one aspect independently from the whole operating system. Brown (1992) suggests that we must operate always under the constraint that an effective intervention should be able to migrate from our experimental classroom to average classrooms operated by and for average students and teachers, supported by realistic technological and personal support.

**Fig. 4.13** Features of design experiments. (Brown, 1992)

# *4.8.2 Educational Design Research at a Glance*

Educational Design Research (EDR) or Design-Based Research (DBR) – the terms are mostly used synonymously – is a meta-methodology in educational research. It represents a genre of applied research in which the iterative development of solutions to practical and complex educational problems provides the setting for scientifc inquiry. The solutions can be educational products, processes, programs, or policies. EDR not only targets solving signifcant problems educational practitioners face but at the same time seeks to discover new knowledge that can inform the work of others with similar problems. EDR distinguishes itself from other forms of inquiry by attending to both solving problems by putting knowledge to use, and through that process, generating new knowledge (McKenney & Reeves, 2014). EDR projects seek to establish collaborations among researchers and practitioners in real-world settings in order to avoid the widespread theory vs. practice dilemmata. EDR is closely related to research-based educational design as conducted with teaching and learning analytics, yet entails a bit more. Both concepts are shaped by iterative, data -driven processes to reach successive approximations of a desired intervention. However, research -based educational design focuses solely on intervention development, whereas design research strives explicitly to make a 'transferable' scientifc contribution in form of design principles (McKenney & Reeves, 2014). Major characteristics of Educational Design Research are shown in Table 4.9:

McKenney and Reeves (2014) described a process model for conducting educational design research. Figure 4.14 shows the model which has three main features (Huang et al., 2019):

• **Three core phases in a fexible, interactive structure: analysis, design, and evaluation.**


**Table 4.9** Characteristics of EDR/DBR

Wang and Hannafn (2005)

**Fig. 4.14** Generic model for conduction Educational Design Research. (McKenney & Reeves, 2014)


**Fig. 4.15** Interdependences of system, learning goals, learner, and learning environment. (Pirnay-Dummer et al., 2012, b)

# *4.8.3 Designing Model-Based Learning Environments*

In model-based and model-oriented learning environments two kinds of models need to be considered: (1) the model of the learning goal, which represents the expertise, set of skills, or, in general, the things to be learned and (2) the model within the learner that is constructed and retained in dependence on the learning environment and on the basis of the current epistemic beliefs active within the learner, i.e., whether and how the learner usually explains parts of the world. We will abbreviate the frst type as the LE model (model of the learning environment) and the L model (model of the learner), always assuming that the two types are closely intertwined, especially in well-designed learning environments (Pirnay-Dummer et al., 2012, b).

As shown in Fig. 4.15 above, the educational system (meso- and exo-system) and the learners have different infuences on the learning goals at different times. The learning goals constitute the constraints for the learning environment. The learning environment is a manifestation (a derivate) of the LE model. Possible and available learning environments (technology and/or best practices) infuence the system by setting the boundaries for what is possible – and decidable as regards educational planning. The learner has infuence on the learning environment (as more or less pre-structured by its design). Learning takes place as soon as the LE model and the L model interact. During that time, the learning goal infuences and guides the interaction between the two models. LE model-oriented technologies usually focus on the L model while model-centered technologies concentrate more on the LE model. It is our understanding that the two (very similar) approaches will always go hand in hand and infuence each other (Pirnay-Dummer et al., 2012, b).

# **Questions and Teaching Materials**

	- (a) **EDR is well grounded**
	- (b) **EDR is following a single set of statistical procedures**
	- (c) **EDR is related to contextual issues.**
	- (d) **EDR is integrating various methods and approaches**

Correct Answer: b.

	- (a) **core phase management**
	- (b) **core phase analysis**
	- (c) **core phase design**
	- (d) **core phase transformation**
	- (e) **core phase evaluation**

Correct Answer: b, c, e.

	- (a) **No**
	- (b) **Yes**

Correct Answer: a.

# 4. **ACTIVITY/PRACTICE QUESTION (Refect on)**

We encourage you to refect on your teaching experience supported through data. You may refect on:


# **4.9 Concluding Self-Assessed Assignment**

# *4.9.1 Introduction*

You are requested to complete a concluding self-assessed assignment. This selfassessed assignment is a real-life scenario activity (based on the use case of the instructional designer David), using a rubric across three profciency levels and an exemplary solution rating. When you have completed this assignment, you will assess it yourself, following the rubric which will list the criteria required and give guidelines for the assessment.

This self-assessed assignment procedure consists of 5 steps:


# *4.9.2 Step 1. Real Life Scenario*

David is an instructional designer. He recently got involved in a newly funded European research project which focusses on the implementation of teaching analytics for a workplace learning environment. The workplace learning environment includes data collection capabilities for students and teachers. All relevant data a securely stored. Data protection rights have been recognised and are fully in place, following EU-GDPR. In addition to the implementation part of the project, all project partners agreed to follow an educational design research approach.

While David started to better understand the key features of teaching analytics and how to conduct educational design research, he knows that you just recently learned about these topics as well. Can you help David to create a strategy for implementing robust teaching analytics capabilities following the **learning analytics profles** (student, learning, curriculum) approach?

Another challenge, for which David asks for your help, focusses on the benefts of **learning analytics design**, i.e., using available data from the workplace learning environment to provide dynamic perspectives including design decisions during the course of learning. Can you point out three benefts David may use for his project?

# *4.9.3 Step 2. Prepare Your Answer*

The implementation of robust teaching analytics capabilities is crucial for the design, implementation and development of digital-enhanced learning environment. Think about your own educational institution and the current implementation strategy.


# *4.9.4 Step 3. Exemplary Sample Solution*

### **Learning Analytics Profles**

The strategy for implementing robust teaching analytics capabilities in the workplace learning environment require at least the following key issues while following the three profles (1) student profle, (2) learning profle, (3) curriculum profle:

The student profle includes static and dynamic indicators. Static indicators include gender, age, education level and history, work experience, current employment status, etc. Dynamic indicators include interest, motivation, response to reactive inventories (e.g., learning strategies, achievement motivation, emotions), computer and social media competencies, enrolments, drop-outs, pass/fail rate, academic performance, etc.

The learning profle includes indicators refecting the current behaviour and performance within the learning environment (e.g., learning management system). Dynamic indicators include trace data such as time specifc information (e.g., time spent on learning environment, time per session, time on task, time on assessment). Other indicators of the learning profle include login frequency, task completion rate, assessment activity, assessment outcome, learning material activity (upload/ download), discussion activity, support access, ratings of learning material, assessment, support, effort, etc.

The curriculum profle includes indicators refecting the expected and required performance defned by the learning designer and course creator. Static indicators include course information such as facilitator, title, level of study, and prerequisites. Individual learning outcomes are defned including information about knowledge type (e.g., content, procedural, causal, meta cognitive), sequencing of materials and assessments, as well as required and expected learning activities.

The available data from all data profles are analysed using pre-defned analytic models allowing summative, real-time, and predictive comparisons. The results of the comparisons are used for specifcally designed interventions which are returned to the corresponding profles. The (semi-) automated interventions include reports, dash-boards, prompts, and scaffolds for teachers. Additionally, teachers can send customised messages for following up with critical incidents (e.g., students at risk, assessments not passed, satisfaction not acceptable, etc.).

### **Learning Analytics Design**

The traditional perspective on learning design is rather static and does not include changes to the learning environment within a short timeframe or while learning processes. In contrast, learning analytics design provides a dynamic perspective including design decisions on the fy. Especially for learning environments with a large number of learners, the benefts of learning analytics design are obvious:



# *4.9.5 Step 4. Rubrics for Assessing Your Work*

# *4.9.6 Step 5. Self-Evaluate Your Answer*

Now that you have seen the exemplary solution, please rate your own work using the criteria in the rubrics for assessing your work.

Calculate your overall score based on the rubrics for assessing your work.


For each of the criteria in the rubric assign to your solution:


Then add up the individual points to calculate your overall score.

My overall score is:

Please mark the applicable answer.


# **References**


Conole, G. (2013). *Designing for learning in an open world*. Springer.


*Technology Research and Development, 64*(5), 877–880. https://doi.org/10.1007/ s11423-016-9480-3


# *Useful Video Resources*

Merrill on Instructional Design **[5:41].**

Enhancing educational data quality in heterogeneous learning contexts using Pentaho Data Integration **[1:09:06].**

Privacy and learning analytics [03:00].

GDPR for Schools & Universities | Is Your University GDPR Ready? [04:23].

GDPR Guidance for Schools [06:20].

Build your data eco-system [04:36].

You Need Data to Personalize Learning [03:10].

Adaptive e-Learning – considerations for lesson developmentYou Need Data to Personalize Learning [02:49].

Education Professor Wins Grant to Develop Digital Game-Based 'stealth' Assessments [02:49]. What are the differences between the various design-based methods of inquiry? [03:49].

# *Further Readings*


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Appendix**

### **Learn2 Analyse Educational Data Literacy Competence Framework**


(continued)

