# Nancy Devlin · David Parkin · Bas Janssen

# Methods for Analysing and Reporting EQ-5D Data

Methods for Analysing and Reporting EQ-5D Data

Nancy Devlin • David Parkin • Bas Janssen

## Methods for Analysing and Reporting EQ-5D Data

Nancy Devlin Centre for Health Policy University of Melbourne Melbourne, Australia

Bas Janssen EuroQol Research Foundation Rotterdam, The Netherlands

David Parkin Office of Health Economics London, UK

#### ISBN 978-3-030-47621-2 ISBN 978-3-030-47622-9 (eBook) https://doi.org/10.1007/978-3-030-47622-9

© The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

#### Foreword By Niek Klazinga

Efforts to capture health and healthcare outcomes through measurement—at personal, group and population level—have been around for many years. In the broad field of today's patient reported measurement instruments, the generic EQ-5D instruments to capture health status stand out for their use over three decades and have been implemented and used for a variety of purposes. These include the systematic assessment of health of populations, both cross-sectionally and over time, (economic) evaluation of health interventions and, more recently, as a supportive tool in the strive towards value-based health care. This means that users may vary from patient and clinicians towards all stakeholders in the healthcare system that seek to underpin their decisions with aggregated patient reported information, such as healthcare managers, financiers and policymakers.

Although a lot of literature has become available over the years on the validity and reliability of the various EQ-5D instruments, general information about how to analyse the data once collected, is scarce and scattered. This book seeks to fill this void and will prove to be a welcome support for all parties who want to analyse the collected data. After a concise explanation of the existing instruments, the reader will find detailed information on data analyses related to topics like the EQ VAS, the calculation and use of EQ-5D values and analyses of EQ-5D data for specific purposes.

With the broadening of the use of generic PROMs to include day to day management of health care and creating more value in the healthcare system, a new audience will start exploring its use. For all users this book will provide support in determining whether for them the EQ-5D instruments are 'fit for use' and how collected data can best be analysed.

Since 2017 the Organization for Economic Cooperation and Development (OECD) has started the Patient Reported Indicator Survey program (PaRIS) to support member states in strengthening a data-driven shift towards value-based healthcare systems. The EQ-5D instruments and the related expertise that has been gained over the years play an important role in this endeavour.

This book can help to turn enthusiasm of collecting data on PROMs into an evidence based and well-informed analysis of the aggregated findings for local, national as well as international comparative use.

> Niek Klazinga, M.D. Ph.D. Strategic Lead Healthcare Quality and Outcomes Program OECD Paris, France

Professor of Social Medicine Amsterdam University Medical Centre, AMC Amsterdam, Netherlands

#### Foreword By Elly Stolk and Gouke Bonsel

When the authors informed us of the plans for this book about EQ-5D, we were immediately excited by its potential to support users in their analysis and reporting of EQ-5D data.

The EQ-5D is a very widely used measure of self-reported health globally. It is a concise, generic questionnaire which is accompanied by value sets, and this has made it particularly widely used in economic evaluation. EQ-5D use is supported by the availability of a wide range of language versions. The EQ-5D 'family' of instruments has expanded to include both three- and five-level versions and a version suitable for use in children, with further instruments planned or in development. User guides are available to support and guide data collection.

However, to date, there has been no comprehensive source of advice to users on the methods that can be applied to analyse EQ-5D data. This book addresses that gap, providing users for the first time with detailed explanation of methods for analysing and reporting EQ-5D data.

One of the main messages in this book is the rich and detailed insights that can be obtained by analysing all aspects of the EQ-5D data provided by respondents. The EQ-5D is unique, as a generic patient reported outcomes questionnaire, in yielding respondents' self-reported descriptions of their own health; value sets which can be used to summarise those descriptions; and a self-assessed measure of overall health, the visual analogue scale (EQ VAS). Each of these elements provides important data, with different properties, that require different methods of analysis and provide different insights.

Economic evaluation of healthcare interventions was the primary field of application for which the EQ-5D developers envisaged the instrument to be useful, and its uptake in this field has made EQ-5D mainstream since its inception. For two decades now, market decisions on pharmaceuticals have relied heavily on the seemingly simple numbers of EQ-5D values. However, even in this context, where the focus is on the use of value sets to estimate QALYs, the full analysis of patients' profile and EQ VAS data can enrich an understanding of patients' health problems and improvements in health from treatment.

Beyond economic evaluation, there are expanding uses of EQ-5D. It has long been used in population health surveys, for example. More recently, the collection of Patient Reported Outcomes Measures (PROMs) has emerged as a major field of application of EQ-5D. The 'PROMification' of healthcare systems reflects a deeply felt need for accountability and quality improvement. In this context, the collection of PROMs such as EQ-5D is at the core of a movement aiming at quality improvement. While not developed for this use, EQ-5D earned a position as primary candidate for PROMs in this setting, due to its brevity and the ample evidence available about its validity as a generic measure of health. Notably, while EQ-5D PROMs data can be analysed by applying value sets, the rationale for doing so is not clear. This broader application of EQ-5D implies a broader and appropriate set of metrics. Increased use of EQ-5D as a PROM has stressed the need for considering what we can learn from the data captured in the EQ-5D descriptive system, but guidance on how to analyse this data has been lacking. It is therefore particularly timely that authors Nancy Devlin, David Parkin and Bas Janssen provide a comprehensive set of methods that can be applied to analyse and report patients' EQ-5D responses in this context.

This book marks the start of a new phase in EQ-5D applications and analysis. We hope the reader feels as inspired as we did when reading the book.

> Elly Stolk, Ph.D. Scientific Team Leader Founding EuroQol Member EuroQol Research Foundation Rotterdam, The Netherlands

Gouke Bonsel, M.D. Ph.D. Founding EuroQol Member EuroQol Research Foundation Rotterdam, The Netherlands

#### Preface

The EQ-5D is a short questionnaire designed to measure patient reported health in a broad, 'generic' manner. Its strength lies both in its brevity; and the ability to measure patient health in a manner that can be compared across patients, diseases and treatments. Since its development nearly three decades ago, it has become the most widely used Patient Reported Outcomes questionnaire internationally, used in population health surveys, clinical studies and in routine outcomes measurement in healthcare systems (Devlin and Brooks 2017).

Yet, despite nearly 30 years of its use, there is no comprehensive guide to users on how to analyse EQ-5D data. The EuroQol group, which developed the EQ-5D, provides users guides, but these have as their emphasis an explanation of the questionnaire and how to collect the data, rather than how to analyse it. We frequently receive requests for advice on how to analyse EQ-5D data, once collected.

Our aim in writing this book is to fill this need by providing clear and comprehensive guidance on the methods which can be used to analyse EQ-5D data. In doing so, we set out to encourage users to make full use of the data collected from patients, in order to maximise the insights that can be obtained. Our intended audience is both new users of the EQ-5D, who may not be familiar with how to analyse the data, and experienced analysts, as a reminder that simple descriptive analysis can yield powerful insights that should precede and inform more sophisticated modelling.

In each chapter, we explain the methods in a straightforward way, with a focus on the underlying measurement properties of the EQ-5D instruments and how that affects the way to approach data analysis. Understanding the nature of the various distinctive elements of data generated by the EQ-5D (the profile, the EQ VAS and the EQ-5D values) is critical, and our focus is on the intuition underlying each approach. We have not set out to write a statistics textbook, so where appropriate we refer readers to appropriate sources for further information.

In order to encourage users to use the methods we describe here, this book will be accompanied by code in the most widely used statistical software: STATA, R, SAS, SPSS and where possible, excel. That code will be free to download and will be available from the EuroQol Group website: www.euroqol.org.

We hope you find this book useful!

Nancy Devlin Professor of Heath Economics University of Melbourne Melbourne, Australia

Senior Visiting Fellow Office of Health Economics London, UK

David Parkin Senior Visiting Fellow Office of Health Economics London, UK

Honorary Visiting Professor, City University of London London, UK

Bas Janssen Senior Scientist, EuroQol Research Foundation Rotterdam, The Netherlands

#### Acknowledgements

The authors are grateful to Yan Feng, Bernarda Zamora, Mark Oppe, Sarah Dewilde, Eleanor Pullenayegum, Ning Yan Gu and Allan Wailoo for their contribution of examples for this book, and to Amy Livingstone for her assistance with seeking copyright permissions. Mike Herdman and Bram Roudijk provided helpful comments on earlier drafts. Funding to write this book was received from the EuroQol Research Foundation. Views expressed in the book are those of the authors and are not necessarily those of the EuroQol Research Foundation.

#### Contents



#### About the Authors

Prof. Nancy Devlin is Professor of Health Economics and Director of the Centre for Health Policy at the University of Melbourne, Australia, and Senior Visiting Fellow at the Office of Health Economics, London. She is President of ISPOR (2019–2020) and past president of the EuroQol Group. Her principal areas of research are the measurement and valuation of Patient Reported Outcomes; and the cost effectiveness thresholds used in making judgments about value for money in health care. Previous books include 'Economic Analysis in Health Care', 'Using Patient Reported Outcomes to Improve Health Care', and 'EQ-5D Valuation Sets: An Inventory, Comparative Review and User Guide'.

Prof. David Parkin is Visiting Professor of Health Economics at City, University of London, and Senior Visiting Fellow at the Office of Health Economics, London. He has been a member of the EuroQol group for over 20 years and was one of the developers of the EQ-5D-5L. He has published extensively on the EQ-5D as well as other topics in health economics and health status measurement. This includes a popular textbook on health economics, 'Economic Analysis in Health Care' with Stephen Morris, Nancy Devlin and Ann Spencer, and the handbook 'Using Patient Reported Outcomes To Improve Health Care', with John Appleby and Nancy Devlin.

Dr. Bas Janssen is an independent researcher and consultant in health economics and outcomes research. He is a senior scientist at the business office of the EuroQol Research Foundation and affiliated to the Erasmus University Medical Center in Rotterdam. He specialises in the measurement and valuation of Patient Reported Outcomes, psychometrics and biostatistics. He has published extensively on EQ-5D, including the handbook 'Self-Reported Population Health: An International Perspective based on EQ-5D' with Agota Szende and Juan Cabasés.

## **Chapter 1 An Introduction to EQ-5D Instruments and Their Applications**

The aims of this chapter are


Our focus, throughout this book, is on the analysis of EQ-5D data. The book is designed to meet the needs of those who have, or are planning to collect, EQ-5D data. Our hope is that this book will encourage all analysts, both those new to the EQ-5D and those experienced in using EQ-5D questionnaires, to make full use of the data provided by respondents, and to maximise the insights possible from those data.

It is also important to say what this book does not address. We do not provide guidance on methods of Patient Reported Outcome (PRO) data collection or PRO study design. For such guidance, you may wish to consult resources such as the SPIRIT-PRO<sup>1</sup> guidelines on inclusion of PROs in clinical trials (Calvert et al. 2018), the United States Food and Drug Administration (FDA) guidance to industry on the use of PRO measures in evidence to support labelling claims (FDA 2009); the European Medicines Agency (EMA) guidance regarding use of health-related quality of life (HRQoL) in labelling studies (EMA 2006); and the various good practice guidelines published by the International Society for Pharmacoeconomics & Outcomes Research (ISPOR), for example on electronic PROs (Zbrozek et al. 2013), and on collection of PROs in paediatric studies (Matza et al. 2013). Also, we do not offer

<sup>1</sup>SPIRIT: Standard Protocol Items: Recommendations for Interventional Trials.

N. Devlin et al., *Methods for Analysing and Reporting EQ-5D Data*, https://doi.org/10.1007/978-3-030-47622-9\_1

guidance on which EQ-5D questionnaire to use in what circumstances—for example, in what populations to use the youth version of the EQ-5D (the EQ-5D-Y); whether to use the three- or five-level version; and how and when to use the paper, telephone, proxy or digital versions. Information on these Issues is provided in the User Guides available online at: www.euroqol.org.

A glossary of the EQ-5D terms used in this and subsequent chapters is in an appendix.

#### **1.1 Measuring Health Using the EQ-5D**

The EQ-5D is a concise, generic measure of self-reported health which is accompanied by weights reflecting the relative importance to people of different types of health problems. The concept of health being measured by EQ-5D is variously described as health status or HRQoL,<sup>2</sup> the latter of which might be defined as:

The value assigned to duration of life as modified by the impairments, functional status, perceptions and social opportunities that are influenced by disease, injury, treatment or policy. (Patrick and Erickson 1993)

The EQ-5D is 'generic' because it measures health in a way that can be compared across different sorts of patients, disease areas, and treatments. The researchers who developed it—the EuroQol Group—aimed to develop a questionnaire which was brief, minimised the burden of data collection, and could be used in a wide variety of health care sector applications (Devlin and Brooks 2017). The '5D' in its name refers to its use of 5 dimensions for describing health states: Mobility, Usual Activities, Self-care, Pain & Discomfort and Anxiety & Depression. In the original EQ-5D questionnaire (Fig. 1.1), now known as the EQ-5D-3L, three levels of problems are described in each dimension, representing no, moderate, or extreme problems in the Pain & Discomfort and Anxiety & Depression dimensions and no, some, and inability to in the Mobility, Usual Activities and Self-care dimensions.<sup>3</sup> In the more recent EQ-5D-5L (Fig. 1.2), the number of levels has been expanded from three to five and these are explicitly expressed as no, mild, moderate, severe and extreme or unable to (Herdman et al. 2011). A version of the instrument, the EQ-5D-Y (Fig. 1.3), has been developed for young people and children, retaining the same five dimensions (Wille et al. 2010).

In each case, the questionnaires are designed mainly for self-completion, either by people who are receiving treatment (for example patients in a clinical trial) or people in other settings (for example a sample of the general public in a population health survey). (As well as the self-report questionnaire, there are also 'interview'

<sup>2</sup>For a discussion of definitional and conceptual issues relating to HRQOL, see Morris et al. (2012), Sect. 11.3.

<sup>3</sup>For the Mobility dimension the worst level is 'confined to bed'.

1.1 Measuring Health Using the EQ-5D 3


**Fig. 1.1** EQ-5D-3L descriptive system. *Source* EuroQol Research Foundation. *EQ*-*5D*-*3L User Guide, 2018*. Latest version available from: https://euroqol.org/publications/user-guides

and 'proxy' versions, designed for special cases where people whose EQ-5D data are being collected cannot complete a self-report questionnaire themselves.) For this reason, the EQ-5D belongs to a category of questionnaires often referred to as PROs and sometimes as Patient Reported Outcome Measures (PROMs). PROs aim to measure people's subjective assessment of their own health in a manner that is systematic, valid and reliable. There is growing recognition that such data from


**Fig. 1.2** EQ-5D-5L descriptive system. *Source* EuroQol Research Foundation. *EQ*-*5D*-*5L User Guide, 2019*. Latest version available from: https://euroqol.org/publications/user-guides


**Fig. 1.3** EQ-5D-Y. *Source* EuroQol Research Foundation. *EQ*-*5D*-*Y User Guide, 2014*. Latest version available from: https://euroqol.org/publications/user-guides

patients provides important information that complements the clinical endpoints traditionally used in medical care, and can pick up problems and issues missed by them (Appleby et al. 2015). For example, Robert Temple from the FDA stated that "The use of Patient Reported Outcome instruments is part of a general movement toward the idea that the patient, properly queried, is the best source of information about how he or she feels" (Bren 2006). The EQ-5D is one of the most widely used PRO measures internationally, and by 2016 the EQ-5D-3L was available in 176 language versions the EQ-5D-5L 123 and the EQ-5D-Y 40 (Devlin and Brooks 2017).

The EQ-5D questionnaire comprises two parts. The first is the EQ-5D descriptive system, as shown in Figs. 1.1, 1.2, and 1.3. Respondents are asked to tick boxes to indicate the level of problem they experience on each of the five dimensions. The combination of these ticks under each dimension describes that person's EQ-5D selfreported health state, often called an 'EQ-5D profile', which is described in more detail below.

The second part of the questionnaire is the EQ VAS, so called because it incorporates a Visual Analogue Scale. This captures the respondent's overall assessment of their health on a scale from 0 (worst health imaginable) to 100 (best health imaginable). The current versions of the EQ-5D-3L and 5L use the same EQ VAS, shown in Fig. 1.4, but the original version of the 3L had a slightly different format, as does the EQ-5D-Y.

The EQ-5D profile data can also be supplemented by using a 'scoring' or 'weighting' system to convert profile data to a single number—EQ-5D values. These scoring systems are usually based on preferences—that is, the problems on each dimension are weighted to reflect how good or bad people think they are. So, for example, many studies have shown that problems with pain and discomfort often carry more weight than problems with self-care as reported by the EQ-5D (see Szende et al. 2007), and this is reflected in the way questionnaire respondents' profile data is summed. These EQ-5D values—which are sometimes referred to in the literature as the EQ-5D Index, or quality of life weights or utilities—are constructed to lie on a scale anchored by the value 1, full health, and 0, dead. EQ-5D values cannot take a value higher than 1, but values less than 0 are possible for health states considered to be worse than dead.

A full set of values for each possible EQ-5D profile is often called a 'value set'. These values are obtained from stated preference studies, where members of the general public4 are asked to imagine living in health states described by the EQ-5D descriptive system, and to engage in a series of tasks designed to gauge how good or bad they consider those health states to be. A variety of methods can be used to elicit these preferences and to model them to create weights for the components of

<sup>4</sup>By convention, and for normative reasons, the general public's stated preferences are usually argued to be those relevant to constructing these value sets (see, for example, Neumann et al. 2017). Value sets and their use are discussed in more details in Chap. 4.

#### 1.1 Measuring Health Using the EQ-5D 7


**Fig. 1.4** EQ VAS (current EQ-5D-5L and EQ-5D-3L version). *Source* EuroQol Research Foundation. *EQ*-*5D*-*5L User Guide, 2019*. Latest version available from: https://euroqol.org/publications/ user-guides

the EQ-5D profiles. The resulting 'value sets'—the complete lists of values for each of the 243 profiles described by the EQ-5D-3L and EQ-5D-Y, and for the 3125 states described by the EQ-5D-5L—differ depending on what methods were used to elicit and model the preferences. They may also differ by country, reflecting differences in preferences across cultures and regions. Being aware of the properties of these value sets, and the difference they might make to your analysis of EQ-5D profile data, is important, and we discuss this further below and in Chap. 4.

#### **1.2 What does the EQ-5D Measure?**

The two parts of the EQ-5D questionnaire, combined with the value sets, means that the instrument generates three distinct types of data: the EQ-5D profile; the EQ VAS; and the EQ-5D values.

Each of these elements measures a somewhat different underlying construct of health. It is important to understand the nature of what is being measured in each case, since this affects hypotheses both about the expected relationship between these elements and between them and other data collected on respondents' health and other characteristics.

#### *1.2.1 The EQ-5D Profile*

A respondent's EQ-5D profile is a summary of the responses that they give to the descriptive system component of the EQ-5D self-report questionnaire. It can be described as five sentences, or summarised as a series of numbers representing the levels of problems in the order that the dimensions appear. Boxes 1.1, 1.2, and 1.3 give a fuller description.

#### **Box 1.1. What are EQ-5D profiles?**

A set of responses to the statements given in the descriptive system element of the EQ-5D questionnaire describes a health state or 'profile' as a combination of dimensions and levels within dimensions. For example, a completed questionnaire may be like this:

This profile can be described as a series of five sentences. For example, this respondent has:


In Box 1.2 we describe how these profiles may be more concisely summarised.

#### **Box 1.2. Summarising EQ-5D profiles**

A simpler way than using five sentences to summarise a profile is to assign each level a number and describe the profile as a five-number string, representing the level of each dimension in the order in which they appear in the questionnaire. The numbers used are: no problems = 1; some problems = 2; and extreme problems or unable to = 3. So, for example, no problems in any dimension is 11111, some problems in every dimension is 22222, and extreme problems in every dimension is 33333. The profile shown in Box 1.1 is 11232.

EQ-5D-5L profile data can be summarised in the same way. 11111 again means no problem on any of the five dimensions of health and the worst health state is 55555. The profile labels are not directly equivalent between the 3L and the 5L, except for 11111, which means no problems on any dimension. The worst health profiles, 33333 and 55555, describe different underlying health states because the worst level for mobility in the 3L is 'confined to bed' whereas in the 5L it is 'unable to walk about'. Similarly, the 'middle' states, 22222 and 33333, mean different things, as 3L level 2 refers to 'some' problems, but 5L level 3 refers to 'moderate' problems.

The numbers given to levels within dimensions are ordinal—for example, 3 is worse than 2 and 2 is worse than 1. However, the profile labels are categories, not numbers, and do not even have ordinal properties. They do have a limited logical ordering—see Devlin et al. (2010) and Parkin et al. (2010) for further details—and in some cases can be used to compare profiles. For example, profile 11111 is better than profile 11112 (it logically dominates it) and 11112 is better than 11122. But we cannot say anything about *how much* better 11111 is compared to 11112. Moreover, we cannot say whether 11112 is better or worse than a profile such as 11121. That depends on the relative importance attached to some problems with anxiety & depression compared with some problems with pain & discomfort.

Chapter 2 demonstrates how health profiles can be compared to make judgements about whether health has improved, using only the ordinal properties of the levels within profiles. But to compare health profiles such as 11112 and 11121 and to measure the magnitude of the difference between any profiles requires a scoring system that assigns weights to each profile. EQ-5D value sets achieve that, using data from stated preferences studies to convert the profile data into a single, cardinal number. We examine the use of value sets in detail in Chap. 4.

#### **Box 1.3. How many EQ-5D profiles?**

For the EQ-5D-3L, there are 35 = 243 possible profiles. There are three groups of profiles that include only two levels (1 and 2, 2 and 3 or 1 and 3), with 2<sup>5</sup> = 32 profiles (13% of all profiles) in each group. Therefore, for each level there are 35–25 = 211 profiles that include at least one of that level. So:


The number of unique profiles described by the EQ-5D-5L is 55 = 3125. There are five groups of profiles that include only four levels, with 4<sup>5</sup> = 1024 profiles (33% of all profiles) in each group. Therefore, for each level there are 55–45 = 2101 profiles that include at least one of that level, 55–35 = 2882 that contain at least one of each of two different levels and 55–25 = 3100 that contain at least one of each of three different levels. So:


In practice, not all profiles have an equal probability of being observed. For example, data obtained from the general population often contain a large proportion of profile 11111. In patient data sets, observations are often clustered on a sub-set of profiles relevant to those patients' condition; and some profiles are almost never observed because they contain unusual combinations of levels—for example the EQ-5D-3L profile 33133, in which there are extreme problems with everything except usual activities, where there are no problems.

The profile element of the EQ-5D questionnaire can be categorised as an example of a Health Status Measurement questionnaire, broadly defined (Bowling 2001, 2004). As noted earlier, the EQ-5D is often also described in the literature as measuring HRQoL. However, the concept of quality of life, and which aspects of it are seen as health-related, is often not precisely defined. Because the EQ-5D is a generic instrument, the EQ-5D profile will not capture everything that matters to all people with respect to their health status or HRQoL, and does not claim to do so. That means that, for some diseases and patients, there may be aspects of health that are important which the EQ-5D does not fully reflect, and this may be important to consider in your analysis of the data.

#### *1.2.2 EQ VAS*

The EQ VAS can be thought of as showing how patients feel about their own health overall. Their overall score will reflect both the relative importance that they place on the different aspects of their health that are included in the EQ-5D descriptive system and other dimensions of health that are not. The EQ VAS therefore provides information that is complementary to the EQ-5D profile. For example, it is often observed that some people who report no problems in any EQ-5D dimension rate their health as less than 100 on the EQ VAS (for example, see Devlin et al. 2004). Chapter 3 discusses other evidence for this, for example that the average EQ VAS scores decline with age even for those whose profile is 11111. Further, although profiles are systematically related to the EQ VAS scores in regression analyses, they only partially explain them (Feng et al. 2014).

#### *1.2.3 EQ-5D Values*

As noted above, EQ-5D values data are produced by applying value sets to summarise the EQ-5D profile data. The nature of these value sets, and their characteristics, are influenced by their principal application, which is in the estimation of qualityadjusted life years (QALYs). It is their use in this context that determines the anchors for the scale of 1 for full health and 0 for dead.5

It is important to note that using these value sets to generate EQ-5D values data introduces a source of exogenous variance into the analysis of profile data which can bias statistical inference (Parkin et al. 2010). Each value set places a different weight on the various levels and dimensions of the profile data, reflecting underlying differences in preferences, the methods used to elicit them, or both. This means that whether there are statistically significant differences in the EQ-5D values between, for example, two arms of a clinical trial, or between two regions in a national health survey, may depend on which value set is used, and the relative importance it puts on the different types of health problems and improvements in them.

More generally, there is *no* neutral way to summarise the data from the EQ-5D profile into a single number. This is not an issue that is only relevant to the EQ-5D instruments: these same points are relevant to the scoring and weighting systems used in *all* generic or condition specific PROs. Any method of combining responses to multiple questions must entail some weight being placed on each question. Even if preference-based weights were not used, and the dimensions of a PRO were equally weighted, that would imply a strong value judgement about the relative importance of various kinds of health problems that may or may not reflect the views of the people who self-reported their health on that PRO. Analysts should be aware of this, and check for the sensitivity of results to the choice of value set.

<sup>5</sup>The convention of anchoring at dead <sup>=</sup> 0 is very widely accepted, but could be debated—see Sampson et al. (2019).

#### *1.2.4 Which Aspect of the Information Provided by the EQ-5D Should be the Primary Focus of My Analysis?*

When considering which element of the EQ-5D data should be the primary focus of analysis, and what methods of analysis should be used, users should be guided by the *purpose* of collecting EQ-5D data and how the results will be used. Table 1.1 provides an overview of the main contexts in which EQ-5D data are collected, and implications regarding the analysis of the resulting data.

There are advantages in being able to summarise and represent a health profile by a single number like the EQ-5D values—for example, it simplifies statistical analysis. However, as we have already emphasised, there is no neutral set of weights that can be used for that purpose: they all embody judgements about what is meant by importance and the appropriate source of information for judging importance. It is therefore not possible to offer generalised guidance about which set of weights should be used if the sole purpose is to summarise profile data for descriptive or inferential statistical analysis. Users should consider the wider purpose for which the summary will be used. If the purpose is simply to provide descriptive information, then it may be better not to use EQ-5D values, but to focus analysis on the profile data themselves (see Chap. 2). This may also be preferable because the EQ-5D value provides less detailed information than the EQ-5D profile it is summarising. Focussing on the EQ-5D values may obscure the underlying information on the type and severity of problems affecting patients that the profile data provide (for example, see Gutacker et al. 2013).

Further, in some cases where a single number is required to represent health, for example, in the generation of population norms (Kind et al. 1999), it may be more appropriate to focus on the EQ VAS data provided by patients or populations, rather than applying the EQ-5D value sets to their profile data.

#### *Economic Evaluation*

Where the economic evaluation of treatment is the main goal of analysis, this has implications for the analysis of EQ-5D data. A key requirement for a health measure to use in cost effectiveness analysis is that it should provide an unambiguous measure of effectiveness. That is, higher EQ values should represent a better state of health and the same differences between EQ values should have the same level of importance. For example, the difference between 0.87 and 0.91 should represent the same degree of change as between 0.22 and 0.26. However, there is arguably a further requirement if the measure of effectiveness is to be based on economics principles, such as those embodied in cost utility analysis—essentially, that the weights need to represent 'values.' Just as costs represent the total value of resources used, that is the volume of each type of resource weighted by their individual value, effectiveness in the context of economic evaluation should represent the value of health output, that is the amount of health generated weighted by its value.


**Table 1.1** Example of types of studies and some considerations for analysis

There is ongoing debate over the extent to which the commonly-used stated preferences methods used adequately reflect underlying notions of 'value', and about the adequacy of QALYs as a measure of societal benefit from treating ill health. However, there appears to be general acceptance (for example, among Health Technology Appraisal bodies, like the National Health Care Institute (Zorginstituut) in The Netherlands, and the United Kingdom's National Institute for Health and Care Excellence) that value sets available for EQ-5D instruments, based on the preferences of adult members of the general public, are usually appropriate for use in cost effectiveness analysis (NICE 2013; Zorginstituut Nederland 2016; Neumann et al. 2017).

Further detail on EQ-5D values, including which value set to use, and the analysis of EQ-5D values data, is provided in Chap. 4.

#### **1.3 EQ-5D Data Collection and Data Handling**

Where EQ-5D data are captured electronically, manual data entry is not required. However, in many cases, EQ-5D questionnaires are still completed in paper format. Where this is the case, data will need to be coded and entered manually. As this process is subject to human error, best practice for EQ-5D questionnaires is the same as any other self-completed paper questionnaire and entails double entry—that is, data being entered twice, and files compared for anomalies, which are then checked against the hardcopy.

Coding and data entry for the descriptive system are relatively straightforward. It is recommended that levels are coded as 1, 2 and 3 (for the EQ-5D-3L) and 1, 2, 3, 4 and 5 (for the EQ-5D-5L) in each dimension, to enable easy generation of the conventional 5-number profile label. Missing data need to be flagged as do any unusual responses, for example if more than one level is ticked on a dimension, although the latter are relatively rare.

EQ VAS data collected electronically are also very straightforward. However, the paper format of the original and current versions of the EQ VAS used in the EQ-5D-3L and EQ-5D-5L (see Figs. 1.4 and 1.5) and the current version of the EQ-5D-Y (see Fig. 1.6) require respondents to draw a line or mark a cross on the VAS to record their response. The resulting data can require a considerable degree of interpretation in coding responses. For example, Feng et al. (2014) noted, from qualitative analysis of a sub-sample of English National Health Service (NHS) PROMs data, a number of common response types with respect to the EQ VAS data (see Table 1.2).

Whereas a type 1 response in Table 1.2 is the only response which strictly complies with the EQ VAS instructions, Feng et al. (2014) argue that types 2 and 3 also provide unambiguous responses that can be captured accurately and reflect the same meaning to the score intended by respondents. Together, types 1–3 covered 88% respondents in the data presented in Table 1.2. Other types, including missing and ambiguous responses (types 5 and 6) require separate codes to flag these issues in analysis. Similar issues may exist with EQ VAS data from the EQ-5D-Y.

**Fig. 1.5** EQ VAS (Original EQ-5D-3L version). *Source* EuroQol Research Foundation. *EQ*-*5D*-*3L User Guide, 2015*. Latest version available from: https://euroqol.org/publications/user-guides

**Fig. 1.6** EQ VAS (EQ-5D-Y version). *Source*EuroQol Research Foundation.*EQ*-*5D*-*Y User Guide, 2014*. Latest version available from: https://euroqol.org/publications/user-guides


**Table 1.2** Types of responses to the original EQ-5D-3L EQ VAS

*Source* Feng et al. (2014). Response types have been combined across both pre-and post-surgery responses and re-ordered by frequency

The current format of the EQ VAS in the EQ-5D-5L and EQ-5D-3L (see Fig. 1.4) entails respondents both noting a number in the box and marking a cross on the scale. In electronic data capture, the two are identical. In paper completion, there is potential for the two responses to differ, and best practice would suggest capturing both and reporting any such discrepancies.

#### **1.4 Before Starting Your Analysis**

#### *1.4.1 Treatment of Missing Data—What to Do, What Not to Do*

There are broadly two types of missing EQ-5D data. Data can be missing altogether for example, where an elective surgery patient in the English NHS fails to complete and return their post-surgery PROMs questionnaire. Or data can be missing in part for example, where the patient completes an EQ-5D questionnaire, but provides incomplete profile data, or does not complete the EQ VAS.

General guidelines (i.e. relating to PRO data, rather than specifically the EQ-5D) often indicate that a substantial amount of missing data can compromise the validity of analysis—but what constitutes 'substantial' is a matter of opinion. For example, based on the German Institute for Quality and Efficiency in Health Care (Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen) standard approach, data from at least 70% of patients at both baseline and one follow up visit are needed to consider analysis of that data valid for its purposes. However, 'percent missing' is not defined consistently across the literature and different definitions on how to estimate the amount of missing data may lead to different practices and results (Coens et al. 2020). Further, even where there are high rates of missing data, analysis of available data may still yield insights into the sub-group who did respond, even if results cannot be generalised to non-responders. In short, there are no hard and fast rules. However, it is important for analysts to report missing data, and to be mindful of potential limitations arising from loss of generalisability.

In general, you should provide data descriptions, state the assumptions underlying the handling of the missing EQ-5D data, and conduct sensitivity analyses to the selected assumption. Included in the data description should be the amount of missing data, missing data patterns, and the association between missing data and observed data, for example respondents' age, gender and any previously observed EQ-5D data for that respondent (Faria et al. 2014).

Analytical methods used for missing data in general are applicable to the EQ-5D; users are advised to consult a statistical text for details. Essentially, it is necessary to consider the assumed form that missingness takes for the data—Missing Completely At Random (MCAR), Missing At Random (MAR) or Missing Not At Random (MNAR) (Little and Rubin 1987)—and to select a method for dealing with this appropriate to that form.

If MCAR, where a respondent's missing data are not related to that person's socio-demographic or other characteristics, analysis can assume that the missing data follow the same patterns as the non-missing data.

If MAR, where a respondent's missing data is related to their observed characteristics, but not any unobserved characteristics, analysis can assume that we have a random sample of respondents with those characteristics and make inferences from that sample about the data that are missing. Multiple Imputation (MI) has been increasingly used in recent years for EQ-5D data with MAR (Ratcliffe et al. 2005; Kaambwa et al. 2012; Simons et al. 2015).

If MNAR, where a respondent's data are missing because of their characteristics, we do not have random samples of people with different characteristics and require more complex analytical methods to deal with resulting selection bias. The Heckman selection model has been applied to EQ-5D values data that are assumed to be MNAR (Kaambwa et al. 2012).

Recent guidance suggests that data analysts should evaluate the sensitivity of the analysis to the MAR assumption using methods such as the weighting or pattern mixture approaches (Faria et al. 2014; Simons et al. 2015). In particular the evaluation should examine how the results might change when a MNAR assumption is made to the missing EQ-5D data.

There are two missing data issues specific to EQ-5D data. First, there is the issue of what should be done where the user wishes to analyse profiles and some but not all of the profile items are missing. Bad practice includes substituting for a respondent's missing profile items an average derived from their non-missing items and substituting an average derived from the non-missing items in the sample as a whole. It might be possible to use MI in this context, but there are currently no examples on which to base guidance. Conservative guidance is therefore to treat as missing any profiles based on missing profile items.

The second is where some or all of the profile items are missing, and the user wishes to analyse EQ-5D values. For this, MI may be an appropriate method if the data are assumed MAR, but an issue is whether this should be applied to profile items, from which an EQ-5D value is calculated, or to EQ-5D values directly (Faria et al. 2014). In practice, the decision depends on the observed missing data pattern and the sample size available for analysis (Simons et al. 2015).

#### *1.4.2 Planning Your Analysis*

A systematic review of the use of PROs in oncology conducted by the Setting International Standards in Analyzing Patient-Reported Outcomes and Quality of Life Endpoints Data (SISAQOL) Consortium (Pe et al. 2018) showed a widespread lack of clearly-specified a priori research hypotheses and a link with the design and statistical methods to be employed. New guidelines for protocol development (for example SPIRIT-PRO) and reporting of PROs (for example CONSORT-PRO6—see Calvert et al. 2013) also recognise this to be a common issue in PRO studies generally.

Before beginning analysis of EQ-5D data, you should therefore consider what questions you want to answer with your data. What are your hypotheses about, for example, how a treatment arm is expected to behave relative to a reference arm in a clinical trial? What assumptions underpin these hypotheses, for example what is your rationale and what evidence has informed that? This, in turn, should inform the statistical analysis plan (SAP) developed prior to analysis. Note that the content of SAPs will vary depending on the study type and study aims.

#### **1.5 Guide to the Rest of this Book**

In the remainder of this book, we explain in detail how each element of the data generated from using EQ-5D instruments—the profile data, EQ VAS and EQ values can be analysed. We provide both a basic introduction to analysis in each case, assuming no prior knowledge of analysis of EQ-5D data, as well as introducing more advanced topics relating to analysis of EQ-5D data.

<sup>6</sup>CONSORT: Consolidated Standards Of Reporting Trials.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 2 Analysis of EQ-5D Profiles**

The aims of this chapter are


Profile data form the cornerstone of analyses of EQ-5D data and, in many cases, are likely to be the primary focus of interest. In this chapter, we provide an overview of methods that can be used to describe the profile data from respondents at a given point in time, and to describe the changes in profiles between different points in time.

Even when the ultimate goal of analysis is to generate EQ-5D values and to estimate quality-adjusted life years (QALYs), analysis of profile data provides important insights and should always be the starting point for analysts. For example, summarising EQ-5D patient data simply as values obscures the underlying information about which aspects of their health have been most affected by their condition, or improved by treatment. To know about that, you need to look at the data that respondents have given you: the boxes they ticked on an EQ-5D questionnaire.

The methods presented here need not be treated as alternatives, but rather as complementary. Although they are illustrated using EQ-5D data, and in some cases developed specifically for the analysis of EQ-5D profile data, these same methods could just as readily be applied to other generic or condition specific health status or patient reported outcome (PRO) measures.

It should be noted that we do not cover inferential statistics, either hypothesis testing or estimation, as this book is not intended as a statistical primer and we assume that readers will be able to apply appropriate inference procedures where required. For example, we describe contingency tables, to which measures of association such as a χ2 test could be applied.

#### **2.1 Cross-Sectional Analysis: Describing Health at a Point in Time by Dimension and Level**

Exploratory data analysis (EDA) of EQ-5D data, including the use of simple descriptive statistics, is undervalued, and often underreported in papers that contain more complex econometric and psychometric analyses. This is bad practice and wasteful of information, because EDA not only generates information that helps in interpreting more complex analyses, but also generates information about health within populations and about the properties of the EQ-5D which is valuable in itself.

Describing health at the most detailed level possible for the EQ-5D can be done very simply, by reporting the number and percentage of patients reporting each level of problem on each dimension of the EQ-5D. An example of this is shown in Table 2.1, which shows EQ-5D-3L data provided by patients before and after hip surgery, using data from a pilot study for the Patient Reported Outcome Measures (PROMs) programme in the English National Health Service (NHS) (Devlin et al. 2010).

This very simple table provides some important information. For example, before hip surgery, 420 of these patients (95.7% of the sample) reported a level 2 problem on mobility, but none reported a level 3 problem. The reason is that Level 3 on the EQ-5D-3L mobility dimension is 'confined to bed'—and even patients with very poor mobility because of hip problems aren't confined to bed. That is a problem with the EQ-5D-3L—as has been pointed out previously (Oppe et al. 2011). This issue has been corrected in the EQ-5D-5L (Herdman et al. 2011), where the most severe problem with mobility is 'unable to walk about', and is an important advantage of the 5L over the 3L (Janssen et al. 2018).

The information on the types of problems experienced by a sample of patients at any given point in time can be simplified still further by collapsing levels together, to create just two categories: the number and percentage of patients reporting no problems (level 1), and the number reporting any level of problems (levels 2 and 3 for the 3L, and levels 2, 3, 4 and 5 for the 5L). This can also be seen in Table 2.1. For example, before surgery, mobility problems are common in these patients, as might be expected: only 4.3% of these patients had no problems with mobility. Of the 95.7% of patients who reported having at least some problem on mobility before surgery, all reported a level 2, as noted above. However, problems on other dimensions are just as prevalent: 99.8% of patients reported at least some problem with pain and discomfort, and 96.6% at least some problem with usual activities. Over 40% of these patients also reported problems with anxiety and depression—something that might be missed by condition specific instruments focused on mobility and function-related issues specific to hips, such as the Oxford Hip Score.

Examining the profile data by each dimension and level in this manner is a good starting point to understanding the nature of the health problems reported in the data you have collected. However, there are limitations to this way of reporting the data. Because the focus is on the frequency of observations in each level *within* each dimension, it doesn't tell us how these problems combine in the people reporting


aResults are for those who responded to both the pre- and the post-operative questionnaire. 84% of respondents to the pre-operative EQ-5D also respondedthe post-operative EQ-5D b'Some problems'=levels 2+3

*Source* Devlin et al. (2010)

2.1 Cross-Sectional Analysis: Describing Health at a Point in Time … 25

 to them. For example, are the people who report a level 3 on Anxiety and Depression also the same people who report a level 3 on Usual Activities? For this reason, it is also important to examine the way that observed levels of problems on each dimension combine into EQ-5D profiles, which is covered in Sect. 2.3.

#### **2.2 Longitudinal Analysis: Describing Changes in Health Between Two Time Points by Dimension and Level**

In addition to describing health states at any one point in time, if you have collected EQ-5D profile data at more than one time point, you are likely also to be interested in describing the changes between them—for example, before and after surgery, or between various time points in a clinical trial, compared to baseline. This too can be done at the level of the EQ-5D dimensions, as is also shown in Table 2.1.

'Eyeballing' the differences in numbers and percentages of patients in each of the levels tells us about the nature of the changes in health that resulted from surgery. For example, the results in Table 2.1 show there were quite striking improvements in patients' Anxiety and Depression, Self-care and Pain and Discomfort—not just Mobility. And because of the issue with level 3 Mobility noted above, whereby the worst level of problem these patients were likely to report on mobility was level 2, the only improvements to mobility that were possible as a result of hip replacement surgery were from 'some' to 'no' problems. This issue with the use of the EQ-5D-3L to measure health outcomes from hip surgery would not have been apparent if these patients' data had been analysed just in terms of EQ-5D values.

It can however be difficult readily to get an overall picture of improvements, even for these relatively simple EQ-5D-3L data. As with the analysis of cross-sectional data, this does not summarise the extent of improvement across dimensions. As noted in Sect. 2.1, one way of handling this is to collapse the levels into just two categories: no problems and some problems. The shift between these two categories provides a simpler way of capturing change. The change in health between time points, reported in this manner, provides a way of summarising the overall extent to which patients go from any level of problem to no problem within each dimension. This may be useful in some contexts, but it has some limitations as an indicator of improvement because of the loss of information caused by aggregation of levels. It doesn't capture improvements other than shifts to no problem, so other improvements that may be of value to patients, for example from extreme to moderate problems, are not captured. That means, that if applied to the EQ-5D-5L, the advantages of its more refined descriptive system will be lost.

#### **2.3 Cross-Sectional Analysis: Describing Health at a Point in Time Using Profiles**

While describing the number and percentage of observed levels within each dimension (as in Table 2.1) gives very useful information dimension-by-dimension, it does not tell you anything about the way these problems are *combined* in the health states reported by patients.

One of the most simple and instructive things you can do with an EQ-5D profile data set is to report the cumulative frequency of these profiles. This will reveal the extent to which your observations are evenly distributed over many profiles, or instead concentrated on a relatively small number of health profiles.

The results can sometimes be quite surprising. For example, in Table 2.2 we show the cumulative frequency of self-reported EQ-5D-3L profiles reported by 7294 respondents in the 2012 Health Survey for England. In this example, the great majority of respondents self-reported their health using only a small number of profiles. The top three most frequently reported profiles represented almost three quarters of the respondents.

In contrast, Table 2.3 shows the cumulative frequency of profiles reported by 996 respondents from the general public in the EQ-5D-5L value set study for England for their self-reported health on the EQ-5D-5L. This shows, in comparison to Table 2.2, a larger number of unique health states observed in this data set, and the observations are less concentrated on a small number of states. A large proportion of observations are accounted for by profile 11111 (no problems on any dimension) in both data sets, which is not surprising given that both samples comprise members of the general public, many of whom would not regard themselves as ill. But in general, this 'ceiling effect' is somewhat less in the EQ-5D-5L data (Devlin et al. 2018). Obviously, the


**Table 2.2** Prevalence of the 10 most frequently observed self-reported health states and frequency of reporting of the worst possible health state in EQ-5D-3L

*Source* Feng et al. (2015)


*Source* Feng et al. (2015)

states observed and their cumulative frequency will differ from data set to data set, but in general the EQ-5D-5L yields less concentrated data, reflecting the advantages of the larger number of response options.

Understanding these patterns of observations in your data is important for three reasons:


Looking at the cumulative frequency is a simple and effective way of getting an insight into the distribution of health profiles in a data set. However, a limitation is that it does not provide a summary statistic that allows us readily to (a) describe how good or bad the health states are, or (b) the extent to which the observations cluster on just a few health states, or are evenly spread out over the available heath states described by the descriptive system. Having a summary statistic to characterise the degree to which there is clustering or dispersion of observed health states is useful, especially if one wanted to compare this characteristic, for example to find out whether there are changes in the distribution of profile data from a group of patients observed at different time points, or between EQ-5D profile data from patients with different conditions.

#### **2.4 Longitudinal Analysis: Describing Changes in Health Between Two Time Points Using Profiles**

Descriptive analyses of profile data such as Table 2.1 can be very useful, but they contain a lot of information and sometimes an overall summary is required. One way of summarising profile data is to generate a single number for each profile using weights, for example using value sets. However, as noted in Chap. 1, this introduces possible problems of information loss and bias. The good news is that there are ways of summarising changes in EQ-5D health status without using value sets, just using the data that respondents have given you.

#### *2.4.1 The Paretian Classification of Health Change (PCHC)*

Devlin et al. (2010) introduced a way of summarising changes in profile data called the Paretian Classification of Health Change (PCHC). The approach is based on the principles of a Pareto improvement in Welfare Economics, drawing an analogy with the challenge of summing up changes in utility of different individuals, where utility can be measured only in ordinal terms. The idea is simple: an EQ-5D health state is deemed to be 'better' than another if it is better on at least one dimension and is no worse on any other dimension. And an EQ-5D health state is deemed to be 'worse' than another if it is worse in at least one dimension and is no better in any other dimension. Using that principle to compare a person's EQ-5D health states between any two time-points, there are only four possibilities:


Applying this to the English NHS PROMs pilot hip replacement data, we found that under 5% had no change, 82% had improved health, under 5% had worse health, and under 10% had a 'mixed' change (Devlin et al. 2010). In other words, this simple analysis provides a very clear summary of what is happening to patients' health because of hip surgery—without relying on value sets. It also highlighted important differences in the benefits from hip surgery, compared with the other types of elective surgery analysed in the English NHS PROMs pilot, shown in Tables 2.4 and 2.5. Looking at Table 2.4, hip replacement operations were by far the best in terms of success in reducing the number of patients who had problems, with knee


**Table 2.4** Changes in health for five surgical procedures according to the PCHC

*Source* Devlin et al. (2010)

**Table 2.5** Changes in health state for three conditions according to the PCHC, taking account of those with no problems


*Source* Devlin et al. (2010)

replacement operations a close second. Hernia and varicose vein repairs were much less successful, and cataract removals had a very low success rate, with more patients getting worse than improving—although the last of these should be interpreted carefully because the EQ-5D may not be capturing the kind of benefits that cataract operations provide. The numbers of patients who worsened or had no change show the same pattern.

One problem with this analysis is that 'No change' is confounded when patients record no problems according to any of the dimensions before treatment, because they are, according to the EQ-5D, healthy patients whose only alternative would be for their condition to worsen as a result of treatment. Recording no problems at all is rare for patients who have conditions serious enough to require a joint replacement but may occur for conditions whose need for treatment may not be fully captured by their EQ-5D profile. Table 2.5 shows for the three conditions to which this applies the PCHC taking into account those with no problems before surgery. In each case, this shows a slightly better performance than suggested by Table 2.4.

The advantage of the PCHC is that it provides a high-level summary of the nature of changes in health reported by patients, without the need to introduce any external scoring system or preference weighting.

The limitations of the PCHC are:


The PCHC can be extended to give information about the composition of differences between profiles according to how dimensions and levels differ. These are illustrated using newer data on hip replacement patients in the English NHS PROMs programme that was instituted following the pilot study referred to earlier, using simple graphs. They also show how data can be compared at different time periods. This could be adapted to compare, for example, patients in different populations.

First, Fig. 2.1 shows the PCHC for three years in graphical form.

Figure 2.2 shows which dimensions were improved for those patients whose PCHC category was 'Improved'

This shows that improvements were spread over all dimensions, but were most frequently found in Pain and Discomfort, followed by Usual Activities and Mobility,

**Fig. 2.1** The PCHC for hip replacement patients in the English NHS, 2009–12

**Fig. 2.2** Percentage of hip replacement patients who improved overall, by the dimensions in which they improved, English NHS 2009–2012

with Self-care and Anxiety and Depression improving for less than 50% of those who improved overall.

Figure 2.3 shows which dimensions were worsened for those patients whose PCHC category was 'Worsened'.

This shows the opposite pattern to improvements. Worsening health was spread over all dimensions, but was most frequently found in Anxiety and Depression and Self-Care followed by Usual Activities, with Pain and Discomfort and Mobility getting worse for less than 20% of those whose health was worse overall.

Figure 2.4 shows a comparison of PCHC 'Mixed' patients, which is more complicated because it involves both worsening and improving in different dimensions.

For the EQ-5D-3L, it is possible to show every possible change in every dimension. Each dimension can change in one of three ways—no change, improvement or worsening—each of which has three possible specific level changes, resulting in 9 categories for each dimension. Table 2.6 shows how these can be displayed for the hip replacement data.

This shows that the dominant change for Mobility, Usual Activities and Pain and Discomfort is an improvement from level 2 to level 1, but for Self-care and Anxiety and Depression it is no change from 'no problems.' Within change categories, it is notable that in each dimension improvements are dominated by a change from level 2 to level 1; that improvements from level 3 to level 1 and worsening from 1 to 3 are rare, reflecting the rarity of level 3 observations in the data set; and worsening from 2 to 3 is the most common amongst those who worsened overall in Usual Activities and Pain and Discomfort.

**Fig. 2.3** Percentage of hip replacement patients whose health worsened overall, by the dimensions in which they worsened, English NHS 2009–12

**Fig. 2.4** Percentage of hip replacement patients who had a mixed change overall, by the dimensions in which they improved and worsened, English NHA 2009–12


**Table 2.6** Changes in levels in each dimension for hip patients, NHS PROMs, 2009–10, percentages of total and of type of change

% total = % of all in the relevant dimension; largest category highlighted in bold

% type = % of all in the change type in the relevant dimension; largest category highlighted in bold

Unfortunately, it is much more difficult to display the same analysis for the 5L version, as there are 25 possible categories for each dimension.

#### *2.4.2 The Probability of Superiority*

Buchholz et al. (2015) introduced a nonparametric effect size measure, the probability of superiority (PS), to analyse paired samples of EQ-5D profile data in the context of assessing changes in health in terms of improvement or deterioration. This measure was initially recommended by Grissom and Kim (2012). For each dimension, the number of patients with positive changes is divided by the total number of matched pairs (i.e. the number of respondents scoring EQ-5D at both time-points). To account for patients with no changes, that is 'ties', half the number of ties is added to the numerator. PS is therefore the probability that within a randomly sampled pair of dependent scores, the score obtained at follow-up will be smaller than the score obtained at baseline. It ranges from 0 to 1 and is


This is a further, useful way of examining the nature of change in EQ-5D data. A limitation is that it focuses on changes at the dimension level, rather than on how this combines at the patient level.

#### *2.4.3 Health Profile Grid (HPG)*

A further way of summarising changes in health in an EQ-5D data set is the Health Profile Grid (HPG), also introduced by Devlin et al. (2010). The HPG relies on profiles being ordered from best to worst. This can be done using a value set, a scoring system based on equally weighted dimensions and levels, or a scoring system based on the EQ VAS predicted from the profile (see Chap. 4).

The HPG plots the profiles between any two points in time. The example shown in Fig. 2.5, again taken from the English NHS PROMs pilot, shows profiles before and six months after hip replacement surgery. The rank ordering is determined by the EQ-5D-3L values according to the value set for the United Kingdom (Dolan et al. 1997). The PCHC category for each profile change is also shown.

The location of each point shows improvement and worsening according to the profiles' rank order. The 45° line represents 'no change'; the further above the line, the greater the improvement in health; below the line means health has worsened. The pattern of observations in the HPG in Fig. 2.5 suggests that most patients experience benefit from hip replacement surgery, as the observations lie predominantly

**Fig. 2.5** Health profile grid for hip operations, English NHS

above the 45° line. There is a spread of health profiles from less to more severe before surgery, but a much narrower distribution after surgery, concentrated in the least severe profiles, with some outliers. The PCHC category adds to this by identifying cases where overall improvement and worsening of the patients' 'before' and 'after' profiles according to their rank are 'Mixed Change', that is they include both improvements in at least one dimension and worsening in at least one other. In these data, every mixed change case included only one dimension which changed in the opposite direction to the overall change according to the profiles' rank.

By contrast, the HPG shown in Fig. 2.6, for the English NHS PROMs pilot cataract surgery data, shows a much more mixed picture of improvements and worsening. The immediately obvious observation is that similar numbers improved and worsened. However, another feature is that most of those with the worst health profiles before surgery improved and most of those with the worst profiles after surgery had amongst the least severe health profiles before surgery. Unlike the clear-cut conclusions that may be drawn from the hip HPG, such a pattern suggests further investigation is required into the impact of cataract operations on patients' health-related quality of life (HRQoL).

Presenting the profiles in this manner can suggest clusters of patients, characterised by the nature of their profiles at time point 1, and the direction and magnitude of the change between the time-points. However, it is important not to rely on visual inspection alone to identify clusters, because some of the gaps that are apparent simply identify EQ-5D health profiles that are very infrequently observed, for example states having no problems in four dimensions and the worst state in the

**Fig. 2.6** Health profile grid for cataract operations, English NHS

**Fig. 2.7** Health profile grid showing clusters of changes in health for NHS hip replacement patients, using the k-means procedure

other. It is essential to test for these formally using statistical cluster analysis techniques. An example, with clusters identified using a k-means procedure, is shown in Fig. 2.7.

The numbers represent the 6 different clusters of patients identified. Most of the clusters seem to be identified as similar because of the patients similar pre-surgery profiles. Cluster 4 is of more interest, identifiable as the patients with worst health profiles after surgery. Also of interest is the comparison of clusters 2 and 5, with similar, relatively less severe profiles before surgery but with cluster 2 having more severe profiles after surgery. These observations could form the basis of further investigation into whether or not these are real clusters of clinical importance.

It is to possible to improve the appearance of the HPG and reduce the problem of artefactual gaps by including only those health states found within the data. It is also possible to take this further by including only the most frequently found profiles. In many data sets, only a few very common profiles are found, along with many rarer cases, so restricting the analysis to profiles covering, for example, 90% of all observations would be informative.

The advantage of the HPG is that it provides a ready means of displaying and examining the changes in health within a sample of patients. A limitation of the HPG is that it relies upon having a valid and appropriate means of ranking the EQ-5D profiles. The method used to rank the profiles may affect the HPG and the statistical identification of clusters.

#### **2.5 Summarising the Severity of EQ-5D Profiles**

It is sometimes useful to summarise the overall 'severity' of EQ-5D health states, by means other than generating weighted scores such as values. Because these involve information loss and hidden assumptions about the aggregation of dimensions and levels, they should be used with care.

#### *2.5.1 The Level Sum Score (LSS)*

It is possible to summarise a profile by calculating a Level Sum Score (LSS), sometimes misleadingly referred to as the 'misery score'. This simply adds up the levels on each dimension, treating each level's conventional label (1, 2 or 3) as if it were a number rather than simply a categorical description.

The best EQ-5D health state involves having no problems on any dimension and is conventionally represented by the label 11111. Treating the level labels as numbers, the best possible score is (1 + 1 + 1 + 1 + 1) = 5. Similarly, the most severe problem on any dimension has the label 3 for the EQ-5D-3L, so the LSS for the worst health state is (3 + 3 + 3 + 3+ 3) = 15. Every other health state on the EQ-5D-3L will have a level sum score between 5 (the best) and 15 (the worst 15), and as these are integer there are 11 possible scores; the larger the score, the worse the health state. For the EQ-5D-5L, the range is between 5 and 25 and there are 21 possible scores.

The LSS has been used as a crude measure of severity to gauge the validity of values obtained in valuation for studies for different health states. Figure 2.8 shows the relationship between the English value set for the EQ-5D-5L and the LSS (Devlin et al. 2018). This shows that, as the LSS increases (states get worse), the values decline.

However, the LSS has some important limitations as a means of summarising health states across dimensions and levels:


**Fig. 2.8** EQ-5D-5L values (English value set) plotted against the LSS

These issues can be seen below, with respect to the EQ-5D-5L. Table 2.7 shows all possible LSSs for the EQ-5D-5L. It also shows descriptive statistics for the English value set for the EQ-5D-5L for all the different LSSs for the EQ-5D-5L. Although the mean and median values relate reasonably well to the order of the LSS, it does show big differences in the standard deviation. Importantly, it shows the overlap between the range of values for the different level summary scores. For example, the range for LSS = 15 includes the mean values of LSS = 12 and LSS = 18 and the lower or upper range respectively of LSS = 10 and LSS = 21. This issue can also be seen in Fig. 2.8. For these reasons, it is wrong to treat the LSS as ordinal.

#### *2.5.2 The Level Frequency Score (LFS)*

An alternative, although rarely used, means of summarising profile data is the level frequency score (LFS). The measure was proposed by Oppe and de Charro (2001) and used there to demonstrate the distribution of the EQ-5D-3L profiles in their data on the effects on HRQoL of a helicopter trauma team. The method characterises each health state by the frequency of levels at 1, 2 or 3 (for the EQ-5D-3L) or the frequency of levels at 1, 2, 3, 4 and 5 on the EQ-5D-5L. For example, in the EQ-5D-5L, the full health profile 11111 has 5, 1 s, no level 2, 3, 4 and 5 s, so the LFS is 50000; the worst health profile is 00005; profiles such as 31524 and 53412 would be 11111; 20 profiles such as 13211 have a LFS of 31100.


**Table 2.7** Summary statistics for the EQ-5D-5L values (English value set) by all the different LSSs

Oppe and de Charro used the LFS to show the way in which the EQ-5D-3L values data observed in their data (using the UK EQ-5D-3L value set) were distributed over the various EQ-5D-3L profiles (see Table 2.8).

The distribution of EQ-5D-5L profiles by LFS is provided in an Appendix to this chapter.

#### **2.6 Analysing the Informativity of EQ-5D Profile Data**

#### *2.6.1 Shannon Indices*

Shannon's indices, originally developed to analyse the information content of strings of text, are widely used in the ecology literature to measure how many species are observed and how evenly animals, or plants are spread over the various categories. It has also been applied widely in assessing distributional characteristics of the EQ-5D


**Table 2.8** Number of observations in the LFS according to the UK EQ-5D-3L values

*Source*Taken from a EuroQol scientific plenary paper which preceded the subsequent journal articles

(Buchholz et al. 2018), where the categories of interest are EQ-5D profiles and we are interested in a summary measure of how evenly respondents to EQ-5D questionnaires are spread over the profiles defined by the descriptive system. The main application of the Shannon indices has been to compare informational richness and evenness of dimensions, either comparing the EQ-5D-3L with the EQ-5D-5L or to compare similar dimensions between different generic health status instruments (Janssen et al. 2007). It is also possible to apply the Shannon indices to distributions of health profiles.

The Shannon index is defined as:

$$H' = -\sum\_{i=1}^{c} p\_i \log\_2 p\_i$$

where *H* represents the absolute amount of informativity captured, *C* is the total number of possible categories (levels or profiles), and *pi* = *ni*/*N*, the proportion of observations in the *i*th category (*i* = 1,…, *C*), where *ni* is the observed number of scores (responses) in category *i* and *N* is the total sample size. The higher the index *H* is, the more information is captured by the dimension or instrument. In the case of a uniform (rectangular) distribution (i.e., *pi* = *p*\* for all *i*), the optimal amount of information is captured and *H* has reached its maximum (*H*- max) which equals log2 *C*. If the number of categories (*C*) is increased, *H*- max increases accordingly, but *H* will only increase if the newly added categories are actually used. The Shannon Evenness index (*J*- ) exclusively reflects the evenness (rectangularity) of a distribution, regardless of the number of categories, and is defined as: *J*- = *H*- /*H*- max. Variance of the Shannon index can be calculated as described by Janssen et al. (2007) and accordingly standard errors and 95% confidence intervals can be calculated.

The Shannon indices are purely descriptive measures of the informational richness and evenness of a classification system and have no relation to the content, meaning, or clinical relevance of what the instrument aims to measure. Both the Shannon index and the Shannon Evenness index are needed to make a useful interpretation of the measurement scale.

#### *2.6.2 Health State Density Curve (HSDC)*

Zamora et al. (2018) introduced a graphical means of depicting the nature of the distribution of EQ-5D profiles, the health state density curve (HSDC). This draws on an analogy with the Lorenz curve in describing an income distribution. The cumulative frequency of health states is compared against the cumulative frequency of the sample or population. A 45° line means that the observed health states are completely evenly spread across the sample: 10% of the sample accounts for 10% of the health states; 50% of the sample accounts for 50% of the health states, and so on.

**Fig. 2.9** HSDC for EQ-5D-5L profiles from Cambridgeshire NHS patients

A concentrated distribution—that is, where relatively few profiles are reported and are common to a large proportion of the sample—will be show as a curve which lies below the 45° line. The more unevenly distributed the profile data, the further below the diagonal line the HSDC will be. In the extreme, where just one profile is reported by all members of the sample, the HSDC will take a right-angled shape.

Figure 2.9 shows the HSDC for patients from three groups of patients, and overall, from Cambridgeshire NHS in the UK. This shows that for all patients, observed profiles are not evenly distributed, that is a small number of profiles accounts for a relatively large share of the observations. The musculoskeletal patients had the most concentrated data.

The HSDC provides a simple means of illustrating this property of a profile data set, in a manner that facilitates comparisons between data sets. It has limitations. As with Lorenz curves, where two curves cross (as is the case with rehabilitation and nursing data shown in Fig. 2.9), there is no unequivocal way of declaring one data set to be more concentrated than another. It also does not tell us which profiles are the most commonly self-reported. Therefore, the HSDC is best seen as a complement to the information from the cumulative frequency of profiles.

#### *2.6.3 Health State Density Index (HSDI) and Other Related Indices*

In the analysis of income distribution, the Lorenz curve is often accompanied by the Gini coefficient, which describes the extent of inequality which is apparent as the area between the diagonal line and the curve, divided by the entire area underneath the diagonal. In a similar way, an index can be calculated to summarise the inequality of observed health state profiles. Zamora et al. (2018) introduce a broadly similar summary measure, the Health State Density Index (HSDI). HSDI has a value of 1 where there is total equality, that is where there are the same number of patients in each profile, and HSDI = 0 for total inequality, that is where one profile accounts for all the observations.

The HSDI allows the degree of concentration in self-reported health to be compared both between different sets of patients and between different instruments, for example the 3 and 5 level versions of the EQ-5D. Zamora et al. (2018) use the HSDC to compare the EQ-5D-3L and EQ-5D-5L, their respective HSDIs indicating the advantages of the 5L in differentiating between patients and yielding less concentrated data.

The specific properties of the HSDI may be compared with the Shannon' indices. Each performs somewhat differently as a measure in capturing specific aspects of the distribution of patients' data, such as the concentration over the most common states, and the influence of 'rare' states. For example, the Shannon index (absolute and relative) is not sensitive to random variations but decreases slowly with "rare health states". The HSDI decreases slowly with random variations and is strongly affected by infrequently observed health states with large decreases towards zero (total inequality). For more detail see Zamora et al. (2018).

#### **Appendix: Analysis of the LFS for the EQ-5D-5L**

For the EQ-5D-5L, the LFS has a total of 102 possible scores, from 00005 (for the worst profile 55555) through to 50000 (no problem on any dimension, state 11111). Like the LSS, a problem with the LFS is that LFSs contain an uneven number of profiles. For example, LFS=50000 and LFS=00005 each contain 1 profile, whereas LFS = 11111 (meaning: any health profile containing one level 1, one level 2, one level 3, one level 4 and one level 5) represents 120 different EQ-5D-5L profiles. Table 2.9 is a full list of the possible values for the LFS.


**Table 2.9** Distribution of the EQ-5D-5L profiles by LFS


**Table 2.9** (continued)

Table 2.10 shows how the LFS could be used to analyse the characteristics of an EQ-5D-5L value set, using data from the English value set (Devlin et al. 2018). It shows the mean and median values for each LFS.


**Table 2.10** Summary statistics of EQ-5D-5L values by LFS


**Table 2.10** (continued)

The following chart (Fig. 2.10) shows how the EVS and LFS are related. As with the LSS, this gives an indication of the general validity of a value set, in that there are no patterns that indicate that a value set takes perverse or other undesirable characteristics.

**Fig. 2.10** EQ-5D-5L values (English value set) plotted against the LFS

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 3 Analysis of EQ VAS Data**

The aims of this chapter are


#### **3.1 Interpreting the EQ VAS**

It is important when analysing EQ VAS data to understand the nature of this element of the EQ-5D questionnaire and the measurement properties that it has. (For a more detailed discussion, see Feng et al. 2014.) The EQ VAS has a unique design that does not conform to conventional Visual Analogue Scale (VAS) formats, and the widely-observed properties that VAS data have therefore do not automatically apply to EQ VAS data. It has some features that conventionally belong to a 'rating scale' but again, because it has an unconventional design, the properties that rating scale data have may also not apply.

A conventional VAS is a straight line of a specified length with verbal descriptors at each end stating the meaning attached to the end points, without any demarcations of the line or numeric labels at any point. The EQ VAS is also a line that has endpoint descriptors, but it also demarcates the line in units of ones and tens, and places number labels on the tens markers. This format for the line is closer to a 'numerical rating scale', but such scales usually attach a number to every marker, have many fewer markers, and often do not have verbal end-point descriptors.

The versions of the EQ VAS contained in the EQ-5D-3L and EQ-5D-5L are also unconventional in how the scores are recorded by respondents. For the EQ-5D-3L version, the method of drawing a line from a box that states 'Your own health state today' to the scale (see Fig. 1.4, Chap. 1) is unique; it is a feature that was included for reasons related to the historical development of the EQ-5D<sup>1</sup> rather than evidence about the best way for respondents to record EQ VAS scores (Feng et al. 2014). The EQ-5D-5L uses a more conventional means of recording the score on the line (by marking a cross) but asks respondents also to record the score separately as a number. The aim of this is to overcome problems of imprecise marking on the line, but this introduces the possibility that respondents may respond primarily to the direct numeric estimate and are therefore undertaking a different measurement task, 'magnitude estimation', data from which may also have different properties to VAS and rating scale data.

The measurement properties that result from this, and therefore the kinds of statistical analysis that are permitted, are therefore not entirely clear. It is reasonable to assume that the resulting scores are at least ordinal. The line's design strongly suggests that the respondent is invited to supply interval level data, which is also the intention of magnitude estimation. The provision of a true zero and maximum even suggests providing scores that have ratio level properties. However, those who complete the VAS may in practice not respond to the visual stimuli provided in exactly this way. The evidence is mixed, with some studies finding reasonable interval scale properties; however, EQ VAS responses have very often been found to exhibit 'end aversion', which suggests that the data cannot be truly interval, though it is possible that a transformation could be estimated to repair this.

Another consideration is that, as with all health-related quality of life (HRQoL) measurement methods, EQ-VAS responses may not be interpersonally comparable. For example, the end-point labels may mean different things to different respondents, and the meaning that they attach to different numbers may also differ (Devlin et al. 2019).

The guidelines for analysis of EQ VAS data below assume that the numerical values given to the EQ VAS behave as if they have at least an interval scale and are interpersonally comparable, such that it is meaningful to calculate descriptive statistics for a sample or population, such as means; to apply hypothesis testing, such as t-tests of differences in means; and to use estimation procedures, such as regression analysis. However, if there is evidence to suggest that the EQ VAS data are ordinal, then non-parametric versions of the descriptive and inferential statistics described below should be used.

It is also the case that EQ VAS data often exhibit digit preference, which is a tendency to choose numbers ending with 0 and to a lesser extent 5, rather than any others. In the context of sample or population data, this phenomenon may be treated as a lack of precision rather than the existence of bias.

Before beginning to analyse EQ VAS data that have been collected via paper questionnaires for the EQ-5D-3L, it is important to check how those data have been coded. Recall, from Chap. 1, that there are particular issues relating to the range of approaches which respondents have been observed to use in completing the EQ VAS

<sup>1</sup>Specifically, the EQ VAS was initially included as a warm-up task in studies to obtain VAS valuations for EQ-5D health states, and the format of the EQ VAS reflects the VAS which was used in those valuation tasks.

in the EQ-5D-3L questionnaire. These may not strictly comply with the instructions but nevertheless represent valid responses. The EQ VAS in the EQ-5D-3L will, in future, be made consistent with the EQ VAS in the EQ-5D-5L, so this issue will no longer arise, but does apply to historic data sets.

#### **3.2 Simple Descriptive Statistics and Inference**

With respect to summary measures, the distribution of EQ VAS data within a sample can be reported using a full range of descriptive statistics, such as minimums, maximums, means, medians, quartiles, standard deviations, interquartile ranges, skewness and kurtosis. Descriptions of relationships between EQ VAS data and other variables can also be reported, such as correlation coefficients. These may be subject to appropriate hypothesis testing using, for example, a t-test to test for differences between means or for the significance of a correlation coefficient. Similarly, EQ VAS data can be used for estimation, either as a dependent or independent variable.

The following example uses publicly available data from the Patient Reported Outcome Measures (PROMs) programme of the English National Health Service (NHS) (Devlin et al. 2010). It shows data collected from 38,187 patients before and after they had hip replacement surgery in 2010–11. The following Table 3.1 shows a range of descriptive statistics.

Because the raw data are recorded as integers, it is important to ensure that the figures reported do not have spurious accuracy. In this table, the median and the mode (in this case, there is only one) retain their integer format. The median might be presented to one decimal place, reflecting the possible values that it could take, but because of digit preference a value ending in 0.5 will be rarely observed. Other statistics are presented to three significant figures, since it is unlikely that greater


**Table 3.1** EQ VAS score for 38,187 patients before and after they had hip replacement surgery in the English NHS, 2010–11

**Fig. 3.1** EQ VAS scores for hip replacement patients before surgery, English NHS 2010–11

precision than this is either necessary or justified. It is also good practice to report the number and percentage of missing values.

It is also informative to report the full distribution of individual EQ VAS data points, especially graphically. A table showing the frequency of observations taking values from the full range of possible scores is possible but may not be very informative about key features of the distribution and will be affected by the issue of digit preference. It is most useful to use a graphical display, particularly spike plots. An example is shown in Fig. 3.1, again using the before-surgery hip replacement data.

This plot not only shows the shape and central tendency of the distribution, but also the extent of digit preference.

It is possible to reduce frequency tables to categories containing ranges, which makes them more easily read. However, end points for ranges should be chosen carefully, as this may affect the visual appearance of the distribution. It is misleading, for example, to define ranges such as 0–4, 5–9, 10–14 etc., as observations such as 9 are more like 10 than 5, for example. It is better to define ranges such that they cover a midpoint, specifically multiples of 5 and 10. However, at the ends of the distribution it may be better to display individual scores for those below the range around 5 (0, 1 and 2) and above the range around 95 (98, 99 and 100) rather than assume they are all representative of 0 and 100. The following Table 3.2 and Fig. 3.2 show this procedure for the before-surgery hip replacement data.

Table 3.3 shows analyses of the similarity and differences between the two observations. In this example, the two EQ VAS scores (before and after surgery) are paired, but it would be possible to undertake similar analyses for unpaired data.


**Table 3.2** EQ VAS scores for hip replacement patients before surgery, English NHS 2010–11

**Fig. 3.2** Mid-point EQ VAS scores for hip replacement patients before surgery, English NHS 2010–11


#### **3.3 Modelling Determinants of EQ VAS Scores**

It will usually be of interest in analysing EQ VAS data to examine the impact of other variables on the EQ VAS scores. In the example above, a before-and-after comparison was made between scores obtained from the same people at two time periods, but similar comparisons could be made for people according to different characteristics such as age, gender, social circumstances, location etc. Obviously, multivariate comparisons could also be made. Multivariate regression techniques have been applied to EQ VAS data and demonstrated good discriminatory properties, for example Parkin et al. (2004).

An analysis that is always available to users of EQ-5D questionnaire data is to model the relationship between the EQ-5D health state profile and the EQ VAS scores. This makes good use of the full questionnaire data by giving additional insights into the nature of the HRQoL of respondents, highlighting the importance of different aspects of their HRQoL, as described by the profile, on their overall HRQoL, as measured by the EQ VAS. Studies using the EQ-5D-3L have demonstrated a good relationship between these. They have consistently found that coefficients on the levels and dimensions of the EQ-5D-3L health state profile are in the correct direction and follow the expected gradient between levels, such that the coefficients on level 3 are greater than those on level 2 (Jelsma and Ferguson 2004; Whynes 2008, 2013; Feng et al. 2014).

Table 3.4 shows an example from the PROMs hip data used earlier.

Amongst possible interpretations of these results, it is notable that Anxiety & Depression has the biggest impact on the EQ VAS scores and Pain & Discomfort the smallest. Although level 2 Mobility has an impact similar to that of level 2 Selfcare, level 3 has a much greater impact, perhaps reflecting the extreme nature of the EQ-5D-3L level 3 descriptor for Mobility ('confined to bed'). Similarly, although level 2 Usual Activities has a much lower impact than level 2 Self-care, the level 3 coefficients for these two dimensions are similar.

It is important to note that the coefficients that will be obtained are specific to the characteristics of the population from which the data are collected. For patient


data, the evidence is that the profile coefficients may differ according to the type of condition that the patient has. Moreover, other variables, including age and sex, may impact not only on the EQ VAS scores directly but also on the profile coefficients, via interaction effects. Such analyses add further to understanding the impact of different patient characteristics and conditions on HRQoL. An additional implication is that simple comparisons between the coefficients obtained from this analysis and those obtained by modelling valuation data should be avoided.

As a direct illustration of this, Fig. 3.3, which has been generated from a vast amount of EQ-5D data held by the EuroQol Group office in Rotterdam, shows that there is a sharply declining EQ VAS by age for those whose self-reported profile contains at least some problems (Oppe 2013). As age increases, the number and severity of problems reported increase and the EQ VAS decreases. But the EQ VAS declines with age even among patients reporting no problems on any EQ-5D dimension.

**Fig. 3.3** The relationship between age and EQ VAS for those with no problems in any EQ-5D dimension and those with problems in at least one dimension (The straight lines are based on linear models for all data, no problems on EQ-5D (11111) and problems in at least one dimension (NOT 11111). The dashed straight lines represent separate linear models for men (|) and women (~) for all data. The other lines depict observed mean scores for corresponding groups.)

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Chapter 4 Analysis of EQ-5D Values**

The aims of this chapter are


#### **4.1 Value Sets and Their Properties**

Despite the potential richness of the EQ-5D descriptive system demonstrated in the previous chapters, the instrument was originally developed as a measure of health status that could serve as the basis for summarising and comparing health outcomes (Williams 2005). In particular, it was designed to be a brief generic measure that would lend itself for the purpose of assigning a single summary value to each possible health profile (hereafter: 'EQ-5D values' or 'values'). The values, presented in country-specific value sets, are a major feature of the EQ-5D instrument, facilitating the calculation of quality-adjusted life years (QALYs) that are used to inform economic evaluations of health care interventions or policies on health.<sup>1</sup>

It is important to note that these values are a special case, based on people's strength of preference for different health profiles, of an index that generates a single summary number. Such indices in general (other than *values*), for example based on clinically-defined need, may be used for other purposes and, as we discuss below, it

<sup>1</sup>Note that it is beyond the scope of this book to offer guidance on how to conduct cost-effectiveness or cost-utility analysis. There are other resources providing detailed guidance on this, such as Drummond et al. (2015) and Sanders et al. (2016).

N. Devlin et al., *Methods for Analysing and Reporting EQ-5D Data*, https://doi.org/10.1007/978-3-030-47622-9\_4

should not be assumed that using value sets is appropriate for any of those purposes.2 Other possible uses of indices more broadly defined are to summarise EQ-5D profiles for statistical analysis, describing the health of a population, comparing population health (between regions, populations, or over time), describing severity of illness, and assessing population or patient priorities for treatment (Devlin and Parkin 2006). A relatively newly developed use is in routine outcomes measurement, to assess the performance of healthcare, for use as a hospital performance indicator (used to help patients choose which hospital to be referred to), and for use in measuring the productivity and performance of a healthcare system (Appleby et al. 2015).

A value set consists of weights that can convert each EQ-5D health profile into a value on a scale anchored at 1 (meaning full health) and 0 (meaning a state as bad as being dead). The scale allows negative values to be assigned to health states that are considered worse than dead. The values can be calculated by applying a formula that attaches a weight to each level in each dimension. In some cases, the formula allows for the possibility that combinations of problems might also affect preferences, via interaction effects.

The EQ-5D-3L describes 243 unique health profiles (35), whereas the EQ-5D-5L describes 3,125 possible unique health profiles (55). Most of the EQ-5D value sets have been obtained using stated preference data elicited from representative samples of the general public, thereby ensuring that they represent the societal perspective. The normative argument for using so-called "social" value sets is that for resource allocation purposes in publicly-or collectively-funded health care, the valuation of health states should reflect the preferences of the relevant general public (Weinstein et al. 1996; Sanders et al. 2016), since it is the general public who are ultimately funding health care and are the users of the health care system (Dolan 1997).

Value sets are commonly produced by valuing a selection of EQ-5D states and, by using econometric techniques, to extrapolate over the full set of health states. For the EQ-5D-3L, a subset of health states to be used in a valuation study was decided by the Group in 1990 (Rabin et al. 2007) along with a preferred method for obtaining the values, using a visual analogue scale (VAS) approach. For various reasons however, subsequent valuation studies have not always adhered to the standard approach, since these studies were often the result of locally led research initiatives. Apart from the choice of the health state design (i.e. deciding on the subset of states to be valued), studies differed in other ways, such as the valuation method and the (interviewer) protocol used, the number of respondents included, exclusion criteria for valuation responses, and modelling choices in arriving at a final value set. When the EQ-5D-5L was introduced, the EuroQol Group decided to return to having a more standardized approach by developing the EuroQol Valuation Technology platform (EQ-VT) (Oppe et al. 2014). Apart from standardization in terms of health state design, valuation methodology, and a computer-assisted personal interview mode of administration, a strict protocol of interviewer training and quality assurance during the entirety of the data collection process was developed and implemented (Ramos-Goñi et al. 2017a).

<sup>2</sup>However, we will only be discussing value sets in this chapter, not any other possible indices.

Values derived for EQ-5D have been based on various stated preference valuation techniques, such as the standard gamble (SG), time trade-off (TTO), VAS, person trade-off or rank-based techniques such as paired comparison, best-worse scaling and discrete choice methods. Since the first publication in 1997 (Dolan 1997), EQ-5D-3L value sets have been derived and published for many countries (www.eur oqol.org). EQ-5D-5L valuation studies have been conducted from 2012 onwards, and the published value sets are listed on the EuroQol website at www.euroqol.org. EQ-5D-3L value sets were mainly based on TTO and VAS valuation methodology, although other techniques have also been used (Craig et al. 2009; Bansback et al. 2012). For the valuation of EQ-5D-5L, the EuroQol Group decided to explore the use of rank-based valuation methods to gain additional information (Devlin and Krabbe 2013). The current EQ-VT protocol for the valuation of EQ-5D-5L health states uses composite TTO and discrete choice valuation methodology (Oppe et al. 2014). There has been much discussion about the theoretical and empirical properties of the different valuation methods. In the health economics literature choice-based methods such as SG and TTO are often argued to have a more solid basis in economic theory than a rating approach such as VAS (Brazier et al. 1999; Drummond et al. 2015) although for an alternative view see Parkin and Devlin (2006)–whereas discrete choice methodology is rooted in mathematical psychology and was further developed into random utility theory (McFadden 1974).

The EQ-5D-5L descriptive system was published before valuation studies were carried out and the subsequent publication of value sets derived from them. As an interim measure, the EuroQol Group coordinated a study that administered both the 3-level and 5-level versions of the EQ-5D to develop a mapping3 function between the EQ-5D-3L value sets and the EQ-5D-5L descriptive system, resulting in (interim) value sets for the EQ-5D-5L (van Hout et al. 2012). 3,691 respondents completed both the 3L and 5L across 6 countries: Denmark, England, Italy, the Netherlands, Poland and Scotland. Different subgroups were targeted, and in most countries, a screening protocol was implemented to ensure that a broad spectrum of levels of health would be captured across the dimensions of EQ-5D for both the 5L and 3L descriptive systems.

Table 4.1a, b show two existing value sets with examples how to calculate the values for a certain health profile.

Finally, an important consideration is that attaching values to descriptive data introduces an exogenous source of variance, which can bias statistical inference (Parkin et al. 2010; Wilke et al. 2010). This is a special problem for applications where people's preferences are not directly relevant and is a key reason why it should not be assumed that values provide a suitable index for non-economics applications. Conclusions about whether there are statistically significant differences in, for example, the health of 2 regions, or health over time, or between 2 arms of a clinical trial, may be influenced by which value set is used. Furthermore, note that there is no such thing as a neutral value set or index; any weighting of EQ-5D profile data will influence the results, including the equally weighted Level Sum Score

<sup>3</sup>For further information on mapping, see Sect. 5.2.


**Table 4.1 a** An example of applying the EQ-5D-3L value set for the United Kingdom (UK) to calculate EQ-5D values. **b** An example of applying the English EQ-5D-5L value set to calculate EQ-5D values


index described in Chap. 2. This is not specific to the EQ-5D, applying equally to the scoring systems of other health measures, both generic and condition specific, including measures that simply sum ranked responses. For economic evaluation, the issue is rather different, because the exogenous influence of people's preferences is a desired feature when taking a societal perspective.

#### **4.2 Positive and Normative Considerations in Choice of Value Set**

In recent decades, a large number of EQ-5D value sets have been published, using a multitude of approaches and valuation techniques, with applications in various fields. Users of EQ-5D often question what the appropriate value set is for their particular use. The aim of this section is to provide advice on this question, largely following the earlier "Guidance to users of EQ-5D value sets" which was published as Chapter 4 of the EuroQol GroupMonographs Volume 2: EQ-5D value sets: Inventory, comparative review and user guide (Devlin and Parkin 2007).

An obvious advantage of using a summary value to represent a health profile is that it simplifies statistical analysis. But since all value sets embody preferences about the relative importance of each level of each dimension, it is not possible to offer generalised guidance about which value set to use if the objective is to summarise profiles for descriptive or inferential statistical analysis. If there is not a clear purpose for using a summary value (especially based on social values), but rather an aim to provide information, it may be better if no value is used, but to report the descriptive information as described in previous chapters. This also applies to describing the health of a population or patient group, or for comparing population health.

One of the most common uses of EQ-5D values remains in economic evaluation, with applications in cost-per-QALY/cost-effectiveness analysis (CEA) or cost-utility analysis (CUA). In CUA, the value set will be used to calculate QALYs, and the weights in the value set should represent "values", meaning that the health profiles described by the instrument should be weighed by the *value* of the health profile. To arrive at QALYs, the values should be anchored at 0 (corresponding to being dead or as bad as being dead), and 1, representing full health. A further requirement, although not essential for all cost-effectiveness analyses, is that the value set should be based on the societal perspective.

Often, economic evaluation is performed to provide evidence for a formal decision-making process. National health technology assessment bodies across the world routinely use economic evaluations to make decisions and recommendations about health care services. At the time of writing the EQ-5D is the preferred (or one of the preferred) health outcome measures recommended by pharmaceutical reimbursement authorities in at least 29 countries, including countries in Europe, North America, South America, Asia and Australia (Kennedy-Martin et al. 2020). When a value set needs to be selected to perform such an evaluation, the first consideration is pragmatic: does the relevant decision maker specify any requirements or preferences regarding which value set should be used? If recommendations will be made to more than one country on the basis of the evaluation's results, for example when performed alongside a multi-country clinical trial, the value set relevant to each separate country should be applied to the effectiveness data and reported to the decision makers in each separate country.

In the absence of specific requirements or guidelines from decision makers, analysts are left to make their own choices, for which broadly there are three main considerations to take into account: relevance to the decision-making context; empirical characteristics of the valuation study and modelling techniques; and the theoretical properties of the valuation methods.

Relevance to the decision-making context entails whether the values reflect the geographical and economic context in which resource allocation decisions are made, and whose values are considered to be relevant in the decision-making process. As mentioned in Sect. 4.1, there is a strong normative argument to opt for social valuations in economic evaluations informing decisions about collectively-funded health care. An alternative would be to use patients' values, because the preferences of patients who are actually experiencing the health states would be more well-informed than values generated from the general public being asked to imagine health states that may be hypothetical to them. Differences between patients' values and social values are widely observed (Zethraeus and Johannesson 1999; de Wit et al. 2000; Brazier et al. 2005; Ogorevc et al. 2019). Since the value set arguably should reflect the preferences of the potential recipients of healthcare, local (i.e. country-specific) value sets should be used when available. For a country for which no value set has been published and no local guidelines are available, practical aspects might be taken into consideration, such as considering a value set of a country that is most similar in terms of e.g. demographics, geography, language, infrastructure, or health care system. Finally, the time period in which the valuation study has been performed is relevant. The UK EQ-5D-3L value set is still being used extensively at the time of writing, but the data collection for the valuation study dates back to 1993, while the UK has gone through many demographic and economic changes since then which might impact on preferences.

Empirical characteristics should be considered when choosing a value set. It is recommended that users study those characteristics before choosing a value set, looking at e.g. the response rate of the valuation study, whether the sample was representative of the general public, which valuation method was used, whether the health state design was appropriate, which mode of administration was used, the 'quality' of the data (were there many missing values, inconsistencies, low values for very mild health states or vice versa), were the econometric modelling techniques sound and appropriate, was the choice of the final model appropriate? These questions largely apply to EQ-5D-3L, since with the introduction of the EQ-VT platform for the valuation of EQ-5D-5L, many potential issues have been resolved by a high level of standardization and rigorous interviewer training and quality assurance.

The theoretical properties of the underlying valuation methods have been a controversial issue for decades. As mentioned in Sect. 4.1, so-called 'choice-based' methods such as SG and TTO have been preferred over a rating approach such as VAS. For the EQ-5D-3L, mainly VAS and TTO value sets are available. TTO based value sets have generally been preferred for purposes of economic evaluation, although it has been suggested that VAS value sets may be used for non-economics studies (Kind 2003). The EQ-VT protocol for EQ-5D-5L valuation studies uses composite TTO and discrete choice valuation techniques, offering the possibility to model a composite TTO based value set, or a value set based on a hybrid model combining composite TTO and discrete choice data (Feng et al. 2018; Ramos-Goñi et al. 2017b).

Based on the criteria discussed above, there may not be a single 'best' value set for any given application. Therefore, it is recommended to perform sensitivity analysis using other suitable value sets, to assess the impact of the choice of value set on results and conclusions. As mentioned above, many countries do not have a value set of their own and therefore have to use 'foreign' values; Parkin et al. (2010) showed that in a simulated economic evaluation experiment, whether or not an intervention is seen as effective in such a country might depend on which other country's value set it chooses. This stresses the relevance of which value set one chooses, and the importance of performing sensitivity analysis (see Sect. 4.8).

The value sets that are used in economic evaluation have a clear theoretical rationale that is the foundation for the values, the way that they are derived, and their meaning. As mentioned above, this rationale might not be relevant for other uses. The values used in economic evaluation are explicitly regarded as 'utilities', with a very specific definition attached to them. There is a clear meaning for the values 1 and 0 and for negative values. As mentioned above, a recognized stated preference technique such as TTO is often recommended to derive the values. Finally, there is a justification for the use of the general population as a source of EQ-5D values. The values should be used in other applications only if the same theoretical rationale also applies.

Figure 4.1 provides an overview of the considerations that should determine your choice between the EQ-5D value sets. Choosing a value set is not simple, since many factors are involved, such as the specific nature of the research application, the sort of decisions it informs, and the context in which the evidence from your research will be used. In longitudinal studies, the same value set should be applied throughout the study. When the research aim is to make comparisons across respondents from

**Fig. 4.1** Guidance on which EQ-5D value set to use

different countries in a multinational cross-sectional study (rather than comparing value set characteristics) it will also be helpful to use a common value set if one is available, otherwise differences in country preferences would be added to the differences between respondents' health status. An example is the European VAS value set (Greiner et al. 2003).

#### **4.3 Simple Descriptive Statistics and Inference**

EQ-5D values can be presented in much the same way as EQ VAS data. Since the valuation methods underlying the values are meant to provide a scale with cardinal properties, for exploratory data analysis you can present a measure of central tendency (e.g. a mean or median), a spread (i.e. a measure of dispersion such as the standard deviation) and a shape (e.g. skewness, mode, or kurtosis). If the data is skewed, as is often the case with EQ-5D value data for general populations or mildly diseased patients, the median value could be used as measure of central tendency. As measure of dispersion one can also add minimums, maximums, and the inter quartile range (IQR) which is the difference between the 75th and 25th percentiles. If you are interested in the precision of the mean, you can use the standard error of the mean and a 95% confidence interval. Similar to EQ VAS data, a t-test can be used for comparing differences between means of different populations (or the same population over time). When you want to compare more than 2 groups, an Analysis of variance (ANOVA) can be used. The following tables and figures contain 2 examples of how to present EQ-5D value results. Table 4.2 and Fig. 4.2 present the results from a study



where the effect of a treatment on health status is investigated (the tables and figures are based on hypothetical data and for illustration purposes only).

Table 4.3 and Fig. 4.3 show results for a patient population and 3 sub-groups.

Below there are two more examples on how to report descriptive statistics for EQ-5D values. Table 4.4 shows a comprehensive overview of EQ-5D-3L population norm values for the United States (US), stratified by age and sex and also including total values. The precision of the estimate of the mean is indicated by the standard error. The median (50th percentile) is included, being relevant in general population samples that tend to be skewed towards full health, and a measure of dispersion is represented by the interquartile range (75th percentile–25th percentile).

An illustrative way to present longitudinal values from different populations is shown in Fig. 4.4 by a scatter plot for the experimental and comparator arms in an intervention design. One can simply track the value means for both patient groups over time, indicating the new treatment causes a more severe drop in health initially but also displays a quicker recovery and finally leads to a higher level of health than the comparator treatment.


**Table 4.3** EQ-5D values for the total patient population and the 3 subgroups

All paents Subgroup 1 Subgroup 2 Subgroup 3

**Table 4.4** General population EQ-5D-3L norm values for a representative sample of the US (Szende et al. 2014, Springer open access)


It is important to note that EQ-5D values are often not symmetrically distributed, and tend to be divided into multiple groups (clusters), which might mean that standard statistics such as means and standard deviations are harder to interpret. This will be discussed in more detail in Sects. 4.4 and 4.6.

**Fig. 4.4** Example of presentation of longitudinal EQ-5D values (hypothetical data with smoothed lines and confidence intervals)

EQ-5D values are often used to calculate QALYs, for use in CUA. Although QALYs are commonly used in an evaluative context, for example when comparing two or more health programmes. An example is shown here to calculate QALYs for descriptive purposes, e.g. for a single individual. In the standard QALY model, values are simply multiplied by the time period for the corresponding health state, and when different health states occur over time, these are added, as shown in Fig. 4.5, where two health states in suboptimal health occur with values ('utilities') of 0.4 and 0.8 after which health gradually improves after the initial event.

**Fig. 4.5** QALY calculation of an event-like condition with a recovery period

#### **4.4 Examining the Distribution of the EQ-5D Values**

Examining EQ-5D value distributions can be done in a graphical as well as in a numerical manner. First, we present graphical ways of exploring distributions. Distributions of EQ-5D values often show gaps and spikes or clusters of observations in certain parts of the scale. At the upper part of the scale there is often a gap which can be quite substantial, especially in EQ-5D-3L value distributions. This gap is caused by the ceiling often present in EQ-5D data and the intercept in the value function. In general population samples, but also in mildly or moderately diseased samples, often a relatively large proportion of respondents score no problems on all five dimensions: the ceiling. A large ceiling will result in a skewed distribution. For many value sets, there is a relatively large constant (or intercept) in the value set, leading to a gap between full health and the second-best health state. In distributional terms this may result in at least two clusters in the distribution. Apart from this "upper gap", more gaps may appear in EQ-5D value distributions. Parkin et al. (2016) demonstrated that two or three clusters often occur in value distributions for EQ-5D-3L. The left panel in Fig. 4.6 shows an example with 3 clusters caused by the ceiling and the intercept (the upper gap) and a low and high cluster which are caused by differences between levels 2 and 3 value decrements being greater than those between levels 1 and 2, and also because of the so-called N3 term4 used in the many EQ-5D-3L value sets, as shown by Parkin et al. The right panel in Fig. 4.6 shows a distribution of EQ-5D-5L in the same patient group, resulting in a much smoother distribution. Note that these data were derived from a single patient sample: these respondents scored both the EQ-5D-3L and EQ-5D-5L descriptive systems, and subsequently the corresponding value sets (UK for EQ-5D-3L and English for EQ-5D-5L) were applied to the health profile data.

There are several differences between value sets across countries, but overall it was shown that EQ-5D-5L distributions resulted in smoother and more natural

**Fig. 4.6** Distribution of EQ-5D-3L and EQ-5D-5L values in a sample of cardiovascular disease (CVD) patients (N = 251)

<sup>4</sup>The N3 term results in an additional decrement of the value when at least one level 3 is present in the health profile.

**Fig. 4.7** Distribution of EQ-5D-5L values in a sample of cardiovascular disease patients (N = 251)

looking distributions than EQ-5D-3L (Janssen et al. 2018). Interestingly, an exception occurred for an EQ-5D-5L value set including a model term similar to N3. Here again three clusters appear in the distribution, as depicted in Fig. 4.7.

Sometimes histograms of distributions are not easy to assess, especially with large datasets in heterogeneous populations, e.g. showing a large spread of observations and perhaps spikes or clusters across the value scale. It becomes even more difficult when you want to compare two distributions in a single figure. In these cases, it might help to use a smoothing function such as the kernel density estimation. Figure 4.8 shows an example of an EQ-5D-3L and EQ-5D-5L kernel density plot in a large heterogeneous dataset. Note that also here the EQ-5D-5L distribution resulted in a much smoother plot when compared to the EQ-5D-3L distribution plot which is much more irregular.

When depicting a single distribution one can also combine a histogram with a smoothing function, such as shown in Fig. 4.9.

A final comment in regard to graphical presentation by histograms is that the choice of number of interval ranges ("bins", each bar represents 1 interval range) might influence the density in areas with a high concentration of observations, e.g. the ceiling (proportion of 11111) will result in a larger spike when more bins are opted for. Figure 4.10 shows an example for an EQ-5D-5L value distribution in a pooled dataset of 9 condition groups with 35 bins in the left panel versus 100 bins in the right panel.

Many EQ-5D-3L value set distributions will result in a distribution with clusters and gaps. These patterns in the distribution are considered to be undesirable as they can diminish the sensitivity and accuracy of the instrument (Janssen et al. 2018). Moreover, they can lead to estimation problems if distributions result in a

**Fig. 4.8** Distribution of EQ-5D-3L and EQ-5D-5L values in a pooled dataset of 9 condition groups (N = 3,790)

**Fig. 4.9** Distribution of EQ-5D-3L values in a sample of Asthma/COPD patients (N = 342)

**Fig. 4.10** Distribution of EQ-5D-5L values applying 35 versus 100 bins

violation of homoscedasticity5 when the values are used as dependent variable in regression analysis. With the introduction of the EQ-5D-5L the clusters and gaps largely disappeared, although to a lesser extent they still might occur. An extreme example is shown in Fig. 4.7 where clusters were caused by the large intercept and the interaction term in the value function. For other EQ-5D-5L country-specific value sets clusters and gaps hardly occur. Overall the interim ('mapped') EQ-5D-5L value distributions tend to be more similar in shape to the EQ-5D-5L value set distributions, although the range is identical to the EQ-5D-3L distributions (Feng et al. 2019; Mulhern et al. 2018). In Sects. 4.6 and 4.7 guidance is provided on how to deal with a clustered data distribution.

A final remark can be made in regard to the terms bimodal and even trimodal that are often used to describe distributions with 2 or 3 clusters respectively. Parkin et al. (2016) point out that in regard to EQ-5D-3L data these terms are misleading, since the modes of the groups are not their most interesting feature. The groups do not always have a single local mode, and in practice these modes are never actually identified, reported, or analysed.

There are several numerical ways of assessing EQ-5D value distributions. A simple way is to report the proportion of the ceiling and the floor. More comprehensive methods are evenness measures such as the Shannon indices, or the Health State Density Index as described in Sect. 2.8. Note that the total number of unique *values* might be (almost) equal to the number of unique possible *health profiles*. In these cases, the resulting indices will be equal to or close to the indices applied to the profile data.

#### **4.5 Variance and Heteroskedasticity**

As described above, EQ-5D value data is often defined by some specific characteristics. By nature, the data are censored due to the upper bound at 1 (full health) and the lower bound for the most severe health profile (33333 in 3L and 55555 in 5L). Because 11111 describes full or "normal" health as indicated by having no problems across the five dimensions, there often is a ceiling present and the data distribution might be skewed. A consequence of these factors is that variances might vary across the value space, leading to heteroskedasticity. Heteroskedasticity refers to the situation where the variance of a variable is unequal across the range of values of a second variable that describes or predicts it. Figure 4.11 shows an example of observed values paired with self-reported EQ VAS ratings. Typically EQ-5D variances will be unequal across the scale, which is at least partly due to the censored nature of the value scale (e.g. the figure clearly shows reduced variance in the upper right corner of the Fig. 4.11).

A graphical way of depicting heteroskedasticity (or homoscedasticity) is by using a residual-versus-predictor plot, which is a scatter plot of residuals against the

<sup>5</sup>See also Sect. 4.6.

**Fig. 4.11** EQ-5D-5L values (Spanish value set) plotted against EQ VAS for a sample of personality disorder patients (n = 384)

predicted values. One can easily detect if there are any patterns visible in the scatter plot. If there are no visible patterns or the plot shows roughly a rectangular shape, or both, the data are likely to be homoscedastic. Note that a pattern could be present in a residual-versus-predictor plot but the data could still be homoscedastic, in which case the data is likely to be biased. Figure 4.12 shows an example of a residual-versuspredictor plot for the same data used in Fig. 4.11. Clearly residuals are distributed unequally across the value scale which means that heteroskedasticity is present.

There have been many reported cases of heteroscedasticity in EQ-5D data. Section 4.7 provides further information on how to deal with heteroscedasticity in EQ-5D data.

#### **4.6 Exploring Clusters in EQ-5D Value Distributions**

As described in Sect. 4.4, EQ-5D value distributions often show clusters of observations. Sometimes these can be clearly detected graphically as is the case for many EQ-5D-3L distributions. In other cases, one can use statistical methods to test for the presence of clusters. A distribution with multiple clusters might imply that there are actually multiple patient populations that should be analysed separately. The mean value might actually refer to a point on the value scale were there are hardly any observations, so perhaps a better way to inform about these data would be to report

**Fig. 4.12** Residual-versus-predictor plot of EQ-5D-5L residual values (Spanish value set) plotted against EQ VAS (ordinary least squares regression) in a sample of personality disorder patients (n = 384)

simple descriptive statistics such as the mean, median, mode, range and standard deviations for the clusters *separately*.

Parkin et al. (2016) and Feng et al. (2019) used statistical techniques to detect clusters, first by applying k-means clustering to demonstrate the presence of clusters in EQ-5D-3L and EQ-5D-5L distributions. The k-means cluster algorithm searches for the optimal partition in *k* clusters. There are many stopping rules available for determining the optimal number of clusters. Feng et al. identified the Calinski-Harabasz pseudo-F index as the most suitable stopping rule for EQ-5D value data. Before applying the k-means procedure, the number of clusters must be decided upon. Subsequently the stopping rule may be applied to determine the optimal number of clusters. For more detail see Feng et al. (2019). Table 4.5 shows an example of applying this method on the EQ-5D-5L value set for England in a large pooled dataset across 2 patient groups. There are different clusters apparent, with different mean values and different dispersion and shape statistics. Note that different clusters are found for the different patient groups, and also the optimal number of clusters varies over the different patient groups.

Although this approach can be used as a useful exploratory tool, it does involve arbitrary judgments. Therefore, a careful examination of the data and resulting cluster statistics is advised before making conclusions in regard to what the optimal clusters are, if any. Testing for clusters and identifying clusters can be useful before using the data for different applications, such as health technology assessment and health


**Table 4.5** Identifying clusters in EQ-5D-5L data (English value set) in 2 patient groups (Feng et al. 2019)

care management processes. The statistical techniques one intends to use should take account of clustering, in order to ensure that inferences drawn from the results are not biased.

An exploratory potential use of cluster analysis is to provide a means of identifying distinct pre-and post-treatment patient groups, and to use that information to predict which patients might benefit the most from the treatment and for which the treatment is less successful.

#### **4.7 Regression Analysis**

Regression analysis is a commonly used statistical technique for analysing EQ-5D values, quantifying the influence on values of their underlying determinants, such as clinical and socioeconomic characteristics. Applying multivariate regression enables multivariate comparisons, similar to the analysis of EQ VAS scores, as described in Sect. 3.3. The main uses are within economic evaluation, where the interest is in the values generated by different health care interventions, and in mapping studies, where the interest is in the values attached to different health states.

In Table 4.6 an example is shown of applying regression techniques for economic modelling for a treatment for relapsed or refractory multiple myeloma (NICE 2017). EQ-5D-3L data (UK value set) resulting from a randomized controlled trial were modelled by regression analysis for use in CUA. A repeated measurement mixed model was used to predict EQ-5D-3L values based on three types of response, whether a patient was ≤3 months prior to death, hospitalizations, (treatment related) adverse events, and new primary malignancies. The occurrence of adverse events and


**Table 4.6** Utility coefficients for parameters obtained using the EQ-5D-3L (UK value set)a

Key: EOL, end of life; PD, progressed disease; PR, partial response; SD, stable disease; TRAE, treatment related adverse events

aEQ-5D-3L data were transposed into a utility decrement using "decrement <sup>=</sup> 1–utility". The decrements were used as dependent variables in the regression model with response status, hospitalisation, adverse events, new primary malignancy, whether a patient is within 3 months prior to death, treatment allocation and time as independent variables, with interactions between time and response status

hospitalisation were included as covariates. The model used a log link and a Gamma distribution. The results from this regression showed that new primary malignancies and whether a patient is ≤3 months prior to death had the largest effects on utility. Variables associated with response status also had a significant impact. The coefficients associated with adverse events and hospitalisations were not significant. The utility coefficients can be used for the calculation of QALYs for inclusion in a CUA model.

As we have seen in Sect. 4.4, EQ-5D data is characterized by its censored nature with bounds at full health and the worst health state. Moreover, for many countryspecific EQ-5D value sets, there is a gap between full health and the second best health state. For EQ-5D-3L, there are often clusters present, which only occurs for certain country-specific value sets for EQ-5D-5L data. Given this specific nature of EQ-5D values, many different regression techniques have been applied.

Ordinary Least Squares (OLS) regression is the most commonly used regression technique. As always, it is necessary to test for violations of its underlying assumptions, although it is robust to small violations, especially in large samples. These include the assumption that the residuals are normally distributed6 and homoscedastic, violations of which affect statistical testing of regression coefficients,

<sup>6</sup>Note that the data itself do not need to be normally distributed due to the Central Limit Theorem. The distribution of the means of non-normal distributions will still be normal as long as the samples are large enough, large being roughly above 30 (Norman and Streiner 2000, p. 28).

though not the estimates themselves. However, EQ-5D values data may be subject to clustering, which violates the assumption that all of the observations in the data are independent, and censoring, which could affect the consistency of the OLS estimator, generating estimates that may be biased.

Various statistical tests are available to verify which regression techniques are most suitable for a given dataset. Normality of residuals can be assessed by several formal tests, including skewness and kurtosis estimates, the Shapiro-Wilk test, or the Jarque-Bera test. When using EQ-5D data in regression analysis, it is recommended to test for heteroskedasticity, for which many formal tests are available, such as the Breusch-Pagan test or the White test. When comparing two or more groups one also has to take the possibility of unequal variances into account. Again, it is recommended to test for unequal variances, e.g. by using the F-test of equality of variances. Note that the assumption of homoscedasticity is related to the residuals and not the dependent and independent variables included in a regression itself. Graphical and numerical approaches as described in Sects. 4.4 and 4.6 can be applied to test for the presence of clusters. Based on these results, one can determine which regression technique is most suited for the analysis of interest.

Many regression modelling techniques are available to deal with the typical nature of EQ-5D value data, such as Tobit, censored least absolute deviation (CLAD) or other median models, two-part models, latent-class models, and limited dependent variable mixture models<sup>7</sup> (Austin 2002; Fu and Kattan 2006; Huang et al. 2008; Pullenayegum et al. 2010; Hernández Alava et al. 2012). These different models aim to deal with various characteristics of the data. Tobit and CLAD models can take account of the censored nature of EQ-5D data. Two-part models specifically take the ceiling effect and the upper gap into account. Pullenayegum et al. (2010) suggested that Tobit and CLAD models might lead to biased results and propose OLS coupled with robust standard errors or the nonparametric bootstrap as a simpler and more valid approach which corrects for heteroskedasticity. Hernández Alava et al. (2012) demonstrated that an adjusted limited dependent variable approach combined with a mixture model can also account for the typical nature of EQ-5D-3L data (censored, large upper gap, and clustering). Figure 4.13 shows how the various models relate to different distributions, and we can indeed see that the adjusted limited dependent variable mixture model might be a good fit for various EQ-5D-3L distributions. For EQ-5D-5L, less complex models might suffice.

The mixture model approach applied by Hernández Alava et al. can also be used to identify latent classes, which bears a resemblance to identifying clusters as described in Sect. 4.6. A latent class model might be applied in regression to account for the different classes or clusters.

We end this section by providing an example. An innovative technique to develop a "catalogue" of EQ-5D-3L values by applying regression techniques to a large representative population survey database collected in the US was introduced by Sullivan and Ghushchyan (2006). CLAD regression was used to estimate the marginal

<sup>7</sup>Note that Hernández Alava et al. (2012) use a wider term (limited dependent variable) for EQ-5D data being censored at 1.

**Fig. 4.13** Illustrative histograms of possible model distributions (Hernández Alava, copyright Value in Health)

disutility of conditions classified by International Classification of Diseases codes (Ninth Revision), controlling for age, comorbidity, gender, race, ethnicity, income, and education. The resulting list of EQ-5D-3L values could serve as an "off-the-shelf" catalogue that might be used by analysts to estimate QALYs in CUA.

#### **4.8 Uncertainty and Sensitivity Analysis**

As mentioned in Sect. 4.2, since there may not be a single 'best' value set for any given application, it is recommended to perform sensitivity analysis using other suitable value sets in order to assess the impact of the choice of value set on results and conclusions. Parkin et al. (2010) showed that the choice of value set might determine whether an intervention is seen as effective or not. Since many countries do not have a value set of their own, the choice of value set as well as performing sensitivity analysis, is very important. The analyst conducting CUA should treat the values in an economic evaluation as uncertain parameters which, just like other non-stochastic uncertain variables such as the discount rate, should be subject to sensitivity analysis, in order to improve confidence in the obtained results.

It must be noted that the magnitude of differences between value sets, and their implications for estimates of QALYs, is not always obvious. As one value set might contain values that are systematically higher (or lower) than another for the health states relevant to a given therapy, these differences may even out in economic evaluation, which focuses on the incremental change in health resulting from that therapy.

Due to the preference structure of a certain country-specific value set, an intervention might be considered effective in one country and not effective in another country, based on identical EQ-5D data resulting from clinical trials. In applications where the societal perspective is not relevant, but the values are used as a convenient way of summarizing the EQ-5D descriptive data, one has to be even more careful, since the influence on any given country-specific value set might bias the results. Sensitivity analysis will also give the researcher a sense of how stable the results are and whether robust conclusions might be drawn.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 5 Advanced Topics**

This chapter provides guidance on a number of advanced topics, building on the content of earlier chapters. Our aims are


#### **5.1 Analysing Changes and Differences in EQ-5D Values and EQ VAS Scores**

In the analyses described in Chaps. 3 and 4, the objects of interest are EQ-5D values or EQ VAS scores measured at one or more points in time for one person or a group of people. These can be compared between the same person at different time points, which we will call 'changes', or between different people or groups of people, which we will call 'differences.' When the object of interest is the change or difference itself, analysts should be cautious in the way that they deal with them.

#### *5.1.1 Defining the Outcome of an Intervention Study as a Change*

In clinical studies of the impact of a health care intervention on health-related quality of life (HRQoL), it is possible to define the outcome in two different ways—the final state of health or the change from initial to final state. In many contexts, the magnitude as well as direction of the change is the object of interest, but there are some well-known issues about estimating the size of changes directly. These relate to all outcome measures, not just health status or HRQoL (for example Vickers and Altman 2001; Bland and Altman 2011), and to Patient Reported Outcome Measures (PROMs) other than EQ-5D instruments, but the characteristics of EQ-5D and EQ VAS values data mean that they are particularly vulnerable to misleading analysis through misspecification of the underlying analytical model.

The key issue is the relationship between the size of initial, or baseline, health state values and the size of the change in them. The most obvious null hypothesis is that baseline and final mean HRQoL scores are the same, equivalent to a mean change score of zero. However, for conditions where underlying health is deteriorating or the condition is self-limiting, this null hypothesis may not be the correct choice. The size of the change may also be related to the baseline in different ways, depending on both the condition and the treatment. For example, if the treatment leads to the same final health state for all patients, the change will be greater, the lower the initial health state; if the treatment is less successful for those with a poorer health state, the change will be greater, the higher the initial health state. Only if the change is constant whatever the initial health state will there be no such complicating relationship.

Figure 5.1 illustrates this point. The horizontal and vertical axes show baseline and final EQ 5D values scores respectively. The solid 45° line shows points where baseline and final are the same, which would be the null hypothesis for patients whose condition is neither improving or deteriorating. The dotted line shows a different

**Fig. 5.1** Stylised example of treatment effects

assumption: that the condition would result in a deterioration of the patients' health over time if untreated. The two solid lines above the 'No Change' line show different relationships between baseline and final scores for two different treatments. Patients A and B undergo a treatment for which the outcomes are better for patients whose baseline score is higher. Assuming the null hypothesis of no improvement or deterioration without treatment, the change after treatment for patient A, who has a lower baseline score than patient B, is smaller (-A) than that for patient B (-B). Patients C and D undergo a treatment which results in the same final score for all patients. Again, assuming that there would be no change without treatment, the change for patient C, who has a lower baseline score than patient D, is bigger (-C) than that for patient D (-D).

The special problem that this raises for both EQ-5D values and EQ VAS scores is that the existence of fixed end-points—0 and 100 for the EQ VAS; 1 and the value of the worst health state for EQ-5D values—places limits on the possible size of change. (The same is true for any outcome measure that has the same properties.) For EQ-5D values, there is an additional problem that the distribution of scores at both baseline and final assessment may not be smooth because of the discrete nature of the EQ-5D health states from which the scores are calculated. These two issues also mean that there may not be the necessary linear relationship between the baseline and final outcome scores that would permit calculation of a single change-based effect size.

The recommendations are therefore to specify carefully the counterfactual to the observed change or difference, or where possible to ensure that there are control groups from which this can be directly measured, and to ensure that appropriate methods are used to transform the distribution of EQ-5D values into a form amenable to statistical analysis.

#### *5.1.2 Minimal Important Differences (MIDs)*

The calculation of Minimal Important Differences (MIDs) for HRQoL or PRO measures, including the EQ-5D, is a topic on which there is currently no consensus, either to its usefulness or the best methods for its estimation. Those who wish to use or estimate MIDs are therefore advised to consult two review articles, one on MIDs in general (King 2011) and the other specifically on the EQ-5D (Coretti et al. 2014). Here we summarise some of the issues.

The term MID is used here, but other terms are used which, as King points out, may differ slightly in their definitions and meaning such as *minimal clinically important difference (MCID), clinically important difference, minimally detectable difference*, *minimum detectable change*, and *subjectively significant difference*. The most widely quoted definition of the concept is of a MCID (Jaeschke et al. 1989), but an updated MID-specific version of this (Guyatt et al. 2002) is "the smallest difference in score in the domain of interest that patients perceive as important, either beneficial or harmful, and which would lead the clinician to consider a change in the patient's management". Coretti et al. make use of a different term, *the smallest worthwhile* *effect* (SWE), defined by Ferreira et al. (2012) as "the smallest beneficial effect justifying costs, risks and inconveniences of an intervention."

There are three key questions to address when deciding whether and how to use MIDs with an HRQoL or PRO measure such as the EQ-5D: What is the purpose of using a MID? What definition should be used for that purpose? and How should the MID be estimated to meet that definition? Although in principle it would be possible to ask these questions about EQ-5D health states, in practice they have only been explored for EQ-5D values and to a lesser extent to EQ VAS scores, so this guide has the same limitation.

In answering these questions, it is essential to note that EQ-5D values have a feature that distinguishes them from some other measures. They already embody a measure of importance as perceived by a group of people, usually a general population, based on their preferences for different health states (see Chap. 4). The values are estimated from an underlying continuous value function at discrete points on the value scale identified by the EQ-5D health states. Any differences in the underlying values, however small, are therefore important in that they indicate a difference that would be preferred or non-preferred by the person affected, other things being equal. Similar arguments apply to the scores generated by the EQ VAS.

A wider definition of importance, such as whether a change is worthwhile given the perceived importance to patients and resource costs of making the change and the duration for which the change is experienced, requires information that is not contained within the EQ-5D values or EQ VAS scores themselves. This suggests that there is no conceptual basis for a MID for EQ-5D values or EQ VAS scores in terms of *desirability*; however, it may be possible to base a MID on whether in practice differences and changes in the EQ-5D values or EQ VAS score are *detectable*. As King points out, this concept of 'minimally detectable' differences or changes has two separate bases. One is psychometric, and concerns whether a difference is capable of being perceived by people. The other is statistical, concerning measurement error, the precision with which perceived differences are recorded.

#### **Using EQ-5D MIDs**

#### *Using EQ*-*5D MIDs for decision making with individual patients*

As noted, the basis for an EQ-5D MID to judge the importance, in terms of desirability, of differences between or changes in health states is weak. A further problem for using this with individual patients is that they may not share the preferences of the average patient or member of the general population about their health. With respect to detectability, the ability to observe changes or differences in EQ-5D values is entirely based on detection of changes to the EQ-5D health states, and the calculation of a summary index in the form of an EQ-5D value may obscure rather than illuminate the nature of the change.

#### *Using EQ*-*5D MIDs for decision making about populations*

Again, it is not possible to judge, in terms of desirability, whether an observed difference or change in EQ-5D values or EQ VAS scores is important without further information. With respect to detectability, there is also a problem arising from how observations for individual people are aggregated to give a population score, exacerbated in the case of the EQ-5D values by their discrete nature. A population average MID will depend both on the size of changes to each individual person's health state and the number of people experiencing different levels of change. As an extreme example, if all but one member of a group recorded a change of EQ-5D values at the MID value and the exception scored below that, the population would be judged as having a difference below the MID. Comparing the mean to the MID would give a misleading account of the clinical importance of the overall observed differences.

#### *Using EQ*-*5D MIDs for clinical research*

A proposed use of MIDs is in determining the most efficient sample size for a clinical trial, based on the desired probabilities of avoiding type 1 and type 2 errors. The aim is to ensure that trials are not over-powered, and generate statistically significant differences that have no clinical significance. A trial powered to detect differences at the level of the MID would be the correct approach for a trial for which HRQoL was the primary endpoint and was the sole determinant of clinical decision making. However, the MID is less useful for trials that have a different primary endpoint or where clinical decision making is not independent of factors other than a difference in HRQoL. In addition, it is again necessary to take account of the distribution of observed differences in EQ-5D values, as using an individually-based MID may be misleading about the total benefit over all patients.

#### **MID estimation methods**

A common finding of the different methods described below is that there is no identifiable single MID for EQ-5D values or EQ VAS scores. Instead, estimates differ by population, patient group, clinical context and sociodemographic factors; and might vary depending on whether health is improving or worsening. It is possible to calculate a score which is an average over different patient populations, such as the widely-quoted estimate by Walters and Brazier (2005) for EQ-5D values (which is described below), but although this is an interesting statistic, the size of the variability between different estimates means that an average EQ-5D MID should not be used for any of the purposes described above.

#### *Patient rating of change*

The most common and direct method of meeting the aim of assessing patients' own perceptions of the importance of differences in their health is to quiz them specifically about that, using a *global transition question*. This is a retrospective assessment by patients of the change in their health between two points, at each of which their current health has been assessed using the HRQoL or PRO instrument. For example, Walters and Brazier (2005) re-analysed 11 studies in different clinical areas that collected both EQ-5D and SF-36 data at different time points. They compared the differences between EQ-5D values with a question taken from the SF-36, asking if their general health was much better, somewhat better, stayed the same, somewhat worse or much worse, compared to the last time they were assessed. Those who answered somewhat better or somewhat worse were considered as having experienced a change equivalent to the MID.

This method relies on the global transition question identifying the minimum perception that patients can have, which is in reality determined by the fixed wording of the text of the permitted answers. For example, patients are likely to have different thresholds for deciding that they have any improvement or deterioration at all, and also different perceptions of the boundaries between 'somewhat' and 'much'. If these do not match the boundaries between the descriptions contained in the EQ-5D health states, then the calculated EQ-5D value changes for the 'somewhat' categories may not reflect the true size of the minimum differences that patients perceive. In addition, global transition questions are affected by the ability of patients to recall their previous health state accurately and may be more subject to acquiescence bias and response shift (Sprangers and Schwartz 1999; Kamper et al. 2009).

#### *Clinical anchors*

Another common method of defining a MID is to examine the scores of patients classified according to a different measure of their clinical status. The rationale is that for clinical decision making, clinicians may have more confidence with an HRQoL measure if it is related to more familiar, clinically-focussed and wellvalidated measures. For example, Pickard et al. (2007) calculated the mean EQ-5D values and EQ VAS scores for cancer patients in the different grades of two clinical measures, the Eastern Cancer Oncology Group (ECOG) and the Functional Assessment of Cancer Therapy General (FACT-G). The differences between the mean scores between different grades, ordered according to severity of the condition, provides MID estimates as a range and average.

This method emphasises the clinical decision-making aspect of the definition of a MID rather than the idea that it should reflect patients' own perception of the importance of change. It therefore depends on an assumption that the clinical anchor measure correctly distinguishes between important and unimportant changes in health states.

#### *Distribution*-*based*

Some estimates of the MID are based on statistics that describe the distribution of health states in a patient population, in particular the standard error of measurement (SEM) and the effect size (ES). Pickard et al. (2007) also estimated MIDs for EQ-5D values and EQ VAS scores using both of these approaches, stratified again according to FACT-G and ECOG grades. The SEM is based on reliability of the HRQoL or PRO instrument, usually measured with respect to test-retest reliability, the distribution around a true score of repeated assessments assuming no memory effect or other contextual changes, which is regarded as a fixed psychometric property of the instrument. An alternative measure is reliability based on internal consistency measured by Cronbach's alpha, which is what Pickard et al. used because of the scarcity of test-retest information for the EQ-5D.

The ES is calculated as the mean difference in HRQoL divided by the betweenperson standard deviation. Pickard et al. based their MIDs on the criterion of onehalf of the standard deviation (SD), although one-third and one-fifth SD are also commonly used (King 2011).

These methods again do not reflect patients' perceptions of importance, and unless they are stratified in the way used by Pickard et al. also do not reflect importance as defined by a clinician for use in decision-making.

#### *Instrument*-*defined*

Luo et al. (2010) and McClure et al. (2017) have calculated MIDs for the 3L and 5L respectively using a method that does not require empirical EQ-5D or other data. It is calculated, for a specified value set, as the smallest difference in the values of any pair of health states, over all possible pairs. It is therefore the smallest possible observed difference in values either for a person whose EQ-5D health state is captured at two different times or for two people at the same time.

This highlights an important property of a value set, and is useful in examining the comparative performance of different value sets. However, it does not match with the usual definitions of a MID and it is not obvious how it might be used for any of the purposes described above. The differences between the values of different health states are entirely determined by perceived differences in the descriptions that the health states are given. This MID therefore does not reflect the smallest score that people find important, but the smallest difference between the health state descriptions, which is fixed by the descriptive system itself, not by the people who value them. As importantly, it is based on an assessment of health state differences for individual people, and in a group or population context, it is highly vulnerable to the problem outlined above of the mean giving a misleading account of overall clinical importance.

The overall recommendation for MIDs is that the purpose of using a MID in a particular context should be carefully considered, that a precise definition for the MID is derived from that purpose, and that the methods used to estimate that MID fit with the definition adopted.

#### *5.1.3 Case-Mix and Risk Adjustment of EQ-5D Data*

Although we refer here to case-mix adjustment, the principles also apply to the related concept of risk adjustment. Adjusting HRQoL or PRO scores for the differing characteristics of patients and external factors is often essential in making comparisons of outcomes. For example, when comparing the average observed EQ-5D value or EQ VAS score changes after treatment for patients in different hospitals, it is important to account for factors that affect outcomes but are not due to variations in the quality of care. One such factor may be the average age of patients treated, which may differ between different providers and affect the outcomes that can be achieved. To obtain a fair comparison of the outcomes of different hospitals, they should be adjusted to take account of the mix of cases that the hospitals see.

There are many different methods for calculating case-mix adjustments, including stratification and direct and indirect standardisation. Stratification refers to calculation of outcomes for subgroups of a population defined according to key characteristics that might affect outcomes, such as age, sex and ethnicity. Direct standardisation calculates outcomes for different units, such as hospitals, adjusted by comparison of the levels of the case-mix variables to those in a known reference population. Indirect standardisation uses, instead of a known reference population, the average level of the variables for the units as a whole. Here we give an example of how the United Kingdom's National Health Service (NHS) adjusts for case-mix in its PROMs programme (Nuttall et al. 2015; Department of Health 2012; NHS England Analytical Team 2013) using the indirect standardisation method.

The NHS case-mix adjustment method has two stages. The average impact of case-mix variables on EQ-5D values or EQ VAS scores is calculated over all patients using regression analysis. The regression coefficients are used to estimate, for each health care provider, the average EQ-5D value or EQ VAS score that would be expected for its mix of those variables. From this, the difference between expected and actual outcomes is calculated for each provider.

This is regarded as a measure of a provider's performance, but the 'expected' outcome is for a hypothetical provider that has the same case-mix, and does not compare the provider with other real providers. Each provider's outcomes are therefore transformed so that they can be compared to a standard case-mix, which is the mean level of the case-mix variables over all providers. This also generates the all-providers average EQ-5D value or EQ VAS score, by definition.

Figure 5.2 illustrates this, using a very simple case-mix adjustment to the posttreatment EQ-5D value or EQ VAS score (Q2), taking account of the pre-treatment value of the score (Q1). An observation on the Q1 = Q2 line would mean that there had been no change in the average EQ-5D value or EQ VAS score. The hypothetical regression line lies above this, meaning that at all levels of Q1 there is on average an improvement following surgery. Q2 is higher with higher Q1, but the size of the improvement (the difference between Q2 and Q1) is smaller with higher Q1.

For provider A, its average post-surgery EQ-5D value or EQ VAS score is Q2a, so that the change in the EQ-5D value or EQ VAS score unadjusted for case-mix is -Q = Q2a−Q1a. Its expected EQ-5D value or EQ VAS score is Q2b and it therefore has performed better than would be expected for a provider that had the same case-mix.

Performance can be quantified as Q2a−Q2b; if this is positive, the provider achieves on average results greater than those predicted; negative if worse than predicted; and zero if as predicted. This difference is applied to the all-provider Q2 EQ-5D value or EQ VAS score, which is Q2d, to give the estimated actual EQ-5D value or EQ VAS score for Provider A if it had the all-provider case-mix. This EQ-5D value or EQ VAS score, Q2c, is calculated so that Q2c−Q2d = Q2a−Q2b, which means Q2c = Q2d + (Q2a−Q2b). The relevant Q1 comparator for this is the all-provider Q1 EQ-5D value or EQ VAS score, so the case-mix adjusted change in the EQ-5D value or EQ VAS score for Provider A is -Q' = Q2c−Q1.

**Fig. 5.2** Stylised example of case-mix adjustment. This figure is taken from Chapter 3 of Appleby et al. (2015)

Amongst the problems with this method are those outlined in Sect. 5.1.1 concerning the assumed counterfactual to the observed changes and the effect of fixed end-points and discrete EQ-5D values on the distribution of changes and their relationship with the pre-surgery EQ-5D values or EQ VAS scores.

Case-mix adjustments can change the estimated outcomes for different units such that a very different assessment is made of their relative performance. For example, Appleby et al. (2015) showed that using a case-mix adjustment for changes in EQ-5D values in the English NHS PROMs programme reduced the range of average hospital scores and the size of their variability around the national average. More importantly, it produces a different performance ranking of hospitals in terms of health gain, as individual hospitals' adjusted and unadjusted gains differ considerably in many cases.

#### **5.2 Mapping**

In this context, mapping refers to methods that are used to convert the responses of one HRQoL or PRO measure to those of a different measure. The most usual application for the EQ-5D is based on an interpretation of EQ-5D values as numbers representing the values that people attach to health states, which have cardinal measurement properties such that they can be used to calculate Quality Adjusted Life Years, which can be used as the denominator in an Incremental Cost Effectiveness Ratio. Mapping is used to convert data from a measure that does not have these properties, such as a condition specific instrument, to EQ-5D values. This takes the form of an algorithm which is applied to the source measure and generates EQ-5D values. Mapping could also be used simply to translate the responses given in another HRQOL measure into EQ-5D health states.

Mapping is also used to convert between the values of the 3L and 5L versions. However, we will not discuss the methods used for this as they concern valuation of health states, which is outside of the scope of this review. At the time of writing, there are no definitive guidelines for those who wish to convert 3L to 5L or vice versa, and a continuing debate about the best methods. Those wishing to make use of such mapping are advised to consult the most up-to-date literature; current key references include van Hout et al. (2012), Hernandez-Alava et al. (2017) and Dakin et al. (2018).

There are useful statements of good practice in mapping to health state measures that have the value-based and cardinality properties described above from measures that do not (Wailoo et al. 2017), and for reporting those studies (Petrou et al. 2015). There is also an online database of existing mapping studies (Dakin 2013). Those who wish to undertake mapping or use existing mapping algorithms are advised to consult those papers, and here we simply summarise some of the issues. It should be emphasised that mapping is a second-best approach that produces only an approximation to true EQ-5D values. The availability of a mapping algorithm for a particular measure can never be a justification for failing to collect EQ-5D health state data as well as or instead of that measure.

The earliest studies that undertook mapping were often based on direct judgements by clinical experts, patients or researchers about the correspondence between the descriptive systems of the source measure and EQ-5D values. This is not now regarded as good practice. Acceptable mapping methods require data that have been collected from respondents who have completed both the source measure and the EQ-5D.

There is a broad division of mapping methods between those that map directly to EQ-5D values and those, known as *response mapping*, that map to EQ-5D health states, from which EQ-5D values are calculated using a value set. For the direct method, it is possible simply to assign EQ-5D values for the health state recorded by each respondent to the category or score that they report for the source measure, and calculate the mean over all respondents. However, this method is restrictive, because it only enables mapping for those health states present in the sample in large enough numbers. It is also known that other patient characteristics and health and treatment condition variables may impact on the mapping. As a result, it is regarded as best practice to use a regression-based method to ensure that the mapping algorithm is both more comprehensive and more precise.

The response mapping method has the advantage, when *using* a mapping algorithm, that it produces an algorithm that generates EQ-5D health states, to which any value set can be applied, while the direct method is specific to a particular value set. The direct method has the advantage, when *generating* a mapping algorithm, that in estimating the relationship between the source measure and the EQ-5D, the response or dependent variable—EQ-5D values—can be treated as a continuous variable. The response mapping method is based on categories—EQ-5D health states—that do not even have ordinal properties. This is a problem because it potentially requires a data set large enough to contain a meaningfully-large observation for each of the 243 (3L) or 3125 (5L) health states. However, in practice this problem is dealt with by assuming that the level recorded in each dimension is independent of the level recorded in other dimensions. This permits estimation of five separate ordered dependent variables, which is statistically much more amenable to analysis.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### **Glossary of EQ-5D Terms**

In this section, we set out the terms used in this book to describe specific aspects of the EQ-5D instruments. It is important to use these terms consistently in papers and reports, as this ensures effective communication and avoids confusion between terms that have very different meanings. For example, the visual analogue scale element of the EQ-5D questionnaire, which is used to report a respondent's overall current health state, should not be confused with a visual analogue scale used as a stated preference method for valuing defined EQ-5D health states. The first is therefore called the EQ VAS and the second the EQ-5D VAS.



<sup>(</sup>continued)



(continued)