Nancy Devlin Bram Roudijk Kristina Ludwig *Editors*

# Value Sets for EQ-5D-5L

A Compendium, Comparative Review & User Guide

Value Sets for EQ-5D-5L

Nancy Devlin • Bram Roudijk • Kristina Ludwig Editors

# Value Sets for EQ-5D-5L

A Compendium, Comparative Review & User Guide

*Editors* Nancy Devlin Centre for Health Policy University of Melbourne Melbourne, VIC, Australia

Bram Roudijk EuroQol Research Foundation Rotterdam, The Netherlands

Kristina Ludwig Department of Health Economics and Health Care Management, School of Public Health Bielefeld University Bielefeld, Germany

ISBN 978-3-030-89288-3 ISBN 978-3-030-89289-0 (eBook) https://doi.org/10.1007/978-3-030-89289-0

© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specifc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affliations.

Cover illustration: agsandrew/shutterstock.com.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# **Foreword by Michael Drummond**

Since its development by the EuroQol Group in 1990, the EQ-5D instrument has probably become the most widely used measure of health-related quality of life in economic evaluations of healthcare treatment and programmes. It is also frequently used in health technology assessments (HTAs), which are a key element in technology adoption decisions in several countries. Reasons for its success include it being generic, concise – and it being accompanied by the value sets that are required to support economic evaluation. This prominence of the EQ-5D means that it is important that the users of the instrument fully understand the attributes and limitations of these value sets. This is the aim of the book *Value Sets for EQ-5D-5L: A Compendium, Comparative Review & User Guide* by Devlin, Roudijk and Ludwig. The primary 'users' in this case are those using EQ-5D data in economic evaluation or other applications, and decision makers interpreting economic evaluations or HTAs as part of their decision-making processes.

The main impetus for the book is the development of the new '5L' version of the EQ-5D instrument. The original version had only three levels on each of the 5 dimensions of health-related quality of life, which in some dimensions resulted in a rather large or 'abrupt' change between levels. For example, in the dimension of 'mobility', a change from level two to level three is expressed as moving from 'some problems in walking about' to being 'confned to bed'. By moving from 3 to 5 levels on each of the dimensions, it is possible to characterise a more gradual change between health states, which is arguably more realistic.

The main implication of the decision to move from 3 to 5 levels was the need to develop a new series of value sets for the EQ-5D-5L. These represent the value of each of the 3125 health states defned by the instrument (i.e. individual combinations of the 5 levels and 5 dimensions) and are critical to the calculation of the quality-adjusted life years (QALYs) gained by healthcare treatments and programmes. This is no mean task, especially as there is no reason to believe that the preferences for health states of the general public would be the same in different jurisdictions. Indeed, more than 35 distinct value sets have been developed for the original 3L instrument, and decision makers in several jurisdictions require that the value set used in HTAs should be representative of the preferences of the population of the country concerned.

To undertake this task, the EuroQol Group embarked on an extensive programme of methodological and empirical research which is reported in the book. Rather than just replicate the approaches used in the development of value sets for the original instrument, the group took the opportunity to develop and strengthen the methods of valuing health states, with a view to specifying a common protocol. (This programme of research is discussed in Chaps. 2 and 3 of the book.)

Users of the new instrument will probably be most interested in the value sets themselves. These are described and classifed in the country-by-country overview given in Chap. 4. Looking at the classifcation given, it is hard to think of any critical information that is not given about the precise methods used to generate the value set for each country, the sample of individuals whose preferences were measured, and the main results obtained. I fnd this chapter useful both in providing information about each country, and for understanding the results for each country in the overall context of all the research conducted.

The comparisons between the various country value sets and issues of which value set to use are discussed in Chaps. 5 and 6. These choices can be quite complicated, especially during the period while value sets for the new instrument are still being developed. For example, if a value set is not currently available for EQ-5D-5L in my country, should I use a value set from a country I perceive to be similar to my own, or an earlier value set for my country from the 3L instrument, using the mapping/crosswalk algorithm that has been developed? What should I do if no value set for either instrument exists for my country?

The fnal chapter discusses the future for value sets. My take from it is that there will be continuing debate about the different results produced by different versions of the EQ-5D, what the change from 3 to 5 levels does to the sensitivity of the instrument, and the conditions under which it makes sense to develop a new value set for a particular jurisdiction. Being based in 'God's Own Country' (Yorkshire), I often wonder whether we should develop our own value set or continue to use those available for the UK! Some of these big questions will probably never be answered. But in the meantime, I have reached a simple conclusion. If you are a decision maker or researcher using the EQ-5D-5L, you should read this book.

Professor of Health Economics Michael Drummond University of York, York, UK

# **Foreword by Kim Rand**

The EuroQol Group started small, and made a concise instrument. After a short detour down the track towards a 6-dimension questionnaire, the EQ-5D was born in a form we easily recognize today, covering "no", "some", and "extreme" problems in mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. A visual analogue scale reminiscent of a thermometer completed the questionnaire, which took all of two pages. That's it.

Today, the EQ-5D is the most used instrument worldwide for measuring qualityadjusted life years (QALYs) for health-economic analyses, even though such use constitutes merely a fraction of publications using the instrument.

The brevity and simplicity of the EQ-5D is also its primary strength, and driver behind its apparent success. It is also the instrument's greatest weakness, and has spurned debates that continue unabated today: which dimensions of health should be included, how should they be described, and what levels of functioning should be provided?

The frst large-scale time trade-off (TTO) based national value set for the EQ-5D, from the seminal UK Measuring and Valuing Health (MVH) study, was published in 1997, initiating a furry of research and costly valuation studies for the 3-level version of the EQ-5D (EQ-5D-3L). In just a few years, the number of studies had risen to the point where potential users could easily lose track as to which value sets were available, and for what jurisdictions. The question as to which value set to use became a real issue. The 2007 book *EQ-5D Value Sets: Inventory, Comparative Review and User Guide* by Szende, Oppe, and Devlin came to the rescue, describing 8 TTO-based and 9 VAS-based national EQ-5D-3L value sets. Around the same time, the development of the new and current 5-level version of the EQ-5D was underway; after years of intense debates, it was decided that three levels of problems were impractically large grained as a descriptive tool. A new, fve level tool was developed: the EQ-5D-5L.

While the EQ-5D-5L remains the briefest of instruments in its class, and still takes up just two pages, the additional levels provided great challenges in terms of how the instrument should be valued. Over the past decade, the primary focus of the EuroQol Group has been a massive effort around the methods, logistics, tools, and decisions necessary to produce high-quality preference-based values for the instrument. As a consequence, 25 national value sets have been produced, and more are underway. While this is great news from a scientifc perspective, the co-existence of value sets for the 3L and 5L versions of the instrument has further complicated the question as to which value set to use.

This book, I am happy to say, does not provide a defnite answer. It does, however, provide insightful discussions around the topic, as well as simple and useful guidance as to how users of the EQ-5D worldwide can identify the value set best suited for their purposes.

The EQ-5D and associated value sets are powerful tools to guide decisionmaking in the health sector and beyond. However, value sets are derived in a particular setting, conventionally designed to refect the health preferences of the adult general population in a particular country. For use in public decision making, where legitimacy and transparency are increasingly required, decisions around what constitutes the best reference value set is a decision best placed at a national jurisdictional level. For any reader of this book who happens to be a decision maker in a country contemplating the use of the EQ-5D, I hope that this book will provide inspiration and insight to help develop national guidelines. We would, of course, be happy to be of assistance in this process.

For non-economic use, and for research involving international comparison, different reasoning may come into play. Regardless of the purpose for which value sets are used, this book, which will be updated with new value sets in the future, provides an excellent source of structured information on available EQ-5D-5L value sets, including methods and other details around the studies from which they derive.

While I appreciate having such books in hardback format, to be retrieved from my bookshelf at need, I am delighted and proud that this book is also available online and for free as an open-source e-book. In the spirit of transparency, *Value Sets for EQ-5D-5L: A Compendium, Comparative Review & User Guide* by Devlin, Roudijk, and Ludwig presents to the world what the EuroQol Group has produced over the last decade, and highlights the important challenges we are currently facing. This book is not intended to stop the debates around the EQ-5D. Rather, it provides an overview of where we currently stand in terms of value sets, which should be of interest to end users, researchers, and decision makers. In the meantime, the EuroQol Group is funding an unprecedented volume of research in a wide range of areas, including a strong push forward for the youth version of the instrument. New books will be needed in just a few years.

In the meantime, enjoy!

Chair of the EuroQol Executive Committee, Kim Rand EuroQol Research Foundation, Rotterdam, The Netherlands Co-founder and Principal, Maths in Health B.V., Rotterdam, The Netherlands Senior Researcher, Health Services Research Unit, Akershus University Hospital, Lørenskog, Norway

# **Preface**

The aim of this book is to collate information about and provide guidance on the use of EQ-5D-5L value sets, providing an easy-to-use resource for users of the EQ-5D-5L instrument. By creating this compendium of value sets, our hope is not just to increase the accessibility of this material, but to also encourage users to be aware of how these value sets were created, the characteristics of value sets, their differences and similarities, and the implications for their use in analysing of EQ-5D-5L data.

The availability of value sets is one of the reasons the EQ-5D instruments are widely used in economic evaluations in healthcare and in population studies around the world. However, these value sets are generally published in peer-reviewed scientifc journals largely aimed at other researchers. The information presented about these studies differs somewhat between journals, authors, and studies. Furthermore, not all value sets are published Open Access, making it more diffcult for some (especially non-academic) users to access the value sets. This creates a place for a book pulling together all the relevant information into a single source and similar format, allowing for an easier comparison.

Our focus in this book is on value sets for the EQ-5D-5L. The development of a standardised international protocol for conducting such studies, and the related training and quality control processes that nowadays accompany it, now constitutes a 'mature technology' that was successfully employed to generate a considerable number of value sets since the frst wave of studies commenced in 2012. The timing therefore seems appropriate to produce this compendium of these value sets. In doing so, we build on the precedent set by Szende, Oppe, and Devlin's 2007 book on EQ-5D-3L value sets, which continues to be widely used and cited as a resource on value sets for that instrument.

We are grateful to the EuroQol Research Foundation for their support of our work and in particular for their commitment to Open Access publication of this book. We would also like to express our gratitude to the principal investigators of

the value set studies reported in this book for their support and input, and to all those who contributed to the authorship of chapters. This was a team effort and a credit to the collegiality and common purpose of the EuroQol Group in promoting the measurement and valuation of health.

We hope you fnd this book useful!


# **Acknowledgements**

We acknowledge with gratitude the EQ-5D-5L value set principal investigators, and their research teams in each country, for giving us permission to include their studies in this book, allowing us access to their valuation data, and for their input to and clarifcations on the summaries presented in Chap. 4. Without their efforts in conducting these studies, this book would not have been possible. We are also grateful to Kim Rand, chair of the Executive Committee, and to Elly Stolk, scientifc team leader, for their careful reviews of the chapters and thoughtful comments and suggestions, and to Rosalind Rabin and KMHO for helpful edits. Lastly, we would like to thank Gerben Bakker for his support with legal and copyright matters.

Funding for this book was provided by the EuroQol Research Foundation EQ169-2020RA. Views expressed in this book are not necessarily those of the EuroQol Group.

# **Contents**


# **About the Editors**

**Nancy Devlin** is Professor of Health Economics at the University of Melbourne, Australia, and senior visiting fellow at the Offce of Health Economics, London. She was 2019–2020 president of ISPOR and a past president of the EuroQol Group. Her principal areas of research are the measurement and valuation of health-related quality of life, priority setting in healthcare, and the cost effectiveness thresholds used in making judgements about value for money in healthcare. Her previous books include *Methods for Analysing and Reporting EQ-5D Data* (with Bas Janssen and David Parkin), *Using Patient Reported Outcomes to Improve Health Care* (with John Appleby and David Parkin), and *Economic Analysis in Health Care* (with Stephen Morris, David Parkin, and Anne Spencer). She was also a co-author (with Agota Szende and Mark Oppe) of the book on EQ-5D-3L values, *EQ-5D Valuation Sets: An Inventory, Comparative Review and User Guide.* Nancy contributed to methodological work underpinning the EQ-VT protocol and to the development of EQ-5D-5L value sets in Mexico, Ireland, and England.

**Bram Roudijk** works as a scientist at the business offce of the EuroQol Research Foundation in Rotterdam, the Netherlands. His main research interests are valuation of health-related quality of life, modelling health-related quality of life valuation data, and differences in preferences for health-related quality of life between countries and cultures. Bram is involved in supporting valuation studies for the EQ-5D family of instruments, including the EQ-5D-5L. Furthermore, Bram is a member of EuroQol's Valuation Working Group.

**Kristina Ludwig** works as a senior scientist in the Department of Health Economics and Health Care Management at Bielefeld University, Germany, and as a freelance senior scientist for the business offce of the EuroQol Research Foundation in Rotterdam, the Netherlands. Her main research area is the measurement and valuation of health-related quality of life in the context of economic evaluation and measuring patient preferences. During her position at the EuroQol offce, she provided support to various valuation studies for the EQ-5D-5L. She contributed methodological work for the refnement of the EQ-VT protocol and was signifcantly involved in the development of the value sets for the EQ-5D instruments, including EQ-5D-5L in Germany.

# **Contributors**

**Jan Busschbach** Section of Medical Psychology, Department of Psychiatry, Erasmus MC, Rotterdam, The Netherlands

**Nancy Devlin** Centre for Health Policy, University of Melbourne, Melbourne, VIC, Australia

**Aureliano Paolo Finch** EuroQol Research Foundation, Rotterdam, The Netherlands

**Bas Janssen** EuroQol Research Foundation, Rotterdam, The Netherlands Section of Medical Psychology, Department of Psychiatry, Erasmus MC, Rotterdam, The Netherlands

**Kristina Ludwig** Department of Health Economics and Health Care Management, School of Public Health, Bielefeld University, Bielefeld, Germany EuroQol Research Foundation, Rotterdam, The Netherlands

**Richard Norman** School of Population Health, Curtin University, Bentley, Australia

**Jan Abel Olsen** Department of Community Medicine, University of Tromsø – The Arctic University of Norway, Tromsø, Norway Division of Health Services, Norwegian Institute of Public Health, Oslo, Norway

**Mark Oppe** Maths in Health B.V., Rotterdam, The Netherlands

**David Parkin** Offce of Health Economics, University of London, London, UK

**Simon Pickard** Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, University of Illinois at Chicago, Chicago, IL, USA

**Juan Manuel Ramos-Goñi** Maths in Health B.V., Rotterdam, The Netherlands

**Bram Roudijk** EuroQol Research Foundation, Rotterdam, The Netherlands

**Elly Stolk** EuroQol Research Foundation, Rotterdam, The Netherlands

**Ben van Hout** School of Health and Related Research, University of Sheffeld, Sheffeld, UK

Pharmerit International, York, UK

**Zhihao Yang** Health Services Management Department, Guizhou Medical University, Gui'an, People's Republic of China

# **Chapter 1 The Development of the EQ-5D-5L and its Value Sets**

**Nancy Devlin, Simon Pickard, and Jan Busschbach**

**Abstract** This chapter introduces the EQ-5D-5L questionnaire and its development by the EuroQol Group. The availability of the EQ-5D-5L, and the growing evidence of its pivotal role as a measurement system, generated a demand for 'values' to accompany it that would enable the use of EQ-5D-5L data in the estimation of quality-adjusted life-years (QALYs) and other applications where EQ-5D-5L profle data needs to be summarised by a single number. Chapter 1 sets out the main aim of the book: to provide an accessible source of information and guidance to support users of EQ-5D-5L and its value sets. Specifcally, the book aims to improve users' understanding of how EQ-5D-5L value sets are generated using the internationally standardised EQ-VT protocol; to raise awareness of the characteristics and properties of value sets; and to inform users' choice of which value set to select for which purpose, and how that choice may affect analysis. The chapter concludes with an overview of the content of the book.

# **1.1 The EQ-5D as an Instrument for Measuring and Valuing Health**

Since the 1990s, the EQ-5D instrument has held a pivotal role in the measurement of self-reported health status and health-related quality of life (HRQoL) (Devlin and Brooks 2017). The availability of a *concise* generic instrument for measuring

J. Busschbach Section of Medical Psychology, Department of Psychiatry, Erasmus MC, Rotterdam, The Netherlands

N. Devlin (\*)

Centre for Health Policy, University of Melbourne, Melbourne, VIC, Australia e-mail: nancy.devlin@unimelb.edu.au

S. Pickard

Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, University of Illinois at Chicago, Chicago, IL, USA

patients' and population self-reported health1 meant that it could be included, with minimal responder burden, in clinical trials, observational studies, and population health surveys. More recently, it has become the cornerstone of routine outcomes measurement in health care systems such as the English NHS PROMs programme and Sweden's national quality registers. The ability of the EQ-5D to measure HRQoL in a *generic* manner has the important advantage of yielding data that can readily be compared across disease areas and between patient and population subgroups, and against population norms. This broad comparability of EQ-5D data is particularly crucial in providing evidence that quantifes health benefts in a standardised and transparent manner to inform decisions regarding alternative ways of using health care resources.

The EQ-5D was developed by the EuroQol Group, then a small group of academics which has now grown into an international network of multidisciplinary researchers with more than 100 members worldwide (Devlin and Brooks 2017). The development of the EQ-5D was motivated in part by the specifc goal of providing evidence on the outcomes of health care programmes in a manner that would facilitate economic evaluation. One of the considerations underpinning the development of the instrument was that it would be accompanied by the 'values' (sometimes also referred to as 'utilities', 'quality of life weights', the 'EQ-5D Index' or 'EQ Index weights') that would enable the quality adjustment of life years as required for the estimation of quality-adjusted life-years (QALYs) used in cost effectiveness analysis (Drummond et al. 2015). The availability of value sets for this purpose has been a notable part of the success and uptake of EQ-5D instruments.

The value sets that accompany EQ-5D instruments provide a means of summarising, via a single number, how good or bad health status is as described by the EQ-5D. The responses to the EQ-5D instrument – that is, the particular combination of levels which are indicated on each of the fve dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression), by those completing it – can be described as EQ-5D 'profles' (see Box 1.1). The original version of the EQ-5D, the EQ-5D-3L, has three response levels for each of the fve dimensions, describing a total of 35 = 243 possible profles (Brooks 1996). The focus of this book is on the later fve level version, the EQ-5D-5L, development of which is described in more detail in the following section – which describes a total of 55 = 3125 profles (Herdman et al. 2011).

The value sets for these instruments provide a single value for each of the possible profles described by them. These values lie on a scale anchored at 1 (full health) and 0 (dead), as is required for the estimation of QALYs. The values are built up from a set of sub-weights which represent the relative importance of each level of problem in each dimension, and indicate how good or bad these are overall, when combined in EQ-5D-5L profles. The term *value set* refers to a set of values

<sup>1</sup>There is ongoing debate about the term health-related quality of life (HRQoL) and whether EQ-5D measures HRQoL or self-perceived health status (Brazier and Karimi 2016). For simplicity, in the remainder of this book, we refer to the EQ-5D-5L value sets as providing values for health states or health as described by the EQ-5D-5L.

#### **Box 1.1: EQ-5D Questionnaires, EQ-5D Profles and Values** EQ-5D questionnaires comprise two key parts:


#### **The EQ-5D-5L questionnaire**

© EuroQol Research Foundation. EQ-5DTM is a trademark of the EuroQol Research Foundation. Reproduced by permission of EuroQol Research Foundation. Reproduction of this version is not allowed. For reproduction, use or modifcation of the EQ-5D (any version), please register your study by using the online EQ registration page: www.euroqol.org.

The EQ VAS is an important part of the questionnaire and provides the patients' overall assessment of their own health on a visual analogue scale. However, many applications of EQ-5D data, including the estimation of QALYs for economic evaluation, focus instead on the use of EQ-5D profle

#### **Box 1.1** (continued)

data. The profle data, and the use of value sets to summarise those data, is the focus of this book, and therefore the EQ VAS is not discussed further. There are in fact many ways of analysing EQ-5D profle data, as detailed in Devlin et al. (2020). One of these ways is by weighting the profle using values sets. This is the most common way of using EQ-5D data in cost effectiveness analysis. This book focuses on the value sets available for the EQ-5D-5L and their use in weighting EQ-5D-5L profle data.

The value sets provide a way of converting the profles into a single number that refects how good or bad people think they are. The values are usually obtained using stated preference methods, and yield values that lie on a scale anchored by the value of 1 for full health, and 0 for dead. EQ-5D values cannot be higher than 1, but values <0 are possible, and indicate health states considered on average to be worse than dead (WTD). Value sets are generally intended to represent the average preferences of local/national populations – so EQ-5D value sets differ between countries. See Chap. 4 for a summary of the available value sets for EQ-5D-5L, and Chap. 6 for information about the differences and similarities between them.

for all possible profles defned by a particular EQ-5D instrument, and is occasionally also referred to by other names, such as an EQ-5D 'tariff' or 'social values.' For the purposes of this book, we will use the terms value and value set.

These values are usually based on the average preferences of the relevant adult general population, obtained using stated preference methods such as the Time Trade-Off (TTO). These stated preference methods aim to elicit values which have the desired properties for estimating QALYs (see Box 5.1 in Chap. 5). Indeed, the availability of EQ-5D values which are suitable for this purpose has led to the EQ-5D being the most widely recommended questionnaire for use in the cost effectiveness evidence submitted to Health Technology Appraisal (HTA) bodies. The EQ-5D is recommended in 85% of HTA guidelines (Kennedy-Martin et al. 2020), including those of the UK's National Institute for Health and Care Excellence (NICE 2013).

The EuroQol Group was and continues to be a pioneer in the development of local/national value sets. The development of EQ-5D-3L value sets was, as an international research effort, unparalleled in the availability of country-specifc values (Szende et al. 2007). There are currently EQ-5D-3L value sets available for 35 countries and, for the EQ-5D-5L, 25 countries, with still further value set studies underway or planned. Both the EQ-5D-3L and the EQ-5D-5L, and the value sets which accompany them, continue to be used next to each other in many countries. The value sets facilitated the use of data from EQ-5D instruments in the estimation of QALYs based on local preferences, as well as in other, 'non-economic' applications where EQ-5D profle data are summarised in a way that refects the relative importance of the different dimensions.

# **1.2 The Development of the EQ-5D-5L**

In 2005 the EuroQol Group initiated efforts to develop an expanded-level version of the EQ-5D-3L. This was motivated by concerns by some stakeholders about limitations of the original instrument, particularly ceiling effects and changes in health that were too small to be detected by the three-level version. Studies which had been undertaken by EuroQol Group members prior to 2005 had shown that various experimental fve-level versions of EQ-5D could reduce ceiling effects while at the same increasing reliability and sensitivity (discriminatory power) and maintaining feasibility (Janssen et al. 2008a, b; Pickard et al. 2007a, b).

The development and testing of the EQ-5D-5L is reported in Herdman et al. (2011). A decision was made early in the new instrument's development to retain the same fve dimensions as the EQ-5D-3L, but to expand the number of response levels. This could in principle have been achieved simply by adding two 'unlabelled' intermediate levels between the existing three. However, in order to arrive at values for the EQ-5D-5L profles, each health state to be evaluated by respondents needed to be capable of being described by fve sentences. This in turn required a label for each level. An example of an EQ-5D-5L health state, displayed in the manner it might be presented in a stated preference task, is shown in Fig. 1.1. The state described in Fig. 1.1 is the same combination of levels and dimensions as the example in Box 1.1 i.e., it is EQ-5D-5L profle 21325.

Herdman et al. (2011) describe the process by which these labels were established, using both English and Spanish as root languages in order to support further translation and adaptation of the new instrument. Severity labels for 5 levels in each dimension were identifed using response scaling. Selecting labels at approximately the 25th, 50th, and 75th centiles produced two alternative 5-level versions. Focus groups were used to investigate the face and content validity of the two versions, including hypothetical health states generated from those versions. This showed evidence in favour of the wording 'slight-moderate-severe' problems, with level one described as 'no problems' in each dimension, and level fve being 'unable to' in the EQ-5D functional dimensions (mobility, self-care, usual activities) and 'extreme problems' in the pain/discomfort and anxiety/depression dimensions.


**Fig. 1.1** An example of an EQ-5D-5L health 'state' described by fve sentences

The fnal version of the fve-level instrument which emerged from this work is described in the EQ-5D-5L User Guide (EuroQol Group 2019). Beside the increased number of levels of the dimensions, the 5-level version of the EQ-5D has other notable features which represent improvements on the EQ-5D-3L. Most importantly, the wording of the mobility dimension is improved: the most severe level of the mobility dimension of the EQ-5D-3L is 'confned to bed', which means that it cannot capture severe problems with mobility that do not involve being confned to bed. This acts to limit its usefulness both in detecting problems with mobility and in capturing improvements in mobility resulting from treatment (Oppe et al. 2011). In the EQ-5D-5L, the most severe level of mobility has been changed to 'unable to walk about'.

These improvements have yielded a number of advantages for the EQ-5D-5L over the EQ-5D-3L. These are summarised by Devlin et al. (2018) and include:


Overall, the evidence suggests that the EQ-5D-5L retains the principal benefts of EQ-5D-3L—its brevity and validity in a wide range of conditions—and produces a more accurate measurement of patient health than the EQ-5D-3L (Devlin et al. 2018). These advantages have been recognised by users and use of the EQ-5D-5L has rapidly increased. There are now more than 130 language versions of the EQ-5D-5L available.

# **1.3 The need for EQ-5D-5L Values**

The availability of the EQ-5D-5L, and the supporting evidence of its improved measurement system, generated a demand for values to accompany it, to allow use of its data in the estimation of QALYs and any other applications where EQ-5D-5L profle data need to be summarised by a preference-weighted single number.

In anticipation of the need to provide EQ-5D-5L value sets, the EuroQol Group initiated an ambitious programme of methodological research, running in parallel with the development of the EQ-5D-5L instrument, and aimed at producing an internationally standardised state-of-the-art valuation protocol. This was timely, as most of the EQ-5D-3L value sets were based on the so called 'MVH-protocol' developed in the early 1990s (Dolan 1997). There was a lack of consistency in the design and implementation of that protocol between value sets studies. Furthermore, limitations of the MVH protocol had been recognised, suggesting improved methods were required for valuation of the EQ-5D-5L.

The aim was therefore not just to improve on the instrument, but to also ensure that the valuation of EQ-5D-5L profles would be based on the best possible stated preference methods – and to provide a well-described, standard valuation study protocol which could be felded in a consistent way across different countries. This would ensure that the value sets generated for the new instrument would, as far as possible, be comparable across countries. That is, that differences between the EQ-5D-5L value sets which are observed would refect the local variations in preferences and opinions which they are intended to capture, rather than being confounded by differences in methods.

As it was anticipated that value sets would take several years to be developed and disseminated, an interim solution was to map EQ-5D-5L data to the EQ-5D-3L instrument by linking descriptive systems, and to use the value sets that already existed for the EQ-5D-3L (van Hout et al. 2011) (further explanation of mapping is provided in Chap. 5). While this provided a practical stop-gap means of summarising EQ-5D-5L data, these mapped values were recognised to be suitable only as a temporary solution as these indirect methods introduce additional error variance, and would still rely upon old and non-standardised value sets. Further, one might question whether values sets for the EQ-5D-3L, developed in the 1990s, would be an adequate representation of the average preferences of today's societies. There are numerous reasons to consider the need to update value sets, including changes in the underlying preferences of populations, improvements in the methods available to value health; changes in the distribution of population demographics; and concerns about potential bias in previous studies (Pickard 2015) – these issues are discussed further in Chap. 7.

In order to arrive at an improved and standardised valuation protocol, the EuroQol Group therefore commissioned a substantial programme of research to develop and test methods suitable for creating new value sets for the EQ-5D-5L that was initiated while the descriptive system was under development. The program of research – which is detailed in Chap. 2 – was started with the intention of providing investigators around the world with the tools to conduct a valuation study that would follow a standardised protocol and produce high quality data based on validated methods that supported comparisons between countries. These efforts culminated in an international protocol for conducting EQ-5D-5L valuation studies, which has been used to produce the 25 country-specifc value sets which are summarised in Chap. 4 of this book (Fig. 1.2).

This endeavour is unique in scale and ambition in the feld of HRQoL valuation and represents a signifcant body of work with direct relevance to decision makers and impact on health care policy internationally.

# **1.4 The Aims of this Book**

The book draws together and summarises, for the frst time, the body of evidence on EQ-5D-5L value sets that has been produced internationally from the EuroQol Group's programme of research and protocol development.

The primary aim of the book is to provide an accessible source of information and guidance to support users of EQ-5D-5L and its value sets. Specifcally, we aim to improve users' understanding of how value sets are generated; raise awareness of the characteristics and properties of value sets; and inform users' choice of which value set to select for particular application, and how that choice may affect their analysis and conclusions. Moreover, the book will also be useful to health economics and outcomes researchers specialising in HRQoL who want to obtain information on the research practises and protocols developed by the EuroQol Group to support EQ-5D-5L valuation.

We begin in Chap. 2 by detailing the process of developing the research protocol underpinning EQ-5D-5L valuation studies. This included a methodological programme of work and international pilot testing; development of a protocol; the frst wave of studies and the conclusions drawn from those early studies; modifcation and strengthening of the protocol and quality assurance processes; and use of the revised protocol in subsequent waves of value set studies. The chapter indicates the considerable learning and progress that was made through this journey of designing and refning the protocol.

Chapter 3 sets out the various aspects of the study design and the basis on which methodological choices were made with respect to the stated preference methods to use; the sub-set of states to value using these methods; and minimum sample size needed.

Chapter 4 provides a reference source and 'thumbnail overview' of the characteristics of the value set in each of 25 countries. In each case, we provide a summary of the value set itself and its characteristics, a worked example of the calculation of values from it; information on the sample from which values were obtained; the methods used in analysing the data and modelling the value set; and the uptake by local HTA bodies and other health care decision makers.

Chapter 5 provides guidance to those who have collected EQ-5D-5L data and want to know how to choose between the value sets reported in Chap. 4. This includes consideration of the purpose of using value sets, how to proceed when there is no value set for a specifc country of interest or where there is more than one value set for a given country; and when it is appropriate to use mapping to obtain EQ-5D-5L values.

Chapter 6 draws together the value sets summarised in Chap. 4, and compares and contrasts their characteristics, reporting original comparative analysis undertaken specifcally for this book. To what extent are there similarities between EQ-5D-5L value sets across countries – and are there important differences between them? Our intention in Chap. 6 is to encourage users to be aware of the specifc properties of the value sets they select to use.

We conclude in Chap. 7 by refecting on the value sets produced to date and considering a number of questions about future directions for this body of work. For example, what is the 'shelf-life' of a value set – and what factors should prompt an update, in order to ensure that value sets represent an adequate representation of the average preferences of a society? What methodological questions remain – and how are improvements or variations in methods reconciled with the need for consistency in the evidence presented to HTA bodies and other users?

The book includes a glossary of terms for those unfamiliar with the EQ-5D and the valuation of the EQ-5D-5L.

The stated vision of the EuroQol Group is the aim "to improve decisions about health and health care throughout the world by developing, promoting and supporting the use of instruments with the widest possible applicability for the measurement and valuation of health" (EuroQol Group 2021). We hope this book contributes to that aim, and that it supports your use of EQ-5D-5L to provide evidence for better health care decision making.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 2 The Development and Strengthening of Methods for Valuing EQ-5D-5L – An Overview**

**Elly Stolk, Juan Manuel Ramos-Goñi, Kristina Ludwig, Mark Oppe, and Richard Norman**

**Abstract** The introduction of the EQ-5D-5L offered an opportunity to develop a standardised valuation protocol, the EQ-VT protocol, with improved methods for health state valuation that enables comparison of the resulting value sets between countries. This chapter summarises the process of developing and strengthening the methods for valuing EQ-5D-5L in the EQ-VT protocol which underpins the valuation studies reported in this book. This includes an overview of the methodological research programme that informed the initial EQ-VT protocol and a description of the key elements of the protocol and the included valuation techniques, i.e. composite time trade-off and discrete choice experiments. This chapter also discusses the frst wave of EQ-5D-5L valuation studies which used the protocol and the resulting conclusions; the subsequent modifcation and strengthening of the EQ-VT protocol including a quality control procedure; and experience with use of the improved EQ-VT protocol in the subsequent waves of EQ-5D-5L valuation studies. The chapter concludes with an overview of the lessons learned during this journey of evidence-based refnement of the EQ-VT protocol from version 1.0 to the current version 2.1.

E. Stolk

EuroQol Research Foundation, Rotterdam, The Netherlands

J. M. Ramos-Goñi · M. Oppe Maths In Health B.V., Rotterdam, The Netherlands

K. Ludwig (\*)

EuroQol Research Foundation, Rotterdam, The Netherlands e-mail: kristina.ludwig@uni-bielefeld.de

Department of Health Economics and Health Care Management, School of Public Health, Bielefeld University, Bielefeld, Germany

R. Norman School of Population Health, Curtin University, Bentley, Australia

# **2.1 Development of the EQ-VT Protocol**

Over the past 25 years, approaches taken to the valuation of EQ-5D-3L have not changed much from those used in Dolan (1997). While issues had been noted in regard to valuing the EQ-5D-3L, the desire to produce new EQ-5D-3L value sets using the same approaches as previously lessened the impetus for change. The introduction of the EQ-5D-5L offered an opportunity to explore how methods for health state valuation could be improved to produce an updated valuation protocol (Oppe et al. 2014). To arrive at a protocol that could be supported broadly, the initial development – and later refnement – of that protocol coincided with an extensive programme of methodological research within the EuroQol Group. This chapter summarises the research that was undertaken, the results that underpinned the initial version of the EQ-5D-5L valuation protocol and later modifcations, and the main lessons learned from the international EQ-5D-5L valuation work.

While the research programme had a broader scope, the focus was on two different methods to elicit preferences for health states, time trade-off (TTO) and discrete choice experiments (DCEs). TTO had emerged as the frst method of choice in earlier valuation studies, and the introduction of the EQ-5D-5L did not change that. Yet, concerns had been expressed about extremely low values that could be produced for states worse than dead (WTD), requiring arbitrary rescaling (Janssen et al. 2013) and therefore refnement of the TTO method was pursued within the research programme. Lead time TTO (LT-TTO) had been identifed as a possible TTO approach that could mitigate issues valuing states WTD (Robinson and Spencer 2006; Tilling et al. 2010; Devlin et al. 2011) and therefore the relative merits of that approach were explored (Attema and Versteegh 2013; Devlin et al. 2013; Versteegh et al. 2013). DCE was at that time recognised as a promising new method for health state valuation (e.g. Salomon 2003; McCabe et al. 2006), and having become more widely used in other aspects of health economics (Ryan 2004). DCE was, therefore the second focus of the research programme and was studied both as a potential alternative to TTO and as a complement. DCE has the beneft of having a generally simpler task compared with TTO, requiring simple choices rather than completion of an iterative process, with potential signifcant benefts for data collection. Questions around the way to collect and model DCE-data therefore also were addressed. Based on the desire to replace the props used in TTO interviews (e.g. TTO board) by computers and develop a computerised TTO procedure, all tasks were integrated into a digital aid (the EuroQol-Valuation Technology, EQ-VT), which was developed in conjunction with the protocol. As a result, the protocol is commonly referred to as the EQ-VT protocol.

We will not cover all fndings of the research programme in this chapter. However, several fndings require particular highlighting:

1. while the conducted research on LT-TTO produced ample proof of concept for the use of LT-TTO in health state valuation, values for states better than dead (BTD) seemed to be subject to a downward bias. Therefore, composite TTO (cTTO) was introduced (Janssen et al. 2013), which uses conventional TTO for the valuation of states BTD, and LT-TTO for states WTD;


Further results obtained in the methodological research programme have been documented in 19 journal articles. Oppe et al. (2014) described how those results supported the development of the EQ-VT protocol version 1.0.

# **2.2 Description of the EQ-VT Protocol**

# *2.2.1 Contents of the Protocol*

From its origins in 2012, some elements of the EQ-VT protocol have evolved but the overall structure has been retained, comprising the following six parts:


After a general welcome and explanation of the purpose of the study, selfreported health as measured by the EQ-5D-5L including EQ VAS and background questions regarding age, gender and experience with illness are asked. The third section then introduces respondents to the cTTO valuation tasks (see Figs. 2.1a and 2.1b). The interviewer uses the example health state "being in a wheelchair" to explain how to interpret and carry out the cTTO tasks. After cTTO task understanding is confrmed, respondents move on to value ten EQ-5D-5L health states and answer three debriefng questions regarding the cTTO tasks. In the next part of the interview, the interviewer explains how to carry out the DCE. The respondents are asked to complete seven forced-choice paired comparisons of EQ-5D-5L health states without a "duration" attribute (see Fig. 2.2), meaning the choice is simply between two EQ-5D-5L health states independent of time. Following this, respondents answer three debriefng questions regarding the DCE tasks. In the concluding part of the interview, the respondents can leave feedback and are thanked for their participation.

**Fig. 2.1a** Presentation of the composite time trade-off used in the EQ-VT protocol: better than dead task. (© 2021 EuroQol Research Foundation. Reprinted with permission)

**Fig. 2.1b** Presentation of the composite time trade-off used in the EQ-VT protocol: worse than dead task. (© 2021 EuroQol Research Foundation. Reprinted with permission)

The cTTO approach begins with the 'conventional' TTO with the frst question being ten years in the health state being valued versus ten years in full health (see Fig. 2.1a), and only shifts to an LT-TTO when the respondent considers the health state to be WTD. In that case, the following LT-TTO task involves a twenty-year time frame: ten years of lead time followed by ten years in the EQ-5D-5L health

**Fig. 2.2** Presentation of a discrete choice experiment task used in the EQ-VT protocol. (© 2021 EuroQol Research Foundation. Reprinted with permission)

state being valued (see Fig. 2.1b). The resulting cTTO values range from −1 (trading all of the lead time) to 1 (trading no years in full health) in 0.05 increments. The exact iteration scheme is reported elsewhere (Oppe et al. 2016). The underlying experimental design including the health state selection for both the cTTO and DCE tasks, and other study design considerations such as sample size requirements, is addressed in Chap. 3.

To ensure that respondents can give valid and meaningful responses during the cTTO task, they frst get the opportunity to experience the cTTO task by completing the wheelchair example and are made aware that they will be asked to evaluate a set of other health states in the same way. After that – still in the wheelchair example – they learn, amongst other things, how their responses will be interpreted, what the range of possible answers is, how the task proceeds in a slightly different way when a state WTD is encountered, and how they need to interpret the health states. Delivering these instructions is challenging for the interviewer, as most are not prompted on screen, the task is diffcult for some respondents, and the interview needs to be completed in a standardised and neutral way. Furthermore, a high level of task engagement is expected from the respondent, and this depends on the level of engagement demonstrated by the interviewer and the quality of interactions with the respondent. Since the wheelchair example is the point in the interview where all of this needs to be discussed, this section is the key to the successful implementation of the EQ-VT protocol; especially the cTTO part.

# *2.2.2 Why the cTTO Task was Adopted*

Concern with the way in which values for states WTD were produced in EQ-5D-3L value sets motivated much of the research carried out to develop the new protocol. It is well known that a standard TTO task, by contrasting a ten-year life in a disease state to a shorter life of t years in full health, can only produce positive values. In this task the value x of the disease state is given by t/10 at the point where the respondent is indifferent between the options. Since lifespans cannot have a negative value, t cannot be negative and so only values for x in the [0, 1] range can be observed. If respondents indicated that they preferred immediate death over living for ten years in the disease state, a modifed task was offered inviting the respondents to compare a health profle including t years in full health followed by 10−t years in ill health to immediate death. Here the value x of health is given by −t/(10−t). As the difference between t and 10 can be made infnitely small (e.g. counted in years, months, weeks, days or smaller units), the value of this negative ratio statistic can become extremely large. To counteract the effect on mean values, an arbitrary transformation was applied to bound the negative values at −1. Various options to transform the data have been proposed, however, the choice between them remained arbitrary but could affect the results substantially.

LT-TTO offers – in theory – a unifed approach for the valuation of states BTD and WTD. As in standard TTO, respondents consider how good or bad it would be to spend ten years in a state of impaired health. However, the period of impaired health does not start now, but starts ten years from now so that the total remaining lifespan is 20 years. This is compared to a life that has t years in full health and the duration t is varied between 0 and 20 to identify indifference. The value x of the disease state can be computed by solving 10+10x=t which gives a positive value for all t>10 and a negative value for all t<10.1 However, due to presence of a bias, described below, the cTTO approach was preferred over LT-TTO.

Regarding LT-TTO, larger lead times *ceteris paribus* extend the range of negative values that can be observed, but higher bounds on the maximum hypothetical lifespan and lower bounds on the size of the trade-off unit also need to be considered. Key fndings were that values for states BTD seemed to be affected by a downward bias in the LT-TTO task, and larger ratios of lead time to disease time amplifed this problem. A possible explanation is that respondents considered what portion of their remaining years to trade off without recognising that trading into the lead time implied a WTD response. Therefore, the decision was made to use standard TTO in the valuation of states BTD and only adopt LT-TTO for the valuation of WTD states. Consistent with previous valuation studies, the standard TTO was again specifed with a ten-year time frame and in the LT-TTO frame a lead time of ten years was offered to even out the changes in value associated with the trade-off unit (years) in the BTD and WTD size of the scale. The name 'composite' TTO was adopted as the name for the TTO protocol adopting standard TTO for the valuation of states BTD and lead time TTO for states WTD.

<sup>1</sup>For example: if t=16, the formula 10+10x=t will read 10+10x=16, thus 10x=6, which solves to x=0.6. If t=6, the same formula reads 10+10x=6, thus 10x=−4, and x=−0.4.

# *2.2.3 Why the DCE Task was Adopted*

In most EQ-5D-3L valuation studies, respondents received multiple valuation tasks, of increasing complexity and from the start it was assumed that the EQ-VT protocol would also include at least two types of stated preference tasks. But which tasks?

The non-standardised protocols for EQ-5D-3L valuation (see Chap. 1) supplied researchers with rank, VAS (visual analogue scale) valuation and TTO responses. At the discretion of the study teams, the collected data were used in various ways. In the early years, both VAS- and TTO-based value sets were developed while the ranking task was seen as a useful precursor. Gradually, however, the views on these methods started to shift. TTO became the method of frst choice and the use of VAS valuation started to decline. Alternatives to VAS valuation were considered for the EQ-VT protocol. At the same time, the underused potential of rank data started to be recognised (e.g. Salomon 2003; Craig and Busschbach 2009; Craig et al. 2009). In the EQ-VT protocol, the ranking and VAS valuation tasks eventually were displaced by DCE. This method is akin to a ranking task.

There were several reasons to choose DCE. For example, one reason for including DCE was because of the different nature of the instrument being valued, i.e., the EQ-5D-5L, rather than the EQ-5D-3L. The subtler differences between levels – especially at the mild end of the descriptive system – meant some people might not be willing to trade off any life years; whereas the DCE could still obtain preferences between mild states. Furthermore, DCEs were widely recognised as a promising new method for valuing health and shown to be feasible for EQ-5D (Stolk et al. 2010). Lastly, a DCE task can be set up in different ways and depending on the chosen confguration, it can produce values (a) on a latent scale or (b) values directly anchored on the QALY (Quality Adjusted Life Year) scale if either the attribute "duration" or the alternative "dead" is included in the DCE (Norman et al. 2016). In the latter case, DCEs yield values that can have the same cardinal measurement properties as TTO, but with a more straightforward and less costly data collection process. Anticipating future developments, it was also considered important to include DCE (instead of VAS or rank) now, to familiarise more researchers with the DCE method and promote learning.

The DCE task included in the EQ-5D-5L was a basic one, comparing two EQ-5D-5L health states without reference to lifespan, i.e., number of years lived in each state. Methodological research that guided this decision had suggested that this basic approach produces robust results, whereas the approaches that could produce values on the QALY scale initially suffered from unexplained high variability in the results, and researchers had different ideas about how to make these advanced tasks work. Therefore, it seemed unwise to push for a harmonised method when the protocol for EQ-5D-5L valuation was introduced. However, it was agreed to continue research about other DCE approaches and see if issues with those approaches could be resolved (for further discussion on this, see Chap. 7).

# *2.2.4 Value Set Generation*

Subsequent to completion of the EQ-VT data collection protocol, value set generation can be based on either hybrid models that draw on both types of data at the same time (i.e., cTTO and DCE), or on cTTO data only. The DCE data cannot be used independently as a basis for value set generation because DCE values are estimated on a latent scale and lack the interpretation of health state values that are anchored at 0 (dead) and 1 (full health). The option to generate a value set based on two types of data has the beneft of providing extra assurance about the ability to construct a value set based on data collected in the valuation study.

While cTTO and DCE results provide two measures of the same construct, preferences for health, perfect agreement of cTTO and DCE results is not to be expected due to the differences between the methods:


Choices between methods for value set generation must refect judgments about the relative merits of each method, given theoretical considerations and/or the properties of the empirical data. If the two data sources agree, that could be an argument to include all data deriving a value set with greater precision. Conversely, if there are discrepancies, it might be questioned which is the "correct" one and it might be considered problematic to combine the two data sources. However, the latter might be considered a fallacious argument, because there exists no gold standard against which the values derived from cTTO or DCE can be judged. Discrepancies therefore can also be looked at as providing complementary information.

As neither line of reasoning will be universally accepted, the EQ-VT protocol sets the frame for eliciting health state preferences and the local research team makes the decision about the way of value set generation (e.g. type of data included and modelling).

# **2.3 How the EQ-VT Protocol Updates Evolved**

After the frst wave of EQ-5D-5L valuation studies (Canada, China, England, Netherlands, and Spain) were completed using the new EQ-VT protocol, it became apparent that there was scope to improve on the frst version of the protocol, especially by strengthening it on the implementation side. In some of those initial studies, issues with the cTTO data were observed such as strong clustering effects, limited coverage of the value range, and high number of inconsistencies.2 The data issues seemed to refect low levels of task engagement of the respondents and/or the interviewers, leading to detrimental effects on quality of cTTO valuations. It was recognised that these issues clustered in interviewers and were not universally present, leading to the hypothesis that the data issues represented interviewer effects. This motivated the development and integration of a procedure that would allow the data to be monitored in real time to detect the presence of any issue and to enable timely interventions: a quality control (QC) procedure (Ramos-Goñi et al. 2017). In addition, the introduction of three practice cTTO tasks following the wheelchair example and the inclusion of confrmatory pop-ups for each cTTO task to validate answers before storing led to EQ-VT version 1.1 (see Table 2.1).

In addition, a comprehensive EQ-VT research programme was launched to test a range of suggestions for strategies that could help to prevent the data quality issues and interviewer effects from occurring. Shah et al. (2015) described the studies (N=7) that were done aimed at remedying cTTO data issues and improving EQ-VT. All studies were set up as experiments with at least two arms, allowing results obtained from a modifed version of the protocol (experimental arm) to be compared to the EQ-VT version 1.1 (control arm). The battery of tests included:



**Table 2.1** Overview of EQ-VT elements by protocol versions

Note: The cross mark shows that an element was included in the protocol version

*cTTO* composite time trade-off, *DCE* discrete choice experiment, *EQ-VT* EuroQol-Valuation Technology, *QC* quality control

a Sometimes used as optional element

<sup>2</sup>There were spikes (i.e., clustering of values at −1, −0.5, 0, 0.5, 1), lower than expected values for mild health states (i.e., a big gap to 1) and a low number of negative values (i.e., few WTD values). Moreover, there was a high number of inconsistencies overall and with regard to the worst possible health state 55555 (i.e., valuing less severe health states lower than the value for 55555).

The collected data provided strong support for integration of a QC module, as it improved data quality markedly. It also supported implementation of the feedback module (see Fig. 2.3), since respondents frequently appreciated having the option to review and reconsider their own responses if needed. The other tested modifcations did not produce clear benefts (Shah et al. 2015). Interviewer effects, clustering of cTTO values, and inconsistencies were strongly reduced in valuation studies that applied the updated EQ-VT protocol (Ramos-Goñi et al. 2017; Stolk et al. 2019). Guided by results obtained in this work, the EQ-VT has received two updates in 2013 (EQ-VT 1.1) and 2014 (EQ-VT 2.0). Later on, in 2017 one more update was implemented (EQ-VT 2.1), which altered the fow of the wheelchair example to include more prompts for interviewers (Stolk et al. 2019). Box 2.1 provides further details on the QC procedure, as implemented from protocol version 1.1 onwards.

**Fig. 2.3** Example of the feedback module used in the EQ-VT protocol since version 2.0. (© 2021 EuroQol Research Foundation. Reprinted with permission)

#### **Box 2.1: QC Procedure Since EQ-VT Protocol 1.1**

A QC procedure was introduced to monitor the interviewer's protocol compliance and interviewer effects as well as the face validity of the data. By looking at four QC criteria it is possible to determine whether an individual interview is of "suspect" quality. If any of the four following criteria is met for an individual interview, it is fagged:


Initial QC reports are used to evaluate whether interviewers met the minimum quality requirements. If 40% out of the ten interviews are fagged as being of *suspect* quality by using the QC tool provided by the EuroQol Executive Offce, all interviews thus far conducted by that interviewer will be removed and the interviewer will be retrained. After further ten interviews, interviewer's performance and compliance are re-evaluated. If again 40% or more interviews are fagged, these interviews will also be removed and the interviewer is removed from the interviewer team. A threshold value of 40% was selected because fagged interviews could hold genuine responses (e.g., respondents who quickly build their opinion and perform the cTTO tasks). Additionally, this allows interviewers to grow into their roles when they built up experience with valuation interviews following the EQ-VT protocol.

During the entire study, the local study team continuously monitors data quality. Later QC reports allow to refect on interviewers' performance, discuss possible improvements and intervene when the performance of an interviewer worsens.

# **2.4 Lessons Learned**

The evidence from the valuation studies and the comprehensive EQ-VT research programme (Shah et al. 2015) led to increased awareness of how challenging the interview is, both for the respondent and the interviewer. Data issues driven by interviewer effects showed that the interviewer and his/her skills are pivotal in the success of the interview, especially for the cTTO tasks. The amount of guidance to respondents affects their engagement and task understanding, and thereby accuracy and reliability of responses. The DCE task may be more robust to interviewer effects, but it may also be that data issues are simply relatively more transparent in cTTO data. While technical aspects of the tasks were the key focus before the frst valuation studies were launched, focus shifted to the human interaction side of the task after the frst wave, which is equally important and clearly needed more attention.

The changes made to the protocol can be categorised into (a) monitoring of and providing support to the interviewers and (b) supporting the respondents. As outlined above, the introduction of the QC procedure since version 1.1 with accompanying QC tool enables monitoring of the protocol compliance of the *interviewers* and interviewer effects. Moreover, it facilitates the support of the interviewer by providing data-based feedback. The items measured are reported elsewhere (Ramos-Goñi et al. 2017). In order to support the *respondents*, it was realised that an extended introduction and practising cTTO is necessary before the valuation tasks can be carried out in the intended way: three additional practice states and a dynamic question after the wheelchair example were added as EQ-VT elements. Depending on the respondent's response for the wheelchair task, in the dynamic question the respondent is asked to imagine a health state that is much better or much worse than being in a wheelchair in order to move to the other part of the evaluation space in the cTTO. Moreover, as mistakes and/or learning effects can still occur, confrmatory pop-ups after each task and the feedback module presented in Fig. 2.3 were additionally included. The latter presents respondents with the rank ordering implied by their cTTO valuations and provide the opportunity to fag problematic valuations for removal from the data. Further details on the EQ-VT elements and its changes are provided elsewhere (Stolk et al. 2019).

To prepare interviewers for their role in the execution of a study, the EuroQol Executive Offce started to work more closely with study teams. Besides making available an interviewer script and EQ-VT software tailored to the needs of each team, the EuroQol Executive Offce now also offers training for the local research team, who in turn train their interviewers (a 'train the trainer' approach). While this training helps, due to the complexity of the interview and because the topics taught are abstract until the interviewers start doing interviews, it will not prevent all issues. Learning on the job – as supported by the QC process – therefore has a big additional impact to promote performance of interviewers, using information on the behaviour of an interviewer to tailor and deliver a personalised set of additional instructions. The initial training addresses therefore a mix of topics related to content and process, to build up interviewer skills and to discuss collaboration when the study is ongoing.

Related topics that need consideration are the selection of interviewers, the logistics of data collection, and, more broadly, how investigators and interviewers can work together most effectively. This part of the study is not standardised, but the EuroQol Executive Offce can offer recommendations. To date, working with a small team of dedicated students, who travel together with a data coordinator throughout the country, and collect data in weekly round of ten interviews per

interviewer, serves as an example of good practice (e.g. Pickard et al. 2019; Shafe et al. 2019; Welie et al. 2020). Students have relevant background knowledge, are familiar with the concepts validity and bias, are keen to learn, want to do well, and do not mind having their performance assessed. Working as a group allows the data coordinator to deliver effective feedback, and individuals are likely to be receptive to it, since they see other interviewers work on similar issues. Undertaking the work as a group makes everyone more focused on the goal of the study.

# **2.5 Concluding Remarks**

Over the last ten years the accumulation of extensive multinational evidence supported the development and the subsequent refnement of a standardised EQ-VT protocol for conducting national EQ-5D-5L valuation studies. A multinational research programme examined alternative approaches for eliciting health state preferences, developed methods to improve data quality and demonstrated the robustness of these approaches across languages and countries.

The EQ-VT protocol was developed in a way that evidence-based refnements are anticipated. Across the different versions of the protocol, EQ-VT 1.0 to the current version 2.1, the valuation tasks have remained the same, but later versions pay more attention to the optimal implementation of these tasks combined with a QC procedure. The refnements of the EQ-VT protocol have been shown to improve data quality and minimize interviewer effects.

The EQ-VT protocol has to date successfully been applied in about 30 countries worldwide and, at time of writing, 25 of these have been published. These 25 value sets are summarised in Chap. 4, and their similarities and differences are described in Chap. 6. Even though the improved valuation protocol with its QC process provides a solid basis for estimating national EQ-5D-5L value sets, there remain methodological questions that can be addressed in future research (see Chap. 7). This might further improve the EQ-VT protocol.

# **References**


Devlin N, Buckingham K, Shah K, Tsuchiya A, Tilling C, Wilkinson G, van Hout B (2013) A comparison of alternative variants of the lead and lag time TTO. Health Econ 22(5):517–532

Dolan P (1997) Modeling valuations for EuroQol health states. Med Care 35(11):1095–1108


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 3 Experimental Design for the Valuation of the EQ-5D-5L**

#### **Mark Oppe, Richard Norman, Zhihao Yang, and Ben van Hout**

**Abstract** The EQ-VT protocol for valuing the EQ-5D-5L offered the opportunity to develop a standardised experimental design to elicit EQ-5D-5L values. This chapter sets out the various aspects of the EQ-VT design and the basis on which methodological choices were made in regard to the stated preference methods used, i.e., composite time trade-off (cTTO) and discrete choice experiments (DCE). These choices include the sub-set of EQ-5D-5L health states to value using these methods; the number of cTTO and DCE valuation tasks per respondent; the minimum sample size needed; and the randomisation schema. This chapter also summarises the research studies developing and testing alternative experimental designs aimed at generating a "Lite" version of the EQ-VT design. This "Lite" version aimed to reduce the number of health states in the design, and thus the sample size, to increase the feasibility of undertaking valuation studies in countries with limited resources or recruitment possibilities. Finally, this chapter outlines remaining methodological issues to be addressed in future research, focusing on refnement of current design strategies, and identifcation of new designs for novel valuation approaches.

M. Oppe (\*)

Z. Yang

B. van Hout School of Health and Related Research, University of Sheffeld, Sheffeld, UK

Pharmerit International, York, UK

Maths in Health B.V., Rotterdam, The Netherlands e-mail: moppe@mathsinhealth.com

R. Norman School of Population Health, Curtin University, Bentley, Australia

Health Services Management Department, Guizhou Medical University, Gui'an, People's Republic of China

# **3.1 Introduction**

As explained in Chap. 2, having decided that the protocol for valuing EQ-5D-5L would include both composite time trade-off (cTTO) and discrete choice experiment (DCE) as elicitation techniques, the next step was to provide a study design that would enable the researchers to identify a model that would appropriately predict values for all 3125 potential health states. For this, choices needed to be made about:


It was envisaged that not all respondents needed to value the same health states and that respondents could be randomised over different blocks of health states. The aim of this chapter is to describe the basis for the protocol designs and the factors that were considered in developing them. In addition, alternative designs and directions for future research with respect to designs will be addressed.

Valuation studies do not test hypotheses, and as such there is no classic power calculation as with randomised clinical trials. Generally, the more subjects and the more data per health state decreases the standard errors around the value for each health state, decreases the standard errors around the model estimates and one would expect it to decrease the probability of misspecifcation. A traditional method to test different designs is by simulating experiments (i.e., simulate respondents' answers to the tasks – informed by prior evidence on how people respond) and compare the simulated means with the means one would expect. The simulated experiments are analysed to determine whether the model that is being estimated corresponds with the model which underlies the simulations (the true model) and what the width is of the confdence intervals surrounding the estimates.

Within the above considerations it was also decided that both the cTTO task and the DCE task needed to be designed such that the data would allow for estimating separate models without the need for data from the other part of the study. This would leave room for the scientists conducting such studies to estimate models using only cTTO data, or only DCE data, or hybrid models combining the two sets of data. The EQ-VT designs were developed using a staged approach. Designs were created for the pilot studies that informed the development of the EQ-VT protocol. These pilot studies also informed refnements with respect to the experimental design.

As described in Chap. 1, for EQ-5D-3L valuation studies, the study protocols and experimental designs were not standardised, although most studies followed some or all of the protocol used in the frst time trade-off (TTO) study for EQ-5D-3L: the Measurement and Valuation of Health (MVH) study conducted in the United Kingdom (Dolan 1997). In the end, different countries produced EQ-5D-3L value sets based on different elicitation tasks. Some used a visual analogue scale (VAS) based elicitation technique, while others used TTO. In addition, different numbers of health states were used (e.g., 43 health states in total for MVH, while others used e.g., 17 states or 24 states, or a more saturated 196 health state approach), and a different selection of health states. These methodological differences between studies hampered comparison between countries: it is unknown to what extent differences in results obtained between countries were due to differences in the preferences of the study populations or due to differences in the study protocol and experimental design. Therefore, for the valuation of the EQ-5D-5L, the EuroQol Group decided to create a standardised study protocol including an experimental design (see Chap. 2 for more details on this standardisation).

# **3.2 EQ-VT Designs**

# *3.2.1 cTTO Design*

The states selected for the design of the cTTO need to be optimised for model estimation. This means the objective is to avoid introducing a bias in the model that originates from the selection of the health states included in the design. For example, if mild or moderate states are highly overrepresented in the design, this could lead to a bias in the model estimation. In addition, there should be enough states included, so that the model can be specifed. For example, since there are 20 main effect parameters (i.e., the four dummy parameters for each of the fve EQ-5D dimensions) the theoretical minimum number of health states to be included would be 21 (20 main effects +1 error term). For the main pilot study, also referred to as the core multinational pilot study (Oppe et al. 2014), the number of states that would be required for estimating an EQ-5D-5L value set was expected to be around 100. It was considered that a main effects model would have 21 parameters (5\*4 dummy variables for the main effects + intercept) leaving 79 degrees of freedom. Such a number of states would allow estimation of random coeffcient models, and inclusion of different kinds of interactions and/or the effects of background variables.

When regarding the number of observations per EQ-5D-5L health state included in the cTTO tasks, we found that in the cTTO pilot study with 121 observations per state, the standard errors for the severe states were around 0.056, while those for the mild states were around 0.01 (Janssen et al. 2013). This suggested we would achieve adequate average precision of the mean observed values with 100 observations per cTTO state. This was based on the assumption that with the standard errors at those levels, a repetition of the sample would result in observed mean values that would very likely fall within the bounds provided by these standard errors.

From the pilot studies, as well as the valuation studies undertaken for the EQ-5D-3L, it was clear that respondents would be able to complete at least 17 cTTO tasks each without negatively impacting on data quality (Tsuchiya et al. 2002; Lamers et al. 2006). However, since we also wanted to include a DCE task for the

same respondents, it was decided that we should limit the number of cTTO tasks per respondent to ten (excluding warm-up and practice tasks). In order to counteract biases due to framing effects, a blocked design was chosen to achieve a balanced mix of states with respect to where they are expected to lie on the overall value scale. That is to say, each respondent should complete a good balance of health states, covering the entire range from mild to severe. Therefore, each block was designed to include one of the fve very mild states (i.e., states 21111, 12111, 11211, 11121, 11112) and the worst state (i.e., state 55555, sometimes referred to as the "pits" state). It was decided to include ten blocks with two fxed states in each block such that eight states per block would need to be generated, i.e. 80 states. This implied that we would have (10\*8 + 5 + 1=) 86 states in total, which is a little less than in the main pilot study, but still more than four times the number of parameters for a typical main effects model.

Putting the above together, ten blocks of ten EQ-5D-5L states each, with 100 observations per block lead to a required sample size of 1000 respondents. This leaves the fnal part of the design for the cTTO part of the EQ-VT: selecting the set of 80 EQ-5D-5L states to be included.

We selected the 80 states from the total set of 3119 (i.e., the 3125 states in the EQ-5D-5L, minus the six states that were already included in the design, namely the fve mildest states and the "pits" state) using Monte Carlo simulation (see Box 3.1). First, values for all 3125 states for a sample of n=1000 respondents were simulated using a simulation programme implemented in R. Details of the simulation programme can be found in (Oppe and van Hout 2010). For the simulation as well as the optimisation algorithm, a main effects model (without constant) was used. This was decided based on the pilot studies and on two previous studies using the EQ-5D-3L. In the frst EQ-5D-3L study, OLS models including main effects and the N3 term (an interaction parameter which takes the value of 1 if any dimension is at level 3, or 0 otherwise) were estimated on the full data set of the MVH study, which resulted in an adjusted R2 of 0.43, and on a data set that included only the mean observed values of the 42 states included in the MVH study (thereby removing the within state variance), which resulted in an adjusted R2 of 0.97 (Oppe et al. 2013). These results indicate that the main contributor to the uncertainty is the within state variance, not the between state variance; that there is very little to gain by adding interaction terms (i.e., R2 can only increase marginally from 0.97); that you run the risk of overftting if interactions are added. The EQ-5D-5L pilot valuation studies showed that interactions similar to N3 or D1 (a parameter which corresponds to the number of impaired dimensions beyond the frst) from the EQ-5D-3L models did not improve the EQ-5D-5L models. Lastly, in a DCE study for EQ-5D-3L using a design optimised for main effects plus all two-way interactions the (pseudo) R2 increased from 0.266 for main effects to 0.277 for a model including interactions. In total there were 12 model parameters, but three of the main effects were no longer included (Stolk et al. 2010). Therefore, it becomes an issue of parsimony: Is adding interactions – consequently making the model less interpretable – worth a small increase in model ft?



A random design of 80 states was generated from the simulated data. An OLS main effects regression model (without constant) was estimated on the simulated set of cTTO data comprising the 80 states and 1000 respondents. Next, the sum of the mean squared errors (MSE) was calculated between the parameters that were used to create the simulated preference data and the parameters resulting from the OLS model. The difference between perfect level balance and achieved level balance of the 80 generated states was also calculated. The construction of the level balance criterion can be found in Appendix A. The regression procedure was repeated 10,000 times and an iterative procedure was used where designs that had either worse level balance or worse MSE were discarded.

The "optimal" set of 80 states was divided over the ten blocks using the blocking algorithm included in the "AlgDesign" package in R (Wheeler 2004). The blocking algorithm divides the states over the blocks in such a way that the within block variance is maximised (i.e., the full severity range is more or less covered within a block), while the between block variance is minimised (i.e., all blocks are more or less the same with respect to the mean severity per block).

In summary, the design of the cTTO experiment consists of 86 states divided over ten blocks with 100 observations per block, leading to about 10,000 observations in total, where the fve very mild states and state 55555 were oversampled compared to the other 80 states. For a main effects model this means that there will be about 400 observations per model parameter (8000 observations/20 parameters). The required sample size was determined to be 1000 (i.e., 10 blocks \* 100 observations per block). The 86 states of the cTTO design can be found in Appendix B.

# *3.2.2 DCE Design*

A Bayesian effcient design algorithm was used to select the pairs for the DCE. The priors were based on the results of a main effects model (without intercept) estimated on the data of an EQ-5D-3L DCE study (Stolk et al. 2010). We assumed that the levels 1, 2 and 3 from the EQ-5D-3L study corresponded to the levels 1, 3 and 5 for the EQ-5D-5L, while the levels 2 and 4 were assumed to be the mid-points between the levels 1 and 3, and 3 and 5 respectively. The standard errors of the parameters of the model we estimated on the EQ-5D-3L DCE data varied between 0.06 and 0.08. Conservatively, these were increased to 0.10 for the priors. The priors that were used can be found in Table 3.1.

Similar to the cTTO design there was an interest in making sure that at least some pairs of health states containing only mild states would be included in the DCE design. Therefore, ten such pairs were created manually. Pilot studies showed that the sample size of 1000 respondents determined for the cTTO would also be suffcient for estimating a DCE model (Krabbe et al. 2014; Oppe et al. 2014). In order to put limits on respondent burden, the number of DCE pairs per respondent was set to seven. The minimum number of observations needed per pair was deemed to be 35. This was based on being slightly more conservative than Hensher and colleagues, who refer to a minimum of 30 responses per set, based on the law of large numbers as stated by Bernoulli (Hensher et al. 2005). Putting these numbers together, a 196 pair design divided over 28 blocks of seven pairs was created using a Bayesian D-effcient design algorithm (see Box 3.2).

First, the set of ten mild pairs was manually selected. Next, a set of 186 pairs was randomly generated. For this set of 186 pairs, the Bayesian D-error of the design was determined using 1000 randomly drawn sets of priors. This process was repeated 10,000 times and the 186 pair design with the best D-error was kept. The ten mild pairs were added to this design, and the total set of 196 pairs was then blocked into 28 blocks of seven pairs each.

The Bayesian D-effcient design algorithm was implemented in R and we used the blocking algorithm included in the "AlgDesign" package in R (Wheeler 2004).

In summary, the DCE designs consists of 196 pairs divided over 28 blocks of seven pairs each. With the same sample size as the cTTO, this leads to a total of 7000 observations, meaning about 350 observations per parameter for a main effects model. The 196 pair DCE design can be found in Appendix C.


**Table 3.1** Priors used for the DCE design

Reproduced from Oppe and van Hout (2017)


**Box 3.2: Algorithm for Selection of DCE Pairs of Health States Used for the EQ-VT Design**

# *3.2.3 Other Considerations*

Apart from the sample size, and the selection of the set of health states for the cTTO and pairs for the DCE, another consideration for the experimental design of the EQ-VT was the randomisation schema needed. This is important, as a proper randomisation schema can counteract potential biases. For the cTTO, each respondent is randomly allocated one block of ten health states. The order in which the ten health states appear for each respondent is also randomised. For the DCE, each respondent is randomly assigned to one of the 28 blocks of pairs. The order of appearance of the seven pairs allocated to each respondent is also randomised, and for each pair of health states the order of appearance on the screen of the two health states comprising a pair (i.e., left versus right) is randomised. The order of appearance of the dimensions was not randomised, because the EQ-5D-5L instrument itself has a fxed order of appearance with respect to the dimensions (see Chap. 1).

# **3.3 Alternative cTTO Designs**

As noted above, the cTTO design of the EQ-VT protocol includes the selection of 86 different EQ-5D-5L health states and a minimal sample size of 1000. While this sample size is considered suffcient and achievable for most countries, reducing the number of states in the design, and thus the sample size, has the appeal that it could increase the feasibility of a valuation study in countries with limited resources or diffculty recruiting such a large number of respondents.

An important criterion for using a small design is that the accuracy of the estimated health state values should not be compromised (i.e., bias should be minimized). Following the study design established by Yang et al. in comparing EQ-5D-3L designs in a saturated VAS study (Yang et al. 2018), this process was replicated for the EQ-5D-5L. First, an EQ-5D-5L saturated VAS dataset was collected from a Chinese university student sample, with 100 VAS values for all 3125 EQ-5D-5L health states. Next, 100 variants of an orthogonal design1 with 25 health states were created and modelled. Their predictive performances were quantifed by calculating the Root Mean Squared Error (RMSE) against the observed VAS values from the saturated dataset. 25 health states were chosen as it is the minimal number for an orthogonal design in a fve-dimension fve-level classifcation system. For comparison, 100 variants of a random design and 100 variants of a D-effcient design were created, also with 25 states in each design. The EQ-VT design was also included as a reference (Yang et al. 2019a). The results showed that the RMSE was 3.44 for the EQ-VT design and 3.40 for the orthogonal design on the VAS scale (from 0 to 100). Little variance is observed among the 100 variants of the orthogonal design. Nevertheless, the inclusion of 11111 in the orthogonal design degraded the overall prediction performance. When extending the orthogonal design with the fve mildest states and the "pits" state (to counteract biases due to framing effects), the RMSE was 3.87. These results showed that the orthogonal design extended with fve mildest and the "pits" state could allow robust and precise estimations of EQ-5D-5L VAS values, as the RMSE was only slightly increased compared with the RMSE of 3.44 for EQ-VT design (i.e., the difference was 0.43 on VAS scale).

Considering the data distribution characteristics of the cTTO values from EQ-5D-5L valuation studies using the EQ-VT (e.g., they are not normally distributed, their distribution was separated by death into two parts, they displayed large heterogeneity etc.), a second study was performed validating the performance of orthogonal designs using cTTO data (Yang et al. 2019b). Following the EQ-VT protocol version 1.1 (as described in Chap. 2) cTTO data were collected from a sample of Chinese university students. In total, three designs were included in the study, i.e., (1) the EQ-VT design; (2) the best performing orthogonal design variant from the VAS saturated study; (3) a D-effcient design with 25 states. In total, 100 observations per health state were collected for the three designs of a total 136 health states (i.e., 86 + 25 + 25). Next, the value sets were modelled by design and their prediction accuracy was evaluated for the 136 states. The RMSEs of the (1) EQ-VT, (2) orthogonal + fve mildest states + the "pits" state and (3) D-effcient designs + fve mildest states + the "pits" state were 0.053, 0.066 and 0.063 on the value scale (0-1) respectively. Based on the fndings of these two studies, the use of

<sup>1</sup>An orthogonal design satisfes the criterion that all severity levels and all severity level combinations are equally prevalent and therefore balanced.

the EQ-VT design was confrmed as a default design choice for EQ-5D-5L valuation studies. However, the orthogonal design with 25 states + fve mildest states + one "pits" state can be used in some specifc contexts, e.g., when resources are not available for a standard EQ-VT study. Peru is the frst country to use the orthogonal design + fve mildest states + one "pits" state (referred as 'Lite' protocol; see Appendix D) for establishing its EQ-5D-5L value set (see Chap. 4). In that study, the modelling results suggested the 'Lite' protocol could produce logical consistent coeffcients, but some coeffcients were not signifcant. Additionally, the DCE coeffcients and cTTO coeffcients were found to be inequivalent in that study and a hybrid model of combining both responses was not used for the fnal Peruvian EQ-5D-5L value set. For the above-mentioned reasons, the authors suggest more research is needed to further explore the feasibility of such 'Lite' protocol (Augustovski et al. 2020).

# **3.4 Future Research**

Regarding design principles employed in EQ-5D-5L valuation studies, ongoing work focuses on refnement of current design strategies, and identifcation of new designs for emerging valuation approaches. Regarding the design used in the EQ-VT, evidence to date suggests that the design is ft for purpose. Across the range of studies already conducted with the EQ-VT (see Chap. 1 for details), the design has allowed precise estimation of health state values. However, there are a number of issues that require addressing in future.

First, it is apparent that the ten pairs of relatively mild health states appended to the DCE design are potentially problematic, and may cause bias in the parameter estimates. One plausible explanation for this is that the values for these health states are likely to be similar (as they are all close to full health and to each other), but the choice probabilities are not necessarily close to 50/50 since there may be a small but consistent preference for accepting a particular dimension at level two over another. Second, it may be that using a broader set of EQ-5D-5L health states yields a more accurate value set as the value of health states not directly observed in the data are more likely to have a near neighbour health state valued. For instance, the ongoing Indian EQ-5D-5L valuation study is exploring the use of an expanded set of 150 health states as part of the cTTO (Jyani et al. 2020). Third, the number of DCE choice pairs typically asked in the standard EQ-VT (i.e., seven) is limiting in terms of the models we might seek to estimate using the resultant data. For example, if we are interested in preference heterogeneity of the DCE data, then only having a small amount of DCE data precludes reliable estimation of more sophisticated latent class, mixed logit or generalised multinomial logit models (Fiebig et al. 2010), particularly if we are concerned with estimating correlations.

One more novel valuation approach that is under consideration currently is the use of DCE as a stand-alone task, a concept which has been growing in popularity in the health preference literature more generally (Mulhern et al. 2019). There are a range of potential advantages of a stand-alone DCE. Most importantly, the task can be undertaken without an interviewer (hence reducing the cost signifcantly) (Mulhern et al. 2013). Further, as it does not require interviewer travel, a smoother geographical distribution of respondents can be achieved (assuming that the surveying approach is equally accessible across regions). However, if we are reliant on the DCE alone (rather than as a component of the EQ-VT alongside cTTO tasks), then there is a need for more than seven choice observations per person, particularly if we want to move beyond estimation of population mean preferences, which is useful if we want to identify population sub-groups with specifc views. Further, there is a need to anchor the data so they can be used to populate cost-utility analysis, for instance through including one or more of a duration attribute, a 'dead' health state, or some other external anchor. Regarding design strategy for a stand-alone DCE, there has been particular focus on generator-type approaches and effcient designs. The relative merits of each have been widely discussed in the literature. For example, EuroQol-funded work has conducted a large DCE in Peru looking at different composite approaches to anchoring and design; these results have been reported as part of a larger study including cTTO and latent DCE tasks (Augustovski et al. 2020). Ongoing analysis of these data, and similar data collected in Denmark (Jensen et al. 2021), will explore whether there is clear enough evidence of superiority of one or the other design approach for this specifc purpose, and then to identify a design (or design approach) which can be used across countries conducting such a valuation survey.

This chapter has described the design strategies that have been used in existing EQ-5D-5L valuation projects, and their relative advances on those used for the EQ-5D-3L. The designs have been selected to balance statistical effciency with respondent ease (and hence data quality), and the current approach appears to refect a good trade-off between the two, with good completion rates, precise model estimates, and face validity of the fnal value sets across a number of languages, countries and cultures. The approaches used to this point are fexible, and can be adapted to meet the challenge of novel valuation approaches which may become more prominent in future years, and give policy makers confdence that the valuation surveys have accurately captured the attitudes of the general public without bias.

# **Appendices**

# *Appendix A: Construction for Level Balance Optimisation Criterion*

**Step 1:** A matrix (labelled "EQ lvl mat") with the counts for each level-domain combination is constructed (note that the example tables below contain hypothetical data using ten EQ-5D-5L states for illustrative purposes):


Reproduced from Oppe and van Hout (2017)

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

**Step 2:** Using the data from "EQ lvl mat" a second matrix, containing the squares of the differences between the presence of levels per dimension is created (labelled "lvl dist mat"):


Reproduced from Oppe and van Hout (2017)

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

**Step 3:** The elements of "lvl dist mat" are summed and the square root is taken over the sum to obtain the optimisation parameter (labelled "lvl bal check"):

"lvl bal check" = square root ( sum ( "lvl dist mat" ) ) = 7.75

A value for "lvl bal check" = 0 indicates perfect level balance (i.e. each leveldomain combination occurs twice).

A value for "lvl bal check" = 44.72 indicates the worst possible level balance: for each domain only 1 level is included. In this case "EQ lvl mat" contains one 10 and four 0's for each domain; "lvl dist mat" contains four 100's and six 0's, and the sum of "lvl dist mat" = 2000, with a square root = 44.72.

Note that perfect level balance is not a requirement (and might actually be undesirable in some cases). Small deviations can be allowed by e.g., setting a maximum allowable value for "lvl bal check" and letting the algorithm sample designs until it fnds one for which "lvl bal. check" is lower than this preset maximum.

# *Appendix B: The cTTO design of the EQ-VT*


**Table 3.2** The 86 EQ-5D-5L health states included in the composite TTO task of the EQ-VT


**Table 3.2** (continued)

Reproduced from Oppe and van Hout (2017)

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

# *Appendix C: The DCE design of the EQ-VT*


**Table 3.3** The 196 pairs of EQ-5D-5L health states included in the DCE task of the EQ-VT


**Table 3.3** (continued)


**Table 3.3** (continued)


**Table 3.3** (continued)


**Table 3.3** (continued)

Reproduced from Oppe and van Hout (2017)

# *Appendix D: The Design Used in the Peruvian EQ-5D-5L Valuation Study*


**Table 3.4** The 31 EQ-5D-5L health states included in the composite TTO task in the EQ-VT in the Peruvian valuation study (Augustovski et al. 2020)

a Health states numbers 1 to 25 are based on the orthogonal design, 26 to 30 are the fve mildest states and 31 is the "pits" state











*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

AD

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 EQ-5D-5L Value Set Summaries**

**Bram Roudijk, Kristina Ludwig, and Nancy Devlin**

**Abstract** This chapter provides structured summaries of all 25 currently published national EQ-5D-5L value sets. The summaries were developed by extracting information from the published manuscripts of each value set and conducting secondary data analyses of the original valuation data generated in each country/region. The summaries include the mathematical formula for the preferred model for each national value set; information on the representativeness of the samples that were used to generate the value set; the mean values observed for each health state valued using composite time trade-off, the distribution of responses in the discrete choice experiment; information on the number of interviewers and whether any interviewer effects were present in the valuation data; key characteristics of the predicted values and relative importance of the EQ-5D-5L dimensions; and information on the uptake of the value set by local decision makers and health technology assessment bodies. This chapter serves as a compendium of EQ-5D-5L value sets, which may inform users of these value sets about the characteristics of all published EQ-5D-5L value sets.

# **4.1 Introduction**

This chapter provides summary-level information on the currently published EQ-5D-5L value sets. The countries/regions are reported alphabetically, by study wave. This means that the oldest studies are reported frst, and the newest studies are reported last. Figure 4.1 provides an overview of the 25 studies which are included, by wave and protocol version.

B. Roudijk (\*)

K. Ludwig

EuroQol Research Foundation, Rotterdam, The Netherlands

#### N. Devlin Centre for Health Policy, University of Melbourne, Melbourne, VIC, Australia

EuroQol Research Foundation, Rotterdam, The Netherlands e-mail: roudijk@euroqol.org

Department of Health Economics and Health Care Management, School of Public Health, Bielefeld University, Bielefeld, Germany


**Fig. 4.1** Overview of EQ-VT studies by study wave and protocol version

Most of the information reported in this chapter was extracted from the manuscripts in which these value sets were published. However, in some cases, some of the relevant information was not included in the published papers. In these cases, we have undertaken secondary analyses of the data sets, with permissions from the study authors, using the methods reported below.

From each value set, we have extracted the mathematical formula for the preferred model for the value set, presented as *V*(*x*) and present the relative weights for each of the different dimension levels (20 parameters). The parameter therefore represents the decrement from level 1 to the respective level. We also present some other key characteristics of the value sets, such as the order of importance of each of the 5 dimensions of the EQ-5D-5L, the value for the worst and best health states, as well as the value for the best suboptimal health state. Lastly, we report key aspects of the study, such as the time frame in which the valuation data was collected, the sample size, sampling frame and sample characteristics.

# **4.2 Methods**

As each study has valued the same 86 health states using cTTO using the study design discussed in Chap. 3, 1 we report the arithmetic means and standard errors for each of these 86 health states in each country/region. The means are calculated for the same sample used in modelling the value set in each case i.e., following any exclusions which may have been made, which we describe. For studies in which the feedback module was used, as discussed in Chap. 2, the arithmetic means are calculated after the exclusion of any fagged responses.

For the DCE data, we report the proportion of respondents choosing EQ-5D-5L state A by the difference in level sum score between the two states included in the pair. Here, the level sum score is merely the sum of the levels of an EQ-5D-5L health state to give a broad indication of the severity of the state. For example, for state 12312, the level sum score would be 1+2+3+1+2=9. For example, the difference in level sum scores within a DCE choice pair with alternative A being 12341 and alternative B being 22335 is then 11 − 15= −4. We then report the percentage of responses choosing A when the difference in level sum scores equals −4. This will be a mix of choice pairs, as various choice pairs will have a level sum score difference of −4.

The modelling and other data analysis strategies used by the value set research team in each country/region differ somewhat. Therefore, we report the following for

<sup>1</sup>With the exception of Peru, where a different health state design for the cTTO was used, as this study was conducted using an EQ-VT 'Lite' protocol. A health state design including 31 unique health states was used instead of the 86-state design. For Peru, we therefore report the means of 31 health states.

each study: (1) data exclusion criteria and the number of excluded responses/ respondents; (2) interviewer effects; and (3) a description of modelling choices. To determine whether there are any interviewer effects, we partition the variance in the valuation data into variance related to interviewers *i*, respondents *j* and responses *k*, and determine the relative share of variance attributed to differences between interviewers. This is done by employing a mixed model in the form of equation 4.1 to each of the valuation datasets:

$$U\_{\eta k} = \beta\_0 + \gamma\_i + \mu\_{\eta} + \varepsilon\_{\eta k} \tag{4.1}$$

This model assumes there is a mean value *β*0 for all health states, which varies by interviewer (*γi*), respondent (*μij*) and health state valued (*εijk*). Here, *β*0 is a fxed effects parameter, while all others are random effects parameters. We then assume that *<sup>i</sup>* ~ , *N* 0 <sup>2</sup> , *ij* ~ , *N* 0 <sup>2</sup> and *ij* ~ , *N* 0 <sup>2</sup> . 2

To determine the share of variance attributed to differences between interviewers, we then calculate the Intra Class Correlation (ICC) coeffcient as in Eq. 4.23 :

$$ICC = \frac{\sigma\_{\gamma}^{2}}{\sigma\_{\gamma}^{2} + \sigma\_{\mu}^{2} + \sigma\_{\varepsilon}^{2}} \tag{4.2}$$

The relative importance of the EQ-5D-5L dimensions was determined by taking the sum of the coeffcients for one dimension, and dividing this by the sum of the coeffcients for all coeffcients. This measure can be seen as the size of the share of the total weight assigned to all dimension levels, and takes into account the relative importance for each dimension at all levels.

Finally, depending on the availability of the relevant information, we report the uptake of the value set by local HTA agencies, as reported in the manuscript or drawing on information provided by the principal investigators of the valuation studies to the authors of this book.

For each value set, we include full reference details and any other relevant literature directly related to the value set. Permission to reproduce these value sets and related information have been granted by the journals in which they are published, and access to the data to facilitate secondary analyses reported in this chapter was granted by the principal investigators, on behalf of the study teams in each case.

<sup>2</sup>We did not consider correlation between the variance in responses between interviewers, respondents and responses, which is a limitation. However, accounting for correlation between variance in random slopes for interviewers and respondents may be challenging, as not every study may use a number of interviewers that is suffciently large to be able to compute these correlations reliably.

<sup>3</sup>The ICC partitions variance into shares attributed to interviewers, respondents and responses. This is a way of operationalising interviewer effects, as it measures the share of variance caused by differences between interviewers. Small differences in distributions refect good agreement between interviewers, and therefore small interviewer effects. However, they do not necessarily refect good data quality only, as other factors, such as clustering of values and inconsistent responses are not captured by this measure.

# **4.3 Country-by-Country Overview of Value Sets**

# *4.3.1 Wave 1*

#### **4.3.1.1 Country/Region: Canada (Table 4.1)**


**Table 4.1** Overview of EQ-5D-5L value set for Canadaa

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities a We report the value set up to four decimals instead of the 3 decimal structure seen in other value set summaries in this chapter, due to the different modelling strategy that was applied to the Canadian data

The mathematical representation of the model for health state X is4 :

> *V x MO Level SC Level UA Level* 1 1351 0 0389 0 0458 0 0195 0 . . . . . . . . . 0444 0 0376 0 051 45 0 0584 45 0 1103 *PD Level AD Level MO SC U A PD AD Num sq* 45 0. . 1409 45 0 1277 45 0.0085 45

#### (i) **Date/wave of study**

Data were collected in the frst wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.0. Additionally, traditional TTO was used as elicitation technique to supplement the EQ-VT. Interviews were conducted in 2012.

#### (ii) **Sample size; sample frame**

1209 interviews with the general population were conducted in three English speaking metropolitan areas: Vancouver, Hamilton and Edmonton, and French speaking respondents were recruited in Montreal. Quota sampling with respect

<sup>4</sup>MO45, SC45, UA45, PD45 and AD45 are dummy variables that equal 1 when level 4 or 5 problems are reported in that dimension, and 0 otherwise. For example, SC45 will equal 1 in state 14111, but 0 in state 12111. Num45sq is a variable that represents the square of the number of level 4 or 5 problems in a health state, beyond the frst.

to age, gender, and education was applied. Of the 1073 respondents included in the fnal value set, 55.5% were female and 44.5% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Canadian population in terms of age (over 18 years), gender, marital status, born in Canada and language spoken at home. The sample was more educated, but had lower incomes compared to the general population in Canada (Table 4.2).


**Table 4.2** Representativeness of the sample in the Canadian valuation study

Reproduced from Xie et al. (2016)

a Statistics Canada 2006 and 2011

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.3)


**Table 4.3** Mean observed cTTOa values by health state

*SE* standard error

a Conventional TTO data was collected alongside the cTTO data and was included in the modelling as well. Here, we report only the cTTO data for comparability with the other value sets reported in the book

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.2)

**Fig. 4.2** Proportions choosing A based on relative severities of A and B. (DCE data were collected during the study, but not used in the modelling stage)

#### (vi) **Exclusion criteria**

A share of respondents with inconsistent responses were excluded. Inconsistencies were defned as strict dominance (e.g., assigning a higher value to state 11411 compared to state 11311). For each respondent, the number of dominated states by the very mild health states (with just one deviation from full health, e.g., 11121) was counted. Respondents were excluded if they assigned (a) the same or a lower value to at least half of the states that were dominated or (b) the same or a lower value for the very mild health state compared with the pits state (55555). 136 respondents met these exclusion criteria.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1209 interviews were conducted by 8 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (2.6%), respondents (35.8%) and responses (61.6%).

#### (viii) **Description of modelling choices**

The Canadian EQ-5D-5L value set was based on a combination of cTTO and conventional TTO data. The authors chose a fnal model that included a linear parameter for each dimension, with each dimension variables having levels 1, 2, 3, 4 and 5. Furthermore, the authors added dummy variables for each of the 5 dimensions, that equal 1 if that dimension reports level 4 or 5 problems, and 0 otherwise. Lastly, a term was added that represents the square of the number of dimensions reporting level 4 or 5 problems, beyond the frst. The estimated model used a Tobit link function, assuming censoring at 0 for negative values and values equal to 0 in the cTTO, and assumed a random intercept for each respondent.

#### (ix) **Value Set** (Table 4.4 and Fig. 4.3)


**Table 4.4** Key characteristics of the Canadian value set

**Fig. 4.3** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

4 EQ-5D-5L Value Set Summaries

# (x) **Uptake by local HTA/health care decision makers**

Canadian Agency for Drugs and Technologies in Health (CADTH) is the national HTA body that reviews and makes reimbursement recommendations to the public insurance programs across Canada. Cost-utility analysis (CUA) is a recommended type of economic evaluation for reimbursement applications. Validated generic health state classifcation systems with Canadian-specifc value sets are recommended for the estimation of QALYs. Although there is no preference for any specifc instrument, the EQ-5D-5L has become one of the commonly used instruments in clinical trials and economic evaluations.

#### (xi) **Reference(s) of value set**

Xie F, Pullenayegum E, Gaebel K, Bansback N, Bryan S, Ohinmaa A, Poissant L, Johnson JA (2016) A time trade-off-derived value set of the EQ-5D-5L for Canada. Med Care 54(1):98–105

## **Further Literature**

Statistics Canada (2006) 2006 Census of Canada. https://www.statcan.gc.ca/. Accessed 14 July 2021

Statistics Canada (2011) https://www12.statcan.gc.ca/census-recensement/2011/ dp-pd/tbt-tt/Index-eng.cfm. Accessed 28 July 2021

#### **4.3.1.2 Country/Region: China (Table 4.5)**


**Table 4.5** Overview of EQ-5D-5L value set for China

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.066 \, MO\_2 - 0.158 \, MO\_3 - 0.287 \, MO\_4 - 0.345 \, MO\_5 - 0.048 \, SC\_2 \\ &- 0.116 \, SC\_3 - 0.210 \, SC\_4 - 0.253 \, SC\_5 - 0.045 \, UA\_2 - 0.107 \, UA\_3 \\ &- 0.194 \, UA\_4 - 0.233 \, UA\_5 - \textbf{0.058} \, PD\_2 - 0.138 \, PD\_3 - 0.252 \, PD\_4 \\ &- 0.302 \, PD\_3 - 0.049 \, AD\_2 - 0.118 \, AD\_3 - 0.215 \, AD\_4 - 0.258 \, AD\_5 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the frst wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.0. Interviews were conducted in 2012.

#### (ii) **Sample size; sample frame**

1332 interviews with the general population were conducted in fve urban areas in different parts of China: Beijing, Chengdu, Guiyang, Nanjing and Shenyang. Within these cities, respondents were sampled to represent these cities with respect to age, gender and educational level using a nonprobability sampling strategy. Of the 1271 respondents included in the fnal value set, 49.9% were female and 50.1% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample** (Table 4.6)

Study sample (N=1271) Chinese adult general population (2011)a **Sampling characteristics** Age, n (%) 18–29 313(24.6%) 25.3% 30–39 244(19.2%) 19.7% 40–49 272(21.4%) 22.9% 50–59 220(17.3%) 14.7% ≥ 60 222(17.5%) 17.3% Gender, n (%) Female 634 (49.9%) 49.4% Male 637 (50.1%) 50.6% Education, n (%) Primary or lower 138 (10.9%) Junior high school 396 (31.2%) Senior high school 446 (35.1%) College or higher 291 (22.9%) Residence of origin City 749 (58.9%) Country 82 (6.5%) Township or village 440 (34.6%) Employment status Full time employee 378 (29.7%) Temporary worker 301 (23.7%) Individual freelancer 148 (11.6%) Retired 240 (18.9%) Student 115 (9.1%) Unemployed 48 (3.8%) Other 41 (3.2%)

**Table 4.6** Representativeness of the sample in the Chinese valuation study

Reproduced from Luo et al. (2017)

a Chinese Statistical Yearbook 2011

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.7)


**Table 4.7** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.4)

**Fig. 4.4** Proportions choosing A based on relative severities of A and B

## (vi) **Exclusion criteria**

Respondents that were younger than 18 years old were excluded (n=25). Another 36 respondents were excluded as they did not finish the interview.

# (vii) **Number of interviewers; Interviewer effects**

In total, 1332 interviews were conducted by 20 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (2.4%), respondents (28.8%) and responses (68.8%).

#### (viii) **Description of modelling choices**

The Chinese EQ-5D-5L value set was based on cTTO data only. The selected model was an 8-parameter multiplicative model with a random intercept, in which 5 coeffcients are estimated for the EQ-5D-5L's 5 dimensions, and 3 coeffcients are estimated for the 3 intermediate levels of the EQ-5D-5L (2,3 and 4), representing level weights. For levels 1 and 5, these are assumed to be 0 and 1, respectively. The predicted coeffcients were rescaled by (1-intercept). The eight-parameter model can be converted to 20 parameters, as presented in Table 4.8 for consistency purposes in this chapter.

#### (ix) **Value Set** (Table 4.8 and Fig. 4.5)


**Table 4.8** Key characteristics of the Chinese value set

**Fig. 4.5** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

4 EQ-5D-5L Value Set Summaries

# (x) **Uptake by local HTA/health care decision makers**

There is currently no HTA agency in China. There are only academic methodology Pharmacoeconomic/Health Technology Assessment guidelines, (China Guidelines for Pharmacoeconomic Evaluations Writing Group, 2019) but there are no recommendations about which specifc health related quality of life instruments are preferred for use.

# (xi) **Reference(s) of value set**

Luo N, Liu G, Li M, Guan H, Jin X, Rand-Hendriksen K (2017) Estimating an EQ-5D-5L value set for China. Value Health 20(4):662–669

# **Further Literature**


#### **4.3.1.3 Country/Region: England (Table 4.9)**


**Table 4.9** Overview of EQ-5D-5L value set for England

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V\left(X\right) &= 1 - 0.058\,MO\_2 - 0.076\,MO\_3 - 0.207\,MO\_4 - 0.274\,MO\_5 - 0.050\,SC\_2 \\ &- 0.080\,SC\_3 - 0.164\,SC\_4 - 0.203\,SC\_3 - 0.050\,UA\_2 - 0.063\,UA\_3 - 0.162\,UA\_4 \\ &- 0.184\,UA\_3 - 0.063\,PD\_2 - 0.084\,PD\_3 - 0.276\,PD\_4 - 0.335\,PD\_5 - 0.078\,AD\_2 \\ &- 0.104\,AD\_3 - 0.285\,AD\_4 - 0.289\,AD\_5 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the frst wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.0. Interviews were conducted between November 2012 and May 2013.

# (ii) **Sample size; sample frame**

1004 interviews were conducted (996 respondents completing the valuation tasks in full) with the general population. A sample of 2020 addresses from 66 primary sampling units (based on postcode sectors) across England was randomly selected, using the Post Offce small user Postcode Address File as the sampling frame. The sample was intended to be representative of adults aged 18 years and over living in private residential accommodation in England. Of the 912 respondents included in the analysis, 59.3% were female and 40.7% were male. The age distribution of the 912 respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was broadly representative for the English population in terms of age (over 18 years), sex and employment status. However, compared to the general population, the sample included a larger proportion of those aged over 75 and retired and a smaller proportion of younger individuals and males (Offce for National Statistics 2011) (Table 4.10).


**Table 4.10** Representativeness of the sample in the England valuation study

Reproduced from Devlin et al. (2018)

a Offce for National Statistics 2011

#### (iv) **Mean observed cTTO values by EQ-5D-5L state** (Table 4.11)


**Table 4.11** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.6)

**Fig. 4.6** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Twenty-three participants (2.3%) gave all 10 health states the same value, and 61 participants (6.1%) valued 55555 no lower than the value they gave to the mildest health state. Excluding these participants gave a core modelling dataset of 912 participants (9120 cTTO observations). No DCE data were excluded.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1004 interviews were conducted by 48 interviewers. Primary data collection was carried out in England by the market research company Ipsos MORI. The valuation data were collected via face-to-face interviews in respondents' homes.

The variance of the responses (following exclusions) can be partitioned into variance related to differences between interviewers (5.39%), respondents (32.46%), and responses (62.15%).

#### (viii) **Description of modelling choices**

The England EQ-5D-5L value set was based on 20-parameter hybrid model, combining both cTTO and DCE data. cTTO data were treated as being censored at −1 and at 1 (to account for asymmetry in the error distributions) and, for specifc responses, at 0 (e.g., for respondents who gave a value of 0 for more than one health state, including 55555). Heterogeneity was addressed via three latent classes, accounting for different groups of respondents differing in their use of

the scale. The latent class coeffcients act to apply an adjustment across all dimensions/level coeffcients; the value set reported in Table 4.12 above simplifes the presentation of the value set by reporting the coeffcients for dimensions/ levels after the application of the latent class coeffcients.

#### (ix) **Value Set** (Table 4.12 and Fig. 4.7)


**Table 4.12** Key characteristics of the English value set

**Fig. 4.7** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

The value set was subject to a 'quality assurance' commissioned by the Department of Health for England from the Economic Evaluation Policy Research Unit (EEPRU). EEPRU's critique is summarised in Hernandez-Alava et al. (2020) and a response from the authors is provided in van Hout et al. (2020). In response to concerns about data quality raised by EEPRU, NICE (2019) issued a position statement indicating it did not recommend use of the value set for England, recommending instead that EQ-5D-5L data be mapped to the EQ-5D-3L using the crosswalk published by van Hout et al. (2012) and the Dolan (1997) UK value set for the EQ-5D-3L be used. A new value set – for the UK, rather than for England – has been commissioned and NICE will review its policy once that study is complete (expected in 2022).

#### (xi) **Reference(s) for this value set**


#### **Further Literature**


#### **4.3.1.4 Country/Region: Netherlands (Table 4.13)**


**Table 4.13** Overview of EQ-5D-5L value set for Netherlands

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.047 - 0.035 \, MO\_2 - 0.057 \, MO\_3 - 0.166 \, MO\_4 - 0.203 \, MO\_3 \\ &- 0.038 \, SC\_2 - 0.061 \, SC\_3 - 0.168 \, SC\_4 - 0.168 \, SC\_5 - 0.039 \, UA\_2 \\ &- 0.087 \, UA\_3 - 0.192 \, UA\_4 - 0.192 \, UA\_5 - 0.066 \, PD\_2 - 0.092 \, PD\_3 \\ &- 0.360 \, PD\_4 - 0.415 \, PD\_5 - 0.070 \, AD\_2 - 0.145 \, AD\_3 - 0.356 \, AD\_4 \\ &- 0.421 \, AD\_3 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the frst wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.0. Interviews were conducted in the fall of 2012.

#### (ii) **Sample size; sample frame**

1003 interviews with the general population were conducted in fve cities and surrounds located in different parts of Netherlands: Utrecht, Rotterdam, Maastricht, Enschede and Groningen. Strata-based sampling with respect to age, gender and educational level as recorded by the by Statistics Netherlands (Centraal Bureau voor de Statistiek 2012). Of the 979 respondents included in the fnal value set, 50.85% were female and 49.15% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Dutch population in terms of age (except for the age group 80 and older), gender, education, and employment status (Table 4.14).


**Table 4.14** Representativeness of the sample in the Dutch valuation study

Reproduced from Versteegh et al. (2016)

a This sample includes the recruited respondents where data on characteristics were available.

b Statistics Netherlands 2012

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.15)


**Table 4.15** Mean observed cTTO values by health statea

*SE* standard error

a In the Dutch EQ-5D-5L value set manuscript, slightly different means are reported. In this table, the means for the analytic sample are reported, which was used to generate the value set. Versteegh et al. (2016) report the means of the full sample, before excluding several respondents

(v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.8)

**Fig. 4.8** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

cTTO data were excluded when the task was not fnished or when interviewers had indicated that the respondent had clearly not understood the task. Furthermore, respondents that gave the same value to all health states in the cTTO tasks were excluded. In total, 13 respondents were excluded from analysis of the cTTO data. In addition, no data were obtained from another 11 respondents, due to a loss of data caused by technical issues, respondents being unable to start the valuation tasks due to technical problems, the absence of an interviewer, unwillingness to participate after being informed about the topic of research and other and unknown reasons.

Level sum score of State A minus level sum score of State B

#### (vii) **Number of interviewers; Interviewer effects**<sup>5</sup> In total, 1003 interviews were conducted by 21 interviewers.

#### (viii) **Description of modelling choices**

The Dutch EQ-5D-5L value set was based on the cTTO data only. The selected model was a Tobit model, that accounts for censoring at −1. Levels 4 and 5 were collapsed for the self-care and usual activities dimensions, as they were ordered inconsistently in other models.

<sup>5</sup> It was not possible to compute the variance attributed to differences between interviewers, as interviewers shared login information for the software used to conduct the interviews. Therefore, it is not possible to differentiate between interviewers.

#### (ix) **Value Set** (Table 4.16 and Fig. 4.9)


**Table 4.16** Key characteristics of the Dutch value set

**Fig. 4.9** Value decrements across dimensions. (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

Cost-utility analysis (CUA) is required by the Dutch regulatory body, Zorg Instituut Nederland (2016). For QALYs, the use of the EQ-5D-5L and the accompanying Dutch value set is recommended.

# (xi) **Reference(s) of value set**

Versteegh MM, Vermeulen KM, Evers SMAA, De Wit GA, Prenger R, Stolk EA (2016) Dutch tariff for the fve-level version of EQ-5D. Value Health 19(4):343–352

# **Further Literature**


#### **4.3.1.5 Country/Region: Spain (Table 4.17)**


**Table 4.17** Overview of EQ-5D-5L value set for Spaina

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities a In the valuation study manuscript, the value set is reported as incremental dummies. For consistency, we report regular dummies here – see Sect. 4.2 for more information

The mathematical representation of the model for health state X is:

$$\begin{aligned} V\left(X\right) &= 1 - 0.084\,MO\_2 - 0.099\,MO\_3 - 0.249\,MO\_4 - 0.337\,MO\_5 - 0.050\,SC\_2 \\ &- 0.053\,SC\_3 - 0.164\,SC\_4 - 0.196\,SC\_5 - 0.044\,UA\_2 - 0.049\,UA\_3 \\ &- 0.135\,UA\_4 - 0.153\,UA\_5 - \mathbf{0}.078\,PD\_2 - 0.101\,PD\_3 - 0.245\,PD\_4 \\ &- 0.382\,PD\_3 - 0.081\,AD\_2 - 0.128\,AD\_3 - 0.270\,AD\_4 - 0.348\,AD\_5 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the frst wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.0. Interviews were conducted in June and July 2012.

#### (ii) **Sample size; sample frame**

1000 interviews with the general population were conducted, stratifed over all provinces of Spain. Within these provinces, respondents were sampled to represent the age and gender distribution of that province. Of the 973 respondents included in the fnal value set, 52.4% were female and 47.6% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Spanish population in terms of age (over 18 years) and gender (Table 4.18).

**Table 4.18** Representativeness of the sample in the Spanish valuation study


Reproduced from Ramos**-**Goñi et al. (2017)

a Spanish Ministry of Health 2012

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.19)


**Table 4.19** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.10** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Respondents that valued all health states in the cTTO tasks as equal to dead were excluded. Furthermore, respondents that had a positive slope on the regression between respondent's cTTO values and level sum score were excluded. 27 out of 1000 respondents were excluded. No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1000 interviews were conducted by 33 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (7.3%), respondents (31.9%), and responses (60.8%).

#### (viii) **Description of modelling choices**

The Spanish EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 Tobit model for the cTTO data, correcting for heteroskedasticity. Furthermore, cTTO responses were treated as intervals rather than point responses. For respondents that were not shown the WTD task in the explanations (see Chap. 2 for more details), the data were considered to be censored at 0. The intercept was constrained in the fnal model, as it was not statistically signifcant.

#### (ix) **Value Set** (Table 4.20 and Fig. 4.11)

**Table 4.20** Key characteristics of the Spanish value set


**Fig. 4.11** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

4 EQ-5D-5L Value Set Summaries

#### (x) **Uptake by local HTA/health care decision makers**

Cost-utility analysis (CUA) is currently not mandatory in pharmacoeconomics and health technology assessment reports either by AEMPS (Agencia Española de Medicamentos y Productos Sanitarios), or RedETS (Red Española de Agencias de Evaluación de Tecnologías Sanitarias y Prestaciones del Sistema Nacional de Salud); the agencies that provide evidence for reimbursement decisions on drugs and technical devices, respectively, in Spain. However, use of CUA is demanded by the State Health Authority in the assessment process of national Public Health programs such as Population Screening or Vaccination programs.

#### (xi) **Reference(s) of value set**

Ramos-Goñi JM, Craig BM, Oppe M, Ramallo-Fariña Y, Pinto-Prades JL, Luo N, Rivero-Arias O (2018) Handling data quality issues to estimate the Spanish EQ-5D-5L value set using a hybrid interval regression approach. Value Health 21(5):596–604

#### **Further Literature**

Ramos-Goñi JM, Pinto-Prades JL, Oppe M, Cabasés JM, Serrano-Aguilar P, Rivero-Arias O (2017) Valuation and modeling of EQ-5D-5L health states using a hybrid approach. Med Care 55(7):e51–e58

Spanish Ministry of Health (2012) Spanish national health survey 2011/2012.

https://www.mscbs.gob.es/estadEstudios/estadisticas/encuestaNacional/ encuesta2011.htm. Accessed 13 July 2021

# *4.3.2 Wave 2*

#### **4.3.2.1 Country/Region: Japan (Table 4.21)**


**Table 4.21** Overview of EQ-5D-5L value set for Japan

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.0609 - 0.0639 \, MO\_2 - 0.1126 \, MO\_3 - 0.1790 \, MO\_4 - 0.2429 \, MO\_3 \\ &- 0.0436 \, SC\_2 - 0.0767 \, SC\_3 - 0.1243 \, SC\_4 - 0.1597 \, SC\_5 - 0.0504 \, UA\_2 \\ &- 0.0911 \, UA\_3 - 0.1479 \, UA\_4 - 0.1748 \, UA\_3 - 0.0445 \, PD\_2 - 0.0682 \, PD\_3 \\ &- 0.1314 \, PD\_4 - 0.1912 \, PD\_5 - 0.0718 \, AD\_2 - 0.1105 \, AD\_3 - 0.1682 \, AD\_4 \\ &- 0.1960 \, AD\_3 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the second wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.1. Interviews were conducted between March and June 2014.

# (ii) **Sample size; sample frame**

1098 interviews with the adult general population > 20 years of age were conducted in fve cities in Japan: Tokyo, Okayama, Nagoya, Osaka, and Niigata. Respondents were recruited by a research company (ANTE-RIO Inc.) that sampled approximately 200 respondents at each location. The sample number was not determined on the basis of statistical considerations; respondents were stratifed by sex and age group in each location to collect the same number in each cell.

Following the application of quality control, data from respondents interviewed by three interviewers were excluded from analysis. Of the 1026 respondents included in the analysis, 49.8% were female and 50.2% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was indicated to be broadly representative of the adult general public of Japan – however the paper does not report descriptive statistics for the general population to enable direct comparison with the characteristics of the analysis sample. The sample characteristics are provided in Table 4.22.



Reproduced from Shiroiwa et al. (2016)

a % age and gender are based on 'Vital Statistics', Ministry of Health, Labour and Welfare; education statistics are from the Japan 2010 National Census

#### (iv) **Mean observed cTTO values by EQ-5D-5L state** (Table 4.23)


**Table 4.23** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.12)

**Fig. 4.12** Proportions choosing A based on relative severities of A and B

# (vi) **Exclusion criteria**

Quality control fagged three interviewers as failing to comply with the protocol; some or all of the data from the 72 respondents who were interviewed were excluded from the analysis data set.

# (vii) **Number of interviewers; Interviewer effects**

In total, 1098 interviews were conducted by 31 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (0.95%), respondents (38.77%), and responses (60.28%).

#### (viii) **Description of modelling choices**

The preferred value set is modelled using only cTTO data. The data were analysed by a linear mixed model with "1 - QOL score" as the dependent variable. To account for the intra-respondent correlation, a constant term and dummy variables representing the levels of the fve dimensions were treated as fxed effects and the respondents were treated as random effects.

#### (ix) **Value Set** (Table 4.24 and Fig. 4.13)


**Table 4.24** Key characteristics of the Japanese value set

**Fig. 4.13** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

According to Sect. 8.2.1 of the Japanese HTA Guidelines, "If Japanese quality of life scores are newly collected for a cost-effectiveness analysis, the use of preference-based measures with a value set developed in Japan using TTO (or mapped onto a TTO score) is recommended as the frst choice." (C2H. 2019) The characteristics of the Japanese EQ-5D-5L reported here therefore meets the stated requirements of Japan's HTA body, although it is unclear how often it has been used in evidence submitted to it.

# (xi) **Reference(s) for this value set**

Shiroiwa T, Ikeda S, Noto S, Igarashi A, Fukuda T, Saito S, Shimozuma K (2016) Comparison of value set based on DCE and/or TTO data: scoring for EQ-5D-5L health states in Japan. Value Health 19(5): 648–655

# **Further Literature**


#### **4.3.2.2 Country/Region: Korea (Table 4.25)**


**Table 4.25** Overview of EQ-5D-5L value set for Korea

N4 is a dummy variable set to 1 where *any* dimension has *at least* a level 4; and 0 otherwise *AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.096 - 0.046 \, MO\_2 - 0.058 \, MO\_3 - 0.133 \, MO\_4 - 0.251 \, MO\_3 \\ &- 0.032 \, SC\_2 - 0.050 \, SC\_3 - 0.078 \, SC\_4 - 0.122 \, SC\_5 - 0.021 \, UA\_2 \\ &- 0.051 \, UA\_3 - 0.100 \, UA\_4 - 0.175 \, UA\_5 - 0.042 \, PD\_2 - 0.053 \, PD\_3 \\ &- 0.166 \, PD\_4 - 0.207 \, PD\_5 - 0.033 \, AD\_2 - 0.046 \, AD\_3 - 0.102 \, AD\_4 \\ &- 0.137 \, AD\_5 - 0.078 \, N4 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the second wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.1. Interviews were performed between August 9 and November 13, 2013.

# (ii) **Sample size; sample frame**

Sampling was performed using a multistage stratifed quota method. A sample quota was assigned to each of the 15 regions according to population structure (population in region, sex, age, and education level), as defned in the June 2013 resident registration data available through the Ministry of Administration and Security, South Korea.

Of the 1080 respondents included in the analysis, 50.6% were female and 49.4% were male.

The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Korean population in terms of age (over 18 years), sex, education and employment status (Table 4.26).

**Table 4.26** Representativeness of the sample in the Korean valuation study


Reproduced from Kim et al. (2016) a 2010 Census

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.27)


**Table 4.27** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.14)

**Fig. 4.14** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Five respondents who responded with the same answer for all 10 health states of cTTO were excluded from the modelling dataset.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1080 interviews were conducted by 27 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (12.54%), respondents (20.69%), and responses (66.77%).

#### (viii) **Description of modelling choice**

The value set was based on cTTO data only, using 20 dummy variables (4 levels for each of 5 dimensions). Three criteria were used to select the fnal model (1) the model must demonstrate logically consistent predictions; (2) goodness of ft of the model, judged using mean absolute error (MAE), generalized R2 and the number of health states with absolute error <0.05 or 0.1; and (3) where models with similar MAEs were consistent, the simplest model was selected to maintain parsimony. The fnal model includes a constant, and a term that picked up whether any dimension in the state was at a level 4 or 5.

#### (ix) **Value Set** (Table 4.28 and Fig. 4.15)


**Table 4.28** Key characteristics of the Korean value set

**Fig. 4.15** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

CUA is currently the preferred form of economic evaluation in the 3rd version of Pharmacoeconomic Guideline revised by HIRA (Health Insurance Review Agency), the institute that assesses the cost-effectiveness of healthcare services to determine whether to include the service into the beneft package and decides the reimbursement price of the service (HIRA 2021), in 2021. There is no preference for a specifc multi-attribute utility instrument but the recommended source of values is a representative sample of the general population, preferably Korean (Bae et al. 2013). For EQ-5D, value sets for both 3-level version and 5-level version were developed from the general population in Korea (Jo et al. 2008; Lee et al. 2009; Kim et al., 2016) and those instruments have been applied for economic evaluations informing the reimbursement decision making regarding new drugs.

#### (xi) **Reference(s) for this value set**

Kim SH, Ahn J, Ock M, Shin S, Park J, Luo N, Jo MW (2016) The EQ-5D-5L valuation study in Korea. Qual Life Res 25(7):1845–1852

#### **Further Literature**


#### **4.3.2.3 Country/Region: Thailand (Table 4.29)**


**Table 4.29** Overview of EQ-5D-5L value set for Thailand

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V\left(X\right) &= 1 - 0.066\,MO\_2 - 0.087\,MO\_3 - 0.211\,MO\_4 - 0.371\,MO\_5 - 0.058\,SC\_2\\ &- 0.071\,SC\_3 - 0.193\,SC\_4 - 0.250\,SC\_5 - 0.058\,UA\_2 - 0.071\,UA\_3\\ &- 0.154\,UA\_4 - 0.248\,UA\_3 - \textbf{0.056}\,PD\_2 - 0.067\,PD\_3 - 0.207\,PD\_4\\ &- 0.256\,PD\_3 - 0.058\,AD\_2 - 0.096\,AD\_3 - 0.233\,AD\_4 - 0.295\,AD\_5 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the second wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.1. Interviews were conducted between August 2013 and January 2014.

# (ii) **Sample size; sample frame**

1207 interviews with the general population were conducted in the capital Bangkok and the following 11 provinces: Sing Buri, Trat, Suphan Buri, Chiang Mai, Chiang Rai, Sukhothai, Surin, Nong Bua Lam Phu, Roi Et, Krabi, and Nakhon Si Thammarat. Within these provinces, probability-based sampling was used to select geographical subunits, from each of which 10 respondents were selected using quota sampling with respect to age and gender. Of the 1207 respondents included in the fnal value set, 51.6% were female and 48.4% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Thai population in terms of age (over 18 years), gender, residential area, and number of children (Table 4.30).

Study sample (N=1207) Thai general populationa **Sampling characteristics** Age, n (%) 18–29 251 (20.8%) 20.2% 30–39 262 (21.7%) 22.7% 40–49 273 (22.6%) 22.6% 50–59 208 (17.2%) 16.9% ≥60 213 (17.7%) 17.6% Gender, n (%) Female 623 (51.6%) 50.9% Male 584 (48.4%) 49.1% Residential area, n (%) Urban 523 (43.3%) 44.2% Rural 684 (56.7%) 55.8% Education, n (%) Primary school or lower 543 (45.0%) 52.8% High school 533 (44.2%) 30.1% Bachelor or higher 131 (10.9%) 17.1% Number of children (mean (SD)) 1.7 (1.6) 1.5

**Table 4.30** Representativeness of the sample in the Thai valuation study

Reproduced from Pattanaphesaj et al. (2018)

a National Statistics Offce 2012 and 2013

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.31)


**Table 4.31** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.16** Proportions choosing A based on relative severities of A and B

# (vi) **Exclusion criteria**

cTTO responses were excluded if (1) a respondent assigned the same value to all 10 states valued, (2) a respondent had a positive slope on the regression between respondent's cTTO values and level sum score and (3) there were severe irrational responses, defned as major inconsistent responses (e.g. severe dominance violations). Two respondents were excluded from analysis. No DCE data were excluded from the analysis.

# (vii) **Number of interviewers; Interviewer effects**

In total, 1207 interviews were conducted by 6 interviewers. The variance of the responses included in the fnal analysis can be partitioned into variance related to differences between interviewers (1.8%), respondents (17.1%) and responses (81.1%).

# (viii) **Description of modelling choices**

The Thai EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 Tobit model for the cTTO data, correcting for heteroskedasticity. The intercept was constrained in the fnal model.

#### (ix) **Value Set** (Table 4.32 and Fig. 4.17)

**Table 4.32** Key characteristics of the Thai value set


**Fig. 4.17** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

Economic evidence is one type of evidence to be considered during the decision-making process for both pharmaceutical and non-pharmaceutical interventions in Thailand (Leelahavarong et al. 2019). Cost-utility analysis (CUA) using QALYs is required for economic evidence (such as when considering high-cost medicines for public funding). EQ-5D-5L and its value set are recommended in the National Health Technology Assessment (HTA) guideline as a method to estimate health utility and subsequently QALYs. Health Intervention and Technology Assessment Program (HITAP) has been offering an annual economic evaluation training to policymakers and researchers which includes how to interpret and use EQ-5D-5L values in health technology assessment (HTA). Currently, EQ-5D-5L is being widely adopted in health economic evaluations in Thailand.

# (xi) **Reference(s) of value set**

Pattanaphesaj J, Thavorncharoensap M, Ramos-Goñi JM, Tongsiri S, Ingsrisawang L, Teerawattananon Y (2018) The EQ-5D-5L valuation study in Thailand. Expert Rev Pharmacoecon Outcomes Res 18(5): 551–558

#### **Further Literature**


#### **4.3.2.4 Country/Region: Uruguay (Table 4.33)**


**Table 4.33** Overview of EQ-5D-5L value set for Uruguay

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.013 - 0.014 \, MO\_2 - 0.032 \, MO\_3 - 0.108 \, MO\_4 - 0.299 \, MO\_3 \\ &- 0.026 \, SC\_2 - 0.06 \, IS\_3 - 0.117 \, SC\_4 - 0.273 \, SC\_5 - 0.042 \, UA\_2 \\ &- 0.046 \, UA\_3 - 0.118 \, UA\_4 - 0.232 \, UA\_3 - 0.017 \, PD\_2 - 0.061 \, PD\_3 \\ &- 0.187 \, PD\_4 - 0.271 \, PD\_5 - 0.010 \, AD\_2 - 0.044 \, AD\_3 - 0.104 \, AD\_4 \\ &- 0.177 \, AD\_5 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the second wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.1. Interviews were conducted between October 2013 and June 2014.

#### (ii) **Sample size; sample frame**

805 interviews with the general population were conducted. Respondents were recruited from 3 Uruguayan regions: Montevideo and the departments of Maldonado and Paysandú, using a stratifed approach. Respondents were recruited using quotas for location, age, gender and socio-economic status. Of the 794 respondents included in the fnal value set, 55.3% were female and 44.7% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was broadly representative for the Uruguayan population. However, younger and higher educated respondents were slightly over-represented (Table 4.34).


**Table 4.34** Representativeness of the sample in the Uruguayan valuation study

Reproduced from Augustovski et al. (2016) a National Institute of Statistics Uruguay 2011 b Missing values (N=2)

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.35)


**Table 4.35** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.18)

**Fig. 4.18** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Respondents were excluded (1) if they had a positive slope on the regression between respondent's cTTO values and level sum score, or (2) if they assigned the same value to all health states (except if all states were assigned the value 1, i.e., non-traders). 11 respondents were excluded from analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 805 interviews were conducted by 11 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (5.5%), respondents (26.9%) and responses (67.6%).

#### (viii) **Description of modelling choices**

The Uruguayan EQ-5D-5L value set was based on cTTO data only. The selected model was based on a robust regression. A tuning variable was used, which was set to equal 8.5.

#### (ix) **Value Set** (Table 4.36 Fig. 4.19)



**Fig. 4.19** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

The Uruguayan EQ-5D-5L value set has been disseminated in scientifc meetings in Uruguay and Latin America and used in several scientifc projects. Research is currently ongoing to collect population health data using EQ-5D-5L in relevant health conditions, in order to use the Uruguayan weights. There are two HTA bodies in Uruguay; División de Evaluación Sanitaria/Ministerio de Salud Publica and Fondo Nacional de Recursos (FNR). Neither of these HTA bodies currently makes a specifc recommendation to use the Uruguayan EQ-5D-5L value set in economic evaluations.

# (xi) **Reference(s) of value set**

Augustovski F, Rey-Ares L, Irazola V, Garay OU, Gianneo O, Fernández G, Morales M, Gibbons L, Ramos-Goñi JM (2016) An EQ-5D-5L value set based on Uruguayan population preferences. Qual Life Res 25(2):323–333

#### **Further Literature**

National Institute of Statistics Uruguay (Instituto Nacional de Estadística) (2011) Census http://www.ine.gub.uy/censos2011/index.html. Accessed 18 July 2021

#### **4.3.2.5 Country/Region: Hong Kong (Table 4.37)**


**Table 4.37** Overview of EQ-5D-5L value set for Hong Kong

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.109 \, MO\_2 - 0.182 \, MO\_3 - 0.371 \, MO\_4 - 0.529 \, MO\_3 - 0.087 \, SC\_2 \\ &- 0.113 \, SC\_3 - 0.271 \, SC\_4 - 0.352 \, SC\_5 - 0.067 \, UA\_2 - 0.094 \, UA\_3 \\ &- 0.234 \, UA\_4 - 0.282 \, UA\_3 - \mathbf{0}.076 \, PD\_2 - 0.147 \, PD\_3 - 0.307 \, PD\_4 \\ &- 0.354 \, PD\_3 - 0.080 \, AD\_2 - 0.140 \, AD\_3 - 0.293 \, AD\_4 - 0.348 \, AD\_3 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the second wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 1.1. Interviews were conducted between June 2014 and October 2015.

#### (ii) **Sample size; sample frame**

A total of 1014 Hong Kong residents aged 18 and above participated in this study. Stratifed quota sampling was applied based on age, sex and educational attainment over three geographical areas of Hong Kong: Hong Kong Island, Kowloon, and New Territories. Of the 999 respondents included in the fnal value set, 59.2% were female, and 40.8% were male. Furthermore, 29.8% of the respondents had at least 1 chronic health condition. The age distribution of the respondents was:


# (iii) **Representativeness of achieved sample**

The study sample was representative for the Hong Kong general population in terms of age, sex and highest education attainment. The distribution of the study sample in terms of marital status, employment status, and area of residence approximated that of the general population (Table 4.38).


**Table 4.38** Representativeness of the sample in the Hong Kong valuation study

Reproduced from Wong et al. (2018)

a Census and Statistics Department. (2012)

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.39)


**Table 4.39** Mean observed cTTO values by health state

*SE* standard error

(v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.20)

**Fig. 4.20** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Three respondents with a positive slope on a regression between their cTTO values and the severity of the states were excluded from analysis. In addition, one respondent who valued all states at zero and 11 respondents who valued all states as −1 were removed from analysis. All DCE data were included in the analysis. A further 515 out of 9990 (5.16%) cTTO responses were removed following issues fagged in the feedback module.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1014 interviews were conducted by 6 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (2.2%), respondents (33.1%), and responses (64.7%).

#### (viii) **Description of modelling choices**

The Hong Kong EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 Tobit model for the cTTO data. The intercept was constrained in the fnal model as it was not statistically signifcant.

#### (ix) **Value Set** (Table 4.40 and Fig. 4.21)


**Table 4.40** Key characteristics of the Hong Kong value set

**Fig. 4.21** Value decrements across dimensions (*MO* mobility, *SC* self-care, *UA* usual activities, *PD* pain/discomfort, *AD* anxiety/depression)

# (x) **Uptake by local HTA/health care decision makers**

The EQ-5D-5L has been adopted in healthcare management for individual patients, education in healthcare delivery and health policy to promote patientcentred care in hospital settings in Hong Kong. The association between health-related quality of life and shared decision-making among patients was explored using the EQ-5D-5L in enhancing health professional-patient communications. The tool was incorporated into the local validated patientreported experience measures (PREMs) among patients as a measure of health outcomes together with patient experience in different healthcare settings including inpatient, specialist outpatient, and accident and emergency department under the Hospital Authority (HA) in Hong Kong. It was also applied in health-related population surveys and routine measurement among patients with chronic conditions such as musculoskeletal problems, diabetes mellitus and elderly with hypertension in both clinical and non-clinical settings.

# (xi) **Reference(s) for this value set**

Wong EL, Ramos-Goni JM, Cheung AW, Wong AY, Rivero-Arias O (2018) Assessing the **use of a feedback module to model EQ-5D-5L health states values in Hong Kong.** Patient 11(2):235–247

#### **Further Literature**


# *4.3.3 Wave 3*

#### **4.3.3.1 Country/Region: France (Table 4.41)**


**Table 4.41** Overview of EQ-5D-5L value set for France

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities a The preferred model is the adjusted hybrid model – these values are taken from Andrade et al. 2020, Table 3 ('weighted model') and rounded to 3 decimal places

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.038 \, MO\_2 - 0.048 \, MO\_3 - 0.179 \, MO\_4 - 0.325 \, MO\_3 - 0.037\\ SC\_2 &- 0.051 \, SC\_3 - 0.172 \, SC\_4 - 0.258 \, SC\_3 - 0.033 \, UA\_2 - 0.040 \, UA\_3 - 0.157\\ UA\_4 &- 0.240 \, UA\_3 - \textbf{0.022} \, PD\_2 - 0.047 \, PD\_3 - 0.264 \, PD\_4 - 0.444 \, PD\_3 - 0.020\\ AD\_2 &- 0.047 \, AD\_3 - 0.200 \, AD\_4 - 0.258 \, AD\_3 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted from March until November 2018.

# (ii) **Sample size; sample frame**

1143 interviews with the general population were conducted. Quota-based sampling with respect to age, sex, and socioeconomic status was applied (National Institute for Statistics and Economic Studies [INSEE] 2018). Interviewers were selected to provide reasonable coverage of the territory and population size of the respondents' residence. Of the 1048 respondents included in the fnal value set, 55.4% were female and 44.6% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was broadly representative for the French population aged over 18. However, the composition of the fnal sample used in modelling (following exclusions – see (vi) below) differed from the French general population in terms of age and gender. An overrepresentation of female respondents was observed. Respondents aged 25–34 years were overrepresented for both sexes. Moreover, women aged 75 and older were underrepresented in the sample, whereas woman in the age group 55–64 years were overrepresented (Table 4.42).


**Table 4.42** Representativeness of the sample in the French valuation study

a National Institute for Statistics and Economic Studies (INSEE) 2018

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.43)


**Table 4.43** Mean observed cTTO values by health state

*SE* standard error

(v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.22)

**Fig. 4.22** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

95 interviews were excluded from data analysis due to poor data quality. Interviews were excluded if the interviewer did not show the WTD task in the wheelchair example or if the respondent gave state 55555 a value that was higher than the value given to the mildest health state presented in the cTTO tasks. Moreover, a total of 6.5% of cTTO responses (n = 677) were removed following the feedback module. No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1143 interviews were conducted by 11 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (3.3%), respondents (11.1%), and responses (85.6%).

#### (viii) **Description of modelling choices**

The French EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 tobit model for the cTTO data, correcting for heteroskedasticity. The model was additionally adjusted to correct for imbalance in the sample in terms of age and gender distribution compared to the general population in France. The intercept was constrained in the fnal model because it was marginal and non-signifcant.

#### (ix) **Value Set** (Table 4.44 and Fig. 4.23)


**Table 4.44** Key characteristics of the French value set

**Fig. 4.23** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

The French National Authority for Health (Haute Autorité de Santé [HAS]) recommends in its updated offcial methodological guide for economic evaluation that the EQ-5D-5L questionnaire and the French EQ-5D-5L value set should be the preferred measure used to derive utility values for use in HTA (HAS 2020).

# (xi) **Reference(s) for this value set**

Andrade LF, Ludwig K, Ramos-Goñi JM, Oppe M, de Pouvourville G (2020) A French Value Set for the EQ-5D-5L. Pharmacoeconomics 38(4):413–425

# **Further Literature**


#### **4.3.3.2 Country/Region: Germany (Table 4.45)**


**Table 4.45** Overview of EQ-5D-5L value set for Germany

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.026 \, MO\_2 - 0.042 \, MO\_3 - 0.139 \, MO\_4 - 0.224 \, MO\_3 - 0.050 \\ SC\_2 &- 0.056 \, SC\_3 - 0.169 \, SC\_4 - 0.260 \, SC\_3 - 0.036 \, UA\_2 - 0.049 \, UA\_3 - 0.129 \\ UA\_4 &- 0.209 \, UA\_3 - \textbf{0.057} \, PD\_2 - 0.109 \, PD\_3 - 0.404 \, PD\_4 - 0.612 \, PD\_3 - 0.030 \, AD\_2 \\ AD\_2 &- 0.082 \, AD\_3 - 0.244 \, AD\_4 - 0.356 \, AD\_3 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted between December 2014 and March 2015.

# (ii) **Sample size; sample frame**

1158 interviews with the general population were conducted in six cities and surrounds located in different parts of Germany: Berlin, Leipzig, Hamburg, Bielefeld, Munich, and Frankfurt. Quota-based sampling with respect to age, sex, educational level, and employment status was applied (Federal Statistical Offce 2015). Of the 1158 respondents included in the fnal value set, 53.4% were female and 46.6% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the German population in terms of age, sex, education, and employment status (Table 4.46).


**Table 4.46** Representativeness of the sample in the German valuation study

Reproduced from Ludwig et al. (2018)

a Federal Statistical Offce 2015

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.47)


**Table 4.47** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.24** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

A total of 6.2% of cTTO responses (n = 713) were removed following the feedback module; but no respondent's entire cTTO responses were excluded. No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1158 interviews were conducted by 19 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (2.7%), respondents (16.0%), and responses (81.3%).

#### (viii) **Description of modelling choices**

The German EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 tobit model for the cTTO data, correcting for heteroskedasticity. The intercept was constrained in the fnal model because it was marginal and non-signifcant.

#### (ix) **Value Set** (Table 4.48 and Fig. 4.25)


**Table 4.48** Key characteristics of the German value set

**Fig. 4.25** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

Cost-utility analysis (CUA) is currently not required and is not the preferred form of economic evaluation in pharmacoeconomics and health technology assessment guidelines provided by the Institute for Quality and Effciency in Health Care (IQWiG), the offcial agency for providing evidence for reimbursement decisions of drugs in Germany (IQWiG 2020; Kennedy-Martin et al. 2020; Rowen et al. 2017). Use of CUA is common in the assessment process for the development of vaccination recommendations by the Standing Committee on Vaccination (STIKO) but no preference for a specifc multiattribute utility instrument is recorded (STIKO 2016).

# (xi) **Reference(s) for this value set**

Ludwig K, Graf von der Schulenburg JM, Greiner W (2018) German Value Set for the EQ-5D-5L. Pharmacoeconomics 36(6):663–674

#### **Further Literature**


#### **4.3.3.3 Country/Region: Indonesia (Table 4.49)**


**Table 4.49** Overview of EQ-5D-5L value set for Indonesiaa

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities a These coeffcients present the decrement from level 1 to the respective level (regular dummies), whereas Purba et al. (2017a) report coeffcients representing the additional decrement of moving from one level to another (incremental dummies)

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 119 0 192 0 410 0 613 0 101 0 140 234 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 248 0 316 0 090 0 156 0 301 0 385 . . . . . . 0 086 0 095 0 198 0 246 0 079 0 134 0 2 2 3 4 5 2 3 . . . . . . . *PD PD PD PD AD AD* 27 0 305 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted between March 2015 and January 2016.

# (ii) **Sample size; sample frame**

1056 interviews with the general population were conducted in six cities and surrounds located in different parts of Indonesia: Jakarta, Bandung, Jogjakarta, Surabaya, Medan, and Makassar. A multi-stage stratifed quota sampling with respect to residence, gender, age, and level of education (stage 1) and with respect to religion and ethnicity (stage 2) was applied (Indonesian Bureau of Statistics (BPS) 2015). Of the 1054 respondents included in the fnal value set, 49.9% were female and 50.1% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Indonesian general population (over 17 years). The distribution of the study sample in terms of residence, gender, and religion was similar to that of the general population. There were some small differences between the Indonesian general population and some of the age groups, education levels, and ethnicities in the study sample (Table 4.50).


**Table 4.50** Representativeness of the sample in the Indonesian valuation study


#### **Table 4.50** (continued)

Reproduced from Purba et al. (2017a) a BPS 2015

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.51)


**Table 4.51** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.26** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Due to data quality issues, after the frst 102 interviews all interviewers were retrained and the interviews collected to that point were excluded from data analysis and regarded as pilot interviews (not included in the above-mentioned study sample, details are reported in Purba et al. 2017b).

A total of 9.8% of cTTO responses (n = 1033) were removed following the feedback module. Moreover, further 45 cTTO responses were excluded where the respondent preferred living in an impaired health state over full health. In addition, two respondents with a positive slope on the regression between his cTTO values and the level sum score were also excluded. No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1056 interviews were conducted by 15 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (2.7%), respondents (12.1%), and responses (85.2%).

#### (viii) **Description of modelling choices**

The Indonesian EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 tobit model for the cTTO data, correcting for heteroskedasticity. The intercept was constrained in the fnal model.

#### (ix) **Value Set** (Table 4.52 and Fig. 4.27)


**Table 4.52** Key characteristics of the Indonesian value set

**Fig. 4.27** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

## (x) **Uptake by local HTA/health care decision makers**

The Indonesian Health Technology Assessment Committee (InaHTAC) produced an HTA guideline in 2017 that suggests use of cost-utility analysis (CUA) and cost-effectiveness analysis (CEA) for economic evaluations in Indonesia. The guideline also recommends EQ-5D as the preferred instrument for use in estimating QALYs (InaHTAC 2017). The availability of EQ-5D-5L value set from a societal, Indonesian perspective supported various HTA and non-HTA studies in Indonesia.

# (xi) **Reference(s) for this value set**

Purba FD, Hunfeld JAM, Iskandarsyah A, Fitriana TS, Sadarjoen SS, Ramos-Goñi JM, Passchier J, Busschbach JJV (2017a) The Indonesian EQ-5D-5L Value Set. Pharmacoeconomics 35(11):1153–1165

#### **Further Literature**


#### **4.3.3.4 Country/Region: Ireland (Table 4.53)**


**Table 4.53** Overview of EQ-5D-5L value set for Ireland

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 063 0 097 0 215 0 344 0 055 0 088 2 3 4 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 229 0 287 0 049 0 072 0 154 0 187 . . . . . . 0 068 0 093 0 373 0 510 0 080 0 202 0 5 234 5 2 3 . . . . . . . *PD PD PD PD AD AD* 35 0 646 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted between March 2015 and September 2016.

#### (ii) **Sample size; sample frame**

1160 interviews with the general population were conducted. A representative sample of Irish residents was obtained using a two-stage stratifed clustering process. In the frst stage, a sample of 54 small areas stratifed by income and urban/rural classifcations were drawn at random from across the country. In the second stage, within each small area, a sample of approximately 20 households were selected at random. Random selection was achieved by using a random starting point and inviting a resident from every third household to participate in the survey. The recruited sample was then compared with Central Statistics Offce for Ireland (CSO) national population estimates for age and sex. Purposive sampling was used to augment the number of younger individuals and males in the sample. Of the 1160 completed surveys, 102 were purposive; 37% were male and 63% female.

The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

Including the purposive sample, the sample broadly refects the Irish population, with some over-representation of those aged > 45 years and of females. Those with tertiary-level education were over-represented and those with only primary- level education were under-represented compared with the population at large (Table 4.54).



#### **Table 4.54** (continued)


Reproduced from Hobbins et al. (2018)

a Decimal places are shown only for consistency in reporting. The percentages shown here are from Hobbins et al. (2018) and are rounded to the closest whole number.

b Central Statistics Offce for Ireland (2011)

#### (iv) **Mean values of cTTO states** (Table 4.55)


**Table 4.55** Mean cTTO values by health state

*SE* standard error

#### (v) **Proportions choosing A and B in DCE based on relative severities of A and B** (Fig. 4.28)

**Fig. 4.28** Proportions choosing A and B based on relative severities of A and B

#### (vi) **Exclusion criteria**

No data were excluded from analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total 1160 interviews were conducted by 7 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (1.39%) respondents (30.98%) and responses (67.63%).

#### (viii) **Description of modelling choices**

The observed cTTO values for the non-fagged health states after the feedback module were used i.e., the respondent's fagged cTTO observations – which accounted for 2% values) were excluded (details on the feedback module and its use are provided in Chap. 2). For the DCE data, the dependent variable was the binary stated choice (i.e., 0/1 indicated the choice for each health state pair). No DCE data were excluded.

The Irish EQ-5D-5L value set was based on a main effects hybrid model combining DCE data and cTTO data, addressing the censoring of cTTO data at −1 and correcting for heteroskedasticity.

#### (ix) **Value Set** (Table 4.56 and Fig. 4.29)


**Table 4.56** Key characteristics of the Ireland value set

**Fig. 4.29** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

There are two principal public entities involved in cost-utility analysis (CUA) in Ireland: the Health Information and Quality Authority (HIQA) for nonpharmaceutical technologies and the National Centre for Pharmacoeconomics (NCPE) for pharmaceutical technologies. The former can undertake CUAs; the latter can request evidence from CUAs by pharmaceutical companies. While HIQA has used the Irish EQ-5D-5L value set (HIQA 2018) as have a number of academic studies (for example, see Murphy et al. 2019; Cardwell et al. 2020), NCPE have not yet adopted it for use.

# (xi) **Reference for this value set**

Hobbins A, Barry L, Kelleher D, Shah K, Devlin N, Ramos Goni JM, O'Neill C (2018) Utility Values for Health States in Ireland: A Value Set for the EQ-5D-5L. Pharmacoeconomics 36(11):1345–1353

# **Further Literature**


# **4.3.3.5 Country/Region: Malaysia (Table 4.57)**


**Table 4.57** Overview of EQ-5D-5L value set for Malaysia

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 081 0 108 0 261 0 340 0 062 0 083 2 3 4 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 200 0 261 0 048 0 064 0 155 0 202 . . . . . . 0 081 0 107 0 259 0 338 0 072 0 095 0 2 234 5 2 3 . . . . . . . *PD PD PD PD AD AD* 30 0 300 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted between August and September 2016.

#### (ii) **Sample size; sample frame**

1137 interviews with the general population were conducted in four Malaysian states: Penang (Northern), Selangor (Central), Kelantan (Eastern), and Malacca (Southern). A quota-based sampling with respect to urbanicity, gender, age (over 18 years) and ethnicity based on the Malaysian National Census (Department of Statistics Malaysia 2010). Of the 1125 respondents included in the fnal value set, 48.8% were female and 51.2% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was largely representative for the Malaysian general population in terms of gender, age, ethnicity, and residential area. Employed (full time)/self-employed respondents were slightly underrepresented in the study sample (Table 4.58).


**Table 4.58** Representativeness of the sample in the Malaysian valuation study

Reproduced from Shafe et al. (2019)

a Department of Statistics Malaysia 2010 and 2016

b Outside labour force: respondents with sickness or disability, caretakers of households, students, and the retired

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.59)


**Table 4.59** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.30** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Respondents whose cTTO value increased with health state severity and those who valued all health states at −1 were excluded from data analysis (n=12). No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1137 interviews were conducted by 18 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (1.1%), respondents (21.8%), and responses (77.1%).

#### (viii) **Description of modelling choices**

The Malaysian EQ-5D-5L value set was based on a non-linear constrained hybrid model combining a conditional logit model for the DCE data and an additive model assuming a normal distribution for the cTTO data. A constrained eight-parameter model was selected that fts a single parameter per dimension, taking a value representing level 5; and one parameter for each of levels 2, 3, and 4 that are multiplied by the respective dimension parameters. The eight-parameter model can be converted to 20 parameters, as presented in Table 4.57 for consistency purposes in this chapter. The intercept was constrained in the fnal model because it was non-signifcant.

#### (ix) **Value Set** (Table 4.60 and Fig. 4.31)


a Minimum value was calculated using the non-rounded values as reported in Shafe et al. (2019)

**Fig. 4.31** Value decrements across dimensions. (Please note that the lines of MO and PD are virtually identical. The line for MO is obscured by the yellow line for PD; AD anxiety/depression, MO mobility, PD pain/discomfort, SC self-care, UA usual activities)

## (x) **Uptake by local HTA/health care decision makers**

According to the current Malaysia Pharmacoeconomic Guidelines, the preferred economic evaluation techniques to inform health technology assessment decisions are cost-effectiveness analysis (CEA) and cost-utility analysis (CUA) (Ministry of Health Malaysia 2019). CUA is the recommended technique when HRQOL is an important outcome and when the intervention affects both morbidity and mortality. The guideline also states that EQ-5D is the preferred patient-reported outcomes measure and that a locally derived value set is strongly recommended for use, with the Malaysian EQ-5D-5L value set study being cited in the document.

# (xi) **Reference(s) for this value set**

Shafe AA, Vasan Thakumar A, Lim CJ, Luo N, Rand-Hendriksen K, Yusof FAM (2019) EQ-5D-5L Valuation for the Malaysian Population. Pharmacoeconomics 37(5):715–725

#### **Further Literature**


#### **4.3.3.6 Country/Region: Poland (Table 4.61)**


**Table 4.61** Overview of EQ-5D-5L value set for Poland

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 025 0 034 0 126 0 314 0 031 0 047 234 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 111 0 264 0 023 0 040 0 097 0 205 . . . . . . 0 030 0 050 0 261 0 575 0 018 0 029 0 1 2 3 4 5 2 3 . . . . . . . *PD PD PD PD AD AD* 08 0 232 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted from June until October 2016.

#### (ii) **Sample size; sample frame**

1281 interviews with the general population were conducted. Quota-based sampling was applied using Polish census data from November 2014, based on personal identifcation number registry and Central Statistical Offce data on education (Central Statistical Offce 2015). Of the 1252 respondents included in the fnal value set, 52.5% were female and 47.5% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Polish general population in terms of age, sex, education, employment status, as well as size and geographical location of the place of residence (Table 4.62).


**Table 4.62** Representativeness of the sample in the Polish valuation study

Reproduced from Golicki et al. (2019) a Central Statistical Offce 2015

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.63)


**Table 4.63** Mean observed cTTO values by health state

*SE* standard error

(v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.32)

**Fig. 4.32** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

29 interviews with data quality issues in the cTTO part were excluded from the data analysis (i.e. fagged interviews in QC tool; see Chap. 2 for more details). A total of 8.3% of cTTO responses (n = 1040) were removed following the feedback module. No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1281 interviews were conducted by 15 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (3.5%), respondents (13.7%), and responses (82.8%).

#### (viii) **Description of modelling choices**

The Polish EQ-5D-5L value set was based on a hybrid model that accounted for random parameters, error scaling with fat tails, censoring at −1, unwillingness to trade life years in cTTO by the religious people and Cauchy distribution in DCE. The intercept was constrained in the fnal model.

#### (ix) **Value Set** (Table 4.64 and Fig. 4.33)



**Fig. 4.33** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

A pharmacoeconomic analysis is required in Polish HTA submissions, and cost-utility analysis is the preferred form of economic evaluation in HTA reports and reimbursement dossiers of drugs, according to Reimbursement Act (2011) and Polish HTA agency (Agencja Oceny Technologii Medycznych i Taryfkacji, AOTMiT) guidelines (AOTMiT 2016). The preferred way of obtaining health state values is based on secondary sources – published data collected using questionnaires accompanied by values. The frst choice is EQ-5D (EQ-5D-5L or EQ-5D-3L). If EQ-5D-based data are not available, the second choice is SF-6D- or HUI-based utilities. The third choice covers health state values based on other instruments. In the case of primary collection of health states valuation data, the use of EQ-5D (EQ-5D-3L or EQ-5D-5L) and Polish values sets are recommended.

# (xi) **Reference(s) for this value set**

Golicki D, Jakubczyk M, Graczyk K, Niewada M (2019) Valuation of EQ-5D-5L Health States in Poland: the First EQ-VT-Based Study in Central and Eastern Europe. Pharmacoeconomics 37(9):1165–1176

#### **Further Literature**


# **4.3.3.7 Country/Region: Portugal (Table 4.65)**


**Table 4.65** Overview of EQ-5D-5L value set for Portugal

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.048 \, MO\_2 - 0.092 \, MO\_3 - 0.182 \, MO\_4 - 0.356 \, MO\_3 - 0.048 \\ SC\_2 &- 0.070 \, SC\_3 - 0.156 \, SC\_4 - 0.294 \, SC\_5 - 0.044 \, UA\_2 - 0.063 \, UA\_3 - 0.135 \\ UA\_4 &- 0.263 \, UA\_3 - 0.041 \, PD\_2 - 0.101 \, PD\_3 - 0.254 \, PD\_4 - 0.406 \, PD\_3 - 0.036 \\ AD\_2 &- 0.085 \, AD\_3 - 0.212 \, AD\_4 - 0.284 \, AD\_3 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted between October 2015 and July 2016.

#### (ii) **Sample size; sample frame**

1451 interviews with the general population were conducted over the country. Random sampling stratifed by gender and age group, based on the Portuguese census (Portuguese Statistical Offce 2012), was applied that was originally designed based on the random route sampling method. Of the 1450


respondents included in the fnal value set, 56.9% were female and 43.1% were male.The age distribution of the respondents was:

#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Portuguese population in terms of age (over 18 years), and gender of the general population of mainland Portugal and the islands (Table 4.66).

**Table 4.66** Representativeness of the sample in the Portuguese valuation study


Reproduced from Ferreira et al. (2019)

a All respondents included in the fnal value set are presented

b Portuguese Statistical Offce 2012

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.67)


**Table 4.67** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.34)

**Fig. 4.34** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

A total of 7.7% of cTTO responses (n = 1119) were removed following the feedback module. In addition, one participant with a positive slope on the regression between his cTTO values and the level sum score was also excluded. No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1451 interviews were conducted by 28 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (5.7%), respondents (20.0%), and responses (74.3%).

# (viii) **Description of modelling choices** The Portuguese EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at -1 tobit model for the cTTO data, correcting for heteroskedasticity. The intercept was constrained in the fnal model.

#### (ix) **Value Set** (Table 4.68 and Fig. 4.35)

**Table 4.68** Key characteristics of the Portuguese value set


**Fig. 4.35** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

The National Authority of Medicines and Health Products (Infarmed) recommends in its offcial methodological guidelines for economic evaluation studies of health technologies that the EQ-5D-5L questionnaire and the Portuguese EQ-5D-5L value set should be the preferred measure used to assess HRQoL in cost-utility analyses (CUA) (Perelman et al. 2019).

#### (xi) **Reference(s) for this value set**

Ferreira PL, Antunes P, Ferreira LN, Pereira LN, Ramos-Goñi JM (2019) A hybrid modelling approach for eliciting health states preferences: the Portuguese EQ-5D-5L value set. Qual Life Res 28(12):3163–3175

#### **Further Literature**

Perelman J, Soares M, Mateus C, Duarte A, Faria R, Ferreira L, Saramago P, Veiga P, Furtado C, Caldeira S, Teixeira MC, Sculpher M (2019) Methodological Guidelines for Economic Evaluation Studies of Health Technologies. INFARMED – National Authority of Medicines and Health Products, I.P., Lisbon. https://www.infarmed.pt/web/infarmed-en/human-medicines. Accessed 25 June 2021

Portuguese Statistical Offce (2012) Census 2011. Portuguese Statistical Offce, Lisbon

#### **4.3.3.8 Country/Region: Taiwan (Table 4.69)**


**Table 4.69** Overview of EQ-5D-5L value set for Taiwana

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities a These coeffcients present the decrement from level 1 to the respective level (regular dummies), whereas Lin et al. (2018) report coeffcients representing the additional decrement of moving from one level to another (incremental dummies)

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.108 \, MO\_2 - 0.200 \, MO\_3 - 0.365 \, MO\_4 - 0.477 \, MO\_3 - 0.076 \\ SC\_2 &- 0.132 \, SC\_3 - 0.264 \, SC\_4 - 0.324 \, SC\_5 - 0.073 \, UA\_2 - 0.123 \, UA\_3 - 0.280 \\ UA\_4 &- 0.351 \, UA\_3 - \textbf{0.087} \, PD\_2 - 0.158 \, PD\_3 - 0.340 \, PD\_4 - 0.453 \, PD\_3 - 0.064 \\ AD\_2 &- 0.183 \, AD\_3 - 0.340 \, AD\_4 - 0.421 \, AD\_3 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.0. Interviews were conducted between January and July 2017.

# (ii) **Sample size; sample frame**

1000 interviews with the general population were conducted in nine randomly selected cities located in six geographic regions in Taiwan: Taipei, New Taipei, Tauyuan, Chinju, Hualien, Taichung, Chiayi, Tainan, and Kaohsiung. Multi-stage stratifed quota sampling with respect to region (stage 1) and age (over 20), gender, and education (stage 2) was applied (Department of household registration, Ministry of the Interior, Taiwan 2016). Of the 1000 respondents included in the fnal value set, 50.5% were female and 49.5% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Taiwanese population in terms of age, gender, and living area. Respondents with higher education were overrepresented while respondents with primary school education were underrepresented in the study sample (Table 4.70).


**Table 4.70** Representativeness of the sample in the Taiwanese valuation study

Reproduced from Lin et al. (2018)

a Department of household registration, Ministry of the Interior, Taiwan 2016

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.71)


**Table 4.71** Mean observed cTTO values by health state

*SE* standard error

(v) **Proportions choosing A in the DCE based on relative severities of A and** 

**Fig. 4.36** Proportions choosing A based on relative severities of A and B

(vi) **Exclusion criteria**

No data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1000 interviews were conducted by 10 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (1.3%), respondents (15.5%), and responses (83.2%).

#### (viii) **Description of modelling choices**

The Taiwanese EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at -1 tobit GLS model for the cTTO data. The intercept was constrained in the fnal model because it was non-signifcant.

#### (ix) **Value Set** (Table 4.72 and Fig. 4.37)

**Table 4.72** Key characteristics of the Taiwanese value set


**Fig. 4.37** Value decrements across dimensions (AD anxiety/depression, MO mobility, PD pain/ discomfort, SC self-care, UA usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

Conducting local cost-effectiveness analyses for new technologies is encouraged in the offcial guideline of the Center for Drug Evaluation (CDE) in Taiwan (CDE 2021a, 2021b; Taiwan Society for Pharmacoeconomics and Outcomes Research [TaSPOR] 2014). The EQ-5D is listed as one of three recommended tools for multi-attribute utility system in this HTA guideline (TaSPOR 2014). A local value set was recommended, although no local value sets were available at the time of publishing this HTA guideline. Under the implementation of NHI in Taiwan, it was announced that "new drug applications with local pharmacoeconomic studies are more likely to get reimbursement (…) [and the] additional markup of up to 10% of pricing decision (…) [can be added once the] applicants conduct a local pharmacoeconomic study" after quality review of the HTA/CDE (Chen et al. 2018). As a result, the EQ-5D-5L value set for Taiwan reported here was endorsed by CDE and TaSPOR in a joint conference ("Workshop of health utility applications - How to use Taiwan EQ-5D-5L value set to promote the impacts of pharmacoeconomics and outcomes research?" November 29, 2019). Given that the large value range of the Taiwanese value set might infuence the future calculation of quality-adjusted life years (QALYs) in cost-utility analyses, it was also agreed that there was a need for further research and discussions to facilitate more reasonable decision-making about adopting a national set of EQ-5D-5L weights for QALYs calculations in that aforementioned conference. This was also endorsed by HTA/CDE. In addition to its use in HTA, the EQ-5D-5L is also included in the National Health Interview Survey.

#### (xi) **Reference(s) for this value set**

Lin HW, Li CI, Lin FJ, Chang JY, Gau CS, Luo N, Pickard AS, Ramos Goñi JM, Tang CH, Hsu CN (2018) Valuation of the EQ-5D-5L in Taiwan. PLoS ONE 13(12). https://doi.org/10.1371/journal.pone.0209344

#### **Further Literature**


#### **4.3.3.9 Country/Region: Denmark (Table 4.73)**


**Table 4.73** Overview of EQ-5D-5L value set for Denmark

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 041 0 054 0 157 0 220 0 035 0 050 2 3 4 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 144 0 209 0 033 0 040 0 139 0 174 . . . . . . 0 048 0 094 0 381 0 537 0 072 0 191 0 4 2 3 4 5 2 3 . . . . . . . *PD PD PD PD AD AD* 30 0 618 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.1. Interviews were conducted between October 2018 and November 2019.

#### (ii) **Sample size; sample frame**

1052 interviews with the general population were conducted. Randomly selected representative samples with respect to age, gender, education, and geographical region were provided by Statistics Denmark and a panel from a Danish market research company, to be invited to the study. Of the 1014 respondents included in the fnal value set, 51.6% were female and 48.4% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Danish population in terms of age (over 18 years), gender, education, and geographical region. However, higher educated respondents were slightly overrepresented compared to the general population (Table 4.74).


**Table 4.74** Representativeness of the sample in the Danish valuation study

Reproduced from Jensen et al. (2021) a Danmarks Statistik 2019

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.75)


**Table 4.75** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.38** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

5 interviews were dropped because the two interviewers that conducted these interviews were not suffciently available to do more interviews. Additionally, 12 interviews were dropped due to technical issues, respondents having cognitive or emotional issues, or withdrawing consent. Lastly, 21 respondents were excluded as they did not provide both cTTO and DCE data. A total of 7.0% of cTTO responses (n = 712) were removed following the feedback module.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1014 interviews were conducted by 13 interviewers. 11 interviewers were included in the fnal interviewer team (see section vi). The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (1.9%), respondents (23.8%), and responses (74.3%).

#### (viii) **Description of modelling choices**

The Danish EQ−5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 Tobit model for the cTTO data, correcting for heteroskedasticity. The intercept was constrained in the fnal model.

#### (ix) **Value Set** (Table 4.76 and Fig. 4.39)


**Table 4.76** Key characteristics of the Danish value set

**Fig. 4.39** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

As of 1st January 2021, the Danish Medicines Council started using qualityadjusted life years (QALYs) for the evaluation of new hospital-dispensed pharmaceuticals and change of indication. QALY estimates are expected to be based on EQ-5D-5L and the new Danish value set (Danish Medical Council 2021). Voluntary submissions of cost-utility analyses to the Reimbursement Committee (for prescription medicines prescribed by GPs) are also expected to make use of EQ-5D-5L with the Danish value set (Danish Medicines Agency). Furthermore, a new priority-setting council for new technology (excluding medicines) was established on 1st January 2021—the Danish Health Technology Council. Submissions to the council with costutility analysis are requested to use QALYs based on EQ-5D-5L and the Danish value set.

#### (xi) **Reference(s) of value set**

Jensen CE, Sørensen SS, Gudex C, Jensen MB, Pedersen KM, Ehlers, LH (2021) The Danish EQ-5D-5L Value set A Hybrid Model Using cTTO and DCE Data. Appl Health Econ Health Policy 19(4):579–591

#### **Further Literature**


#### **4.3.3.10 Country/Region: Ethiopia (Table 4.77)**


**Table 4.77** Overview of EQ-5D-5L value set for Ethiopiaa

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities a In the valuation study manuscript, the value set is reported in incremental dummies. For consistency, we report regular dummies here

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.034 \, MO\_2 - 0.064 \, MO\_3 - 0.228 \, MO\_4 - 0.360 \, MO\_3 - 0.024\\ SC\_2 &- 0.040 \, SC\_3 - 0.142 \, SC\_4 - 0.222 \, SC\_5 - 0.032 \, UA\_2 - 0.048 \, UA\_3 - 0.157\\ UA\_4 &- 0.272 \, UA\_5 - \textbf{0}.036 \, PD\_2 - 0.052 \, PD\_3 - 0.270 \, PD\_4 - 0.406 \, PD\_5 - 0.026\\ AD\_2 &- 0.085 \, AD\_3 - 0.299 \, AD\_4 - 0.458 \, AD\_5 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-VT protocol 2.1 and the EQ-PVT software. Interviews were conducted between March and May 2018.

#### (ii) **Sample size; sample frame**

1050 interviews with the general population were conducted, recruited from the capital Addis Ababa city, and Butajira, which is a rural region. Multistage stratifed quota sampling with respect to geographic area/residence, age, gender, and religion was applied. Of the 1048 respondents included in the fnal value set, 47.9% were female and 52.1% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Ethiopian population in terms of age (over 18 years), gender, residence and religion (Table 4.78).


**Table 4.78** Representativeness of the sample in the Ethiopian valuation study

Reproduced from Welie et al. (2020)

a Central Statistical Agency 2017

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.79)


**Table 4.79** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.40** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Respondents that valued all states as equal in the cTTO were excluded. Furthermore, respondents that valued the worst health state (55555) as higher than the mildest health state in their block were excluded. This amounts to 9 respondents excluded by the characteristics of their cTTO responses. Furthermore, 2 respondents' DCE data were excluded due to technical problems.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1050 interviews were conducted by 10 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (0.1%), respondents (10.5%) and responses (89.4%).

#### (viii) **Description of modelling choices**

The Ethiopian EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at −1 Tobit model for the cTTO data, correcting for heteroskedasticity. The intercept was constrained in the fnal model.

#### (ix) **Value Set** (Table 4.80 and Fig. 4.41)


**Fig. 4.41** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

The Ethiopian Federal Ministry of Health (FMOH) has set up an HTA agency in Ethiopia. However, HTA is not yet used for reimbursement decisions in Ethiopia.

# (xi) **Reference(s) of value set**

Welie AG, Gebretekle GB, Stolk E, Mukuria C, Krahn MD, Enquoselassie F, Fenta TG (2020) Valuing health state: an EQ-5D-5L value set for Ethiopians. Value Health Reg Issues 22:7–14

#### **Further Literature**

Central Statistical Agency (2017) Demographic and Health Survey 2016. https:// dhsprogram.com/pubs/pdf/FR328/FR328.pdf. Accessed 28 July 2021

# **4.3.3.11 Country/Region: Hungary (Table 4.81)**


**Table 4.81** Overview of EQ-5D-5L value set for Hungary

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 035 0 089 0 263 0 455 0 045 0 089 2 3 4 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 241 0 366 0 035 0 085 0 217 0 276 . . . . . . 0 043 0 073 0 288 0 411 0 040 0 093 0 2 2 3 4 5 2 3 . . . . . . . *PD PD PD PD AD AD* 61 0 340 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.1. Interviews were conducted from May 2018 until March 2019.

#### (ii) **Sample size; sample frame**

1000 interviews with the general population were conducted. A non-probability quota sampling was used, and quotas were set for age and sex according to the latest data reported by the Hungarian Central Statistical Offce (2016). Of the 1000 respondents included in the fnal value set, 53.3% were female and 46.7% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the Hungarian general population in terms of age and sex. The distribution of the study sample in terms of marital status, employment status, and area of residence approximated that of the general population. Higher-educated respondents and inhabitants of Central Hungary were slightly overrepresented in the Hungarian valuation study (Table 4.82).


**Table 4.82** Representativeness of the sample in the Hungarian valuation study

184

#### **Table 4.82** (continued)


Reproduced from Rencz et al. (2020)

a Hungarian Central Statistical Offce 2016

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.83)


**Table 4.83** Mean observed cTTO values by health state


#### **Table 4.83** (continued)

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.42)

**Fig. 4.42** Proportions choosing A based on relative severities of A and B

# (vi) **Exclusion criteria**

A total of 6.3% of cTTO responses (n = 634) were removed following the feedback module, but no respondent's entire cTTO responses were excluded.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1000 interviews were conducted by 13 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (1.7%), respondents (10.7%), and responses (87.6%).

#### (viii) **Description of modelling choices**

The Hungarian EQ-5D-5L value set was based on the cTTO data only. The selected model was a pooled heteroskedastic tobit model, left-censored at -1. The intercept was constrained in the fnal model because it was non-signifcant.

#### (ix) **Value Set** (Table 4.84 and Fig. 4.43)


**Table 4.84** Key characteristics of the Hungarian value set

**Fig. 4.43** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

In Hungary, guidelines on methods for economic evaluations in healthcare are developed by the Ministry of Human Capacities (EMMI 2017), and submissions are critically appraised at the Division for HTA at the National Institute for Pharmacy and Nutrition (OGYÉI 2019). Cost-utility analysis is the preferred form of economic evaluation of new health technologies. EQ-5D-5L is the preferred measure to calculate QALY's (EMMI 2017). From 2020, the availability of the Hungarian value sets for both EQ-5D-5L and EQ-5D-3L are expected to increase the use and diffusion of the EQ-5D in Hungary.

# (xi) **Reference(s) for this value set**

Rencz F, Brodszky V, Gulácsi L, Golicki D, Ruzsa G, Pickard AS, Law EH, Péntek M (2020) Parallel Valuation of the EQ-5D-3L and EQ-5D-5L by Time Trade-Off in Hungary. Value Health 23(9):1235–1245

#### **Further Literature**


#### **4.3.3.12 Country/Region: Mexico (Table 4.85)**


**Table 4.85** Overview of EQ-5D-5L value set for Mexico

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC* 1 0 0160 0 0473 0 1786 0 2697 0 0476 0 2 3 4 5 2 . . . . . .0819 0. . 1697 0 2589 0. . 0553 0 0952 0.1798 3 4 5 2 <sup>3</sup> *SC SC SC UA UA UA* 4 5 2 3 4 5 0. . 2758*UA* 0 0531*PD* 0. . 0808*PD* 0 2283*PD* 0. . 4579*PD* 0 0551 0 0824 0 1611 0 3337 *AD*2 3 . . *AD AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.1. Interviews were conducted from June until August 2019.

#### (ii) **Sample size; sample frame**

A nationally representative sample of Mexican adults > 18 years stratifed by sex, age and socioeconomic level were obtained using sample frames developed by CONAPO, the Mexican Offce of Statistics and Geography and the socioeconomic classifcation of households by the Mexican Association of Marketing Research and Public Opinion Agencies (AMAI). Of the 1000 respondents included in the analysis, 48.6% were female and 51.4% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the adult Mexican population in terms of age, sex, geographical region and socioeconomic group (Table 4.86).



Reproduced from Gutierrez-Delgado et al. (2021)

a Instituto Nacional de Estadística y Geografía 2015

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.87)


**Table 4.87** Mean observed cTTO values by health state

*SE* standard error

**Fig. 4.44** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Using the feedback module, 2.1% of the cTTO responses were deemed problematic and were excluded from analysis. No DCE data were excluded from the analysis.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1000 interviews were conducted by 15 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (0.99%), respondents (7.61%), and responses (91.40%).

#### (viii) **Description of modelling choices**

The observed cTTO values for the non-fagged health states after the feedback module were used (i.e. the respondent's fagged cTTO observations were excluded) (details on the feedback module and its use are provided in Chap. 2).

The Mexican EQ-5D-5L value set was based on a 20-parameter model estimated using only cTTO data. The value set is based on a heteroscedastic Bayesian model with censoring at −1.

#### (ix) **Value Set** (Table 4.88 and Fig. 4.45)


**Table 4.88** Key characteristics of the Mexican value set

**Fig. 4.45** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

# (x) **Uptake by local HTA/health care decision makers**

Mexico's General Health Council (GHC) is the collegiate body responsible for updating the National Compendium of Healthcare Supplies of the public health care institutions. The Compendium aims to strengthen the evaluation of health care technologies, to optimize public resources directed at addressing health problems and to notify and update health professionals. The GHC periodically updates the health technology assessment (HTA) processes used to determine inclusion in the Compendium. The most recent update of the HTA process includes cost-utility analysis (CUA) as a complementary evaluation that can be presented to strengthen the cost-effectiveness and budget impact analyses of health technologies seeking inclusion in the Compendium. It can be foreseen that the Mexican value set will encourage the development of CUA in the near future.

# (xi) **Reference(s) for this value set**

Gutierrez-Delgado C, Galindo-Suárez RM, Cruz-Santiago C, Shah K, Papadimitropoulos M, Zamora B, Feng Y, Devlin N (2021) EQ-5D-5L Health-State Values for the Mexican Population. Appl Health Econ Health Policy. https://doi.org/10.1007/s40258-021-00658-0

#### **Further Literature**


#### **4.3.3.13 Country/Region: Peru (Table 4.89)**


**Table 4.89** Overview of EQ-5D-5L value set for Perua

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities a In the valuation study manuscript, the value set is reported in incremental dummies. For consistency, we report regular dummies here

The mathematical representation of the model for health state X is:

$$\begin{aligned} V(X) &= 1 - 0.104 \, MO\_2 - 0.223 \, MO\_3 - 0.312 \, MO\_4 - 0.473 \, MO\_3 - 0.117 \\ SC\_2 &- 0.214 \, SC\_3 - 0.264 \, SC\_4 - 0.355 \, SC\_5 - 0.143 \, UA\_2 - 0.157 \, UA\_3 - 0.231 \\ UA\_4 &- 0.347 \, UA\_3 - \textbf{0.072} \, PD\_2 - 0.132 \, PD\_3 - 0.287 \, PD\_4 - 0.476 \, PD\_5 - 0.123 \, D\_5 \\ AD\_2 &- 0.126 \, AD\_3 - 0.188 \, AD\_4 - 0.422 \, AD\_5 \end{aligned}$$

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using a "Lite" version of the EQ-5D-5L valuation protocol 2.1 (see also Chap. 3 for more details). Interviews were conducted between April 2018 and February 2019.

# (ii) **Sample size; sample frame**

1000 interviews with the general population were conducted in three cities located in different parts of Peru: Lima, Arequipa and Iquitos. 300 of these respondents completed a full cTTO + DCE interview, while 700 completed DCE tasks only. Sampling was stratifed by region, age and gender. Of the 300 respondents included in the fnal value set, 49.7% were female and 50.3% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample** (Table 4.90)


**Table 4.90** Representativeness of the sample in the Peruvian valuation study

Reproduced from Augustovski et al. (2020)

a Instituto Nacional de Estadística e Informática 2017

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.91)


**Table 4.91** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.46)

**Fig. 4.46** Proportions choosing A based on relative severities of A and B

#### (vi) **Exclusion criteria**

Respondents were excluded if they did not belong to the age range that was intended to be sampled (18–75 years old). Furthermore, respondents were excluded if they did not live in a household selected through the sampling strategy. A total of 30 respondents were excluded for these reasons.

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1000 interviews were conducted by 12 interviewers. 5 interviewers performed cTTO + DCE interviews, while the other 7 interviewers conducted DCE interviews only. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (0.7%), respondents (27.5%) and responses (71.8%).

# (viii) **Description of modelling choices**

The Peruvian EQ-5D-5L value set was based on the cTTO data only. The selected model was a Tobit model, that accounts for censoring at −1, with a correction for heteroskedasticity. The model was additionally adjusted by differential weights to improve the representativeness of the sample compared to the general population in Peru. The intercept was constrained in the fnal model.

(ix) **Value Set** (Table 4.92 and Fig. 4.47)


**Table 4.92** Key characteristics of the Peruvian value set

**Fig. 4.47** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

Cost-utility analysis (CUA) is currently not required by the Peruvian Ministry of Health, nor by national insurance agencies (Seguro Integral de Salud and Fondo Intangigle Solidario en Salud). However, previously CUAs have been used for decision making, especially on vaccination recommendations, using weights from other countries (Bolaños-Díaz et al. 2017; Bolaños-Días et al. 2016). Furthermore, researchers at CRONICAS institute for chronic diseases have been using the Peruvian EQ-5D-5L values in ongoing research. The National HTA Network, Red Nacional de Evaluación de Tecnologías Sanitarias (RENETSA), currently makes no specifc recommendation on the use of EQ-5D-5L in economic evaluations in health care.

#### (xi) **Reference(s) of value set**

Augustovski F, Belizán M, Gibbons L, Reyes N, Stolk E, Craig BM, Tejada RA (2020) Peruvian Valuation of the EQ-5D-5L: A Direct Comparison of Time Trade-Off and Discrete Choice Experiments. Value Health 23(7):880–888

#### **Further Literature**


#### **4.3.3.14 Country/Region: United States (Table 4.93)**


**Table 4.93** Overview of EQ-5D-5L value set for the United States

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 096 0 122 0 237 0 322 0 089 0 107 2 3 4 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 220 0 261 0 068 0 101 0 255 0 255 . . . . . . 0 060 0 098 0 318 0 414 0 057 0 123 0 2 234 5 2 3 . . . . . . . *PD PD PD PD AD AD* 99 0 321 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.1. Interviews were conducted between May and September 2017.

# (ii) **Sample size; sample frame**

1134 interviews with the general population were conducted in 6 US metropolitan areas: Chicago, Philadelphia, Seattle, Birmingham, Phoenix, and Denver. Quota sampling with respect to age, gender, ethnicity, and race was applied based on the US census (US Census Bureau 2015). Of the 1062 respondents included in the fnal value set, 51.0% were female and 48.5% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was representative for the US adult population in terms of age, gender, race, ethnicity, prevalence of chronic disease, and general health status (Table 4.94).


#### **Table 4.94** Representativeness of the sample in the US valuation study


#### **Table 4.94** (continued)

Reproduced from Pickard et al. (2019)

a All respondents of the 'analytic' sample are included, each of whom provided useable cTTO and DCE data (n=1102)

b Li et al. 2011; Centers for Disease Control and Prevention (CDC) 2018; US Census Bureau 2015

#### (iv) **Meactes** (Table 4.95)


**Table 4.95** Mean observed cTTO values by health state


**Table 4.95** (continued)

*SE* standard error

**Fig. 4.48** Proportions choosing A based on relative severities of A and B

# (vi) **Exclusion criteria**

Respondents who did not understand the cTTO tasks (n = 72) were excluded from the analysis, based on interviewer assessment. Moreover, a total of 11.6% of cTTO responses (n = 1234) were removed following the feedback module from respondents who were deemed to have comprehended the task based on interviewer assessment.

# (vii) **Number of interviewers; Interviewer effects**

In total, 1134 interviews were conducted by 11 interviewers. The variance of the responses included in the fnal value set can be partitioned into variance related to differences between interviewers (0.04%), respondents (35.25%), and responses (64.71%).

# (viii) **Description of modelling choices**

The US EQ-5D-5L value set was based on the cTTO data only. The selected model was a tobit model, left-censored at −1, correcting for heteroskedasticity, and accounting for panel data (i.e., random intercept). Usual activities levels 4 and 5 were constrained to have the same value decrement. The intercept was constrained in the fnal model because it was non-signifcant.

# (ix) **Value Set** (Table 4.96 and Fig. 4.49)


**Table 4.96** Key characteristics of the US value set

**Fig. 4.49** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

In the US, the Affordable Care Act prevents the denial of coverage of medical products or services in a manner that treats extending the life of an elderly, disabled, or terminally ill individual as of lower value than extending the life of healthy individuals. However, the absence of drug price regulation and instances of predatory business practices has led to the emergence of the nonproft Institute for Clinical Evaluation and Review as a highly infuential organisation that assesses the fairness of drug prices using QALYs as one of the metrics. The EQ-5D-3L forms the basis for many utility inputs and this will likely extend to the EQ-5D-5L as greater use is seen in the literature. In the two years since the US value set was published, the EQ-5D-5L has been employed as an endpoint in clinical trials for pharmaceuticals as well as psychosocial interventions, with evidence that favorably supports its validity/ responsiveness for a range of applications and health care interventions (Courtin et al. 2020; Hanmer et al. 2021; Reveille et al. 2021; Xiang et al. 2020).

#### (xi) **Reference(s) for this value set**

Pickard AS, Law EH, Jiang R, Pullenayegum E, Shaw JW, Xie F, Oppe M, Boye KS, Chapman RH, Gong CL, Balch A, Busschbach JJV (2019) United States Valuation of EQ-5D-5L Health States Using an International Protocol. Value Health 2(8):931–941

#### **Further Literature**


#### **4.3.3.15 Country/Region: Vietnam (Table 4.97)**


**Table 4.97** Overview of EQ-5D-5L value set for Vietnam

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

The mathematical representation of the model for health state X is:

*V X MO MO MO MO SC S* 1 0 069 0 079 0 206 0 376 0 043 0 046 2 3 4 5 2 . . . . . . *C SC SC UA UA UA UA* 3 4 5 2 3 4 5 0 147 0 231 0 046 0 059 0 174 0 299 . . . . . . 0 084 0 152 0 270 0 367 0 064 0 113 0 1 234 5 2 3 . . . . . . . *PD PD PD PD AD AD* 71 0 239 *AD*4 5 . *AD*

#### (i) **Date/wave of study**

Data were collected in the third wave of EQ-5D-5L valuation studies using the EQ-5D-5L valuation protocol 2.1. The following two adjustments were made to the EQ-VT: (1) the respondent was asked to answer the cTTO tasks for someone like them ("third person") instead of themselves as per protocol, (2) additional use of printed coloured DCE cards as visual aid. Interviews were conducted between November and December 2017.

# (ii) **Sample size; sample frame**

1200 interviews with the general population were conducted in six provinces, representing different geographical regions: Northern mountains, the Red River delta, the Highlands, Central Coast, the South-East and the Mekong river delta. Multi-stage stratifed cluster sampling with respect to region (stage 1) and residential area (stage 2), and a probabilistic quota-based method (stage 3) with respect to age (over 18 years) and gender was applied (Vietnam General Statistic Offce 2017). Of the 1200 respondents included in the fnal value set, 51% were female and 49% were male. The age distribution of the respondents was:


#### (iii) **Representativeness of achieved sample**

The study sample was largely representative for the Vietnamese general population in terms of age, gender, and residential area (Table 4.98).


**Table 4.98** Representativeness of the sample in the Vietnamese valuation study


#### **Table 4.98** (continued)

Reproduced from Mai et al. (2020)

a Vietnam General Statistic Book 2017

b Poverty level was based on Vietnam offcial poverty line

#### (iv) **Mean observed cTTO values of EQ-5D-5L states** (Table 4.99)


**Table 4.99** Mean observed cTTO values by health state

*SE* standard error

# (v) **Proportions choosing A in the DCE based on relative severities of A and B** (Fig. 4.50)

**Fig. 4.50** Proportions choosing A based on relative severities of A and B

## (vi) **Exclusion criteria**

A total of 8.9% of cTTO responses (n = 1068) were removed following the feedback module; but no respondent's entire cTTO responses were excluded. The observations of the ten manually added DCE pairs were excluded from the data analysis (details on the DCE design are provided in Chap. 3).

#### (vii) **Number of interviewers; Interviewer effects**

In total, 1200 interviews were conducted by 10 interviewers. The variance of the responses can be partitioned into variance related to differences between interviewers (4.0%), respondents (13.7%), and responses (82.3%).

#### (viii) **Description of modelling choices**

The Vietnamese EQ-5D-5L value set was based on a hybrid model combining a conditional logit model for the DCE data and a censored at -1 tobit model for the cTTO data. The intercept was constrained in the fnal model.

#### (ix) **Value Set** (Table 4.100 and Fig. 4.51)


**Table 4.100** Key characteristics of the Vietnamese value set

**Fig. 4.51** Value decrements across dimensions (*AD* anxiety/depression, *MO* mobility, *PD* pain/ discomfort, *SC* self-care, *UA* usual activities)

#### (x) **Uptake by local HTA/health care decision makers**

Vietnam's Ministry of Health (MOH) has taken a frst step toward applying evidence-based medicine to health policy making process by enacting the national HTA guidelines and upgrading the health insurance package with cost-effective drugs based on HTA evidence (Ministry of Health 2017). The MOH has applied evidence-based HTA to produce well-informed healthcare decisions, initially in health insurance. According to the national guideline on HTA submissions, QALY is a required index. QALY estimates can either be sourced from related literature or measured directly using the suggested instrument, EQ-5D-5L. In Vietnam, the EQ-5D-5L is presently the only HRQoL instrument that can produce values which are based on the preferences of the Vietnamese general population (Mai et al. 2020).

#### (xi) **Reference(s) for this value set**

Mai VQ, Sun S, Minh HV, Luo N, Giang KB, Lindholm L, Sahlen KG (2020) An EQ-5D-5L Value Set for Vietnam. Qual Life Res 29(7):1923–1933

#### **Further Literature**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 Guidance to Users of EQ-5D-5L Value Sets**

**Nancy Devlin, Aureliano Paolo Finch, and David Parkin**

**Abstract** One of the most common questions that the EuroQol Group is asked by users of the EQ-5D-5L is: 'Which value set should I use?'. The aim of this chapter is to provide guidance on this issue for users. There are two principal ways that EQ-5D-5L value sets are applied and used. The frst is for summarising healthrelated quality of life to estimate quality-adjusted life-years (QALYs) and changes in QALYs that result from the health care use. This kind of evidence is often part of health technology assessment (HTA). The second category of use is when value sets are employed as a way of summarising and statistically analysing EQ-5D-5L profle data without the aim of estimating QALYs. In each case, the stated requirements of those who use this evidence in decision making is a key consideration. This chapter summarises the relevant considerations to be taken into account when choosing a value set for QALY estimation purposes; and the considerations which are relevant to choosing a value set to use in other, 'non-QALY' applications.

# **5.1 Introduction**

One of the most common questions that the EuroQol Group is asked by users of the EQ-5D-5L is: '*Which value set should I use*?'. There is no simple answer to this, as it depends on the user's objectives in using the instrument, the decisions that it informs, and the context in which the information will be used. Selecting an EQ-5D-5L value set will also be affected by the availability of value sets and their acceptability to users. Which value set to use is straightforward under two

N. Devlin (\*)

D. Parkin Offce of Health Economics, London, UK

City, University of London, London, UK

Centre for Health Policy, University of Melbourne, Melbourne, VIC, Australia e-mail: nancy.devlin@unimelb.edu.au

A. P. Finch EuroQol Research Foundation, Rotterdam, The Netherlands

conditions: (a) an EQ-5D-5L value set, based on the EQ-VT protocol described in Chap. 2, is available for the country to which the data to be analysed refer; and (b) that value set is acceptable to those who will make decisions based on it.

However, in many countries a local EQ-VT-generated EQ-5D-5L value set is not available; and even if there is one there is no guarantee that local decision-makers will accept it. In these circumstances, alternatives include using another country's value set that was generated using the EQ-VT protocol; using a value set generated by an alternative valuation method; and mapping from the EQ-5D-5L to the EQ-5D-3L, where a local value set exists for the latter.

This chapter guides potential users through these and other issues that arise when choosing an EQ-5D-5L value set. Sections 5.2 and 5.3 present an overview of the principal considerations relevant to users, providing an easy access guide. Section 5.4 discusses some more technical and theoretical issues.

The frst and most important question for any user of an EQ-5D-5L value set is: '*What is the purpose of representing EQ-5D-5L profle data as a single number?'.* There are broadly two main categories of use that can be identifed. A frst important category is when the EQ-5D-5L is used for summarising health-related quality of life (HRQoL) to estimate Quality Adjusted Life Years (QALYs) and changes in QALYs that result from the health care use. This kind of evidence is often part of health technology assessment (HTA). Section 5.2 discusses relevant considerations about choosing a value set for QALY estimation.

The second important category of use is when value sets are employed as a way of summarising and statistically analysing EQ-5D-5L profle data without the aim of estimating QALYs. Section 5.3 summarises the considerations relevant to choosing which value set to use in these 'non-QALY' applications.

# **5.2 Which Value Set Should Be Used to Estimate QALYs? – An Overview**

The use of EQ-5D-5L values to estimate QALYs imposes requirements on the characteristics of those values. This specifc use of values is of such importance that these requirements are largely built into the methods for eliciting and modelling them. Unfortunately, there is no consensus about the theoretical properties that the values used to estimate QALYs should have, as refected in ongoing debates about which valuation methods best meet those properties. However, some principles are widely adopted, and requirements that meet these, detailed in Box 5.1, underlie all of the value sets produced using the EQ-VT protocol (see Chap. 4). Other valuation protocols may not. For example, value sets that rely exclusively on Discrete Choice Experiments (DCE) without a duration attribute or any other means of anchoring the DCE responses do not meet these requirements, largely ruling them out for use for QALY estimation.

#### **Box 5.1: What Properties do EQ-5D-5L Values Need to Have to Be Suitable for Use in Estimating QALYs for Economic Evaluation?**

For use in economic evaluation, QALYs must have some basic properties, for example that they can be used as an *unambiguous* measure of the *value* of *every* health care intervention (Morris et al. 2012). How this translates into requirements for the health state values that form the 'Q' element of QALYs is less clear and subject to debates over both economic and psychometric theory and practice. Possibly the only universally agreed property for these values derives from the defnition of a QALY; full health maintained over one year will generate one QALY, implying that the value attached to full health should be equal to 1. Current practice underlying the value sets described in this book is therefore open to debate but does meet the basic requirements for measuring QALYs. It assumes that, at a minimum, values should be:


These requirements contributed to the EuroQol Group's decision to use time trade-off (TTO) and Discrete Choice Experiments (DCE) in the EQ-VT protocol for EQ-5D-5L valuation studies (See Chap. 2). Section 5.4.2 briefy discusses these issues further, with suggested further reading.

Values are sometimes referred to as 'utilities', but the value sets described in this book do not claim to measure utility according to any of its conventional technical defnitions (see Drummond et al. 2015, Chapter 5, Section 5.4.2). For example, they may not conform to the axioms underlying von Neuman-Morgenstern measurable utility under conditions of uncertainty based on expected utility theory (EUT). The Standard Gamble (SG) method aims to elicit such utilities but is not widely used because of concerns about the validity of EUT and the ability of respondents to judge probabilities. Other value set properties required for estimating QALYs, such as constant proportionality and additive independence, are assumed to be satisfed, as is the case with all HRQoL instruments accompanied by values.

Figure 5.1 presents a summary of the main considerations in choosing an EQ-5D-5L value set when the main aim is QALY estimation. First, users should assess whether the QALY analysis is for use in HTA or other purposes, and who will be informed by it. HTA bodies and other decision-makers using QALY evidence may have specifc recommendations about their preferred value set, which in most

cases would be the frst choice for the base case. If not, the choice to be made depends on factors such as the local availability of value sets, the relevance of available non-local value sets and, in either case, their empirical characteristics and their theoretical properties. These issues are discussed in more detail in the following sections.

# *5.2.1 End Users' Requirements and Recommendations*

'End users' refers to whoever the analysis of EQ-5D-5L data is intended to inform. This could be national or local government bodies, HTA organisations, local health care budget holders, health care providers and insurers, health care professionals, patients or the general public. In practice, it is likely that the only end users who will specify a preferred or accepted value set are HTA organisations. Hence, when EQ-5D-5L data are analysed to generate estimates of QALYs for cost-effectiveness analysis, we recommend frst consulting whether the relevant HTA body or other stakeholder has published a 'methods guide' or provide guidance stating their requirements for value sets selection.

Kennedy-Martin et al. (2020) provide a summary of stated requirements of health care decision-making bodies internationally regarding the valuation of health states. For example, the National Institute for Health and Care Excellence (NICE) in the UK (NICE 2013; currently being updated), Zorginstituut in the Netherlands (Zoorginatituut Nederlands 2016) and Haute Authorité de Santé in France (HAS 2020) each provide HTA methods guides on how EQ-5D-5L data should be valued for submissions to them. The Pharmaceutical Benefts Advisory Committee (PBAC) in Australia, in contrast to European agencies, is less prescriptive about which HRQoL instrument to use, and which value set to employ in conjunction with them (PBAC 2016). In most cases, HTA authorities' methods guides state that a value set based on the stated preferences of that country's general public is recommended. There are exceptions, for example, Sweden's Dental and Pharmaceutical Benefts Agency (TLV) indicates that the values used in submissions to them should refect Swedish patients' experienced values, i.e. 'appraisals of persons in the health condition in question' (TLV 2003, 2017), rather than stated preferences of the Swedish general public.

There may be cases in which there is no end user guidance about value sets, or the guidance provided is too broad to assist in choosing between alternative value sets. This is a particular problem when QALY estimates are derived from multicountry trial data, or are used as evidence in multiple HTA submissions, or both. The choice of value set may be made even more diffcult if the end user is a global organisation making recommendations that affect multiple countries. In these instances, the choice of value set is left to the user. In Sects. 5.2.2 to 5.2.4 we describe the criteria that users should consider in such cases.

# *5.2.2 Relevance to the Population to Whom the Analysis Refers*

To our knowledge, most HTA methods guides recommend that QALY estimates should ideally be based on values obtained locally, that is from the area over which that HTA body has jurisdiction. This ensures that resource allocation decisions refect that country's preferences about the relative importance of different health problems. There are more national EQ-5D value sets available than for any other generic measure of HRQoL. The availability of EQ-5D-5L value sets will continue to expand, as further countries undertake valuation studies to support the development and expansion of HTA worldwide. However, there will inevitably remain countries where no local value sets are available.

For a country that does not have an EQ-5D-5L value set but does have an EQ-5D-3L value set, mapping between the two descriptive systems provides one means of valuing the EQ-5D-5L – see Box 5.2 below. Mapping methods have also been used to estimate a link between the EQ-5D and other condition-specifc measures of HRQoL, but these will not be discussed here as they do not produce a value set for the EQ-5D-5L. The use of mapping methods may meet HTA requirements; for example, current NICE guidance recommends mapping EQ-5D-5L to the EQ-5D-3L (NICE 2019) thereby allowing use of values from the York MVH 'A1 Tariff' EQ-5D-3L value set (MVH Group 1995).

Analysts are therefore recommended to consult relevant local HTA methods guides before choosing whether to use a mapping method, and which one to use. Box 5.2 provides further details on mapping.

If there are no local value sets for either the EQ-5D-5L *or* EQ-5D-3L, an obvious suggestion is to use a value set from a country that has a similar population, considering socio-demographic, cultural and linguistic characteristics that might be expected to infuence health preferences (evidence about how such characteristics infuence values is presented in Chap. 6). That is straightforward if there is only one such country, and their value set satisfes the other criteria detailed below. Where there is more than one value set which may be considered relevant and acceptable, the choice of value set should be subject to sensitivity analysis.

A special case is where a study is undertaken in more than one distinct population, as may be the case with, for example, a multi-country or multi-region clinical trial. While it has been proposed to use a single value set to represent the preferences for a region or continent when available (e.g., Greiner et al. 2003; Łaszewska et al. 2020), this solution is currently not widely applied. The possibility of developing regional value sets for EQ-5D-5L is explored in Chap. 6. If the results of the clinical study are to be used in different HTA jurisdictions, each of which makes recommendations about the use of value sets, these should be followed - which might result in more than one value set applied to the same data.

There are advantages to having a single value set that could be used in cases where there is no local alternative, or the values are required to cover more than one locality – for example, in enabling comparison of results in such cases. The EQ-5D

#### **Box 5.2: Mapping Between 3L and 5L to Create Value Sets**

The most notable example of the application of mapping methods to create value sets for the EQ-5D-5L that may be used when no valuation studies are available is van Hout et al. (2011). In this response mapping study, data from 3691 patients in six European countries who completed both the EQ-5D-3L and EQ-5D-5L were analysed using four different statistical methods. The chosen method was the 'indirect non-parametric method', which assumed independence of each EQ-5D dimension and removed inconsistent responses such as choosing level 1 on the 3L and level 5 on the 5L. This generates transition probabilities: the probability that a person would have recorded a particular response to the EQ-5D-3L given the response they gave to the EQ-5D-5L. The resulting 243 x 3125 table of transition probabilities can be applied to any EQ-5D-3L value set to generate a 5L 'crosswalk' value set.

At the time when the van Hout et al. (2011) mapping was developed, EQ-5D-5L value set studies had not yet been initiated, which made it impossible to develop a bi-directional crosswalk. More recently, following users' demand and due to the availability of EQ-5D-5L value sets, the same data used in the original van Hout et al. (2011) crosswalk were employed for mapping the EQ-5D-3L to the EQ-5D-5L, using indirect non-parametric and ordinal logistic regression methods (van Hout and Shaw 2021).

An alternative response mapping approach for deriving EQ-5D-5L or EQ-5D-3L values has been proposed by Hernández-Alava and Pudney (2017), but it currently remains less used. This mapping was re-estimated on multiple samples, with the most recent estimation being based on a large dataset of English responders (Hernández-Alava et al. 2020). Its statistical performance is similar to that of the van Hout crosswalk for mapping the EQ-5D-3L to the EQ-5D-5L (Hernández-Alava et al. 2020). The van Hout and Shaw (2021) mapping, using ordinal logistic regression including regressors coding for other EQ-5D-3L dimensions, show a slightly better performance than that of Hernández-Alava and Pudney (2017) for mapping the EQ-5D-3L to the EQ-5D-5L. It is notable that the current iteration of the Hernández-Alava and Pudney (2017) crosswalk only allows mapping to UK/English value sets, while the models developed in van Hout and Shaw (2021) are freely accessible in R, and are easily adapted to other value sets.

As there is currently no consensus about which of these approaches should be used, users are encouraged to check the latest recommendations from the scientifc advisers in the EuroQol offce and the relevant HTA body. The analysis tools section of the EuroQol website reports generic and country-specifc algorithms for both the van Hout et al. (2011) EQ-5D-5L to EQ-5D-3L and the van Hout and Shaw (2021) EQ-5D-3L to EQ-5D-5L crosswalks, as well as syntax for the value sets for some countries.

These are available at: https://euroqol.org/support/analysis-tools/

value sets for the UK and the USA have sometimes been used for this purpose. However, there is no scientifc rationale for choosing any value set as a default option.

# *5.2.3 Empirical Characteristics of the Value Sets*

For most analysts, it is likely that the above considerations will suffce to choose a value set. However, there may remain cases where a choice between value sets must be made. In such cases, it is helpful to examine the quality of the study that generated the value set. This includes the quality of the valuation data and modelling choices made by the study authors and how the particular properties and characteristics of the value sets compare. Analysts who do not feel able to make judgements using the criteria discussed below are encouraged to contact the EuroQol offce, whose scientifc offcers are well placed to advise.

A check list for assessing value sets, such as the one provided by Xie et al. (2015) (Checklist for REporting VAluaTion studiEs – CREATE) provides a structured way of approaching the assessment of study quality – see Box 5.3. However, this checklist focuses on the quality of the *reporting* of the studies and does not directly address considerations of the quality of collected data upon which models are based (other than where these lead to exclusions). Obvious questions to ask about the quality of the data collected in the value set study include: Was the sample size appropriate and was a reasonable response rate achieved? Is the sample representative of the general public? Is there any cause for concern about data quality - for example, were there high rates of missing or implausible valuations? Were there interviewer effects? Were the interviews conducted in a manner that was compliant with the protocol? These issues are addressed in Chap. 2 and are reported for each of the value sets summarised in Chap. 4.

With respect to the modelling methods used to produce value sets from the valuation data, quality may be judged both by the statistical methods used and also by conformity of the value set to properties that are essential or desirable for the way that they will be used. What criteria were used in selecting the specifc model used to produce the value set?

In the case of the value sets reported in Chap. 4, many of these issues relating to data quality, though not subsequent modelling of the data, are dealt with by the rigorous quality control (QC) process applied to EQ-VT-generated data from wave 2 onwards (see Chap. 2). Users of the resulting value sets can therefore have greater confdence in their use. The value sets reported in Chap. 4 follow the EQ-VT protocol and study designs1 set out in Chaps. 2 and 3. They have also been published in peer-reviewed journals, and therefore meet the scientifc standards of those journals. However, the EuroQol Group does not currently have a formal process for

<sup>1</sup>With exceptions - for example, Peru used a 'Lite' version of the EQ-VT protocol, and Vietnam also used an adapted design. See Chap. 4 for further information.

**Box 5.3: The CREATE Checklist (Reproduced from Xie et al. 2015)** *Descriptive systems*


#### *Health states valued*


# *Sampling*


# *Preference data collection*


#### *Study sample*


#### *Modelling*


#### *Scoring algorithm*


endorsing value sets, an issue which is discussed in Chap. 7. Furthermore, the QC processes used in the frst wave of studies were not standardised and did not always satisfy the requirements of users; an example is the concerns expressed by NICE about the frst EQ-5D-5L value set for England (see Hernandez-Alava et al. 2020 and van Hout et al. 2020). This issue was addressed via strengthened QC in subsequent waves, as detailed in Chap. 2.

There are also 'non-standard' EQ-5D-5L value sets available that do not follow the EQ-VT protocol and were undertaken independently of the EuroQol Group, for example, Craig and Rand (2018) for the USA and Sullivan et al. (2020) for New Zealand. Other 'non-EQ-VT' value sets may be produced in future. Researchers have employed different methods, using different protocols, and analysed their data using different econometric procedures, and the resulting value sets will refect this. The EuroQol Group encourages the use of its EQ-VT protocol in studies aiming to produce national value sets for the EQ-5D-5L, to enhance consistency and comparability. The EuroQol Group does not aim to prevent or discourage improvement or innovation in methods for valuing the EQ-5D family of instruments, indeed it actively supports methodological studies.

Users should be aware of and familiarise themselves with the characteristics of the EQ-5D-5L value sets they choose, whether generated by the EQ-VT protocol or not. Are there important differences in preferences between dimensions? Are there any interaction effects in the values that apply when there are particular combinations of health problems? These characteristics of the value sets combine with the properties of the patients' EQ-5D-5L profle data to which they are applied with important implications for QALY estimates (Parkin et al. 2016).

In general, users should be aware of the characteristics of value sets, such as the overall range of values, how these are distributed and whether there are interaction terms, as these will all exert an infuence on their use in statistical analysis (Parkin et al. 2010). For example, if the health condition under consideration involves very severe states, the way in which values for states considered 'worse than dead' have been calculated, rescaled or bounded in the value set will be of particular relevance. If the health states are experienced for long durations, it will be relevant to examine how this relates to the duration of states described in the valuation exercise given the possible effect of "maximum endurable time" on valuations (Sutherland et al. 1982) and the assumption of "constant proportionality" (Dolan and Stalmeier 2003). If the treatment under consideration involves marginal improvements from very good health states to full health, the way in which the constant term has been handled in modelling will affect the estimated change in QALYs.

# *5.2.4 Transparency and Uncertainty*

The most important decision about which value set to use is for the 'base case' for analysis, but it is also recommended that where possible and appropriate analysts also undertake sensitivity analysis using alternative value sets.

The choice of a base case value set should be carefully considered before undertaking analyses, as well as which sensitivity analyses are required given the decision context. For a prospective study, it is important that both the choice of base case and alternative value sets and the rationale for choosing them are clearly set out in the project protocol and statistical analysis plan, and that these are adhered to.

It may be that, considering the factors discussed in the previous sections, there is no value set which is unequivocally 'the best'. In such cases, the analyst's choice of base case value set should be carefully justifed; it is essential that analysts are transparent about the reasons for their choice of base case value set. Usual good practice for such decisions is to choose the value set that is likely to generate the most conservative set of results for the base case. For example, if used in a trial of a new treatment over an established alternative, the principle should be to choose the value set that will generate the results least favourable to it. It would clearly be unethical and contrary to principles of good scientifc practices to choose a value set on the basis that it will generate results most favourable to the analyst's preferred outcome for the study.

In cases where there remain doubts about which value set to use, analysing and reporting the sensitivity of results and conclusions to alternative value sets will increase the value of the information generated. If results are not substantially affected by the choice of value set, this increases confdence in the fndings. Where results and conclusions are contingent on which value set is used, it is very important to convey this information to those who will use this evidence in health care decisions. However, it is important that this recommendation is not interpreted as meaning that users should simply undertake their analyses using different value sets.

In these cases, the EQ-5D-5L values used in an economic appraisal are appropriately considered as part of the uncertainty around the variables that form the economic appraisal model. The analyst should treat the values in an economic appraisal as uncertain parameters and subject them to sensitivity analysis, as with other nonstochastic uncertain variables such as the discount rate. Currently this is not common practice, but it is readily done and would improve confdence in results.

# **5.3 Which Value Set to Use in 'Non-QALY' Applications – An Overview**

Cost-effectiveness analyses is an obvious application for which a single number summary of EQ-5D-5L profle data is essential, but there are other contexts in which this may be useful. Examples of these kinds of applications include:

(a) Population health studies:


the effect on the New York population of the 2020 lockdown during the COVID-19 pandemic.

	- Describing the severity of illness amongst patients. For example, van Wilder et al. (2019) published EQ-5D-3L values for many chronic conditions, disaggregated by patient characteristics.
	- Waiting list management. For example, Derrett et al. (2003) applied EQ-5D-3L valuations to patients' EQ-5D-3L profles as a means of creating a ranking of patients on elective surgery waiting lists in terms of the severity of their condition and their suggested priority for treatment.
	- Summarising the performance of hospitals in achieving improved health outcomes for patients as a result of surgery. For example, the National Health Service (NHS) in England publishes hospital-specifc data from its Patient Reported Outcome Measures (PROMS) programme using EQ-5D values from the UK population as a whole, rather than from patients who use the hospital, refecting the fact that the NHS is a national service (Appleby et al. 2015).

Many of the considerations for choosing which value set to use in QALY estimation are also relevant in the context of 'non-QALY' applications, in particular the applicability of the value set to the population to whom the analysis refers (Sect. 5.2.2) and the value sets' empirical characteristics (Sect. 5.2.3).

A further essential consideration in this context is that the values used should be appropriate to the proposed application and context. As values are not neutral, they should refect the views of those population and groups that count in judging importance given the decision context in which they are applied.

Figure 5.2 provides an overview of the considerations concerning whether a value set is appropriate to use in applications where the principal aim is not to estimate QALYs, and which value set should be chosen in such applications.

As indicated at the start of this chapter, the frst and most important question for *any* user of *any* value set is: 'W*hat is the purpose of representing EQ-5D-5L profle data as a single number?'.* Value sets are often used to provide a convenient means of summarising EQ-5D data as a 'single number' for the purposes of statistical analysis (Devlin et al. 2020).

There are important advantages in being able to summarise and represent an EQ-5D-5L profle by a single number – for example, it simplifes statistical analysis. However, it is important to note that there is no "neutral" set of values that can be used for this purpose. Any value set for the EQ-5D-5L explicitly or implicitly compares each level of each dimension with every other and attaches relative

importance to them. No set of values is "objective": they all embody judgements about both what is meant by importance and the appropriate source of information for assessing it. It is therefore not possible to offer generalised guidance about *which* value set to use if the sole purpose is summarising profles for descriptive or inferential statistical analysis. However, users should be aware that using a value set can introduce an exogenous source of variance that may bias statistical inference. For example, using one value set rather than another may make a difference to conclusions about whether there are statistically signifcant differences between EQ-5D-5L responses between arms of a clinical trial, two groups of patients, or two regions (Parkin et al. 2010; Wilke et al. 2010). Of course, where the purpose of analysis is to refect a society's view about the relative importance of different kinds of health problems, this may be considered a desirable feature.

Users should consider the wider purpose for which the summary will be used. If there is no one purpose, rather just a desire to provide information, then it may not to be necessary to apply a value set to the data, but rather to report the EQ-5D-5L profles themselves in some detail. This may also be preferable because EQ-5D values provide less detailed information than a profle. A range of methods for analysing and reporting profle data are provided in Devlin et al. (2020).

Further, in some cases where a single number is required to represent health, it may be more appropriate to focus on the EQ VAS data provided directly by the relevant patients or populations themselves, rather than using profle-based values. Whether the EQ VAS or value set-weighted profles are most relevant will depends on the nature of the analysis, and its purpose, and whether it is patients or society's perspective that is most important.

An alternative to applying EQ-5D-5L values sets of the kind reported in this book, or to focussing analysis just on EQ-5D-5L profles or EQ VAS data provided by patients, is to apply a different means of aggregating profle data. One approach which has been explored is to develop a scoring algorithm based on predicted EQ VAS. Using a sample of patients' or population data, the responses to the EQ-5D profle are used to predict the EQ VAS via regression analysis (Hardman et al. 2002; Whynes and The TOMBOLA Group 2008; Feng et al. 2014; Burstrom et al. 2014; Gutacker et al. 2020). These provide, for any given EQ-5D profle the average EQ VAS on a 0–100 scale (representing worst to best health imaginable). As such a scale is not anchored at dead = 0, it is not suitable for estimating QALYs – but does represent an average view of how good or bad health states are. Where the relationship between the profle and EQ VAS is based on patient data, such value sets are also claimed to represent patients' experience. This use of VAS data is examined further in Sect. 5.4.1.

In contrast to the application of EQ-5D-5L data in QALY estimation, where the requirements of economic evaluation provide a broad theoretical foundation to guide the choice of value sets (see Box 5.1), the analysis of EQ-5D-5L data in other applications may lack an obvious theoretical foundation to guide how data are appropriately analysed or reported. For users concerned with choosing value sets with particular theoretical properties, Sect. 5.4.2 provides a brief discussion of the issues. Where the end user of analysis is known, and where the kinds of decisions that the analysis will inform is clear, the choice of approach should be guided by any requirements of the end user or, where none are provided, by considering what is most relevant to the decisions at stake.

Note that in many of these 'non-QALY' applications of EQ-5D data, analysis of EQ-5D-5L profles, EQ VAS and EQ-5D values may *all* be relevant to decision makers, as each provides different and complementary information. Where this is the case, the use of value sets to summarise EQ-5D-5L profle data should be accompanied by analyses of EQ-5D-5L profle and EQ VAS data. An example of this is the use of the EQ-5D-3L in studies of the general population in different countries, including those designed to generate population norms. The key EuroQol Group publication on this (Szende et al. 2014) includes values based on value sets, but also reports comparative EQ VAS and dimension and level data for 24 countries using the EQ-5D-3L.

Finally, where there is a clear rationale for using value sets to weight EQ-5D-5L data for statistical analysis (for example, where society's rather than patients' preferences are considered paramount), the advice provided in Sect. 5.2 will be equally relevant. For example, the basis for choosing which value set is used should be clearly stated, ideally in advance of analysis, and sensitivity analysis undertaken to determine whether the characteristics of that value set exert an important effect on results and conclusions.

# **5.4 Choosing Value Sets – Some Further Considerations**

This section complements the overview provided in Sects. 5.2 and 5.3 with a more detailed discussion of two issues: relevance to the decision-making context, and theoretical properties of value sets.

# *5.4.1 Relevance to the Decision-Making Context*

We have already noted that, as a general principle, users should choose a value set which is relevant to the decision-making context. A frst assessment of relevance relates to the country in which values were obtained, as described earlier. Yet, other more nuanced facets may need to be considered to deem a value set relevant, including whose values are relevant in the context of interest and what is the appropriate source of such values.

The question of whose values are relevant has been widely debated and there are different possible answers to that (Dolan et al. 2003). Most of the evidence and considerations presented in this chapter relate to "social" value sets (such as those reported in Chap. 4), which are meant to represent the average values of the general public. In essence, these "social" valuations for EQ-5D-5L are generated from members of the general public being asked to consider states that may be hypothetical to them, and to value them from the perspective of imagining being in those states.

There are normative arguments advanced for using social valuations in economic evaluation. Broadly speaking, the purpose of any economic evaluation is to assess the value for money of alternative uses of scarce health care resources. Where the context of these decisions is the public sector, it is generally argued that the valuation of health states used in the assessment of 'beneft' should refect, as closely as possible, the preferences of the relevant general public. This is both because, in publicly-funded health care systems, it is the general public who are funding health care, e.g. via taxes; and because the general public are potential users of the health care system and can provide valuations 'behind a veil of ignorance'.

An alternative could be to create a "patient-based value set" consisting of values elicited from patients, using either the same stated preference methods used for the general population or revealed preferences based on self-reported EQ VAS values. Patient-based value sets are preferred in some countries, such as Germany and Sweden (Rowen et al. 2017). Proponents of this choice argue that "patient-based value sets" refect the preferences of those who are actually experiencing the states, and for this reason are more well-informed. Differences between patients' and the general public's valuation of states are common and have been extensively observed. For example, members of the general public often give a lower value to health states than those who experience them, as they cannot predict what their experience in that state would be or how they would adapt to it (Brazier et al. 2005). Ogorevc et al. (2019) report signifcant differences between patients' and general public values, but these varied by dimension, with patients considering mobility and self-care problems as less problematic, but pain/discomfort and anxiety/depression more problematic. While it may be desirable to include an assessment of patients' values as an adjunct to the main analyses in most studies, there are theoretical concerns about using these values in the context of, for example, economic evaluation. For example, the fact that values for health states may be modifed by adaptation could be an argument against their use for decision making based on *ex ante* judgements about the value of health care interventions. Moreover, it may be diffcult to include patients in valuation studies given their impaired health and unethical to perform an intrusive valuation interview with them. These considerations and practical limitations have led most HTA bodies (with the notable exception of Sweden's TLV, as noted earlier) and end users to specify that it is general public values which are required, and this is refected in the protocol for valuation of EQ-5D-5L. For this reason, this chapter assumes that a representative sample of the general public is preferred.

Nevertheless, it may be that, pragmatically, the only available source of values is from the patients whose health states are being analysed, or that in some applications these are regarded by the relevant decision-makers as being the most appropriate. There have also been some debates about whether or not it is appropriate to use the values from sub-groups of the population rather than the population as a whole – for example, the values of women or older people for conditions which only affect them (Sculpher and Gafni 2001, 2002; Robinson and Parkin 2002). Similarly, there

are debates about whether the values of children and adolescents, who are generally excluded from sampling but are also members of the general public, are relevant to include in social values (Hill et al. 2020). There is currently no consensus on these issues.

A second relevant issue is the point in time that value sets were generated. Just as there are important differences in health state values between countries (as is evident in the value sets reported in Chap. 4, and compared in Chap. 6), it is possible there may be differences in the average values within a country, over time. This would arise if preferences regarding health are not stable, as is normally assumed in economics, but change over time (Bridges 2003), perhaps because of changing experience of and expectations about health. Further, the composition of the general public changes through time, as a result of ageing, changes in immigration and emigration, and sociodemographic shifts, and such changes may also affect the *average* preferences of society that value sets refect. We currently have very little evidence on these matters for EQ-5D-5L valuations, because of the relative recency of these value sets, or for other HRQoL instruments, because differences in methods used limit the comparability of valuation data through time. However, as a general rule, a more recent value set is preferable to an older one, providing they are equally relevant in other ways, and are otherwise comparable on the empirical and theoretical grounds discussed below. This question of what the appropriate 'shelf-life' of a value sets is, is considered further in Chap. 7.

# *5.4.2 The Theoretical Properties of Values and Value Sets*

As well as the TTO and DCE methods used in the EQ-VT, there are other methods for valuing health states including Standard Gamble (discussed in Box 5.1), Magnitude Estimation, Paired Comparisons (PC), Rating scales, Visual Analogue Scales (VAS), the Better than Dead approach (van Hoorn et al. 2014), Number Equivalence (also known as Person Trade-Off) and Personal Utility Functions (PUF) (Devlin et al. 2019). And, while the EQ-VT uses a specifc type of TTO, composite TTO (cTTO) (see Chap. 2), there are other forms of TTO (such as lead time TTO and lag time TTO); similarly, there are still other types of DCE (such as DCE with duration; and best worst scaling). These other methods are not currently widely used for valuing the EQ-5D-5L and in many cases have only been used in smaller experimental studies, rather than the large-scale representative sample studies appropriate to the construction of value sets for practical use. However, they have been used to estimate value sets for other instruments – for example VAS for the EQ-5D-3L and PC to estimate disability weights for the World Bank / World Health Organisation Disability Adjusted Life Years project. It is possible that future non-standard value sets may be generated that have different properties to those generated by the TTO and the DCE, which may be an important factor in the choice of value sets.

Unfortunately, the theoretical and empirical case for favouring one method of health state valuation over another is far from clear-cut. In the context of QALY estimation, for example, it has been argued that the QALY is no more than a convenient device to combine length and quality of life into a single metric (Parkin and Devlin 2006) and does not need to conform to theoretical concepts such as 'utility' or measurable 'utility'. The theoretical foundations of QALYs therefore do not require that quality of life be valued using a particular measurement method. However, the current dominant practice of using TTO and DCE methods, following the rationale provided in Box 5.1, has the merit of imposing consistency between the resulting value sets and giving a relatively clear interpretation to them.

The recommendation is therefore to exercise caution when considering using value sets resulting from non-standard valuation methods and to examine closely the rationale used by their developers.

# **5.5 Concluding Remarks**

There is no simple answer to the question of which value set to use: the answer depends on the specifc nature of the research application, the sort of decisions it informs, and the context in which the evidence from your research will be used.

In some cases, which value set to use will be determined by the stated requirements of those using the evidence to inform decision-making. Where this is not the case, we encourage potential users of EQ-5D-5L value sets carefully to consider each of the practical and theoretical issues discussed in this chapter. We strongly recommend that users clearly justify their choice of value sets in a transparent manner. Where there remains uncertainty over which value set to use, we recommend that researchers should report the sensitivity of their results and conclusions to the use of alternative value sets. In applications where QALY estimation is not a goal, there may not be a clear rationale for using a value set as the focus of analysis, and users are encouraged to make full use of the EQ-5D-5L profle and EQ VAS data provided by respondents.

# **References**


Swedish). [Changes in the Dental and Pharmaceutical Benefts Agency's guidance for economic evaluations]. Stockholm: TLV. https://tlv.se/download/18.467926b615d084471 ac3230c/1510316374332/TLVAR\_2017\_1.pdf. Accessed 19 July 2021


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 6 How Do EQ-5D-5L Value Sets Differ?**

**Bram Roudijk, Bas Janssen, and Jan Abel Olsen**

**Abstract** This chapter aims to explore the differences in EQ-5D-5L value sets between countries/areas, and to investigate whether common patterns can be identifed between them. EQ-5D-5L value sets for 25 countries/areas were extracted from published literature. These national value sets were compared on key characteristics, such as: the relative importance of the EQ-5D-5L dimensions; the value scale length and the distribution of values over the value scale. Using these characteristics, distinct preference patterns were identifed for Asian, Eastern European and Western countries/areas. The Asian countries/areas were split into East Asian and Southeast Asian countries/areas, as these subgroups shared similar characteristics. Using mean values for countries/areas with similar preference patterns, several aggregate value sets were generated. These aggregate value sets describe mean values for all 3125 health states described by the EQ-5D-5L for countries/areas with similar preference patterns. Applying these values to EQ-5D-5L profle data for 7933 respondents in an international survey showed that these aggregate value sets represent the individual national value sets relatively well. This chapter identifed large differences between value sets, yet was able to identify common preference patterns between selected countries/areas.

B. Roudijk (\*) ·

B. Janssen EuroQol Research Foundation, Rotterdam, The Netherlands

Section of Medical Psychology, Department of Psychiatry, Erasmus MC, Rotterdam, The Netherlands

J. A. Olsen

Division of Health Services, Norwegian Institute of Public Health, Oslo, Norway

EuroQol Research Foundation, Rotterdam, The Netherlands e-mail: Roudijk@EuroQol.org

Department of Community Medicine, University of Tromsø – The Arctic University of Norway, Tromsø, Norway

# **6.1 Introduction**

Since 2012, 25 EQ-5D-5L value sets have been published using the EuroQol Valuation Technology (EQ-VT), a standardised valuation protocol, as described in Chap. 4 of this book (Oppe et al. 2014; Stolk et al. 2019). These value sets have been developed across the world, concentrated initially in Western Europe and Canada, subsequently including more North American and Asian/Pacifc countries/areas and (more recently) including countries/areas from Latin America, the Middle East, Africa and Eastern Europe.

Value sets have been shown to differ between countries/areas in a number of aspects such as: the relative importance of the fve dimensions and their associated levels; the length of the value scale; how many health states are given values worse than dead (WTD); the location of the descriptive midpoint (33333) on the scale and the shape of the distribution of the values for all 3125 health states. In principle, there are two reasons why we observe such cross-country/area differences in values: (1) methodological differences and (2) genuine differences in populations' health state preferences.

In contrast to valuations elicited for the EQ-5D-3L instrument, the EQ-VT protocol used for the EQ-5D-5L instrument provides a standardised method to collect valuation data using a combination of composite Time Trade-Off (cTTO) and Discrete Choice Experiment (DCE). Although the use of EQ-VT standardises a large part of the valuation methodology, some methodological differences may persist between countries/areas. These include the choice of valuation method used for the fnal dataset (Norman et al. 2009), modelling strategy, translation of the EQ-5D-5L (which may lead to different interpretations of health problems as described by the instrument) and any imbalance in the socio-demographic composition of the sample that might undermine the sample's representativeness. The EQ-VT protocol allows local research teams to choose a sampling strategy that is acceptable to the HTA bodies of their respective countries/areas (Stolk et al. 2019). It also allows local research teams to decide on the modelling strategy e.g., whether to use both the cTTO and DCE data or cTTO only (see Chap. 2 for more details). All of these aspects introduce heterogeneity due to methodological differences, which may be refected in the value sets.

The differences in populations' health state preferences are assumed to be affected by a wide range of institutional and other country/area-specifc circumstances that impact individuals' health opportunities and challenges and may shape health expectations and norms. Countries/areas differ along highly relevant factors such as: their healthcare system (e.g., whether universal coverage is in place or not), social insurance (including sickness beneft schemes), wealth measures such as Gross Domestic Product (GDP), governance, culture (norms and beliefs) and even climate and geography (e.g., the importance of mobility, which may be related to the infrastructure of a country/area).

The aim of this chapter is to identify in which ways currently published EQ-5D-5L value sets differ and whether we can establish distinctive preference

patterns which are common across groups of countries/areas. Previous work (Olsen et al. 2018) analysed seven EQ-5D-5L value sets and identifed a 'Western preference pattern' (WePP). This chapter extends the work of Olsen et al. (2018), using the value sets reported from all 25 countries/areas reported in Chap. 4. We hypothesise that countries/areas that are similar in terms of institutional settings and other country/area-specifc circumstances will have similar value sets. As 21 additional value sets have been published since the four Western countries/areas included in the Olsen et al. (2018) study (Canada, England, Netherlands, Spain) (Xie et al. 2016; Devlin et al. 2018; Versteegh et al. 2016; Ramos-Goñi et al. 2017b, 2018), it may be possible to further validate and refne the suggested Western preference pattern identifed by Olsen et al. (2018). We also investigate whether other preference patterns emerge for other groups of countries/areas, i.e., whether there are any similarities in the value sets for countries/areas in other regions of the world, that may share similar characteristics.

# **6.2 Methods**

# *6.2.1 Analysing Differences Between Value Sets*

To determine how the value sets may differ from each other, several important characteristics of the value sets are used. Olsen et al. (2018) previously identifed: (1) the relative importance of the different EQ-5D-5L dimensions; (2) differences in scale length between countries/areas, which gives an indication of the willingness to trade-off quality for quantity of life; (3) the marginal effect of moving from one severity level to another severity level; and fnally, (4) the location on the value scale of the descriptive midpoint in the EQ-5D-5L, state "33333". To compare the relative importance of the EQ-5D-5L dimensions, we compare (1) the relative importance of the functional dimensions (mobility (MO), self-care (SC) and usual activities (UA)) versus the symptom dimensions (pain/discomfort (PD), anxiety/depression (AD)); (2) the relative importance of pain; (3) the relative importance of anxiety/ depression.

For the current chapter, all 25 value sets published at the time of writing were used. A database was created in which utilities of each value set were assigned to all possible 3125 EQ-5D-5L health states. These value sets include: ten from Europe (Ramos-Goñi et al. 2017b, 2018; Versteegh et al. 2016; Devlin et al. 2018; Andrade et al. 2020; Ludwig et al. 2018; Ferreira et al. 2019; Hobbins et al. 2018; Golicki et al. 2019; Rencz et al. 2020; Jensen et al. 2021); two from North America (Xie et al. 2016; Pickard et al. 2019); three from Latin America (Augustovski et al. 2016; Augustovski et al. 2020; Gutierrez-Delgado et al. 2021); one from Africa (Welie et al. 2020) and nine from Asia (Luo et al. 2017; Shiroiwa et al. 2016; Kim et al. 2016; Mai et al. 2020; Pattanaphesaj et al. 2018; Lin et al. 2018; Purba et al. 2017; Wong et al. 2018; Shafe et al. 2019). As a preliminary exploration of the value sets, the kernel density distributions of each value set were plotted and compared graphically.

The relative importance of the EQ-5D-5L dimensions was assessed by comparing the values for single dimension level 5 problems health states (51111, 15111, 11511, 11151, 11115) between countries/areas, encompassing the maximum value decrement for each dimension.1 A further sub-analysis was carried out to determine the relative importance of PD and the relative importance of mental health (i.e., AD), and their ranking compared to the other dimensions.

Differences in scale length between countries/areas are inspected by subtracting the value for state "55555" (extreme problems/unable to on all dimensions) from the value for state "11111" (no problems on any of the dimensions) for each country/area. The location of descriptive midpoint in the value distribution is assessed by taking the difference between the value for state "11111" and state "33333" and dividing this value by the scale length.

Analysing the marginal effect of moving from one severity level to another on a dimension is trivial in cases where a 20-parameter main effects model is used as the preferred model for a value set, as the coeffcients of the 20-parameter model can be used for this analysis (see Chap. 4 for a description of the modelling of valuation data). However, this is not the case for at least some of the value sets included in this analysis, such as value sets that included an intercept, or that were defned according to a constrained 8- or 9-parameter model, where the distance between the levels of the EQ-5D-5L is kept constant over the dimensions.2 Therefore, we calculated the values for each health state with problems on a single dimension only (e.g. for AD 11111, 11112, 11113, 11114, 11115) and plotted these in a line plot for each dimension for each country/area separately. This allows for a comparison of the 20-parameter model value sets with all other value sets.

# *6.2.2 Defning Preference Patterns and the Performance of These Patterns*

To test the performance of the identifed preference patterns, data from the Multi Instrument Comparison (MIC) study (Richardson et al. 2012) were used, as in the Olsen et al. study (Olsen et al. 2018). In the MIC study, patients from seven disease areas (arthritis, asthma, cancer, depression, diabetes, hearing loss and heart disease) as well as a healthy respondent group, completed the EQ-5D-5L, as part of a larger international survey. In total, 7933 respondents from six Western countries/areas (Australia, Canada, Germany, Norway, United Kingdom, United States) completed

<sup>1</sup>Note that this is a different and less sophisticated way of assessing dimension importance compared to that reported in Chap. 4 for each value set.

<sup>2</sup>The results of 8- or 9-parameter models can be presented as 20-parameter models, without changing anything of substance relating to the model.

the EQ-5D-5L. The values for the 25 different value sets used in the current chapter were assigned to the health profles of the respondents, as were the values generated by the identifed preference patterns. Using line plots, we compared the distributions of the values between the countries/areas and identifed preference patterns. The preference patterns, or *aggregate value sets*, will be defned based on the mean of the coeffcients of sets of value sets that share common properties. The properties used for this purpose are: (1) the relative importance of the EQ-5D-5L dimensions (calculated as described above, four different sub-characteristics are compared); (2) the distribution over values over the scale (six different sub-characteristics are compared); and (3) geographic proximity and cultural similarity.

# **6.3 Differences Between Value Sets**

# *6.3.1 Relative Importance of the Dimensions*

Table 6.1 provides information on the geographical region and subregion of the countries/areas in which value set studies were conducted. Furthermore, it shows which protocol version was used for the data collection, which refects some of the methodological choices made in each study, such as the use of the quality control (QC) procedure, practice health states, a feedback module and dynamic practice examples. More can be read about this elsewhere (Stolk et al. 2019) and in Chap. 2. Furthermore, Table 6.1 shows the order of importance of the dimensions, which differs between countries/areas. In each country/area either MO, PD or AD is identifed as the most important dimension. The least important dimension is either SC, UA, AD in Uruguay, and PD in the Indonesian value set. MO is ranked as the most important dimension eleven times (including all of the nine Asian countries/areas), PD is ranked as the most important dimension ten times (including seven Western countries/areas) and AD is ranked as most important four times (including the remaining three Western countries/areas). In 16 value sets, UA is ranked as the least important or is tied as the least important dimension. SC is ranked seven times as the least important dimension, while AD is ranked least important once and PD once.

Table 6.2 reports the individual weights, or partial value decrement, for having a certain level of problems on a certain health dimension compared to not having any problems on that dimension. The table restricts itself to the maximum level, so only the weights for level 5 problems are reported. The smallest value decrement assigned to any dimension with level 5 problems is UA, in Spain, with a value of 0.153. In contrast, the largest value decrement in Spain is for PD, with a value of 0.381. The largest value decrement assigned to any dimension with level 5 problems is AD in Ireland, with a value of 0.646. In contrast, the smallest value decrement assigned to any dimension with level 5 problems in Ireland is 0.187, for UA. The size of the range of value decrements assigned to the dimensions differs substantially between countries/areas. The smallest difference is reported in Japan, where the largest


**Table 6.1** Summary information by country/area

*NA* North America, *WE* Western Europe, *EE* Eastern Europe, *EA* East Asia, *SEA* Southeast Asia, *LA* Latin America, *AF* Africa

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

a The order of importance of the dimensions is defned by the weights assigned to the states 51111, 15111, 11511, 11151, 11115

b Method is defned as the data used for the fnal model (i.e., a cTTO-only or a hybrid model was used, or another strategy)

weight (MO) is only 0.079 larger than the smallest weight (SC). The largest differences are reported in Ireland, where the largest weight (AD) is 0.459 larger than the smallest weight (UA). These results show that countries/areas can differ considerably in terms of which dimension is considered most important and the absolute difference in weight assigned to the different dimensions.


**Table 6.2** Weights assigned to level 5 problems in each dimension

*AD* anxiety/depression, *MO* mobility, *PD* pain/discomfort, *SC* self-care, *UA* usual activities

# *6.3.2 Marginal Value Decrements of Moving from One Level to Another*

Figures 6.1a and 6.1b report the marginal effect of moving from one level to another, by dimension. For some countries/areas, such as Canada and Japan, the marginal value decrement of moving to another level of problems is relatively similar between the dimensions. However, in countries/areas such as Indonesia, Ireland, Germany, the Netherlands and Poland, the marginal effects for one or two dimensions are signifcantly more negative. Furthermore, the slopes of the graphs in Figs. 6.1a and 6.1b differ between countries/areas, indicating different marginal value decrements. For example, in Japan the decrements in value by level seem to be relatively linear, while in several Western countries/areas (including Canada, England, Netherlands), there seems to be a "kink" when moving from level 3 to 4 on any dimension, and a "reverse kink" when moving from level 4 to 5.

These fndings are supported by Table 6.3, which reports values for specifc health states, representing the same level of problems on all dimensions; 11111, 22222, 33333, 44444, 55555. Table 6.3 shows how these important health states in

**Fig. 6.1a** Value decrements by level and dimension, by country/area for Eastern European and Western countries/areas

**Fig. 6.1b** Value decrements by level and dimension, by country/area for Asian, African and Latin American countries/areas


the descriptive system are spread over the value scale. For example, in China the values assigned to these states correspond well with their location in the descriptive system, with state 33333 being roughly halfway on the value scale between state 11111 and 55555. For other countries/areas, such as France, this is not so: the difference between 11111 and 33333 represents only 15% of the scale, with the remaining 85% representing the difference between 33333 and 55555.

# *6.3.3 Scale Length and Location of the Descriptive Midpoint on the Health Utility Scale*

Figure 6.2 reports the kernel density distributions of each value set. The differences in scale length between countries/areas are refected in these distributions. The scale length for a country/area can be thought of as an indicator of their willingness to give up life years to improve quality of life. The shape of the distributions also differs, although in addition to the scale length, this is also related to the relative importance of the domains and modelling decisions, which may lead to normally or non-normally distributed value sets.

Table 6.3 includes the location of the descriptive midpoint (33333) of the EQ-5D-5L on the value distribution, expressed as a percentage of the total scale length. For the Western European countries/areas, the descriptive midpoints are in

**Fig. 6.2** Kernel density distribution plots by country/area

most cases assigned a higher value than the mathematical midpoint of the health value scale, indicating a relatively larger weights for severe and extreme health problems, compared to slight and moderate problems. For Asian countries/areas, this is the case as well, with the exception of Korea, where state 33333 is assigned a relatively low value compared to the scale range. Furthermore, Table 6.3 also reports the scale length for each value set. For Western countries/areas, these are relatively similar and are between 1.096 and 1.757. Ireland is an exception with a scale length of 1.974. For the Asian countries/areas, there is a clear distinction between East Asian and Southeast Asian countries/areas in terms of scale length; for East Asian countries/areas such as China, Japan and Korea, these are relatively small (between 1.026 and 1.391). For Southeast Asian countries/areas the scale lengths are much larger, with the lowest being reported in Thailand at 1.420 and the highest in Taiwan at 2.025.

# **6.4 Identifying Preference Patterns**

Based on these fndings, it appears that we can now differentiate between four regions: Asian, Western, Eastern European, Latin-American and African. Tables 6.4a and 6.4b provide more details on certain properties that are shared between the value sets for different countries/areas and by contrast also identifes differences. The Asian countries/areas can further be divided into the East Asian (Japan, Korea and China) and Southeast Asian (Vietnam, Thailand, Taiwan, Hong Kong, Indonesia and Malaysia) subgroups. The Latin-American (Mexico, Peru and Uruguay) and African (Ethiopia) regions are pooled due to the fact that there are few value sets available for that region and they do not ft into the other categories.

What can be seen from Tables 6.1, 6.2, 6.3, 6.4a and 6.4b, is that the relative importance of the dimensions differs fundamentally between Western countries/ areas and Asian, Latin-American, African and Eastern European countries/areas. For the Asian value sets, MO is always the most important dimension, mostly followed by PD, with UA and SC being of least concern. In contrast, PD or AD is usually the most important dimension in Western countries/areas, followed by MO, SC and UA. Eastern European countries/areas show a high importance for PD and MO, followed by SC, AD and UA. For the Latin-American and African value sets, the orders of relative importance are more mixed. Other characteristics, such as a drop at the top of the scale, i.e., the drop in value associated to not being in state 11111 regardless of the health problems experienced and relative drops in value over the levels, seem mixed. Although Olsen et al. (2018) found substantial differences in the value sets from the frst wave of valuation studies for the EQ-5D-5L, these differences are less apparent in the newer studies.

The scale length also differs substantially between countries/areas, as can be seen from Tables 6.4a and 6.4b. Western countries/areas seem more similar regarding how the values for some key points on the scale (states 22222, 33333, 44444 and 55555) are distributed over the value scale, as compared to other countries/areas.


Key characteristics from each country's/area's value set, Western countries/areas and Eastern Europe


Between the Asian countries/areas, there seems to be a distinction in scale length between Southeast Asian countries/areas and East Asian countries/areas, with East Asian countries/areas reporting shorter scale length than Southeast Asian countries/ areas, indicating less willingness to trade life years for quality of life in East Asian countries/areas, compared to Southeast Asian countries/areas.

Olsen et al. (2018) found a Western preference pattern (WePP) in their previous study, which represented a hybrid of the value sets for England, Canada, Spain and the Netherlands. Of the ten key characteristics in their WePP-model (the characteristics in Tables 6.4a and 6.4b, except for the drop at the top/N1 term), Canada and England fulflled all 10 characteristics, while the Netherlands and Spain met 9. The new value sets from other Western countries/areas appear to confrm the existence of this WePP, with Portugal and Ireland fulflling 8 of the criteria, Denmark fulflling 7, while France and Germany fulflled 6 of the criteria. The US and the two Eastern European countries/areas are somewhat different from the Western preference patterns, as they only adhere to 4 or 5 of the criteria. Interestingly, the relative importance of AD is much lower in the Eastern European countries/areas, while MO and SC are more important, although this also applies to a lesser extent to France and Portugal.

Table 6.4b reveals more heterogeneous preferences behind the value sets. It seems that Asian countries/areas share similar characteristics, but can be subgrouped into East Asian and Southeast Asian preference patterns. The remaining countries/areas (Mexico, Peru, Uruguay and Ethiopia) are different in their characteristics from the (Southeast and East) Asian, Eastern European and Western value sets. As they also differ between each other, these are not grouped as another preference pattern. However, one common aspect of the value sets from Mexico, Peru, Uruguay and Ethiopia that can be clearly identifed is that the value for state 55555 is lower than −0.2 in all these countries/areas.

Preference patterns and the aggregate value sets associated with them, are generated and defned as the means of the groups of value sets they represent. Taking the means of the values from several value sets that share similar characteristics ensures that the aggregate value sets broadly represent the value sets they should represent, without large variation. These aggregate value sets are reported in Table 6.5 and are presented as weights for the level dimension combinations (e.g., UA3 represents the weight for having moderate problems on UA).

# **6.5 How Do These Preference Patterns Perform?**

Figure 6.3a-e show how these aggregate value sets perform compared to the national value sets they represent. These fgures represent values assigned to the EQ-5D-5L health profles of the respondents of the MIC study, based on the value sets for each country/area in the geographic region, and the aggregate value sets developed here, referred to with the prefx MN to each region. These values are plotted against the relative severity of the health profles of those respondents, as defned by the level

**Table 6.5** The aggregate value sets, presented as weights for level/dimension combinations (e.g., SC4 indicates level 4 problems with SC). N1 indicates a drop in value related to not being in full health/state 11111


**Fig. 6.3a** Performance of the preference patterns: MN-WEPP

**Fig. 6.3b** Performance of the preference patterns: MN-ASIA

**Fig. 6.3e** Performance of the preference patterns: MN-SEA

**Fig. 6.3d** Performance of the preference patterns: MN-EASIA

**Fig. 6.3c** Performance of the preference patterns: MN-EUR-E

sum score, calculated by taking the sum of the levels of problems for all dimensions of the EQ-5D-5L (e.g., for state 12315 this is 1 + 2 + 3 + 1 + 5 = 12) and rescaled to a scale in which 0 is the worst health state and 1 is the best health state. This allows us to see whether the aggregate value sets could represent single country/area value sets well, when used in patient populations. If they perform well, these aggregate value sets may be useful to assess the quality of life in multi-country/area studies. What these fgures show is that the Western aggregate value set based on means (MN-WEPP) performs relatively well and can be seen as an extension of the Western preference pattern ('WePP') model suggested by Olsen et al. (2018). However, this preference pattern may misrepresent the value sets of some countries/areas to some degree, such as Ireland and France. The values for Ireland are consistently lower than the MN-WEPP. The value set for France generates values that are frequently substantially higher than the other value sets for the mild and moderate states, yet the values for France become more comparable to the other European value sets and the aggregate value set for more severe states. Eastern European values (MN-EUR-E, Fig. 6.3c) are quite similar for the mild and moderate states, yet seem to diverge for the more severe states.

The results for the Asian aggregate value set (MN-ASIA, Fig. 6.3b) are mixed. The Southeast Asian aggregate value set (MN-SEA, Fig. 6.3e) performs relatively well for mild and moderate states, yet for severe states, there seems to be a split between two sets of 3 countries/areas. Taiwan, Hong Kong and Indonesia show substantially lower values than Thailand, Malaysia and Vietnam. For East Asia, Korea and Japan appear to have very similar preference patterns, yet China performs quite differently from those, leading to a misrepresentation of the East Asian aggregate value set (MN-EASIA, Fig. 6.3d). For the whole of Asia, there seems to be a lot of heterogeneity, with smaller groups of countries/areas being more alike, but no real pattern in values that is shared among all countries/areas.

# **6.6 Discussion**

# *6.6.1 Main Findings*

In this chapter we have identifed several key differences between currently published value sets, by examining the distributions, scale lengths, relative importance of the dimensions, marginal differences in values between levels, and a focus on symptoms versus the functional dimensions. Furthermore, we were able to identify several preference patterns for countries/areas that share common characteristics in terms of geography and/or institutional settings.

# *6.6.2 Preference Patterns*

We have identifed fve preference patterns; for Western countries/areas, Asian countries/areas, further subdivided into an East-Asian preference pattern and a Southeast Asian preference pattern and Eastern-European countries/areas. Our fndings show that the countries/areas identifed by Olsen et al. (2018) as having a similar pattern can be supplemented with other Western countries/areas as well, as they are also similar in value set characteristics.

Eastern European countries/areas differ substantially from the Western preference pattern, as MO is considered more important than in Western countries/areas, compared to PD or AD. Furthermore, AD is given low priority in general, compared to the Western preference pattern.

The Asian preference patterns are distinct from the Western preference pattern as MO is considered the most important dimension in all Asian value sets. Furthermore, a clear distinction between the Eastern European and Asian preference patterns is that on average, there is a higher importance for AD in the Asian value sets, compared to the Eastern European preference pattern. The scale lengths do not differ substantially from the Western and Eastern European preference patterns, yet differ substantially within the Asian preference pattern. Figures 6.3b, 6.3d and 6.3e illustrate this. The difference in scale length and divergence between the Eastern Asian and Southeast Asian countries/areas for the severe health states is the main difference between the two and leads us to distinguish two separate Asian preference patterns. This confrms fndings by Xie et al. who concluded that there is less variation within Western value sets compared to Asian value sets (Xie et al. 2017).

# *6.6.3 Data Quality and Modelling Strategies*

In addition to the factors discussed in the methods and results sections, there are two other key elements that may cause differences between value sets: (1) data quality and (2) modelling strategies. Especially in the frst wave of valuation studies, some studies reported issues with data quality. Two key data issues identifed were a lack of worse than dead (WTD) responses due to the fact that the WTD task of the cTTO was not explained to the respondents and satisfcing by the respondents, leading to low values for very mild health states (Stolk et al. 2019; Ramos-Goñi et al. 2017a, b). Both of these are undesirable and may affect value sets, resulting in poor face validity. The lack of negative values may lead to a narrower value range than could have been found when genuine preferences had been captured, while low values for mild health states may lead to imprecision and underestimation at the top of the scale, resulting in low values for mild states in the value sets.

Modelling strategies may also affect some of the key aspects of a value set. These refer to: (1) whether cTTO data are combined with DCE data (hybrid modelling); (2) the assumptions on censoring at −1; (3) the way heteroskedasticity is dealt with; (4) accounting for preference heterogeneity; and (5) allowing for nonlinear terms. Hybrid modelling combines the DCE and cTTO data into a single likelihood function, which allows the researcher to model both sets of data simultaneously (Ramos-Goñi et al. 2018). There is evidence that the scale length is somewhat longer in studies that use hybrid models when compared to cTTO models. For example, in the case of the US valuation study, the scale length differed by 0.126 between the cTTO and hybrid models, and the order of importance of the dimensions also differed (Pickard et al. 2019). Furthermore, there is an ongoing debate on whether hybrid modelling is an appropriate strategy when the DCE and cTTO data are not in agreement.

Taking into account the censored nature of cTTO data is another matter of concern that may affect value sets systematically. In the cTTO task, respondents are constrained by a minimum value of −1 that they can assign to health states. In practice, respondents may be willing to assign an even lower value to a health state. To account for this, assumptions can be made about the distribution of the responses at −1, which may be beyond this value. Tobit models are a way to deal with this and they may substantially lengthen the scale compared to models that make no assumptions about censored data. For example, in the Dutch study, the scale length differed by 0.119 between the Tobit and regular linear models (Versteegh et al. 2016). Furthermore, there may be consequences for the values at the top of the scale as well.

Models accounting for heteroskedasticity may be better at providing more exact estimates for cTTO data, as the standard errors are substantially smaller for the milder health states compared to the more severe states on the scale. Furthermore, accounting for preference heterogeneity by employing random intercept models may also account for differences between value sets. Finally, allowing for nonlinear terms such as interactions in the models may cause differences between value sets, producing non-normal distributions.

# *6.6.4 Differences Between Value Sets Between and Within Preference Patterns*

As identifed in the introduction, there are many factors that may explain differences and similarities between value sets. Differences in genuine preferences may result from differences in cultural values, wealth, characteristics of health systems, whereas methodological differences can be caused by differences in measurement method, data quality and modelling strategies.

A study by Wang et al. investigated the results of seven Asian cTTO datasets, used for generating EQ-5D-5L value sets (Wang et al. 2019). They found substantial differences between value sets for Asian countries/areas and recommend developing value sets for each country/area independently, on the basis that a value set from one Asian country/area may not adequately represent the values from a neighbouring country/area. We also fnd some differences between Asian countries/areas, yet also some similarities that allow us to approximately group Asian value sets into two groups. One can speculate what may cause these differences between the two groups. It could be that some countries/areas are more similar in their preferences, as they share similar characteristics in terms of their wealth, healthcare systems, social insurance and culture. However, looking at the effect of culture, mixed results have been found. One study fnds a relationship between the relative importance of the different attributes and differences in culture (Bailey and Kind 2010), while another study fnds no relationship between scale length and differences in cultural values (Roudijk et al. 2019). Other factors such as study protocol and QC could also be an important factor in explaining differences between value sets. For example, the studies from the earlier waves of EQ-5D-5L valuation studies (Japan, China, Korea) report much smaller scale lengths than most of the other Asian value sets (Shiroiwa et al. 2016; Luo et al. 2017; Kim et al. 2016). This may partially explain differences between Asian countries/areas. A similar observation can be made for Western countries/areas, yet the difference seems to be smaller between Western value sets. More research, possibly in the form of a meta-analysis, is needed to assess key methodological differences between the value sets within geographical regions and to explore whether these methodological differences and macroeconomic determinants such as health systems may explain differences between the value sets.

# *6.6.5 Limitations*

A limitation of this chapter is that the number of countries/areas included in each preference pattern differs substantially. For example, the Western aggregate value set includes value sets from ten countries/areas, while the Eastern European aggregate value set contains only two. Another limitation is that geographical/cultural regions such as Africa, the Middle East and Latin America are underrepresented in the number of available value sets. Once more value sets are available from these regions, it will become more feasible to determine if there are any preference patterns in these regions and compare them with the currently identifed preference patterns. Finally, another limitation is that the patient data used to test the preference patterns was collected in Western countries/areas only. Therefore, the value sets from non-Western countries/areas (and subsequent aggregate value sets) may not adequately refect the values for patients. Future research using patient data from non-Western countries/areas may improve our understanding of how well these aggregate value sets perform in non-Western countries/areas.

# **6.7 Conclusions**

This chapter identifed key differences between value sets and attempts to group value sets on similarities according to certain relevant characteristics. Five different preference patterns were identifed. As differences between value sets for countries/

areas included within a preference pattern can still be substantial, we still recommend the development and use of national value sets rather than using a value set from a different country/area or from a composite of countries/areas. However, these aggregate value sets could be used for sensitivity analyses when applying foreign value sets.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 7 Where Next for EQ-5D-5L National Value Sets and the EQ-VT Protocol?**

#### **Richard Norman, Nancy Devlin, and Elly Stolk**

**Abstract** The purpose of this chapter is to refect on the future of EQ-5D-5L valuation studies, going beyond the value sets summarised in this book. This includes a number of linked themes. First, the EQ-5D-5L valuation research programme has allowed the continued evolution of methods, as methodological studies have demonstrated that aspects of the EQ-VT protocol could be strengthened or improved. This chapter describes some of the key candidates for future refnement of the methods for valuing EQ-5D-5L. Second, while the standardisation of valuation methodology is important, it is anticipated that many countries may require a less resource-intensive, but still rigorous version of the valuation protocol. This chapter outlines the progress towards developing a 'lite' version of the EQ-VT protocol, and considers the future possibility of valuation protocols based exclusively on discrete choice experiments, with accompanying strengths and weaknesses. Finally, the 'shelf-life' of value sets is considered, along with how demographic and other societal changes may manifest in how people value health, and the implications of that for the need to update EQ-5D-5L value sets.

# **7.1 Introduction**

Previous chapters have provided an overview of the EQ-5D-5L value sets produced to date. Taken together, these value sets – and the methodological development which underpins them – constitute a very substantial body of work. The availability of EQ-5D-5L value sets has facilitated the use of EQ-5D-5L data collected from patients around the world for a variety of purposes. Primarily, these value sets are aimed at supporting the estimation of Quality Adjusted Life Years (QALYs) and QALY gains from health care for use in cost effectiveness and cost utility analysis,

R. Norman

N. Devlin (\*)

School of Population Health, Curtin University, Bentley, Australia

Centre for Health Policy, University of Melbourne, Melbourne, VIC, Australia e-mail: nancy.devlin@unimelb.edu.au

E. Stolk EuroQol Research Foundation, Rotterdam, The Netherlands

providing evidence to inform health technology assessment (HTA) processes globally. Additionally, the value sets allow the use of EQ-5D-5L in other applications, such as monitoring population health (both in the general and patient population) where there is a requirement to summarise EQ-5D profle data, focussing on those aspects of health that are considered to be most important by society (see Chap. 5).

The production of these EQ-5D-5L value sets, coordinated by the EuroQol Group, represents a unique endeavour in scale and breadth, unprecedented in the preference-weighting of other measures of health-related quality of life (HRQoL). It has improved on the earlier EQ-5D-3L valuation efforts, which were largely researcher-driven, used protocols that were not always fully documented, and consequently had limited comparability because of differences in methods and protocols. In contrast, the EQ-5D-5L valuation studies have been based on a similar and well documented protocol for collecting data that is carefully managed in accordance with agreed metrics and includes a deliberate process for incremental improvement of the protocol. The high standards applied in developing the protocol and in the application of quality control in its use have resulted in a protocol (see Chap. 2) that has been successfully replicated in many different contexts. This suggests that a new level of maturity in valuation approach has been reached, and that the techniques used refect modern best practice in the health valuation feld.

While the EQ-5D-5L valuation effort already has signifcant global coverage, further EQ-5D-5L value sets are planned or underway (for example, in the Middle East and Africa where such studies are relatively few), refecting continued growth in use of the instrument. The development of universal health care systems around the world (for example, in China and Mexico) will further reinforce the demand for evidence on 'value for money' to support the allocation of resources in publicly funded public health care systems. This is likely to result in continued demand for use of the EQ-5D-5L and its accompanying value sets in both existing and new contexts.

The purpose of this chapter is to refect on the future of EQ-5D-5L valuation studies, beyond the value sets summarised in Chap. 4. This includes a number of linked themes.

First, the EQ-5D-5L valuation project has allowed continued evolution in methods, as methodological studies have demonstrated that aspects of the protocol could be strengthened or improved. This chapter will describe some of the key candidates for future refnement of the methods.

Second, while the standardisation of the methodology is important, it is anticipated that many countries may seek a less resource-intensive, but still rigorous version of the valuation protocol. We outline progress towards developing a 'lite' version of EQ-VT. This 'lite' version of EQ-VT will also include a description of the development of a stand-alone discrete choice experiment (DCE) protocol, with accompanying strengths and weaknesses relative to the 'gold-standard' approach described in previous chapters.

Finally, it is worthwhile considering the shelf-life of value sets. As time progresses, pre-existing studies become increasingly unreliable estimates of what a contemporary study would report as the 'average' preferences of a society, due to methodological improvements, changes in the demographic makeup of the population, and preference shifts caused by broader cultural trends that may manifest in how people consider HRQoL and its value relative to life extension. More broadly, there are questions about who should make judgements about value sets, e.g., who decides when a new value set is needed? Similarly, who decides whether it is the general public (however defned) or some other group whose preferences are relevant? And who should judge whether any given value set is acceptable for use? What is the role and responsibility of the EuroQol Group versus local HTA bodies or other bodies?

# **7.2 Future Directions for Improvements in the EQ-VT – An Overview**

As has been demonstrated in Chap. 2, signifcant work has gone into ensuring that the EQ-VT protocol is a reliable and defensible method for the valuation of EQ-5D-5L health states. EQ-VT is a *living product* which will continue to evolve. Any concern that has been expressed or that will be expressed regarding the methods adopted in the EQ-VT protocol can act as a catalyst to further research and development and to inform and shape future methodological choices. Some key areas for future progress are described below. Before discussing these, it is important to point out that changing the EQ-VT protocol necessarily involves a balance between using the improvements in data that may arise from incorporating enhanced methods against the reduction in consistency and comparability between value sets. Each advance to the EQ-VT protocol needs to lead to demonstrably better data, ideally in multiple methodological studies in a multinational context. Given the level of existing work to refne the EQ-VT approach, as described in Chap. 2, this sets a high bar for change.

The principal questions concerning the future directions of EQ-VT are in effect the same questions that confront *any* stated preferences study for *any* HRQoL instrument, namely: (i) what method(s) to use to elicit stated preferences, using what mode of administration; (ii) what study design to use (what sample size is required; and what sub-sample of states to include in stated preference tasks); and (iii) what modelling approaches to use to interpolate values across the descriptive system for the HRQoL instrument.

# *7.2.1 What Methods to Use?*

The choice to include both time trade-off (TTO) and DCE methods, made early on in the programme of work (see Chap. 2), refected both the growing popularity of DCE methods in health economics and the long-standing role of TTO in providing evidence to support QALY estimation – and the lack of consensus in health economics about any one method being optimal.

Despite the widespread acceptance of TTO, and the leading place it has earned among EQ-5D valuation methods, there are nevertheless remaining issues with TTO and the variant of it used in EQ-VT, the composite TTO (cTTO). As with any TTO approach, the cTTO tasks in the EQ-VT protocol necessarily incorporate methodological choices e.g., about the iterative routing process used to achieve the point of indifference; and about the duration of the states being valued (see Chap. 2 for more detail). Each of these choices has the potential to exert a framing effect on the values which are produced and might be challenged. For example, the use of a ten-year duration for all states to be valued is very widely used and has come to be regarded as standard, but that duration might be considered as an arbitrary choice, and it is likely that the observed proportional trade-offs would differ if alternative durations were employed (Stalmeier et al. 2007; Craig et al. 2018; Jonker et al. 2018, Attema and Brouwer 2014). The use of a 10-year duration is known to encounter issues with violations of constant proportionality and with the diffculty of imagining states (especially severe ones) over such a long period, without relief. The use of cTTO also involves the use of different tasks for obtaining values > 0 (the conventional TTO) and < 0 (a lead time TTO task) (Devlin et al. 2011; Janssen et al. 2013). The use of different methods for obtaining values across the scale raises questions about the comparability of values above and below 0. The particular design of the task for states < 0 sets the minimum observable value at −1 by design, which has the appeal of avoiding the likely need for rescaling of values. However, it also raises the question of whether −1 is the lowest meaningful value possible and, if values less than that exist, how to refect that (e.g., in modelling). These and other issues will remain the subject for future research.

DCE methods have the appeal of presenting respondents with a potentially simpler choice task, allowing the rapid collection of large quantities of stated preferences data via online self-completion. However, the DCE tasks as included in the EQ-VT protocol have the limitation that they produce values on a latent scale. When the protocol was initially established, DCE approaches that allow calibration of the values relative to 'dead' were still in an early stage of development and were rejected, mainly because results obtained when the methods were tested varied a lot for reasons that were poorly understood. However, research done in recent years has put these initial results into perspective, revealing a dependency of values derived from the DCE-duration approach on modelling choices, design specifcation and the interdependencies between the two (Lim et al. 2018; Jonker et al. 2018; Jonker and Bliemer 2019. This seems to have brought a future closer where DCE can reach more of its potential and have a larger role in valuation studies of EQ-5D instruments. To some extent, this can already be seen in the valuation protocol for EQ-5D-Y, where DCE plays a bigger role (Ramos-Goñi et al. 2020).

# *7.2.2 Procedural Aspects*

Similarly, there is ongoing attention to various procedural aspects of valuation studies. A key one is the basis for decisions about the number of health states and choice tasks to be included in the valuation tasks. It is important to select health states and pairs which allow unbiased estimation of coeffcients based on whichever functional form is required. Yang et al. (2018, 2019) advanced the feld by showing just how much the statistical properties of the set of health states/pairs matter to the predictive performance of the designs, and demonstrated that many published ways to select health states were suboptimal (including popular designs used to value EQ-5D-3L) and that by contrast the design used in EQ-5D-5L valuation studies performed well in comparison to alternative approaches. In the statistical approach to create a design for valuing EQ-5D-5L, the functional form, design, and sample size were considered in parallel. A large number of candidate designs was created using random draws, and the performance of these designs was evaluated using a given model (main effect) and priors derived from pilot studies, and the best one was kept (Oppe and van Hout 2017) (see Chap. 3 for more details). However, scope for improvement may still exist as we do not yet know how larger designs perform, and what number of observations per state is optimal. Moreover, Yang et al. (2019) showed that accurate prediction of the value of mild states is especially challenging and that some designs that perform well overall, perform poorly with respect to the value of mild states. This in turn calls for more attention on the models too.

Questions also exist about the mode of data collection – debate over which was fuelled by the COVID-19 pandemic and the resulting disruption to face-to-face interviewer administration of EQ-VT, as described in Chap. 2, in countries which had been planning value set studies. This gave rise to the idea of conducting EQ-VT interviews online – i.e., interviewer-guided, rather than self-completed, but conducted via an online platform rather than face-to-face. Initial experimentation suggested online data collection to be feasible; to enable reasonable responder engagement; and to yield data that appears to be of acceptable quality (Lipman 2020). Online interviews may even have some advantages e.g., in reaching respondents from broader geographic areas; in reducing costs of interviewer travel; and allowing use of 'expert interviewers' who do not need to be based physically in the same region or even the same country as respondents. However, there are also potential disadvantages e.g., in accessing samples without access to internet. Further, caution is required as there may be important differences between the preferences obtained from each mode of administration. Further evidence is required to establish the equivalence of data obtained via online administration.

# *7.2.3 Analysis and Modelling*

While the EuroQol Group has been prescriptive about the use of its protocol for study design and elicitation, local research teams have a choice about other analyses to undertake, which modelling methods to use and about the criteria to use when choosing which algorithm is regarded as the preferred one. As we have shown in Chap. 4, modelling practise varies widely, but the common underlying protocol nevertheless facilitates comparison of resulting values and value sets between countries.

In particular, value sets differ in regard to whether they base their preferred value set on cTTO data only (for example, China and US), or a hybrid of cTTO and DCE (for example, England and Denmark) (see Chap. 4). Such differences refect both scientifc and strategic issues. Strategically, in some countries HTA bodies have expressed a preference for TTO-based values, and this is refected in the choice of modelling approach taken to value sets. Scientifcally, as is the case when competing approaches are taken to measurement, there is ongoing uncertainty about whether the cTTO and DCE are measuring the same thing, and what should be made of inconsistency between them. For instance, recent work has suggested differing relative importance of dimensions between cTTO and DCE in Peru and Mexico (Augustovski et al. 2020; Gutierrez-Delgado et al. 2021). Going forward, any disagreement in values derived from DCE and cTTO tasks need to be reviewed carefully, in relation to the level of conceptual resemblance between cTTO and DCE, assumptions used in both methods (including modelling assumptions), and scope for implementation issues to arise.

As we survey the future of EQ-5D value set development, we are cognisant that there will *always* be methodological questions; this is part of the inquisitive nature of science and good science depends on scientifc debate. Such questions can lead to different responses: either to strengthen the methods currently in the protocol or to investigate new methods. As long as no method exists that commands universal support – which is likely to be the case here since we have no external validation to judge – any methodological question will fuel debate and can lead to either type of response. The research and development investment of the EuroQol Group in recent years has mainly focussed on refnement of the methods included in EQ-VT, as described in Chap. 2. However, other methods development has also been supported and the EuroQol Group continues to be open to alternatives, both from within the membership and from the broader and vibrant community of health preference and valuation researchers.

The use of TTO over so many years means we have a considerable evidence base to support its use. This has raised the bar for other methods as well, requiring very considerable evidence on their performance and the properties of the preference data they yield, before they can be considered a candidate for use. This is particularly apparent in our cautious approach to DCE, where an ambitious programme of research is underway to yield a deep understanding of its use in valuing EQ-5D instruments. This is good scientifc practise – but is also strategically important, as

stakeholders have a lot riding on their use of EQ-5D data and value sets. No transition can be made lightly, and the level of maturity reached in the EQ-VT protocol is diffcult to match. The EuroQol Group is committed to progressing the science around valuation and to ensure evidence supports a new generation of methods ft for purpose in the future.

# **7.3 Developing Alternative Approaches and Answering Different Questions**

The EQ-5D-5L exists in a dynamic environment, both in terms of the methods that can be used to develop value sets, and the empirical questions it can help to solve. This ever-changing context we work in continues to also present new challenges. The development of a 'Lite' protocol, a lighter, less resource-intensive EQ-VT (as described in Chap. 3), is a good example of this. As we move into more resourceconstrained settings, we need to reduce the cost of conducting valuation surveys, and to make the undertaking of such work more accessible to those who bring essential local knowledge, context and contacts, but relatively less experience in the more technical aspects of the work. But, if we progress down this path, it is unclear whether we yet know the impact of switching protocols, something which requires some caution and careful comparative evaluation.

Either as part of the Lite valuation or not, the confguration of the DCE is an important ongoing consideration. DCEs that include comparisons of states with 'dead' have the appeal of being simple; but DCE with duration arguably conceptually resembles TTO to a greater extent, which may be considered an advantage (Mulhern et al. 2014). This potential advantage was recognized when the EQ-VT protocol was developed, but it was coupled with concerns about the low values that were obtained in some initial applications. Stolk et al. (2019) suggest these results arise because of the difference between DCE with duration and cTTO: the latter observes values and uses lead time TTO to assess the strength of preferences for health states that are classifed as worse than dead. In contrast, the DCE with duration task never indicates directly whether a health state has a value worse than dead. It also relies on extrapolation – and this comes with extra uncertainty and the potential for bias if the underlying assumptions are wrong. Evidence suggests estimates of values obtained by DCE with duration estimates are sensitive to model specifcation and in particular to assumptions made regarding time preferences. Models applied to cTTO rely on the assumption of constant proportionality, which may not hold. However, violations of this assumption can be a bigger problem for DCE with duration than for cTTO, because of the required extrapolation in the former. These issues with DCE with duration are an ongoing area of methodological research.

Quantitative approaches to valuing EQ-5D-5L are valuable and will always remain a centrepiece of value set development within the EuroQol Group. However, there is a growing literature focused on greater refection and deliberation by respondents (Robinson and Bryan 2013; Devlin et al. 2019; Karimi et al. 2017, 2019). This line of enquiry is potentially extremely valuable in identifying *why* respondents place value on certain aspects of health, and also in minimising the risk from datasets being contaminated with ill-considered or hasty responses.

# **7.4 Making Scientifc and Social Value Judgements About Value Sets**

As discussed in Chap. 5, users of value sets should consider both the inherent scientifc quality and the underlying social value judgements that value sets embody. Indeed, community decision makers are becoming more active in independently scrutinising value sets and applying their own quality assurance – for example, the England EQ-5D-5L value set, which was part of the frst wave of studies, was subject to a formal review by the Department of Health for England (Hernández-Alava et al. 2020; van Hout et al. 2020) and ultimately rejected for use by the National Institute for Clinical Excellence (NICE) (NICE 2019). This has led to efforts (currently underway) to produce a new, UK-wide value set. More generally, the question remains of who is responsible for value set endorsement – is this a case of 'caveat emptor' i.e., is it ultimately the responsibility of users and decision-making bodies, or is there a role for the EuroQol Group? To date, other than allowing use of EQ-VT and monitoring data collection via quality control, the EuroQol Group has not imposed any process for approving (or not) the value sets modelled from EQ-VT data.

This question is particularly pertinent in settings where value sets have been developed using methods which are quite different from those recommended by the EuroQol Group at the time. For instance, EQ-5D-5L value set studies using different methods to elicit the state preferences of the general public have been conducted in the US (Craig and Rand 2018) and New Zealand (Sullivan et al. 2019). These value sets are not reported in this book, as our focus is on value sets produced using the EQ-VT protocol. Similarly, there is an emerging body of work examining the preferences of patients, rather than the general public – an example of a value set based on these 'experienced' values can be found in Burström et al. (2020) for Sweden. Such studies offer interesting methodological comparisons and can, under particular circumstances, be used in those countries. However, the differences in methods used in such cases means comparisons of the EQ-5D-5L values yielded by them with the value sets reported in Chap. 4 should be treated with caution, as these differences are attributable to both different local preferences and methodological differences, which are impossible to disentangle.

Moving away from scientifc judgement of value sets, the social values that underpin the use of each are potentially important. While value sets are most commonly developed using the adult general population, this is defned differently in different countries – for example, in Japan and Taiwan this is considered to be those

over 20 years of age; more commonly it is interpreted to be those over 18 years, while in some countries, such as Indonesia, this was set at 17 years and older (see Chap. 4 for details). The views of younger adolescents and children are typically excluded from such studies.1 While the merits of such exclusion in the valuation population can be discussed, a key issue is how we defne the age threshold. At what age do we defne a person to have transitioned into adulthood and able to complete the cognitively challenging valuation tasks we use? And are we imposing age criteria for practical reasons (e.g., with respect to comprehension and data quality), ethical reasons (concerns about confronting younger people with life/death trade-offs) or philosophical/normative reasons about whose preferences should determine public policy – or a combination of all three? To the extent that age impacts on preferences, this can have signifcant implications for decision making in practice. It could be argued that such determinations are best made by the users of the value set themselves. The appropriate method for engagement on such topics is likely to be context-specifc, and will yield different decisions, impacting the comparability of the value sets between nations. This trade-off between consistency and tailoring to the local context is an ongoing challenge.

# **7.5 Adapting to Change**

Previous value sets for the EQ-5D-3L have remained in use and accepted by policy makers for long periods of time e.g., the UK MVH value set (Dolan 1997), data for which were collected in 1993/94, and NICE continues to recommend while awaiting a new EQ-5D-5L value set for the UK. This begs the question of what the shelflife is of such value sets, and what factors might prompt the need for new value sets, bearing in mind both the potential benefts of updated values and the costs of producing them.

Samples are recruited to be representative of the general public at the point at which data are collected, and value sets represent the average preferences of society. Over time, the socio-demographic composition of populations changes due to population ageing, trends in fertility rates and patterns of immigration. These changes could be expected to lead to changes in the averag*e* preferences of the general public, if this means that the share of sub-groups in the population with different preferences changes. Perhaps less obviously, changes in the proportion of the population who are very elderly and more likely to be in residential care, or those incarcerated in prisons or are in other institutions may also be important, since these people often fall outside the sample frames used to recruit the general public. Such changes might indicate the need for a new value set. An alternative would be to use population weights to account for such changes, but this would rely on appropriate

<sup>1</sup>Child health status can be measured and increasingly valued using the EQ-5D-Y, but the value sets that accompany the EQ-5D-Y are typically based on the stated preferences of the adult general public, and not those of younger people.

demographic data collection during the initial value set development, which would be challenging as we do not know in advance the population demographics we would want to weight on.

Changes in preferences could provide another reason for updating value sets and may arise due to other factors infuencing society. For example, over time living standards health and HRQoL have improved for many people, and this may increase our expectations about health and health care in ways that affect our preferences for HRQoL. There may also be specifc health issues locally that exert an effect on preferences. One might speculate about whether the high-profle debates over euthanasia that have occurred in a number of countries might affect the trade-offs the general public were prepared to make against dead/duration. In Mexico, the relatively high importance placed on problems with mobility have been suggested to be linked to the widespread lack of support or social services for those with mobility problems (Gutierrez-Delgado et al. 2021). In general, increasing awareness of mental health issues may affect how people consider these health issues and their importance relative to other health problems. The COVID-19 pandemic, and its global impact, could also potentially exert an effect on how people value HRQoL. However, there is a lack of research on such factors and very little clear evidence on how they affect stated preferences.

These issues suggest a rationale for updating value sets from time to time – but there are currently no guidelines about this, and no consensus about what factors or *prima facie* evidence should trigger an update. One possibility may be to conduct a less expensive survey, such as a DCE, at regular intervals with updated sampling frames to monitor if there is evidence of preference shifts which might motivate conduct of a replication EQ-VT study to accurately capture the shift.

Further, the benefts of updating a value set need to be weighed up against the costs. These include not just the costs of producing a new value set but the costs and consequences for their use in decision making. For example, HTA bodies may be concerned about changes to the HRQoL values used in cost effectiveness evidence and the implications of these for consistency of their decisions. In economists' terms, these changes impose costs of their own, so updating may need to be balanced against these pragmatic and operational considerations.

# **7.6 Concluding Remarks**

The national value sets for EQ-5D-5L summarised in this book play a vital role in supporting the use of EQ-5D-5L data, providing evidence for HTA and other health care decision making contexts. The EQ-VT protocol used to produce these value sets can now be considered to represent a mature and well-tested set of methods. However, there will *always* remain questions relating to which methods for eliciting and modelling values for HRQoL are best – and this is the case both for EQ-5D-5L and other HRQoL instruments. The EuroQol Group actively encourages and supports innovative research and development into valuation methods and is a leading investor in such research internationally. This ensures that there is scope for researchers to develop and explore potential new methods, and a process for assessing the case for their inclusion in the protocol in future. These efforts not only beneft studies to value EQ-5D-5L, but also inform the wider scientifc agenda on valuation of HRQoL instruments.

# **References**


EQ-5D tariff: methodology report. Health Technol Assess 18(12):vii–xxvi, 1–191. https://doi. org/10.3310/hta18120. PMID: 24568945; PMCID: PMC4781204


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Glossary of EQ-5D Terms**

In this section, we set out the terms used in this book to describe specifc aspects of the EQ-5D instruments and the methods used to develop and report their value sets. This glossary builds on that provided in Devlin et al. (2020), which has been reproduced here with permissions from the publishers and the authors and updated where necessary including adding terms relating to valuation of EQ-5D-5L which arise in this book. Further information on EQ-5D nomenclature is provided in Brooks et al. (2020).

Terms which appear in bold within each description indicate that to be a term which is also included and defned elsewhere in the glossary. Terms appear in alphabetical order.

Please note that general statistical terms used in this book which are not specifc to the valuation of EQ-5D-5L are not defned in this glossary; readers who require clarifcation on methods used for analysing and modelling valuation data are encouraged to refer to relevant textbooks (e.g., Cameron and Trivedi 2005) and journal articles (e.g., Ramos-Goñi et al. 2017).






# **References**

