#### Matteo Di Masoa , Monica Ferraronia , Pasquale Ferranteb , Serena Delbueb , Federico Ambrogia **Longitudinal profile of a set of biomarkers in predicting Covid-19 mortality using joint models**

**Longitudinal profile of a set of biomarkers in predicting Covid-19 mortality using joint models** 

 Department of Clinical Sciences and Community Health, Branch of Medical Statistics, Biometry and Epidemiology "G.A. Maccacaro", University of Milan, Milan, Italy Department of Biomedical, Surgical & Dental Sciences, Università degli Studi di Matteo Di Maso, Monica Ferraroni, Pasquale Ferrante, Serena Delbue, Federico Ambrogi

Milano, Milan, Italy

## **1. Introduction**

a

b

In survival analysis, time-varying covariates (i.e., covariates that are repeatedly measured over time) are endogenous when (i) their measurements are directly related to the event status and (ii) when incomplete information occur at random points during the follow-up because subjects may skip schedule visits and dropout from the study (Rizopoulos, 2012). Consequently, the classical time-dependent Cox model (Therneau and Grambsch, 2000) leads to biased estimates.

In order to correctly estimate the association between a time-to-event outcome and endogenous covariates, two approaches become in widespread use. The first is the joint model (JM) for the simultaneously analysis of longitudinal and time-to-event data (Rizopoulos, 2012). In this approach, the survival sub-model (used to predict hazards for a set of time-invariant covariates) and longitudinal sub-model (used to predict timevarying covariates) are interdependent by means of a set of random effects (i.e., shared parameters). Random effects are individual-specific model terms, and their inclusion in JM provides a way of producing overall predictions. The second approach is the landmarking analysis (van Houwelingen and Putter, 2012), a more pragmatic method that avoids modelling the time-varying covariates. In this approach, the estimated effect of the time-varying covariates is based on the value at the landmark time point, after which values of time-varying covariates may change.

During the first wave of Covid-19 pandemic, physicians at Istituto Clinico di Città Studi in Milan collected a set of inflammatory biomarkers in order to understand what might be used as prognostic factors in progression and mortality of Covid-19 disease. Biomarkers were collected repeatedly over the follow-up. Furthermore, particularly in the first epidemic outbreak, physicians did not have standard clinical protocols for management of Covid-19 disease and for this reason, measurements of biomarkers were highly incomplete especially at the baseline.

The aim of the present study is twice. Using data on Covid-19 patients, we firstly evaluate the association of a single biomarker on Covid-19 mortality using JM, landmarking, and time-dependent Cox model in order to compare estimates. Second, we present JM estimates for the whole set of biomarkers collected on Covid-19 patients to evaluate their association on mortality risk.

177 Matteo Di Maso, University of Milan, Italy, matteo.dimaso@unimi.it, 0000-0002-6481-990X Monica Ferraroni, University of Milan, Italy, monica.ferraroni@unimi.it, 0000-0002-4542-4996 Pasquale Ferrante, University of Milan, Italy, pasquale.ferrante@unimi.it Serena Delbue, University of Milan, Italy, serena.delbue@unimi.it, 0000-0002-3199-9369

Federico Ambrogi, University of Milan, Italy, federico.ambrogi@unimi.it, 0000-0001-9358-011X

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

Matteo Di Maso, Monica Ferraroni, Pasquale Ferrante, Serena Delbue, Federico Ambrogi, *Longitudinal profile of a set of biomarkers in predicting Covid-19 mortality using joint models*, pp. 191-196, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88- 5518-461-8.36, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

# **2. Methods**

#### *Theoretical framework of JM*

According to the shared parameter approach, the JM consists of two sub-models: one to model the time-to-event outcome (survival sub-model) and the other to model the timevarying covariates (longitudinal sub-model).

The survival sub-model is a typical semi-parametric (or parametric) model for timeto-event outcome. Let \* *Ti* be the true event time for the *th i* subject (with *i N* 1, , ), *Ti* be the observed event time, defined as the minimum of the potential right-censoring time *Ci* and \* *Ti* , i.e., \* min , *T TC <sup>i</sup> i i* , and let \* *i ii IT C* be the event indicator. Furthermore, let *m t <sup>i</sup>* be the true and unobserved value of a single time-varying covariate at time *t* . The (proportional) hazards model is:

$$h\_i\left(t^\*\right) = h\_0\left(t^\*\right) \cdot \exp\left\{\beta\_j X\_{ij} + \alpha m\_i\left(t\right)\right\}$$

where *h*<sup>0</sup> denotes the baseline risk function, *Xij* is the set of *j* time-invariant covariates measured at baseline for the *th i* subject, *<sup>j</sup>* is the corresponding vector of regression coefficients, and is the regression coefficient for the time-varying covariate, quantifying the effect of such variable to the event risk.

The longitudinal sub-model is a typical linear mixed model for longitudinal outcome. As information on the time-varying covariate are collected intermittently and with error at a set of few time points for each subject, the aim of longitudinal sub-model is to predict the complete longitudinal history (also called trajectory) of the time-varying covariate (the outcome of the longitudinal sub-model) for a set of time-invariant covariates. In particular, longitudinal sub-model is:

$$\mathcal{Y}\_i\left(t\right) = m\_i\left(t\right) + \varepsilon\_i\left(t\right)$$

where *m t X gZ t <sup>i</sup> j ij i i* , with *gND <sup>i</sup>* ~ 0, and <sup>2</sup> ~ 0, *<sup>i</sup> t N* . The quantity *y t <sup>i</sup>* is the observed longitudinal outcome for the *th i* subject at time *t* , *<sup>j</sup>* denote the estimates for the fixed effects *Xij* and *<sup>i</sup> g* denote the estimates for the random effects *Z t <sup>i</sup>* . In the shared parameter approach, the random effects are common for the longitudinal and survival sub-models.

Recently, a Bayesian approach for fitting JM was introduced. In particular, estimation of JM's parameters proceeds using Markov chain Monte Carlo (MCMC) algorithm. The posterior distribution of the model parameters is derived under the assumptions that given the shared parameter, both longitudinal and survival sub-models are assumed independent, and the longitudinal outcomes of each subject are assumed independent. In this approach, non-informative priors can be used for explorative purposes.

### *Landmarking analysis*

The idea behind landmarking analysis is to select, for a given time point *LM t s* , all subjects alive and under follow-up at time *s* . In particular, landmarking involves to set *s* and using the value of the time-varying covariate at *s* as fixed covariate in a timedependent Cox model from *s* onwards, in a subset of subjects at risk at *s* . For a generic subject *i*, the objective is to use part of the information of the time-varying covariate of the subject to estimate the conditional probability that the subject is still alive after a predefined time window *w*. More specifically, at a prediction time point *s* , the conditional probability that the subject is still alive at time *w s* conditionally on being alive at time *s* and conditional the history of the time-varying covariate up to *s* is given by:

$$\pi\_i\left(\mathbf{s} + \boldsymbol{\omega} \mid \mathbf{s}\right) = P\left\{T\_i > \mathbf{s} + \boldsymbol{\omega} \mid T\_i \ge \mathbf{s}, m\_i\left(\mathbf{s}\right)\right\}$$

with *m s <sup>i</sup>* denoting the history of time-varying covariate up to *s* .

### *Data collection*

Between 21 February and 19 March 2020, a total of 403 Covid-19 patients were admitted at Istituto Clinico Città Studi in Milan. Patients aged 21-100 years and 58.3% were men. Person-time at risk was computed as the time elapsed from the day of hospital admission to the day of Covid-19 death (event time), to the day of hospital discharge, or to the day of moving in other structure (right-censoring time), whichever came first. Baseline characteristics included sex and age of patients, whereas biomarkers measurements included ferritin (ng/ml), lymphocytes count, neutrophil granulocytes count, D-dimer (ng/ml), C-reactive protein (ml/l), glucose (mg/dl) and lactate dehydrogenase (LDH; U/l).

## *Statistical analysis*

In order to compare JM, landmarking, and time-dependent Cox model estimates, ferritin was considered. In particular, logarithm of ferritin (log-ferritin) levels was used to account for the skewedness of the measurements. According to the Bayesian approach, independent and non-informative priors for the fixed effects of the longitudinal and survival sub-models (i.e., age and sex) and for the shared parameter (i.e. subject-specific predicted trajectories of log-ferritin level) were used in the JM. In addition, a natural cubic spline with 2 knots was used to model the subject specific log-ferritin trajectories through time and to model age. Two knots are generally sufficient to detect mild non-linear effects and to avoid over-parametrization of the model considering the available sample size.

In landmarking analysis, a set of landmarking time point for log-ferritin time of measurements was considered. In particular, data were analysed with *s* running from 3 to 20 days which corresponded to the median and the 75th centile of log-ferritin time of the first and last measurements, respectively. Prediction windows *w* were set at 7, 14, 21, and 28 days. Age was modelled in the same way as the JM.

In the time-dependent Cox model, observed log-ferritin levels was incorporated as time-varying covariate using a natural cubic spline with 2 knots as well as age.

In order to provide associations between biomarkers and Covid-19 mortality, a JM of each biomarker one at time with the occurrence of Covid-19 death was performed (univariable JM). Logarithmic transformation was considered for ferritin, lymphocytes (log-lymphocytes), neutrophil granulocytes (log-neutrophil granulocytes), D-dimer (log-D-dimer), and C-reactive protein (log-C-reactive protein). In the multivariable JM including all biomarkers (multivariable JM), D-dimer was excluded due to the high number (78; 19%) of patients with missing values. Assumptions for priors, biomarkers trajectories and age were the same as the JM for log-ferritin previously described.

Analyses were performed using JMbayes (Rizopoulos, 2016) and dynpred (Putter, 2015) packages in R Statistical Software, version 4.0.5 (R Core Team 2021).

# **3. Results**

Among 403 Covid-19 patients admitted at Istituto Clinico Città Studi, 140 patients died during the follow-up. Among 263 patients survived, 99 were discharged and 164 were moved in other structures. The median of follow-up was 14 days (range: 0-78 days).

Hazard ratios (HR) and corresponding 95% confidence intervals (CI) from the (biased) time-dependent Cox model and JM for log-ferritin levels (ng/ml) were 2.10 (1.67-2.64) and 1.73 (1.38-2.20), respectively. According to landmarking analysis, the HR was 1.73 (1.25-2.38) for a prediction window of 7 days. With regards to 14, 21, and 28 prediction windows, HRs were 1.86 (1.36-2.54), 1.91 (1.40-2.60), and 1.91 (1.40- 2.61), respectively.

The estimates obtained from univariable JM showed decreased level through time for expected log-ferritin according to the negative coefficients for the splines of time at measurements (table 1). Conversely, the expected level of log-ferritin increased with increasing age and men showed higher expected levels than women. The expected loglymphocytes count increased through time, whereas it decreased with age. No association emerged between log-lymphocytes and sex. The expected log-neutrophil granulocytes count decreased through time, whereas it increased with age and men showed higher levels. Likewise, expected log-D-dimer levels decreased through time, increased with age and men had higher levels. For log-C-reactive protein, expected levels showed a mixed trend through time. In particular, levels initially decreased according to the negative coefficient for the first part of follow-up and increased thereafter. The expected log-Creactive protein levels increased with age and men showed higher levels than women. Expected levels of glucose and LDH decreased through time, while increasing with age and men had higher levels.

In univariable JM, all biomarkers were significantly associated with Covid-19 mortality. An increase in the levels of biomarkers was associated with an increased in the mortality risk, except for lymphocytes. In particular, doubling of levels for loglymphocytes count was associated with approximately halving mortality risk (HR=0.58; 95% CI: 0.46-0.73). The strongest associations were observed for log-neutrophil granulocytes (HR=2.87; 95% CI: 2.30-3.51 for doubling of levels), for log-C-reactive protein (HR=2.44; 95% CI: 2.01-2.97 for doubling of levels) and glucose (HR=2.89; 95% CI: 1.92-4.26 for an increase of 100 mg/dl).

The multivariable JM was estimated using data on 320 patients with 96 (30%) events (after exclusion of patients with missing values for D-dimer). For ferritin and lymphocytes there were no more evidence of association with mortality. The strength of the association was attenuated with respect to the univariable JM for log-neutrophil granulocytes (HR=1.78; 95% CI: 1.16-2.69 for doubling of levels), log-C-reactive protein (HR=1.44; 95% CI: 1.13-1.83 for doubling of levels), LDH (HR=1.28; 95% CI: 1.09- 1.49 for an increase of 100 UI/l), and glucose (HR of 2.44; 95% CI: 1.28-4.26 for an increase of 100 mg/dl).

However, the strongest effect in both univariable and multivariable JM was observed for age with a HR starting to rapidly increase approximately at 60 years.

# **4. Conclusion**

In the present work, we firstly compared HR estimates of a single time-varying covariate (log-ferritin) using different approaches. The HRs from JM and landmarking approaches were lower than that of the time-dependent Cox model. In addition, landmarking estimate for a 7-day prediction window was similar to the estimate of the JM, but it tended to increase increasing prediction window. However, landmarking estimates were lower than the time-dependent Cox model one.

Finally, the multivariable JM model showed associations between some biomarkers and Covid-19 mortality but the strong association between age and mortality risk persisted after adjusted for biomarkers considered.

## **References**

Rizopoulos D. (2012). *Joint Models for Longitudinal and Time-to-Event Data. With Application in R.* Boca Raton: Chapman & Hall/CRC.



**Table 1.** Univariable and multivariable joint model estimates.

