Josafhat Salinas Ruíz Osval Antonio Montesinos López Gabriela Hernández Ramírez Jose Crossa Hiriart

# Generalized Linear Mixed Models with Applications in Agriculture and Biology

Generalized Linear Mixed Models with Applications in Agriculture and Biology

Josafhat Salinas Ruíz • Osval Antonio Montesinos López • Gabriela Hernández Ramírez Jose Crossa Hiriart

# Generalized Linear Mixed Models with Applications in Agriculture and Biology

Josafhat Salinas Ruíz Innovación Agroalimentaria Sustentable Colegio de Postgraduados Córdoba, Mexico

Gabriela Hernández Ramírez Ciencia de los Alimentos y Biotecnología Tecnológico Nacional de México/ITS de Tierra Blanca, Mexico

Osval Antonio Montesinos López Facultad de Telemática University of Colima Colima, Mexico

Jose Crossa Hiriart International Maize and Wheat Improvement Center (CIMMYT) Texcoco, Mexico

ISBN 978-3-031-32799-5 ISBN 978-3-031-32800-8 (eBook) https://doi.org/10.1007/978-3-031-32800-8

The translation was done with the help of artificial intelligence (machine translation by the service DeepL.com). A subsequent human revision was done primarily in terms of content.

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# Foreword

This book is another example of CIMMYT's commitment to scientific advancement, and comprehensive and inclusive knowledge sharing and dissemination. By using alternative statistical models and methods to describe and analyze data sets from different disciplines, such as biology and agriculture, the book facilitates the adoption and effective use of these tools by publicly funded researchers and practitioners of national agricultural research extension systems (NARES) and universities across the Global South.

The authors aim to offer different and new models, methods, and techniques to agricultural scientists who often lack the resources to adopt these tools or face practical constraints when analyzing different types of data.

This work would not be possible with the continuous support of CIMMYT's outstanding partners and donors who invest in non-profit frontier research for the benefit of millions of farmers and low-income communities worldwide. For that reason, it could not be more fitting for this book to be published as an open access resource for the international community to benefit from. I trust that this publication will greatly contribute to accelerate the development and deployment of resourceefficient and nutritious crops for a food secure future.

> Bram Govaerts Director General, CIMMYT

# Acknowledgments

The work presented in this book described models and methods for improving the statistical analyses of continuous, ordinal, and count data usually collected in agriculture and biology. The authors are grateful to the past and present CIMMYT Directors General, Deputy Directors of Research, Deputy Directors of Administration, Directors of Research Programs, and other Administration offices and Laboratories of CIMMYT for their continuous and firm support of biometrical genetics, and statistics research, training, and service in support of CIMMYT's mission: "maize and wheat science for improved livelihoods."

This work was made possible with support from the CGIAR Research Programs on Wheat and Maize (wheat.org, maize.org), and many funders including Australia, United Kingdom (DFID), USA (USAID), South Africa, China, Mexico (SAGARPA), Canada, India, Korea, Norway, Switzerland, France, Japan, New Zealand, Sweden, and the World Bank. We thank the financial support of the Mexico Government throughout MASAGRO and several other regional projects and close collaboration with numerous Mexican researchers.

We acknowledge the financial support provided by the (1) Bill and Melinda Gates Foundation (INV-003439 BMGF/FCDO Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods [AG2MW]) as well as (2) USAID projects (Amend. No. 9 MTO 069033, USAID-CIMMYT Wheat/AGGMW, AGG-Maize Supplementary Project, AGG [Stress Tolerant Maize for Africa]).

Very special recognition is given to Bill and Melinda Gates Foundation for providing the Open Access fee of this book.

We are also thankful for the financial support provided by the (1) Foundations for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806, (2) Sveriges Llantbruksuniversitet (Swedish University of Agricultural Sciences) Department of

The original version of this book has been revised. The Acknowledgment section which was inadvertently omitted after the Foreword has now been included.

Plant Breeding, Sundsvägen 10, 23053 Alnarp, Sweden, (3) CIMMYT CRP, (4) the Consejo Nacional de Tecnología y Ciencia (CONACYT) of México, and (5) Universidad de Colima of Mexico.

We highly appreciate and thank the several students and professors at the Universidad de Colima and students as well professors from the Colegio de Post-Graduados (COLPOS) who tested and made suggestions on early version of the material covered in the book; their many useful suggestions had a direct impact on the current organization and the technical content of the book.

# Contents





### Contents xiii


# Chapter 1 Elements of Generalized Linear Mixed Models

# 1.1 Introduction to Linear Models

Linear models are commonly used to describe and analyze datasets from different research areas, such as biological, agricultural, social, and so on. A linear model aims to best represent/describe the nature of a dataset. A model is usually made up of factors or a series of factors that can be nominal or discrete variables (sex, year, etc.) or continuous variables (age, height, etc.), which have an effect on the observed data. Linear models are the most commonly used statistical models for estimating and predicting a response based on a set of observations.

Linear models get their name because they are linear in the model parameters. The general form of a linear model is given by

$$\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{e} \tag{1.1}$$

where y is the vector of dimension n × 1 observed responses, X is the design matrix of n × ( p + 1) fixed constants, β is the vector of ( p + 1) × 1 parameters to be estimated (unknown), and ε is the vector of n × 1 random errors. Linearity arises because the mean response of vector y is linear to the vector of unknown parameters β. Mathematically, this is demonstrated by obtaining the first derivative of the predictor with respect to β, and, if after derivation it is still a function of any of the beta parameters, then the model is said to be nonlinear; otherwise, it is a linear model. In this case, the derivative of the predictor (1.1) with respect to beta is equal to X, so, mathematically, the model in (1.1) is linear, since after derivation, the predictor no longer depends on the β parameters.

Several models used in statistics are examples of the general linear model y = Xβ + e. These include regression models and analysis of variance (ANOVA) models. Regression models generally refer to those in which the design matrix X is of a full column rank, whereas in analysis of variance models, the design matrix X is not of a full column rank. Some linear models are briefly described in the following sections.

# 1.2 Regression Models

Linear models are often used to model the relationship between a variable, known as the response or dependent variable, y, and one or more predictors, known as independent or explanatory variables, X1, X2, ⋯, Xp.

# 1.2.1 Simple Linear Regression

Consider a model in which a response variable y is linearly related to an explanatory variable X<sup>1</sup> via

$$\mathbf{y}\_i = \beta\_0 + \beta\_1 X\_{1i} + e\_i$$

where ε<sup>i</sup> are uncorrelated random errors (i = 1, 2,⋯, n) which are commonly assumed to be normally distributed with mean 0 and variance constant σ<sup>2</sup> > 0, ε<sup>i</sup> ~ N(0, σ<sup>2</sup> ). If X11, X12, ⋯, X1<sup>n</sup> are constant (fixed), then this is a general linear model y = Xβ + e where

$$\mathbf{y\_{n \times 1}} = \begin{pmatrix} y\_1 \\ y\_2 \\ \vdots \\ y\_n \end{pmatrix}, \ X\_{n \times 2} = \begin{pmatrix} 1 & X\_{11} \\ 1 & X\_{12} \\ \vdots & \vdots \\ 1 & X\_{1n} \end{pmatrix}, \ \mathbf{\mathcal{J}\_{2 \times 1}} = \begin{pmatrix} \boldsymbol{\beta}\_0 \\ \boldsymbol{\beta}\_1 \end{pmatrix}, \ \mathbf{e\_{n \times 1}} = \begin{pmatrix} \boldsymbol{\epsilon}\_1 \\ \boldsymbol{\epsilon}\_2 \\ \vdots \\ \boldsymbol{\epsilon}\_n \end{pmatrix}$$

Example Let us consider the relationship between the performance test scores and tissue concentration of lysergic acid diethylamide commonly known as LSD (from German Lysergsäure-diethylamid) in a group of volunteers who received the drug

Table 1.1 Average mathematical test scores and LSD tissue concentrations



(Wagner et al. 1968). The average scores on the mathematical test and the LSD tissue concentrations are shown in Table 1.1.

The components of this regression model are as follows:

þ ð Þ Distribution: yi - N ηi, σ<sup>2</sup> Linear predictor: η<sup>i</sup> = β<sup>0</sup> β<sup>1</sup> ×Conc<sup>i</sup> Link function: μ<sup>i</sup> = η<sup>i</sup> identity

The syntax for performing a simple linear regression using the GLIMMIX procedure in Statistical Analysis Software (SAS) is as follows:

proc glimmix; model y= X1/solution; run;

Part of the results is shown in Table 1.2. The analysis of variance (item a) indicates that drug concentration has a significant effect on average mathematical performance (P = 0.0019). The estimates of the regression model parameters (item b) are β<sup>0</sup> and β1, and the mean squared error (MSE scale) is shown in Table 1.2(b) under "Parameter estimates."

With these results, the linear predictor η<sup>i</sup> ð Þthat predicts the average mathematical performance as a function of LSD concentration is as follows:

$$
\hat{\eta}\_i = 89.124 - 9.01 \times \text{Conc}\_i
$$

This means that we can predict the average mathematical performance of an individual for whom we need to know the LSD concentration (Conci) to be applied. From the estimated parameters, we can say that there is a negative relationship between LSD concentration and mathematical score. Figure 1.1 clearly shows that an increase in drug supply has a negative effect on the mathematical score of the youth. This fitted model explains 87.7% of the variability in the data (Fig. 1.1).

Fig. 1.1 Relationship between applied drug concentration and the mathematical score of the youth

Adjusted model of the relationship between the average score and LSD concentration.

# 1.2.2 Multiple Linear Regression

Suppose that a response variable y is linearly related to several independent variables X1, X2, ⋯, Xp such that

$$\mathbf{y}\_i = \beta\_0 + \beta\_1 \mathbf{X}\_{i1} + \beta\_2 \mathbf{X}\_{i2} + \dots + \beta\_p \mathbf{X}\_{ip} + \varepsilon\_i$$

for i = 1, 2, ⋯, n. Here, ε<sup>i</sup> are uncorrelated random errors (i = 1, 2, ⋯, n) normally distributed with a zero mean and constant variance σ<sup>2</sup> , i.e., ε<sup>i</sup> ~ N(0, σ<sup>2</sup> ). If the explanatory variables are fixed constants, then the above model belongs to a general linear model of the form y = Xβ + ε, as can be seen below:

$$\begin{aligned} \mathbf{y}\_{n\times 1} &= \begin{pmatrix} \mathbf{y}\_1 \\ \mathbf{y}\_2 \\ \vdots \\ \mathbf{y}\_n \end{pmatrix}, \ X\_{n\times (p+1)} = \begin{pmatrix} 1 & X\_{11} & X\_{12} \cdots & X\_{1p} \\ 1 & X\_{21} & X\_{22} \cdots & X\_{2p} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & X\_{n1} & X\_{n2} \cdots & X\_{np} \end{pmatrix}, \ \mathbf{\beta}\_{p\times 1} &= \begin{pmatrix} \beta\_0 \\ \beta\_1 \\ \beta\_2 \\ \vdots \\ \beta\_p \end{pmatrix}, \\ \mathbf{e}\_{n\times 1} &= \begin{pmatrix} \epsilon\_1 \\ \epsilon\_2 \\ \vdots \\ \epsilon\_n \end{pmatrix} \end{aligned}$$


Table 1.3 Body weight (kilograms) and its relationship with circumference (centimeters) and heart length (centimeters) of seven young bulls

A regression analysis can be used to assess the relationship between explanatory variables and the response variable. It is also a useful tool for predicting future observations or simply describing the structure of the data.

Example Let us to fit a regression model of the relationship between body weight and heart girth and length of the hearts of seven young bulls from the data shown in Table 1.3.

The components of this multiple regression model are as follows:

þ þ ð Þ Distribution: yi - N ηi, σ<sup>2</sup> Linear predictor: η<sup>i</sup> = β<sup>0</sup> Xi1β<sup>1</sup> Xi2β<sup>2</sup> Link function: μ<sup>i</sup> = η<sup>i</sup> identity

The syntax for performing a multiple regression using the GLIMMIX procedure in SAS, assuming that there is no interaction between bull heart girth (X1) and length (X2), is shown below:

proc glimmix; model y = X1 X2/solution cl; run;

Based on the regression model specifications, the option "solution cl" prompts GLIMMIX to provide the value of the estimated parameters and their respective confidence intervals. Other useful options available are "htype = 1, 2, and 3," which refer to the sum of squares of types I, II, and III. The type III fixed effects tests in (a) of Table 1.4 indicate that there is a linear relationship between heart length (size) and weight in young bulls. The estimated parameters with their respective confidence intervals β0, β1, β<sup>2</sup> as well as the MSE (scale) of the fitted regression model are listed below in (b).

Note that in a linear model, the parameters are linearly entered, but the variables do not necessarily have to be linear. For example, consider the following two examples:

$$\rho\_i = \beta\_0 + \beta\_1 X\_{i1} + \beta\_2 \log(X\_{i2}) + \dots + \beta\_k X\_{ik} + \epsilon\_i$$


Table 1.4 Results of the multiple regression analysis

$$\mathbf{y}\_i = \beta\_0 + X\_{i1}^{\beta\_1} + \beta\_2 X\_{i2} + \dots + \beta\_k \exp(X\_{ik}) + \epsilon\_{ik}$$

The first example is a linear model, whereas the second one is not, since its derivatives do not depend on the beta coefficients, with the exception of the term Xβ<sup>1</sup> i1 whose derivative is equal to Xβ<sup>1</sup> <sup>i</sup><sup>1</sup> logð Þ Xi<sup>1</sup> . This clearly shows that the second example is a nonlinear model because the derivative of the predictor depends on β1.

# 1.3 Analysis of Variance Models

# 1.3.1 One-Way Analysis of Variance

Consider an experiment in which you want to test t treatments (t > 2), to the level of the ith treatment with ni experimental units that are selected and randomly assigned to the ith treatment. The model describing this experiment is as follows:

$$\mathbf{y}\_{\vec{ij}} = \mu + \mathfrak{r}\_i + \mathfrak{e}\_{\vec{ij}}$$

experimental units n = i ni. for i = 1, 2, ⋯, t and j = 1, 2, ⋯, ni . Here, Eij are the uncorrelated random errors with normal distribution with a zero mean and a variance constant σ<sup>2</sup> (εij ~ N(0, σ<sup>2</sup> )). If the treatment effects are considered as fixed constants (drawn from a finite number), then this model is <sup>a</sup> special case of the general linear model (1), with the total number of t


Error t(r - 1) = 3 × 2 = 6 Total tr - 1 = 3 × 3 - 1 = 8

In matrix terms, the information under this design of experiment is equal to:

$$\begin{aligned} \mathbf{y}\_{\mathbf{n}\times 1} &= \begin{pmatrix} \mathbf{y}\_{11} \\ \mathbf{y}\_{12} \\ \vdots \\ \mathbf{y}\_{\mathbf{n}\cdot} \end{pmatrix}, \quad \mathbf{X}\_{\mathbf{n}\times(\ell+1)} = \begin{pmatrix} \mathbf{1}\_{n\_{l}} & \mathbf{1}\_{n\_{l}} & \mathbf{0}\_{n1} & \cdots & \mathbf{0}\_{n\_{l}} \\ \mathbf{1}\_{n\_{l}} & \mathbf{0}\_{n\_{l}} & \mathbf{1}\_{n\_{l}} & \cdots & \mathbf{0}\_{n\_{l}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathbf{1}\_{n\_{l}} & \mathbf{0}\_{n\_{l}} & \mathbf{0}\_{n\_{l}} & \cdots & \mathbf{1}\_{n\_{l}} \end{pmatrix}, \quad \boldsymbol{\theta}\_{t+1\times 1} = \begin{pmatrix} \mu \\ \tau\_{1} \\ \tau\_{2} \\ \vdots \\ \tau\_{l} \end{pmatrix}, \\ \boldsymbol{\epsilon}\_{\mathbf{n}\times 1} &= \begin{pmatrix} \mathbf{e}\_{11} \\ \mathbf{e}\_{12} \\ \vdots \\ \mathbf{e}\_{13} \end{pmatrix} \end{aligned}$$

where 1ni is the vector of ones of order ni and 0ni is the vector of zeros of order ni. Note that the matrix Xn<sup>×</sup>(t+1) is not of a full column rank because its first column can be obtained as a linear combination of its remaining columns.

Example Assume that measurements of the biomass produced by three different types of bacteria are collected in three separate Petri dishes (replicates) in a glucose broth culture medium for each bacterium (Table 1.5).

The sources of variation and degrees of freedom (DFs) for this experiment are shown in Table 1.6.

The components for this one-way model, assuming that each of the response variable yij is normally distributed, are as follows:

> þ ð Þ Distribution: yij - N μij, σ<sup>2</sup> Linear predictor: η<sup>i</sup> = α τ<sup>i</sup> Link function: μ<sup>i</sup> = η<sup>i</sup> identity

where yij is the response observed at the jth repetition in the ith bacterium, η<sup>i</sup> is the linear predictor, α is the intercept (the grand mean), and τ<sup>i</sup> is the fixed effect due to the type of bacterium.


The SAS syntax for a one-way analysis of variance (ANOVA) is as follows:

```
proc glimmix data=biomass;
class bacteria;
model y = bacteria;
lsmeans bacteria/lines;
run;
```
Similar to "proc glm" or "proc mixed," the "class" command allows to define the type of class variables (categorical or nominal) to be included in the model; in this case, for the class variable "bacteria," the "model" command allows to declare (list) the response variable "y" and all the class or continuous variables that enter the model, whereas the "lsmeans" command asks GLIMMIX to estimate the means of the treatments and the "lines" option allows to make a comparison of means. Part of the results is presented below.

By default, "proc GLIMMIX" provides the fit statistics (information criteria), which are extremely useful for comparing or choosing a model that explains the largest possible proportion of variation present in a dataset, i.e., the best-fit model (part (a) of Table 1.7). The statistic "-2 res log likelihood" is most useful when comparing nested models, and the rest of the statistics is useful for comparing models that are not necessarily nested. The mean squared error (MSE) in GLIMMIX is given as the statistic "Pearson′ s chi - square/DF." In this analysis, this value is 8.78. σ<sup>2</sup> = MSE = 8:78 . In part (b), the analysis of variance indicates that at least one type of bacterium produces a different biomass (P < 0.0001). That is, the null hypothesis is rejected (H<sup>0</sup> : τ<sup>A</sup> = τ<sup>B</sup> = τC) at a significance level of 5%.

The estimated least squares (LS) means obtained with "lsmeans" are tabulated under the "Estimate" column with their standard errors in the "Standard error" column of Table 1.8. These estimated means were obtained (by default) with Fisher's LSD (least significant difference).


Table 1.8 Means and estimated standard errors of the one-way model

Table 1.9 Comparison of the means (LSD) in the one-way model


Finally, Table 1.9 presents a comparison of the means obtained with "lines" and indicates that bacteria type C has a better fermentative conversion of glucose to lactic acid compared to bacteria types B and A. Equal letters per column indicate that they are statistically equal.

# 1.3.2 Two-Way Nested Analysis of Variance

Let us consider an experiment with two factors, A and B, in which each level of B is nested within a level of factor A, that is, each level of factor B appears within a level of factor A. Then, the model that describes this experiment is as follows:

$$\mathcal{y}\_{ijk} = \mu + \alpha\_i + \beta\_{j(i)} + \epsilon\_{ijk}$$

for i = 1, 2, ⋯, a; j = 1, 2, ⋯, bi; and k = 1, 2, ⋯, nij. In this model, μ is the overall mean, α<sup>i</sup> represents the effect due to the ith level of factor A, and βj(i) represents the effect of the jth level of factor B nested within the ith level of factor A. Assuming that all factors are fixed, and that the errors εijk are normally distributed, that is εijk~N(0, σ<sup>2</sup> ), this model is the general linear model of the form y = Xβ + e. For example, suppose that you have a = 3 levels of factor A, b = 2 levels of factor B, and nij = 2, then the vectors and matrices have the following form:

y = y<sup>111</sup> y<sup>112</sup> y<sup>121</sup> y<sup>122</sup> y<sup>211</sup> y<sup>212</sup> y<sup>221</sup> y<sup>222</sup> y<sup>311</sup> y<sup>312</sup> y<sup>321</sup> y<sup>322</sup> , X = , β = μ α1 α2 α3 β<sup>11</sup> β<sup>12</sup> β<sup>21</sup> β<sup>22</sup> β<sup>31</sup> β<sup>32</sup> , e = E<sup>111</sup> E<sup>112</sup> E<sup>121</sup> E<sup>122</sup> E<sup>211</sup> E<sup>212</sup> E<sup>221</sup> E<sup>222</sup> E<sup>311</sup> E<sup>312</sup> E<sup>321</sup> E<sup>322</sup> :

Example Suppose that a researcher was studying the assimilation of fluorescently labeled proteins in rat kidneys and wanted to know whether his two technicians, technician A and technician B, were performing the procedure consistently. Technician A randomly chose three rats, and technician B randomly chose three other rats, and each technician measured the protein assimilation in each rat. Since rats are expensive and measurements are cheap, both technicians measured protein assimilation at various random locations in the kidneys of each rat (Table 1.10).

When performing a nested ANOVA, we are often interested in testing the null hypothesis (Ho : τ<sup>A</sup> = τ<sup>B</sup> ). As in this example, we do not wish to test whether the subgroups (rats within technicians) are significantly different, since the goal is to prove that both technicians are performing their jobs adequately. The sources of variation and degrees of freedom are shown in Table 1.11.

The components of this two-way model, assuming that the response variable yij is normally distributed, are as follows:


Table 1.10 Levels of protein assimilation in the rat kidneys measured by both technicians



Distribution: yij -N μij, σ<sup>2</sup>

Linear predictor: <sup>η</sup>ij <sup>=</sup> <sup>α</sup> <sup>þ</sup> <sup>τ</sup><sup>i</sup> <sup>þ</sup> β τð Þj ið Þ

ð Þ Link function: μ<sup>i</sup> = η<sup>i</sup> identity

where yij is the level of assimilation of the fluorescent protein obtained from rat j by technician i, α is the intercept, τ<sup>i</sup> is the fixed effect due to the technician, and β(τ)j(i) is the nested effect of rat j within technician i.

The SAS commands for the main effects of factor A and factor B nested within A are as follows:

```
proc glimmix data=rata nobound;
class technician rat rep;
model protein=technical rat(technical);
lsmeans technician rat(technician)/lines;
run;
```
Part of the results is shown in Table 1.12. The results indicate that there is minimum variability of the technicians since the value of the mean squared error (Pearson′ s chi - square/DF) is 0.04 (part (a)). This means that the variance between group means is smaller than would be expected. The analysis of variance in part (b) indicates that there is no difference in the measurement of fluorescent proteins in the rats between technicians (P = 0.3065). Since there is variation between rats in the


Rat (technical) 3.98 0.0067


Table 1.13 Comparison of the means (LSD) in the nested model


average protein uptake, it is to be expected that between rats within technicians, there are mean differences in the protein uptake (P = 0.0067).

In Table 1.13 part (a), the values of the least squares means tabulated under the "Estimate" column are shown with their respective "Standard errors." It can be seen that rats under technician A have statistically the same mean protein uptake as do rats under technician B (part (b)).

Comparison of means for rat subgroups under both technicians showed similar means for rats under technician A but different means for rats under technician B (part (a) and (b), Table 1.14).


Table 1.14 Comparison of the means (LSD) of the subgroups nested within technicians

# 1.3.3 Two-Way Analysis of Variance with Interaction

This experiment is used when one wishes to test two factors A and B, with a levels of factor A and b levels of factor B. In this experiment, both factors are crossed, this means that each level of A occurs in combination with each level of factor B. The model with interaction is given by:

$$\mathcal{Y}\_{\vec{q}\vec{k}} = \mu + a\_i + \beta\_{\vec{y}} + \chi\_{\vec{ij}} + \epsilon\_{\vec{ijk}}$$

for i = 1, 2, ⋯, a; j = 1, 2, ⋯, b; k = 1, 2, ⋯, nij; and εijk ~ N(0, σ<sup>2</sup> ). If all the parameters of the model are fixed, then this model can be expressed as y = Xβ + e. For this model with a = 3, b = 2, and nij = 3, the matrix expression has the form:

,

$$
\begin{array}{c}
\begin{array}{|c|c|c|c|c|c|c|c|}
\hline
\text{(2)} \\
\text{(1)} \\
\text{(1)} \\
\text{(2)} \\
\text{(2)} \\
\text{(1)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\text{(2)} \\
\end{array}
\end{array}
$$

Example This experiment consisted of developing an in vitro efficacy test for selftanning formulations. Two brands, 1 = erythrulose, 2 = dihydroxyacetone (factor A), and three formulations, 1 = solution, 2 = gel, and 3 = cream (factor B), were tested with four replicates for each condition according to Jermann et al. (2001). Total color change was measured for each of the combination conditions. The dataset is shown in Table 1.15.

### 1.3 Analysis of Variance Models 15


Table 1.15 Color change (Y) in each of the brands and formulations



For this two-way model, assuming that the response variable yijk has a normal distribution, the components are as follows:

> Distribution: yijk - N μijk, σ<sup>2</sup> Linear predictor: ηij = μ þ α<sup>i</sup> þ β<sup>j</sup> þ γij Link function: μij = ηij ð Þ identity

where yijk is the color change observed at the kth repetition at the ith level of factor A and at the jth level of factor B, μ is the intercept (the overall mean), α<sup>i</sup> is the fixed effect due to the level of factor A (mark), β<sup>j</sup> represents the fixed effect of the level of factor B (type of formulation), and γij is the fixed effect due to the interaction between the brand and formulation. Table 1.16 shows the sources of variation and degrees of freedom.

The following code in GLIMMIX in SAS allows us to estimate the main effects and the interaction:

```
proc glimmix;
class brand formulation;
model y = brand|formulacion;
lsmeans brand|formulacion/lines;
run;
```


Table 1.17 Results of the analysis of variance of the two-way model with interaction

Table 1.18 Means and standard errors of the tanning brand


Part of the results is shown below. Of all the fit statistics in (a) of Table 1.17, the value that we are interested in highlighting in this analysis is "Pearson′ s chi square/DF," which corresponds to the mean squared error (MSE), even though we are evaluating different possible models for this given dataset. The value of the MSE is 5.53. The type III fixed effects tests, in part (b) of Table 1.17, indicate that the type of brand (P < 0.0001), formulation (P < 0.0001), and the interaction between both factors (P = 0.0231) all have a significant effect on the change of self-tanning color.

The least mean squares obtained with "lsmeans" are shown in the Table 1.18 for the levels of tanning brand factor in Table 1.19 for the levels of tanning brand formulation and in Table 1.20 for the interaction of both factors. The "lines" option allows us to make a comparison of means using the LSD method.

The least squares means for the tanning brand factor are given in Table 1.18.

The least squares means for the type of tanning brand formulation are given in Table 1.19.


Table 1.19 Means and standard errors of the tanning brand formulation

Table 1.20 Comparison of the means of the interaction of both factors


The hypothesis test for the interaction should be tested first, and only if the interaction effect is not significant, should the main effects be tested. If the interaction is significant, then tests for the main effects are meaningless. The interaction analysis shows that brand 2 (dihydroxyacetone), in all three formulations, shows a greater change compared to brand 1 (erythrulose).

Now, considering the previous model without interaction (γ11=γ12=⋯=γ32=0) where factor A has a levels and factor B has b levels, the model without interaction is given by:

$$\gamma\_{ijk} = \mu + a\_i + \beta\_j + \epsilon\_{ijk}$$

for i = 1, 2, ⋯, a; j = 1, 2, ⋯, b; k = 1, 2, ⋯, nij; and εijk ~ N(0, σ<sup>2</sup> ). The model without interaction with a = 3, b = 2, and nij = 3 reduces to:

$$\mathbf{y} = \begin{pmatrix} y\_{111} \\ y\_{112} \\ y\_{113} \\ y\_{122} \\ y\_{123} \\ y\_{211} \\ y\_{212} \\ y\_{223} \\ y\_{224} \\ y\_{311} \\ y\_{321} \\ y\_{322} \\ y\_{323} \end{pmatrix}, \quad \mathbf{X} = \begin{pmatrix} 1 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 & 0 & 1 \end{pmatrix}, \quad \boldsymbol{\theta} = \begin{pmatrix} \mu \\ \mu\_{1} \\ \epsilon\_{112} \\ \alpha\_{2} \\ \beta\_{3} \\ \beta\_{1} \\ \epsilon\_{22} \end{pmatrix}, \quad \boldsymbol{\sigma} = \begin{pmatrix} \epsilon\_{112} \\ \epsilon\_{113} \\ \epsilon\_{123} \\ \epsilon\_{133} \\ \epsilon\_{134} \end{pmatrix}$$

Note that the design matrix for the model without interaction is the same as that for the model with interaction, except that the last six columns are removed.

Let us assume that the interaction effect is not significant. The following SAS code estimates the main effects of both factors. Running the program and analysis is left as practice for the readers.

```
proc glimmix;
class brand formulation;
model y = formula brand;
lsmeans brand formulation/lines;
run;
```
# 1.4 Analysis of Covariance (ANCOVA)

Consider an experiment to compare t ≥ 2 treatments after adjusting for the effects of a covariate x. The model for an analysis of covariance is given by:

$$\mathbf{y}\_{\vec{ij}} = \mu + \mathfrak{r}\_i + \beta\_i \mathfrak{x}\_{\vec{ij}} + \mathfrak{e}\_{\vec{ij}}$$

for i = 1, 2, ⋯, t and j = 1, 2, ⋯, ni Here, Eij are the independent normally distributed random errors with a zero mean and a variance constant σ<sup>2</sup> > 0. In this model, μ is the overall mean, τ<sup>i</sup> is the fixed effect of the ith treatment (ignoring the covariates x ' s), β<sup>i</sup> denotes the slope of the line that relates the response variable y to x for the ith treatment, and xij are fixed covariates. Assuming t = 3, n<sup>1</sup> = n<sup>2</sup> = n<sup>3</sup> = 3, we have:

$$\mathbf{y} = \begin{pmatrix} \mathbf{y}\_{11} \\ \mathbf{y}\_{12} \\ \mathbf{y}\_{13} \\ \mathbf{y}\_{21} \\ \mathbf{y}\_{23} \\ \mathbf{y}\_{31} \\ \mathbf{y}\_{32} \\ \mathbf{y}\_{33} \\ \end{pmatrix}, \quad X = \begin{pmatrix} 1 & 1 & 0 & 0 & x\_{11} & 0 & 0 \\ 1 & 1 & 0 & 0 & x\_{12} & 0 & 0 \\ 1 & 1 & 0 & 0 & x\_{13} & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & x\_{21} & 0 \\ 1 & 0 & 1 & 0 & 0 & x\_{22} & 0 \\ 1 & 0 & 1 & 0 & 0 & x\_{23} & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 & x\_{33} \\ 1 & 0 & 0 & 1 & 0 & 0 & x\_{33} \\ \end{pmatrix}, \quad \mathbf{f} = \begin{pmatrix} \mu \\ \alpha\_{1} \\ \alpha\_{2} \\ \alpha\_{3} \\ \beta\_{1} \\ \beta\_{2} \\ \beta\_{3} \\ \beta\_{3} \\ \end{pmatrix},$$

The analysis of covariance (ANCOVA), as can be seen, obeys a general linear model of the form y = Xβ + e .

For example consider a hypothetical study of flower production in two subspecies of plants. The number of flowers per plant may vary between the subspecies, but, within each subspecies, flower production may also vary with the size of each plant, and this relationship may be positive or negative. A positive relationship might arise if plants with more resources (sunlight, water, nutrients) could invest more energy in both growth and flower production. A negative relationship could arise if there was a trade-off between the energy invested in growth and the energy invested in flower production. In this study, subspecies is a categorical variable and plant size is a continuous variable (the covariate). Measuring plant size and flower production in the two subspecies allows the investigation of three different questions:

Is flower production influenced by subspecies?

Is flower production influenced by plant size?

Is the effect of flower production on plant size influenced by subspecies?

Example 1. The central question in plant reproductive ecology is how hermaphroditic plant species allocate resources to male and female structures. A study conducted to address this question counted the number of stamens (male structures that produce pollen) and ovules (female structures that when fertilized by a pollen grain will become seeds) in the flowers of "prairie larkspur" plants in two populations in southeastern Minnesota. The total number of flowers produced by each plant was also determined to assess whether plant size affected ovule production per flower. The dataset for this example can be found in the Appendix (Data: Larkspur plants).

An ANCOVA is appropriate for this study to test the following three null hypotheses from these data:


The components of the ANCOVA model, assuming that the response variable yijk is normally distributed, are as follows:

> Distribution: yij -N μijk , σ<sup>2</sup>

Linear predictor: <sup>η</sup>ij <sup>=</sup> <sup>μ</sup> <sup>þ</sup> <sup>τ</sup><sup>i</sup> <sup>þ</sup> plantað Þ<sup>τ</sup> j ið Þ <sup>þ</sup> <sup>β</sup><sup>i</sup> Xij - <sup>X</sup>::

Link function : μij = ηij ð Þ identity

where yij is the number of ovules observed in the jth plant of the ith population, μ is the overall mean,τ<sup>i</sup> is the fixed effect due to the population i, planta(τ)j(i) is the random effect due to the plant j in the population i, β<sup>i</sup> is the slope of the population i, X:: is the overall mean of the size of all plants, and Xij is the plant size i in the population j. The ANCOVA results (sources of variation and degrees of freedom) are shown in Table 1.21.

The basic syntax in GLIMMIX for analysis of covariance with different slopes is as follows:

```
proc glimmix;
class poblacion plant;
model ovules = population xbar population*xbar/ddfm=satterthwaite;
random plant(population);
lsmeans population/lines;
run;
```


Table 1.21 Analysis of covariance


Table 1.22 Results of the analysis of covariance for the two populations of larkspur plants

In the above syntax, the "class" command lists all classes or categorical variables, except the covariate (continuous variable), which – in this case – is a variable centered by the average of the size of all plants xbar = Xij - X:: : The options "ddfm" and "lines" invoke proc GLIMMIX to do a degree-of-freedom correction using the Satterthwaite method and a comparison of the means using the LSD method. Part of the results is shown in Table 1.22.

The estimates of the variance components (part (a)) due to plant and withintreatment variability are σ<sup>2</sup> planta poblacion ð Þ <sup>=</sup> <sup>12</sup>:795 and <sup>σ</sup><sup>2</sup> <sup>=</sup> MSE <sup>=</sup> <sup>0</sup>:9321, respectively. The analysis of variance in (b) showed that there is a significant effect between the two populations (P = 0.0084), plant size (P = 0.0001) and plant size is influenced by subspecies (interaction) on the average number of ovules (P = 0.0066) per flower. The estimated means and their respective standard errors of the average number of ovules for both populations are tabulated in the "Estimate" column in part (c), as well as the comparison of the means in part (d).

If in the previous model we assume that the slopes were equal (β<sup>1</sup> = β2), then the ANCOVA reduces to:

$$\gamma\_{ij} = \mu + \pi\_i + \beta \left( X\_{\vec{y}} - \overline{X}\_{..} \right) + \epsilon\_{\vec{y}}$$

The ANCOVA model, with t = 3, n<sup>1</sup> = n<sup>2</sup> = n<sup>3</sup> = 3 for this case (equality of slopes) reduces to:

$$\mathbf{y} = \begin{pmatrix} \mathbf{y}\_{11} \\ \mathbf{y}\_{12} \\ \mathbf{y}\_{13} \\ \mathbf{y}\_{21} \\ \mathbf{y}\_{22} \\ \mathbf{y}\_{31} \\ \mathbf{y}\_{33} \\ \mathbf{y}\_{33} \end{pmatrix}, \quad \mathbf{X} = \begin{pmatrix} 1 & 1 & 0 & 0 & x\_{11} \\ 1 & 1 & 0 & 0 & x\_{12} \\ 1 & 1 & 0 & 0 & x\_{13} \\ 1 & 0 & 1 & 0 & x\_{21} \\ 1 & 0 & 1 & 0 & x\_{22} \\ 1 & 0 & 1 & 0 & x\_{31} \\ 1 & 0 & 0 & 1 & x\_{32} \\ 1 & 0 & 0 & 1 & x\_{33} \\ 1 & 0 & 0 & 1 & x\_{33} \end{pmatrix}, \quad \mathbf{\mathcal{B}} = \begin{pmatrix} \mu \\ \epsilon\_{11} \\ \epsilon\_{12} \\ \epsilon\_{21} \\ \epsilon\_{22} \\ \epsilon\_{31} \\ \epsilon\_{32} \\ \epsilon\_{33} \\ \epsilon\_{33} \end{pmatrix}.$$

The basic syntax using GLIMMIX for an analysis of covariance with equal slopes is as follows:

```
proc glimmix;
class poblacion plant;
model ovules = population xbar/ddfm=satterthwaite;
random plant(population);
lsmeans population/lines;
run;
```
So far, we have exemplified the general linear model of the form y = Xβ +e. In the following, some characteristics of a linear mixed model (LMM) will be described.

# 1.5 Mixed Models

# 1.5.1 Introduction

Linear mixed models (LMMs) are appropriate for analyzing continuous response variables in which the residuals are normally distributed. These types of models are well suited for studies of grouped datasets such as (1) students in classrooms, animals in herds, people grouped by municipality or geographic region, or randomized block experimental designs such as batches of raw materials for an industrial process and (2) longitudinal or repeated measures studies, in which subjects are measured repeatedly over time or under different conditions. These designs occur in a wide variety of settings: biology, agriculture, industry, and socioeconomic sciences. LMMs provide researchers with powerful and flexible analytical tools for these types of data.

The name linear mixed models comes from the fact that these models are linear in the parameters and that the covariates, or independent variables, may involve a combination of fixed and random effects. "Fixed effects" can be associated with continuous covariates, such as weight in kilograms of an animal, maize yield in tons per hectare, and reference test score or socioeconomic status, which will carry a continuous range of values, or with factors, such as gender, variety, or group treatment, which are categorical. Fixed effects are unknown constant parameters associated with continuous covariates or levels of the categorical factors in an LMM. The estimation of these parameters in LMMs is generally of intrinsic interest because they indicate the relationship of the covariates with the continuous response variable.

When the levels of a factor are drawn from a large enough sample such that each particular level is not of interest (e.g., classrooms, regions, herds, or clinics that are randomly sampled from a population), the effects associated with the levels of those factors can be modeled as random effects in an LMM. "Random effects" are represented by random (unobserved) variables that we generally assume to have a particular distribution, with normal distribution being the most common.

Mixed models are extremely useful because they allow us to work on (address) two important aspects:

	- (a) Multiple measurements of the same subject/organism
	- (b) Experiments organized into spatial blocks
	- (c) Observational data in which multiple investigations were conducted in different locations
	- (d) Synthesis of data from similar experiments that were performed by different researchers

# 1.5.2 Mixed Models

The matrix notation for a mixed model is highly similar to that for a fixed effects (systematic) model. The main difference is that, instead of using only one design matrix to explain the entire model in its systematic part, the matrix notation for a mixed model uses at least two design matrices: a design matrix X to describe the fixed effects in the model and a design matrix Z to describe the random effects in the model. The fixed effects design matrix X is constructed in the same way as a general linear fixed effects model (y = Xβ + ε ). X has a dimension of n × ( p + 1), where n is the number of observations in the dataset and p + 1 is the number of parameters of fixed effects in the model to be estimated. The design matrix for the random effects Z is constructed in the same way as the construction of the design matrix for fixed effects, but now for the random effects. The Z matrix has a dimension of n × q, where q is the number of coefficients of random effects in the model.

In matrix notation, a linear mixed model can be represented as

$$\mathbf{y} = \overbrace{\mathbf{X}\mathfrak{J}}^{\text{Sistematic}} + \overbrace{\mathbf{Z}\mathfrak{b}}^{\text{random}} + \overbrace{\mathfrak{e}}^{\text{Experimental Error}} \tag{1.2}$$
 
$$\mathbf{b} \sim N(\mathbf{0}, \mathbf{G}) \text{ and } \mathfrak{e} \sim N(\mathbf{0}, \mathbf{R})$$

where y is the vector of n × 1 observations, β is the vector of ( p + 1) × 1 fixed effects, b is the vector of random effects of q × 1, ε is the vector of n × 1 random error terms, X is the design matrix of n × ( p + 1) for fixed effects related to observations at β, and Z is the design matrix n × q for the random effects (b) related to observations at b.

Assuming that both b and ε are uncorrelated random variables with a zero mean and variance–covariance matrices G and R, respectively, we have

$$E(\mathbf{b}) = \mathbf{0}, E(\mathbf{e}\_-) = \mathbf{0}$$

$$\operatorname{Var}(\mathbf{b}) = \mathbf{G}, \operatorname{Var}(\mathbf{e}\_-) = \mathbf{R}$$

$$\operatorname{Cov}(\mathbf{b}, \mathbf{e}) = \mathbf{0}$$

It is not difficult to verify that Var(y) = Var(Xβ + Zb + ε) is

$$\operatorname{Var}(\mathbf{y}) = \mathbf{Z}G\mathbf{Z}' + \mathbf{R} = V$$

Matrix V is an important component when working with linear mixed models (LMMs) because it contains random sources of variation and also defines how such models differ from ordinary least squares estimation. If the model contains only random effects, such as a randomized complete block design (RCBD), then matrix G is the first point of attention. On the other hand, for repeated measures or for spatial analysis, matrix R is extremely important. Assuming that the random effects (blocks) have a normal distribution,

$$\mathbf{b} \sim N(\mathbf{0}, \mathbf{G}) \text{ and } \text{Var}(\mathbf{e}\_1) = \mathbf{R}$$

Then, the vector of observations y will have a normal distribution, that is, y~N(Xβ,V). The same model can be written in the probability distribution form in two different but equivalent ways. The first is the marginal model

$$\mathbf{y} \sim N(E[\mathbf{y}] = \mathbf{X}\boldsymbol{\beta}, \mathbf{V} = \mathbf{Z}\mathbf{G}\mathbf{Z}^{\prime} + \mathbf{R})\tag{1.3}$$

In this marginal model, the mean is based only on fixed effects and the parameters describing the random effects appear (are contained) in the variance and covariance matrix V (Littell et al. 2006). In general, a structure is imposed in b in terms of Var(b) = G, and, therefore, marginally, the components of y depend on the structure in V = ZGZ′ + R.

The second model is the conditional model

$$\mathbf{y} \mid \mathbf{b} \sim N(\mathbf{X}\boldsymbol{\mathfrak{f}} + \mathbf{Z}\boldsymbol{\mathfrak{b}}, \mathbf{R}) \tag{1.4}$$

In this conditional model, b is distributed as shown in Eq. (1.2) for this parameter. For LMMs, the two models are exactly the same; but if the response variable is modeled under a non-normal distribution, then the models are different (Stroup, 2012) and generalized linear mixed models are required.

The fixed effects estimator (β) is useful to obtain the best linear unbiased estimators (commonly known as BLUEs), whereas the estimator b is useful for computing the best linear unbiased predictors (commonly known as BLUPs) for the random effects b. The estimation of the expected value of the marginal LMM (1.3) allows the estimation of the BLUEs and that of the conditional LMM (1.4), the BLUPs. The estimators for the BLUEs of β and the BLUPs of b are as follows:

$$
\widehat{\boldsymbol{\theta}} = \left(\boldsymbol{X}^T \boldsymbol{V}^{-1} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^T \boldsymbol{V}^{-1} \boldsymbol{\mathfrak{y}}
$$

$$
\widehat{\boldsymbol{b}} = \boldsymbol{G} \boldsymbol{Z}^T \boldsymbol{V}^{-1} \left(\boldsymbol{\mathfrak{y}} - \widehat{\boldsymbol{\theta}}\right)
$$

This solution is efficient when working with small datasets because, in the context of big data, it is computationally highly demanding since the inverse of matrix V has to be estimated. For this reason, it is normally used to obtain the solution of the BLUEs of β and the BLUPs of b, also known as Henderson's mixed model equations, which are presented later in this chapter.

# 1.5.3 Distribution of the Response Variable Conditional on Random Effects (y| b)


# 1.5.4 Types of Factors and Their Related Effects on LMMs

In an LMM, there are two types of factors, namely, fixed factors that make up the systematic part and random factors that are the stochastic part, and their related effects on the dependent variable (response). In the following sections, we provide a brief description of these factors and their implications in the context of an LMM.

### 1.5.4.1 Fixed Factors

A fixed factor is commonly used in standard analysis of variance (ANOVA) or analysis of covariance (ANCOVA) models. It is defined as a categorical or classification variable, for which the researcher has included all levels (or conditions) in the model that are of interest in the study. Fixed factors may include qualitative covariates, such as gender; classification variables implied by a sampling design, such as a region or a stratum, or by a study design, such as the method of treatment in a randomized clinical trial; and so on. The levels of a fixed factor are chosen to represent specific conditions so that they can be used to define contrasts (or sets of contrasts) of interest in the research study.

### 1.5.4.2 Random Factors

A random factor is a classification variable with levels that can be randomly sampled from a population with different levels of study. All possible levels of a random factor are not present in the dataset, but this is the intention of the researcher, i.e., to make inference about the entire population of levels from the selected sample of these factor levels. Random factors are considered in an analysis such that the change in the dependent variable across random factor levels can be evaluated and the results of the data analysis can be generalized to all random factor levels in the population.

### 1.5.4.3 Fixed Versus Random Factors

In contrast to fixed factor levels, random factor levels do not represent conditions specifically chosen to meet the objectives of the study. However, depending on the objectives of the study, the same factor may be considered as either a fixed factor or a random factor.

Fixed effects, commonly referred to as regression coefficients or fixed effect parameters, describe the relationships between the dependent variable and predictor variables (i.e., fixed factors or continuous covariates) for an entire population of units of analysis or for a relatively small number of subpopulations defined by the levels of a fixed factor. Fixed effects may describe the contrasts or differences between levels of a fixed factor (e.g., sex between males and females) in the mean responses for a continuous dependent variable or may describe the effect of a continuous covariate on the dependent variable. Fixed effects are assumed to be unknown fixed quantities in an LMM and are estimated based on analysis of the data collected in a study.

Random effects are random values associated with the levels of a random factor (or factors) in an LMM. These values, which are specific to a given level of a random factor, generally represent random deviations from the relationships described by fixed effects. For example, random effects associated with levels of a random factor may enter an LMM as random intercepts (random deviations for a given subject or group as an overall intercept) or as random coefficients (random deviations for a given subject or group from the total fixed effects) in the model. In contrast to fixed effects, random effects are represented as stochastic variables in an LMM.

# 1.5.5 Nested Versus Crossed Factors and Their Corresponding Effects

When a given level of one factor (random or fixed) can be measured only at a single level of another factor and not across multiple levels, then the levels of the first factor are said to be nested within the levels of the second factor. The effects of the nested factor on the response variable are known as nested effects. For example, suppose that you want to conduct a particular study at the primary level in a school zone, you would select schools and classrooms at random. Classroom levels (one of the random factors) are nested within school levels (another random factor), since each classroom can appear within a single school.

When a given level of one factor (random or fixed) can be measured across multiple levels of another factor, one factor is said to be crossed with the other and the effects of these factors on the dependent variable are known as crossover effects.

# 1.5.6 Estimation Methods

Standard methods of estimation in mixed models with a normal response are maximum likelihood (ML) and restricted maximum likelihood (REML). The linear mixed effects model is as follows:

$$\mathbf{y} = \mathbf{X}\boldsymbol{\theta} + \mathbf{Z}\boldsymbol{b} + \boldsymbol{\sigma}$$

The variance–covariance matrix V for a one-way analysis of variance (ANOVA) with a randomized block effect and with six observations is equal to:

$$\mathbf{V} = \mathbf{V} \mathbf{a}(\mathbf{y}) = \mathbf{Z} \mathbf{G} \mathbf{Z}^{'} + \sigma^{2} \mathbf{I} = \begin{pmatrix} \sigma^{2} + \sigma\_{b}^{2} & \sigma\_{b}^{2} & 0 & 0 & 0 & 0\\ \sigma\_{b}^{2} & \sigma^{2} + \sigma\_{b}^{2} & 0 & 0 & 0 & 0\\ 0 & 0 & \sigma^{2} + \sigma\_{b}^{2} & \sigma\_{b}^{2} & 0 & 0\\ 0 & 0 & \sigma\_{b}^{2} & \sigma^{2} + \sigma\_{b}^{2} & 0 & 0\\ 0 & 0 & 0 & 0 & \sigma\_{b}^{2} + \sigma\_{b}^{2} & \sigma\_{b}^{2}\\ 0 & 0 & 0 & 0 & \sigma\_{b}^{2} & \sigma^{2} + \sigma\_{b}^{2} \end{pmatrix}$$

The variance of <sup>y</sup><sup>11</sup> is <sup>V</sup><sup>11</sup> <sup>=</sup> <sup>σ</sup><sup>2</sup> <sup>þ</sup> <sup>σ</sup><sup>2</sup> <sup>b</sup> and the covariance between y<sup>11</sup> and y<sup>21</sup> is V<sup>12</sup> = V<sup>21</sup> = σ<sup>2</sup> <sup>b</sup>. These two observations come from the same block. The covariance between y<sup>11</sup> and other observations is zero. In matrix V, all possible covariances can be found.

### 1.5.6.1 Maximum Likelihood

The likelihood function l is a function of the observations and the model parameters. It gives us a measure of the probability of looking at a particular observation y, given a set of model parameters β and b. The likelihood function for y j b and b for a mixed model is given by:

$$d(\mathbf{y}|\mathbf{b}) = -\frac{n}{2}\log(2\pi) - \frac{1}{2}\log|\mathbf{R}| - \frac{1}{2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} - \mathbf{Z}\boldsymbol{b})^{\prime}\mathbf{R}^{-1}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta} - \mathbf{Z}\boldsymbol{b})$$

and

$$l(\boldsymbol{b}) = -\frac{N\_b}{2}\log(2\pi) - \frac{1}{2}\log|\boldsymbol{G}| - \frac{1}{2}\boldsymbol{b}^T\boldsymbol{G}^{-1}\boldsymbol{b}$$

where Nb represents the total number of random effect levels. Therefore, the joint distribution of y and b is equal to:

$$d(\mathbf{y}, \mathbf{b}) = -\left(\frac{1}{2}\right)(\mathbf{y} - \mathbf{X}\boldsymbol{\theta} - \mathbf{Z}\boldsymbol{\theta})^T \mathbf{R}^{-1} (\mathbf{y} - \mathbf{X}\boldsymbol{\theta} - \mathbf{Z}\boldsymbol{\theta}) - \left(\frac{1}{2}\right)\mathbf{b}^T \mathbf{G}^{-1} \mathbf{b}$$

Now, after deriving the above expression with respect to β and b and then setting it to zero and solving the resulting equations with respect to β and b, the maximum likelihood estimators are obtained:

$$\frac{\partial l(\mathbf{y}, \mathbf{b})}{\partial \boldsymbol{\theta}^T} = \mathbf{X}^T \mathbf{R}^{-1} \mathbf{y} - \mathbf{X}^T \mathbf{R}^{-1} \mathbf{X} \boldsymbol{\theta} - \mathbf{Z}^T \mathbf{R}^{-1} \mathbf{X} \boldsymbol{\theta}$$

$$\frac{\partial l(\mathbf{y}, \mathbf{b})}{\partial \boldsymbol{\theta}^T} = \mathbf{Z}^T \mathbf{R}^{-1} \mathbf{y} - \mathbf{X}^T \mathbf{R}^{-1} \mathbf{Z} \boldsymbol{\theta} - \mathbf{Z}^T \mathbf{R}^{-1} \mathbf{Z} \boldsymbol{\theta}$$

Setting them to zero and solving for β and b, we obtain the following linear mixed equations:

### 1.5 Mixed Models 29

$$
\begin{pmatrix} X^T \mathbf{R}^{-1} X & X^T \mathbf{R}^{-1} \mathbf{Z} \\ Z \mathbf{R}^{-1} X & Z^T \mathbf{R}^{-1} \mathbf{Z} + G^{-1} \end{pmatrix} \begin{pmatrix} \widehat{\boldsymbol{\mathcal{A}}} \\ \boldsymbol{b} \end{pmatrix} = \begin{pmatrix} X^T \mathbf{R}^{-1} \mathbf{y} \\ Z^T \mathbf{R}^{-1} \mathbf{y} \end{pmatrix}
$$

The solution can be written as:

$$
\begin{pmatrix} \widehat{\boldsymbol{\mathcal{A}}} \\ \boldsymbol{b} \end{pmatrix} = \begin{pmatrix} \boldsymbol{X}^{\boldsymbol{T}} \boldsymbol{\mathcal{R}}^{-1} \boldsymbol{X} & \boldsymbol{X}^{\boldsymbol{T}} \boldsymbol{\mathcal{R}}^{-1} \boldsymbol{Z} \\ \boldsymbol{Z} \boldsymbol{\mathcal{R}}^{-1} \boldsymbol{X} & \boldsymbol{Z}^{\boldsymbol{T}} \boldsymbol{\mathcal{R}}^{-1} \boldsymbol{Z} + \boldsymbol{G}^{-1} \end{pmatrix}^{-1} \begin{pmatrix} \boldsymbol{X}^{\boldsymbol{T}} \boldsymbol{\mathcal{R}}^{-1} \boldsymbol{y} \\ \boldsymbol{Z}^{\boldsymbol{T}} \boldsymbol{\mathcal{R}}^{-1} \boldsymbol{y} \end{pmatrix}.
$$

Here, β is the vector of fixed effects parameters and b is the vector of random effects parameters. The information of these parameters is related to the two covariance matrices G and R, and it no longer depends on V as in the previous solution. Moreover, this solution, which is known as Henderson's (1950) mixed model equations, is computationally much more efficient than the previous one given for the parameters (β and b) since it does not need to obtain the inverse of the matrix V = ZGZ′ + R. The solution to these mixed model linear equations is based on the assumption that we know the components of G and R, which, in practice, need to be estimated. Therefore, the following is a popular method for estimating the variance components of G and R, which is extremely versatile and powerful.

### 1.5.6.2 Restricted Maximum Likelihood Estimation

The restricted maximum likelihood method is also known as the residual maximum likelihood method and is extremely useful, among other things, for estimating variance components. This method is also based on the maximum likelihood method, but, instead of maximizing the likelihood function of the original data, it maximizes the likelihood function over a set of errors obtained by removing the variables from the original response to fixed effects, which are assumed to be known. That is, now instead of maximizing over y is maximized over Ky but to obtain the variance components, it is assumed that K is a matrix of constants, such that KX = 0, which implies that:

$$E(\mathbf{Ky}) = (\mathbf{KX}\boldsymbol{\beta} + \mathbf{KZ}\boldsymbol{b} + \mathbf{Ke}) = \mathbf{0}$$

$$\mathbf{Var}(\mathbf{Ky}) = (\mathbf{K}^T \mathbf{V} \mathbf{K})$$

This implies that Ky is distributed over N(0, KT VK) and the likelihood of Ky is called the restricted maximum likelihood (REML). There are many options to choose K and typically K = I - X(X<sup>T</sup> X) -1 XT , which is the ordinary least squares residual operator used. Therefore, the log likelihood of Ky is equal to

$$d(\mathbf{V}|\mathbf{Ky}) = -\frac{n-p}{2}\log(2\pi) - \frac{1}{2}\log|\mathbf{K^T}\mathbf{V}\mathbf{K}| - \frac{1}{2}(\mathbf{y}^T\mathbf{K^T})^\prime\mathbf{K^T}\mathbf{V}\mathbf{K^{-1}}(\mathbf{Ky})$$

This log likelihood after some algebra, according to Stroup (2012), is equal to:

$$d(V|\mathbf{Ky}) = -\frac{n-p}{2}\log(2\pi) - \frac{1}{2}\log|V| - \frac{1}{2}\log\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right) - \frac{1}{2}\mathbf{r}^T\mathbf{V}\mathbf{r}$$

where p = rank (X) and r = y - XβML, where βML = X<sup>0</sup> V - <sup>1</sup>X - X<sup>0</sup> V - <sup>1</sup>y

The variance components of G and R are estimated with iterative methods such as the Newton–Raphson or Fisher's scoring method, which maximizes the likelihood function l(V| Ky) with respect to the variance components. The maximization process starts with starting values for the variance components to estimate G and R, and, with these values of G and R, it is possible to estimate a new, more refined version of the parameters β and b; then, these values are used to update the estimates of the variance components of the matrices G and R, and this process continues until the established convergence is met.

# 1.5.7 One-Way Random Effects Model

Suppose that we randomly select a possible levels from a sufficiently large set of levels of the factor of interest. In this case, we say that the factor is random. Random factors are usually categorical. Continuous covariates that cannot be measured at random levels are generally known as "systematic" or "fixed" effects (e.g., linear, quadratic, or even exponential terms). Random effects are not systematic. Let us assume a simple one-way model:

$$\mathbf{y}\_{ij} = \mu + \boldsymbol{\tau}\_i + \boldsymbol{\varepsilon}\_{ij}; \qquad i = 1, 2, \cdots, a; \qquad j = 1, 2, \cdots, n\_i$$

However, in this case, the treatment effects and the error term are random variables, i.e., τ<sup>i</sup> - N 0, σ<sup>2</sup> <sup>τ</sup> and <sup>ε</sup>ij~N(0, <sup>σ</sup><sup>2</sup> ), respectively. The terms τ<sup>i</sup> and εij are uncorrelated, commonly referred to as "variance components."

There can be some confusion about the differences between noise factors and random factors. Noise factors can be fixed or random.

Factors are random when we think of them as being/coming from a random sample of a larger population, and their effect is not systematic. It is not always clear when a factor is random. For example, suppose that the vice president of a chain of stores is interested in the effects of implementing a management policy in his stores and the experiment includes all five existing stores, he might consider "the store" as a fixed factor because the levels of the factor "store" do not come from a random sample. However, if the store chain has 100 stores and takes 5 stores for the experiment, as the company is considering rapid expansion and plans to implement the selected new policy at the new locations, then "store" could be considered as a random factor.

In fixed effects models, the researcher's interest would focus on testing the equality of means of treatments (stores). This would not be appropriate, however, for the case in which 5 stores are randomly selected out of 100 because the treatments are randomly selected and we are interested in the population of treatments (stores), not in a particular store or group of stores. The appropriate hypothesis test for this random effect model would be

$$H\_0: \sigma\_\tau^2 = 0 \quad \text{vs} \ H\_a: \sigma\_\tau^2 > 0$$

Partitioning a standard analysis of variance from the total sum of squares still works; however, the form of the appropriate test statistic depends on the expected mean squares. In this case, the appropriate test statistic would be

$$F\_c = \frac{\text{Mean Saquare}\_{\text{Treatment}}}{\text{Mean Square}\_{\text{Error}}},$$

numerator and N - a in the denominator, where N = i = 1 ni. <sup>F</sup><sup>c</sup> follows an <sup>F</sup>-distribution (Fisher–Snedecor) with degrees of freedom <sup>a</sup> - 1 in the <sup>a</sup>

In a completely random model, we are interested in estimating the variance components. σ<sup>2</sup> <sup>τ</sup> and σ2. To do so, we use the analysis of variance method, which consists of equating the expected mean squares with the observed values as follows:

$$
\widehat{\sigma}^2 + n\widehat{\sigma}\_\tau^2 = \text{Mean Square}\_{\text{Treatments}}
$$

where σ<sup>2</sup> = Mean SquareError

$$
\widehat{\sigma}\_{\pi}^{2} = \frac{\text{Mean Square}\_{\text{Error}} - \widehat{\sigma}^{2}}{n}
$$

# 1.5.8 Analysis of Variance Model of a Randomized Block Design

Consider a one-way analysis of variance model with a randomized block additive effect. Assume two treatments and three blocks,

$$\mathbf{y}\_{ij} = \mu + \mathbf{r}\_i + b\_j + \epsilon\_{ij}$$

where bj - N 0, σ<sup>2</sup> <sup>b</sup> and Eij - <sup>N</sup> 0, <sup>σ</sup><sup>2</sup> <sup>ð</sup> <sup>Þ</sup> with <sup>i</sup> <sup>=</sup> 1, 2, <sup>3</sup> and <sup>j</sup> <sup>=</sup> 1, 2, 3. The random effects bj and Eij are independent and uncorrelated. In addition, treatment effects are assumed to be fixed. The matrix notation of this model is as follows:

where b~N(0, G) and ε~N(0,R). The variance–covariance matrix G for the random effects in this case is a diagonal matrix 3 × 3 with diagonal elements σ<sup>2</sup> <sup>b</sup>. Note how the matrix representation of this model exactly corresponds to the mixed model formulation. That is,

$$\mathbf{y} = \mathbf{X}\boldsymbol{\mathfrak{f}} + \mathbf{Z}\boldsymbol{\mathfrak{b}} + \boldsymbol{\mathfrak{e}}, \quad \text{where} \quad \boldsymbol{\mathfrak{b}} \sim N(\mathbf{0}, \mathbf{G}) \text{ and } \boldsymbol{\mathfrak{e}} \sim N(\mathbf{0}, \mathbf{R}).$$

Example An animal nutritionist is interested in comparing the effect of three diets on weight gain in piglets. To conduct the experiment, the nutritionist randomly selects 3 litters from a set of 20, each containing 3 healthy, similar-sized, recently weaned piglets. In each litter, three piglets are selected and each piglet is randomly assigned to a treatment.

A randomized complete block design (RCBD) is a variation of the completely randomized design (CRD). In this design, blocks of experimental units are chosen in such a way that the units within the blocks are as homogeneous as possible with respect to each other (homogeneous) and different between blocks. In a randomized complete block design, generally in each block, there is one experimental unit for each treatment, but this does not limit having more than one experimental unit for each treatment in each block.

An RCBD has two sources of variation: the factor of interest that includes the treatments to be studied and the "block factor" that identifies the litters used in the experiment.

Assumptions in RCBD:


Table 1.23 lists the weight in kilograms of piglets from three different litters under three different diets. To make inferences about the pattern of weight gain for the entire population (all litters) of piglets, the litters must be considered in the model as a random effect. Thus, the linear mixed model describing the variability of piglet weight gain in this research, as a function of diets, is as follows:




$$\mathbf{y}\_{ij} = \mu + \boldsymbol{\tau}\_i + \boldsymbol{b}\_j + \boldsymbol{\varepsilon}\_{ij} \text{ for } i = 1, 2, 3; j = 1, 2, 3.$$

where yij is the weight observed in the ijth piglet, μ is the overall mean, τ<sup>i</sup> is the fixed effect due to ith diet, bj is the random effect due to the jth block (litter) assuming bj - N 0, σ<sup>2</sup> <sup>b</sup> , and εij is the independent and identically distributed, approximately normal, observed error term with mean 0 and variance σ<sup>2</sup> , i.e., εij~N(0, σ<sup>2</sup> ).

Random effects, bj and εij, are assumed to be independent and uncorrelated. Table 1.24 shows an outline of the analysis of variance for this dataset.

The SAS program to analyze this dataset is as follows:

```
proc glimmix data=piglets;
class litter diet;
model gain=diet/ddfm=satterthwaite;
random litter;
lsmeans diet/lines;
contrast "Diet1 vs Diet2" diet 1 -1 0;
contrast "Diet2 vs Diet3" diet 1 0 -1;
run; quit;
```
In the previous syntax, we can mention two commands of great importance in this example: (1) the "ddfm = satterthwaite" command allows to make a correction of the degrees of freedom, and this correction is of great importance when the number of experimental units (UE) is different in each one of the treatments and (2) the command "lines" serve to obtain the means of "lsmeans" but are grouped with letters, and, if these averages appear with different letters, then they reflect significant differences.

The output for this code is shown in Table 1.25. Subsection (a) of this table shows the estimated variance due to litter <sup>ð</sup>σ<sup>2</sup> litter = 5:3117Þ and the mean squared error σ<sup>2</sup> = 3:2961 . The analysis of variance, part (b), shows that there is a highly significant effect of diet on piglet weight gain (P = 0.0091). In the results (part c), we also observe the estimated means and its standard errors (obtained with "lsmeans diet/lines") and the grouping of means that are statistically different (part d). In these last results, we can observe that the weight gain of piglets under treatments I and II


Table 1.25 Results of the analysis of variance of the three different diets tested on piglet weight gain


are not statistically different from each other, but they are statistically different with respect to treatment III.

Since the researcher wishes to make an inference about the entire population of litters, the factor "litter" must be entered as a random effect; otherwise, the ability of the F-test to detect differences between treatments is diminished because the P-value changes from 0.0091 to 0.0248. Another way to see the importance of including random effects in an ANOVA is to calculate the relative efficiency (RE) between the two models.

Table 1.26 shows the results of the analysis of variance under a completely randomized design (CRD), i.e., yij = μ + litter<sup>i</sup> + eij is as follows:

In this case, if the experiment had been analyzed under a CRD, then the relative efficiency (RE) between an RCBD and a CRD would be:

$$\text{RE} = \frac{\text{CME}\_{\text{CRD}}}{\text{CME}\_{\text{RCBD}}} = \frac{\frac{(\text{SSB}\_{\text{RCBD}} + \text{SCE}\_{\text{RCBD}})}{t(b-1)}}{\text{CME}\_{\text{RCBD}}} = \frac{(b-1)\text{MSB}\_{\text{RCBD}} + b(t-1)\text{CME}\_{\text{RCBD}}}{(bt-1)\text{CME}\_{\text{RCBD}}}$$

where CMEDCA is the mean squared error under a CRD, CMERCBD is the mean squared error under an RCBD, SSBDBCA is the sum of squares due to blocks in an RCBD, SSEDBCA is the sum of squares of errors in an RCBD, MSBDBCA is the mean


square due to blocks, and t and b are the number of treatments and blocks, respectively. If blocks are not useful, then the RE would be equal to 1. The higher the RE, the more effective the blocking is in reducing the error variance. This value can be interpreted as the relationship <sup>r</sup>=<sup>b</sup>, where r is the number of experimental units that would have to be assigned to each treatment if a CRD were used instead of an RCBD.

In Table 1.27, we can observe the mean squared error (MSE) of a CRD and RCBD (Pearson's chi-square / DF) obtained with the GLIMMIX procedure in SAS as well as a series of fit statistics.

The MSE for a CRD and an RCBD are 8.61 and 3.3, respectively. Substituting these values into the above equation, we obtain

$$\text{ER} = \frac{\text{CME}\_{\text{CRD}}}{\text{CME}\_{\text{RCBD}}} = \frac{8.61}{3.3} = 2.609.$$

This value indicates that, an RCBD is 2.609 times more efficient than a CRD. In other words, this implies that it should have taken, at least, 8 (2.609 × 3 ≈ 8) more experimental units × treatment units in a CRD to obtain the same MSE as that obtained in an RCBD.

# 1.6 Exercises

Exercise 1.6.1 The following dataset corresponds to the growth of pea plants, in eye units, in tissue culture with auxins ( 0.114 mm). The purpose of this experiment was to test the effects of the addition of various types of sugars to the culture medium on growth in length. Pea plants were randomly assigned to one of five treatments: control (no sugar), 2 % of glucose, 2 % of fructose, 1 % of glucose + 1 % of fructose, and 2% sucrose. A total of 10 observations were taken in each of the treatments, assuming that the measurements are approximately normally distributed with constant variance. Here, the individual plants to which the treatments were applied are the experimental units. The data from this experiment are shown below (Table 1.28):


Table 1.28 Growth of pea plants in the culture medium with auxins with different types of sugars




Exercise 1.6.2 A forage company wants to test three different types of fertilizers (F1, F2, and F3) for the production of two forage species (A and B) for cattle and compare them with a fertilizer they usually apply, which we will call control. For this, he decides to use 48 pots with 6 replications in the greenhouse to test the combinations of fertilizers and forage species. The data from this experiment are shown in Table 1.29:


Exercise 1.6.3 The data in this experiment are the number of plants regrown after grazing with sheep–goats. The initial size of the plant at the top of its rootstock is recorded, and the weight of seeds (g) that it produces at the end of the season is the response or dependent variable. The data for this experiment are as follows (Table 1.30):

	- Is seed weight influenced by the type of grazing?
	- Is seed weight influenced by the plant size?
	- Is the effect of grazing type on plant size influenced by the initial plant size?

Exercise 1.6.4 An experiment was conducted to study the effect of supplementation of weaned lambs on health and growth rate when exposed to helminthiasis. A total of 16 Dorper (breed 1) and 16 Red Maasai (breed 2) lambs were treated with an anthelmintic at 3 months of age (after weaning) and randomly allocated into "blocks" of 4 per breed, classified on the basis of 3-month body weight for supplemented and unsupplemented groups. Therefore, two lambs in each block were randomly allocated to supplemented (night-fed cotton seed meal and wheat bran) and unsupplemented groups. All lambs were kept on grazing for a further 3 months. Data recorded included the initial body weight (kilograms) at weaning and weight at 3 months after weaning, percentage red blood cell volume (RBCV), and fecal egg count (FEC) at 6 months of age. Data from this experiment are shown below (Table 1.31):


Did supplementation improve weight gain? Did supplementation affect PRBC and FEC, and were there differences in weight gain, PRBC, or FEC between breeds?


Table 1.30 Fruit production after grazing


Table 1.31 Supplementation trial in Dorper (breed 1) and Red Maasai (breed 2) lambs

IW initial weight, FW final weight, PRBC percentage of red blood cells, FEC fecal egg count, WG weight gain


# Appendix

### Appendix 41


Data: Larkspur plants from two populations in the state of Minnesota

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 2 Generalized Linear Models

# 2.1 Introduction

In the generalized linear model (GLM) (which is not highly general) y = Xβ + e, the response variables are normally distributed, with constant variance across the values of all the predictor variables, and are linear functions of the predictor variables. Transformations of data are used to try to force the data into a normal linear regression model or to find a non-normal-type response variable transformation (discrete, categorical, positive continuous scale, etc.) that is linearly related to the predictor variables; however, this is no longer necessary. Instead of using a normal distribution, a positively skewed distribution with values that are positive real numbers can be selected. Generalized linear models (GLMs) go beyond linear mixed models, taking into account that the response variables are not of continuous scale (not normally distributed), GLMs are heteroscedastic, and there is a linear relationship between the mean of the response variable and the predictor or explanatory variables.

Nelder and Wedderburn (1972) implemented a unified methodology for linear models, thus opening a window for researchers to design models that can explain the variation of the phenomenon under study. Later, McCullagh and Nelder (1989) proposed an extension of linear models, called generalized linear models (GLMs). They pointed out that the key elements of a classical linear model are as follows: (i) the observations are independent, (ii) the mean of the observation is a linear function of some covariates, and (iii) the variance of the observation is a constant. To further extend these, points (ii) and (iii) are modified as follows: (ii') the mean of the observation is associated with a linear function of some covariates via a link function and (iii') the variance of the observation is a function of the mean. For more details, see the study by McCullagh and Nelder (1989). GLMs can be adapted to a wide variety of response variables. Special cases of GLMs include not only regression and analysis of variance (ANOVA) but also logistic regression, probit models, Poisson regression, log-linear models, and many more.

# 2.2 Components of a GLM

The construction of a GLM begins with choosing the distribution of the response variable, the predictor or explanatory variables to include in the systematic component, and how to connect the mean of the response to the systematic component. The three important components are described in the following sections:

# 2.2.1 The Random Component

The first component to specify is the random component, which consists of choosing a probability distribution for the response variable. This can be any member of the exponential family of distributions, such as normal, binomial, Poisson, gamma, and so on.

# 2.2.2 The Systematic Component

The second component of a GLM is the systematic component or linear predictor, which consists of a linear combination of explanatory variables (the predictor). The systematic component of a model is the fixed structural part of the model that explains the systematic variability between means. The linear predictor is found on the right-hand side of the equation in the specification of a linear or nonlinear regression model. Let x1, x2, ⋯, xp be the numerical (dummy) or discrete (category) predictor (explanatory) variables, then the linear predictor is

$$\boldsymbol{\eta}\_{i} = \boldsymbol{\beta}\_{0} + \boldsymbol{\beta}\_{1}\mathbf{x}\_{1i} + \boldsymbol{\beta}\_{2}\mathbf{x}\_{2i} + \cdots + \boldsymbol{\beta}\_{p}\mathbf{x}\_{pi} = \mathbf{x}\_{i}^{T}\boldsymbol{\mathfrak{B}}$$

where β<sup>T</sup> = (β0, β1, β2,⋯, βp) is the vector of regression parameters and x<sup>T</sup> <sup>i</sup> = 1, x1<sup>i</sup>, x2<sup>i</sup>, ⋯, xpi is the vector of predictor variables. Although η is a linear function, the x ' s can be nonlinear in form. For example, η can be a quadratic, cubic, or higher-order polynomial. The expected value of yi and the linear predictor η<sup>i</sup> are related through the link function. For example, in a Poisson GLM, the predictor is equal to logð Þ <sup>λ</sup><sup>i</sup> <sup>=</sup> <sup>x</sup><sup>T</sup> <sup>i</sup> β, since the link is a natural logarithm, better known as the link log.

In normal linear regression models, the focus is on η and finding the predictors or explanatory variables that best explain or predict the mean of the response variable. This is also important in a GLM. Problems such as multicollinearity in normal linear regression are also problems in generalized linear models.

# 2.2.3 Predictor's Link Function η

Finally, we will look at the specification of the link function that maps the mean of the response variable to the linear predictor. The link function allows a nonlinear relationship between the mean of the response variable and the linear predictor, and this link g () connects the mean of the response variable with the linear predictor. That is,

$$\mathcal{g}\left(\mu\right) = \eta$$

The function must be monotonous (and differentiable). The mean is equal, in turn, to the inverse transformation of g (), that is,

$$\mu = \text{g}^{-1}(\eta)$$

The most natural and meaningful way to interpret the model parameters is in terms of the scale of the data. In other words,

$$\mu = \mathbf{g}^{-1}(\boldsymbol{\eta}) = \mathbf{g}^{-1} \left( \boldsymbol{\beta}\_0 + \boldsymbol{\beta}\_1 \mathbf{x}\_{1i} + \boldsymbol{\beta}\_2 \mathbf{x}\_{2i} + \cdots + \boldsymbol{\beta}\_p \mathbf{x}\_{pi} \right)$$

It is important to note that the link relates the mean of the response to the linear predictor and that this is different from transforming directly to the response variable. If the response variables are transformed (i.e., y ' s), then a distribution must be selected, which describes the population distribution of the transformed data, thus making the original interpretation of the data more difficult. A transformation of the mean is generally not equal to the mean of the transformed values, that is, g(E[y]) ≠ E(g[y]). For example, suppose we have a distribution with the following values (and probabilities):


The mean of this distribution is E[y] = 1 × 0.125 + 2 × 0.375 + 3 × 0.375 + 4 × 0.125 = 2.5. Therefore, the logarithm of the mean of this distribution is ln(E[y]) = ln (2.5) = 0.916, whereas the mean of the logarithm is equal to E(ln[y]) = 0.845. The value of the linear predictor η could potentially equal any value, but the expected values of the response variable – as in the case of counts or proportions – can be bounded. If there are no restrictions on the response variable (positive or negative real numbers), then the "identity link" function could be used, where the mean is identical to the linear predictor, that is,

μ = η:

As mentioned before, the link function establishes a connection between the linear predictor η and the mean of the distribution μ. It is important to note that the link function in some cases is in a sense similar to a function transformation, in that it establishes only a mathematical connection between the parameters of the model. A function transformation is applied to the observations to better understand the relationship between the mean and the response variables or, in some cases, to stabilize the variance. Special cases are mentioned below:


$$\eta = \log\left(\frac{\pi}{(1-\pi)}\right) \to \quad \pi = \frac{e^{\eta}}{(1+e^{\eta})}$$

Another useful alternative for these types of data is the probit link function:

$$
\eta = \Phi^{-1}(\pi) \to \pi = \Phi(\eta).
$$


$$
\eta = \log(\lambda) \to \quad \lambda = e^{\eta}
$$

The variance of the function has the form Var(λ) = λ, and, similar to the binomial distribution, the scale parameter is 1. Poisson models with a log link function are often referred to as log-linear models, commonly used when there are contingency (data frequency) tables with at least two entries.

### 2.2 Components of a GLM 47

(d) A gamma distribution has a link function of the form:

$$
\eta = \frac{1}{\mu} \quad \rightarrow \quad \mu = \frac{1}{\eta}
$$

The variance of the function is given by Var(μ) = μ<sup>2</sup> and the scale parameter ϕ is usually unknown. In some cases, the log link function is commonly used, which results in an exponential inverse link. It should be noted that the link function does not map the range of the means contained within the linear predictor. Therefore, given its limitations, the theory only provides reasonable approximations for most applications. An exponential distribution is a special case of the gamma distribution.

Previously, the classical methods for working with non-normal data – before the advances in computational methods – consisted of using direct transformation of the response variable, that is, the data were transformed using the function t( y) before being analyzed. The goal of the transformation was to obtain a simple connection between the mean and the linear predictor. However, obtaining a consistent scale of variation when selecting a transformation is vitally important. The usual way for selecting a suitable transformation is based on the assumption that, within the region of variation of the random variable, the transformation can capture the variability adequately through a simple linear approximation of the mean. That is, if the random variable y has a distribution with a mean μ and variance σ<sup>2</sup> (μ), we want to find a transformation t( y) such that it is forced to have a constant variance (stabilizes the variance). The commonly used functions to stabilize variance are the square root y p when data have a Poisson distribution; the arcsine square root when data are binomial; and the logarithmic transformation for data with a constant coefficient of variation.

Table 2.1 provides an overview of the most common link functions that will give admissible values for certain types of response variables and the corresponding inverse of the link function.


Table 2.1 Common link functions for different response variables

Note: Φ is the cumulative distribution function of a standard normal distribution; μ and π are the expected values of the response; η is the linear predictor; and ϕ is the scale parameter

# 2.3 Assumptions of a GLM

According to McCullagh and Nelder (1989) and Agresti (2013) in Chap. 4, a GLM is defined under the following assumptions:


# 2.4 Estimation and Inference of a GLM

Estimators of the regression coefficients for linear models with a normal response are obtained using least squares or ML, and significance tests are generally used to compare the sum of least squares under different hypothesis tests using the F-test. It is worth mentioning that these tests are exact, and, so, no approximations are required for their implementation.

GLMs offer a natural extension of this situation in the sense that: (1) The computational calculations used to determine the ML estimations of the regression parameters/coefficients are highly similar to those used in cases when the response is normal, with the difference being that the estimation process is iterative, which produces successive approximations that converge to the ML estimates. (2) In the inference procedures, the test statistic commonly used is the likelihood ratio test, which is parallel to the F-tests in linear models with a normal response. Thus, GLMs provide a uniform method of estimation and inference. Estimation of parameter β is highly similar to the ML method, whereas the inference methods are generally approximations since they are based on the theory of the distribution of a sufficiently large sample, as in the case of the likelihood ratio method. There are several alternative tests such as the Wald test, test scores, and the likelihood ratio test.

# 2.5 Specification of a GLM

In the following examples, we will describe the components of a GLM for some normal, gamma, binomial, and Poisson regression models.

# 2.5.1 Continuous Normal Response Variable

In simple linear regression models, the expected mean value of a continuous response variable depends on a set of explanatory variables, as follows:

$$
\rho\_i = \beta\_0 + \beta\_1 \mathbf{x}\_i + \varepsilon\_i, \quad \varepsilon\_i \sim N\left(0, \sigma^2\right),
$$

Equivalently,

$$E(\mathbf{y}\_i|\mathbf{x}\_i) = \boldsymbol{\beta}\_0 + \boldsymbol{\beta}\_1 \mathbf{x}\_i$$

This GLM can be expressed in terms of its three components:

$$\text{Distribution: } \mathbf{y}\_i \sim N\left(\mu\_i, \sigma^2\right)$$

$$E(\mathbf{y}\_i) = \mu\_i$$

$$\text{Var}(\mathbf{y}\_i) = \sigma^2$$

$$\text{Linear predictor: } \eta\_i = \beta\_0 + \beta\_1 \mathbf{x}\_i$$

ð Þ Link function: η<sup>i</sup> = μ<sup>i</sup> identity link

where β<sup>0</sup> and β<sup>1</sup> are the intercept and slope, respectively. This means that we are expressing the linear model as a GLM.

Example 1 A simple linear regression analysis was performed on the diamond price ( y) as a function of the number of carats (Table 2.2) and assuming that the response variable " y " has a normal distribution with a mean β<sup>0</sup> + β1xi and variance σ<sup>2</sup> .

The basic Statistical Analysis Software (SAS) syntax for simple linear regression is as follows:

```
proc reg ;
model price=weight/clb p r;
output out=diag p=pred r=resid;
id weght;
run;
```
In the above program, "proc reg" invokes a linear regression procedure in SAS. The "clb" option generates a confidence interval for the slope and intercept. The "p"


Table 2.2 Diamond price (dollars) based on weight (carats)

Table 2.3 Regression analysis results


option generates fitted values and standard errors. The "r" option performs a residual analysis (i.e., checks assumptions). The "output out" statement generates a new dataset called "diag" containing the residuals and the predicted/adjusted values. The "id weight" statement adds the specified variable to the fitted values output.

Part of the results is shown in Table 2.3. The estimated parameters, obtained from "proc reg," are shown below:

Note that the estimated parameters are all statistically significantly different from zero. Then, the linear predictor takes the form:

$$\eta = -259.63 + 3721.02 \times \text{weight}\_i$$

If the response variable " y " does not fit the data well, then the normal distribution may barely represent the response distribution; that is, it would weakly explain the variability of the data and, consequently, the "identity" may not be the best link function, since the linear predictor would not include all the relevant information or some combination of the three components of the GLM. Although other fit measure statistics exist in the linear regression model, such as the coefficient of determination (R<sup>2</sup> ), the residual analysis is used to determine whether there is a good fit of the model or whether the assumptions of a Gaussian model are met. In this example, the value of R<sup>2</sup> is R<sup>2</sup> = 0.9783, and this value indicates that the model used explains 97.83% of the total variability of the dataset. In Fig. 2.1, we can see that the simple linear regression model provides a good fit to this dataset.

Fig. 2.1 A dot plot of price vs. weight (carat) and fitted model

# 2.5.2 Binary Logistic Regression

Logistic regression and other binomial response models are widely used in research areas like biological sciences and agriculture. Given their importance in this section, some relevant features of these models are mentioned.

Let yi be the observed response on a set of p explanatory variables x1, x2, ⋯, xp whose distribution yi is binomial with ni independent Bernoulli trials and probability of success π<sup>i</sup> on each trial, i.e.,

$$\mathbf{y}\_i \sim \text{Binomial}(n\_i, \pi\_i)$$

Then, we can model the response using a GLM with a binomial response. The linear predictor in this case will be equal to

$$\log\left(\frac{\pi\_i}{1-\pi\_i}\right) = \beta\_0 + \beta\_1\chi\_{1i} + \dots + \beta\_p\chi\_{pi}$$

commonly known as "logit" because logit is defined as:

$$\text{logit}(\pi\_i) = \log \left[ \frac{\pi\_i}{(1 - \pi\_i)} \right]$$

which models the logarithm of the odds ratio, <sup>π</sup> , as a function of the predictor i ð Þ 1 - π<sup>i</sup> variables. The components of this GLM for binomial data are:

> Distribution: yi -Binomial ni ð Þ , π<sup>i</sup> , with mean and variance

$$E(\mathbf{y}\_i) = n\_i \pi\_i \text{ and } \operatorname{Var}(\mathbf{y}\_i) = n\_i \pi\_i (1 - \pi\_i)$$

$$\text{Linear predictor: } \eta\_i = \beta\_0 + \beta\_1 \mathbf{x}\_{1i} + \dots + \beta\_p \mathbf{x}\_{pi}$$

$$\text{S.t. } \dots \text{ is } \dots, \dots, \dots \quad \begin{bmatrix} \pi\_i & \end{bmatrix} \dots \begin{bmatrix} \end{bmatrix} \dots \begin{bmatrix} \end{bmatrix}$$

ð Þ Link function : <sup>η</sup><sup>i</sup> <sup>=</sup> logitðπi<sup>Þ</sup> <sup>=</sup> log <sup>π</sup><sup>i</sup> 1 - π<sup>i</sup> ðlogit linkÞ

Another highly useful link function – when you have experiments – is the "probit" link η<sup>i</sup> = Φ-<sup>1</sup> (πi), which was mentioned before.

The basic GLM for this dataset, under the probit link, is almost identical to the logit link as seen below:

$$\text{Distribution} \colon \mathbf{y}\_i \sim \text{Binomial}(n\_i, \pi\_i)$$

$$\text{Linear predictor} \colon \eta\_i = \beta\_0 + \beta\_1 \mathbf{x}\_i + \dots + \beta\_p \mathbf{x}\_p$$

$$\text{Link function} \colon \eta\_i = \text{prob}(\pi\_i) = \boldsymbol{\Phi}^{-1}[\pi\_i].$$

Example 1 An engineer is interested in studying the effect of temperature (Temp) from 0 to 40 °C and time in days from 0 to 15 days on the germination of seeds of a certain crop. For this reason, he placed seeds in different pots containing moist soil. After a certain number of days, the number of germinated seeds was counted. If the seeds germinated, then y = 1; otherwise, y = 0. The probability of germination πij can be modeled through

$$
\eta\_{ij} = \beta\_0 + \beta\_1 \text{Day}\_i + \beta\_2 \text{Temp}\_j
$$

where ηij is the linear predictor and β0, β1, and β<sup>2</sup> are the parameters to be estimated. In this GLM, the link function is

$$\eta\_{ij} = \text{logit}(\pi\_{ij}) = \log \left[ \frac{\pi\_{ij}}{(1 - \pi\_{ij})} \right]$$

and the probability in the interval (0, 1) is computed through the inverse of the link function

$$\pi\_{ij} = \frac{1}{1 + \exp^{\eta\_{\psi}}} = \text{g}^{-1}\left(\eta\_{ij}\right),$$

This last expression allows to estimate the probability of germination (πij) under different temperature conditions (°C) and time periods (days). Note that the nonlinear relationship between the result πij and the linear predictor ηij is modeled by the inverse of the link function. In this particular case, the link function is the logit.

$$\eta\_{ij} = \log \left[ \frac{\pi\_{ij}}{(1 - \pi\_{ij})} \right] = g \left( \pi\_{ij} \right).$$

For the illustration of this example, a set of data was simulated using the values β<sup>0</sup> = 8, β<sup>1</sup> = - 0.19, and β<sup>2</sup> = - 0.37 in the linear predictor and the inverse of the linear function by varying the temperature from 0 to 40 °C and time from 0 to 15 days, i.e.,

$$\hat{\pi\_{ij}} = \frac{1}{1 + e^{(8 - 0.19 \times \text{Temp}\_i - 0.37 \times \text{Day}\_j)}}$$

Part of the simulated data is shown below:


The following commands allow us to perform a binomial regression using the "logit," "probit," and linear regression with the "identity" link. It is important to mention that we denote temperature as t and days as d in the codes used below.

Logit Regression proc glimmix data=germ; model p = t d/solution dist=binomial link=logit cl; output out=logitout pred(noblup ilink)=predicted resid=residual; run;

```
Probit Regression
proc glimmix data=germ;
model p = t d/solution dist=binomial link=probit cl;
output out=probitout pred(noblup ilink)=predicted resid=residual;
run;
```

```
Linear Regression (Identity)
proc glimmix data=germ ;
model p = t d/solution cl dist=normal;
output out=identity out pred(noblup ilink)=predicted resid=residual;
run;
```
"proc GLIMMIX" in SAS uses complex models without modifying the response variable as occurs when a direct transformation is applied to the response variable. Instead, GLIMMIX uses a link function of the response variable that is modeled as having a linear relationship with the explanatory variables. The "model" command specifies the response variable p as a function of the explanatory variables t and d, which define Xβ. The "solution" option in the model specification invokes the regression procedure to list the fixed effects parameter estimates of the model (β0, β1, and β2). The "dist" option is used to specify the distribution of the response variable, and the "link" option is used to specify the link function.

To get predicted probability values for each observation, the "output" option in proc GLIMMIX is used. Two types of predicted values can be obtained with the "output" option. The first type is the solution for the random effects (best linear unbiased predictors (BLUPs)) in the linearized model, and the second type is the predictions based on the fixed effects (best linear unbiased estimators (BLUEs)) (pred(noblup ilink) = predicted). The "ilink" sub-option in the "pred" option asks for the inverse function of the predicted values, that is, the probabilities of the predictions that are stored under the predicted file name. Finally, the "resid" option is used to request the residuals of the regression, which are stored in the residual.

Table 2.4 shows part of the output (analysis of variance (part (a)) and estimation and significance of fixed effects (part (b)) of the regression procedure using the logit link function.


Table 2.4 Estimation and significance of fixed effects using the logit link function

### 2.5 Specification of a GLM 55


Table 2.5 Parameter estimates, linear predictor, and probability of linear, logit, and probit models

\*The linear predictor η and the probability π were estimated using D = 15 and T = 30

Fig. 2.2 (a, b) Probability of seed germination as a function of temperature and day

In Table 2.5, parameter estimates of the linear predictor for the generalized linear, logit, and probit models are presented. The probabilities estimated by the probit and logit models are almost identical to each other, but those of the linear probability model are different; this is because the data were generated with a binomial distribution, whereas the estimated linear predictor differs substantially from the linear predictor under the link probit and logit.

In Fig. 2.2a, b, we observe that in an interval between 3 and 7 days and 0 and 15 °C, there is approximately 20% seed germination, but, while both factors increase, the germination percentage also increases substantially.

### 2.5.2.1 Model Diagnosis

For a linear model, a plot of the predicted values against the residuals is probably the simplest way to decide whether the model used provides a good fit to the data; but, for a GLM, we must decide on the appropriate scale to use for the fitted values.

Fig. 2.3 Predicted vs. residual values using the logit link

Generally, it is better to use linear predictors η in the plot rather than the predicted responses μ. If there is no linear relationship between the linear predictors and the residuals, then it could indicate a lack of fit in the model. For a linear model, we could perform a transformation of the response variable, but this is not highly recommended for a GLM as this could change the response distribution. Another alternative would be to change the link function, but since there are not many link functions that allow interpreting a model easily, this is not a good option. Moreover, changing the linear predictor or transforming the predictor variables would not be the best way to go.

Figures 2.3, 2.4, and 2.5 show the linear predictor versus residual (we can also see the predicted value versus the residual). By investigating the nature of the relationship between the predictors and the residuals in Fig. 2.3, we can see that there is a linear relationship between the predictor and the residual, using the logit function, whereas the probit and identity functions do not show this linear relationship. However, with the probit link function, we observe a curvilinear relationship between the predictor and the residual, which may be because homogeneity of variance is not satisfied under this link function. Therefore, the logit link is shown to be the best choice.

Example 2 Fruit flies can be a year-round problem in fruit-growing areas in many regions of the world, such as in Mexico, and are most common especially in late summer and fall because ripe or fermented fruits and vegetables attract insects by serving as a natural host. If these insects are not controlled, economic losses in fruitgrowing areas could be large and devastating to the producers. In response to this, entomologists have implemented experiments to help mitigate the damages caused by these insects. One such experiment attempted to establish the relationship between the concentration of a toxic agent (nicotine) for 5 hours and the number

Fig. 2.4 Predicted values vs. residuals using the probit link

Fig. 2.5 Predicted vs. residual values using the identity link

of insects killed (common fruit fly); the data are shown in Table 2.6, and, for more information, see the study by Myers et al. (2002).

The number of dead insects can be modeled under a binomial distribution (n, π). Let yi denote the number of dead insects at a concentration i. The GLM components for this dataset are:


Table 2.6 Ratio of the concentration of a toxic agent to the number of fruit flies killed

Distribution: yi - Binomial ni ð Þ , π<sup>i</sup> , with mean and variance : E yi ð Þ = niπ<sup>i</sup> and Var yi ð Þ = niπið Þ 1 - π<sup>i</sup>

> þ Linear predictor: η<sup>i</sup> = β<sup>0</sup> β1conc<sup>i</sup>

ð Þ Link function : <sup>η</sup><sup>i</sup> <sup>=</sup> logitðπi<sup>Þ</sup> <sup>=</sup> log <sup>½</sup> <sup>π</sup><sup>i</sup> 1 - π<sup>i</sup> ðlogit linkÞ

Note that we are using conc<sup>i</sup> to denote the independent variable nicotine toxicant concentration. The following SAS code allows us to perform a binomial regression for the fruit fly dataset:

```
fly data;
input conc n y;
datalines;
0.1 47 8
0.15 53 14
0.2 55 24
0.3 52 32
0.5 46 38
0.7 54 50
0.95 52 50
;
proc glimmix data=nobound fly;
model y/n = conc/dist=binomial link=logit solution;
run;
```
The above syntax produces the following output:

The analysis of variance (Table 2.7 a) shows that there is a highly significant effect of nicotine concentration on the number of flies killed (P = 0.0004). From the results obtained, we can observe that, in part (b), the maximum likelihood estimator for the intercept and slope are β<sup>0</sup> = - 1:7361 and β<sup>1</sup> = 6:2954, respectively, which are used to construct the linear predictor:

$$
\hat{\eta}\_i = -1.7361 + 6.2954 \times \text{conc}\_i
$$


Fig. 2.6 Proportion of dead insects as a function of nicotine concentration

Therefore, with the logistic regression model, we can estimate the probability that an insect dies when exposed to a certain concentration i of nicotine using the following expression:

$$\widehat{\pi}(\text{conc}\_i) = \frac{e^{\widehat{\eta\_i}}}{1 + e^{\widehat{\eta\_i}}} = \frac{e^{-1.7361 + 6.2954 \times \text{conc}\_i}}{1 + e^{-1.7361 + 6.2954 \times \text{conc}\_i}}$$

A plot of the mean proportion of dead insects exposed to a certain concentration of nicotine and the regression curve (linear, quadratic, and cubic) is shown in Fig. 2.6. In this figure, we observe that as the nicotine concentration increases, the mean proportion of dead insects increases. The best linear predictor is of a quadratic order.

# 2.5.3 Poisson Regression

Often, the outcome of a variable is numerical in the form of counts. Sometimes it is a count of rare events such as, for example, (1) the number of plants infected by a certain disease in a population over a period of time, (2) the number of insects surviving after the application of an insecticide over time, (3) the number of dead fish found per cubic kilometer due to a certain pollutant, (4) the number of sick animals occurring in a given month in a given country, and so on. The Poisson probability distribution is perhaps the most widely used for modeling count-type response variables. As λ (the average count) increases, the Poisson distribution grows symmetrically and eventually approaches a normal distribution.

The Poisson likelihood function is appropriate for nonnegative integer data and this process assumes that events occur randomly over time, so the following conditions must be met:


The probability distribution of a Poisson random variable " y, " which represents the number of successes occurring in a given time interval or in a given region of space, is given by the expression

$$P(\mathbf{y} = k) = \frac{e^{-\lambda}\lambda^k}{k!}, \quad \lambda > 0, \ k = 1, 2, \cdots$$

where λ is the average number of successes (the average count) in a time or space interval. The mean and variance of this distribution are the same, that is,

$$E(\mathbf{y}) = \mathbf{Var}(\mathbf{y}) = \lambda$$

Poisson regression belongs to a GLM and is appropriate for analyzing count data or contingency tables. A Poisson regression assumes that the response variable "y" has a Poisson distribution and that the logarithm of its expected value can be modeled by a linear combination of unknown parameters and independent variables. As in a standard linear regression, the predictors, weighted by the coefficients of x1, x2, ⋯, xp, are summed to form the linear predictor,

$$
\eta\_i = \beta\_0 + \sum\_{p=1}^{P} x\_{pi}\beta\_p
$$

where β<sup>0</sup> is the intercept and β<sup>p</sup> is the slope of the covariates xp ( p = 1,⋯, P). Thus, the expected value of yi and the linear predictor η<sup>i</sup> are related through the link function. The components of a GLM with a Poisson response (yi ~ Poisson(λi)), where λ<sup>i</sup> is the expected value of yi, are as follows:

Fig. 2.7 Students infected with the disease

$$\text{Distribution: } \mathbf{y}\_i \sim \text{Poisson}(\lambda\_i), \text{with } E(\mathbf{y}\_i) = \text{Var}(\mathbf{y}\_i) = \lambda\_i$$

$$\text{Linear predictor: } \eta\_i = \beta\_0 + \beta\_1 \mathbf{x}\_{1i} + \dots + \beta\_p \mathbf{x}\_{pi}$$

$$\text{Link function: } \eta\_i = \log(\lambda\_i) = \mathbf{g}(\lambda\_i) \qquad (\log \text{link})$$

Example 1 The following dataset corresponds to the number of students diagnosed (Fig. 2.7) with a certain infectious disease within a period of days of an initial outbreak. We will fit a generalized linear model for "count" data assuming a Poisson distribution.

Note that the response distribution is skewed to the right and that the responses are positive integers. Since the response variable is count, the initial choice of a Poisson distribution is reasonable for this dataset with its canonical link, the natural logarithm. The number of "days elapsed" after the initial disease outbreak is the predictor variable in the systematic component. Thus, the GLM for this dataset (Appendix: Data: Infected students) is:

> Distribution: Inffected students<sup>i</sup> -Poissonð Þ λ<sup>i</sup>

> > þ Linear predictor: η<sup>i</sup> = β<sup>o</sup> β1Days

ð Þ ð Þ Link function: η<sup>i</sup> = log λ<sup>i</sup> log link

Part of the data is shown below:



For the purposes of implementation, we use days to denote elapsed days and students to denote infected students. We can employ the Poisson regression model using GLIMMIX in SAS, as shown below:

```
proc glimmix data=students method=laplace;
model students=days/solution dist=poisson link=log;
output out=sal_infection pred(noblup ilink)=predicted
resid=residual;
run;
```
The "proc GLIMMIX" statement invokes the SAS generalized linear mixed model (GLMM) procedure. The "model" command specifies the response variable and the predictor variable, whereas the "solution" option in the model specification requests a listing of the fixed effects parameter estimates. The "dist = poisson" option specifies the distribution of the data, and the "link = log" option declares the link function to be used in the model. The default estimation technique in generalized linear mixed models is restricted pseudo-likelihood (the "RPSL method"); in this example, we use "method = laplace." The "output" option creates a dataset containing predicted values and diagnostic residuals, calculated after fitting the model. By default, all variables in the original dataset are included in the output dataset, whereas the "out = sal\_infection" statement specifies the name of the output dataset. The "pre(noblup ilink) = predicted" option calculates the predicted values without taking into account the random effects of the model, and "ilink" calculates the statistics and predicted values at the scale of the data. Finally, the "resid = residual option" calculates the residuals.

The probability estimation of a GLMM involves an integral, which, in general, cannot be calculated explicitly. "GLIMMIX," by default, uses the RSPL method, but it also offers different options such as the quadrature and Laplace integration method, among others. These integral approximation methods approximate the probability function of an GLMM, and the optimization of the function is numerically approximated. These methods provide a real objective function for optimization. For more details, see the SAS manual. However, in a GLM, this approximation involving the integral is not necessary since an exact solution can be obtained to estimate the parameters, as there are no random effects. The results of this analysis are shown below (Table 2.8).

The fit statistics in part (a) ("Fit statistics") give us an idea of the quality of the goodness of the fit of the model; these statistics are very useful when we are proposing different models to try and find the best model for the data. In this case, the value of the generalized chi-squared statistic divided over its degrees of freedom


is close to 1. This indicates that the variability of these data has been reasonably modeled and that there is no residual overdispersion. The value of the generalized chi-squared statistic divided over its degrees of freedom (Pearson′ s chi - square/DF) is the experimental error of the analysis.

The "Type III tests of fixed effects" (in part (b)) and the solution for the intercept and the days effect ("Parameter estimates") in part (c) are shown in Table 2.8. The negative coefficient of the covariate days indicates that as the number of days increases, the average number of students diagnosed with the disease decreases.

That is, we reject the null hypothesis (P = 0.0001) that the expected number of infected students is the same as the number of days increases.

We see that with a 1-day increase in the infection period, the expected (or average) number of students diagnosed with the disease decreases by a factor of e -0.01746 = 0.9827.

The estimated linear predictor for this GLM is:

$$
\hat{\eta}\_i = 1.9902 - 0.01746 \times \text{Days}.
$$

For example, we can calculate the probability of diagnosing " k = 2" infected students in a period of 2 days; i.e., " Days = 2" as follows:

$$
\widehat{P}(Y\_i = k) = \frac{\exp\left(-\widehat{\lambda\_i}\right) \left(\widehat{\lambda\_i}\right)^k}{k!}
$$

Fig. 2.8 Infected students and a Poisson regression fit

$$\begin{split} \widehat{P}(Y\_i = 2) &= \frac{\exp^{-\left(-\exp[1.9902 - 0.0146 \times 2]\right)} (\exp[1.9902 - 0.0146 \times 2])^2}{2} \\ &= \frac{\exp^{-\left(-\exp(1.961)\right)} (\exp(1.961))^2}{2} = 0.0207 \end{split}$$

This value indicates that the probability of observing/diagnosing two students with the disease in a 2-day period is 0.0207 (2.0701%).

In Fig. 2.8, we observe that the Poisson model is a good candidate for modeling this dataset, since there is no overdispersion in this regression model.

Example 2 A forest engineer is interested in modeling the number of trees recently infected by a certain virus. The data that he has are age (years), height (meters) of the trees, and the number of infected trees. Using a linear model could result in negative values of the parameter λ, which would not make sense. The link function g(λ) for a Poisson error structure is the logarithm. Therefore, the GLM, defining yi = infected treesi, can be as follows:

> þ þ ðÞ ðÞ ð Þ Distribution: yi - Poissonð Þ λ<sup>i</sup> Linear predictor: η<sup>i</sup> = β<sup>0</sup> β<sup>1</sup> × Agei β<sup>2</sup> × height<sup>i</sup> Link function: η<sup>i</sup> = log λ<sup>i</sup> = g λ<sup>i</sup> log link

For this example, a dataset was simulated using the following parameter values: β<sup>0</sup> = - 2, β<sup>1</sup> = - 0.03, and β<sup>2</sup> = - 0.04. In addition, in order to obtain the linear predictor, the variable age (years) varied from 0 to 50 and height (meters) from 0 to 30, both with increments in one unit. Thus, the values of yij were simulated with the following expression:

Fig. 2.9 (a, b) Probability of tree infection as a function of tree height and age in years

$$\mathbf{y}\_{ij} = \mathbf{exp}^{(-2 - 0.03 \times \mathbf{Age}\_i - 0.04 \times \mathbf{Height}\_j)}$$

In Fig 2.9a, b, we can see that at a young age, between 1 and 10 years and at a height of no more than 10 meters, trees are more susceptible to be infested by the virus. However, as their age increases, trees show greater resistance.

The following SAS code fits a Poisson regression model with two predictor variables, assuming that there is no interaction between the two explanatory variables.

```
proc glimmix data=infection method=laplace;
model infection=age height /solution dist=poisson link=log;
output out=sal_infection pred(noblup ilink)=predicted
resid=residual;
run;
```
In Table 2.9 part (a), the analysis of variance shows that age and tree height are highly significant, indicating that both variables help explain the infection mechanism of the trees through a Poisson model (P < 0.0001).

The linear predictor for this GLM, with Poisson distribution, in the response variable is:

$$
\eta\_{ij} = -2 - 0.03 \times \text{Age}\_i - 0.04 \times \text{Height}\_j
$$

The estimated values of the parameters of each of the explanatory variables indicate that as age (years) and height (meters) increase by one unit, the tree is less susceptible to the virus. If we want to calculate the probability of diagnosing " k = 3" infected trees with the virus when they are 2 years old and 3-meters tall, we can use the following equation:


Table 2.9 Part of the results of the analysis of variance under a Poisson distribution

$$
\widehat{P}(Y\_i = k) = \frac{\exp\left(-\widehat{\lambda\_i}\right) \left(\widehat{\lambda\_i}\right)^k}{k!}
$$

$$\begin{split} \hat{P}(Y\_i = 3) \\ &= \frac{\exp^{\left(-\exp\left[-2 - 0.03 \times \text{Age} - 0.04 \times \text{Height}\right]\right)} \left(\exp\left[-2 - 0.03 \times \text{Age} - 0.04 \times \text{Height}\right]\right)^3}{3!} \\ &= \frac{\exp^{\left(-\exp\left[-2 - 0.03 \times 2 - 0.04 \times 3\right]\right)} \left(\exp\left[-2 - 0.03 \times 2 - 0.04 \times 3\right]\right)^3}{3!} = 0.000215 \end{split}$$

This value indicates that the probability of observing/diagnosing three trees with the virus causing the disease when they are 2 years old and 3-meters tall is 0.000215 (0.0215%).

A Poisson regression model, sometimes referred to as a log-linear model, is especially useful when it is used in contingency table modeling. Log-linear models are models of associations between variables in a contingency table; they treat variables symmetrically and do not distinguish one variable as a response. They have a formal structure of double or more entries that can be fitted by binomial or Poisson regression. These models for contingency tables have several specific applications in biological and social sciences.

Variables can be nominal or ordinal. A nominal variable has no natural order; for example, gender (male, female, transgender), eye color (blue, brown, green), and type of pet (cat, bird, fish, dog, mouse). An ordinal variable has a range of orders; for example, when you want to measure the degree of consumer satisfaction with the consumption of a product (very dissatisfied, somewhat dissatisfied, neither satisfied nor dissatisfied, somewhat satisfied, very satisfied).

# 2.5.4 Gamma Regression

A gamma distribution is a distribution that occurs naturally in processes for which waiting times, between events, are relevant. Lifetime data are sometimes modeled with a gamma distribution. This distribution can take a wide range of forms due to the relationship between the mean and variance across its two parameters (α and β) and is suitable for dealing with heteroscedasticity of nonnegative data. The probability of observing a particular value y, given the parameters α and β, is

$$f(\mathbf{y}) = \frac{1}{\Gamma(\alpha)\beta^{\alpha}} \mathbf{y}^{\alpha - 1} e^{-(\mathbf{y}/\beta)}; \mathbf{y}, \alpha, \beta > 0$$

where Γ(∙) is the gamma function. A gamma regression uses the input variables X's and coefficients to make a prediction about the mean of " y, " but it actually focuses more of its attention on the scale parameter β. The mean and variance of a Gamma random variable are:

$$E(Y) = a\beta = \mu \quad \text{and } \operatorname{Var}(Y) = a\beta^2 = \mu^2/a^2$$

The probability density function gamma can be rewritten in terms of the mean μ and the scale parameter α as follows:

$$f(\mathbf{y}) = \frac{1}{\Gamma(a)\mathbf{y}} \left(\frac{\mathbf{y}a}{\mu}\right)^a \exp^{\left(-\mathbf{y}\frac{\mathbf{a}}{\mu}\right)}, \quad \mathbf{y} > \mathbf{0}$$

Plotting the gamma distribution (Fig. 2.10) with three different values of shape α = (0.75, 1, and 2), the scale parameter μ has a multiplicative effect. In the gamma density of the first panel α = 0.75, we see that the density is infinite at 0, whereas in the second panel α = 1, it corresponds to the exponential density, and, in the third panel α = 2, we see a skewed distribution.

Fig. 2.10 Gamma density: from left to right, α = 0.75, 1, and 2

A gamma distribution can arise in different forms. The sum of " n" independent and identically distributed exponential random variables with parameter β has a gamma distribution (n, β). The chi-squared distribution χ<sup>2</sup> is a special case of a gamma distribution with β = 1/2 and α/2 degrees of freedom.

Theoretically, a Gamma distribution should be the best choice when the response variable has a real value in the range of zero to infinity and it is appropriate when a fixed relationship between the mean and variance is suspected. If we expect the values " y" to be small, then we should expect a small amount of variability in the observed values. Conversely, if we expect large values of " y, " then we should expect (observe) a lot of variability.

Models with a gamma distribution with multiplicative covariate effects provide additional support for modeling nonnegative right-skewed continuous responses, such as the gamma variable with the log link function. Whether the data are modeled with an inverse or logarithmic link function will depend on whether the rate of change or the logarithm of the rate of change is a more meaningful measure. For example, in studies of yield density that commonly assume that yield per plant is inversely proportional to plant density (Shinozaki and Kira 1956), the linear predictor is:

$$
\eta\_i = \left(\beta\_0 + \beta\_1 x\_i\right)^{-1}
$$

Example 1 In the development of coagulation agents, it is common to perform in vitro clotting time studies. The following data were reported by McCullagh and Nelder (1989). Plasma samples from healthy men were diluted to nine different percentages of prothrombin-free plasma concentration; the greater the dilution, the more interference with the ability of the blood to clot because the natural clotting ability of the blood has been weakened. For each sample, clotting was induced by introducing thromboplastin, a clotting agent, and the time until clotting occurred (in seconds) was recorded. Five samples were measured at each of the nine concentration percentages, and the mean clotting times were averaged; therefore, the response is the mean clotting time across the five samples. In Fig. 2.11, the response variable is plotted against the percentage thromboplastin concentration in which we observe that the longer clotting times tend to be more variable than the smaller clotting times, so a linear regression model may not be appropriate.

In this analysis, we will model clotting times as the response variable (yi) with plasma concentration percentage as the predictor variable. Conc denotes the independent variable concentration. The GLM for this dataset is:

> þ Distribution: yi = Clotting time<sup>i</sup> - Gammað Þ α, β Linear predictor: η<sup>i</sup> = β<sup>0</sup> β<sup>1</sup> × conc<sup>i</sup> Link function : <sup>μ</sup><sup>i</sup> <sup>=</sup> <sup>1</sup> β<sup>0</sup> β<sup>1</sup> × conc<sup>i</sup> ðinverse linkÞ

þ

Fig. 2.11 Clotting time (seconds), depending on the thromboplastin concentration

The following syntax allows us to adjust a GLM with gamma errors in GLIMMIX:

```
data coagu;
input num conc y;
datalines;
1 5 118
2 10 58
3 15 42
4 20 35
5 30 27
6 40 25
7 60 21
8 80 19
9 100 18
;
proc glimmix data = coagu;
model y = conc / dist=gamma link=power(-1) solution;
output out=salgamm1 pred(noblup ilink)=predicted resid=residual;
run;
```
Most of the syntax has already been described in the previous examples; the only new one is the link = power(-1) option. This statement invokes the inverse link function in the GLIMMIX procedure.

Some of the output from this analysis is shown in Table 2.10.

The dilution percentage, part (a) in Table 2.10, of the blood plasma concentration significantly affects the clotting time (P = 0.0004). The values for constructing the fitted linear predictor are tabulated in part (b) of Table 2.10.

η<sup>i</sup> = 0:008686 þ 0:000658 × conc<sup>i</sup>


Scale 0.05213 0.02436 . . .

With the parameterization of the gamma distribution, previously chosen, the intercept and the beta coefficient corresponding to the concentration variable were calculated through GLIMMIX in SAS, as well as the scale parameter(α), which in the SAS output corresponds to the scale. With part of this information, it is possible to calculate the mean (E[Y] = μ) and variance (Var[Y] = μ<sup>2</sup> /α) for a concentration conc = 10 as follows:

$$\hat{\mu} = \hat{\mu} = \frac{1}{0.008686 + 0.000658 \times conc} = \frac{1}{0.008686 + 0.000658 \times 10} = 65.505$$

$$\widehat{\mathrm{Var}}(\mathbf{y}) = \frac{\widehat{\mu}^2}{a} = \frac{65.505^2}{0.052} = 85818.215$$

The average time it takes for blood to clot – when a thromboplastin concentration of 10% is added – is 65.505 seconds with a variance of 85818.215.

### 2.5.4.1 Model Selection

Selecting a model from a set of candidate models that provides the best fit and largely explains the variability in the data is a necessary but complex task. This process involves trying to minimize information loss. From the field of information theory, several information criteria have been proposed to quantify information, or the expected value of information, and, among these, the most widely used are the Akaike information criterion (AIC) (Akaike 1973, 1974) and the Bayesian information criterion (BIC) (Schwarz 1978). Both AIC and BIC are based on the ML estimates of the model parameters. In a regression fit, the estimates of β´ s under the ordinary least method and the ML method are identical. The difference between the two methods comes from estimating the common variance σ<sup>2</sup> of the normal distribution of the errors, around the true mean.

Table 2.11 Goodness-of-fit metrics for each of the three models and regression analysis results for model 3


To get an idea of how to use these adjustment statistics, let us compare three possible models that best explain the effect of the plasma dilution percentage:

Model 1: η<sup>i</sup> = β<sup>0</sup> þ β<sup>1</sup> × conc<sup>i</sup>

$$\text{Model 2: } \eta\_i = \beta\_0 + \beta\_1 \times \text{conc}\_i + \beta\_2 \times \text{conc}\_i^2$$

$$\text{Model 3: } \eta\_i = \beta\_0 + \beta\_1 \times \text{conc}\_i + \beta\_2 \times \text{conc}\_i^2 + \beta\_3 \times \text{conc}\_i^3$$

Since the proposed models have a gamma error structure, the commonly used fit statistic (R<sup>2</sup> ) in a simple linear regression model is not reported. Part of the results of this analysis is shown below with various metrics as goodness-of-fit measures:

With regard to the values of the goodness-of-fit metrics (Table 2.11 part (a)), the smaller they are, the better the fit. Based on the above, the accuracy of the fit of the three regression models increased as the polynomial in the linear predictor increased. That is, model three best explained the variability of the plasma clotting time. The

Fig. 2.12 Fitting the gamma regression model with three predictors

type III sum of squares for fixed effects and the estimated parameters under model three are tabulated in parts (b) and (c) in Table 2.11, respectively.

Parameter estimates under the linear predictor with linear, quadratic, and cubic effects are highly significant. The results suggest that a cubic effect for the percentage dilution in plasma concentration in the linear predictor is more efficient in explaining the clotting time than taking only a linear predictor with only linear or both linear and quadratic effects (Fig. 2.12).

# 2.5.5 Beta Regression

Studies in various areas of knowledge, including agriculture, often face the need to explain a variable expressed as a proportion, percentage, rate, or fraction in the continuous range (0,1). In economics, for example, the factors that influence the proportion of households that do not have a cement floor have been studied. Similarly, in plant breeding, it is desired to investigate the factors that influence the proportion of plant leaves damaged by a certain disease. In parallel, the proportion of impurities in chemical compounds is of everyday interest in physics and chemistry. While studies on electoral preferences analyze citizen participation rates and the variables that can explain them, in the field of education and academic performance, we try to explain the proportion of success in standardized tests. Moreover, it is also used to identify the factors associated with the proportion of credit used by credit card users. The public health field has also been confronted with the need to model the proportion of coverage in health programs in order to identify the sociodemographic and economic characteristics associated with whether a woman is covered. Johnson et al. (1995) presented the properties of the probability distribution of this type of variable; these researchers showed that the beta distribution can be used to model proportions, since its density can take different forms depending on the values of the two shape parameters that index the distribution. However, the beta regression that results from using the beta distribution as a response variable in the context of generalized linear models is not very well known, but its use is increasing every day, thanks to friendly software that allow its implementation in an extremely easy manner.

Definition Let y be a continuous random variable defined in the interval [0, 1] and α, β > 0. Then, Y has a beta distribution with parameters of forms α and β if and only if:

$$f\_Y(\mathbf{y}) = \frac{1}{B(\alpha, \beta)} \mathbf{y}^{\alpha - 1} (1 - \mathbf{y})^{\beta - 1}, \quad \mathbf{0} < \mathbf{y} < 1$$

where <sup>B</sup>(α, <sup>β</sup>) is the beta function defined as <sup>B</sup>ð Þ <sup>α</sup>, <sup>β</sup> <sup>=</sup> <sup>Γ</sup>ð Þ <sup>α</sup> <sup>Γ</sup>ð Þ <sup>β</sup> <sup>Γ</sup>ð Þ <sup>α</sup>þ<sup>β</sup> and <sup>Γ</sup> is the gamma function. The mean and variance of this probability density function are given by

$$E(Y) = \frac{a}{a+\beta} \text{ and } \text{Var}(Y) = \frac{a\beta}{(a+\beta+1)(a+\beta)^2}.$$

In the context of regression analysis, the density of the beta distribution provided above is not very useful for modeling the mean of the response. Therefore, this density is reparametrized so that it contains a precision (or dispersion) parameter. This reparameterization consists of defining a μ = <sup>α</sup> <sup>α</sup>þ<sup>β</sup> and <sup>ϕ</sup> <sup>=</sup> <sup>α</sup> <sup>+</sup> <sup>β</sup>, i.e., <sup>α</sup> <sup>=</sup> μϕ and β = (1 - μ)ϕ, which means that:

$$E(\mathbf{y}) = \mu$$

and

$$\operatorname{Var}(\mathbf{y}) = \frac{\mu(1-\mu)}{1+\phi}.$$

So, μ is the mean of the response variable and ϕ can be interpreted as a parameter of precision in the sense that, for a fixed μ, the higher the value of ϕ, the smaller the variance of y. The density function of y can be written as:

$$\,\_2f(\mathbf{y};\mu,\phi) = \frac{\Gamma(\phi)}{\Gamma(\mu\phi)\Gamma((1-\mu)\phi)} \mathbf{y}^{\mu\phi-1} (1-\mathbf{y})^{(1-\mu)\phi-1}, 0 < \mathbf{y} < 1$$

where 0 < μ < 1 and ϕ > 0.

Let y1, y2, . . . , yn be independent and identically distributed random variables, where each yi with i = 1, 2, . . . , n is modeled under the parametrized beta model with a mean μ and an unknown parameter ϕ. The model is obtained by assuming that the mean of yi can be written as:


$$\lg(\mu\_i) = \sum\_{i=1}^{k} \chi\_{ij}\beta\_i = \eta\_i$$

where β1, β2, ..., β<sup>k</sup> are unknown regression parameters and xij are the k covariates (k < n) that are fixed and known. Finally, g(∙) is a strictly monotone and differentiable link function that maps to the real numbers in the interval (0, 1).

There are several possible options for the link function g(∙). For example, we can use a logit link function <sup>g</sup>ð Þ <sup>μ</sup> <sup>=</sup> log <sup>μ</sup> <sup>1</sup> - <sup>μ</sup> , which is considered the most popular and asymptotically efficient, but it is also feasible to use the probit g(μ) = Φ-<sup>1</sup> (μ) function, where Φ(∙) is the cumulative distribution function of a standard normal random variable, and the complementary link function g(μ) = log {- log (1 - μ)}, among others (McCullagh and Nelder 1989).

Example 1 The objective of this experiment was to evaluate the effect of the concentration of a chemical compound on the proportion of damage ( y) in the fruits (Table 2.12). This compound is known to inhibit the growth of an insect, but, at a certain concentration, it can cause damage to the fruits.

The proportion of damage to the fruits can be modeled under a beta distribution (μ, ϕ). Let yi be the proportion of damage to the fruits due to the ith concentration. The GLM components for this dataset are as follows:

$$\text{Distribution: } \mathbf{y}\_i \sim \text{Beta}(\mu\_i, \phi), \text{with } E(\mathbf{y}) = \mu \quad \text{and} \quad \text{Var}(\mathbf{y}) = \frac{\mu(1-\mu)}{1+\phi}$$

$$\text{Linear predicate: } \eta\_i = \beta\_0 + \beta\_1 \times \text{conc}\_i$$

$$\text{Link function : } \eta\_i = \text{logit}(\pi\_i) = \log \left[ \frac{\pi\_i}{(1-\pi\_i)} \right] (\text{logit log})$$

Table 2.12 Proportion of fruit damage ( y) as a function of concentration. Percentage is equal to proportion ×100


Note that we are using conc to denote the independent variable concentration of the chemical compound. The following SAS code allows us to perform a beta regression for the dataset:

proc glimmix method=laplace; model y = conc / dist=beta s; run;

The "method = Laplace" statement asks SAS for the estimation method to be Laplace integration, and the "dist = beta" and "s" options invoke GLIMMIX to perform beta regression and provide fixed parameter estimation, respectively.

In order to see which type of linear, quadratic, or cubic predictor best explains the observed variability in a dataset, we make use of the fit statistics (-2 log likelihood, AIC, etc.). Part of the output is shown below in Table 2.13. According to the fit statistics in part (a), the predictor that best models this experiment is the quadratic predictor.

In Fig. 2.13, we can see that the best linear predictor to model a dataset is of the cubic order, but due to the indeterminacy (not showing here) in the t-value (infinity), in the hypothesis test of the estimated parameters, it was decided to take the quadratic predictor. Both predictors, quadratic and cubic, better model the proportion (percentage = proportion×100) of fruit damage caused by the concentration of the applied chemical.

Fig. 2.13 Fitting the beta regression model

# 2.6 Exercises

Exercise 2.6.1 The partial dataset corresponds to an evaluation of the effects of increasing application rates of picloram (0, 1.1, 2.2, and 4.5 kg/ha) for the control of larkspur plants (data in Table 2.14). The objective of this study was to study the efficacy of picloram herbicide in controlling larkspur plants.


Exercise 2.6.2 Effect of pH, Brix, temperature, and nisin concentration on the growth of Alicyclobacillus acidoterrestris CRA7152 in apple juice. The objective of this experiment was to model the presence/absence of CRA7152 growth in apple juice as a function of pH (3.5–5.5), Brix (11–19), temperature (25–50 °C), and nisin concentration (0–70). The data are shown below (Table 2.15):


Exercise 2.6.3 The objective of this experiment was to evaluate the level of toxicity of concentrations of pyrethrin and piperonyl butoxide on the mortality of beetles (Tribolium castaneum). Pyrethrin is a natural insecticide found in the plant Chrysanthemum cinerariaefolium and its flowers. The active ingredients are pyrethrins I and II, cinerins I and II, and jasmolins I and II. The dried flowers contain 0.9–1.3% pyrethrum. The crude extract contains 50–60% pyrethrum and is imported from


Table 2.14 Toxicity of picloram in controlling larkspur plants


Table 2.14 (continued)


Table 2.14 (continued)

Rep replicate, Conc concentration, Y =1 dead/ Y = 0 alive

various countries. The extract is diluted to 20%, which is the maximum concentration commercially available in the United States. Pyrethrin oxidizes on exposure to air but has been shown to be stable for long periods in water-based emulsions and oil concentrates. Synergistic compounds (such as piperonyl butoxide or N-octyl bicycloheptene dicarboximide), which enhance the effect of pyrethrin on insects, are present in commercially available pyrethrin formulations. The results of this study are shown below (Table 2.16).



Table 2.15 Growth of Alicyclobacillus acidoterrestris CRA7152


Table 2.17 Results of the experiment with carbon disulfide


Exercise 2.6.4 The objective of this experiment was to model the probability of mortality of the toxic effect of carbon disulfide (CS2) gas on beetles. The insects were exposed to various concentrations of this gas (in mf/L) for 5 hours (Bliss 1935), and, then the number of dead beetles (Y) was counted. The data are shown below (Table 2.17).


Exercise 2.6.5 A study was conducted to assess the fowlpox virus in chorioallantois by the Pock counting technique. The membrane Pock count for 50 embryos exposed to one of four dilutions of virus (multiples of 10ˆ(-3.86)). The FD column heading corresponds to the dilution factor and the number of Pocks observed (Table 2.18).


Table 2.18 Results of the fowl pox experiment

Table 2.19 Number of reversed Salmonella TA98 colonies



Exercise 2.6.6 Data were provided by Margolin et al. (1981) from an Ames Salmonella reverse mutagenicity assay. The table shows the number of reversed colonies observed on each of the three plates (repeats) tested at each of the six quinoline dose levels. The focus is on testing for mutagenic effects over time in the excess variation typically observed between counts (Table 2.19).


# Appendix



Students infected with a certain disease

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 3 Objectives of Inference for Stochastic Models

Throughout this book, we have been using the pseudonym GLMMs to denote generalized linear mixed models. The common denominator among all these models is that they all contain a linear model (LM) part, which refers to the fixed effects component of the linear predictor Xβ. In a GLMM, the prefix "G" indicates that the distribution of observations may not be normal, the suffix of the first M means that the linear predictor includes mixed effects and thus contains random effects, which are expressed by the term "Zb." The fixed linear component of the predictor Xβ is important because the fixed effects describe the treatment design, which, in turn, is determined by the objectives or the initial research questions that the study wishes to answer. Therefore, if the researcher proposes using a reasonable model to analyze an experiment, then he/she must be able to express each objective as a question about a model parameter or as a linear combination of model parameters.

Example Assume a factorial 2 × 2 model, with two levels in both factors A and B, in which all possible combinations are tested. In this case, Xβ corresponds to a two-way model with interaction and a predictor given by

$$\eta\_{ij} = \mu + \alpha\_i + \beta\_j + (a\beta)\_{ij}; i, j = 1, 2$$

As in all the statistical models studied so far, the linear predictor is expressed in terms of the link function, and ηij can estimate the mean μij (a combination of treatments) directly if the data follow a normal distribution and indirectly if the data are not normally distributed. For this example, the inference should focus on one or more of the following options (estimable functions): a treatment combination mean; a main effect mean; the mean of factor A, which is the average of the overall levels of factor B or vice versa; the difference of the main effects or the difference of a single effect, i.e., the difference between two levels of factor A at a given level of B or the difference between two levels of factor B at a given level of factor A; and so on. Each of these options can be expressed in terms of the parameters of the linear predictor, as shown in Table 3.1.


Table 3.1 Estimable functions in a factorial 2 × 2 treatment structure using the identity link function

Assuming that the data have a normal distribution, which is equivalent to using an identity link function, the estimator, in terms of the linear predictor (column 2), estimates the expected values of column three. If the data do not follow a normal distribution, then column 2 indirectly estimates the expected values of column three, and, in order to estimate the expected values, link functions are required. For link functions other than identity, the estimates in column two require a more careful handling. In an experimental design with a factorial treatment structure, the analysis should focus on the interaction of the two factors. If this interaction is significant, then the simple effects are not equal; however, if the interaction is not significant, then the main effects provide useful information; otherwise, the main effects are confounded. Therefore, for this reason, in this case, it is better to focus on the simple effects.

# 3.1 Three Aspects to Consider for an Inference

When constructing a model, the researcher must decide whether the effects are fixed or random. This decision has important implications with respect to the estimation criteria and in the interpretation of the tests and estimates obtained. Given these implications, three important aspects, described in the following sections, must be taken into consideration in statistical modeling.

# 3.1.1 Data Scale in the Modeling Process Versus Original Data

This is a very particular issue for models with a link function other than "identity," since the scale of the data used in the modeling process is not always the same as the scale of the original data when the assumption of normality in the response variable is no longer valid. When the data are normally distributed, the estimable function directly estimates the expected value. However, this is not true if the data follow a non-normal distribution. For example, in a logistic model for binomial data in a completely randomized design, the estimable function η + τ<sup>i</sup> estimates a logit or "log" odds. In this vein, η + τ<sup>i</sup> must be expressed as a probability and not as a logit, i.e., the expected value for individuals receiving the ith treatment is a probability. This requires converting the estimate to a probability, using the inverse link; that is:

$$\pi\_i = \frac{1}{(1 + e^{-(\eta + \pi\_i)})}$$

Thus, for functions other than "identity," there are two ways of expressing the estimates: (1) in terms of the parameters directly estimated from the GLMM (model scale) or (2) in terms of the expected value of the response variable (data scale).

# 3.1.2 Inference Space

This problem arises only when the linear predictor contains random effects. In these models, the estimates are obtained through a linear combination (an estimable function) with fixed effects, even though the linear predictor contains random effects. K′ β denotes the estimable function, where K is the matrix of order [( p + 1) × k] and β is the vector of fixed effects parameters of order [( p + 1) × 1]. The estimable function (K′ β) represents a broad inference as it generalizes results to the entire population represented by the random effects.

Although the linear combination K′ β + Z′ b is a predictable function with Z′ , a matrix for random effects with nonzero coefficients, its inference is limited to only those levels defined in b. Suppose that you are conducting an experiment with three treatments at different locations (L), then the estimable function τ<sup>1</sup> - τ<sup>2</sup> provides information for the inference about the difference between treatments 1 and 2 in the whole population under study. Although the predictable function [τ<sup>1</sup> - τ<sup>2</sup> + (Lτ)1<sup>j</sup> - (Lτ)2j] constrains the inference space between treatments 1 and 2, it is limited to location (Lj). The type of inference produced by predictable functions is called "narrow inference" because the nonzero coefficients in matrix Z reduce the scope of inference for the entire population at those levels identified in Z. Thus, the predictive function K′ β + Z′ b should be used for specific estimates, whereas the estimable function K′ β should be used for valid estimates for the entire population under study.

# 3.1.3 Inference Based on Marginal and Conditional Models

As mentioned in the previous chapter, the specification of a generalized linear mixed model (GLMM) is done in terms of two probability distributions: (1) the distribution of the observations, given the random effects y j b and (2) the distribution of the random effects b. This feature is very particular to Gaussian (and non-Gaussian) mixed models (MMs), for this reason, it is also valid for mixed models with response variables that are different from a normal distribution.

From the probability theory, the marginal probability distribution of data (y) can be obtained by integrating over the random effects, b, from the joint probability distribution of y and b. Of the two distributions, the marginal distribution of data is the only one that can be known and observable. Many non-Gaussian mixed models, which seem reasonable, do not distinguish between the distribution of y j b and y. Models that do not make this distinction are called marginal models. Estimates obtained by marginal models have different expected values compared to those produced by conditional models. Therefore, marginal models are not estimated in the same way as conditional models.

# 3.2 Illustrative Examples of the Data Scale and the Model Scale

In linear models, inference begins with the estimable function K′ β, and, these models, in turn, are defined in terms of the linear function η = g(μ) = Xβ (if there are no random effects) and η = Xβ + Zb (if there are random effects in the model), whereas K′ β produces results in terms of the link function.

For linear normal response models such as LMs and LMMs, the link function is not visible because they use the "identity" function as the link. Linear combinations of model parameters directly estimate desired values such as differences between treatments and many other hypothesis tests of interest. Inference for an LM is straightforward.

For GLMs and GLMMs with a non-normal response, the estimation of K′ β yields a linear combination of elements of the linear predictor η, which is a linear combination of g(μ), typically a nonlinear function of μ. For example, with Poisson data the K′ β is a function of logarithm (log) and for binomial data, it is a function of logit or probit. However, most of the time, the researcher wants to see the binomial results expressed in terms of the probability of the outcome of interest, whereas for Poisson, the results are expressed in terms of counts. This means that since both GLMs and GLMMs carry out the estimation process on the scale of the model (depending on the link used) to report the results of interest in terms of the scale of the data, it is necessary to apply the inverse link to the predictor in terms of the model scale to express the results. To mention two examples, in the case of the logit link for binomial, the results are expressed in terms of probability and, in the case of the


Table 3.2 Percentage of germinated seeds (Y) out of total seeds (N)

Poisson model, they are expressed in terms of counts. To exemplify the model scale and the data scale, an example is shown below.

Example 3.1 Consider the following experiment in which three chemical seed coat softeners were tested for studying their effect on germination of tomato seeds in Styrofoam trays (Table 3.2).

To illustrate the above two concepts, we first analyze these data using a completely randomized design (CRD), assuming the response variable to be normal, and, then, we analyze the same experimental design but with a binomial response variable. We are interested in comparing the means of treatments using a completely randomized design. Note that for demonstrative purposes, we are assuming that Y has a normal distribution, when in fact it has a binomial distribution.

The components of this model are defined as follows:

Distribution: yij~N(μi, σ<sup>2</sup> ) Linear predictor: η<sup>i</sup> = η + τi; (i = 1, 2, 3) Link function: η<sup>i</sup> = μ<sup>i</sup> (identity link)

The analysis of variance (ANOVA) (part (a)) and estimated parameters (part (b)) of this experimental design indicate that there is a highly significant difference between the treatments (P = 0.0033) for the germination of tomato seeds. Table 3.3 shows part of the results.

The estimated parameter values of the model, except for treatment three, are shown in the table above (obtained with the "solution" command) because the model is over-parameterized. The estimable functions K′ β for the treatment means are as follows:

$$\mathbf{K}' = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \end{bmatrix}; \quad \mathbf{J} = \begin{bmatrix} \eta \\ \tau\_1 \\ \tau\_2 \\ \tau\_3 \end{bmatrix}$$


Table 3.3 Results of the analysis of variance using a CRD

þ þ From the estimated treatment parameters τ<sup>i</sup> = μ<sup>i</sup> =^η þ ^τi, we can obtain the estimated mean for each one of the treatments (i = 1, 2, 3) as follows: for treatment 1, τ<sup>1</sup> =^η ^τ<sup>1</sup> = 41:6667 7:3333 = 49; for treatment

þ 2, τ<sup>2</sup> =^η ^τ<sup>2</sup> = 41:6667 - 18 = 23:6667; and for treatment

þ þ 3, τ<sup>3</sup> =^η ^τ<sup>3</sup> = 41:6667 0 = 41. The value of the mean squared error (σ^<sup>2</sup> ),

which appears in the table as "Scale," is 29.5556.

Þ For the difference between treatments, the τ<sup>i</sup> - τi′ values for i ≠ i ′ are as follows: τ<sup>1</sup> -τ<sup>2</sup> = ^η þ ^τ<sup>1</sup> - ð Þ ^η þ ^τ<sup>2</sup> =^τ<sup>1</sup> -^τ<sup>2</sup> = 7:3333 - -ð Þ 18 = 25:3333,τ<sup>1</sup> -τ<sup>3</sup> =^η þ ^τ<sup>1</sup> - ð Þ ^η þ ^τ<sup>3</sup> =^τ<sup>1</sup> -^τ<sup>3</sup> = 7:333 - 0:0 = 7:3333, and τ<sup>2</sup> -τ<sup>3</sup> =^η þ ^τ<sup>2</sup> - ð^η þ ^τ<sup>3</sup> =^τ<sup>2</sup> -^τ<sup>3</sup> = - 18:00 - 0:0 = - 18:0. These estimates can be obtained using the Statistical Analysis Software (SAS) "estimate" and "lsmeans" commands, as shown below:

```
proc glimmix data=germi;
class trt;
model y=trt / solution;
lsmeans trt / diff e;
estimate 'lsm trt 1' intercept 1 trt 1 0;
estimate 'lsm trt 2' intercept 1 trt 0 1;
estimate 'lsm trt 3' intercept 1 trt 0 0 1;
estimate 'overall mean' intercept 1 trt 0.33333 0.33333 0.33333
0.33333;
estimate 'overall mean' intercept 3 trt 1 1 1 1 / divisor=3;
estimate 'trt diff 1&2' trt 1 -1 0;
estimate 'trt diff 2&3' trt 0 1 -1;
run;
```
The "estimate" command requires us to specify what we wish to estimate and the "intercept" command refers to the intercept (η) and "Trt" to the treatment (τi) effects under evaluation; the coefficients needed for the estimates are shown above. While the "lsmeans" command invokes GLIMMIX in SAS to estimate the treatment means, "diff" asks to estimate the differences between pairs of treatments, and "E"


Table 3.4 Results obtained using the "estimate" and "lsmeans" commands

displays the coefficients of the estimable functions used in "lsmeans." Some of the outputs of the above code are shown in Table 3.4.

Next, we analyze the same data, also using a CRD, but now assuming a binomial distribution in the response variable. N indicates the independent number of Bernoulli trials observed in the ijth observation. The components of the model are as follows:

Distribution: yij~Binomial(Nij, πi) Linear predictor: η<sup>i</sup> = η + τi; (i = 1, 2, 3) Link function: η<sup>i</sup> = logit <sup>π</sup><sup>i</sup> <sup>1</sup> - <sup>π</sup><sup>i</sup> (logit link)

Fitting these data in a binomial model, the fixed effects solution of the parameters obtained in terms of the model scale are tabulated in Table 3.5.

The above results were obtained using the following SAS code:

```
proc glimmix data=germi;
class trt;
model y/n=trt / solution;
run;
```
ð Þ Similar to the previous example, we can estimate the mean of treatments and the differences between two pairs of treatments. The linear predictors for the treatments are as follows: ^η<sup>1</sup> =^η þ ^τ<sup>1</sup> = 0:5108 þ 0:5093 = 1:0201, ^η<sup>2</sup> = ^η þ ^τ<sup>2</sup> = 0:5108 - 1:108 = - 0:5971, and ^η<sup>3</sup> =^η þ ^τ<sup>3</sup> = 0:5108 þ 0:0 = 0:5108, and, for the differences between treatments (1 and 2, 1 and 3, and 2 and 3), they are as follows: ^η<sup>1</sup> -^η<sup>2</sup> = 1:0201 - - 0:5971 = 1:6173, ^η<sup>1</sup> -^η<sup>3</sup> = 1:0201 - 0:5108 = 0:5093, and ^η<sup>2</sup> -^η<sup>3</sup> = - 0:5971 - 0:5108 = - 1:1079, respectively


Table 3.5 Estimated parameters at the model scale

Using the relationship between the linear predictor and the link function η<sup>i</sup> = logit <sup>π</sup><sup>i</sup> <sup>1</sup> - <sup>π</sup><sup>i</sup> <sup>=</sup> log <sup>π</sup><sup>i</sup> <sup>1</sup> - <sup>π</sup><sup>i</sup> , we can estimate the probability of observing a favorable outcome for each of the treatments, that is, π1, π2, and π3, respectively. Applying the inverse link, we obtain:

$$\hat{\pi}\_1 = 1/\left(1 + e^{-\left(\hat{\eta} + \hat{\pi}\_1\right)}\right); \hat{\pi}\_2 = 1/\left(1 + e^{-\left(\hat{\eta} + \hat{\pi}\_2\right)}\right); \text{and } \hat{\pi}\_3 = 1/\left(1 + e^{-\left(\hat{\eta} + \hat{\pi}\_3\right)}\right).$$

Substituting the corresponding values, we obtain

$$
\hat{\pi}\_1 = 1/\left(1 + e^{-\left(\hat{\eta} + \hat{\tau}\_1\right)}\right) = 1/\left(1 + e^{-1.0201}\right) = 0.735,
$$

$$
\hat{\pi}\_2 = 1/\left(1 + e^{-\left(\hat{\eta} + \hat{\tau}\_2\right)}\right) = 1/\left(1 + e^{-\left(-0.5971\right)}\right) = 0.355,\text{and}
$$

$$
\hat{\pi}\_3 = 1/\left(1 + e^{-\left(\hat{\eta} + \hat{\tau}\_3\right)}\right) = 1/\left(1 + e^{-\left(0.5108\right)}\right) = 0.625
$$

Here, we can see that the treatment with the highest probability of success is treatment one, followed by treatment three, whereas treatment two has the lowest probability of success. Now, for the difference between two treatments, τ<sup>i</sup> - τi′ for i ≠ i ′ , we can estimate the logarithm of the odds ratio as

$$
\pi\_i - \pi\_{i'} = \log\left(\frac{\pi\_i}{1 - \pi\_i}\right) - \log\left(\frac{\pi\_{i'}}{1 - \pi\_{i'}}\right) = \log\left(\frac{\pi\_i}{1 - \pi\_{i'}} \sqrt{\frac{\pi\_{i'}}{1 - \pi\_{i'}}}\right),
$$

<sup>1</sup> - <sup>π</sup><sup>i</sup> 0 where, in this particular case, odds = <sup>π</sup><sup>i</sup> <sup>1</sup> - <sup>π</sup><sup>i</sup> is the odds of the treatment i and oddsratio = log <sup>π</sup><sup>i</sup> <sup>1</sup> - <sup>π</sup><sup>i</sup> = <sup>π</sup><sup>i</sup> <sup>0</sup> is the odds ratio for treatments i and i ′ , for i ≠ i ′ .

When applying the inverse link to the above expression (odds ratio), we get

$$\mathbf{O} \text{ddsratio} = 1 \bigvee \left( {}^{1+e^{-\left(\frac{\hat{\mathbf{r}}\_i - \hat{\mathbf{r}}\_{i'}}{\hat{\mathbf{r}}\_i}\right)}} \right)$$

The value of the odds ratios for treatments 1 and 3 is

$$\text{Oddsratio}\_{1-3} = \left| \langle \rangle\_{1+\epsilon^{-\left(\frac{\tilde{r}}{2}-\frac{\tilde{r}}{2}\right)}} \right| = \left| \langle \rangle\_{1+\epsilon^{-0.9093}} \right| = 0.6246$$

Similarly, for the pair of treatments 1–2 and 2–3, the resulting odds ratios are Oddsratio1 - <sup>2</sup> = 0.8344 and Oddsratio2 - <sup>3</sup> = 0.2483, respectively. It is important to mention that the odds ratios are not the mean of the difference of π<sup>i</sup> - πi′for i ≠ i ′ .

From the previous example, it is clear that when the response variable is not normal, parameter estimation and inference occurs at two levels. The linear predictor Xβ and the estimable function K′ β are expressed in terms of the link function, logit – estimates on the model scale (scale of the link function) – as in the above example. Under the logit link, the logarithm of the odds and the difference of the estimate (log odds ratio) are very common and useful terms in categorical data analysis for the estimation of treatments or treatment differences in terms of the data scale.

Commonly, estimation at the model scale in GLMs is not very easy to interpret, and, as such, the data scale plays a very important role. A data scale involves applying the inverse of the link function to the estimable function, K′ β, as we did in the previous example to convert the log of the odds for each treatment to a probability. In general, we use the inverse of the link function to transform the estimates at the model scale to the data scale. The inverse of the link function is not used for estimating the differences between treatments because the link functions are generally nonlinear. This is why the inverse of the link function is not applied to the differences between treatments because it produces meaningless results.

Þ Thus, in the logit model, we have two approximations for the difference. First, we could apply the inverse of the link function to each linear predictor for each treatment and then take the difference between probabilities: π^<sup>i</sup> - π^<sup>i</sup> ′ . That is, we can estimate the difference between <sup>π</sup><sup>i</sup> - <sup>π</sup>i′ through 1<sup>=</sup> <sup>1</sup> <sup>þ</sup> <sup>e</sup> - ð Þ <sup>η</sup>þτ<sup>i</sup> - <sup>1</sup><sup>=</sup> <sup>1</sup> <sup>þ</sup> <sup>e</sup> - <sup>ð</sup>ηþτi<sup>0</sup> and not as 1<sup>=</sup> <sup>1</sup> <sup>þ</sup> <sup>e</sup> - ð Þ ^τ<sup>i</sup> -^τi<sup>0</sup> . Second, we know that <sup>τ</sup><sup>i</sup> - <sup>τ</sup>i′ estimates the logarithm of the odds ratio by means of eð Þ <sup>τ</sup><sup>i</sup> - <sup>τ</sup>i<sup>0</sup> , which produces an estimate of the odds ratio. Both approaches are valid, and the use of one approach or the other depends on the requirements of the particular study.

With the GLIMMIX procedure, we can implement the solution in terms of the data scale with the "ilink," "exp," and "oddsratio" commands, as shown in the following SAS code:

```
proc glimmix;
class trt;
model y/n=trt / solution oddsratio;
lsmeans trt / diff oddsratio ilink ;
estimate 'lsm trt 1' intercept 1 trt 1 0/ilink;
estimate 'lsm trt 2' intercept 1 trt 0 1/ilink;
estimate 'lsm trt 3' intercept 1 trt 0 0 1/ilink;
estimate 'overall mean' intercept 1 trt 0.33333 0.33333 0.33333
0.33333/ilink;
estimate 'overall mean' intercept 3 trt 1 1 1 1 / divisor=3 ilink;
```

```
estimate 'trt diff 1&2' trt 1 -1 0/ oddsratio ilink;
estimate 'trt diff 1&3' trt 1 0 -1/oddsratio ilink;
estimate 'trt diff 2&3' trt 0 1 -1/oddsratio ilink;
estimate 'trt diff 1&2' trt 1 -1 0/exp;
estimate 'trt diff 1&3' trt 1 0 -1/exp;
estimate 'trt diff 2&3' trt 0 1 -1/exp;
run;
```
Part of the output of "proc GLIMMIX" is shown in Table 3.6. The "Odds ratio estimates" (part (a)) are the result of the "oddsratio" command in the previous program, whereas the confidence intervals are provided by default.

What appears under "Estimate" (in part (b)) is in the model scale ^η<sup>i</sup> = ^η þ ^τi, and what appears under "Mean" (in part (b)) is an estimate of the inverse of the link function <sup>π</sup>^<sup>i</sup> <sup>=</sup> <sup>1</sup><sup>=</sup> <sup>1</sup> <sup>þ</sup> <sup>e</sup> - ð Þ ^ηþ^τ<sup>i</sup> and, in this case, is a probability that corresponds to the data scale. Similarly, what appears under "Estimate" is in model scale ^τ<sup>i</sup> -^τ<sup>i</sup> 0 , whereas the "Odds ratio" values were estimated using eð Þ <sup>τ</sup><sup>i</sup> - <sup>τ</sup>i<sup>0</sup> and are in data scale.

Þ Under "Estimates" column in Table 3.7, the log odds ratio appears as an "Exponentiated estimate" regardless of whether we use the "oddsratio" or "exp" option in the "estimate" command. For the overall mean, the inverse of the link function applied to ^<sup>η</sup> <sup>þ</sup> <sup>1</sup> <sup>3</sup> ð Þ ^τ<sup>1</sup> þ ^τ<sup>2</sup> þ ^τ<sup>3</sup> is 0.5772, which is totally different from the average of π^is; that is, <sup>1</sup> <sup>3</sup> ð Þ <sup>π</sup>^<sup>1</sup> <sup>þ</sup> <sup>π</sup>^<sup>2</sup> <sup>þ</sup> <sup>π</sup>^<sup>3</sup> <sup>=</sup> <sup>1</sup> <sup>3</sup> ð0:735 þ 0:355 þ 0:655 = 0:5816. This illustrates that we have to be extremely careful when using the output of proc GLIMMIX, as it can produce outputs in terms of both the model scale and the data scale through the application of the inverse of the link function; however, this has to be applied appropriately, otherwise, we will get meaningless results.

### Example 3.2: Randomized complete block design (RCBD) with normal and binomial responses

Now, assume that we have the same example but in an RCBD. The three treatments were tested in each of the blocks, as shown in Table 3.8.

In this example, first, the data are analyzed assuming a normal response and assuming that the block effect is fixed; then, they are analyzed assuming a binomial response.

The model components under a Gaussian response variable are as follows:

Distribution: yij ~ N(μij, σ<sup>2</sup> ) Linear predictor: ηij = η + τ<sup>i</sup> + blockj; (i, j = 1, 2, 3) Link function: ηij = μij; (identity link)

From the theory of linear models, we know that we can estimate the ith treatment mean through

$$\bar{\eta}\_{i\bullet} = \mathbb{W}\_{\triangleright} \sum\_{j=1}^{3} \mathbf{y}\_{ij} = \eta + \tau\_i + \mathbb{W}\_{\triangleright} \sum\_{j=1}^{3} \mathbf{block} \mathbf{block}\_{j} = \eta + \tau\_i + \overline{\mathbf{block}}\_{i}$$


Table 3.6 Results of the "ilink," "exp," and "oddsratio" commands

Table 3.7 Estimates at the model scale and at the data scale



Table 3.8 Percentage of germinated seeds (Y) out of total seeds (N) in a randomized complete block design

where bloq• = <sup>1</sup>=3 3 j = 1 blockj:

For the mean difference of two treatments i and i', this is estimated as

$$
\bar{\eta}\_{i\bullet} - \bar{\eta}\_{i'\bullet} = \eta + \tau\_i + \text{b\bar{l}o}\mathbf{q}\_{\bullet} - (\eta + \tau\_{i'} + \text{b\bar{l}o}\mathbf{q}\_{\bullet}) = \tau\_i - \tau\_{i'}\eta
$$

The goal of this experiment could be to compare the treatment means, that is, η1: =η2: =η3:, equivalently – this can be expressed as τ<sup>1</sup> = τ<sup>2</sup> = τ<sup>3</sup> – or to compare one treatment with the average of the other treatments: for example, to compare treatment 1 with the averages of treatments 2 and 3 (Trt1.vs.average.Trt2.and.Trt3).

For the hypothesis test of the equality of treatments (τ<sup>1</sup> = τ<sup>2</sup> = τ3), the estimable function K′ β is given by:

$$\begin{aligned} \mathbf{K}^{'} = \begin{bmatrix} \mathbf{0} & 1 & \mathbf{0} & -1 & \mathbf{0} & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & 1 & -1 & \mathbf{0} & \mathbf{0} & \mathbf{0} \end{bmatrix}; \ \mathbf{f} = \begin{bmatrix} \eta \\ \tau\_1 \\ \tau\_2 \\ \tau\_3 \\ \mathbf{b} \mathbf{l} \mathbf{q}\_1 \\ \mathbf{b} \mathbf{l} \mathbf{q}\_2 \\ \mathbf{b} \mathbf{l} \mathbf{q}\_3 \end{bmatrix} \end{aligned}$$

While for contrasts Trt1.vs.average.Trt2.and.Trt3 and Trt2.vs.average.Trt1.and. Trt3, K′ β is given by

$$\text{Trt1.vs.}\\\text{average.}\\\text{Trt2. and}\\\text{Trt3} = \bar{\eta}\_1\\\text{.} - \text{!}\\\langle \bar{\eta}\_2\\\text{.} + \bar{\eta}\_3\\\text{.} ) = \tau\_1 - \left(\frac{\tau\_2 - \tau\_3}{2}\right)$$

$$\text{Trt2.vs.}\\\text{average.Trt1.and.}\\\text{Trt3} = \bar{\eta}\_2 \text{.} - \,^1/2(\bar{\eta}\_1 \text{.} + \bar{\eta}\_3 \text{.}) = \tau\_2 - \left(\frac{\tau\_1 - \tau\_3}{2}\right)^2$$

$$\mathbf{K}' = \begin{bmatrix} 0 & 2 & -1 & -1 & 0 & 0 & 0 \\ 0 & 1 & -2 & 1 & 0 & 0 & 0 \end{bmatrix}; \mathbf{J} = \begin{bmatrix} \eta \\ \tau\_1 \\ \tau\_2 \\ \tau\_3 \\ \text{bloq}\_1 \\ \text{bloq}\_2 \\ \text{bloq}\_3 \end{bmatrix}$$

The following GLIMMIX procedure allows us to implement the above example.

```
proc glimmix;
class trt block;
model y = trt block/solution;
lsmeans trt / diff e;
estimate 'lsm trt1' intercept 3 trt 3 0 0 0 block 1 1 1 / divisor=3;
estimate 'overall mean' intercept 3 trt 1 1 1 1 block 1 1 1 1 / divisor=3;
estimate 'average trt1&trt2' intercept 6 trt 3 3 0 block 2 2 2 /
divisor=6;
estimate 'average trt1&trt2&trt3' intercept 9 trt 3 3 3 3 block 3 3 3 3
3/divider=9;
estimate 'trt1 vs trt2' trt 1 -1 0 ;
estimate 'trt1 vs trt3' trt 1 0 -1;
estimate 'trt2 vs trt3' trt 0 1 -1;
estimate 'trt1 vs trt2' trt 1 -1 0, 'trt1 vs trt3' trt 1 0 -1, 'trt2 vs
trt3' trt 0 1 -1/divisor=1,1,1 adjust=sidak;
contrast 'trt1 vs trt2' trt 1 -1 0 ;
contrast 'trt1 vs trt3' trt 1 0 -1;
contrast 'trt2 vs trt3' trt 0 1 -1;
contrast 'trt1 vs average trt1,trt2' trt 2 -1 -1;
contrast 'trt2 vs average trt1,trt3' trt -1 2 -1;
contrast 'type 3 trt ss' trt 1 0 -1 0,trt 0 1 -1;
contrast 'type 3 trt test' trt 2 -1 -1,trt -1 2 -1;
run;
```
Part of the GLIMMIX output is shown below in Table 3.9. Parameter estimation for treatments 1–2 and blocks 1–2 are shown below, except for treatment and block 3. This is because it is an incomplete rank model. The generalized inverse is used in the estimation through the SWEEP operator of SAS. In this case, it sets the last class effect equal to zero (Table 3.9).

"Coefficients" (part (a) of Table 3.10) obtained with option E for the least squares means of treatments in "lsmeans" shows how SAS uses this information in the parameter solution to calculate the treatment means (part (b)). In part (c), we can see the difference of means obtained with the "diff" option in "lsmeans."

The estimates obtained from the "estimate" command with multiple estimable functions and in the "Sidak" adjustment and contrasts are shown in Table 3.11. This adjustment allows us to control for type I errors. The "adjust" option in "estimate" in the Sidak adjustment (part (b)) allows us to obtain the adjusted P-values denoted as AdjP in addition to Pr > |t|.


Table 3.9 Estimation of treatment and block parameters


Table 3.10 Coefficients for treatment and block used in least squares


The planned contrasts in matrix K′ and with the F-values obtained with the "contrast command" produce the same results (part (c)).

Now, the same dataset is fitted using the same predictor but assuming that the response variable is binomial. This analysis intends to show the options available in the SAS commands when you want to fit non-normal responses; in this case, it is binomial. Practically, the same commands used in the previous program with normal data are used, but, now, some other options ("ilink," "oddsratio," or "exp") are exemplified with details under what circumstances they should be used. This is because all estimable functions produce estimates at the model scale, and we must


Table 3.11 Multiple estimates and contrasts

decide what conversions are necessary to obtain the results at the data scale. Below, the estimable functions and the appropriate conversion required to produce the results on the data scale are listed.


The following GLIMMIX program shows how to implement this model with a binomial response.

```
proc glimmix;
class trt block;
model y/n = trt block/solution oddsratio;
lsmeans trt / diff e oddsratio;
estimate 'lsm trt1' intercept 3 trt 3 0 0 0 block 1 1 1 1/divider=3 ilink;
estimate 'difference trt1 vs trt2' trt 1 -1 0/exp;
estimate 'avg trt1&trt2&trt3' intercept 9 trt 3 3 3 3 block 3 3 3 3
3/divider=9;
estimate 'trt1 vs trt2' trt 1 -1 0/exp;
estimate 'trt1 vs trt3' trt 1 0 -1/exp;
estimate 'trt2 vs trt3' trt 0 1 -1/exp;
estimate 'trt1 vs trt2' trt 1 -1 0, 'trt1 vs trt3' trt 1 0 -1, 'trt1 vs
trt3' trt 0 1 -1/exp adjust=sidak;
contrast 'trt1 vs trt2' trt 1 -1 0;
contrast 'trt1 vs trt3' trt 1 0 -1;
contrast 'trt2 vs trt3' trt 0 1 -1;
contrast 'trt1 vs average trt1,trt2' trt 2 -1 -1;
contrast 'trt2 vs average trt1,trt3' trt -1 2 -1;
contrast 'type 3 trt ss' trt 1 0 -1 0,trt 0 1 -1;
contrast 'type 3 trt test' trt 2 -1 -1,trt -1 2 -1;
run;
```
Part of the output is shown in Table 3.12. The estimated treatment and block parameters of the model are given in part (a) of Table 3.12; the last two effects of both classes were restricted to zero because they are incomplete rank design matrices. In part (b), the type III tests of fixed effects and in part (c) the odds ratio estimates are provided. Note that σ^<sup>2</sup> does not appear in the output because the variance of the binomial distribution is not an independent parameter.

In Table 3.12 (parts (b) and (c)), which shows the sum of the squares of fixed effects type III as well as the odds ratio, it can be seen that only the effect of treatments is significant but not the effect of blocks, which indicates that it is valid to analyze these data using a completely randomized design. Two sets of odds ratios were estimated (part (c)): one for the treatment effects and the other for the block effects in the model. In the calculation of odds ratios, generally, the last level of the factor is compared with the rest of the levels of that same factor.

ð Þ The estimates obtained with "estimate", in Table 3.13 (parts (a) and (b)), are results in terms of the model scale, whereas the last column is obtained by applying EXP e<sup>τ</sup><sup>i</sup> - <sup>τ</sup>i<sup>0</sup> .

þ þ The least squares means for treatment and the linear predictors of treatment differences (parts (a) and (b) of Table 3.14, respectively) obtained with "lsmeans" are the values under the "Estimate" column, and, these, together with their corresponding standard errors, were obtained using the linear predictor ^ηij <sup>=</sup> ^<sup>η</sup> ^τ<sup>i</sup> ^ block• .

These estimates are on the model scale, whereas the values under the "Mean" column and their respective standard errors were obtained by applying the inverse link to obtain the probabilities of success of each treatment (π^iÞ. While the estimated linear predictors for the mean differences were obtained with the "oddsratio" option, the mean difference in the data scale is obtained by taking the inverse of these predictors.


Table 3.12 Results of the analysis of variance in a binomial model

# 3.3 Fixed and Random Effects in the Inference Space

In an analysis, inference can be directed solely at fixed effects (population inference) or at a combination of fixed and random effects (specific inference). To illustrate these two levels of inferences, we will consider two examples:

# 3.3.1 A Broad Inference Space or a Population Inference

In practice, the random effects in a linear mixed model (LMM) should represent the population from which the data were collected and should be included in studies as if they came from a well-planned sample. In a model, random effects can be locations, regions, states, blocks, and so on, and they have two very particular characteristics.


These two characteristics allow us to have a broad inference space where we can calculate point estimates, estimate intervals, and perform hypothesis testing applicable to the entire population represented by the random effects. Formally, an estimate or hypothesis test based on an LMM indicates that we have a broad


Table 3.13 Different estimates obtained with "estimate"

Trt3


Table 3.14 Estimated linear predictors for treatments and treatment differences with their respective inverse values

inference space defined by the estimable function K′ β if Z is a matrix with coefficients equal to zero; otherwise, the estimation or hypothesis test is defined by the prediction function K′ β + Z′ β, which is a specific inference.

# 3.3.2 Mixed Models with a Normal Response

In Example 3.2, the response variable was assumed as a function of fixed effects due to treatments and blocks, since block effects were also assumed to be fixed effects. Now, suppose that applications of treatments were done by three different people (blocks); then, assuming that the block effects are fixed, this would be questionable since each person does their job according to their experience, skill, and so forth. Clearly, there is some variability between blocks that is not due to the experiment and this has to be removed, so the effects due to blocks must be considered random. In this example, let us assume that the three blocks (persons) were randomly selected from a population. Thus, the components of the model are defined as follows:

Distribution:


Linear predictor: ηij = η + τ<sup>i</sup> + block<sup>j</sup> (i, j = 1, 2, 3) Link function: ηij = μij (identity link)

Note the impact of changing the estimable function for the mean of treatments. In Example 3.2, the estimable function was defined by <sup>E</sup>ðyi•<sup>Þ</sup> <sup>=</sup> <sup>η</sup> <sup>þ</sup> <sup>τ</sup><sup>i</sup> <sup>þ</sup> block• . Now with the mixed model (fixed effects and random blocks), the estimable function is defined by <sup>E</sup> yi• ð Þ <sup>=</sup> <sup>η</sup> <sup>þ</sup> <sup>τ</sup><sup>i</sup> because <sup>E</sup>(block) <sup>=</sup> 0. Therefore, the estimable function for the mean in each of the treatments is η + τi. In this situation, two questions arise:


The following program allows us to estimate a mixed model with a normal response.

```
proc glimmix;
class trt block;
model y = trt /solution;
random block/solution;
lsmeans trt / diff e;
estimate 'lsm trt1' intercept 1 trt 1 0 0 0|block 0 0 0 0;
estimate 'lsm trt2' intercept 1 trt 0 1 0|block 0 0 0 0;
estimate 'lsm trt3' intercept 1 trt 0 0 0 1|block 0 0 0 0;
estimate 'blup trt1' intercept 3 trt 3 0 0 0|block 1 1 1 / divisor=3;
estimate 'blup trt2' intercept 3 trt 0 3 0|block 1 1 1 / divisor=3;
estimate 'blup trt3' intercept 3 trt 0 0 3|block 1 1 1 / divisor=3;
run;
```
In the previous SAS GLIMMIX code, the "estimate" command shows the coefficients associated with the fixed effects before the vertical bar (j) and after the vertical bar, are provided the coefficients for the random effects associated with the model, that is:

$$\mathbf{K}^{'}\boldsymbol{\mathfrak{f}} + \mathbf{Z}^{'}\boldsymbol{\mathfrak{b}} = \begin{bmatrix} \overbrace{\begin{subarray}{c} \text{eefestosfijos} \\ 1 & 1 \\ 1 & 0 \\ 1 & 0 \end{subarray}}^{\text{eefestosfijos}} \begin{pmatrix} \eta \\ \tau\_{1} \\ \tau\_{2} \\ \tau\_{3} \end{pmatrix}}\_{\begin{subarray}{c} \text{i} \\ \tau\_{3} \end{subarray}} + \begin{pmatrix} \text{eefestosalatorios} \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} \text{block}\_{1} \\ \text{block}\_{2} \\ \text{block}\_{3} \end{pmatrix} \end{bmatrix}$$

Part of the output is shown in Table 3.15. Subsection (a) shows the estimated variance components due to blocks, and for conditional observations, the effect of the blocks is σ^<sup>2</sup> block = 11:2778, whereas the mean squared error (MSE) is σ^<sup>2</sup> = 18:2778. On the other hand, the fixed effects solution obtained with the "solution" option of the parameters is provided in part by (b). The analysis of variance (part (c)) indicates that there is a significant difference between treatments (P = 0.0045), and so the null hypothesis must be rejected (H<sup>0</sup> : μ<sup>1</sup> = μ<sup>2</sup> = μ3).

The means estimated with the "estimate" statement, given that the mean block effect is zero, the mean and the best linear unbiased predictor for each treatment are similar, as shown in Table 3.16 (part (a)). Subsections (b) and (c) show the means and differences between two means estimated with the "lsmeans" statement and the "diff" option.


Table 3.15 Variance components, fixed effects, and fixed effects test

# 3.4 Marginal and Conditional Models

The process of analyzing a dataset has two main objectives: the first is model selection, which aims to find well-fitting parsimonious models for the responses being measured, and the second is model prediction, where estimates from the selected models are used to predict quantities of interest and their uncertainties.

The differences that may arise in this analysis process are mainly due to the choice of unidentifiable constraints on random effects. To compare two different models, we must compare analogous quantities. Different constraints can lead to apparently extremely different but inferentially identical models. The conditional model is believed to be the basic model, and any conditional model leads to a specific marginal model. Lee and Nelder (2004) proposed and worked on conditional models derived from generalized hierarchical linear models (GHLMs) and marginal models derived from these conditional models. Marginal models have often been fitted using generalized estimating equations (GEEs), the drawbacks of which are also discussed.

# 3.4.1 Marginal Versus Conditional Models

Consider two models with a normal distribution: one is a random effects model (a mixed model)

$$\gamma\_{i\bar{j}} = \mu + \mathfrak{r}\_i + b\_{\bar{j}} + \mathfrak{e}\_{i\bar{j}}$$

where bj N 0, σ<sup>2</sup> <sup>b</sup> and <sup>ε</sup>ij~N(0, <sup>σ</sup><sup>2</sup> ). The other is a marginal model


Table 3.16 Estimated means, best linear unbiased estimates (BLUEs), and BLUPs for treatment and the difference between two means

$$E(\mathbf{y}\_{\vec{\eta}}) = \mu + \mathfrak{r}\_{\vec{\iota}}$$

where the elements in V( y) = Σ are variances and covariances that have an arbitrary correlation structure. Zeger et al. (1988) pointed out that given a marginal model, the generalized estimating equations are consistent. An obvious advantage of using random effects models is that they allow conditional inferences in addition to marginal inferences (Robinson 1991). Using the model with random effects, we can obtain not only the conditional mean

$$
\mu\_{ij}^C = E\left(\mathbf{y}\_{ij}|b\_j\right) = \mu + \boldsymbol{\tau}\_i + b\_j
$$

but also the marginal mean

$$\mu\_{\vec{\imath}\vec{\jmath}} = E\left(\mu\_{\vec{\imath}\vec{\jmath}}^C\right) = E\left(\mathbf{y}\_{\vec{\imath}\vec{\jmath}}|b\_{\vec{\jmath}}\right) = \mu + \boldsymbol{\tau}\_{\vec{\imath}}$$

whereas with the marginal model, we can obtain only the marginal mean μij.

It may be reasonable to assume that the unobservable characteristic of the random effects of blocks (bj) follows a certain distribution. However, the center of this distribution cannot be identified because it is confounded with the intercept. Therefore, in the random effects model, we put the unidentifiable constraints E(bi) = 0 and E(εij) = 0 as we do for error terms in linear models. In the mixed model, these restrictions are <sup>j</sup> ^bj <sup>=</sup> 0 and <sup>j</sup> ^εij = 0 in any estimation procedure. First, we

### 3.4 Marginal and Conditional Models 107


Table 3.17 Mortality of coffee seedling clones (C) in different substrates (S)

consider the case in which the data follow a normal distribution. We then briefly discuss how the results differ for data with a non-normal distribution.

# 3.4.2 Normal Distribution

Example The effect of different substrates (factor A), i.e., three substrates made from vermicompost and one from compost, on the development of physiological variables and mortality of cuttings of three clones (factor B) of robusta coffee (Coffea canephora p.) was evaluated. The levels of factor A are randomly assigned to rows in each block, with the following restriction: each block receives levels A1, A2, A3, and A4 and each level of factor B (B1, B2, and B3) is randomly assigned to each level of factor A in each block. The data for this experiment are tabulated in Table 3.17.

Note that while there are two randomization processes, there are effectively three sizes of experimental units: rows for A levels, columns for B levels, and row–column intersections for A × B combinations. Thus, the experimental design used was a complete randomized design with a strip-plot treatment arrangement.

The model, for these data, is given below:

$$\mathcal{Y}\_{ijk} = \mu + b\_k + a\_i + (ab)\_{ik} + \beta\_j + (\beta b)\_{jk} + (a\beta)\_{ij} + \varepsilon\_{ijk}$$

where yijk is the kth response observed at the ith level of factor A and at the jth level of factor B, μ is the overall mean, bk is the random effect due to blocks assuming bk - N 0, σ<sup>2</sup> <sup>b</sup> , <sup>α</sup><sup>i</sup> is the fixed effect due to substrate type (S), (αb)ik is the random effect due to the interaction of a substrate with blocks assuming ð Þ αb ik - N 0, σ<sup>2</sup> <sup>α</sup><sup>b</sup> , β<sup>j</sup> is the fixed effect due to the coffee clone type (C), (βb)jk is the random effect due to the interaction of a coffee clone with blocks assuming ð Þ βb jk - N 0, σ<sup>2</sup> <sup>β</sup><sup>b</sup> , (αβ)ij is the interaction fixed effect between a substrate and a coffee clone, and εijk is the normal random error εijk~N(0, σ<sup>2</sup> ). The components of the model for this dataset are as follows:

j Linear predictor: ηijk = μ + bk + α<sup>i</sup> + (αb)ik + β<sup>j</sup> + (βb)jk + (αβ)ij Distributions: yijk bk, (αb)ik, (βb)jk~N(μijk, σ<sup>2</sup> )

$$b\_k \sim N(0, \sigma\_b^2); \ (ab)\_{ik} \sim N(0, \sigma\_{ab}^2); \ (\beta b)\_{jk} \sim N\left(0, \sigma\_{\beta b}^2\right)$$

Link function: ηijk = μijk

The following GLIMMIX syntax sets a GLMM with a normal response.

```
proc glimmix;
class block s c;
model y=s|c;
random intercept s w/subject=block;
lsmeans s*c/ slicediff=s;
run;
```
Part of the results of this analysis is shown below. The estimated variance components for blocks, block × substrate, blocks × clon, and the MSE are σ^2 <sup>b</sup> = 23:4714, σ^<sup>2</sup> <sup>α</sup><sup>b</sup> = 35:4995, σ^<sup>2</sup> <sup>β</sup><sup>b</sup> = 67:0160 and σ^<sup>2</sup> = CME = 139:58, respectively, which are listed in part (a) of Table 3.18. However, the fixed effects tests for both factors and the interaction (part (b)) are not statistically significant.

According to the "slicediff = s" option in the "lsmeans" statement, Table 3.19 shows the simple effects of each substrate level at varying clone levels.

# 3.4.3 Non-normal Distribution

Example Using the data in Table 3.17 but under a beta distribution, the components of the GLMM change slightly:


Table 3.19 Simple effect comparisons across substrate levels


j Distributions: yijk bk, (αb)ik, (βb)jk~Beta(μijk, ϕ), where ϕ is the scale parameter.

$$b\_k \sim N\left(0, \sigma\_b^2\right); (ab)\_{ik} \sim N\left(0, \sigma\_{ab}^2\right); (\beta b)\_{jk} \sim N\left(0, \sigma\_{\beta b}^2\right)$$

Linear predictor: ηijk = μ + bk + α<sup>i</sup> + (αb)ik + β<sup>j</sup> + (βb)jk + (αβ)ij Link function: ηijk = logit(μijk)

The following GLIMMIX syntax sets a beta response variable.

```
proc glimmix method=laplace;
class block s c;
model pct=s|c/dist=beta;
random intercept s w/subject=block;
lsmeans s*c/plot=meanplot(sliceby=s join) slicediff=s ilink;
run;
```


Table 3.20 Variance compo-

Table 3.21 Simple effect comparisons across substrate levels


Simple effect comparisons of S\*C least squares means by S

Some of the SAS output from this analysis is shown below. The variance components estimated for blocks, block × substrate, blocks × clon, and the scale parameter are σ^<sup>2</sup> <sup>b</sup> = 0:06723, σ^<sup>2</sup> <sup>α</sup><sup>b</sup> = 0:1594, σ^<sup>2</sup> <sup>β</sup><sup>b</sup> = 0:1932, and with scale parameter ϕ^ = 16:6041, respectively, which are listed in part (a) of Table 3.20. However, the fixed effects tests for both factors and the interaction (part (b)) are not statistically significant. Unlike a normal distribution (the previous example), the variance components (multiplied by 100) under the beta distribution are smaller, and the type III fixed effects test is closer to be significant.

Table 3.21 shows, for each substrate level at varying clone levels, the estimates (linear predictors) of the simple effects. These effects differ from the previous results, but this is mainly because in a GLMM, these values correspond to the linear predictors estimated at the model scale and not to the estimated means at the data


Table 3.22 Total yields (grams) of barley varieties in 12 independent trials

scale (Example 3.4.2). It is also important to note that the degrees-of-freedom correction in the estimation of means cannot yet be used in the estimation of a GLMM.

# 3.5 Exercises

Exercise 3.5.1 The data in the Table 3.22 below show the yield of five barley varieties in a randomized complete block experiment conducted in Minnesota (Immer et al. 1934).


Exercise 3.5.2 Lew (2007) conducted an experiment to determine whether cultured cells respond to two drugs (chemical formulations). The experiment was conducted using a cell culture line placed in Petri dishes. Each experimental trial consisted of three Petri dishes: one treated with drug 1, one treated with drug 2, and one untreated as a control. The data are shown in the following Table 3.23:



Table 3.23 Number of cells cultured in different drugs


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 4 Generalized Linear Mixed Models for Non-normal Responses

# 4.1 Introduction

Generalized linear mixed models (GLMMs) have been recognized as one of the major methodological developments in recent years, which is evidenced by the increased use of such sophisticated statistical tools with broader applicability and flexibility. This family of models can be applied to a wide range of different data types (continuous, categorical (nominal or ordinal), percentages, and counts), and each is appropriate for a specific type of data. This modern methodology allows data to be described through a distribution of the exponential family that best fits the response variable. These complex models were not computationally possible up until recently when advances in statistical software have allowed users to apply GLMMs (Zuur et al. 2009; Stroup 2012; Zuur et al. 2013). Researchers in fields other than statistical science are also interested in modeling the structure of data. For example, in the social sciences there have been applications in the field of education when several tests are applied to students; in longitudinal personality studies when the occurrence of an emotion is repeatedly observed over time over a set of people; and in surveys to investigate the political preference of a population, among others.

Likewise, agriculture and life sciences are other major areas, where the measurement of response variables depart from the conventionally used classical methodology based on "normality" to model or describe the data set, i.e., data that generally fall within the nominal, ordinal or interval (continuous) scales of measurement. In a GLMM, the data response does not undergo any transformation, but, instead, the response is modeled as a function of the expected value through a linear relationship with the explanatory variables. GLMMs, a powerful tool, allow proper modeling of variations between groups and between space and time, leading to accuracy in the modeling of the observed data as well as in the estimation of variance components.

# 4.2 A Brief Description of Linear Mixed Models (LMMs)

Before addressing GLMMs, we present a brief overview of linear mixed models (LMMs). An LMM is a model whose response variable is normal and assumes: (1) that the relationship between the mean of the dependent variable ( y) and fixed and random effects can be modeled as a linear function; (2) that the variance is not a function of the mean; and (3) that random effects follow a normal distribution.

The classic representation of an LMM in the matrix form, is

$$\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{b} + \mathbf{e} \tag{4.1}$$

where y is the vector (n × 1) of the response variable; X is the design matrix (n × ( p + 1)) of fixed effects with rank k; β is the vector of unknown parameters (( p + 1) × 1); Z is the design matrix (n × q) of random effects; and b is the vector of unknown parameters of random effects (q × 1), assuming that the vector of random effects b follows a normal distribution with mean 0 and variance matrix G, that is, b~N(0, G). Finally, ε is the error vector with a normal distribution with mean 0 and a variance–covariance matrix (ε~N(0,R)); both vectors b and ε are assumed to be independent of each other.

Model 4.1, as previously mentioned, can be described in terms of a probability distribution in two ways: the first is the marginal model y~N(E[y] = Xβ, Var [y] = V = ZGZ′ + R), where the mean is based solely on the fixed effects, and the parameters describing the random effects are contained in the variance and covariance matrix V (Littell et al. 2006), while the second form is the conditional model y j b~N(Xβ + Zb,R). Under normality assumptions, both models are exactly the same and hence produce the same solution, whereas when normality is not satisfied, the models produce different solutions (Stroup 2012).

# 4.3 Generalized Linear Mixed Models

Most datasets in agricultural, biological, and social sciences often fall outside the scope of the traditional methods taught in introductory statistics and statistical methods. Often, these data (response variables) are: (a) binary (the presence or absence of a trait of interest, success or failure, the infection status of an individual, or the expression of a genetic disorder); (b) proportional (the ratio of females to males, infection or mortality rates within a group of individuals); or (c) counts (the number of emerging seedlings, the number of sprouts, etc.), where basic statistical methods attempt to quantify the effects of each predictor variable. However, often, studies of these experiments involve random effects, the purpose of which is to quantify variation among individuals or units. The most common random effects are blocks in experimental or observational studies that are replicated across sites (locations or environments) or over time. Random effects also encompass variations among individuals (when measuring multiple responses per individual such as survival of multiple offspring or sex ratios of multiple offspring), genotypes, species, and regions or periods over time.

GLMMs are a powerful class of statistical tools that combine the concepts and ideas of generalized linear models (GLMs) with linear mixed models (LMMs). That is, a GLMM is an extension of the GLM, in which the linear predictor contains random effects in addition to fixed effects. These models handle a wide range of both response distributions and scenarios in which observations are sampled. GLMMs extend the theory of LMMs to response variables that have a non-normal distribution. In GLMMs, the response data are not transformed; instead, the explanatory variables are expressed as a linear relationship through a function g of the expectation of y j b; that is, the response is conditional on random effects. This performs the link function that relates the response to the explanatory variables in a linear manner, thus allowing the use of standard LMM techniques for estimation and hypothesis testing.

A conditional model is used to describe a GLMM with non-Gaussian errors (Model 4.1), given a link function (g), as shown below:

$$
\mathbf{g}(\eta) = \mathbf{X}\mathfrak{J} + \mathbf{Z}b,
$$

which is a function of the conditional expectation given by

$$E[\mathbf{y}|\mathbf{b}] = \mathbf{g}^{-1}(\mathbf{X}\mathbf{\mathcal{J}} + \mathbf{Z}\mathbf{b}) = \mathbf{g}^{-1}(\boldsymbol{\eta}) = \boldsymbol{\mu} \tag{4.2}$$

where g-<sup>1</sup> (∙) is the inverse link and the other terms have already been mentioned earlier. The fixed and random effects are combined to form the conditional linear predictor

$$\boldsymbol{\eta} = \mathbf{g}(E[\mathbf{y}|\boldsymbol{\theta}]) = \mathbf{X}\boldsymbol{\theta} + \mathbf{Z}\boldsymbol{\theta} \tag{4.3}$$

The relationship between the linear predictor and the vector of observations is modeled as follows:

$$\mathbf{y} \mid \mathbf{b} \sim \left( \mathbf{g}^{-1}(\boldsymbol{\eta}), \mathbf{R} \right) \tag{4.4}$$

j The above notation (4.4) expresses the conditional distribution of y, given b has a mean g-<sup>1</sup> (η) and variance R. Note that instead of specifying the distribution for y as in the case of a GLM, we specify a distribution for the conditional response y b.

The variance and covariance matrix for the observations is given by:

$$V(\mathbf{y}) = E[V(\mathbf{y}|\mathbf{b})] + V(E[\mathbf{y}|\mathbf{b}]) = \mathbf{A}^{1/2}\mathbf{R}\mathbf{A}^{1/2} + \mathbf{Z}\mathbf{G}\mathbf{Z}' \tag{4.5}$$

where matrix A is a diagonal matrix containing the variance functions of the model. GLMMs cover an important group of statistical models, such as:


GLMMs have been formulated to correct the shortcomings of LMMs, as there are many cases where the assumptions made in linear mixed models are inadequate. First, an LMM assumes that the relationship between the mean of the dependent variable (y) and fixed and random effects (β, b) can be modeled through a linear function. This assumption is questionable, like when a researcher wishes to model the incidence of a disease or the success or failure of an event.

The second assumption of an LMM is that variance is not a function of the mean and that the random effects follow a normal distribution. The assumption of constant variance is not met when the response variable is binary (1, 0). In this case, the variance is π(1 - π), which is a function of the mean. The result is a random variable, which can take two values (0, 1); in contrast, the normal distribution can take any real number. Finally, the predictions for an LMM can take any real value, whereas the predictions for a binary variable are bounded in the interval (0, 1), since it is a probability and this prediction cannot support negative values.

Historically, a number of options have been used to address and solve some LMM problems, even though their use is not the most appropriate. These include applying logarithmic transformations (log( y)), transformations using the square root y p , arcsine transformations (seno-<sup>1</sup> ( y)), and so on. However, many of these transformations use linear mixed models by ignoring the fact that these models are not the most accurate, despite being aware that the response variable does not satisfy the assumption of normality. These options are attractive because they are relatively simple and easy to implement using the LMM machinery. However, they circumvent the problem that a linear mixed model is not the best model for analyzing data.

# 4.4 The Inverse Link Function

In a GLMM, the canonical link function maps the original data to the linear predictor of the model g(η) = Xβ + Zb. This linear predictor can be transformed to an observed data scale through an inverse link function. In other words, the inverse link function is used to map the value of the linear predictor for the ith observation to the conditional mean at the data scale ηi. For example, suppose that we are conducting an experiment in which we are assessing the number of undesirable weeds observed in a crop of interest after the application of a certain number of treatments; the response variable is assumed to have a Poisson distribution with a mean λij, the linear predictor of which is given by

$$
\eta\_{ij} = \eta + \mathfrak{r}\_i + b\_j
$$

4.6 Specification of a GLMM <sup>117</sup> where η is the intercept, τ<sup>i</sup> is the fixed effect due to treatments, and bj is the random effect assuming bj N 0, σ<sup>2</sup> b .

To obtain the inverse function of the following predictor

$$
\log \left( \lambda\_{\vec{\eta}} \right) = \mathbf{g} \left( \eta\_{\vec{\eta}} \right) = \eta + \mathfrak{r}\_i + b\_j,
$$

we proceed by exponentiating both sides of the previous equation, with which we obtain the inverse function of the link shown below:

$$
\lambda\_{\vec{\eta}} = e^{\eta + \mathfrak{r}\_{\vec{\iota}} + b\_{\vec{\eta}}},
$$

which is denoted as g-<sup>1</sup> (g(ηij)) = g-<sup>1</sup> (η + τ<sup>i</sup> + bj).

Therefore, for this example, λij depends on the linear predictor through the inverse link function and the variance σ<sup>2</sup> ij depends on λij through the variance function.

# 4.5 The Variance Function

The variance function is used to model the inconsistent variability of the phenomenon under study. With GLMMs, the residual variability arises from two sources, namely, the variability of the distribution of sampling units in an experimental arrangement (blocks, plots, locations, etc.) and the variability due to overdispersion. Overdispersion can be modeled in several ways. When dealing with a GLMM, the scale parameter or the dispersion parameter ϕ is extremely important since it can either increase or decrease the variance in the model for each observation.

$$\operatorname{Var}\left(\mathbf{y}\_{ij}|b\_{j}\right) = \phi \mathbf{Var}\left(\eta\_{ij}\right)$$

If overdispersion exists, one way to remove it is to add the random effects (in SAS \_residual\_) of each observation to the linear predictor. Another alternative is to use another distribution to model the dataset; for example, the two-parameter negative binomial (NB) distribution (ηij, ϕ) instead of the single-parameter Poisson distribution (λij) in the case of count data.

# 4.6 Specification of a GLMM

A GLMM is composed of three parts: (1) fixed effects that convey systematic and structural differences in responses; (2) random effects that convey stochastic differences between blocks or other random factors, as these effects allow generalizations


Table 4.1 Common distributions with their respective link functions

of the population from which the sampling units have been (randomly) sampled; and (3) distribution of errors. Thus, a complete definition of a GLMM is as follows:

> Þ ð Þð Þ ð Þ ð Þ þ ð Þ y j b fð Þ μ, ϕ ðconditional distribution b N 0, G random effects g μ = η link function η = Xβ Zb linear predictor

where the distribution function f(∙) is a member of the exponential family, g(μ) is the linear function, X and Z are the design matrices, and β and b are the unknown parameters for fixed and random effects, respectively.

When fitting a GLMM, the data remain on the original measurement scale (data scale). However, when means are estimated from a linear function of the explanatory variables (the predictor), these means are on the model scale. A link function is used to link the model scale back to the original data scale. This is not the same as transforming the original measurements to a different measurement scale. For example, applying the log transformation for counts followed by an analysis of variance (ANOVA) under a normal distribution is not the same as fitting a generalized linear model, assuming a Poisson distribution and using a log link (Gbur et al. 2012). In the first case, the least squares means would normally be equal to the arithmetic means, whereas in the second case, the means are inversely linked to the data scale, which may not be equal to the arithmetic means of the original sample.

The distribution specifications in "proc GLIMMIX" have default link functions, but it is always highly recommended to explicitly code the link function, since for some type of response variable, more than one alternative exists. This way, there is no doubt that an appropriate function was used. Using the wrong link function will lead to totally meaningless and incorrect results. Table 4.1 shows some common distributions, the appropriate link function, and the proper syntax for each.

For a complete list, see the online Statistical Analysis Software (SAS/STAT) documentation for PROC GLIMMIX.

# 4.7 Estimation of the Dispersion Parameter

The overall measures of fit compare the observed values of the response variable with fitted (predicted) values. The dispersion parameter is unknown and therefore must be estimated. There are two methods for estimating the overdispersion parameter. McCullagh (1983) proposed estimating overdispersion as follows:

$$\phi = \frac{(\mathbf{y} - \boldsymbol{\mu})^{\prime} \mathbf{V}\_{\mu}^{-1} (\mathbf{y} - \boldsymbol{\mu})}{N - p} = \frac{\text{Pearson's } \chi^{2}}{N - p} \mathbf{y}$$

where V - <sup>1</sup> <sup>μ</sup> is the diagonal matrix of the variance functions and N - p is the degree of freedom for lack of fit. Later, McCullagh and Nelder (1989) suggested using deviance

$$\widehat{\phi} = \frac{\text{Deviance}}{N - p} = \frac{-2\left[\ln(LM\_1) - \ln(L(M\_2))\right]}{N - p}$$

Deviance is a global fit statistic that also compares fitted and observed values; however, its exact function depends on the likelihood function of the random component of the model. Deviance compares the maximum value of the likelihood function of a model, like M1, with the maximum possible value of the likelihood function that is calculated using data. When data are used in the likelihood function, the model is saturated and has as many parameters as possible. Thus, M<sup>2</sup> is saturated and has as many parameters as the data. Model M<sup>2</sup> tries to fit the data and gives the highest possible value for the likelihood.

If the overdispersion parameter is significantly greater than one, this indicates that overdispersion exists; in other words, it indicates that the variance is greater than the mean. Therefore, the parameter should be used to adjust the variance. If overdispersion is not taken into account, inflated test statistics may be generated. However, when the dispersion parameter is less than 1, the test statistics are more conservative, which is not considered a big problem.

The following example is intended to show how GLIMMIX in SAS estimates the dispersion parameter in a GLMM.

Example An agronomist wants to test the effectiveness of a new herbicide offered on the market (we will denote this as herb\_N) and compare it with the herbicide that has been used for several cycles (herb\_C). The experimental arrangement used was a randomized complete block design as shown below (Table 4.2).

The components of a GLMM with a Poisson response variable are listed below:

$$\begin{aligned} \text{Distribution: } y\_{ij} &\mid b\_j \sim \text{Poisson}\left(\mathbb{A}\_{ij}\right) \\ b\_j &\sim N\left(0, \sigma^2\_{\text{blockque}}\right) \end{aligned}$$


Linear predictor: ηij = η þ herbicide<sup>i</sup> þ bj

Link function: log λij = ηij

This model assumes that the slopes are the same for each herbicide. The following SAS code is used for the proposed model:

```
proc glimmix nobound method=laplace;
class block trt;
model count = trt/dist=poisson link=log;
random block;
lsmeans trt/ilink lines;
run;
```
Explanation The "method = "option is used to specify the method used to optimize the logarithm of the likelihood function. In "proc GLIMMIX," there are two popular methods: adaptive quadrature (quad) or Laplace (laplace), which are the preferred methods for categorical response variables. Both of these methods fit a conditional model. When the quadrature method is used (method = quad), subjects (individuals) must be declared in the random effects (e.g., for the above program, "random intercept/subject=block"). In addition, processing random effects by subject is more efficient than using the syntax "random block" random effects in blocks. The "dist" option is where you specify the probability distribution that is appropriate for the type of response; in this case, it is the Poisson distribution. The "link" option is for specifying the link function of the distribution. The "ddfm" option is omitted so that GLIMMIX uses – by default – the method for calculating the denominator degrees of freedom for the fixed effects tests that result from the model. The "ilink" option converts the estimates of the treatment means (lsmeans) on the model scale to the data scale. Finally, "proc GLIMMIX" supports the "lines" option, which adds letter groups to the mean differences resulting from using "lsmeans."

The most relevant parts of the SAS output, for the purposes of what we want to show, are shown in Tables 4.3 and 4.4. The fit statistics of the fitted model are shown in part (a) and part (b) of Table 4.3. The -2 log likelihood statistic is extremely


Table 4.4 Type III fixed effects tests and estimated least squares means


useful for comparing nested models, whereas the different versions of information criteria that exist, such as Akaike information criterion (AIC), Akaike's information criteria with small sample bias correction (AICC), Bayesian information criterion (BIC), Bozdogan Akaike's information criteria (CAIC), and Hannan and Quinn information criteria (HQIC), are useful when comparing models that are not necessarily nested (subsection (a)). The table of fit statistics for the conditional distribution shows the sum of the independent contributions to the conditional (part (b)) -2 log likelihood, the value of which is 139.03, whereas the value of Pearson's statistic divided by the degrees of freedom for the conditional distribution (Pearson′ s chi square/DF) is 4.85.

The estimated dispersion parameter (ϕ = Pearson's chi-square/DF) has a value far from 1; in this case, it is ϕ = 4:85, which indicates that there is a strong overdispersion. This may be because the specified distribution of the data is not appropriate, the counts are too small, or the variance function was not correctly specified. The estimate of the variance component due to a block is tabulated in part (c) of Table 4.2, the estimated value of which is σ<sup>2</sup> bloque = 1:559.

The fixed effects test and least squares means are shown in Table 4.4. The type III fixed effects tests indicate that there is a highly significant difference (part (a)) in the effectiveness of herbicides in weed suppression; the estimated means with their respective standard errors are tabulated under the "Mean" column (part (b)). The "Estimate" column containing the estimates of the means of lsmeans is on the model scale. They are derived from the log likelihood function. SAS always lists the means obtained with lsmeans from the model scale when creating least squares means test tables. The "Mean" column has been converted back to the data scale using the "ilink" inverse link function. These values are estimates of the average counts for each treatment level (in this case, the herbicide type on the data scale). When we report the results, we must replace the corresponding model's least squares values in the test tables with these estimates (means on the data scale corresponding to the values in the "Mean" column).

Since there is a strong overdispersion ϕ > 1 , assuming that the data have a Poisson distribution is risky because this implies that the mean and variance are equal, which is an assumption implying that the data have a Poisson distribution, i.e., that the mean and variance are the same. A useful alternative distribution might be a negative binomial distribution; this distribution has a mean λ and variance λ + λϕ<sup>2</sup> with ϕ > 0 commonly known as the scale parameter.

The following is the specification of the components of a GLMM with a negative binomial (NB) response variable:

> Distribution : yij j bj Negative binomialðλij, ϕÞ bj <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> bloqueÞ Linear predictor: ηij = η þ herbicide<sup>i</sup> þ bj Link function: log λij = ηij

The GLIMMIX procedure also allows modeling a GLMM with a negative binomial response variable:

```
proc glimmix data=itam nobound method=laplace;
class block trts;
model count = trts/dist=negbin;
random block;
lsmeans trts/ilink;
run;
```
Part of the output is shown in Table 4.5. The fit statistics for the model comparison (part (a)) and that for the conditional distribution (part (b)) are both provided by the GLIMMIX procedure when a conditional distribution is specified. Since in the previous analysis, it was observed that overdispersion exists when assuming a Poisson distribution, the results – under a negative binomial distribution – indicate that this overdispersion problem no longer exists; i.e., the binomial distribution is no


longer overdispersed ϕ = 0:58 . In other words, the negative binomial distribution does a better job than the Poisson distribution in fitting these data, since it effectively controls the overdispersion.

Comparing the fit statistics tabulated in Table 4.3 subsection (c) under both distributions, we can observe that when the data are modeled under a negative binomial distribution, the values of the fit statistics are lower than those under a Poisson distribution, since the dispersion parameter ϕ < 1. This indicates that the negative binomial models this dataset better.

# 4.8 Estimation and Inference in Generalized Linear Mixed Models

# 4.8.1 Estimation

In GLMMs, inference involves the estimation and testing of the hypotheses of unknown parameters in β, G, and R as well as the best linear unbiased predictions (BLUPs) of random effects, b. In most modern statistical tools, including GLMMs, parameter fitting is performed via maximum likelihood (ML) or methods derived from this method. For simple analyses, in which the response variables are normal, classical ANOVA methods are based on calculating the differences of the sums of the squares that produce the same results as an ML estimation. However, this equivalence is not obtained in models with more complex structures such as LMMs or GLMMs. To find the ML estimators, in GLMMs, one must integrate over all possible values of the random effects. For GLMMs, this computation is at best slow and at worst (a large number of random effects) computationally infeasible.

Statisticians have proposed several ways to approximate the parameter estimates of a GLMM, including penalized quasi-likelihood (PQL) and pseudo-likelihood methods (Schall 1991; Wolfinger and O'Connell 1993; Breslow and Clayton 1993), Laplace approximations (Raudenbush et al. 2000) and Gauss–Hermite quadrature (Pinheiro and Chao (2006), and Bayesian methods based on Markov chain Monte Carlo (Gilks et al. 1996). In all these approaches, researchers must distinguish between a standard ML estimation, which estimates the standard deviations of the random effects assuming that the fixed effects estimates are precisely correct, and restricted maximum likelihood (REML), a variant that averages over the uncertainty in the fixed effects parameters (Pinheiro and Bates 2000; Littell et al. 2006).

The ML method underestimates the standard deviations of random effects, except in extremely large datasets, but it is most useful for comparing models with different fixed effects. Pseudo- and quasi-likelihood methods are the simplest and the most widely used in approximating a GLMM. They are widely implemented in statistical packages that promote the use of GLMMs in many areas of ecology, biology, and quantitative and evolutionary genetics (Breslow 2004). Unfortunately, pseudo- and quasi-likelihood methods produce biases in parameter estimation if the standard deviations of the random effects are large, especially when using binary data (Rodriguez and Goldman 2001; Goldstein and Rasbash 1996). Lee and Nelder (2001) have implemented several improvements to the PQL version, but these are not available in most common statistical software packages. As a rule of thumb, PQL performs poorly for Poisson data when the average number of counts per treatment combination is less than five or for binomial data when the expected numbers of successes and failures for each observation are less than five (Breslow 2004). Another disadvantage of PQL is that it calculates a quasi-likelihood rather than the true likelihood. Because of this, many statisticians believe that PQL-based methods should not be used for inference.

There are two more accurate approximations available, which also reduce bias. One is the Laplace approximation (Raudenbush et al. 2000), which approximates the true likelihood of a GLMM instead of a quasi-likelihood, allowing the maximum likelihood method in the GLMM inference process. The other approach is called Gauss–Hermite quadrature (Pinheiro and Chao 2006), which is more accurate than the Laplace approximation but is slower (requires more computational resources). Therefore, the procedures for parameter estimation of a GLMM that are approximations are as follows:

The penalized quasi-likelihood method performs the estimation process by alternating between (1) estimating the fixed parameters by fitting a GLM with a variance– covariance matrix based on an LMM fit and (2) estimating the variances and covariances by fitting a GLM with unequal variances calculated from the previous GLM fit. Pseudo-likelihood, a close cousin of the ML method, estimates variances differently and estimates a scale parameter to account for overdispersion (some authors use these terms interchangeably). In summary, GLMMs require an iterative process in parameter estimation. Two categories of iterative procedures are used by SAS: linearization and integral approximation. The GLIMMIX procedure uses the pseudo-likelihood method in linearization, and integral approximation uses the Laplace approximation or adaptive methods such as Gauss–Hermite quadrature. These methods maximize the log likelihood of the exponential distribution family, i.e., non-normal distributions. The pseudolikelihood method is the default procedure in the GLIMMIX procedure (Proc GLIMMIX). The Laplace method and quadrature are an approximation for maximum likelihood, but the Laplace method is computationally simpler than quadrature and also provides excellent estimates.

# 4.8.2 Inference

After estimating the parameter values in a GLMM, the next step is to extract information and draw statistical conclusions from a given dataset through careful analysis of the parameter estimates (confidence intervals, hypothesis testing) and select a model that best describes or explains the most variability in the dataset. Inference can generally be based on three types: (a) hypothesis testing, (b) model comparison, and (c) Bayesian approaches. Hypothesis testing compares test statistics (F-test in ANOVA) to verify their expected distributions under the null hypothesis (H0), estimating the value of P (P-value) to determine whether H<sup>0</sup> can be rejected. On the other hand, model selection compares candidate model fits. These can be selected using hypothesis testing; that is, testing nested versus more complex models (Stephens et al. 2005) or using information theory approaches such as Wald tests (Z, χ2 , t, and F). In model selection, likelihood ratio (LR) tests can ensure the significance of factors or choose the best of a pair of candidate models. On the other hand, information criteria allow multiple comparisons and selections of non-nested models. Among these criteria are the Akaike information criterion (AIC) and related information criteria that use deviance as a measure of fit, adding a term to penalize more complex models. Information criteria can provide better estimates. Variations of AIC are highly common when sample sizes are not large (AICC), when there is overdispersion in the data (quasi-AIC, QAIC), or when one wishes to identify/ determine the number of parameters in a model (Bayesian information criterion, BIC).

# 4.9 Fitting the Model

The mathematics behind a GLMM is quite complex. It is difficult to conceptualize the use of constructs such as distributions, link functions, log likelihood, and quasilikelihood when fitting a model. Perhaps the following points will help explain the modeling process.


The key concepts of proc GLIMMIX are (1) it uses a distribution to estimate the model parameters; it does not fit the data to a distribution, and (2) the data values are not transformed by the link function; the link function converts the means (least squares means) to the data scale after estimation at the model scale.

# 4.10 Exercises

	- (a) According to the response variable, what type(s) of probability distribution do you suggest for the variable?
	- (b) Construct a GLMM to study the effect of treatments on seed germination.
	- (c) Analyze the dataset according to the model proposed in (a). Is the probability distribution proposed in (a) adequate?
	- (d) Is there a significant difference in the proportion of germinated seeds between treatments?


Table 4.6 Number of seeds not germinating (out of 50)


Table 4.7 Control of cockchafer larvae

	- (a) Considering the type of answer of this exercise; what type(s) of probability distribution(s) do you suggest for this type of response?
	- (b) Construct a GLMM to study the effect of treatments and the age of Cockchafer larvae.
	- (c) Analyze the dataset according to the model proposed in (a).
	- (d) Is the model used in (a) sufficient? If so, discuss your findings.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 5 Generalized Linear Mixed Models for Counts

# 5.1 Introduction

Data in the for of counts regularly appear in studies in which the number of occurrences is investigated, such as the number of insects, birds, or weeds in agricultural or agroecological studies; the number of plants transformed or regenerated using modern breeding techniques; the number of individuals with a certain disease in a medical study; and the number of defective products in a quality improvement study, among others. These counts can be counted per unit of time, area, or volume. When using a generalized linear model (GLM) with a Poisson distribution, it is often found that there is excessive dispersion (extra variation) that is no longer captured by the Poisson model. In these cases, the data must be modeled with a negative binomial distribution that has the same mean as the Poisson distribution but with a variance greater than the mean. Most experiments have some form of structure due to the experimental design (completely randomized design (CRD), randomized complete block design (RCBD), incomplete block, or split-plot design) or the sampling design, which must be incorporated into the predictor to adequately model the data.

# 5.2 The Poisson Model

A Poisson distribution with parameter λ belongs to the exponential family and is a discrete random variable, whose probability function is equal to

$$f(\mathbf{y}) = \frac{e^{-\lambda}\lambda^y}{\mathbf{y}!}; \lambda > 0, \mathbf{y} = 0, 1, 2, \cdots.$$

The mean and variance of a Poisson random variable are equal, i.e., E( y) = Var ( y) = λ. A Poisson distribution is often used to model responses that are "counts." As λ increases, the Poisson distribution becomes more symmetric and eventually it can be reasonably approximated by a normal distribution.

Let yij be the value of the count variable associated with unit i at level one and with unit j at level two, given a set of explanatory variables. Therefore, we can express this as

$$f\left(\mathbf{y}\_{\vec{\boldsymbol{y}}}\right) = \frac{e^{-\lambda\_{\vec{\boldsymbol{y}}}}\lambda\_{\vec{\boldsymbol{y}}}^{\mathbf{y}\_{\vec{\boldsymbol{y}}}}}{\mathbf{y}\_{\vec{\boldsymbol{y}}}!}, \mathbf{y}\_{\vec{\boldsymbol{y}}} = \mathbf{0}, 1, 2, \cdots$$

and the logarithm of the likelihood is given by:

$$\log f(\mathbf{y}\_{ij}) = \log \left( \frac{e^{-\lambda\_{\vec{\eta}}} \lambda\_{\vec{ij}}^{\mathbf{y}\_{ij}}}{\mathbf{y}\_{\vec{ij}}!} \right) = -\lambda\_{\vec{\eta}} + \mathbf{y}\_{\vec{ij}} \log \left( \lambda\_{\vec{\eta}} \right) - \log \left( \mathbf{y}\_{\vec{ij}}! \right).$$

A Poisson distribution has very particular mathematical properties that are used when we model "counts." For example, the expected value of y is equal to the variance of y, such that

$$E(\mathbf{y}\_{ik}) = \mathbf{Var}(\mathbf{y}\_{ik}) = \lambda\_{ij}$$

Then, λij is necessarily a nonnegative number, which could lead to difficulties if we consider using the identity bound function in this context. The natural logarithm is mainly used as a link function for expected "counts." For single-level (factor) data, Poisson regression model is considered, where we work with the natural logarithm of the counts, log(λi), whereas for multilevel data (more than two factors), mixed models with Poisson data are considered a better choice for the logarithm of the counts λij.

Suppose that given the random effects of b, the counts y1, y2, ⋯, yn are conditionally independent such that yij j bj~Poisson(λij), where

$$
\log \left( \lambda\_{ij} \right) = \eta + \mathfrak{r}\_i + b\_j.
$$

This is a special case of a generalized linear mixed model (GLMM) in which the link function of this family of distributions is g(λij) = log (λij). The dispersion parameter ϕ, in this case, is equal to 1.

Sometimes, if the data counts are extremely large, their distribution can be approximated to a continuous distribution. Whereas, if all the counts are large enough, then the square root of the counts is viable for fitting the model as it allows the variance to be stabilized. However, as mentioned in previous chapters, the estimation process under normality can be problematic, as it can provide negative fitted values and predictions, which is illogical.

# 5.2.1 CRD with a Poisson Response

An CRD is a design in which a fixed number of t treatments is randomly assigned to r experimental units. The linear predictor describing the mean structure of this GLM is

$$
\eta\_{ij} = \eta + \tau\_i
$$

where ηij denotes the ijth link function of the ith treatment in the jth observation, η is the intercept, and τ<sup>i</sup> is the fixed effect due to treatment i (i = 1, 2,⋯, t; j = 1, 2, ⋯ri), with t treatments and ri replicates in each treatment i.

Example Effect of a subculture on the number of shoots during micropropagation of sugarcane.

The objective of micropropagation in sugarcane is to produce vegetative material identical to the donor so that its genetic integrity is preserved. Despite this, somaclonal variation has been observed in plants derived from in vitro culture regardless of explant, variety, ploidy level, number of subcultures, and generation route used, among others. A total of 8 explants were planted in temporary immersion bioreactors (explant/bioreactor) to determine whether the number of subcultures (10 subcultures) influences the number of shoots observed per explant. In this example, we have ri observations ( j = 1, 2, ...,ri) on each of the 10 subcultures (i = 1, 2, ..., 10) in a completely randomized design (Appendix 1: Data: Subcultures). The analysis of variance (ANOVA) table (Table 5.1) for this model is given below:

The components of the GLM are set out below:

Distribution: yij - Poisson λij Linear predictor: ηij = η þ τ<sup>i</sup> Link function: log λij = ηij

where yij denotes the number of sprouts observed in subculture i explant j (i = 1, 2, ⋯, 10; j = 1, 2, ⋯, 8), ηij is the ijth link function, η is the intercept, and τ<sup>i</sup> is the fixed effect of subculture i.


Table 5.1 Analysis of variance


The following Statistical Analysis Software (SAS) code allows analyzing an CRD with a Poisson response.

```
proc glimmix data=sugar method=laplace;
class rep1 sub1 ;
model nb=sub/dist=poisson s link=log;
lsmeans sub/lines ilink;
run;quit;
```
While most of the commands used have been explained before, the options in the model statement "dist," "s," and "link" communicate to the SAS the type of data distribution, the fixed effects solution, and the link to use, respectively. In addition, the "lines" option asks the GLIMMIX procedure in the "lsmeans" (least squares means) command for mean comparisons, and the "ilink" option provides the inverse link function.

Part of the output is shown in Table 5.2, where part (a) shows the model and the methods used to fit the statistical model, whereas part (b) lists the dimensions of the relevant matrices in the model specification.

Due to the absence of random effects in this model, there are no columns in matrix Z. The 11 columns in matrix X comprise an intercept and 10 columns for the effect of subcultures.

The goodness-of-fit statistics of the model are shown in part (a) of Table 5.3. The value of the generalized chi-squared statistic over its degrees of freedom (DFs) is less than 1. (Pearson′ s chi - square/DF = 0.79). This indicates that there is no overdispersion and that the variability in the data has been adequately modeled with the Poisson distribution.

Subsection (b) of Table 5.3 shows the maximum likelihood (ML) ("Estimate"), parameter estimates, standard errors, and t-tests for the hypothesis of the parameters.

### 5.2 The Poisson Model 133


Table 5.3 Fit statistics and estimated parameters

(a) Fit statistics (Akaike's information criterion (AIC), a small sample bias Corrected Akaike's

Table 5.4 (part (a)) shows significance tests for the fixed effects in the model "Type III fixed effects tests." These tests are Wald tests and not likelihood ratio tests. The effect of a subculture on the number of shoots is highly significant in this model with a value of P < 0.0001, indicating that the 10 subcultures do not produce the same number of shoots, that is, the number of subcultures affects the average shoot production in the explant.

The least squares means obtained with "lsmeans" (part (b) in Table 5.4) are the values under the column "Estimate," which along with the standard errors, were calculated with the linear predictor η<sup>i</sup> = η þ τi. These estimates are on the model scale, whereas the "Mean" column values and their respective standard errors are on the data scale, which were obtained by applying the inverse link to obtain the λ<sup>i</sup> values, i.e., λ<sup>i</sup> = exp <sup>η</sup>ð Þ<sup>i</sup> with their respective standard errors.

A comparison of means, using the option "lines," is presented in Fig. 5.1. In this figure, we can see that in the first subcultures, the average production is minimal but it increases as subcultures increase from 5 to 8, and, in subculture 9, the average number of shoots per explant begins to decrease.


Table 5.4 Type III tests of fixed effects and least squares means (means)

Fig. 5.1 Average number of shoots per subculture. Bars with different letters are statistically different using α = 0.05

# 5.2.2 Example 2: CRDs with Poisson Response

Researchers want to determine whether the application of a new growth compound to walnut trees changes the amount of nuts produced per tree. They were applied at three different times (pre-flowering = 1, flowering = 2, and post-flowering = 3) and


A1 89 A2 127 A2 89 B2 99 B3 118

in two formulations (A and B) plus a control (C). In addition to the treatments (Trt) there was a control, where no compound was applied. In total, 7 treatments were randomly applied to the experimental units (trees), i.e., 35 trees, in a rectangular arrangement (as shown below). The average number of nuts yij observed in the formulation and the time of application are provided in Table 5.5.

The components of the GLMM are listed below:

Distribution : yij j rj - PoissonðλijÞ rj - <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> treeÞ Linear predictor: ηij = η þ τ<sup>i</sup> þ rj Link function: log λij = ηij

where yij denotes the number of nuts in treatment i on tree j (i = 1, 2, ⋯, 7; j = 1, 2, ⋯, 5), ηij is the linear predictor, η is the intercept, τ<sup>i</sup> is the fixed effect due to treatment i, and rj is the random effect due to tree j.

The following SAS statements allows a GLMM to be fitted in a completely randomized design with a Poisson response variable.

```
proc glimmix data=crd_nuez nobound method=laplace;
class trt rep;
model count = trt/dist=Poi link=log;
random rep;
lsmeans trt/lines ilink;
run;
```
The options in the model statement, dist, s and ilink communicates to SAS the type of data distribution, the fixed effects solution and to compute the inverse link, respectively. In addition, the option "lines" requests the GLIMMIX procedure in the "lsmeans" (least squares means) command, and the mean comparisons and the "ilink" option provide the inverse of the link function.

Part of the results is presented in Table 5.6. The value of the statistic for conditional distribution (part (a)) indicates that there is a strong overdispersion (χ<sup>2</sup> / df = 3.62), and the variance component estimates due to sampling in the experimental units (trees) is σ<sup>2</sup> tree = 0:035 (part (b)).


In addition, Table 5.6 (part (c)) shows the type III tests of fixed effects, indicating that there is a significant difference between treatments on the average number of nuts per tree (P = 0.0001). However, it is not recommended to continue with the inference and analysis of the experiment due to the presence of extra-variance (commonly known as overdispersion; Pearson′ s chi - square/DF = 3.62) in the data that strongly affects the F-test and the standard errors of the means.

A highly effective alternative to deal with the inconvenience of overdispersion in the data is to use a different distribution to the Poisson distribution. A negative binomial distribution is an excellent option for count data with overdispersion. Assuming that the conditional distribution of the observations is given by:

$$|y\_{ij}| \; r\_j \sim \text{Poisson}(\lambda\_{ij}),$$

where λij~Gamma~(1/ϕ, ϕ), ϕ as the scale parameter and rj - <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> treeÞ. The resulting new GLMM is:

> Distribution : yij j rj - Negative Binomialðλij, ϕÞ, rj - <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> treeÞ Linear predictor : ηij = η þ τ<sup>i</sup> þ rj

Link function: log λij = ηij

The following GLIMMIX statements for fitting this model under a negative binomial distribution in a CRD manner is provided next.

```
proc glimmix data=crd_nuez nobound method=laplace;
class trt rep;
model count = trt/dist=Negbin link=log;
random rep;
lsmeans trt/lines ilink;
run;
```




Part of the results is listed below. The information criteria in Table 5.7 part (a) are helpful in choosing which model best fits the dataset. Clearly, the negative binomial distribution provides the best fit to these data. On the other hand, in the conditional fit statistics (part (b)), we observed that the Poisson model had a strong overdispersion (Pearson′ s chi - square/DF = 3.62) and that by fitting the data under a negative binomial distribution, the overdispersion of the dataset was removed (Pearson′ s chi - Square/DF = 0.91).

Table 5.8 shows the variance component estimates (part (a)) and the type III tests of fixed effects (part (b)). The estimated variance parameter, due to trees, is σ2 tree = 0:04288, and the estimated scale parameter (Scale) is ϕ = 0:06141. The type III tests of fixed effects (part (b)) show that there is a highly significant effect of treatments on the average number of nuts (P < 0.0001).

The values under the column "Estimates" are the estimates of the linear predictor η<sup>i</sup> (the model scale), and the values under "Mean" are the means λ<sup>i</sup> (the data scale) with their respective standard errors obtained with the command "lsmeans" and "ilink" (Table 5.9). The results show that the treatments implemented in this experiment showed a higher average number of walnuts than did the "control" treatment C. In general, formula B applied to the walnut trees at the full-flowering stage showed a higher nut production.


Table 5.9 Estimates on the model scale ("Estimate") and means on the data scale ("Mean")

Interest often arises in areas of agricultural and biological sciences to conduct experiments that involve random effects (blocks, locations, etc.) and response variables different from the normal distribution. For example, suppose that a certain number of treatments are being tested at different randomly selected locations, out of a sufficiently large number of locations. At each location, the experimental units are randomly assigned to each of the treatments. Let yij be the number of (observed) individuals possessing the characteristic of interest in the ith treatment in the jth block. The model for the mean structure of this experiment is

$$
\eta\_{ij} = \eta + \mathfrak{r}\_i + b\_j
$$

where η is the intercept, τ<sup>i</sup> is the fixed effect due to the ith treatment i, and bj is the random effect of the block j with bj - <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> blockÞ.

# 5.2.3 Example 3: Control of Weeds in Cereal Crops in an RCBD

One of the main problems when growing cereal crops is the competition that exists between the weeds and seedlings. If a field supervisor is interested in testing five designed treatments plus a control for weed control in cereal crops, then a randomized complete block design (four blocks) should be used. Table 5.10 shows the number of weed plants observed in each of the treatments (yij) in parentheses.

Table 5.11 shows the sources of variation and the degrees of freedom of a randomized complete block design used in this experiment.

Since the response is count, it will be modeled using a GLMM with a Poisson response variable, which is stated below:

$$\begin{aligned} \text{Distribution}: & y\_{ij} \mid b\_j \sim \text{Poisson}(\lambda\_{ij})\\ b\_j &\sim N(0, \sigma\_{\text{block}}^2) \end{aligned}$$


Table 5.10 Number of weeds in each treatment (the number in parentheses corresponds to the treatment number)

Table 5.11 Analysis of variance


Linear predictor: ηij = η þ τ<sup>i</sup> þ bj

Link function: log λij = ηij

where yij denotes the number of weed plants observed in treatment i and block j (i = 1, 2,⋯, 6; j = 1, 2, 3, 4), ηij is the linear predictor, η is the intercept, τ<sup>i</sup> is the fixed effect due to treatment <sup>i</sup>, and bj is the random block effect bj - N 0, σ<sup>2</sup> block .

Using the GLIMMIX procedure, the following syntax specifies the analysis of a GLMM with a Poisson response.

```
proc glimmix nobound method=laplace;
class Block Trt;
model Count = Trt/dist=Poisson s;
random block;
lsmeans Trt/diff lines ilink;
run; quit;
```
Note that in the above syntax, we use "method = laplace" (or we can also use "method = quadrature") to fit the mixed model and obtain the chi-squared/DF fit statistic. If the method of integration is not specified, then a generalized chi-squared/ DF statistic is obtained. The auxiliary options after the "lsmeans" command are described below: "diff" provides paired comparisons between treatments, "lines" provides the pair comparison of means using letters, and "ilink" provides the value of the inverse of the link function. Some of the outputs are listed below.

Table 5.12 (a) presents the basic information about the model and estimation procedure used.

Subsection (b) of Table 5.12 shows/ lists the "Dimensions" of the relevant matrices used in the model. The random effects matrix Z indicates that there are four columns due to blocks, and the fixed effects matrix X indicates that there is one column for the intercept plus six columns due to treatments.


The "Fit statistics" and "Fit statistics for conditional distribution" (parts (a) and (b) of Table 5.13, respectively) show information about the fit of the GLMM. The generalized chi-squared statistic measures the sum of the residual squares in the final model and the relationship with its degrees of freedom; this is a measure of the variability of the observations about the model around the mean.

The value of Pearson's chi-square/DF for the conditional distribution is 11.8, well above up 1. This value gives strong evidence of overdispersion in the dataset. In other words, this value is calling our distribution and linear predictor assumption into question, which means that the variance function was not adequately specified.


Table 5.14 Variance component estimates, parameter estimates, and type III tests of fixed effects

Table 5.15 Estimated least squares means ("Mean")


The F-test for testing H<sup>0</sup> (τ<sup>1</sup> = τ<sup>2</sup> = ⋯ = τ6) or equivalent (μ<sup>1</sup> = μ<sup>2</sup> = ⋯ = μ6) indicates that there is a highly significant difference (P < 0.0001) in the average number of weeds in at least one treatment (part (c)) (Table 5.14).

The estimates of the linear predictor on the model scale for each of the treatments η<sup>i</sup> ð Þ and the inverse of the linear predictor λ<sup>i</sup> on the data scale (with their respective standard errors) are calculated as follows <sup>η</sup><sup>i</sup> <sup>=</sup> <sup>η</sup> <sup>þ</sup> <sup>τ</sup><sup>i</sup> and <sup>λ</sup><sup>i</sup> <sup>=</sup> exp <sup>η</sup>ð Þ<sup>i</sup> , respectively. These values are listed in Table 5.15.

The "plots" option in the "proc GLIMMIX" statement creates a set of plots for the raw residuals, Pearson residuals, and studentized residuals.

The panel consists of a plot of studentized residuals versus the linear predictor η<sup>i</sup> ð Þ, a histogram of the residuals with a normal density superimposed, a plot of residual versus quantiles, and a box plot for the residuals. The panel of studentized residuals indicates the possibility of a slightly skewed distribution (Fig. 5.2). In this figure, we can see that the range of values of the residuals changes, as do the values

Fig. 5.2 Studentized conditional residuals

of the linear predictor, indicating that the assumption of constant variance is no longer met. The residuals–quantiles plot confirms the constant variance violation. A nonconstant variance may also suggest an incorrect selection of the response distribution or variance function.

# 5.2.4 Overdispersion in Poisson Data

Linear mixed models assume that the observations have a normal distribution conditional to the fixed effects of parameters. In addition, the mean μ is independent of the variance σ<sup>2</sup> , whereas, in most GLMMs that assume a binomial or Poisson distribution, the variance "dispersion" is set to 1. That is, if the mean is known, then we assume that the variance is also known. The extra variability not predicted by a generalized linear model's random component reflects overdispersion. Overdispersion occurs because the mean and variance components of a GLM are related and depend on the same parameter that is being predicted through the predictor set. However, if overdispersion is present in a dataset, then the estimated standard errors and test statistics of the overall goodness of fit will be distorted and adjustments must be made. In other words, when there is overdispersion in a dataset, the standard errors of the estimated parameters are too small, which leads to test statistics for the model parameters that are too large (i.e., type I error increases).

Overdispersion can be caused by several factors: omission of predictor variables in the model, high correlation in the observations due to nested effects, misspecification of the systematic component, or incorrect distribution of the data. Systematic or overdispersion deviations may be the result of incorrect assumptions about the stochastic and/or systematic component of the model. The model may also not fit the dataset well because of an incorrect choice of the link function. Systematic deviations may also result from lack of either random effects or independence of observations. These random factors should generally address deviance violations and problems associated with the systematic component.

According to Stroup (2013), overdispersion occurs when the variance exceeds the theoretical variance under the distribution model of the data. For any distribution with a nontrivial variance function, overdispersion is theoretically possible for distributions belonging to the one-parameter exponential family because they lack a scale parameter to mitigate the mean–variance relationship; therefore, models such as Poisson distribution are vulnerable to overdispersion. In summary, overdispersion occurs when:


If we do not account for overdispersion, we underestimate the standard errors (for a large variance, the standard errors are not correct) and inflate the statistical tests causing the type I error to inflate and the confidence intervals to be unreliable. Fig. 5.3 shows that as the predicted mean μ increases, the residuals have a larger spread in the plot, indicating that the variance may increase as a function of the mean, whereas Fig. 5.4 shows a nonconstant variance.

In the fit statistics obtained under the GLMM with the Poisson distribution (part (b), Table 5.13), the value of the statistic of Pearson′ s chi - square/DF = 11.8) indicates that there is a strong overdispersion in the dataset. Another aspect provided by the output is the value of the test statistic F (F = 523.57) tabulated in (part (c)) of Table 5.14. A value too large may indicate that the fit is incorrect. Once the researcher has detected overdispersion, he/she must consider the strategy that will take to remedy it. There are three possible alternatives to evaluate (test) and eliminate overdispersion. Below, we will review the three aforementioned alternatives.

Fig. 5.3 Conditional residuals versus predicted values on the data scale

### 5.2.4.1 Using the Scale Parameter

The first alternative is to add a scale parameter and replace Var(yij| bj) = λij by Var(yij| bj) = ϕλij. This consists of replacing the logarithm of the conditional likelihood yij log (λij) - λij - log (yij) by the quasi-likelihood yij log (λij) - λij/ϕ, assuming that ϕ > 1 could adequately model the observed variance.

The following GLIMMIX syntax invokes this alternative of adding a scale parameter under a Poisson response variable.

```
proc glimmix;
class Block Trt;
model Count = Trt/dist=Poisson;
random intercept/subject=block;
random _residual_;
lsmeans Trt/ ilink ;
run;
```
The SAS code is highly similar to that previously used with the addition of the "random \_residual\_" command to the program. Note that the Laplace integration method ("method = laplace") has been removed, which causes the estimation to be performed using the pseudo-likelihood (PL) method; the scale parameter is estimated and used in the adjustment of the standard errors and test statistics. The

Fig. 5.4 Residuals on the model scale

GLIMMIX procedure uses the generalized chi-square divided by its degrees of freedom Gener:chi - square=DF = ϕ as the estimate of the scale parameter. All standard errors are multiplied by ϕ, and all F-test values are divided by ϕ. Table 5.16 shows part of the results.

In Table 5.16, we observe the fit statistics (part (a)), covariance parameter estimates (part (b)), and the value of the scale parameter, which is equal to ϕ = 19:4848 (Residual(VC)). The value of the F-statistic under the Poisson distribution in the analysis is 26.87 (part (c)); this value is obtained by dividing the F-value from the previous analysis by 523:57=ϕ . The results indicate that even under this adjustment, overdispersion exists and that this value increases from 11.8 to 19.4848 (part (a)). The inclusion of the scale parameter affects the variance estimate due to blocks σ<sup>2</sup> block as well as the estimates of treatment means (part (d)), but the main impact is on the standard errors.

The inclusion of the scale parameter implies that there is a quasi-likelihood, meaning that there is no true likelihood of the model and, therefore, there is no true likelihood process that provides a true expected value of λ and a variance of ϕλ.


Table 5.16 Results of the adjustment by adding the scale parameter

### 5.2.4.2 Linear Predictor Review

In count and binomial response variables, it is important to check whether the linear predictor is correctly specified, that is, whether it is being randomly affected by the experimental units within blocks. If λij is being randomly affected by the experimental units within blocks, which is important in count and binomial response variables, then, the ANOVA table should include the effect of the block × treatment source of variation; this must be specified in the linear predictor in a GLMM. Thus, the linear predictor is specified as

$$\eta\_{\vec{\eta}} = \eta + \tau\_i + b\_j + (b\tau)\_{\vec{\eta}}$$

$$\begin{aligned} \text{Distribution} &: y\_{\vec{\eta}} \mid b\_j, b\tau\_{\vec{\eta}} \sim Poisson(\lambda\_{\vec{\eta}})\\ b\_j &\sim N(0, \sigma\_{block}^2) \\ b\tau\_{\vec{\eta}} &\sim N(0, \sigma\_{block \times \tau}^2) \end{aligned}$$

$$\text{Linear predictor: } \eta\_{ij} = \eta + \tau\_i + b\_j + (b\tau)\_{ij}$$

Link function: log λij = ηij:

The following GLIMMIX program allows the above model to be adjusted:


Table 5.17 Results of the fit by redefining the predictor of the model

```
proc glimmix method=laplace;
class Block Trt;
model Count = Trt/dist=Poisson;
random intercept Trt/subject=block;
lsmeans Trt/ ilink ;
run;
```
Part of the output is shown in Table 5.17. The results tabulated in part (a) indicate that the overdispersion has been eliminated ϕ = 0:11 , but there is a risk of underestimating the variance. For this reason, it is highly recommended that the value of ϕ should be close to 1. The estimated variance components (part (b)) for blocks and block × treatments are σ<sup>2</sup> block = 0:05969 and σ<sup>2</sup> block <sup>×</sup> Trt = 0:1152, respectively.

The type III tests of fixed effects are highly significant (P = 0.0001), indicating that the six treatments are not equally effective in weed control (part (c)). The values in part (d) under the "Mean" column are the means on the original scale of the data for each of the treatments with their respective standard errors. The values of the means – compared with the previous ones – (using the scale parameter) do not vary much, but the standard errors have a more marked variation.

### 5.2.4.3 Using a Different Distribution

Another way to account for the problem of overdispersion when using a Poisson distribution is to change the assumed distribution of the response variable. Poisson variables have the same mean and variance, but, in biological sciences, with variables such as counts, this assumption is not always true. A negative binomial distribution is a good alternative (see Example 5.2), as previously discussed. A negative binomial variable's mean is denoted by the parameter λ > 0 and variance λ + ϕλ<sup>2</sup> by ϕ > 0. That is, the expected value E( y) = λ and variance Var( y) = λ + ϕλ<sup>2</sup> , where ϕ is the scale parameter. The components of this model are shown below:

Given that yij j bj~Poisson(λij), it is assumed that λij~Gamma~(1/ϕ, ϕ), with ϕ as the scale parameter and bj - N 0, σ<sup>2</sup> block . The new specification of the resulting GLMM is as follows:

> Distribution : yij j bj - Negative Binomialðλij, ϕÞ bj - <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> blockÞ Linear predictor: ηij = η þ τ<sup>i</sup> þ bj Link function: log λij = ηij:

The following GLIMMIX statements fit the model with a negative binomial distribution.

```
proc glimmix method=laplace;
class block Trt;
model count = Trt/dist=NegBin;
random block;
lsmeans Trt/ ilink ;
run;
```
Some of the most relevant outputs from GLIMMIX are presented in Table 5.18. Pearson's chi-squared (Pearson′ s chi - square/DF) value of 0.88 (part (a)) shows that overdispersion in the dataset has been removed. The estimated scale parameter tabulated in part (b) (Scale) is ϕ = 0:1080. This value is not the same scale parameter estimated using the Poisson model with the "random \_residual\_" command, since the methodology for calculating them in these models is different. However, as mentioned above, both scale parameters affect the relationship between the mean and variance in the Poisson and negative binomial distributions.

The value of the test statistic shown in part (c) of Table 5.18, under the negative binomial distribution for the effect of treatments, is highly similar to the value obtained with the Poisson distribution when the effect of the block × treatment interaction was added to the linear predictor. The values under "Estimate" are estimates of the linear predictor on the model scale (part (d)), whereas those under the "Mean" column are the treatment means on the data scale, using the negative binomial distribution. Of the three proposed alternatives to fit these data, the last two (including in the predictor the block–treatment interaction and assuming a negative binomial distribution) provides a better fit.


Table 5.18 Fitting results by redefining the model structure

# 5.2.5 Factorial Designs

Many experiments involve studying the effects of two or more factors. Factorial designs are the most efficient for these types of experiments. In a factorial design, all possible combinations of factor levels are investigated in each replicate. If there are a levels of factor A and b levels of factor B, then each replicate contains all ab treatment combinations.

### 5.2.5.1 Example: A 2 × 4 Factorial with a Poisson Response

This application refers to a factorial experiment involving explants from cotyledons of cucumber (Cucumis sativus L.) with two factors, i.e., genotype (two levels) and culture medium (four levels). Each of the eight combinations of the genotype and culture levels were applied to four Petri dishes, each containing six leaf explants. The response variable was the number of buds in each of the leaf explants, i.e., the response variable was a count. There are two sources of variation in this application, namely, variation between Petri dishes and variation between the explants within the Petri dishes (Table 5.19).

The sources of variation and degrees of freedom for this experiment are shown in Table 5.20.

The components that define this model are shown below:

$$\begin{aligned} \text{Distribution} &: y\_{jk} \mid \text{petri.disk}\_k, \\ \text{explante}(\text{petri.disk})\_{l(k)} &\sim \text{Poisson}(\lambda\_{ijk}) \text{pert.disk}\_j \sim N\left(0, \sigma\_{\text{pert.disk}}^2\right), \\ \text{explant}(\text{petri.disk})\_{l(k)} &\sim N\left(0, \sigma\_{\text{explant(pert.disk)}}^2\right) \end{aligned}$$

Linear predictor : <sup>η</sup>ijkl <sup>=</sup> <sup>η</sup> <sup>þ</sup> <sup>α</sup><sup>i</sup> <sup>þ</sup> <sup>β</sup><sup>j</sup> þ ðαβÞij <sup>þ</sup> petri:dish<sup>k</sup> <sup>þ</sup> explantðpetri:dishÞ<sup>l</sup>ðk<sup>Þ</sup>

Link function: log λijkl = ηijkl

where ηijkl is the linear predictor in genotype i (i = 1, 2), culture medium j ( j = 1, 2, 3, 4), Petri.dish k (k = 1, 2, 3, 4), and explant l (l = 1, 2, 3, 4, 5, 6), η is the intercept, α<sup>i</sup> is fixed effect due to genotype i, β<sup>j</sup> is the fixed effect due to culture medium j, (αβ)ij is the effect of the interaction between genotype i and culture medium j, Petri.dish<sup>k</sup> is the random effect of the Petri.dish, and explant (Petri.dish)l(k) is the random effect of the explant within the Petri.dish, assuming Petri:dish<sup>j</sup> - <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> Petri:dish<sup>Þ</sup> and explantðPetri:dishÞ<sup>l</sup>ðk<sup>Þ</sup> - <sup>N</sup>ð0, <sup>σ</sup><sup>2</sup> explantðPetri:dishÞÞ.

The following GLIMMIX procedure fits a factorial experiment with a Poisson response.

```
proc glimmix method=laplace ;
class genotype culture petri.dish explant;
model y = genotype|culture/dist=Poisson;
random petri.dish explant(petri.dish));
lsmeans genotype|culture/ilink lines;
run;
```
Some of the SAS output is shown in Table 5.21. The fit statistics in part (a) for this dataset are shown below. Note that "method = laplace" was used for the estimation process and to obtain Pearson's fit statistic χ<sup>2</sup> /DF. The result indicates that there is evidence of overdispersion (Pearson′ s chi - square/DF = 1.84).

Overdispersion, as discussed before, implies more variability in the data than would be expected, potentially explaining the lack of fit in a Poisson model. Part (b) shows the variance component estimates due to Petri\_dish, which is equal to σ^<sup>2</sup> Petri:dish = 0:003616, and, for the explants within Petri.dish, it is σ^2 explantðPetri:dish<sup>Þ</sup> <sup>=</sup> <sup>0</sup>:01462. However, the type III test of fixed effects indicates that there is a statistically significant effect of genotype, culture medium, and the interaction of both factors (part c).

The plot of residuals against the linear predictor in Fig. 5.5 provides further evidence of possible overdispersion.

The least squares means on the model scale for the genotype (part (a)), the culture medium (part (b)), and the interaction between both factors (part (c)) are listed under the "Estimate" column of Table 5.22, whereas under the "Mean" column are the means of these factors but in terms of the data.

### 5.2 The Poisson Model 151


Table 5.19 Number of buds counted in the cucumber experiment

Table 5.20 Sources of variation and degrees of freedom



Table 5.21 Conditional fit statistics, variance component estimates, and type III tests of fixed effects under the Poisson distribution

Since there is overdispersion in the data, we will fit the GLMM again using the negative binomial distribution. That is, under the following GLMM:

Distribution : yij <sup>j</sup> Petri:dishk, explantðPetri:dishÞ<sup>l</sup>ðk<sup>Þ</sup> -Negative Binomailðλij, ϕÞ,

$$\text{Petri.dish}\_{j} \sim N(0, \sigma\_{\text{Petri.dish}}^2),$$

$$\text{explant } (\text{Petri.disk})\_{l(k)} \sim N\left(0, \sigma\_{\text{explant}(\text{Petri.disk})}^2\right),$$

Linear predictor : <sup>η</sup>ijkl <sup>=</sup> <sup>η</sup> <sup>þ</sup> <sup>α</sup><sup>i</sup> <sup>þ</sup> <sup>β</sup><sup>j</sup> þ ðαβÞij <sup>þ</sup> Petri:dish<sup>k</sup> <sup>þ</sup> explantðPetri:dishÞ<sup>l</sup>ðk<sup>Þ</sup>

Link function: log λijkl = ηijkl

and the scale parameter ϕ.

The following GLIMMIX program allows us to fit a GLMM with a negative binomial response variable.

```
proc glimmix ;
class genotype culture petri.dish explant ;
model y = cultivar|culture/dist=NegBin link=log;
random petri.dish Explant(petri.dish);
lsmeans cultivar|culture;
run;
```
It should be noted that this program is very similar to the previous one, and the only difference is that now a negative binomial distribution is used ("dist = negbin"). Part of the results is presented in Table 5.23. As we have already mentioned, a negative binomial distribution is another model for count variables when there is overdispersion in the dataset. If Pearson's chi-squared value divided over the degrees of freedom is less than or equal to 1, then the overdispersion is 0 or close to 0, which means that the model is able to efficiently capture the degree of overdispersion.

Fig. 5.5 Studentized conditional residuals

Based on the conditional distribution, Pearson's chi-squared (χ<sup>2</sup> /DF = 0.83) fit statistic indicates that we have no evidence of overdispersion, so we can justify the negative binomial distribution, which is better than the Poisson distribution implemented above. In part (b), we show that the estimated scale parameter is ϕ = 0:1712. This value is not the same as the parameter for the quasi-Poisson model obtained with the "random \_residual\_" command. Note that the variance components were slightly affected. Additionally, in Table 5.23, we can see the type III tests for the fixed effects of the model in part (c), where a significant effect of genotype, culture, and the interaction between both factors (genotype\*culture) can be observed on the number of buds in the leaf explant.

The "lines" option in the "lsmeans" command is used to obtain Fisher's least significant difference (LSD) means for both factors and their interaction. The means and their respective standard errors, on the model scale ("Estimate" column) and on the data scale ("Mean" column), are tabulated in Table 5.24, the genotype and culture medium are in Table 5.25, and the interaction between both factors is in Table 5.26. The estimated values in this mean comparison for cultivar (Table 5.24) correspond to the values of the linear predictor η<sup>i</sup> on the model scale, whereas the means on the data scale is λ<sup>i</sup> (part (a)) and the comparison of means (on the model scale) are tabulated in part (b).


Table 5.22 Estimates on the model scale and means on the data scale under the Poisson distribution

Table 5.23 Conditional fit statistics, variance component estimates, and type III tests of fixed effects under the negative binomial distribution



Table 5.24 Estimates on the model scale and means on the data scale under the negative binomial distribution

Table 5.25 Means estimates on the model scale and data scale for the culture medium


(b) T grouping of culture least squares means (α=0.05)

LS means with the same letter are not significantly different


For the culture medium (Table 5.25), the estimated values in this comparison of means correspond to the values of the linear predictor η<sup>j</sup> (on the model scale), but, by applying the inverse link to ηj, we obtain the values under the "Mean" column that provide the means on the data scale (part (a)). The mean comparisons on the model scale are shown in part (b).

The results indicate that the means in culture media 2 and 3 provided a statistically similar average number of buds compared to the means in culture media 1 and 4 (see Fig. 5.6).

The interaction between both factors (Table 5.26), the average number of buds, and the mean comparisons are shown in Table 5.26.


Table 5.26 Estimates on the model scale and means on the data scale for the interaction between genotype and culture medium

Fig. 5.6 Comparison of the average number of buds as a function of the type of culture medium (LSD, α = 0.05)

The values under "Estimates" (Table 5.26) correspond to those of the linear predictor ηij (model scale), but the values under "Mean" correspond to the means λij on the data scale.

Graphically, Fig. 5.7 shows that genotype 1 in culture medium 2 provides the highest number of buds, whereas the lowest number of buds was observed in culture medium 4. For genotype 2, the highest number of buds was observed in culture media 2 and 3. Finally, culture medium 4 is less suitable for both genotypes.

Fig. 5.7 Effect of the cultivar × culture medium interaction on the average number of buds (LSD, α = 0.05)

# 5.2.6 Latin Square (LS) Design

A Latin square (LS) is used where heterogeneity is associated with the crossing of two factors, generally, both with the same number of levels. This design was originally used in agricultural experimentation with plots placed in a square arrangement, with expected heterogeneity along the rows and columns of the square. Blocking in both directions across rows and columns is done in this experimental design. Sometimes in experimentation, blocking in two directions may be appropriate, i.e., the use of an LS design is a good option. Some examples are provided below to illustrate the use of this experimental design:


For an LS layout, the number of rows (r) and columns (c) should be equal to the number of treatments (t) and the number of replicates of each treatment. The assignment of treatments is such that each treatment appears exactly once in each


row and column, with each row and column containing a full set of treatments. Thus, the treatment effect estimates are independent of the differences between rows or columns, and the rows, columns, and treatments are orthogonal to each other.

The analysis of variance for this experimental design, assuming that there are r rows, c columns, and t treatments, with r = c = t, contains the following sources of variability (Table 5.27).

From the analysis of variance table, the linear model for an LS design with t treatments is as follows:

$$\mathbf{y}\_{ijk} = \mu + f\_j + c\_k + \mathfrak{r}\_i + \varepsilon\_{ijk}$$

where yijk is the response observed in treatment i in row f and column c, μ is the overall mean, fj is the random effect of row j assuming f <sup>j</sup> - N 0, σ<sup>2</sup> <sup>f</sup> , ck is the random effect of column k with ck - N 0, σ<sup>2</sup> <sup>c</sup> , <sup>τ</sup><sup>i</sup> is the fixed effect of treatment <sup>i</sup>, and εijk is the distributed random error term N(0, σ<sup>2</sup> ). Note that the treatments are allocated in the jkth quadrant (in row j and column k).

### 5.2.6.1 Latin Square Design with a Poisson Response

In a series of field experiments, several "inducer-attractant" strategies were tested to control insect pests in oilseed rape. In one experiment, the use of wild turnip rape (turnip rape) as an earlier flowering trap crop (TR) (the "attractor") was tested together with the use of a repellent (an antifeedant) applied to oilseed rape in spring (S, the "inducer"). Untreated oilseed rape (U) was included as a control. The experiment was set up as a 6 × 6 Latin square with two replicates of each of the three treatments per row and column. An assessment of the number of mature pollen beetles was made on 10 plants per plot in early April, 1 day after spraying the repellent (antifeedant). The average number of adult beetles sampled on 10 plants per plot was recorded (Appendix 1: Data: Beatles). The question is: Is there evidence that the attractor or inducer works? That is, are fewer beetles present in the proposed treatments compared to the control?

The model components that define this GLMM are as described below:

```
Distribution: yijkl j f j, ck -
                            Poisson λijkl
      f j -
         N 0, σ2
                   f , ck -
                           N 0, σ2
                                    c
Linear predictor: ηijkl = η þ f j þ ck þ τi
```
Link function: log λijkl = ηijkl

where ηijkl is the linear predictor that relates the effect of the repetition l (l = 1, 2) in row j ( j = 1, 2,⋯, 6) and column k (k = 1, 2, ,⋯, 6) when treatment i is applied (i = 1, 2, 3, ), η is the intercept, τ<sup>i</sup> is the fixed effect of treatment i, fj is random effect of row j, and ck is the random effect due to column k, assuming that there is no interaction between the rows and columns as well as between the treatments and rows or the treatments and columns. The assumed distributions for rows and columns are f - N 0, σ<sup>2</sup> <sup>f</sup> and ck - N 0, σ<sup>2</sup> <sup>c</sup> , respectively. The model uses the linear predictor (ηijkl) to estimate the means (λijkl = μijkl) of the treatments.

The following GLIMMIX program fits a Latin square design with a Poisson response:

```
Proc glimmix nobound method=laplace;
class Row Column Treatment;
model count = treatment/dist=Poi link=log;
random row column;
lsmeans treatment/lines ilink;
run;
```
Part of the output is shown in Table 5.28. In the values of the fit statistics (part (a)), we observe that the value of Pearson's chi-square divided by the degrees of freedom is less than 1 <sup>χ</sup><sup>2</sup> DF = 0:55 , indicating that there is no overdispersion in the data and that the Poisson distribution adequately models the dataset.

The type III tests of fixed effects in part (b) indicate that there is no significant evidence of differences between the treatments (P = 0.0621).

Part (c) of Table 5.28 shows the estimates of treatments on the model scale ("Estimate") and on the data scale ("Mean") with their respective standard errors. The values 4.6191, 6.9396, and 5.1561 (under the "Mean" column) correspond to the treatment means for S, TR, and U, respectively.

### 5.2.6.2 Randomized Complete Block Design in a Split Plot

Sometimes the researcher is interested in testing multiple factors using different experimental units, and, in most cases, the experimenter cannot randomly accommodate the treatment combinations. Suppose that one wishes to test two factors, A and B with a and b levels each, respectively. The levels of the first factor (A) are randomly applied to the primary experimental units. Then, the levels of the second


Table 5.28 Results of the analysis of variance

factor (B) are applied to the secondary subunits formed within the primary unit in which the first factor was applied. In other words, the primary experimental unit (whole plot) was used for the application of the first factor; then, after this, it was divided to form the secondary experimental units (subplots) for the application of the levels of the second factor. Since the split-plot design has two levels of experimental units, the whole plot portions (primary units) and subplots (secondary units) have different experimental errors. Split-plot experiments were invented in agriculture by Fisher (1925), and their importance in industrial experimentation has been widely recognized (Yates 1935).

As a simple illustration, consider a study of three pulp preparation methods (factor A) and four temperature levels (factor B) on the effect of paper tensile strength (paper quality). A batch of pulp is produced by one of the three methods; it is then divided into four equal portions (samples). Each portion is cooked at a specific level of temperature. The assignment of treatments to plots and subplots is shown in Table 5.29.

The standard ANOVA model for two factors in a split-plot design, in which there are three levels of factor A and four levels of factor B nested within factor A, is described below:

$$y\_{ijk} = \mu + a\_i + r\_k + a(r)\_{ik} + \beta\_j + (a\beta)\_{ij} + \varepsilon\_{ijk}$$

where yijk is the observed response at level i (i = 1, 2, 3) of factor A and at level j ( j = 1, 2, 3, 4) of factor B in block k (k = 1, 3, 3), μ is the overall mean, α<sup>i</sup> is the effect at level i of factor A, rk is the random effect of blocks assuming rk - N 0, σ<sup>2</sup> r , α(r)ik is the random effect of the error of the whole plot assuming αð Þr ik - N 0, σ<sup>2</sup> <sup>α</sup>ð Þ<sup>r</sup> , <sup>β</sup><sup>j</sup> is the effect at level <sup>j</sup> of factor B, (αβ)ij is the interaction


Table 5.29 Assigning treatments to whole plots and subplots

Table 5.30 Sources of variation and degrees of freedom for a randomized block design with a split-plot treatment arrangement


fixed effect at level i of factor A and at level j of factor B, and εijk is the normal random experimental error {εijk~iidN(, σ<sup>2</sup> )}. The ANOVA table with sources of variation is shown in Table 5.30 for this experimental design.

Example 5.1 A split-plot design in randomized complete block arrangement with a Poisson response

A split plot is probably the most common design structure in plant and soil research. Such experiments involve two or more treatment factors. Typically, large units called whole plots are grouped into blocks. The levels of the first factor are randomly assigned to whole plots. Each whole plot is divided into smaller units, called subplots (split plots). Next, the levels of the second factor are randomly assigned to units of split plots within each whole plot.

In this example, four blocks were implemented, which were divided into seven parts for the seven levels of the first factor (A1, A2, A3, A4, A5, A6, and A7), as whole plots. Then, each whole plot was divided into four units for randomly assigning the four levels of factor B, known as subplots (B1, B2, B3, and B4). Both factors were used to control the growth of a particular weed. Both factors were randomly allocated in each block, as shown below:


Total r × a × b - 1 = 4 × 7 × 4 - 1 = 111


The sources of variation and degrees of freedom for this experiment are shown below in Table 5.31:

In this experiment, the response variable was the number of weeds in each of the plots (Appendix 1: Weed counts). The components that define this GLMM are as shown below:

$$\begin{aligned} \text{Distribution: } y\_{ijk} \mid r\_k, a(r)\_{ik} &\sim \text{Poisson}\left(\lambda\_{ijk}\right) \\ r\_k &\sim N\left(0, \sigma\_r^2\right), a(r)\_{ik} \sim N\left(0, \sigma\_{ar}^2\right) \end{aligned}$$

$$\begin{aligned} \text{Linear predictor: } & \eta\_{ijk} = \eta + a\_i + r\_k + a(r)\_{ik} + \beta\_j + (a\beta)\_{ij} \end{aligned}$$

$$\begin{aligned} \text{Link function: } & \log\left(\lambda\_{ijk}\right) = \eta\_{ijk} \end{aligned}$$

where ηijk is the linear predictor that relates the effect of factor A with i levels (i = 1, 2,⋯, 7)and factor B with j levels ( j = 1, 2, 3, 4) in block k with (k = 1, 2, 3, 4); η is the intercept, α<sup>i</sup> is the fixed effect at level i of factor A, β<sup>j</sup> is the fixed effect at level j of factor B, (αβ)ij is the fixed effect of the interaction between level i of factor A and level j of factor B, rk is the random effect due to block; and α(r)ik is the random error effect of the whole plot, assuming rk - N 0, σ<sup>2</sup> <sup>r</sup> and αð Þr ik - N 0, σ<sup>2</sup> AR , respectively. The model uses the aforementioned linear predictor (ηijk) to estimate the means (λijk = μijk) of the treatments.

The following GLIMMIX program fits a split-plot block design with a Poisson response variable:

```
proc glimmix method=laplace;
class block a b;
model count=a|b / dist=Poisson link=log;
random block block*a;
lsmeans a|b /lines ilink;
run;
```
Part of the output is shown below.

As in the previous examples, the Poisson model was found to be inadequate because the value of Pearson's chi-squared statistic divided by the degrees of


freedom is greater than 1 <sup>χ</sup><sup>2</sup> df = 4:50 .This indicates that we have probably misspecified either the conditional distribution of <sup>y</sup> <sup>j</sup> <sup>b</sup> or the linear predictor, but, in this case, there is evidence that we need to look for other distributions for this dataset (part (a), Table 5.32. In addition, in part (b), the values of variance component estimates due to blocks and blocks × A are tabulated σ^2 <sup>r</sup> = 0:01526; σ^<sup>2</sup> ra <sup>=</sup> <sup>0</sup>:2454 . On the other hand, the type III tests of fixed effects (part (c)) show a significant effect of factor B and the interaction between both factors.

An alternative to reduce the overdispersion is to keep the same linear predictor, changing the Poison distribution in the response variable by the negative binomial distribution, that is:

> Distribution: yijk j rk, αð Þr ik - Negative binonial λijk, ϕ rk iidN 0, σ<sup>2</sup> <sup>r</sup> , αð Þr ik iidN 0, σ<sup>2</sup> AR Linear predictor: ηijk = η þ α<sup>i</sup> þ rk þ αð Þr ik þ β<sup>j</sup> þ ð Þ αβ ij Link function: log λijk = ηijk

The following syntax fits a GLMM under a negative binomial distribution.

```
proc glimmix method=Laplace;
class block a b;
model count=a|b / dist=NegBin link=log;
random intercept a /subject=block;
lsmeans a|b/lines ilink;
run;
```
Part of the output is shown below (Table 5.33). According to the results tabulated in (a), they indicate that the overdispersion has been removed from the analysis


Table 5.33 Results of the

χ2 df = 0:71 . The variance components estimates, tabulated in part (b), are σ<sup>2</sup> <sup>r</sup> = 0:0024 and σ<sup>2</sup> AR = 0:1222 for blocks and blocks × A, respectively. The estimated scale parameter is ϕ = 0:3458. Note that the results under the negative binomial distribution differ from those obtained under the Poisson distribution, which is due, of course, to the fact that the negative binomial distribution better captures overdispersion. The fixed effects F-test for factor A is significant at the 5% significance level (part (c)), whereas factor B and the interaction effect do not significantly influence the response variable.

Example 5.2 A split-split plot in time in a randomized complete block design with a Poisson response.

The propagation of coffee seedlings through grafting in nurseries depends on several factors such as the type of substrate, the rootstock of the plant that will host the graft, type of graft, light intensity, type and size of the container, humidity, temperature, and so forth. The objective of this experiment was to evaluate the effect of shade cloth (light intensity), type of container, and clone on the number of leaves produced by the Coffea canephora P. clones grafted with the Coffea arabica L. variety Oro azteca.

The factors studied were the color of the shade cloth (black, pearl, and red), container size (tube of 0.5 kg and 1 kg), and five coffee clones of the variety Coffea canephora P. plus a franc foot (Coffea arabica L. and Var. Oro azteca) over a period of 11 months (Appendix 1: Coffee data). The clones used in the experiment are listed below (Table 5.34). Different physiological parameters were evaluated for 11 months.

This work was implemented in four randomized complete blocks. The following table exemplifies how a block was constructed.

### 5.2 The Poisson Model 165


Table 5.34 Clones of Coffea canephora P


The statistical model describing a split-split plot in time design is described below:

$$\begin{split} \mathbf{y}\_{ijklm} &= \mu + a\_i + r\_m + (ar)\_{im} + \boldsymbol{\beta}\_j + (a\boldsymbol{\beta})\_{ij} + \boldsymbol{\chi}\_k + (a\boldsymbol{\eta})\_{ik} + (\boldsymbol{\eta}\boldsymbol{\eta})\_{jk} + (a\boldsymbol{\eta}\boldsymbol{\gamma})\_{ijk} \\ &+ (rab\boldsymbol{\gamma})\_{ijklm} + \boldsymbol{\tau}\_l + (a\boldsymbol{\tau})\_{il} + (\boldsymbol{\beta}\boldsymbol{\tau})\_{jl} + (a\boldsymbol{\rho}\boldsymbol{\tau})\_{ij} + (\boldsymbol{\chi}\boldsymbol{\tau})\_{kl} + (a\boldsymbol{\eta}\boldsymbol{\tau})\_{ikl} \\ &+ (\boldsymbol{\beta}\boldsymbol{\gamma}\boldsymbol{\tau})\_{jkl} + (a\boldsymbol{\rho}\boldsymbol{\gamma}\boldsymbol{\tau})\_{ijkl} + \boldsymbol{\varepsilon}\_{ijklm} \\ \end{split}$$
 
$$i = 1,2,3; j = 1,2,3, 4, 5; k = 1,2,3; l = 1,\cdots, 11; m = 1,2,3, 4$$

where yijklm is the response variable in repetition m, shade cloth i, clone j, and tray k in time l; μ is the overall mean; α<sup>i</sup> is the fixed effect due to the type of shade cloth; βj, γk, and τ<sup>l</sup> are the fixed effects due to clone type, tray,and sampling time, respectively; (αβ)ij, (αγ)ik,(βγ)jk, (ατ)il, (βτ)jl, and (γτ)kl are the effects of the double interactions of the factors shade cloth type with clone, tray, and sampling time; (αβγ)ijk, (αβτ)ij, (αγτ)ikl, (βγτ)jkl, and (αβγτ)ijkl are the effects of the third and fourth interactions of the factors under study; (ar)im is the random effect of blocks with type of shade cloth with rm, (ar)im, (rabγ)ijkm are the random effect due to blocks, blocks with type of shade cloth, blocks with type of shade cloth, and time assuming rm - N 0, σ<sup>2</sup> <sup>r</sup> , ð Þ ar im - N 0, σ<sup>2</sup> <sup>r</sup><sup>α</sup> ð Þ rabγ ijkm - N 0, σ<sup>2</sup> αβγð Þ rep , and <sup>ε</sup>ijklm is random error {εijklm~N(0, σ<sup>2</sup> )}.

The following SAS program fits a GLMM in a split-split plot in time under a randomized complete block design with a Poisson response.

proc glimmix data=work.Nhojas\_cafe nobound method=laplace; class shade clone tray rep time;


Table 5.35 Fit statistics for choosing the correlation structure

Table 5.36 Conditional fit statistics and variance component estimates


random intercept shade shade\*clone\*tray/subject=rep type=ar(1) ;

lsmeans shade|clone|tray|time /lines ilink;

run;

Some of the results are listed below. To study which correlation structure best fits this experimental design, five types of correlation structures were tested (Table 5.35): compound symmetry ("CS"), autoregression of order 1 ("AR(1)"), unstructured ("UN"), Toeplizt of order 1 ("Toep(1)"), and ante (ANTE(1)). To do this, in the "random" command with the "type" option, the type of correlation to be tested is specified, and it is here where the option of type of variance–covariance structure must be changed. The fit statistics indicate that the variance–covariance structure that best fits the model is the autoregressive structure of order 1 <sup>h</sup>AR(1)i. This can be seen in the following table in which the goodness-of-fit statistics for choosing between all these variance–covariance structures are reported.

Table 5.36 shows the conditional statistics and variance component estimates. The fit statistic Pearson′ s chi - square/DF = 0.57 in part (a) indicates that, in a conditional model, there is no evidence of mis-specifying the distribution or linear predictor. In other words, there is no overdispersion in the dataset, and, therefore, it is reasonable that the analysis and inference can be based on the Poisson model.

The analysis of variance for the type III tests of fixed effects (Table 5.37) indicates that there is a highly significant effect of the main effect type of shade cloth (P = 0.0001), clone (P = 0.0001), and tray (P = 0.0001) as well as of most of the interactions, except for the interactions shade\_cloth\*clone; (P = 0.3846),

model y = shade|clone|tray|time/dist=poi link=log;


Table 5.37 Type III fixed effects tests

Table 5.38 Estimated means on the model scale and on the data scale for the shade cloth


(b) T grouping of shade cloth least squares means (α=0.05)



shade\_cloth\*tray\*time (P = 0.9289), clone\*tray\*time (P = 0.9760), and shade\_cloth\*clone\*tray\*time (P = 0.2484).

The means and standard errors of each of the main effects, on the data scale, for shade\_cloth, tray, and clone are shown in the "Mean" column in part (a) of Table 5.38, whereas in part (b), the mean comparisons for the type of shade cloth are shown.

Table 5.39 presents the estimates of the linear predictor ("Estimates" column) in terms of the model scale and treatment means in terms of the data scale ("Mean" column) for the type of clone (part (a)). In addition, in Table 5.39 (part (b)), the mean comparisons are presented for the type of clone.


Table 5.39 Estimated means on the model scale and on the data scale for the type of clone

C3 1.5064 B C1 1.5008 B C4 1.4750 C B C2 1.4250 C

C5 1.5965 A

Table 5.40 Estimated means on the model scale and on the data scale for the tray factor


(b) T grouping of tray least squares means (α=0.05)

LS means with the same letter are not significantly different


Table 5.40 presents the estimates for the levels of the tray on both scales (part (a)). Similarly, in this table (part (b)), the treatment mean comparisons are presented for the levels of the tray.

Tables 5.41, 5.42, 5.43, and 5.44 show the means and standard errors on both scales of the two-factor and three-factor interactions.

Interaction type of shade cloth\*clone

Interaction type of shade cloth\*tray

Interaction clone\*tray

Interaction shade\*clone\*tray

Although it is not the objective of this book, part of the results is discussed below. In Fig. 5.8, it is possible to observe that the red shade cloth significantly stimulates leaf production in coffee grafts, followed by the black and pearl shade cloths. The


Table 5.41 Estimated means on the model scale and on the data scale for the type of shade cloth\*clone

Table 5.42 Estimated means on the model scale and on the data scale for the interaction type of shade cloth\*tray


production of leaves in coffee grafts shows a bimodal figure that can be due to factors such as humidity and temperature. Extreme conditions of both factors cause stress at the growing points and, therefore, the appearance of leaves.

Regarding the type of clone used as rootstock, the clones showed a better average leaf production in months 5 and 6, whereas the lowest production was observed in


Table 5.43 Estimated means on the model scale and on the data scale for the clone–tray interaction

months 1, 2, 8, and 9. The franc foot showed a higher average of leaves compared to the rest of the clones (Fig. 5.9).

# 5.3 Exercises

Exercise 5.3.1 A researcher in the area of plant sciences wants to know what is the response of a plant in vitro culture when it is exposed to different concentrations (ppm) of a chemical compound to the number of outbreaks that the explant produces (yij). The data for this experiment are given below (Table 5.45):



Table 5.44 Estimated means on the model scale and on the data scale for the shade–clone–tray interaction


Table 5.44 (continued)

Fig. 5.8 Effect of mesh type on the average number of leaves

Fig. 5.9 Effect of mesh type on the average number of leaves

Exercise 5.3.2 Earthworms (Lubricus terrestris L.) were counted in four replicates of a factorial experiment at the W.K. Kellogg Biological Station in Battle Creek, Michigan, in 1995. A 24 factorial experiment was conducted. Factors and treatment levels were plowing (chiseled and unplowed), input level (conventional and low), manure application (yes/no), and crop (corn and soybean). The objective of interest was whether L. terrestris density varies according to these management protocols and how various factors act and interact. The data (not pooled) in the table shows the total worm counts (per square foot) in the factorial design 2<sup>4</sup> for the experimental units 64 (2<sup>4</sup> × 4) (juvenile and adult worms). The numbers in each cell of the table correspond to the counts in the replicates (Table 5.46).


Exercise 5.3.3 This experiment involves an investigation of genotypic variation within cultivars of pore (Allium porrum L.) with respect to adventitious shoot formation in the callus tissue. The data in Table 5.47 refer to 20 genotypes of 1 cultivar. Each genotype is represented by six calluses. These observations are the number of shoots per callus. The data are subject to two sources of variation, i.e., variation between genotypes and variation between the calluses within the genotypes.


Table 5.45 In vitro culture (Conc = concentration in ppm)


Table 5.46 Results of the experiment with earthworms




Exercise 5.3.4 In an experiment at the Research Institute for Animal Production "Schoonoord" in the Netherlands, the effects of active immunization against androstenedione on the fertility of Texel ewes were studied (Engel and te Brake 1993). The number of fetuses per ewe can be considered as the net result of a process that determines the number of ovulations and a probability process for these ovulations to produce fetuses. In this study, the goals are to model and analyze (a) the number of ovulations and the number of fetuses in relation to Fecundin (androstenedione-7acarboxyethylthioether) treatment, animal age, mating period and (b) the number of fetuses in relation to treatment, animal age, and number of ovulations observed. A summary of the experiment and a summary of the data are shown below (Table 5.48).

Of the 125 Texel ewes, 63 are treated with Fecundin, whereas the remaining 62 serve as a control group. The ewes are sorted into four age classes (e.g.,<0.5, 0.5 - 1.5, 1.5 - 2.5, and > 2.5 years) and two mating periods (starting on October 1 and October 22, 1986, respectively). The interactions with age are interesting and because it is a factor, it is easier to handle than a covariate where age was entered as a factor. The number of animals in the four age classes is 25, 44, 24, and 32, respectively. The age class is evenly distributed in the combinations of mating period and treatment groups. Ewes were slaughtered at 75–80 days after the last mating, and the number of ovulations and number of fetuses were determined. Ovulation numbers ranged from 1 to 5. For six animals, the number of ovulations was not known, so these ewes were excluded from the database.


Exercise 5.3.5 The following example deals with one of the most harmful insects in the root system of the main crops, whose common name is "blind hen." The experiment consisted of six treatments formulated for larval control in a randomized block arrangement (A, B, C, D, E, and F). The count per area shows the number of larvae in two age groups (a and b) (Table 5.49).


5.3 Exercises 177



# Appendix 1

Data: Subcultures


### Appendix 1 179


### Data: Beatles



### Data: Weed counts



### Appendix 1 181



### Appendix 1 183

Coffee data



### Appendix 1 185


Coffee data


### Appendix 1 187


Coffee data

(continued)

### 188 5 Generalized Linear Mixed Models for Counts


### Appendix 1 189



### Appendix 1 191



### Appendix 1 193



### Appendix 1 195



### Appendix 1 197


Coffee data


### Appendix 1 199



### Appendix 1 201



### Appendix 1 203



### Appendix 1 205



### Appendix 1 207

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 6 Generalized Linear Mixed Models for Proportions and Percentages

# 6.1 Response Variables as Ratios and Percentages

In this chapter, we will review generalized linear mixed models (GLMMs) whose response can be either a proportion or a percentage. For proportion and percentage data, we refer to data whose expected value is between 0 and 1 or between 0 and 100. For the remainder of this book, we will refer to this type of data only in terms of proportion, knowing that it is possible to change it to a percentage scale only when multiplying it by 100. Proportions can be classified into two types: discrete and continuous. Discrete proportions arise when the unit of observation consists of N distinct entities, of which individuals have the attribute of interest "y". N must be a nonnegative integer and "y" must be a positive integer; here, y ≤ N. Therefore, the observed proportion must be a discrete fraction, which can take values 0 N , <sup>1</sup> <sup>N</sup>, ⋯, <sup>N</sup> <sup>N</sup>. A binomial distribution is the sum of a series of m independent binary trials (i.e., trials with only two possible outcomes: success or failure), where all trials have the same probability of success. For binary and binomial distributions, the target of inference is the value of the parameter such that 0≤ E <sup>y</sup> <sup>N</sup>= π ≤ 1. Continuous proportions (ratios) arise when the researcher measures responses such as the fraction of the area of a leaf infested with a fungus, the proportion of damaged cloth in a square meter, the fraction of a contaminated area, and so on. As with the binomial parameter π, the continuous rates (fractions) take values between 0 and 1, but, unlike the binomial, the continuous proportions do not result from a set of Bernoulli tests. Instead, the beta distribution is most often used when the response variable is in continuous proportions. In the following sections, we will first address issues in modeling when we have binary and binomial data. When the response variable is binomial, we have the option of using a linearization method (pseudolikelihood (PL)) or the Laplace or quadrature integral approximation (Stroup 2012).

# 6.2 Analysis of Discrete Proportions: Binary and Binomial Responses

A binomial distribution is the number of successes from a series of N independent binary trials – Bernoulli trials (i.e., trials with two possible outcomes: success or failure), where all trials have the same probability of success. In the context of a GLMM, there are N binomial responses, each of which is the result of binary trials. The ith response consists of two pieces of information: the number of trials ni and the number of successes yi, as shown in the following example.

# 6.2.1 Completely Randomized Design (CRD): Methylation Experiment

An agent to induce demethylation is applied to plants; this agent converts methylated nucleotides to their unmethylated forms, thus causing epigenetic changes that produce or induce abnormal phenotypes such as deformation or stunting (Amoah et al. 2008). A pilot study was implemented to investigate the relationship between the dose of the demethylating agent and the observed proportion of plants with a normal phenotype. Seeds were treated with the demethylating agent at six different doses, including the control. Plants were sown in trays, with each tray containing seeds previously treated with the same dose of the demethylating agent. Each dose was replicated 4 times: 2 with 60 plants and 2 with 100 plants. The trays were allocated following a completely randomized design (CRD). The plants with a normal phenotype in each tray are shown (in Table 6.1) with the number of plants per tray (N). The notation 59(60) indicates that 59 normal plants were found out of 60 plants under study. In the same way, the notation 14(100) indicates that 14 normal plants were found out of 100 plants under study.

The sources of variation and degrees of freedom (DFs) for this experiment are shown in Table 6.2.


Fig. 6.1 Effect of the demethylating agent on the proportion of normal plants

The statistical model of a completely randomized design (CRD) is

$$\mathbf{y}\_{ij} = \boldsymbol{\mu} + \mathbf{r}\_i + e\_{ij}$$

where yij is the number of observed normal plants in the tray j ( j = 1, 2, 3, 4) at the dose i (i = 1, 2,⋯, 6), μ is the overall mean, τi is the effect of dose i of the demethylating agent, and εij are non-normal errors.

The expected value (normal plants) of a set of tests ni follows a binomial distribution yi ~ Binomial(ni, πi), where πi is the probability of success in each trial, with 0 ≤ πi ≤ 1, where πi = yi=ni . Thus, the probability of observing an outcome yi can be written as

$$P(Y\_i = \mathbf{y}\_i | n\_i, \mathbf{y}\_i) = \binom{n\_i}{\mathbf{y}\_i} \pi\_i^{\mathbf{y}\_i} (1 - \pi\_i)^{n\_i - \mathbf{y}\_i}; \mathbf{y}\_i = \mathbf{0}, 1, \cdots, n\_i.$$

This probability depends on the number of known tests ni, whereas the probability of success (πi) is an unknown parameter. In Fig. 6.1, we observe that the probability of obtaining a normal plant depends on the applied dose of the demethylating agent. Given that yi has a binomial distribution, the expected value (the mean) is the product of the number of trials and the probability of success in each trial, that is, E(Yi) = niπi. Since the number of trials is fixed (once the data have been obtained), modeling the probability of success is equivalent to modeling the expected value as well as the variance since it is also a function of the number of trials and the probability of success. So, the expected value and variance of yi are

$$E(\mathbf{y}\_i) = \mu\_i = n\_i \pi\_i; \text{Var}(\mathbf{y}\_i) = n\_i \pi\_i (1 - \pi\_i) \dots$$

This variance is small if the value πi is close to 0 or 1, and this increases to its maximum when πi = 0.5. This can be seen in Fig. 6.1, where proportions close to 0 or 1 show less variance than do proportions between 0.1 and 0.2 for a demethylating agent dose of 0.5. This variance can also be written in terms of the expected value as:

$$\operatorname{Var}(\mathbf{y}\_i) = \frac{\mu\_i}{n\_i} (n\_i - \mu\_i).$$

In this CRD, the fixed number of treatments t (doses) were randomly assigned to r experimental units (trays). The linear predictor describing the structure of the mean of this GLMM is

$$
\eta\_i = \eta + \mathfrak{e}\_i
$$

where ηi denotes the ith linear predictor, η is the intercept, and τi is the fixed effect due to treatments i (i = 1, 2,⋯, t) with t treatments and ri replicates in each treatment.

The components that define this GLMM are shown below:

ð Þ<sup>i</sup> <sup>i</sup> Distribution: yi~Binomial(Nij, πi) Linear predictor: ηi = η + τ<sup>i</sup> Link function: logit π <sup>=</sup> logit <sup>π</sup>i <sup>=</sup> <sup>η</sup> <sup>1</sup> - <sup>π</sup><sup>i</sup>

where ηi is the linear predictor that relates the effect of dose i (i = 1, 2,⋯, 6) to probability πi. The model uses the linear predictor (ηi) to estimate the means (πi = μi) of the observations for each treatment.

The following GLIMMIX program fits a CRD with a binomial response:

```
proc glimmix nobound method=Laplace; 
class Dose Rep; 
model y/N= dose/link=logit; 
lsmeans dose/lines ilink; 
run;
```
In this example, the distribution of the dataset was not specified to GLIMMIX in the model specification because by using the expression "y/N," proc GLIMMIX automatically infers that this dataset has a binomial distribution. It is also important to note that variable dose and repetition were declared as class variables in the "class" command, which Statistical Analysis Software (SAS) interprets as explanatory variables that are nonnumerical factors. However, the variable declared "Rep" is not used in the model specification.


Table 6.3 Results of the analysis of variance

Part of the results is shown in Table 6.3. Pearson's chi-squared statistic value divided by the degrees of freedom in part (a) (Pearson′s chi - square/DF = 0.5) indicates that there is no evidence of extra-dispersion in the dataset. The analysis of variance (ANOVA) tabulated in part (b) in Table 6.3, with the type III tests of fixed effects, indicates that there is a highly significant difference (P = 0.0001) in the average proportion of normal plants with respect to the dose applied to the seeds.

The output when using the "lsmeans" command in conjunction with the "ilink" option is in the "Mean" column (part (c) in Table 6.3). These values are the values of πi ′ s, i.e., the estimated probabilities π^0 = 0:9813 and π^0:01 = 0:9813 of normal plants for the treatments whose doses are 0 and 0.01, respectively. For treatments with doses of 0.1 and 0.5, the observed probabilities of normal plants are π^0:1 = 0:8813 and π^0:5 = 0:1375, respectively, whereas for the 1 and 1.5 doses, the observed probabilities of normal plants decrease dramatically with π^1 = 0:02501 and π^1:5 = 0:03126, respectively.

Figure 6.2 shows the mean comparisons (least significance difference (LSD)) of the estimated probabilities according to the dose applied to the seeds in trays. In this figure, we can observe that in the treatments with dose = 0 (control) and dose = 0.01, the observed proportions of normal plants are not statistically different from each other, but they do differ with the other applied doses. At a dose of 0.1, the observed proportion of normal plants was 88.13%, and this was statistically different from all the doses used. Finally, doses at 0.5, 1, and 1.5 of the demethylating agent in the observed proportion of normal plants decreased drastically to 13.75%, 2.501%, and 3.12%, respectively. The doses of 1 and 1.5 produced statistically equal proportions of normal plants.

Fig. 6.2 Comparison of the estimated probabilities per dose of the demethylating agent

If the researcher wishes to model how dose levels of the demethylating agent affect normal plant proportions, then the dose must be declared as a continuous variable. The following SAS syntax with proc GLIMMIX runs a binomial regression:

```
proc glimmix data=crd_bin method=Laplace plots=all; 
class Rep; 
model y/N= dose/solution; 
random rep; 
run;quit.
```
Most of the commands and options have already been discussed throughout this book; the "model y/N" command indicates that the response variable is in a ratio. Therefore, this dataset is modeled with a binomial distribution, which is affected by the different number of individuals in each repetition. proc GLIMMIX interprets the distribution of the data as binomial, whereas the "solution" option requests the parameter estimates of the model (intercept and slope).

The components that define this GLMM are shown below:

ð Þ<sup>i</sup> <sup>1</sup> - <sup>π</sup><sup>i</sup> <sup>i</sup> Distribution: yi~Binomial(Nij, πi) Linear predictor: ηi = η + β dose<sup>i</sup> Link function: logit π = logit <sup>π</sup>i = η

Thus, the model can be written as


Table 6.4 Regression analysis results

$$\eta\_i = \log\left(\frac{\mu\_i}{n\_i - \mu\_i}\right) = \log\left(\frac{n\_i \pi\_i}{n\_i - n\_i \pi\_i}\right) = \log\left(\frac{\pi\_i}{1 - \pi\_i}\right) = \text{logit}(\pi\_i) = \eta + \beta \text{dose}\pi\_i$$

and the logit function can be written in terms of the probability of success, πi, as

$$\pi\_i = \frac{1}{1 + \exp(-\eta\_i)}$$

Part of the SAS output of the GLIMMIX syntax is shown below. The goodnessof-fit statistics, type III tests of fixed effects, and parameter estimates are shown in Table 6.4. The analysis of variance indicates that the demethylating agent has a highly significant effect on the observed proportion of normal plants (P < 0.0001) (part (b)). The maximum likelihood estimates for the intercept and slope are η = 2.7927 and β = - 7.6232, respectively.

Figure 6.3 shows that as the value of the linear predictor increases (ηi), the value of the residuals rapidly decreases. We can also see that the residuals plotted against the quantiles clearly do not follow a normal distribution because this model is not a linear function of the explanatory variable "dose."

Figure 6.4 shows that the proportions studied and fitted are not so far apart, and, as such, the binomial model is suitable for this dataset. The estimated linear predictor of this model is as follows:

$$
\hat{\eta}\_i = \hat{\eta} + \hat{\beta} \times \text{dose}\_i = 2.7927 - 7.6232 \times \text{dose}\_i.
$$

Fig. 6.3 A graph of residuals versus the linear predictor, quantiles

Fig. 6.4 Observed and estimated proportion

The logit of the probability of success is a linear function of the explanatory variables, so the model can be written in terms of the probability of success (observing normal plants) as

$$
\pi\_i = \frac{1}{1 + \exp^{\left(-\eta\_i\right)}}
$$

Given the parameter estimates, we can predict the success probability of observing a normal plant, and given a certain concentration of the demethylating agent, this estimated probability (using the estimated linear predictor) can be seen plotted in Fig. 6.4.

$$\hat{\pi}\_{i} = \frac{1}{1 + \exp^{(\hat{\eta}\_{i})}} = \frac{1}{1 + \exp^{(-2.7927 + 7.6232 \times \text{dose})}}$$

# 6.3 Factorial Design in a Randomized Complete Block Design (RCBD) with Binomial Data: Toxic Effect of Different Treatments on Two Species of Fleas

A group of researchers wishes to study the toxic effect of certain treatments (Trts) on two flea species (SP) (Daphnia magna and Ceriodaphnia dubia). To compare the toxicity effect of treatments on both flea species, a randomized complete block design (RCBD bioassay) was implemented with three replicates per treatment, with each replicate consisting of 10 fleas (Appendix: Fleas). The linear predictor describing this experiment is described below:

$$\eta\_{ijkl} = \eta + a\_i + \beta\_j + (a\beta)\_{ij} + \text{biassay}\_k + \text{rep}(\text{biasasy})\_{l(k)}$$

ð Þl kð Þ rep bioassay ð Þ where η is the intercept, αi is the fixed effect due to species i, βj is the fixed effect of treatment j, (αβ)ij is the fixed effects interaction between the flea species and treatment, bioassayk is the random effect due to bioassay k assuming bioassay<sup>k</sup> <sup>N</sup>0, <sup>σ</sup><sup>2</sup> bioassay , and rep(bioassay)l(k) is the random effect due to repetition bioassay assuming rep bioassay N 0, σ<sup>2</sup> .

The remaining components of this GLMM with a binomial response (Nijk, πijk) are described below:

j Distribution: yijkl bioassayk, rep(bioassay)l(<sup>k</sup>)~Binomial(Nijk, πijk)

bioassay<sup>k</sup> <sup>N</sup>0, <sup>σ</sup><sup>2</sup> bioassay , rep bioassay ð Þl kð Þ <sup>N</sup>0, <sup>σ</sup><sup>2</sup> rep bioassay ðÞ , where Nijkl is the number of dead fleas, observed in species i in replicate l in bioassay k under treatment j,

ijk 1 - πijk ijk Link function: logit π = log <sup>π</sup>ijk = η .

The following SAS syntax allows us to fit the GLMM with a binomial response.


Table 6.5 Results of the

proc glimmix data=pulgas nobound method=laplace; class Bioen SP Trt Rep ; Model Sobrevi/n = SP|Trat/dist=binomial; random Bioen sp\*bioen(rep); lsmeans SP|Trt/lines ilink;

run;

Part of the results is listed in Table 6.5. The fit statistics in part (a) and the conditional statistics in part (b) are useful for model comparison, whereas the variance component estimates are shown in part (c). The value of the statistic Pearson' s chi - square/DF = 0.10 indicates that the binomial model gives a good fit to the dataset. The variance component estimates for bioassays and replication nested in bioassays are σ^<sup>2</sup> bioassay = - 0:1051 and σ^<sup>2</sup> rep bioassay ðÞ = - <sup>0</sup>:1192, respectively. The type III tests of fixed effects (part (d)) show the significance tests of the fixed effects in the model. The treatment effect and the interaction between the flea species (SP) and treatment are clearly significant with P < 0.0001 and P = 0.0009, respectively.

Since survival was statistically similar in both flea species, we will focus on the factors that were significant. Part (a) in Table 6.6 shows the means and standard errors of treatments on the model scale ("Estimate" column) and on the data scale ("Mean" column), obtained with "lsmeans" and the "ilink" option as well as the mean comparisons, which are on the model scale (part (b)).


Table 6.6 Means and standard errors on the model scale and on the data scale

The LINES display does not reflect all significant comparisons. The following additional pairs are significantly different: (T3,T4)

Based on the fixed effects tests, the flea species × treatment interaction is significant. The means on the model scale are listed under the "Estimate" column, followed by their standard errors, "Standard error" (Table 6.7). The output of the "ilink" option in "lsmeans" applies the inverse function of the link function to the estimates on the model scale to obtain the estimates on the data scale. The probabilities, on the data scale, are given under the "Mean" column with their respective standard errors and correspond to the probability of insect (flea) survival.

Figure 6.5 shows that the survival of both species is different in treatments 2–5; the Daphnia species showed more resistance in treatments 2 and 3, whereas the Ceriodaphnia species showed greater resistance in treatments 4 and 5. On the other hand, in treatments 1 and 6, survival was similar in both species.

# 6.4 A Split-Plot Design in an RCBD with a Normal Response

A split plot is the most common treatment structure design in agricultural and agroindustrial research areas. These experiments generally involve two or more factors under study. Typically, large or primary experimental units, commonly known as the whole plot, are grouped into blocks. The levels of the first factor are randomly assigned to the whole plots. Then, each whole plot is divided into smaller units, known as split or secondary plots. The levels of the second factor are randomly assigned to the subplots within each whole plot.



Fig. 6.5 The average survival rate of both species

The model equation for the analysis of variance assuming normality in the response is

$$y\_{ijk} = \eta + a\_i + r\_k + (\text{ra})\_{ik} + \beta\_j + (a\beta)\_{ij} + e\_{ijk}$$

$$i = 1, 2, \cdots, a; j = 1, 2, \cdots, b; k = 1, 2, \cdots, r$$

where yijk is the observed response variable in the kth block at the ith level of factor A and at the jth level of factor B, α and β refer to the fixed treatment effects due to factors A and B, respectively, r is the random effect due to the blocks, (ra)ik is the random error term due to the whole plot that is an interaction between the blocks and factor A, and eijk is the random residual effect. Normally, the errors and other random terms are also assumed to be normal; however, when the response variable is not normally distributed, this way of specifying the model is not the most appropriate. Thus, under the assumption that the response variable is normal, this way of specifying the model is valid.

# 6.4.1 An RCBD Split Plot with Binomial Data: Carrot Fly Larval Infestation of Carrots

Data were obtained from an experiment that was designed to compare a number of carrot genotypes with respect to their resistance to infestation by carrot fly larvae. The data involved 16 genotypes that were compared at 2 pest levels to be controlled. The experiment was conducted in three randomized blocks. Each block consisted of


Table 6.8 The notation 44/53 denotes that 44 carrots were infected ( y) out of a sample size of 53 studied (N)

Table 6.9 Sources of variation and degrees of freedom


32 plots, 1 for each combination of genotype and pest infestation level. At the end of the experiment, about 50 carrots were taken from each plot and assessed for infestation by carrot fly larvae. The data obtained are shown in Table 6.8.

Table 6.9 shows the analysis of variance summarizing the sources of variation and degrees of freedom.

Rewriting in terms of the linear predictor

$$
\eta\_{ijk} = \eta + a\_i + r\_k + (\text{ra})\_{ik} + \beta\_j + (a\beta)\_{ij}
$$

Since the observations were taken at the subplot level, conditioned on the structural effects of the design, these observations have a variance associated with the subplot. Therefore, α and β refer to the treatment fixed effects due to factors A


and B, respectively; (αβ)ij refers to the interaction of the above factors; rk is the random effect due to blocks; and blocks × whole plot (ra)ik is assumed to contribute to the variation such that rk <sup>N</sup>0, <sup>σ</sup><sup>2</sup> r and ra ð Þik <sup>N</sup>0, <sup>σ</sup><sup>2</sup> block <sup>×</sup> <sup>A</sup>. This model uses the linear predictor ηijk to estimate the mean of the observations μijk.

The specification of the this GLMM is as follows:

j ð Þ - Distribution: yijk rk, (ra)rk~Binomial(Nijk, πijk) rk N 0, σ<sup>2</sup> r , ra rk N 0, σ<sup>2</sup> block A Link function: logit(πijk) = ηijk.

The following SAS GLIMMIX program allows the fitting of a GLMM with a split-plot structure in a randomized complete block design with a binomial response.

```
proc glimmix data=spd_pp nobound method=quadrature; 
class Genotype Trt Block ; 
model y/N = Genotype|Trt; 
random intercept trt /subject=block; 
lsmeans Genotype|Trt/lines ilink; 
run;
```
The program uses the quadrature estimation method (method=quadrature). This estimation method produces similar results as the Laplace method. Part of the results is provided in Table 6.10. Pearson's chi-squared/DF value in part (a) gives an idea of whether there is overdispersion or extra-variation in the dataset. In this case, Pearson' s chi - square/DF = 1.97 indicates that there is overdispersion in the dataset, so it is feasible to use either the pseudo-likelihood (PL) estimation method or a different distribution. In addition to these results, the variance component estimated due to blocks and blocks × genotype (the whole plot) in part (b) are σ2 block = 0:004272 and σ<sup>2</sup> ðÞ block <sup>×</sup> <sup>A</sup> <sup>=</sup> <sup>0</sup>:03344, respectively. The results of the fixed effects tests (part (c)) indicate that the effect of genotype and the interaction between genotype and treatment are significant.

The appropriate method for model evaluation depends on whether or not there is evidence of overdispersion, so we consider this issue below. The residual variance incorporates systematic discrepancies between the model and the observed responses, variation between replicates (observations in independent experimental units with the same values of the explanatory variables) and sampling variation arising from the distribution of the data; in this case, it is the binomial distribution. If there are no duplicate observations and the fitted model provides an adequate description of the systematic trend, then only sampling variation contributes to the residual variance. If this is true, then the residual deviation has an approximate chi-squared distribution with degrees of freedom similar to the mean squared error (MSE) (the residual).

Since there is overdispersion in the data using the binomial distribution, there are three alternatives we can explore: (1) review the linear predictor, which involves carefully revising the analysis of variance table; (2) add a scale parameter; or (3) use another distribution for the dataset. Each of these three possible alternatives is discussed below, in this order.

### 6.4.1.1 Linear Predictor Review (ηijk)

If the proportion of normal plants (πijk) is being affected by the genotype within each infestation level (trt = αi) from plot to plot within each of the blocks, then a nested factorial effect of genotype within infestation levels (trt) could be included in the analysis of variance. Thus, the linear predictor would be defined as

$$
\eta\_{ijk} = \eta + a\_i + r\_k + (\text{ra})\_{ik} + \beta(a)\_{j(i)}.
$$

ð Þ where αi, β(τ)j(<sup>i</sup>), rk, and (ra)ik are the fixed effects due to treatments, the effect of genotypes nested within a treatment, random effects due to blocks rk <sup>N</sup>0, <sup>σ</sup><sup>2</sup> r , and the interaction between blocks and treatment ra ik N 0, σ<sup>2</sup> RA , respectively.

The following GLIMMIX syntax estimates the above linear predictor:

```
proc glimmix data=spd_pp method=laplace; 
class Genotype Trt Block ; 
model y/N = Trt genotype(trt); 
random trt/subject=block; 
lsmeans genotype(trt)/lines ilink slice=trt slicediff=trt; 
run;
```
The only difference between this proc GLIMMIX and the previous one is that in this program, we have included the nested effect of genotypes within treatment, genotype (trt), and removed only the fixed effects of genotypes. Part of the results is shown in Table 6.11. The value of Pearson's chi-squared/DF statistic (part (a)) as


well as the fit statistics did not decrease when modifying the linear predictor. However, the F-values calculated for treatments and genotypes within treatments (part (c)) are smaller than those obtained in the split-plot design.

Since the overdispersion is still present (Pearson' s chi - square/DF = 1.97), another alternative is to add a scaling parameter to the model. This alternative is presented below.

### 6.4.1.2 Scale Parameter

If the residual deviation is larger than expected when compared to critical values of the appropriate chi-squared distribution, and if this cannot be corrected by redefining the linear predictor of the model, then there is more variation present than can be accounted for by the distributional likelihood assumption. In this case, we say that the data show overdispersion. The simplest way to deal with overdispersion is to extend the model for scaling the variance function. Adding the scale parameter replaces Var(yij) = πij(1 - πij) with Var(yij) = ϕπij(1 - πij). The rationale for this approach is discussed by Collett (2002). The parameter ϕ is a scale factor, called the dispersion parameter, which is used to summarize the degree of overdispersion present in the observations. Clearly, ϕ = 1 corresponds to the original distribution model. This parameter can be estimated in several different ways. The logarithm of the likelihood of the binomial distribution is given by

$$\log\binom{N}{\mathbf{y}\_{ij}} + \mathbf{y}\_{ij}\log\left(\frac{\pi\_{ij}}{1-\pi\_{ij}}\right) + N\log\left(1-\pi\_{ij}\right)$$

In the logarithm of the likelihood, the term "yij log <sup>π</sup>ij <sup>1</sup> - <sup>π</sup>ij " is very important; any quantity that multiplies yij is known as the natural or canonical parameter, and this parameter is always a function of the mean. For the binomial distribution, the mean Nijπij and the natural parameter is log <sup>π</sup>ij <sup>1</sup> - <sup>π</sup>ij , and, in categorical data, it is known as "log odds." The generalized estimating equation (GEE) method provides a valid analysis for marginal means, since under a binomial distribution, in the quasilikelihood, the variance of the distribution is given by ϕπij(1 - πij). This is achieved by adding the "random \_residual\_" command in the following SAS syntax.

The following GLIMMIX commands are used to invoke the scale parameter but using the first predictor proposed for these data.

```
proc glimmix data=spd_pp nobound; 
class GenotypeTrtBlock ; 
model y/N = Trt|genotype; 
random intercept trt/subject=block; 
random _residual__; 
lsmeans Trt|genotype/lines ilink ; 
run;
```
In this syntax, we still keep the binomial distribution (y/N is equivalent to telling GLIMMIX in SAS that it is a binomial response) but will add the "random \_residual\_" command. In this case, we cannot obtain the maximum likelihood estimators because we cannot implement the Laplace method ("method = laplace") or adaptive quadrature ("method = quad") approximation method, so the estimation is performed through the pseudo-likelihood (PL) method. This causes the scale parameter to be estimated, and, consequently, it is used in the adjustment of all standard errors and statistical tests. Proc GLIMMIX uses the generalized statistics of McCullagh and Nelder (1989), i.e., χ2 /df as the estimator of the scale parameter (ϕ^Þ. All standard errors from the analysis under a binomial distribution are multiplied by ϕ^, and all F-tests are divided by ϕ^ to account for overdispersion. Part of the output

is shown below.

The value of Pearson's statistic in part (a) indicates that overdispersion has not been eliminated. Chi - square/DF = 3.13, on the contrary, indicates that this value has increased. This result indicates that adding a scale parameter to the model does not decrease the extra-variation present in the dataset, since the binomial assumption forces a relationship between the mean and variance of the data that might not contain the data being analyzed. On the other hand, the estimated scale parameter is ϕ^ = 3:1263 (part (b)). Pearson's residual analysis showed that its variance is 3.6257, which is considerably larger than 1, implying a large overdispersion. In addition, the results of the fixed effects tests (part (c)) vary from those above (Table 6.12).

Therefore, the third option based on assuming an alternative distribution (beta distribution) on the response variable is discussed below.


### 6.4.1.3 Alternative Distribution

Another approach to control the overdispersion would be to use a different distribution in the interval [0, 1], such as the beta distribution, to model the data. Generally, this distribution yields good results when all experiments have the same number of observations (successes and failures), i.e., when Nijk = N. When Nijk varies a little, even in many cases, the beta distribution yields acceptable results. It is important to mention that the proportions come from binomial counts, and, therefore, we now define the response variable as pijk = yijk Nijk so that it can be modeled as the beta distribution. The components of the beta response model are listed below:

j ð Þ ijk 1 - πijk ijk Distribution: pijk rk, (ra)rk~Beta(πijk, ϕ) with ϕ as the scale parameter rk N 0, σ<sup>2</sup> r , ra rk N 0, σ<sup>2</sup> RA Linear predictor: ηijk = η + αi + rk + (αr)ik + βj + (αβ)ij Link function: logit π = logit <sup>π</sup>ijk = η

As mentioned before, we now use the response variable pijk = yijk Nijk . This new response variable pijk is not the same as the one used in the binomial distribution. The following SAS commands fit a GLMM in a split-plot randomized complete block design with a beta response. It is important to mention that before implementing this model in SAS GLIMMIX, the variable <sup>p</sup> <sup>=</sup> pijk = yijk Nijk was defined.

```
proc glimmix data=spd_pp nobound method=laplace; 
class GenotypeTrtBlock ; 
model p = Genotype|Trt/dist=beta; 
random intercept trt/subject=block; 
lsmeans Genotype|Trt/lines ilink; 
run;
```


Table 6.14 Results of the analysis of variance, assuming binomial and beta distributions


Some of the SAS GLIMMIX output is listed below. Based on the fit statistics under the binomial (first alternative) and beta distributions (Table 6.13), clearly the values of the statistics related to the degree of overdispersion are lower in the beta distribution than in the binomial distribution, indicating that the beta distribution provides a better fit (part (a)). Looking at the fit statistics for the conditional model in part (b), the values of the three fit statistics in the binomial model are higher than the values in the beta model. The value of Pearson' s chi - square/DF under the beta distribution is 1.01. This value indicates that the overdispersion has been virtually eliminated from the data and that therefore the beta distribution is a better candidate model for this dataset.

Adding the scale parameter (ϕ) to the model, the variance components and standard errors in Table 6.14 cause (part (a)) variation for each of the results and, therefore, the F- and t-tests are affected (part (b)). The estimated value of the scale


Table 6.15 Estimated means and standard errors on the model scale and the data scale

parameter is ϕ^ = 25:7018. The variance components based on the binomial model and beta are listed below.

The treatment means (part (a)) and genotypes (part (b)) are presented in Table 6.15. The estimates on the model scale are listed under the column "Estimate" with their respective standard errors "Standard error," and the values on the data scale are listed under the column "MEAN" with their respective standard errors "Standard error mean." In the table of least squares means for the effect of genotypes, inconsistencies are observed in the values of t and in the standard error values of the means, so other estimation alternatives should be sought.

In large samples, both binomial and normal distributions are quite similar. Logically, the latter two analyses, binomial and beta, are attractive because of their consistency with the nature of the data. Because of the inconsistencies in the estimates of the mean for genotypes (tvalue = Infty and standard error of the mean), a robust method of estimation could be used; in this case, this is the normal distribution.

Assuming that pijk has a normal distribution with a mean μijk and constant variance σ2 , the components of this model are as follows:


Table 6.16 Results of the analysis of variance, assuming

Distribution: pct <sup>j</sup> <sup>r</sup> , (ra) ~Normal(<sup>μ</sup> , <sup>σ</sup><sup>2</sup> ) ð Þ ijk k ik ijk rk N 0, σ<sup>2</sup> <sup>r</sup>, ra ik N 0, σ<sup>2</sup> RA Linear predictor: ηijk = η + αi + rk + (αr)ik + βj + (αβ)ij Link function: ηijk = μijk; identity

Similarly, in this example, the response variable used was pctijk = yijk Nijk . This new response variable pctijk is not the same as the response variable used in the binomial distribution. The following SAS GLIMMIX commands adjust a linear mixed model (LMM) under a split plot in a randomized complete block design with a normal response.

```
proc glimmix data=spd_pct nobound; 
class Genotype Trt Block ; 
model pct = Genotype|Trt; 
random block block*trt; 
lsmeans Genotype|Trt/lines; 
run;
```
ffi Part of the results is shown below. The values of fit statistics in part (a) of Table 6.16 for the model are clearly lower than those estimated in the previous options. This indicates that the normal distribution is reasonable, even though the response is a proportion. The estimated variance components, tabulated in part (b) due to blocks, blocks x treatment, and the mean squared error (MSE) (Residual = Gener. chi-square/DF) are σ^<sup>2</sup> block = 0:000123, σ^<sup>2</sup> block <sup>×</sup> trt = 0:00039, and σ^<sup>2</sup>= MSE = 0:009442 0:01, respectively.


Table 6.17 Means and standard errors for genotypes and treatments

The F-statistics for the fixed effects of genotype, treatments, and the interaction between both factors provide significant statistical evidence on the proportion of infested carrots in each of the genotypes (part (c)). Overall, the least squares means for genotypes and treatments are reported in Table 6.17 in parts (a) and (b). The genotypes showing the highest fraction of infested carrots were 1, 2, 3, 5, and 13, whereas genotypes 12 and 16 showed the lowest percentage of infested carrots. Now, for treatments, the highest proportion of infested carrots was observed in treatment 1 with 24.78%, whereas in treatment 2, it was 13.58%.

Based on the fixed effects tests, the interaction effect of genotype x treatment on the proportion of infested carrots was statistically different. Genotypes 9 and 16 showed higher susceptibility in treatment 1 followed by treatment 2, whereas genotypes 5, 11, 13, and 15 showed the same proportions of infested carrots in both treatments (Fig. 6.6). On the other hand, genotypes that showed higher resistance to infestation levels were genotypes 1, 2, and 6 followed by genotypes 3, 4, 7, 8, 10, and 12.

Fig. 6.6 The average proportion of infested carrots in genotypes as a function of treatment

# 6.5 A Split-Split Plot in an RCBD:- In Vitro Germination of Seeds

The growth of a plant in a tissue culture can be explained by various combined effects of A, B, and C factors. For this, the availability and efficient use of chemical resources (factors) is of great relevance when availability is scarce or too expensive. In light of this, the combination of three reagents (A, B, and C), reagent A at three levels and reagents B and C at two levels, were tested on the in vitro germination of orchid seeds. The combination of the levels of each of the factors is schematized below.



In each of the factor combinations, N orchid seeds were placed to germinate for a period of time. Let yijk be the number of seeds germinated at the ith level of factor A, at the jth level of factor B, and at the kth level of factor C. Since the observations are made at the sub-subplot level, conditional on the structural effects of the design, these observations have a variance associated with the subplot. Therefore, the statistical model for this experiment is given below:


Distribution: y j r , (ra) , (rαβ) ~Binomial(N , π ) ð Þ ð Þ ijkl l il ijl ijk ijk rl N 0, σ<sup>2</sup> <sup>r</sup>, ra rk N 0, σ<sup>2</sup> RA , rαβ ijl N 0, σ<sup>2</sup> rab Linear predictor:

ηijk = η + αi + rl + (rα)il + βj + (αβ)ij + (rαβ)ijl + γk + (αγ)ik + (βγ)jk + (αβγ)ijk,

where blocks (rl), blocks × A ((ra)il), and blocks × A × B ((rαβ)ijl) are assumed to contribute to the variation such that rl <sup>N</sup>0, <sup>σ</sup><sup>2</sup> r , ra ð Þil <sup>N</sup>0, <sup>σ</sup><sup>2</sup> <sup>r</sup> <sup>×</sup> A , ð Þ <sup>r</sup>αβ ijl <sup>N</sup>0, <sup>σ</sup><sup>2</sup> rab , respectively, and εijkl experimental errors are distributed as N(0, σ2 ). This model uses the linear predictor ηijk to estimate the mean of the observations μijk.

Link function: logit(πijkl) = ηijkl

Table 6.18 below shows the data obtained from this experiment.

Table 6.19 presents the analysis of variance and shows the sources of variation and degrees of freedom for this experimental design.

The following SAS GLIMMIX program allows a GLMM with a split-split plot structure to be fitted in an RCBD with a binomial response.


Table 6.19 Sources of variation and degrees of freedom for the randomized block design with an arrangement of treatments under the split-split-plot structure

```
proc GLIMMIX data=germ nobound method=laplace; 
class Block A B C; 
model Y/N = A|B|C/dist=binomial link=logit; 
random block block*A block*A block*A*B; 
lsmeans A|B|C/lines ilink; 
run;
```
Part of the output is shown in Table 6.20. The value of the conditional statistic Pearson' chi - square/DF = 1.81 (part (a)) indicates that there is an overdispersion in the dataset since these values are greater than 1. The estimated variance components tabulated in part (b) correspond to blocks, blocks × factor A, and blocks × factor A × factor B, which are σ<sup>2</sup> <sup>r</sup>= 0:0752, σ<sup>2</sup> rA = 0:088, and σ<sup>2</sup> rab = 0:0425, respectively. The type III tests of fixed effects are shown in part (c). Here, we see that the test of equality of treatments is not significant for factors A and B and the interaction AB (A, P = 0.1917, B, P = 0.0897; AB, P = 0.6262), whereas for factor C and the interactions AC, BC, and ABC, it is significant at a level of 5%.

Since there is overdispersion in the dataset, the binomial distribution does not provide a good fit for the dataset (Pearson' s chi - square/DF = 1.81). An alternative to model this dataset could be the beta distribution. Under this assumption, let the response variable be pijk = yijk Nijk , the proportion of seeds that germinated, then pijk is assumed to have a beta distribution rather than a binomial distribution for the success count yijk out of a total of Nijk Bernoulli trials.

The components of the model are listed below:

j ð Þ ð Þ Distribution: pijk rl, (ra)il, (rαβ)ijl ~ Beta(πijk, ϕ), with ϕ as the scale parameter. rl N 0, σ<sup>2</sup> <sup>r</sup>, ra rk N 0, σ<sup>2</sup> rA , rαβ ijl N 0, σ<sup>2</sup> rab Linear predictor: ηijk = η + αi + rl + (rα)il + βj + (αβ)ij + (rαβ)ijl + γk + (αγ)ik + (βγ)jk + (αβγ)ijk Link function: logit π = logit <sup>π</sup>ijk = η

ijk 1 - πijk ijk


Table 6.20 Results of the analysis of variance of the RCBD in the split-split plot under the binomial distribution

The following SAS commands fit a GLMM on a split-split plot in a randomized complete block design assuming a beta distribution for the response variable.

```
proc glimmix data=germ nobound method=laplace; 
class BlockABC ; 
model p = A|B|C/dist=beta ; 
random block block*A block*A*B;/*intercept A /subject=block*/; 
lsmeans A|B|C/lines ilink; 
run;
```
Part of the results is listed in Table 6.21 under a beta distribution. The value of the fit statistic for the conditional model tabulated in (a) (Pearson' s chi - square/ DF = 1.01) indicates that overdispersion has been removed and that the beta distribution is a good model to fit the dataset. Part (b) shows the variance component estimates for blocks, blockxA, and blockxAxB σ^2 <sup>r</sup>= - 0:157, σ<sup>2</sup> rA = - 0:05558, and σ<sup>2</sup> rab = - 0:227, respectively and the value of the estimated scale parameter ϕ^ = 19:2789 . According to the type III tests of fixed effects in part (c), the main effect of factor C (P = 0.0128) and interaction A×B×C (P = 0.0424) are statistically significant at a level of 5%.

The estimates of the interactions are shown in Table 6.22 on the model scale under the "Estimate" column and as probabilities on the data scale under the "Mean" column with its corresponding standard errors under the "Standard error mean" column.


Table 6.21 Results of the analysis of variance of the RCBD in the split-split plot structure under the beta distribution

Table 6.22 Estimated least mean squares on the model scale ("Estimate" column) and the data scale ("Mean" column)


Fig. 6.7 The average seed germination rate

The simple effects of factors show that the best combination of factor levels was A2\*B1\*C2, showing the highest seed germination proportion followed by the combination of factors A1\*B1\*C2, A3\*B2\*C2, and lower proportion, which were observed in the combination of factors A1\*B2\*C2, A2\*B2\*C1 and A2\*B2\*C2 (Fig. 6.7). Finally, the combination of the factor levels A2 × B1 × C1 showed the lowest proportion of seed germination.

# 6.6 Alternative Link Functions for Binomial Data

In previous chapters, we used proc GLIMMIX with binomial data and, by default, it works with the link function " logit. " However, in certain applications with binomial data, other link functions are acceptable, either because they make it easier to interpret or because for certain binomial datasets, the link function " logit" cannot accurately model the data and, as a result, produce biased (misleading) results. In this section, we consider two alternative link functions to the logit for binomial data: the link " probit" and the complementary log-log link.

ð Þ <sup>1</sup> <sup>2</sup><sup>π</sup> The probit model is also used to model dichotomous (Bernoulli) or binomial (sum of Bernoulli trials) responses. For this model, the link function, called the probit link, uses the inverse of the cumulative distribution function of a standard normal distribution to transform probabilities to the standard normal variable. That is, Φ-1 (πi) = ηi, which implies that πi = Φ(ηi), where Φ Z = <sup>z</sup> - p <sup>1</sup> e - <sup>1</sup> 2t <sup>2</sup>dt.

The use of the probit regression model dates back to Bliss (1934). Bliss was interested in finding an effective pesticide to control insects that fed on grape leaves. He discovered that the relationship between the response and a dose of pesticide was sigmoid, and he applied the probit link function to transform the dose–response curve from a sigmoid to a linear relationship.

The complementary function log - log defined as ηi = log (- log (1 - πi)), whose inverse is πi = 1 - e - <sup>e</sup>η<sup>i</sup>, is useful for data in which most of the probabilities are near zero or near one. For small values of πi, the log-log transformation produces results highly similar to those produced when using a logit link. As the probability increases, the transformation approaches infinity more slowly than the probit or logit model.

# 6.6.1 Probit Link: A Split-Split Plot in an RCBD with a Binomial Response

This example takes the dataset of the split-split plot in an RCBD (Exercise 6.8.5). In this example, the data were modeled using the function " logit." In this exercise, we will fit the dataset using the link function " probit, " and we will compare and contrast the results using a logit link. The components of the GLMM are identical to those in Example 6.5, except for the link function. That is, we replace:

ijk <sup>1</sup> - <sup>π</sup>ijk ijk Φ ijk ijk Link function: logit π = logit <sup>π</sup>ijk = η by -1 (π ) = η .

The following GLIMMIX syntax implements the fitting of the binomial data using the link function " probit. "

```
proc glimmix data=germ nobound method=laplace; 
class Block A B C; 
model Y/N = A|B|C/link=probit; 
random block block*A block*A*B; 
lsmeans A|B|C/lines ilink; 
run;
```
Table 6.23 shows part of the results under the binomial distribution with the "probit" link function. In parts (a) and (b), we see the mean squared error and variance component estimates for blocks, whole plot, subplot, and sub-subplot, where it can be observed that these values are positive and not negative, as the ones obtained with the link function " logit. " Since the variance components are positive, this analysis makes more sense than the one based on the logit link.

The type III tests of fixed effects are tabulated in part (c) of Table 6.23; the main effects of factors A and B and the interactions A\*B, A\*C, and B\*C are not significant in both link functions, whereas the main effect of factor C and the interaction A\*B\*C are statistically significant under the "probit" link.

The estimated probabilities π^ijk and their respective standard errors are presented in Table 6.24 for each of the combinations of the three factors, which


Table 6.23 Results of the analysis of variance of the RCBD in the split-split plot structure under the binomial distribution using the "probit" link

ð Þ are very similar in both link functions. However, the average standard error is slightly higher with the "logit" link function ðstandar:error:meanlogit = 0:0711Þ compared to the "probit" link standar:errormeanprobit = 0:0693 .

# 6.6.2 Complementary Log-Log Link Function: A Split Plot in an RCBD with a Binomial Response

Researchers studied three different micro-minerals (A, B, and C) on the attachment of explants of a commercial culture. In this vein, micro-mineral A was tested at three levels (i = 1, 2, and 3), and micro-minerals B and C at two levels ( j, k = 1,2 and). The combination of the different levels yielded a total of 12 combinations. Since the researchers wanted to study factor C with greater precision, a split-plot treatment structure was designed in which micro-minerals A and B were placed in the whole plot (a large plot) and micro-mineral C in the subplot (a small plot). Treatment factor combinations were placed in an RCBD manner (r = 1, 2). The outcome of interest was the number of live plants (yijkr) out of the total number of plants growing in the


Table 6.24 Means and standard errors using the probit and logit link functions

unit (nijkr). The data can be referred to in the Appendix (Data: Commercial crop explant attachment).

The GLMM for this experiment is described below (log-log data):

j ð Þ ð Þ Distribution: yijkl rl, r(aβ)ijl~Binomial(Nijk, πijk) rl N 0, σ<sup>2</sup> r , r aβ ijl N 0, σ<sup>2</sup> rab , Linear predictor: ηijkl = η + rl + α<sup>i</sup> + β<sup>j</sup> + (αβ)ijl + r(αβ)il + γ<sup>k</sup> + (αγ)ik + (βγ)jk + (αβγ)ijk, <sup>i</sup>+ βj + (αβ)ijl + r(αβ)il + γk + (αγ)ik + (βγ)jk + (αβγ)ijk, where blocks (rl) and blocks x (A x B) ((r(aβ))ijl) are assumed to contribute to the variation such that rl N 0, σ<sup>2</sup> <sup>r</sup>and r aβ ijl N 0, σ<sup>2</sup> rab , respectively. Link function: log - log (πijkl) = ηijkl

The following GLIMMIX code adjusts the binomial proportions with a complementary link function log - log in an RCBD manner.

```
proc glimmix data=spp nobound method=laplace; 
class block A B C; 
model y/n = A|B|C/link=ccll; 
random block block(A*B); 
lsmeans A|B|C/lines ilink; 
run;
```
The "link = ccll" option specifies that "proc GLIMMIX" will fit the model using the complementary (log - log) link function. The "lsmeans A|B|C/lines ilink" command calls for estimation of the linear predictors ηijk, whereas the "lines" and "ilink" options provide the comparison between the linear predictors and their inverse. Part of the output is shown below. Table 6.25 shows the variance component estimates of blocks and blocks (A×B) using alternative link functions. Under


Table 6.25 Variance component estimates using the same distribution but a different link function

Table 6.26 Type III tests of fixed effects using the same distribution but with a different link function


Table 6.27 Fit statistics using the same distribution but a different link function


the link "probit," the variance components are smaller compared to those obtained with the link functions "log – log" and "logit."

The values of the hypothesis tests for the fixed effects, both main effects and interactions, are shown in Table 6.26. The three link functions behave similarly.

One tool that might be useful in choosing which link function provides a better fit, or which best describes the variability of a dataset, is the model fit statistics. The fit statistics indicate that the model with the complementary "log - log" link function provides the best fit (Table 6.27).

Table 6.28 shows the maximum likelihood estimators π^ijk for each of the link functions and the combination of factor levels, and it can be verified that they provide very similar estimates. It is important to mention that the correct


Table 6.28 Means and standard errors using the same distribution but with a different link function

specification of the linear predictor as well as the distribution of the response variable are the most important elements for obtaining a good fit.

# 6.7 Percentages

In this section, we consider proportions that have been calculated from discrete counts, for example, the number of infected plants in treatment i of total Ni plants that are likely to have a binomial distribution. This class of models allows the response to arise from different distributions and probabilities.

# 6.7.1 RCBD: Dead Aphid Rate

An experiment was designed to study the effect of conidial density on the transmission of a fungus that attacks aphids. Aphid carcasses killed by the fungus, and from which the fungus released spores, were placed on bean plants at three densities (A = 1, B = 5, or C = 10 carcasses per plant) to provide different doses of fungal conidia. Densities were assigned to individual bean plants in a completely randomized design with six replicates. A total of 20 live uninfected (N) aphids were placed on each plant with a ladybug that was allowed to forage (feed on the bean plants) to facilitate the transfer of conidia between the carcasses and the live aphids. For each plant, the number of aphids infected with the fungus was counted (nij) and the proportion of aphids infected with the fungus was calculated 7 days after the



Table 6.30 Sources of variation and degrees of freedom


inoculum was placed. The results shown below correspond to the proportion of infected aphids calculated at each of the inoculum concentrations ( pij = nij/N; N = 20) to each of the conidial concentrations (density) tested (Table 6.29).

The sources of variation and degrees of freedom for this experiment are shown in Table 6.30.

The components of the GLMM having a beta response are listed below:

j ð Þi jð Þ density plant ð Þ 1 - πij ij ij Distributions: pij density(plant)i( <sup>j</sup>) ~ Beta(πij, ϕ) density plant N 0, σ<sup>2</sup> Linear predictor: ηij = μ + densityi + density(plant)i( <sup>j</sup>); i = 1, 2, 3; j = 1, ⋯,6 Link function: log <sup>π</sup>ij = logit π = η

The following GLIMMIX program fits a GLMM in a completely randomized design with a beta distribution. Here, density is conc\_ino.

```
proc glimmix data=thumbs nobound method=laplace; 
class plant conc_ino; 
model p = conc_ino /dist=beta link=logit; 
random conc_ino(plant);
```


Table 6.32 Means and standard errors on the model scale and the data scale


lsmeans conc\_ino/lines ilink; run;

Part of the results is shown in Table 6.31. The value of the conditional fit statistic in part (a), Pearson' s chi - square/DF = 1.02, indicates that there is no overdispersion in the data and that the beta distribution is a good model for this dataset. The estimated variance of the plants' nested inoculum density is σ^2 density plant ðÞ = - <sup>0</sup>:1833 and the estimated scale parameter is ϕ^ <sup>=</sup> <sup>12</sup>:999; both are tabulated in part (b). In part (c) of the same table, the type III tests of fixed effects are shown, indicating that the density (concentration) of the inoculum has a significant effect (P = 0.0038) on the proportion of infected aphids with the fungus.

The values under the column "Estimates" are estimated mean proportions on the model scale, whereas the column "Mean" shows the estimated mean proportions on the data scale with their respective standard errors (Table 6.32). These estimates where obtained with the "lsmeans" and "ilink" option.

Figure 6.8 shows a linear trend in the proportion of aphids infested as conidial density increases. Conidia densities A and B showed statistically equal proportions of infested aphids compared to density C. Finally, the highest proportion of infested aphids was observed at density C.

Fig. 6.8 Proportion of aphids infected at different conidia concentration densities

# 6.7.2 RCBD: Percentage of Quality Malt

An agro-industrial engineer is interested in studying the effect of germination time in minutes (48, 96, and 144) on the percentage of quality malt obtained from six sorghum varieties (sorghum bicolor): Gambella 1107, Macia, Meko, Red Swazi, Teshale, and 76T1#23 (Bekele et al. 2012). The percentage of quality malt ( y) as a function of both factors is shown in Table 6.33.

For this purpose, an RCBD was implemented with a treatment factorial structure (variety × germination time). The statistical model to analyze the dataset is the following:

j Distributions: yijk rk ~ Beta(πijk, ϕ); i = 1, ⋯, 6; j, k = 1, 2, 3 rk <sup>N</sup>0, <sup>σ</sup><sup>2</sup> block , where yijk is the kth percentage of malt quality observed at the ith variety with the jth fermentation time.

Linear predictor: ηijk = μ + rk + αi + βj + (αβ)ij, where μ is the overall mean, αi is the fixed effect due to variety i, βj is the fixed effect due to germination time j, and (αβ)ij is the interaction effect between variety and germination time.

Link function: logit(πijk) = ηijk

Table 6.34 shows the sources of variation and degrees of freedom for this experiment.

The following GLIMMIX commands adjust a GLMM with a beta response.

```
proc glimmix data=malting nobound method=laplace; 
class var_sorghum ger_time block; 
model p = var_sorghum|ger_time/dist=beta link=logit; 
random block; 
lsmeans var_sorghum|ger_time/lines ilink; 
run;
```


Table 6.33 Percentage of quality malt as a function of both factors (variety and germination time)

Table 6.34 Sources of variation and degrees of freedom


Part of the results of the above program is shown in Table 6.35. In part (a), the value of Pearson's chi-square/DF is tabulated <sup>χ</sup><sup>2</sup> df = 0:92 , which indicates that the beta distribution is a good distribution for modeling malt percentage since the t-value of Pearson's chi-square/DF is close to 1. The estimated variance due to blocks is


Table 6.36 Means and standard errors on the model scale and the data scale for sorghum varieties


σ^2 block <sup>=</sup> <sup>0</sup>:012 and the estimated scale parameter is ϕ^ <sup>=</sup> 431 (part (b)), whereas the type III fixed effects hypothesis tests in part (c) show that sorghum variety has a significant effect on malt quality percentage (P = 0.0001).

The least squares means on the model scale and the data scale for the factor variety are listed under the columns "Estimate" and "Mean" with their respective standard errors "Standard error" in Table 6.36.

Figure 6.9 shows that Teshale produced the highest average malt percentage (0.2685 ± 0.01436), followed by the varieties 76 T1#23 and Meco (0.2313 ± 0.01316,0.246 ± 0.01366), whereas the variety Macia produced the lowest malt percentage (0.09915 ± 0.0074).

# 6.7.3 A Split Plot in an RCBD: Cockroach Mortality (Blattella germanica)

An entomologist is interested in testing six isolates of insect pathogenic fungi: five obtained from different hosts and one already known isolate (Control) of a fungus

Fig. 6.9 Percentage of quality malt of bicolor sorghum varieties

Table 6.37 Analysis of variance with sources of variation and degrees of freedom for this experiment


with potential for biological control of a particular species of cockroaches. To do so, the entomologist decides to test these fungal isolates on three different insect ages (age1 = E1, age2 = E2, and age3 = E3). Each of the isolates was placed in a Petri dish with 10 insects of a specific age. Each set (isolate–age) was randomly assigned to two blocks (Appendix: Data: Cockroaches).

The analysis of variance table (Table 6.37) with the sources of variation and degrees of freedom for this experiment is presented below. The response variable (percentage mortality) for this experiment is assumed to have a beta distribution.

The components that describe the model of this experiment are listed below:

$$\begin{aligned} &\text{Distributions: } \eta\_{ijk} \mid r\_k, \ r(a)\_{k(i)} \sim \text{Beta}(\pi\_{ijk}, \phi); \ i = 1, \cdots, 6; j = 1, 2, 3; k = 1, 2. \\ &\tau\_k \sim N(0, \sigma\_r^2), r(a)\_{k(i)} \sim N\left(0, \sigma\_{r(a)}^2\right) \\ &\text{Linear predictor: } \eta\_{ijk} = \mu + r\_k + a\_i + r(a)\_{k(i)} + \beta\_j + (a\beta)\_{ij} \\ &\text{Link function: } \log \acute{w}(\pi\_{ijk}) = \eta\_{ijk} \end{aligned}$$


The following GLIMMIX commands adjust a GLMM with a beta response.

```
proc glimmix nobound method=laplace; 
class block Isolation Age; 
model y = Isolation|Age/dist=beta link=logit; 
random Isolation/subject=block; 
lsmeans Insulation|Age/slice=Insulation lines ilink; 
run;
```
Some of the outputs are listed below (Table 6.38). The conditional statistic Pearson′ s chi - square/DF = 1 indicates that the distribution used is appropriate for these datasets (part (a)). The variance component estimates are tabulated in part (b), and, for blocks, the estimate is σ^<sup>2</sup> <sup>r</sup>= - 0:03125 and the estimated scale parameter is ϕ^ = 24:1882. The hypothesis test is in part (c) with type III fixed effects of equality of means for type of isolation, age of the insect, and the interaction between both factors. These outputs indicate that they have a significant effect on insect mortality.

We see the expected proportions with their respective standard errors of both factors on the data scale under the "Mean" column (Tables 6.39 and 6.40). These values arise by applying the inverse link to estimates under "Estimate" on the model scale. Table 6.39 shows the estimated average mortality probabilities for the isolates; for example, for isolate A1, applying the inverse link to the linear predictor estimate ^η1: <sup>=</sup> <sup>0</sup>:1722 we get π^1: <sup>=</sup> <sup>1</sup>=<sup>1</sup> <sup>þ</sup><sup>e</sup> - <sup>0</sup>:<sup>1722</sup><sup>=</sup> <sup>0</sup>:5429. In this manner, we see that the expected proportions for isolates 2 and 4 are π^2: = 0:6555 and π^4: = 0:5762, respectively, whereas for the control π^control: = 0:1157.

Regarding the age of the insect (Table 6.40), the expected average probability of mortality was higher at age three (adults) with a higher mortality rate π^:3 = 0:6435, whereas insects at age two (E2) had a higher resistance to the isolations, showing a mortality of π^:2 = 0:2598.

In general, fungal isolates A1, A2, A3, and A4 showed an average mortality of more than 75% for adult insects (E3), whereas isolates A1, A2, and A5 showed a


Table 6.39 Means and standard errors on the model scale and the data scale for isolation

Table 6.40 Means and standard errors on the model scale and the data scale for insect age


mortality rate of around 65% for cockroaches of age E1 (juvenile insects). On the other hand, all isolates showed lower lethal effectiveness on insects of age E2 (Fig. 6.10).

# 6.7.4 A Split-Plot Design in an RCBD: Percentage Disease Inhibition

A plant pathologist wishes to compare the response of two plant varieties to different doses/amounts of a pesticide formulated to protect plants against a disease. Five racks (blocks) were chosen to account for local variation within the greenhouse. Each rack was divided into four sections or rooms and were randomly assigned one of four pesticide levels to each rack. The four pesticide levels were 1, 2, 4, and 8 mg/ L. One plant of each variety was placed in each section of the rack. Of the two plant varieties, one variety was susceptible, labeled S, and the other variety was resistant, labeled R (Table 6.41). The response variable ( y) is the percentage of disease inhibition in the plant.

The sources of variation and degrees of freedom for this experiment are shown in Table 6.42.

Following the same reasoning used in the examples above, the components of the GLMM with a beta response that models the observed disease inhibition proportion ( pijk) under dose i with variety j in block k are listed as follows:.

j ð Þ Distributions: yijk rk, (rα)ik~Beta(πijk, ϕ); i = 1, ⋯, 4; j = 1, 2; k = 1, ⋯,5 rk N 0, σ<sup>2</sup> r , rα ik N 0, σ<sup>2</sup> rA

Fig. 6.10 Cockroach mortality percentage


Table 6.41 Percentage of inhibition


Table 6.42 Sources of variation and degrees of freedom

Linear predictor: η = μ + r + α + (rα) + β + (αβ) , where r is the random block ijk k i ik j ij k effect, αi is the fixed dose effect, βj is the fixed variety effect, (rα)ik is the random effect due to block by dose interaction, and (αβ)ij is the interaction of fixed effects due to dose variety.

Link function: logit(πijk) = ηijk

The following GLIMMIX commands adjust a GLMM.

```
proc glimmix nobound method=laplace; 
class Variety dose block; 
model y = dose variety dose*variety /dist=beta link=logit; 
random Block Block*dose; 
contrast 'Linear dose' dose -3 -1 1 3; 
contrast 'Quadratic dose' dose 1 -1 -1 -1 1; 
contrast 'dose Cubic' dose -1 3 -3 1; 
lsmeans variety|dose / slice=(variety dose) lines ilink; 
ods output lsmeans=dose_means; 
run;
```
The "contrast" command in the program can perform a hypothesis testing to see what trend (linear, quadratic, or cubic) the "dose" factor has on the percentage of disease inhibition. Part of the output is shown in Table 6.43. The value of the conditional goodness-of-fit statistic Pearson' s chi - square/DF= 0.59 indicates that we have no evidence of overdispersion, and, therefore, the beta distribution is adequate to model this dataset (part (a)). The variance component estimates in part (b) for block and block × dose are σ^<sup>2</sup> <sup>r</sup>= 0:004898 and σ^<sup>2</sup> <sup>r</sup> <sup>∙</sup> dose = 0:002372, respectively. Finally, the F-value provides sufficient statistical evidence of the effect of dose on disease decline in plants (P = 0.0001), whereas the effect of variety and dose × variety do not provide sufficient evidence.

Table 6.44 shows the polynomial contrasts for the effect of "dose," which indicate that there is a significant quadratic effect on the percentage of disease inhibition.

The inhibition percentage has almost a linear trend as the dose increases from 1 to 4 ml/L in both varieties, but when the dose is higher than 4 ml/L, the inhibition of the disease decreases in both varieties (Fig. 6.11).


# 6.7.5 Randomized Complete Block Design with a Binomial Response with Multiple Variance Components

The dataset corresponds to an experiment implemented by Madden and Hughes (1995) on the incidence of the disease caused by the fungus Plasmopara viticola on grape plants (Vitis labrusca). Six different treatments in a randomized block design (b = 3) were tested, where treatment 1 was the control, to study the disease with three grape plants (v = 3). On a single date in autumn, five sprouts were (r = 5) randomly selected from each of the three grape plants and the number of leaves with at least one mildew lesion was counted (m) out of a total n leaves. The number of leaves per shoot ranged from 7 to 21. The data for this experiment can be found in the Appendix (Data: Disease incidence on grape plants).

The statistical model that could describe the incidence of disease in this experiment, if the response variable pijkl were treated as a normal variable, would be as described below:

$$p\_{ijkl} = \eta + \tau\_i + b\_j + (b\nu)\_{jk} + (b\nu r)\_{jkl} + \varepsilon\_{ijkl}$$

$$i = 1, 2, \dots, 6; j = 1, 2, 3; k = 1, 2, \dots, 3; l = 1, 2, \dots, 5$$

Fig. 6.11 Percentage of disease inhibition in both varieties

where p is the ijkl proportion of diseased leaves, η is the intercept, τ is the xed block × plant × sprout ijkl i fi treatment effect i, bj is the random effect of blocks assuming bj <sup>N</sup>0, <sup>σ</sup><sup>2</sup> block , (bv)jk is the block–plant random effect assuming bv ð Þjk <sup>N</sup>0, <sup>σ</sup><sup>2</sup> block <sup>×</sup> plant , (bvr)jkl is the random effect due to block–plant–sprouts assuming bvr ð Þjkl N 0, σ<sup>2</sup> , and εijkl is the experimental error assuming εijkl~N(0, σ2 ).

r s

For the disease incidence data, the assumption of a normal distribution for pijkl is not recommended. A good starting point for the analysis is to assume that the observed number of diseased leaves in the sprouts (yijkl) follows a binomial distribution with parameter πijkl and nijkl, the total number of leaves on the sprout.

Therefore, the components of the GLMM with a binomial distribution in the response variable are as follows:

j <sup>j</sup> block ð Þjk block <sup>×</sup> plant ð Þjkl block <sup>×</sup> plant <sup>×</sup> sprout Distribution: pijkl bj, (bv)jk, (bvr)jkl ~ binomial(πijkl, nijkl) b N 0, σ<sup>2</sup> , bv N 0, σ<sup>2</sup> , bvr N 0, σ<sup>2</sup> Linear predictor: ηijkl = η + τi + bj + (bv)jk + (bvr)jkl. Link function: logit(πijkl) = ηijkl

The following GLIMMIX syntax fits a GLMM with a binomial response.

```
proc glimmix method=laplace nobound; 
class v r b t; 
model m/n = t /dist=bin; 
random intercept v v*r/subject=b; 
lsmeans t/lines ilink; 
run;
```


Part of the results based on the aforementioned model is shown in Table 6.45. By default, proc GLIMMIX provides the fit statistics useful for selecting the best model from a group of models (part (a)).

e In addition to accuracy considerations, the Laplace (or quadrature) analysis allows us to obtain the "conditional distribution fit statistics," specifically Pearson' s χ2 /df. Recall that this statistic helps assess the goodness of fit of th model. If the value of χ2 /df ≫ 1 is an indicator that there is overdispersion in the dataset, then this may be because the linear predictor is incomplete or the assumed distribution is not suitable (mis-specified) for this dataset. In part (b), we can see that the value of the conditional distribution statistic of Pearson' s χ2 /df = 1.47. This value indicates that we have evidence of overdispersion. The variance component estimates due to block, block × plant, and block × plant × sprout are tabulated in part (c), whereas the type III tests of fixed effects (part (d)) indicate that there is a significant difference (P < 0.0001) between treatments.

Since there is overdispersion in the data in the binomial model, an alternative distribution is the beta distribution. The components of the GLMM are as follows:

$$\begin{split} & \text{Distribution:} \ p\_{ijkl} \mid b\_j, (\text{bv})\_{jk}, (\text{bvr})\_{jk} \vdash \text{Beta}(\pi\_{ijkl}, \phi); \\ & b\_j \sim N\left(0, \sigma\_{\text{block}}^2\right), (\text{bv})\_{jk} \sim N\left(0, \sigma\_{\text{block}\times\text{plant}}^2\right), (\text{bvr})\_{jkl} \sim N\left(0, \sigma\_{\text{block}\times\text{plant}\times\text{sprot}}^2\right); \\ & \text{Linear predictor:} \ \eta\_{ijkl} = \eta + \tau\_i + b\_j + (\text{bv})\_{jk} + (\text{bvr})\_{jkl} \\ & \text{Link function:} \ \text{logit}(\pi\_{ijkl}) = \eta\_{ijkl}. \end{split}$$


The following SAS commands adjust an GLMM under a beta distribution.

```
proc GLIMMIX method=laplace nobound; 
class v r b t; 
model pct = t /dist=beta link=logit; 
random intercept v v*r/subject=b; 
lsmeans t/lines ilink; 
run;
```
Some of the outputs are shown below. Table 6.46 shows that the values of the fit statistics, as well as the conditional distribution statistics (parts (a) and (b)), are much smaller than when the binomial distribution was used.

This indicates that the beta distribution is more appropriate for the dataset, as the value of Pearson' s statistic is χ2 /df = 1.03, indicating that the problem of overdispersion was almost totally controlled. The variance component estimates as well as the estimated scale parameter ϕ ^ are tabulated in part (c). Similar to the previous analysis, the type III tests of fixed effects indicate that there is a highly significant difference (part (d)) in treatments on the average proportion of leaves with fungal disease.

The least mean squares (means) on the model scale (column "Estimate") and on the data scale (column "Mean") are tabulated in Table 6.47. The results indicate that


Table 6.47 Estimated means (least squares means) on the model scale and on the data scale



all proposed treatments in this study reduce the proportion of diseased leaves compared to the control treatment (t = 1).

The mean comparison (LSD) obtained with the option "lines" indicates that the proportion of diseased leaves in treatment one is statistically different from the rest of the treatments (Table 6.48).

# 6.8 Exercises

Exercise 6.8.1 Seeds of a particular crop were stored at four different temperatures (T1, T2, T3, and T4) under four different chemical concentrations (0, 0.1, 1.0, and 10). To study the effects of temperature and chemical concentration, a completely randomized experiment was conducted with a factorial treatment structure 4 × 4 and four replicates. For each of the 64 experimental units, 50 seeds were placed in a dish and the number of seeds that germinated under standard conditions was recorded. Germination data were obtained from Mead et al. (1993, p. 325) (Table 6.49).


Table 6.49 Seed germination experiment results


Table 6.50 Results of the apple sprouts experiment

The first number refers to the number of inoculations (n) and the second to the number of inoculations that developed the gangrenous sore (Y)


Exercise 6.8.2 Data were obtained from an experiment in which separate sprouts of apple trees were inoculated with macroconidia of the fungus Nectria galligena, which causes apple cancer (canker gangrene). The experimental factors were inoculum density (three levels: 200, 1000, and 5000 macroconidia per ml) and variety (three levels: Jonagold, Golden Delicious, and Jonathan). The experiment was carried out in 4 randomized blocks with 12 plots. Each plot consisted of one sprout on which five inoculations were made. The numbers of successful inoculations per plot on day 17 after inoculation are shown in the table below (Table 6.50).


Exercise 6.8.3 This experiment concerns the germination efficiencies of protoplasts obtained from plants of seven species of the genera Lycopersicon (tomato) and


Table 6.51 Protoplast germination experiment results

Solanum (potato). For each species, three or four protoplast isolates were used and, depending on the availability of the protoplasts, a variable number of plates was carried out. Per plate, approximately 105 protoplasts were placed in a Petri dish, and, after 4 weeks, the proportion of dividing protoplasts was recorded. The results in percentages are listed below (Table 6.51).


Exercise 6.8.4 The data in this example are the results of a triangle test for 12 raters tasting 10 pairs of coffee varieties (Table 6.52). The triangle test consisted of each rater drinking three cups, one of one variety and two of the other. Each rater had 12 triangles for each pair of varieties, 2 for each of the following sequences: AAB, ABA, BAA, ABB, BAB, and BBA. The answer is the correct variety identification number appearing once. The experiment was conducted in two groups of six


Table 6.52 Triangle test (G = group, Eval = panelist, PdV = variety pair, V\_A = variety A; V\_B = variety B; Y = number of correct discriminations, n = number of trials)


Table 6.52 (continued)

evaluators, each with the aim of discriminating the abilities of the panelists for future evaluations. The data for this example are shown below:


Exercise 6.8.5 Several brewing techniques are used in the production of espresso coffee. Among them, the most widespread are bar machines and single-dose pods, designed in large numbers due to their commercial popularity. This experiment tries to compare the foaming rate (Y, in percentage) effects of three different brewing techniques on espresso quality (method 1 = bar machine (BM), method 2 = hyperespresso method (HIP), and method 3 = I-espresso system (IT)). Nine replicates per method were carried out (Table 6.53).



Table 6.53 Experimental results of espresso coffee



Exercise 6.8.6 The decision to adopt a particular scale for data involving small integers is not an easy one because any analysis must be – to some extent – as adequate as possible to obtain estimates with as little uncertainty as possible. As a simple example of this type of data, consider the following results from a potted wheat germination experiment (Table 6.54).


Exercise 6.8.7 A greenhouse experiment was carried out to investigate how a disease spreads in two varieties of (agurkesyge) cucumber, which is supposed to depend on the climate and the amount of fertilizers used for the two varieties. The following data come from the Department of Plant Pathology. Two climates were used: (1) change to day temperature 3 hours before sunrise and (2) normal change to day temperature. Three amounts of fertilizer were applied, normal (2.0 units), high (3.5 units), and very high (4.0 units). The two varieties were Aminex and Dalibor. To have a better controlled experiment, the plants were "standardized" to equally have as many leaves, and, then (on day 0, for example), the plants were contaminated with the disease. Subsequently, 8 days after the plants were contaminated, the amount of infection (in percentage) was recorded. From the resulting infection curve, two measures were calculated (in a manner not specified here), namely, the rate of spread of the disease (%) and the level of infection at the end of the disease period. The experiment was implemented in three blocks, each of which consisted of two sections. Each section consisted of three plots, which were divided into two subplots, each of which had six to eight plants. Thus, there were a total of 36 subplots. The results were recorded for each subplot. The experimental factors were randomly assigned to the different units as follows: two climates to the two sections within each block, three amounts of fertilizer to the three plots within each section, and, finally, the two varieties to the two subplots within each plot. The data are shown below (Table 6.55).


Exercise 6.8.8 This example is an experiment to identify damage to the uterus in laboratory rodents after exposure to boric acid, a compound widely used in pesticides, pharmaceuticals, and other household products (Heindel et al. 1992). The study design included four doses of boric acid. The compound was administered to pregnant female mice during the first 17 days of gestation, and, then, the females were sacrificed and their litters examined. The table below presents the resulting trials for litters dying in utero (Y) of the total number of trials conducted (N) at each of the four doses tested: d1 = 0{control}, d2 = 0.1, d3 = 0.2, and d3 = 0.4 (as percentage of boric acid in the diet) (Table 6.56).



Table 6.55 Greenhouse experiment results of cucumber varieties


Table 6.56 Rodent experiment results

# Appendix







Data: Cockroaches (E1 = np, E2 = ng, E3 = adult)


Data: Cockroaches (E1 = np, E2 = ng, E3 = adult)

### Appendix 271


Data: Cockroaches (E1 = np, E2 = ng, E3 = adult)

Data: Disease incidence in grapevine plants (b = block, v = plant, r = shoot, t = treatment, m = number of diseased leaves per shoot, and n = total number of leaves per shoot).



Data: Disease incidence in grapevine plants (b = block, v = plant, r = shoot, t = treatment,


Data: Disease incidence in grapevine plants (b = block, v = plant, r = shoot, t = treatment,

Appendix 273


Data: Disease incidence in grapevine plants (b = block, v = plant, r = shoot, t = treatment,


Appendix 275


Data: Disease incidence in grapevine plants (b = block, v = plant, r = shoot, t = treatment,

Data: Disease incidence in grapevine plants (b = block, v = plant, r = shoot, t = treatment, m = number of diseased leaves per shoot, and n = total number of leaves per shoot).


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 7 Time of Occurrence of an Event of Interest

# 7.1 Introduction

In studies such as biological sciences, animal science, and agronomy, a common outcome of interest is the time at which an event of interest occurs. The main characteristic of these data is that the subjects/experimental units are usually observed for different periods of time until the event of interest occurs. These events of interest may be adverse events such as the death of an experimental unit and the cessation of lactation, or positive events such as the conception of a female's offspring from a particular treatment and the onset of estrus in a female undergoing hormone treatment, among others. Because of the characteristics of these response variables, a "normal" distribution is often a poor choice for modeling the time at which the event of interest occurs. Exponential, log-normal, gamma, Weibull, and other more complex distributions that tend to be more common and are better choices for modeling these phenomena.

Fitting a generalized linear mixed model (GLMM) is a good option for analyzing these phenomena because the conditional response distribution of the random effects of this model has desirable properties. In this vein, it is conventional to speak of survival data and survival analysis, regardless of the nature of the event. Similar data also arise when measuring the time to complete a task, such as walking 50 meters, passing an agronomy exam, performing a sensory evaluation of coffee, and so on. The purpose of this chapter is to provide the reader with the essential language of linear models and the connection between GLMMs and survival analysis.

# 7.2 Generalized Linear Mixed Models with a Gamma Response

The gamma family of distributions encompasses continuous, nonnegative, right-skewed values. A gamma distribution has two nonnegative parameters –α and β –the probability density function of which is given by:

$$f(\mathbf{y}; a, \boldsymbol{\beta}) = \frac{1}{\Gamma(a)\beta^a} \mathbf{y}^{a-1} e^{\left\{-\mathbf{y}\_{\beta}\right\}}, \mathbf{y} \ge \mathbf{0}$$

where Γ αð Þ = <sup>1</sup>t <sup>α</sup> - <sup>1</sup>e - <sup>t</sup> dt <sup>0</sup> is the gamma function (Casella and Berger 2002). The mean and variance of a random gamma variable are E[Y] = αβ = μ and Var½ - Y = αβ<sup>2</sup> = <sup>μ</sup><sup>2</sup> =<sup>α</sup>, respectively. This density function can be rewritten in terms of the mean μ and the scale parameter ϕ = 1/α.

$$f(\mathbf{y}; a, \boldsymbol{\beta}) = \frac{1}{\Gamma\left(\frac{1}{\phi}\right) (\mu \phi)^{\mathbb{V}} \mathbf{y}^{\frac{1}{\phi} - 1} e^{\left\{-\frac{\mathbf{y}}{\mu \phi}\right\}\_{\mathbf{y}}} \mathbf{y} \ge 0}.$$

# 7.2.1 CRD: Estrus Induction in Pelibuey Ewes

Estrus induction in ewes is a very common practice carried out in livestock farms or at research centers. For this, an animal researcher uses gonadotropin-releasing hormone (GnRH), equine chorionic gonadotropin (eCG), and P4 in a controlled internal drug-releasing (CIDR) intravaginal device in female Pelibuey ewes (n = 78) with single, double, and triple lambing as treatments. In order to ensure that all animals were in good condition during the experiment, ewes received the same zootechnical management and feeding. For this experiment, the ewes were synchronized on the same day under a synchronization protocol. Table 7.1 presents the analysis of variance (ANOVA).

The variables evaluated in this experiment were the time of onset and duration of estrus (yij) in hours according to the type of calving. The variability among female sheep on weight, age, and body condition must be taken into account in the



Table 7.2 Results of the analysis of variance

analysis. The data from this experiment can be found in the Appendix 1 of this book (Data: Pelibuey Sheep). Thus, the components of a gamma GLMM are as follows:

$$\begin{aligned} \text{Distributions: } y\_{jl} \mid \tau(r)\_{ij} &\sim \text{Gamma}\left(\mu\_{lj}, \phi\right); i = 1, 2, 3; j = 1, \cdots, r\_l. \\ r(\tau)\_{ij} &\sim N\left(0, \sigma^2\_{\tau(\text{axial})}\right) \end{aligned}$$

Linear predictor: ηij = μ þ τ<sup>i</sup> þ τð Þr ij

Link function: log μij = ηij

j τ animal where ηij is the ith link function for treatment i (type of birth angle, double or triple) in ewes j, μ is the overall mean, τ<sup>i</sup> is the fixed effect due to type of birth (treatment), r(τ)ij is the random effect due to type of birth (treatment) in ewes j with τð Þr N 0, σ<sup>2</sup> . ð Þ

The following GLIMMIX program fits the model

```
proc glimmix nobound method=laplace;
class animal birthtype;
model Inestro = birthtype/dist=gamma;
random birthtype (animal);
lsmeans birthtype/lines ilink;
run;
```
Part of the results is reported in Table 7.2.

Subsection (a) shows the estimated variance components due to the type of parturition used in females σ<sup>2</sup> birthtype animal ð Þ = - 0:0157ð Þ ± 0:0837 as well as the scale parameter ϕ = 0:06668 .

Table 7.2 (b) shows the results of the hypothesis tests for type III fixed effects, which indicate that there is a statistically significant effect of treatment (type of birth) on the time of onset and duration of ewe estrus.


Table 7.3 Means and standard errors on the model scale ("Estimate" column) and the data scale ("Mean" column) for the onset and duration of estrus in Pelibuey ewe lambs

The last two columns of Table 7.3, labeled "Mean" and "Standard error," correspond to the means (μij) on the data scale for the ewes' mean onset and duration of estrus with their respective standard errors. For example, the mean time to onset of estrus in single-birth ewes was 26.87 ± 1.78 hours, whereas for double- and triplebirth ewes, it was 21.37 ± 0.98 and 21.1 ± 0.95, respectively. On the other hand, the average time (in hours) of estrus duration was longer in double- and triple-birth ewes (14.46 ± 1.69 and 16.56 ± 1.63, respectively) compared to single-birth ewes (5.38 ± 0.81).

# 7.2.2 Randomized Complete Block Design (RCBD): Itch Relief Drugs

A total of 10 male volunteer patients between 20 and 30 years of age participated as a study group to compare 7 treatments (Trts) (5 drugs, 1 placebo, and 1 no drug) to relieve their itching. Since each subject responded differently to each drug, and, in addition, each subject received a different treatment in the 7 days of study, each of the subjects can be considered a block. Treatment assignment was randomized across days. Except for the drug-free day, subjects were administered the treatment intravenously, and, then, their forearms were induced to itch using an effective itch stimulus called cowage. The duration of itching, in seconds, was recorded. The data are shown in Table 7.4.

From left to right, the drugs used were papaverine = Papv, morphine = Morp, aminophylline = Amino, pentobarbital = Pent, and tripelennamine pentobarbital = Tripel.

The analysis of variance table (Table 7.5) shows the sources of variation and degrees of freedom for this experiment.


Table 7.4 Time taken to get rid of the itch

Table 7.5 Sources of variation and degrees of freedom


The components of the GLMM with a gamma response are as follows:

Distributions : yij j rðαβÞijk Gammaðμij, ϕÞ; i = 1,⋯, 7; j = 1,⋯, 10: rj N 0, σ<sup>2</sup> patient Linear predictor: ηij = μ þ rj þ τ<sup>i</sup>

Link function: log μijk = ηijk

where ηij is the predictor with treatment i and block j, μ is the overall mean, rj is the random effect of the patient with rj N 0, σ<sup>2</sup> patient , and <sup>τ</sup><sup>i</sup> is the fixed effect due to treatment.

Note, although the exponential and gamma distributions have a canonical link equal to the inverse of the mean, the gamma and exponential GLMMs most often use a computationally more stable link (link = log), which was used in this and in the previous analysis.

The following GLIMMIX syntax adjusts a GLMM into complete blocks.

```
proc glimmix nobound method=laplace;
class Patient Trt;
model y = Trt/dist=gamma;
random Patient;
lsmeans Trt/lines ilink;
run;
```


The statistics of the conditional model (Pearson′ s chi - squre/DF = 0.08) as well as the variance components (Patient) and the scale parameter ϕ of the model indicate that the gamma model adequately describes the dataset (Table 7.6 parts (a) and (b)). The analysis of variance (Table 7.6 part (c)) indicates that there is a highly significant difference of treatments in the mean time of itch duration (P = 0.0030).

The dispersion observed in the following plot (top left) of the residuals versus the linear predictor value suggests that the variance is constant and homogeneous (Fig. 7.1). The histogram (upper right) shows a nearly symmetrical pattern with little bias. Furthermore, the residuals versus quantile plot (bottom left) shows no marked deviations, indicating that the fit is adequate. Finally, the bottom right plot shows that the average residuals are zero and vary between -0.5 and 0.75.

The "lsmeans" on the data scale, for each of the five treatments, placebo, and the control treatment, are shown under the "Mean" column with their respective "Standard error" in Table 7.7. Each of the five drugs appear to have a significant effect compared to the placebo and control. Papaverine (Papv) is the most effective drug. Both the placebo and control treatment have statistically similar means. The relatively large difference in the placebo group suggests that some patients responded negatively to the placebo compared to the control, whereas others responded positively.

Figure 7.2 shows that the drug papaverine significantly reduced the itching time, followed by the drugs aminophylline and morphine, whereas the efficacies of the drugs pentobarbital and tripelennamine were highly similar to each other in eliminating itching.

# 7.2.3 Factorial Design: Insect Survival Time

This experiment consisted of studying the effectiveness of four different types of insecticides (Insec1, Insec2, Insec3, and Insec4) at three different concentration levels (low, medium, and high) in the survival time (in hours) of a particular species

Fig. 7.1 Conditional residuals

Table 7.7 Means and standard errors on the model scale ("Estimate" column) and the data scale ("Mean" column) for the average duration time of the itch


of beetles (Appendix 1: Data: Beetles). The interaction between both factors (insecticide \* dose) yielded a total of 12 combinations (treatments). The objective of this study was to compare the insecticides, dose, and interaction with beetle survival time. Due to the intrinsic characteristics of each of the insects, these must be considered as a source of variation in the experiment, since they respond differently

Fig. 7.2 Average time taken to eliminate itching

Table 7.8 Sources of variation and degrees of freedom


to certain stimuli. Assuming that 48 beetles are available, they were randomly assigned equally to 4 groups (blocks) with 12 treatment combinations. That is, four beetles were randomly assigned to each treatment.

The sources of variation and degrees of freedom for this experiment are shown in the following analysis of variance table (Table 7.8).

The components of the gamma-response GLMM are as follows:

$$\begin{aligned} \text{Distribations}: \text{y}\_{ijk} \mid r\_k &\sim \text{Gamma}(\mu\_{ijk}, \phi); i = 1, \dots, 4; j = 1, 2, 3; k = 1, \dots, 4. \\\ r\_k &\sim N(0, \sigma\_{\text{block}}^2) \end{aligned}$$

Linear predictor: ηijk = μ þ rk þ α<sup>i</sup> þ β<sup>j</sup> þ ð Þ αβ ij

Link function: log μijk = ηijk

The following GLIMMIX command adjusts a GLMM with a gamma response.


proc glimmix nobound method=laplace; class dose insecticide insect; model time = dose|insecticide/dist=gamma; random insect; lsmeans dose|insecticide/lines ilink; run;

Part of the Statistical Analysis Software (SAS) output is shown in Table 7.9. The value of the conditional model's Pearson′ s chi - square/DF = 0.04 indicates that the gamma distribution adequately models the data. The estimated variance component for blocks and the scaling parameter given by the "residual" value are shown below (in part (b)) ðσ^<sup>2</sup> block = - 0:00173, and σ^<sup>2</sup> = 0:04155, respectivelyÞ.

The analysis of variance in (c) of Table 7.9 indicates that the insecticides and dose (P = 0.0001) have different significant effectiveness (toxicity) on beetle survival time. However, the interaction between both factors is close to significance (P = 0.0868). The "lsmeans" values on the data scale for dose μ<sup>i</sup>:: (part (a)) and insecticide μ:j: (part (b)) with their respective standard errors for both factors are listed under the columns titled "Mean" and "Standard error mean" of Table 7.10, respectively.

The combination of levels of both factors affected the average survival time of the beetles (Table 7.11). For insecticides 1 and 3 at a high dose, the survival time was lower with average times of 2.1 ± 0.209 and 2.35 ± 0.334 hours, respectively. In general, low values of survival times were observed for insecticides 1 and 3 compared to insecticides 2 and 4.

# 7.2.4 A Split Plot with a Factorial Structure on a Large Plot in a Completely Randomized Design (CRD)

Four samples were obtained from each of two batches (Reps) of unprocessed gum from Acacia sp. Trees, with eight samples in total. Within each batch, the four


Table 7.10 Means and standard errors on the model scale ("Estimate") and the data scale ("Mean") for the factor dose and type of insecticide

Table 7.11 Means and standard errors on the model scale and the data scale for the interaction between dose and type of insecticide

Insec4 1.6284 0.05755 33 28.29 <0.0001 5.0960 0.2933


samples were randomly assigned to combinations of two factors with two levels each. The first factor refers to whether the gum was demineralized or not, and the second factor refers to whether the gum was pasteurized or not. An emulsion made from each gum sample was divided into three smaller parts, which were randomly assigned to the levels of a third factor, the PH, and pH was adjusted to 2.5, 4.5, or 5.5 using citric acid (Appendix 1: Data: Gum Breakdown Times).

This is a split-plot design, with whole plots and rubber samples in a block arrangement. The combined levels of demineralization and pasteurization of the paste are large (whole) plot factors. The split plots are the smaller parts, with a specific pH, which is the only split-plot factor. The response measured ( y) was the


Table 7.12 Sources of variation and degrees of freedom

time to break, i.e., the time (in hours) until the emulsion failed. The sources of variation and degrees of freedom for this experiment are shown in Table 7.12.

The components of the GLMM with a Gamma response are as follows:

$$\text{Distributions: } y\_{ijkl} \mid r\_l, a\theta(r)\_{ijl} \sim \text{Gamma}(\mu\_{ijkl}, \Phi); i = 1, 2; j = 1, 2; k = 1, 2, 3; l = 1, 2.$$

$$r\_l \sim N\left(0, \sigma\_r^2\right), a\theta(r)\_{ijl} \sim N\left(0, \sigma\_{ra\theta}^2\right)$$

$$\text{Linear predictor: } \eta\_{ijkl} = \mu + a\_i + \beta\_j + (a\theta)\_{ij} + r(a\theta)\_{ijl} + \chi\_k + (a\eta)\_{ik} + (\beta\eta)\_{jk}$$

$$+ (a\theta\gamma)\_{ijk};$$

ijl rαβ where αi, βj, and γ<sup>k</sup> are the fixed effects due to the factors demineralization, pasteurization, and pH, respectively; the effects (αβ)ij, (αγ)ik, (βγ)jk, and (αβγ)ijk are the two- and three-way interactions of the factors under study; and αβ(r)ijl are random effects due to the demineralization x pasteurization x rep interaction, assuming that αβð Þr N 0, σ<sup>2</sup> .

Link function: log μijk = ηijk

The GLIMMIX commands for setting this GLMM are as follows:

```
proc glimmix nobound method=laplace;
class Batch Demineralization Pasteurization pH;
model y = Demineralization|Pasteurization|pH/dist=gamma;
random batch(Demineralization*Pasteurization);
lsmeans Demineralization|Pasteurization|pH/lines ilink;
run;
```


Table 7.13 Results of the analysis of variance

The relevant results from the SAS output are shown in Table 7.13. The value of the conditional model <sup>χ</sup><sup>2</sup> DF = 0:01 indicates that the gamma distribution does not cause overdispersion. The variance component due to blocks × demineralization × pasteurization σ<sup>2</sup> <sup>r</sup>ð Þ αβ and the scale parameter ϕ are shown in (b).

The hypothesis tests for type III fixed effects are presented in part (c) of Table 7.13, where a significant effect of the factors demineralization, pasteurization, and pH as well as the interaction between demineralization with pasteurization are observed on the gum. However, the interactions demineralization\*pH (P = 0.0676) and demineralization\*pasteurization\*pH are close to significance (P = 0.0535). The emulsion breaking time is strongly affected by no demineralization (demineralization = 1) and no pasteurization (pasteurization = 1) of the gum and, to a lesser extent, by the pH adjusted to the gum (Table 7.14).

Analyzing the simple effects of the factors, we can observe that when the gum has not been pasteurized (B = 1), the average emulsion break time is very similar in the demineralized paste than in the non-demineralized paste at the three pH levels. However, when the gum has been pasteurized, demineralization has a significant impact on the emulsion breakup time; for example, for a paste that is not demineralized and pasteurized (A1B2), the emulsion breakup time is much lower than when the gum has been demineralized and pasteurized (A2B2) at all three pH levels. Finally, with a demineralized, pasteurized gum at pH = 4.5, a gum with higher breaking stability is obtained (Table 7.15).


Table 7.14 Means and standard errors of the main effects on the model scale (Estimate) and the data scale (Mean)

Table 7.15 Means and standard errors of the simple effects on the model scale (Estimate) and the data scale (Mean)


Demineralization\*pasteurization\*pH least squares means

A = demineralization (1 = no, 2 = yes), B = pasteurization (1 = no, 2 = yes), and C = pH (1 = 2.5, 2 = 4.5, and 3 = 5.5)

# 7.3 Survival Analysis

When a research focuses on the time of occurrence of a specific event, we usually refer to survival times, and, hence, the statistical analysis of these times, as mentioned above, is known as survival analysis. A very characteristic feature of survival times is the presence of censored times, that is, when there are individuals whose actual survival time is not known.

For a set of survival times (including censored ones) of a sample of individuals, it is possible to estimate the proportion of the population that will survive a time interval under the same circumstances. The methods used to make this estimate are based on the proposal of Kaplan and Meier (1958). This method allows – through different statistical tests (log rank, Breslow, Tarone–Ware, etc.) – the comparison of the survival of two or more groups of individuals who differ with respect to certain factors.

Survival analysis focuses its interest on a group or several groups of individuals for whom an event is defined, which occurs after a time interval. To determine the time of interest, there are three requirements: an initial time, a scale to measure the passage of time (minutes, hours, days, etc.), and clarity about what is meant by the event of interest.

Survival of an individual is conceptually the probability of being alive in a given time " t " from diagnosis, i.e., initiation of treatment or complete remission for a group of individuals. In clinical studies, survival times often refer to time till death, development of a particular symptom, or relapse after complete remission of a disease. Failure is defined as death, relapse, or the occurrence of a new disease. In many survival analyses, when the end of the observation period previously set by the investigator is reached, there are individuals to whom the event has not occurred and we do not know when it will occur. Therefore, the actual survival time for them is unknown, and only the survival time to the end of the study is known. Such survival times are called censored times. It also happens, in some cases, that some individuals do not continue the study until the end of the analysis period for reasons unrelated to the research, e.g., death from other causes; these times are also censored. These censored data contribute valuable information and, therefore, should not be omitted from the analysis.

The pharmaceutical and food industries are legally required to label the shelf life of their product on the packaging. For pharmaceuticals, the requirements for how to determine shelf life are highly regulated. However, the regulatory standards do not specifically define shelf life. Instead, the definition is implicit through the estimation procedure. The interest is in the situation where multiple batches are used to determine a shelf life of a product that applies to all future batches. Consequently, both shelf life and label life are of great importance because of the variability within and between batches. Product development must be very well thought out before a company can have confidence in shelf life estimates. The company must be able to reliably produce a homogeneous product from batch to batch of ingredients, as physical and chemical factors impact the ability of bacteria to grow, such as pH, water activity, and uniformity of the mix (moisture distribution, salt, preservative or food acid) and, consequently, the shelf life of the product. Therefore, products should be inspected at appropriate times and samples should be tested for critical stability of physical and chemical characteristics. These tests also provide an opportunity to begin microbiological testing for spoilage organisms. Testing should continue beyond the intended shelf life unless the product fails earlier. Testing should lead to an understanding of target levels and ranges of ingredients for evaluation of the critical physical and chemical characteristics of the product over the intended shelf life.

Survival analysis is the name for a collection of statistical techniques used to describe and quantify the time in which the event of interest occurs. The term "survival time" specifies the amount of time taken to occur. Situations in which survival analyses have been used in epidemiology include:


# 7.3.1 Concepts and Definitions

To clearly understand and interpret a rate of change calculated from the event data of interest, a more extensive approach is needed. The definition of a rate of change begins with the mathematical description of a changing pattern over time, represented by the symbol S(t). A version of a ratio is created by dividing the change in function S(t)[S(t) to S(t + Δt)] by the corresponding change over time t(t to t + Δt) producing the rate of change

$$\text{rate of change} = \frac{\text{change on } S(t)}{\text{change on time}} = \frac{S(t) - S(t + \Delta t)}{(t + \Delta t) - t} = \frac{S(t) - S(t + \Delta t)}{\Delta t}$$

Rates of change, with respect to time, apply to a variety of situations, but one specific function, traditionally denoted by S(t), is fundamental to the analysis of survival data. This is called the survival function and is defined as the probability of surviving (probability of survival) beyond a specific point in time (denoted by t). That is;

$$\begin{aligned} S(t) &= P(\text{survival time} = 0 \text{ at time} = t) \\ &= P(\text{survival in the interval } [0, t]) \end{aligned}$$

Equivalent to

$$S(t) = P(\text{surviving beyond time } t) = P(T \ge t) = 1 - F(t)$$

where F(t) is the cumulative distribution function with F(t) = P(T ≤ t). Another important concept in survival analysis is the hazard function h(t). The hazard function that depends on T is defined as

294 7 Time of Occurrence of an Event of Interest

$$h(t) = \lim\_{\Delta t \to 0} \left\{ \frac{P(t \le T < t + \Delta t | T \ge t)}{\Delta t} \right\}.$$

such that the following expression can be expressed as

$$h(t) = \lim\_{\Delta t \to 0} \left\{ \frac{F(t + \Delta t) - F(t)}{\Delta t} \right\} \times \frac{1}{P(T \ge t)}$$

$$h(t) = \frac{f(t)}{S(t)}$$

where f(t) is the probability density function. Any distribution defined by t 2 [0, t) can serve as a survival distribution. Consequently,

$$h(t) = -\frac{\mathfrak{d}}{t} \{ \log S(t) \} .$$

It then follows that

$$S(t) = \exp\{-H(t)\}$$

where H(t) the cumulative hazard function

$$H(t) = \int\_0^t h(u) du$$

Another useful relationship is

$$H(t) = -\log S(t).$$

For the simplest model, the exponential model with h(t) = λ (λ is a constant), the survival function is given by

$$S(t) = \exp\left\{-\int\_0^t h(u)du\right\} = \exp-\int\_0^t \lambda du = e^{-\lambda t}$$

with the probability density function given by

$$f(t) = \frac{\mathfrak{D}}{t} \mathcal{S}(t) = \lambda e^{-\lambda t}.$$

Thus, the survival function, hazard function, and cumulative risk for the exponential model is given by:

Survival function: S tð Þ = e - <sup>λ</sup><sup>t</sup>

$$\text{Risk function: } h(t) = \frac{f(t)}{S(t)} = \frac{\lambda e^{-\lambda t}}{e^{-\lambda t}} = \lambda$$

$$\text{Cumulative risk function: } H(t) = \int\_{t}^{t} h(u) du = \int\_{0}^{t} \lambda du = \lambda t.$$

# 7.3.2 CRD: Aedes aegypti

The objective of this experiment was to test the vulnerability of Aedes aegypti mosquitoes to different fungal treatments (four treatments). A bioassay was conducted to determine the survival time of each of the mosquitoes. Three-day-old mosquitoes were maintained after hatching in 45-cm rearing cages with access to water but not food. The mosquitoes were kept in rearing cages with water and fed warm pig blood (37 °C) through a natural membrane (sausage casing) approximately every 3 days and allowed to oviposit freely during the waiting period. A total of 10 mosquitoes were placed in a chamber to which one of the treatments (four) plus a control was applied. Here, we present part of the data from a bioassay with four replicates. The complete data from this trial can be found in the Appendix 1 (Data: Aedes aegypti).



The components of this GLMM are as follows:

$$\begin{aligned} \text{Distribations: } y\_{ij} \mid \text{rep}\_j &\sim \text{Gamma}\left(\mu\_{ij}, \phi\right),\\ \text{rep}\_j &\sim N\left(0, \sigma\_{\text{rep}}^2\right) \end{aligned}$$

Linear predictor: ηij = η þ τ<sup>i</sup> þ rep<sup>j</sup>

Link function: ηij = log μij

rep where η is the intercept, τ<sup>i</sup> is the treatment effect, and rep<sup>j</sup> is the random effect due to the mosquito chamber assuming rep<sup>j</sup> N 0, σ<sup>2</sup> :

The following GLIMMIX commands adjust a GLMM with a gamma response:

```
proc glimmix data=mosquitos method=laplace;
class bio trt rep;
model y = trt/dist=gamma;
random rep;
lsmeans trt/lines ilink;
run;
```
Part of the output is shown in Table 7.16. The statistic in (a) above indicates that there is no over-dispersion in the fit of the data, as indicated by Pearson′ s chi square/DF = 0.18. The analysis of variance (type III tests of fixed effects) indicates that there is a highly significant effect (P = 0.0001) of the fungal treatments on the mean mosquito survival time.

The relevant information in Table 7.17 "lsmeans" comes from the columns labeled "Estimate" and "Mean": these are the estimates on the model scale and the data scale, and the average survival time in each of the treatments is represented by μ<sup>i</sup> ð Þ ±standard error .

The estimated risk function for each treatment combination is ^λ<sup>i</sup> = <sup>1</sup>=μ^i : For example, for treatment Ma1, the estimated hazard function is λMa<sup>1</sup> = <sup>1</sup>=<sup>3</sup>:<sup>4223</sup> = 0:2922: We can manually calculate these values from the Mean column or we can automate the process by adding the command "ods output lsmeans = mu" in the GLIMMIX


Table 7.17 Means and standard errors of the main effects on the model scale (Estimate) and the data scale (Mean)

program above. Once we have saved the treatment means, we can ask SAS to estimate the estimated hazard function for the treatments. The commands are as follows:

```
data hazard;
set mu;
hazard=1/mu;
proc print data=hazard;
run;
```
The results are listed below in Table 7.18. The hazard column contains the estimated hazard functions for each treatment hið Þt = λi.

From the values λi, we can calculate the estimated survival function Sið Þt = e - <sup>λ</sup>it for each of the treatments. Figure 7.3 shows the probability of survival over time obtained with Sið Þt = e -^λit of each of the proposed treatments and the control. Clearly, the treatments MaS, Ma1, MaC, and Mam showed a greater efficacy in the biological control of these mosquitoes.

# 7.3.3 RCBD: Aedes aegypti

Similar to the previous example, this experiment consisted of testing the vulnerability of Aedes aegypti mosquitoes to different fungal treatments (four treatments). For this, two bioassays were conducted to determine the survival time of each of the mosquitoes. Three-day-old mosquitoes were maintained after hatching in 45-cm rearing cages with access to water but not food. Mosquitoes were maintained in rearing cages with water and were fed warm pig blood (37 °C) through a natural membrane (sausage casing) approximately every 3 days. They were allowed to freely oviposit during the waiting period. A total of 10 mosquitoes were placed in a chamber to which one of the treatments (four) plus a control was applied. The data can be found in the Appendix 1 (Data: Aedes aegypti).



Fig. 7.3 Estimated survival probability for each treatment

The components of this GLMM are as follows:

$$\begin{aligned} \text{Distributions: } & \mathbf{y}\_{ijk} \mid \mathbf{b} \mathbf{io}\_j, \text{rep}(\mathbf{b} \mathbf{io})\_{k(j)} \sim \mathbf{Gamma}(\mu\_{ijk}, \phi) \\ \mathbf{b} \mathbf{io}\_j \sim N\left(\mathbf{0}, \sigma^2\_{\mathbf{b} \mathbf{io}}\right), \text{rep}(\mathbf{b} \mathbf{io})\_{k(j)} \sim N\left(\mathbf{0}, \sigma^2\_{\text{rep}(\mathbf{b} \mathbf{io})}\right) \\ \text{Linear predictor: } & \eta\_{ij} = \eta + \tau\_i + \mathbf{b} \mathbf{io}\_j + \text{rep}(\mathbf{b} \mathbf{io})\_{k(j)} \end{aligned}$$

bio k jð Þ rep bio ð Þ where η is the intercept, τ<sup>i</sup> is the treatment effect, bio<sup>j</sup> and rep(bio)k( <sup>j</sup>) are the random effects of the bioassay and the mosquito chamber within the bioassay, respectively, assuming bio<sup>j</sup> N 0, σ<sup>2</sup> and rep bio ð Þ N 0, σ<sup>2</sup> :

Link function: ηij = log μij

The following GLIMMIX program fits a block GLMM with a gamma response.

```
proc glimmix method=laplace nobound;
class bio trt ind rep;
model y = trt/dist=gamma;
random bio rep(bio);
ods output lsmeans=mu;
lsmeans trt/lines ilink;
run;quit;
```
The results obtained are shown below. Part of the statistics and variance components are listed in Table 7.19. In part (a), the value of the statistic of


Table 7.20 Means and standard errors of the main effects on the model scale (Estimate) and the data scale (Mean)


Table 7.21 Means and standard errors of the main effects on the model scale (Estimate), the data scale (Mean), and the hazard function λ<sup>i</sup>


Pearson′s chi - square/DF = 0.34 and in part (b), the estimated variance components due to blocks, within-block replicates, and experimental error are σ^2 bio = 0:1859, σ^<sup>2</sup> repðbio<sup>Þ</sup> = 0:02562, and σ^<sup>2</sup> = 0:2822, respectively. The type III effect hypothesis tests (part (c)) indicate that there is a highly significant difference between treatments on the mean survival time, as indicated by P = 0.0001.

Tables 7.20 and 7.21 show the estimates on the model scale and the data scale, linear predictors η<sup>i</sup> ð Þ, means μ<sup>i</sup> ð Þ with their respective standard errors, and the estimated hazard function. The results indicate that the MaC treatment has a greater lethal effect than A. aegypti mosquito control.

Fig. 7.4 Estimated survival probability for each treatment

curves were obtained with Sið Þt = e - <sup>λ</sup>i<sup>t</sup> . Figure 7.4 shows the survival times for the different treatments tested. These

# 7.4 Exercises

Exercise 7.4.1 The investigation of this experiment focused on studying the times of animal incapacitation experienced after being exposed to the burning of eight types of aircraft interior materials (M1–M9) and performances in milligram/gram combustion of seven gases (CO, HCN, H2S, HCl, HBr, NO2, SO2) (Spurgeon 1978). The recorded incapacitation time of the animal when exposed to different combustion materials (under the column "Material") is found under the column "Time in minutes" and in the third column the value of (1000/Time); these data are shown below (Table 7.22):



Table 7.22 Time of incapacity of the animal when exposed to different combustion gases


Table 7.22 (continued)

Exercise 7.4.2 Cockroaches are responsible for 80% of infestations in spaces used by humans. They associate with humans and have the ability to contaminate food with their feces and secretions, having both medical and economic implications. Different insecticides have been formulated, mainly synthetic, and, in some cases, have led to the development of cockroaches' resistance. This example deals with the study of survival in days ( y) of this insect when exposed to two promising fungi in the biological control of this insect plus an already known control. The data for this example are shown below (Table 7.23):


Table 7.23 Results of the cockroach biological control experiment


Table 7.23 (continued)

(a) Write down a statistical model of this experiment.


Exercise 7.4.3 Consider a study on the effect of analgesic treatments (Trt) in elderly patients with neuralgia. Two test treatments (A and B) and a placebo (P) are compared. The response variable is whether the patient reported pain or not (yes = 1, n = 0). The investigators recorded the age (E) and sex (S) of 60 patients and the duration (time = T) in which the pain disappeared after starting the treatment. The data are presented in the Table 7.24 below.



Table 7.24 Results with neuralgia patients (Trt = Treatment, S = Sex, E = Age, T = Time, D = Pain with yes = 1 and no = 0)

Exercise 7.4.4 Refer to the previous exercise and perform an analysis of covariance.


# Appendix 1

Data: Onset and duration of estrus in Pelibuey ewes (age in weeks, weight in kilograms, Inestro = number of days from the onset of estrus, Durestro = number of days in the duration of estrus)



### Appendix 1 307


Data: Beetles




Data: Rubber break time

Data: Aedes aegypti (Trt = treatment, Rep = repetition, Y = survival time)



Data: Aedes aegypti (Bio = bioassay, Trt = treatment, Rep = repetition, Y = survival time)







### Data: Pelibuey Sheep



### Appendix 1 317


Data: Gum Breakdown Times


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 8 Generalized Linear Mixed Models for Categorical and Ordinal Responses

# 8.1 Introduction

According to Agresti (2013), a multinomial distribution is a generalization of a binomial distribution in cases with more than two possible ordered (ordinal) or unordered (nominal) outcomes. Given a response with more than two possible outcomes and independent trials with probabilities of similar category for each trial, the distribution of counts across categories follows a multinomial distribution. Quinn and Keough (2002) believe that several methods exist for multinomial data analysis. The most common form of categorical data analysis in biological sciences, which results in frequency counts, is creating cross-tabulations or contingency tables and chi-squared tests to examine associations between two or more categorical variables. However, such an approach is ill suited for a study aimed at estimating the response when there is a change in the explanatory variable(s), as contingency tables are used to analyze the association between variables without considering a predictor or response variable. In this analysis, the results are valid as long as less than 20% of the cells have an expected count less than five and none are less than one (Logan 2010). Fisher's exact test extends the chi-squared test in studies involving small sample sizes.

There are several methods for modeling multinomial data; traditional methods of multinomial data analysis include frequency analysis (counts), which uses the chi-squared test and the log-linear model for contingency tables. This chapter focuses on describing multinomial logit and probit models in detail.

# 8.2 Concepts and Definitions

For the multinomial distribution each observation drawn from a total of N observations belongs to exactly one of the mutually and exclusive c = 1, ⋯, C categories and each category has a probability π<sup>c</sup> (c = 1, ⋯,C) of belonging to the category c. A multinomial distribution refers to the probability that exactly one randomly sampled observation from the population belongs to category y1, that is, it belongs to category 1, y<sup>2</sup> observations belong to category 2, and so forth up to category C,where C c = 1 yc = N and C c = 1 π<sup>c</sup> = 1. The density function of this distribution

is equal to

$$f(\mathbf{y}\_1, \mathbf{y}\_2, \dots, \mathbf{y}\_C) = \frac{N!}{\mathbf{y}\_1! \mathbf{y}\_2! \dots \mathbf{y}\_C!} \pi\_1^{\mathbf{y}\_1} \pi\_2^{\mathbf{y}\_2} \dots \pi\_C^{\mathbf{y}\_C}$$

Multinomial models are applied in data analysis where the categorical response variable has more than two possible outcomes while the independent variables can be continuous, categorical, or both (Hosmer and Lemeshow 2000). The categorical response variable can be either ordinal (ordered) or nominal (unordered). Ordinal response variables are single values that represent a rank order on some dimension, but there are not enough values to be treated as a continuous variable. Nominal (unordered) response variables are those whose values provide a rank but do not provide an indication of order. Models for multinomial data are constructed in a similar way as for binomial data. The link functions used in these types of models are similar to the logit and probit functions used for binomial data. Cumulative logit and cumulative probit models define the link function such that when properly fitted to the data, they allow for parsimonious modeling of ordinal or multinomial data. Generalized logit and probit models do not require ordered categories and are therefore suitable for multinomial nominal data.

In terms of generalized linear models (GLMs) and generalized linear mixed models (GLMMs), a multinomial distribution with C categories requires C - 1 link functions to fully specify a model that relates the response probabilities (π1, π2, ..., πC) to the linear predictor. The commonly used models are the cumulative logit model, also known as the proportional odds model proposed by McCullagh (1980), and the cumulative probit model, also known as the threshold model. Throughout this chapter, we will use either of these two link functions interchangeably.

The link functions for a cumulative logit model with C categories are

$$\eta\_1 = \log\left(\frac{\pi\_1}{1-\pi\_1}\right) = \eta\_1 + \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{b}$$

$$\eta\_2 = \log\left(\frac{\pi\_1 + \pi\_2}{1 - (\pi\_1 + \pi\_2)}\right) = \eta\_2 + \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{b}$$

$$\vdots$$

$$\eta\_{C-1} = \log\left(\frac{\pi\_1 + \pi\_2 + \dots + \pi\_{C-1}}{1 - (\pi\_1 + \pi\_2 + \dots + \pi\_{C-1})}\right) = \eta\_{C-1} + \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{b}$$

where X and Z are the design matrices, whereas β and b are the vectors of fixed and random effects parameters, respectively. The inverse links of each of the functions are as follows:

$$\begin{aligned} \pi\_1 &= \frac{1}{1 + e^{-\eta\_1}} = h(\eta\_1) \\ \pi\_1 + \pi\_2 &= \frac{1}{1 + e^{-\eta\_2}} = h(\eta\_2) \\ &\vdots \\ \pi\_1 + \pi\_2 + \dots + \pi\_{C-1} &= \frac{1}{1 + e^{-\eta\_{c-1}}} = h(\eta\_{C-1}). \end{aligned}$$

Once h(η1), h(η2), ... h(η<sup>c</sup> - 1) have been estimated, we can then estimate the probabilities π^1, π^2, ..., π^c.

# 8.3 Cumulative Logit Models (Proportional Odds Models)

Multinomial logit models are used to model the relationships between a polytomous response variable and a set of predictor variables. These polytomous response models can be classified – as mentioned above – into two different types, depending on whether the response variable has an ordered or an unordered structure.

In a proportional odds model, the covariates (linear predictor η) have the same effect on the probabilities that the response variable has in any category when considering different values of the covariates, thus shifting the response distribution to the right (or left) without changing the shape of the distribution. In a proportional odds model, the cumulative logits model the effect of the covariates on the response probabilities below or equal to the category cutoff.

A multinomial logit model assumes independence of categories, which implies that the probabilities of choosing a category c relative to a category c ′ are independent of the category characteristics of c and c ′ for c ≠ c ′ . The assumption requires that if a new category is available, then the prior probabilities are precisely adjusted to preserve the original probabilities between all pairs of outcomes. The proportional odds model employs a strict assumption that the odds ratio does not depend on the category, and, therefore, we need to test the proportional odds assumption, which is also called the "parallel regression assumption."

# 8.3.1 Complete Randomize Design (CRD) with a Multinomial Response: Ordinal

Data are obtained from an experiment related to red core disease in strawberries, which is caused by the fungus Phytophthora fragariae. In this example, 12 strawberry populations were evaluated in a completely randomized experiment with 4 replications (Table 8.1). Plots generally consisted of 10 plants; in some cases, only 9 plants were observed. At the end of the experiment, each plant was assigned to one of three ordered categories representing fungal damage (1 = no damage, 2 = moderate damage, and 3 = severe damage).

A total of 12 populations were obtained by crossing 3 genotypes of male parents with 4 genotypes of female parents. The variation between and within plots is considered minimal, whereas the genetic and nongenetic effects are more significant, as plants from the same cross are not genetically identical.

The model that fits these data for the cumulative probabilities is a GLMM, which exhibit a classification effect on the treatment variable (population resulting from crossing genotypes). Thus, the GLMM for multinomial ordered outcomes with C categories requires C - 1 link function equations to fully specify the model that relates the response probabilities (π1, π2, ..., πC) to the linear predictor ηij (Stroup 2013). The C - 1 multinomial logit equations are tested against each of the remaining categories 1, 2, . . , C - 1.


Table 8.1 Evaluation of red core disease in strawberry plants

The components of the GLMM with an ordinal multinomial response are as follows:


$$\log\left(\frac{\pi\_{1\bar{ij}}}{1-\pi\_{1\bar{ij}}}\right) = \eta\_{1\bar{ij}}$$

$$\log\left(\frac{\pi\_{1\bar{ij}}+\pi\_{2\bar{ij}}}{1-\left(\pi\_{1\bar{ij}}+\pi\_{2\bar{ij}}\right)}\right) = \eta\_{2\bar{ij}}$$

The following GLIMMIX program fits a cumulative logit model with an ordinal multinomial response in a CRD.

```
proc glimmix data=FRESA;
class rep trt cat;
model cat(order=data)= trt/dist=Multinomial link=clogit solution
oddsratio;
random intercept/subject=rep solution ;
estimate 'c=1, t=1' intercept 1 0 trt 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
'c=2, t=1' intercept 0 1 trt 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
'c=1, t=2' intercept 1 0 trt 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
'c=2, t=2' intercept 0 1 trt 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
'c=1, t=3' intercept 1 0 trt 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
'c=2, t=3' intercept 0 1 trt 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0,
'c=1, t=4' intercept 1 0 trt 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0,
'c=2, t=4' intercept 0 1 trt 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0,
'c=1, t=5' intercept 1 0 trt 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0,
'c=2, t=5' intercept 0 1 trt 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0,
'c=1, t=6' intercept 1 0 trt 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0,
'c=2, t=6' intercept 0 1 trt 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0,
'c=1, t=7' intercept 1 0 trt 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0,
'c=2, t=7' intercept 0 1 trt 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0,
'c=1, t=8' intercept 1 0 trt 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0,
'c=2, t=8' intercept 0 1 trt 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0,
'c=1, t=9' intercept 1 0 trt 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0,
'c=2, t=9' intercept 0 1 trt 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0,
'c=1, t=10' intercept 1 0 trt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0,
'c=2, t=10' intercept 0 1 trt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0,
```


```
'c=1, t=11' intercept 1 0 trt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0,
'c=2, t=11' intercept 0 1 trt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0,
'c=1, t=12' intercept 1 0 trt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1,
'c=2, t=12' intercept 0 1 trt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1/ilink;
freq freq;
run;
```
Although most of the GLIMMIX commands have already been described in previous examples, it is important to emphasize that the data should be structured in a logical way as follows: one line for repetition, treatment, lesion category, and the frequency or number of observations (Y), which, in this case, is referenced by the variables rep, trt (trt = cross), cat (category), and freq, respectively. Part of the data arrangement can be seen in Table 8.2, whereas the rest of the dataset can be found in the Appendix (Data: CRD with multinomial response: ordinal).

In the program commands of this example, "order = data" indicates that the order in which the categories are arranged in the dataset is under an order (ordinal) category. Consider that the observations in each line always have order categories such as no injury (Without), moderate injury (Moderate), and severe injury (Severe). If there is no congruent order in the arrangement of the dataset to be analyzed, then GLIMMIX will reorder the categories in an alphabetical or numerical order depending on the initial coding of the data. The "estimate" command specifies the estimable functions that form the boundaries between the categories for each of the populations (trt). Finally, the "freq command" instructs GLIMMIX to use "freq" as the number of observations (frequency) under the corresponding categorization. In this way, the first estimate " c = 1, t = 1" defines the predictor η<sup>1</sup> + τ1, that is, the boundary between the "Without" and "Moderate" categories for treatment 1 with its corresponding logit log <sup>π</sup><sup>11</sup> <sup>1</sup> - <sup>π</sup><sup>11</sup> , whereas the second estimate " c = 2, t = 1" defines the boundary between the categories of "Moderate" and "Severe" damage with the logit log <sup>π</sup>11þπ<sup>21</sup> <sup>1</sup> - <sup>ð</sup>π11þπ21<sup>Þ</sup> , which estimates the probability of observing <sup>a</sup> plant from


Table 8.3 Results of the multinomial analysis of variance for injury level in strawberry plants


Table 8.4 Fixed effects solution for injury categories

population1 (M1H1 = trt) "Without" damage and "Moderate" damage when exposed to the fungus (Phytophthora fragariae). Part of the output is presented in Table 8.3.

The estimated variance component (part (a)) due to plants is σ<sup>2</sup> <sup>r</sup> = 0:1453, whereas the hypothesis tests for type III effects (part (b)) ("Type III tests of fixed effects") indicate that the crosses have different significant tolerance levels to fungal attacks (Pr > F = P = 0.0032). The results of the fixed effects solution, obtained by specifying the "solution" option in the model, are shown in Table 8.4.

From the fixed effects solution, we can estimate the linear predictors for the two categories of each treatment, which are in terms of the model scale. For example, for treatment 1, the first category of injury <sup>η</sup><sup>11</sup> <sup>=</sup> <sup>η</sup><sup>1</sup> <sup>þ</sup> <sup>τ</sup><sup>1</sup> = - <sup>0</sup>:<sup>4571</sup> <sup>þ</sup> ð Þ - <sup>1</sup>:<sup>1456</sup> = - <sup>1</sup>:6027, where <sup>η</sup><sup>1</sup> defines the boundary between the categories "Without" damage and "Moderate" damage and η<sup>2</sup> defines the boundary between the categories "Moderate" damage and "Severe" damage, and the linear predictor is η<sup>11</sup> = η<sup>2</sup> þ τ<sup>1</sup> = 1:0631 þ ð Þ - 1:1456 = - 0:0825. Note that for the proportional odds, the τ<sup>i</sup> values are not category-specific; treatment effects move the boundaries as a group.


Table 8.5 Estimated odds ratio

The odds ratio (Table 8.5) is the result of taking eτ<sup>i</sup> for crosses 1–12. Since odds ratios are not specific to a particular category, this value is the same for all three categories and hence the name odds ratio.

In Table 8.6, we show the maximum likelihood estimates of the linear predictors ^ηci =^η<sup>C</sup> þ ^τ<sup>i</sup> in the "Estimate" column, in terms of the model scale, as well as the means on the data scale for each of the categories of the treatments tested ("Mean").

Thus, for c = 1, t = 1 (response category "Without" damage and treatment 1), the estimator is η<sup>11</sup> = - 1:6027 and for c = 2, t = 1 ("Moderate" damage and treatment 1), the linear predictor is η<sup>21</sup> = - 0:0825. Taking the inverse of the link function yields the probability of π<sup>11</sup> = <sup>1</sup>=<sup>1</sup>þe1:<sup>6027</sup> = 0:1676. This is the estimated probability for which the cross (treatment) M1H1 has a response score of "Without damage." This inverse value is presented under the "Mean" column (Table 8.6).

Now, for c = 2, t = 1, the inverse of the link yields the following probability: π<sup>11</sup> þ π<sup>21</sup> = <sup>1</sup>=<sup>1</sup>þe0:<sup>0825</sup> = 0:4794 (cumulative probability). From this value, we deduce the probability of observing a "Moderate" damage and a "Severe" damage in the plant of the cross M1H1. For "Moderate" damage, the probability is π<sup>21</sup> = 0:4794 - π<sup>11</sup> = 0:4794 - 0:1676 = 0:3118, and, for "Severe" damage, it is π<sup>31</sup> = 1 - π<sup>11</sup> þ π<sup>21</sup> = 1 - 0:4794 = 0:5206. Similarly, the rest of the probabilities in the different crosses are estimated.

# 8.3.2 Randomized Complete Block Design (RCBD) with a Multinomial Response: Ordinal

In recent years, poultry production has become conscious of animal welfare, which is associated with bird mortality, behavior, and health, among others (Stanley 1981; Martrenchar et al. 2002). One of the diseases related to animal welfare is footpad dermatitis, and, among many repercussions, it affects a bird's ability to walk (Bilgili et al. 2009). Pododermatitis is known as contact dermatitis or footpad dermatitis and


Table 8.6 Estimates on the model scale (Estimate) and on the data scale (Mean) for the damage categories in strawberry plants

is characterized by inflammation and necrotic lesions from the plantar surface to deep within the footpads of chicken. Deep ulcers may result in abscesses and in the thickening of the underlying tissues and structures (Greene et al. 1985).

Chicken feet have great economic importance because they are in high demand in the foreign market, mainly in Southeast Asia and China; however, due to diseases or alterations such as pododermatitis, there are significant economic losses since diseased feet are not suitable for human consumption and this, subsequently, reflects in market prices (Taira et al. 2014). Due to the economic importance of this product, Garcia et al. (2010) have focused on studying the factors that cause this disease and


Table 8.7 Treatment design

on finding strategies to reduce leg and carcass lesions in poultry. Important factors in broiler fattening are the type of litter, litter height, nutrition and feeding programs, and bird health, among others.

The objective of this study was to evaluate the effect of litter density and organic minerals (Availa Zn and Availa Mn), with an extract of Yucca schidigera (Micro-Aid) as a supplement to a traditional fattening program, on the development of footpad dermatitis in broilers. The genetic material used in this experiment was mainly male Ross line chickens. The traditional broiler fattening program by the poultry farm consists of three phases: a starter diet (1–18 days), a grower diet (19–35 days), and a finisher diet (36–50 days), applied for a period of 50 days, where rice husk is used as bedding material at a density of 1 kg m-<sup>2</sup> . In this research, a foot health program was implemented in addition to the traditional fattening program, which included the addition of 125 ppm of Micro-Aid (Yucca schidigera extract), 40 ppm of Availa Zn, and 40 ppm of Availa Mn to the fattening diet.

Based on the above information, four treatments were evaluated at two poultry farms, as described below:


The response variable evaluated was the degree of foot lesion (pododermatitis) at the end of the fattening period (50 days). The response variable was evaluated on 1250 chickens per treatment. The degree of a footpad lesion was determined according to a visual guide for lesions in chickens based on the method of De Jong and Guémené (2012). This method entails defining three grades: grade 0 is attributed to legs with no lesions, grade one is if lesions exist in some areas of the footpad (<50%), and grade two is if the leg has extensive lesions in areas of the footpad (50–100%). Table 8.8 shows the dataset indicating the block, treatment, level of lesion, and the number of birds observed with a given lesion (frequency).


Table 8.8 Pododermatitis in broilers

Note: Without stands for no lesion, slight stands for moderate lesion, and severe stands for severe lesion

The GLMM for multinomial ordered results with C categories requires C - 1 link function equations instead of one to fully specify a model that relates the response probabilities (π1, π2, ..., πC) to the linear predictor ηij (Stroup 2013). The C - 1 multinomial logit equations are tested against each of the categories 1, 2, ..., C - 1.

The link functions for the cumulative logit model to describe the response variable with C categories are as follows:

$$\begin{aligned} \eta\_{(1)\bar{y}} &= \log\left(\frac{\pi\_{1\bar{y}}}{1-\pi\_{1\bar{y}}}\right) = \eta\_1 + \tau\_i + b\_j\\ \eta\_{(2)\bar{y}} &= \log\left(\frac{\pi\_{1\bar{y}} + \pi\_{2\bar{y}}}{1 - \left(\pi\_{1\bar{y}} + \pi\_{2\bar{y}}\right)}\right) = \eta\_2 + \tau\_i + b\_j\\ &\vdots\\ \eta\_{(C-1)\bar{y}} &= \log\left(\frac{\pi\_{1\bar{y}} + \pi\_{2\bar{y}} + \dots + \pi\_{(C-1)\bar{y}}}{1 - \left(\pi\_{1\bar{y}} + \pi\_{2\bar{y}} + \dots + \pi\_{(C-1)\bar{y}}\right)}\right) = \eta\_{C-1} + \tau\_i + b\_j \end{aligned}$$

The components of the GLMM with an ordinal multinomial response variable are as follows:


$$\log\left(\frac{\pi\_{0\ddot{y}}}{1-\pi\_{0\ddagger\dot{y}}}\right) = \eta\_{(0)\ddagger\dot{y}}$$

$$\log\left(\frac{\pi\_{0ij} + \pi\_{1ij}}{1 - \left(\pi\_{0ij} + \pi\_{1ij}\right)}\right) = \eta\_{(1)ij}$$

The following GLIMMIX commands fit a cumulative logit model with an ordinal multinomial response.

```
proc glimmix data=multinomial_ord;
class block trt;
model categoria (order=data)= trt/dist=Multinomial link=clogit
solution oddsratio(DIFF=LAST LABEL);
random intercept/subject=block;
estimate 'c=0, t=1' intercept 1 0 trt 1 0,
'c=1, t=1' intercept 0 1 trt 1 0,
'c=0, t=2' intercept 1 0 trt 0 1 0,
'c=1, t=2' intercept 0 1 trt 0 1 0,
'c=0, t=3' intercept 1 0 trt 0 0 0 1 0,
'c=1, t=3' intercept 0 1 trt 0 0 0 1 0,
'c=0, t=4' intercept 0 1 trt 0 0 0 0 1,
'c=1, t=4' intercept 1 0 trt 0 0 0 0 1/ilink;
freq y;
run;
```
The data should have one column for block, treatment, lesion category, and frequency or number of observations (Y), which, in this case, is referenced by the variables block, trt, category, and frequency, respectively.

Most of the options in the above syntax have already been explained previously; the "order = data" option specifies that the order in which the categories appear in the dataset will be treated as ordinal categories from the lowest to the highest for the analysis. If this option is not used with the response variable in the model specification, "proc GLIMMIX" will rearrange its categories in an alphabetical or numerical order, but this will depend on whether the categories are entered as a number or a name. The "freq y " option orders GLIMMIX to use y as the number of observations in the corresponding category. The "estimate" command specifies the estimable functions that form the boundaries between categories of each of the four treatments. For example, the first estimate "c = 0, t = 1" defines η<sup>0</sup> + τ1, that is, the boundary between the categories "Without" (no lesion) and "Moderate" (slight lesion) for treatment 1. This first estimate corresponds to logit log <sup>π</sup><sup>01</sup> <sup>1</sup> - <sup>π</sup><sup>01</sup> , which is the probability that a chicken that received treatment 1 will respond to a degree of lesion classified under category 0 (no lesion). The second estimation "c = 1, t = 1" defines η<sup>1</sup> + τ1, that is, the boundary between the categories "Moderate" (slight lesion) and


Table 8.9 Results of the analysis of variance in the multinomial cumulative logit model

"Severe" (severe lesion) for treatment 1 and corresponds to logit log <sup>π</sup><sup>11</sup> <sup>1</sup> - <sup>π</sup><sup>11</sup> , and so on. By taking the inverse of these links values, we can obtain the estimated probabilities of π<sup>01</sup> and π11. Part of the Statistical Analysis Software (SAS) glimmix output is presented below:

The results of the analysis of variance in part (a) of Table 8.9 indicate that the degree of lesion in the chicken footpad (pododermatitis) in the treatments tested were significantly different (P < 0.0001). Therefore, the hypothesis of proportional odds of treatments is rejected (H<sup>0</sup> : τ<sup>i</sup> = 0 for all i, that is, oddsratio = 1).

In part (b) of Table 8.9, we can see that the estimated intercepts η<sup>1</sup> = 0:6144 and η<sup>2</sup> = 3:8787 define the boundary between the categories "Without" lesion and "Moderate" lesion and the boundary between the categories "Moderate" lesion and "Severe" lesion, respectively. The estimated effect of the treatments ð Þ τ<sup>i</sup> shows that the boundaries move either upward or downward when a certain treatment is applied. In this sense, all estimated treatment coefficients have a negative effect with respect to treatment 4. This means that chickens under treatments 1–3 have a low probability of developing a moderate lesion and a higher probability of developing a severe lesion than when treatment 4 is applied.

To calculate the probability that a chicken will not develop footpad dermatitis (c = 0) when receiving treatment 1, that is, "c = 0, Trt = 1," we first estimate the linear predictor η<sup>01</sup> = η<sup>0</sup> þ τ<sup>1</sup> = 0:6144 þ ð Þ - 1:5034 = - 0:889, and, taking the inverse, we obtain π<sup>01</sup> = <sup>1</sup>=<sup>1</sup>þ<sup>e</sup> - -ð Þ <sup>0</sup>:<sup>889</sup> = 0:29. This value is the estimated probability that a chicken will not develop footpad dermatitis when receiving treatment 1. However, now, for "c = 1, Trt = 1," η<sup>11</sup> = η<sup>1</sup> þ τ<sup>1</sup> = 3:8787 þ ð Þ - 1:5034 = 2:3753, whose inverse value is 0.915. This value is an estimate of the probability π<sup>01</sup> þ π11. From this value, we obtain the probability that a chicken will develop a moderate lesion and a severe lesion. For a moderate lesion, the probability is π<sup>11</sup> = 0:915 - π<sup>01</sup> = 0:915 - 0:29 = 0:624, and, for a severe lesion, the probability is π<sup>21</sup> = 1 - 0:915 = 0:085. In a similar way the probabilities for the categories (c = 0, 1, 2) of the rest of the treatments are computed.


Table 8.10 Estimated odds ratio

Table 8.11 Estimates on the model scale (Estimate) and on the data scale (Mean) for footpad dermatitis categories in the multinomial cumulative logit model


The odds ratios tabulated in Table 8.10 are the odds ratios for treatments 1 through 4, i.e., e<sup>τ</sup><sup>i</sup> for treatments 1–4. These are the estimated odds ratios of adjacent categories of treatments i (i = 1, 2, 3) relative to treatment 4. Values of τ<sup>i</sup> are not category-specific; the odds ratios for "Without" lesion versus "Moderate" lesion and those for "Moderate" lesion versus "Severe" lesion are listed below (hence the name "proportional odds").

From the above odds ratio results, it should be obvious why the F- and P-values in the fixed effects tests are what they are. Adding the "ilink" option to the end of the "estimate" command prompts GLIMMIX to estimate the inverse of the linear predictors ηci ð Þ, i.e., the probabilities per category πci = <sup>1</sup>=<sup>1</sup>þ<sup>e</sup> - <sup>η</sup>ci (Table 8.11).

In the above table, several estimates are shown for η<sup>c</sup> þ τi. For example, the probability that a chicken will not develop a lesion under treatment 1 can be represented by "c = 0, t = 1," that is, η<sup>c</sup> þ τ<sup>1</sup> = -0.8893. This result matches the one obtained from the fixed effects table "Solutions for fixed effects" previously shown. Taking the inverse of the link yields the probability

Fig. 8.1 Estimated probabilities for the footpad lesion categories in the treatments tested, using the cumulative logit model

<sup>π</sup><sup>01</sup> <sup>=</sup> <sup>1</sup><sup>=</sup> <sup>1</sup> <sup>þ</sup> <sup>e</sup>0:<sup>8893</sup> ð Þ <sup>=</sup> <sup>0</sup>:2914. This probability is the maximum likelihood estimate that a chicken will have no footpad lesion with treatment 1. The inverse of the link function is under the "Mean" column of Table 8.11. Now, for the category "c = 1, t = 1," the inverse of the linear predictor is 0.9149, this is the estimate of π<sup>01</sup> þ π11. From this value, we can obtain the probability of a chicken showing a "Moderate" lesion when receiving treatment 1, that is, π<sup>01</sup> þ π<sup>11</sup> = 0:9149, and, substituting the value of π01, we obtain the value π<sup>11</sup> = 0:9141 - 0:2914 = 0:6227. Finally, for a "Severe" lesion (category" c = 2, t = 1" ), the probability that a chicken will present a severe lesion is π<sup>21</sup> = 1 - 0:9141 = 0:0859. Following the same procedure, we can obtain the probabilities for each of the following categories (c = 0, 1, 2) of the rest of the treatments (2–4).

Figure 8.1 shows that under the traditional feeding program with a litter density of 1 kg m-<sup>2</sup> of rice husks (Trt1), there is a high probability that broilers will develop moderate and severe footpad lesions, as shown by π<sup>11</sup> = 0:624 and π<sup>21</sup> = 0:085, respectively. When the litter density was increased from 1 to 2 kg m-<sup>2</sup> of rice husks under the traditional broiler program (Trt2), the probability of the risk of developing moderate and severe footpad lesions in broilers decreased significantly to π<sup>12</sup> = 0:384 and π<sup>22</sup> = 0:026, respectively, compared to Trt1, whereas the probability of not developing a footpad lesion increased to π<sup>02</sup> = 0:590 Trt2 ð Þ compared to π<sup>01</sup> = 0:291 Trt1 ð Þ. Regarding the implementation of the two foot care programs plus the litter density of 2 kg husk m-<sup>2</sup> of rice husks, the probability of chickens of not developing a footpad lesion is π<sup>04</sup> = 0:649 (Trt4) compared to π<sup>03</sup> = 0:396 in Trt3, whereas the probability of chickens developing moderate and severe lesions decreased from π<sup>14</sup> = 0:331 and π<sup>24</sup> = 0:025 in Trt4 compared to π<sup>13</sup> = 0:549 and π<sup>23</sup> = 0:055 in Trt3.

# 8.4 Cumulative Probit Models

An ordinal cumulative probit model, first considered by Aitchison and Silvey (1957), generalizes a binary probit model to ordinal responses. This model results from the probit modeling of the cumulative probabilities as a linear function of the covariates. The link functions for the cumulative probit model with C categories are listed below:

$$\begin{aligned} \eta\_1 &= \Phi^{-1}(\pi\_1) = \eta\_1 + \mathbf{X}\mathfrak{P} + \mathbf{Z}\mathfrak{b} \\ \eta\_2 &= \Phi^{-1}(\pi\_1 + \pi\_2) = \eta\_2 + \mathbf{X}\mathfrak{P} + \mathbf{Z}\mathfrak{b} \\ &\vdots \\ \eta\_{C-1} &= \Phi^{-1}(\pi\_1 + \pi\_2 + \dots + \pi\_{C-1}) = \eta\_{C-1} + \mathbf{X}\mathfrak{P} + \mathbf{Z}\mathfrak{b} \end{aligned}$$

where X and Z are the design matrices, β and b are the vectors of fixed and random effects parameters, respectively, and Φ-<sup>1</sup> () is the inverse function of the standard normal cumulative distribution. The inverse link of each of the link functions is as follows:

$$\begin{aligned} \pi\_1 &= \Phi(\eta\_1) = h(\eta\_1) \\ \pi\_1 + \pi\_2 &= \Phi(\eta\_2) = h(\eta\_2) \\ &\vdots \\ \pi\_1 + \pi\_2 + \dots + \pi\_{C-1} &= \Phi(\eta\_{\epsilon-1}) = h(\eta\_{\epsilon-1}). \end{aligned}$$

Once h(η1), h(η2), ... h(η<sup>c</sup> - 1) are estimated, we can estimate π1, ... , πC. The quality of the estimates of the ordinal cumulative probit model are usually very similar to those of an ordinal cumulative logit model for some datasets but not all. Both involve stochastic ordering at different levels of the response variable and are designed to detect the location of changes in the response variable.

Returning to Example 8.3.1, for the cumulative probit model, we change the "LINK = CPROBIT" option in the model's definition of the above program syntax. The output will contain all the same elements, except the odds ratios. The analysis for the cumulative probit is exactly the same as that one we performed in the cumulative logit model. Part of the output is shown in parts (a)–(c) of Table 8.12.

The estimated variance component due to blocks is σ<sup>2</sup> block = 0:0092. The results of the analysis of variance showed that the degrees of lesion in the chickens' footpad (pododermatitis) in the tested treatments differ significantly (P < 0.0001).

In part (b) of Table 8.12, it is possible to observe that the estimated intercepts η<sup>1</sup> = 0:3880 and η<sup>2</sup> = 2:2407 define the boundary between the "Without" lesion and "Moderate" lesion categories and the boundary between the "Moderate" lesion and "Severe" lesion categories, respectively. The estimated effect of the treatments ðτiÞ moves the boundaries either upward or downward, when a certain treatment is applied. In this sense, all estimated treatment coefficients have a negative effect


Table 8.12 Results of the analysis of variance in the multinomial cumulative probit model

with respect to treatment 4. This means that chickens under treatments 1–3 have a low probability of developing a footpad lesion and a higher probability of developing a severe lesion with respect to treatment 4.

From "Type III tests of fixed effects" (Table 8.12, part (b)), the probabilities for each of the categories can be obtained. For the probability that a chicken will not develop a footpad lesion (c = 0) under treatment 1, i.e., " c = 0, Trt = 1, " the estimated linear predictor is obtained as η<sup>01</sup> = η<sup>0</sup> þ τ<sup>1</sup> = 0:3880 þ ð Þ - 0:9278 = - 0:5398 and, taking the inverse, gives π<sup>01</sup> = Φð Þ - 0:5398 = 0:2946, that is, the estimated probability that a chicken will not develop a footpad lesion when receiving treatment 1. For " <sup>c</sup> <sup>=</sup> 1, Trt <sup>=</sup> 1, " <sup>η</sup><sup>11</sup> <sup>=</sup> <sup>η</sup><sup>1</sup> <sup>þ</sup> <sup>τ</sup><sup>1</sup> <sup>=</sup> <sup>2</sup>:<sup>2407</sup> <sup>þ</sup> ð Þ - <sup>0</sup>:<sup>9278</sup> <sup>=</sup> <sup>1</sup>:3129, whose inverse value is 0.9054. This value is an estimator of π<sup>01</sup> þ π11. From this value, we can obtain the probability that a chicken will develop a moderate lesion and a severe lesion. For a moderate lesion, π<sup>11</sup> = 0:9054 - π<sup>01</sup> = 0:9054 - 0:2946 = 0:6108, and, for a severe lesion, π<sup>21</sup> = 1 - 0:9054 = 0:0946. Similarly, we can obtain the probabilities of the categories for the other treatments (c = 0, 1, 2) for the rest of the treatments.

Similar to the previous example, adding the "ILINK" option to the end of the "ESTIMATE" command prompts GLIMMIX to estimate the values of the linear predictors ηci ð Þ and the inverse of the linear predictors, which are the probabilities per category πci = Φ ηci ð Þ ð Þ . Table 8.13 shows the estimates of the linear predictors as well as their inverse values (probabilities in this case).

From the above table, we show the estimates of η<sup>c</sup> þ τi. For example, the estimated linear predictor that a chicken will not develop a footpad lesion under treatment 1, i.e., " <sup>c</sup> <sup>=</sup> 0, <sup>t</sup> <sup>=</sup> 1, " is calculated as <sup>η</sup><sup>c</sup> <sup>þ</sup> <sup>τ</sup><sup>1</sup> = - <sup>0</sup>:5398. This result matches the values obtained from the fixed effects table ("Solutions for fixed effects") previously shown. Taking the inverse of the link function, π<sup>01</sup> = Φð Þ 0:5398 = 0:2947. This is the


Table 8.13 Estimates on the model scale (Estimate) and on the data scale (Mean) for footpad lesion categories in the multinomial cumulative probit model

probability that a chicken will not develop a footpad lesion when receiving treatment 1. This probability is under the "Mean" column.

Now, for the category " c = 1, t = 1, " the inverse of the link function is a probability of 0.9054, which results from the inverse value of the linear predictor η<sup>1</sup> þ τ<sup>1</sup> = 1:3129. This value is the estimate in terms of probability π<sup>01</sup> þ π11. From this value, we can obtain the probability that a chicken presents a "Moderate" lesion when receiving treatment 1, that is, π<sup>01</sup> þ π<sup>11</sup> = 0:9054, and, using the value of π01, we obtain the values π<sup>11</sup> = 0:9054 - 0:2947 = 0:6107 and π<sup>21</sup> = 1 - 0:9054 = 0:0946. Following the same procedure, we can obtain the rest of the probabilities for each one of the categories (c = 0, 1, 2) and for the rest of the treatments (2–4).

# 8.5 Effect of Judges' Experience on Canned Bean Quality Ratings

Canning quality is one of the most essential traits required in all new dry bean (Phaseolus vulgaris L.) varieties, and the selection for this trait is a critical part of bean breeding programs. Advanced lines that are candidates for release as varieties must be evaluated for canning quality for at least 3 years from samples grown at different locations. Quality is evaluated by a panel of judges with varying levels of experience in evaluating breeding lines for visual quality traits. A total of 264 bean breeding lines from 4 commercial classes were retained according to the procedures described by Walters et al. (1997). These included 62 white (navy), 65 black,


Table 8.14 Frequency of ratings of different types of beans as a function of the bean-rating experience

55 kidney, and 82 pinto bean lines plus control or "check" lines. The visual appearance of the processed beans was determined subjectively by a panel of 13 judges on a 7-point hedonic scale (1 = very undesirable, ..., 4 = neither desirable nor undesirable,..., 7 = very desirable). Beans were presented to the panel of judges in random order at the same time. Before evaluating the samples, all judges were shown examples of samples rated as satisfactory.

There is concern that certain judges, due to lack of experience, may not be able to correctly score the canned samples. From attribute-based product evaluations, inferences about the effects of experience can be drawn from the psychology literature (Wallsten and Budescu 1981). Prior to the bean canning quality rating experiment, it was postulated that not only do less experienced judges have a more severe rating than do more experienced judges but also that experience should have little or no effect on white beans, for which the canning procedure was developed. Judges are stratified for the purpose of analysis by experience (less than 5 years, greater than 5 years). Counts by canning quality, judge experience, and bean breeding lines are listed in the following table (Table 8.14).

The link functions for the cumulative logit model for describing a variable with C categories are as follows:

$$\eta\_{(1)\ddot{y}} = \log\left(\frac{\pi\_{1\ddot{y}}}{1 - \pi\_{1\ddot{y}}}\right) = \eta\_1 + \alpha\_i + \beta\_j + (a\beta)\_{\dot{y}}$$

$$\eta\_{(2)\ddot{y}} = \log\left(\frac{\pi\_{1\ddot{y}} + \pi\_{2\ddot{y}}}{1 - (\pi\_{1\ddot{y}} + \pi\_{2\ddot{y}})}\right) = \eta\_2 + \alpha\_i + \beta\_j + (a\beta)\_{\ddot{y}}$$

$$\vdots$$

$$\eta\_{C-1} = \log\left(\frac{\pi\_{1\ddot{y}} + \pi\_{2\ddot{y}} + \dots + \pi\_{(C-1)\ddot{y}}}{1 - (\pi\_{1\ddot{y}} + \pi\_{2\ddot{y}} + \dots + \pi\_{(C-1)\ddot{y}})}\right) = \eta\_{C-1} + \alpha\_i + \beta\_j + (a\beta)\_{\ddot{y}}$$

The components of the GLMM with an ordinal multinomial response are as follows:


$$\log\left(\frac{\pi\_{1\bar{ij}}}{1-\pi\_{1\bar{ij}}}\right) = \eta\_{1\bar{ij}}$$

$$\log\left(\frac{\pi\_{1\bar{ij}} + \pi\_{2\bar{ij}}}{1 - \left(\pi\_{1\bar{ij}} + \pi\_{2\bar{ij}}\right)}\right) = \eta\_{2\bar{ij}}$$

$$\log\left(\frac{\pi\_{1ij} + \pi\_{2ij} + \pi\_{3ij}}{1 - \left(\pi\_{1ij} + \pi\_{2ij} + \pi\_{3ij}\right)}\right) = \eta\_{3ij}$$

$$\log\left(\frac{\pi\_{1ij} + \pi\_{2ij} + \pi\_{3ij} + \pi\_{4ij}}{1 - \left(\pi\_{1ij} + \pi\_{2ij} + \pi\_{3ij} + \pi\_{4ij}\right)}\right) = \eta\_{4ij}$$

$$\log\left(\frac{\pi\_{1ij} + \pi\_{2ij} + \pi\_{3ij} + \pi\_{4ij} + \pi\_{5ij}}{1 - \left(\pi\_{1ij} + \pi\_{2ij} + \pi\_{3ij} + \pi\_{4ij} + \pi\_{5ij}\right)}\right) = \eta\_{Sij}$$

$$\log\left(\frac{\pi\_{1\bar{i}\bar{j}} + \pi\_{2\bar{i}\bar{j}} + \pi\_{3\bar{i}\bar{j}} + \pi\_{4\bar{i}\bar{j}} + \pi\_{5\bar{i}\bar{j}} + \pi\_{6\bar{i}\bar{j}}}{1 - \left(\pi\_{1\bar{i}\bar{j}} + \pi\_{2\bar{i}\bar{j}} + \pi\_{3\bar{i}\bar{j}} + \pi\_{4\bar{i}\bar{j}} + \pi\_{5\bar{i}\bar{j}} + \pi\_{6\bar{i}\bar{j}}\right)}\right) = \eta\_{6\bar{i}\bar{j}}$$

The following GLIMMIX commands fit a cumulative logit model with an ordinal multinomial response.

```
proc glimmix data=beans ;
class Exper;
model cal(order=data)= Exper|Class/dist=Multinomial link=clogit
```


Table 8.16 Hypothesis testing in quality assessment


Part of the results is shown below. The results of the analysis of variance show that the class of bean (Class), experience of the evaluator (Exper), and the interaction between class and experience (Class×Exper) on bean canning scores differ significantly (P = 0.0001). That is, the results of comparing judges with more and less years of experience will depend on the line (variety) of beans (Table 8.15).

The contrasts address this interaction (Table 8.16). Hypothesis testing is as follows: πclass of bean, <sup>&</sup>lt; 5 years of experience = πclass of bean, <sup>&</sup>gt; 5 years of experience.

The results show that judges with more than 5 years of experience differ from those with less than 5 years of experience in evaluating the quality of canned kidney and pinto beans (Table 8.16). With the "solution" option in the model specification, the fixed parameter estimates table shows the solution of the fixed effects parameters under maximum likelihood. In this table, we can observe the values of the estimated intercepts: η<sup>1</sup> = - 4:6421 defines the boundary between the categories, "1 = highly undesirable" and "2 = moderately undesirable", whereas η<sup>2</sup> = - 2:9316 defines the boundary between the categories "2 = moderately undesirable" and "3 = slightly undesirable." The third intercept defines the boundary between the categories "3 = moderately undesirable" and "3 = slightly undesirable," η<sup>3</sup> = - 1:3995 defines the boundary between the categories "3 = slightly undesirable" and "4 = neither undesirable nor desirable," and so on.

The estimated effects of bean type ðαiÞ, evaluator β<sup>i</sup> , and their interaction αβij are shown below. From these values, we can estimate the linear predictors for each of the categories. For example, the linear predictor for canned black beans evaluated by an inexperienced judge who assigns the category "1 = very undesirable" is η<sup>111</sup> = η<sup>1</sup> þ α<sup>1</sup> þ β<sup>1</sup> þ αβ<sup>11</sup> = - 4:6421 þ 1:9670 þ 1:0284 - 0:8066 = - 2:4533, for category "2 = moderately undesirable," it is η<sup>211</sup> = η<sup>2</sup> þ α<sup>1</sup> þ β<sup>1</sup> þ αβ<sup>11</sup> = - 2:9316 þ 1:9670 þ 1:0284 - 0:8066 = - 0:7428, for category "3 = slightly undesirable," it is η<sup>311</sup> = η<sup>3</sup> þ α<sup>1</sup> þ β<sup>1</sup> þ αβ<sup>11</sup> = - 1:3995 þ 1:9670 þ 1:0284 - 0:8066 = 0:7893, and,


Table 8.17 Maximum likelihood estimation of the estimated parameters in the fixed effects solution of canned bean quality ratings in the multinomial cumulative logit model

for category "4 = neither undesirable nor desirable," it is η<sup>411</sup> = η<sup>4</sup> þ α<sup>1</sup> þ β<sup>1</sup> þ αβ<sup>11</sup> = 0:004287 þ 1:9670 þ 1:0284 - 0:8066 = 2:1931. This is how the other categories are calculated for each type of bean and assessor (Table 8.17).

The results of Table 8.18 were obtained with the "estimate" command in conjunction with the "ilink" option that prompts GLIMMIX to compute the values of the linear predictors, ^ηcij, tabulated under the "Estimate" column, and the estimated probabilities π^cij for all categories of each treatment are tabulated under the "Mean" column πcij , except the reference category.

From Table 8.18 ("Estimates"), we can obtain the probabilities reported under the "Mean" column in which an inexperienced (<5 years) panelist (judge) would rate canned black beans as category 1 (1 = highly undesirable) with a probability of π^<sup>111</sup> = 0:08 compared to an experienced panelist (>5 years) who would give a


Table 8.18 Estimates on the model scale (Estimate) and on the data scale (Mean) based on judges' experience in canned bean quality ratings in the multinomial cumulative logit model






Table 8.18 (continued)

probability of π^<sup>112</sup> = 0:0646. To calculate the probability that a judge with less than 5 years experience would assign a rating of 2 (2 = moderately undesirable) to canned black beans, we derive this probability from the cumulative probability of 0.3224, which corresponds to π<sup>211</sup> þ π111, from which we get π<sup>211</sup> = 0:3224 - π<sup>111</sup> = 0:3224 - 0:08 = 0:24. On the other hand, for a judge with experience (>5 years), the probability of assigning a score of 2 to canned black beans is π<sup>212</sup> = 0:2760 - π<sup>112</sup> = 0:2760 - 0:06446 = 0:2115.

Following the same procedure, the other probabilities for the rest of the categories are obtained. The probabilities calculated for each of the categories are shown in Table 8.19 and can be seen in Fig. 8.2.

# 8.6 Generalized Logit Models: Nominal Response Variables

In a model with unordered data, the polytomous response variable does not have an ordered structure. Two classes of models, generalized logit models and conditional logit models, can be used with nominal response data. A generalized logit model consists of a combination of several binary logits estimated simultaneously. A logit model is the simplest and best-known probabilistic choice model. However, there are problems in making use of a multinomial logit model because of its inflexibility. A generalized logit model is essentially more flexible than the traditional multinomial cumulative logit model.

A generalized logit model shows the same flexibility as a probit model but is much more tractable. Like cumulative logit and probit models, a generalized logit model has C – 1 link functions, where C denotes the number of response categories.


Table 8.19 Probabilities calculated for each of the canned bean grades

Cal1 = qualification 1, Cal2 = qualification 2,...., Cal7 = qualification 7; J1 = panelist with less than 5 years' experience, and J2 = panelist with more than 5 years' experience

Fig. 8.2 Estimated probabilities for each category of the acceptability of canned beans, according to the experience of the panelist (judge)

Moreover, in this class of models, a category is first defined as the reference category. This may be arbitrary or it may make compelling logical sense in the study to designate a particular response category as the reference. In practice and throughout the analysis, the category used as the reference is irrelevant, as long as we are consistent about it. For example, if C is used as the reference category, then the generalized logits are defined as shown below:

$$\begin{aligned} \boldsymbol{\eta}\_1 &= \log \left( \frac{\boldsymbol{\pi}\_{1\bar{\boldsymbol{\eta}}}}{\boldsymbol{\pi}\_{C\bar{\boldsymbol{\eta}}}} \right) = \boldsymbol{a}\_1 + \mathbf{X}\boldsymbol{\beta}\_1 + \mathbf{Z}\boldsymbol{b}\_1 \\ \boldsymbol{\eta}\_2 &= \log \left( \frac{\boldsymbol{\pi}\_{2\bar{\boldsymbol{\eta}}}}{\boldsymbol{\pi}\_{C\bar{\boldsymbol{\eta}}}} \right) = \boldsymbol{a}\_2 + \mathbf{X}\boldsymbol{\beta}\_2 + \mathbf{Z}\boldsymbol{b}\_2 \\ &\vdots \\ \boldsymbol{\eta}\_{C-1} &= \log \left( \frac{\boldsymbol{\pi}\_{(C-1)\bar{\boldsymbol{\eta}}}}{\boldsymbol{\pi}\_{C\bar{\boldsymbol{\eta}}}} \right) = \boldsymbol{a}\_{c-1} + \mathbf{X}\boldsymbol{\beta}\_{C-1} + \mathbf{Z}\boldsymbol{b}\_{C-1} \end{aligned}$$

Given the different effects in the models, the intercepts (α´ s), β´ s, and b´ s vary across the pairs of response variable categories for each link function. Using algebra, it can be shown that the general form of the inverse of the link functions is given by

$$\pi\_c = \frac{e^{\eta\_c}}{1 + \sum\_{c=1}^{C-1} e^{\eta\_c}}, \qquad c = 1, 2, \dots, C - 1$$

Once π1, π1, ... . π<sup>C</sup> - <sup>1</sup> are estimated, the reference category is estimated as <sup>π</sup><sup>C</sup> <sup>=</sup> <sup>1</sup> - <sup>C</sup> - <sup>1</sup> c = 1 πc.

# 8.6.1 CRDs with a Nominal Multinomial Response

In practice, cumulative models are used for analyzing ordinal data and generalized logit models for nominal data. Returning to Example 8.3.1, we will now implement the analysis of a generalized logit model. This model relaxes the assumptions of proportionality; but it is less parsimonious than the "odds ratio" model since they fit C - 1 binary logit models, where C is the number of categories of the response variable. The linear predictor and distribution are the same as in the previous example.

The following GLIMMIX syntax implements the analysis of the generalized logit model:

```
proc glimmix data=chickens ;
class trt block category;
model category(reference='severe')= trt/dist=Multinomial
link=glogit oddsratio;
random intercept/subject=block solution group=category;
estimate 't=1' intercept 1 trt 1 0,
t=2' intercept 1 trt 0 1 0,
't=3' intercept 1 trt 0 0 0 1 0,
't=4' intercept 1 trt 0 0 0 0 1/ilink bycat;
freq y;
run;
```


Table 8.21 Maximum likelihood estimates on the model scale (Estimate) for footpad lesion level in the multinomial generalized logit model


Most of the syntax of the program has already been explained. The "reference=" option is new to this program in the command, where the model is defined and is used to designate the reference category. By not specifying the "reference=" option, GLIMMIX by default uses the last category in the dataset. Moreover, the "link = glogit" option prompts GLIMMIX to fit a generalized logit model. The "bycat" option in the "estimate" command is unique to the generalized logit model. Finally, the "ilink" option asks GLIMMIX to estimate all category probabilities for each treatment, except those for the reference category. Part of the output is shown in Table 8.20. The fixed effects test shows that there are highly significant differences (P = 0.0001) on the average percentage of footpad lesion level between treatments.

Unlike the cumulative logit model, in the generalized logit model, the estimates of the fixed effects (treatments), as well as the intercepts, are separated for each link function. For the estimation of linear predictors, we use the estimated values of Table 8.21 ("Solutions for fixed effects"). The estimated intercepts α<sup>1</sup> = 4:8525 and α<sup>2</sup> = 4:2485 define the boundary between the categories "Without" lesion and "Moderate" lesion and the boundary between the categories "Moderate" lesion and "Severe" lesion, respectively. For treatment 1, the treatment effects (^τiÞ estimated for the "Without" lesion category is τ<sup>1</sup> = - 3:8447 and for the "Moderate" lesion category, it is τ<sup>1</sup> = - 2:6478. With these values, the linear predictors for the "Without" lesion and "Moderate" lesion categories under treatment 1 are η<sup>01</sup> = 4:8525 - 3:8447 = 1:0077 and η<sup>11</sup> = 4:2485 - 2:6478 = 1:6007, respectively.

The estimated probabilities for each of the categories ("Without" lesion and "Moderate" lesion) in each treatment, except for the reference category, are found


Table 8.22 Estimates on the model scale ("Estimate") and on the data scale ("Mean") for footpad lesion level observed in treatments in the multinomial generalized logit model

under the "Mean" column of Table 8.22. The probability that a chick has no footpad lesion when receiving treatment 1 is π<sup>01</sup> = 0:315, whereas the value 0.57 corresponds to the cumulative probability π<sup>01</sup> þ π11. From this value, we can calculate the probability of observing a moderate lesion, which is π<sup>11</sup> = 0:57 - π<sup>01</sup> = 0:57 - 0:315 = 0:255. From these probabilities, we can estimate the probability of observing a severe footpad lesion under treatment 1 as π<sup>21</sup> = 1 - ð Þ 0:57 = 0:43. Following the same logic, we can estimate the reference probabilities for the rest of the other treatments.

Another important result is the odds ratio estimates. These estimates are shown in Table 8.23.

These odds ratios compare the odds for the labeled category to those for the reference category for treatments 1–3 relative to treatment 4. These odds ratio values are derived from the estimated probabilities in each of the categories. For example, the probabilities that a chicken does not present a lesion and a moderate lesion are π<sup>04</sup> = 0:6433 and π<sup>14</sup> = 0:3517, respectively. From these probabilities, we can estimate the probability of observing a severe lesion as follows: π<sup>24</sup> = 1 - ð Þ 0:6433 þ 0:3517 = 0:005. The estimated odds ratio of not observing a lesion ("Without" lesion) between treatments 1 and 4 is

$$\text{Odds} \xrightarrow[\text{Tr1}]{} \text{ratio}\_{\text{Tr1}, \text{Tr1}4} = \frac{\widehat{\pi}\_{01}}{\widehat{\pi}\_{21}} / \frac{\widehat{\pi}\_{04}}{\widehat{\pi}\_{24}} = \frac{0.315}{0.115} / \frac{0.6433}{0.005} = 0.02135$$

the value provided in the odds ratio estimates table. If we compare the analysis using the cumulative logit link and the generalized logit link, we observe insignificant


Table 8.23 Estimated odds ratio

changes in the estimated category probabilities by treatment as well as in the significance level in the test of treatment effects.

# 8.6.2 CRD: Cheese Tasting

Consider a study in which you want to know the effects of various additives on the flavor of cheese. Researchers tested 4 cheese additives and obtained 52 response ratings for each additive. Each response was measured on a scale of 9 categories ranging from: I dislike it very much (1) to I like it very much or excellent flavor (9). Data are obtained from the study by McCullagh and Nelder (1989) (Table 8.24).

The components of the GLMM with an ordinal multinomial response are as follows:


$$\log\left(\frac{\pi\_{1i}}{1-\pi\_{1i}}\right) = \eta\_{1i}$$

$$\log\left(\frac{\pi\_{1i}+\pi\_{2i}}{1-\left(\pi\_{1i}+\pi\_{2i}\right)}\right) = \eta\_{2i}$$

$$\log\left(\frac{\pi\_{1i}+\pi\_{2i}+\pi\_{3i}}{1-\left(\pi\_{1i}+\pi\_{2i}+\pi\_{3i}\right)}\right) = \eta\_{3i}$$


Table 8.24 Effect of addi-


The following GLIMMIX commands fit a cumulative logit model with an ordinal multinomial response.

```
proc glimmix ;
class id additive scale;
model scale(order=data)= additive/dist=Multinomial link=clogit
solution oddsratio;
estimate 'c=1, a=1' intercept 1 0 0 0 0 0 0 0 additive 1 0 0 0,
'c=2, a=1' intercept 0 1 0 0 0 0 0 0 additive 1 0 0 0,
'c=3, a=1' intercept 0 0 1 0 0 0 0 0 additive 1 0 0 0,
'c=4, a=1' intercept 0 0 0 1 0 0 0 0 additive 1 0 0 0,
```


```
'c=5, a=1' intercept 0 0 0 0 1 0 0 0 additive 1 0 0 0,
'c=6, a=1' intercept 0 0 0 0 0 1 0 0 additive 1 0 0 0,
'c=7, a=1' intercept 0 0 0 0 0 0 1 0 additive 1 0 0 0,
'c=8, a=1' intercept 0 0 0 0 0 0 0 1 additive 1 0 0 0,
'c=1, a=2' intercept 1 0 0 0 0 0 0 0 additive 0 1 0 0,
'c=2, a=2' intercept 0 1 0 0 0 0 0 0 additive 0 1 0 0,
'c=3, a=2' intercept 0 0 1 0 0 0 0 0 additive 0 1 0 0,
'c=4, a=2' intercept 0 0 0 1 0 0 0 0 additive 0 1 0 0,
'c=5, a=2' intercept 0 0 0 0 1 0 0 0 additive 0 1 0 0,
'c=6, a=2' intercept 0 0 0 0 0 1 0 0 additive 0 1 0 0,
'c=7, a=2' intercept 0 0 0 0 0 0 1 0 additive 0 1 0 0,
'c=8, a=2' intercept 0 0 0 0 0 0 0 1 additive 0 1 0 0,
'c=1, a=3' intercept 1 0 0 0 0 0 0 0 additive 0 0 1 0,
'c=2, a=3' intercept 0 1 0 0 0 0 0 0 additive 0 0 1 0,
'c=3, a=3' intercept 0 0 1 0 0 0 0 0 additive 0 0 1 0,
'c=4, a=3' intercept 0 0 0 1 0 0 0 0 additive 0 0 1 0,
'c=5, a=3' intercept 0 0 0 0 1 0 0 0 additive 0 0 1 0,
'c=6, a=3' intercept 0 0 0 0 0 1 0 0 additive 0 0 1 0,
'c=7, a=3' intercept 0 0 0 0 0 0 1 0 additive 0 0 1 0,
'c=8, a=3' intercept 0 0 0 0 0 0 0 1 additive 0 0 1 0,
'c=1, a=4' intercept 1 0 0 0 0 0 0 0 additive 0 0 0 1,
'c=2, a=4' intercept 0 1 0 0 0 0 0 0 additive 0 0 0 1,
'c=3, a=4' intercept 0 0 1 0 0 0 0 0 additive 0 0 0 1,
'c=4, a=4' intercept 0 0 0 1 0 0 0 0 additive 0 0 0 1,
'c=5, a=4' intercept 0 0 0 0 1 0 0 0 additive 0 0 0 1,
'c=6, a=4' intercept 0 0 0 0 0 1 0 0 additive 0 0 0 1,
'c=7, a=4' intercept 0 0 0 0 0 0 1 0 additive 0 0 0 1,
'c=8, a=4' intercept 0 0 0 0 0 0 0 1 additive 0 0 0 1/ilink;
freq freq;
run;
```
Part of the results is shown in Table 8.25. The results of the analysis of variance show that the type of additive used in the manufacture of cheese significantly affects the degree of consumer acceptance (P = 0.0001). That is, the type of additive affects the sensory characteristics of the cheese.

The contrast of hypothesis are presented in Table 8.26. The hypothesis tests are as follows:

$$
\pi\_{\text{additive}\_i} = \pi\_{\text{additive}\_j}; \forall i \neq j
$$

The results show that the additives provide different sensory characteristics that are reflected in the evaluation of preference.

With the "solution" option in the model specification, Table 8.27 (fixed parameter estimates) shows the solution of the maximum likelihood estimates for the fixed


Table 8.26 Contrast of hypothesis in the acceptance of cheese made with four additives

Table 8.27 Maximum likelihood estimates of the fixed effects in the preference ratings of cheese made with different types of additives in the multinomial cumulative logit model


effects parameters. In this table, we observe the values of the estimated intercepts: η<sup>1</sup> = - 7:0802 defines the boundary between categories "1" and "2," whereas η<sup>2</sup> = - 6:0250 defines the boundary between categories "2" and "3." The third intercept, ^η<sup>3</sup> = - 4:9254, defines the boundary between categories "3" and "4" and so forth. The estimated effects of the additive type α<sup>i</sup> ð Þ , i = 1, 2, 3, and 4 are 1.628, 4.9646, 3.3227, and 0, respectively. From these values, linear predictors are estimated for each of the categories.

For example, the estimated linear predictor for a cheese made with additive 1, where the evaluator (consumer) assigns it category "1 = highly undesirable," is represented as η<sup>11</sup> = η<sup>1</sup> þ α<sup>1</sup> = - 7:0802 þ 1:6128 = - 5:4674; for the category "2 = moderately undesirable," it is η<sup>21</sup> = η<sup>2</sup> þ α<sup>1</sup> = - 6:0250 þ 1:6128 = - 4:4122; for the category "3 = slightly undesirable," it is η<sup>31</sup> = η<sup>3</sup> þ α<sup>1</sup> = - 4:9254 þ 1:6128 = - 3:3126; and for the category "4 = neither undesirable nor desirable," it is η<sup>41</sup> = η<sup>4</sup> þ α<sup>1</sup> = - 3:8568 þ 1:6128 = - 2:2440. These values are shown in the "Estimate" column of Table 8.28; other categories are similarly calculated for each type of additive.

The estimated values in Table 8.27 obtained with the "estimate" command in conjunction with the "ilink" option prompts GLIMMIX to calculate the values of the linear predictors ηCi tabulated in the "Estimate" column and estimated probabilities πCiof all categories of each treatment, tabulated in the "Mean" column πcij , except for the reference category.

From Table 8.28 (Estimates), we obtain the probabilities for each category that is reported under the "Mean" column. In this case, the probability for π<sup>11</sup> = 0:004205. This value is obtained by taking the inverse value of the linear predictor η<sup>11</sup> = - 5:4674 <sup>π</sup><sup>11</sup> <sup>=</sup> <sup>1</sup><sup>=</sup> <sup>1</sup> <sup>þ</sup> exp ð Þ <sup>5</sup>:<sup>4674</sup> <sup>=</sup> <sup>0</sup>:<sup>004205</sup> . To calculate the probability that a panelist would assign a rating of 2 (2 = moderately undesirable) to cheese made with additive 1, we use the cumulative probability of 0.01198, which corresponds to π^<sup>21</sup> þ π^11. From this value, we obtain π<sup>21</sup> = 0:01198 - π<sup>11</sup> = 0:01198 - 0:004205 = 0:007775 and for the probability of assigning a rating of 3 to cheese made with additive 1, π<sup>31</sup> = 0:03514 - ð Þ π<sup>21</sup> þ π<sup>11</sup> = 0:03514 - 0:001198 = 0:033942: Following the same procedure, we obtain the other probabilities for the rest of the categories of each of the additives used in the manufacturing of cheese, which are tabulated in Table 8.29 and can be seen in Fig. 8.3.

Figure 8.3 shows the probability results of each flavor rating for each of the additives (it should be noted that some probability values were suppressed to avoid overwriting). It can be seen that additive 1 primarily receives ratings of 5–7; additive 2 primarily receives ratings of 2–5; additive 3 primarily receives ratings of 4–6; and additive 4 primarily receives ratings of 7–9.

The odds ratio results (Table 8.30) show the preferences more clearly. For example, the odds ratio additive 1 vs. 4 states that the first additive is 5.017 times more likely to receive a lower score than the fourth additive.

# 8.7 Exercises

Exercise 8.7.1 The dataset for this exercise corresponds to the results of 9 judges who rated 2 classes of wine, namely, white wine (WW = 1) and red wine (RW = 2), and, within each wine class, they rated 10 wines on a scale of 1–20 points. The minimum rating for a particular wine was 7, and the maximum rating was 19.5. For didactic purposes, ratings between 7 and 11 were assigned low quality, a rating between 12 and 15 as medium quality, and anything above 15 was considered excellent quality. The data are shown in Table 8.31 of the wine evaluation experiment under columns "Judge" (wine evaluator panelist), "Wine\_type" (white wine: 1, red wine: 2), "Quality" (low, medium, and excellent), and the frequency of the observed qualities ("y").

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Interpretation of the odds ratios
	- (iii) The expected probability per category for each treatment

Table 8.28 Estimates on the model scale (Estimate) and on the data scale (Mean) based on judges' preference ratings of cheese made with different types of additives in the multinomial cumulative logit model



Table 8.28 (continued)

Exercise 8.7.2 Data were obtained from a series of experiments conducted to reduce damage to potato tubers due to a potato lifter. The experiments were conducted at the Institute of Agricultural Engineering (IMAG-DLO) in Wageningen, the Netherlands. One source of damage was the type of rod used in the lifter. In the experiment – under consideration – eight types of rods were compared. It is an empirical fact that the degree of damage varies considerably between potato varieties with the type of rope used in the lifting of full potato sacks. Three blocks of observations were obtained for the combinations of varieties and rope types. Most of the combinations involved about 20 potatoes. For some combinations, there are no data due to an insufficient number of large potatoes. Tubers were dropped from a given height. To determine the damage, all tubers were peeled and the degree of blue coloration was classified into one of four classes (class 1: no damage; class 2: slight damage; class 3: moderate damage; and class 4: severe damage). The observations, in the form of counts per class and combination, are shown in Table 8.32 of the tuber experiment whose columns are "Variety" (1, 2, 3, 4, 4, 5, 6), "String" (1, 2, 3, 4, 5, 6, 7, 8), "Block" (1, 2, 3), Type of damage (sd = no damage, dl = light damage, dm = moderate damage, ds = severe damage), and the observed frequency (Y).



Note: Grade1 = Grade 1, Grade2 = Grade 2,...., Grade9 = Grade 9

Fig. 8.3 Estimated probabilities for the categories of acceptability for the cheese according to the type of additive



(a) List the components of the multinomial GLMM.

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Interpretation of the odds ratios
	- (iii) The expected probability per category for each treatment

Exercise 8.7.3 Fit a generalized multinomial logit model using the dataset from Exercise 8.7.2 of this section, following the instructions:


Table 8.31 Results of the wine evaluation experiment


Table 8.31 (continued)

(a) List the components of this model.

(b) Perform a thorough and appropriate analysis of the data, focusing on:


Exercise 8.7.4 In this exercise, the effects of judges' experience on quality ratings of canned beans are assessed. Canning quality is one of the most essential traits required in all new dry bean (Phaseolus vulgaris L.) varieties, and selection for this trait is a critical part of bean breeding programs. Advanced lines, which are candidates for release as varieties, must be evaluated for canning quality for at least 3 years from samples grown at different locations. Quality is evaluated by a panel of judges with varying levels of experience in evaluating breeding lines for visual quality traits. In all, 264 bean breeding lines from 4 commercial classes were conserved according to the procedures described by Walters et al. (1997).

These included 62 white (navy), 65 black, 55 kidney, and 82 pinto bean lines plus control lines and "checks." The visual appearance of the processed beans was determined subjectively by a panel of 13 judges on a 7-point hedonic scale (1 = very undesirable, 4 = neither desirable nor undesirable, 7 = very desirable). The beans were presented to the panel of judges in random order at the same time. Prior to evaluating the samples, all judges were shown examples of samples rated as satisfactory (4). There is concern that certain judges, due to lack of experience, may not be able to score canned samples correctly.

From attribute-based product evaluations, inferences about the effects of experience can be drawn from the psychology literature (Wallsten and Budescu (1981). Prior to the bean canning quality rating experiment, it was postulated that not only do


Table 8.32 Results of the tuber experiment. V = variety, C = string, B = block, D = damage (sd = no damage, dl = slight damage, dm = moderate damage, ds = severe damage), and Y = observed frequency

Table 8.32 (continued)



Table 8.32 (continued)

Table 8.32 (continued)



Table 8.32 (continued)

less experienced judges have a more severe rating than do more experienced judges but also that experience should have little or no effect on the white beans for which the canning procedure was developed. Judges are stratified for the purpose of analysis by experience (less than 5 years, greater than 5 years).

Counts by canning quality, judge experience, and bean breeding lines are listed in the following table (Table 8.33).


Table 8.33 Bean experiment results

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Interpretation of the odds ratios
	- (iii) The expected probability per category for each treatment

Exercise 8.7.5 An experiment was conducted to look at the damage levels (ordinal categories 0–4) of Picea sitchensis shoots in two time periods (10 November and 8 December), at four temperatures (different on each date), and at four ozone levels (Table 8.34).

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Interpretation of the odds ratios
	- (iii) The expected probability per category for each treatment


Table 8.34 Experimental results of Picea sitchensis sprouts


Table 8.34 (continued)


Table 8.34 (continued)


Table 8.34 (continued)


# Appendix

### Appendix 375


Data: CRD with a multinomial response: ordinal

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Chapter 9 Generalized Linear Mixed Models for Repeated Measurements

# 9.1 Introduction

Repeated measures data, also known as longitudinal data, are those derived from experiments in which observations are made on the same experimental units at various planned times. These experiments can be of the regression or analysis of variance (ANOVA) type, can contain two or more treatments, and are set up using familiar designs, such as completely randomized design (CRD), randomized complete block design (RCBD), or randomized incomplete blocks, if blocking is appropriate, or using row and column designs such as Latin squares when appropriate. Repeated measures designs are widely used in the biological sciences and are fairly well understood for normally distributed data but less so with binary, ordinal, count data, and so on. Nevertheless, recent developments in statistical computing methodology and software have greatly increased the number of tools available for analyzing categorical data.

A generalized linear mixed model (GLMM) is one of the most useful and sophisticated structures in modern statistics, as it allows complex structures to be incorporated into the framework of a general linear model. Fitting such models has been the subject of much research over the last three decades. GLMMs, for repeated measures, combine both generalized linear model (GLM) theory (e.g., a binomial, multinomial, or Poisson response variable) and linear mixed effects models.

Experimentation is sometimes not well understood since researchers believe that it involves only the manipulation of the levels of independent variables and the observation of subsequent responses in dependent variables. Independent variables, whose levels are determined or set by the experimenter, are said to have fixed effects, although random effects are also very common, where the levels of the effects are assumed to be randomly selected from an infinite population of possible levels. Many variables of interest in research are not fully amenable to experimental manipulation but can nevertheless be studied by considering them to have random effects. For example, the genetic composition of individuals of a species cannot be manipulated experimentally, but it is of great interest to geneticists aiming to assess the genetic contribution to individual variation of some specific behaviors.

A GLMM with repeated measures is a generalization of the standard linear model, and this generalization is due to (1) the presence of more than one response variable that can be binary, ordinal, count, and so on and (2) the nonconstant correlation and/or variability exhibited by the data. The linear mixed model, therefore, gives you the flexibility to model not only the means of your data (as in the standard linear model) but also their variances and covariances. Usually, a normal distribution is assumed for random effects. Since normally distributed data can be modeled entirely in terms of their means and variances/covariances, the two sets of parameters in a linear mixed model actually specify the full probability distribution of the data. The parameters of the mean structure in the model are called (known as) fixed effects parameters, which can be qualitative (as in traditional analysis of variance) or quantitative (as in standard regression), and the parameters of the variance–covariance of the model are known as covariance parameters, which help distinguish a linear mixed model from the standard linear model. Covariance parameters come up quite frequently in the following applications, with two more typical scenarios:


The first scenario can be generalized to include a set of clusters nested within one another. For example, if students are the experimental unit, they can be grouped into classes (clusters), which, in turn, can be grouped into schools. Each level of this hierarchy may present an additional source of variability and correlation. The second scenario occurs in longitudinal studies, in which repeated measurements of the same experimental unit over time are taken. Alternatively, these repeated measures could be spatial or multivariate.

# 9.2 Example of Turf Quality

The proportional odds model, introduced by McCullagh (1980), was proposed as an extension of the generalized linear model used for ordinal responses. One can recall that the proportional odds model is a special case of a GLM with a cumulative link function in which the probability of an observation falling into a category or below is modeled. In the case of a logit link, with only two categories (a binary response), the proportional odds model reduces to a standard logistic regression or a classification model. As with any other type of response variable, repeated measurements are common in agronomic research. They result in clustered data structures with correlations between repeated observations in the same experimental unit that must be taken into account in the analysis.


Table 9.1 Turf quality of five grass varieties (low, Med = medium, Excel = Excellent, Sept = September)

The data were obtained from an experiment studying the turf quality of five grass varieties. The varieties were sown independently in 17 or 18 plots. The evaluations of the plots (experimental units) were carried out in the months of May, July, and September of the growing season, and turf quality was classified on an ordinal scale into three categories: low quality, medium quality, and excellent quality, as demonstrated in Table 9.1.

The components of the GLMM, with repeated measures with an ordinal multinomial response, are as follows:


$$\log\left(\frac{\pi\_{0ij}}{1-\pi\_{0ij}}\right) = \eta\_{0ij}$$

$$\log\left(\frac{\pi\_{0ij} + \pi\_{1ij}}{1 - \left(\pi\_{0ij} + \pi\_{1ij}\right)}\right) = \eta\_{1ij}$$

The following Statistical Analysis Software (SAS) program fits a repeated measures GLMM with an ordinal response.

```
proc glimmix data=turfgrass method=laplace; 
class Variety time; 
model cat(order=data)=variety|time/dist=Multinomial link=clogit 
solution oddsratio; 
random intercept/subject=variety type=cs solution ;
```


```
Table 9.2 Fit statistics under 
different correlation structures
```

```
estimate 'c=1, var=1' intercept 1 0 variedad 1 0 0 0 0, 
'c=2, var=1' intercept 0 1 variedad 1 0 0 0 0, 
'c=1, var=2' intercept 1 0 variedad 0 1 0 0 0, 
'c=2, var=2' intercept 0 1 variedad 0 1 0 0 0, 
'c=1, var=3' intercept 1 0 variedad 0 0 1 0 0, 
'c=2, var=3' intercept 0 1 variedad 0 0 1 0 0, 
'c=1, var=4' intercept 1 0 variedad 0 0 0 1 0, 
'c=2, var=4' intercept 0 1 variedad 0 0 0 1 0, 
'c=1, var=5' intercept 1 0 variedad 0 0 0 0 1, 
'c=2, var=5' intercept 0 1 variedad 0 0 0 0 1/ilink; 
freq y; 
run;
```
Mixed models have advantages over fixed linear models (Littell et al. 1996) because they have the ability to incorporate fixed (Xβ) and random effects (Zb) that allow us to select different variance–covariance structures for repeated measures experiments (with or without missing data) to see which covariance structure best fits the model (Henderson 1984; Smith et al. 2005). Selecting or building a good enough model involves selecting a covariance structure that best fits the dataset. The information criteria minus two Restricted Log Likelihood (-2RLL), Akaike information criterion (AIC), Corrected Akaike's information criterion (AICC), Bayesian information criterion (BIC), etc.) provided by proc GLIMMIX are used as statistical fit measures to select the variance structure (compound symmetry ("CS"), first-order autoregressive ("AR(1)"), Toeplitz ("Toep(1)"), unstructured ("UN)") that best models the dataset.

Most of the commands have already been explained. To provide the correlation structure that you want to model, with the above program, you vary the "TYPE" option = (CS, AR(1), Toep(1), and UN) separately to specify each of the covariance structures in the parentheses. Part of the results is shown below.

According to the fit statistics (Table 9.2), the covariance structure that best fits the dataset is Toeplitz of order 1 (Toep(1)). The type III tests of fixed effects, shown in Table 9.3 part (a), indicate that grass variety provides different turfgrass qualities


Table 9.3 Results of the analysis of variance

Table 9.4 Estimated linear predictors and means on the model scale (Estimate) and on the data scale (Mean) for observed turfgrass quality in grass varieties in the multinomial generalized logit model


(P = 0.0202). The "solution" option in the model specification "Model" provides the solution of fixed effects of the model (intercepts and treatments), which we use to estimate the linear predictors ^ηci =^η<sup>c</sup>þ Variety ^ <sup>i</sup>(part (b)).

The probabilities πci obtained using the "Estimate" information are tabulated under the "Mean" column of Table 9.4.

From these values, we can observe that for the category " c = 1, var = 1, " the value of the linear predictor is η11 <sup>=</sup> <sup>η</sup><sup>1</sup> <sup>þ</sup>variety1 = - <sup>2</sup>:0248. Taking the inverse of ^η11 corresponds to the probability of π11 = 0:1166 of observing "Low"-quality grass of variety 1. Now, for the category " c = 2, var = 1, " the inverse of the linear predictor is 0.6507, which is the estimate of the probability π<sup>11</sup> <sup>þ</sup>π21. From this value, we can obtain the probability that variety 1 provides grass of "Medium" quality, that is, π<sup>11</sup> <sup>þ</sup>π21 <sup>=</sup> <sup>0</sup>:6504, and, substituting the value of π11, we obtain the probability value π21 = 0:6507 - 0:1166 = 0:5341. With these two probability estimates π11 and π21, it is possible to estimate the probability that variety 1 will yield an "Excellent" quality turf, which is equal to π31 = 1 - 0:6504 = 0:3496. Likewise, we obtain the values of the remaining probabilities πci for the rest of the grass varieties.

# 9.3 Effect of Insecticides on Aphid Growth

A cage experiment was used to investigate the effect of three insecticides on aphid colonies with partial resistance to a common active compound. There were eight treatments: all combinations of the three insecticides and a control (no insecticide) with two types of colonies (susceptible or partially resistant). The experiment was organized as an RCBD with six blocks of eight cages, and each cage was assigned a treatment combination in each block. A colony of aphids was reared in each cage, and the number of live aphids was recorded before insecticide treatment was applied and then 2 and 6 days after application. Both hatches and deaths could occur within each cage between evaluations. The dataset from this experiment is shown below (Table 9.5).

Following the same reasoning as in previous examples, the components of the GLMM with a Poisson response and repeated measures, which models the number of aphids (yijkl), is described in the following lines.

$$\text{Distributions: } \text{y}\_{\vec{y}kl} \mid b\_l, \text{insecticide} \times \text{clone}(\text{block})\_{\vec{y}(l)} \sim \text{Poisson}\left(\lambda\_{\vec{y}kl}\right)$$

$$b\_l \sim N\left(0, \sigma^2\_{\text{block}}\right), \quad \text{insecticide} \times \text{clone}(\text{block})\_{\vec{y}(l)} \sim N\left(0, \sigma^2\_{\text{insecicide} \times \text{clone} \times \text{block}}\right).$$

Linear predictor: ηijkl = θ + Ii + Cj + (IC)ij + bl + IC(b)ij(l) + τ<sup>l</sup> + (Iτ)il + (Cτ)il + (ICτ)ijkl, Cj + (IC)ij + bl + IC(b)ij(<sup>l</sup>) + τl + (Iτ)il + (Cτ)il + (ICτ)ijkl, where ηijkl is the linear predictor, θ is the intercept, Ii (i = 1, 2, 3) is the fixed effect due to the insecticide, Cj ( j = 1, 2) is the fixed effect due to the aphid clone, (IC)ij is the fixed effect due to the interaction between the type of insecticide and clone, bl (k = 1, 2, 3) is the random block effect, assuming bl <sup>N</sup>0, <sup>σ</sup><sup>2</sup> block , IC(b)ij(<sup>l</sup>) is the random effect of the interaction between the insecticide and clone within blocks, assuming insecticide <sup>×</sup> clone block ð Þij kð Þ <sup>N</sup>0, <sup>σ</sup><sup>2</sup> insecticide <sup>×</sup> clone <sup>×</sup> block , τl (l = 1, 2, 3) is the fixed effect due to measurement time, and (Cτ)il and (ICτ)ijkl are the fixed effects due to interaction.


Table 9.5 Effect of insecticides (C = control, R = resistant, S = susceptible) on aphid growth

Link function: log(λijkl) = ηijk is the link function that relates the linear predictor to the mean (λijkl).

The following SAS program adjusts the GLMM with a Poisson distribution on repeated measures.

proc glimmix nobound method=laplace; class ID Block Insecticide Cage Clone time; model y = Insecticide|clone|time/dist=poi; random intercept Insecticide\*Clon/subject=block; lsmeans Insecticide|Clon|time/lines ilink; run;quit;


Table 9.6 Results of the analysis of variance in the Poisson GLMM

Before fitting the GLMM, we compare the estimates of covariance structures with a Poisson distribution assumed in the response variable. According to the fit statistics, the covariance structure that best models the data is the autoregressive type of order 1 (AR(1)). The value of the fit statistic of the conditional distribution Pearson′ s chi - square/DF = 5.77 indicates that there is an extra variation (aka overdispersion) and that the Poisson distribution does not adequately fit the data (Table 9.6).

Since there is overdispersion in the data, a highly recommended alternative is to find another suitable (or more appropriate) distribution for this dataset. In this case, the linear predictor will be the same, although now, a negative binomial distribution will be assumed in the response variable. That is,

yijkljbl, insecticide <sup>×</sup> clone block ð Þij lð Þ Negative Binomial λijkl, <sup>ϕ</sup>

This negative binomial model arises by assuming that the conditional distribution of observations given random blocks and Insecticide\*clone(block)ij(<sup>l</sup>)) is as follows: yijkljbl, insecticide\*clone(block)ij(<sup>l</sup>) ~ Poisson(λijkl), where λijkl Gamma <sup>1</sup> <sup>ϕ</sup> , ϕ . The result of the new distribution of yijkljbl, insecticide × clone(Block)ij(<sup>l</sup>) is a negative binomial (Negative binomial (λijkl, ϕ)). The link function is log(λijkl) = ηijkl.

The following SAS code fits the GLMM with a negative binomial distribution.

```
proc glimmix nobound method=laplace; 
class ID Block Insecticide Cage Clone time; 
model y = Insecticide|clone|time/dist=negbi; 
random intercept Insecticide*Clone/subject=block; 
lsmeans Insecticide|Clone|time/lines ilink; 
run;
```


Table 9.7 Fit statistics


Part of the results is shown in Table 9.7. The values of the fit statistics, assuming a negative binomial distribution of the data, are shown in part (a), and the value of the conditional statistic is observed in part (b) (Pearson′ s chi - square/DF = 0.81). This indicates that overdispersion has been eliminated from the data, and, so, the negative binomial distribution adequately models the response variable.

The estimated variance components are shown in part (a) of Table 9.8, under an AR(1) covariance structure. The estimates of the variance components of blocks, the interaction between the insecticide and clone within blocks, and the scale parameter are σ<sup>2</sup> block <sup>=</sup> <sup>0</sup>:06613, σ<sup>2</sup> insecticide <sup>×</sup> clone block ðÞ = - <sup>0</sup>:7575, and <sup>ϕ</sup> <sup>=</sup> <sup>0</sup>:1584, respectively. The fixed III type effects tests (part (b)) indicate that there is a significant effect of insecticide type (P < 0.0001), clone (P = 0.0387), measurement time (P = 0.0137), and interactions insecticide x measurement time (P < 0.0001) and clone x measurement time (P = 0.0259) on the average number of aphids. The interaction insecticide x clone x measurement time is close to significance (P < 0.0663).


Table 9.9 Estimates of insecticide least squares (LS) means on the model scale (Estimate) and the data scale (Mean)

Table 9.10 Clone least squares means on the model scale (Estimate) and the data scale (Mean)


Table 9.11 Insecticide\*clone least squares means on the model scale (Estimate) and the data scale (Mean)


Table 9.12 Time least squares means on the model scale (Estimate) and the data scale (Mean)


The linear predictors and estimated means of the factors and interaction are under the "Estimate" and "Mean" columns, respectively. Average number of aphids for insecticide, clone and time are given below:

For insecticide type (Table 9.9):

For clone (Table 9.10):

For the interaction insecticide\*clone (Table 9.11):

For measurement time (Table 9.12):

For the interaction insecticide\*time (Table 9.13):

For the interaction clone\*time (Table 9.14):

For the interaction insecticide\*clone\*time (Table 9.15):


Table 9.13 Insecticide\*time least squares means on the model scale (Estimate) and the data scale (Mean)

Table 9.14 Clone\*time least squares means on the model scale (Estimate) and the data scale (Mean)


# 9.4 Manufacture of Livestock Feed

In this experiment, two types of pelleted feed were manufactured using different amounts of whole sorghum. Using the whole grain resulted in one feed with a high pellet durability index (PDI) and one with a low PDI. The researcher was interested in how much impact this difference in PDI would have on the amount of intact and pelleted feed distributed to the different positions along the feeding line. The line was fed four times with the high PDI feed and four times with the low PDI feed. After each run, the total weight of the feed in each of the 12 identified trays was measured. The feed was then sieved into each tray, and the crushed fine granules were weighed in the feed line. The response of interest was the ratio (proportion) between the weight of fine granules and the total weight of the feed for each tray. The data for this experiment are in the Appendix (Data: Feeding line experiment).

The experimental design used in this study was a split plot in a randomized completely design. There were 2 fixed factors, feed with 2 levels (high PDI feed (H) and low PDI feed (L)), and a tray with 12 levels (1, 2, 3, ..., 12 locations along




the feed line). Different run levels (1, 2, 3, 4 runs in the feed line) may influence the inference of this experiment, so it is advisable to analyze which variance structure is suitable for this analysis.

The ANOVA table (Table 9.16) with degrees of freedom for this experiment is shown below.

The researcher aims to draw conclusions about the destructiveness in the feed line with two types of feed, high PDI and low PDI. The following GLMM is used to describe the experiment:

$$y\_{ijk} = \mu + a\_i + a(r)\_{ik} + \beta\_j + (a\beta)\_{ij} + \varepsilon\_{ijk}$$

where yijk is the proportion observed in the run k (k = 1, 2, 3, 4), tray j ( j = 1, 2, ..., 12), and in feed i (i = 1, 2); μ is the overall mean; αi is the fixed effect of feed i; α(r)ik is the random effect of the ith feed within the kth run, assuming <sup>α</sup>ð Þ<sup>r</sup> ik <sup>N</sup>0, <sup>σ</sup><sup>2</sup> <sup>α</sup>r ; βj is the fixed effect due to the jth tray; (αβ)ij is the effect of the interaction between the ith feed and the jth tray; and εijk is the experimental error. The components of the conditional GLMM assuming that the response variable follows a beta distribution are listed below:

The distribution of the response variable is given by yikj j α(r)ik~Beta(μ + αi <sup>+</sup> α(r)ik + βj + (αβ)ij, ϕ) whose linear predictor is ηijk = μ + αi + α(r)ik + βj + (αβ)ij with link function logit <sup>π</sup>ijk <sup>1</sup> - <sup>π</sup>ijk <sup>=</sup> <sup>η</sup>ijk. The following GLIMMIX syntax fits a GLMM with a beta distribution.

```
proc glimmix method=laplace; 
class tray feed run; 
model ratio = feed|tray/dist=beta; 
random intercept/subject=feed(run) type=toep(1); 
lsmeans feed|tray/lines ilink; 
run;
```
Part of the output is shown below. Four covariance structures ("CS," "AR(1)," "Toep(1)," and "UN") were tested to see which one best fits the response variable. Of these covariance structures, "Toep(1)" produced the best fit statistics (part (a), Table 9.17).

Another important result that gives the guideline to continue with the analysis is the conditional distribution statistic (Pearson′ s chi - square/DF = 0.96), whose



Table 9.18 Feed least squares means on the model scale (Estimate) and the data scale (Mean)

value indicates that the beta model adequately fits the data, whereas the fixed effects tests (part (c)) indicate that there is a statistically significant effect of feeding type (P = 0.0001) and tray (P = 0.0001).

The linear predictors and estimated probabilities of the factors and interaction are listed under the "Estimate" and "Mean" columns of the following tables, respectively.

For the feeding line (Table 9.18):

For the tray (Table 9.19):

For the interaction feeding\*tray (Table 9.20):

# 9.5 Characterization of Spatial and Temporal Variations in Fecal Coliform Density

During a 1-month period (June 1981), 30 river water samples were collected from the channel at 3 stations, A, B, and C (downstream to upstream) on 5 randomly selected days at 9:00 a.m. and 3:00 p.m. (1 sample per station per hour per day). Each sample was analyzed for fecal coliform by method FC-96. The data from this experiment are shown in Table 9.21.


Table 9.19 Tray least squares means on the model scale (Estimate) and the data scale (Mean)

Table 9.20 Tray\*feed least squares means on the model scale (Estimate) and the data scale (Mean)



Table 9.21 Variation in fecal coliform densities of the river water samples from three sampling stations on five sampling days at 9:00 a.m. (TM = 1) and 3:00 p.m. (TM = 2)

To assess the relative magnitudes of sources of variation due to time, site, and subsampling on the number of coliforms per milliliter (yijk), an analysis of variance using a GLMM with a Poisson response was performed, as described below:

We denote yijk as the number of colonies per milliliter, whose conditional distribution is given by yijkjsampling(site)ik ~ Poisson (λijk) with the linear predictor ηijk defined by

$$
\eta\_{ijk} = \theta + \text{site}\_i + \text{sampling}(\text{site})\_{ik} + \text{time}\_j + (\text{site} \times \text{time})\_{ij}
$$


Table 9.22 Results of the analysis of variance

$$(i = 1, 2, 3; j = 1, 2, 3, 4, 5; k = 1, 2)$$

where ηijk is the linear predictor that relates the linear function to the mean, θ is the intercept, sitei is the fixed effect due to the sampling site i, sampling(site)ik is the random effect due to the sampling time nested within the site, assuming sampling site ð Þik <sup>N</sup>0, <sup>σ</sup><sup>2</sup> sampling site ðÞ , timej is the fixed effect due to sampling date, and (site × time)ij is the effect of the interaction between the site and sampling date. The link function for this model is log(λijk) = ηijk.

The following GLIMMIX syntax fits a GLMM with a Poisson response.

```
proc glimmix data=ufc nobound method=laplace; 
class T TM Site ; 
model ufc = Site|T/dist=Poisson link=log; 
random intercept/subject=TM(Site) type=toep(1); 
lsmeans Site|T/lines ilink; 
run;
```
Part of the results is summarized in Table 9.22. To determine which covariance structure best models the response variable, four types were tested (part (a)), all of which produced very similar results. Because of these results, the "Toep(1)" covariance structure was chosen. From this, the fit statistics were obtained, and the value of the conditional distribution statistic is Pearson′ s chi - square/DF = 54.41. This value indicates that there is a strong overdispersion in the dataset. Therefore, it is important to look for an alternative distribution that solves this problem.

The hypothesis tests in part (c) indicate that there is a significant difference in the date of sampling (P = 0.0001) as well as in the interaction between the site and date


of sampling (P = 0.0001). That is, the concentration of fecal coliform units per milliliter is affected by the date of data collection. However, we observed that there is an excessive dispersion in the data. One way to check for and deal with overdispersion is to run a quasi-Poisson model, which, during the fitting process, adds an additional dispersion parameter to account for that additional variance. Another option is to look for a distribution that adequately fits the data; in this case, the negative binomial distribution is a good alternative.

Next, we will implement the analysis assuming that the response variable is distributed under a negative binomial distribution. This means that the distribution of yijk (number of colonies per militro) is given by yijk j smapling(site)ik~Negative Binomial (λijk, ϕ), where ϕ is the scale parameter. However, the linear predictor ηijk and the link function remain unchanged.

The following GLIMMIX commands fit a GLMM with a negative binomial distribution.

```
proc glimmix data=ufc nobound method=laplace; 
class T TM Site; 
model ufc = Site|T/dist=negbin; 
random intercept/subject=TM(Site)/type=Toep(1); 
lsmeans Site|T/lines ilink; 
run;
```
Part of the output of the above program is shown below. The values of the fit statistics under the negative binomial distribution (part (a) of Table 9.23) are much smaller compared to those obtained assuming the Poisson model, indicating that the negative binomial distribution adequately fits the response variable. Furthermore, the value of the conditional distribution statistic indicates that the negative binomial distribution is a good distribution for these data (Pearson′ s chi - square/DF = 0.76).

This parameter Pearson0 s chi - square DF = 0:76 refers to how many times the variance is larger than the mean. Since this value is less than 1 (part (b)), the conditional variance is actually smaller than the conditional mean, indicating that overdispersion has been removed in the fitting of the data. Another direct effect


Table 9.25 Means and standard errors on the model scale (Estimate) and on the data scale (Mean) of the sampling site data


Table 9.26 Means and standard errors of measurement time on the model scale (Estimate) and the data scale (Mean)


observed when there is no overdispersion is the F-values of the fixed effects tests (Table 9.24). In this case, the date on which the samples were collected was significant but not the interaction between the two factors, as the case when the data were fitted using the Poisson GLMM.

The linear predictors and estimated probabilities of the main effects and the interaction between both factors are under the columns "Estimate" and "Mean," respectively. The sampling site averages are presented below (Table 9.25).

The averages by sampling date are listed below (Table 9.26).

The means of the interaction site × sampling date are shown below (Table 9.27).

# 9.6 Log-Normal Distribution

Positively skewed distributions are highly common, especially when modeling biological data. Data often have a lower bound, usually 0 or the detection limit, but have no restriction on the upper bound. Therefore, when the data are below the median, no observation can be further away than the lower bound; however, when the data are above the median, there may be values that are many times further away, giving a positively skewed distribution. These skewed distributions can often be approximated by a log-normal distribution (Limpert et al. 2001).


Table 9.27 Means and standard errors for the interaction T\*site on the model scale (Estimate) and the data scale (Mean)

Fig. 9.1 Density function of the log-normal distribution with parameters 1 and 0.6

A log-normal distribution is characterized by having only positive nonzero values, positive skewness, a nonconstant variance that is proportional to the square of the mean value, and a normally distributed natural logarithm. The probability density function for a log-normal distribution has an asymmetric appearance, with a larger amount of data below the expected value and a thinner right tail with higher values. Figure 9.1 shows the positive skewness of a log-normal distribution with mean 1 and standard deviation 0.6.

# 9.6.1 Emission of Nitrous Oxide (N2O) in Beef Cattle Manure with Different Percentages of Crude Protein in the Diet

The experiment was conducted between January and February 2017 at the Colegio de Postgraduados Campus Córdoba located in Amatlán de los Reyes, Veracruz, México. The genetic material used were four 5–6-month-old males of the Criollo lechero tropical (CLT) breed, randomly distributed in individual pens of 4.8 × 2.1 m2 , each one with 75% shade, a cup drinker, and a drawer-type feeder. To ensure the required crude protein percentages for each treatment, the following diets (treatments 1–4) were developed: Trt1 (12% crude protein), Trt2 (14% crude protein), Trt3 (16% crude protein), and Trt4 (commercial feed with 16% crude protein). Each animal randomly received the four treatments in different periods. Each treatment was applied for 11 days, of which the first 7 were considered adaptation days and the following 4 days were used for the measurement of gases in the daily accumulated excreta. The experiment had a total duration of 44 days. The data from this experiment are tabulated in the Appendix (Data: Nitrous oxide emission). The N2O gas fluxes in ppm were calculated from a linear or nonlinear increase of the concentrations inside the static chambers over time, and these fluxes were converted to micrograms of N2O–N per m2 per hour ( y); for more details, see the study by Nadia Hernández-Tapia et al., (2019). The statistical model used in this study was an analysis of covariance model in a randomized complete block design with repeated measures, as described below.

$$\mathbf{y}\_{ijk} = \mu + \tau\_i + \text{animal}\_{\bar{j}} + \text{time}\_{k} + (\tau \times \text{time})\_{ik} + \beta\_i (\mathbf{x}\_{\bar{i}\bar{j}} - \overline{\mathbf{x}}) + \varepsilon\_{ijk}$$

where yijk is the flux of N2O–N (μg m-2 h-1 ); μ is the overall mean; τi is the fixed effect due to treatment i (i = 1, 2, 3, 4); animalj is the random effect due to animal j ( j = 1, 2, 3, 4), assuming animalj~N(0, σ<sup>2</sup> animal); timek is the fixed effect of time k (k = 1, 2, 3, 4, 5) at the time of measurement; (τ × time)ik is the effect of the interaction between τi and timek, βi is the coefficient of linear regression of the covariate xij in treatment i and time j, where xij can be the pH, humidity (HE), temperature (TE) in the manure, maximum temperature (TMaxA), minimum temperature (TMinA), maximum humidity (HMaxA), minimum humidity (HMinA), or initial weight (kilograms) at the start of a treatment; x is the mean of the covariate in question; and εijk is the non-normal experimental error.

The linear predictor ηijk for N2O–N is ηijk <sup>=</sup> <sup>μ</sup> þ τ<sup>i</sup>þ animal<sup>j</sup>þ timekþ ð Þ <sup>τ</sup> time ik þ βi xij -x . The response variable yijk has a conditional log-normal distribution with a mean μijk and variance e<sup>σ</sup><sup>2</sup> - <sup>1</sup>:e<sup>2</sup>μþ<sup>σ</sup>2 , that is, yijkjanimalj ~ Log normal (μijk, e<sup>σ</sup><sup>2</sup> - 1 :e<sup>2</sup>μþ<sup>σ</sup>2 ); the rest of the parameters have already been described above.

The following GLIMMIX syntax adjusts a GLMM with a log-normal response:

```
proc glimmix data=co2 nobound method=laplace; 
class animal trt time; 
model flox =trt|time xbar/dist=lognormal; 
random intercept /subject=animal type=cs; 
lsmeans trt|time/lines ilink; 
run;
```
Although most of the commands have already been described in previous chapters, in this chapter, we average the TMinA covariate "xbar." Part of the output is shown below.

The gas emissions from cattle manure, regardless of the treatment applied, are influenced by several factors (covariates) that the researcher cannot control, which have a significant effect on the estimation of means and experimental error. Both are linearly related to the response variable. Covariates such as pH, humidity, and temperature of the excreta, as well as the temperature and humidity (maximum and minimum) of the environment, influence the dynamics of gas emission. These covariates were considered and analyzed in the covariance model to adjust the estimated means of the N2O–N flux. Based on the fit statistics obtained from the proposed models (Table 9.28), the model that best explains the variability of the N2O–N flux is model 5 because this model provides the lowest values in AIC, AICC, BIC, and MSE (Mean Square Error). Therefore, the model that provides the best fit or explains the most variability in the N2O–N flux is the one that includes the minimum environment temperature.

The conditional fit statistics (part (a)) and the estimated variance components (part (b)) are shown in Table 9.29. The type III fixed effects tests (part (c)) indicate that there is a significant effect of Trt (P = 0.0008), time (P = 0.0288), the interaction Trt × time (P = 0.0140), the covariate Tmin (P = 0.0079), and the interaction Tmin × Trt (P = 0.038).

The average N2O–N emissions between Trt1 (12% CP: Crude Protein) and Trt2 (14% CP) were statistically different from each other. Treatment 1 emitted the highest N2O–N flux despite being the treatment with the lowest percentage of CP (Table 9.30).

# 9.7 Effect of a Chemical Salt on the Percentage Inhibition of the Fusarium sp.

In order to observe the tolerance of the fungus Fusarium sp. to different concentrations of a chemical salt, a bioassay was implemented to evaluate the percentage of inhibition of the fungus. This bioassay consisted of placing a nutritive culture medium in Petri dishes for the fungal development in which different concentrations of the salt in ppm were added (0, 500, 1000, and 2000, ). Mycelium growth was measured during 6 days, and the percentage of inhibition of Fusarium sp. growth was calculated. Part of the data is shown below, and the complete base is in the Appendix (Data: Percentage inhibition).


Table 9.28 Fit statistics in the different models proposed


Table 9.29 Conditional fit statistics, variance components, and type III fixed effect tests





Following the same reasoning as in previous examples, the components of the GLMM with beta response distribution repeated-measures for the percentage inhibition of Fusarium sp. (yijkl) are listed below:

Distributions: yijkl <sup>j</sup>ωkl, conc(ω)i(kl)~Beta(πijkl, <sup>ϕ</sup>); i = 1, <sup>⋯</sup>, 4; j = 1, ..., 6; k = 1, 2; <sup>l</sup>= 1, ..., ri. ωkl <sup>N</sup>0, <sup>σ</sup><sup>2</sup> <sup>ω</sup>, conc ω i kl ð Þ ð Þ <sup>N</sup>0, <sup>σ</sup><sup>2</sup> conc ðÞ <sup>ω</sup> .

Linear predictor: <sup>η</sup>ijk <sup>=</sup> <sup>θ</sup> <sup>þ</sup>conci þ ωkl þ conc ð Þ <sup>ω</sup> i kl ð Þ <sup>þ</sup>time<sup>j</sup>þ ðconc <sup>×</sup> time ij: <sup>Þ</sup>

where ηijk is the linear predictor, θ is the intercept, conci is the fixed effect of salt concentration, ωkl is the random effect of the Petri dish within the bioassay, assuming ωkl <sup>N</sup>0, <sup>σ</sup><sup>2</sup> <sup>ω</sup>, conc(ω)i(kl) is the random effect of salt concentration– Petri dish–bioassay, assuming conc ð Þ <sup>ω</sup> i kl ð Þ <sup>N</sup>0, <sup>σ</sup><sup>2</sup> conc ðÞ <sup>ω</sup> , timej is the fixed effect due to the day of measurement, and (conc × time)ij is the interaction effect of chemical salt concentration with the day of measurement.

Link function: logit(πijkl) = ηijkl is the link function that relates the linear predictor to the mean (πijkl).

The following SAS program adjusts the beta GLMM with repeated measures.

```
proc glimmix data=inhibition method=laplace nobound; 
class Bio Day Conc Rep; 
model pct = Con|Day/dist=beta link=logit; 
random intercept/subject=con(bio) type=cs; 
lsmeans Con|Day/lines ilink; 
run;
```
Before fitting the generalized linear mixed model, we compare the estimates of the covariance structures with the beta distribution in the response variable (Table 9.31 part (a)). According to the fit statistics, the covariance structures that best fit the data are the Toeplitz type (Toep(1)) and unstructured (UN).

Having defined the covariance structure, in this case, Toeplitz of order 1, we present part of the results of the data fit (Table 9.31 part (b)). The fit statistic Pearson′ s chi - square/DF = 1.07 indicates that there is no overdispersion and that the beta distribution fits the data adequately. The estimated variance component, under Toeplitz (1), of the concentration–repetition bioassay is σ<sup>2</sup> con ðÞ <sup>ω</sup> <sup>=</sup> <sup>0</sup>:00285 and the scale parameter ϕ = 52:281 (c).


Table 9.31 Fit statistics for the conditional distribution and variance components



The fixed effects indicate that there is a highly significant effect of salt concentration (P = 0.0002), time (P = 0.0001), and the interaction concentration x time (P = 0.0074) on the growth inhibition of Fusarium sp. (Table 9.32).

The linear predictors and estimated probabilities of the factors (Table 9.33 parts (a) and (b)) and interaction (Table 9.34) are found under the columns "Estimate" and "Mean," respectively.

# 9.8 Carbon Dioxide (CO2) Emission as a Function of Soil Moisture and Microbial Activity

Productive agricultural soil requires a certain level of ventilation to maintain active plant root growth and soil microbial activity. One scientist found that soil oxygenation levels had been affected in soils fertilized with nutrient-rich sludge from a sewage treatment plant. The level of soil aeration can be reduced by (1) the high water content of the sludge added, through compaction with heavy machinery used to add the sludge and, ironically, (2) the increased microbial activity that occurs when sludge with high organic matter content is added. The objective of the research was to determine the moisture levels at which aeration becomes a limiting factor for


Table 9.33 Concentration and measurement time least square means on the model scale (Estimate) and the data scale (Mean)

microbial activity in the soil. The study included a control treatment (no sludge) and three treatments using sludge as a fertilizer with different moisture contents, whose moisture levels for the fertilized soil were 0.24, 0.26, and 0.28 kg water/kg soil.

Soil samples were randomly assigned to the four treatments in a randomized completely design. Soil samples were placed in sealed containers and incubated under favorable conditions for microbial activity. The soil was compacted in the containers simulating a degree of compaction experienced in the field. Microbial activity, measured as an increase in CO2, was used as a measure of the level of soil oxygenation. The CO2 evolution/kilogram soil/day in each container was measured on 2, 4, 6, and 8 days after starting of the incubation period. Microbial activity in each soil sample was recorded as the percentage increase in CO2 produced above the atmospheric level. The data are shown in Table 9.35.

The analysis of variance table for this experiment is shown below (Table 9.36).

Let pctijk be the percentage of CO2 emission, assuming that pctijk has a beta distribution with a mean πijk and scale parameter ϕ, i.e., pctijk~Beta(πijk, ϕ). The linear predictor ηijk that relates the mean to the link function is given by

$$\eta\_{ijk} = \theta + a\_i + a(r)\_{i(k)} + \tau\_j + (a\tau)\_{ij}; \\ i = 1, \dots, 4, j = 1, \dots, 4, k = 1, 2, 3$$

where θ is the intercept, αi is the fixed effect of the treatment i, α(r)i(<sup>k</sup>) is the random effect of treatment nested in the repetition k, assuming that α ð Þ<sup>r</sup> i kð Þ <sup>N</sup>0, <sup>σ</sup><sup>2</sup> <sup>α</sup>ðÞ <sup>r</sup> , τ<sup>j</sup> is the fixed effect of measurement time j, and (ατ)ij is the interaction effect of treatment with measurement time. The link function is defined by logit(πijk) = ηijk.

The following SAS syntax fits a GLMM on repeated measures with a beta distribution.


Table 9.34 Measuring time\*salt concentration interaction on the model scale (Estimate) and the data scale (Mean)

```
proc glimmix data=co2 method=laplace; 
class trt container time; 
model pct = trt|time/dist=beta link=logit; 
random trt/subject=container; 
lsmeans trt|time /lines ilink; 
run;
```
Part of the results is shown below. The fit statistics under different covariance structures (Table 9.37 part (a)), such as AIC and AICC indicate that a Toeplitz-type covariance structure of order 1 provides the best fit to the dataset of this experiment.

Table 9.38 part (a) shows the estimated variance component due to treatment x repetition, i.e., - σ<sup>2</sup> a rðÞ <sup>=</sup> <sup>0</sup>:03363, and the estimated scale parameter <sup>ϕ</sup> <sup>=</sup> <sup>790</sup>:82, and the hypothesis test (part (b)) indicates that the treatments yielded statistically different means (P = 0.0011).


Table 9.35 Repeated measurements of emissions of CO2 by bacterial activity in soil under different moisture conditions



Table 9.37 Fit statistics of the beta GLMM under different covariance structures


Table 9.39 shows the estimated average emissions of CO2 in tested treatments, which showed that the treatment with moisture 0.24 kg water/kg soil favored a higher microbial activity, whereas treatments with moisture levels 0.26 and 0.28 kg water/kg soil showed similar microbial activity between them.



Table 9.39 Means and standard errors on the model scale (Estimate) and the data scale (Mean)

Fig. 9.2 CO2 emission as a measure of microbial activity

Figure 9.2 clearly shows that the treatment with moisture 0.24 kg water/kg soil provides the best conditions for soil microbial activity, whereas the rest of the treatments significantly affect the activity of microorganisms.

# 9.9 Effect of Soil Compaction and Soil Moisture on Microbial Activity

A soil scientist conducted an experiment to evaluate the effects of soil compaction and soil moisture on microbial activity. Ventilation levels may be restricted in highly saturated or compacted soils, thus reducing microbial activity. The experiment consisted of three levels of soil compaction (1.1, 1.4, and 1.6 mg soil/m3 ) and three levels of soil moisture (0.1, 0.2, and 0.24 kg water/kg soil). The treated soil samples were placed in sealed containers and incubated under conditions to microbial activity. The percentage increase in CO2 produced above atmospheric levels was measured in each soil sample. The experimental design was a completely randomized design (CRD) with a 3 X 3 factorial structure of treatments. Two replicates of the soil container units were prepared for each treatment. The evolution of CO2/kg soil/day was measured for three successive days. The data from this experiment are shown below in Table 9.40.

The analysis of variance table for this experiment is shown below (Table 9.41).

Let pctijk be the percentage of CO2 emission and assume that pctijk has a beta distribution with a mean πijk and scale parameter ϕ, i.e., pctijk~Beta(πijk, ϕ). The linear predictor ηijk that relates the mean to the link function is given by

$$\eta\_{ijkl} = \theta + a\_i + \beta\_j + (a\beta)\_{ij} + a\beta (r)\_{ij(l)} + \tau\_k + (a\tau)\_{ik} + (\beta\tau)\_{jk} + (a\beta\tau)\_{ijk}$$


Table 9.40 Percentage of CO2 by bacterial activity as a function of soil density (mg soil/m3 ) and soil humidity (kg water/kg soil)


Table 9.41 Analysis of variance of an CRD with factorial structure of treatments in repeated measures

Note: Here, 1 degree of freedom was subtracted from the total observations of the experiment since there is a missing observation

$$i = 1, 2, 3, \; j = 1, 2, 3, \; k = 1, 2, 3, \; l = 1, 2$$

where θ is the intercept, αi is the fixed effect of the density factor, βj is the fixed effect of the humidity factor, (αβ)ij is the effect of the interaction between density and humidity, αβ(r)ij(<sup>l</sup>) is the random effect of the interaction density × humidity × repetition αβ ð Þ<sup>r</sup> ij lð Þ <sup>N</sup>0, <sup>σ</sup><sup>2</sup> αβ ðÞ <sup>r</sup> , τl is the fixed effect of measurement time, (ατ)ij is the fixed effect of the interaction between density and measurement time, (βτ)jk is the fixed effect of the interaction between moisture and measurement time, and (αβτ)ijk is the fixed effect of the interaction of density × humidity × time. The link function is defined by logit(πijkl) = ηijkl.

The following SAS GLIMMIX syntax fits a repeated measures GLMM with a beta distribution.

```
proc glimmix data=co2_fact nobound method=laplace; 
class density moisture rep time; 
model pct = density|humidity|time/dist=beta link=logit; 
random density*humidity/subject=rep type=toep(1); 
lsmeans density|humidity|time/lines ilink; 
run;
```
Part of the results is listed below. The fit statistics (AIC and AICC) in Table 9.42 part (a) indicate that a Toeplitz covariance structure of order 1 provides the best fit to of the data.

The type III tests of fixed effects in Table 9.43 indicate that soil density (P = 0.0021), humidity (P = 0.0001), the evolution of emission over time (P = 0.0001), and the interaction between moisture and time of measurement (P = 0.0001) are statistically significant.


Table 9.42 Fit statistics of a beta GLMM with a factorial structure of treatments under different covariance structures

Table 9.43 Hypothesis testing of the factors under study


The least mean squares obtained with the "lsmeans" command on the model scale are shown under the "Estimate" column and the data scale under the "Mean" column of Table 9.44.

# 9.10 Joint Model for Binary and Poisson Data

Another advantage of the GLIMMIX procedure is the ability to fit models to data where the distribution and/or link function varies with response variables. This is accomplished through the specification of DIST = BYOBS or LINK=BYOBS in the model definition. The dataset created below provides an example of a variable with a bivariate outcome. This reflects the condition and length of hospital stay for 32 patients with herniorrhaphy. These data are taken from data provided by Mosteller and Tukey (1977) and reproduced in the study by Hand et al. (1994) (Table 9.45).

For each patient, two responses were recorded. A binary response takes the value one if a patient experienced a routine recovery and the value zero if postoperative intensive care was required. The second response variable is a count variable that


Table 9.44 Means and standard errors and comparison of means (least significance difference (LSD)) on the model scale (Estimate) and data scale (Mean)

(b) T grouping of density\*humidity least squares means (α = 0.05)



measures the length of hospital stay after the surgery (in days). The binary variable "OKstatus" is a regressor variable that distinguishes patients according to their postoperative physical status ("1" implies better status), and the variable age is the age of the patient.

These data can be modeled with a separate logistic model for the binary outcome and with a Poisson model for the count outcome. Such separate analyses would not take into account the correlation between the two response variables. It is reasonable to assume that the duration of post-surgery hospitalization is correlated and will depend on whether the patient requires intensive care.

In the following analysis, the correlation between the two types of response variables for a patient is modeled with shared random effects (G-side). The dataset variable "dist" identifies the distribution for each observation. For those observations that follow a binary distribution, the response variable option "(event = "1 ")" determines which value of the binary variable is modeled as the event of interest. Since no "link" option is specified, the link is also chosen on an observation-byobservation basis as a predetermined link for the respective distribution. The following GLIMMIX commands fit this dataset with two distributions:


Table 9.45 Hospital condition and length of stay of patients

data Poi\_Bin; length dist \$7; input d\$ patient age OKstatus response @@; if d = 'B' then dist='Binary'; else dist='Poisson'; datalines; B 1 78 1 0 P 1 78 1 9 B 2 60 1 0 P 2 60 1 4 B 3 68 1 1 1 P 3 68 1 7 B 4 62 0 1 P 4 62 0 35 ................................. ...................................... ...................................


Table 9.46 Model information

```
.................................... 
..................................... 
.................................. 
B 29 54 1 0 P 29 54 1 2 B 30 43 1 1 1 P 30 43 1 3 
B 31 4 1 1 1 P 31 4 1 3 B 32 52 1 1 1 P 32 52 1 8 
; 
proc glimmix data=joint; 
class patient dist; 
model response(event='1') = dist dist*age dist*OKstatus / 
noint s dist=byobs(dist); 
random int / subject=patient; 
lsmeans dist/lines ilink; 
run;
```
Some of the output is shown below. Table 9.46 ("Model information") shows that the distribution of the data is multivariate and that possibly multiple link functions are involved; by default, proc. GLIMMIX uses a logit link for the binary observations and a log link for the Poisson data.

Table 9.47 shows the value of the distribution statistic Gener. chi - square/ DF = 0.90, which indicates that there is no overdispersion, and also shows the estimated variance component due to patient, which is, σ<sup>2</sup> patient <sup>=</sup> <sup>0</sup>:299. The fixed effects tests for the effects of age and status are shown in part (c).

In addition to the above results, the maximum likelihood estimators of the intercepts, as well as the values of the slopes of each of the variables of both probability distributions, are tabulated in Table 9.48.

Thus, to calculate the probability that a patient will experience a routine recovery, the following expression is used:

$$\widehat{\pi} = \frac{1}{1 + \exp\left\{-\widehat{\rho}\_0 - \widehat{\rho}\_1 \times \text{age} - \widehat{\rho}\_2 \times \text{okstaus}\right\}}$$

$$= \frac{1}{1 + \exp\left\{-5.7783 + 0.07872 \times \text{age} + 0.4697 \times \text{okstaus}\right\}}$$


Table 9.48 Maximum likelihood estimators for fixed effects


whereas the following expression is used to calculate the average value of the length of hospital stay after the surgery (in days):

$$\hat{\lambda} = \exp\left\{{}^{\alpha\_0 + \widehat{\alpha\_1} \times \text{age} + \widehat{\alpha\_2} \times \text{okutsus}}\right\} = \exp\left\{{0.8410 + 0.01875 \times \text{age} - 0.1896 \times \text{okutsus}}\right\}$$

# 9.11 Exercises

Exercise 9.11.1 Consider an experiment in which three treatments are compared. There are r blocks of n animals, each using grouping criteria relevant to the experiment. Within each block, one animal is randomly assigned to each treatment. A measurement was taken on animals at "week 0," when treatments were applied, and again at weeks 4 and 12. Variables measured included weight, the presence or absence of disease symptoms, and severity of symptoms, classified as "worse," "no change," or "better." The focus of this experiment was on repeated measures analysis of the last two types of data in the above list: categorical data that are binary or ordinal and ordinal responses/ratings in an experiment designed with a repeated measures and treatment factor structure. Regardless of whether the observations are


Table 9.49 Results of a repeated measures experiment with an ordinal response variable

normally distributed, categorical, or have some other distribution, a general approach to repeated measures analysis based on the linear mixed model uses the following general form:

Observation <sup>=</sup> systematic between - subjects variation þ random between


The following table shows the data from an experiment in which each cell contains the number of animals in a given treatment × week × response category combination (Table 9.49).

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Odds ratio interpretation
	- (iii) The expected probability per category for each treatment

Repeat (b) through (d), assuming a generalized multinomial logit in Exercise 9.11.1. Discuss your results.

Repeat (b) through (d) assuming a multinomial cumulative probit in Exercise 9.11.1. Discuss your results and compare with those found in (1) and (2).

Alternatively, the contingency table approach can be implemented using a log-linear model. For the previous example, 9.11.1, fit the log-linear model

$$\log\left(\lambda\_{ijk}\right) = \mu + \tau\_i + \varpi\_j + \left(\tau\varpi\right)\_{\vec{ij}} + c\_k + \left(\tau c\right)\_{ik} + \left(\tau\varpi c\right)\_{\vec{ijk}}$$

where λijk is the expected count of the treatment combination ijk by week by response category and τ, ϖ, and c refer to treatment, week,and response category effects, respectively.


Table 9.50 Nitrogen injection treatment factors study

Exercise 9.11.2 Fertilization of turf has traditionally been accomplished through surface applications. The introduction of new equipment (Hydroject) has made it possible to place soluble materials below the surface (Table 9.50).

A study was conducted during the 1997 growing season to compare surface application and subsoil injection of nitrogen on the green color of bentgrass (Agrostis palustris L. Huds) 1 year after transplanting. The treatment structure was a full factorial of grass management factors (four types/levels) and the rate/level (two levels) of nitrogen application per square meter (g/m2 ). Eight treatment combinations were arranged in a completely randomized design with four replications. Turf color was evaluated in each experimental unit at weekly intervals of 4 weeks as poor, average, good, or excellent.

Of particular interest was the determination of the water injection effect, the subsurface effect, and the comparison of injection versus surface applications. These are contrasts between the levels of factor management practice and their primary objective, which was to determine whether the factor interacts with the rate of application.

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Interpretation of the odds ratios
	- (iii) The expected probability per category for each treatment

(c) Test whether the proportional odds assumption is viable. Cite relevant evidence to support your conclusion regarding the adequacy of the assumption.

### Exercise 9.11.3 Refer to Exercise 9.11.1.

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Interpretation of odds ratios
	- (iii) The expected probability per category for each treatment

Exercise 9.11.4 Refer to Exercise 9.11.1.

	- (i) An evaluation of the effects of the combination of treatments
	- (ii) Interpretation of the odds ratios
	- (iii) The expected probability per category for each treatment


# Appendix

### Appendix 417

Data: Feeding line experiment Tray Feeding Run Proportion Tray Feeding Run Proportion L 1 0.70601 1 L 3 0.75524 L 1 0.68817 2 L 3 0.77249 L 1 0.68317 3 L 3 . L 1 0.77805 4 L 3 0.84204 L 1 0.76692 5 L 3 0.81572 L 1 0.79127 6 L 3 0.79161 L 1 0.73653 7 L 3 0.81234 L 1 0.74939 8 L 3 0.81795 L 1 0.78773 9 L 3 0.8225 L 1 0.7381 10 L 3 0.79384 L 1 0.88486 11 L 3 0.8135 L 1 0.90401 12 L 3 0.83965 H 2 0.07547 1 H 4 0.07105 H 2 0.05801 2 H 4 0.05511 H 2 0.0565 3 H 4 0.05217 H 2 0.09579 4 H 4 0.10567 H 2 0.10954 5 H 4 0.0755 H 2 0.12154 6 H 4 0.0853 H 2 0.1144 7 H 4 0.09363 H 2 0.13728 8 H 4 0.11154 H 2 0.15012 9 H 4 0.16264 H 2 0.12113 10 H 4 0.09215 H 2 0.17633 11 H 4 0.1834 H 2 0.16408 12 H 4 0.21016 L 2 0.78318 1 L 4 0.76556 L 2 0.78418 2 L 4 0.78307 L 2 0.78589 3 L 4 0.76486 L 2 0.78867 4 L 4 0.82391 L 2 0.81988 5 L 4 0.8044 L 2 0.82793 6 L 4 0.81178 L 2 0.81384 7 L 4 0.84339 L 2 0.81037 8 L 4 0.78833 L 2 0.77528 9 L 4 0.79804 L 2 0.78916 10 L 4 0.82236 L 2 0.87109 11 L 4 0.83807 L 2 0.84704 12 L 4 0.85532



### Appendix 419



Data: Percentage inhibition (Bio bioassay, Con concentration, Rep repetition, Por percentage inhibition)


Data: Percentage inhibition (Bio bioassay, Con concentration, Rep repetition, Por percentage inhibition)

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# References

Agresti A (2013) Introduction to categorical datikea analysis, 3rd edn. Wiley, Hoboken


J. Salinas Ruíz et al., Generalized Linear Mixed Models with Applications

<sup>©</sup> The Editor(s) (if applicable) and The Author(s) 2023

in Agriculture and Biology, https://doi.org/10.1007/978-3-031-32800-8

